How to Start Convert PDF Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

VB.NET PDF Text Converter Library
How to convert PDF to Text file Using VB.NET in .NET Windows Forms, WPF, Console application

VB.NET Guide and Sample Codes to Convert PDF to Text in .NET Project

In this tutorial, you will learn how to convert PDF to text (.txt) file using in .NET Windows Forms, ASP.NET MVC web application.

  • PDF to Text file (.txt) conversion
  • Convert PDF text content to lines

How to convert PDF to Text file using Visual Basic .NET

  1. Download XDoc.PDF Text conversion VB.NET library
  2. Install VB library to convert PDF document to text file
  3. Step by Step Tutorial

RasterEdge .NET Imaging SDK has included several image processing library controls to edit your images and documents in .NET applications. Among all the DLL components, there is a PDF processing library which enables developers to convert PDF document into text file using Visual Basic .NET programming language. Using this VB.NET PDF text conversion API, users will be able to convert a PDF file or a certain page to text and easily save it as new txt file.

Before you get started, please make sure that you have installed the Microsoft .NET Framework (2.0 or above) as well as MS Visual Studio (2005 or later). After that, please refer to the following example, we will show you an example code of converting PDF document to text file in a Visual Basic .NET imaging application. Furthermore, if you are a Visual C# .NET programmer, you can go to this Visual C# tutorial for PDF to text conversion in .NET project.

How to read, convert PDF to Text file using VB.NET

Below are the steps and VB.NET demo source code to convert one PDF page to text file programmatically using code.

  1. Define a new PDFDocument object with an existing PDF file loaded
  2. Define a PDFTextMgr object with PDFDocument object loaded.
  3. Get a PDFPage object from the first page of the PDF document
  4. Utilize PDFTextMgr.ExtractTextCharacter() method to extract all characters from the PDF page
  5. Utilize PDFTextMgr.ExtractTextWord() method to extract all words from the PDF page
  6. Utilize PDFTextMgr.ExtractTextLine() method to extract all text lines from the PDF page

' open a document
Dim inputFilePath As String = "C:\2.pdf"
Dim doc As PDFDocument = New PDFDocument(inputFilePath)
' get text manager from the document
Dim textMgr As PDFTextMgr = PDFTextHandler.ExportPDFTextManager(doc)

' extract different text content from the first page
Dim pageIndex As Integer = 0
Dim page As PDFPage = doc.GetPage(pageIndex)

' get all characters in the page
Dim allChars As List(Of PDFTextCharacter) = textMgr.ExtractTextCharacter(page)
' report characters
For Each obj As PDFTextCharacter In allChars
    Console.WriteLine("Char: {0}; Boundary: {1}", obj.GetChar(), obj.GetBoundary().ToString())

' get all words in the page
Dim allWords As List(Of PDFTextWord) = textMgr.ExtractTextWord(page)
' report characters
For Each obj As PDFTextWord In allWords
    Console.WriteLine("Word: {0}; Boundary: {1}", obj.GetContent(), obj.GetBoundary().ToString())

' get all lines in the page
Dim allLines As List(Of PDFTextLine) = textMgr.ExtractTextLine(page)
' report characters
For Each obj As PDFTextLine In allLines
    Console.WriteLine("Line: {0}; Boundary: {1}", obj.GetContent(), obj.GetBoundary().ToString())