PM > Install-Package XDoc.PDF

How to Start Tutorials Troubleshooting Main Operations Convert PDF Read PDF Edit PDF PDF Report Generator Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

C# PDF Reader Library
How to open, display, view, read, extract Adobe PDF file text, image contents using c# .net pdf viewer control


How to read, extract, explore PDF contents in C#.NET for ASP.NET, webforms, ajax, Winforms, WPF, Azure projects





Download and Install XDoc.PDF Reader C# library



  1. Download XDoc.PDF Reader C# library
  2. Install C# library to convert PDF pages to other file formats
  3. Step by Step Tutorial






Read PDF document



Read, extract text from PDF

The following C# source code will show how to extract text content from PDF document line by line using C#.

  • Load an existing PDF file into PDFDocument object
  • Get PDFTextMgr object from PDFTextHandler.ExportPDFTextManager()
  • Get the first page of PDF document
  • Use PDFTextMgr.ExtractTextLine() to get all lines of text inside the first PDF page.

//  open a document
String inputFilePath = Program.RootPath + "\\" + "2.pdf";
PDFDocument doc = new PDFDocument(inputFilePath);
//  get text manager from the document
PDFTextMgr textMgr = PDFTextHandler.ExportPDFTextManager(doc);

//  extract different text content from the first page
int pageIndex = 0;
PDFPage page = (PDFPage)doc.GetPage(pageIndex);

//  get all lines in the page
List<PDFTextLine> allLines = textMgr.ExtractTextLine(page);
//  report characters
foreach (PDFTextLine obj in allLines)
{
    Console.WriteLine("Line: " + obj.GetContent() + "; Boundary: " + obj.GetBoundary().ToString());
}

Besides reading lines of text from PDF page, PDFTextMgr also supports reading characters, words, paragraphs of the PDF pages using C#. View details here: How to read, extract text from PDF using C#?



Read images from PDF

You could easily extract all images inside the PDF page. The C# code below explains the steps.

            // Open a document.
            PDFDocument doc = new PDFDocument(@"C:\demo.pdf");

            // Extract all images in the document.
            List<PDFImage> allImages = PDFImageHandler.ExtractImages(doc);


Read font resources from PDF

Using XDoc.PDF C# library, you could easily get the font information inside the PDF document, such as embed font number, font name.

//  Open file
PDFDocument doc = new PDFDocument(@"C:\demo.pdf");
//  Retreive all embedded font information
List<PDFEmbedFontInfo> fontInfos = PDFTextHandler.GetEmbeddedFontInfo(doc);
//  Number of embedded fonts in the document.
Console.WriteLine("Embedded Font Count: {0}", fontInfos.Count);
foreach (PDFEmbedFontInfo fontInfo in fontInfos)
{
    //  Font name of the embedded font.
    Console.WriteLine("Font Name:      {0}", fontInfo.FontName);
    //  Object number in the PDF file.
    Console.WriteLine("Object Number:  {0}", fontInfo.ObjectNumber);
}


Read document metadata from PDF

The C# example source code here shows how to read, extract PDF file metadata information, such as author name, create and last modified date time.

            // Retrieve PDF document metadata.
            String inputFilePath = @"C:\demo.pdf";
            PDFDocument doc = new PDFDocument(inputFilePath);
            PDFMetadata metadata = doc.GetDescription();
            Console.WriteLine("Title:         " + metadata.Title);
            Console.WriteLine("Author:        " + metadata.Author);
            Console.WriteLine("Subject:       " + metadata.Subject);
            Console.WriteLine("Keywords:      " + metadata.Keywords);
            Console.WriteLine("Creator:       " + metadata.Creator);
            Console.WriteLine("Producer:      " + metadata.Producer);
            Console.WriteLine("Create Date:   " + metadata.CreatedDate.ToString());
            Console.WriteLine("Modified Date: " + metadata.ModifiedDate.ToString());

Using XDoc.PDF C# library, you could easily read, update, delete PDF metadata and XMP data in PDF file in C# application. View details here how to read, edit PDF metadata, XMP in C#?



Read AcroForm data from PDF

One of the popular operation in PDF library is read user filled AcroForm data from PDFs in C#.

String inputFilePath = Program.RootPath + "\\" + "1_AF_Filled.pdf";

List<BaseFormField> fields = PDFFormHandler.GetFormFields(inputFilePath);
Console.WriteLine("Number of Fields: " + fields.Count);
if (fields.Count > 0)
{
    foreach (BaseFormField field in fields)
    {
        Console.WriteLine("Field");
        Console.WriteLine("  Name:      " + field.Name);

        if (field is AFCheckBox)
        {
            Console.WriteLine("  Type:      " + "CheckBox");
            Console.WriteLine("  IsChecked: " + ((AFCheckBox)field).IsChecked);
        }
        else if (field is AFRadioButton)
        {
            Console.WriteLine("  Type:      " + "RadioButton");
            Console.WriteLine("  IsChecked: " + ((AFRadioButton)field).IsChecked);
        }
        else if (field is AFTextBox)
        {
            Console.WriteLine("  Type:      " + "TextBox");
            Console.WriteLine("  Content:   " + ((AFTextBox)field).Text);
        }
        else if (field is AFListBox)
        {
            Console.WriteLine("  Type:                " + "ListBox");
            Console.WriteLine("  Selected Item Index: " + ((AFListBox)field).SelectedIndexes[0]);
        }
        else if (field is AFComboBox)
        {
            Console.WriteLine("  Type:                " + "ComboBox");
            Console.WriteLine("  Selected Item Index: " + ((AFComboBox)field).SelectedIndex);
        }
    }
}


Read annotation and markup from PDF

You can easily read all annotations and markup data from a PDF document using C# in ASP.NET, Windows application.

String inputFilePath = Program.RootPath + "\\" + "Annot_9.pdf";

PDFDocument doc = new PDFDocument(inputFilePath);
List<IPDFAnnot> annots = PDFAnnotHandler.GetAllAnnotations(doc);
foreach (IPDFAnnot annot in annots)
{
    if (annot is PDFAnnotTextBox)
    {
        PDFAnnotTextBox obj = (PDFAnnotTextBox)annot;
        Console.WriteLine("Textbox Boundary: " + obj.Boundary.ToString());
        Console.WriteLine("Textbox Border Color: " + obj.LineColor.ToString());
        Console.WriteLine("Textbox Border Width: " + obj.LineWidth);
        Console.WriteLine("Textbox Fill Color: " + obj.FillColor.ToString());
    }
}