C# PDF Reader Library
How to open, display, view, read, extract Adobe PDF file text, image contents using c# .net pdf viewer control
How to read, extract, explore PDF contents in C#.NET for ASP.NET, webforms, ajax, Winforms, WPF, Azure projects
Download and Install XDoc.PDF Reader C# library
Read PDF document
Read, extract text from PDF
The following C# source code will show how to extract text content from PDF document line by line using C#.
- Load an existing PDF file into PDFDocument object
- Get PDFTextMgr object from PDFTextHandler.ExportPDFTextManager()
- Get the first page of PDF document
- Use PDFTextMgr.ExtractTextLine() to get all lines of text inside the first PDF page.
// open a document
String inputFilePath = Program.RootPath + "\\" + "2.pdf";
PDFDocument doc = new PDFDocument(inputFilePath);
// get text manager from the document
PDFTextMgr textMgr = PDFTextHandler.ExportPDFTextManager(doc);
// extract different text content from the first page
int pageIndex = 0;
PDFPage page = (PDFPage)doc.GetPage(pageIndex);
// get all lines in the page
List<PDFTextLine> allLines = textMgr.ExtractTextLine(page);
// report characters
foreach (PDFTextLine obj in allLines)
{
Console.WriteLine("Line: " + obj.GetContent() + "; Boundary: " + obj.GetBoundary().ToString());
}
Besides reading lines of text from PDF page, PDFTextMgr also supports reading characters, words, paragraphs of the PDF pages using C#.
View details here: How to read, extract text from PDF using C#?
Read images from PDF
You could easily extract all images inside the PDF page. The C# code below explains the steps.
// Open a document.
PDFDocument doc = new PDFDocument(@"C:\demo.pdf");
// Extract all images in the document.
List<PDFImage> allImages = PDFImageHandler.ExtractImages(doc);
Read font resources from PDF
Using XDoc.PDF C# library, you could easily get the font information inside the PDF document, such as embed font number, font name.
// Open file
PDFDocument doc = new PDFDocument(@"C:\demo.pdf");
// Retreive all embedded font information
List<PDFEmbedFontInfo> fontInfos = PDFTextHandler.GetEmbeddedFontInfo(doc);
// Number of embedded fonts in the document.
Console.WriteLine("Embedded Font Count: {0}", fontInfos.Count);
foreach (PDFEmbedFontInfo fontInfo in fontInfos)
{
// Font name of the embedded font.
Console.WriteLine("Font Name: {0}", fontInfo.FontName);
// Object number in the PDF file.
Console.WriteLine("Object Number: {0}", fontInfo.ObjectNumber);
}
Read document metadata from PDF
The C# example source code here shows how to read, extract PDF file metadata information, such as author name, create and last modified date time.
// Retrieve PDF document metadata.
String inputFilePath = @"C:\demo.pdf";
PDFDocument doc = new PDFDocument(inputFilePath);
PDFMetadata metadata = doc.GetDescription();
Console.WriteLine("Title: " + metadata.Title);
Console.WriteLine("Author: " + metadata.Author);
Console.WriteLine("Subject: " + metadata.Subject);
Console.WriteLine("Keywords: " + metadata.Keywords);
Console.WriteLine("Creator: " + metadata.Creator);
Console.WriteLine("Producer: " + metadata.Producer);
Console.WriteLine("Create Date: " + metadata.CreatedDate.ToString());
Console.WriteLine("Modified Date: " + metadata.ModifiedDate.ToString());
Using XDoc.PDF C# library, you could easily read, update, delete PDF metadata and XMP data in PDF file in C# application. View details here
how to read, edit PDF metadata, XMP in C#?
Read AcroForm data from PDF
One of the popular operation in PDF library is read user filled AcroForm data from PDFs in C#.
String inputFilePath = Program.RootPath + "\\" + "1_AF_Filled.pdf";
List<BaseFormField> fields = PDFFormHandler.GetFormFields(inputFilePath);
Console.WriteLine("Number of Fields: " + fields.Count);
if (fields.Count > 0)
{
foreach (BaseFormField field in fields)
{
Console.WriteLine("Field");
Console.WriteLine(" Name: " + field.Name);
if (field is AFCheckBox)
{
Console.WriteLine(" Type: " + "CheckBox");
Console.WriteLine(" IsChecked: " + ((AFCheckBox)field).IsChecked);
}
else if (field is AFRadioButton)
{
Console.WriteLine(" Type: " + "RadioButton");
Console.WriteLine(" IsChecked: " + ((AFRadioButton)field).IsChecked);
}
else if (field is AFTextBox)
{
Console.WriteLine(" Type: " + "TextBox");
Console.WriteLine(" Content: " + ((AFTextBox)field).Text);
}
else if (field is AFListBox)
{
Console.WriteLine(" Type: " + "ListBox");
Console.WriteLine(" Selected Item Index: " + ((AFListBox)field).SelectedIndexes[0]);
}
else if (field is AFComboBox)
{
Console.WriteLine(" Type: " + "ComboBox");
Console.WriteLine(" Selected Item Index: " + ((AFComboBox)field).SelectedIndex);
}
}
}
Read annotation and markup from PDF
You can easily read all annotations and markup data from a PDF document using C# in ASP.NET, Windows application.
String inputFilePath = Program.RootPath + "\\" + "Annot_9.pdf";
PDFDocument doc = new PDFDocument(inputFilePath);
List<IPDFAnnot> annots = PDFAnnotHandler.GetAllAnnotations(doc);
foreach (IPDFAnnot annot in annots)
{
if (annot is PDFAnnotTextBox)
{
PDFAnnotTextBox obj = (PDFAnnotTextBox)annot;
Console.WriteLine("Textbox Boundary: " + obj.Boundary.ToString());
Console.WriteLine("Textbox Border Color: " + obj.LineColor.ToString());
Console.WriteLine("Textbox Border Width: " + obj.LineWidth);
Console.WriteLine("Textbox Fill Color: " + obj.FillColor.ToString());
}
}