XDoc.PDF
Features
Tech Specs
How-to C#
How-to VB.NET
Pricing
C# PDF: How to HTML5 Viewer & Editor PDF Create PDF Export File and Page Process PDF Read PDF Write Form Process Document Protect Annotation & Drawing PDF Print WPF Viewer & Editor Work with Other SDKs Barcode Read Barcode Create OCR Twain
OCR
  |  
Home ›› XDoc.PDF ›› C# PDF: OCR

C#.NET PDF - Extract Text from Scanned PDF Using OCR SDK for C#.NET


How to Extract Text from Adobe PDF Document Using .NET OCR Library in Visual C#




Overview



Best OCR SDK for Visual Studio .NET


Scan text content from adobe PDF document in .NET WinForms


Specify any area of PDF to perform OCR


.NET library for batching OCR PDF text content


.NET DLLs can be easily to be integrated into ASP.NET project


Support .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint


Recognize the whole PDF document and get all text content


Recognize a page of PDF document and extract its text content


Recognize scanned PDF file and output OCR result to Adobe PDF file


Recognize scanned PDF document and output OCR result to MS Word file


Online C# class source code for ocr text extraction in .NET


Free components and controls for downloading and using in .NET framework




Extract Text from Whole PDF Document in C#.NET



Add necessary references to your C#.NET project. Right-click the project and select "Add Reference..." to locate and add the following DLLs as project references;


  RasterEdge.Imaging.Basic.dll


  RasterEdge.Imaging.Basic.Codec.dll


  RasterEdge.Imaging.Drawing.dll


  RasterEdge.Imaging.Font.dll


  RasterEdge.Imaging.Processing.dll


  RasterEdge.XImage.Raster.dll


  RasterEdge.XImage.Raster.Core.dll


  RasterEdge.XDoc.PDF.dll


  RasterEdge.XImage.AdvancedCleanup.Core.dll


  RasterEdge.XImage.OCR.dll


  RasterEdge.XImage.OCR.Tesseract.dll


Use corresponding namespaces;


  RasterEdge.Imaging.Basic;


  RasterEdge.XDoc.PDF;


  RasterEdge.XImage.OCR;


Note: When you get the error "Could not load file or assembly 'RasterEdge.Imaging.Basic' or any other assembly or one of its dependencies. An attempt to load a program with an incorrect format", please check your configure as follows:

       

       If you are using x64 libraries/dlls, Right click the project -> Properties -> Build -> Platform target: x64.

       

       If using x86, the platform target should be x86.


Add the following C# OCR PDF text demo code to your project.




String ocrSource = @"D:\Alice\DLL\Source\";
OCRHandler.SetTrainResourcePath(ocrSource);
PDFDocument pdf = new PDFDocument(@"C:\input.pdf");
BasePage page = pdf.GetPage(0);
Bitmap bmp = page.ConvertToImage();
OCRPage ocrPage = OCRHandler.Import(bmp);
ocrPage.Recognize();
ocrPage.SaveTo(MIMEType.TXT, @"C:\output.txt");