C# OCR Library
Demos for Image Content Extraction Using OCR
Free OCR Demos and Sample Codes for Extracting Content from Various Images
In addition to extracting document page content in C# OCR project,
you are also able to extract content (text) from various image files (scanned PDF, Jpeg, Png, Bmp, Gif, and Tiff), and then output to text and PDF files.
On this page, free demos and sample codes for how to perform these mature OCR functions are offered. Please see details in the following parts.
Related .net document control helps:
asp.net view excel in browser: ASP.NET Excel Viewer in C# Control (MVC & WebForms): view Office Excel document in web browser.
asp.net annotate pdf control:
ASP.NET Annotate PDF Control: annotate, comment, markup PDF document online using ASP.NET C#
asp.net mvc pdf editor control: ASP.NET MVC PDF Viewer & Editor: view, annotate, redact, edit PDF document in C# ASP.NET MVC
asp.net dicom library: ASP.NET Dicom Document Viewer Control: view, annotate dicom imaging files online in ASP.NET
asp.net document viewer example:
EdgeDoc:ASP.NET Document Viewer C# Control:
Open, view, annotate, redact, convert documents online in C#, VB.NET, AS...
asp.net mvc image viewer: ASP.NET Image Viewer Control(MVC & WebForms): view, annotate, redact, convert image files in html, JQuery
asp.net mvc display tiff: ASP.NET Tiff Viewer: view, annotate multipage Tiff images in ASP.NET MVC, WebForms using C# Control
Please pay attention, for common raster images content extraction, like Jpeg, Png, Bmp, and Gif, only four indispensable assemblies should be integrated into your C# application. If your target file is Tiff or PDF document, then respective DLL libraries should also be used.
C# Sample Code for Scanned PDF Text Extraction
Please copy C# OCR sample code below to extract text from scanned PDF document and save to pdf.txt.
// Set the training data path please put eng.traineddata (for English) under the path specified.
OCRHandler.SetTrainResourcePath(resourcePath);
// Set supported language. You can also set this attribute in OCRPage or OCRZone.
OCRHandler.Settings.LanguagesEnabled.Add(RasterEdge.Imaging.OCR.Language.Eng);
// Load PDF document & return OCR document.
PDFDocument pdf = new PDFDocument(@"C:\sample.pdf");
int pageCount = pdf.GetPageCount();
StreamWriter writer = new StreamWriter(@"C:\pdf.txt");
try
{
for (int i = 0; i < pageCount; i++)
{
// Load page to recongnize.
PDFPage page = (PDFPage)pdf.GetPage(i);
// The default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
Bitmap bmp = page.ConvertToImage(192);//192,288....
// Import the page to recoginze.
OCRPage oPage = OCRHandler.Import(bmp);
oPage.Recognize();
writer.WriteLine(oPage.GetText());
}
}
catch { }
finally
{
writer.Close();
}
C# Sample Code for Jpeg Image Text Extraction
Please use the following OCR sample code to extract text from Jpeg and save to jpeg.txt in C# program.
// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.jpeg");
// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\jpeg.txt");
C# Sample Code for Png Image Text Extraction
This C# OCR demo code illustrates how to extract text from Png and save to png.pdf.
// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.png");
// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.PDF, @"C:\png.pdf");