OCR: How to C#
Using OCR SDK for C#.NET
Sample C#.NET Codes
Extract Content from Images
  |  
Home ›› XImage.OCR ›› C# OCR: Demos and Sample Codes

C#: Demos for Image Content Extraction Using OCR


Free OCR Demos and Sample Codes for Extracting Content from Various Images




In addition to extracting document page content in C# OCR project, you are also able to extract content (text) from various image files (scanned PDF, Jpeg, Png, Bmp, Gif, and Tiff), and then output to text and PDF files. On this page, free demos and sample codes for how to perform these mature OCR functions are offered. Please see details in the following parts.


Related .net document control helps:
asp.net webforms document viewer: ASP.NET WebForms application document viewer control: view, annotate, redact, convert files online in ASP.NET
asp.net pdf viewer: ASP.NET PDF Viewer Control: view, navigate, zoom Adobe PDF document in C# ASP.NET
asp.net annotate pdf: ASP.NET Annotate PDF Control: annotate, comment, markup PDF document online using ASP.NET C#
asp.net excel document viewer: ASP.NET Excel Document Viewer in C# Control: view office excel files online in ASP.NET web applications
asp.net iis document viewer: C# HTML5 Viewer: Deployment on IIS
asp.net image viewer: ASP.NET Image Document Viewer Control: view, annotate, redact, convert image files in html web browser
asp.net sharepoint pdf editor: ASP.NET SharePoint PDF Editor Control: view, edit, annotate, redact PDF document in SharePoint sites


Please pay attention, for common raster images content extraction, like Jpeg, Png, Bmp, and Gif, only four indispensable assemblies should be integrated into your C# application. If your target file is Tiff or PDF document, then respective DLL libraries should also be used.




Page Content



Demo For Scanned PDF Content (Text) Extraction


Demo For Jpeg Content  (Text) Extraction


Demo For Png Content  (Text) Extraction


Demo For Bmp Content  (Text) Extraction


Demo For Gif Content  (Text) Extraction


Demo For Tiff Content (Text) Extraction

                                                                                                                                           



C# Project DLLs: Extract Image Content



In order to run the following scan tiff image text sample code successfully, please do as follows:


Add References


  RasterEdge.XImage.OCR.dll


  RasterEdge.XImage.OCR.Tesseract.dll


  RasterEdge.Imaging.Basic.dll


  RasterEdge.Imaging.Basic.Codec.dll


  RasterEdge.Imaging.Drawing.dll


  RasterEdge.Imaging.Font.dll


  RasterEdge.Imaging.Processing.dll


  RasterEdge.XImage.AdvancedCleanup.Core.dll


  RasterEdge.XImage.Raster.Core.dll


  RasterEdge.XImage.Raster.dll


  RasterEdge.XDoc.PDF.dll


Using Namespaces


  using RasterEdge.XDoc.PDF;


  using RasterEdge.XImage.OCR;


  using RasterEdge.Imaging.Basic;


Note: When you get the error "Could not load file or assembly 'RasterEdge.Imaging.Basic' or any other assembly or one of its dependencies. An attempt to load a program with an incorrect format", please check your configure as follows:

       

       If you are using x64 libraries/dlls, Right click the project -> Properties -> Build -> Platform target: x64.

       

       If using x86, the platform target should be x86.




C# Sample Code for Scanned PDF Text Extraction



Please copy C# OCR sample code below to extract text from scanned PDF document and save to pdf.txt.




// Set the training data path please put eng.traineddata (for English) under the path specified.
OCRHandler.SetTrainResourcePath(resourcePath);

// Set supported language. You can also set this attribute in OCRPage or OCRZone.
OCRHandler.Settings.LanguagesEnabled.Add(RasterEdge.Imaging.OCR.Language.Eng);

// Load PDF document & return OCR document.
PDFDocument pdf = new PDFDocument(@"C:\sample.pdf");
int pageCount = pdf.GetPageCount();
StreamWriter writer = new StreamWriter(@"C:\pdf.txt");
try
{
        for (int i = 0; i < pageCount; i++)
        {
                // Load page to recongnize.
                PDFPage page = (PDFPage)pdf.GetPage(i);

                // The default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
                Bitmap bmp = page.ConvertToImage(192);//192,288....

                // Import the page to recoginze.
                OCRPage oPage = OCRHandler.Import(bmp);
                oPage.Recognize();
                writer.WriteLine(oPage.GetText());
        }
}
catch { }
finally
{
        writer.Close();
}





C# Sample Code for Jpeg Image Text Extraction



Please use the following OCR sample code to extract text from Jpeg and save to jpeg.txt in C# program.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.jpeg");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\jpeg.txt");





C# Sample Code for Png Image Text Extraction



This C# OCR demo code illustrates how to extract text from Png and save to png.pdf.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.png");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.PDF, @"C:\png.pdf");


     



C# Sample Code for Bmp Image Text Extraction



The following OCR demo code will help you easily extract text from Bmp and save to bmp.txt in C# project.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.bmp");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\bmp.txt");


     



C# Sample Code for Gif Text Extraction



This is a simple example for how to use XImage.OCR for .NET in C# to extract text from Gif and save to gif.txt.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.gif");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\gif.txt");