OCR: How to C#
Using OCR SDK for C#.NET
Sample C#.NET Codes
Extract Content from Images
  |  
Home ›› XImage.OCR ›› C# OCR: Demos and Sample Codes

C#: Demos for Image Content Extraction Using OCR


Free OCR Demos and Sample Codes for Extracting Content from Various Images




In addition to extracting document page content in C# OCR project, you are also able to extract content (text) from various image files (scanned PDF, Jpeg, Png, Bmp, Gif, and Tiff), and then output to text and PDF files. On this page, free demos and sample codes for how to perform these mature OCR functions are offered. Please see details in the following parts.


Related .net document control helps:
asp.net view excel in browser: ASP.NET Excel Viewer in C# Control (MVC & WebForms): view Office Excel document in web browser.
asp.net annotate pdf control: ASP.NET Annotate PDF Control: annotate, comment, markup PDF document online using ASP.NET C#
asp.net mvc pdf editor control: ASP.NET MVC PDF Viewer & Editor: view, annotate, redact, edit PDF document in C# ASP.NET MVC
asp.net dicom library: ASP.NET Dicom Document Viewer Control: view, annotate dicom imaging files online in ASP.NET
asp.net document viewer example: EdgeDoc:ASP.NET Document Viewer C# Control: Open, view, annotate, redact, convert documents online in C#, VB.NET, AS...
asp.net mvc image viewer: ASP.NET Image Viewer Control(MVC & WebForms): view, annotate, redact, convert image files in html, JQuery
asp.net mvc display tiff: ASP.NET Tiff Viewer: view, annotate multipage Tiff images in ASP.NET MVC, WebForms using C# Control


Please pay attention, for common raster images content extraction, like Jpeg, Png, Bmp, and Gif, only four indispensable assemblies should be integrated into your C# application. If your target file is Tiff or PDF document, then respective DLL libraries should also be used.




Page Content



Demo For Scanned PDF Content (Text) Extraction


Demo For Jpeg Content  (Text) Extraction


Demo For Png Content  (Text) Extraction


Demo For Bmp Content  (Text) Extraction


Demo For Gif Content  (Text) Extraction


Demo For Tiff Content (Text) Extraction

                                                                                                                                           



C# Project DLLs: Extract Image Content



In order to run the following scan tiff image text sample code successfully, please do as follows:


Add References


  RasterEdge.XImage.OCR.dll


  RasterEdge.XImage.OCR.Tesseract.dll


  RasterEdge.Imaging.Basic.dll


  RasterEdge.Imaging.Basic.Codec.dll


  RasterEdge.Imaging.Drawing.dll


  RasterEdge.Imaging.Font.dll


  RasterEdge.Imaging.Processing.dll


  RasterEdge.XImage.AdvancedCleanup.Core.dll


  RasterEdge.XImage.Raster.Core.dll


  RasterEdge.XImage.Raster.dll


  RasterEdge.XDoc.PDF.dll


Using Namespaces


  using RasterEdge.XDoc.PDF;


  using RasterEdge.XImage.OCR;


  using RasterEdge.Imaging.Basic;


Note: When you get the error "Could not load file or assembly 'RasterEdge.Imaging.Basic' or any other assembly or one of its dependencies. An attempt to load a program with an incorrect format", please check your configure as follows:

       

       If you are using x64 libraries/dlls, Right click the project -> Properties -> Build -> Platform target: x64.

       

       If using x86, the platform target should be x86.




C# Sample Code for Scanned PDF Text Extraction



Please copy C# OCR sample code below to extract text from scanned PDF document and save to pdf.txt.




// Set the training data path please put eng.traineddata (for English) under the path specified.
OCRHandler.SetTrainResourcePath(resourcePath);

// Set supported language. You can also set this attribute in OCRPage or OCRZone.
OCRHandler.Settings.LanguagesEnabled.Add(RasterEdge.Imaging.OCR.Language.Eng);

// Load PDF document & return OCR document.
PDFDocument pdf = new PDFDocument(@"C:\sample.pdf");
int pageCount = pdf.GetPageCount();
StreamWriter writer = new StreamWriter(@"C:\pdf.txt");
try
{
        for (int i = 0; i < pageCount; i++)
        {
                // Load page to recongnize.
                PDFPage page = (PDFPage)pdf.GetPage(i);

                // The default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
                Bitmap bmp = page.ConvertToImage(192);//192,288....

                // Import the page to recoginze.
                OCRPage oPage = OCRHandler.Import(bmp);
                oPage.Recognize();
                writer.WriteLine(oPage.GetText());
        }
}
catch { }
finally
{
        writer.Close();
}





C# Sample Code for Jpeg Image Text Extraction



Please use the following OCR sample code to extract text from Jpeg and save to jpeg.txt in C# program.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.jpeg");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\jpeg.txt");





C# Sample Code for Png Image Text Extraction



This C# OCR demo code illustrates how to extract text from Png and save to png.pdf.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.png");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.PDF, @"C:\png.pdf");


     



C# Sample Code for Bmp Image Text Extraction



The following OCR demo code will help you easily extract text from Bmp and save to bmp.txt in C# project.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.bmp");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\bmp.txt");


     



C# Sample Code for Gif Text Extraction



This is a simple example for how to use XImage.OCR for .NET in C# to extract text from Gif and save to gif.txt.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.gif");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\gif.txt");