OCR: How to C#
Using OCR SDK for C#.NET
Sample C#.NET Codes
Extract Content from Images
  |  
Home ›› XImage.OCR ›› C# OCR: Demos and Sample Codes

C#: Demos for Image Content Extraction Using OCR


Free OCR Demos and Sample Codes for Extracting Content from Various Images




In addition to extracting document page content in C# OCR project, you are also able to extract content (text) from various image files (scanned PDF, Jpeg, Png, Bmp, Gif, and Tiff), and then output to text and PDF files. On this page, free demos and sample codes for how to perform these mature OCR functions are offered. Please see details in the following parts.


Please pay attention, for common raster images content extraction, like Jpeg, Png, Bmp, and Gif, only four indispensable assemblies should be integrated into your C# application. If your target file is Tiff or PDF document, then respective DLL libraries should also be used.




Page Content



Demo For Scanned PDF Content (Text) Extraction


Demo For Jpeg Content  (Text) Extraction


Demo For Png Content  (Text) Extraction


Demo For Bmp Content  (Text) Extraction


Demo For Gif Content  (Text) Extraction


Demo For Tiff Content (Text) Extraction

                                                                                                                                           



C# Project DLLs: Extract Image Content



In order to run the following scan tiff image text sample code successfully, please do as follows:


Add References


  RasterEdge.XImage.OCR.dll


  RasterEdge.XImage.OCR.Tesseract.dll


  RasterEdge.Imaging.Basic.dll


  RasterEdge.Imaging.Basic.Codec.dll


  RasterEdge.Imaging.Drawing.dll


  RasterEdge.Imaging.Font.dll


  RasterEdge.Imaging.Processing.dll


  RasterEdge.XImage.AdvancedCleanup.Core.dll


  RasterEdge.XImage.Raster.Core.dll


  RasterEdge.XImage.Raster.dll


  RasterEdge.XDoc.PDF.dll


Using Namespaces


  using RasterEdge.XDoc.PDF;


  using RasterEdge.XImage.OCR;


  using RasterEdge.Imaging.Basic;




C# Sample Code for Scanned PDF Text Extraction



Please copy C# OCR sample code below to extract text from scanned PDF document and save to pdf.txt.




// Set the training data path please put eng.traineddata (for English) under the path specified.
OCRHandler.SetTrainResourcePath(resourcePath);

// Set supported language. You can also set this attribute in OCRPage or OCRZone.
OCRHandler.Settings.LanguagesEnabled.Add(RasterEdge.Imaging.OCR.Language.Eng);

// Load PDF document & return OCR document.
PDFDocument pdf = new PDFDocument(@"C:\sample.pdf");
int pageCount = pdf.GetPageCount();
StreamWriter writer = new StreamWriter(@"C:\pdf.txt");
try
{
        for (int i = 0; i < pageCount; i++)
        {
                // Load page to recongnize.
                PDFPage page = (PDFPage)pdf.GetPage(i);

                // Rasterize the page with a resolution multipi.
                Bitmap bmp = page.GetBitmap();

                // Import the page to recoginze.
                OCRPage oPage = OCRHandler.Import(bmp);
                oPage.Recognize();
                writer.WriteLine(oPage.GetText());
        }
}
catch { }
finally
{
        writer.Close();
}





C# Sample Code for Jpeg Image Text Extraction



Please use the following OCR sample code to extract text from Jpeg and save to jpeg.txt in C# program.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.jpeg");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\jpeg.txt");





C# Sample Code for Png Image Text Extraction



This C# OCR demo code illustrates how to extract text from Png and save to png.pdf.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.png");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.PDF, @"C:\png.pdf");


     



C# Sample Code for Bmp Image Text Extraction



The following OCR demo code will help you easily extract text from Bmp and save to bmp.txt in C# project.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.bmp");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\bmp.txt");


     



C# Sample Code for Gif Text Extraction



This is a simple example for how to use XImage.OCR for .NET in C# to extract text from Gif and save to gif.txt.




// Set the training data path. Please put eng.traineddata (for English) under the directory you specified.
OCRHandler.SetTrainResourcePath(resourcePath);
REImage img = new REImage(@"C:\sample.gif");

// Recognize characters from this image. Default language is English.
OCRPage page = OCRHandler.Import(img);
page.Recognize();
page.SaveTo(MIMEType.TXT, @"C:\gif.txt");