Guide for C#
Core Document Formats
Additional Features

C# Imaging - OCR Recognition in C#.NET

Comprehensive Visual C# Codes for OCR Recognition in .NET Image Library

Visual C#
Home > .NET Imaging SDK > C# > OCR Recognition
If this is your first time to use our DocImageSDK, we strongly suggest you reading How to Start first!

RasterEdge provides users the most standard and comprehensive Optical Character Recognition SDK technology that is fully developed, highly accurate and easy to work within C#.NET, VB.NET, web ASP.NET and .NET WinForms programming environments. This tutorial article mainly talks about RasterEdge's high level OCR Add-on toolkit in C# class. With this C# imaging OCR SDK, users are supposed to extract text from scanned documents or image-only PDFs and convert images to text-searchable formats rapidly.
So, if you want to deploy OCR recognition, RasterEdge OCR Add-On is your best choice. Flexible C# OCR recognition, detecting and processing options are provided for a better performance.
  • Improve OCR color image reading quality
  • Distinguish between OCR region types
  • Clean up after text translation
  • Cancel OCR in your C# application progress easily
  • Count total number of each word in images
  • Support font mapping in your C# image project
We offer the specific information of OCR SDK technology of RasterEdge .NET Imaging DLL library:
  • Overview of OCR recognition technology benefits when applied to Visual C# program
  • C# Windows imaging project creation tutorials for developers and end users
  • Methods for users to extract text out from image using OCR reader
  • Entire managed C# sample code for users to accurately extract and process text from image by using our C# OCR image reading control library
Overview of Visual C# OCR SDK Technology Benefits
  • Free to implement the reliable and high performance Optical Character Recognition in any .NET application or environment
  • Simple to integrate .NET Imaging OCR Add-on into a C# Windows desktop application
  • Support using this OCR SDK to read image text that from over 10 languages and character sets
  • Able to recognize images captured by a digital camera, scanned documents or image-only PDFs using C# OCR reading project
  • Support both monochrome and bitonal color image recognition for scanned documents and pictures in C#
  • Complete and rapid report of extracted text, including size, font, location, character attribute, etc.
C# Project for OCR Recognition
In this part, you will know how to create a C#.NET project to apply RasterEdge .NET Imaging SDK and OCR Add-On components. Specific steps are as follows.
  1. Please download RasterEdge .NET Imaging SDK that contains OCR Imaging Add-On;
  2. Create a .NET project with Visual C# programming language in Visual Studio 2005 or any greater version;
  3. Integrate RasterEdge .NET Imaging SDK & RasterEdge OCR Add-On DLLs to the create C#.NET project;
    • RasterEdge.Imaging.Basic.dll
    • RasterEdge.Imaging.OCR.dll
For VB.NET developers, please go to OCR recognition for VB.NET. Want to view image and document OCR functions in WinForms or Web applications, please go to OCR recognition in WinForms and web document image OCR recognition.
Methods to Extract Text from Image in C# Project
The following C# methods are used to extract text from image using RasterEdge .NET Imaging SDK and OCR Image Add-On.
 WorkRegistry.Reset();
// load jpeg image
REImage img = new REImage(@"c:\test1.jpeg");
//Set the tranning data path, please put eng.traineddata (for english)
//under the directory you specified.
OCRHandler.SetTrainResourcePath(@"c:\source");
//resize image to improve accuracy if the image is clear enough skip this
img = img.Resize(new Size((int)img.Width * 2, (int)img.Height * 2));
// import reimage for OCR
OCRPage page1 = OCRHandler.Import(img);
//recognize characters from this image, default language is english
OCRPage page = OCRHandler.Import(img);
page.Recognize();
Console.WriteLine(page.GetText());
OCR text from Scanned PDF or Tiff Documents
The following C# demo code shows how to extract text from pdf and tiff documents using OCR APIs
//Set the training data path please put eng.traineddata (for english)
//under the path specified
OCRHandler.SetTrainResourcePath(@"c:\source");
// set supported language, you can also set this attribute in OCRPage or OCRZone
OCRHandler.Settings.LanguagesEnabled.Add(Language.Eng);
//load pdf or tiff document
PDFDocument doc = new PDFDocument(@"C:\sample.pdf");
//TIFFDocument doc = new TIFFDocument(@"c:\sample.tif");
// load the first page to Recongnize
PDFPage page = (PDFPage)doc.GetPage(0);
//TIFFPage page = (TIFFPage)doc.GetPage(0);
//rasterize the page with scale factor
Bitmap bmp = page.GetBitmap(1.5f);
// import the page to recoginze
OCRPage oPage = OCRHandler.Import(bmp);
oPage.Recognize();
// save ocr result as other documet format(txt pdf, svg).
oPage.SaveTo(MIMEType.TXT,@"c:\sample.txt");
//or you can output the text directly
Console.WriteLine(oPage.GetText());
What Is Optical Character Recognition?
What will you do when you want to digitize a paper document, such as a magazine article, booklet, electronic PDF, Word or other document? Without any doubt, the smartest solution is using a scanner or OCR software to convert all the required material into readable digital format.
OCR, shorten form of Optical Character Recognition, is a toolkit that enables users to convert different types of scanned documents, PDF files or images captured by a digital camera into editable and searchable digit data. With OCR recognizer, users can easily and fast extract all the characters as well as their properties from image or document.


Recommend this to Google+