Guide for C#
Core Document Formats
Additional Features

C# Imaging - OCR Recognition in C#.NET

Comprehensive Visual C# Codes for OCR Recognition in .NET Image Library

Visual C#
Home > .NET Imaging SDK > C# > OCR Recognition
If this is your first time to use our DocImageSDK, we strongly suggest you reading How to Start first!

RasterEdge provides users the most standard and comprehensive Optical Character Recognition SDK technology that is fully developed, highly accurate and easy to work within C#.NET, VB.NET, web ASP.NET and .NET WinForms programming environments. This tutorial article mainly talks about RasterEdge's high level OCR Add-on toolkit in C# class. With this C# imaging OCR SDK, users are supposed to extract text from scanned documents or image-only PDFs and convert images to text-searchable formats rapidly.
So, if you want to deploy OCR recognition, RasterEdge OCR Add-On is your best choice. Flexible C# OCR recognition, detecting and processing options are provided for a better performance.
  • Improve OCR color image reading quality
  • Distinguish between OCR region types
  • Clean up after text translation
  • Cancel OCR in your C# application progress easily
  • Count total number of each word in images
  • Support font mapping in your C# image project
We offer the specific information of OCR SDK technology of RasterEdge .NET Imaging DLL library:
  • Overview of OCR recognition technology benefits when applied to Visual C# program
  • C# Windows imaging project creation tutorials for developers and end users
  • Methods for users to extract text out from image using OCR reader
  • Entire managed C# sample code for users to accurately extract and process text from image by using our C# OCR image reading control library
Overview of Visual C# OCR SDK Technology Benefits
  • Free to implement the reliable and high performance Optical Character Recognition in any .NET application or environment
  • Simple to integrate .NET Imaging OCR Add-on into a C# Windows desktop application
  • Support using this OCR SDK to read image text that from over 10 languages and character sets
  • Able to recognize images captured by a digital camera, scanned documents or image-only PDFs using C# OCR reading project
  • Support both monochrome and bitonal color image recognition for scanned documents and pictures in C#
  • Complete and rapid report of extracted text, including size, font, location, character attribute, etc.
C# Project for OCR Recognition
In this part, you will know how to create a C#.NET project to apply RasterEdge .NET Imaging SDK and OCR Add-On components. Specific steps are as follows.
  1. Please download RasterEdge .NET Imaging SDK that contains OCR Imaging Add-On;
  2. Create a .NET project with Visual C# programming language in Visual Studio 2005 or any greater version;
  3. Copy the created RasterEdge License text file to the new C# project folder;
  4. Integrate RasterEdge .NET Imaging SDK & RasterEdge OCR Add-On DLLs to the create C#.NET project;
    • RasterEdge.Imaging.Basic.dll
    • RasterEdge.Imaging.TesseractOCR.dll
  5. Use following using statements of .NET Imaging SDK & OCR Image Add-On:
using System.Drawing;
using System.Diagnostics;
using RasterEdge.Imaging.TesseractOCR;
For VB.NET developers, please go to OCR recognition for VB.NET. Want to view image and document OCR functions in WinForms or Web applications, please go to OCR recognition in WinForms and web document image OCR recognition.
Methods to Extract Text from Image in C# Project
The following C# methods are used to extract text from image using RasterEdge .NET Imaging SDK and OCR Image Add-On.
Bitmap bmp = new Bitmap(fileName);

Tesseract ocr = new Tesseract();
//set the "tessdata" folder's parent folder
ocr.Init(@"C:\", "eng", false);
List<Word> result = ocr.DoOCR(bmp, Rectangle.Empty);
int lineCount = Tesseract.LineCount(result);
for (int i = 0; i < lineCount; i++)
{
String val = Tesseract.GetLineText(result, i);
Debug.WriteLine("
Line " + i + ": " + val);
}
Entire C# Demo Code to Extract Text from Image
The following C# demo code shows how to extract text from image using the above image text extracting API solution.
using System.Drawing;
using System.Diagnostics;
using RasterEdge.Imaging.TesseractOCR;

namespace RE__Test
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

public static string FolderName = "c:/";

private void button1_Click(object sender, EventArgs e)
{
string fileName = FolderName + "SampleOCR.bmp";

Bitmap bmp = new Bitmap(fileName);

Tesseract ocr = new Tesseract();
//set the "tessdata" folder's parent folder
ocr.Init(@"C:\", "eng", false);
List<Word> result = ocr.DoOCR(bmp, Rectangle.Empty);

foreach (Word word in result)
{
Debug.WriteLine(word.Confidence + "
: " + word.Text + " (" + word.Left + ", " + word.Top + ")"
+ "
W=" + (word.Right - word.Left) + ", H=" + (word.Bottom - word.Top));
}

Bitmap newBmp = new Bitmap(bmp.Width, bmp.Height, bmp.PixelFormat);
Graphics g = Graphics.FromImage(newBmp);
g.DrawImage(bmp, 0, 0);

foreach (Word word in result)
{
g.DrawRectangle(new Pen(new SolidBrush(Color.Red)),
new Rectangle(word.Left, word.Top, (word.Right - word.Left), (word.Bottom - word.Top)));
}

newBmp.Save(fileName + "
.new.bmp");

int lineCount = Tesseract.LineCount(result);
Debug.WriteLine("
Line Count: " + lineCount);
for (int i = 0; i < lineCount; i++)
{
String val = Tesseract.GetLineText(result, i);
Debug.WriteLine("
Line " + i + ": " + val);
}
}
}
}
What Is Optical Character Recognition?
What will you do when you want to digitize a paper document, such as a magazine article, booklet, electronic PDF, Word or other document? Without any doubt, the smartest solution is using a scanner or OCR software to convert all the required material into readable digital format.
OCR, shorten form of Optical Character Recognition, is a toolkit that enables users to convert different types of scanned documents, PDF files or images captured by a digital camera into editable and searchable digit data. With OCR recognizer, users can easily and fast extract all the characters as well as their properties from image or document.


Recommend this to Google+