- DocImaging for C#
- DocImaging for VB.NET
- SDK Class API Reference
Guide for C#
Core Document Formats
If this is your first time to use our DocImageSDK, we strongly suggest you reading How to Start first!
RasterEdge provides users the most standard and comprehensive Optical Character Recognition SDK technology that is fully developed, highly accurate and easy to work within C#.NET, VB.NET, web ASP.NET and .NET WinForms programming environments. This tutorial article mainly talks about RasterEdge's high level OCR Add-on toolkit in C# class. With this C# imaging OCR SDK, users are supposed to extract text from scanned documents or image-only PDFs and convert images to text-searchable formats rapidly.
So, if you want to deploy OCR recognition, RasterEdge OCR Add-On is your best choice. Flexible C# OCR recognition, detecting and processing options are provided for a better performance.
We offer the specific information of OCR SDK technology of RasterEdge .NET Imaging DLL library:
Overview of Visual C# OCR SDK Technology Benefits
C# Project for OCR Recognition
In this part, you will know how to create a C#.NET project to apply RasterEdge .NET Imaging SDK and OCR Add-On components. Specific steps are as follows.
For VB.NET developers, please go to OCR recognition for VB.NET. Want to view image and document OCR functions in WinForms or Web applications, please go to OCR recognition in WinForms and web document image OCR recognition.
Methods to Extract Text from Image in C# Project
The following C# methods are used to extract text from image using RasterEdge .NET Imaging SDK and OCR Image Add-On.
WorkRegistry.Reset(); // load jpeg image REImage img = new REImage(@"c:\test1.jpeg"); //Set the tranning data path, please put eng.traineddata (for english) //under the directory you specified. OCRHandler.SetTrainResourcePath(@"c:\source"); //resize image to improve accuracy if the image is clear enough skip this img = img.Resize(new Size((int)img.Width * 2, (int)img.Height * 2)); // import reimage for OCR OCRPage page1 = OCRHandler.Import(img); //recognize characters from this image, default language is english OCRPage page = OCRHandler.Import(img); page.Recognize(); Console.WriteLine(page.GetText());
OCR text from Scanned PDF or Tiff Documents
The following C# demo code shows how to extract text from pdf and tiff documents using OCR APIs
//Set the training data path please put eng.traineddata (for english) //under the path specified OCRHandler.SetTrainResourcePath(@"c:\source"); // set supported language, you can also set this attribute in OCRPage or OCRZone OCRHandler.Settings.LanguagesEnabled.Add(Language.Eng); //load pdf or tiff document PDFDocument doc = new PDFDocument(@"C:\sample.pdf"); //TIFFDocument doc = new TIFFDocument(@"c:\sample.tif"); // load the first page to Recongnize PDFPage page = (PDFPage)doc.GetPage(0); //TIFFPage page = (TIFFPage)doc.GetPage(0); //rasterize the page with scale factor Bitmap bmp = page.GetBitmap(1.5f); // import the page to recoginze OCRPage oPage = OCRHandler.Import(bmp); oPage.Recognize(); // save ocr result as other documet format(txt pdf, svg). oPage.SaveTo(MIMEType.TXT,@"c:\sample.txt"); //or you can output the text directly Console.WriteLine(oPage.GetText());
What Is Optical Character Recognition?
What will you do when you want to digitize a paper document, such as a magazine article, booklet, electronic PDF, Word or other document? Without any doubt, the smartest solution is using a scanner or OCR software to convert all the required material into readable digital format.
OCR, shorten form of Optical Character Recognition, is a toolkit that enables users to convert different types of scanned documents, PDF files or images captured by a digital camera into editable and searchable digit data. With OCR recognizer, users can easily and fast extract all the characters as well as their properties from image or document.