OCR: How to C#
Using OCR SDK for C#.NET
Sample C#.NET Codes
Extract Text from Tiff File
  |  
Home ›› XImage.OCR ›› C# OCR: Extract Text from Tiff File

C#: Extract Text from Tiff File Using OCR SDK


Online C# Guide for Text Extraction from Tiff File Using .NET OCR SDK



Related .net document control helps:
asp.net file viewer: ASP.NET Document Viewer Control in C#: View, Annotate, Redact web document files in ASP.NET using C#
asp.net tiff file viewer: ASP.NET Tiff Document Viewer: view, annotate Tiff files in HTML using C# Control
asp.net pdf file permission control: ASP.NET PDF File Permission Control: apply, remove PDF file permission settings in C#
asp.net edit pdf image control: ASP.NET PDF Image Edit Control: online insert, edit PDF images in C#
asp.net mvc pdf editor control: ASP.NET MVC PDF Editor: view, annotate, redact, edit PDF document in C# ASP.NET MVC
asp.net sharepoint pdf editor control: ASP.NET SharePoint PDF Editor Control: view, edit, annotate, redact PDF document in SharePoint sites
asp.net mvc file viewer: ASP.NET MVC Document Viewer: view, annotate, redact files on ASP.NET MVC web projects


Overview



By using XImage.OCR for .NET, C# programmers are entitled to implement mature and fast OCR recognition for Tiff, scanned PDF and multiple other image file formats like Jpeg, Bmp, Png, Gif, etc. This online guide will focus on implementing OCR technology on Tiff image file. To be more specific, C# programmers are able to do the following aspects. Respective demo codes are provided in the coming parts.


Extract text from whole Tiff file


Extract text from specified Tiff page


Extract text from specified zone in Tiff page


Scan image and output OCR result to PDF document


Scan image and output OCR result to Word document


Before moving onto using C# demo codes below, please firstly install XImage.OCR for .NET into your C# project. What should be noticed here is that respective DLL libraries should also be integrated as project references if you need to OCR specific files. For Tiff image, RasterEdge.XDoc.Tiff.dll should be used as well.




C# Project DLLs: Extract Text from Tiff File Using OCR SDK



In order to run the following scan tiff image text sample code successfully, please do as follows:


Add References


  RasterEdge.XImage.OCR.dll


  RasterEdge.XImage.OCR.Tesseract.dll


  RasterEdge.Imaging.Basic.dll


  RasterEdge.Imaging.Basic.Codec.dll


  RasterEdge.Imaging.Drawing.dll


  RasterEdge.Imaging.Font.dll


  RasterEdge.Imaging.Processing.dll


  RasterEdge.XImage.AdvancedCleanup.Core.dll


  RasterEdge.XImage.Raster.Core.dll


  RasterEdge.XImage.Raster.dll


  RasterEdge.XDoc.TIFF.dll


Using Namespaces


  using RasterEdge.XDoc.TIFF;


  using RasterEdge.XImage.OCR;


Note: When you get the error "Could not load file or assembly 'RasterEdge.Imaging.Basic' or any other assembly or one of its dependencies. An attempt to load a program with an incorrect format", please check your configure as follows:

       

       If you are using x64 libraries/dlls, Right click the project -> Properties -> Build -> Platform target: x64.

       

       If using x86, the platform target should be x86.




C# OCR: Extract Text from Whole Tiff File




            // Open a tiff file.
            String inputFilePath = @"C:\input.tif";
            TIFFDocument doc = new TIFFDocument(inputFilePath);

            // The folder that contains '.traineddata' files.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");

            // Set output file path.
            String outputFilePath = @"C:\Output.txt";
            StreamWriter writer = new StreamWriter(outputFilePath);
            for (int i = 0; i < doc.GetPageCount(); i++)
            {
                BasePage page = doc.GetPage(i);
                //the default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
                Bitmap bmp = page.ConvertToImage(96);//192,288....
                OCRPage ocrPage = OCRHandler.Import(bmp);
                ocrPage.Recognize();
                writer.WriteLine(ocrPage.GetText());
            }
            writer.Close();





C# OCR: Extract Text from Specified Tiff Page




            // Open a tif file.
            String inputFilePath = @"C:\input.tif";
            TIFFDocument doc = new TIFFDocument(inputFilePath);
            BasePage page = doc.GetPage(0);
            //the default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
            Bitmap bmp = page.ConvertToImage(96);//192,288....
            // The folder that contains '.traineddata' files.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");
            OCRPage ocrPage = OCRHandler.Import(bmp);
            ocrPage.Recognize();
            ocrPage.SaveTo(MIMEType.TXT, @"C:\output.txt");





C# OCR: Extract Text from Specified Zone in Tiff Page




// The folder that contains '.traineddata' files.
OCRHandler.SetTrainResourcePath(DefaultSourceFolder);

// Set input file path.
String inputFilePath = RootFolder + "\\" + "Test.tif";

// Set output file path.
String outputFilePath = RootFolder + "\\" + "Output2.txt";

// Import the TIFF file.
OCRDocument doc = OCRHandler.Import(inputFilePath);

// Get the first page.
OCRPage page = doc.GetPage(0);

// Get a page zone start from point (10, 10) with width 400, height 300.
OCRZone pageZone = page.CreateZone(new Rectangle(10, 10, 400, 300));

// Apply recognizing.
pageZone.Recognize();

// Output the result to a text file.
pageZone.SaveTo(MIMEType.TXT, outputFilePath);





C# OCR: Scan Tiff and Output OCR Result to PDF



Add Reference(Extra)


  RasterEdge.XDoc.PDF.dll




            // Open a TIFF file.
            String inputFilePath = @"C:\input.tif";
            TIFFDocument doc = new TIFFDocument(inputFilePath);

            // The folder that contains '.traineddata' files.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");
            // Set output file path.
            Stream[] streams = new MemoryStream[doc.GetPageCount()];
            for (int i = 0; i < doc.GetPageCount(); i++)
            {
                BasePage page = doc.GetPage(i);
                streams[i] = new MemoryStream();
                //the default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
                Bitmap bmp = page.ConvertToImage(96);//192,288....
                OCRPage ocrPage = OCRHandler.Import(bmp);
                ocrPage.Recognize();
                ocrPage.SaveTo(MIMEType.PDF, streams[i]);
                streams[i].Seek(0, SeekOrigin.Begin);
            }
            PDFDocument.CombineDocument(streams, @"C:\output.pdf");





C# OCR: Scan Tiff and Output OCR Result to Word




            // Open a TIFF file.
            String inputFilePath = @"C:\input.tif";
            TIFFDocument doc = new TIFFDocument(inputFilePath);

            // The folder that contains '.traineddata' files.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");
            // Set output file path.
            Stream[] streams = new MemoryStream[doc.GetPageCount()];
            for (int i = 0; i < doc.GetPageCount(); i++)
            {
                BasePage page = doc.GetPage(i);
                streams[i] = new MemoryStream();
                //the default resolution is 96, if you set larger, it will be helpful to recognize the text, but it can't be too large.
                Bitmap bmp = page.ConvertToImage(96);//192,288....
                OCRPage ocrPage = OCRHandler.Import(bmp);
                ocrPage.Recognize();
                ocrPage.SaveTo(MIMEType.DOCX, streams[i]);
                streams[i].Seek(0, SeekOrigin.Begin);
            }
            DOCXDocument.CombineDocument(streams, @"C:\output.docx");