C# TIFF Image Library
Extract Text from TIFF File in C#.NET
Complete C# .NET Tutorial for How to Extract Text from TIFF File
C# Extract Text from TIFF File Overview
Using RasterEdge XDoc.Tiff for .NET and .NET OCR SDK, C# programmers can implement high performance text extraction from Tiff image file. Mature and reliable .NET APIs for extracting text from Tiff file in Visual C# .NET project are well-designed and provided. Moreover, text content, style, and format of original Tiff image can be retained during extraction.
By simply integrating our .NET SDKs, C# users can easily add and perform text extraction functionality into .NET Tiff image processing application. If you've already add respective DLL assemblies into your C# project as references, you may directly have a quick test by using the following C# sample code.
C# Code to Extract Certain Page Text from Multi-page TIFF
The following C# coding example demonstrates how to extract the first page text from a multi-page TIFF file, and then save the result as a text file. Certainly, you may also render it to a PDF, Word or SVG file.
// Set the training data path. Please put eng.traineddata (for english) under the path specified.
OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");
// Set supported language. You can also set this attribute in OCRPage or OCRZone.
OCRHandler.Settings.LanguagesEnabled.Add(Language.Eng);
// Load Tiff document.
TIFFDocument doc = new TIFFDocument(@"C:\demo1.tif");
// Load the first page to recongnize.
TIFFPage page = (TIFFPage)doc.GetPage(0);
// Import the page to recoginze.
OCRPage oPage = OCRHandler.Import(page);
oPage.Recognize();
String outputTxt = @"C:\tiffpage0.txt";
// Save ocr result as other documet formats, like txt, pdf, and svg.
oPage.SaveTo(MIMEType.TXT, outputTxt);
C# Code to Extract Certain Page Text from Multi-page TIFF and Save to PDF
The following C# coding example demonstrates how to extract the first page text from a multi-page TIFF file, and then save the result as a pdf file. Certainly, you may also render it to a PDF, Word or SVG file.
// Set the training data path. Please put eng.traineddata (for english) under the path specified.
OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");
// Set supported language. You can also set this attribute in OCRPage or OCRZone.
OCRHandler.Settings.LanguagesEnabled.Add(Language.Eng);
// Load Tiff document.
TIFFDocument doc = new TIFFDocument(@"C:\demo1.tif");
// Load the first page to recongnize.
TIFFPage page = (TIFFPage)doc.GetPage(0);
// Import the page to recoginze.
OCRPage oPage = OCRHandler.Import(page);
oPage.Recognize();
String outputTxt = @"C:\tiffpage0.pdf";
// Save ocr result as other documet formats, like txt, pdf, and svg.
oPage.SaveTo(MIMEType.PDF, outputTxt);
C# Code to Extract Text from Multi-page TIFF Document and Save to PDF
The following C# coding example demonstrates how to extract text from a multi-page TIFF file, and then save the result as a pdf file. Certainly, you may also render it to a PDF, Word or SVG file.
// The folder that contains '.traineddata' files.
OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");
// Set input file path.
String inputFilePath = @"C:\input.tif";
// Set output file path.
String outputFilePath = @"C:\Output.pdf";
TIFFDocument tiff = new TIFFDocument(inputFilePath);
int pageCount = tiff.GetPageCount();
MemoryStream[] stream = new MemoryStream[pageCount];
for (int i = 0; i < pageCount; i++)
{
BasePage page = tiff.GetPage(i);
Bitmap bmp = page.ConvertToImage();
OCRPage ocrPage = OCRHandler.Import(page);
ocrPage.Recognize();
stream[i] = new MemoryStream();
ocrPage.SaveTo(MIMEType.PDF, stream[i]);
stream[i].Seek(0, SeekOrigin.Begin);
}
PDFDocument.CombineDocument(stream, outputFilePath);