XDoc.PDF
Features
Tech Specs
How-to C#
Pricing
How to Start Convert PDF Read PDF Build PDF Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

C# PDF Text Search Library
How to get, search text with coordinates from PDF file using C# .net


C# Guide about How to Search Text in PDF Document and Obtain Text Content and Location Information with .NET PDF Control










  • Best Visual Studio .NET PDF document SDK , built on .NET framework 2.0 and compatible with Windows operating system
  • Free components and library are easy to be integrated in .NET WinForms application and ASP.NET for searching adobe PDF text in C# class
  • Support .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint
  • C# class sample code for searching text from specified PDF pages in .NET console application
  • Able to find and get PDF text position details in C#.NET application
  • Allow to search defined PDF file page or the whole document
  • Support search PDF file with various search options, like whole word, ignore case, match string, etc
  • Ability to search and replace PDF text in ASP.NET programmatically






About text search on PDF



Using XDoc.PDF for .NET sdk, you can easily do text search on PDF document. you can find and location text through the following methods:


  1. Do search and find text
  2. Using regular expression to search and find text
  3. Find all text inside a page region
  4. Find the text char by the page position




Text search options


Using c#, you run searches to find specific text items in PDF file. You can run a simple text search, looking for a search term within list of PDF pages, or a page region. Or you can use advanced search options, and search PDF document. Search Options and example C# source code:


  1. WholeWord: Finds only occurrences of the complete word. For example, if you search for the word inside, the words in and side aren't found.
  2. IgnoreCase: Finds only occurrences of the words that match the capitalization you provide. For example, if you search for the word White, the words white and WHITE aren't found.
  3. ContextExpansion: The number or chars will be returned with searched text


RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = "RasterEdge";
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 0;




Text search with regular expression


In C#, you can do advanced text search with regular expression. The following C# example source code support text search on urls.



//  Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;








Using C#, do text search on PDF document


This section content will explain how to do text search on pdf whole document, a specified page, or page region.







C# search text from whole pdf document


The C# code below shows how to do a text search on a pdf document.



        #region search text from pdf document
        internal static void searchTextFromDocument()
        {
            String inputFilePath = @"C:\demo.pdf";
            // Open a document.
            PDFDocument doc = new PDFDocument(inputFilePath);
            // Set the search options
            RESearchOption option = new RESearchOption();
            option.IgnoreCase = true;
            option.WholeWord = true;
            option.ContextExpansion = 10;

            // Search text and save it to SearchResult.
            SearchResult results = doc.Search("RasterEdge", option);
        }
        #endregion






C# search text from specified pdf page


The C# code below shows how to do a text search on a pdf page.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search text "RasterEdge"
String matchString = "RasterEdge";
//  Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
//  Set search range (on the first page)
int pageOffset = 0;
int pageCount = 1;

//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}






C# search text from consecutive pdf pages


The C# code below shows how to do a text search on a pdf pages range.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search text "RasterEdge"
String matchString = "RasterEdge";
//  Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
//  Set search page range (from page 1 to 3)
int pageOffset = 0;
int pageCount = 3;

//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}






Search text from the specified page region


The C# code below shows how to do a text search on a pdf page region



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search text "RasterEdge"
String matchString = "RasterEdge";
//  Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
//  Set target page region in the 1st page.
int pageIndex = 0;
//  Region: start point (0,0), with = 500, height = 300. Unit: pixel (in 96 dpi).
RectangleF pageRegion = new RectangleF(0, 0, 500, 300);

//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageIndex, pageRegion);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}








Using C#, do text search with regular expression on PDF document


This section content will explain how to do text search with regular expression on pdf whole document, a specified page, or page region.







Search text with regular expression from the specified page(s)


The C# code below shows how to do a text search with regular expression on pdf pages.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
//  Set search range (from page 1 to 3)
int pageOffset = 0;
int pageCount = 3;

//  Apply searching
MatchResult result = doc.Search(pattern, regexOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.GetResult())
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}
else
    Console.WriteLine("No Matched Item");






Search text with regular expression from specified page region


The C# code below shows how to do a text search with regular expression on a pdf page region.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
//  Set target page region in the 1st page.
int pageIndex = 0;
//  Region: start point (0,0), with = 500, height = 300. Unit: pixel (in 96 dpi).
RectangleF pageRegion = new RectangleF(0, 0, 500, 300);

//  Apply searching
MatchResult result = doc.Search(pattern, regexOps, pageIndex, pageRegion);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.GetResult())
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}
else
    Console.WriteLine("No Matched Item");




C# search and replace text from pdf document


The C# code below shows how to do a text search and replace on a pdf. If you need know more about text replace function in PDF, please go to http://www.rasteredge.com/how-to/csharp-imaging/pdf-text-edit-replace/.



        #region search and replace text from pdf document
        internal static void searchAndReplaceTextFromDocument()
        {
            String inputFilePath = @"C:\demo.pdf";
            // Open a document.
            PDFDocument doc = new PDFDocument(inputFilePath);

            // Set the search options.
            RESearchOption option = new RESearchOption();
            option.IgnoreCase = true;
            option.WholeWord = true;
            option.ContextExpansion = 10;

            // Replace "RasterEdge" with "Image".
            doc.Replace("RasterEdge", "Image", option);
            doc.Save(@"C:\output.pdf");
        }
        #endregion