How to Start Tutorials Troubleshooting Main Operations Convert PDF Read PDF Edit PDF PDF Report Generator Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

C# PDF Text Search Library
How to get, search text with coordinates from PDF file using C# .net


C# guide about how to search text with regular expressions in PDF document and obtain text search results with coordinates in C# ASP.NET, Windows application



In this C# tutorial, you learn how to search text in PDF file in the C# ASP.NET Core, MVC, Web, Windows applications.

  • Search specified text in PDF document, pages, page regions
  • Search horizontal or vertical text using regular expressions
  • Search and get coordinates of text search results in pdf document
  • Search, find, and replace or mark up text within PDF document using C# .NET API for .NET Core and framework.

How to search, get coordinates of text in PDFs programmatically using C#

  1. Download XDoc.PDF Text Reader C# library
  2. Install C# library to search text in PDF document
  3. Step by Step Tutorial




















  • Best Visual Studio .NET PDF document SDK , built on .NET framework 2.0 and compatible with Windows operating system
  • C# PDF text library: c# PDF extract text, replace text in pdf using c#, c# remove text from pdf, c# remove images from pdf, extract image from pdf c# pdfs, how to add image in pdf in c#.
  • Free components and library are easy to be integrated in .NET WinForms application and ASP.NET for searching adobe PDF text in C# class
  • Support .NET Core, ASP.NET Core MVC, .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint
  • C# class sample code for searching text from specified PDF pages in .NET console application
  • Able to find and get PDF text position details in C#.NET application
  • Allow to search defined PDF file page or the whole document
  • Support search PDF file with various search options, like whole word, ignore case, match string, etc
  • Ability to search and replace PDF text in ASP.NET programmatically






About text search on PDF



Using XDoc.PDF for .NET sdk, you can easily do text search on PDF document. you can find and location text through the following methods:


  1. Do search and find text
  2. Using regular expression to search and find text
  3. Find all text inside a page region
  4. Find the text char by the page position




Text search options


Using c#, you run searches to find specific text items in PDF file. You can run a simple text search, looking for a search term within list of PDF pages, or a page region. Or you can use advanced search options, and search PDF document. Search Options and example C# source code:


  1. WholeWord: Finds only occurrences of the complete word. For example, if you search for the word inside, the words in and side aren't found.
  2. IgnoreCase: Finds only occurrences of the words that match the capitalization you provide. For example, if you search for the word White, the words white and WHITE aren't found.
  3. ContextExpansion: The number or chars will be returned with searched text


RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = "RasterEdge";
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 0;




Text search with regular expression


In C#, you can do advanced text search with regular expression. The following C# example source code support text search on urls.



//  Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;




Get text search results coordinates


After you do a text search on a pdf file using C#, you will get a list of SearchResultItem objects. Each SearchResultItem object has one property CombinedResultArea, which contains the text coordinates information.

  1. Area.X: the text coordinates, left top point X value on the pdf page
  2. Area.Y: the text coordinates, left top point Y value on the pdf page
  3. Area.Width: the text coordinates, area width
  4. Area.Height: the text coordinates, area height


//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}








Using C#, do text search on PDF document


This section content will explain how to do text search on pdf whole document, a specified page, or page region.







C# search text from whole pdf document


The C# code below shows how to do a text search on a pdf document.



        #region search text from pdf document
        internal static void searchTextFromDocument()
        {
            String inputFilePath = @"C:\demo.pdf";
            // Open a document.
            PDFDocument doc = new PDFDocument(inputFilePath);
            // Set the search options
            RESearchOption option = new RESearchOption();
            option.IgnoreCase = true;
            option.WholeWord = true;
            option.ContextExpansion = 10;

            // Search text and save it to SearchResult.
            SearchResult results = doc.Search("RasterEdge", option);
        }
        #endregion






C# search text from specified pdf page


The C# code below shows how to do a text search on a pdf page.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search text "RasterEdge"
String matchString = "RasterEdge";
//  Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
//  Set search range (on the first page)
int pageOffset = 0;
int pageCount = 1;

//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}






C# search text from consecutive pdf pages


The C# code below shows how to do a text search on a pdf pages range.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search text "RasterEdge"
String matchString = "RasterEdge";
//  Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
//  Set search page range (from page 1 to 3)
int pageOffset = 0;
int pageCount = 3;

//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}






Search text from the specified page region


The C# code below shows how to do a text search on a pdf page region



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search text "RasterEdge"
String matchString = "RasterEdge";
//  Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
//  Set target page region in the 1st page.
int pageIndex = 0;
//  Region: start point (0,0), with = 500, height = 300. Unit: pixel (in 96 dpi).
RectangleF pageRegion = new RectangleF(0, 0, 500, 300);

//  Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageIndex, pageRegion);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.Result)
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}








Using C#, do text search with regular expression on PDF document


This section content will explain how to do text search with regular expression on pdf whole document, a specified page, or page region.







Search text with regular expression from the specified page(s)


The C# code below shows how to do a text search with regular expression on pdf pages.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
//  Set search range (from page 1 to 3)
int pageOffset = 0;
int pageCount = 3;

//  Apply searching
MatchResult result = doc.Search(pattern, regexOps, pageOffset, pageCount);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.GetResult())
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}
else
    Console.WriteLine("No Matched Item");






Search text with regular expression from specified page region


The C# code below shows how to do a text search with regular expression on a pdf page region.



String inputFilePath = @"C:\1.pdf";

//  Open file
PDFDocument doc = new PDFDocument(inputFilePath);

//  Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
//  Set target page region in the 1st page.
int pageIndex = 0;
//  Region: start point (0,0), with = 500, height = 300. Unit: pixel (in 96 dpi).
RectangleF pageRegion = new RectangleF(0, 0, 500, 300);

//  Apply searching
MatchResult result = doc.Search(pattern, regexOps, pageIndex, pageRegion);

//  Show result
if (result.HaveMatched)
{
    foreach (SearchResultItem item in result.GetResult())
    {
        Console.WriteLine("Matched String: '{0}'", item.MatchedString);
        Console.WriteLine("Context String: '{0}'", item.ContextString);
        Console.WriteLine("Result Area(s):");
        foreach (SearchResultLocation area in item.CombinedResultArea)
            Console.WriteLine("  {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
                area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
                area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
    }
}
else
    Console.WriteLine("No Matched Item");




C# search and replace text from pdf document


The C# code below shows how to do a text search and replace on a pdf. If you need know more about text replace function in PDF, please go to https://www.rasteredge.com/how-to/csharp-imaging/pdf-text-edit-replace/.



        #region search and replace text from pdf document
        internal static void searchAndReplaceTextFromDocument()
        {
            String inputFilePath = @"C:\demo.pdf";
            // Open a document.
            PDFDocument doc = new PDFDocument(inputFilePath);

            // Set the search options.
            RESearchOption option = new RESearchOption();
            option.IgnoreCase = true;
            option.WholeWord = true;
            option.ContextExpansion = 10;

            // Replace "RasterEdge" with "Image".
            doc.Replace("RasterEdge", "Image", option);
            doc.Save(@"C:\output.pdf");
        }
        #endregion