|
C# PDF Text Search Library
How to get, search text with coordinates from PDF file using C# .net
C# guide about how to search text with regular expressions in PDF document and obtain text search results with coordinates in C# ASP.NET, Windows application
In this C# tutorial, you learn how to search text in PDF file in the C# ASP.NET Core, MVC, Web, Windows applications.
- Search specified text in PDF document, pages, page regions
- Search horizontal or vertical text using regular expressions
- Search and get coordinates of text search results in pdf document
- Search, find, and replace or mark up text within PDF document using C# .NET API for .NET Core and framework.
How to search, get coordinates of text in PDFs programmatically using C#
- Best Visual Studio .NET PDF document SDK , built on .NET framework 2.0 and compatible with Windows operating system
- C# PDF text library:
c# PDF extract text,
replace text in pdf using c#,
c# remove text from pdf,
c# remove images from pdf,
extract image from pdf c# pdfs,
how to add image in pdf in c#.
- Free components and library are easy to be integrated in .NET WinForms application and ASP.NET for searching adobe PDF text in C# class
- Support .NET Core, ASP.NET Core MVC, .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint
- C# class sample code for searching text from specified PDF pages in .NET console application
- Able to find and get PDF text position details in C#.NET application
- Allow to search defined PDF file page or the whole document
- Support search PDF file with various search options, like whole word, ignore case, match string, etc
- Ability to search and replace PDF text in ASP.NET programmatically
About text search on PDF
Using XDoc.PDF for .NET sdk, you can easily do text search on PDF document. you can find and location text through the following methods:
- Do search and find text
- Using regular expression to search and find text
- Find all text inside a page region
- Find the text char by the page position
Text search options
Using c#, you run searches to find specific text items in PDF file. You can run a simple text search, looking for a search term within list of PDF pages, or a page region.
Or you can use advanced search options, and search PDF document. Search Options and example C# source code:
- WholeWord: Finds only occurrences of the complete word. For example, if you search for the word inside, the words in and side aren't found.
- IgnoreCase: Finds only occurrences of the words that match the capitalization you provide. For example, if you search for the word White, the words white and WHITE aren't found.
- ContextExpansion: The number or chars will be returned with searched text
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = "RasterEdge";
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 0;
Text search with regular expression
In C#, you can do advanced text search with regular expression. The following C# example source code support text search on urls.
// Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
Get text search results coordinates
After you do a text search on a pdf file using C#, you will get a list of SearchResultItem objects. Each SearchResultItem object has
one property CombinedResultArea, which contains the text coordinates information.
- Area.X: the text coordinates, left top point X value on the pdf page
- Area.Y: the text coordinates, left top point Y value on the pdf page
- Area.Width: the text coordinates, area width
- Area.Height: the text coordinates, area height
// Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);
// Show result
if (result.HaveMatched)
{
foreach (SearchResultItem item in result.Result)
{
Console.WriteLine("Matched String: '{0}'", item.MatchedString);
Console.WriteLine("Context String: '{0}'", item.ContextString);
Console.WriteLine("Result Area(s):");
foreach (SearchResultLocation area in item.CombinedResultArea)
Console.WriteLine(" {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
}
}
Using C#, do text search on PDF document
This section content will explain how to do text search on pdf whole document, a specified page, or page region.
C# search text from whole pdf document
The C# code below shows how to do a text search on a pdf document.
#region search text from pdf document
internal static void searchTextFromDocument()
{
String inputFilePath = @"C:\demo.pdf";
// Open a document.
PDFDocument doc = new PDFDocument(inputFilePath);
// Set the search options
RESearchOption option = new RESearchOption();
option.IgnoreCase = true;
option.WholeWord = true;
option.ContextExpansion = 10;
// Search text and save it to SearchResult.
SearchResult results = doc.Search("RasterEdge", option);
}
#endregion
C# search text from specified pdf page
The C# code below shows how to do a text search on a pdf page.
String inputFilePath = @"C:\1.pdf";
// Open file
PDFDocument doc = new PDFDocument(inputFilePath);
// Search text "RasterEdge"
String matchString = "RasterEdge";
// Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
// Set search range (on the first page)
int pageOffset = 0;
int pageCount = 1;
// Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);
// Show result
if (result.HaveMatched)
{
foreach (SearchResultItem item in result.Result)
{
Console.WriteLine("Matched String: '{0}'", item.MatchedString);
Console.WriteLine("Context String: '{0}'", item.ContextString);
Console.WriteLine("Result Area(s):");
foreach (SearchResultLocation area in item.CombinedResultArea)
Console.WriteLine(" {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
}
}
C# search text from consecutive pdf pages
The C# code below shows how to do a text search on a pdf pages range.
String inputFilePath = @"C:\1.pdf";
// Open file
PDFDocument doc = new PDFDocument(inputFilePath);
// Search text "RasterEdge"
String matchString = "RasterEdge";
// Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
// Set search page range (from page 1 to 3)
int pageOffset = 0;
int pageCount = 3;
// Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageOffset, pageCount);
// Show result
if (result.HaveMatched)
{
foreach (SearchResultItem item in result.Result)
{
Console.WriteLine("Matched String: '{0}'", item.MatchedString);
Console.WriteLine("Context String: '{0}'", item.ContextString);
Console.WriteLine("Result Area(s):");
foreach (SearchResultLocation area in item.CombinedResultArea)
Console.WriteLine(" {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
}
}
Search text from the specified page region
The C# code below shows how to do a text search on a pdf page region
String inputFilePath = @"C:\1.pdf";
// Open file
PDFDocument doc = new PDFDocument(inputFilePath);
// Search text "RasterEdge"
String matchString = "RasterEdge";
// Set search option
RESearchOption searchOps = new RESearchOption();
searchOps.MatchString = matchString;
searchOps.IgnoreCase = true;
searchOps.WholeWord = false;
searchOps.ContextExpansion = 10;
// Set target page region in the 1st page.
int pageIndex = 0;
// Region: start point (0,0), with = 500, height = 300. Unit: pixel (in 96 dpi).
RectangleF pageRegion = new RectangleF(0, 0, 500, 300);
// Apply searching
SearchResult result = doc.Search(matchString, searchOps, pageIndex, pageRegion);
// Show result
if (result.HaveMatched)
{
foreach (SearchResultItem item in result.Result)
{
Console.WriteLine("Matched String: '{0}'", item.MatchedString);
Console.WriteLine("Context String: '{0}'", item.ContextString);
Console.WriteLine("Result Area(s):");
foreach (SearchResultLocation area in item.CombinedResultArea)
Console.WriteLine(" {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
}
}
Using C#, do text search with regular expression on PDF document
This section content will explain how to do text search with regular expression on pdf whole document, a specified page, or page region.
Search text with regular expression from the specified page(s)
The C# code below shows how to do a text search with regular expression on pdf pages.
String inputFilePath = @"C:\1.pdf";
// Open file
PDFDocument doc = new PDFDocument(inputFilePath);
// Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
// Set search range (from page 1 to 3)
int pageOffset = 0;
int pageCount = 3;
// Apply searching
MatchResult result = doc.Search(pattern, regexOps, pageOffset, pageCount);
// Show result
if (result.HaveMatched)
{
foreach (SearchResultItem item in result.GetResult())
{
Console.WriteLine("Matched String: '{0}'", item.MatchedString);
Console.WriteLine("Context String: '{0}'", item.ContextString);
Console.WriteLine("Result Area(s):");
foreach (SearchResultLocation area in item.CombinedResultArea)
Console.WriteLine(" {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
}
}
else
Console.WriteLine("No Matched Item");
Search text with regular expression from specified page region
The C# code below shows how to do a text search with regular expression on a pdf page region.
String inputFilePath = @"C:\1.pdf";
// Open file
PDFDocument doc = new PDFDocument(inputFilePath);
// Search pattern for URL
String pattern = @"\b(\S+)://(\S+)\b";
RegexOptions regexOps = RegexOptions.IgnoreCase;
// Set target page region in the 1st page.
int pageIndex = 0;
// Region: start point (0,0), with = 500, height = 300. Unit: pixel (in 96 dpi).
RectangleF pageRegion = new RectangleF(0, 0, 500, 300);
// Apply searching
MatchResult result = doc.Search(pattern, regexOps, pageIndex, pageRegion);
// Show result
if (result.HaveMatched)
{
foreach (SearchResultItem item in result.GetResult())
{
Console.WriteLine("Matched String: '{0}'", item.MatchedString);
Console.WriteLine("Context String: '{0}'", item.ContextString);
Console.WriteLine("Result Area(s):");
foreach (SearchResultLocation area in item.CombinedResultArea)
Console.WriteLine(" {0}: {1},{2}; W={3}; H={4}", area.PageIndex,
area.Area.X.ToPixel(), area.Area.Y.ToPixel(),
area.Area.Width.ToPixel(), area.Area.Height.ToPixel());
}
}
else
Console.WriteLine("No Matched Item");
C# search and replace text from pdf document
The C# code below shows how to do a text search and replace on a pdf. If you need know more about
text replace function in PDF, please go to
https://www.rasteredge.com/how-to/csharp-imaging/pdf-text-edit-replace/.
#region search and replace text from pdf document
internal static void searchAndReplaceTextFromDocument()
{
String inputFilePath = @"C:\demo.pdf";
// Open a document.
PDFDocument doc = new PDFDocument(inputFilePath);
// Set the search options.
RESearchOption option = new RESearchOption();
option.IgnoreCase = true;
option.WholeWord = true;
option.ContextExpansion = 10;
// Replace "RasterEdge" with "Image".
doc.Replace("RasterEdge", "Image", option);
doc.Save(@"C:\output.pdf");
}
#endregion
|