pdf conversion in c# : Extracting data from pdf to excel application software cloud windows html .net class itext_so-sample4-part1255

Pageevents
Whencreatingadocumentfromscratch,iTextcantriggeraseriesofeventsforinstancewhena
newpagestartsorends,whenaparagraphisadded,andsoon…
Howtoaddarectangletoeverypageofadocument?
I’musingiTexttocreateaPDFdocument.RightnowIamtryingtogetarectangleon
everysinglepageofthedocumentbutI’mnotsurehowtodothis.Itriedaddingthisat
theendofmycode:
PdfContentByte cb writer.getDirectContent();
for (int pgCnt = 1; pgCnt <= writer.getPageNumber(); pgCnt++) {
cb.saveState();
cb.setColorStroke(new CMYKColor(1f, 0f, 0f, 0f));
cb.setColorFill(new CMYKColor(1f, 0f, 0f, 0f));
cb.rectangle(20,10,10,820);
cb.fill();
cb.restoreState();
}
butthisonlyaddstherectangleonthelastpageanditkindofmakesensebecauseI’mnot
usingthe
pgCnt
anywhere.HowcanIspecifythatIwanttherectangleonpagenumber
pgCnt
,soIcanaddtherectangleoneverypage?
PostedonStackOverflowonMar19,2013⁶⁴byCarlaStabille⁶⁵
PleasetakealookattheentriesforthekeywordPageevents⁶⁶ontheofficialiTextsite.Youneedto
extendthePdfPageEventHelper⁶⁷classandaddyourcodetothe
onEndPage()
method.
⁶⁴
http://stackoverflow.com/questions/16638406/how-can-i-add-rectangle-on-every-page-of-a-document-using-itext
⁶⁵
http://stackoverflow.com/users/1883606/carla-stabile
⁶⁶
http://itextpdf.com/themes/keyword.php?id=204
⁶⁷
http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfPageEventHelper.html
Extracting data from pdf to excel - extract form data from PDF in C#.net, ASP.NET, MVC, Ajax, WPF
Help to Read and Extract Field Data from PDF with a Convenient C# Solution
pdf form save with reader; extract data from pdf forms
Extracting data from pdf to excel - VB.NET PDF Form Data Read library: extract form data from PDF in vb.net, ASP.NET, MVC, Ajax, WPF
Convenient VB.NET Solution to Read and Extract Field Data from PDF
sign pdf form reader; online form pdf output
Pageevents
36
public void onEndPage(PdfWriter writerDocument document) {
PdfContentByte cb writer.getDirectContent();
cb.saveState();
cb.setColorStroke(new CMYKColor(1f, 0f, 0f, 0f));
cb.setColorFill(new CMYKColor(1f, 0f, 0f, 0f));
cb.rectangle(20,10,10,820);
cb.fill();
cb.restoreState();
}
Createaninstanceofyourcustompageeventclass,anddeclareittothewriterbeforeopeningthe
document:
writer.setPageEvent(myPageEventInstance);
Nowyourrectanglewillbedrawnoneverypage,ontopoftheexistingcontent.Ifyouwantthe
rectangleundertheexistingcontent:replace
getDirectContent()
with
getDirectContentUnder()
.
HowcanIaddanimagetoallpagesofmyPDF?
Ihave been trying to addan imagetoall l pagesusingiTextSharp. . Theimage needs
tobe OVER all l content of every page. . Ihaveused thefollowingcode below w all the
other
doc.add()
Document doc = new Document(iTextSharp.text.PageSize.A4, 1010301);
PdfWriter writer = = PdfWriter.GetInstance(doc,
new FileStream(Server.MapPath("~/pdf/" + fname), FileMode.Create));
doc.Open();
Image image e = = Image.GetInstance(Server.MapPath("~/images/draft.png"));
image.SetAbsolutePosition(12300);
writer.DirectContent.AddImage(image, false);
doc.Close();
Theabovecodeonlyinsertsanimageinthelastpage.Isthereanywaytoinserttheimage
inthesamewayinallpages?
PostedonStackOverflowonFeb20,2014⁶⁸byNevilleNazerane⁶⁹
It’snormalthattheimageisonlyaddedonce;afterall:you’readdingitonlyonce.
Youshouldcreateadocumentin5stepsandaddaneventinstep2:
⁶⁸
http://stackoverflow.com/questions/21908651/add-an-image-in-all-pages-of-pdf
⁶⁹
http://stackoverflow.com/users/991609/neville-nazerane
C# PDF Text Extract Library: extract text content from PDF file in
Free online source code for extracting text from adobe PDF document in C#.NET class. Enable extracting PDF text to another PDF file, TXT and SVG formats.
extract pdf data into excel; saving pdf forms in acrobat reader
C# PDF Image Extract Library: Select, copy, paste PDF images in C#
C# programming sample for extracting all images from PDF. // Open a document. C# programming sample for extracting all images from a specific PDF page.
extracting data from pdf into excel; how to save fillable pdf form in reader
Pageevents
37
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.GetInstance(document, stream);
MyEvent event new MyEvent();
writer.PageEvent = event;
// step 3
document.Open();
// step 4
// Add whatever r content t you want to o add
// step 5
document.Close();
Youhavetowritethe
MyEvent
classyourself:
protected class MyEvent : PdfPageEventHelper r {
Image image;
public override e void OnOpenDocument(PdfWriter writer, Document document) {
image = = Image.GetInstance(Server.MapPath("~/images/draft.png"));
image.SetAbsolutePosition(12300);
}
public override e void OnEndPage(PdfWriter writer, Document document) {
writer.DirectContent.AddImage(image);
}
}
The
OnEndPage()
inclass
MyEvent
willbetriggeredeverytimethe
PdfWriter
hasfinishedapage.
Hencetheimagewillbeaddedoneverypage.
Caveat:itisimportanttocreatethe
image
objectoutsidethe
OnEndPage()
method,otherwisethe
imagebytesriskbeingaddedasmanytimesastherearepagesinyourPDF(leadingtoabloated
PDF).
VB.NET PDF Page Extract Library: copy, paste, cut PDF pages in vb.
VB.NET PDF - PDF File Pages Extraction Guide. Detailed VB.NET Guide for Extracting Pages from Microsoft PDF Doc. Free PDF document
how to fill out pdf forms in reader; how to make a pdf form fillable in reader
VB.NET PDF Text Extract Library: extract text content from PDF
Enable extracting PDF text to another PDF file, and other formats such as TXT and SVG form. OCR text from scanned PDF by working with XImage.OCR SDK.
extract data from pdf using java; export pdf form data to excel
Pageevents
38
Howtosetafixedbackgroundimageforallmypages?
OnButtonClick,Igenerate4pagesonmyPDF,iaddedthisimagetoprovideabackground
image
string imageFilePath = = parent t + "/Images/bg_image.jpg";
iTextSharp.text.Image jpg = = iTextSharp.text.Image.GetInstance(imageFilePath);
jpg.ScaleToFit(17001000);
jpg.Alignment = = iTextSharp.text.Image.UNDERLYING;
jpg.SetAbsolutePosition(00);
document.Add(jpg);
Itworksonlywith1page,butwhenIgenerateaPDFthatcontainsmanyrecordsandhave
severalpages,the
bg
imageisonlyatthelastpage.Iwanttoapplythebackgroundimage
toallofthepages.
PostedonStackOverflowonNov1,2014⁷⁰bydandy⁷¹
Itisnormalthatthebackgroundisaddedonlyonce,becauseyou’readdingitonlyonce.
Ifyouwanttoaddcontenttoeverypage,youshouldnotdothatmanuallybecauseyoudon’tknow
whenanewpagewillbecreatedbyiText.Insteadyoushoulduseapageevent.
Theideaistocreateanimplementationofthe
PdfPageEvent
interface,forinstancebyextending
the
PdfPageEventHelper
classandoverridingthe
OnEndPage()
method:
class TemplateHelper : PdfPageEventHelper {
private Stationery instance;
public TemplateHelper() { }
public TemplateHelper(Stationery instance) {
this.instance = instance;
}
/**
* @see com.itextpdf.text.pdf.PdfPageEventHelper#onEndPage(
*
com.itextpdf.text.pdf.PdfWriter, com.itextpdf.text.Document)
*/
public override e void OnEndPage(PdfWriter writer, Document document) {
writer.DirectContentUnder.AddTemplate(instance.page, 00);
}
}
⁷⁰
http://stackoverflow.com/questions/26688288/set-a-fix-background-image-for-all-my-pages-in-pdf-itext-asp-c-sharp
⁷¹
http://stackoverflow.com/users/4131886/dandy
VB.NET PDF Image Extract Library: Select, copy, paste PDF images
Sample for extracting all images from PDF in VB.NET program. Sample for extracting an image from a specific position on PDF in VB.NET program.
filling out pdf forms with reader; how to flatten a pdf form in reader
How to C#: Basic SDK Concept of XDoc.PDF for .NET
And PDF file text processing like text writing, extracting, searching, etc., are to load a PDF document from file or query data and save the PDF document.
save pdf forms in reader; how to fill out a pdf form with reader
Pageevents
39
Inthiscase,weadda
PdfTemplate
,butitisveryeasytoaddan
Image
replacingthe
Stationery
instancewithan
Image
instanceandreplacingthe
AddTemplate()
methodwiththe
AddImage()
method.
Onceyouhaveaninstanceofyourcustompageevent,youneedtodeclareittothe
PdfWriter
instance:
writer.PageEvent = new TemplateHelper(this);
Fromthatmomenton,your
OnEndPage()
methodwillbeexecutedeachtimeapageisfinalized.
Warning:asdocumentedyoushallnot usethe
OnStartPage()
methodtoaddcontentinapage
event!
Ifweadapttheaboveexampletoyourrequirement,thefinalresultwouldlookmoreorlesslike
this:
class ImageBackgroundHelper : PdfPageEventHelper r {
private Image img;
public ImageBackgroundHelper(Image img) {
this.img = img;
}
/**
* @see com.itextpdf.text.pdf.PdfPageEventHelper#onEndPage(
*
com.itextpdf.text.pdf.PdfWriter, com.itextpdf.text.Document)
*/
public override e void OnEndPage(PdfWriter writer, Document document) {
writer.DirectContentUnder.AddImage(img);
}
}
Nowyoucanusethiseventlikethis:
string imageFilePath = = parent + "/Images/bg_image.jpg";
iTextSharp.text.Image jpg g = = iTextSharp.text.Image.GetInstance(imageFilePath);
jpg.ScaleToFit(17001000);
jpg.SetAbsolutePosition(00);
writer.PageEvent = new ImageBackgroundHelper(jpg);
Notethat1700and1000seemsquitebig.Areyousurethosearethedimensionsofyourpage?
VB.NET PDF: Basic SDK Concept of XDoc.PDF
And PDF file text processing like text writing, extracting, searching, etc., are to load a PDF document from file or query data and save the PDF document.
extract data from pdf file to excel; change font size pdf form reader
C# PDF insert text Library: insert text into PDF content in C#.net
functions to PDF document imaging application, such as inserting text to PDF, deleting text from PDF, searching text in PDF, extracting text from PDF, and so on
exporting pdf form to excel; pdf data extraction open source
ParsingXMLandXHTML
Therearealotofquestionsabout
HTMLWorker
onStackOverflow.Manyofthesequestionsremain
unansweredas
HTMLWorker
hasbeenabandonedinfavorofXMLWorker.
HTMLWorker
wasinitially
meantasaparserforasmallselectionofHTMLtags.Peoplestartedusingitasifitwereafull-blown
HTMLtoPDFconverterandthencomplainedbecause
HTMLWorker
doesn’tsupportCSSparsing.The
HTMLWorker
codegreworganicallyupuntilapointwhereitwasnolongermaintainable.
Westartedanotherproject,calledXMLWorker.ItcanbeusedtoconvertXHTMLtoPDF.It’snot
anURLtoPDFconverterinthesensethatitwon’t“printyourwebsitetoPDF”.InHTML,youcan
encountercontentattheendofthefilethatneedstobeaddedatthestartofthedocument.When
thishappens,onewouldexpectthatthestartofthedocumentisthefirstpage.Thatisn’tpossible
withiTextasiTextflushesfinishedpagestothe
OutputStream
assoonaspossibleandthereisno
waytoreturntoapreviouspagetoaddtheextracontent.
XMLWorkerismeanttocreatesimplereportsusinganeasylanguagesuchasHTML(andsome
CSS).Itwon’tresolveASPpages,norexecuteJavaScript.ItwillonlydealwithfinishedXHTML.
WhyisitsodifficulttoconvertXMLtoPDF?
Couldanybodyexplaintomewhyisitsocomplicatedtocreateapdffilefromxmlsheet?
AcrobatcancreateXMLFilebutwhenIwanttodothisotherwayrounditsuddenlygets
complicated.Iwouldliketofindsomesimpleapplicationwhichwouldallowmetocreate
apdffileoutofxml.Isitpossible?
PostedonStackOverflowonJun13,2013⁷²byDDEX⁷³
XMLisabunchofingredients,PDFisthefinishedmeal.
Heorshewhoknowshowtocookcancreateawidevarietyofmealsusingthesameingredients.
Withapotato,hecancreatesoup,mashedpotatoes,crisps,frenchfries,…There’sanalmostendless
listofpossibilities.
Heorshewhocan’tcook,willstareatthepotatoandwonder:HowonearthcanIturnthisugly
vegetableintoanicecroquette?
Theansweris:youneedarecipe.ThatrecipecouldbeanXSL:FOfile,theXHTMLspecification,a
DocBookimplementation,anXFAtemplate,…Withoutthatrecipe,you’llneverbeabletoturnyour
XMLintoPDF.
⁷²
http://stackoverflow.com/questions/17081907/why-is-it-so-difficult-to-convert-xml-to-pdf
⁷³
http://stackoverflow.com/users/1934834/ddex
C# PDF Page Extract Library: copy, paste, cut PDF pages in C#.net
NET application. Online C# source code for extracting, copying and pasting PDF pages in C#.NET console class. Support .NET WinForms
java read pdf form fields; make pdf form editable in reader
ParsingXMLandXHTML
41
HowtoaddexternalCSSwhilegeneratingPDF?
CurrentlyiamusingfollowingcodetogeneratePDFinaJSPfile:
response.setContentType("application/force-download");
response.setHeader("Content-Disposition""attachment;filename=reports.pdf");
Document document new Document();
document.setPageSize(PageSize.A1);
PdfWriter writer null;
writer PdfWriter.getInstance(documentresponse.getOutputStream());
document.open();
ByteArrayInputStream bis
new ByteArrayInputStream(htmlSource.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writerdocumentbis);
document.close();
Withthiscode,I’mabletogeneratePDF,butIwouldliketoaddaCSSfilewhilegenerating
PDF.
PostedonStackOverflowonJul16,2014⁷⁴byYellaGoud⁷⁵
PleasetakealookattheParseHtmlTable1⁷⁶example.Inthisexample,wehaveHTMLstoredina
StringBuilder
objectandsomeCSSstoredina
String
.Inmyexample,Iconvertthe
sb
objectand
the
CSS
objecttoan
InputStream
.IfyouhavefileswiththeHTMLandtheCSS,youcouldeasily
usea
FileInputStream
.
Onceyouhavean
InputStream
fortheHTMLandtheCSS,youcanusethiscode:
// CSS
CSSResolver cssResolver new StyleAttrCSSResolver();
CssFile cssFile XMLWorkerHelper.getCSS(new ByteArrayInputStream(CSS.getBytes()\
));
cssResolver.addCss(cssFile);
// HTML
HtmlPipelineContext htmlContext new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf new PdfWriterPipeline(documentwriter);
HtmlPipeline html new HtmlPipeline(htmlContextpdf);
⁷⁴
http://stackoverflow.com/questions/24777549/how-to-add-external-css-while-generating-pdf
⁷⁵
http://stackoverflow.com/users/2436481/yella-goud
⁷⁶
http://itextpdf.com/sandbox/xmlworker/ParseHtmlTable1
ParsingXMLandXHTML
42
CssResolverPipeline css new CssResolverPipeline(cssResolverhtml);
// XML Worker
XMLWorker worker new XMLWorker(csstrue);
XMLParser p new XMLParser(worker);
p.parse(new ByteArrayInputStream(sb.toString().getBytes()));
Or,ifyoudon’tlikeallthatcode:
ByteArrayInputStream bis =
new ByteArrayInputStream(htmlSource.toString().getBytes());
ByteArrayInputStream cis =
new ByteArrayInputStream(cssSource.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writerdocumentbiscis);
HowtodoHTMLtoXMLconversiontogenerateclosed
tags?
WhenItryconvertinghtmltopdfusingiTextandXMLWorker,I’maskedtogivethe
closingtagfor
<hr>
and
<br>
tags.ItworksifIdothismanually,butIdon’twanttoadd
eachclosingtagmanually.HowcanIdothisinanautomatedway?
PostedonStackOverflowonOct30,2014⁷⁷byKannuVerma⁷⁸
YouareexperiencingthisproblembecauseyouarefeedingHTMLtoiText’sXMLWorker.XML
WorkerrequiresXML,soyouneedtoconvertyourHTMLintoXHTML.
ThereisanexampleonhowtodothisontheofficialiTextsite:D00_XHTML⁷⁹
public static void tidyUp(String paththrows IOException {
File html new File(path);
byte[] xhtml Jsoup.parse(html"US-ASCII").html().getBytes();
File dir new File("results/xml");
dir.mkdirs();
FileOutputStream fos new FileOutputStream(new File(dirhtml.getName()));
fos.write(xhtml);
fos.close();
}
⁷⁷
http://stackoverflow.com/questions/26652029/how-to-do-xml-to-html-conversion-to-generate-closed-tags
⁷⁸
http://stackoverflow.com/users/4197576/kannu-verma
⁷⁹
http://itextpdf.com/sandbox/xmlworker/D00_XHTML
ParsingXMLandXHTML
43
Inthisexample,wegetapathtoanordinaryHTMLfile(similartowhatyouhave).Wethenuse
theJsoup⁸⁰librarytoparsetheHTMLintoanXHTMLbytearray.Inthisexample,weusethatbyte
arraytowriteanXHTMLfiletodisk.YoucanusethebytearraydirectlyasinputforXMLWorker.
⁸⁰
http://jsoup.org/
InspectaPDFwithiText
iTextcantellyoumoreaboutaPDF.Whatisthesizeofapage?Whichmeasurementunitisused.
Allofthesequestionscanbeansweredwithasimpleexampleusing
PdfReader
.
WhydoIgetan“
InvalidPdfException
:PDFheader
signaturenotfound”?
Ihavesomecodethatreadspdffiles.Thecodefailsattheline:
iTextSharp.text.pdf.PRTokeniser.CheckPdfHeader() at
iTextSharp.text.pdf.PdfReader.ReadPdf()
Iknowfromotherentriesthatthisissueiscomingfromsomeinvalidformattinginthe
PDF.HoweverI’mnotinapositiontotellmyuserstoredotheirPDFs.Istheresomeother
wayaroundthisissue,thatcanallowreadingofthepdfdespitethisproblem?
PostedonStackOverflowonSep10,2012⁸¹byDavidChoi⁸²
Ifafiledoesn’tstartwith%PDF-thenthere’snothingtofix:thefileisn’taPDFfile.
However,theremaybeanotherproblem:maybeyou’retryingtoaccessafilethathaszerolengthdue
tosomeproblemwhilecreatingtheInputStream.AnothercontextinwhichI’veseenthishappen,
isaPDFloadedfromaserver,wheretheserverreturneda404messageinHTMLinsteadofaPDF
file;-)
Whenever thatexceptionhappens,you u shouldstore thebytessomewhere,andexaminethem.
Withoutthosebytes,nobodywillbeabletogiveyouusefuladvice.
⁸¹
http://stackoverflow.com/questions/12357126/invalidpdfexception-pdf-header-signature-not-found
⁸²
http://stackoverflow.com/users/1426199/david-choi
Documents you may be interested
Documents you may be interested