ExtractingtextfromPDFs
65
RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy;
StringBuilder sb = new StringBuilder();
for (int i = 1; i i <= = reader.NumberOfPages; ; i++) {
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy\
(), filter);
sb.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i, , strategy));
}
Nowyou’llgetallthetextsnippetsthatintersectwiththe
rect
(sopartofthetextmaybeoutside
rect
,iTextdoesn’tcuttextsnippetsinpieces).
NotethatyoucangettheMediaBoxofapageusing:
Rectangle mediabox = = reader.GetPageSize(pagenum);
The coordinate of f the e lower-left corner is x =
mediabox.Left
and y y =
mediabox.Bottom
; the
coordinateoftheupper-rightcornerisx=
mediabox.Right
andy=
mediabox.Top
.
Thevaluesofxincreasefromlefttoright;thevaluesofyincreasefrombottomtotop.Theunitof
themeasurementsysteminPDFiscalled“userunit”.Bydefaultoneuserunitcoincideswithone
point(thiscanchange,butyouwon’tfindmanyPDFswithadifferentUserUnitvalue).Innormal
circumstances,72userunits=1inch.
Create tiff from pdf - Convert PDF to tiff images in C#.net, ASP.NET MVC, Ajax, WinForms, WPF
Online C# Tutorial for How to Convert PDF File to Tiff Image File
convert pdf file to tiff format; pdf to tiff converter
Create tiff from pdf - VB.NET PDF Convert to Tiff SDK: Convert PDF to tiff images in vb.net, ASP.NET MVC, Ajax, WinForms, WPF
Free VB.NET Guide to Render and Convert PDF Document to TIFF
convert multiple page pdf to tiff; pdf to tiff open source
GeneralquestionsaboutiText
ThesearesomequestionsaboutiTextingeneral.Theyaren’talwaysaboutatechnicalproblem,but
theycanbeaboutabasicconceptthatisexplainedinmoredetailinoneofthelaterchapters.
UnitTestingandAutomatedTestingQuestions
IhavebeensearchingforsomeunittestsfortheprogramiTextwithnoluck.Isanyone
awareofanysuchtests?Also,doesanyoneknowifthedevelopersuseanyautomatic
testingtoolsoniText,suchasJenkins?
PostedonStackOverflowonFeb21,2014¹¹⁵byuser3338813¹¹⁶
Internally,weuseJenkinsaswellasTeamCity.
Wehavetwotypesoftests:
1. Theteststhatareaddedwhennewcorefunctionalityisadded.Youcanfindthesewhere
Mavenexpectsthem:eachMavenprojecthasa
src
directorywith2subdirectories:
main
and
test
.Forinstance:ifyoulookatiTextcore,you’llfindthereleasedstuffhere¹¹⁷andthetests
here¹¹⁸.Mostofthesetestsarebuiltontopofourtestutils¹¹⁹.
2. TheteststhatareaddedwhenwegetquestionsonSOorwhenwecreatecodesamplesfor
thebooks.FortheseweuseagenerictestclassessuchasGenericTest¹²⁰andSandboxSam-
pleWrapper¹²¹.Thewrapperclassmakescreatingatestano-brainer.Allyouneedtodoto
turnasampleintoatestisaddingthe
@WrapToTest
annotation.Well,actuallythere’smore
involved:youneedtofollowaspecificpatternwhenwritingasample:alwaysuse
SRC
and
DEST
forsourcePDFsandresultingPDFs,alwaysusea
createPdf()
or
manipulatePdf()
method,andalwaysgivethecmpfilethesamenameasthe
DEST
fileprefixedwith
cmp_
.
Inbothcases,you’llfindPDFfilesofwhichthenamestartswith
cmp_
,seeforinstancethecmpfiles
folder¹²²fortheexamples.Inbothcases,you’llfindreferencestoGhostscriptandacomparetool
(you’llneedtoconfigurethese).
¹¹⁵
http://stackoverflow.com/questions/21944424/itext-unit-testing-and-automated-testing-questions
¹¹⁶
http://stackoverflow.com/users/3338813/user3338813
¹¹⁷
http://sourceforge.net/p/itext/code/HEAD/tree/trunk/itext/src/main/
¹¹⁸
http://sourceforge.net/p/itext/code/HEAD/tree/trunk/itext/src/test/
¹¹⁹
http://sourceforge.net/p/itext/code/HEAD/tree/trunk/itext/src/main/java/com/itextpdf/testutils/
¹²⁰
http://sourceforge.net/p/itext/code/HEAD/tree/trunk/sandbox/src/test/java/sandbox/GenericTest.java
¹²¹
http://sourceforge.net/p/itext/code/HEAD/tree/trunk/sandbox/src/test/java/sandbox/SandboxSampleWrapper.java
¹²²
http://sourceforge.net/p/itext/code/HEAD/tree/trunk/sandbox/cmpfiles/
C# Create PDF from Tiff Library to convert tif images to PDF in C#
Create PDF from Tiff. |. Home ›› XDoc.PDF ›› C# PDF: Create PDF from Tiff. Create PDF from Tiff in both .NET WinForms and ASP.NET application.
pdf to tiff quality; converting pdf to tiff
VB.NET Create PDF from Tiff Library to convert tif images to PDF
WPF. PDF Create. Create PDF from Word. Create PDF from Excel. Create PDF from PowerPoint. Create PDF from Tiff. Create PDF from Images.
batch convert pdf to tiff; .net convert pdf to tiff
GeneralquestionsaboutiText
67
WhydoIgeta“Couldnotfind
PdfGraphics2D
”error?
I have
come
across
a
runtime
exception
“Could
not find
class
com.itextpdf.awt.PdfGraphics2D
”.IwantedtocreateaPDFdocumentfromandroid
device.ForthatIusediTextlibrary.ThismycodeforcreatingPDF:
Document document new Document();
PdfWriter.getInstance(documentoutStream);
document.open();
document.add(new Paragraph(data));
document.close();
Thecodeworksfine.ItiscreatingPDFsuccessfully.butitgivesmearuntimeexception:
06-14 10:09:20.491: : W/dalvikvm(764):
Unable to resolve superclass s of Lcom/itextpdf/awt/PdfGraphics2D; (1251)
06-14 10:09:20.491: : W/dalvikvm(764):
Link of class s 'Lcom/itextpdf/awt/PdfGraphics2D;' ' failed
06-14 10:09:20.491: : E/dalvikvm(764):
Could not find class s 'com.itextpdf.awt.PdfGraphics2D',
referenced from m method d com.itextpdf.text.pdf.PdfContentByte.createGraphics
06-14 10:09:20.491: : W/dalvikvm(764):
VFY: unable to o resolve new-instance e 480
(Lcom/itextpdf/awt/PdfGraphics2D;) in Lcom/itextpdf/text/pdf/PdfContentByte;
06-14 10:09:25.280: : E/dalvikvm(764):
Could not find class s 'org.bouncycastle.cert.X509CertificateHolder',
referenced from m method d com.itextpdf.text.pdf.PdfReader.readDecryptedDocObj
06-14 10:09:25.280:
W/dalvikvm(764): VFY: : unable e to resolve new-instance 1612
(Lorg/bouncycastle/cert/X509CertificateHolder;) in Lcom/itextpdf/text/pdf/Pd\
fReader;
Ihavedonecleanandbuild,addedjartolibsfolderandmakeitselectedonorderand
exportandidonelotofresearchforpast2days.butnothinghelpedme.Baseduponmy
knowledgethereshouldbethesepossibilities:(1)theexternaljarisn’tloadedproperly,
or(2)theclass
PdfGraphics2D
extends
java.awt.Graphics2D
whichisnotavailableon
Android.
PostedonStackOverflowonJun14,2013¹²³byR9J¹²⁴
¹²³
http://stackoverflow.com/questions/17102533/could-not-find-class-com-itextpdf-awt-pdfgraphics2d
¹²⁴
http://stackoverflow.com/users/1912085/r9j
VB.NET Create PDF from PowerPoint Library to convert pptx, ppt to
WPF. PDF Create. Create PDF from Word. Create PDF from Excel. Create PDF from PowerPoint. Create PDF from Tiff. Create PDF from Images.
reader convert pdf to tiff; compare pdf to tiff
VB.NET Create PDF from Word Library to convert docx, doc to PDF in
WPF. PDF Create. Create PDF from Word. Create PDF from Excel. Create PDF from PowerPoint. Create PDF from Tiff. Create PDF from Images.
how to convert pdf to tiff on; c# convert pdf to tiff
GeneralquestionsaboutiText
68
You’vediscoveredthat
PdfGraphics2D
extends
java.awt.Graphics2
D,andasyoualreadyknow
Graphics2D
isaforbiddenclassonAndroid.
You’vealsoencounteredproblemsrelatedtoBouncyCastle.
Thistellsmethatyou’reusingtheJavaversionofiTextinsteadoftheAndroidport¹²⁵.Inthe
Androidport,wereplacedBouncyCastlebySpongyCastle(asrecommendedwhenusingencryption
onAndroid)andweremovedallreferencestoforbiddenclasses(forinstanceinthe
awt
and
nio
packages).
PleaseswitchtousingtheAndroidportofiText.ItiscallediTextG¹²⁶.
WhydoIgeta“
getOutputStream()
hasalreadybeen
calledforthisresponse”errorinJSP?
I’musingJDBCtofetchdatafromdatabaseandthenIuseiTexttocreateaPDFfilewhich
canbedownloadedonclientmachine.TheapplicationiscodedinHTML/JSPandrunson
ApacheTomcat.
Iusethe
response.getOutputStream
tocreateanoutputPDFfileimmediately.However,
Igetthefollowingerror:
getOutputStream()
hasalreadybeencalledforthisresponse
HowcanIgenerateadynamicPDFfilewhichcanbedownloadedbyclientmachine?
PostedonStackOverflowonJun13,2013¹²⁷bySahilSharma¹²⁸
WhenyouwriteJSP,youprobablylikewhitespaceandindentation,forinstance:
¹²⁵
http://itextpdf.com/product/itextg
¹²⁶
http://itextpdf.com/product/itextg
¹²⁷
http://stackoverflow.com/questions/17083318/how-to-insert-image-in-pdf-using-itext-and-download-to-client-machine
¹²⁸
http://stackoverflow.com/users/2367475/sahil-sharma
VB.NET Create PDF from Excel Library to convert xlsx, xls to PDF
WPF. PDF Create. Create PDF from Word. Create PDF from Excel. Create PDF from PowerPoint. Create PDF from Tiff. Create PDF from Images.
batch pdf to tiff converter; file conversion pdf to tiff
C# Create PDF from Excel Library to convert xlsx, xls to PDF in C#
Create searchable and scanned PDF files from Excel. Description: Convert to PDF/TIFF and save it on the disk. Parameters: Name, Description, Valid Value.
convert pdf file to tiff online; pdf converter to tiff online
GeneralquestionsaboutiText
69
<% //a line of code %>
<%
// some more code
%>
<% // another line of code %>
<%
response.getOutputStream();
%>
Thiswillalwayscausetheexception
"getOutputStream() has already been n called d for this
response"
regardlessifyou’reusingiTextornot.The
getOutputStream()
methodwascalledthe
momentyouintroducedyourfirstwhitespacecharacterinyourJSPscript.
Tofixthis,youneedtoremoveallwhitespace:
<% //a line of code %><%
// some more code
%><% // another line of f code %><%
response.getOutputStream();
%>
Notasinglecharacterisacceptedoutsidethe
<%
and
%>
markers.AsexplainedinthebetterJSP
manuals,youshouldn’tuseJSPtocreatebinaryfiles.Whynot?BecauseJSPintroduceswhitespace
charactersatarbitraryplacesinyourbinaryfile.Thatresultsincorruptfiles.UseServletsinstead!
C# Create PDF from PowerPoint Library to convert pptx, ppt to PDF
Easy to create searchable and scanned PDF files from PowerPoint. Description: Convert to PDF/TIFF and save it on the disk. Parameters:
pdf to tiff converter without watermark; how to save pdf to tiff
C# Create PDF from Word Library to convert docx, doc to PDF in C#.
Easy to create searchable and scanned PDF files from Word. Description: Convert to PDF/TIFF and save it on the disk. Parameters: Name, Description, Valid Value.
save pdf to tiff; c# convert pdf to tiff
Legalquestions
Although StackOverflow w is s a a forum where developers s post technical l questions s and d technical
questionsonly,wenoticethatsomedevelopersalsowanttoknowmoreaboutthelegalaspectsof
usingopensource,morespecifically:isitlegaltouseiTextforfree?Istherealicensefeeinvolved?
WhatisthedifferencebetweenLowagieandiText?
WhatisthedifferencebetweenlowagieandiText.Isthisjustversiondifferenceorup-
gradationtolibrary.Whichonerecommendedtobeused.
PostedonStackOverflowonNov22,2012¹²⁹byAdeebCheulkar¹³⁰
IamLowagie,thelowagieyoureferto.I’mtheoriginalauthorofiTextandtheauthorofthe“iText
inAction”books.
AsexplainedintheSalesFAQ¹³¹,youshouldusethelatestversionofiText.
ThedifferencesbetweenoldversionsofiText(iText2.x.ydatesfromJuly2009orearlier)andnewer
versionsofiTextcanbefoundinthechangelogs¹³².
The5.0.0versionhadthefollowingsubstantialchanges:
• iTextandiTextSharpstartedusingthesameversionnumbers
• theiText.jariscompiledusingJava5(insteadofwiththeJDK1.4).
• TheF/OSSlicensehasbeenupgradedfromMPL/LGPLtoAGPL.
• Thepackagenameshavechangedfrom
com.lowagie
to
com.itextpdf
.
• ThetoolboxandRTFsupporthavebeenremoved:they arenowina a separateprojectat
SourceForge.
NumerousbugshavebeenfixedsinceJuly2009.FunctionalitythatmakesyourPDFsfuture-proof
suchasupdatesregardingnewdigitalsignaturestandardsandnewstandardssuchasPDF/UA,
PDF/A-2andPDF/A-3isonlyavailableinthemorerecentiTextversions.
¹²⁹
http://stackoverflow.com/questions/13515210/difference-between-lowagie-and-itext
¹³⁰
http://stackoverflow.com/users/1771109/adeeb-cheulkar
¹³¹
http://itextpdf.com/salesfaq
¹³²
http://itextpdf.com/changelog
Legalquestions
71
CaniText2.1.7orearlierbeusedcommercially?
CaniText2.1.7(MPL/GPL)licencebeusedincommercialprojects?Iamnotalegalguybut
lotsofdiscussionthreadssuggestthatthereisnoissueusingtheearlierversion(2.1.7)of
iTextincommercialprojectsasthatversionisboundedwithterms&conditionsgoverned
byMPL¹³³/GPLlicense.
However,ifwelookatiText’sofficialwebsite¹³⁴,itsaysasthelicencehasbeenupgraded
toAGPLlicence,onehastobuythesoftwarebeforecommerciallyusingit.Seethetopic
entitledWhyshouldn’tIuseiText2.x(oriTextSharp4.x)?
LEGALREASONS:OlderversionsofiTextunderthefreemodelmaycon-
taincodefragmentsthatinfringeotherpeople’scopyrightsorintellectual
propertyrights.iTextSoftwareGrouphasdoneasignificantinvestmentin
identifyingandeliminatingallthosecasesasofversion5.1.whichisoneof
thereasonwhyitisnowapayingcommercialversion.Wedonotrecommend
theuseofversionspriorto5.1forcommercialprojectsasyourcompanycould
beliableforcopyrightorIPinfringements.
Ofcourse,thisseemsawarningonly.DiscouragementofnotusingiTextwithearlier
versionduetoTechnicalreasonscouldbeunderstoodbutLegalreasonsarenotworth.
WhataboutthecommercialprojectswhohavebeenusingiText2.1.7beforethelicence
upgradehappenediniText?Wouldtheynowhavetochangetheirwholeprojectplanning
becauseiTexthasnowchangehismindtonottodistributeitcommercially?Ofcourse
iTextmighthasdonesignificantinvestmentinupgradingtheversiontechnicallybutwhat
abouttheinvestmentonemighthavedoneinhiscommercialprojectusingiText2.1.7or
earlier?
Please someone who understands legal implications of f both the e licences clarify this
confusion.iTextcanusesuchwarningtoencourageitssalebutisthereanythingsubstantial
insuchwarning?CanoneuseiTextwithversion2.1.7orearliercommercially?Comments
fromMr.BrunoLowagie,theoriginalauthorofiTextarehighlyappreciated.
PostedonStackOverflowonSep6,2014¹³⁵byDevendraSharma¹³⁶
ThefirstiTextcompanywasfoundedin2008.Thepurposeofthiscompanywastoputallthe
IntellectualPropertyofthecodeintoonelegalentity.Thiswasachievedbyidentifying[1.]every
thirdpartyprojectfromwhichcodewasborrowed,aswellas[2.]everyindividualdeveloperwho
contributedcode.
¹³³
https://www.mozilla.org/MPL/1.1/
¹³⁴
http://itextpdf.com/salesfaq
¹³⁵
http://stackoverflow.com/questions/25696851/can-itext-2-1-7-or-earlier-can-be-used-commercially
¹³⁶
http://stackoverflow.com/users/2881228/devendra-sharma
Legalquestions
72
[1.]Somecodesnippetswereborrowedfromprojectswithanambiguouslicense.Forinstance:we
hadasnippetthatwasreleasedunderSun’sExampleLicense(whichallowedustousethecode),but
inthecommentsectionoftheclass,itsaidthatthecodewasproprietarytoSUN(whichprevented
ustousethecode).Whichofbothprevailed?Beinganignorantdeveloperatthattime,Ithought
theExampleLicensewastheoneIcoulduse,justlikesomepeopleclaimthatyoucanuseiText2.1.7
today.Lawyershowever,disagreed:theysaidthatthemoststrictlicensewasthevalidone.
We solved these problems s by y (1) asking permissionto use code withambiguous s licenses, , (2)
refactoringcodeifwedidn’tgetpermission,(3)removingcodewecouldn’trefactor.
Wedidthesamewithcontributionsfromindividualdevelopers.
[2.]TheIPfromindividualdeveloperswastransferredtoiTextGroupNV(formerlyknownas1T3XT
BVBA)byaskingeverydeveloperwhocontributed20linesofcodeormoretosignaContributor
LicenseAgreement.
Twoproblemsarose:
1. Individual l developerscould not be reached. . For example: we dropped d the RTF package
completelybecausewecouldn’tfindacoupleofthecoredevelopersoftheRTFfunctionality.
2. Inacoupleofcases,wehadtonegotiateabouttheCLA.Forexample:onecompanydidn’t
liketheCLA.Instead,thiscompanyreleasedthecontributionofitsemployeesunderanMIT
license,sothatwecoulduseitanyway.Anotherorganizationwasreallyslowinagreeingwith
theCLA.IttookusuntilSeptember2009beforewereceivedformalapproval.Onlyafterthis
approval,weswitchedtotheAGPL.Ican’tdisclosethedocument(itwasdifferentfromthe
CLA),northenameoftheorganization(IhopeIdon’tbreaktheNDAjustbywritingthis).I
canonlysaythatweonlyhadfullcoverageofthecodebaseafterthatdocumentwassigned.
IgnorantdevelopersclaimthattheLGPL/MPLheader“protects”them,butwhatifsomeproprietary
codewasaccidentallyaddedtoaclasswithsuchaheader?Doesthismakethatproprietarycode
“availableundertheMPL/LGPL”?Ifitdid,itwouldbesufficienttotakeproprietarycode,addan
MPL/LGPLheaderandpublishit.Doingthisonpurposewouldbeillegal.Doingthisoutofignorance
canbepardonedifthereisawillingnesstofixtheissue.
Intheearlyyearsofopensource,itdidoccurthatproprietarycodegotmixedintoanopensource
projectbyaccident.AtiText,wehaveinvestedalotoftimeandeffortintocleaningupthecode
base.Sincethatexercise,weareverydisciplinedwithrespecttocodecontributions.Thisisoneof
thecoretasksofaprofessionalopensourcecompany.
Afterwefixedalltheissues,weremovedallcopiesofthoseoldiTextversionsfromourserversto
makesurewewereintheclear.IfacompanydecidestousesomerogueversionofiText2.1.7thatis
outsideofourcontrol,thiscompanydoessowillinglyandknowingly,inotherwords:atitsown
risk!Thereisnowaysuchacompanycanclaim:“Wedidn’tknowtherewasapossibleIPissuewith
thecode.”
IfyouwanttouseiText2.1.7,youneedtodotheexercisewehavedonebetween2007-2009at
yourownexpense.Thiswillcostyoumorethanthepriceofalicense.Forinstance:theindividual
Legalquestions
73
developersgavepermissiontoiTextGroupNVtodobusinesswithiText,butwilltheygivethat
permissiontoyou?Howwillyouidentifythoseindividualdevelopers?
Moreover:iText2.1.7datesfromJuly2009,meaningthatitismorethan5yearsold.Manybugshave
beenfixedsincethatdate.Shouldyouknowinglyintroducethosebugsintothecodebaseofyour
customer,thenyourcustomermayclaimthatyouhadanalternative:youcouldhaveusedamore
recentversionofiText…
Asforyourquestion“whatabouttheinvestmentonemighthavedoneinhiscommercialprojectusing
iText2.1.7orearlier?”Thatinvestmentmusthavebeendoneatleast3yearsago,becausewe’vebeen
informingpeoplethattheyshouldupgradeforatleastthatlong.Upgradingtoarecentversionisan
investmentthatshouldbecategorizedasamaintenancecost.Itshouldbeanaffordablecostbecause
whoeverhasbeenusingiText2.1.7forthatlonginacommercialprojecthasbeenmakingmoney
thankstoiTextforthatlong.Claimingthat“iTexthasnowchangeditsmind”isnotcorrectunless
nowismarkedasasynonymof5yearsagoinyourdictionary.
Tobecontinued…
AlltheanswersandthemanycodesamplesIhaveprovidedonStackOverflowwerewritteninthe
hopethattheyarehelpful.Ileaveituptothereaderofthis“Bestof”selectiontodecidewhether
ornot“I’mkindofadick” asthepeoplewhodown-votedsomeofmyanswersclaim.Ijustlove
answeringquestions,andwhereloveisinvolved,there’salsopain,forinstancethepainifthelove
isn’treturned.Somepeopleseemtomakeasportoutofittobegforananswerandthentothankme
bysaying:we’renevergoingtobeacustomerofiTextSoftware.Somehowthatdoesn’tcompute.I
hopeyouunderstand.
Obviously,abooklikethisisneverfinished.NewquestionsaboutiTextarepostedeveryday.Iexpect
thatthisbookwillgrowovertheyears.Someanswersmaybecomeobsolete,somenewfunctionality
willrequiremoreclarification.Thisclarificationmaybeprovidedintheformofananswertoanew
question,orasatopicinoneoftheotherupcomingbooks:
• TheABCofPDF¹³⁷
• CreateyourPDFswithiText¹³⁸
• UpdateyourPDFswithiText¹³⁹
• SignyourPDFswithiText¹⁴⁰
Allofthesebooksareavailableforfree.Nodonationisexpected.
¹³⁷
https://leanpub.com/itext_pdfabce
¹³⁸
https://leanpub.com/itext_pdfcreate
¹³⁹
https://leanpub.com/itext_pdfupdate
¹⁴⁰
https://leanpub.com/itext_pdfsign
Documents you may be interested
Documents you may be interested