pdf sdk c# free : Extract pdf data into excel software SDK project winforms windows .net UWP faq2-part169

Chapter 2
Existing users
(including everyonewhouses abrowser)
2.1 What doI havetodotouseXML?
Toreadit: useanymodernwebbrowser;tocreateit: useanXML
editor.
FortheaverageuseroftheWeb,youdon’tneedanything exceptabrowser
whichworkswithXML(seethequestionaboutbrowsers). Remembernew
XML-relatedfacilitiesarebeinginventedorimplementedallthetime(seethe
W3Cwebsite),sosomerecentfeaturesmaynotworkinallbrowsersyet.
YoucanuseXML-conformantbrowserstolookatsomeofthestableXML
material, suchasJonBosak’sShakespeareplays andthemolecular
experimentsoftheChemicalMarkupLanguage(CML).Therearesome
moreexamplesourceslisted at
http://xml.coverpages.org/xml.html#examples,andyouwillfindXML
(particularlyintheguiseofXHTML)being introducedinplaces whereit
won’tbreakolderbrowsers.
Ifyou want to startpreparations forcreatingyourownXMLfiles, seethe
questions intheAuthors’SectionandtheDevelopers’Section,particularly
thequestiononQuestion4.10onpage82.
21
Extract pdf data into excel - extract form data from PDF in C#.net, ASP.NET, MVC, Ajax, WPF
Help to Read and Extract Field Data from PDF with a Convenient C# Solution
save data in pdf form reader; pdf form field recognition
Extract pdf data into excel - VB.NET PDF Form Data Read library: extract form data from PDF in vb.net, ASP.NET, MVC, Ajax, WPF
Convenient VB.NET Solution to Read and Extract Field Data from PDF
java read pdf form fields; save pdf forms in reader
2.2 What doesXMLlook like(inside)?
PointybracketslikeHTML
ThebasicstructureofXMLissimilartootherapplications ofSGML,
includingHTML.Thebasiccomponents canbeseeninthefollowing
examples. AnXMLdocumentstartswithanoptionalProlog,whichcanhave
two (optional)parts:
1. TheXML Declaration:
<?xml version="1.0" encoding="utf-8"?>
Thisspecifiesthat thisis anXMLdocumentandthatitusestheUTF-8
characterrepertoire(thedefault;othersareavailablebutsupport is
onlymandated forUTF-8);
2. ADocumentTypeDeclarationifyou areusing aDTD:
<!DOCTYPE report SYSTEM "http://sales.acme.corp/dtds/salesrep.dtd">
whichidentifiesthetypeofdocument(here,‘report’)andsayswhere
theDocument Type Description(DTD)isstored;
ThePrologis followedbytheDocumentInstance:
1. Aroot element,whichistheoutermost(top level)element(start-tag
plusend-tag)whichencloseseverythingelse: intheexamplesbelowthe
rootelements areconversationandtitlepage;
2. Astructuredmixofdescriptiveorprescriptiveelementsenclosing the
characterdatacontent (text), and optionallyanyattributes
(‘name="value"’pairs)insidesomestart-tags.
XMLdocumentscanbeverysimple, withstraightforwardnestedmarkup of
yourowndesign:
<?xml version="1.0" standalone="yes"?>
<conversation>
<greeting>Hello, world!</greeting>
<response>Stop the planet, I want to get off!</response>
</conversation>
22
C# Create PDF from Excel Library to convert xlsx, xls to PDF in C#
Turn all Excel spreadsheet into high quality PDF without losing formatting. Evaluation library and components for PDF creation from Excel in C#.NET framework.
pdf form save with reader; vb extract data from pdf
C# PDF insert text Library: insert text into PDF content in C#.net
Parameters: Name, Description, Valid Value. value, The char wil be added into PDF page, 0
extract data from pdf c#; cannot save pdf form in reader
Ortheycanbemorecomplicated, withaSchemaorDTD,andmaybean
internalsubset(localDTDchangesin[squarebrackets]withinthe
DocumentTypeDeclarationliketheENTITYdeclarationbelow);andan
arbitrarilycomplexnestedstructure:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE titlepage
SYSTEM "http://www.foo.bar/dtds/typo.dtd"
[<!ENTITY % active.links "INCLUDE">]>
<titlepage xml:id="BG12273624">
<white-space type="vertical" amount="36"/>
<title font="Baskerville" alignment="centered"
size="24/30">Hello, world!</title>
<white-space type="vertical" amount="12"/>
<!-- In some e copies the following
decoration is hand-colored, presumably
by the author -->
<image location="http://www.foo.bar/fleuron.eps"
type="URI" alignment="centered"/>
<white-space type="vertical" amount="24"/>
<author font="Baskerville" size="18/22"
style="italic">Vitam capias</author>
<white-space type="vertical" role="filler"/>
</titlepage>
Ortheycanbeanywherebetween: a lotwilldependonhowyouwantto
defineyourdocumenttype(orwhoseyouuse)andwhatitwillbeusedfor.
Database-generatedorprogram-generated XMLdocumentsusedin
e-commerceareusuallyunformattedbecausetheyareformachine
consumption,notforhumanreading,andtheymayuseverylong names or
values,with multipleredundancyandsometimesnocharacterdatacontentat
all, justvalues inattributes:
<?xml version="1.0"?>
<ORDER-UPDATE AUTHMD5="4baf7d7cff5faa3ce67acf66ccda8248"
ORDER-UPDATE-ISSUE="193E22C2-EAF3-11D9-9736-CAFC705A30B3"
ORDER-UPDATE-DATE="2005-07-01T15:34:22.46"
ORDER-UPDATE-DESTINATION="6B197E02-EAF3-11D9-85D5-997710D9978F"
ORDER-UPDATE-ORDERNO="8316ADEA-EAF3-11D9-9955-D289ECBC99F3">
<ORDER-UPDATE-DELTA-MODIFICATION-DETAIL ORDER-UPDATE-XML:ID="BAC352437484">
<ORDER-UPDATE-DELTA-MODIFICATION-VALUE ORDER-UPDATE-ITEM="56"
23
VB.NET Create PDF from Excel Library to convert xlsx, xls to PDF
Field Data. Data: Auto Fill-in Field Data. Field: Insert & pages edit, C#.NET PDF pages extract, copy, paste NET Microsoft Office Excel to adobe PDF file converter
export pdf form data to excel; how to make pdf editable form reader
C# PDF Page Extract Library: copy, paste, cut PDF pages in C#.net
pdf"; doc.Save(outputFilePath); C#.NET Sample Code: Extract PDF Pages and Save into a New PDF File in C#.NET. You can easily get
extract data from pdf; extract pdf data to excel
ORDER-UPDATE-QUANTITY="2000"/>
</ORDER-UPDATE-DELTA-MODIFICATION-DETAIL>
</ORDER-UPDATE>
2.3 ShouldIuseXMLinsteadofHTML?
Yesifyouneedrobustness,accuracy,andpersistence.
XMLallows authorsandproviderstodesigntheirowndocumentmarkup
insteadofbeing limitedbyHTML.Documenttypescanbeexplicitlytailored
to anapplication,so thecumbersomefudgingandpoodlefakingthathasto
takeplacewithHTMLbecomesa thingofthepast: yourmarkup canalways
saywhatit means. Trivialexample:
<date YYYY-MM-DD="2005-12-26">last Monday</date>
• Informationcontentcanbericherandeasierto use, becausethe
descriptiveandhypertextlinkingabilitiesofXMLaremuchgreater
thanthoseavailableinHTML.
• XMLcanprovidemoreand betterfacilities forbrowserpresentation
andperformance,usingXSLT andCSS stylesheets;
• Itremoves manyoftheunderlyingcomplexities ofSGML-format
HTML(whichledtothembeingignoredandbroken) infavourofa
moreflexiblemodel, sowritingprograms to handleXMLismuch
easierthandoingthesameforalltheoldbrokenHTML.
• Informationbecomesmoreaccessibleandreusable,becausethemore
flexiblemarkup ofXMLcanbeusedbyanyXMLsoftwareinsteadof
beingrestrictedtospecificmanufacturersas hasbecomethecasewith
HTML.
• XMLfilescanbeused outsidetheWebaswell, inexisting
document-handling environments(egpublishing).
Ifyourinformationis transient, orcompletelystaticandunreferenced, or
veryshortandsimple, andunlikelyto needupdating,HTMLmaybeallyou
need.
24
C# PDF insert image Library: insert images into PDF in C#.net, ASP
Import graphic picture, digital photo, signature and logo into PDF document. Merge several images into PDF. Insert images into PDF form field.
extracting data from pdf to excel; extracting data from pdf forms
VB.NET PDF Page Extract Library: copy, paste, cut PDF pages in vb.
Help to extract single or multiple pages from adobe PDF file and save into a new PDF file. VB.NET: Extract PDF Pages and Save into a New PDF File.
pdf form data extraction; pdf data extraction
2.4 Someonesent meanXMLfile. HowdoI readit?
OpenitinanXMLbrowserorXMLeditor.
Ifthefileiswell-formedorvalid XML,youcanjustopenitwithany
XML-conformantbrowser(seeQuestion2.1onpage21andQuestion2.6on
page28).Thiswilldisplaythefileinanunformattedview,showingallthe
markupinaformatthatlets you folduporunfoldthenestedhierarchy
(clickonthelittleplusandminussymbols),whichwillatleastletyou read
something.
Ifthefilecontainsa linktoanXSLTorCSSstylesheet(andthestylesheet
wasprovidedoris web-accessible)thenthebrowsershouldformatthefilein
areadablemanner(butbewarethatin-browserformattingis not robust).
Ifyou want to edit thefile, you needanXMLeditor(seeQuestion4.10on
page82).Unlessyouareveryskilledwithpointy-bracketmarkup,donottry
to editXMLfileswith non-XMLeditors.
2.5 HowdoIcontrol theformatting ofXML?
UseCSSoranXSLT2stylesheet.
InHTML,defaultstyling wasbuilt intothebrowsers becausethetagset of
HTMLwaspredefinedandhardwiredinto browsers. This isstilltruefor
XHTMLandHTML5tosomeextent. InotherXML, whereyoucandefine
yourowntagset, browserscannotpossiblybeexpected to guess orknowin
advancewhatnames you aregoingto useandwhattheywillmean, soyou
needastylesheetifyouwanttodisplayformattedtext.
Browserswhich readXMLwillacceptanduseaCSSstylesheetata
minimum,butyou canalso usethemorepowerfulXSLTstylesheetlanguage
to transformyourXMLintoHTML— whichbrowsers, ofcourse,already
knowhowtodisplay(andthatHTMLcanstilluseaCSSstylesheet). Thisway
yougetallthedocumentmanagement benefits ofusingXML, butyou don’t
havetoworryaboutyourreaders needingXMLsmarts intheirbrowsers.
25
C# PDF Page Insert Library: insert pages into PDF file in C#.net
from the ability to inserting a new PDF page into existing PDF PDF page or pages from various file formats, such as PDF, Tiff, Word, Excel, PowerPoint, Bmp
how to save a pdf form in reader; vb extract data from pdf
C# PDF File Split Library: Split, seperate PDF into multiple files
Divide PDF File into Two Using C#. This is an C# example of splitting a PDF to two new PDF files. Split PDF Document into Multiple PDF Files in C#.
export pdf form data to excel spreadsheet; online form pdf output
Thistransformationis usuallydonebythedocument owner, ontheirserver,
so you justgettheHTMLanyway, possiblyunawarethatitwas XML
originally. Butitisalsopossibleto usethe(ratherlimited)built-inXSLT1.0
transformerinsomebrowsers,andserveroperatorscannowalsouse
Saxon CE, whichisadownloadablein-browserversionofXSLT2.
MikeBrownwrites:
XSLTisanXMLdocumentprocessinglanguagethatusessourcecodethat
happenstobewritteninXML.AnXSLTdocumentdeclaresasetofrules
foranXSLTprocessortousewheninterpretingthecontentsofanXML
document.TheserulestelltheXSLTprocessorhowtogenerateanew
XML-likedatastructureandhowthatdatashouldbeemitted—asanXML
document,asanHTMLdocument,asplaintext,orperhapsinsomeother
format.
Thistransformationcanbedoneeitherinsidethebrowser,orbythe
serverbeforethefileissent.Transformationinthebrowseroffloadsthe
processingfromtheserver,butmayintroducebrowserdependencies,
leadingtosomeofyourreadersbeingexcluded.Transformationinthe
servermakestheprocessbrowser-independent,butplacesaheavier
processingloadontheserver.
Aswithanysystemwherefilescanbeviewedatrandombyarbitraryusers,
theauthorcannotknowwhatresources(suchas fonts)areontheuser’s
system,so thesamecareis neededaswithHTMLusingfonts. To invokea
stylesheetfromanXMLfileforstandaloneprocessing inthebrowser,
includeoneofthestylesheetdeclarations:
<?xml-stylesheet href="foo.xsl" type="text/xsl"?>
<?xml-stylesheet href="foo.css" type="text/css"?>
(substitutingtheURIofyourstylesheet,ofcourse). See
http://www.w3.org/TR/xml-stylesheet/forthefulldetails. The Cascading
StylesheetSpecification(CSS)providesasimplesyntaxforassigningstylesto
elements,andhas beenimplementedinmostbrowsers.
DavePawsonmaintainsacomprehensiveXSLFAQat
http://www.dpawson.co.uk/xsl/,andhisbookPawson[Pawson],2002[the
Foxbook]is availablefromO’Reilly. XSLusesXMLsyntax(anXSL
26
stylesheetisjustanXMLfile) and haswidespreadsupportfromseveral
majorbrowservendors(seethequestionsonbrowsersandothersoftware).
XSLcomes intwoflavours:
• XSLitself,whichisa pureformatting language,outputtinga Formatted
Objects (FO)file,whichneedsa textformatterlikeFOP,XEP,orothers
to createprintable(PDF) output(butseeQuestion2.5). CurrentlyIam
notawareofanyWeb browsers whichsupportdirectXSLrenderingto
PDF;
• XSLT (TforTransformation), whichis alanguageto specify
transformationsofXMLintoHTMLeitherinsidethebrowseroratthe
serverbeforetransmission. Itcanalsospecifytransformations from
onevocabularyofXMLtoanother,andfromXMLto plaintext (which
canbeanyformat,includingRTFandL
A
T
E
X).
Allcurrentversions ofMicrosoftInternetExplorer, Firefox,Chrome,
Mozilla, Safari,andOpera handleXSLT1.0insidethebrowser. Beware
obsoletebrowsers likeMSIE5.5whichneedssomepost-installationsurgery
to removethelong-obsoleteWD-xslandreplaceitwiththecurrent
XSL-Transformprocessor.
WYSIWYG
FOR
XSL
Therehavebeenattemptstoproducepseudo-WYSIWYGeditorsfor
creatingXSL[T]stylesheets,buttheyhavemostlybeenrestrictedtosimple
mappingbetweeninputelementsandoutputelements(egaDocBook
paratoaHTMLp).Anythingbeyondthisseemslikelytofailbecauseofthe
infinitecomplexityofwhatpeoplewanttodowiththeirinformation.If
youhaveaccesstotheACMdatabase,seethepaperbyPietriga,Vion-Dury,
andQuintonVXT,fromtheACMDocEng’01(Atlanta)Proceedings.
G
ENERATING
HTML
ONTHESERVER
Thereisagrowinguseofserver-sideprocessorslikeCocoonandothers,
whichletyoucreate,store,andmanageyourinformationinXMLbutserve
itauto-convertedtoHTMLorsomeotherformat,thusallowingtheoutput
tobeusedbyanybrowser.XSLTisalsowidelyusedtotransformXMLinto
non-SGMLformatsforinputtoothersystems(forexampletotransform
XMLintoLAT
E
Xfortypesetting).
A
LTERNATIVESTO
XSL:FO
27
InsteadofgeneratingPDFviaanFOprocessor,itispossibletouseXSLT2
totransformXMLtoL
A
T
E
XfortypesettingPDF(asisdonefortheprint
versionsofthisFAQ,fromDocBooktoLAT
E
X).Thishastheadvantageof
beingabletomakeuseofLAT
E
X’sextensivelibraryofprewrittenformatting
modules(‘packages’),whichavoidsmuchofthewheel-reinventing
currentlyrequiredwithXSL:FO.
Alternatively,DavidCarlisle’sxmltexreadsXMLdirectly,offering
anotherpracticalifexperimentalsolutiontotypesettingXML.Oneuseof
aT
E
XsystemthatcantypesetXMLfilesisasabackendprocessorfor
XSL:FO,serialisedasXML.SebastianRahtz’sPassiveT
E
Xusesxmltexto
achievethisend.
TheT
E
XFAQisathttp://www.tex.ac.uk/faq.Silmarilmaintainsthe
onlineversionofPeterFlynn’sbookonLAT
E
X,FormattingInformation,
whichhassomeexamplesofXSLT2conversionFlynn,2014.
SGMLsystemsusedasimilarstylesheetmechanism: someofthecommon
ones weretheFOSI(FormattedOutputSpecificationInstance), whichwas
standardindefenceandindustrialengineeringapplications, especiallywhen
usingtheArbortexteditor(Adept, thenEpic,probablysomethingelsenext
week);theDynaText/DynaWebstylesheetusedinSGMLpublishingtothe
web;andtheSynexstylesheet usedinbrowsers basedontheSynexengine
(egPanorama, whosestylinginterfacewaspartlyadoptedinXMetaL),the
expertiseofwhosedesignerspersists intheDocZillabrowser.
2.6 WherecanI getan XML browser?
AllmodernbrowserssupportXML
Currentstateofexisting browsersupportforXML(1August 2014):
• Currentversions ofMicrosoftInternetExplorer, Firefox,Safari,
Chrome,Mozilla, andOperaallappeartosupportXMLwithCSS
and/orXSLT 1.0stylesheets. Theeditorwouldwelcomeadditional
informationandcorrections.
• Don’tuseNetscape(anyversion), Internet Explorer6orearlier, orany
earlyversions ofMozilla ifyouwantXMLsupport: theyeitherdon’t
haveitorwerehopelesslybroken. Upgradetoamodernbrowseras
soonaspossible.
28
Theremainderofthislistisofhistoricalinterestonly.
• MicrosoftInternetExplorer5.0and5.5handledXML,processingitby
defaultusinga built-instylesheet writteninaMicrosoft-specific,
obsoletepredecessorofXSLTcalled XSL(nottobeconfusedwith the
realXSLT).Theoutputofthestylesheetis DHTML, which,when
renderedinthebrowser, showsa coloured, syntax-highlightedversion
oftheXMLdocument, withcollapsibleviews. IftheXMLdocument
referencesa stylesheet, thatstylesheetwillbeusedinstead, withinthe
limitationsofMSIE’sincompleteimplementationofCSS.MSIE5.0and
5.5canalso usestylesheetsinanotherobsoletesyntaxcalled WD-xsl,
whichshouldbeavoided. Theseversions canbeupgradedtosupport
realXSLT:seetheMSXMLFAQ.
MSIE6.0andlateruserealXSLT 1.0,but canuseboth theobsolete
syntaxesaswell.
• MozillaFirefox0.9up,Netscape6and7(thereis noNetscape5), and
GaleonallhavefullXMLsupport withXSLTandCSS.Ingeneral,
Firefoxis morerobustthanMSIE,andprovidesbetterstandards
adherence.
IhaveauserreportthatNetscape4.6and4.8supportsXML, butno
independentverification.
• TheauthorsoftheformerMultiDocProSGMLbrowser,CITEC(whose
enginewas also usedinPanorama andotherbrowsers), joinedforces
withMozillatoproducea multi-everything browsercalled DocZilla,
whichreadHTML, XML, andSGML, withXSLTandCSSstylesheets.
ThisranunderWindowsandLinuxandwasatrelease1.0at thetimeit
becameunavailable. Thiswasbyfarthemost ambitiousbrowser
project,andwas backedbyverysolid markup-handling expertise.
Ihaveless informationontheXMLcapabilitiesoftheMacOSXbrowser
Safari, whichisbasedontheKHTMLengineused inKonqueror. Konqueror
itselfdoesnotappeartosupportXMLorXSLT(atleastinKDEunder
Fedora Core, forexample),but Safari1.3.2(v312.6)underOS 10.3did
providepartialsupportforXML, butdoesnothonouranexternalDTD
modified byaninternalsubset(thanksto JohnHayniefortestingthis).
29
MikeBrownwrites:
Theconceptof‘browsing’isprimarilytheresultofHTMLhavingthe
semanticsthatitdoes.InanHTMLdocumenttherearesectionsoftext
calledanchorsthatare‘hyperlinked’tootherdocumentsthatmightbeat
remotelocationsonanetworkorfilesystem.HTMLdocumentsprovide
cuestoawebbrowserregardinghowthedocumentshouldbedisplayed
andwhatkindofbehavioursareexpectedofthebrowserwhentheuser
interactswithit.TheHTMLspecificationprovidesmanysuggestionsand
requirementsforthebrowser,andprovidesspecificmeaningsformany
differentexamplesofmarkup,suchasthefactthatan<img>element
referstoanimagethatshouldberetrievedbythebrowserandrendered
inlinewiththeadjacenttext.
UnlikeHTML,XMLdoesnothavesuchinherentsemanticsatall.There
isnoprescribedmethodforrenderingXMLdocuments.Therefore,whatit
meansto‘browse’XMLisopentointerpretation.Forexample,anXML
documentdescribingthecharacteristicsofamachinepartdoesnotcarry
anyinformationabouthowthatinformationshouldbepresentedtoa
user.Anapplicationisfreetousethedatatoproduceanimageofthe
part,generateaformattedtextlistingoftheinformation,displaytheXML
document’smarkupwithaprettycolorscheme,orrestructurethedata
intoaformatforstorageinadatabase,transmissionoveranetwork,or
inputtoanotherprogram.
However,despitethefactthatXMLdocumentsarepurelydescriptive
datafiles,itispossibleto‘browse’theminasense,byrenderingthemwith
stylesheets.Astylesheetisaseparatedocumentthatprovideshintsand
algorithmsforrenderingortransformingthedataintheXMLdocument.
HTMLusersmaybefamiliarwithCascadingStyleSheets(CSS).TheCSS
stylesheetlanguageisgeneralandpowerfulenoughtobeappliedtoXML
documents,althoughitisorientedtowardvisualrenderingofthe
documentanddoesnotallowforcomplexprocessingofthedocument’s
data.ByassociatinganXMLdocumentwithaCSSstylesheet,itmaybe
possibletoloadanXMLdocumentinaCSS-awarewebbrowser,andthe
browsermaybeabletoprovidesomekindofrenderingofit,evenifthe
browserdoesnototherwiseknowhowtoreadandprocessXML
documents.However,notallwebbrowserswillloadanXMLdocument
correctly,andtheyarenotrequiredtorecognisetheXMLmarkupthat
associatesthedocumentwithastylesheet,soonecannotassumethat
XMLdocumentscanbeopenedwithjustanywebbrowser.
AmorecomplexandpowerfulstylesheetlanguageisXSLT,the
30
Documents you may be interested
Documents you may be interested