asp.net c# view pdf : Break pdf documents SDK software API .net winforms windows sharepoint open-2001-0651-part840

Apositivepointaboutthistoolisthatitmarksthepageswithinformation. Atthetopofeachpage,
itplacesananchorwherebythenamegivenisthepagenumber. Italsoshowsthepagenumberat
theendofeachpage. Thispagenumberingandnamingmakestherecognitionofpagebreaksvery
easy,andisanicefeature.
Duringthetestingofthisprogram,IdidnotfindmanyPDFfilesthattheprogramwasunableto
convert. TheonlysuchfileswerePDFfilesofaparticularlybadquality,whichIsuspectwere
documentsthatmaypossiblyhavecontainedunknownfonts,soimprovisationhadtobecarriedout
bythetoolthatcreatedthePDF. ThetoolhadabsolutelynoproblemsconvertingPDFfilesthat
werecreatedfromMicrosoftWorddocuments,andtheHTMLsourceforPDFfilesconvertedfrom
suchdocumentswasexactlythesameasthoseconvertedfromnon-Microsoftdocuments.
AlthoughthetooldoesplaceboldtagsanditalictagsintotheHTMLsource,thisisnot100%
reliable,asthetagginginformationisalittleerratic. Takeforexamplethefollowingextract:
<i>Klim</i><i>a-</i>
<i>u</i><i>n</i><i>d</i>
<i>K</i><i>ält</i><i>eanlagen</i><br>
<i>du</i><i>rc</i><i>h</i>
<i>E</i><i>ins</i><i>a</i><i>tz</i>
<i>v</i><i>o</i><i>n</i>
<i>e</i><i>lek</i><i>t</i><i>ronis</i><i>c</i><i>h</i>
<i>geregelt</i><i>e</i><i>r</i>
<i>P</i><i>um</i><i>pen.</i>
<i>KI</i>
<i>Luf</i><i>t-</i>
<i>und</i>
<i>Kältet</i><i>ec</i><i>hni</i><i>k</i><i>,</i>
<i>page</i>
<i>20-</i><br>
<i>23</i><i>,</i>
<i>J</i><i>anuary</i>
Figure9. HTMLsourceproducedbypdftohtml. NotethepointlessrepetitionofHTMLtagginginformation.
Noticefromtheaboveextract,thatthetagsareclosedandre-openedpointlesslyforpartsofa
word,whereitshouldreallybe1setoftagstoenclosetheentireword. Infact,often,itshould
reallybe1setoftagstoenclosetheentiresectionofitalicisedtext. Ofcoursethisdoesnotmakea
differencetothebrowser,butifweareinterestedinthesource,thenthereisalotofcleaningtobe
doneuponit.
The‘pdftohtml’Tool:TestingConclusion
ItisashamethattheHTMLsourceproducedbythistoolissobrokenup. Itwouldhavebeen
preferableifitwasonlybrokenwheretherelevantHTMLbreaktagswere. However,despitethis
Break pdf documents - Split, seperate PDF into multiple files in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Explain How to Split PDF Document in Visual C#.NET Application
break pdf into multiple documents; break a pdf into smaller files
Break pdf documents - VB.NET PDF File Split Library: Split, seperate PDF into multiple files in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
VB.NET PDF Document Splitter Control to Disassemble PDF Document
pdf will no pages selected; pdf rotate single page
fact,theHTMLoutputisdisplayedverywellinabrowser,andsoifthisisthegoal,thenthistool
wouldbeperfect. Itevenhasthecapabilitytoincludepictures,makeframes,etc.
Unfortunately,althoughthetooldoesplaceboldanditalicisingmark-uptagsintotheHTML
source,itisnotreliableenoughformypurposes. Ihadwantedthetooltomarkupallheaders
preferablywith"<H>"tags,sothatIcouldrecognisesectionheadersetc. However,thetooldoes
notuse"<H>"tags,anditsometimeswouldmarkupheader,butothertimesitwouldnot. Thereis
nothingtobegainedfromthecheckingformarked-upheadersinthesourceproduced,asituses
“<B>”tagstomarkheadersup,butalsomarksupnormaltextwiththeseboldtags,hence
destroyinganywayofuniquelyidentifyingtheheadersbymark-up.
IwouldsaythatasfarasPDF-to-HTMLconvertersgo,thistoolisverygoodforcreatingHTML
thatisintendedfordisplayinabrowser,andIwouldstronglyrecommendthetoolifthisisthe
desiredoperation. However,itmustberealisedthatifthesourceiswantedforanyfurther
examination,itwillbenecessarytodoconsiderablecleaningonitbeforeitisready.
Verdict:
RecommendforPDF-to-HTMLconversionstobeviewedinabrowser.
Notreliableifmark-upofvarioussectionsisrequired.
The‘pstotext’Tool
Source:<http://research.compaq.com/SRC/virtualpaper/pstotext.html>
ThepstotexttoolisavailablefreelyfromtheaboveURL. ItwaswrittenbyAndrewBirrell,asa
spinofffromaprojectknownas"VirtualPaper".
ThepstotexttoolisprimarilyaPostScripttotextconverter,butcanalsobeusedforconverting
PDFdocumentstotext(althoughthedocumentationforthetoolclaimsthatthisisslightlyless
reliable).
Thepstotexttoolrequiresversion3.33orlaterofGhostscriptinordertowork.
Installingpstotext
ThepstotexttoolisdownloadedfromtheaboveURLasazipped".tar"file. Itsinstallationisreally
verysimple,itonlybeingnecessarytobuildthetoolwiththemakecommand.
Usingpstotext
Thepstotexttoolhasthefollowingusageinformation:
Usage: pstotext [option|file]...
Options:
-cork
assume Cork encoding for dvips output
C# PDF Convert: How to Convert MS PPT to Adobe PDF Document
RasterEdge.com is specializing in documents and images conversion WriteLine("Fail: can not convert to PDF, file type unsupport"); break; case ConvertResult
break pdf password online; how to split pdf file by pages
C# PDF Convert: How to Convert Office Excel to Adobe PDF
sheet size will keep unchanged for conversion among documents. WriteLine("Fail: can not convert to PDF, file type unsupport"); break; case ConvertResult
break a pdf into parts; can print pdf no pages selected
-landscape
rotate 270 degrees
-landscapeOther
rotate 90 0 degrees
-portrait
don't rotate e (default)
-bboxes
output one word per line with bounding box
-debug
show Ghostscript output and error r messages
-gs "command"
Ghostscript command
-
read from stdin n (default if f no o files s specified)
-output file
output results to o "file" " (default is s stdout)
Essentially,Iusedthetoolwithnoneoftheoptionsintheform:
«
pstotext [filename]
»
ThisstreamedtheoutputtoSTDOUT,whichIfoundtobeagoodfeature,asitallowstheoutputto
bedirectlycapturedandmanipulatedinsteadofhavingtobewrittentoafile,whichwouldtakeup
extratime.
Ifoundthatthepstotexttoolworkedfairlyquickly. ForaPostScriptfileofaround322KB,ittook
approximately50secondstomaketheconversiontotext,includingwritingittoafile.
Onthewhole,thequalityoftheoutputproducedbypstotextwasverygood. Therewererarely
manywordsincorrectlybrokenwithspaces. Linestookthesamelengthsastheydidintheoriginal
file(i.e.theyarewrappedinthesameplacesastheywereinthePostScriptfile). Thiswrapping
howeverwasdonebyinsertinganewlinecharacterintotheline,thusbreakingitintoseparate
lines. Thisisperhapsalittleunfortunate,asitwouldbeniceintheinterestsofparsingtheresultsif
thetoolweretoonlyinsertnewlinecharacterswheretherewasreallysupposedtobeanewline
(notwherelineswerewrappedforformattingpurposes),asitwouldeliminatetheneedtorebuild
linesatalaterstage.
OneproblemthatIdidencounterintheoutputwaswiththeword"different". Intheoutputtext
fromadocument,itkeptoutputting"di#erent"wheretheworddifferentwassupposedtobe. I
believethatthismustbesomethingtodowith2'f'charactersbeingnexttoeachother-perhaps
somesortofcharacterencodingproblem. Thiscouldcausebigproblemsifthetextwastobe
searchedforcertainkeywords,asstrangecharactersinthemiddleofwordsinthetextcould
preventtheirmatching.
Anotherdrawbacktothetoolsoutput(albeitasmallone)wasthatwhenawordwassplitand
hyphenated(duetowordwrappinginthePostScript),thistoolmadenoefforttoremovethe
hyphenationandplacethewordbacktogetherasoneword. Thisisunfortunate,asitwouldhave
helpedtoimprovethequalityoftheoutput.
Thepstotexttooldoesnotinsertanykindofpagebreakinformationsuchasalineofhyphensasin
theoutputoftheprescripttool. Allitdoesisprintthepagenumber(ifthedocumenthadone),
alongwitha'\f'character,whichisaform-feedcharacter,andservesasapagebreak. Thisis
perhapsunfortunate,asitwouldaidtheclarityoftheoutputtohavealineofhyphensorothersuch
charactersinsteadofa'\f'character.
ThefollowingextractfromtheaconvertedPostScriptdocumentshowsthequalityoftextproduced
bythepstotexttoolforPostScriptdocuments:
VB.NET PDF Page Insert Library: insert pages into PDF file in vb.
Forms. Support adding PDF page number. Offer PDF page break inserting function. Free SDK library for Visual Studio .NET. Independent
split pdf files; break a pdf file into parts
C# PDF Page Insert Library: insert pages into PDF file in C#.net
Ability to add PDF page number in preview. Offer PDF page break inserting function. Free components and online source codes for .NET framework 2.0+.
break pdf into single pages; break apart pdf
[9]SandraPayetteandCarlLagoze.Value-addedsurrogatesfordis-
tributedcontent.D-LibMagazine:TheMagazineofDigitalLi-
braryResearch,6(6),June2000.
18
[10]AndyQuick.JavaHTMLtidy.
<http://www3.sympatico.ca/ac.quick/jtidy.html>
[11]K.G.Saur.Functionalrequirementsforbibliographicrecords,
1998.UBCIMPublications-NewSeriesVol.19.
[12]KarenSollinsandLarryMasinter.Functionalrequire-
mentsforuniformresourcenames,December1994.
http://www.ietf.org/rfc/rfc1737.txt.
[13]ElaineSvenonius.TheIntellectualFoundationofInformation
Organization.M.I.T.Press,2000.
[14]HerbertVandeSompelandCarlLagoze.TheSantaFeCon-
ventionoftheOpenArchivesInitiative.D-LibMagazine:The
MagazineofDigitalLibraryResearch,6(2),February2000.
19
Figure10. Textproducedbythe‘pstotext’tool. Thequalityisfairlygood.
Noticeintheabovefigure,thatthepagenumber(18)thatcomesafterthe9threference. Thiscould
easilybemistakenforanotherpartofthatreferenceduringparsing.
WhenthetoolwastriedusingPDFfilesasinput,theresultsoftheconversionwerefair,but
certainlyalackofqualitywasvisible. Infact,insomePDFconversions,thedocumentwascut
short. Thissuggeststomethatthepstotexttoolpossiblyhassomedifficultiesinreadingthe
internalreferencestotheobjectsthatmakeupthePDFfile. Thisdoesnotmeantosaythatthetool
wasuselessforconvertingPDFdocumentstotext,justcertainlynotperfect.
ThefollowingextractshowspartofaconversionofaPDFdocument:
[37]ATLAShomepagehttp://atlasinfo.cern.ch:80/Atlas/Welcome.html
[38]ATLASTrigger/DAQPrototype-1homepagehttp://atddoc.cern.ch/Atlas/
[39]ApplicationsofCorbaintheAtlasprototypeDAQ,S.Kolos,R.Jones.L.Mapelli,Y.Ryabov,
11th
IEEENPSSRealTimeConferenceProceedings,1999,pp469-474
[40]Textor-94experimenthomepageishttp://www.fz-juelich.de/ipp
[41]Objectivity/CorbadistributeddatabaseperformanceongigabitSUN-Ultra-10cluster,
L.Gommansandothers,11thIEEENPSSRealTimeConferenceProceedings,1999,442-445
[42]OverviewofPHENIXOnlineSystem,C.Witzig,10thIEEERealTimeConference
Proceedings,
1998,pp541-543
[43]UseofCORBAinthePHENIXDistributedOnlineComputingSystem,E.Desmondand
others,
C# TWAIN - Query & Set Device Abilities in C#
device.TwainTransferMode = method; break; } if (method == TwainTransferMethod.TWSX_FILE) device.TransferMethod = method; } // If it's not supported tell stop.
reader split pdf; pdf separate pages
C# TWAIN - Install, Deploy and Distribute XImage.Twain Control
RasterEdge.XDoc.PDF.dll. device.TwainTransferMode = method; break; } if (method == TwainTransferMethod.TWSX_FILE) device.TransferMethod = method; } // If it's
split pdf by bookmark; cannot print pdf no pages selected
11thIEEENPSSRealTimeConferenceProceedings,1999,pp487-491
[44]BaBarhomepagehttp://www.slac.stanford.edu/BFROOT/
[45]AmbientandConfigurationDatabasesfortheBaBarOnlineSystem,G.Zioulasandothers,
11th
IEEENPSSRealTimeConferenceProceedings,1999,pp548-550
Figure11. Textproducedbypstotext(convertedfromaPDFdocument).
Ascanbeseenfromthisextract,thequalityisfairlyhigh. Itisashamethatthiscan'tbe
guaranteedthiseverytime.
Unfortunately,duringtesting,Idiscoveredthatthepstotexttoolwasveryunreliablewhen
attemptingtoconvertaPostScriptfilethatwascreatedfromaMicrosoftdocument. Iattemptedto
convertseveralPostScriptfilesthathadbeencreatedfromMicrosoftWordandPowerPointfilesby
theCERNConversionService,andobtained"garbage"outputsimilartothefollowing:
-
--
--
--
--
--
--
-
--
-
--
.
.
.
-
Figure12. GarbageoutputobtainedwhenanattemptismadebypstotexttoconvertaPostScriptproducedfroma
Microsoftdocument.
Itcanbeseenfromtheaboveextract,thattheoutputforthisMicrosoftcreatedPostScriptthathas
beenconvertedtotextiscompletelyuseless. Whatisworseisthatthetooldoesnotappearto
outputanysortoferrormessagessayingthatitcannotproperlyunderstandthefilethathasbeen
passedtoitasinput.
C# TWAIN - Specify Size and Location to Scan
foreach (TwainStaticFrameSizeType frame in frames) { if (frame == TwainStaticFrameSizeType.LetterUS) { this.device.FrameSize = frame; break; } } }.
pdf no pages selected to print; break a pdf apart
C# TWAIN - Acquire or Save Image to File
RasterEdge.XDoc.PDF.dll. if (device.Compression != TwainCompressionMode.Group4) device.Compression = TwainCompressionMode.Group3; break; } } acq.FileTranfer
break pdf into multiple pages; pdf insert page break
The‘pstotext’Tool:TestingConclusion
Ihavefoundthatthepstotexttooliscapableofproducingniceoutputthatisfairlyeasytoparse.
Therehavebeenafewdownfallswiththisoutput,suchasthelackofpagebreakinformationetc,
butthisisnottoocrucial,asitstilloutputstheform-feedcharacter.
However,onthedownsideofthetool,althoughoftenveryniceoutputwasobtainedfromaPDF
fileconversion,sometimespartofthefilewouldbelost. ThismakesitpartlyunreliableforPDF
conversions.
Thebiggestletdownofallforthepstotexttoolisthatitseemstobeveryunreliableatconverting
MicrosoftPostScriptdocumentstotext. This,inmyopinion,makesthetoolunsuitableforuseina
productionenvironmentwherewecannotdeterminethesourcesandcreatorsofthePostScript/PDF
filesthatwewanttoconvert. Infact,itisquitelikelythatmanyofthefilesthatwewouldexpectto
convertwouldhavebeencreatedfromMicrosoftWorddocuments,soclearly,thistoolis
unsuitable.
Verdict:Unsuitable.
The‘Prescript’Tool
Source:<http://www.nzdl.org/html/prescript.html>
PrescriptisavailablefreelyfromtheaboveURL. ItwaswrittenbypeopleattheNewZealand
digitalLibraryorganisationasatranslatortochangePostScriptdocumentsintoText. Italso
howeverofferssupportforasimpleHTMLoutputofthedocument. Unfortunately,because
prescriptisaPostScripttotexttranslator,itdoesnotofferPDFtotexttranslationcapabilities.
WhenprescriptisusedtoproduceHTML,ithasthecapabilityonlytointroducecertainHTML
tags. Thesearethe"<P>","<BR>","<HR>"and"<I>...</I>"tags. . Italsoofcourseinsertsthe
"<HTML>","<HEAD>",etctagsintothedocument. AccordingtotheNZDLsit,prescriptalso
attemptstosupportparagraphboundariesdetectionbyusingthelinespacingandindentationinthe
documentinordertodetermineparagraphboundaries.
AccordingtotheNZDLsite,prescriptalsoattemptstode-hyphenatewordsthathavebeen
hyphenatedbythePostScript,whichcouldbeafairlyusefulfeature. Italsoattemptssomeligature
translationforT
E
Xdocuments.
Prescriptrequiresversion4.01orhigheroftheGhostScriptutilityinordertowork. Itisalso
writteninthePythonlanguage,andsorequiresthePythoninterpreter.
Thereare2mainversionsofprescriptavailable. Theseare"PreScript0.1",whichisthestable
versionoftheprogram,andisrecommendedbytheauthorsastheversionthatshouldbeusedfor
anyseriousworkthatistobeundertaken. Thereisalsohowever,"PreScript2.2",whichisthe
latestversionofthetool. Theauthorsclaimthatthisversionofthetoolisalotfaster,andgenerally
better,includingbetterpredictionofline,pageandparagraphbreaks.
Installingprescript
Havingdownloadedtheprescripttool,whichcameasa"tar"package,amakefilewasusedinorder
toinstallit. However,manythingsneededtobedonemanually,suchasmakingthevarious
directoriesthatitneeded,asitcouldnotsuccessfullydothisduringtheattemptedbuilds. Itwas
alsonecessaryformetochangethepointertothepythoninterpreter,etc. Itwasalittleawkward,
butnomajorproblemswereencountered.
Usingprescript:
Thetoolcouldbeinvokedasfollows:
«
prescript <plain|html|arff> > <input> [output]
»
Itwasnecessarytospecifyforthetool,whichformattheoutputshouldtake,thenameoftheinput
file(thePostScriptfiletobeconverted),andthenameoftheoutputfile,towhichtheoutputwasto
bewritten. Unfortunately,itisnotpossibletotelltheutilitytosimplywriteitsoutputtothe
STDOUTstream-itappearstoneedtoactuallywriteittoafile. Thisisadefinitedownsidetothe
tool,asformypurposes,Isimplywanttocallthetoolfromwithinanotherprogram,feedingits
outputdirectlytoSTDOUT,andretrievingitforusebymyownprogram. Anintermediatestageof
writingafilewouldbeaperformancedrawback.
Havinglearnedhowtousethetool,ItrieditwithseveralPostScriptfiles,testingboththeHTML
outputandtheplain-textASCIIoutput. Ifoundthatonthewhole,thetoolgavesomeveryclean
andencouragingresults.
Firstofall,theHTMLresultsshallbediscussed. Ishallalsoincludesomeshortextractsfromthe
convertedoutputofafilesothattheycanbeappreciatedwithinthisdocument.
Whenviewedwithinabrowser,theHTMLresultsareverynice. Althoughthetextisinfairlyshort
lines,itiswellbrokenupintoparagraphs. Itisverycleartoreadinthismanner. Oneproblemthat
wasencounteredhowever,wasthatwhenthereisanimageinthePostScriptdocumentandthis
imagecontainswords,thewordsareunfortunatelytranslatedandplacedwithintheoutputtext.
Often,thiscanbenonsensebecausetheymeannothingwithouttherestoftheimage,anditwould
inmyopinionhavebeenbettertosimplyleavethemoutofthetranslator. Unfortunately,thisisa
problemcommontoalloftheconversiontoolstestedforthepurposesofthisreport. Itismy
feelingthatthetoolscannotdistinguishbetweentextwrittenontopofanimage,andtextwrittenon
therestofthePS/PDFcanvas. Presumably,theimageisrepresentedasbinaryinformation,butits
textremainsasaseriesofcharacterswrittenonthecanvaswiththeusualoperators.
Asmygoalistheextractionofreferenceinformationfromthereferencesectionhowever,itshould
usuallycausenoproblems. However,becausethereferencessectioncansometimescontain
imagesandfigures,anytextfromthesecouldpollutethereferences.
WhenduringthetranslationprocessthetooldiscoversanewpageinthePostScriptdocument,itis
markedintheHTMLoutputwitha"<HR>"tag. Thiscouldbeveryuseful,asitwouldallowany
parsingtooltoeasilyandunambiguouslyidentifyanynewpagesintheoutput. Thetoolalsomarks
eachpagewiththepagenumberjustbeforethe"<HR>"tag.
AdownsidetotheHTMLproducedisthattheprescripttooldoesnotmakeanyefforttomark-up
titlesectionswith"<H>"tagsor"<B>"tags. Thisiscertainlyunfortunate,asitwouldbevery
usefulforaparserattemptingtoextractreferencestohavetitlesectionsmarkedup. Theauthors
informationaboutthetooldoessaythatituses"<I>"tags,butonlyformarkingupheaderand
footersections. Verydisappointing.
RegardingthesourceoftheHTMLitself,Icanonlysaythatitisverygood. WithotherPostScript
totextconversiontoolsthatIhaveseen,thequalityoftheoutputisoftenmessy,withhyphened
wordsfrequentlyoccurring,andwithwordsnotproperlyrecognisedandendingupwithspacesin
themiddleofaword,hencemakingfurtherparsingdifficultiesforanytoolsthatusetheoutputto
attempttorecognisewordsandinformation. WiththeHTMLsourceproducedbytheprescripttool
however,thisisnotthecase. Ididnotonceseeawordthathadbeenerroneouslysplit.
LineswerebrokenattheendofthelineasitappearsinthePostScriptdocument. Theonlyreason
thattheselineswerebrokenatthesepointsinthePostScriptdocumentisthattheformattingofthe
textinthePostScriptrequireslinestobewrappedinorderthattheyfitthepage. Inouroutput
HTMLhowever,itwouldbepreferableifthelinesdidnotkeepthisformatoflinebreaksunless
thereisreallysupposedtobeone. However,withtheHTMLoutput,itwasnotalargeproblem
becausewiththeHTMLmarkup,itwouldbeeasytoreplaceallcarriagereturnsinthetextwith
spaces,unlesstherewasa"<BR>"or"<P>"tagpresent,inwhichcasethecarriagereturninthe
textwouldbejustified.
ThefollowingextractshowssomeHTMLsourcecreatedbytheprescriptprogramforadocument.
<p>[1]DonnaBergmark.Automaticextractionofreferencelinkinginformation
fromonlinedocuments.TechnicalReportTR2000-1821,
CornellComputerScienceDepartment,October2000.
<p>[2]Priscilla Caplan and William Arms. . Reference e linking
for journal articles. . D-LibMagazine: : TheMagazine
ofDigitalLibraryResearch,5(7/8),July/August1999.
&lt;http://www.dlib.org/dlib/july99/caplan/07caplan.html&gt;
<p>[3]JamesDavisandCarlLagoze.NCSTRL:designanddeployment
ofagloballydistributeddigitallibrary.IEEEComputer,February
1999.
<p>[4]SteveHitchcock,LesCarr,WendyHall,StephenHarris,S.Probets,
D.Evans,andD.Brailsford.Linkingelectronicjournals:
LessonsfromtheOpenJournalproject.D-LibMagazine:The
MagazineofDigitalLibraryResearch,December1998.
<p>[5]C.LagozeandJ.Davis.Dienst:Anarchitecturefordistributed
documentlibraries.CommunicationsoftheACM,38(4):47,April
1995.
<p>[6]SteveLawrence,C.LeeGiles,andKurtBollacker. Digitallibraries
andautonomouscitationindexing. IEEEComputer,
32(6):67{71,1999.&lt;http://www.researchindex.com&gt;
<p>[7]NormanPaskin.E-citations:actionableidentifiersandscholarly
referencing,1999.&lt;http://www.doi.org/citations.pdf&gt;
<p>[8]S.PayetteandC.Lagoze. Flexibleandextensibledigitalobject
andrepositoryarchitecture(FEDORA).InSecondEuropean
ConferenceonResearchandAdvancedTechnologyforDigitalLibraries,
Heraklion,Crete,1998.
<p>[9]SandraPayetteandCarlLagoze.Value-addedsurrogatesfordistributed
content.D-LibMagazine:TheMagazineofDigitalLibrary
Research,6(6),June2000.
<p><!--PageNo--><p><b><center>18</center></b><p>
<!--EndOfPage--><p><hr><p>
<p>[10]Andy Quick.
Java HTML
tidy.
&lt;http://www3.sympatico.ca/ac.quick/jtidy.html&gt;
<p>[11]K.G.Saur.Functionalrequirementsforbibliographicrecords,
1998.UBCIMPublications-NewSeriesVol.19.
<p>[12]Karen SollinsandLarryMasinter. . Functional l requirements
for uniform resource names, , December r 1994.
http://www.ietf.org/rfc/rfc1737.txt.
<p>[13]ElaineSvenonius.TheIntellectualFoundationofInformation
Organization.M.I.T.Press,2000.
<p>[14]HerbertVandeSompelandCarlLagoze.TheSantaFeConvention
oftheOpenArchivesInitiative.D-LibMagazine:The
MagazineofDigitalLibraryResearch,6(2),February2000.
<p><!--PageNo--><p><b><center>19</center></b><p>
Figure13. AsampleoftheHTMLsourcecreatedbytheprescripttool. . Itisofaveryhighquality.
NoticefromtheaboveHTMLsource,thatalthougheachreferencelineissplitintoseverallinesof
text(wrapped),eachreferenceisseparatedfromthepreviousbya"<P>"tag. Thiswouldmakeit
veryeasyforaparsertorebuildthecompletereferenceline,andindeedtoseparateseveral
referencelinesfromeachother. Noticealso,thewaythatthestartofapageismarkedwiththe
pagenumber(andthereisalsoacommenttoletusknowthatthisisthepagenumber:
"<p><!--Page No--><p><b><center>19</center></b><p>"
Thiswouldmakeitveryeasyforaparsertorecognisethatthepagehasreacheditsend,and
thereforetorecognisethepatternofnewlinesetcthatcomewiththeendofthepage,andthus
removethemappropriately.
Thereisalsothefollowingendofpagecommentafterthepagenumberhasbeendisplayed:
"<!--EndOfPage--><p><hr><p>"
Overall,theHTMLsourceproducedbyprescriptisofahighquality. Thefigurebelowshowsa
screenshotofitsappearanceinabrowser:
Figure14. HTMLproducedbytheprescripttoolasitappearsinabrowser.
TextOutput
NowthattheHTMLcreatedbyprescriptfromthePostScriptdocumenthasbeendiscussed,itis
necessarytodiscussthetextoutputofthetool. Thediscussionofthetextoutputoftheprescript
toolprovidedherewillbefairlyshort,becausethetextoutputisverysimilartotheHTMLoutput.
Essentially,itisthesameastheHTMLoutput,butwithoutanymarkuptags.
Documents you may be interested
Documents you may be interested