pdf viewer in mvc c# : Paste picture into pdf preview control software system azure winforms windows console tb96szabo0-part1654

E112
MAPS39
PéterSzabó
OptimizingPDFoutputsizeof
T
E
Xdocuments
Abstract
ThereareseveraltoolsforgeneratingPDFoutputfrom
aT
E
Xdocument. Bychoosingtheappropriatetools
andconfiguringthemproperly,itispossibletoreduce
thePDFoutputsizebyafactorof3orevenmore,thus
reducingdocumentdownloadtimes,hostingand
archivingcosts.Weenumeratethemostcommontools,
andshowhowtoconfigurethemtoreducethesizeof
text,fonts,imagesandcross-referenceinformation
embeddedintothefinalPDF. Wealsoanalyzeimage
compressionindetail.
Wepresentanewtoolcalledpdfsizeopt.pywhich
optimizesthesizeofembeddedimagesandType1
fonts,andremovesobjectduplicates. Wealsopropose
aworkflowforPDFsizeoptimization,whichinvolves
configurationof T
E
Xtools,runningpdfsizeopt.pyand
theMultivalentPDFcompressoraswell.
1Introduction
1.1WhatdoesaPDFdocumentcontain
PDF
isapopulardocumentleformatdesignedforprint-
ingandon-screenviewing.
PDF
faithfullypreservesthe
designelementsof thedocument,suchasfonts, line
breaks,pagebreaks,exactspacing,textlayout,vector
graphicsandimageresolution. Thustheauthorof f a
PDF
documenthasprecisecontroloverthedocument’s
appearance—nomatterwhatoperatingsystemorrenderer
softwareisusedforviewingorprintingthe
PDF
.From
theviewer’sperspective,a
PDF
documentisasequence
ofrectangularpagescontainingtext,vectorgraphicsand
pixel-basedimages.Inaddition,somerectangularpage
regionscanbemarkedashyperlinks,andUnicodeanno-
tationscanalsobeaddedtotheregions,sotextmaybe
copy-pastedfromthedocuments.(Usuallythecopy-paste
yieldsonlyasequenceofcharacters,withallformatting
andpositioninglost.Dependingonthesoftwareandthe
annotation,theboldanditalicspropertiescanbepre-
served.)Atree-structuredtableofcontentscanbeadded
aswell,eachnodeconsistingofanunformattedcaption
andahyperlinkwithinthedocument.
Additionalfeaturesof
PDF
includeforms(theuserlls
someeldswithdata,clicksonthesubmitbutton,andthe
dataissenttoaserverinan
HTTP
request),eventhandlers
inJavaScript,embeddedmultimediales,encryptionand
accessprotection.
PDF
hasalmostthesame2
D
graphicsmodel(text,fonts,
colors,vectorgraphics)asPostScript,oneofthemost
widespreadpagedescriptionandprintercontrollanguage.
Soitispossibletoconvertbetween
PDF
andPostScript
withoutlossofinformation,exceptforafewconstructs,
e.g.transparencyandcolorgradientsarenotsupported
byPostScript.Conversionfrom
PDF
toPostScriptmay
blowupthelesizeiftherearemanyrepetitionsinthe
PDF
(e.g.alogodrawntoeachpage).Someoftheinter-
activefeaturesof
PDF
(suchasforms,annotationsand
bookmarks)havenoPostScriptequivalenteither;other
nonprintableelements(suchashyperlinksandthedocu-
mentoutline)aresupportedinPostScriptusing
pdfmark
,
butmany
PDF
-to-PostScriptconvertersjustignorethem.
1.2HowtocreatePDF
Since
PDF
containslittleornostructuralandsemantic
information(suchasinwhichorderthedocumentshould
beread,whichregionsaretitles,howthetablesarebuilt
andhowthechartsgenerated),wordprocessors,drawing
programsandtypesettingsystemsusuallycanexportto
PDF
,butforloadingandsavingtheykeepusingtheirown
leformatwhichpreservessemantics.
PDF
isusuallynot
involvedwhiletheauthoriscomposing(ortypesetting)
thedocument,butonceaversionofadocumentisready,
a
PDF
canbeexportedanddistributed.Shouldtheauthor
distributethedocumentinthenativeleformatofthe
wordprocessor,hemightriskthatthedocumentdoesn’t
get renderedasheintended, duetosoftwareversion
dierencesorbecauseslightlydierentfontsareinstalled
ontherenderingcomputer,orthepagelayoutsettingsin
thewordprocessoraredierent.
Mostwordprocessorsanddrawingprogramsandim-
ageeditorssupportexportingas
PDF
.Itisalsopossibleto
generatea
PDF
evenifthesoftwaredoesn’thavea
PDF
exportfeature.Forexample,itmaybepossibletoinstalla
printerdriver,whichgenerates
PDF
insteadofsendingthe
documenttoarealprinter.(Forexample,onWindows,
PDF
Creator[22]issuchanopen-sourcedriver.)Someold
Paste picture into pdf preview - copy, paste, cut PDF images in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Detailed tutorial for copying, pasting, and cutting image in PDF page using C# class code
how to cut a picture out of a pdf; how to copy picture from pdf and paste in word
Paste picture into pdf preview - VB.NET PDF copy, paste image library: copy, paste, cut PDF images in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
VB.NET Tutorial for How to Cut or Copy an Image from One Page and Paste to Another
how to copy and paste a picture from a pdf; how to copy picture from pdf file
OptimizingPDFoutputsizeofT
E
Xdocuments
EUROTEX2009
E113
programscanemitPostScript,butnot
PDF
.Theps2pdf
[28]tool(partofGhostscript)canbeusedtoconvertthe
PostScriptto
PDF
.
Thereareseveraloptionsfor
PDF
generation from
T
E
Xdocuments,includingpdfT
E
X,dvipdfmxanddvips+
ps2pdf.Dependingonhowthedocumentuseshyperlinks
andPostScriptprogrammingingraphics,someofthese
wouldnotwork. SeethedetailsinSubsection2.1. See
[13]forsomemoreinformationabout
PDF
andgenerating
itwithL
A
T
E
X.
1.3MotivationformakingPDFfilessmaller
Ourgoalistoreducethesizeof
PDF
les,focusingon
thosecreatedfromT
E
Xdocuments.Havingsmaller
PDF
lesreducesdownloadtimes,webhostingcostsandstor-
agecostsaswell.Althoughthereisnourgentneedfor
reducing
PDF
storagecostsforpersonaluse(sincehard
drivesinmodern
PC
sarelargeenough),storagecosts
aresignicantforpublishinghouses,printshops,e-book
storesandhostingservices,librariesandarchives[26].
Usuallylotsofcopiesandbackupsaremadeof
PDF
les
originatingfromsuchplaces;saving20%ofthelesize
rightaftergeneratingthe
PDF
wouldsave20%ofallfuture
costsassociatedwiththele.
Althoughe-bookreaderscanstorelotsofdocuments
(e.g.a4
GB
e-bookreadercanstore800
PDF
booksof5
MB
averagereasonablelesize),theygetfullquicklyifwe
don’tpayattentiontooptimized
PDF
generation. One
caneasilygeta
PDF
le5timeslargerthanreasonableby
generatingitwithsoftwarewhichdoesn’tpayattentionto
size,ornotsettingtheexportsettingsproperly.Upgrading
orchangingthegeneratorsoftwareisnotalwaysfeasible.
A
PDF
recompressorbecomesusefulinthesecases.
Itisnotourgoaltoproposeorusealternativele
formats,whichsupportamorecompactdocumentrepre-
sentationormoreaggressivecompressionthan
PDF
.An
exampleforsuchanapproachistheMultivalentcompact
PDF
leformat[25],seeSection5formoredetails.There
isnotechnicalreasonagainstusingacompactformatfor
storage,andconvertingitontheytoregular
PDF
before
processingifneeded.Thedisadvantageofanonstandard
compactformatisthatmost
PDF
viewersandtoolsdon’t
supportitbydefault,sotheuserhastoinstallandrunthe
conversiontool,whichsomeuserscan’torwon’tdojust
forviewinga
PDF
. Whenarchivingcompact
PDF
les
foralongterm,wehavetomakesurethatwe’llhave
aworkingconverteratrestoretime.WithMultivalent,
thisispossiblebyarchivingthe
.jar
lecontainingthe
codeoftheconverter. Butthismaynotsuitallneeds,
becauseMultivalent isnotopen source, thereareno
alternativeimplementations, andthereisnodetailed
openspecicationforitscompact
PDF
leformat.
Apixel-based(xedresolution)alternativeof
PDF
is
DjVu(seeSection5).
Itispossibletosavespaceina
PDF
byremovingnon-
printedinformationsuchashyperlinks,documentoutline
elements,forms,text-to-Unicodemappingoruseranno-
tations.Removingthesedoesnotaecttheoutputwhen
the
PDF
isprinted,butitdegradestheuserexperience
whenthe
PDF
isviewedonacomputer,anditmayalso
degradenavigationandsearchability. Anotheroption
istoremoveembeddedfonts. Insuchacase,the
PDF
viewerwillpickafontwithsimilarmetricsifthefont
isnotinstalledontheviewermachine.Pleasenotethat
unembeddingthefontdoesn’tchangethehorizontaldis-
tancebetweenglyphs,sothepagelayoutwillremainthe
same,butmaybeglyphswilllookfunnyorhard-to-read.
Yetanotheroptiontosavespaceistoreducetheresolu-
tionoftheembeddedimages.Wewillnotuseanyofthe
techniquesmentionedinthisparagraph,becauseourgoal
istoreduceredundancyandmakethebyterepresenta-
tionmoreeective,whilepreservingvisualandsemantic
informationinthedocument.
1.4PDFfilestructure
Itispossibletosavespaceinthe
PDF
byserializingthe
sameinformationmoreeectivelyand/orusingbetter
compression.Thissectiongivesahigh-levelintroduction
tothedatastructuresandtheirserializationinthe
PDF
le,focusingonsizeoptimization.Forafulldescription
ofthe
PDF
leformat,see[3].
PDF
supportsinteger,realnumber,boolean,null,string
andnameassimpledatatypes. Astringisasequence
of8-bitbytes.Anameisalsoasequenceof8-bitbytes,
usuallyaconcatenationofafewEnglishwordsinCamel-
Case,oftenusedasadictionarykey(e.g./MediaBox)oran
enumerationvalue(e.g.
/DeviceGray
). Compositedata
typesarethelistandthedictionary.Adictionaryisan
unorderedsequenceofkey–valuepairs,wherekeysmust
benames. Valuesindictionariesandlistitemscanbe
primitiveorcomposite.Thereisasimpleserializationof
valuesto8-bitstrings,compatiblewithPostScriptLan-
guageLevel2.Forexample,
<</Integer 5 5 /Real l -6.7 7 /Null null
/StringInHex <Face> /String ((C)29\\))
/Boolean true e /Name /Foo /List [3 3 4 4 5]>>
denesadictionarywithvaluesofvarioustypes.Alldata
typesareimmutable.
Itispossibletodeneavalueforfutureusebydening
anobject.Forexample,
12  obj j [/
PDF
/Text] endobj
denesobjectnumber12tobeanarrayoftwoitems
(
/
PDF
and
/Text
).Thenumber0inthedenitionisthe
so-calledgenerationnumber,signifyingthattheobject
hasnotbeenmodiedsincethe
PDF
wasgenerated.
PDF
C# PDF insert image Library: insert images into PDF in C#.net, ASP
Import graphic picture, digital photo, signature and logo Ability to put image into specified PDF page component supports inserting image to PDF in preview
copy images from pdf; how to copy text from pdf image to word
VB.NET PDF insert image library: insert images into PDF in vb.net
Import graphic picture, digital photo, signature and logo into Insert images into PDF form field in VB.NET. component supports inserting image to PDF in preview
copying images from pdf files; copy image from pdf to
E114
MAPS39
PéterSzabó
makesitpossibletostoreoldversionsofanobjectwith
dierentgenerationnumbers,theonewiththehighest
numberbeingthemostrecent. Sincemostofthetools
justcreateanew
PDF
insteadofupdatingpartsofan
existingone,wecanassumeforsimplicitythatthegener-
ationnumberisalwayszero.Onceanobjectisdened
itispossibletorefertoit(e.g.
12   R
)insteadoftyping
itsvalue.Itispossibletodeneself-referentiallistsand
dictionariesusingobjectdenitions.The
PDF
specica-
tionrequiresthatsome
PDF
structureelements(suchas
the
/FontDescriptor
value)beanindirectreference,i.e.
denedasanobject. Suchelementscannotbeinlined
intootherobject,buttheymustbereferredto.
A
PDF
lecontainsaheader,alistofobjects,atrailer
dictionary,cross-referenceinformation(osetsofobject
denitions,sortedbyobjectnumber),andtheend-of-le
marker. Theheadercontainsthe
PDF
version(
PDF
-1.7
beingthelatest). Alloftheleelementsaboveexcept
forthe
PDF
version,thelistofobjectsandthetrailerare
redundant,andcanberegeneratediflost.Theparsing
ofthe
PDF
startsatthetrailerdictionary.Its
/Root
value
referstothecatalogdictionaryobject, whose
/Pages
valuereferstoadictionaryobject containingthe list
ofpages.Theinterpretationofeachobjectdependson
thereferencepathwhichleadstothatobjectfromthe
trailer.Inaddition,dictionaryobjectsmayhavethe/Type
and/or
/Subtype
valueindicatingtheinterpretation.For
example,<</Subtype/Image ...>>denesapixel-based
image.
In addition tothe data types above,
PDF
supports
streamsaswell. Astreamobjectisadictionaryaug-
mentedbythestreamdata,whichisabytesequence.The
syntaxisXYobj<<dict-items>>streamstream-data
endstream endobj
.Thestreamdatacanbecompressed
orotherwiseencoded(suchasinhex).The
/Filter
and
/DecodeParms
valuesinthedictionaryspecifyhowto
uncompress/decodethestreamdata.Itispossibletospec-
ifymultiplesuchlters,e.g.
/Filter [/ASCIIHexDecode
/FlateDecode]
saysthatthebytesafter
stream
should
bedecodedasahexstring,andthenuncompressedus-
ing
PDF
’s
ZIP
implementation.(Pleasenotethattheuse
of
/ASCIIHexDecode
isjustawasteofspaceunlessone
wantstocreatean
ASCIIPDF
le.)Thethreemostcom-
monusesforstreamsare:imagepixeldata,embedded
fontlesandcontentstreams.Acontentstreamcontains
theinstructionstodrawthecontentsofthepage. The
streamdatais
ASCII
,withasyntaxsimilartoPostScript,
butwithdierentoperators. Forexample,
BT/F 2  Tf
1     1 8 8 9 9 Tm(Hello world)Tj ET
drawsthetext
“HelloWorld”withthefont/Fatsize20units,shiftedup
by8units,andshiftedrightby9units(accordingtothe
transformationmatrix
1     1 8 9
).
Streamscanusethefollowinggenericcompression
methods:
ZIP
(alsocalledate),
LZW
and
RLE
(run-length
encoding).
ZIP
isalmostalwayssuperior. Inaddition
tothose,
PDF
supportssomeimage-speciccompression
methodsaswell:
JPEG
and
JPEG
2000fortrue-colorimages
and
JBIG
2and
G
3fax(alsocalled
CCITT
fax)forbilevel
(two-color)images.
JPEG
and
JPEG
2000arelossymeth-
ods,theyusuallyyieldthesamesizeatthesamequality
settings—but
JPEG
2000ismoreexible.
JBIG
2issuperior
to
G
3faxand
ZIP
forbilevelimages. Anynumberof
compressionlterscanbeappliedtoastream,butusu-
allyapplyingmorethanoneyieldsalargercompressed
streamsizethanjustapplyingone.
ZIP
and
LZW
support
predictorsaswell. Apredictorisaneasy-to-compute,
invertiblelterwhichisappliedtothestreamdatabe-
forecompression,tomakethedatamorecompressible.
Onepossiblepredictorsubtractsthepreviousdatavalue
fromthecurrentone,andsendsthedierencetothecom-
pressor. Thishelpsreducethelesizeifthedierence
betweenadjacentdatavaluesismostlysmall,whichis
trueforsomeimageswithasmallnumberofcolors.
Thereiscross-referenceinformationneartheendof
the
PDF
le,whichcontainsthestartbyteosetofall
objectdenitions. Usingthisinformationitispossible
torenderpartsofthele,withoutreadingthewholele.
Themostcommonformatforcross-referenceinforma-
tionisthecross-referencetable(startingwiththekeyword
xref).Eachiteminthetableconsumes20bytes,andcon-
tainsanobjectbyteoset.Theobjectnumberisencoded
bythepositionoftheitem.For
PDF
swithseveralthou-
sandobjects,thespaceoccupiedbythecross-reference
tableisnotnegligible.
PDF
1.5introducescross-reference
streams,whichstorethecross-referenceinformationin
compactforminastream.Suchstreamsareusuallycom-
pressedaswell,using
ZIP
andapredictor. Thebenet
ofthepredictoristhatadjacentosetsareclosetoeach
other,sotheirdierencewillcontainlotsofzeros,which
canbecompressedbetter.
Compressioncannotbeappliedtothe
PDF
leasa
whole,onlyindividualparts(suchasstreamdataand
cross-referenceinformation)canbecompressed. How-
ever,therecanbelotsofsmallobjectdenitionsinthe
lewhicharenotstreams. Tocompressthose,
PDF
1.5
introducesobjectstreams.Thedatainanobjectstream
containsaconcatenationofanynumberofnon-stream
objectdenitions. Objectstreamscanbecompressed
justasregularstreamdata. Thismakesitpossibleto
squeezerepetitionsspanningovermultipleobjectde-
nitions.Thus,with
PDF
1.5,mostofthe
PDF
lecanbe
storedincompressedstreams.Onlyafewdozenheader
bytesandend-of-lemarkersandthestreamdictionaries
remainuncompressed.
C# PDF remove image library: remove, delete images from PDF in C#.
Support removing vector image, graphic picture, digital photo, scanned signature, logo, etc. Able to cut and paste image into another PDF file.
copy picture from pdf; how to paste a picture in a pdf
VB.NET PDF remove image library: remove, delete images from PDF in
PDF Image Extract; VB.NET Write: Insert text into PDF; C#.NET PDF pages extract, copy, paste, C#.NET Support removing vector image, graphic picture, digital photo
how to copy an image from a pdf; how to copy pdf image into powerpoint
OptimizingPDFoutputsizeofT
E
Xdocuments
EUROTEX2009
E115
Table1: OutputfilesizesofPDFgenerationfromTheT
E
Xbook,
with various methods. . The e PDF F was s optimized with pdf-
sizeopt.py,thenwithMultivalent.
optimized
method
PDFbytes
PDFbytes
pdfT
E
X
2283510
1806887
dvipdfm
2269821
1787039
dvipdfmx
2007012
1800270
dvips+ps2pdf
3485081
3181869
2MakingPDFfilessmaller
2.1Howtoprepareasmall,optimizable
PDFwithT
E
X
Whenaimingforasmall
PDF
,itispossibletogetitby
usingthebesttoolswiththepropersettingstocreatethe
smallestpossible
PDF
fromthestart.Anotherapproach
istocreatea
PDF
withoutpayingattentiontothetools
andtheirsettings,andthenoptimize
PDF
witha
PDF
size
optimizertool.Theapproachwesuggestinthispaperis
amixtureofthetwo:payattentiontothe
PDF
generator
toolsandtheirfundamentalsettings,sogeneratinga
PDF
whichissmallenoughfortemporaryuseandalsoeasyto
optimizefurther;anduseanoptimizertocreatethenal,
evensmaller
PDF
.
Thissectionenumeratesthemostcommontoolswhich
cangeneratethetemporary
PDF
froma
.tex
source.As
partofthis,itexplainshowtoenforcethepropercom-
pressionandfontsettings,andhowtopreparevectorand
pixel-basedimagessotheydon’tbecomeunnecessarily
large.
Pickthebest
PDF
generationmethod. Table2listsfea-
turesofthe3mostcommonmethods(alsocalleddrivers)
whichproducea
PDF
fromaT
E
Xdocument,andTable1
comparesthelesizetheyproducewhencompilingThe
T
E
Xbook. Thereisnosinglebestdriverbecauseofthe
dierentfeaturesets,butlookingathowlargetheout-
putofdvipsis,thepreliminaryconclusionwouldbeto
usepdfT
E
Xordvipdfm(x)exceptifadvancedPostScript
featuresareneeded(suchasforpsfragandpstricks).
Wecontinuewithpresentingandanalyzingthemeth-
odsmentioned.
dvips ThisapproachconvertsT
E
Xsource
DVI
PostScript
PDF
,usingdvips[29]forcreatingthe
PostScriptle,andps2pdf[28](partofGhostscript)
forcreatingthe
PDF
le.Examplecommand-linesfor
compiling
doc.tex
to
doc.pdf
:
$ latex doc
$ dvips doc
$ ps2pdf14 -d{\PDF}SETTINGS=/prepress doc.ps
Table2. Features supportedbyvariousPDFoutputmethods.
Feature
pdfT
E
X
dvipdfm(x)
dvips
hyperref
+
+
+
TikZ
+
+
+
beamer.cls
+
+
o
+
u
includePDF
+
+
b
+
embedbitmapfont
+
+
+
embedType1font
+
+
+
embedTrueTypefont
+
+
includeEPS
+
+
includeJPEG
+
+
x
includePNG
+
+
x
includeMetaPost
+
m
+
m
+
r
psfrag
f
f
+
pstricks
f
f
+
pdfpages
+
linebreakinlink
+
+
b: boundingboxdetectionwithebborpts-graphics-helper
f: see[21]forworkarounds
m: convenient t with
\includegraphicsmps
defined in pts-
graphics-helper
r: renamefileto.epsmanually
o: with\documentclass[dvipdfm]{beamer}
u: use
dvips -t unknown doc.dvi
to getthe paper size
right.
x: with\usepackage[dvipdfmx]{graphics}andshellescape
runningextractbb
dvipdfmx Thetooldvipdfmx[7]convertsfrom
DVI
to
PDF
,producingaverysmalloutputle.dvipdfmx
ispartofT
E
XLive2008,butsinceit’squitenew,it
maybemissingfromotherT
E
Xdistributions. Its
predecessor,dvipdfmhasnotbeenupdatedsince
March2007.Notablenewfeaturesindvipdfmxare:
supportfornon-latinscriptsandfonts;emittingthe
Type1fontsin
CFF
(that’sthemainreasonforthe
sizedierenceinTable2);parsingpdfT
E
X-stylefont
.map
les.Examplecommand-lines:
$ latex x doc
$ dvipdfmx doc
pdfT
E
Thecommandspdftexorpdflatex[41]
generate
PDF
directlyfromthe
.tex
source,without
anyintermediateles.Animportantadvantageof
pdfT
E
Xovertheothermethodsisthatitintegrates
nicelywiththeeditorsT
E
XShopandT
E
Xworks.The
single-stepapproachensuresthattherewouldbe
noglitches(e.g.imagesmisalignedornotproperly
sized)becausethetoolsarenotintegratedproperly.
Examplecommand-line:
$ pdflatex doc
Thecommand
latex doc
is run for both dvips and
C# Word - Document Processing in C#.NET
Open(docFilePath); //Get the main ducument IDocument doc = document.GetDocument(); //Document clone IDocument doc0 = doc.Clone(); //Get all picture in document
how to copy image from pdf file; how to copy picture from pdf to powerpoint
E116
MAPS39
PéterSzabó
dvipdfm(x).Sincethesetwodriversexpectabitdierent
\specialsintheDVIle,thedrivernamehastobecom-
municatedtotheT
E
Xmacrosgeneratingthe
\special
s.
For L
A
T
E
X, dvips s is the e default. . To o get dvipdfm(x)
right, pass
dvipdfm
(or
dvipdfmx
) as an option to
\documentclass
ortoboth
\usepackage{graphicx}
and
\usepackage{hyperref}
. The e package e pts-graphics-
helper[34]setsupdvipdfmasdefaultunlessthedoc-
umentiscompiledwithpdflatex.
Unfortunately,somegraphicspackages(suchaspsfrag
andpstricks)requireaPostScriptbackendsuchasdvips,
andpdfT
E
Xordvipdfmxdon’tprovidethat. See[21]
foralistofworkarounds. Theyrelyonrunningdvips
onthegraphics,possiblyconvertingitsoutputto
PDF
,
andthenincludingthoselesinthemaincompilation.
Mostoftheextraworkcanbeavoidedifgraphicsare
createdasexternal
PDF
les(withouttextreplacements),
TikZ[8]guresorMetaPostgures.TikZandMetaPost
supporttextcaptionstypesetbyT
E
X.Inkscapeuserscan
usetextext[46]withinInkscapetomakeT
E
Xtypesetthe
captions.
The
\includegraphics
command of the standard
graphicxL
A
T
E
X-packageacceptsa
PDF
astheimagele.
Inthiscase,therstpageofthespecied
PDF
willbe
usedasarectangularimage.Withdvipdfm(x),onealso
needsa
.bb
(or
.bbx
)lecontainingtheboundingbox.
Thiscanbegeneratedwiththeebbtool(ortheextractbb
toolshippingwithdvipdfm(x).Or,itispossibletouse
thepts-graphics-helperpackage[34],whichcanndthe
PDF
boundingboxdirectly(mostofthetime).
dvipdfm(x)containsspecialsupportforembedding
gurescreatedbyMetaPost. ForpdfT
E
X,thegraphicx
packageloadssupp-pdf.tex, whichcanparsetheout-
putofMetaPost,andembedittothedocument.Unfor-
tunately,thegraphicxpackageisnotsmartenoughto
recognizeMetaPostoutputles(
jobname.1
,
jobname.2
etc.)byextension.Thepts-graphics-helperpackageover-
comesthislimitationbydening
\includegraphicsmps
,
whichcanbeusedinplaceof
\includegraphics
forin-
cludinggurescreatedbyMetaPost.Thepackageworks
consistentlywithdvipdfm(x)andpdfT
E
X.
WithpdfT
E
X,itispossibletoembedpageregionsfrom
anexternal
PDF
le,usingthepdfpagesL
A
T
E
X-package.
PleasenotethatduetoalimitationinpdfT
E
X,hyperlinks
andoutlines(tableofcontents)intheembedded
PDF
will
belost.
Althoughdvipdfm(x)supports
PNG
and
JPEG
image
inclusion,calculatingtheboundingboxmaybecumber-
some.Itisrecommendedthatallexternalimagesshould
beconvertedto
PDF
rst.Therecommendedsoftwarefor
thatconversionissam2p[38,39],whichcreatesasmall
PDF
(or
EPS
)quickly.
Consideringalloftheabove,werecommendusing
pdfT
E
XforcompilingT
E
Xdocumentsto
PDF
. If, , for
somereason,usingpdfT
E
Xisnotfeasible,werecommend
dvipdfmxfromT
E
XLive2008orlater. Ifa1%decrease
inlesizeisworththetroubleofgettingfontsright,we
recommenddvipdfm. Inallthesecases,thenal
PDF
shouldbeoptimizedwithpdfsizeopt.py(seelater).
Getridofcomplexgraphics. Somecomputeralgebrapro-
gramsandvectormodelingtoolsemitverylarge
PDF
(or
similarvectorgraphics)les. Thiscanbebecausethey
drawthegraphicsusingtoomanylittleparts(e.g.they
drawasphereusingseveralthousandtriangles),orthey
drawtoomanypartswhichwouldbeinvisibleanyway
sinceotherpartscoverthem.Convertingoroptimizing
such
PDF
lesusuallydoesn’thelp,becausetheoptimizers
arenotsmartenoughtorearrangethedrawinginstruc-
tions,andthenskipsomeofthem.Agoodruleofthumb
isthatifagureinanoptimized
PDF
leislargerthan
thecorresponding
PNG
lerenderedin600
DPI
,thenthe
gureistoocomplex.Toreducethelesize,itisrecom-
mendedtoexportthegureasa
PNG
(or
JPEG
)image
fromtheprogram,andembedthatbitmapimage.
Downsamplehigh-resolutionimages. Formostprinters
itdoesn’tmakeavisibledierencetoprintinaresolu-
tionhigherthan600
DPI
.Sometimeseventhedierence
between300
DPI
and600
DPI
isnegligible. Soconvert-
ingtheembeddedimages downto300
DPI
maysave
signicantspacewithouttoomuchqualitydegradation.
Downsamplingbeforetheimageisincludedisabitof
manualworkforeachimage,buttherearealotoffree
softwaretoolstodoit(suchas
GIMP
[10]andthecon-
verttoolofImageMagick).Itispossibletodownsample
afterthe
PDF
hasbeencreated, forexamplewiththe
commercialsoftware
PDF
Enhancer[20]orAdobeAc-
robat. ps2pdf(usingGhostscript’s
-dDEVICE=pdfwrite
,
andsetdistillerparamstocustomize,seeparametersin
[28])canread
PDF
les,anddownsampleimageswithin
aswell,butitusuallygrowsotherpartsoftheletoo
much(15%increaseinlesizeforTheT
E
Xbook),andit
maylosesomeinformation(itdoeskeephyperlinksand
thedocumentoutline,though).
Croplargeimages. Ifonlypartsofalargeimagecontain
usefulandrelevantinformation,onecansavespaceby
croppingtheimage.
Choosethe
JPEG
quality. Whenusing
JPEG
(or
JPEG
2000)
compression,thereisatradeobetweenqualityandle
size. Most
JPEG
encoders basedonlibjpegaccept an
integerqualityvaluebetween1and100.Fortruecolor
photos,aqualitybelow40producesaseverelydegraded,
hard-to-recognizeimage,with75wegetsomeharmless
OptimizingPDFoutputsizeofT
E
Xdocuments
EUROTEX2009
E117
glitches,andwith85thedegradationishardtonotice.
Ifthedocumentcontainslotsoflarge
JPEG
images,it
isworthreencodingthosewithalowerqualitysetting
togetasmaller
PDF
le.
PDF
Enhancercanreencode
JPEG
imagesinanexisting
PDF
,butsometimesnotallthe
imageshavetobereencoded.With
GIMP
itispossibleto
getareal-timepreviewofthequalitydegradationbefore
saving,bymovingthequalityslider.
Pleasenotethatsomecamerasdon’tencode
JPEG
les
eectivelywhensavingtothememorycard,anditispos-
sibletosavealotofspacebyreencodingonthecomputer,
evenwithhighqualitysettings.
Optimizepoorlyexportedimages. Notallimageprocess-
ingprogramspayattentiontosizeoftheimagelethey
saveorexport.Theymightnotusecompressionbyde-
fault;ortheycompresswithsuboptimalsettings;or(for
EPS
les)theytrytosavetheleinsomecompatibil-
itymode,encodingandcompressingthedatapoorly;or
theyaddlotsofunneededmetadata. Thesepoorlyex-
portedimagesmakeT
E
Xandthedriversrunslowly,and
theywastediskspace(bothonthelocalmachineandin
therevisioncontrolrepository). Agoodruleofthumb
todetectapoorlyexportedimageistousesam2pto
converttheexportedimageto
JPEG
and
PNG
(
sam2p -c
ijg:85 exported.img test.jpg
;
sam2p exported.img
test.png
),andifanyoftheselesisalotsmallerthan
theexportedimage,thentheimagewasexportedpoorly.
Convertingtheexportedimagewithsam2p(toanyof
EPS
,
PDF
,
JPEG
and
PNG
)isafastandeectivewayto
reducetheexportedimagesize.Althoughsam2p,withits
defaultsettings,doesn’tcreatethesmallestpossiblele,
itrunsveryquickly,anditcreatesanimagelewhichis
smallenoughtobeembeddedinthetemporary
PDF
.
Embedvectorfontsinsteadofbitmapfonts. Mostfonts
usedwithT
E
XnowadaysareavailableinType1vector
format.(ThesefontsincludetheComputerModernfam-
ilies,theLatinModernfamilies,the
URW
versionsof
thebase14andsomeotherAdobefonts,theT
E
XGyre
families,theVerafamilies,thePalatinofamily,thecor-
respondingmathfonts,andsomesymbolanddrawing
fonts.) ThisisasignicantshiftfromtheoriginalT
E
X
(+dvips)concept,whichusedbitmapfontsgenerated
byMetaFont. Whiledriversstillsupport t embedding
bitmapfontstothe
PDF
,thisisnotrecommended,be-
causebitmaps(at600
DPI
)arelargerthantheirvector
equivalent,theyrendermoreslowlyandtheylookuglier
insome
PDF
viewers.
Ifafontismissingfromthefont.maple,driverstend
togenerateabitmapfontautomatically,andembedthat.
Tomakesurethisdidn’thappen,itispossibletodetect
thepresenceofbitmapfontsina
PDF
byrunninggrep -a
Table3: Font.mapfilesusedbyvariousdriversandtheirsymlink
targets(defaultfirst)inT
E
XLive2008.
Driver
Font.mapfile
xdvi
ps2pk.map
dvips
psfonts.map→
psfonts_t1.map|(psfonts_pk.map)
pdfT
E
X
pdftex.map→
pdftex_dl14.map|(pdftex_ndl14.map)
dvipdfm(x)
dvipdfm.map→
dvipdfm_dl14.map|(dvipdfm_ndl14.map)
"/Subtype */Type3" doc.pdf
. Hereishowtoinstruct
pdfT
E
Xtousebitmapfontsonly(fordebuggingpurposes):
pdflatex "\pdfmapfile\input"doc.Themostcommon
reasonforthedrivernotndingacorrespondingvector
fontisthatthe
.map
leiswrongorthewrongmaple
isused.WithT
E
XLive,theupdmaptoolcanbeusedto
regeneratethe.maplesfortheuser,andtheupdmap-sys
commandregeneratesthesystem-level.maples.Table3
showswhichdriverreadswhich
.map
le.Copyingover
pdftex_dl14.maptothecurrentdirectoryasthedriver-
specic
.map
leusuallymakesthedriverndthefont.
OldT
E
Xdistributionshadquitealotofproblemsnding
fonts,upgradingtoT
E
XLive2008ornewerisstrongly
recommended.
Someotherpopularfonts(suchastheMicrosoftweb
fonts)areavailableinTrueType,anothervectorformat.
dvipdfm(x)andpdfT
E
XcanembedTrueTypefonts,but
dvipscannot(itjustdumpsthe
.ttf
letothe
.ps
le,
renderingitunparsable).
OpenTypefontswithadvancedtablesforscriptand
featureselectionandglyphsubstitutionaresupportedby
Unicode-awareT
E
X-derivativessuchasXeT
E
X,andalso
bydvipdfmx.
Omitthebase14fonts. Thebase14fontsareTimes(in4
styles,Helvetica(in4styles),Courier(in4styles),Symbol
andZapfDingbats. Toreducethesizeofthe
PDF
,itis
possibletoomitthemfromthe
PDF
le,because
PDF
viewerstendtohavethem.However,omittingthebase
14fontsisdeprecatedsince
PDF
1.5.AdobeReader6.0or
newer,andother
PDF
viewers(suchasxpdfandevince)
don’tcontainthosefontseither,buttheycanndthem
assystemfonts.OnDebian-basedLinuxsystems,those
fontsareinthegsfontspackage.
In T
E
XLive, directives pdftexDownloadBase14 and
dvipdfmDownloadBase14 etc.in the conguration le
texmf-config/web2c/updmap.cfgspecifywhethertoem-
bedthebase14fonts.Aftermodifyingthisle(eitherthe
system-wideoneortheonein$HOME/.texlive28)and
runningtheupdmapcommand,thefollowingfontmap
leswouldbecreated:
E118
MAPS39
PéterSzabó
pdftex_dl14.map FontmapleforpdfT
E
Xwiththe
base14fontsembedded.Thisisthedefault.
pdftex_ndl14.map FontmapleforpdfT
E
Xwiththe
base14fontsomitted.
pdftex.map FontmapleusedbypdfT
E
Xbydefault.
Identicaltooneofthetwoabove,basedonthe
pdftexDownloadBase14setting.
dvipdfm_dl14.map Fontmaplefordvipdfm(x)with
thebase14fontsembedded.Thisisthedefault.
dvipdfm_ndl14.map Fontmaplefordvipdfm(x)
withthebase14fontsomitted.
dvipdfm.map Fontmapleusedbydvipdfm(x)by
default.Identicaltooneofthetwoabove,basedon
thedvipdfmDownloadBase14setting.
Itispossibletospecifythebase14embeddingsettings
withoutmodifyingcongurationlesorgenerating.map
les.Examplecommand-lineforpdfT
E
X(typeitwithout
linebreaks):
pdflatex "\pdfmapfile{pdftex_ndl14.map}
\input" doc.tex
However,thiswilldisplayawarningNoagsspecied
fornon-embeddedfont.Togetridofthis,use
pdflatex "\pdfmapfile{=
pdftex_ndl14_extraflag.map}
\input" doc.tex
instead.Getthe
.map
lefrom[34].
The
.map
lesyntaxfor dvipdfmis dierent, , but
dvipdfmxcanusea
.map
leofpdfT
E
Xsyntax,likethis:
dvipdfmx -f f pdftex_dl14.map p doc.dvi
Pleasenotethatdvipdfmloadsthe.maplesspecied
indvipdfmx.cfgrst,andthe
.map
lesloadedwiththe
-f
agoverrideentriesloadedpreviously,fromthecon-
gurationle. Tohavethebase14fontsomitted,run
(withoutalinebreak):
dvipdfmx -f f pdftex_ndl14.map
-f dvipdfmx_ndl14_extra.map doc.tex
Again,youcangetthelast
.map
lefrom[34].Without
dvipdfmx_ndl14_extra.map,abugindvipdfmprevents
itfromwritinga
PDF
lewithoutthefont—itwould
embedarenderedbitmapfontinstead.
Subset fonts. Fontsubsetting is theprocesswhen the
driverselectsandembedsonlytheglyphsofafontwhich
areactuallyusedinthedocument. Fontsubsettingis
turnedonbydefaultfordvips,dvipdfm(x)andpdfT
E
X
whenemittingglyphsproducedbyT
E
X.
2.2Extramanualtweakson
T
E
X-to-PDFcompilation
Thissectionsshowsacoupleofmethodstoreducethe
sizeofthe
PDF
createdbyaT
E
Xcompilationmanually.
Itisnotnecessarytoimplementthesemethodsifthe
temporary
PDF
getsoptimizedbypdfsizeopy.py+Multi-
valent,becausethiscombinationimplementsthemethods
discussedhere.
Setthe
ZIP
compressionleveltomaximum. ForpdfT
E
X,
theassignment
\pdfcompresslevel9
selectsmaximum
PDF
compression.WithT
E
XLive2008,thisisthedefault.
Hereishowtospecifyitonthecommand-line(without
linebreaks):
pdflatex "\pdfcompresslevel9
\input" doc.tex
Fordvipdfm(x),thecommand-lineag
-z9
canbeused
tomaximizecompression.Thisisalsothedefault.
PDF
itselfsupportsredundancyeliminationinmanydierent
places(seeinSubsection2.3)inadditiontosettingthe
ZIP
compressionlevel.
Thereisnoneedtopayattentiontothistweak,because
Multivalentrecompressesall
ZIP
streamswithmaximum
eort.
Generate object t streams and d cross-reference streams.
pdfT
E
Xcangenerateobjectstreamsandcross-reference
streamstosaveabout10%ofthe
PDF
lesize,oreven
moreifthelecontainslotsofhyperlinks.(Theactual
savingdependsonthelestructure.)Examplecommand-
lineforenablingit(withoutlinebreaks):
pdflatex "\pdfminorversion5
\pdfobjcompresslevel3
\input" doc.tex
Accordingto[27],if
ZIP
compressionisusedtocom-
presstheobjectstreams,insomerarecasesitispossibleto
savespacebystartinganewblockwithinthe
ZIP
stream
justattherightpoints.
Thereisnoneedtopayattentiontothistweak,because
Multivalentgeneratesobjectstreamsandcross-reference
streamsbydefault.
EncodeType1fontsas
CFF
.
CFF
[2](Type2or
/Subtype
/Type1C
)isanalternative,compact,highlycompressible
binaryfontformatthatcanrepresentType1fontdata
withoutloss.Byembeddingvectorfontsin
CFF
instead
ofType1,onecansavesignicantportionofthe
PDF
le,
especiallyifthedocumentis10pagesorless(e.g.reducing
the
PDF
lesizefrom200kBto50kB).dvipdfmxdoes
thisbydefault,buttheotherdrivers(pdfT
E
X,dvipdfm,
ps2pdfwithdvips)don’tsupport
CFF
embeddingsofar.
Thereisnoneedtopayattentiontothistweak,because
pdfsizeopt.pyconvertsType1fontsinthe
PDF
to
CFF
.
Creategraphicswithfontsubsettinginmind. Forglyphs
coming from external l sources such h as the e included
OptimizingPDFoutputsizeofT
E
Xdocuments
EUROTEX2009
E119
PostScriptand
PDF
graphics,thedriverisusuallynot
smartenoughtorecognizethefontsalreadyembedded,
andunifythemwiththefontsinthemaindocument.
Let’ssupposethatthedocumentcontainsincludedgraph-
icswithtextcaptions,eachgraphicssourcePostScript
or
PDF
havingthefontsubsetsembedded. Nomatter
whetherdvips,dvipdfm(x)orpdfT
E
Xisthedriver,itwill
notbesmartenoughtounifythesesubsetstoasingle
font. Thusspacewouldbewastedinthenal
PDF
le
containingmultiplesubsetsofthesamefont,possibly
storingduplicateversionsofsomeglyphs.
Itispossibletoavoidthiswastebyusingagraphics
packageimplementedinpureT
E
X(suchasTikZ)orus-
ingMetaPost(forwhichthereisspecialsupportindvips,
dvipdfm(x)andpdfT
E
Xtoavoidfontandglyphduplica-
tion).Thepackagepsfragdoesn’tsuerfromthisproblem
eitherifthe
EPS
lesdon’tcontainanyembeddedfonts.
Thereisnoneedtopayattentiontothistweak,because
pdfsizeopt.pyuniesfontsubsets.
Disablefontsubsettingbeforeconcatenation. Ifa
PDF
documentisaconcatenationofseveralsmaller
PDF
les
(suchasinjournalvolumesandconferenceproceeding),
andeach
PDF
lecontainsitsown,subsettedfonts,then
itdependsontheconcatenatortoolwhetherthosesub-
setsareuniedornot.Mostconcatenatortools(pdftk,
Multivalent,pdfpages,ps2pdf;see[32]formore)don’t
unifythesefontsubsets.
However,ifyouuseps2pdf for
PDF
concatenation,
youcanget fontsubsettingandsubsetunicationby
disablingfontsubsettingwhengeneratingthesmall
PDF
les.Inthiscase,Ghostscript(runbyps2pdf)willnotice
thatthedocumentcontainstheexactsamefontmany
times,anditwillsubsetonlyonecopyofthefont.
Thereisnoneedtopayattentiontothistweak,because
pdfsizeopt.pyuniesfontsubsets.
Embedeachgraphicsleonce. Whenthesamegraphics
le(suchasthecompanylogoonpresentationslides)is
includedmultipletimes,itdependsonthedriverwhether
thegraphicsdataisduplicatedinthenal
PDF
.pdfT
E
X
doesn’tduplicate,dvipdfm(x)duplicatesonlyMetaPost
graphics,anddvipsalwaysduplicates.
Thereisnoneedtopayattentiontothistweak,because
bothpdfsizeopt.pyandMultivalenteliminateduplicates
ofidenticalobjects.
2.3HowPDFoptimizerssavespace
Thissubsectiondescribessomemethods
PDF
optimizers
usetoreducethelesize.Wefocusonideasandmethods
relevanttoT
E
Xdocuments.
Use cross-reference streams s compressed with the e y-
predictor. Eachosetentryinan(uncompressed)cross-
referencetableconsumes20bytes.Itcanbereducedby
usingcompressedcross-referencestreams,andenabling
they-predictor.AsshownincolumnxrefofTable4,a
reductionfactorof180ispossibleifthe
PDF
lecontains
manyobjects(e.g.morethan105objectsinpdfref,with
lessthan12000bytesinthecross-referencestream).
Thereasonwhythey-predictorcanmakeadierence
ofafactorof2orevenmoreisthefollowing. They-
predictorencodeseachbyteinarectangulararrayof
bytesbysubtractingtheoriginalbyteabovethecurrent
bytefromthecurrentbyte.Soifeachrowoftherectan-
gulararraycontainsanobjectoset,andtheosetsare
increasing,thenmostofthebytesintheoutputofthey-
predictorwouldhaveasmallabsolutevalue,mostlyzero.
Thustheoutputofthey-predictorcanbecompressed
betterwith
ZIP
thantheoriginalbytearray.
Some tools such h as s Multivalent implement the y-
predictorwith
PNG
predictor12,butusing
TIFF
predic-
tor2avoidsstungintheextrabytepereachrow—
pdfsizeopt.pydoesthat.
Useobjectstreams. Itispossibletosavespaceinthe
PDF
byconcatenatingsmall(non-stream)objectstoanobject
stream,andcompressingthestreamasawhole.Onecan
evensortobjectsbytyperst,sosimilarobjectswillbe
placednexttoeachother,andtheywillttothe32kB
long
ZIP
compressionwindow.
Please note e that both object streams and cross-
referencestreamsare
PDF
1.5features,andcross-reference
streamsmustbealsousedwhenobjectstreamsareused.
Usebetterstreamcompression. In
PDF
anystreamcan
becompressedwithanycompressionlter(oracombina-
tionoflters).
ZIP
isthemosteectivegeneral-purpose
compression,whichisrecommendedforcompressing
contentstreams,objectstreams,cross-referencestreams
andfontdata(suchas
CFF
).Forimages,however,there
arespecializedlters(seelaterinthissection).
Most
PDF
generators(suchasdvipdfm(x)andpdfT
E
X)
andoptimizationtools(suchasMultivalent)usethezlib
codeforgeneral-purpose
ZIP
compression.zlibletsthe
userspecifytheeort parameterbetween0 (nocom-
pression)and9(slowestcompression,smallestoutput)
tobalancecompressionspeedversuscompresseddata
size.Thereare,howeveralternative
ZIP
compressorim-
plementations(suchastheonein
KZIP
[30]and
PNGOUT
[31,9]),whichprovideanevenhighereort—butthe
authordoesn’tknowofany
PDF
optimizersusingthose
algorithms.
Recompresspixel-basedimages.
PDF
supportsmorethan
6compressionmethods(andanycombinationofthem)
andmorethan6predictors,sotherearelotsofpossibil-
itiestomakeimagessmaller.Herewefocusonlossless
E120
MAPS39
PéterSzabó
compression(thusexcluding
JPEG
and
JPEG
2000usedfor
compressingphotos).Animageisrectangulararrayof
pixels.Eachpixelisencodedasavectorofoneormore
componentsinthecolorspaceoftheimage. Typical
colorspacesare
RGB
(
/DeviceRGB
),grayscale(
/Device
Gray
),
CMYK
(
/DeviceCMYK
),colorspaceswherecolors
aredevice-independent,andthepalette(indexed)ver-
sionsofthose. Eachcolorcomponentofeachpixelis
encodedasanonnegativeintegerwithaxednumberof
bits(bits-per-component,
BPC
;canbe1,2,4,8,12or16).
Theimagedatacanbecompressedwithanycombination
ofthe
PDF
compressionmethods.
Beforerecompressingtheimage,usuallyitisworth
extractingtheraw
RGB
or
CMYK
(ordevice-independent)
imagedata,andthencompressingtheimagethebestwe
can. Partialapproachessuchasoptimizingthepalette
onlyareusuallysuboptimal,becausetheymaybeinca-
pableofconvertinganindexedimagetograyscaletosave
thestoragespaceneededbythepalette.
Topickthebestencodingfortheimage,wehaveto
decidewhichcolorspace,bits-per-component,compres-
sionmethod(s)andpredictortouse.Wehavetochoose
acolorspacewhichcanrepresentallthecolorsinthe
image. Wemayconvertagrayscaleimagetoan
RGB
image(andbackifallpixelsaregrayscale).Wemayalso
convertagrayscaleimagetoa
CMYK
image(andmaybe
back).Iftheimagedoesn’thavemorethan256dierent
colors,wecanuseanindexedversionofthecolorspace.
Agoodruleofthumb(nomatterthecompression)isto
pickthecolorspace+bits-per-componentcombination
whichneedstheleastnumberofbitsperpixel.Onadraw,
picktheonewhichdoesn’tneedapalette.Theseideas
canalsobeappliediftheimagecontainsanalphachannel
(whichallowsfortransparentorsemi-transparentpixels).
Itispossibletofurtheroptimizesomecornercases,for
exampleiftheimagehasonlyasinglecolor,thenitis
worthencodingitasvectorgraphicsllingarectangle
ofthatcolor.Or,whentheimageisagridofrectangles,
whereeachrectanglecontainsasinglecolor,thenitis
worthencodingalowerresolutionimage,andincrease
thescalefactorintheimagetransformationmatrixto
drawthelargerimage.
High-eort
ZIP
isthebestcompressionmethodsup-
ported by
PDF
, except forbilevel(two-color) images,
where
JBIG
2canyieldasmallerresultforsomeinputs.
JBIG
2ismosteectiveonimageswithlotsof2
D
repe-
titions,e.g.imagescontaininglotsoftext(becausethe
lettersarerepeating).Otherlosslesscompressionmethods
supportedby
PDF
(suchas
RLE
,
LZW
and
G
3fax)areinfe-
riorto
ZIP
and/or
JBIG
2.Sometimestheimageissosmall
(like10×10pixels)thatcompressingwouldincreaseits
size. Mostoftheimagesdon’tbenetfromapredictor
(usedtogetherwith
ZIP
compression),butsomeofthem
do.
PDF
supportsthe
PNG
predictorimagedataformat,
whichmakesitpossibletochooseadierentpredictorfor
scanline(imagerow).Theheuristicdefaultalgorithmin
pnmtopngcalculatesall5scanlinevariations,andpicks
theonehavingthesmallestsumofabsolutevalues.This
facilitatesbyteswithsmallabsolutevaluesintheuncom-
pressedimagedata,sotheHumancodingin
ZIP
can
compressiteectively.
Mostofthetimeitisnotpossibletotellinadvance
if
ZIP
or
JBIG
2shouldbeused,orwhetherapredictor
shouldbeusedwith
ZIP
ornot.Togetthesmallestpos-
sibleoutput,itisrecommendedtorunall3variations
andpicktheoneyieldingthesmallestimageobject.For
verysmallimages,theuncompressedversionshouldbe
consideredaswell. Iftheimageishugeandithaslots
repetitiveregions,itmaybeworthtoapply
ZIP
more
thanonce.Pleasenotethatmetadata(suchasspecifying
thedecompressionlter(s)touse)alsocontributestothe
imagesize.
Most
PDF
optimizersusethezlibcodefor
ZIP
compres-
sioninimages. Theoutputofsomeotherimagecom-
pressors(mostnotably
PNGOUT
[31],seealsoOpti
PNG
[43]and[42]foralistof11other
PNG
optimizationtools,
andmoretoolsin[15])issmallerthanwhatzlibpro-
duceswithitshighesteort,butthoseothercompressors
usuallyruna100timesorevenslowerthanzlib.
Howmuchadocumentsizedecreasesbecauseofimage
recompressiondependsonthestructureofthedocument
(howmanyimagesarethere,howlargetheimagesare,
howlargepartofthelesizeisoccupiedbyimages)and
howeectivelythe
PDF
wasgenerated.Thepercentage
savingsintheimagecolumnofTable4suggeststhat
onlyalittlesavingispossible(about5%)iftheuserpays
attentiontoembedtheimageseectively,accordingto
theimage-relatedguidelinespresentedinSection2.1.It
ispossibletosavelotsofspacebydecreasingtheimage
resolution,ordecreasingtheimagequalitybyusingsome
lossycompressionmethod(suchas
JPEG
or
JPEG
2000)
withlowerqualitysettings.Thesekindsofoptimizations
aresupportedbyAdobeAcrobatProand
PDF
Enhancer,
buttheyareoutofscopeofourgoalstodecreasethele
sizewhilenotchangingitsrenderedappearance.
JPEG
lescouldbenetfromalosslesstransformation,
suchasremoving
EXIF
tagsandothermetadata. Com-
pressing
JPEG
datafurtherwith
ZIP
wouldn’tsavespace.
Theprogrampack
JPG
[33]appliescustomlosslesscom-
pressionto
JPEG
les,savingabout20%.Unfortunately,
PDF
doesn’thaveadecompressionlterforthat.
Convertsomeinlineimagestoobjects. Itispossibleto
inlineimagesintocontentstreams.This
PDF
featuresaves
about30bytesperimageascomparedtohavingtheimage
OptimizingPDFoutputsizeofT
E
Xdocuments
EUROTEX2009
E121
asastandaloneimageobject. However,inlineimages
cannotbeshared. Soinordertosavethemostspace,
inlineimageswhichareusedmorethanonceshouldbe
convertedtoobjects,andimageobjectsusedonlyonce
shouldbeconvertedtoinlineimages. Imageshaving
paletteduplicationwithotherimagesshouldbeimage
objects,sothepalettecanbeshared.
Unifyduplicateobjects. Iftwoormore
PDF
objectsshare
thesameserializedvalue,itisnaturaltosavespaceby
keepingonlytherstone,andmodifyingreferencesto
therestsothattheyrefertotherstone.Itispossibleto
optimizeevenmorebyconstructingequivalenceclasses,
andkeepingonlyoneobjectperclass.Forexample,ifthe
PDF
contains
5   obj j << < /Next t 6   R R /Prev v 5 5  R >> > endobj
6   obj j << < /Next t 5   R R /Prev v 6 6  R >> > endobj
7  obj << /First 6  R R >> > endobj
thenobjects5and6areequivalent,sowecanrewritethe
PDF
to
5   obj j << < /Next t 5   R R /Prev v 5 5  R >> > endobj
7  obj << /First 5  R R >> > endobj
PDF
generatorsusuallydon’temitduplicateobjectson
purpose,butitjusthappensbychancethatsomeobject
valuesareequal.Ifthedocumentcontainsthesamepage
content,font, fontencoding,imageorgraphicsmore
thanonce,andthe
PDF
generatorfailstonoticethat,
thenthesewouldmostprobablybecomeduplicateob-
jects,whichcanbeoptimizedaway.Themethoddvips
+ps2pdfusuallyproduceslotsofduplicatedobjectsif
thedocumentcontainslotsofduplicatecontentsuchas
\includegraphics
loadingsamegraphicsmanytimes.
Removeimageduplicates,basedonvisiblepixelvalue.
Dierentcolorspace,bits-per-pixelandcompressionset-
tingscan causemanydierentrepresentationsofthe
sameimage(rectangularpixelarray)tobepresentinthe
document.Thiscanindeedhappenifdierentpartsof
the
PDF
werecreatedwithdierent(e.g.onewithpdfT
E
X,
anotherwithdvips),andtheresultswereconcatenated.
Tosavespace,theoptimizercankeeponlythesmallest
imageobject,andupdatereferences.
Removeunusedobjects. Some
PDF
lescontainobjects
whicharenotreachablefromthe/Rootortrailerobjects.
Thesemaybepresentbecauseofincrementalupdates,
concatenationsorconversion,orbecausetheleisa
linearized
PDF
.Itissafetosavespacebyremovingthose
unusedobjects.Alinearized
PDF
providesabetterweb
experiencetotheuser,becauseitmakestherstpage
ofthe
PDF
appearearlier.Sincealinearized
PDF
canbe
automaticallygeneratedfromanon-linearizedoneany
time,thereisnopointkeepingalinearized
PDF
when
optimizingforsize.
Extractlargepartsofobjects. Unifyingduplicateobjects
cansavespaceonlyifawholeobjectisduplicated. If
aparagraphisrepeatedonapage,itwillmostproba-
blyremainduplicated,becausetheduplicationiswithin
asingleobject(thecontentstream). Sotheoptimizer
cansavespacebydetectingcontentduplicationinthe
sub-objectlevel(outsidestreamdataandinsidecontent
streamdata),andextractingtheduplicatedpartstoindi-
vidualobjects,whichcannowbeunied.Althoughthis
extractionwouldusuallybetooslowifappliedtoalldata
structuresinthe
PDF
,itmaybeworthapplyingittosome
largestructuressuchasimagepalettes(whosemaximum
sizeis768bytesfor
RGB
images).
Reorganizecontentstreamsandform
XO
bjects. Instruc-
tionsfordrawingasinglepagecanspanovermultiple
contentstreamsandform
XO
bjects.Tosavespace,itis
possibletoconcatenatethosetoasinglecontentstream,
andcompressthestreamatonce. Afterallthosecon-
catenations,largecommoninstructionsequencescanbe
extractedtoform
XO
bjectstomakecodereusepossible.
Removeunnecessaryindirectreferences. The
PDF
speci-
cationdeneswhetheravaluewithinacompound
PDF
valuemustbeanindirectreference.Ifaparticularvalue
inthe
PDF
leisanindirectreference,butitdoesn’thave
tobe,andotherobjectsarenotreferringtothatobject,
theninliningthevalueoftheobjectsavesspace.Some
PDF
generatorsemitlotsofunnecessaryindirectrefer-
ences,becausetheygeneratethe
PDF
lesequentially,
andforsomeobjectstheydon’tknowthefullvaluewhen
theyaregeneratingtheobject—sotheyreplacepartsof
thevaluebyindirectreferences,whosedenitionsthey
givelater.Thisstrategycansavesome
RAM
duringthe
PDF
generation,butitmakesthe
PDF
about40byteslarger
thannecessaryforeachsuchreference.
ConvertType1fontsto
CFF
SincedriversembedType1
fontstothe
PDF
asType1(exceptfordvipdfmx,which
emits
CFF
),and
CFF
canrepresentthesamefontwith
lessbytes(becauseofthebinaryformatandthesmart
defaults),anditisalsomorecompressible(becauseit
doesn’thaveencryption),itisnaturaltosavespaceby
convertingType1fontsinthe
PDF
to
CFF
.
Subsetfonts. Thiscanbedonebyndingunusedglyphs
infonts,andgettingridofthem. Usuallythisdoesn’t
saveanyspaceforT
E
Xdocuments,becausedriverssubset
fontsbydefault.
Unify subsets of thesame e font. As discussed in n Sec-
tion2.1,a
PDF
lemayendupcontainingmultiplesub-
setsofthesamefontwhentypesettingacollectionof
Documents you may be interested
Documents you may be interested