management - Electronic document file format for long-term preservation (working in ISO
Technical committee 171), based on PDF 1.4 and later also ISO 32000-1 - PDF 1.7
PDF/E (since 2008 - ISO 24517) - a.k.a. "PDF for Engineering" - Document management -
Engineering document format using PDF (working in ISO Technical committee 171), based on
PDF/VT (since 2010 - ISO 16612-2) - a.k.a. "PDF for exchange of variable data and
transactional (VT) printing" - Graphic technology - Variable data exchange (working in ISO
Technical committee 130), based on PDF 1.6 as restricted by PDF/X-4 and PDF/X-5
PDF/UA (since 2012 - ISO 14289-1) - a.k.a. "PDF for Universal Access" - Document
management applications - Electronic document file format enhancement for accessibility
(working in ISO Technical committee 171), based on ISO 32000-1 - PDF 1.7
There is also the PDF/H, a.k.a. PDF Healthcare, a best practices guide (BPG), supplemented by an
Implementation Guide (IG), published in 2008. PDF Healthcare is not a standard or proposed
standard, but only a guide for use with existing standards and other technologies. It is supported by the
standards development organizations ASTM and AIIM. PDF/H BPG is based on PDF 1.6.
Full function PDF
The final revised documentation for PDF 1.7 was approved by ISO Technical Committee 171 in
January 2008 and published as ISO 32000-1:2008 on July 1, 2008 and titled Document
management—Portable document format—Part 1: PDF 1.7.
ISO 32000-1:2008 is the first ISO standard for full function PDF. The previous ISO PDF standards
(PDF/A, PDF/X, etc.) are intended for more specialized uses. ISO 32000-1 includes all of the
functionality previously documented in the Adobe PDF Specifications for versions 1.0 through 1.6.
Adobe removed certain features of PDF from previous versions; these features are not contained in
PDF 1.7 either.
The ISO 32000-1 document was prepared by Adobe Systems Incorporated based upon PDF
Reference, sixth edition, Adobe Portable Document Format version 1.7, November 2006. It was
reviewed, edited and adopted under a special fast-track procedure, by ISO Technical Committee
171 (ISO/TC 171), Document management application, Subcommittee SC 2, Application issues, in
parallel with its approval by the ISO member bodies.
According to the ISO PDF standard abstract:
ISO 32000-1:2008 specifies a digital form for representing electronic documents to
enable users to exchange and view electronic documents independent of the
environment they were created in or the environment they are viewed or printed in. It is
intended for the developer of software that creates PDF files (conforming writers),
software that reads existing PDF files and interprets their contents for display and
interaction (conforming readers) and PDF products that read and/or write PDF files for a
variety of other purposes (conforming products).
A new version of PDF standard is under development under the name ISO/DIS 32000-2 - Document
management—Portable document format—Part 2: PDF 2.0 (as of September 2012).
was accepted by ISO as a new proposal in 2009 (ISO/NP 32000-2). The TC 171 SC 2 WG 8
Committee working on ISO 32000-2 (PDF 2.0) is continuing to actively develop the document;
processing hundreds of technical and editorial comments and operating eight ad hoc committees
comprising numerous interested parties, including Adobe Systems. To provide more time to develop
the document the original ISO project was cancelled in 2012 and a New Project item was
Adobe has submitted the Adobe Extension Level 5 and Adobe Extension Level 3 specifications to
ISO for inclusion into the ISO 32000-2 specification, but only some of their features have been
PDF 2.0 will reference Adobe's XML Forms Architecture 3.1. In 2011 the ISO Committee urged
Adobe Systems to submit the XFA Specification, XML Forms Architecture (XFA), to ISO for
standardization and requested Adobe Systems to stabilize the XFA specification. The ISO
Committee expressed its concerns about the stability of the XFA specification.
ISO TC 171 SC 2 WG 8
Formed in 2008 to curate the PDF Reference as an ISO Standard, Working Group 8 typically meets
twice a year, with members from ten or more countries attending in each instance. Meetings of the
ISO Committee for ISO 32000 are open to accredited Subject Matter Experts. Interested parties
should contact their respective ISO Member Body for information about joining ISO 32000.
Current Project Leadership: Cherie Ekholm, Microsoft & Duff Johnson, Independent Consultant
(http://www.duff-johnson.com), Project Co-Leaders
Past Project Leadership: 2008-2011: James King, PhD, Adobe Systems
Secretary: Betsy Fanning, AIIM
Anyone may create applications that can read and write PDF files without having to pay royalties to
Adobe Systems; Adobe holds patents to PDF, but licenses them for royalty-free use in developing
software complying with its PDF specification.
The PDF combines three technologies:
A subset of the PostScript page description programming language, for generating the layout
A font-embedding/replacement system to allow fonts to travel with the documents.
A structured storage system to bundle these elements and any associated content into a single
file, with data compression where appropriate.
PostScript is a page description language run in an interpreter to generate an image, a process
requiring many resources. It can handle not just graphics, but standard features of programming
languages such as
commands. PDF is largely based on PostScript but simplified to
remove flow control features like these, while graphics commands such as
Often, the PostScript-like PDF code is generated from a source PostScript file. The graphics
commands that are output by the PostScript code are collected and tokenized; any files, graphics, or
fonts to which the document refers also are collected; then, everything is compressed to a single file.
Therefore, the entire PostScript world (fonts, layout, measurements) remains intact.
As a document format, PDF has several advantages over PostScript:
PDF contains tokenized and interpreted results of the PostScript source code, for direct
correspondence between changes to items in the PDF page description and changes to the
resulting page appearance.
PDF (from version 1.4) supports true graphic transparency; PostScript does not.
PostScript is an interpreted programming language with an implicit global state, so instructions
accompanying the description of one page can affect the appearance of any following page.
Therefore, all preceding pages in a PostScript document must be processed to determine the
correct appearance of a given page, whereas each page in a PDF document is unaffected by
the others. As a result, PDF viewers allow the user to quickly jump to the final pages of a long
document, whereas a Postscript viewer needs to process all pages sequentially before being
able to display the destination page (unless the optional PostScript Document Structuring
Conventions have been carefully complied with).
A PDF file consists primarily of objects, of which there are eight types:
Boolean values, representing true or false
Arrays, ordered collections of objects
Dictionaries, collections of objects indexed by Names
Streams, usually containing large amounts of data
The null object
Objects may be either direct (embedded in another object) or indirect. Indirect objects are numbered
with an object number and a generation number. An index table called the xref table gives the byte
offset of each indirect object from the start of the file.
This design allows for efficient random
access to the objects in the file, and also allows for small changes to be made without rewriting the
entire file (incremental update). Beginning with PDF version 1.5, indirect objects may also be located
in special streams known as object streams. This technique reduces the size of files that have large
numbers of small indirect objects and is especially useful for Tagged PDF.
There are two layouts to the PDF files—non-linear (not "optimized") and linear ("optimized"). Non-
linear PDF files consume less disk space than their linear counterparts, though they are slower to
access because portions of the data required to assemble pages of the document are scattered
throughout the PDF file. Linear PDF files (also called "optimized" or "web optimized" PDF files) are
constructed in a manner that enables them to be read in a Web browser plugin without waiting for the
entire file to download, since they are written to disk in a linear (as in page order) fashion.
files may be optimized using Adobe Acrobat software or QPDF.
The basic design of how graphics are represented in PDF is very similar to that of PostScript, except
for the use of transparency, which was added in PDF 1.4.
PDF graphics use a device independent Cartesian coordinate system to describe the surface of a
page. A PDF page description can use a matrix to scale, rotate, or skew graphical elements. A key
concept in PDF is that of the graphics state, which is a collection of graphical parameters that may be
changed, saved, and restored by a page description. PDF has (as of version 1.6) 24 graphics state
properties, of which some of the most important are:
The current transformation matrix (CTM), which determines the coordinate system
The clipping path
The color space
The alpha constant, which is a key component of transparency
Vector graphics in PDF, as in PostScript, are constructed with paths. Paths are usually composed of
lines and cubic Bézier curves, but can also be constructed from the outlines of text. Unlike PostScript,
PDF does not allow a single path to mix text outlines with lines and curves. Paths can be stroked,
filled, or used for clipping. Strokes and fills can use any color set in the graphics state, including
PDF supports several types of patterns. The simplest is the tiling pattern in which a piece of artwork is
specified to be drawn repeatedly. This may be a colored tiling pattern, with the colors specified in the
pattern object, or an uncolored tiling pattern, which defers color specification to the time the pattern is
drawn. Beginning with PDF 1.3 there is also a shading pattern, which draws continuously varying
colors. There are seven types of shading pattern of which the simplest are the axial shade (Type 2)
and radial shade (Type 3).
Raster images in PDF (called Image XObjects) are represented by dictionaries with an associated
stream. The dictionary describes properties of the image, and the stream contains the image data.
(Less commonly, a raster image may be embedded directly in a page description as an inline
image.) Images are typically filtered for compression purposes. Image filters supported in PDF
include the general purpose filters
ASCII85Decode a filter used to put the stream into 7-bit ASCII
ASCIIHexDecode similar to ASCII85Decode but less compact
FlateDecode a commonly used filter based on the zlib/deflate algorithm (a.k.a. gzip, but not zip)
defined in RFC 1950 and RFC 1951; introduced in PDF 1.2; it can use one of two groups of
predictor functions for more compact zlib/deflate compression: Predictor 2 from the TIFF 6.0
specification and predictors (filters) from the PNG specification (RFC 2083)
LZWDecode a filter based on LZW Compression; it can use one of two groups of predictor
functions for more compact LZW compression: Predictor 2 from the TIFF 6.0 specification and
predictors (filters) from the PNG specification
RunLengthDecode a simple compression method for streams with repetitive data using the
Run-length encoding algorithm and the image-specific filters
DCTDecode a lossy filter based on the JPEG standard
CCITTFaxDecode a lossless bi-level (black/white) filter based on the Group 3 or Group 4
CCITT (ITU-T) fax compression standard defined in ITU-T T.4 and T.6
JBIG2Decode a lossy or lossless bi-level (black/white) filter based on the JBIG2 standard,
introduced in PDF 1.4
JPXDecode a lossy or lossless filter based on the JPEG 2000 standard, introduced in PDF 1.5
Normally all image content in a PDF is embedded in the file. But PDF allows image data to be stored
in external files by the use of external streams or Alternate Images. Standardized subsets of PDF,
including PDF/A and PDF/X, prohibit these features.
Text in PDF is represented by text elements in page content streams. A text element specifies that
characters should be drawn at certain positions. The characters are specified using the encoding of a
selected font resource.
A font object in PDF is a description of a digital typeface. It may either describe the characteristics of
a typeface, or it may include an embedded font file. The latter case is called an embedded font while
the former is called an unembedded font. The font files that may be embedded are based on widely
used standard digital font formats: Type 1 (and its compressed variant CFF), TrueType, and
(beginning with PDF 1.6) OpenType. Additionally PDF supports the Type 3 variant in which the
components of the font are described by PDF graphic operators.
Standard Type 1 Fonts (Standard 14 Fonts)
Fourteen typefaces—known as the standard 14 fonts—have a special significance in PDF
Times (v3) (in regular, italic, bold, and bold italic)
Courier (in regular, oblique, bold and bold oblique)
Helvetica (v3) (in regular, oblique, bold and bold oblique)
These fonts are sometimes called the base fourteen fonts.
These fonts, or suitable substitute fonts
with the same metrics, must always be available in all PDF readers and so need not be embedded in
PDF viewers must know about the metrics of these fonts. Other fonts may be substituted if
they are not embedded in a PDF.
Within text strings, characters are shown using character codes (integers) that map to glyphs in the
current font using an encoding. There are a number of predefined encodings, including WinAnsi,
MacRoman, and a large number of encodings for East Asian languages, and a font can have its own
built-in encoding. (Although the WinAnsi and MacRoman encodings are derived from the historical
properties of the Windows and Macintosh operating systems, fonts using these encodings work
equally well on any platform.) PDF can specify a predefined encoding to use, the font's built-in
encoding or provide a lookup table of differences to a predefined or built-in encoding (not
recommended with TrueType fonts).
The encoding mechanisms in PDF were designed for Type 1
fonts, and the rules for applying them to TrueType fonts are complex.
For large fonts or fonts with non-standard glyphs, the special encodings Identity-H (for horizontal
writing) and Identity-V (for vertical) are used. With such fonts it is necessary to provide a ToUnicode
table if semantic information about the characters is to be preserved.
The original imaging model of PDF was, like PostScript's, opaque: each object drawn on the page
completely replaced anything previously marked in the same location. In PDF 1.4 the imaging model
was extended to allow transparency. When transparency is used, new objects interact with previously
marked objects to produce blending effects. The addition of transparency to PDF was done by means
of new extensions that were designed to be ignored in products written to the PDF 1.3 and earlier
specifications. As a result, files that use a small amount of transparency might view acceptably in older
viewers, but files making extensive use of transparency could be viewed incorrectly in an older viewer
Documents you may be interested
Documents you may be interested