98
Preface
Our world is getting more digital by the day.
A lot of information and documents only
exist in digital form today, but will they still
be legible „tomorrow“? 周at was the theme
of an interesting TV show appropriately
called „周e Digital Disaster“. It began with
cave drawings from the stone age and papy-
rus rolls from ancient Egypt, both of which
have survived as documents for thousands
of years. What documents from the 21st
century will future generations be able to
find and still read? But it‘s happening much
quicker than you may realize. I always carry
a 3½ inch floppy disk in my pocket, and it
demonstrates a lot of the problems of long-
term archiving. It begins with the hardware:
where can you buy a 3½ inch floppy disk to-
day? And even if you find one, there‘s a good
chance that the disk is physically damaged.
If these two hardware hurdles are success-
fully cleared, then what kind of so晴ware or
document will we find on the floppy disk?
Are the appropriate viewing and processing
programs still available? And this example
is a mere 15 years old!
My short anecdote leads us to the de-
mand on the long-term archiving of docu-
ments. Electronic archiving is critical for
businesses and organizations, because doc-
uments today o晴en only exist in digital for-
mat. 周e length of time that business docu-
ments have to be archived varies from sec-
tor to sectors and country to country, but
some examples can help us to get an idea.
Federal laws o晴en requires an archiving
period of around 10 years. Banks and in-
surances demand that customer dossiers be
retained for more than 50 years. In the en-
gineering branch, archival periods of 100
years are common for aircra晴, bridges
hopefully hold a whole lot longer.
And saving documents in proprietry for-
mats for this length of time is really not a
good idea. 周is leads to the second problem
with the digital document world - that many
users already have a real „format zoo“, which
can quickly become unmanageable (if it isn‘t
already so). Proprietary document formats
have to be migrated on a regular basis, in or-
der that newer versions of the processing
so晴ware can still read them.
Employees working on customer dos-
siers aren‘t really impressed when 10 differ-
ent viewing programs are opened up at the
same time. In some of the programs they
might not even know how to navigate
around in a document. In order to solve
this problem, a document and archiving
format is needed that guarantees the re-
quired long-term archiving period and of-
fers the option of a single format type.
周is is where PDF/A as an ISO standard
for long-term archiving enters the stage.
周e „A“ stands for „Archive“ and the PDF/A
standard was specifically created for long-
term archiving. It envisions a single PDF/A
archive for all documents in an organiza-
tion, from input through to output, and in-
cludes all of the areas inbetween.
You will find many more advantages to
PDF/A on the following pages, written with
the aim of converting the very formal ISO
standard into a form that is easily under-
stood and enhanced with practical exam-
ples. Since PDF/A resolves a lot of the criti-
cal problems that users have, the PDF/A
Competence Center was formed as an as-
sociation with the aim of providing infor-
mation over PDF/A, promoting the distri-
bution of the standard, and acting as a cen-
tral point of contact for your questions
dealing with PDF/A. We hope that this
booklet gives you a good overview and in-
troduction to PDF/A, and also helps as a
motivator for implementing the standard.
Berlin, in September, 2007
周omas Zellmann,
Chairman PDF/A Competence Center
PS: a special thanks goes out to our mem-
ber callas so晴ware GmbH, who initiated
the German version of this booklet and
provided it to the PDF/A Competence Cen-
ter for translation into English and for fur-
ther distribution.
PDF/A
in a Nutshell
3
77
周roughout history, it has always been im-
portant to preserve our past for future gen-
erations. Until the last 20 years in our paper
centric world, this was a fairly easy task.
One would simply take the folders of pa-
pers or other objects that were to be pre-
served and send them off to an archive for
safe keeping or place them in a fire retar-
dant container. With electronic documents
this task is not as easily approached, which
is how PDF/Archive or PDF/A came into
being.
PDF/Archive addresses the growing need
to electronically archive documents in a
way that would ensure preservation of their
contents over an extended period of time.
Additionally, it ensures that the documents
will be able to be retrieved and rendered
with a consistent and predictable result
each time they are viewed.
AIIM, the Enterprise Content Manage-
ment Association, and NPES – 周e Asso-
ciation for Suppliers of Printing, Publish-
ing and Converting Technologies were ap-
proached by numerous organizations
which were being faced with the need to
preserve over long periods of time, large
quantities of electronic documents. A晴er
reviewing the options of maintaining this
electronic history in TIFF, XML, native
format or PDF, it was decided that PDF
would be the best format as it would enable
the accurate rendering of the document as
it had been intended to be displayed. How-
ever, in order to ensure the long term pres-
ervation of the electronic documents, PDF
would need to be enhanced slightly.
周e joint effort of AIIM and NPES
brought together the document and con-
tent management experts with the graphics
experts who had already developed the
PDF/X family of standards. When we an-
nounced the proposed work to develop a
subset of PDF tags for long-term preserva-
tion of electronic documents, we were over-
whelmed by the interest to participate from
virtually every area in the world.
AIIM’s expertise as an accredited stan-
dards developer and the secretariat of ISO
TC 171, Document Management Applica-
tions and ISO TC 171 SC2, Document Ap-
plications, AIIM brought to the project the
means for gaining ISO approval and wider
adoption of the standard. ISO 19005-1,
Document management – Electronic docu-
ment file format for long-term preservation
– Part 1: Use of PDF 1.4 (PDF/A-1) became
an approved ISO standard within 22
months of introduction as a new project
through the dedicated efforts of many re-
cords managers, archivists, so晴ware devel-
opers and end users.
While adoption of the standard has been
a little slower than we had anticipated, we
are encouraged by the continuing interest
and growing adoption of the standard. 周is
book along with the continuing efforts of
AIIM and the PDF/A Competence Centre
will continue to increase the adoption rate
of PDF/A in the industry.
Silver Spring, in September, 2007
Betsy Fanning
AIIM, Director, Standards
Preface
4
PDF/A
in a Nutshell
C# Excel - Excel Conversion & Rendering in C#.NET using other external third-party dependencies like Adobe Acrobat. Using this .NET professional Excel document conversion library Excel to PDF Conversion.
acrobat remove text from pdf; how to delete text in pdf converter
52
Durable documents with the PDF/A standard
Open files are not always complete
9
TIFF as an archive format
9
PDF data containers
10
Why PDF/A and not PDF?
11
周e introduction of the PDF/A standard
11
How to create archive PDFs
12
Who stands to benefit from PDF/A?
13
Table: Comparison between PDF/A-1a and PDF/A-1b
15
Overview: Which file formats are suitable for archiving?
16
Is XPS an alternative to PDF/A?
18
PDF/A creation: Analog, digital, and mass processing
PDF/A from scanned documents
21
Scanning options in Acrobat 8 Professional
22
Converting pages that have already been scanned to PDF/A
23
The Distiller engine
25
PDF/A document generation using the Distiller
25
Office and administration
28
PDF/A in Office 2007
28
Office 2003 and the PDFMaker
29
PDF/A using the 3-Heights PDF Producer
31
PDF/A ‘en masse’
32
PDF/A ‘from nothing’
32
Creating PDF/A from print data streams
33
Table of Contents
Illustrations: PixelQuelle.de
Table of Contents
PDF/A
in a Nutshell
5
54
From PDF to PDF/A: Converting PDFs to archive PDFs
PDF/A generation with Preflight
34
Converting PDF to PDF/A with pdfaPilot
37
Is this really a PDF/A file? PDF/A validation
Validation with Preflight
39
pdfaPilot PDF/A
41
Archive PDFs in everyday life: What issues might arise?
Images
42
Resolution is not part of the PDF/A standard
43
Permitted and prohibited compression types
43
Transparency
44
Colors
46
Fonts
48
Metadata
50
PDF/A and metadata
50
Accessibility
52
Creating an accessible PDF file from Word
54
Interactive PDF files
56
Comments and annotations
56
Forms
58
Embedding fonts for PDF/A forms
59
PDF/A for design drawings
60
Electronic signatures
61
Security levels
62
Digital signatures in PDF with Acrobat
63
Challenges in practice
64
Illustrations: photocase.com/de
6
PDF/A
in a Nutshell
Table of Contents
25
The outlook: PDF/A in the future
Enhancements in PDF/A-2
65
Looking towards PDF/A-3
66
PDF/A-1 developments
66
PDF/A in one hundred years time
67
What the error messages mean
Preflight results and troubleshooting for PDF/A
68
Glossary
Explanation of terms relating to PDF/A
80
About:
The PDF/A Competence Center
86
AIIM
87
Sepp Huberbauer – photocase.com/de
Table of Contents
PDF/A
in a Nutshell
7
45
Durable documents with the
PDF/A standard
周ere are certain documents that people
want to keep because of their sentimental
value: Love letters, photographs of their
first day at school, or holiday snaps, for ex-
ample. Other documents have to be kept
for legal reasons. 周ese document include
birth certificates, academic certificates and
reports, invoices that are needed for tax
purposes, insurance documents, and con-
tracts.
In the days when everything existed on
paper – in the pre-digital era – the main
problem was remembering which index
file, folder, or shoe box you’d used to store
your letters or contracts. In today’s world of
digital documents, the task of archiving is
fundamentally different. 周anks to search
functions or database solutions, even the
most forgetful of us can easily find a par-
ticular document or photo on our comput-
ers. In addition, any possible space prob-
lems can be solved simply by purchasing
additional RAM. However, there are cer-
tain risks and uncertainties that might in-
fluence the shelf life of digital documents.
周ese risks do not only arise from the phys-
ical durability of the data carriers used al-
though it is clear that magnetic tape, CD-
ROMs, and DVDs will not necessarily last
any longer than paper and ink. However,
photographic prints dating from 1900 still
exist today. Still, it’s debatable whether or
not we will similarly be able to view the
millions of digital snapshots being taken
and stored on mobile phone memory cards
all over the world in, for example, 2107.
In addition to the restrictions imposed
by the limited lifetime of data carriers, the
1.
Markus Imorde – photocase.com/de
8
PDF/A
in a Nutshell
120
document format and so晴ware used also
present a considerable challenge for the du-
rability of electronic documents. Yester-
day’s, today’s, and tomorrow’s so晴ware
It’s a common problem: Opening old
documents in brand-new programs
doesn’t always work. 周e rate of success
for the opposite direction (new documents
in old programs) is even less encouraging.
So晴ware developers do try to achieve
backward compatibility that enables files
that are, say, five years old to be opened
using a current program release. However,
this can change the layout and page ren-
dering, meaning that not everything is
displayed exactly as it ought to be. More
recent so晴ware tends to generate docu-
ments with additional features that older
versions may not be able to display. In
some cases, it is not even possible to open
current files in previous versions of a pro-
gram. For example, whereas a Microso晴
Word 95 file can normally be opened in
Word 2003, it is not
possible to open a
Word 2003 document
in Word 95.
Because so晴ware
production cycles are
becoming ever shorter
– one major release
per year is not unusual
– the challenge that
arises from new pro-
gram developments is greater than that
caused by the aging of storage media. 周e
successful long-term archiving of digital
files is at least as threatened by the constant
rollout of new program versions as by dam-
aged data or data carriers.
Open files are not always complete
File formats are not all equally suitable for
the long-term, secure archiving of content.
If it is not possible to store all the elements
required for the complete display of con-
tent in a file format – graphics and fonts as
well as text – then the possibility of stum-
bling blocks when it is attempted to use
the file later on cannot be ruled out. If, for
example, the program used cannot find
linked external images, a page cannot be
displayed as required. Instead, the frame
where the image should appear displays
only a rough preview of the image or a
question mark. 周e problem of open files
for which not all illustrations and fonts
are available has been causing irritating
delays for printers and their suppliers for a
long time. However, the introduction of
PDF, a format that can store all the com-
ponents required for a printed document,
has greatly simplified work in this area. In
addition, layout files such as XPress or In-
Design are now becoming increasingly
less common in printers’ archives. Instead,
printers are storing the actual PDF docu-
ments that were used for the printing
task.
TIFF as an archive format
For a long time, many public authorities
and companies that need to store large
quantities of correspondence, records, in-
voices, contracts, and similar information
in digital archives
have been using
the pixel image
format
TIFF
(Tagged Image File
Format). 周is for-
mat
digitalizes
templates contain-
ing text and imag-
es pixel by pixel.
TIFF is an estab-
lished image file format that has both ad-
vantages and disadvantages. Pixel-based
formats store the appearance of templates.
Problems with missing graphics and fonts
do not occur, since the format stores all of
the template elements as an image. Since
TIFF is widespread and is subject to few
file handling complications when upgrad-
ing to a new program version, many users
believe that the future of the format is
guaranteed. However, while TIFF may in-
deed be a de facto standard, it is not an
official norm for safe archiving. Other dis-
advantages include the relatively large file
size and the fact that scanned texts cannot
be searched without OCR (text recogni-
tion), since this format converts them to
image elements.
➔
"周e successful long-term
archiving of digital files is at least
as threatened by the constant
rollout of new program versions
as by damaged data or data
carriers."
TIFF-G4 – a black and white
TIFF variant that works with a
compression method devel-
oped for fax technology – is
commonly used for archiving.
Durable documents with the PDF/A standard
PDF/A
in a Nutshell
9
Documents you may be interested
Documents you may be interested