75
Text recognition, accessibility, and meta-
data functions can be used to give the new
PDF additional features.
周e text recognition function creates
searchable text (otherwise, the scanned
pages in the PDF simply take the form of
pixels). Accessibility (enabling access to
content for the visually impaired) gives
the PDF structural information that de-
scribes the order in which it is to be read.
周is is required to optimize the use of
screen readers. Metadata can be consid-
ered as a kind of digital label that contains
information such as the title of the docu-
ment, copyright, keywords, and author.
周is is useful for administrating digital
archives.
Detailed description of text recognition
Additional settings for text recognition can
be made using the ‘Options’ button. 周is
area includes a language setting and other
fine-tuning settings for text recognition.
For example, the user can define whether
the scan output should be a searchable im-
age or formatted text with graphics. How-
ever, note the following: Even the second,
superior option is no guarantee of PDF/A-
1a-compliance, since errors can still occur
when structures are reconstructed. 周is is
why the restricted version, PDF/A-1b, is
used here, too.
Converting pages that have already been
scanned to PDF/A
周e procedure used to convert scanned
documents that already exist in the form of
pixel data to PDF documents in Acrobat
Professional is rather different. First, the
image file (TIFF or JPEG) is imported by
choosing ‘File’ → ‘Create PDF’ → ‘From File’
from the menu. It is then converted to a
PDF file. 周e ‘Document’ menu contains
the ‘Optimize Scanned PDF’ function.
Once the document has been converted
into a PDF, the user can use this function
to improve it before subjecting it to text
recognition.
Text recognition is also called from the
‘Document’ menu item. It is triggered with
the ‘OCR Text Recognition’ → ‘Recognize
Text Using OCR’ command.
The user can then check that the pro-
cess worked correctly: Clicking ‘Find All
OCR Suspects’ triggers a search for im-
age elements that could not be converted
to text.
➔
Recognize Text – Settings: The ‘PDF Output Style’ field contains op-
tions for generating a simple PDF image with searchable text or a
more complex PDF file with separate areas for text and graphics
where possible.
Text Recognition and Metadata: These options give the PDF addi-
tional features such as searchable text and metadata. These fea-
tures are not PDF/A-relevant, but they do enhance the functionality
of the PDF file.
Document optimization and text recogni-
tion: The ‘Optimize Scanned PDF’ function
can be used to enhance the source materi-
al for text recognition, e.g. by removing
edge shadows. Following this process, the
user can use Adobe’s ‘Recognize Text Using
OCR’ function to generate searchable text.
PDF/A creation
PDF/A
in a Nutshell
23
69
Saving or exporting the document as a PDF/A
周e PDF document must now be converted
to PDF/A. 周is can be achieved in just a few
steps using the Export function or with the
‘Save As’ command. Both methods involve
the use of the integrated Acrobat Profes-
sional Preflight engine, which carries out
the conversion to PDF/A. Regardless of
whether the user chooses the Export op-
tion or ‘Save As’, only the PDF/A-1b level in
the ‘Settings’ will be successful.
Even a晴er text recognition, metadata in-
put, and the integration of structural infor-
mation for accessibility, scanned docu-
ments do not automatically have advanced
PDF/A-1a features.
When the user clicks ‘OK’, Acrobat gen-
erates a PDF/A file from the PDF docu-
ment.
■
Saving space when generating PDF files from
scanned documents:
The generation of PDF files from digitalized paper documents has a disadvan-
tage – the image data for such files normally requires more memory capacity
than digital pages of text. A PDF generated from a Word document will be
considerably smaller than a PDF file that is generated from a Word printout
using a scanner.
This comparatively high file size is particularly cumbersome if a large number
of documents with many pages need to be archived. There is a big difference
between 10,000 x 40 KB and 10,000 x 400 KB: 400 MB will still fit on a CD-ROM;
4 GB will not.
An important factor for determining the size of PDF files is whether the docu-
ment is read in black and white (line scan), grayscale, or in color – color data
consists of much more information than bitonal data and the resulting data
quantity is therefore also larger.
Various image compression types have been developed over the past years to
enable users to save memory space when storing image data. The best known
of these methods is JPEG compression. PDF/A permits compression, but not all
types. JPEG and JBIG2 are permitted, but JPEG2000 is not. In addition to the
type of compression, the compression level is also important for a scanned
text. This is for readability reasons. Higher compression levels can render the
image/text progressively less clearly.
Berlin-based LuraTech has been working on effective image compression for
digitalized company documents for years. During the course of the develop-
ment of PDF/A, LuraTech has enhanced the product and service scope of scan-
to-image and scan-to-PDF solutions by adding a scan-to-PDF/A function. The
JBIG2 compression used has been improved by a type of layer technology that
enables color documents to be digitalized in a legible manner while using rela-
tively low amounts of memory.
In addition to compression, there are various text
recognition functions and options for integrating
metadata into PDF/A files.
More information on the Internet at:
www.luratech.com
A multitude of formats for saving documents: This long list of for-
mats includes the PDF/A standard.
Scanned documents are always converted to PDF/A-1b: To ensure a
successful PDF/A conversion, the preset PDF/A-1b-compliance speci-
fication must not be changed.
red text
and
blue text
red text
and
blue text
24
PDF/A
in a Nutshell
PDF/A creation
73
The Distiller engine
For a long time, the Distiller was the only
recommended way of producing faultless
PDF files for tasks such as professional
printing from certain programs. In the
light of constantly improving PDF creation
functions such as those used in new ver-
sions of widely used programs like Micro-
so晴 Office and InDesign or at operating-
system level (for instance, Mac OS X), the
Distiller is losing some of its importance.
However, it is still an important component
of the Acrobat package.
周e Distiller uses a slightly specious
method to convert various file formats into
the PDF format: It uses the PostScript for-
mat that is generated temporarily when
printing files. Because PostScript and PDF
are related to each other both with regard to
development and on a structural level, con-
version from PostScript to PDF is normally
easy to achieve – as long as the appropriate
printer drivers and a PostScript to PDF con-
verter, like Adobe Distiller, are available.
Since all popular programs have print
functions, the generation of PDFs using a
combination of print data and the Dis-
tiller is an all-purpose method. In addi-
tion, the related format EPS (Encapsu-
lated PostScript) can also be directly ‘dis-
tilled’.
When is the Distiller useful? 周e Dis-
tiller is the appropriate tool for creating
PDFs from applications that do not offer a
PDF export function or an option for sav-
ing files as PDFs. However, the Distiller
has more features. Watched folders enable
the creation of PDFs to be automated and
standardized – a useful feature for many
usage environments.
PDF/A document generation using the
Distiller
With Acrobat 8, Adobe has implemented
presettings for standard-compliant con-
version to PDF/A. 周e Distiller can only
be used to create PDF/A-1b-compliant
documents; PDF/A-1a documents cannot
be created for technical reasons, since the
required structural information is not
generated or passed using the PostScript
method.
➔
PDF/A using Distiller 8: There are two
PDF/A settings, each with a different color
space. This enables the creation of PDF/A
files in RGB (for displaying on monitors)
and in CMYK (for printing).
Acrobat Distiller 8: The Distiller is a
program that is contained in the Acrobat
package. Even Acrobat 1 was shipped with
a Distiller. Reliable PDF/A creation func-
tions were introduced with Distiller 8.
PostScript and EPS files can
also be converted to PDF by
dragging them into the Acro-
bat Distiller window or onto
the Distiller icon. This means
that there is no need to use the
‘Open...’ command.
PDF/A creation
PDF/A
in a Nutshell
25
55
Users can choose between two default
settings in the main window of Acrobat
Distiller. PDF/A in color space RGB is
mainly suited for use on computer screens.
CMYK PDF/A is intended for printing out
with either an office printer or with pro-
fessional four color printing on an offset
printer.
PDF/A settings in detail
Changes to the preset default settings for
the generation of PDF/A should only be
made after due consideration to avoid
creating non-compliant documents.
These settings can be modified by choos-
ing ‘Settings’ → ‘Edit Adobe PDF Set-
tings’.
Settings that influence the resolution
and compression of images can be made
in the ‘Images’ section. Files with lower
resolution and higher compression values
are smaller, but this can worsen the dis-
play quality. However, the compression
type can be changed to ZIP, which does
not impair the image quality.
When creating PDF/A with the CMYK
color space, European users should take a
look at the ‘Standards’ section. 周e Out-
put Intent presetting here is intended for
the US market. (周e term ‘output intent’
comes from the color management field
and refers to the regulation of color set-
tings for printing.) In this area, users can
select an output intent that is more suited
for use in Europe, such as the European
ICC profile ‘ISO Coated FOGRA27’, which
is contained in the Acrobat 8 scope of de-
livery.
If a change is made to a default profile,
the changed profile is saved as a copy; Dis-
tiller default settings cannot be overwrit-
ten.
PDF/A settings: Users can change the com-
pression level and resolution in the ‘Imag-
es’ section. The compression type can be
changed to ‘ZIP’. The preset sRGB output
intent in the Standards section is the gen-
erally recommended intent for RGB. In the
case of CMYK, the preset US profile can be
changed to a profile more suited to the Eu-
ropean market.
lio – photocase.com/de
26
PDF/A
in a Nutshell
PDF/A creation
52
Additional throughput with watched folders
PDF settings (the settings that specify how
PDF files are to be ‘distilled’) can also be
appended to file-system folders. Such fold-
ers are called ‘watched folders’ or ‘hot
folders’.
Hot folders can be set up in just a few
steps in the Distiller. The user specifies
which file-system folders are to be
watched, selects the required post pro-
cessing setting – in this case, the setting
for PDF/A – and the Distiller then creates
two new folders (‘In’ and ‘Out’) in each
hot folder, as well as a ‘joboptions’ in-
struction file.
If a print file is now saved in this folder,
the job specifications are implemented au-
tomatically without user intervention. Us-
ers can save multiple documents in a
watched folder for processing. In addition
to the fact that this process can be carried
out automatically, another advantage is
that the quality of the files remains con-
stant.
However, as far as licensing is con-
cerned, the Distiller is not intended to be
used to enable entire departments to ac-
cess watched folders on the server. Adobe
markets a server version of the Distiller
for the mass creation of PDFs in compa-
nies. 周is high-throughput solution has
now acquired a different naming and is
marketed as the ‘LiveCycle PDF Generator
PostScript’.
■
Watched folders: The Distiller enables
watched folders to be set up. Why not cre-
ate separate ‘hot folders’ for PDF/A (RGB)
and PDF/A (CMYK) in order to create PDF
files more effectively?
In and Out: PostScript print files are sent to
the ‘In’ folder. The Distiller processes each
file in accordance with the settings in the
job options (in this case, the setting for
PDF/A files in the RGB color space). The
new PDF/A files are sent to the ‘Out’ folder.
The process creates log files containing in-
formation on the process flow.
PDF/A creation
PDF/A
in a Nutshell
27
68
Office and
administration
Many users all over the world use Micro-
so晴 Office programs to create their docu-
ments. Frequently, working files in DOC,
PPT, or Excel format are used for internal
and external communication and for stor-
ing files in archives. 周is process can some-
times cause problems for recipients and is
not optimum for long-term storage. PDF –
or, even better, PDF/A – minimizes if not
totally eliminates problems that occur
when exchanging and archiving files. 周e
creation of PDFs in the new Microso晴 Of-
fice 2007 is slightly different from the pro-
cess in previous versions of Office.
PDF/A in Office 2007
Unlike with the previous versions of Mi-
croso晴 Office, Office 2007 enables the ex-
port of PDF/A files without requiring the
use of Acrobat or the Distiller.
Before the Office 2007 package was rolled
out at the end of 2006, discussions took
place on PDF features. Adobe Systems and
Microso晴 disagreed about the integration
of a direct PDF output function in Office
2007 programs. As a solution to the dis-
pute, users must now download a separate
‘Save As PDF or XPS’ add-in from the Mi-
croso晴 Web site and install it in the appli-
cation package later on. 周e following Of-
fice 2007 programs benefit from the new
export function: Access, Excel, InfoPath,
OneNote, PowerPoint, Publisher, Visio,
and Word.
Once the PDF export add-in has been in-
stalled, an extra option is added to the Save
As command that enables Office docu-
ments to be saved as PDFs. 周e ‘Options’
dialog box in the ‘Publish As’ area is a use-
ful feature. Users can select a checkbox
here to create PDF files in accordance with
the PDF/A standard: ‘ISO 19005-1 compli-
ant (PDF/A)’. When the user clicks ‘OK’,
the program creates a PDF/A-1b-compliant
file.
If users wish to proceed as in Office 2003
and use a connection to the PDF/Distiller
Only available as an add-in: Users can
only benefit from the function that en-
ables documents to be published as PDF
files in Office 2007 after downloading and
installing a free add-in. Internet:
www.microsoft.com
PDF/A from Office 2007: The Options in
the PDF export dialog include the ‘ISO
19005-1 compliant (PDF/A)’ setting,
which enables the generation of a
level B PDF/A.
XPS is a device-independent
document format developed
by Microsoft. The abbreviation
stands for ‘XML Paper Specifi-
cation’.
28
PDF/A
in a Nutshell
PDF/A creation
72
settings to export PDFs from Office 2007,
they should expect to experience problems
in conjunction with Acrobat 8.0. Accord-
ing to the manufacturers, this is due to the
fact that the rollout dates of the two so晴-
ware solutions were so close together. An
Acrobat update to version 8.1 should solve
these incompatibility issues.
Office 2003 and the PDFMaker
It is only possible to generate PDF/A docu-
ments from Office 2003 using the PDF-
Maker add-in and a connection to Acrobat
(or the Adobe Distiller). Acrobat 8 Profes-
sional provides current conversion settings
for PDF/A. Users can create both PDF/A-1-
a-compliant and PDF/A-1b-compliant files
from Office programs.
Settings for PDF/A-1b
The Office application menu (for exam-
ple, in Word) has an ‘Adobe PDF’ entry
that enables the triggering of PDF gener-
ation and access to the presettings. The
‘Change Conversion Settings’ command
opens a dialog box where users can select
options and make additional settings.
The ‘Settings’ tab consists of a dropdown
menu with various options delivered with
Adobe Distiller. There are two PDF/A-1b
variants here – one for four-color CMYK
output, and one for RGB monitor output.
In this example, the RGB variant is used.
Clicking the ‘Advanced Settings ...’ but-
ton opens detailed Adobe PDF settings.
Users can change the image resolution
and compression type here, but it is im-
portant to take care not to make changes
that could endanger the PDF/A-compli-
ance of files (for example, for Acrobat
compatibility). However, let us return to
the conversion settings tabs in the Mi-
crosoft application.
Be careful with the security settings
Because security settings – passwords for
opening, printing, or changing PDF files –
are not permitted in PDF/A files, users
should not make any changes on the ‘Secu-
rity’ tab. Users who wish to protect their
PDF/A files must protect the storage loca-
tion of these files. 周is can be achieved by
implementing password protection for a
folder or drive, for example.
➔
Adobe PDF in Word 2003: The Conversion
Settings are essential tools for successfully
producing PDF/A files. The settings for
PDF/A-1b produce files that are suitable
for long-term archiving. PDF documents in
RGB color mode are intended to be dis-
played on monitors; those in CMYK mode
are primarily intended to be printed. The
conversion setting for PDF/A-1a can be ac-
tivated by selecting the relevant checkbox.
Acrobat 7 offered support for
preliminary versions of the
PDF/A standard.
As of Acrobat 8, full support of
the final PDF/A standard is of-
fered.
PDF/A creation
PDF/A
in a Nutshell
29
Documents you may be interested
Documents you may be interested