47
3.1 General Programming 43
It is important to understand that the generated PDF document cannot be finished
after an exception occurred. The only method which can safely be called after an excep-
tion is PDF_delete( ). In the C language binding PDF_get_apiname( ), PDF_get_errnum( ),
and PDF_get_errmsg( ) may also be called. Calling any other PDFlib methods after an ex-
ception may lead to unexpected results. The exception (or data passed to the C error
handler) will contain the following information:
> A unique error number (see Table 3.2);
> The name of the PDFlib API function which caused the exception;
> A descriptive text containing details of the problem;
C language clients can fetch this information using dedicated functions (PDF_get_
errnum( ), PDF_get_apiname( ), and PDF_get_errmsg( )), while in other languages it will be
part of the exception object.
Disabling exceptions. Some exceptions can be disabled. These fall into two categories:
non-fatal errors (warnings) and errors which may or may not justify an exception de-
pending on client preferences.
Warnings generally indicate some problem in your PDFlib code which you should in-
vestigate more closely. However, processing may continue in case of non-fatal errors.
For this reason, you can suppress warnings using the following function call:
PDF_set_parameter(p, "warning", "false");
The suggested strategy is to enable warnings during the development cycle (and closely
examine possible warnings), and disable warnings in a production system.
Certain operations may be considered fatal for some clients, while others are pre-
pared to deal with the situation. In these cases the behavior of the respective PDFlib API
function changes according to a parameter. This distinction is implemented for loading
fonts, images, PDF import documents, and ICC profiles. For example, if a font cannot be
loaded due to some configuration problem one client may simply give up, while anoth-
er may choose another font instead. When the parameter fontwarning is set to true, an
exception will be thrown when the font cannot be loaded. Otherwise the function will
return an error code instead. The parameter can be set as follows:
PDF_set_parameter(p, "fontwarning", "false");
3.1.4 Option Lists
Option lists are a powerful yet easy method to control PDFlib operations. Instead of re-
quiring a multitude of function parameters, many PDFlib API methods support option
lists, or optlists for short. These are strings which may contain an arbitrary number of
options. Optlists support various data types and composite data like arrays. In most lan-
Table 3.2 Ranges of PDFlib exception numbers
error ranges
reasons
1000 – 1999
(PDCORE library): memory, I/O, arguments, parameters/values, options
2000 – 2999
(PDFlib library): configuration, scoping, graphics and text, color, images, fonts, encodings, hyper-
text, PDF/X
3000 – 3999 (reserved)
4000 – 4999
(PDF import library PDI): configuration and parameter, corrupt PDF (file, object, or stream level)
C# Word - Word Conversion in C#.NET Word documents in .NET class applications independently, without using other external third-party dependencies like Adobe Acrobat. Word to PDF Conversion.
how to delete text from a pdf in acrobat; remove text from pdf reader
51
44
Chapter 3: PDFlib Programming
guages optlists can easily be constructed by concatenating the required keywords and
values. C programmers may want to use the sprintf( ) function in order to construct
optlists. An optlist is a string containing one or more pairs of the form
name value(s)
Names and values, as well as multiple name/value pairs can be separated by arbitrary
whitespace characters (space, tab, carriage return, newline) and/or an equal sign ’=’.
Simple values may use any of the following data types:
> Boolean: true or false; if the value of a boolean option is omitted the value true is as-
sumed. As a shorthand notation noname can be used instead of name false.
> String: strings containing whitespace must be bracketed with { and }. An empty
string can be constructed with { }. The characters {, }, and \ must be preceded by a an
additional \ character if they are supposed to be part of the string.
> Keyword: one of a predefined list of fixed keywords
> Float and integer: decimal floating point or integer numbers; point and comma can
be used as decimal separators.
> Handle: several PDFlib-internal object handles, e.g., font handles, image handles.
Options can have list values, i.e., a group of multiple simple values. Lists are bracketed
with { and }, for example
dasharray {11 22 33}
Depending on the type and interpretation of an option additional restrictions may ap-
ply. For example, integer or float options may be restricted to a certain range of values;
handles must be valid for the corresponding type of object, etc. Conditions for options
are documented in their respective descriptions.
The following lines contain examples of option lists for various PDFlib functions
which support option lists (see Chapter 7 for details on available options):
PDF_fit_image( ):
boxsize {500 600} position 50 fitmethod nofit scale .5
PDF_load_font( ):
embedding=true subsetting=true subsetlimit=50 kerning=false
PDF_load_font( ):
embedding subsetting subsetlimit=50 nokerning
PDF_open_pdi( ):
password {secret string}
PDF_create_gstate( ):
linewidth 0.5 blendmode overlay opacityfill 0.75
3.1.5 The PDFlib Virtual File System (PVF)
In addition to disk files a facility called PDFlib Virtual File System (PVF) allows clients to di-
rectly supply data in memory without any disk files involved. This offers performance
benefits and can be used for data fetched from a database which does not even exist on
an isolated disk file, as well as other situations where the client already has the required
data available in memory as a result of some processing.
PVF is based on the concept of named virtual read-only files which can be used just
like regular file names with any API function. They can even be used in UPR configura-
tion files. Virtual file names can be generated in an arbitrary way by the client. Obvious-
ly, virtual file names must be chosen such that name clashes with regular disk files are
avoided. For this reason a hierarchical naming convention for virtual file names is rec-
ommended as follows (filename refers to a name chosen by the client which is unique in
the respective category). It is also recommended to keep standard file name suffixes:
> Raster image files: /pvf/image/filename
45
3.1 General Programming 45
> font outline and metrics files (it is recommended to use the actual font name as the
base portion of the file name): /pvf/font/filename
> ICC profiles: /pvf/iccprofile/filename
> Encodings and codepages: /pvf/codepage/filename
> PDF documents: /pvf/pdf/filename
When searching for a named file PDFlib will first check whether the supplied file name
refers to a known virtual file, and then try to open the named file on disk.
Lifetime of virtual files. Some functions will immediately consume the data supplied
in a virtual file, while others will read only parts of the file, with other fragments being
used at a later point in time. For this reason close attention must be paid to the lifetime
of virtual files. PDFlib will place an internal lock on every virtual file, and remove the
lock only when the contents are no longer needed. Unless the client requested PDFlib to
make an immediate copy of the data (using the copy option in PDF_create_pvf( )), the vir-
tual file’s contents must only be modified, deleted, or freed by the client when it is no
longer locked by PDFlib. PDFlib will automatically delete all virtual files in PDF_delete( ).
However, the actual file contents (the data comprising a virtual file) must always be
freed by the client.
Different strategies. PVF supports different approaches with respect to managing the
memory required for virtual files. These are governed by the fact that PDFlib may need
access to a virtual file’s contents after the API call which accepted the virtual file name,
but never needs access to the contents after PDF_close( ). Remember that calling PDF_
delete_pvf( ) does not free the actual file contents (unless the copy option has been sup-
plied), but only the corresponding data structures used for PVF file name administra-
tion. This gives rise to the following strategies:
> Minimize memory usage: it is recommended to call PDF_delete_pvf( ) immediately af-
ter the API call which accepted the virtual file name, and another time after PDF_
close( ). The second call is required because PDFlib may still need access to the data so
that the first call refuses to unlock the virtual file. However, in some cases the first
call will already free the data, and the second call doesn’t do any harm. The client
may free the file contents only when PDF_delete_pvf( ) succeeded.
> Optimize performance by reusing virtual files: some clients may wish to reuse some
data (e.g., font definitions) within various output documents, and avoid multiple
create/delete cycles for the same file contents. In this case it is recommended not to
call PDF_delete_pvf( ) as long as more PDF output documents using the virtual file
will be generated.
> Lazy programming: if memory usage is not a concern the client may elect not to call
PDF_delete_pvf( ) at all. In this case PDFlib will internally delete all pending virtual
files in PDF_delete( ).
In all cases the client may free the corresponding data only when PDF_delete_pvf( ) re-
turned successfully, or after PDF_delete( ).
3.1.6 Resource Configuration and File Searching
In most advanced applications PDFlib needs access to resources such as font file, encod-
ing definition, ICC color profiles, etc. In order to make PDFlib’s resource handling plat-
form-independent and customizable, a configuration file can be supplied for describing
C# Excel - Excel Conversion & Rendering in C#.NET Excel documents in .NET class applications independently, without using other external third-party dependencies like Adobe Acrobat. Excel to PDF Conversion.
acrobat remove text from pdf; how to erase pdf text
56
46
Chapter 3: PDFlib Programming
the available resources along with the names of their corresponding disk files. In addi-
tion to a static configuration file, dynamic configuration can be accomplished at run-
time by adding resources with PDF_set_parameter( ). For the configuration file we dug
out a simple text format called Unix PostScript Resource (UPR) which came to life in the
era of Display PostScript, and is still in use on several systems. However, we extended
the original UPR format for our purposes. The UPR file format as used by PDFlib will be
described below. There is a utility called makepsres (often distributed as part of the X
Window System) which can be used to automatically generate UPR files from PostScript
font outline and metrics files.
Resource categories. The resource categories supported by PDFlib are listed in Table
3.3. Other resource categories may be present in the UPR file for compatibility with Dis-
play PostScript installations, but they will silently be ignored.
Redundant resource entries should be avoided. For example, do not include multiple
entries for a certain font’s metrics data. Also, the font name as configured in the UPR file
should exactly match the actual font name in order to avoid confusion (although
PDFlib does not enforce this restriction).
In Mac OS Classic the colon character ’:’ must be used as a directory separator. The
font names of resource-based PostScript Type 1 fonts (LWFN fonts) must be specified us-
ing the full path including volume name, for example:
Foo-Italic=Classic:Data:Fonts:FooIta
The UPR file format. UPR files are text files with a very simple structure that can easily
be written in a text editor or generated automatically. To start with, let’s take a look at
some syntactical issues:
> Lines can have a maximum of 255 characters.
> A backslash ’\’ escapes newline characters. This may be used to extend lines.
> An isolated period character ’ . ’ serves as a section terminator.
> All entries are case-sensitive.
> Comment lines may be introduced with a percent ’%’ character, and terminated by
the end of the line.
> Whitespace is ignored everywhere except in resource names and file names.
UPR files consist of the following components:
> A magic line for identifying the file. It has the following form:
Table 3.3 Resource categories supported in PDFlib
resource category name
explanation
SearchPath
Relative or absolute path name of directories containing data files
FontAFM
PostScript font metrics file in AFM format
FontPFM
PostScript font metrics file in PFM format
FontOutline
PostScript, TrueType or OpenType font outline file
Encoding
text file containing an 8-bit encoding or code page table
HostFont
name of a font installed on the system
1
1. Resources in this category do not necessarily require any value.
ICCProfile
name of an ICC color profile
1
StandardOutputIntent
name of a standard output condition for PDF/X
47
3.1 General Programming 47
PS-Resources-1.0
> A section listing all resource categories described in the file. Each line describes one
resource category. The list is terminated by a line with a single period character.
Available resource categories are described below.
> A section for each of the resource categories listed at the beginning of the file. Each
section starts with a line showing the resource category, followed by an arbitrary
number of lines describing available resources. The list is terminated by a line with a
single period character. Each resource data line contains the name of the resource
(equal signs have to be quoted). If the resource requires a file name, this name has to
be added after an equal sign. The SearchPath (see below) will be applied when PDFlib
searches for files listed in resource entries.
File searching and the SearchPath resource category. PDFlib reads a variety of data
items, such as raster images, font outline and metrics information, encoding defini-
tions, PDF documents, and ICC color profiles from disk files. In addition to relative or ab-
solute path names you can also use file names without any path specification. The
SearchPath resource category can be used to specify a list of path names for directories
containing the required data files. When PDFlib must open a file it will first use the file
name exactly as supplied and try to open the file. If this attempt fails PDFlib will try to
open the file in the directories specified in the SearchPath resource category one after
another until it succeeds. SearchPath entries can be accumulated, and will be searched in
reverse order (paths set at a later point in time will searched before earlier ones). This
feature can be used to separate the PDFlib application from platform-specific file sys-
tem schemes. In order to disable the search you can use a fully specified path name in
the PDFlib functions.
On Windows PDFlib will initialize the SearchPath resource category with an entry
read from the following registry entry:
HKLM\SOFTWARE\PDFlib\PDFlib\5.0.1\SearchPath
This registry entry may contain a list of path names separated by a semicolon ’;’ char-
acter.
On IBM iSeries the SearchPath resource category will be initialized with the following
values:
/pdflib/5.0.1/fonts
/pdflib/5.0.1/bind/data
Sample UPR file. The following listing gives an example of a UPR configuration file as
used by PDFlib. It describes some font metrics and outline files plus a custom encoding:
PS-Resources-1.0
SearchPath
FontAFM
FontPFM
FontOutline
Encoding
ICCProfile
.
SearchPath
/usr/local/lib/fonts
Classic:Data:Fonts
48
48
Chapter 3: PDFlib Programming
C:/psfonts/pfm
C:/psfonts
/users/kurt/my_images
.
FontAFM
Code-128=Code_128.afm
.
FontPFM
Foobar-Bold=foobb___.pfm
Mistral=c:/psfonts/pfm/mist____.pfm
.
FontOutline
Code-128=Code_128.pfa
ArialMT=Arial.ttf
.
Encoding
myencoding=myencoding.enc
.
ICCProfile
highspeedprinter=cmykhighspeed.icc
.
Searching for the UPR resource file. If only the built-in resources (e.g., PDF core font,
built-in encodings, sRGB ICC profile) or system resources (host fonts) are to be used, a
UPR configuration file is not required, since PDFlib will find all necessary resources
without any additional configuration.
If other resources are to be used you can specify such resources via calls to PDF_set_
parameter( ) (see below) or in a UPR resource file. PDFlib reads this file automatically
when the first resource is requested. The detailed process is as follows:
> If the environment variable PDFLIBRESOURCE is defined PDFlib takes its value as the
name of the UPR file to be read. If this file cannot be read an exception will be
thrown.
> If the environment variable PDFLIBRESOURCE is not defined PDFlib tries to open a file
with the following name:
upr (on MVS; a dataset is expected)
pdflib/<version>/fonts/pdflib.upr (on IBM eServer iSeries)
pdflib.upr (Windows, Unix, and all other systems)
If this file cannot be read no exception will be thrown.
> On Windows PDFlib will additionally try to read the registry entry
HKLM\SOFTWARE\PDFlib\PDFlib\5.0.1\resourcefile
The value of this entry (which will be created by the PDFlib installer, but can also be
created by other means) will be taken as the name of the resource file to be used. If
this file cannot be read an exception will be thrown.
> The client can force PDFlib to read a resource file at runtime by explicitly setting the
resourcefile parameter:
PDF_set_parameter(p, "resourcefile", "/path/to/pdflib.upr");
This call can be repeated arbitrarily often; the resource entries will be accumulated.
44
3.1 General Programming 49
Configuring resources at runtime. In addition to using a UPR file for the configuration,
it is also possible to directly configure individual resources within the source code via
the PDF_set_parameter( ) function. This function takes a category name and a corre-
sponding resource entry as it would appear in the respective section of this category in
a UPR resource file, for example:
PDF_set_parameter(p, "FontAFM", "Foobar-Bold=foobb___.afm")
PDF_set_parameter(p, "FontOutline", "Foobar-Bold=foobb___.pfa")
3.1.7 Generating PDF Documents in Memory
In addition to generating PDF documents on a file, PDFlib can also be instructed to gen-
erate the PDF directly in memory (in-core). This technique offers performance benefits
since no disk-based I/O is involved, and the PDF document can, for example, directly be
streamed via HTTP. Webmasters will be especially happy to hear that their server will
not be cluttered with temporary PDF files. Unix users can write the generated PDF to the
stdout channel and consume it in a pipe process by supplying »–« as filename for PDF_
open_file( ).
You may, at your option, periodically collect partial data (e.g., every time a page has
been finished), or fetch the complete PDF document in one big chunk at the end (after
PDF_close( )). Interleaving production and consumption of the PDF data has several ad-
vantages. Firstly, since not all data must be kept in memory, the memory requirements
are reduced. Secondly, such a scheme can boost performance since the first chunk of
data can be transmitted over a slow link while the next chunk is still being generated.
However, the total length of the generated data will only be known when the complete
document is finished.
The active in-core PDF generation interface. In order to generate PDF data in memory,
simply supply an empty filename to PDF_open_file( ), and retrieve the data with PDF_
get_buffer( ):
PDF_open_file(p, "")
...create document...
PDF_close(p);
buf = PDF_get_buffer(p, &size);
... use the PDF data contained in the buffer ...
PDF_delete(p);
Note The PDF data in the buffer must be treated as binary data.
This is considered »active« mode since the client decides when he wishes to fetch the
buffer contents. Active mode is available for all supported language bindings.
Note C and C++ clients must not free the returned buffer.
The passive in-core PDF generation interface. In »passive« mode, which is only avail-
able in the C and C++ language bindings, the user installs (via PDF_open_mem( )) a call-
back function which will be called at unpredictable times by PDFlib whenever PDF data
is waiting to be consumed. Timing and buffer size constraints related to flushing (trans-
ferring the PDF data from the library to the client) can be configured by the client in or-
der to provide for maximum flexibility. Depending on the environment, it may be ad-
vantageous to fetch the complete PDF document at once, in multiple chunks, or in
53
50
Chapter 3: PDFlib Programming
many small segments in order to prevent PDFlib from increasing the internal docu-
ment buffer. The flushing strategy can be set using PDF_set_parameter( ) and the flush
parameter values detailed in Table 3.4.
3.1.8 Using PDFlib on EBCDIC-based Platforms
The operators and structure elements in the PDF file format are based on ASCII, making
it difficult to mix text output and PDF operators on EBCDIC-based platforms such as
IBM eServer iSeries 400 and zSeries S/390. However, a special mainframe version of
PDFlib has been carefully crafted in order to allow mixing of ASCII-based PDF operators
and EBCDIC (or other) text output. The EBCDIC-safe version of PDFlib is available for
various operating systems and machine architectures.
In order to leverage PDFlib’s features on EBCDIC-based platforms the following items
are expected to be supplied in EBCDIC text format (more specifically, in code page 037
on iSeries, and code page 1047 on zSeries):
> PFA font files, UPR configuration files, AFM font metrics files
> encoding and code page files
> string parameters to PDFlib functions
> input and output file names
> environment variables (if supported by the runtime environment)
> PDFlib error messages will also be generated in EBCDIC format (except in Java).
If you prefer to use input text files (PFA, UPR, AFM, encodings) in ASCII format you can
set the asciifile parameter to true (default is false). PDFlib will then expect these files in
ASCII encoding. String parameters will still be expected in EBCDIC encoding, however.
In contrast, the following items must always be treated in binary mode (i.e., any con-
version must be avoided):
> PDF input and output files
> PFB font outline and PFM font metrics files
> TrueType and OpenType font files
> image files and ICC profiles
Table 3.4 Controlling PDFlib’s flushing strategy with the flush parameter
flush parameter
flushing strategy
benefits
none
flush only once at the end of the
document
complete PDF document can be fetched by
the client in one chunk
page
flush at the end of each page
generating and fetching pages can be nicely
interleaved
content
flush after all fonts, images, file
attachments, and pages
even better interleaving, since large items
won’t clog the buffer
heavy
always flush when the internal 64
KB document buffer is full
PDFlib’s internal buffer will never grow
beyond a fixed size
Documents you may be interested
Documents you may be interested