76
102
Chapter 4: Text Handling
Acrobat supports a number of standard fonts for CJK text. These fonts are supplied
with the Acrobat installation (or the Asian FontPack), and therefore don’t have to be em-
bedded in the PDF file. These fonts contain all characters required for common encod-
ings, and support both horizontal and vertical writing modes. The standard fonts and
CMaps are documented in Table 4.5. The Acrobat 4 fonts can also be used with Acrobat 5,
but the corresponding Acrobat 5 fonts will be used for display and printing if a required
font is not installed on the system.
Note Acrobat’s standard CJK fonts do not support bold and italic variations. However, these can be
simulated with the artificial font style feature (see Section 4.6.3, »Text Variations«, page 98).
As can be seen from the table, the default CMaps support most CJK encodings used on
Mac, Windows, and Unix systems, as well as several other vendor-specific encodings. In
particular, the major Japanese encoding schemes Shift-JIS, EUC, ISO 2022, and Unicode
(UCS-2) are supported. Tables with all supported characters are available from Adobe
1
;
CMap descriptions can be found in Table 4.6.
Note Unicode-capable language bindings must only use UCS2-compatible CMaps. Other CMaps are
not supported.
Horizontal and vertical writing mode. PDFlib supports both horizontal and vertical
writing modes for standard CJK fonts and CMaps. The mode is selected along with the
encoding by choosing the appropriate CMap name. CMaps with names ending in -H se-
lect horizontal writing mode, while the -V suffix selects vertical writing mode.
1. See partners.adobe.com/asn/tech/type/cidfonts.jsp for a wealth of resources related to CID fonts, including tables with
all supported glyphs (search for »character collection«).
Table 4.5 Acrobat’s standard fonts and CMaps (encodings) for Japanese, Chinese, and Korean text
locale
font name
sample
supported CMaps (encodings)
Simplified
Chinese
STSong-Light
1
STSongStd-Light-Acro
2
1. Available in Acrobat 4; Acrobat 5 will substitute these with different fonts.
2. Available in Acrobat 5 only.
GB-EUC-H, GB-EUC-V, GBpc-EUC-H, GBpc-EUC-V,
GBK-EUC-H, GBK-EUC-V, GBKp-EUC-H, GBKp-EUC-V,
GBK2K-H, GBK2K-V, UniGB-UCS2-H, UniGB-UCS2-V
Traditional
Chinese
MHei-Medium
1
MSung-Light
1
MSungStd-Light-Acro
2
B5pc-H, B5pc-V, HKscs-B5-H, HKscs-B5-V, ETen-B5-H,
ETen-B5-V, ETenms-B5-H, ETenms-B5-V, CNS-EUC-H,
CNS-EUC-V, UniCNS-UCS2-H, UniCNS-UCS2-V
Japanese
HeiseiKakuGo-W5
1
HeiseiMin-W3
1
KozMinPro-Regular-Acro
2
83pv-RKSJ-H, 90ms-RKSJ-H, 90ms-RKSJ-V, 90msp-
RKSJ-H, 90msp-RKSJ-V, 90pv-RKSJ-H, Add-RKSJ-H,
Add-RKSJ-V, EUC-H, EUC-V, Ext-RKSJ-H, Ext-RKSJ-V,
H, V, UniJIS-UCS2-H, UniJIS-UCS2-V, UniJIS-UCS2-
HW-H, UniJIS-UCS2-HW-V
Korean
HYGoThic-Medium
1
HYSMyeongJo-Medium
1
HYSMyeongJoStd-Medium-
Acro
2
KSC-EUC-H, KSC-EUC-V, KSCms-UHC-H, KSCms-
UHC-V, KSCms-UHC-HW-H, KSCms-UHC-HW-V,
KSCpc-EUC-H, UniKS-UCS2-H, UniKS-UCS2-V
101
4.7 Chinese, Japanese, and Korean Text 103
Table 4.6 Predefined CMaps for Japanese, Chinese, and Korean text (from the PDF Reference)
locale
supported CMaps
description
Simplified
Chinese
UniGB-UCS2-H
UniGB-UCS2-V
Unicode (UCS-2) encoding for the Adobe-GB1 character collection
GB-EUC-H
GB-EUC-V
Microsoft Code Page 936 (charset 134), GB 2312-80 character set, EUC-CN
encoding
GBpc-EUC-H
GBpc-EUC-V
Macintosh, GB 2312-80 character set, EUC-CN encoding, Script Manager
code 2
GBK-EUC-H, -V
Microsoft Code Page 936 (charset 134), GBK character set, GBK encoding
GBKp-EUC-H
1
GBKp-EUC-V
1
Same as GBK-EUC-H, but replaces half-width Latin characters with
proportional forms and maps code 0x24 to dollar ($) instead of yuan (¥).
GBK2K-H
1
, -V
1
GB 18030-2000 character set, mixed 1-, 2-, and 4-byte encoding
Traditional
Chinese
UniCNS-UCS2-H
UniCNS-UCS2-V
Unicode (UCS-2) encoding for the Adobe-CNS1 character collection
B5pc-H
B5pc-V
Macintosh, Big Five character set, Big Five encoding, Script Manager code 2
HKscs-B5-H
1
HKscs-B5-V
1
Hong Kong SCS (Supplementary Character Set), an extension to the Big
Five character set and encoding
ETen-B5-H
ETen-B5-V
Microsoft Code Page 950 (charset 136), Big Five character set with ETen
extensions
ETenms-B5-H
ETenms-B5-V
Same as ETen-B5-H, but replaces half-width Latin characters with
proportional forms
CNS-EUC-H, -V
CNS 11643-1992 character set, EUC-TW encoding
Japanese
UniJIS-UCS2-H, -V
Unicode (UCS-2) encoding for the Adobe-Japan1 character collection
UniJIS-UCS2-HW-H
UniJIS-UCS2-HW-V
Same as UniJIS-UCS2-H, but replaces proportional Latin characters with
half-width forms
83pv-RKSJ-H
Macintosh, JIS X 0208 character set with KanjiTalk6 extensions, Shift-JIS
encoding, Script Manager code 1
90ms-RKSJ-H
90ms-RKSJ-V
Microsoft Code Page 932 (charset 128), JIS X 0208 character set with NEC
and IBM extensions
90msp-RKSJ-H
90msp-RKSJ-V
Same as 90ms-RKSJ-H, but replaces half-width Latin characters with
proportional forms
90pv-RKSJ-H
Macintosh, JIS X 0208 character set with KanjiTalk7 extensions, Shift-JIS
encoding, Script Manager code 1
Add-RKSJ-H, -V
JIS X 0208 character set with Fujitsu FMR extensions, Shift-JIS encoding
EUC-H, -V
JIS X 0208 character set, EUC-JP encoding
Ext-RKSJ-H, -V
JIS C 6226 (JIS78) character set with NEC extensions, Shift-JIS encoding
H, V
JIS X 0208 character set, ISO-2022-JP encoding
Korean
UniKS-UCS2-H -V
Unicode (UCS-2) encoding for the Adobe-Korea1 character collection
KSC-EUC-H, -V
KS X 1001:1992 character set, EUC-KR encoding
KSCms-UHC-H
KSCms-UHC-V
Microsoft Code Page 949 (charset 129), KS X 1001:1992 character set plus
8822 additional hangul, Unified Hangul Code (UHC) encoding
KSCms-UHC-HW-H
KSCms-UHC-HW-V
Same as KSCms-UHC-H, but replaces proportional Latin characters with
half-width forms
KSCpc-EUC-H
Macintosh, KS X 1001:1992 character set with Mac OS KH extensions, Script
Manager Code 3
1. Only available for PDF 1.4 / Acrobat 5 and above
VB.NET PDF - Convert PDF with VB.NET WPF PDF Viewer Create PDF from Text. PDF Export. Convert PDF to Word (.docx Image to PDF. Image: Remove Image from PDF Page. Image Data: Read, Extract Field Data. Data: Auto Fill
how to extract data from pdf file using java; using pdf forms to collect data
46
104
Chapter 4: Text Handling
Note Some PDFlib functions change their semantics according to the writing mode. For example,
PDF_continue_text( ) should not be used in vertical writing mode, and the character spacing
must be negative in order to spread characters apart in vertical writing mode.
CJK text encoding for standard CMaps. The client is responsible for supplying text en-
coded such that it matches the requested CMap. PDFlib does not check whether the sup-
plied text conforms to the requested CMap.
For multi-byte encodings, the high-order byte of a character must appear first. Alter-
natively, the byte ordering and text format can be selected with the textformat parame-
ter (see Section 4.5.1, »Unicode for Page Descriptions«, page 91) provided a UCS2-based
CMap is used.
Since several of the supported encodings may contain null characters in the text
strings, C developers must take care not to use the PDF_show( ) etc. functions, but in-
stead PDF_show2( ) etc. which allow for arbitrary binary strings along with a length pa-
rameter. For all other language bindings, the text functions support binary strings, and
PDF_show2( ) etc. are not required.
Restrictions for standard CJK fonts and CMaps. The following features are not sup-
ported for standard CJK fonts in combination with CMaps other than UCS2:
>calculating the extent of text with PDF_stringwidth( ) (but see Section 4.7.4, »Forcing
monospaced Fonts«, page 106)
>box formatting with PDF_show_boxed( ) (doesn’t work with any standard CMap, even
UCS2)
>activating underline/overline/strikeout mode
>retrieving the textx/texty position
These restrictions hold for standard CJK fonts. Note that although the widths of CJK text
cannot be queried in these cases, the width will nevertheless be generated correctly in
the PDF output. Also note the above features are well supported for custom CJK fonts.
Note The UniJIS-UCS2-HW-H/V CMaps are incorrectly treated as monospaced. This will be fixed in a
future release.
Note These restrictions will be lifted in a future release. We intend to offer extended support for oth-
er major CJK encodings including SJIS and EUC.
Standard CJK font example. Standard CJK fonts can be selected with the PDF_load_
font( ) interface, supplying the CMap name as the encoding parameter. However, you
must take into account that a given CJK font supports only a certain set of CMaps (see
Table 4.5), and that Unicode-aware language bindings support only UCS2-compatible
CMaps. The KozMinPro-Regular-Acro sample in Table 4.5 can been generated with the fol-
lowing code:
font = PDF_load_font(p, "KozMinPro-Regular-Acro", 0, "UniJIS-UCS2-H", "");
PDF_setfont(p, font, 24);
PDF_set_text_pos(p, 50, 500);
/* We use UTF-16 format with little-endian (LE) byte ordering */
PDF_set_parameter(p, "textformat", "utf16le");
PDF_show(p, "\xE5\x65\x2C\x67\x9E\x8A");
These statements locate one of the Japanese standard fonts, choosing a Shift-JIS-com-
patible CMap (Ext-RKSJ) and horizontal writing mode (H). The fontname parameter must
44
4.7 Chinese, Japanese, and Korean Text 105
be the exact name of the font without any encoding or writing mode suffixes. The
encoding parameter is the name of one of the supported CMaps (the choice depends on
the font) and will also indicate the writing mode (see above). PDFlib supports all of Acro-
bat’s default CMaps, and will complain when it detects a mismatch between the re-
quested font and the CMap. For example, PDFlib will reject a request to use a Korean
font with a Japanese encoding.
4.7.3 Custom CJK Fonts
In addition to Acrobat’s standard CJK fonts PDFlib supports custom CJK fonts (fonts out-
side the list in Table 4.5) in the TrueType (including TrueType Collections, TTC) and
OpenType formats. A custom CJK font will be processed as follows:
>The font will be converted to a CID font and embedded in the PDF output regardless
of the embedding setting provided by the client. Since PDFlib respects font embed-
ding restrictions which may be defined in a font, fonts which do not allow embed-
ding can not be used as custom CJK fonts.
>By default, font subsetting will be applied to all embedded custom CJK fonts; this can
be controlled with various parameters, see Section 4.3, »Font Embedding and Subset-
ting«, page 80.
>Proportional Latin characters and half-width characters are fully supported for cus-
tom CJK fonts.
>Japanese host font names can be supplied to PDF_load_font( ) as UTF-8 with initial
BOM, or UCS-2.
Note Original Composite Fonts (OCF) and raw PostScript CID fonts are not supported. Windows
EUDC fonts are supported.
Supported encodings for custom CJK fonts. Custom CJK fonts can be used with the fol-
lowing encodings:
>unicode encoding.
>8-bit encodings (although these are unlikely to be useful for CJK text)
>glyphid addressing (see Section 4.4.3, »Glyph ID Addressing for TrueType and Open-
Type Fonts«, page 89)
The textformat parameter will be evaluated for custom CJK fonts.
Restrictions for custom CJK fonts. The following features are currently not supported
for custom CJK fonts:
>Encodings other than those listed above can not be used. In particular, the CMaps
listed in Table 4.6 can not be used with custom CJK fonts, but only with the standard
CJK fonts.
>Vertical writing mode is not implemented.
Custom CJK font example. The following example uses the ArialUnicodeMS font to
display some Chinese text. The font must either be installed on the system or must be
configured according to Section 4.3.1, »How PDFlib Searches for Fonts«, page 80):
/* This is not required if the font is installed on the system */
PDF_set_parameter(p, "FontOutline", "Arial Unicode MS=ARIALUNI.TTF");
font = PDF_load_font(p, "Arial Unicode MS", 0, "unicode", "");
PDF_setfont(p, font, 24);
14
106
Chapter 4: Text Handling
PDF_set_text_pos(p, x, y);
/* We use UTF-16 format with big-endian (BE) byte ordering */
PDF_set_parameter(p, "textformat", "utf16be");
PDF_show2(p, "\x4e\x00\x50\x0b\x4e\xba", 6);
4.7.4 Forcing monospaced Fonts
Some applications are not prepared to deal with proportional CJK fonts, and calculate
the extent of text based on a constant glyph width and the number of glyphs. PDFlib
can be instructed to force monospaced glyphs even for fonts that usually have glyphs
with varying widths. Use the monospace option of PDF_load_font( ) to specify the desired
width for all glyphs. For standard CJK fonts the value 1000 will result in pleasing results:
font = PDF_load_font(p, "KozMinPro-Regular-Acro", 0, "UniJIS-UCS2-H", "monospace 1000");
The monospace option is only recommended for standard CJK fonts.
VB.NET PDF- HTML5 PDF Viewer for VB.NET Project Create PDF from Text. PDF Export. Convert PDF to Word (.docx Image to PDF. Image: Remove Image from PDF Page. Image Data: Read, Extract Field Data. Data: Auto Fill
how to save a pdf form in reader; how to fill out pdf forms in reader
36
4.8 Placing and Fitting Text 107
4.8 Placing and Fitting Text
The function PDF_fit_textline( ) for placing a single line of text on a page offers a wealth
of formatting options. The most important options will be discussed in this section us-
ing some common application examples. A complete description of these options can
be found in Table 7.10. Most options for PDF_fit_textline( ) are identical to those of PDF_
fit_image( ) . Therefore we will only use text-related examples here; it is recommended
to take a look at the examples in Section 5.3, »Placing Images and Imported PDF Pages«,
page 121, for an introduction.
The examples below demonstrate only the relevant call of the function PDF_fit_
textline( ) , assuming that the required font has already been loaded and set in the de-
sired font size.
PDF_fit_textline( ) uses the so-called text box to determine the positioning of the text:
the width of the text box is identical to the width of the text, and the box height is iden-
tical to the height of capital letters in the font. The text box can be extended to the left
and right or top and bottom using the margin option. The margin will be scaled along
with the text line.
4.8.1 Simple Text Placement
Placing text in the bottom center. We place text at the reference point such that the
text box will be positioned with the center of its bottom line at the reference point (see
Figure 4.6):
PDF_fit_textline(p, text, 297, 0, "position {50 0}");
This code fragment places the text box with the bottom center (position {50 0}) at the ref-
erence point (297, 0).
Placing text in the top right corner. Now we place the text at the reference point such
that the text box will be placed with the upper right corner at the reference point (see
Figure 4.7):
PDF_fit_textline(p, text, 595, 842, "position 100");
Kraxi
Kraxi
Fig. 4.6
Placing text in the
bottom center
Fig. 4.7
Placing text in the upper
right corner
36
108
Chapter 4: Text Handling
This code fragment places the text box with the upper right corner (position 100) at the
reference point (595, 842).
Placing text with a margin. To extend the previous example we can add a horizontal
margin to the text to achieve a certain distance to the right. This may be useful for plac-
ing text in table columns:
PDF_fit_textline(p, text, 595, 842, "position 100 margin {20 0}");
4.8.2 Placing Text in a Box
Placing centered text in a box. We define a box and place the text centered within the
box (see Figure 4.8):
PDF_fit_textline(p, text, 10, 200, "boxsize {500 220} position 50");
This code fragment places the text centered (position 50) in a box with the lower left cor-
ner at (10, 200), 500 units wide and 220 units high (boxsize {500 220}).
Proportionally fitting text to a box. We extend the previous example and fit the text
into the box completely (see Figure 4.9):
PDF_fit_textline(p, text, 10, 200, "boxsize {500 220} position 50 fitmethod meet");
Note that the font size will be changed when text is fit into the box with fitmethod meet.
In order to prevent the text from being scaled up use auto instead of meet.
Completely fitting text to a Box. We can further modify the previous example such
that the text will not be fit into the box proportionally, but completely covers the box.
However, this combination will only rarely be used since the text may be distorted (see
Figure 4.10):
PDF_fit_textline(p, text, 10, 200, "boxsize {500 220} position 50 fitmethod entire");
Kraxi
Fig. 4.8
Placing centered text in a
box
Fig. 4.10
Completely fitting text to a
box
Kraxi
Kraxi
Fig. 4.9
Proportionally fitting text to
a box
20
4.8 Placing and Fitting Text 109
4.8.3 Aligning Text
Simple alignment. Our next goal is to rotate text such that its original lower left cor-
ner will be placed at a given reference point (see Figure 4.11). This may be useful, for ex-
ample, for placing a rotated column heading in a table header:
PDF_fit_textline(p, text, 5, 5, "orientate west");
This code fragment orientates the text to the west (90˚ counterclockwise) and then
translates it the lower left corner of the rotated text to the reference point (5, 5).
Aligning text at a vertical line. Positioning text along a vertical line (i.e., a box with
zero width) is a somewhat extreme case which may be useful nevertheless (see Figure
4.12):
PDF_fit_textline(p, text, 0, 0, "boxsize {0 600} position {0 50} orientate west");
This code fragment rotates the text, and places it at the center of the line from (0, 0) to
(0, 600).
Kraxi
Kraxi
Fig. 4.11
Simple Aligning
Fig. 4.12
Aligning text at a vertical line
2
110
Chapter 4: Text Handling
Documents you may be interested
Documents you may be interested