40
36
Chapter 2: PDFlib Language Bindings
More than one way of String handling. Depending on the requirements of your appli-
cation you can work with UTF-8, UTF-16, or legacy encodings. The following code snip-
pets demonstrate all three variants. All examples create the same Japanese output, but
accept the string input in different formats.
The first example works with Unicode UTF-8 and uses the Unicode::String module
which is part of most modern Perl distributions, and available on CPAN). Since Perl
works with UTF-8 internally no explicit UTF-8 conversion is required:
use Unicode::String qw(utf8 utf16 uhex);
...
PDF_set_parameter($p, "textformat", "utf8");
$font = PDF_load_font($p, "Arial Unicode MS", "unicode", "");
PDF_setfont($p, $font, 24.0);
PDF_set_text_pos($p, 50, 700);
PDF_show($p, uhex("U+65E5 U+672C U+8A9E"));
The second example works with Unicode UTF-16 and little-endian byte order:
PDF_set_parameter($p, "textformat", "utf16le");
$font = PDF_load_font($p, "Arial Unicode MS", "unicode", "");
PDF_setfont($p, $font, 24.0);
PDF_set_text_pos($p, 50, 700);
PDF_show($p, "\xE5\x65\x2C\x67\x9E\x8A");
The third example works with Shift-JIS. Except on Windows systems it requires access to
the 90ms-RKSJ-H CMap for string conversion:
PDF_set_parameter($p, "SearchPath", "../../../resource/cmap");
$font = PDF_load_font($p, "Arial Unicode MS", "cp932", "");
PDF_setfont($p, $font, 24.0);
PDF_set_text_pos($p, 50, 700);
PDF_show($p, "\x93\xFA\x96\x7B\x8C\xEA");
Unicode and legacy encoding conversion. For the convenience of PDFlib users we list
some useful string conversion methods here. Please refer to the Perl documentation for
more details. The following constructor creates a Unicode string from a byte array:
$logos="\x{039b}\x{03bf}\x{03b3}\x{03bf}\x{03c3}\x{0020}" ;
The following constructor creates a Unicode string from the Unicode character name:
$delta = "\N{GREEK CAPITAL LETTER DELTA}";
The Encode module supports many encodings and has interfaces for converting be-
tween those encodings:
use Encode 'decode';
$data = decode("iso-8859-3", $data);
# convert from legacy to UTF-8