36
25.2 The Golden Rule
So the rule of thumb: if you are not dealing with binary data, work with Unicode.
What does working with Unicode in Python 2.x mean?
• as long as you are using ASCII charpoints only (basically numbers, some special
characters oflatinletters withoutumlauts or anything fancy)you canuse regular
string literals (’Hello World’).
• if you need anything else than ASCII in a string you have to mark this string as
Unicode string by prefixing it with a lowercase u. (like u’Hänsel und Gretel’)
• if you are using non-Unicode characters in your Python files you have to tell
Python which encoding your file uses. Again, I recommend UTF-8 for this pur-
pose. To tell the interpreter your encoding you can put the # -*- coding: utf-8
-*- into the first or second line of your Python source file.
• Jinja is configured to decode the template files from UTF-8. So make sure to tell
your editor to save the file as UTF-8 there as well.
25.3 Encoding and Decoding Yourself
If you are talking with a filesystem or something that is not really based on Unicode
you will have to ensure that you decode properly when working with Unicode inter-
face. So for example if you want to load a file on the filesystem and embed it into a
Jinja2 template you will have to decode it from the encoding of that file. Here the old
problem that text files do not specify their encoding comes into play. So do yourself a
favour and limit yourself to UTF-8 for text files as well.
Anyways. To load such a file with Unicode you can use the built-in str.decode()
method:
def read_file(filename, charset=utf-8):
with open(filename, r) as f:
return f.read().decode(charset)
To go from Unicode into a specific charset such as UTF-8 you can use the
unicode.encode() method:
def write_file(filename, contents, charset=utf-8):
with open(filename, w) as f:
f.write(contents.encode(charset))
25.4 Configuring Editors
Most editors save as UTF-8 by default nowadays but in case your editor is not config-
ured to do this you have to change it. Here some common ways to set your editor to
store as UTF-8:
252