25.2 The Golden Rule
So the rule of thumb: if you are not dealing with binary data, work with Unicode.
What does working with Unicode in Python 2.x mean?
• as long as you are using ASCII charpoints only (basically numbers, some special
characters oflatinletters withoutumlauts or anything fancy)you canuse regular
string literals (’Hello World’).
• if you need anything else than ASCII in a string you have to mark this string as
Unicode string by preﬁxing it with a lowercase u. (like u’Hänsel und Gretel’)
• if you are using non-Unicode characters in your Python ﬁles you have to tell
Python which encoding your ﬁle uses. Again, I recommend UTF-8 for this pur-
pose. To tell the interpreter your encoding you can put the # -*- coding: utf-8
-*- into the ﬁrst or second line of your Python source ﬁle.
• Jinja is conﬁgured to decode the template ﬁles from UTF-8. So make sure to tell
your editor to save the ﬁle as UTF-8 there as well.
25.3 Encoding and Decoding Yourself
If you are talking with a ﬁlesystem or something that is not really based on Unicode
you will have to ensure that you decode properly when working with Unicode inter-
face. So for example if you want to load a ﬁle on the ﬁlesystem and embed it into a
Jinja2 template you will have to decode it from the encoding of that ﬁle. Here the old
problem that text ﬁles do not specify their encoding comes into play. So do yourself a
favour and limit yourself to UTF-8 for text ﬁles as well.
Anyways. To load such a ﬁle with Unicode you can use the built-in str.decode()
def read_file(filename, charset=utf-8):
with open(filename, r) as f:
To go from Unicode into a speciﬁc charset such as UTF-8 you can use the
def write_file(filename, contents, charset=utf-8):
with open(filename, w) as f:
25.4 Conﬁguring Editors
Most editors save as UTF-8 by default nowadays but in case your editor is not conﬁg-
ured to do this you have to change it. Here some common ways to set your editor to
store as UTF-8: