59
Python Module of the Week, Release 1.132
The pattern in the example matches the first orlast word of the input. It matches line. at the end of the string,even
though there is no newline.
$ python re_flags_multiline.py
Text
: ’This is some text -- with punctuation.\nAnd a second line.’
Pattern
: (^\w+)|(\w+\S
*
$)
Single Line : [(’This’, ’’), (’’, ’line.’)]
Multline
: [(’This’, ’’), (’’, ’punctuation.’), (’And’, ’’), (’’, ’line.’)]
DOTALL is the other flag related to multiline text. Normally the dot character . matches everything in the input text
except a newline character. The flag allows dot to match newlines as well.
import re
text = ’This is s some text -- - with punctuation.\nAnd a second d line.’
pattern = r’.+’
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)
print ’Text
:’, repr(text)
print ’Pattern
:’, pattern
print ’No newlines s :’, no_newlines.findall(text)
print ’Dotall
:’, dotall.findall(text)
Without the flag, each line of the input text matches the pattern separately. Adding the flag causes the entire string to
be consumed.
$ python re_flags_dotall.py
Text
: ’This is some text -- with punctuation.\nAnd a second line.’
Pattern
: .+
No newlines : [’This is some text -- with punctuation.’, ’And a second line.’]
Dotall
: [’This is some text -- with punctuation.\nAnd a second line.’]
Unicode
Under Python 2, str objects use the ASCII character set,and regular expression processing assumes that the pattern
and input text are both ASCII. The escape codes described earlier are defined in terms of ASCII by default. Those
assumptions mean that the pattern \w+ will match the word“French” but not “Français”, since the ç is not partof the
ASCII character set. To enable Unicode matching in Python 2,add the UNICODE flag when compiling the pattern.
import re
import codecs
import sys
# Set standard output encoding to UTF-8.
sys.stdout = codecs.getwriter(’UTF-8’)(sys.stdout)
text = u’Français złoty Österreich’
pattern = ur’\w+’
ascii_pattern = re.compile(pattern)
unicode_pattern = re.compile(pattern, re.UNICODE)
print ’Text
:’, text
print ’Pattern :’, pattern
print ’ASCII
:’, u’, ’.join(ascii_pattern.findall(text))
print ’Unicode :’, u’, ’.join(unicode_pattern.findall(text))
74
Chapter 6. String Services