characters to be used in a matched string; what they do specify, however, are legal
positions at which a match can occur. Sometimes these elements are called regular-
expression anchors because they anchor the pattern to a specific position in the search
string. The most commonly used anchor elements are
, which ties the pattern to the
beginning of the string, and
, which anchors the pattern to the end of the string.
. If you want to search for “Java” as a word by itself (not as
, which requires a
space before and after the word. But there are two problems with this solution. First,
it does not match “Java” at the beginning or the end of a string, but only if it appears
with space on either side. Second, when this pattern does find a match, the matched
string it returns has leading and trailing spaces, which is not quite what’s needed. So
instead of matching actual space characters with
, match (or anchor to) word boun-
. The resulting expression is
. The element
match to a location that is not a word boundary. Thus, the pattern
You can also use arbitrary regular expressions as anchor conditions. If you include an
characters, it is a lookahead assertion, and it specifies that
the enclosed characters must match, without actually matching them. For example, to
match the name of a common programming language, but only if it is followed by a
colon, you could use
. This pattern matches the word
“Java in a Nutshell”, because it is not followed by a colon.
If you instead introduce an assertion with
, it is a negative lookahead assertion,
which specifies that the following characters must not match. For example,
matches “Java” followed by a capital letter and any number of
additional ASCII word characters, as long as “Java” is not followed by “Script”. It
matches “JavaBeans” but not “Javanese”, and it matches “JavaScrip” but not “Java-
Table 10-5 summarizes regular-expression anchors.
Table 10-5. Regular-expression anchor characters
Match the beginning of the string and, in multiline searches, the beginning of a line.
Match the end of the string and, in multiline searches, the end of a line.
Match a word boundary. That is, match the position between a
character and a
character or between a
character and the beginning or end of a string. (Note, however, that
Match a position that is not a word boundary.
A positive lookahead assertion. Require that the following characters match the pattern
, but do not include
those characters in the match.
A negative lookahead assertion. Require that the following characters do not match the pattern
258 | Chapter 10: Pattern Matching with Regular Expressions