74
Rich Text Format Specification v. 1.2
Page 3
A group consists of text and control words or control symbols enclosed in braces ({}). The opening brace ({)
indicates the start of the group and the closing brace (}) indicates the end of the group. Each group specifies
the text affected by the group and the different attributes of that text. The RTF file can also include groups
for fonts, styles, screen color, pictures, footnotes, annotations, headers and footers, summary information,
fields, and bookmarks, as well as document-, section-, paragraph-, and character-formatting properties. If the
font, style, screen-color, and summary-information groups and document-formatting properties are included,
they must precede the first plain-text character in the document. These groups form the RTF file header. If
the group for fonts is included, it should precede the group for styles. If any group is not used, it can be
omitted. The groups are discussed in the following sections.
Certain control words control properties (such as bold, italic, keep together, and so forth) that have only
two states. When such a control word has no parameter or has a non-zero parameter, it is assumed that the
control word turns on the property. When such a control word has a parameter of 0 (zero), it is assumed that
the control word turns off the property. For example, \
b turns on bold, whereas \
b0 turns off bold.
Certain control words, referred to as destinations, mark the beginning of a collection of related text which
could appear at another position, or destination, within the document. Destinations may also be text which
is used but should not appear within the document at all. An example of a destination is the \
footnote
group, where the footnote text follows the control word. Destination control words and their following text
must be enclosed in braces. Destinations added after the RTF specification published in the March 1987
Microsoft Systems Journal may be preceded by the control symbol \
*. This control symbol identifies
destinations whose related text should be ignored if the RTF reader does not recognize the destination. (RTF
writers should follow the convention of using this control symbol when adding new destinations or
groups.) Destinations whose related text should be inserted into the document even if the RTF reader does
not recognize the destination should not use \
*. All destinations that were not included in the March 1987
revision of the RTF specification are shown with \
* as part of the control word.
Formatting specified within a group affects only the text within that group. Generally, text within a group
inherits the formatting of the text in the preceding group. However, Microsoft implementations of RTF
assume that the footnote, annotation, header, and footer groups (described later in this chapter) do not inherit
the formatting of the preceding text. Therefore, to ensure that these groups are always formatted correctly,
you should set the formatting within these groups to the default with the \
sectd, \
pard, and \
plain
control words, and then add any desired formatting.
The control words, control symbols, and braces constitute control information. All other characters in the
file are plain text. Here is an example of plain text that does not exist within a group:
{\rtf\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}{\f1\fdecor
Symbol;}{\f2\fswiss Helv;}}{\colortbl;\red0\green0\blue0;
\red0\green0\blue255;\red0\green255\blue255;\red0\green255\
blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\
green255\blue0;\red255\green255\blue255;}{\stylesheet{\fs20
\snext0Normal;}}{\info{\author John Doe}
{\creatim\yr1990\mo7\dy30\hr10\min48}{\version1}{\edmins0}
{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\widoctrl\ftnbj \sectd\linex0\endnhere
\pard\plain \fs20 This is plain text.\par}
The phrase “This is plain text” is not part of a group and is treated as document text.
As previously mentioned, the backslash (
\
) and braces ({ }) have special meaning in RTF. To use these
characters as text, precede them with a backslash, as in \
\
, \
{, and \
}.
Conventions of an RTF Reader
The reader of an RTF stream is concerned with the following:
Æ
Separating control information from plain text.
Æ
Acting on control information.
Æ
Collecting and properly inserting text into the document, as directed by the current group state.
Acting on control information is designed to be a relatively simple process. Some control information
simply contributes special characters to the plain text stream. Other information serves to change the
program state, which includes properties of the document as a whole, or to change any of a collection of
group states, which apply to parts of the document.
As previously mentioned, a group state can specify the following:
71
Rich Text Format Specification v. 1.2
Page 4
Æ
The destination, or part of the document that the plain text is constructing.
Æ
Character-formatting properties, such as bold or italic.
Æ
Paragraph-formatting properties, such as justified or centered.
Æ
Section-formatting properties, such as the number of columns.
Æ
Table-formatting properties, which define the number of cells and dimensions of a table row.
In practice, an RTF reader will evaluate each character it reads in sequence as follows:
Æ
If the character is an opening brace ({), the reader stores its current state on the stack. If the character is
a closing brace (}), the reader retrieves the current state from the stack.
Æ
If the character is a backslash, the reader collects the control word or control symbol and its parameter,
if any, and looks up the control word or control symbol in a table that maps control words to actions.
It then carries out the action prescribed in the table. (The possible actions are discussed below.) The
read pointer is left before or after a control-word delimiter, as appropriate.
Æ
If the character is anything other than opening brace ({), closing brace (}), or backslash (\)
, the reader
assumes that the character is plain text and writes the character to the current destination using current
formatting properties.
If the RTF reader cannot find a particular control word or control symbol in the look-up table described
above, the control word or control symbol should be ignored. If a control word or control symbol is
preceded by an opening brace ({), it is part of a group. The current state should be saved on the stack, but no
state change should occur. When a closing brace (}) is encountered, the current state should be retrieved from
the stack, thereby resetting the current state. If the \
* control symbol precedes a control word, then it
defines a destination group and was itself preceded by an opening brace ({). The RTF reader should discard
all text up to and including the closing brace (}) that closes this group. All RTF readers must recognize all
destinations defined in the March 1987 RTF specification. The reader may skip past the group, but it is not
allowed to simply discard the control word. Destinations defined since March 1987 are marked with the \*
control symbol.
Note
All RTF readers must implement the \* control symbol to be able to read RTF files written by newer RTF
writers.
For control words or control symbols that the RTF reader can find in the look-up table, the possible actions
are as follows.
Change Destination
The RTF reader changes the destination to the destination described in the table entry. Destination changes
are legal only immediately after an opening brace ({). (Other restrictions may also apply; for example,
footnotes cannot be nested.) Many destination changes imply that the current property settings will be reset
to their default settings. Examples of control words that change destination are \
footnote, \
header,
\
footer, \
pict, \
info, \
fonttbl, \
stylesheet, and \
colortbl. This chapter identifies all destination
control words where they appear in control-word tables.
Change Formatting Property
The RTF reader changes the property as described in the table entry. The entry will specify whether a
parameter is required. “Alphabetic List of RTF Keywords,” later in this chapter, also specifies which control
words require parameters. If a parameter is needed and not specified, then a default will be used. The default
value used depends on the control word. If the control word does not specify a default, then all RTF readers
should assume a default of 0.
Insert Special Character
The reader inserts into the document the character code or codes described in the table entry.
Insert Special Character and Perform Action
The reader inserts into the document the character code or codes described in the table entry and performs
whatever other action the entry specifies. For example, when Microsoft Word interprets \
par, a paragraph
mark is inserted in the document and special code is run to record the paragraph properties belonging to that
paragraph mark.
68
Rich Text Format Specification v. 1.2
Page 5
Formal Syntax
This chapter describes RTF using the following syntax, based on Backus-Naur Form:
Syntax
Meaning
#PCDATA
Text (without control words)
#SDATA
Hexadecimal data
#BDATA
Binary data
'c'
A literal
<text>
A non-terminal
a
The (terminal) control word a, without a parameter.
a
The (terminal) control word a, with a parameter
a?
Item a is optional.
a+
One or more repetitions of item a.
a*
Zero or more repetitions of item a.
a b
Item a followed by item b.
a | b
Item a or item b
a & b
Item a and/or item b, in any order
Contents of an RTF File
An RTF file has the following syntax:
<File>
'{' <header> <document>'}'
This syntax is overly strict; all RTF readers must read RTF that does not conform to this syntax. However,
all RTF readers must correctly read RTF written according to this syntax. If you write RTF that conforms
to this syntax, all correct RTF readers will read it.
Header
The header has the following syntax:
<header>
\
rtf <charset> \
deff? <fonttbl> <colortbl> <stylesheet>?
RTF Version
An entire RTF file is considered a group and must be enclosed in braces. The control word \
rtfN must
follow the opening brace. The numeric parameter N identifies the version of the RTF standard used. The
RTF standard described in this chapter corresponds to RTF Specification Version 1.
Character Set
After specifying the RTF version, you must declare the character set used in this document. The control
word for the character set must precede any plain text or any table control words. The RTF specification
currently supports the following character sets:
Control word Character set
\
ansi
ANSI (default)
\
mac
Apple Macintosh
\
pc
IBM PC code page 437
\
pca
IBM PC code page 850, used by IBM Personal System/2 (not implemented in version 1
of Word for OS/2)
102
Rich Text Format Specification v. 1.2
Page 6
Font Table
The \
fonttbl control word introduces the font table group. This group defines the fonts available in the
document and has the following syntax:
<fonttbl>
'{' \
fonttbl (<fontinfo> | ('{' <fontinfo> '}'))+ '}'
<fontinfo>
<fontnum><fontfamily><fcharset><fprq><fontemb>?<codepage>?
<fontname><fontaltname> ';'
<fontnum>
\
f
<fontfamily>
\
fnil | \
froman | \
fswiss | \
fmodern | \
fscript | \
fdecor | \
ftech | \
fbidi
<fcharset>
\
fcharset
<fprq>
\
fprq
<fontname>
#PCDATA
<fontaltname>
'{\*' \falt #PCDATA '}'
<fontemb>
'{\
*' \
fontemb <fonttype> <fontfname>? <data>? '}'
<fonttype>
\
ftnil | \
fttruetype
<fontfname>
'{\
* \
fontfile <codepage>? #PCDATA '}'
<codepage>
\
cpg
Note for <fontemb> that either <fontname> or <data> must be present, although both may be present.
All fonts available to the RTF writer can be included in the font table, even if the document doesn't use all
the fonts.
RTF also supports font families, so that applications can attempt to intelligently choose fonts if the exact
font is not present on the reading system. RTF uses the following control words to describe the various
font families.
Control word Font family
\
fnil
Unknown or default fonts (default)
\
froman
Roman, proportionally spaced serif fonts (Tms Rmn, Palatino, etc.)
\
fswiss
Swiss, proportionally spaced sans serif fonts (Swiss, etc.)
\
fmodern
Fixed-pitch serif and sans serif fonts (Courier, Pica, etc.)
\
fscript
Script fonts (Cursive, etc.)
\
fdecor
Decorative fonts (Old English, ITC Zapf Chancery, etc.)
\
ftech
Technical, symbol, and mathematical fonts (Symbol, etc.)
\
fbidi
Arabic, Hebrew, or other bi-directional font (Miriam, etc.)
If an RTF file uses a default font, the default font number is specified with the \
deffN control word, which
must precede the font-table group. The RTF writer supplies the default font number used in the creation of
the document as the numeric argument N. The RTF reader then translates this number through the font
table into the most similar font available on the reader's system.
The following control words specify the character set and pitch of a font in the font table:
Control word
Definition
\
fcharsetN
Specifies the character set of a font in the font table.
\
fprqN
Specifies the pitch of a font in the font table.
If \fcharset is specified, the N argument can be one of the following types:
Character set
N value
ANSI_CHARSET
0
SYMBOL_CHARSET
2
81
Rich Text Format Specification v. 1.2
Page 7
SHIFTJIS_CHARSET
128
GREEK_CHARSET
161
TURKISH_CHARSET
162
HEBREW_CHARSET
177
ARABICSIMPLIFIED_CHARSET
178
ARABICTRADITIONAL_CHARSET
179
ARABICUSER_CHARSET
180
HEBREWUSER_CHARSET
181
CYRILLIC_CHARSET
204
EASTERNEUROPE_CHARSET
238
PC437_CHARSET
254
OEM_CHARSET
255
If \fprq is specified, the N argument can be one of the following values:
Pitch
Value
Default pitch
0
Fixed pitch
1
Variable pitch
2
Code Page Support
A font may have a different character set from the character set of the document. For example, the Symbol
font has the same characters in the same positions on both the Macintosh and Windows. RTF describes this
with the \
cpg control word, which names the character set used by the font. In addition, file names (used in
field instructions and in embedded fonts) may not necessarily be the same as the character set of the
document, and the \
cpg control word can change the character set for these file names, as well. However,
all RTF documents must still declare a character set, to maintain backwards compatibility with older RTF
readers.
The table below describes valid values for \
cpg:
Value
Description
437
United States IBM
708
Arabic (ASMO 708)
709
Arabic (ASMO 449+, BCON V4)
710
Arabic (Transparent Arabic)
711
Arabic (Nafitha Enhanced)
720
Arabic (Transparent ASMO)
819
Windows 3.1 (United States & Western Europe)
850
IBM Multilingual
852
Eastern European
860
Portuguese
862
Hebrew
863
French Canadian
864
Arabic
865
Norwegian
866
Soviet Union
932
Japanese
102
Rich Text Format Specification v. 1.2
Page 8
1250
Windows 3.1 (Eastern European)
1251
Windows 3.1 (Soviet Union)
Font Embedding
RTF supports embedded fonts with the \
fontemb group located inside a font definition. An embedded font
can be specified by a file name, or the actual font data may be located inside the group. If a file name is
specified, it is contained in the \
fontfile group. The \
cpg control word can be used to specify the character
set for the file name.
RTF supports TrueTypeÔ and other embedded fonts. The type of the embedded font is described by the
following control words:
Control word Embedded font type
\
ftnil
Unknown or default font type (default)
\
fttruetype
TrueType font
The File Table
The \
filetbl control word introduces the file table destination, a new destination. This group defines the
files referenced in the document and has the following syntax:
<filetbl>
'{\
*' \
filetbl ('{' <fileinfo> '}')+ '}'
<fileinfo>
'{' \
file <filenum><relpath>?<osnum>? <filesource>+ <filename> ';}'
<filenum>
\
fid
<relpath>
\
frelative
<osnum>
\
fosnum
<filesource>
\
fvalidmac | \
fvaliddos | \
fvalidntfs | \
fvalidhpfs | \
fnetwork
<filename>
#PCDATA
Note that the filename can be any valid alphanumeric string for the named file system, giving the complete
path and filename.
Control word Definition
\
filetbl
A structure analogous to the style or font table, the file table is a list of documents
referenced by the current document. This is a destination control word output as part of
the document header.
\
file
This marks the beginning of a file group, which lists relevant information about the
referenced file. This is a destination control word.
\
fidN
File ID number. Files are referenced later in the document using this number.
\
frelativeN
The character position within the path (starting at zero) where the referenced file's path
starts to be relative to the path of the owning document. For example, a document is
saved to the path c:\
private\
resume\
foo.doc and its file table contains the path
c:\
private\
resume\
edu\
bar.doc, then that entry in the file table will be \
frelative18, to
point at the character 'e' in "edu". This is to allow preservation of relative paths.
\
fosnumN
Currently only filled in for paths from the Macintosh file system. It is a OS-specific
number for identifying the file, which may be used to speed up access to the file, or find
it if it has been moved to another folder on disk. The MacOS name for this number is
the "file id". Additional meanings of the \
fosnumN may be defined for other file
systems in the future.
\
fvalidmac
Macintosh file system.
\
fvaliddos
MS-DOS file system.
\
fvalidntfs
NTFS file system.
\
fvalidhpfs
HPFS file system.
85
Rich Text Format Specification v. 1.2
Page 9
\
fnetwork
Network file system. This keyword may be used in conjunction with any of the previous
file source keywords.
Color Table
The \
colortbl control word introduces the color table group, which defines screen colors, character colors,
and other color information. This group has the following syntax:
<colortbl>
'{' \
colortbl <colordef>+ '}'
<colordef>
\
red ? & \
green ? & \
blue ? ';'
The following are valid control words for this group:
Control word Meaning
\
redN
Red index
\
greenN
Green index
\
blueN
Blue index
Each definition must be delimited by a semicolon, even if the definition is omitted. If a color definition is
omitted, the RTF reader uses its default color. In the example below, three colors are defined. The first color
is omitted, as shown by the semicolon following the \
colortbl control word.
{\colortbl;\red0\green0\blue0;\red0\green0\blue255;}
The foreground and background colors use indexes into the color table to define a color. For more
information on color setup, see your Windows documentation.
The following example defines a block of text in color (where supported). Note that the cf/cb index is the
index of an entry in the color table, which represents a red/green/blue color combination.
{\f1\cb1\cf2 This is colored text. The background is color
1 and the foreground is color 2.}
If the file is translated for software that does not display color, the reader ignores the color-table group.
Style Sheet
The \
stylesheet control word introduces the style sheet group, which contains definitions and
descriptions of the various styles used in the document. All styles in the document's style sheet can be
included, even if not all the styles are used. In RTF, a style is a shorthand used to specify a set of character,
paragraph, or section formatting.
The style-sheet group has the following syntax:
<stylesheet>
'{' \
stylesheet <style>+ '}'
<style>
'{' <styledef>?<keycode>? <formatting> <additive>? <based>? <next>? <stylename>? ';'
'}'
<styledef>
\
s | \
cs | \
ds
<keycode>
'{' \
keycode <keys> '}'
<additive>
\additive
<based>
\
sbasedon
<next>
\
snext
<formatting>
(<brdrdef> | <parfmt> | <apoctl> | <tabdef> | <shading> | <chrfmt>)+
<stylename>
#PCDATA
<keys>
( \
shift? & \
ctrl? & \
alt?) <key>
<key>
\
fn | #PCDATA
For <style>, both <styledef> and <stylename> are optional; the default is paragraph style 0. Note for
<stylename> that Microsoft Word for the Macintosh interprets commas in #PCDATA as separating style
synonyms. Also, for <key>, the data must be exactly one character.
95
Rich Text Format Specification v. 1.2
Page 10
Control word Meaning
\
additive
Used in a character style definition ({\
*\
cs_). Indicates that style attributes are to be
applied in addition to current attributes, rather than setting the character attributes to only
the style definition.
\
sbasedonN
Defines the number of the style on which the current style is based (default is 222-no
style).
\
snextN
Defines the next style associated with the current style; if omitted, the next style is the
current style.
\
keycode
This group is specified within the description of a style in the style sheet in the RTF
header. The syntax for this group is {\
*\
keycode Keys} where Keys are the characters used
in the key code. For example, a style, Normal, may be defined {\
s0 {\
*\
keycode
\
shift\
ctrl n}Normal;} within the RTF style sheet. See the Special Character control
words for the characters outside of the alphanumeric range that may be used.
\
alt
The
ALT
modifier key. Used to describe quick-key codes for styles.
\
shift
The
SHIFT
modifier key. Used to describe quick-key codes for styles.
\
ctrl
The
CTRL
modifier key. Used to describe quick-key codes for styles.
\
fnN
Specifies a function key where N is the function key number. Used to describe quick-key
codes for styles.
The following is an example of an RTF style sheet:
{\stylesheet{\fs20 \sbasedon222\snext0{\*\keycode \shift\ctrl n}
Normal;}{\s1\ar \fs20 \sbasedon0\snext1 FLUSHRIGHT;}{\s2\fi-
720\li720\fs20\ri2880\sbasedon0\snext2 IND;}}
and RTF paragraphs to which the styles are applied:
\widowctrl\ftnbj\ftnrestart \sectd \linex0\endnhere \pard\plain
\fs20 This is Normal style.
\par \pard\plain \s1
This is right justified. I call this style FLUSHRIGHT.
\par \pard\plain \s2
This is an indented paragraph. I call this style IND. It produces
a hanging indent.
\par}
In the preceding example, the PostScript style is declared but not used. Some of the control words in this
example are discussed in later sections.
Revision Marks
This table allows tracking of multiple authors and reviewers of a document, and is used in conjunction with
the character properties for revision marks.
Control word Definition
\
revtbl
This group consists of subgroups that each identify the author of a revision in the
document, as in
{Author1;}.
This is a destination control word.
Revision conflicts, such as one author deleting another's additions, are stored as one
group, in the following form:
CurrentAuthor\'00\'<length of previousauthor's name>PreviousAuthor\'00
PreviousRevisionTime
The four bytes of the DTTM strucutre are emitted as ASCII characters, so values > 127
should be emitted as quoted hex values.
All time references for revision marks use the following bit field structure, DTTM:
Bit numbers
Information
Range
0–5
Minute
0–59
6–10
Hour
0–23
11–15
Day of month
1–31
Documents you may be interested
Documents you may be interested