The National Archives A Guide to Formats Version: 1
Page 27 of 83
Some user-editable document formats track changes to the content (but usually not all kinds of
content), and allow review and commenting of the content by different parties. User-defined
fields may exist to contain defined data (e.g. to support mail-merge functionality). Many
document formats have specifically defined fields to hold user metadata, such as the author of a
document. They may also have embedded dependencies on external data (e.g. a link to
another file on a disk, which can break if either file is moved), and cross-links within the
document which can also break.
Some features of document file formats only exist to preserve backwards compatibility with
documents written in earlier formats. While this mitigates some continuity risks, it also further
increases the complexity of the formats going forwards.
All document migration carries risk, due to the complexity of document formats. It is entirely
normal that a document migration will lose or change some features of the original, unless the
document is very simple. In many cases, the change or loss can be quite minimal and may not
be considered vital (e.g. the style of a heading changes slightly). However, it is essential that all
document migrations are tested thoroughly on a selected set of candidate documents, to assure
that essential features are not lost in the process. Document migration can be largely separated
into three broad types of migration, which typically carry different risks:
within a family of file formats (e.g. Microsoft Word 95 to Microsoft Word 97-2003)
across format families (e.g. Microsoft Rich Text Format to OpenDocument Text 1.1)
from a user-editable to a page-layout format (e.g. OpenDocument Text 1.1 to PDF 1.7).
Within a family of file formats
Upgrading within a family of file formats generally poses few direct continuity risks, as most file
formats are specifically engineered to be backwards-compatible with earlier versions of the
‘same’ format. However, migration is never risk free, and some small changes to documents
may be found – e.g. styles and formatting may change. By contrast, downgrading to earlier
versions may entirely lose formatting, embedded objects, programmatic code or other advanced
features depending on what is supported in the earlier versions. The textual content itself is
usually preserved when downgrading.
Across format families
Migrating from one broad type of document file format to an entirely different one poses the
highest direct continuity risks. No two broad families of document file format support exactly the
same features, in the same ways, so some change and loss to a document should be expected.