The parts form a tree. If a part has child parts, it must have a relationships part which identifies these.
The part which contains the main text of the document is the Main Document Part. Each Part has a
name. The name of the Main Document Part is usually "/word/document.xml".
If the document has a header, then the main document part woud have a header child part, and this
would be described in the main document part's relationships (part).
Similarly for any images. To see the structure of any given document, see "Parts List" further below.
An introduction to WordML is beyond the scope of this document. You can find a very readable
introduction in 1
edition Part 3 (Primer) at http://www.ecma-
(a better link for the 1st edition
(Dec 2006), since its not zipped up).
The Office Open XML
file formats were standardised between December 2006 and November 2008,
first by the Ecma International
consortium (where they became ECMA-376),
and subsequently .. by the ISO
's Joint Technical Committee 1
(where they became ISO/IEC 29500:2008).
The Ecma-376.htm link also contains the 2nd edition documents (of Dec 2008), which are
aligned with ISO/IEC 29500".
Office 2007 SP2 implements ECMA-376 1st Edition
; this is what docx4j implements.
ISO/IEC 29500 (
ECMA-376 2nd Edition) has
Strict and Transitional conformance classes. Office 2010 supports
transitional, and also has read only support for strict.
Docx4j has 3 layers:
OpenPackaging handles things at the Open Packaging Conventions level: unzipping a docx into
WordprocessingMLPackage and a set of objects inheriting from Part; allowing parts to be
added/deleted; saving the docx