Reccomendations for metadata and data formats for
online availability and long-term preservation, version
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
The most popular descriptive metadata format is Dublin Core (the full name is Dublin
Core Metadata Element abbreviated as DCMES), which is globally recognized ISO
standard. 71% of existing recommendations and 59% of survey respondents has
indicated it as the main format for descriptive metadata in the context of long-term
preservation. It is a simple and easy to use XML-based format. The simplicity of DCMES
is an advantage and disadvantage at the same time. It is good because thanks to
simplicity many institutions can easily use it. It is bad because the meaning of particular
elements in the standard is not strict, which may cause various misunderstandings. If
more detailed description is needed Dublin Core Metadata Initiative Terms (DCTerms)
can be used, as those include all the elements from DCMES, and add additional ones,
which allow for more precise description.
MODS format is quite popular with relatively high adaptation in the user community
(16% of respondends use it for preservation, 47% of existing recommendations indicate it as a good
option). MODS is based on XML, it can contain a richer description than Dublin Core,
and is also based on MARC21 (though is not able to carry full MARC21 records),
therefore can be easily created from existing MARC21 records.
MARC21 was also indicated in existing recommendations and survey. Nevertheless it is
not highly recommended as it has several issues with interoperability. It has a specific
encoding scheme for transportation purposes (MARC21 communication format), but it is
not simple, it is not self-descriptive and definitely it is not human-readable. Additional
complication is the possibility to encode MARC21 records using different encodings. It
may cause additional issues, as for instance the offsets indicated in MARC21 leader
(header) depend on characters and not bytes (and some characters can occupy more than
one byte – depending on the encoding). It means that encoding needs to be know
beforehand (before processing) and it is not available in the file itself. Because of these
reasons the MARC21 format is proposed as alternative.
Structural metadata format
For structural metadata the only option is METS format. In practice there is no real
alternative for the format. It is already used by 36% of survey respondents and it is
indicated by existing recommendations in 59% of cases. It is an XML-based open
standard, simple to apply and supporting various specific formats, including MODS,
ALTO, TextMD, MIX and PREMIS (which are all recommended by Succeed project) . It is
therefore the best option (and in practice the only one) to be used for structural metadata
for long-term preservation.
Administrative metadata format
Recommended: PREMIS, MIX, TextMD