42
into a single searchable resource, allowing users to search across multiple titles with a
single query. To achieve this while allowing institutions the flexibility to incorporate
materials into their own catalog systems and online services, NDNP awardees must
ensure LC has access to updated title-level bibliographic records from CONSER and
metadata for various levels of granularity within the digital reproductions.
Each newspaper digitized through NDNP must be supported by coherent metadata, to
provide intellectual access and support navigation of the structure of the publication, by
date, section, etc. The tables in Appendix A list the elements appropriate at the
newspaper title level, the issue/edition level, and the page level. [The tables indicate
whether elements are mandatory and whether they are repeatable.] The access interface
will permit direct identification and citation at each level through persistent identifiers.
The identification of newspapers titles will be based on Library of Congress Catalog
Numbers (LCCNs), since not all historical newspapers have been assigned International
Standard Serial Numbers (ISSNs) or another unique identifier. These metadata
specifications will be discussed at the awardees’ annual meeting.
All newspaper titles selected for digitization under NDNP must be under bibliographic
control per U.S. newspaper cataloging guidelines maintained by the Cooperative Online
Serials Cataloging (CONSER) program and included in the CONSER database hosted
within the OCLC Online Union Catalog (WorldCat). Each title must have a full
bibliographic record at the title-level for the original materials (not microfilm) and
associated holdings information. If pre-existing, the CONSER records must be reviewed
and updated as necessary by the awardee institution and exported and delivered to LC
before submission of associated digitized pages. Such export records should be in
MARC 21 Communications format, UTF-8 encoding.
All LCCNs provided in metadata must be normalized to MARC21 standard.
Provide issue/edition metadata for all known issue/edition occurrences, i.e. if microfilm
reel includes information (target or Guide to Contents) indicating an issue/edition was
known to be published but is not available as a digital asset at this time, create a record
for that issue/edition and use the Issue Present Indicator to indicate the issue/edition the
record described is not available.
Provide page metadata for all known page occurrences, i.e. if microfilm reel includes
information (target or Guide to Contents) indicating a page was known to be published
but is not available as a digital asset at this time, create a record for that page and use the
Page Present Indicator to indicate the page the record describes is not available. Note,
however, that a page record should not be created for a page if the issue which the page is
part of has been identified as missing.
For issue, the combination of LCCN, Issue Date, and Edition Order can be used as a
unique identifier. For page, the combination of LCCN, Issue Date, Edition Order, and
Page Sequence Number will be unique.
Library of Congress
08/19/2011
Page 11
44
In addition to Issue and page metadata, also produce reel metadata objects that describe
individual scanned reels and filmed targets. Some fields, as indicated in the Metadata
Dictionary and XML templates, are optional and not used within the NDNP system to
manage or provide access to data. Awardees should use their own discretion in
determining whether capture of this data is useful for their own needs.
Awardees will deliver all digital assets in METS object structure (Metadata Encoded
Transmission Schema), according to an XML Batch template structure. (See Appendix C
– XML Metadata Templates.)
Technical Validation of Digital Objects
All NDNP Award digital objects must be validated prior to delivery to LC. NDNP
utilizes a program-specific software application - distributed to all awardees and updated
as needed - to ensure technical conformance with the digital object profiles and
specifications. The software is distributed as the NDNP Digital Viewer and Validator
(DVV), and allows users to view and validate a batch through a Windows graphic user
interface, or to validate from a DOS or Linux command line processor.
NDNP has developed the validation process by using and extending the JHOVE
(JSTOR/Harvard Object Validation Environment – see <http://hul.harvard.edu/jhove
>)
toolkit.
JHOVE enables the identification, validation, and characterization of files. Each
file format, e.g., TIFF, is supported by a separate module. The NDNP Validation Library,
included in the NDNP DVV, "wraps" JHOVE and extends JHOVE's existing TIFF, PDF,
and JPEG2000 modules with the NDNP-specific validation rules. In addition, the
Validation Library uses a combination of existing XML schemas and Schematron
schemas, implementing validation in a custom JHOVE module, and uses JHOVE’s
format characterization abilities to populate the PREMIS and MIX sections of Issue and
Reel METS objects.
For more on the technical approach of digital object validation, see Justin Littman, “A
Technical Approach and Distributed Model for Validation of Digital Objects.” D-Lib
Magazine, May 2006. http://www.dlib.org/dlib/may06/littman/05littman.html
.
Summary of All Digital Asset Deliverables
1. Validated Master digital page image format = TIFF 6.0 uncompressed,
2. Validated OCR text file with bounding-box coordinates = 1 text file per page,
3. Validated PDF Image with Hidden Text = 1 PDF per page,
4. Validated derivative digital page image format = JPEG2000 (.JP2) using specified
compression options,
5. Validated metadata using METS in accordance with guidelines in Appendices A
and C.
Note: The four digital files associated directly with a newspaper page (.TIF, .JP2, .PDF,
and OCR) are expected to use the same file identifiers with distinct file extensions.
Library of Congress
08/19/2011
Page 12
33
Library of Congress
08/19/2011
Page 13
Valid file format examples are available for download at http://www.loc.gov/ndnp/
.
Delivery of Digital Assets
Awardees will deliver all digital assets to LC in a METS object structure (Metadata
Encoded Transmission Schema), according to an XML Batch template structure. (See
Appendix C – XML Metadata Templates.)
For delivery, the awardee shall organize the page images and related files for each
newspaper title in a hierarchical directory structure sufficient for identification of the
individual digital assets from the metadata provided. (See Appendix D – File and
Directory Structure on Delivery Media.) Assets delivered to LC as prescribed in this
directory structure are converted by LC to conformance with the “BagIt” specification, a
hierarchical package format for transferring digital content (see
http://www.cdlib.org/inside/diglib/bagit/bagitspec.html
for background information).
A given delivery device should encompass a single batch. Awardees will name each
batch conforming to NDNP batch naming specifications. The precise directory structure
and batch naming specification will be discussed at the post-award awardee meeting and
include successive sub-directories based on LCCN, reel number, and issue date with
edition sequence. An XML Batch file should be created per the template in Appendix C.
Delivery of digital assets to LC should primarily be via tracked shipment of durable
external hard drives (preferably, both USB 2.0 and Firewire-enabled). The possibility of
delivery via Internet2-enabled server-to-server file transfer will be discussed at the annual
awardees’ conference (resource planning should be based on use of durable external hard
drives). Awardees should plan for adequate temporary storage locally (approx. 54 Mb per
page – including TIFF, JP2, PDF, OCR, metadata) during the transfer and verification
process at LC. Awardees should plan to deliver data batches to LC monthly (no more
than 10,000 pages per month), with an expected response time of 6-8 weeks for LC data
acceptance and ingestion.
Further options and specifications for delivery will be specified at the initial 2011-13
awardees’ meeting, post-award.
C# WPF PDF Viewer SDK to view PDF document in C#.NET Select. Select text and image to copy and paste using Ctrl+C and Ctrl+V. Rotation (Ⅲ) & Zoom (Ⅳ) Tabs. Click to zoom out current PDF document page. 5.
remove text from pdf preview; acrobat remove text from pdf
7
Library of Congress
08/19/2011
Page 15
Appendices
(NOTE: Latest versions of these specifications (in use by 2010-12 NDNP Awardees) are
available on the Profiles and Specifications page of the LC NDNP Web Site at
http://www.loc.gov/ndnp/ )
74
Appendix A: Digital Asset Metadata Elements - Dictionary
NOTES:
- Metadata elements below are described by original object. Elements may appear in more than one digital object per NDNP specifications.
Data
Description
Data Type
Example
Notes
Repeat-
able
R= repeatable
NR = non-
repeatable
Manda-
tory
M=mandatory
MA=mandatory,
if available
O=optional
Xpath (see XML templates) and/or
Data location
General Information
Awardee Name string
New York Public
Library
name of institution that
received the NEH award
NR
M
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:newspaper:iss
ue”]/metsHdr/mets:agent/mets:nam
e
Award Year
enumeration 2011
Year of NEH award
under which the
digitization of this
content was funded.
Valid values are:
2008
2009
2010
2011
NR
M
Xml:xml[@TYPE=“urn:library-of-
congress:ndnp:batch”]/batchHdr/bat
ch:agent/batch:awardYear
Original Source
Repository
string
Multiple
examples:
Library of
Congress;
Washington, DC
Owner of original
source that was digitized
(micro-film or paper) ;
city and state postal
abbreviation
NR
M
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:newspaper:iss
ue”]/mets:dmdSec[@ID=“pageMods
Bib1”]/mets:mdWrap/mets:xmlData/
mods:mods/mods:relatedItem[@typ
e=“original”]/mods:location/mods:ph
ysicalLocation/@displayLabel
Library of Congress
08/19/2011
Page 17
71
or
New York Public
Library; New
York, NY
~or~
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:microfilmReel”]
/mets:dmdSec[@ID=“targetModsBib
1”]/mets:mdWrap/mets:xmlData/mo
ds:mods/mods:relatedItem/mods:loc
ation/mods:physicalLocation/@displ
ayLabel
Original Source
Repository Code
enumeration dlc
Normalized MARC
organization code of
owner of source. See
http://www.loc.gov/marc
/organizations/org-
search.php
for more
information and code
list.
NR
MA
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:newspaper:iss
ue”]/mets:dmdSec[@ID=“pageMods
Bib1”]/mets:mdWrap/mets:xmlData/
mods:mods/mods:relatedItem[@typ
e=“original”]/mods:location/mods:ph
ysicalLocation
~or~
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:microfilmReel”]
/mets:dmdSec[@ID=“targetModsBib
1”]/mets:mdWrap/mets:xmlData/mo
ds:mods/mods:relatedItem/mods:loc
ation/mods:physicalLocation
Digital
Responsible
Institution
string
Multiple
examples:
Library of
Congress;
Washington, DC
or
Library of
Virginia;
Awardee institution; city
and state postal
abbreviation
NR
M
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:newspaper:iss
ue”]/mets:dmdSec[@ID=“pageMods
Bib1”]/mets:mdWrap/mets:xmlData/
mods:mods/mods:note[@type=“age
ncy
ResponsibleForReproduction”]/@di
splayLabel
~or~
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:microfilmReel”]
Library of Congress
08/19/2011
Page 18
53
Richmond, VA
/mets:dmdSec[@ID=“techTargetMo
dsBib”]/mets:mdWrap/mets:xmlData
/mods:mods/mods:note/@displayLa
bel
~or~
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:microfilmReel”]
/mets:dmdSec[@ID=“targetModsBib
1”]/mets:mdWrap/mets:xmlData/mo
ds:mods/ mods:note/@displayLabel
~or~
TIFF: ImageProducer
~or~
PDF: rdf:Description/dc:description/
rdf:Alt/rdf:li~or~
JPEG2000: rdf:Description/
dc:description/ rdf:Alt/rdf:li
Digital
Responsible
Institution Code
enumeration Multiple
examples:
dlc
or
vi
Normalized MARC
organization code of
Awardee. See
http://www.loc.gov/marc
/organizations/org-
search.php
.
NR
MA
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:newspaper:iss
ue”]/mets:dmdSec[@ID=“pageMods
Bib1”]/mets:mdWrap/mets:xmlData/
mods:mods/mods:note[@type=“age
ncy ResponsibleForReproduction”]
~or~
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:microfilmReel”]
/mets:dmdSec[@ID=“techTargetMo
dsBib”]/mets:mdWrap/mets:xmlData
/mods:mods/mods:note
~or~
mets:mets[@TYPE=“urn:library-of-
congress:ndnp:mets:microfilmReel”]
Library of Congress
08/19/2011
Page 19
47
/mets:dmdSec[@ID=“targetModsBib
1”]/mets:mdWrap/mets:xmlData/mo
ds:mods/ mods:note
~or~
Xml:xml[@TYPE=“urn:library-of-
congress:ndnp:batch”]/batchHdr/bat
ch:agent/batch:awardee
Batch name
Sample
Batch name
Production
String
String
batch_dlc_2009sa
mple
batch_dlc_alpha
For initial sample batch,
use this naming
structure:
batch_[MARC
organization
code*]_[year of
award]sample
For production batches,
use this naming
structure:
batch_[MARC
organization
code*]_[keyword**].
*For MARC
Organization code, see
http://www.loc.gov/marc
/organizations/org-
search.php
.
** Batch keywords are
unique within the
deliveries of a given
awardee throughout their
NR
M
Xml:xml[@TYPE=“urn:library-of-
congress:ndnp:batch”]/batchHdr/bat
ch:agent/batch:name
Library of Congress
08/19/2011
Page 20
Documents you may be interested
Documents you may be interested