FILENAMES AND DELIVERY DIRECTORIES - LOT 1 AND LOT 2
The contractor shall assign a digital-image filename to each image captured as part of the initial image-capture process, and deliver
these files to the Library in a certain arrangement of directories and subdirectories, following the specifications outlined in Section J,
Attachment 4. These are called delivery directories. The filename and directory structure is essential as it will facilitate future access
to the images and texts. The contractor shall deliver the images and texts in delivery directories which the Library will archive in
repository directories that parallel those created for delivery. These directories and the names of the files they contain provide the
structure for the Library's digital repository, the institution's archive of digital information. The directory names and filenames link the
images and texts to elements in the Library's collection-retrieval system. The content of the digital repository is stored in UNIX-based
servers at the Library of Congress. The Library, however, anticipates production and delivery content using equipment that employs
the MS-DOS operating system. In addition, sets of images may be delivered to third parties who use IBM-compatible, DOS-based
computers. For this reason, the directory names and filenames shall conform to DOS naming conventions. In order to accommodate
UNIX needs, any alphabet letters in the file or directory names shall be lower case. Since filename extensions will be assigned
according to file type (e.g., .tif, .jif or .jpg), the first eight characters--the file name proper--become very important.
Identifiers Used to Name Directories
Specifications for file- and directory-naming are outlined in Section J, Attachment 4. The particular file- and directory names shall be
assigned from interpretation of these general specifications. The Library will specify an identifier for a delivery directory. Identifiers
are unique names which distinguish one item from another. Under this contract, an item may be any of the following: a book or
pamphlet, a folder in a manuscript collection, or a document within a folder of a manuscript collection. An identifier is the prefix or
left-side (right-truncated) portion of a name that may contain as many as eight characters. For example, the identifier is bj06 might be
used as the basis for assigning the directory names bj06001 (for the first 300 files), bj06002 (for the second 300 files), bj06003 (for the
third 300 files), and bj06004 (for the fourth 300 files). Other, similar patterns may also be specified for other identifiers, as outline in
Section J, Attachment 4. The identifiers used to name directories also appear in the cataloging or finding-aid data the Library employs
in its retrieval systems. When a researcher has found an item of interest in a catalog or finding aid and executes a fetch command, the
retrieval system uses the identifier to locate the appropriate repository directory in the Library's digital archive and proceeds to retrieve
the appropriate set of image or text files.
File and Directory Structures
Assigning filenames and naming directories for the collections shall be performed according to the four structures identified below and
in accordance with the detailed specifications found in Section J, Attachment 4.
1. Unnumbered documents in folder structure
2. Bibliographic record/print-page number structure
a. When printed page numbers are tracked
b. When printed page numbers are not tracked
3. Serials structure
a. When printed page numbers are tracked
b. When printed page numbers are not tracked
4. Copyright-registration-number and technical-document structure
Identifying targets for each item will identify the naming scheme to be applied to that item.
In order to properly assign filenames and enter data into the scanning log, appropriate actions as specified for the various document
and collection features listed below shall be taken.
New Folders in Manuscript Collections
The start of each new file folder shall be identified during the scanning of folders within a manuscript collection. If not already marked
on the folder, a number shall be assigned to the folder. These numbers or names shall be used to properly assign names to delivery
17 of 27
New Documents in Manuscript Folders
The beginning of new documents (reports, letters, etc.) shall be identified and "new document" shall be indicated as a feature.
Features and Page Numbers in Printed Matter
For some books or other printed matter, the presence of at least four types of features: title pages, tables of contents, lists of
illustrations, indexes, and/or cumulative tables of contents or indexes (for serials). In addition, the actual printed page numbers (when
present) for certain books or magazines shall also be identified. Special codes shall be used to embed feature-identifiers in the
Derivative Images or Multiple Versions of Entire Page
When derivative images or multiple versions of page images are required, e.g., to successfully reproduce printed halftones and finely
inscribed line art in a second image, filenames shall be assigned to the images in accordance with the specifications.
Images of Segments of Pages
When multiple segments of a single sheet or page are created, filenames shall be assigned to the images in accordance with the
specifications in Section J, Attachment 4.7.
Resolution targets scanned according to the specifications in C.4.8 and shall be named in accordance with the specifications in Section
J, Attachment 4.
TEXT CONVERSION AND SGML-ENCODING - LOT 1
The Library of Congress encodes converted texts with Standard Generalized Markup Language (SGML) following the Text Encoding
Initiative (TEI). Encoding permits the retention of certain elements and features that would be lost in simple word-based ASCII
conversion. These include structural elements (front matter, chapters, illustration location) and such features as highlighted text (for
example, bold or italics). The retention of these elements and features permits the interchange of texts with reduced loss of meaning,
the loading of texts into access software that interprets and displays materials as encoded, and the use of retrieval software that can, for
example, give added weight to certain portions of a text. In 1992, the Library developed a Document Type Definition (DTD) for the
SGML-encoding of its historical materials. The contractor shall comply with the DTD, Tag Library, and supplementary keying
instructions provided by the Library for the conversion of textual materials. The contractor shall synthesize these Library-furnished
materials in order to create a single instruction set. The historical documents (American Memory) DTD and associated keying and
encoding instructions are provided in Section J, Attachment 6 and Attachment 7.
The texts shall be delivered in ISO 646. The IBM-extended characters (letters modified with diacritics), however, shall be coded
according to the standard publicly declared entity reference sets in ISO 8879.
Text Conversion and SGML-encoding from Image Sets Scanned by the Contractor
The contractor shall create SGML-encoded, machine-readable texts from image sets that have been scanned under this contract. The
18 of 27
contractor shall be responsible for preparing and delivering image sets for conversion; this will include activities such as checking that
all required rework has been completed, integrating rework into image sets, copying image sets onto suitable delivery medium, etc.
The contractor shall also be responsible for tracking the progress of conversion work.
Text Conversion and SGML-encoding from Existing Image Sets
The contractor shall create SGML-encoded, machine-readable texts from image sets provided by the Library and not produced under
this contract when those image sets meet the image-type and image-quality requirements set forth in this document. These image sets
will be provided by the Library on write-once CD-ROM disks in directories and with filenames that meet the requirements set forth in
this document. Paper targets to accompany each document represented in the image set will be provided by the Library; these targets
will contain all the information needed for the document's text header, as described in the Keying Instructions, Section J, Attachment
The delivered SGML-encoded texts from image sets provided by the Library shall meet the requirements for text provided in
preceding sections. If needed for the conversion process, the contractor shall reprocess the image sets or copy them onto other media
for that purpose.
Prior to assigning a task order for the conversion of image sets not produced under this contract, the contractor will be given an
opportunity to examine the images and media and to determine if the minimum image requirements of this contract can be met.
Texts and Associated Files
All SGML-encoded texts shall be named according to specifications and delivered as ASCII files on the media indicated. Each
delivery shall be accompanied by a printed memo and printed directory list. Several types of associated files shall also be provided for
each SGML-encoded text: page information group files, reference files, omission report files and ENTITY files. A description of each
file follows. (See Section J, Attachment 8, for examples of each.)
Page Information Group Files
For every SGML-encoded document, the Library will confirm that: control page numbers match document filenames, control page
numbers and print page numbers progress in the proper sequence, and there are no unaccountable variations in the relationship
between control page numbers and print page numbers. Therefore, each delivery of SGML-encoded text shall include a set of
machine-readable ASCII files, one file for each full-text document, that contains a list of all the page information group tags and their
contents, in the order in which they appear in the document.
Pointers to internal and external references in SGML-encoded documents (such as all occurrences of illustration, table, and note
reference tags with attributes) shall be reported in a set of machine-readable ASCII files, one file for each full-text document.
Omission Report Files
Text that is illegible is marked with an SGML tag to show its placement. A report of the line-number location of any omitted text shall
be delivered in a set of machine-readable ASCII files, one file for each full-text document.
ENTITY references are used to link to external files. ENTITY references in the SGML file point to an external entity file that lists
each ENTITY value, its corresponding filename and MIME type. Therefore, for all converted text a set of machine-readable ASCII
files, one file for each full-text document, shall be provided that contains a list of the ENTITY values found in the document plus the
filename associated with the ENTITY value.
19 of 27
Customized Error Diagnostic Software and Other Quality Review Tools
Currently no commercially-available off-the-shelf (COTS) software has been determined to be adequate for the full range of quality
review activity required to ensure that all conversion requirements are met. Some customization of error diagnostic software may be
required to meet the quality review requirements. Any customized error diagnostic software or other automated tools used by the
contractor in quality review shall be furnished to the Library for the contract's period of performance at no additional cost. This
software (or combination of software) must be able to produce user-friendly output stating errors found and their locations. The
contractor shall also provide the Library with instructions for using this software. This software must be provided to the Library before
the delivery of the test-batch SGML-encoded texts. Customized additions and modifications to the diagnostic software shall be
provided to the Library as they are put into active use by the contractor.
RELATED SERVICES AND ACTIVITIES - LOT 1 AND LOT 2
Photocopying of Source Material
During the scanning process, certain source materials shall require photocopying. When color pages are scanned as bitonal or
grayscale, it will be necessary to produce a photocopy of the original in order to compensate for the scanner's color-blindness. The
Library will provide the photocopier when scanning takes place onsite and will permit copying at no cost to the contractor.
Printed Copies of Scanned Images
Printed copies of scanned images are required for the following images.
Black and white hardcopy of bitonal images and grayscale or color images, halftoned at print-time
Grayscale hardcopy of grayscale images
Color hardcopy of color images
The need for printed copies of scanned images will also be specified for the task order (when applicable).
Programming and Processing Activities
The capability to provide different levels of technical expertise is required. It is anticipated that additional programming or processing
steps associated with scanning or conversion, modifications to the SGML DTD and related files, and adjustments to the workflow
tracking system may be required. These tasks may require different levels of technical expertise, including a processing technician or a
computer programmer, and will be specified for task orders as applicable.
CONTRACT STARTUP AND TESTING ACTIVITY - LOT 1 AND 2
Because of the complexity of the Library's requirements and the variation in the Library's original materials, the first task to be
performed under this contract for both LOT 1 and LOT 2 shall entail the study of a representative cross section of items and the
production of a set of test images. For LOT 1, the task will also include the production of a set of SGML-encoded texts. The startup
and testing phase shall provide a time during which the contractor and NDLP staff shall work together to address and finalize a
mutually agreed upon definition of particular matters related to these technical requirements, such as: the handling of printed halftone
illustrations and, for LOT 1 only, the clarification of keying and coding instructions for text conversion and SGML encoding. The
startup activity for LOT 1 shall be eight weeks, while the activity for LOT 2 shall be five weeks. The outcome of the startup and
testing activity shall 1) establish the specifications for the first task order and 2) provide for the provisional establishment of
specifications for other materials likely to be encountered in later tasks under the contract. The startup and testing activities for LOT 1
and LOT 2 will also provide an opportunity for the Library and the contractor to finalize the details of data entry in the Library's
workflow tracking system (see C.14).
20 of 27
Representative Types of Materials
During the contract's startup phase, the Library will furnish the following representative examples of the types of materials from which
digital images and texts shall be produced. The sample materials will be accompanied by instructions regarding the filenaming and
directory structure to be employed.
Furnished for LOT 1
Approximately 500 pages of material from a manuscript collection. These will be unnumbered documents in file folders.
30 pages of material from a copyright-deposit-series collection, e.g., sheet music. These materials will be unbound separate
sheets, possibly including left and right pages, to be imaged separately.
4 books (approximately 800 pages). 2 books from a local history collection for the states of Michigan, Wisconsin, and Minnesota
and 2 books from the Journals of the House of Representatives. These books will serve as a representative sample of bound book
collections and both images and SGML-encoded texts will be produced.
The books will be selected to represent the most frequently encountered types for scanning, e.g., books that must be scanned face up
with and without a cradle. Furnished for LOT 2
1 book (ca. 400 pages) that is large, cumbersome, and fragile.
1 book (ca. 200 pages) containing color illustrations.
The startup phase for both LOT 1 and LOT 2 shall include the following actions:
The Library will make the items listed above available to the contractor at the Library of Congress, along with guidelines
for handling, filenaming, etc.
The contractor project manager and other contractor designated staff shall meet with the Library project manager (COTR)
and other Library staff to discuss the sample materials and delineate the various options for scanning, the SGML markup
of text, and the delivery directory structure(s).
The Library's Conservation Office will conduct an orientation session on the safe handling of originals, including use of
contractor's book cradle, if applicable.
The contractor shall deliver and set up scanning equipment in the designated location.
The contractor shall scan the materials in the manner approved during the preceding meetings. The scanned images shall then be
forwarded to the Library as both digital files and printed copies for each image.
The contractor and the Library shall install the workflow tracking system. The Library will provide training for its use.
The Library will review the digital samples and printed copies and provide a written response to the contractor concerning
acceptability, to be followed by discussions that may be needed to resolve any issues.
The contractor shall provide feedback concerning the functionality of installed workflow tracking system, as far as image
scanning is concerned. Revisions and adjustments will be discussed.
The startup phase for LOT 1 shall continue with the following additional actions:
The contractor performs analysis of documents for text conversion and encoding and reports its finding to the Library. After
approval by the Library, the contractor proceeds with the conversion of texts to machine-readable form with SGML encoding.
21 of 27
Documents you may be interested
Documents you may be interested