Substantive metadata is embedded in the document it describes and remains with the
document when it is moved or copied. . . .
b. System Metadata
System metadata “reflects information created by the user or by the organization's
information management system.” (citation omitted) This data may not be embedded
within the file it describes, but can usually be easily retrieved from whatever operating
system is in use. . . . Examples of system metadata include data concerning “the author,
date and time of creation, and the date a document was modified.” . . . Courts have
commented that most system (and substantive) metadata lacks evidentiary value because
it is not relevant. (citations omitted) System metadata is relevant, however, if the
authenticity of a document is questioned or if establishing “who received what
information and when” is important to the claims or defenses of a party. (citation
omitted) This type of metadata also makes electronic documents more functional because
it significantly improves a party's ability to access, search, and sort large numbers of
documents efficiently. . . . .
c. Embedded Metadata
Embedded metadata consists of “text, numbers, content, data, or other information that is
directly or indirectly inputted into a [n]ative [f]ile by a user and which is not typically
visible to the user viewing the output display” of the native file. . . . Examples include
spreadsheet formulas, hidden columns, externally or internally linked files (such as sound
files), hyperlinks, references and fields, and database information. . . . This type of
metadata is often crucial to understanding an electronic document. For instance, a
complicated spreadsheet may be difficult to comprehend without the ability to view the
formulas underlying the output in each cell. . . . .
In computer terminology, a database is simply a collection of mutually related data or
information stored in computer record fields. They are organized collections of information
similar to index cards, phone books, or file cabinets of documents. In business, all kinds of data,
from e-mail and contact information to financial data and records of sales, are stored in some
form of a database. Databases track employee information, payrolls, job classifications,
retirement benefits and a host of other business related information. The Fed. R. Civ. P. 34,
Advisory Committee Note of 2006 recognized “dynamic databases” and how databases may
store different forms of ESI, "[e]lectronically stored information may exist in dynamic databases
and other forms far different from fixed expression on paper. . . . Using current technology, for
example, a party might be called upon to produce word processing documents, email messages,
electronic spreadsheets, different image or sound files, and material from databases." A subset of
a database can be produced by the use of queries and reports. They can be exported into a
comma, delimited, spreadsheet or other file format.
Arkfeld on Electronic Discovery and Evidence, §§ 3.11 Structure and Type of Electronic
Information, 5.3, ESI Forms and Disclosure Formats, and 7.7(G), ESI Form(s).
N.A.A.C.P. v. Acusport Corp., 210 F.R.D., 268, 278-279 (E.D.N.Y. Sept. 18, 2002) (case
provides extensive discussion of the different components of a database)
Cook v. Deloitte & Touche, LLP, No. 03-3926, 2005 U.S. Dist. LEXIS 22252, at *8, 14 (D.N.Y.
Sept. 30, 2005) (database contained a file for each employee reflecting each telephone call or
letter between [employer] and the employee, including notes summarizing the substance of the
contact . . . [and a database] describing vacant positions).
In re Seroquel Prods. Liab. Litig., MDL 1769, 2007 U.S. Dist. LEXIS 5877, at *8 (D. Fla. Jan.
26, 2007) (Databases holding information about customers, production, employee performance,
internal processes, etc.)
Spreadsheet application programs perform simple and complex mathematical calculations
automatically. Spreadsheets can be used in a variety of business functions, and oftentimes are
used by individuals to keep their financial and other records. Spreadsheets have long been used
in business and are of interest for electronic discovery because of their content. Calculations and
mathematical analyses, mailing lists, to-do lists, attendance rosters, invoices, real estate closing
statement calculations, truth in lending statements and mortgage payments, loan calculations and
amortization schedules are a few of the uses of spreadsheets. Database information can be
downloaded into a spreadsheet format.
"Native" spreadsheet computer files contain:
Calculations or formulas that are not visible in a printout version (only the result of the
calculation is visible);
Hidden cells, columns, rows and post-it style comments;
Hidden formulas; and
Display of all rows and columns.
Arkfeld on Electronic Discovery and Evidence, § 3.12 Structure and Type of Electronic
Information and § 7.7(G), ESI Form(s)
Public Citizen v. Carlin, 2 F. Supp.2d 1, 14 (D.D.C. 1997). (Paper print-outs of computer
spreadsheets only display the results of calculations made on the spreadsheet, while the actual
electronic version of the spreadsheet will show the formula used to make the calculations.)
Williams v. Sprint/United Mgmt. Co., 230 F.R.D. 640 (D. Kan. 2005) (The court ordered an
employer in an employment discrimination case to restore the metadata it had scrubbed or erased
from Excel spreadsheet files and unlock them.)
Imaging is a process where paper or ESI is scanned into a system and stored electronically as a
“picture.” These digitized computer files of documents are known as “images.” Images may be
searchable or non-searchable. Metadata from the original document is generally not converted by
the producing party if you convert and print a file to a TIFF or PDF format. ESI can be converted
to images for disclosure purposes, redaction or presentation in the courtroom. In addition, images
may be linked to a database to assist in the retrieval process.
If an image is not searchable it can be converted into searchable text by using optical character
recognition (OCR). OCR is a process of converting letters or numbers that appear on an image or
printed page to a bit mapped image and then into ASCII that can be searched. However, the
process is not 100% accurate and it may be expensive to "clean up" the converted image data. An
image that is created from ESI is generally searchable since the ESI is converted directly into an
image with the accompanying “text.” This is usually accomplished by using the PDF image
format. In addition, ESI can be converted to a TIFF nonsearchable image and be accompanied
with searchable “text.”
The two most popular image forms for litigation are TIFF and PDF.
TIFF (Tagged Image File Format). A TIFF is a standard proprietary file format for storing
images as bit maps. It is used especially for scanning documents because it can support any size,
resolution and color depth. TIFF images are not searchable. If the document is OCR’ed, which is
not 100% accurate, then it will become partially searchable.
PDF (Portable Document Format). PDF stands for Portable Document Format and is also a
proprietary file format (Adobe, Inc.) that preserves the fonts, images, graphics and layouts of any
source document, such as a Microsoft Word document, regardless of the application and
platform used to create it. After converting a file to a PDF format anyone with (a free) Adobe
Reader software can view the document as it originally appeared in the application program.
This precludes the necessity of having to obtain a licensed copy of the application program to
view the document. There are two different types of PDF files:
IMAGE ONLY format is an exact electronic picture of the paper document. It cannot be word
searched, unless the image is subsequently OCR’ed; IMAGE FORMAT with SEARCHABLE
TEXT FORMAT is an electronic picture or image of the document that also contains background
“hidden” text that can be word searched using Adobe Reader software. Documents, spreadsheets,
e-mail and graphics can all be converted to TIFF or PDF.
Arkfeld on Electronic Discovery and Evidence, § 5.3 (D), ESI Forms and Disclosure Formats and
§ 7.7(G), ESI Form(s)
Fed. R. Civ. P. 34(a) states: “(a) Scope. Any party may serve on any other party a request (1) to
produce . . . documents or electronically stored information — including . . . images . . . stored in
any medium from which information can be obtained (emphasis added).”
The Fed. R. Civ. P. 34, Advisory Committee Note of 2006 introduced new terminology for the
ESI “image” form and stated, “[i]mages, for example, might be hard-copy documents or
electronically stored information.”
Williams v. Sprint/United Mgmt. Co., 230 F.R.D. 640, 643, n.8 (D. Kan. 2005) (“TIFF (Tagged
Image File Format) is one of the most widely used and supported graphic file formats for storing
bit-mapped images, with many different compression formats and resolutions. A TIFF file is
characterized by its ‘.tif’ file name extension. (citation omitted)”).
In re Payment Card Interchange Fee & Merch. Disc. Antitrust Litig., No. 05-1720, 2007 U.S.
Dist. LEXIS 2650, at *6,7,15 (E.D.N.Y. Jan. 12, 2007). (TIFF images without searchable text are
an unacceptable form of production and a violation of the Rules Advisory Committee Proviso
that data ordinarily kept in electronically searchable form should not be produced in a form that
removes or significantly degrades this feature.)
CP Solutions PTE, Ltd. v. GE, No. 04-2150, 2006 U.S. Dist. Lexis 27053, at *1-15 (D. Conn.
Feb. 6, 2006)
Text, ASCII, and Conversion Formats
Text or ASCII. “Text” or "ASCII" (American Standard Code for Information Interchange)
documents are those documents that have the “text” of a document stored in a computer file.
These documents can be word or phrase searched and one can instantly access the exact location
of the words in the text documents. ASCII is a format that most computer programs recognize
for transferring data between programs and to conduct “text” searches. Essentially any document
produced is in a “text” format, if it is in an ASCII format. Once in an ASCII format, it can be
imported and searched in a search and retrieval software program. An example of a “text’
document is the deposition of a witness. Other examples of “text” documents include; business
documents, trial transcripts, witness interviews, expert reports, etc.
Conversion Formats. Generally, in order to transfer data between different database, spreadsheet
and automated litigation support (ALS) programs you have to convert data into a format that is
recognized by both programs. For example, one common format for transferring data from one
application to another is “comma-delimited” in which each piece of data is separated by a
“comma.” Most database and spreadsheet programs are able to import and export “comma-
delimited data.” Any character can be used to separate the data, but the common separators or
delimiters are the comma (usually referred to as CSV (comma-separated value)), text, vertical
bar, space and the tab key. Column headers are usually included as the first line and used as
“field descriptors” for identification purposes.
Zakre v. Norddeutsche Landesbank Girozentrale, No. 03-257, 2004 U.S. Dist. LEXIS 6026, at
*1-2 (D.N.Y. April 9, 2004) (plaintiff is able to search the ESI provided in a text-searchable
format “for single words or phrases, or combinations of words or phrases”).
In re Payment Card Interchange Fee & Merch. Disc. Antitrust Litig., No. 05-1720, 2007 U.S.
Dist. LEXIS 2650, at *12-14 (D.N.Y. Jan. 12, 2007) (“‘OCR’ refers to ‘optical character
recognition,’ a computer software program that translates images of text into a format that can be
searched or ‘read’ electronically.”).
Pace v. Int’l Mill Serv., No. 05-69, 2007 U.S. Dist. LEXIS 34104, at *1-2 (D. Ind. May 7, 2007).
In a pre-amendment case, the Court denied plaintiff’s motion to compel and noted that plaintiff’s
consultant alleged that the defendant provided “a DVD that contained ‘Microsoft Office Excel
comma separated value files,’ rather than PDF files. . . . [and] that the data . . . was missing field
descriptions and was provided in comma-separated, rather than the requested text-delineated
format. The lack of field descriptions was corrected by [the defendant].”
J.C. Assocs. v. Fid. & Guar. Ins. Co., No. 01-2437, 2006 U.S. Dist. LEXIS 32919 (D.D.C. May
25, 2006). The Court ordered the plaintiff to make available to the defendant an OCR-scan
program (OmniPage Pro) to permit the defendant to convert and search these files for specific
keywords, check for privilege and then disclose to the plaintiff.
Fed. R. Civ. P. 34, Advisory Committee Note of 2006 recognizes that: “[t]he form of production
is more important to the exchange of electronically stored information than of hard-copy
materials, although a party might specify hard copy as the requested form.”
In re Bristol-Myers Squibb Sec. Litig., 205 F.R.D. 437, 443-444 (D.N.J. 2002) The Court noted
that, “[o]f course, in some instances, paper, rather than electronic, production may still be the
preferable method of discovery.”
MacNamara v. City of New York, No. 04-9216, 2006 U.S. Dist. LEXIS 82926, at *16-17 (D.N.Y.
Nov. 13, 2006). The defendants produced electronic arrest records of the plaintiffs and others to
which the plaintiffs objected. The plaintiffs contended that “significant errors, edits and
omissions” occurred at the data entry stage for arrest records and requested “the handwritten
worksheets.” The Court agreed and ordered the city “to produce . . . Worksheets for non-party
arrestees as well as named plaintiffs, subject to the ‘attorneys’-eyes-only’ designation . . . .”
Automated Litigation Support (ALS) Form and Online ESI Depository
Technically a disclosure of ESI in an automated litigation support (ALS) format is not a “form.”
However, many litigants, courts and agencies will agree or order that the exchange of ESI be
provided in an ALS format. For example, the Court may require that all ESI be provided in a
Summation “load” file which will enable the requesting party to immediately load, search,
analyze and produce ESI reports in Summation. These “load” files may contain a database,
images and text. Essentially, ESI is preprocessed to be used immediately with ALS systems.
Automated litigation support (ALS) generally refers to computer operations that support legal
functions in litigation. Summation and Concordance are two of the leading litigation ALS
systems. An ALS system can manage a large volume of documents, electronic data, transcripts
and other data in a secure environment for quick retrieval and analysis during litigation. In
addition, online ESI depositories, similar to such as Lextranet or CaseVault, are web based ALS
systems and are increasingly being used for hosting and management of ESI.
Arkfeld on Electronic Discovery and Evidence, § 5.2(B) Automated Litigation Support System
and § 5.2(C), Online ESI Depository
O’Bar v. Lowe’s Home Ctrs., Inc., No. 04-00019, 2007 U.S. Dist. LEXIS 32497, *21 (D.N.C.
May 2, 2007) (“If load files were created in the process of converting Native Files to Static
Images, or if load files may be created without undue burden or cost, load files should be
produced together with Static Images.”).
Quinby v. WestLB AG, No. 04-7406, 2006 U.S. Dist. LEXIS 64531, at *17-18 (D.N.Y. Sept. 5,
2006). The Court denied defendant’s request to shift the costs of reformatting data from a “.tif”
to a “.dii” format because the defendant failed to raise the issue in its initial motion. “[n9 ‘.tif’ or
‘Tagged Image Format’ is a commonly used electronic format for digitized pictures of
documents. It is often used to produce e-mails for loading onto Summation, a software program
designed to assist attorneys in searching, organizing and analyzing documents . . . ] [n10 ‘.dii’ is
a format which facilitates the loading of files onto Summation and may be created from .tif
Audio and Video
Audio has moved from analog recording with LPs (long-playing records) and tape cassettes as
the playback medium to digital recording using computers with a digital sound playback. Digital
audio files are usually compressed for less storage requirements and faster transmission. The
most popular audio file format today is MP3. MP3 is a standard technology and format for
compressing a sound sequence into a very small file while preserving the original level of sound
quality when it is played.
Video generally refers to recording, manipulating and displaying moving images in a format that
can be presented on a television or on a computer monitor. It is a recording produced with a
video recorder (camcorder) or some other device that captures full motion.
Arkfeld on Electronic Discovery and Evidence, § 3.18, Video; § 3.20, Audio
Documents you may be interested
Documents you may be interested