c# view pdf web browser : Reader extract pages from pdf software SDK dll winforms windows web page web forms 08thesis_migletz0-part108

NAVAL 
POSTGRADUATE 
SCHOOL 
MONTEREY, CALIFORNIA 
THESIS 
Approved for public release; distribution is unlimited 
AUTOMATED METADATA EXTRACTION 
by 
James Migletz 
June 2008  
Thesis Advisor:   
Simson Garfinkel 
Second Reader: 
Kevin Squire 
Reader extract pages from pdf - copy, paste, cut PDF pages in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Easy to Use C# Code to Extract PDF Pages, Copy Pages from One PDF File and Paste into Others
delete pages from pdf preview; delete page from pdf
Reader extract pages from pdf - VB.NET PDF Page Extract Library: copy, paste, cut PDF pages in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
Detailed VB.NET Guide for Extracting Pages from Microsoft PDF Doc
extract pages from pdf file online; delete pages of pdf online
THIS PAGE INTENTIONALLY LEFT BLANK 
C# PDF Text Extract Library: extract text content from PDF file in
inputFilePath); PDFTextMgr textMgr = PDFTextHandler.ExportPDFTextManager(doc); // Extract text content C# example code for text extraction from all PDF pages.
add and remove pages from pdf file online; cutting pdf pages
VB.NET PDF Text Extract Library: extract text content from PDF
PDF ›› VB.NET PDF: Extract PDF Text. VB.NET PDF - Extract Text from PDF Using VB. How to Extract Text from PDF with VB.NET Sample Codes in .NET Application.
cut pdf pages; cut pages from pdf online
i
REPORT DOCUMENTATION PAGE 
Form Approved OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, 
searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send 
comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to 
Washington headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 
22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503.
1. AGENCY USE ONLY (Leave blank) 
2. REPORT DATE   
June 2008 
3. REPORT TYPE AND DATES COVERED 
Master’s Thesis 
4. TITLE AND SUBTITLE  Automated Metadata Extraction 
6. AUTHOR(S)  James Migletz 
5. FUNDING NUMBERS 
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 
Naval Postgraduate School 
Monterey, CA  93943-5000 
8. PERFORMING ORGANIZATION 
REPORT NUMBER     
9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 
N/A 
10. SPONSORING/MONITORING 
AGENCY REPORT NUMBER 
11. SUPPLEMENTARY NOTES  The views expressed in this thesis are those of the author and do not reflect the official policy 
or position of the Department of Defense or the U.S. Government. 
12a. DISTRIBUTION / AVAILABILITY STATEMENT   
Approved for public release; distribution is unlimited 
12b. DISTRIBUTION CODE 
13. ABSTRACT (maximum 200 words)  
Metadata is data that describes data. There are many computer forensic uses of metadata and being able to extract 
metadata automatically provides positive forensic implications. This thesis presents a new technique for batch 
processing disk images and automatically extracting metadata from files and file contents. The technique is embodied 
in a program called fiwalk that has a plug-in architecture allowing new metadata extractors to be readily incorporated. 
Output from fiwalk can be provided in multiple formats such as ARFF and text. The plug-ins created for this thesis 
include one created by Simson Garfinkel for extracting metadata from .jpeg files, two for Microsoft Office documents 
(one for prior to Office 2007 release and one for Office 2007 release), and a default plug-in for extracting metadata 
from .gif, .pdf, and .mp3 files. To better understand the metadata available in common file formats such as .doc, 
.docx, .odt, .pdf, .mp3, .mp4, .jpeg, .tiff, and .gif, an examination of these formats is provided. 
15. NUMBER OF 
PAGES  
83 
14. SUBJECT TERMS  
Metadata, Metadata Extraction, Fiwalk, WV, Libextractor, File Formats, ARFF 
16. PRICE CODE 
17. SECURITY 
CLASSIFICATION OF 
REPORT 
Unclassified 
18. SECURITY 
CLASSIFICATION OF THIS 
PAGE 
Unclassified 
19. SECURITY 
CLASSIFICATION OF 
ABSTRACT 
Unclassified 
20. LIMITATION OF 
ABSTRACT 
UU 
NSN 7540-01-280-5500 
S
tandard Form 298 (Rev. 2-89)  
Prescribed by ANSI Std. 239-18
C# PDF Image Extract Library: Select, copy, paste PDF images in C#
Image: Extract Image from PDF. |. Home ›› XDoc.PDF ›› C# PDF: Extract PDF Image. How to C#: Extract Image from PDF Document.
deleting pages from pdf file; delete page from pdf file online
VB.NET PDF Image Extract Library: Select, copy, paste PDF images
Image: Extract Image from PDF. |. Home ›› XDoc.PDF ›› VB.NET PDF: Extract PDF Image. VB.NET PDF - Extract Image from PDF Document in VB.NET.
extract pages from pdf online; reader extract pages from pdf
ii
THIS PAGE INTENTIONALLY LEFT BLANK 
C# PDF Page Insert Library: insert pages into PDF file in C#.net
doc2.Save(outPutFilePath); Add and Insert Multiple PDF Pages to PDF Document Using C#. Add and Insert Blank Pages to PDF File in C#.NET.
extract pdf pages acrobat; extract page from pdf document
VB.NET PDF Page Delete Library: remove PDF pages in vb.net, ASP.
doc.Save(outPutFilePath). How to VB.NET: Delete Consecutive Pages from PDF. doc.Save(outPutFilePath). How to VB.NET: Delete Specified Pages from PDF.
copy pages from pdf to new pdf; extract pdf pages
iii
Approved for public release; distribution is unlimited 
AUTOMATED METADATA EXTRACTION 
James J. Migletz 
Major, United States Marine Corps  
B.S., Northwest Missouri State University, 1991 
Submitted in partial fulfillment of the 
requirements for the degree of 
MASTER OF SCIENCE IN COMPUTER SCIENCE 
from the 
NAVAL POSTGRADUATE SCHOOL 
June 2008 
Author: 
James J. Migletz 
Approved by:  
Simson Garfinkel 
Thesis Advisor 
Kevin Squire 
Second Reader 
Peter J. Denning 
Chairman, Department of Computer Science 
VB.NET PDF Page Insert Library: insert pages into PDF file in vb.
Page: Insert PDF Pages. |. Home ›› XDoc.PDF ›› VB.NET PDF: Insert PDF Page. Add and Insert Multiple PDF Pages to PDF Document Using VB.
copying a pdf page into word; copy web page to pdf
C# PDF Page Delete Library: remove PDF pages in C#.net, ASP.NET
doc.Save(outPutFilePath); Demo Code: How to Delete Consecutive Pages from PDF in C#.NET. Demo Code: How to Delete Specified Pages from PDF in C#.NET.
delete pages out of a pdf; copy pages from pdf into new pdf
iv
THIS PAGE INTENTIONALLY LEFT BLANK 
v
ABSTRACT 
Metadata is data that describes data. There are many computer forensic uses of 
metadata and being able to extract metadata automatically provides positive forensic 
implications. This thesis presents a new technique for batch processing disk images and 
automatically extracting metadata from files and file contents. The technique is embodied 
in  a  program  called  fiwalk  that  has  a  plug-in  architecture  allowing  new  metadata 
extractors to be readily incorporated. Output from fiwalk can be provided in multiple 
formats such as ARFF and text. The plug-ins created for this thesis include one created 
by Simson Garfinkel for extracting metadata from .jpeg files, two for Microsoft Office 
documents (one for prior to Office 2007 release and one for Office 2007 release), and a 
default  plug-in  for  extracting  metadata  from  .gif,  .pdf,  and  .mp3  files.  To  better 
understand the metadata available in common file formats such as .doc, .docx, .odt, .pdf, 
.mp3, .mp4, .jpeg, tiff, and .gif, an examination of these formats is provided. 
vi
THIS PAGE INTENTIONALLY LEFT BLANK 
vii
TABLE OF CONTENTS 
I.   
INTRODUCTION........................................................................................................1 
A. 
PURPOSE OF STUDY....................................................................................1 
B. 
THESIS ORGANIZATION............................................................................2 
II.    METADATA................................................................................................................3 
A. 
FILE SYSTEM METADATA........................................................................4 
B. 
METADATA IN MEDIA FILES...................................................................4 
1. 
GIF........................................................................................................5 
2. 
JPEG.....................................................................................................6 
3. 
Music: MP3 and AAC.........................................................................7 
4. 
Tagged Image File Format (TIFF).....................................................9 
C. 
METADATA IN DOCUMENT FILES.........................................................9 
1. 
Microsoft Office...................................................................................9 
2. 
Office Open XML Format (Microsoft Office 2007)........................11 
3. 
Open Office.........................................................................................14 
4. 
Portable Document Format (PDF)...................................................16 
III.    A FRAMEWORK FOR AUTOMATED METADATA EXTRACTION.............19 
A. 
FIWALK.........................................................................................................19 
1. 
fiwalk Introduction............................................................................19 
2. 
fiwalk Algorithm................................................................................19 
3. 
fiwalk Evaluation...............................................................................20 
B. 
USING THE SLEUTHKIT PROGRAMMATICALLY............................21 
C. 
DOMEX-GATEWAY INTERFACE (DGI)................................................22 
D. 
ARFF...............................................................................................................22 
IV.    PLUG-INS FOR AUTOMATED METADATA EXTRACTION.........................23 
A. 
JPEG PLUG-IN (JPEG_EXTRACT)..........................................................23 
B. 
MICROSOFT OFFICE PLUG-IN...............................................................23 
1. 
WV PLUG-IN (Word_Extract)........................................................24 
2. 
DOCX_Extractor...............................................................................26 
C. 
DEFAULT PLUG-IN (LIBEXTRACT_PLUGIN).....................................29 
V.    ANALYSIS OF OPEN OFFICE AND OFFICE OPEN XML FILES..................33 
A. 
EXAMINATION OF MICROSOFT OFFICE 2007 DOCUMENTS .......33 
1. 
Document.xml file..............................................................................33 
2. 
Content Controls................................................................................33 
3. 
Identifiers............................................................................................34 
B. 
TIMESTAMPS...............................................................................................36 
C. 
ENCRYPTION...............................................................................................38 
D. 
THUMBNAILS..............................................................................................40 
VI.    PRIOR AND RELATED WORK.............................................................................43 
A. 
OTHER  APPROACHES  TO  AUTOMATIC  METADATA 
EXTRACTION..............................................................................................43 
viii
1. 
Metadata Extraction in EnCase.......................................................43 
2. 
Metadata Extraction in the Sleuthkit...............................................44 
B. 
USES OF METADATA IN COMPUTER FORENSICS...........................45 
1. 
File Feature Extraction and Cross Drive Analysis.........................45 
2. 
Uses of Metadata in Data Mining.....................................................47 
C. 
USES OF METADATA IN SEARCHES.....................................................48 
1. 
FileHold...............................................................................................48 
2. 
Google Desktop Search/Google Search Appliance.........................49 
3. 
Oracle Data Integrator......................................................................49 
VII.    CONCLUSION..........................................................................................................51 
A. 
FINDINGS......................................................................................................51 
1. 
Automated Extraction.......................................................................51 
2. 
Metadata Extraction Opportunities.................................................51 
3. 
Metadata Comparison.......................................................................52 
4. 
Deficiencies in Metadata Extraction Tools......................................52 
B. 
FUTURE WORK...........................................................................................54 
LIST OF REFERENCES......................................................................................................57 
INITIAL DISTRIBUTION LIST.........................................................................................61 
Documents you may be interested
Documents you may be interested