67
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 2
•
Export TIFF, PDF, TEXT, Source and Metadata to CSV, Concordance, Summation,
RingTail, and Introspect Case Management load file formats.
Discovery Assistant Downloads:
Main Application:
Discovery Assistant.
Add-Ins
Postscript Add-on
- Supports text searchable PDF output.
24 bit Color Add-on - Supports 24 bit TIFF color output.
Additional Tools:
Report Manager
- Converts native XML projects to MDB, XLS and TXT.
TeraBite
- Break millions of files into manageable import file lists.
TIFF print driver
- stand alone TIFF print driver.
Introspec IDX tool
- conversion add-in tool for Introspec IDX format.
Review and search tools:
QuickReview
- HTML based client review tool with built in search.
Summary Sheet:
Batch Processing
Load, Process, Review, Export
Supported File Types
Outlook (PST, MSG), Lotus Notes (NSF), EML,
Zip, DOC, XLS, PPT, PDF, Html, Txt, TIF, Jpeg,
BMP, Gif, Png + many more.
Specialized processing
XLS, DOC, PPT, OLE, email, zip.
File De-Duplication
At File and Message level. Local and Global.
Metadata Extraction
99 separate Metadata fields
Export Load Files
CSV, Summation, Concordance, Opticon, Ipro,
Ringtail, Introspec.
Export File Formats
Single and Multi-page B&W and Color TIFF,
scanned PDF, searchable PDF, Postscript, Paper,
Text.
Export Files
Image, Metadata, Text, original Source
Export Folder Format Options
Flat, Volume/Box, Mirror Source, Bates Folder.
Status Reporting
Load, Convert, Process, Export,
Blank Page Removal
Extremely fast
Assign Doc ID’s
Up to 20 alphanumeric chars
Assign Bates Numbers
Up to 20 alphanumeric chars
Bates Stamping
Supports force white space, 6 page locations.
Print to Paper
TIFF or Postscript Blowback
Document Pass-Through
For files that can’t be converted
Quality Control Review Module
Accept, Reject, Skip, Replace, OCR, View Source
VB Imaging - VB ISSN Barcode Generating help VB.NET developers draw and add standard ISSN barcode on photos, images and BMP image formats, our users can even create ISSN barcode on PDF, TIFF, Excel
extract image from pdf c#; pdf image extractor c#
63
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 3
Manual TIFF File Replacement
Manual ‘Print To Tiff’ override
Native Text Extraction
From file, or from file print stream
OCR Text Extraction
Handles portrait and Landscape images.
OLE Embedded Object Extraction
DOC, PPT and XLS OLE extraction.
Support for Foreign Character Sets
MBCS only
Intelligent Process Monitoring.
Monitors 6 timeout values, auto-close dialogs.
Parent Child Relationships
Maintains the parent/child/sibling hierarchy
Advanced Email Handling
PST, MSG, NSF, EML types
Upgrade Path to Multiple Servers
Scalable architecture
Microsoft Excel Spreadsheet formatting
20+ print control settings
Microsoft Word formatting
10+ print control settings
Microsoft PowerPoint formatting
4+ print control settings
Built in Review and Search Tool
Distributable HTML based client review tool with
support for indexed search.
Batch Processing
Discovery Assistant allows the following functions to be batched:
File Import with local de-duplication.
File Conversion to TIFF, PDF, Text, and Metadata.
Global de-duplication.
Removal of Blank Pages.
OCR text extraction.
Bates Stamping.
Export.
Printing.
All processed data is stored within a project directory.
Files can be loaded into a Discovery Assistant project in one of 4 ways:
Add File.
Add Folder.
Add from file list.
Drag from Windows Explorer into the All Files tab.
To batch load large file sets, download and install the TeraBite utility. This program allows the user to
create an ordered file list for loading into Discovery Assistant.
Documents are categorized as follows:
Convertible
Document can be converted on conversion machine
Non-Convertible
Document cannot be converted. Requires user analysis.
Queued for Conversion
In list of files to convert.
Converted
Converted to TIFF, Text and Metadata
Stamped
Bates Stamped
Failed
Failed while converting. Requires user analysis.
53
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 4
During the conversion process, files are moved from the 'all files' category through to the 'stamped'
category through a series of steps. Each step is represented by a tab, which contains a subset of the 'all
files' list - representing what stage the conversion process has reached for those files.
Processing can be stopped and re-started at any time.
At any point during the batch process the user can click on a processed file to view:
Converted TIFF / PDF file
Extracted Text
Extracted MetaData
Native Source Document
Children, Parent, and Duplicates
Supported File Types
Discovery Assistant processes a wide variety of input file types. These include but are not limited to:
Microsoft Word documents, Excel spreadsheets, PowerPoint presentations, Outlook email files (PST,
MSG), Outlook Express Files (EML), Lotus Notes files (NSF), WordPerfect documents, rich text format
files (RTF), Microsoft Visio files, Corel Draw files, CAD/CAM files, Lotus 123 spreadsheets, text files,
HTML documents, Adobe Acrobat documents (PDF), compressed archives (ZIP), images (TIF, JPG,
BMP, etc.), scanned files and more.
Discovery Assistant uses the native application to petrify documents so any documents that possess a
print or printto command on your system can be processed. At time of installation, Discovery Assistant
produces a list that identifies any document type with a ‘print’ or ‘printto’ file association.
Discovery Assistant contains a conditional filter to exclude executable files, hidden files, system files, and
an optional feature to not process email attachments or sub-directories.
Document type is determined by file content, ensuring that misnamed documents are properly processed.
Common Supported file types:
Microsoft Word (DOC, DOCX)
All versions, including Office 2007
Excel (XLS, XLSX, CSV)
All versions, including Office 2007
PowerPoint (PPT, PPTX)
All versions, including Office 2007
Outlook Database (PST, MSG)
Recognizes 14 different message types
Outlook Express (DBX, EML)
EML only. DBX must be converted to PST first
Rich Text Format (RTF)
Uses Word to render
Visio (VSD)
2000/2003/2007
Corel Draw (CDR)
Requires Corel Draw be installed
Corel Photo Paint (CPT)
Requires Corel Photo Paint be installed
Cad Cam (DWG, DXF, DWF)
Uses ABViewer from Cad Soft Tools
WordPerfect
Uses Word Perfect
Lotus 1-2-3 (WKS)
Uses Lotus 1-2-3
64
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 5
Lotus Notes Database (NSF)
Requires Notes be installed
Text (TXT, DAT, LOG, BAT)
ASCII and UNICODE
HTML
Uses Internet Explorer to render
Adobe Acrobat (PDF)
Uses latest version of Adobe Reader
Compressed Archives (ZIP)
Maintains parent/child relationships
Images (TIF, JPG, BMP, PNG, GIF, DCX) Internal built-in Viewer
Scanned files
Copy-thru option to improve speed
OLE documents
Word, Excel, PowerPoint OLE embedded docs
File De-duplication
De-duplication can be done at the file level, or message level (multiple attachments).
Global De-duplication is supported across multiple projects.
Uses MD5 Hash code to identify duplicates, and a full binary compare to confirm matches.
Extraction of Metadata and Text
Discovery Assistant supports full metadata extraction from source documents, including MS Office
specific tags, Microsoft Outlook email specific tags, and Lotus Notes specific tags. Standard email tags
include Date Sent, Time Sent, Subject, Text Body, Html Body, Filename, Author, File Size, File Date, File
Time, email header information, To, From. Lotus Notes specific metadata includes IMLog, Appointment,
Bookmark, Notice, Phone Message, Return Receipt, and Task Form.
Document text is extracted at time of printing, either from the document itself, or from the print stream.
This ensures 100% accuracy. If the document text cannot be extracted (perhaps because the document
is an image file) then the user has the option to OCR the document after conversion to TIFF
(petrification). Discovery Assistant uses the Microsoft Office OCR engine to extract text from image files
(99% accuracy). If Word 2003 or Word 2007 OCR engine is not available, then Discovery Assistant uses
its own built-in OCR engine.
The full set of extractable and exportable Metadata fields is:
1
ITEMID
Discovery Assistant file ID
2
BEGINDOC
Export file title of first page
3
ENDDOC
Export file title of last page
4
APPLICATION_NAME
Name of creating application
5
ATTACHMENTSCOUNT
Count of attachments
6
ATTACHLIST
List of export file titles of attachments
7
ATTACHMENTRANGE
Range of export file titles of attachments
8 GROUPRANGE
Range of export file titles that belong as a group. e.g. an email and its
attachments or a zip file and its contents
9 BATESGROUPRANGE
Range of Bates Numbers that belong as a group. e.g. an email and its
attachments or a zip file and its contents
130
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 6
10 BEGATTACH
Export file title of first page of group. e.g. an email and its attachments
or a zip file and its contents
11 ENDATTACH
Export file title of last page of group. e.g. an email and its attachments
or a zip file and its contents
12
ATTACHTITLE
File title of attachment
13
AUTHOR
Document author
14
LASTAUTHOR
Last Document author
15
REVNUM
Last Document author
16
BATESBEG
Beginning Bates number
17
BATESEND
Ending Bates number
18 BATESBEGGROUP
Beginning Bates number for group. e.g. an email and its attachments
or a zip file and its contents
19 BATESENDGROUP
Ending Bates number for group. e.g. an email and its attachments
or a zip file and its contents
20
BCC
Blind Carbon Copy recipient
21
CC
Carbon Copy recipient
22
DACOMMENT
Discovery Assistant Pass-Through comment
23
DOCTEXT
Document Text
24
FILECREATIONDATE
Source document creation date
25
FILECREATIONTIME
Source document creation time
26
FILEMODIFYDATE
Source document modified date
27
FILEMODIFYTIME
Source document modified time
28
FILEACCESSDATE
Source Document Last Access Date
29
FILEACCESSTIME
Source Document Last Access Time
30
FILEPRINTDATE
Source Document Last Print Date
31
FILEPRINTTIME
Source Document Last Print Time
32
SENTDATE
Email sent date
33
SENTTIME
Email sent time
34
RECEIVEDDATE
Email received date
35
RECEIVEDTIME
Email received time
36
DOCTITLE
Document Title
37
DUPPATHS
Source document paths of duplicate items
38
BODY
Body of email
39
FILEEXTENSION
Source file extension
40
FILEPATHNAME
Source file path
41
EXPORTEDSOURCEFILEPATHNAME Exported source file path
42
FILENAME
Source file name (including extension)
43
FILEDISPLAYNAME
Source file title
44
FILETYPENAME
Source file type name
45
PARENT
Email parent folder name
46
FROM
Email From address
47
HASHCODE
MD5 hash code value for source document
48
ISDUP
True/False is duplicate
49
MSGID
Email message ID
50
DOCUMENTPAGES
Output file page count
51
PARENTID
Export file title of parent item
C# Imaging - Scan RM4SCC Barcode in C#.NET PDF, Word, Excel and PPT) and extract barcode value Load an image or a document(PDF, TIFF, Word, Excel barcode from (scanned) images, pictures & photos that are
how to extract images from pdf in acrobat; extract image from pdf acrobat
140
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 7
52
SHORTFILETITLE
Short file title
53
OBJECTSIZE
Source file size on disk
54
STOREID
Message store identifier
55
STORENAME
Message store source file name
56
SUBJECT
Email subject
57
SENTTO
Email To address
58
ITEMINDEX
Item Index
59
INETHEADER
Internet Header
60
DOCID
Document ID
61
ALTRCPALLOW
Alternate Recipient Allowed
62
AUTOFWD
Auto Forwarded
63
BILLINFO
Billing Information
64
CATEGOR
Categories
65
COMPANIES
Companies
66
CNVINDEX
Conversation Index
67
CNVTOPIC
Conversation Topic
68
DEFDLVDATE
Deferred Delivery Date
69
DEFDLVTIME
Deferred Delivery Time
70
DELAFTSUB
Delete After Submit
71
EXPIRYDATE
Expiry Date
72
EXPIRYTIME
Expiry Time
73
HTMLBODY
HTML Message Body
74
IMPORTANCE
Importance
75
MSGCLASS
Message Class
76
MSGMLG
Message Mileage
77
NOAGING
No Aging
78
DLVRPTREQ
Originator Delivery Report Requested
79
OLINTVER
Outlook Internal Version
80
OLVER
Outlook Version
81
RDRECREQ
Read Receipt Requested
82
RCVBYNAME
Received By Name
83
RCVONBEHALFNAME
Received On Behalf Of Name
84
RCPREASSPROHIB
Recipient Reassignment Prohibited
85
REPLRECIPS
Reply Recipients
86
SAVED
Saved
87
SENSITIVITY
Sensitivity
88
SENT
Sent
89
SNTONBEHALFNAME
Sent On Behalf Of Name
90
SUBMITTED
Submitted
91
UNREAD
UnRead
92
READ
Message read y/n?
93
VOTINGOPT
Voting Options
94
VOTINGRESP
Voting Response
95
GLOBALPRIMARY
'Yes' if this is the first occurrence of this item in the global table.
96
GLOBALCOUNT
Count of occurrences of this item in the Global Project table.
49
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 8
97 SRCCUSTOD
Source Custodian. Obtained from third to last directory name in source
file path.
98 SRCBOX
Source Box. Obtained from second to last directory name in source
file path.
99
SRCFOLDER
Source Folder. Obtained from last directory name in source file path.
Export options
Discovery Assistant supports a variety of export options including:
Export Formats:
Summation eDii Load File
Concordance DAT or DCB files
Opticon Log files
Ringtail Load File
Ipro Load File
Comma Separated Value (CSV)
Introspec IDX Load File
File formats:
Single and multi page B&W TIFF.
Single and multi page Color TIFF.
Multi-page scanned PDF.
Multi-page text searchable PDF.
Multi-page Postscript.
Paper only (print direct to printer).
Text only
Supported Output resolutions and page size (determined at time of conversion):
Resolutions: 100x100, 200x200, 300x300, 400x400
Letter, Legal and A4 output page sizes
Additional export file types:
Metadata Text.
Document Text.
Original source file.
Export Folder Structure:
Flat
Volume / Box
Mirror Source
Bates Folder
User configurable Export File Naming Schemes can include the following fields:
Project ID
File ID
File Title
Short File Title
39
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 9
File Extension
Bates Start
Bates End
Page Number
Bates Number
Document ID
Status Reporting
Discovery Assistant generates summary reports during critical phases of the process. Reports are stored
with the project, and can be reviewed chronologically using Windows Explorer file manager.
Reports indicate any warnings, errors, time to complete the function, and any additional information that
might be necessary to track the progress of one or more files through the process.
Reports are generated at time of:
File Import
File Conversion
File Export
Batch de-blanking
Batch OCR
Batch Bates Stamping
Blank Page Removal (Page Deblanking)
Spreadsheet files by default are set to print the whole sheet, not the default print range. The printing
process does generate a large number of blank pages.
Discovery Assistant can batch process files to detect and eliminate blank pages. The process is
extremely fast, and can go through 1000’s of pages in a second looking for blank pages.
Bates Numbers and Document ID's
Converted documents can be assigned both a Bates Number and a Document ID.
Assigned ID’s support up to 20 alphanumeric characters.
Bates Range values are automatically assigned to parent items when Bates Numbers are assigned in
‘child next’ order.
Stamping
Discovery Assistant includes the ability to stamp converted files with specific information. The user is able
to control the stamp content from the Stamp Options. Possible stamps are: Bates number, document ID,
file name, file path, file type, page number, number of pages and many more. Discovery Assistant
supports a user selectable option to shrink the document image so that stamps do not obscure image
data.
43
`
ImageMAKER Development Inc.
Discovery Assistant Specifications.
Copyright © 2004-2007 ImageMAKER Development Inc.
Page 10
Stamp fields can be placed in one of 6 coordinate positions. Top left, Top Center, Top Right, Bottom
Left, Bottom Center, Bottom Right.
Stamp text fields can include a combination of the following data fields (plus user defined strings)
Project ID
File ID
File Name
File Path
File Type
Page Number
Total Pages
Bates Page
Bates Number
Document ID
Confidential
Global Primary
Global Count
Hash Code
Sort Options
Any field type tracked and displayed by Discovery Assistant can also be sorted. Sortable columns
include status, filename, source path, size, date/time to name just a few.
Printing a Hard Copy
Stamped or Unstamped processed TIFF / Postscript files can be spooled to a standard printer. Metadata
pages can be printed as slip sleeves.
Pass Through
Some files cannot be converted, perhaps because they are password protected, corrupt, or unprintable
(such as WAV files). These files can be ‘passed through’ as a converted file by creating a placeholder
file. The placeholder file includes the file’s metadata fields and a user configurable message.
Quality Control Review Module
The QC module allows users to review the conversion results, and to re-queue any incorrectly converted
documents, and/or to manually replace TIFF files.
Functional features include:
Accept
Reject
Skip
Replace
OCR
View original native file.
Documents can be reviewed one page at a time. Text scrolls to match the current page.
Documents you may be interested
Documents you may be interested