83
© ABBYY. All rights reserved. Page 3 of 53
2.2.3.
Correspondence between input and output files .................................................................. 25
2.3.
Notifications .......................................................................................................................... 26
2.3.1.
Including server and workflow names into the text of notification messages ....................... 26
2.3.2.
Notification about near license expiry ................................................................................... 26
2.4.
Job rejection without loss of files .......................................................................................... 26
2.5.
Interface improvements ........................................................................................................ 27
2.5.1.
Main window of Administration Console ............................................................................... 27
2.5.2.
Workflow status pane ........................................................................................................... 27
2.6.
Soft stop of the workflow processing .................................................................................... 28
3.
Workflow settings ...................................................................................................................... 29
3.1.
Document Library workflow type .......................................................................................... 29
3.1.1.
Periodical crawling of document libraries ............................................................................. 29
3.2.
Input settings ........................................................................................................................ 30
3.2.1.
Processing SharePoint libraries ........................................................................................... 30
3.2.2.
Using IFilter for processing PDF files in MS SharePoint ...................................................... 31
3.2.3.
Filtering files for processing and settings for unprocessed files ........................................... 31
3.2.4.
Using the SSL protocol for data protection ........................................................................... 33
3.3.
Processing settings ............................................................................................................... 33
3.3.1.
Special mode for processing technical drawings ................................................................. 33
3.3.2.
Despeckle images option...................................................................................................... 34
3.3.3.
Setup the color of filling the document edges after deskew ................................................. 34
3.3.4.
Additional fonts ..................................................................................................................... 35
3.3.5.
To speed up processing, text in pictures is not recognized by default ................................. 35
3.3.6.
Blank page detection settings ............................................................................................... 35
3.4.
PDF processing options........................................................................................................ 36
3.4.1.
Improved MRC compression method of output PDF files .................................................... 36
3.4.2.
Version, format, and other parameters of an output PDF file ............................................... 37
3.4.3.
Export to PDF/A-3 format ..................................................................................................... 37
3.4.4.
Tagged PDF enabled by default ........................................................................................... 37
3.4.5.
Possibility to skip processing PDFs with a text layer ............................................................ 37
3.4.6.
Ability to embed a text layer and keep the image and all PDF file properties ...................... 38
3.4.7.
Enabling and disabling Fast Web View for PDF files ........................................................... 39
3.4.8.
Using PDF text layer for recognition results improvement ................................................... 39
3.4.9.
Using PDF text layer for generating quality output files of different formats ........................ 39
3.5.
Output settings ...................................................................................................................... 40
3.5.1.
Overwriting files in an output folder ...................................................................................... 40
3.5.2.
Export format compatible with FineReader Engine 11 ......................................................... 41
3.5.3.
KeepPages parameter .......................................................................................................... 41
3.5.4.
Export to specific column types in SharePoint ..................................................................... 41
3.5.5.
Export to ePub3 format ......................................................................................................... 42
3.5.6.
Settings of units measurement for export to ALTO XML ...................................................... 42
How to C#: Preview Document Content Using XDoc.Word Get Preview From File. You may get document preview image from an existing Word file in C#.net. You may get document preview image from stream object in C#.net.
copy images from pdf; how to cut an image out of a pdf file
67
© ABBYY. All rights reserved. Page 4 of 53
4.
Document processing ............................................................................................................... 43
4.1.
Improved recognition of Arabic texts ........................................................................................ 43
4.2.
Ability to limit the number of processed pages in input files ....................................................... 43
4.3.
Support of new barcode type - USPS-4CB (Intelligent Mail Barcode) ...................................... 43
4.4.
Disabled image compression of lossy JBIG2 type ............................................................... 43
5.
Scanning Station ....................................................................................................................... 44
5.1.
Sending registration parameters values to index fields ........................................................ 44
6.
Verification and Indexing Stations ............................................................................................ 45
6.1.
Manual selection of documents for verification and indexing ............................................... 45
6.2.
Saving documents ................................................................................................................ 46
6.3.
Timeout of inactivity .............................................................................................................. 46
6.4.
Improved work with document types and index fields on Indexing Stations ........................ 46
6.4.1.
Import of index fields from files ............................................................................................. 46
6.4.2.
Quick input of index fields ..................................................................................................... 47
6.4.3.
Possibility to combine values from several regions into a one index field ............................ 47
6.5.
User interface changes ......................................................................................................... 48
6.5.1.
Verification Station ................................................................................................................ 48
6.5.2.
Indexing Station .................................................................................................................... 48
7.
Operating systems .................................................................................................................... 48
7.1.
Support for Windows Server 2012 Release 2 ...................................................................... 48
7.2.
Discontinued support for Windows XP and Windows Server 2003 ...................................... 49
8.
Scripting .................................................................................................................................... 49
8.1.
Access to subsequent pages from the document assembly script ...................................... 49
8.2.
Detecting the workflow name by script ................................................................................. 49
9.
Changes in the COM-based API and Web API ........................................................................ 49
9.1.
Namespace changes ............................................................................................................ 49
9.2.
Compatible API ..................................................................................................................... 49
9.3.
Automatic API deployment on 64x operating systems ......................................................... 49
9.4.
Added objects ....................................................................................................................... 49
9.4.1.
Correspondence between input and output files .................................................................. 50
9.4.2.
Support of the recognition service scenario (for NLC) ......................................................... 50
9.4.3.
Deleting of jobs ..................................................................................................................... 52
10.
UI and Documentation localization ....................................................................................... 52
74
© ABBYY. All rights reserved. Page 5 of 53
Introduction
About the document
This document describes new features that are implemented in ABBYY Recognition Server 4.
About the product
ABBYY Recognition Server 4 is intended to provide new technology including improved recognition of texts in
Arabic language, better integration with SharePoint, new PDF processing features and other improvements. Main
server features such as stability, performance, and auto-recovery were revised and improved. This version also
includes functionality for processing read-only folders, advanced logging, some UI changes and bug fixes.
Release 3 – Key features and enhancements
Part #: 1135/9, build # 4.0.4.1425, OCR Technologies build # 13.0.20.54, release date: 15/06/2015.
•
Conversion of Office file formats
•
Processing the entire SharePoint portal with child sites within one workflow
•
Saving output files in input folders
•
Writing original documents as attachments to PDF/A and PDF documents
•
Improvements in export to ALTO XML
•
Sending notifications to the Administrator via an SMTP server
Release 2 - Key features and enhancements
Part #: 1135/6, build # 4.0.3.1167, OCR Technologies build # 13.0.15.131, release date: 14/11/2014
New features and changes in Release 2 are marked with the blue color here and in the document below.
The major features:
•
Improved MRC compression method
•
Using IFilter for processing PDF files in MS SharePoint
•
Processing the SharePoint document libraries:
o
Crawling of the complete SharePoint site (including multiple libraries and folders)
o
Periodical re-crawling settings
•
Export to specific column types in SharePoint (support of Date, Number, and selected other formats)
•
Export to PDF/A-3
Other improvements:
•
Improved e-mail notifications:
o
In advance notifications about license expiry
o
Information on server name in the message text
•
Sending registration parameters values from Scanning Station to index fields
•
Soft stop of the workflow processing
•
Support of failover cluster
•
Using PDF text layer for generating output files
•
Blank page detection parameters
•
New barcode type - USPS-4CB (Intelligent Mail Barcode)
•
New export format: ePub3
•
Settings of units measurement for export to ALTO XML
•
Disabled image compression of lossy JBIG2 type
•
Tagged PDF enabled by default
•
Possibility to combine values from several areas into a one index field
•
Access to subsequent pages from the document assembly script
•
Detecting the workflow name by script
53
© ABBYY. All rights reserved. Page 6 of 53
Release 1 Multilingual - Key features and enhancements
Part #: 1135/5, build # 4.0.2.952, OCR Technologies build number 13.0.13.21, release date: 14/08/2014
•
Translation of UI and help into the following languages:
o
French
o
German
o
Italian
o
Spanish
o
Chinese
o
Portuguese (Brazil)
o
Czech
o
Hungarian
o
Polish
Release 1 (English and Russian User Interface) - Key features and
enhancements
Part #: 1135/4, build # 4.0.2.943, OCR Technologies build number 13.0.13.15, release date: May 27, 2014
•
Improved fault tolerance and logging
•
Processing documents in “read-only” mode
•
Processing of documents in SharePoint libraries
•
Enhanced work with PDF files
•
Better support for construction drawings
•
Faster recognition of Arabic texts
•
User management via Active Directory
Installing the new version
Recognition Server 4 can be installed on the same computer where Recognition Server 3.5 or previous versions
were installed.
Configuration of a previous version of ABBYY Recognition Server can be imported in ABBYY Recognition Server 4.
For further information, please see System Administrator’s Guide, “Upgrade from the previous versions of ABBYY
Recognition Server.”
Note. Please be aware that some changes have been made to the XML Result file scheme and the corresponding
API object. This may lead to modifications in your custom code written for integration of ABBYY Recognition Server
with third-party systems. Please find details below in this document or in the XMLResult description article in the
help file.
Licensing
Recognition Server 4 requires licenses generated specifically for this version of the product. It cannot work with a
license generated for Recognition Server 3.5 or earlier.
32
© ABBYY. All rights reserved. Page 7 of 53
New Features and Improvements
Release 3
1.
Import
1.1.
Conversion of Office file formats
General description
It is possible now to process digitally created documents (e.g. in Microsoft Office file formats) together with images
and PDF files.
This feature enables simultaneous input for of documents in various formats. Any digital library can be normalized
and made searchable and ready for long-term storage.
Imported document will be processed according to the workflow settings. The most common scenario is to import
various files and convert them to PDF or PDF/A. However, other output formats, if supported by Recognition
Server, can also be used.
This kind of conversion requires the corresponding Microsoft Office v.2007+ (or LibreOffice v.4.2+) or other third
party application to be installed on the computer with Recognition Server components (server and/or station).
Files converted to PDF via Microsoft Office will have superior visual quality and a text layer inherited from the
original document.
How to enable
Step 1. It is necessary to configure an input handler for opening Office files and converting them into to a PDF
suitable for further processing by Recognition Server. Input handlers can be selected in the Input Handlers dialog
box (1. Input tab, Handlers… button).
Converted by
Supported Formats
Microsoft Office
DOC, DOCX, RTF, TXT, HTML, HTM, XLS, XLSX, PPT, PPTX
X
LibreOffice
DOC, DOCX, RTF, ODT, XLS, XLSX, ODS, PPT, PPTX, ODP
, PPTX, ODP
P
22
© ABBYY. All rights reserved. Page 8 of 53
For conversion via Microsoft Office or LibreOffice the event handler, named When Office File Is Received
(Settings), should be enabled.
In the handler properties, the user can specify:
•
an application for converting the files (Microsoft Office or LibreOffice)
•
a processing component of Recognition Server (server or station)
•
(for Microsoft Office only) user account permissions to run the application, if required. The user
credentials should be specified if Microsoft Office is running under the user account different from the
account used for running the Recognition Server service.
For conversion via the third party application, the input handlers based on custom script should be used.
Step 2. For new workflows of the Hot Folder type, conversion of all files is allowed by default (*.* is specified in
“Process files:” mask). For new workflows of the Document Library type, the desired Office files extensions should
be added to a mask manually (in order to prevent documents from being converted by mistake).
To specify the Office files formats to be processed by Recognition Server, users can also use the Configuration.xml
file (<OnFileReceivedCustomOffice >). In this parameter, the user can disable the conversion of Office files or
specify the extensions of files that should be processed.
Note: Import of Office files is not available in Microsoft Search IFilter and Google Search Appliance Connector
workflows. These files do not require any processing prior to be indexed by search engines.
Implemented in: Release 3.
26
© ABBYY. All rights reserved. Page 9 of 53
1.2.
Import Event Handlers
New scripts were added to handle the import events managed by server and/or by station (2. Input tab, Handlers…
button). Users can fine-tune document preprocessing using scripts that alter or improve input files.
There are two script-based event handlers: When File Is Received by Server (Custom Script) and When File Is
Received by Station (Custom Script). A script is run separately for each file.
By means of scripts, the user can analyze the input file (name, extension), then preprocess the file and send to
processing or exclude this file from processing (mark as processed) and add a notification to the Event Log.
For instance, a document can be preprocessed by the external application (converted to format suitable for the
further processing by Recognition Server); its resolution can be changed prior to recognition, etc.
The third type of events serves for enabling the preprocessing of office files by Microsoft Office or LibreOffice. It
does not require scripting.
Implemented in: Release 3.
1.3.
Processing the entire SharePoint portal with child sites within one
workflow
Processing the entire SharePoint portal, including its multiple child sites, can now be configured within a single
workflow.
In the previous release, the user had to create several workflows to access individual child sites.
After specifying the connection settings to the SharePoint portal, the complete structure of the portal is shown in
the Select SharePoint Libraries dialog box. Here it is possible to select any child sites and their libraries to be
processed.
Note: Now the feature of processing Microsoft SharePoint libraries requires the installation of .Net Framework
4.5.2, which can be installed separately or by enabling the Microsoft SharePoint Support option when installing
the Server Manager.
10
© ABBYY. All rights reserved. Page 10 of 53
Implemented in: Release 3.
1.4.
Ability to process files only after an XML ticket is added
A new mode is supported by Recognition Server that always requires an XML ticket before processing documents.
This is useful in situations when documents are placed into an input folder before an XML ticket arrives. The option
of waiting for an XML ticket can prevent files from being processed incorrectly.
To enable this feature, modify the input file mask so as to allow only XML files (*.xml). Documents will not be
processed until an XML ticket is placed into the input folder.
Implemented in: Release 3.
Documents you may be interested
Documents you may be interested