53
ABBYY FineReader 12 User‘s Guide
33
Important! When you save a FineReader document, any user patterns and languages that
were created when you were working with this document are saved in addition to page
images and text.
Closing a document
To close a document page, click Close Current Page on the Document menu.
To close the entire document, click Close FineReader Document on the File menu.
Splitting FineReader documents
When processing large numbers of multi–page documents, it is often more practical to scan
all the documents first and only then analyze and recognize them. However, to preserve the
original formatting of each paper document correctly, ABBYY FineReader must process each
of them as a separate FineReader document. ABBYY FineReader includes tools for grouping
scanned pages into separate documents.
To split a FineReader document into several documents:
1. On the File menu, click Split FineReader Document… or select pages in the Pages
pane, right–click the selection, and then click Move Pages to New Document…
2. In the dialog box that opens, create the necessary number of documents by clicking the
Add document button.
3. Move pages from the Pages window into their appropriate documents displayed in the
New Documents pane using one of the following three methods:
o
Select pages and drag them with the mouse;
Note: You can also use drag–and–drop to move pages between documents.
o
Click the Move button to move the selected pages into the current document
displayed in the New Documents pane or click the Return button to return them
to the Pages window.
o
Use keyboard shortcuts: press Ctrl+Right Arrow to move selected pages from the
Pages window to the selected document in the New Document pane, and
Ctrl+Left Arrow or Delete to move them back.
4. Once you are finished moving pages into the new FineReader documents, click the Create
All button to create all documents at once or click the Create button in each of the
documents individually.
Tip: You can also drag–and–drop selected pages from the Pages pane into any other
ABBYY FineReader window. A new FineReader document will be created for these pages.
Ordering pages in a FineReader document
1. Select one or more pages in the Pages window.
2. Right–click the selection and then click Reorder Pages… on the shortcut menu.
3. In the Reorder Pages dialog box, choose one of the following:
o
Reorder pages (cannot be undone)
This changes all page numbers successively, starting with the selected page.
o
Restore original page order after duplex scanning
This option restores the original page numbering of a document with double–sided
pages if you used a scanner with an automatic feeder to first scan all the odd–
numbered pages and then all the even–numbered pages. You can choose between
the normal and the reverse order for the even–numbered pages.
Important! This option will only work if 3 or more consecutively numbered pages
40
ABBYY FineReader 12 User‘s Guide
34
are selected.
o
Swap book pages
This option is useful if you scan a book written in a left–to–right script and split the
facing pages, but fail to specify the correct language.
Important! This option will only work for 2 or more consecutively numbered
pages, including at least 2 facing pages.
Note: To cancel this operation, select Undo last operation.
4. Click OK.
The order of the pages in the Pages window will change to reflect the new numbering.
Note:
1. To change the number of one page, click its number in the Pages window and enter the
new number in the field.
2. In the Thumbnails mode, you can change page numbering simply by dragging selected
pages to the desired place in the document.
Document properties
Document properties contain information about the document (the extended title of the
document, author, subject, key words, etc). Document properties can be used to sort your
files. Additionally, you can search for documents by their properties and edit the properties
of a document.
When recognizing PDF documents and certain types of image files, ABBYY FineReader will
export the properties of the source document. You can then edit these properties.
To add or modify document properties:
Click Tools > Options…
Click the Document tab, and in the Document properties group, specify the title,
author, subject and key words.
Patterns and languages
You can save pattern and language settings and load settings from files.
To save patterns and languages to a file:
1. Open the Options dialog box (Tools > Options…) and then click the Read tab.
2. Under User patterns and languages, click the Save to File… button.
3. In the dialog box that opens, type in a name for your file and specify a storage location.
This file will contain the path to the folder where user languages, language groups,
dictionaries, and patterns are stored.
To load patterns and languages:
1. Open the Options dialog box (Tools > Options…) and then click the Read tab.
VB.NET PDF - Convert PDF with VB.NET WPF PDF Viewer Best WPF PDF Viewer control as well as a powerful PDF converter. PDF to image file formats with high quality, support converting PDF to PNG, JPG, BMP and GIF.
change pdf to jpg format; changing pdf file to jpg
49
ABBYY FineReader 12 User‘s Guide
35
2. Under User patterns and languages, click the Load from File… button.
3. In the Load Options dialog box, select the file that contains the desired user patterns and
languages (it should have the extension *.fbt) and click Open.
Document Features to Consider Prior to OCR
The quality of images has a significant impact on recognition quality. This section explains
what factors you should take into account before recognizing images.
Document languages
Print type
Print quality
Color mode
Document languages
ABBYY FineReader recognizes both single– and multi–language documents (e.g. written in
two or more languages). For multi–language documents, you need to select several
recognition languages.
To specify an OCR language for your document, in the Document Language drop–down
list on the main toolbar or in the Task window, select one of the following:
Autoselect
ABBYY FineReader will automatically select the appropriate languages from the user–
defined list of languages. To modify this list:
1. Select More languages…
2. In the Language Editor dialog box, select the Automatically select document
languages from the following list option.
3. Click the Specify… button.
4. In the Languages dialog box, select the desired languages.
A language or a combination of languages
Select a language or a language combination. The list of languages includes recently used
recognition languages, as well as English, German, and French.
More languages…
Select this option if the language you need is not visible in the list.
In the Language Editor dialog box, select the Specify languages manually option and
then select the desired language or languages by checking the appropriate boxes. If you
often use a particular language combination, you can create a new group for these
languages.
If a language is not in the list, it is either:
1. not supported by ABBYY FineReader, or
2. not supported by your copy of the software.
The complete list of languages available in your copy can be found in the Licenses
dialog box (Help > About… > License Info).
In addition to using built–in languages and language groups, you can create your own. For
details, see "If the Program Fails to Recognize Some of the Characters."
27
ABBYY FineReader 12 User‘s Guide
36
Print type
Documents may be printed on various devices such as typewriters and fax machines. OCR
quality can be improved by selecting the correct Document type in the Options dialog
box.
For most documents, the program will detect the print type automatically. For automatic
print type detection, the Auto option must be selected under Document type in the
Options dialog box (Tools > Options…). You can process the document in full–color or
black–and–white mode.
You may also choose to manually select the print type as needed.
An example of typewritten text. All letters are of equal width
(compare, for example, "w" and "t"). For texts of this type, select
Typewriter.
An example of a text produced by a fax machine. As you can see
from the example, the letters are not clear in some places, in
addition to noise and distortion. For texts of this type, select Fax.
Tip: After recognizing typewritten texts or faxes, be sure to select Auto before processing
regular printed documents.
Print quality
Poor–quality documents with "noise" (i.e. random black dots or speckles), blurred and
uneven letters, or skewed lines and shifted table borders may require specific scanning
settings.
Fax
Newspaper
Poor–quality documents are best scanned in grayscale. When scanning in grayscale, the
program will select the optimal brightness value automatically.
50
ABBYY FineReader 12 User‘s Guide
37
The grayscale scanning mode retains more information about the letters in the scanned
text to achieve better OCR results when recognizing documents of medium to poor quality.
You can also correct some of the defects manually using the image editing tools available
in the Image Editor. For details, see "Image Preprocessing."
Color mode
If you do not need to preserve the original colors of a full–color document, you can process
the document in black–and–white mode. This will greatly reduce the size of the resulting
FineReader document and speed up the OCR process. However, processing low–contrast
images in black and white may result in poor OCR quality. We also do not recommend black
and white processing for photos, magazine pages, and texts in Chinese, Japanese, and
Korean.
Note: You can also speed up recognition of color and black–and–white documents by
selecting the Fast reading option on the Read tab of the Options dialog box. For more
about the recognition modes, see OCR Options.
To select a color mode:
Use the Color mode drop–down list in the Task dialog box or
Select one of the options under Color mode on the Document tab of the Options dialog
box (Tools > Options…).
Important! Once the document is converted to black–and–white, you will not be able to
restore the colors. To get a color document, open the file with color images or scan the
paper document in color mode.
OCR Options
Selecting the right OCR options is important if you want fast and accurate results. When
deciding which options you want to use, you should consider not only the type and
complexity of your document, but also how you intend to use the results. The following
groups of options are available:
Reading mode
Detect structural elements
Training
User patterns and languages
Fonts
Barcodes
You can find the OCR options on the Read tab of the Options dialog box (Tools >
Options…).
Important! ABBYY FineReader automatically recognizes any pages you add to a
FineReader document. The currently selected options will be used for recognition. You can
turn off automatic analysis and OCR of newly added images on the Scan/Open tab of the
Options dialog box (Tools > Options…).
Note: If you change the OCR options after a document has been recognized, run the OCR
process again to recognize the document with the new options.
47
ABBYY FineReader 12 User‘s Guide
38
Reading mode
There are two reading modes in ABBYY FineReader 12:
Thorough reading
In this mode, ABBYY FineReader analyzes and recognizes both simple documents and
documents with complex layouts, even those with text printed on a colored background
and documents with complex tables (including tables with white grid lines and tables with
color cells).
Note: Compared to the Fast mode, the Thorough mode takes more time but ensures
better recognition quality.
Fast reading
This mode is recommended for processing large documents with simple layouts and good
quality images.
Detect structural elements
Select the structural elements you want the program to detect: headers and footers,
footnotes, tables of contents and lists. The selected elements will be clickable when the
document is saved.
Training
Recognition with training is used to recognize the following types of text:
Text with decorative elements
Texts with special symbols (e.g. uncommon mathematical symbols)
Large volumes of text from low–quality images (over 100 pages)
The Read with training option is disabled by default. Enable this option to train ABBYY
FineReader when recognizing text.
You can use built–in or custom patterns for recognition. Select one of the options under
Training to choose which patterns you want to use.
User patterns and languages
You can save and load user pattern and language settings.
Fonts
Here you can select the fonts to be used when saving recognized text.
To select fonts:
1. Click the Fonts… button.
2. Select the desired fonts and click OK.
Barcodes
If your document contains barcodes and you wish them to be converted into strings of
letters and digits rather than saved as pictures, select Look for barcodes. This feature is
disabled by default.
Working with Complex–Script Languages
With ABBYY FineReader, you can recognize documents in Arabic, Hebrew, Yiddish, Thai,
Chinese, Japanese, and Korean. Some additional considerations must be taken into account
51
ABBYY FineReader 12 User‘s Guide
39
when working with documents in Chinese, Japanese or Korean and documents in which a
combination of CJK and European languages is used.
Installing language support
Recommended fonts
Disabling automatic image processing
Recognizing documents written in more than one language
If non–European characters are not displayed in the Text window
Changing the direction of recognized text
Installing language support
To be able to recognize texts written in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese,
and Korean, you may need to install these languages.
Microsoft Windows 8, Windows 7, and Windows Vista support these languages by default.
To install new languages in Microsoft Windows XP:
1. Click Start on the taskbar.
2. Click Control Panel > Regional and Language Options.
3. Click the Languages tab and select the following options:
o
Install files for complex script and right–to–left languages (including
Thai)
to enable support for Arabic, Hebrew, Yiddish, and Thai
o
Install files for East Asian languages
to enable support for Japanese, Chinese, and Korean
4. Click OK.
Recommended fonts
Recognition of text in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese, and Korean may
require the installation of additional fonts in Windows. The table below lists the
recommended fonts for texts in these languages.
OCR Language
Recommended font
Arabic
Arial™ Unicode™ MS*
Hebrew
Arial™ Unicode™ MS*
Yiddish
Arial™ Unicode™ MS*
Thai
Arial™ Unicode™ MS*
Aharoni
David
Levenim mt
Miriam
Narkisim
43
ABBYY FineReader 12 User‘s Guide
40
Rod
Chinese (Simplified),
Chinese (Traditional),
Japanese, Korean,
Korean (Hangul)
Arial™ Unicode™ MS*
SimSun fonts
such as: SimSun (Founder Extended), SimSun–18030, NSimSun.
Simhei
YouYuan
PMingLiU
MingLiU
Ming(for–ISO10646)
STSong
* This font is installed together with Microsoft Windows XP and Microsoft Office 2000 or
later.
The sections below contain advice on improving recognition accuracy.
Disabling automatic processing
By default, any pages you add to a FineReader document are automatically recognized.
However, if your document contains text in a CJK language combined with a European
language, we recommend disabling automatic page orientation detection and using the
dual page splitting option only if all of the page images have the correct orientation (e.g.,
they were not scanned upside down).
The Detect page orientation and Split facing pages options can be enabled and
disabled on the Scan/Open tab of the Options dialog box.
Note: To split facing pages in Arabic, Hebrew, or Yiddish, be sure to select the
corresponding recognition language first and only then select the Split facing pages
option. This will ensure that the pages are arranged in the correct order. You can also
restore the original page numbering by selecting the Swap book pages option. For
details, see "What Is a FineReader Document?"
If your document has a complex structure, we recommend disabling automatic analysis and
OCR for images and performing these operations manually.
To disable automatic analysis and OCR:
1. Open the Options dialog box (Tools > Options…).
2. Clear the Automatically process pages as they are added option on the Scan/Open
tab.
3. Click OK.
Recognizing documents written in more than one language
The instructions below are provided as an example and explain how to recognize a
document that contains both English and Chinese text. Documents that contain other
languages can be recognized in a similar manner.
Documents you may be interested
Documents you may be interested