37
Chapter 4 Proofing and editing 55
in different orientations. The program can handle these; in the output they appear right-
rotated.
Beside the language list the option Verify language choices invokes automatic language
detection that warns of differences between a detected language and the language setting. It
works at page-level and identifies four categories: Japanese, Chinese, Korean and non-Asian.
It cannot distinguish between Traditional and Simplified Chinese or between non-Asian
languages. The last category means Japanese, Chinese or Korean characters were not
detected. Verification takes place during image pre-processing, so the required recognition
language must be set before image loading.
Auto-layout and auto-zoning are recommended for Asian pages. This places all detected texts
into text zones; by choosing an Asian recognition language you set Asian OCR to run in these
zones and that can automatically detect and transmit the text direction, coping with mixed
areas of horizontal and vertical texts on a page.
However, the zoning tool
lets you force vertical Asian recognition by manual zoning.
Please draw rectangular zones with this tool. To manually zone horizontal Asian text, use the
usual text zone type. Do not use the two other vertical-text tools on Asian texts. Drawing a
vertical Asian zone does not automatically enable an Asian language, nor influence the
language auto-detection.
Digital camera images are accepted for Asian languages. However, the automatic 3D deskew
algorithm is unlikely to be useful - certainly not for vertical texts. Preferably use the standard
image loading command and perform manual 3D deskewing with the relevant SET tool if
required. In general, SET tools can be used on Asian images.
Recognized Asian pages appear in the Text Editor, provided your system has support for East
Asian languages - always with horizontal text direction. There is no need to specify Asian
fonts under Options/OCR, a default font is automatically applied - typically Arial Unicode
MS. Other Asian-capable fonts on your system can be chosen in the Text Editor. Editor
support allows text viewing and verifying - Formatted Text is recommended as formatting
level. Large-scale editing and spell-checking are better done in the target application.
Proofing, training and dictionary support are not available for Asian texts. Therefore, prior to
performing Asian OCR, go to the Proofing panel under Options and disable dictionary word
marking, automatic proofreading and IntelliTrain and ensure that no training file is loaded.
Redaction can be applied to Asian texts, either by selection or searching. The workflow step
Form Data Extraction should not be applied to Asian pages.
Typical output converters for Asian texts are RTF, Microsoft Word, Searchable PDF or XPS.
The text direction will be as detected during pre-processing. Changes made in the Text Editor
33
Chapter 4 Proofing and editing 56
- where text is horizontal - will be exported, also to vertical text. Plain Text converters are
available (Unicode TXT, Notepad) but here text direction will always be horizontal.
Training
Training is the process of changing the OCR solutions assigned to character shapes in the
image. It is useful for uniformly degraded documents or when an unusual typeface is used
throughout a document. OmniPage offers two types of training: manual training and
automatic training (IntelliTrain). Data coming from both types of training are combined and
available for saving to a training file.
When you leave a page on which training data was generated, you will be asked how to apply
it to other existing pages in the document
.
Manual training
To do manual training, place the insertion point in front of the character you want to train, or
select a group of characters (up to one word) and choose Train Character... from the Tools
menu or the shortcut menu. You will see an enlarged view of the character(s) to be trained,
along with the current OCR solution. Change this to the desired solution and click OK. The
program takes this training and examines the rest of the page. If it finds candidate words to
change, the Check Training dialog box lists these. Incorrect words should be re-trained before
the list is approved.
IntelliTrain
IntelliTrain is an automated form of training. It takes input from the corrections you make
during proofing. When you make a change, it remembers the character shape involved, and
your proofing change. It searches other similar character shapes in the document, especially in
suspect words. It assesses whether to apply the user correction or not.
You can turn IntelliTrain on or off in the Proofing panel of the Options dialog box.
IntelliTrain remembers the training data it collects, and adds it to any manual training you
have done. This training can be saved to a training file for future use with similar documents.
For examples of IntelliTrain, see Help.
Training files
Whenever you close a document or switch to another one when unsaved training data exists, a
dialog box appears allowing you to save it. To save a training file into an OPD, load it from
Tools > Training File, click Embed, and save to the file type OmniPage Document.
34
Chapter 4 Proofing and editing 57
Saving training to file, loading, editing and unloading training files are all done in the Training
Files dialog box.
Unsaved training can be edited in the Edit Training dialog box, an asterisk is displayed in the
title bar in place of a training file name. Save it in the Training Files dialog box.
A training file can be also edited; its name appears in the title bar. If it has unsaved training
added to it, an asterisk appears after its name. Both the unsaved and the modified training are
saved when you close the dialog box.
The Edit Training dialog box displays frames containing a character shape and an OCR
solution assigned to that shape. Click a frame to select it. Then you can delete it with the
Delete key, or change the assignation. Use arrow keys to move to the next or previous frame.
You are
editing your
unsaved
training.
This frame has
been deleted.
To undelete it,
select it again
and press the
Delete key.
This frame is
selected.
Top part: image shape.
Bottom part: OCR
Double-click frame or
press Enter to change its
OCR solution.
Text and image editing
OmniPage has a WYSIWYG Text Editor, providing many editing facilities. These work very
similarly to those in leading word processors.
Editing character attributes
In all formatting levels except Plain Text, you can change the font type, size and attributes (bold,
italic, underlined) for selected text.
33
Chapter 4 Proofing and editing 58
Editing paragraph attributes
In all formatting levels except Plain Text, you can change the alignment of selected paragraphs
and apply bulleting to paragraphs.
Paragraph styles
Paragraph styles are auto-detected during recognition. A list of styles is built up and presented
in a selection box on the left of the Formatting toolbar. Use this to assign a style to selected
paragraphs.
Graphics
You can edit the contents of a selected graphic if you have an image editor in your computer.
Click Edit Picture With in the Format menu. Here you can choose to use the image editor
associated with BMP files in your Windows system, and load the graphic. Alternatively, you can
use the Choose Program... item to select another program. This will replace the Default Image
Editor item. Edit the graphic, then close the editor to have it re-embedded in the Text Editor. Do
not change the graphic’s size, resolution or type, because this will prevent the re-embedding.
You can also edit images before recognition using the Image Enhancement tools.
Tables
Tables are displayed in the Text Editor in grids. Move the cursor into a table area. It changes
appearance, allowing you to move gridlines. You can also use the Text Editor’s rulers to modify
a table. Modify the placement of text in table cells with the alignment buttons in the Formatting
toolbar and the tab controls in the ruler.
Hyperlinks
Web page and e-mail addresses can be detected and placed as links in recognized text. Choose
Hyperlink... in the Format menu to edit an existing link or create a new one.
Editing in True Page
Page elements are contained in text boxes, table boxes and picture boxes. These usually
correspond to text, table and graphic zones in the image. Click inside an element to see the box
border; they have the same coloring as the corresponding zones. The Help topic True Page
provides details on the operations summarized here.
Frames have gray borders and enclose one or more boxes. They are placed when a visible
border is detected in an image. Format frame and table borders and shading with a shortcut
menu or by choosing Table... in the Format menu. Text box shading can be specified from its
shortcut menu.
29
Chapter 4 Proofing and editing 59
Multicolumn areas have orange borders and enclose one or more boxes. They are auto-
detected and show which text will be treated as flowing columns when exported with the
Flowing Page formatting level.
Reading order can be displayed and changed. Click the Show reading order tool in the
Formatting toolbar to have the order shown by arrows. Click again to remove the arrows.
Click the Change reading order tool for a set of reordering buttons in place of the
Formatting toolbar. A changed order is applied in the formatting levels Plain Text and
Formatted Text. It modifies the way the cursor moves through a page when it is exported
as True Page.
On-the-fly editing
This allows you to modify a recognized page through re-zoning, without having to re-process
the whole page. When on-the-fly editing is enabled, zone changes (deleting, drawing,
resizing, changing type) immediately make changes in the recognized page. Conversely, when
you modify elements in the Text Editor’s True Page formatting level, this changes the zones
on that page.
Two linked tools on the Image toolbar control on-the-fly zoning. One of these tools is always
active whenever no recognition is in progress.
Click this to activate on-the-fly editing. The red signal shows there are no stored zoning
changes.
Click this to turn on-the-fly editing off. Your zoning changes are stored; the on-the-fly
tool displays a green signal to show there are stored changes. To activate these changes,
do one of the following:
Click the on-the-fly tool with a green signal. The zoning changes will cause changes in the
Text Editor.
Click the Perform OCR button to have the whole page (re)recognized, including your
zone changes.
For details on how changes are handled in on-the-fly zoning and their effects in the Text Editor,
see On-the-fly processing in Help.
42
Chapter 4 Proofing and editing 60
Marking and redacting
The Mark Text toolbar gives you tools to mark (highlight or strike-out);
and to redact text. Use the View menu to have this toolbar displayed. You
can float or dock this tool group. Each tool has its equivalent menu item in
the Format menu or the Text Editor shortcut menu.
Redacting is blacking out confidential information. It is unreadable and
unsearchable. To mark and redact text manually, click the Mark for
Redacting tool and use its cursor to select all the text parts you want to redact. They appear
with a gray highlight. When you are ready, click the Redact Document tool. Choose to do
redaction in a copy (safer) or the original document. If you choose to redact a copy, both the
copy and the original remain open in OmniPage, ready to be saved.
WARNING: If you redact the original document, you cannot retrieve the information you
have blacked out.
To find and redact text by searching, select Find and Mark Text from the Edit menu to display
the Find, Replace and Mark Text dialog box. Search for text to be marked for redaction. Step
through all occurrences and decide for each case whether to redact immediately or mark for
redaction. In the latter case, perform the redaction by choosing Close and Redact Document in
the Mark Text dialog box or later click the Redact Document button.
You can apply highlighting and striking out either by selection or searching.
Reading text aloud
The Nuance RealSpeak
®
speech facility is provided for the visually impaired, but it can also
be useful to anyone during text checking and verification. The speaking is controlled by
movements of the insertion point in the Text Editor which can be mouse or keyboard driven.
To hear text:
Use these keys:
One character at a time, forward or back
Right or left arrow. Letter, number or punctuation names are spoken.
Current word
Ctrl + Numpad 1
One word to the right
Ctrl + right arrow
One word to the left
Ctrl + left arrow
A single line
Place the insertion point in the line
Next line
Down arrow
Previous line
Up arrow
Documents you may be interested
Documents you may be interested