26
For more information, go to Scanning Settings in Chapter 12,
Working with Settings.
Static Thresholding Setting
Each pixel is a shade of gray having a value between 0 and 255. A
pixel whose value is zero is completely white. The higher the
number, the darker the pixel. With Static thresholding, Kurzweil
1000 simplifies that number into one of two states: zero for white,
and one for black. The resulting pixel, or the image made up of
those pixels, becomes a binary image. All shades of gray higher
than a particular number, called the threshold, Kurzweil 1000
considers black. Shades below or equal to the threshold are white.
When you use Static thresholding with the brightness setting, the
higher the brightness value the more gray values convert into white
pixels, so the resulting image is brighter.
Static thresholding has its advantages. It gives you the greatest
possible control over the brightness of the final image. It is also very
fast because a binary image is smaller than a grayscale one.
Dynamic Thresholding Setting
Dynamic thresholding, the second approach, breaks up each page
into regions, and then tries to decide what would be the best
threshold for each region. One page can end up with several
different thresholds, which reduces the control you have over the
brightness of the final image.
This setting is particularly well-suited for multi-colored pages.
However, because more data has to go from the scanner to the
software, this setting can slow down the scan process.
26
Grayscale Setting
The Grayscale option does away with thresholding altogether. The
full grayscale image is sent to the character recognition engine.
This option increases the size of the file when you save the full
grayscale image. You can also expect that this will be the slowest
approach to scanning and recognition. When you use the
Grayscale setting with the latest version of the FineReader Engine,
however, you can improve recognition of difficult documents.
Color Setting
Select color scanning if you want the image to be viewable in color
or to create a color TIFF format file. In color scanning, Kurzweil
1000 uses black and white mode, while retaining the color data.
This can greatly reduce the speed of the process.
This option also affects the resolution of the resulting color image,
which displays at one half the resolution of the original scanned
image. For example, if you scan at 300 DPI, the resulting image is
at 150 DPI.
Interrupting and Saving Recognition
When opening a multi-page image or PDF file, Kurzweil1000 tells
you how many pages the file contains. At an time during
recognition, you can press Escape to interrupt the process. To
resume recognition, press F9 or Start New Scan.
If however, you want to resume recognition at a later time, you can
save the recognition.
To save recognition:
1. During recognition, press Escape to interrupt the process.
27
2. Use File Save (Alt+F+S) to save the file. It is a good idea to
check to make sure you know where this file will be saved and
under what name as you will need that information to find and
open it when you are ready to resume recognition.
3. Close the file.
4. When you re-open the file, Kurzweil 1000 tells you how many
pages remain to be recognized and asks if you want to continue
recognition. Select Yes to continue at the point where you left
off. If you select No, recognition does not take place and you
can continue to read the document as is.
Optimizing Recognition
Like Optimize Scanning, Optimize Recognition directs Kurzweil
1000 to automatically find the best settings for producing optimal
OCR results for an open page. Accessible from the Scan menu, this
command opens a dialog containing two settings: Text Quality and
Engine; both are set by default to Optimize.
When Kurzweil 1000 completes the recognition, you can opt to
keep the newly recognized page by pressing the ENTER key or
discard it by pressing the CANCEL or ESCAPE key.
A few notes about this feature:
• Because this feature works with a page image, the Keep Images
setting in General Settings must be enabled when scanning the
page.
• Optimize Recognition is not savable to a Settings file.
• If you opt to Not Optimize in one session, Kurzweil 1000 does
not keep that setting for the next session, instead it resets the
setting to the Optimize default.
24
Looking at Recognition Statistics
An important tool for choosing the appropriate scanning settings,
such as brightness, is a recognition statistics report.
There are three types of reports that give you information about the
performance of the character recognition engine:
• Last Page provides statistics about the last page that was
recognized.
• Cumulative provides statistics about all the pages the system
has recognized.
• Poorly Recognized gives you a list of those pages in the
current document whose Confidence level fall below a threshold
that you set.
To view Recognition Statistics:
Choose Recognition Statistics from the Tools menu or use the
mnemonics ALT+O+O.
If you do not have a file open, the Cumulative dialog opens.
If you have a file open, use the UP or DOWN ARROW key to select
Last Page, Cumulative, or Poorly Recognized, or press their
mnemonic, L, C, or P, respectively, then press the ENTER key.
Recognition Statistics for the Last Page
The Last Page dialog contains the following indicators:
• Scan Time (ALT+N) tells you the time elapsed, in seconds,
between the press of the Scan button and scanning complete.
• Speed (ALT+S) at which the page was recognized.
28
• Confidence level (ALT+C) tells you how confident the system is
about the recognition process. The higher the number, the more
confident Kurzweil 1000 is about the accuracy of the process.
• Spelling Accuracy (ALT+L) tells you the percentage of words
on the last recognized page that were spelled correctly.
• Character Count (ALT+H) is the number of characters
recognized.
• Illegible Characters (ALT+I) are counted when a recognition
engine thinks that a particular blob on the page is probably a
character, but it has no idea what that character is.
• Questionable Characters (ALT+Q) are counted when a
recognition engine thinks that it has recognized a character, but
is not particularly certain of its recognition.
• Touching Characters (ALT+T) and Broken Characters
(ALT+B) are useful in determining what brightness setting to
use. A high count of touching characters is an indication that
your brightness setting may be too low. A high count of broken
characters is an indication that your brightness setting is too
high. These two statistics are available only for the RTK engine.
• Elapsed Time (ALT+E) is measured in seconds, and only
includes the time it took to do automatic orientation, despeckling,
column finding, and recognition. It does not include scanning
time.
• Automatic Corrections (ALT+A) indicates the number of
errors Kurzweil 1000 corrected.
• The number of Images Remaining (ALT+R) is available if you
are doing batch recognition, and indicates how many page
images remain to be recognized.
28
Cumulative Recognition Statistics
The Cumulative dialog contains all the controls in the Last Page
dialog, except for Touching Characters and Broken Characters. It
presents statistics for all recognized pages since recognition began
on your system. A Clear button allows you to reset values to zero.
Poorly Recognized Pages
By default, the Poorly Recognized Pages list (ALT+P), in the
Poorly Recognized Pages dialog, lists pages in the current
document whose Confidence level falls below a certain threshold.
The pages are sorted by ascending Confidence level, so the
topmost page is likely to be the most poorly recognized. You can,
however, sort by a number of other parameters available from the
Sort by list, and you can group the pages within the list using the
Group by options. The controls in this dialog are:
• Sort by list (ALT+S). Select Confidence Level, Page Number
with the confidence threshold, Spelling Percentage, and
Page Number with the spelling threshold.
If you are in the Poorly Recognized Pages (ALT+P) list, you
can use CTRL+S to cycle through the Sort by options, thereby
changing the pages included on the list and their ordering. A
page’s presence on the list is based on either it’s Confidence or
Spelling Threshold value, and the list order is either by
ascending page number or threshold value. The order is also
affected by the option selected in the Group by list.
When you exit Kurzweil 1000, the system does not save this
selection.
• Group by list (ALT+G).
You can select:
22
Nothing to leave the pages ungrouped.
Image State to group the pages so that those with images are
presented first, followed by those without images. Note: Pages
only have images when the Keep Images option in the General
settings is enabled during recognition.
Edit State to group pages so that those that have not been
marked as "edited" are presented first, followed by those that
have been edited.
Both Image State and Edit Status to split the list into four
groups: unedited pages with images, unedited pages without
images, edited pages with images, and edited pages without
images.
• Confidence Threshold text box (ALT+T). Enter a threshold
from 0 to 100.
• Spelling Threshold text box (ALT+L). Enter a threshold from 0
to 100.
• Toggle Edit Status button (ALT+E). Mark the page as edited.
You can also toggle a page’s Edit Status by pressing CTRL+E
when you are positioned in the list of poorly recognized pages.
• Apply button (ALT+ENTER) Allows you to change the
Confidence Threshold, Spelling Threshold, Group and Sort
settings without leaving the dialog box by choosing OK.
26
To open and edit a poorly recognized page:
1. Select the page you want from the list and press the ENTER
key.
2. In the page, you might change some recognition parameters,
rerecognize (if the page has an image), and/or rescan the page.
During recognition, if the Confidence level falls below the
threshold, the system immediately announces the threshold
level value, which indicates a poorly recognized page.
3. After you have finished working on the page, press ESCAPE to
return to the Poorly Recognized Page dialog to continue editing
additional pages. Don’t forget to mark the pages you’ve edited
by using CTRL+E.
Batch Scanning
Batch scanning lets you scan a number of pages at once without
recognizing or reading them. Instead, you can store them as image
files, which are like snapshots of the original document, then
perform the recognition process later.
Batch scanning saves time during the actual scanning process, as
the system does not recognize each page as it is scanned. Since
the recognition process is completely automated, Kurzweil 1000
can perform this step while the system is unattended.
To perform a batch scan using menus:
1. Open the Settings menu and choose Scanning. In the tab
page that appears, press TAB to go to the Mode option. Use
the arrow keys to choose Image Scanning Only, then press
ENTER.
Documents you may be interested
Documents you may be interested