What is an OCR system?
An OCR (Optical Character Recognition) system enables you to input printed documents
into your computer automatically via a scanner.
FineReader is an omnifont optical text recognition system. As a result it can recognize texts
set in practically any font without any prior training. FineReader features high recognition
accuracy and low sensitivity to print defects due to its incorporation of special recognition
technology based on the principles of Integral Purposeful Adaptive (IPA) perception.
The document input process can be divided into two stages:
1. Scanning. During the first stage the scanner acts as the computer’s “eye”. It
looks at the image and transfers it to the computer. The acquired image is
nothing more than a picture, a set of black, white, and color dots impossible
to edit in any word processor.
2. Recognition. During the second stage FineReader carries out OCR image
Let’s take a closer look at the second stage.
FineReader OCR image processing involves analyzing the image file transmitted by the scan
ner (layout analysis) and recognizing each character. The layout analysis (selecting the
recognition areas, tables, pictures, lines, and individual characters) and image reading
processes are closely related. Page layout analysis is more accurate if the nature of the text
is known to the application.
As mentioned previously, the image recognition process is based on the principles of
Integral Purposeful Adaptive (IPA) perception.
z Integrity – the identification of recognition objects based on a set of basic
elements and their interrelations.
z Purposefulness– the generation and purposeful verification of recogni
z Adaptability – the system’s ability to learn and be trained.
These three principles determine the system’s behavior. The system generates a hypothesis
concerning a recognition object (a character, part of a character, or several glued charac
ters) and then accepts or rejects this hypothesis according to whether the structural ele
ments are present. These structural elements are computer equivalents of character parts
crucial for human perception (arcs, circles, dots etc.). The application then adapts itself to
the text according to the degree of accuracy attained. Purposeful searching and context
information enable the system to recognize even torn and distorted characters, rendering it
almost insensitive to print defects.
The final result is the recognized text that you see in the FineReader window, a text you
can edit and save in any convenient format.
ABBYY FineReader 6.0 User’s Guide