41
FileConvert can use a number of different OCR engines. Each engine has its own strengths and
weaknesses:
Standard
This is a basic engine. Its biggest advantage is speed – the engine is very fast. Its
accuracy is typically above 90%, high enough to make your documents searchable. It
handles poor images gracefully, but its accuracy degrades as the image quality goes down.
This engine only recognizes English characters. Also, it does not recognize rotated pages,
meaning that all scanned images must have upright text.
Advanced
The advanced engine is somewhat slower than the standard engine, but its accuracy is
much better – usually above 97%. This engine supports and automatically detects Danish,
Dutch, English, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish. It
handles light, dark, and dirty backgrounds quite well. It can recognize rotated pages.
Microsoft Office (MODI)
Microsoft Office 2003 and newer include Microsoft Office Document Imaging, or MODI.
MODI includes a fast, capable OCR engine. The MODI engine is actually licensed from
ScanSoft, the maker of OmniPage and PaperPort, so its performance is comparable to
those engines. It does not recognize rotated pages.
IMPORTANT: These engines will only be available as options if they are detected on your system.
FileConvert does not bundle or install any engine except for the Standard and Advanced engines.
Page Timeout
Some pages can cause the OCR engine to hang. To let the OCR engine move on past pages like
these, you can specify a page timeout. If the engine hasn't been able to successfully OCR a page
before the timeout lapses, it will give up and move on to the next page. Failed pages will be
reported in the log.
Line Break Options
Use these options to tell FileConvert where to insert line breaks (hard returns) in the OCR text.
NOTE: These options are only relevant if you choose Text as the output file format. When you
convert to PDF and embed hidden text, the OCR text will always be character-aligned behind the
image.
By Paragraph
FileConvert will try to figure out where paragraphs end based on punctuation. For
example, if a "." falls at the end of a line, it's probably the last sentence in the paragraph.
FileConvert will insert two returns wherever it thinks a paragraph ends. If you choose this
option, you should proofread your OCR text to make sure all of the line breaks were
handled correctly.
By Line
FileConvert will preserve the original lines from the document. This means that wherever a
line wraps in the document, FileConvert will insert a line break. FileConvert will not try to
figure out paragraph endings.
None