33
· 104 ·
Page Timeout. Sometimes an OCR engine can get stuck on a really tough page – poor
resolution, poor contrast, a lot of images, etc. The OCR Timeout option sets the “give up” point.
If the OCR engine hasn’t been successful after __ seconds, FileCenter will give up and move on
to the next page. For reference, most pages, even tough ones, can be recognized in under 30
seconds.
Page Threads. To speed up OCR, you can have FileCenter OCR more than one page at a time.
Most modern hardware can handle this without bogging down. Use this setting to specify the
number of pages you want OCRed simultaneously (current limit: 2).
Pro Only: This option is only available in FileCenter Professional.
Page Text. When FileCenter creates a searchable PDF, it puts the text character-aligned behind
the words in the scan. Some users prefer to have the text lumped in a single hidden paragraph
at the top of the page, which reduces the file size slightly and reduces the number of text
objects in the document. To do this, change the Embedded Text option to Top of Page.
Most users should leave this set to Word-Aligned.
Line Breaks. When you’re sending the OCR text to a word processor, FileCenter needs to know
where to put line breaks. You have three options:
By Paragraph. FileCenter will try to figure out where paragraphs end based on punctuation. For
example, if a period falls at the end of a line, it’s probably the last sentence in the paragraph.
FileCenter will insert two returns wherever it thinks a paragraph ends. If you choose this option,
you should proofread the text to make sure all of the line breaks were handled correctly.
By Line. FileCenter will preserve the original lines from the document. This means that one
horizontal line of text in the document will be one line of text. In other words, wherever a line
wraps in the document, FileCenter will insert a line break. FileCenter will not try to figure out
paragraph endings.
None. FileCenter will not insert any line breaks. The text will come out as one continuous line of
text.
Limit OCR To __ Pages. If your main reason for running OCR is to make the PDF searchable, you
might find that most of the important information is in the first few pages of the document. If
that’s true, you can save a lot of time by limiting OCR to the first few pages, especially if you
scan documents that are dozens or hundreds of pages long.
Use the Limit OCR option to choose how many pages of the document you want to OCR. OCR
will quit once those pages are done.