When the starting point for a PDF file is a set of images, or a scanning
process, this text layer is not present and the result is an image-only
PDF. When the starting point is an editable document, the text layer can
be created and the PDF is called 'Normal' or 'Searchable'. The creator of a
PDF can require provision of a password to allow access the text layer.
How does PDF Converter work?
PDF Converter has the ability to perform Optical Character Recognition
(OCR). This is the process of extracting text from an image. It does not
need to use OCR to unlock PDF or XPS files with an accessible text layer
- it must capture the page layout and arrange the given text and other
elements correctly on each page in the new document.
Optical Character Recognition (OCR) is normally used only for input
pages without an accessible text layer or when non-standard character
encoding is detected, but you can require it for any conversion under
Processing Options in the Converter Assistant.
Handling Image-only Pages
Pages without a text layer are a special case for conversion. You can
decide how the program should handle these pages: convert them with
the built-in Optical Character Recognition (OCR), transfer them as
images to the target document or skip them. You can require inspection
of the first pages (up to ten) in files you open. Optionally, you can set
conversion to be stopped, if no text-layer pages are detected.
If you have Nuance® OmniPage®, you can use this to have more control
over the recognition process.
PDF Converter supports over 100 languages, including Danish, Dutch,
English, Finnish, French, German, Italian, Norwegian, Polish,
Portuguese, Spanish and Swedish. The program can convert
multi-lingual documents. A full list of supported languages is provided
in Help. Correct language choice is important for converting image-only