Defines the method for extracting images from PDFs :
0 – Auto
1 – via Bitmap
2 – Extract TIFFs
3 – Convert to TIFF
4 – In-Place
Ensure that the output file is PDF/A compliant. Note that this cannot be
applied to In-Place PDF extraction conversions.
Specifies a temporary folder to be used for bitmap images used during OCR
processing. If this is not specified, the first of the following environment
variables that is defined will be used : TMP, TMPDIR, TEMP.
Specify that JBIG2 compression should be used for bitonal images.
There are two options that can be used to control how the OCR engine
processes parts of the document image that appear to be graphics areas.
By default, if an area of the document is indentified as a graphic area then no
OCR processing is run on that area. However, certain documents may include
areas or boxes that are identified as “graphic” or “picture” areas but that
actually do contain useful text.
To ensure that the OCR engine can be forced to process such areas there are
two options :
“Treat all Graphics Areas as Text”. This option will ensure the entire
document is processed as text. To use this option from the command line use
“Remove Box Lines in OCR Processing”. This option is ideal for forms where
sometimes boxes around text can cause an area to be identified as graphics.
This option removes boxes from the temporary copy of the imaged used by the
OCR engine. It does not remove boxes from the final image. Technically, this
option removes connected elements with a minimum area (by default 100
pixels). To use this option from the command line use -9 100 (Or replace 100
with a different value >10 if desired). This option is currently only applied for
This command line option should generally only be used under guidance from
technical support. It can control the way that color images are processed and
force binarization with a particular threshold. (for example -q 127).
Line Removal - This removes lines and boxes during OCR processing to
improve recognition – particularly in cases where characters “touch” lines.
This option is available via the GUI drop down or via the command line flag –
y lr100.5 The values of 100 and 5 are defaults and should only be changed
with guidance from Aquaforest technical support.
Other advanced Image Morphology options are available using the –y flag .
These are rarely required and should be used only under guidance from
Blank page removal. This option can be used when converting TIFF files to
Searchable PDFs. A value should be provided which specifies the pixel
threshold to be used to determine whether a page is blank or not. A suggested
value is 100 ie using the advanced flag as shown below :