Chapter 8: Image Extraction
8.4 Page-based and Resource-based Image Loops
The distinction between placed images and image resources gives rise to two funda-
mentally different approaches to image extraction: page-based and resource-based im-
age extraction loops. Both methods can be used to extract images to a disk file or to
Page-based image extraction loop. In this case the application is interested in the ex-
act page layout and placed images, but doesn’t care about duplicated image data. Ex-
tracting images with a page-based loop creates an image file for each placed image, and
may result in the same image data for more than one extracted placed image. The appli-
cation can avoid image duplication by checking for duplicate image IDs. However,
unique image resource can more easily be extracted with the resource-based image ex-
traction loop (see below).
The page-based image extraction loop can be activated in the TET command-line
tool with the option --imageloop page. Code for page-based image extraction at the API
level is demonstrated in the images_per_page and images_in_memory topics in the TET
Cookbook. The images_per_page Cookbook topic and the image_extractor mini sample in
the TET packages also show how to retrieve the image geometry.
Details of the page-based image extraction loop (please refer to the sample code
mentioned above): TET_get_image_info( ) retrieves geometric information about a
placed image as well as the pCOS image ID (in the imageid field) of the underlying image
data. This ID can be used to retrieve more image details with TET_pcos_get_number( ),
such as the color space, width and height in pixels, etc., as well as the actual pixel data
with TET_write_image_file( ) or TET_get_image_data( ). TET_get_image_info( ) does not
touch the actual pixel data of the image. If the same image is referenced multiply on
one or more pages, the corresponding IDs will be the same.
Resource-based image extraction loop. In this case the application is interested in the
image resources of the document, but doesn’t care which image is used on which page.
Image resources which are placed more than once (on one or more pages) are extracted
only once. On the other hand, image resources which are not placed at all on any page
will also be extracted.
The resource-based image extraction loop can be activated in the TET command-line
tool with the option --imageloop resource. Code for resource-based image extraction at
the API level is demonstrated in the image_resources mini sample and Cookbook topic.
The pCOS Path Reference contains more information regarding the pCOS interface.
Details of the resource-based image extraction loop (please refer to the sample code
mentioned above): all pages are opened before extracting image resources to make sure
that image merging is activated; if image merging is not relevant this step can be
skipped. In order to extract an image, the corresponding image ID is required. The code
enumerates all values from 0 to the highest image ID, which is queried with TET_pcos_
get_number( ) as the value of the pCOS path length:images. In order to skip the consumed
parts of merged images (e.g. the strips of a multi-strip image), the type of each image re-
source is examined with the mergetype pCOS pseudo object. This allows us to skip image
parts which have been consumed by the image merging process (since we are only in-
terested in the resulting merged image). Once an image ID has been determined, one of
the functions TET_write_image_file( ) or TET_get_image_data( ) can be called to write the
image data to a disk file or pass the pixel data in memory, respectively.