58
Robert Buckley and Simon Tanner
August 2009
6
© Buckley & Tanner, KCL 2009
compression with 5 resolution levels is recommended for images of this and
similar sizes, which are typical of the sample images provided. In practice, the
number of resolution levels would vary with the original image size so that the
lowest resolution sub-image has the desired dimensions.
3.3
Multiple quality layers
There are two main reasons for using multiple quality layers. One is so that it is
possible to decompress fewer layers and therefore less compressed data when
accessing lower resolution sub-images. This speeds up decompression without
affecting quality since the incremental quality due to the discarded layers is not
noticeable at reduced resolutions. The second reason is that multiple quality
layers make it possible to deliver subsets of the compressed image
corresponding to higher compression ratios, which may be acceptable in some
applications. This means there is less data to transmit and process, which
improves performance and reduces access times. It also means that it is
possible for the access format to be a subset of the preservation format,
derived from it by discarding quality layers as the application and quality
requirements warrant. The use of quality layers makes it possible to
retroactively reduce the storage needs should they be revised downward by
discarding quality layers in the preservation format and turning images
compressed at 4:1 or 5:1 for example into images compressed at 8:1 or
higher, depending on where the quality layer boundaries are defined.
3.4
Example: TIFF to JP2 conversion
For example, the following command line uses the Kakadu
3
compress function
(kdu_compress) to convert a TIFF image to a JP2 file that contains an
irreversible JPEG 2000 datastream. In particular, it contains a lossy JPEG 2000
datastream with 5 resolution levels and 8 quality layers, corresponding to
compression ratios of 4, 8, 16, 32, 64, 128, 256 and 512 to 1 for a 24-bit color
image. These correspond to compressed bit rates of 6, 3, 1.5, 0.75, 0.375,
0.1875, 0.09375 and 0.046875 bits per pixel. (A compression ratio of 4 to 1
applied to a color image that originally had 24 bits per pixel means the
compressed image will equivalently have a compressed bit rate of 6 bits per
pixel.) The Kakadu command line use bit rates rather than compression ratios
to specify the amount of compression.
kdu_compress -i in.tif -o out.jp2 -rate
6,3,1.5,0.75,0.375,0.1875,0.09375,0.046875 Creversible=no
Clevels=5 Stiles={1024,1024} Cblk={64,64} Corder=RPCL
The JPEG 2000 datastream created in this example has 1024-by-1024 tiles, 64-
by-64 codeblocks and a resolution-major progressive order RPCL, so that the
compressed data for the lowest resolution (and therefore smallest) sub-image
occurs first in the datastream, followed by the compressed data needed to
reconstruct the next lowest resolution sub-image and so on. This data ordering
means that the data for a thumbnail image occurs in a contiguous block at the
start of the datastream where it can be easily and speedily accessed. This data
organization makes it possible to obtain a screen-resolution image quickly from
a megabyte or gigiabyte sized image compressed using JPEG 2000. Tiles and
codeblocks are used to partition the image for processing and make it possible
to access portions of the datastream corresponding to sub-regions of the
image.
3
http://www.kakadusoftware.com/
66
Robert Buckley and Simon Tanner
August 2009
7
© Buckley & Tanner, KCL 2009
3.5
Minimally Lossy Compression
The JPEG 2000 coder in this example would discard transformed and
compressed data to obtain a compressed file size corresponding to 4:1
compression. This needs to be compared with the performance of the the
minimally lossy coder, where no data is discarded but which is still lossy
because of the use of the irreversible transforms. In some cases, depending on
the image content, as shown in Section 3.6, the minimally lossy coder can give
higher compression ratios than 4:1. Accordingly, it is recommended that a
minimally lossy format with multiple quality layers and multiple resolution
levels be used for the preservation format. The access format would use
reduced quality subsets of the preservation format optionally obtained by
discarding layers and using reduced resolution levels.
3.6
Testing reversible and irreversible compression
The reason to use irreversible compression is that it gives better compression
than reversible compression, at the cost of introducing errors (or differences) in
the reconstructed image. This section examines this performance tradeoff.
Reversible and irreversible compression were applied to four images provided
by the Wellcome Digital Library (Figure 1). A variation on irreversible
compression was tested which used coder bypass mode, in which the coder
skipped the compression of some of the data. This gave a little less
compression, but made the coder (and decoder) run about 20% faster. The
Kakadu commands used in these tests are given in Appendix 2.
The compression ratios obtained with these three test are shown in the
following table.
For these particular images, the compression ratio for irreversible JPEG 2000
was from about 40% to almost 80% better than it was for reversible, and on
average over 30% faster (with a further 20% boost with coder bypass mode).
The cost of irreversible compared to reversible is the error it introduces. The
error or difference between the reversibly and irreversibly compressed images
is about 50 dB, which means the average absolute error value was about 0.5.
For one of the sample images, 99.99% of the green component values were
the same after decompression as they were before, or at most two counts
different. (For the red and blue components, the percentages were 99.79 and
99.35.) This is within the tolerance for scanners: in other words, minimally
lossless irreversible JPEG 2000 compression adds about as much noise to an
image as a good scanner does.
A region was cropped from one image so that the visual effects of this error on
this image could be examined more closely (Figure 2). When they were, the
differences were not perceptible on screen or on paper. Unless being able to
reconstruct the original scan is a requirement, legal or otherwise, then
irreversible compression is clearly advantaged over reversible compression.
Original
Reversible Irreversible Irreversible
w/bypass
L0051262_Manuscript_Page
2.25
3.45
3.42
L0051320_Line_Drawing
1.82
2.52
2.51
L0051761_Painting
2.46
3.96
3.90
L0051440_Archive_Collection
2.52
4.47
4.41
72
Robert Buckley and Simon Tanner
August 2009
8
© Buckley & Tanner, KCL 2009
3.7
Further compression findings
In these tests, the compressed file sizes (and compression ratios) were image
dependent and varied with the image content. Images with less detail or
variation than these samples would give even higher compression ratios.
An advantage of JPEG 2000 is that it lets the user set the compression ratio, or
equivalently the compressed file size, to a specific target value, which the
coder achieves by discarding compressed image data. While this feature was
not used to set the overall compressed file size in the minimally lossy
compression case, it can be used to set the sizes of intermediate images
corresponding to the different quality layers. The following Kakadu command
line generates a JP2 file with a minimally lossy irreversible JPEG 2000
datastream that complies with the recommendation in this report:
kdu_compress -i in.tif -o out.jp2 -rate -, 4, 2.34, 1.36, 0.797,
0.466, 0.272, 0.159, 0.0929, 0.0543, 0.0317, 0.0185
Creversible=no Clevels=5 Stiles={1024,1024} Cblk={64,64}
Corder=RPCL Cmodes=BYPASS
The JPEG 2000 datatream in this example has 5 resolution levels and 12
quality layers. Using all 12 layers give a decompressed image with minimal
loss. The intermediate layers boundaries are at pre-set compressed bit rates,
starting at 4 bits per pixel, corresponding to a compression ratio of 6:1,
assuming a 24-bit color original. Thereafter, the layer boundaries are
distributed logarithmically up to a compression ratio of 1296:1. The exact
values are not critical. What is important is the range of values and there being
sufficient values to provide an adequate sampling within the range.
When a datastream has multiple quality layers, it is possible to truncate it at
points corresponding to the layer boundaries and obtain derivative datastreams
that correspond to higher compression ratios (or lower compressed bit rates).
In the previous example, discarding the topmost quality layer produces a
datastream corresponding to a compression ratio of 6:1 (compressed bit rate of
4 bits per pixel). Discarding the next layers produces a datastream with a
compression ratio of 10.3:1, and so on. As noted previously, some images may
have minimally lossy compression ratio greater than 6:1; the layer settings can
be adjusted when this happens.
Using layers adds overhead that increases the size of the datastream and
therefore decreases the compression ratio. To assess this effect as well as the
overhead due to the use of tiles, the four sample images were compressed with
one layer and no tiles, with one layer and 1024x1024 tiles, and with 12 layers
and no tiles. As the following table shows, adding layers and tiles did decrease
the minimally lossy compression ratio, but the effect was only visible in the
third place after the decimal and was therefore judged insignificant in
comparison to the advantages of using them.
Original
No tiles
1 layer
1024x1024 tiles
1 layer
No tiles
12 layers
L0051262_Manuscript_Page
3.452
3.450
3.443
L0051320_Line_Drawing
2.522
2.521
2.517
L0051761_Painting
3.961
3.957
3.948
L0051440_Archive_Collection 4.477
4.473
4.461
Besides tiles and layers, other datastream components that can improve
performance and access within the datastream are markers, such as Tile
Length Markers (TLM) which can aid in searching for tile boundaries in a
datastream. Their effectiveness depends on whether or not the decoder or
54
Robert Buckley and Simon Tanner
August 2009
9
© Buckley & Tanner, KCL 2009
access protocol makes use of them. As a result, recommendations regarding
their use depend on the choice of codec.
4 Implementation solutions / discussion
This section discusses the file format and metadata recommendations.
One function of a file format is packaging the datastream with metadata that
can be used to render, interpret and describe the image in the file. Besides
defining the JPEG 2000 datastream and core decoder, Part 1 of the JPEG 2000
standard also defines the JP2 file format which applications may use to
encapsulate a JPEG 2000 datastream. A minimal JP2 file consists of four
structures or “boxes”:
1. JPEG 2000 Signature Box, which identifies the file as a member of the
JPEG 2000 file format family
2. File Type Box, which identifies which member of the family it is, the
version number and the members of the family it is compatible with
3. JP2 Header Box, which contains image parameters such as resolution
and color specification needed for rendering the image
4. Contiguous Codestream Box, which contains the JPEG 2000 datastream
4.1
Color Specification
How an image was captured or created determines the parameters in the JP2
Header Box, which are subsequently used to render and interpret the image.
Among these parameters are the number of components (i.e. whether the
image is grayscale or color), an optional resolution value for capture or display,
and the color specification. In general the color content of an image can be
specified in one of two ways: directly using a named color space, such as
sRGB, Adobe RGB 98 or CIELAB, or indirectly using an ICC profile.
The digitization process and the nature of the material being digitized, not the
file format, drive the color specification requirements of the application. The
issue for the file format is whether or not it supports the color encoding used
by the digital materials. What’s significant about the JP2 file format is that it
supports a limited set of color specifications. For example, the only color space
it supports directly is sRGB, including its grayscale and luminance-chrominance
analogues. This is a consequence of the JP2 file format having been originally
designed with digital cameras in mind.
Besides sRGB, the JP2 file format supports a restricted set of ICC profiles,
namely gamma-matrix-style ICC profiles. This style of profile can represent the
data encoded by RGB color spaces other than sRGB. The image data is still
RGB; it’s just that it is specified indirectly by means of an ICC profile. This does
not necessarily mean that non-sRGB systems must support ICC workflows; it
does mean more sophisticated handling of the color specification in the JP2 file.
For example, the system may recognize that the JP2 file contains the ICC
profile for Adobe RGB 98 and use an Adobe RGB 98 workflow.
An alternative to JP2 is the Baseline JPX file format, defined in Part 2 of the
JPEG 2000 standard. JPX is an extended version of JP2 which, among other
things, specifies additional named color spaces, including Adobe RGB 98 and
ProPhoto RGB. There are some RGB spaces, such as eciRGBv2, which JPX does
not support directly and for which ICC profiles would still be needed for them to
be used. The best thing is to use the JP2 file format as long as possible, since it
is more widely supported than JPX and its use avoids support for the more
advanced features of JPX when only extended color space support is desired.
65
Robert Buckley and Simon Tanner
August 2009
10
© Buckley & Tanner, KCL 2009
4.2
Capture Resolution
The JP2 Header Box may also contain a capture or display resolutions,
indicating the resolution at which the image was captured or the resolution at
which it should be displayed. While the JP2 file is required to contain a color
specification, it is not required to have either resolution values. Instead, it is up
to the application to require it. This report recommends that the JP2 Header
Box in the JP2 file contain a capture resolution value, indicating the resolution
at which the image contained in the file was scanned. The JP2 file format
specification requires that the resolution value be given in pixels per meter.
4.3
Metadata
In addition to the four boxes that a JP2 is required to contain, it may optionally
contain XML and UUID boxes. Each can contain vendor or application specific
information, encoded in an XML box using XML or in a UUID box in a way that
is interpreted according to the UUID code (UUID stands for Universally Unique
Identifier). These two types of boxes are used to embed metadata in a JP2 file.
For example, UUID boxes are used for IPTC
4
or EXIF
5
metadata. An XML box
can be used for any XML-encoded data, such as MIX.
While the application and system normally determine the nature and format of
the metadata associated with an image, JPEG 2000-specific administrative or
technical metadata is within scope for this report. While such metadata may or
may not be embedded in a JP2 file, this reports recommends that it be
embedded.
JPEG 2000-specific metadata in the JP2 file should follow the ANSI/NISO
Z39.87-2006 standard. This standard defines a data dictionary with technical
metadata for digital still images. It lists “image/jp2” as an example of a
formatName value and lists “JPEG2000 Lossy” and “JPEG2000 Lossless” as
compressionScheme values. Files that implement this recommendation would
have “JPEG 2000 Lossy” as their compressionScheme value and would also
contain a rational compressionRatio value.
Compression
compressionScheme
JPEG2000 Lossy
compressionRatio
<rational value>
While “JPEG2000 Lossy” is the compressionScheme value for all files that follow
this recommendation and the compressionRatio value can be derived from file
size and parameters in the JP2 Header Box, it is recommended that they be
specified explicitly.
The Z39.87 standard also defines a SpecialFormatCharacteristics container to
document attributes that are unique to a particular file format and datastream.
In the case of JPEG 2000, this container has two sub-containers: one for
CodecCompliance and the other for EncodingOptions. The elements in the
CodecCompliance container identify by name and version the coder that
created the datastream, the profile to which the datastream conforms (Part 1
of the JPEG 2000 standard defines codestream or datastream profiles), and the
class of the decoder needed to decompress the image (Part 4 of the JPEG 2000
defines compliance classes). The elements in the EncodingOptions container
give the size of the tiles, the number of quality layers and the number of
resolution levels.
4
International Press Telecommunications Council, creates standards for photo
metadata (http://www.iptc.org/IPTC4XMP/)
5
Exchangeable image file format, a standard file format with metadata tags for digital
cameras (http://www.exif.org/)
Documents you may be interested
Documents you may be interested