11
11
ABSTRACT
Image coding requires an effective representation of images to provide dimensionality reduction, a quantization
strategy to maintain image quality, and finally the error free encoding of quantized coefficients. In the coding of
quantized coefficients, Huffman coding and arithmetic coding have been used most commonly and are suggested
as alternatives in the JPEG standard. In some recent work, zerotree coding has been proposed as an alternate
method, that considers the dependence of of quantized coefficients from subband to subband, and thus appears
as a generalization of the context-based approach often used with arithmetic coding.
In this paper, we propose to review these approaches and discuss them as special cases of an analysis based
approach to the coding of coefficients. The requirements on causality and computational complexity implied
by arithmetic and zero-tree coding will be studied and other schemes proposed for the choice of the predictive
coefficient contexts that are suggested by image analysis.
Keywords: analysis based modeling, wavelet coding, zerotree coding, image modeling, context—based mod-
eling, embedded zerotree wavelet (EZW) code.
1 INTRODUCTION
Lossy encoding of images often consists of three stages: representation, quantization and error-free encoding.
Block-based transforms (such as the DCT of JPEG) and subband and wavelet decompositions are commonly
used to convert an image into a representation with good energy compaction. The transform coefficients are
then quantized to reduce the information and achieve the desired bitrate. The quantized coefficient image is then
entropy encoded in a lossless fashion. We have shown (for wavelets) that good results can be obtained with such
a framework."2 Recently, however, more sophisticated techniques have surfaced which, in some sense, analyze
the image to exploit higher level correlations that exist in the transform domain. One such technique, which has
received a lot of attention, is the embedded zerotree wavelet (EZW) code.5
In this paper, after introducing lossy image compression, we discuss the EZW code, analyze its behavior,
and then propose a model which uses simple, but more conventional, modeling techniques to achieve better
performance. As such, we show that the zerotree data structure, from a coding efficiency perspective, is of little
use.
2.2 Quantization
Typically, the number of samples resulting from image transformations remains the same, but the precision
required to specify the transform coefficients increases. Often, the output of the representation is a set of real-
valued coefficients, which we cannot encode with a finite number of bits. Thus, quantization is required to reduce
the coefficients to finite precision. Furthermore, quantization is often the only way we can reduce the information
content of the source in a controlled fashion.
In all common transformations, there is some notion of frequency in the transform domain, and better quantiz-
ers exploit the human visual system by quantizing higher frequencies, where errors are less visible, more coarsely
than lower frequencies. Theoretically, vector quantization (VQ) results in better performance, but is much more
complex to implement. Furthermore, the gain of VQ over scalar quantization (SQ) is reduced for decorrelated
( transform) coefficients. Recent wavelet transform encoding techniques attempt to exploit the benefits of VQ,
while minimizing the computational burden.3
2.3 Modeling
We typically divide the coding process into two components: modeling and error—free encoding. The goal of
modeling is to predict the distribution to be used to encode each pixel. Error—free encoding will be discussed in
the next section.
In coding, our goal is to encode each sequence of symbols {si , s2, . . . , s} with — log2 p(s1 , s2, . . . s) bits. If
the symbols are independent and identically distributed (i.i.d), this reduces to —n log2 p(s) where p(s) =p(sj), Vi.
If not, then modeling is the task of estimating p(s1, 2, . . . sn). Clearly, even for a binary source, the number of
possible sequences is unmanageable and simplifications must be made. Reformulating the problem as that of
estimating a set of conditional probabilities, i.e.,
does not reduce the complexity, but does lead to a related formulation which is useful in practice. To reduce the
where I is some unknown but to be determined function. If f is the identity mapping, then both formulations
are identical. Our goal however, is to reduce the problem of estimating the symbol distributions to a manageable
size. This is accomplished by restricting the range of f to a small set of s1aes. Now, we can associate with
each state a conditional source comprised of all symbols which occur in that state. The success of our model, is
determined by the extent that the conditional sources are decorrelated with one another and are i.i.d. random
processes. Unfortunately, the determination of the best f for a given source is extremely difficult and we must
resort to heuristic techniques and intuition.
What is our intuition? First, we have at our disposal all previously transmitted symbols. Typically, due to
complexity constraints, we must also reduce the range of f. This is done by choosing a set of pixels (a context)
from the set of all previously transmitted pixels. This choice is of utmost importance, since our prediction
( classification) is based entirely upon these pixels. For raster scan techniques, this typically corresponds to
neighboring pixels to the left and above the current pixel. These should be the pixels that supply the most
information about the current pixel. Often, f is taken to be the identity function on this reduced set of pixels,
so that's its design consists entirely of their selection.
Note that in the above discussion, we have mentioned previously transmitted pixels. This suggests that the
order in which we scan the data is also of importance, and adds another level of complexity to image encoding.
Causality is an ill-defined property in 2—D, especially when a frame buffer is available. Hierarchical pyramids,
for instance, correspond to a reordering of the data which, hopefully, has desirable properties. Finally, we realize
that, in some fashion, the probabilities determined by the encoder must be communicated to the decoder so that
the data can be correctly decompressed.
DOD LI ODD LI
l:i
Figure 1: Zerotree representation of a 2 level dyadic wavelet decomposition with 4 significant pixels.
z,i
'zi ,
LfriJ
r5
———— ———
Figure 2: Example scanning paths for a 2 level dyadic wavelet decomposition. Left to right: raster scan, Morton
order, and Peano scan. The dotted lines define the connectivity between subbands.
augmented significance map. Some example orderings are shown in Figure 2. The only restriction placed on them
is that a pixel parent (in the zerotree) should be scanned before the pixel. This corresponds, roughly, to a depth
first traversal of the zerotree. Alternatively, one could just encode an in order traversal of the tree, but there may
be correlations that are better exploited by more specific scanning patterns. The chosen ordering is important
consideration with respect to the embedded nature of the algorithm, and emphasizes the fact that, for images,
there really is no well-defined notion of causality. Intuitively, is would seem that we should transmit the lower
frequency components first.
For entropy encoding, Shapiro conditions the zerotree symbols using the significance of a pixels parent and
the previous pixel in the defined ordering. An isolated zero cannot occur at the leaf nodes, so that a ternary
alphabet can be used for the highest frequency bands. The bits encoded in the subordinate pass are encoded in
a single context, without any conditioning. All subsources are encoded using an adaptive arithmetic code8 with
a maximum frequency count of 256.
To completely specify the EZW coder, as described, we must also specify the set of wavelets, the normalization
used in the wavelet transform, a scanning order, and the minimum significance threshold, Tmjn , or M.
3.1 Implementation
We have implemented the EZW coder, as described above, except that we use the binary arithmetic QM-coder
as our entropy encoder. Thus, we do not have control over the adaptivity of the encoder (since it is built in) and
must map the multi-alphabet sources onto binary trees before encoding. Furthermore, we have made many parts
of the algorithm optional so that we can evaluate their contribution to its performance. For all results presented,
we use a 6-level dyadic wavelet decomposition of Lena, based on the biorthogonal 9/7 wavelets of Barlaud, which
we,1'2 and others,7 have found to be useful in coding applications.
iiiii
A,
1.06 t rcx -- 1.06
gI \ I I
ci
. 1.04 . 1.04
ci U)
> >
: 1.02 : 1.02
1.00 1.00
0.98 0.98
0 1 2 3 4 5 0 1 2 3 4 5
bpp bpp
Figure 3: Usefulness of various options in the design of a zerotree based encoder. All comparisons are in terms of
the output bitrate relative to pcx.
o bpp
4 5 6
binary symbols encoded (1e6)
the wavelet normalization strategy. Varying Tmjn causes the peaks to shift, and the resulting performance at a
given bitrate can vary by more than 1 dB (see Figure 5). This emphasizes the fact that, even though the EZW
coder is an embedded algorithm, to obtain (its) optimum performance at a given bitrate, a costly optimization
( over Tmin) is required. This optimization is similar to common quantizer design approaches encountered in more
standard encoding frameworks.
Although never explicitly mentioned in Shapiro's paper, the normalization factor chosen in the forward and
inverse wavelet transforms is very important. Given a a normalization factor s, we scale the results by 1/s and s,
respectively, in the forward and inverse 1-D wavelet transforms. With this definition, a normalized transform, in
which the range of the transform is commensurate with the range of the original image, corresponds to s =
With s = 1 , we get an amplification of the LL band coefficients by a factor of 4 for each iteration in the wavelet
decomposition. Recall that the EZW code uses an amplitude first decomposition, therefore, this range expansion
corresponds to a reordering of the data to be encoded and has a significant effect on its performance. It can also
be interpreted as a frequency weighted quantization method (see DISCUSSION). Our experimental results are
shown in Figure 5, where we see that values between 0.8 and 1.2 are fairly interchangeable, but that outside this
range performance degrades quickly.
40 40
ci: ci:
z(I) z
(I)
o_
35 35
30 30
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
bpp bpp
Figure 5: Left: performance as a function of Tmjn (t l/Trnin). Right: as a function of wavelet normalization
factor. Both results are for EZW pcx.
1.2
50
1.1
z(I) a)
>
0 Ct
a)
40
0.9
30 0.8
0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13
bpp slice
Figure 6: Analysis bases models for wavelet transform image encoding. Left: performance of simple models versus
ezw-pcx. Right: Relative performance of extended models versus ezw-rcx.
and refinement contexts, respectively, 1—6 is the scale (or frequency band) in the transform, i and s are the
significance and insignificance of the parent and previous pixels, and 0, +, and — are used to represent their signs.
Thus, we use a total of 102 binary subsources. Note that we no longer have multi-alphabet sources, as when
encoding a zerotree; all symbols are binary.
The code simply consists of scanning each plane, in the prescribed order, and encoding the symbols with
respect to the corresponding contexts described above. We no longer have dominant and subordinate passes as in
the EZW code. We present the results, with Tmjn 1, a wavelet normalization factor of 1 and our three scanning
patterns, in Figure 6. From the curves we see that this simple model, indeed, outperforms the EZW code, while
preserving the embedded nature of the code, further emphasizing that the gain is primarily due to modeling, not
the zerotree data structure. To be fair, more than twice as many binary decisions must be encoded by the new
scheme. Using a Peano scan can reduce the bitrate by as much as 2%. Although we have not explicitly presented
it here, the gains we have obtained are almost entirely in the encoding of the location information, suggesting
that our extended contexts to sign and refinement bits are of little use.
4.2 Extensions
An extensive analysis of correlations in the transformed image would be required to determine the optimum
contexts for encoding each decision in the embedded code. However, as a step in the right direction, we present
some simple extensions to the above model to indicate that further gains are possible, if more pertinent modeling
is employed.
We present 4 modifications to the model used above (with a raster scan ordering), which we denote ws-r. In
ws-rX (for extended), we add two additional conditioning pixels in the same band, representing the significance of
the neighboring pixels in the previous significance map to the right and below the pixel currently being encoded.
In ws-rXs, we augment ws-r by adding 2 sibling pixels, that is, pixels at the same location, but in alternate
frequency bands. For example, if we are encoding a pixel in the LH2 band,11 we use the corresponding pixels in
1But we have made no attempt to minimize this computation burden.
¶ We number our subbands incrementally from the lowest to highest frequencies.
5 DISCUSSION
We have shown that there are no inherent advantages to using the zerotree data structure, with the exception
of there being less decisions to encode with the binary arithmetic encoder. Using simple model design techniques
a more efficient model can be developed that does not rely on complicated data structures. This is somewhat
intuitively pleasing, since in some respects the zerotree is like a block based code, which has been shown to be
inferior to conditional codes of the same complexity.4
Furthermore, we have shown that additional gains are possible by defining better conditioning contexts. The
zerotree does not exploit the dependencies between pixels in neighboring blocks effectively, and it does not exploit
the dependencies between siblings in the representation at all. The extent of the gains possible are not known, and
the results we obtained were modest. Intuitively, however, we see significant structure in the wavelet transform
images, and techniques which use higher level descriptions of the edges at lower resolutions to predict higher
frequency bands appear to have merit. We leave this for future work.
Although it claims to eliminate the need for quantization, their is an underlying quantization mechanism used
by the EZW code, as we have implied earlier. The choice of wavelet normalization and the set of slice thresholds
defines a set of quantizers. Indeed, the EZW code specifies a (heuristically designed) set of quantizers, and a
method for interpolating between them. Although never explicitly mentioned in the EZW discussions, a HVS
weighted quantizer could as easily be applied to the wavelet transform before the bitplanes are encoded. This
would reorder the data transmitted by the code, and could be made more effective from a HVS viewpoint.
7 REFERENCES
[1] Jian Lu, V. Ralph Algazi, and Robert R. Estes, Jr. Comparison of wavelet image coders using the Picture
Quality Scale (pqs). In Proceedings of The SPIE, Wavele Applications II, volume 2491, pages 1119—1130,
April 1995.
[2] Jian Lu, V. Ralph Algazi, and Robert R. Estes, Jr. A comparitive study of wavelet image coders. Submitted
to Optical Engineering special issue on VCIP, 1995.
[ 3] Michael T. Orchard and Kannan Ramchandran. An investigation of wavelet-based image coding using an
entropy-contrained quantization framework. In Storer and Cohn,6 pages 341—50.
[4] J. J. Rissanen and G. G. Langdon, Jr. Universal modeling and coding. IEEE Transaclions on Information
Theory, 27(1):12—23, January 1981.
[5] Jerome M. Shapiro. Embedded image coding using zerotrees of wavelet coefficients. Signal Processing,
41(12):3445—3462, December 1993.
[6] James A. Storer and Martin Cohn, editors. DCC'94:Daa Compression Conference, Snowbird, Utah, March
1994. IEEE Computer Society Press.
[7] John D. Villasenor, Benjamin Belzer, and Judy Liao. Filter evaluation and selection in wavelet image com-
pression. In Storer and Cohn,6 pages 351—60.
[8] Ian H. Witten, Radford M. Neal, and John G. Cleary. Arithmetic coding for data compression. Comm. Assoc.
for Compning Machinery, 30(6):520—39, June 1987.