08-3 - Image Compression
08-3 - Image Compression
1
Imagecompression Image compression
methods and standards
Spring2008 ELEN4304/5365DIP 1
byGlebV.Tcheslavski:[email protected]
http://ee.lamar.edu/gleb/dip/index.htm
J PEG
J PEG while being one of the most popular continuous tone image
compression standards defines three basic coding schemes:
1) A lossy baseline coding system based on DCT;
2) Anextendedcodingsystemfor greater compression higher 2) An extended coding system for greater compression, higher
precision, or progressive reconstruction applications;
3) A lossless independent coding system for reversible compression.
In a baseline format, the image is subdivided into 8x8 pixel blocks,
which are processed left to right, top to bottom. For each block, its 64
pixels are level-shifted by subtracting 2
k-1
, where 2
k
is the maximum
Spring2008 ELEN4304/5365DIP 2
number of intensity levels. Next, a 2D DCT of the block is computed,
quantized, and reordered using the zigzag pattern to form a 1D
sequence of quantized coefficients. Next, the nonzero AC coefficients
are coded using a variable-length code. The DC coefficient is
difference coded relative to the DC coefficient of the previous block.
4/28/2008
2
J PEG
The J PEG recommended luminance quantization
array can be scaled to provide a variety of
compression levels (select the quality of J PEG
compression) compression).
Consider compression and reconstruction of the
following 8x8 subimage:
52 55 61 66 70 61 64 73
63 59 66 90 109 85 69 72
62 59 68 113 144 104 66 73
63 58 71 122 154 106 70 69
67 61 68 104 126 88 68 70
Scale by
2
7
=128
76 73 67 62 58 67 64 55
65 69 62 38 19 43 59 56
66 69 60 15 16 24 62 55
65 70 57 6 26 22 58 59
61 67 60 24 2 40 60 58
Spring2008 ELEN4304/5365DIP 3
67 61 68 104 126 88 68 70
79 65 60 70 77 63 58 75
85 71 64 59 55 61 65 83
87 79 69 68 65 76 78 94
The original 256 =2
8
levels image
61 67 60 24 2 40 60 58
49 63 68 58 51 65 70 53
43 57 64 69 73 67 63 45
41 49 59 60 63 52 50 34
J PEG
DCT of the scaled image Quantized transformed array
Next, the zigzag ordering pattern will lead to
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 EOB]
Spring2008 ELEN4304/5365DIP 4
Where EOB is a special end-of-block symbol.
Next, the difference between the current blocks DC symbol and the
DC symbol from the previous block is computed and coded. The
nonzero AC coefficients are coded according to another code table.
4/28/2008
3
J PEG
J PEG decompression begins from decoding DC and AC coefficients
and recovering an array of quantized coefficients from a 1D zigzag.
De-normalized DCT coefficients
Spring2008 ELEN4304/5365DIP 5
IDCT array Upscaled IDCT reconstructed
image
J PEG
The error between the
original and reconstructed
images is due to the lossy
nature of the J PEG
compression. The rms error
is approximately 5.8 intensity
levels.
Spring2008 ELEN4304/5365DIP 6
4/28/2008
4
J PEG
J PEG
approximation with
compression25:1, compression 25:1,
rms error 5.4
J PEG
approximation with
compression 52:1,
107
Spring2008 ELEN4304/5365DIP 7
rms error 10.7
Predictive coding
Predictive coding is based on eliminating the redundancies of closely
spaced pixels in space and/or in time by extracting and coding only
the new information in each pixel. The new information is defined as
the difference between the actual and predicted value of the pixel.
Lossless predictive coding
Lossless predictive
coding system
Spring2008 ELEN4304/5365DIP 8
Predictor generates the
expected value of each
sample based on a
specified number of
past samples.
4/28/2008
5
Predictive coding
Predictors output is rounded to the nearest integer and is used to
compute prediction error
( ) ( ) ( ) e n f n f n = ( ) ( ) ( ) e n f n f n
Prediction error is encoded by a variable-length code to generate the
next element of the encoded data stream. The decoder reconstructs
e(n) from the encoded data and performs the inverse operation
( ) ( ) ( ) f n e n f n = +
Variouslocal global andadaptivemethodscanbeusedtogenerate
( ) f n
Spring2008 ELEN4304/5365DIP 9
Various local, global, and adaptive methods can be used to generate
Often, the prediction is a linear combination of m previous samples:
( ) f n
1
( ) ( )
m
i
i
f n round f n i
=
=
Predictive coding
Where m is the order of the linear predictor and
i
, i = 1,m are
prediction coefficients, f(n) are the input pixels. The m samples used
for prediction can be taken from the current scan line (1D linear
di ti di LPC) f th t d i li (2D predictive coding LPC), from the current and previous line (2D
LPC), or from the current image and the previous images in an image
sequence (3D LPC). The 1D LPC:
1
( ) ( , )
m
i
i
f n round f x y i
=
=
Spring2008 ELEN4304/5365DIP 10
which is a function of the previous pixels in the current line. Note
that the prediction cannot be formed for the first m pixels. These
pixels are coded by other means (f.e. Huffman code).
4/28/2008
6
Predictive coding
For the image shown, form a
first-order (m =1) LPC in form
[ ]
( ) ( 1) f x y round f x y =
[ ]
( , ) ( , 1) f x y round f x y =
A predictor is called a previous
pixel predictor, and the coding
procedure is differential
(previous pixel) coding.
Predictionerror image
Spring2008 ELEN4304/5365DIP 11
Prediction error image
up-scaled by 128.
The average prediction error
0.26
The entropy reduction is due to removal of spatial redundancy.
Predictive coding
The compression achieved in predictive coding is related to the
entropy reduction resulting from mapping an input image into a
prediction error sequence. Therefore, the pdf of the prediction error is
(i l) hi hl k d 0 dh l i l ll ( d (in general) highly peaked at 0 and has relatively small (compared to
the input image) variance. It is often modeled by a zero-mean
uncorrelated Laplacian pdf:
2
1
( )
2
e
e
e
e
p e e
=
Spring2008 ELEN4304/5365DIP 12
where
e
is the standard deviation of e.
4/28/2008
7
Predictive coding
Two successive frames of Earth
taken by NASA spacecraft.
Using the first-order (m =1) Using the first-order (m =1) g ( )
LPC:
[ ]
( , , ) ( , , 1) f x y t round f x y t =
with =1, the pixel intensities
in the second frame can be
predicted from the intensities in
g ( )
LPC:
Spring2008 ELEN4304/5365DIP 13
the first frame; the residual
image
Considerable decrease in standard deviation and in the entropy
indicates significant compression that can be achieved.
Predictive coding
Motion compensated prediction residuals
Since successive frames in a video sequence are often quite similar,
coding their differences can reduce temporal redundancy and provide
significant compression. On the other hand, when a frame sequence
contains rapidly moving objects, the similarity between neighboring
frames is reduced. The attempt to use LPC on images with little
temporal redundancy may lead to data expansion. Video
compression systems avoid the problem of data expansion by:
1. Tracking object movement and compensating for it during the
di i ddiff i
Spring2008 ELEN4304/5365DIP 14
prediction and differencing process;
2. Switching to an alternative coding method when there is
insufficient inter-frame correlation (similarity between frames) to
make predictive coding advantageous.
4/28/2008
8
Predictive coding
Basics of motion
compensation:
Each video frame is divided into non-overlapping rectangular regions
typically of size 4x4 to 16x16 macroblocks. The movement of
each macroblock with respect to the previous frame (reference frame)
Spring2008 ELEN4304/5365DIP 15
is encoded in a motion vector that describes the motion by defining
the vertical and horizontal displacement from the most likely
position. This displacement is usually specified to the nearest pixel,
pixel, or pixel precision. If sub-pixel precision is used, prediction
must be interpolated from a combination of pixels in the frame.
Predictive coding
An encoded frame that is based on the previous frame (forward
prediction) is called a predictive frame (P-frame); the frame that is
also based on thesubsequent frame(backward prediction) is called a
bidirectional frame (B-frame) B-frames requirethecompressedcode bidirectional frame (B-frame). B-frames requirethecompressedcode
streamtobereordered.
Finally, some frames are encoded without referencing to any of the
neighboringframes(likeJ PEG) andareencodedindependently. Such
frames are called
intraframes or independent
frames(I-frames) andare
Spring2008 ELEN4304/5365DIP 16
frames (I frames) and are
ideal starting points for the
generation of prediction
residuals. Also, I-frames can
be easily accessed without
decoding the stream.
4/28/2008
9
Predictive coding
Motion estimation isthekeyconcept of motioncompensation. During
it, themotionof objectsismeasuredandencodedintomotionvectors.
The search for the best motion vector requires specification of
optimality criterion For instance motion vectors may beselected on optimality criterion. For instance, motion vectors may beselected on
the basis of maximum correlation or minimum error between
macroblock pixels and the predicted (or interpolated) pixels for the
chosen reference frame. One of the most frequently used error
measuresisthemean absolute distortion (MAD):
1
( , ) ( , ) ( , )
m n
MAD x y f x i y j p x i dx y j dy = + + + + + +
Spring2008 ELEN4304/5365DIP 17
1 1 i j
mn
= =
where x and y are the coordinates of the upper-left pixel of the mxn
macroblock being coded, dx and dy are displacements from the
referenceframe, andp isanarrayof predictedmacroblockpixels.
Predictive coding
Typically, dx and dy
must fall within a
limited search region
d h around each
macroblock.
Values from 8 to 64 pixels are common, and the horizontal search
area often is significantly larger than the vertical search area.
Another, more computationally efficient measure is the sum of
absolute distortions (SAD) that omits the 1/mn factor.
Spring2008 ELEN4304/5365DIP 18
For the specified selection criterion (say, MAD), motion estimation
is performed by searching for the dx and dy minimizing MAD(x,y)
over the allowed range of motion vector displacements block
matching. An exhaustive search is efficient but expensive; there are
fast algorithms that are inexpensive but dont guarantee optimum.
4/28/2008
10
Predictive coding
Two images
differing by 13
frames. frames.
Spring2008 ELEN4304/5365DIP 19
Motion vectors:
highly correlated
variable-length code
Difference image:
stdiv of error =
12.73; entropy =4.2
Motion-compens.
difference image
Predictive coding
Motion compensated prediction residual was computed by dividing
the latest figure into 16x16 macroblocks and comparing each
macroblock to all possible 16x16 macroblock in the earlier frame
within 16 pixels position. The MAD criterion was used. The
resulting standard deviation was 5.62 and the entropy was 3.04
bits/pixel.
We observe that there is no motion in the lower portion of the image
corresponding to the space shuttle. Therefore, no motion vectors are
shown. Themacroblocksinthisareaarepredictedfromsimilarly
Spring2008 ELEN4304/5365DIP 20
shown. The macroblocks in this area are predicted from similarly
located macroblocks in the reference frame.
4/28/2008
11
Predictive coding
Prediction accuracy can be increased using sub-pixel motion
compensation.
Prediction Residual with 1
residual with no
motion
compensation:
Stdev =12.7;
Entropy =4.17
pixel motion
compensation:
Stdev =4.4;
Entropy =3.34
Residual with
Residual with
Spring2008 ELEN4304/5365DIP 21
Residual with
pixel motion
compensation:
Stdev =4;
Entropy =3.35
pixel motion
compensation:
Stdev =3.8;
Entropy =3.34
Predictive coding
Motion estimation is a computationally intensive process. Fortunately,
only the encoder must estimate the macroblock motion. The decoder
for theknownmotionvectorsof themacroblocks accessestheareas for the known motion vectors of the macroblocks accesses the areas
of the reference frames that were used in the encoder to form the
prediction residuals.
For this reason, most video compression standards do not include
motion estimation. Instead, compression standards focus on the
decoder: place constraints on macroblock dimensions, motion vector
i i h i t l d ti l di l t t
Spring2008 ELEN4304/5365DIP 22
precision, horizontal and vertical displacement ranges, etc.
4/28/2008
12
Predictive coding
Spring2008 ELEN4304/5365DIP 23
Predictive coding
Most of the video compression standards use an 8x8 DCT for I-frame
encoding but specify a larger area (16x16 macroblocks) for motion
compensation. Additionally, even the P- and B-frame prediction
id l f d dd ff i f DCT ffi i residuals are transform coded due to effectiveness of DCT coefficient
quantization. The H.264 and MPEG-4 AVC support intraframe
predictive coding (in I-frames) to reduce spatial redundancy
Spring2008 ELEN4304/5365DIP 24
4/28/2008
13
Predictive coding
A typical motion-
compensated p
video encoder
Spring2008 ELEN4304/5365DIP 25
An encoder exploits redundancies within and between adjacent video
frames, and the psychovisual properties of the human visual system.
Predictive coding
Encoders input is a sequence of macroblocks. For color video, each
macroblock consists of a luminance block and 2 chrominance blocks.
The human eye has less spatial acuity for color than for luminance,
th f th h i bl k ft l d t h lf th therefore, the chrominance blocks are often sampled at half the
horizontal and vertical resolution the luminance block.
luma
chroma
Spring2008 ELEN4304/5365DIP 26
Grayed elements of the encoder are J PEG encoder that may operate
on conventional macroblock (I-frames) or their differences (P- and
B-blocks). Inverse mapper performs IDCT.
4/28/2008
14
Predictive
coding
1 minute HD
(1280x720) full-color ( )
video containing 150
frames fade-in from
black (frames 21, 44),
to black (frames 1595,
1609, 1652), abrupt
changes (frames 1303,
Spring2008 ELEN4304/5365DIP 27
1304). H.264
compression requires
44.56 MB of storage as
compared to about 5
GB uncompressed.
MPEG-2
Abbr. Name
Picture
Coding Types
Chroma
Format
Aspect Ratios
Scalable
modes
MPEG-2 Profiles
SP Simple profile I, P 4:2:0
square pixels,
4:3, or 16:9
none
MP Main profile I, P, B 4:2:0
square pixels,
4:3, or 16:9
none
SNR
SNR Scalable
profile
I, P, B 4:2:0
square pixels,
4:3, or 16:9
SNR (signal-
to-noise ratio)
scalable
Spring2008 ELEN4304/5365DIP 28
Spatial
Spatially
Scalable
profile
I, P, B 4:2:0
square pixels,
4:3, or 16:9
SNR- or
spatial-
scalable
HP High profile I, P, B 4:2:2 or 4:2:0
square pixels,
4:3, or 16:9
SNR- or
spatial-
scalable
4/28/2008
15
MPEG-2
Abbr. Name Frame rates (Hz)
Max h.
res
Max v-
res
Max luminance
samples/s
Max bit
rate
MPEG-2 Levels
( )
res. res. samples/s
(Mbit/s)
LL Low Level
23.976, 24, 25,
29.97, 30
352 288 3,041,280 4
ML
Main
Level
23.976, 24, 25,
29.97, 30
720 576
10,368,000;
Hp: 14,475,600 for
4:2:0 and
11,059,200 for 4:2:2
15
Spring2008 ELEN4304/5365DIP 29
H-14 High 1440
23.976, 24, 25,
29.97, 30, 50,
59.94, 60
1440 1152
47,001,600
Hpwith4:2:0:
62,668,800
60
HL High Level
23.976, 24, 25,
29.97, 30, 50,
59.94, 60
1920 1152
62,668,800
Hp: with 4:2:0:
83,558,400
80
MPEG-2
Allowed Resolutions
720 480, 704 480, 352 480, 352 240 pixel (NTSC)
720 576, 704 576, 352 576, 352 288 pixel (PAL)
Allowed Aspect ratios (Display AR)
4:3
16:9
(1.85:1 and 2.35:1, among others, are often listed as valid DVD aspect ratios,
but are actually just a 16:9 image with the top and bottom of the frame masked
in black)
Allowed Frame rates
29.97 frame/s (NTSC)
25frame/s (PAL)
Note: By using a pattern of REPEAT_FIRST_FIELD flags on the headers of encoded
Spring2008 ELEN4304/5365DIP 30
pictures, pictures can be displayed for either two or three fields and almost any
picture display rate (minimum of the frame rate) can be achieved. This is most
often used to display 23.976 (approximately film rate) video on NTSC. Audio+video
bitrate
Video peak 9.8 Mbit/s
Total peak 10.08 Mbit/s
Minimum 300 kbit/s
4/28/2008
16
Lossy predictive coding
Lossy predictive coding
is achieved by including
aquantizer that replaces a quantizer that replaces
the error-free output by
the nearest integer.
The decompressed sequence
Spring2008 ELEN4304/5365DIP 31
( ) ( ) ( ) f n e n f n = +
is also a predictors input.
mapped prediction error
Lossy predictive coding
Delta modulation
Delta modulation is a simple form of lossy predictive coding, where
thepredictor andquantizer aredefinedas the predictor and quantizer are defined as
( ) ( 1)
( ) 0
( )
f n f n
for e n
e n
otherwise
=
+ >
( ) ( ) ( ) E e n E f n f n =
{ } ( )
{ }
( ) ( ) ( ) E e n E f n f n =
with the constrain that
( ) ( ) ( ) ( ) ( ) ( ) f n e n f n e n f n f n = + + =
and
( ) ( 1)
m
i
f n f n =
Spring2008 ELEN4304/5365DIP 35
1
i
i=
{ } { }
{ } { } { }
( ) ( ) ( ) ( )
( ) ( 1) ( ) ( 2) ( ) ( )
f n f n f n f n
E f n m f n E f n m f n E f n m f n m
R =
1
2
Spring2008 ELEN4304/5365DIP 37
{ }
{ }
( ) ( 2)
( ) ( )
E f n f n
E f n f n m
r =
2
m
r
{ }
1
( ) ( )
e i
i
E f n f n i
=
r
However, computations of autocorrelations is very difficult in
practice. Usually, a set of global coefficients is computed assuming a
simple input model and substituting the corresponding
autocorrelations.. For instance, when a 2D Markov image source
with the separable autocorrelation function
Spring2008 ELEN4304/5365DIP 38
p
{ }
2
( , ) ( , )
i j
v h
E f x y f x i y j =
and a generalized 4
th
order linear predictor
1 2 3 4
( , ) ( , 1) ( 1, 1) ( 1, ) ( 1, 1) f x y f x y f x y f x y f x y = + + + +
4/28/2008
20
Optimal predictors
are assumed, the resulting optimal coefficients are
1 2 3 4
0
h v h v
= = = =
where
h
and arehorizontal andvertical correlationcoefficientsof where
h
and
v
are horizontal and vertical correlation coefficients of
the considered image. The sum of prediction coefficients usually is
1
1
m
i
i
which ensures that the predictors output is within the allowed range
and reduces the impact of transmission noise (usually seen as
Spring2008 ELEN4304/5365DIP 39
p ( y
horizontal streaks). Reducing the decoders susceptibility to the
input noise is important since a single error may propagate to all
future outputs! That is, the decoders output may become unstable. If
the sum is strictly less than one, an input error will affect only a
small number of outputs.
Optimal predictors
Considering the prediction error resulting from DPMC coding the
monochrome Lena image assuming a zero quantization error and
using each of the 4 predictors:
( , ) 0.97 ( , 1)
( , ) 0.5 ( , 1) 0.5 ( 1, )
( , )
0.97 ( 1, )
f x y f x y
f x y f x y f x y
f x y f x y f x y f x y
f x y if h v
f x y
f x y otherwise
=
= +
= +
=
Spring2008 ELEN4304/5365DIP 40
where
( 1, ) ( 1, 1)
( , 1) ( 1, 1)
h f x y f x y
v f x y f x y
=
=
are the horizontal and vertical gradients at point (x,y).
4/28/2008
21
Optimal predictors
Observe that the 4
th
adaptive predictor is designed to improve edge
rendition by computing a local measure of the directional properties.
Predictionerror images Prediction error images
computed for the 4 predictors.
The visually perceptive error
decreases as the predictors
order increases.
Spring2008 ELEN4304/5365DIP 41
The standard deviations are
11.1, 9.8, 9.1, and 9.7 intensity
levels.
Optimal quantization
The staircase quantization function
t = q(s) is an odd function of s (i.e.
q(-s) =-q(s)) and can be described
completely by the L/2 values of s
i
and t
i
shown in the first quadrant of
the graph. These break points define
function discontinuities and are
called the quantizers decision and
reconstruction levels.
B h i i d if i li i ( ]
Spring2008 ELEN4304/5365DIP 42
By the convention, s is mapped to t
i
if it lies in (s
i
, s
i+1
].
The quantizer design problem is to select the best s
i
and t
i
for a
particular optimization criterion and input pdf p(s).
4/28/2008
22
Optimal quantization
If the optimization criterion (that can be either statistical or psycho-
visual measure) is the minimization of the mean-square quantization
error E{(s
i
t
i
)
2
} and p(s) is an even function, the conditions for
i i l minimal error are
1
1
( ) ( ) 1,2,... 2
0 0
1,2,... 2 1
2
i
i
s
i
s
i i
i
s t p s ds i L
i
t t
s i L
+
=
=
+
= =
Spring2008 ELEN4304/5365DIP 43
2
2
i i
i i
i L
s s
t t
=
=
Optimal quantization
Therefore, the reconstruction levels are the centroids of the area
under p(s) over the specified decision intervals; the decision levels
are halfway between the reconstruction levels; q is an odd function.
Th ti ti f i th b diti i ti l i th The quantizer satisfying the above conditions is optimal in the mean-
square error sense and is called the L-level Lloyd-Max quantizer.
For a unit variance Laplacian pdf, the 2-, 4-, and 8-level Lloyd-Max
decision and reconstruction levels (computed numerically) are:
Spring2008 ELEN4304/5365DIP 44
4/28/2008
23
Optimal quantization
The 3 quantizers shown provide fixed output rates of 1, 2, and 3
bits/pixel. If a non-unit variance pdf is considered, the reconstruction
and decision levels are obtained by multiplying the shown values by
th t d dd i ti f th id d df the standard deviation of the considered pdf.
The last row in the table shows the step size satisfying
1 1 i i i i
t t s s
= =
The Lloyd-Max quantizer is not adaptive. However, adjusting the
quantization levels based on the local behavior of an image can be
very beneficial. In theory, slowly changing regions can be finely
Spring2008 ELEN4304/5365DIP 45
y y, y g g g y
quantized, while the rapidly changing areas are quantized more
coarsely. This approach reduces both the granular noise and the
slope overload, while requiring a minimal increase in code rate and
increased quantizer complexity.
Wavelet compression
The idea of wavelet-based compression is similar to the idea of DCT
compression: the transform coefficients can be stored more
efficiently than the pixels themselves since a transform decorrelates
th i f ti t di i l If th t f b i f ti (i the information stored in pixels. If the transform basis functions (in
this case wavelets) pack most of the visual information in few
coefficients, the remaining coefficients can be quantized coarsely or
zeroed with little image distortion.
A typical
l
Spring2008 ELEN4304/5365DIP 46
wavelet
coding
system
4/28/2008
24
Wavelet compression
To encode a 2
J
x2
J
image, an analyzing wavelet and a minimum
decomposition level J-P are selected and used to compute DWT of the
image. If the wavelet has the complementary scaling function , the
f t l t t f b d Th t f t l fast wavelet transform can be used. The transform converts a large
portion of the original image to horizontal, vertical, and diagonal
coefficients with zero mean and Laplacian-like distribution. Many of
these coefficients carry little visual information and can be quantized
and coded to reduce inter-coefficient and coding redundancy.
Since the wavelet transform is both computationally efficient and
Spring2008 ELEN4304/5365DIP 47
inherently local (since basis functions have a limited duration), image
subdivision into block is not needed, which eliminate the blocking
artifact and is the major difference compared to the transform coding
systems.
Wavelet compression
The selection of a wavelet affects the computational complexity of
thetransformsandthesystemsabilitytocompressandreconstruct
Wavelet selection
the transforms and the systems ability to compress and reconstruct
images of acceptable error.
When a wavelet has a companion scaling function, transformations
can be implemented as a sequence of filtering operations. The ability
of the wavelet to pack information into a small number of transform
coefficients determines its compression and reconstruction
f
Spring2008 ELEN4304/5365DIP 48
performance.
The most frequently used are Daubechies wavelets and biorthogonal
wavelets.
4/28/2008
25
Wavelet compression
3-scale
Haar
wavelet
t f
3-scale
Daubechies
wavelet
t f transform
3-scale
symelet
transform
3-scale
Cohen-
Spring2008 ELEN4304/5365DIP 49
y
transform
Daubechies-
Feauveau
biorthogonal
wavelet
transform
Wavelet compression
Symletsare an extension of Daubechies wavelets with increased
symmetry. As seen in the table, computational intensity increases
(from 4 to 28 multiplications and additions per coefficient) when
i t l l t ll th i f ti moving to more complex wavelets, as well as the information
packing performance.
Spring2008 ELEN4304/5365DIP 50
The coefficients below 1.5 were set to zero. The potential
compression ability of biorthogonal wavelet is almost 10% higher
than the one of Haar wavelet.
4/28/2008
26
Wavelet compression
Decomposition level selection
Since a P-scale fast wavelet transform involves P filter banks, the
number of iterations in the computation of the forward and inverse
transforms increases with the number of decomposition levels. In
many applications (like searching image database or transmitting
images for progressive reconstruction), the resolution of the stored
or transmitted images and the scale of the lowest useful
approximation normally determine the number of transform levels.
F
Spring2008 ELEN4304/5365DIP 51
For a
biorthogonal
wavelet and
a global
threshold 25
Wavelet compression
The most important factor affecting wavelet coding compression and
reconstruction error is coefficient quantization. The most widely
Quantizer design
q y
used compressors are uniform. However, the effectiveness of the
quantization can be improved significantly by
1) Introducing a larger quantization interval around zero (a dead
zone);
2) Adapting the size of the quantization interval from scale to scale.
Spring2008 ELEN4304/5365DIP 52
In either case, the selected quantization intervals must be transmitted
to the decoder with the encoded image bit stream. The intervals
themselves may be determined heuristically or automatically
computed based on the image being decoded.
4/28/2008
27
Wavelet compression
The impact of dead zone
interval size on the percentage
of truncated detail coefficients
for a 3 scale biorthogonal
wavelet-based encoding.
As the size of the dead zone
increases, the number of
truncated coefficients increases
too. There is almost no gain
Spring2008 ELEN4304/5365DIP 53
beyond 5.
The rmsreconstruction error due to the dead zone thresholding
increases from 0 to 1.94 (at threshold 5). If every detail coefficient
were eliminated, approximately 97.92% of coefficients would be
truncated and the reconstruction error would reach 12.3 levels.
Wavelet compression: J PEG-2000
J PEG-2000 is based on the wavelet coding technique and provides
an increased flexibility in both the compression of continuous-tone
still images and access to the compressed data. Portions of the
di b df l i compressed image can be extracted for retranslation, storage,
display, or editing. Coefficient quantization is adapted to individual
scales and subbands and the quantized coefficients are arithmetically
coded on a bit-plane basis.
The first step of the encoding process is to shift the image intensity
(or thethreecomponent imagesinthecaseof color images) by
Spring2008 ELEN4304/5365DIP 54
(or the three component images in the case of color images) by
subtracting 2
Ssiz-1
. If there are exactly three components, they may be
optionally decorrelated using a reversible or nonreversible linear
combination of the components.
4/28/2008
28
Wavelet compression: J PEG-2000
For instance, the irreversible component transform of J PEG-200 is:
0 0 1 2
( , ) 0.299 ( , ) 0.587 ( , ) 0.114 ( , )
( ) 016875 ( ) 033126 ( ) 05 ( )
Y x y I x y I x y I x y
Y x y I x y I x y I x y
= + +
= +
1 0 1 2
3 0 1 2
( , ) 0.16875 ( , ) 0.33126 ( , ) 0.5 ( , )
( , ) 0.5 ( , ) 0.41869 ( , ) 0.08131 ( , )
Y x y I x y I x y I x y
Y x y I x y I x y I x y
= +
=
where I
0
, I
1
, andI
2
are the level-shifted input components and Y
0
, Y
1
,
and Y
2
are the corresponding decorrelated components. If the input
components are the red, green, and blue planes, the equation
approximatestheRGB toYC
b
C color video Thecoal of the
Spring2008 ELEN4304/5365DIP 55
approximates the R G B to Y C
b
C
r
color video. The coal of the
transformation is to improve compression efficiency since the
transformed components Y
1
and Y
2
are difference images whose
histograms are highly peaked around zero.
Wavelet compression: J PEG-2000
After the image has been level shifted and optionally decorrelated,
its components can be divided into tiles rectangular arrays of
i l h di d d l idi i l pixels that are processed independently providing a simple
mechanism to access and/or manipulate a limited region of a coded
image.
For instance, an image with a 16:9 aspect ratio can be subdivided
into tiles such that one of its tiles is a subimage with a 4:3 aspect
ratio. That tile could be reconstructed without accessing the other
Spring2008 ELEN4304/5365DIP 56
tiles in the compressed image.
If the image is not subdivided into tiles, it is called a single tile.
4/28/2008
29
Wavelet compression: J PEG-2000
Next, the 1D DWT of the rows and columns of each tile component
is computed. For error-free compression, the transform uses a
biorthogonal 5-3 coefficient scaling and wavelet vector. A rounding
d i df th t f ffi i t h i i t procedure is used for the transform coefficients having non-integer
values.
In lossy applications, a 9-7 coefficient scaling-wavelet vector is
used. In either case, the transform is computed using either the fast
wavelet transform or a complementary lifting-based approach.
The coefficients
Spring2008 ELEN4304/5365DIP 57
used to construct
the 9-7 FWT
analysis filter
bank
Wavelet compression: J PEG-2000
The complementary lifting-based implementation involves 6
sequential lifting/scaling operations:
[ ]
0 1
(2 1) (2 1) (2 ) (2 2) 3 2 1 3 Y n X n X n X n i n i + = + + + + + < +
[ ]
[ ]
0 1
0 1
(2 ) (2 ) (2 1) (2 1) 2 2 2
(2 1) (2 1) (2 ) (2 2) 1 2 1 1
(2 ) (2 ) (2 1) (2 1
Y n X n Y n Y n i n i
Y n Y n Y n Y n i n i
Y n Y n Y n Y n
= + + + < +
+ = + + + + + < +
= + + + [ ]
0 1
0 1
0 1
) 2
(2 1) (2 1) 2 1
(2 ) (2 ) 2
i n i
Y n K Y n i n i
Y n Y n K i n i
<
+ = + + <
= <
HereX isthecomponent beingtransformed Y istheresulting
Spring2008 ELEN4304/5365DIP 58
Here X is the component being transformed, Y is the resulting
transform, i
0
and i
1
define the position of the tile component.
1.586134342; 0.052980118; 0.882911075; 0.433506852
1.230174105 K
= = = =
=
4/28/2008
30
Wavelet compression: J PEG-2000
The described transformation produces 4 subbands: a low-resolution
approximation of the tile component and the components
horizontal, vertical, and diagonal detail images. Repeating the
t f ti N ti d N l l t t f transformation N
L
times produces an N
L
-scale wavelet transform.
Adjacent scales are related spatially by power of 2.
J PEG-2000 two-scale (N
L
=2) wavelet
transform tile-component notation.
When each of the tile components has
beenprocessed thetotal number of
Spring2008 ELEN4304/5365DIP 59
been processed, the total number of
transform coefficients is equal to the
number of samples (pixels) in the
original image. However, the important
visual information is concentrated in a
few coefficients.
Wavelet compression: J PEG-2000
To reduce the number of bits needed to store the information,
coefficient a
b
(u,v) of subband b is quantized to the value q
b
(u,v) by
[ ]
( , )
( ) ( )
b
a u v
q u v sign a u v floor
=
[ ] ( , ) ( , )
b b
b
q u v sign a u v floor =
where the quantization step size is
11
2 1
2
b b
R
b
b
= +
R isthenominal dynamic range of subbandb (thesumof the
Spring2008 ELEN4304/5365DIP 60
R
b
is the nominal dynamic range of subband b (the sum of the
number of bits used to represent the original image and the analysis
gain bits for subband b), and
b
and
b
are the number of bits allotted
to the exponent and mantissa of the subbandscoefficients.
4/28/2008
31
Wavelet compression: J PEG-2000
For error-free compression,
b
=0, R
b
=
b
, and
b
=1. for
irreversible compression, no particular quantization step size is
specified in the standard. Instead, the number of exponent and
ti bit t b id dt th d d bb db i mantissa bits must be provided to the decoder on a subband basis,
called a subband quantization, or for the N
L
LL subband only, called
derived quantization. In the latter case, the remaining subbands are
quantized using extrapolated N
L
LL subband parameters. If
0
and
0
are the number of bits allocated to the N
L
LL subband, the
extrapolated parameters for subband b are
=
Spring2008 ELEN4304/5365DIP 61
0
0
b
b b L
n N
=
= +
where n
b
is the number of subband decomposition levels from the
original image tile component to subband b.
Wavelet compression: J PEG-2000
In the final steps of the encoding, the coefficients of each
transformed tile-components subbands are arranged into rectangular
code blocks, which are coded individually, one bit at a time. Starting
f h i ifi bi l i h l hbi from the most significant bit plane with a nonzero element, each bit
plane is processed in three passes. Each bit is coded in only one of
the three passes that are called significance propagation, magnitude
refinement, and cleanup. The outputs are then arithmetically coded
and grouped with similar passes from other code blocks to form
layers arbitrary numbers of groupings of coding passes from each
codeblock Theresultinglayersarefinallypartitionedintopackets
Spring2008 ELEN4304/5365DIP 62
code block. The resulting layers are finally partitioned into packets
that are fundamental units of the encoded code stream providing an
additional method of extracting a spatial region of interest from the
code stream.
4/28/2008
32
Wavelet compression: J PEG-2000
J PEG-2000 decoders invert the previously described operations.
After reconstructing the subbands of the tile-components from the
arithmetically coded packets, a user-selected number of subbands is
d d d A d d dbi d h l i decoded. Any nondecodedbits are set to zero and the resulting
coefficients are inverse quantized using the inverse-quantized
transform coefficient
( )
( )
( , )
( , )
( , ) 2 ( , ) 0
( , ) ( , ) 2 ( , ) 0
b b
b b
b
M N u v
b b b
M N u v
q b b b
q u v r q u v
R u v q u v r q u v
+ >
= <
( , )
b
q u v
Spring2008 ELEN4304/5365DIP 63
0 ( , ) 0
b
q u v
where M
b
is the number of bit planes for a particular subband and N
b
is the number of decoded bit planes. The reconstruction parameter r
is chosen by the decoder to produce the best visual or objective
quality of reconstruction: 0 r 1 (commonly r =).
Wavelet compression: J PEG-2000
The inverse quantized coefficients are then inverse transformed by
column and by row using IFWT filter bank whose coefficients are
obtained from the table or via the lifting-based operations:
( )
[ ]
0 1
0 1
0 1
(2 ) (2 ) 3 2 3
(2 1) 1 (2 1) 2 2 1 2
(2 ) (2 ) (2 1) (2 1) 3 2 3
(2 1) (2 1)
X n K Y n i n i
X n K Y n i n i
X n X n X n X n i n i
X n X n X
= < +
+ = + < +
= + + < +
+ = + [ ]
[ ]
0 1
0 1
(2 ) (2 2) 2 2 1 2
(2 ) (2 ) (2 1) (2 1) 1 2 1
n X n i n i
X n X n X n X n i n i
+ + + < +
= + + < +
Spring2008 ELEN4304/5365DIP 64
[ ]
0 1
(2 1) (2 1) (2 ) (2 2) 2 1 X n X n X n X n i n i + = + + + + <
The parameters , , , , and K are defined as previously and the
inverse-quantized coefficient row or column element Y(n) is
symmetrically extended if necessary.
4/28/2008
33
Wavelet compression: J PEG-2000
Finally, the component tiles are assembled, inverse component
transformed (if needed), and DC-level shifted. For irreversible
coding, the inverse component transformation is
0 0 2
1 0 1 2
2 0 1
( , ) ( , ) 1.402 ( , )
( , ) ( , ) 0.34413 ( , ) 0.71414 ( , )
( , ) ( , ) 1.772 ( , )
I x y Y x y Y x y
I x y Y x y Y x y Y x y
I x y Y x y Y x y
= +
= +
= +
and the transformed pixels are shifted by +2
Ssiz-1
.
Spring2008 ELEN4304/5365DIP 65
J PEG-2000
C =25. rmserror =3.86
C =52. rmserror =5.77
C =75.
Much better visual quality
and smaller error than with
J PEG.
Spring2008 ELEN4304/5365DIP 66
C =105.
J PEG-2000 provides usable
images compressed by more
than 100:1