The JPEG Image Compression Algorithm: John W. O'Brien (Obrienjw@colorado - Edu)
The JPEG Image Compression Algorithm: John W. O'Brien (Obrienjw@colorado - Edu)
i,
j, and
k) gives
the number of units along the three principal axes; x, y, and z.
For instance, the vector (3, 5, 7)
2
1
2
0
_
_
, v
2
=
_
_
2
1
2
0
_
_
, v
3
=
_
_
0
0
1
_
_
. (1)
Note that one of the vectors did not change. With this set
the same set of coefcients (3, 5, 7)
2, 8/
2, 7)
.
In the preceding example, we observed how one set of coef-
cients can represent two different points in space depending
on the basis used. Next, and more relevant to the DCT, we
show how a single point can be represented by two different
sets of coefcients.
Let
v =
_
_
2
4
2
8
_
_
= c
1
v
1
+ c
2
v
2
+ c
3
v
3
.
(2)
Since the new basis dened in (1) is orthonormal, we can
nd the coefcients c
i
by taking the inner product of v with
each basis vector in turn. To show why this works, here is
the full computation for the rst coefcient, beginning with
a substitution from (2) and then using the properties of bi-
linearity (inner products in general) and of orthonormality
(these basis vectors in particular):
v, v
1
= c
1
v
1
+ c
2
v
2
+ c
3
v
3
, v
1
= c
1
v
1
, v
1
+ c
2
v
2
, v
1
+ c
3
v
3
, v
1
= c
1
(1) + c
2
(0) + c
3
(0)
= c
1
(3)
We form a new vector w with these coefcients,
w =
_
_
c
1
c
2
c
3
_
_
=
_
_
0
4
8
_
_
. (4)
The reader might wish to apply (1) and (2) to (4) to verify
this result.
2) Extending and generalizing: It might not be immedi-
ately apparent why this concept is especially useful in the
application of image compression. To begin to uncover the
true utility of this approach, let us examine more closely the
above example. Notice that the original vector v has three,
non-zero coefcients, while the transformed vector w has
a coefcient that is zero. In this relatively simple situation
(vectors in R
3
) sending one of the coefcients to zero is no
great feat. What if, instead of a three-dimensional space, we
APPM-3310 FINAL PROJECT, DECEMBER 2, 2005 3
were working in a higher-dimensional space? It might be the
case that several, even most, of the coefcients in a vector are
transformed to zero. That would introduce the possibility of
representing the vector in a compact way. If you think about
it, we do something like this all the time: an expression like
k = {1, 2, 3, . . . , 99} really means that k is the rst ninety-
nine integers. The . . . ,99 is a six character shorthand for
ninety-six of the members of k, which would otherwise have
taken one hundred eighty six digits and ninety ve commas
(two hundred eighty one more characters). By adjusting the
representation we are able to hold a lot of information in a
small space. However, this alternate representation depends
heavily on special attributes of the information being repre-
sented.
3) Images as vectors: Now we take this idea and apply
it to the task of image compression. Despite the fact that
any given sub-pixel (e.g. the green component) in an image
can take on any integer on the interval [0, 255], the relation
to that pixel and its neighbors is typically not that arbitrary.
Pick up any photograph and look for patterns in the variation
of color and brightness across the image. Depending on the
particular image you may see large patches with very little, or
very gradual, variation. Other areas of the image will have
larger uctuations from one point to an adjacent point, or
uctuations that are spaced more closely, but they might form
some repeating pattern. There are many combinations as well,
but these patterns represent a property of the image that we call
spatial frequency content. Just as a passage of music with bass
notes has low temporal frequency components, an image with
slow changes in tone has low spatial frequency components.
The next step is to quantify these frequency components.
4) Intensity to frequency: Consider a line of eight adjacent
pixels. Think of it as some part of one of the rows or columns
from an image, or just think of it as an independent bunch of
pixels that decided to hang out together and form a line. Now
we treat this row of pixels like a vector; a list of coefcients
indicating how much of the appropriate color component is
present at that spatial location. Each basis vector in this model
would be just like the standard Euclidean basis in geometry,
and have the unit value in one of the eight positions with
zeros in the rest. The unit value, in this case, is equivalent to a
255th of the maximum possible intensity. A minor adjustment
at this stage, as mentioned at the end of (I-D), is to use
a signed integer representation of the pixels, subtracting 128
from each intensity sample so that they now fall on the interval
[-128,127]. Later on when we decompress a JPEG le, the last
step in the decoding sequence will be to add 127 back to every
sample.
Now we select an alternate basis that will exploit the
patterns that are generally found in groups of pixels like this.
Since we are interested in the spatial frequency content of the
group, we choose basis vectors that are parameterized in a
frequency variable: those produced by cosine functions.
The basis vectors of the DCT are given by the function
d
[t] =
C()
2
cos
_
(2t + 1)
16
_
C() =
_
1
2
= 0
1 else
for t = {0, 1, . . . , 7}
= {0, 1, . . . , 7},
(5)
where t is the coefcient index, and identies one of
the eight basis vectors by the frequency used to generate it.
This denition is chosen to give the basis set the property
of orthonormality
2
. Fig. 1 shows what each of these vectors
would look like on an intensity versus position plot. The
highest values would correspond to the brightest pixels while
the lowest (largest negative) values are the darkest pixels.
= 0 = 1 = 2 = 3
= 4 = 5 = 6 = 7
Fig. 1. A largely qualitative view of the DCT basis vectors.
If we now apply the method described above in (II-B.1),
we can express the pixel vector as a linear combination of the
sinusoidal basis vectors dened in (5). The remarkable feature
of the transformed vector is that many of the high frequency
coefcients are often zero or close to zero. This happens
because continuous tone digital images typically have little or
no high frequency spatial variation. Look again at the = 7
basis vector in Fig. 1. The corresponding pixel values would
alternate between very bright and very dark pixels across the
whole row. Given that computer monitors have pixels that are
about 0.3mm wide, one may readily imagine that an image
with a drastically different color every fraction of an inch
would be unpleasant to look at.
In a group of eight frequency coefcients it would not be
out of the question to expect three or four of them to be zero,
and certainly four or ve could be very close to zero. If a
coefcient is nearly zero, it means that the contribution of
the corresponding frequency component is small, perhaps even
sufciently insignicant that it would not be missed if it were
to be discarded.
2
Orthonormality can be proved for this set of vectors by simply testing every
pair under the dot product. There are 24 pairs, each requiring 8 multiplications
and 7 additions for the dot product. A mere 360 arithmetic operations.
APPM-3310 FINAL PROJECT, DECEMBER 2, 2005 4
5) Spatial frequency in two directions: There is one more
extension of the DCT that is important for the JPEG algorithm.
Since images are represented as 2-D arrays of pixels, it is
advantageous to incorporate a quantitative measure of both
vertical and horizontal spatial frequency, not just one or the
other. The discussion in (II-B.4) dealt only with a line
of pixels. While an image can certainly be separated into
many such lines, and transformed accordingly, it is better to
transform blocks (sub images, so to speak).
6) Denition of the DCT: The 2-D DCT used for JPEG
image compression is dened as follows
3
. We use the general
form of the DCT basis vectors from (5).
F(v, u) =
7
x=0
7
y=0
p(y, x)d
u
[x]d
v
[y] (6)
where F(v, u) is the frequency coefcient with vertical
frequency v and horizontal frequency u, and p(y, x) gives the
value of pixel in row y and column x of the block.
This denition can be framed in terms of matrix operations.
Let
F =
_
_
f
00
f
07
.
.
.
.
.
.
f
70
f
77
_
_ f
vu
= F(v, u), (7)
P =
_
_
p
00
p
07
.
.
.
.
.
.
p
70
p
77
_
_ p
yx
= p(y, x), (8)
and
D =
_
_
d
00
d
07
.
.
.
.
.
.
d
70
d
77
_
_ d
t
= d
[t]. (9)
That is, the matrices F and P are direct analogs to the digital
storage of frequency coefcients and pixels respectively, while
D has the DCT basis vectors as its rows. By using the element-
wise denitions of (7), (8), and (9) to substitute in (6) we see
a slightly different form of the DCT equation.
f
vu
=
7
x=0
7
y=0
p
yx
d
ux
d
vy
(10)
By rearranging factors and grouping appropriately
4
we are
able to recognize (10) as the element-wise denition of matrix
multiplication:
F = DPD
(11)
3
The term two-dimensional (2-D) in this context refers to the shape of
the 8 8 pixel block, not the number of basis vectors of the DCT.
4
Recall also that the subscripts of a matrix element can be reversed by
taking the transpose of the matrix.
7) The inverse DCT (IDCT): Until now, this discussion has
been exclusively concerned with the compression of an image.
Let us not overlook the fact that a compression scheme is of
no use unless the process can be reversed and the information
decompressed. We now have all we need to prove that the
DCT is reversible. The proof is very nice: The matrix D has
full rank because its rows are a basis. Therefore D and its
transpose are invertible. Since the rows of D are orthonormal,
D is an orthogonal matrix, so D
= D
1
. The conclusion of
the proof gives us an equation for the IDCT.
D
1
F(D
)
1
= D
1
DPD
(D
)
1
= P
P = D
FD
(12)
If we take the transpose of (12) we nd that we can perform
the IDCT by using the DCT to operate on F
_
188 145 88 58 67 110 134 134
187 193 152 125 115 130 137 139
166 184 201 194 198 195 151 139
152 168 188 214 229 225 172 143
156 159 165 181 201 199 169 144
168 163 164 158 167 174 156 145
174 171 169 158 155 163 160 144
170 172 167 161 159 167 169 147
_
_
(14)
The visual-numerical analog can be seen, for instance, by
noting the association between the lowest value in the matrix
58, or 22.7% white, at location (0,3)and the dark pixel in
the top row of the block, and between the highest value229,
or 89.8% white, at location (3,4)and the nearly white pixel
near the center of the block.
B. Level shift
The sample range is shifted to be zero-centered by subtract-
ing 128:
P
0
= P
0
(128)I =
_
_
60 17 40 70 61 18 6 6
59 65 24 3 13 2 9 11
38 56 73 66 70 67 23 11
24 40 60 86 101 97 44 15
28 31 37 53 73 71 41 16
40 35 36 30 39 46 28 17
46 43 41 30 27 35 32 16
42 44 39 33 31 39 41 19
_
_
(15)
C. Forward DCT
Switching to a oating point representation we apply (11)
to P
0
to nd the frequency coefcients.
F
0
= DCT[P
0
] =
_
_
263.00 46.69 10.84 45.62
67.22 24.13 56.18 6.96
119.71 32.57 129.33 7.86
87.58 7.34 71.10 16.89
11.75 33.87 3.76 23.69
3.79 10.86 13.13 8.78
1.56 6.20 7.08 4.27
2.73 0.94 1.93 1.05
28.00 4.91 5.84 5.57
2.32 8.65 0.87 5.86
8.71 10.05 8.83 1.19
4.75 4.86 1.32 2.37
4.75 5.40 3.85 0.44
1.96 4.01 6.98 0.26
1.17 0.42 8.42 2.77
0.31 0.95 2.29 3.48
_
_
(16)
D. Quantization
The quantization table we will use is chosen from an
example in [1]. Note that the values in the upper-left corner
of the matrix, which correspond to DC and low-frequency
components, are fairly low (10-20), whereas the values in the
lower-right region of the matrix, which are used to quantize
the high frequency components, are much higher (80-120).
Q
0
=
_
_
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
_
_
(17)
Element-wise division F
0
[v, u]/Q
0
[v, u] and rounding gives
the block of quantized coefcients.
G
0
=
_
_
16 4 1 3 1 0 0 0
6 2 4 0 0 0 0 0
9 3 8 0 0 0 0 0
6 0 3 1 0 0 0 0
1 2 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
_
_
(18)
E. Zig-zag sequence
Most computers store 2-D arrays in memory in a row-wise
sequence. That is, all elements from the top row are stored
consecutively, followed by all elements from the next row,
and so forth. For the purpose of JPEG compress, on the other
hand, it is desirable to have as many of the zero coefcients
next to each other as possible to maximize the compression
available with the RLE and Huffman coding. Consequently,
the 1-D representation chosen for this 2-D array starts in the
upper left corner and sweeps back and forth diagonally. The
linear representation of G
0
would then be
G
0
= {16, 4, 6, 9, 2, 1, 3, 4, 3, 6, 1, 0, 8,
0, 1, 0, 0, 0, 3, 2, 0, 0, 0, 0, 1, 0, . . . , 0} (19)
F. Intermediate coding
Though the principal thrust of this paper is not entropy
coding, it is worthwhile to glance at an example. A general
sense of what is happening can be gained without extensive
analysis of the theoretical background of the approach.
APPM-3310 FINAL PROJECT, DECEMBER 2, 2005 7
1) Representing values: Commonly, the elements of G
0
would be stored in the computer as xed-length binary in-
tegers. Since they are so small, this choice would waste
many bits since a small number can be represented with a
small number of bits. The biggest number in this array can be
specied with ve bits (10000). To exploit this fact, a set of
variable length integer (VLI) codes are specied by the JPEG
standard. The ones that will be used for this example are given
in Table I.
value code size
-9 0110 4
-6 001 3
-2 01 2
-1 0 1
1 1 1
2 10 2
3 11 2
4 100 3
8 1000 4
16 10000 5
TABLE I
VARIABLE LENGTH INTEGERS USED IN THE EXAMPLE OF (III-F.1).
2) Intermediate and entropy coding: The block G
0
will
be entropy encoded starting at the rst element and stepping
through it to the last element.
6
Each of the corresponding
sequence of coded pieces of information will have the form
(run,size,value). As per the brief discussion in (II-D), the
intermediate code of G
0
would be
G
0
= {(0, 5), 10000, (0, 3), 100, (0, 3), 001, (0, 4), 0110,
(0, 2), 10, (0, 1), 0, (0, 2), 11, (0, 3), 100, (0, 2), 11,
(0, 3), 001, (0, 1), 0, (1, 4), 1000, (1, 1), 0, (3, 2), 11,
(0, 2), 01, (4, 1), 1, (0, 0)} (20)
Each (run,size) pair are the zero-run and value size, while
the numbers that follow are the VLI values from Table I. The
block is terminated with the pair (0,0) indicating that there are
no more non-zero elements.
It is the (run,size) pairs that receive the Huffman coding.
Table II shows all of the pairs that appear in 20, as well as
how many times they appear in this block.
Huffman codes have the following properties:
1) No code is a prex for any other code (e.g. if 010 is
valid, then 010... is not)
2) Codes that are more likely to occur have shorter path
lengths typically
Since we are only concerned for the moment with data from
a single block, it is possible to develop a Huffman table that
is beautifully suited to this block. The choice is shown in
Table II. Substituting from that table into (20) gives this string
of bits:
6
DC coefcients are actually treated differently than AC coefcients. The
values are stored as differences relative to previous blocks, and a different
coding table is used. However, this detail will be neglected for the purpose
of this example.
# pair code
4 (0,2) 00
4 (0,3) 01
2 (0,1) 100
1 (0,0) 101
1 (0,4) 11000
1 (0,5) 11001
1 (1,1) 11010
1 (1,4) 11011
1 (3,2) 11100
1 (4,1) 11101
TABLE II
A LIST OF (RUN,SIZE) PAIRS FROM (20), NUMBER OF OCCURRENCES, AND
HUFFMAN CODES.
G
0
= {
1100110000011000100
1110000110001010000
0110110000110100110
0011011100011010011
100110001111011101
} (21)
That is 94 bits. The original block of intensity samples
would have been 4096 bits long in memory. Of course, this
is not counting the storage space required to store the quanti-
zation table(s) and the Huffman coding table(s), but they are
shared among many blocks so the storage expense per block
is negligible. Assuming that we have already taken care of
storing the Huffman and quantization tables, the compression
ratio for this block is 2.3%, or about 0.18 bits per pixel.
G. Decompression
If the inverse of this process is applied to the compressed
data, a block of pixels P
1
will be generated. This block is
intended to be a reconstruction of the original P
0
, acknowl-
edging that there will be some difference due to the loss and
nite precision of the system. The
P
1
=
_
_
187 146 88 53 63 100 127 134
190 175 148 126 125 135 137 131
171 185 197 198 194 183 160 139
142 165 193 212 220 210 180 150
147 155 167 184 202 203 177 146
176 168 158 158 172 179 160 134
183 175 162 154 159 165 155 138
164 167 164 158 159 166 163 154
_
_
(22)
To see how much these samples vary from the original we
will also compute a percent error block. Note that most pixels
have changed by 1-5% and none more than 10%. There are
even some that have been reconstructed exactly.
APPM-3310 FINAL PROJECT, DECEMBER 2, 2005 8
E =
_
_
0.5 0.7 0.0 8.6
1.6 9.3 2.6 0.8
3.0 0.5 2.0 2.1
6.6 1.8 2.7 0.9
5.8 2.5 1.2 1.7
4.8 3.1 3.7 0.0
5.2 2.3 4.1 2.5
3.5 2.9 1.8 1.9
6.0 9.1 5.2 0.0
8.7 3.8 0.0 5.8
2.0 6.2 6.0 0.0
3.9 6.7 4.7 4.9
0.5 2.0 4.7 1.4
3.0 2.9 2.6 7.6
2.6 1.2 3.1 4.2
0.0 0.6 3.6 4.8
_
_
(23)
IV. CONCLUSION
This examination of the JPEG compression algorithm, in
particular the DCT operation, has demonstrated both the qual-
itative concepts behind the technique as well as the quantitative
processes that are used to apply them.
The important results that can be drawn from this work in-
clude an understanding of the scope of the technique. Perform-
ing a DCT on blocks in an image gives easily compressible
data because of the content of most images, not because of an
inherent constraint of the DCT. The other important result is
the framing of the DCT in terms of matrix operations. It is the
rst step in developing computational methods for computing
the transform quickly.
Study that could logically follow from this work would be:
1) an exploration of Huffman coding in the context of proba-
bility and information theory; 2) a review of the other modes of
operation of the JPEG algorithm (Progressive, Hierarchical);
and 3) applications of the DCT or similar transforms to the
compression and manipulation of other kinds of data (like
audio).
V. ACKNOWLEDGMENT
The author wishes to recognize and thank Dr. Anne
Dougherty for her thoughtfulness and high-standards.
REFERENCES
[1] G. K. Wallace, The jpeg still picture compression standard, Communi-
cations of the ACM, Apr. 1991.
[2] Recommendation T.81, International Telecommunication Union (ITU)
Std., Sept. 1992, joint Photographic Expert Group (JPEG). [Online].
Available: http://www.w3.org/Graphics/JPEG/itu-t81.pdf
[3] C. L. Phillips, J. M. Parr, and E. A. Riskin, Signals, Systems, and
Transforms, 3rd ed. Upper Saddle River, NJ: Prentice Hall, 2003.
[4] P. J. Olver and C. Shakiban, Applied Linear Algebra. Upper Saddle
River, NJ: Prentice Hall, 2006.
APPENDIX I
MATLAB CODE
% FILE: jpeg.m
% DATE: 2005-12-01
% DESCRIPTION: Example code to demonstrate
% the JPEG compression algorithm
% Block of pixel values
P0 = [188 145 88 58 67 110 134 134;
187 193 152 125 115 130 137 139;
166 184 201 194 198 195 151 139;
152 168 188 214 229 225 172 143;
156 159 165 181 201 199 169 144;
168 163 164 158 167 174 156 145;
174 171 169 158 155 163 160 144;
170 172 167 161 159 167 169 147];
% Quantization table
quant = [
16 11 10 16 24 40 51 61;
12 12 14 19 26 58 60 55;
14 13 16 24 40 57 69 56;
14 17 22 29 51 87 80 62;
18 22 37 56 68 109 103 77;
24 35 55 64 81 104 113 92;
49 64 78 87 103 121 120 101;
72 92 95 98 112 100 103 99];
% Perform level shift
P = P0 - 128;
% Construct elementary row operation matrix
% to normalize the DC basis vector
S = eye(8)/2;
S(1,1) = 1/2/sqrt(2);
% DCT basis vectors
D = zeros(8,8);
for t = [0:7]
for w = [0:7]
D(w+1,t+1) = cos((2
*
t+1)
*
w
*
pi/16);
end
end
D = S
*
D;
% Perform forward DCT
F = D
*
P
*
D;
% Perform quantization of coefficients
Q = round(F./quant);
% Dequantize
F1 = quant.
*
Q;
% Perform IDCT
P1 = D
*
F1
*
D;
% Level shift up
P1 = P1 + 128;
% Integer round
P1 = round(P1);
% Compute precent error
% to one decimal place
E = round((P1 - P0)./P0
*
1000)/10;