EE4740 Lecture4 Slides
EE4740 Lecture4 Slides
Last lecture:
● Non-singular, uniquely decodable, and prefix codes
● Kraft inequality and bounds on the optimal code length
Today:
● Constructions of prefix codes: Shannon, Huffman, Arithmetic
● Optimality of Huffman code
Reference: Cover and Thomas, Chapters 5 and 13 (particularly, the parts
presented on the slides)
2 / 38
Recap
● The expected length L(C ) of a source code C (x) for a random variable X with
probability mass function p(x) is given by
L(C ) = ∑ p(x)l(x),
x∈X
3 / 38
Recap: Types of Codes
4 / 38
Recap: Optimal Prefix Code
Conversely, given a set of codeword lengths that satisfy this inequality, there exists a
prefix code with these word lengths.
5 / 38
Optimal Code
HD (X ) ≤ L < HD (X ) + 1
● McMillian theorem says that any uniquely decodable code satisfies Kraft
inequality, implying there is no better choice than prefix code
Today:
● Constructions of prefix codes: Shannon, Huffman, Arithmetic
● Optimality of Huffman code
6 / 38
Representing Prefix-free Code
Root
0 1
Symbol Codeword c
A 0 A .
B 10 10 11
C 110
D 1110 B .
E 1111 110 111
D E
7 / 38
Representing Prefix-free Code
Root
0 1
Symbol Codeword c
A 0 A .
B 10 10 11
C 110
D 1110 B .
E 1111 110 111
D E
● Interval representation: Codewords correspond to intervals, which are
non-intersecting subsets of [0, 1]
● Given codeword c = (c1 , ..., cl ) ∈ {0, 1}, the interval is
7 / 38
Interval Representation
Given codeword c = (c1 , ..., cl ) ∈ {0, 1}, the interval is
8 / 38
Interval Representation
Given codeword c = (c1 , ..., cl ) ∈ {0, 1}, the interval is
The interval Ic consists of all b ∈ [0, 1] starting with bc . Since the prefix is not
shared, the intervals are non-intersecting.
8 / 38
Construction 1: Shannon Code
Symbol pi Fi li
A 4/10 0 2
B 3/10 4/10 2
C 2/10 7/10 3
D 1/10 9/10 4
● The code for Fi differs from all succeeding ones in one or more of its li places
since the remaining Fi are at least 2−li larger
9 / 38
Decimal Fractions to Binary: Algorithm
1 Multiply the fraction by 2, while noting down the resulting integer and fraction
parts of the product
2 Keep multiplying each successive resulting fraction by 2 until you get a resulting
fraction product of zero
3 Now, write all the integer parts of the product in each step
10 / 38
Shannon Code
Symbol pi Fi li Code
AA 4/10 0 2 00
H(X ) = 1.846
AB 3/10 4/10 2 01
L2 = 2.4
BA 2/10 7/10 3 101
BB 1/10 9/10 4 1110
11 / 38
Shannon Code: Tree Representation
● Compute
− log pi ≤ li < − log pi + 1
12 / 38
Shannon Code: Some Observations
● Shannon code may be much worse than the optimal code. For example when
p1 = 0.99 and p2 = 0.01 since ⌈− log p2 ⌉ = 7
13 / 38
Example: Shannon Code
Root
0 1
Symbol pi Fi ⌈− log p(x)⌉ Code .
A
A 0.55 0 1 0 10 11
B 0.25 0.55 2 10
C 0.1 0.8 4 1100
B .
D 0.1 0.9 4 1110
110 111
What is the optimal (shortest expected length)
. .
prefix code? Huffman code!
1100 1101
C D
14 / 38
Huffman Code
● An optimal (shortest expected length) prefix code for a given distribution can be
constructed by a simple algorithm discovered by Huffman
● Huffman code arranges the symbols in the order of decreasing probability, and
joins the two least probable symbols together
● The new messages are reordered after which two symbols are join again. Repeat
this process until two symbols are left
15 / 38
Example: Huffman Code
16 / 38
Some Examples
17 / 38
Some Examples
17 / 38
Some Examples
17 / 38
Huffman Code on English Text
Image courtesy: Twitter, Simon Pampena and Stanford data compression course notes
18 / 38
Practical prefix-free coding
● Huffman codes are actually used in practice due to their optimality and easy
construction
● Examples:
● http/2 header compression
● ZLIB, DEFLATE, GZIP, PNG compression
● JPEG compression
19 / 38
Non-unique Optimal Codes
(2, 2, 2, 2) OR (1, 2, 3, 3)
● In the second code, the third symbol has length 3, which is greater than
⌈−log 1/4⌉ = 2
20 / 38
Optimality of Codes
21 / 38
Proof of Condition 1
● Let’s assume that our code C is optimal but for pk > pj it has lk > lj .
● Now, lets consider another prefix-free code Ĉ where we exchange the codewords
corresponding to i and j.
m
L(Ĉ ) − L(C ) = ∑ pi (lˆi − li ) = pk (lj − lk ) + pj (lk − lj ) = (pk − pj )(lj − lk ) < 0
i=1
22 / 38
Proof of Condition 2
● If the two longest codewords are not of the same length, one can delete the last
bit of the longer one, preserving the prefix property and achieving lower expected
codeword length.
● Hence, the two longest codewords must have the same length.
● For example, let a codeword be 0110101 be the unique longest codeword. Then,
as there is no other codeword of length 7, we can be sure that we can drop one
bit, we get a shorter average codelength!
23 / 38
Huffman Code and Optimality Conditions
● The symbols with higher probability are chosen later, so their code-lengths are
lower.
● We always choose the two symbols with the smallest probability and combine
them, so the Condition 2 is also satisfied.
We verified that the Huffman code has some desirable properties, it does not prove
the optimality. See the C&T textbook for a rigorous proof.
24 / 38
Huffman Code: Some Observations
● The main property is that after the merge step of merging two smallest
probability nodes of probability distribution (p1 , p2 , . . . , pm ), the remaining tree
is optimal for the distribution (p1 , p2 , . . . , pm−1 + pm ) obtained after merging
● Huffman coding is a “greedy” algorithm in that it coalesces the two least likely
symbols at each stage.
● The typical prefix-free code decoding works by parsing through the prefix-free
tree, until we reach a leaf node
25 / 38
Issues with Symbol Codes
26 / 38
Issues with Symbol Codes
26 / 38
A New Optimal Code
● The issue is that, as we increase the block size n, our codebook size increases
exponentially as D n
● The larger the codebook size the more complicated the encoding/decoding
becomes, the more memory we need, the higher the latency etc.
● The idea doesn’t hold ground practically. Arithmetic coding addresses this
issue.
27 / 38
Arithmetic Coding
● Arithmetic coding achieves almost the same compression as Huffman code for
extension, but it is practical.
28 / 38
An Example: Encoding
29 / 38
Example: Finding The Interval
30 / 38
Example: Finding The Interval
Find an interval (or a range) [L, H) ⊂ [0, 1), corresponding to the entire sequence.
We start with [L, H) = [0, 1), and the subdivide the interval as we see each symbol,
depending on its probability.
31 / 38
Decoding
● 0.1011111000100 → v1 = 0.74267578125
● We can start decoding from the first interval I0 (x n ) = [0, 1) by comparing with
the cumulative distribution
v1 − 0.7
v2 = = 0.21337890625 ∈ [0.2, 0.7) Ô⇒ B
0.2
32 / 38
Natural Example
33 / 38
Out of Order Example
34 / 38
Unique Representation
● We can uniquely represent the interval [L, H) by picking the midpoint: (L + H)/2
● The midpoint could have a very long expansion, so we round it off after B bits
to get the codeword w
− log(H − L) ≤ B ≤ − log(H − L) + 1
In = 0.0002 Ô⇒ B = 13
35 / 38
Optimality of Arithmetic Codes
● What is the size of the interval (H-L) for the input x n ?
n
H − L = ∏ pi = p(x n )
i=1
● Expected codelength is
36 / 38
Huffman Vs Arithmetic Coding
37 / 38
Summary
● Shannon code
● Huffman code
● Optimality of Huffman code
● Arithmetic codes
38 / 38