IT8_02_Source_Coding
IT8_02_Source_Coding
Łukasz Dębowski
[email protected]
Definition (injection)
Function f is called an injection if x 6= y implies f(x) 6= f(y).
Definition (code)
Any injection B : X → Y∗ will be called a code.
We will consider mostly binary codes, i.e., codes for which Y = {0, 1}∗ . On
the other hand, the alphabet X may consist of letters, digits or other symbols.
Example of a code
Example
An example of a code:
symbol x: code word B(x):
a 0
b 1
c 10
d 11
is also an injection.
Examples of codes
Comma-separated codes
Theorem
Each comma-separated code is uniquely decodable.
Proof
For a comma-separated code B, let us decompose B(x) = φ(x)c. We first
observe that B(x1 )...B(xn ) = B(y1 )...B(ym ) holds only if n = m (the same
number of c’s on both sides of equality) and φ(xi ) = φ(yi ) for i = 1, ..., n.
Next, we observe that function φ is a code. Hence string B(x1 )...B(xn ) may
be only the image of (x1 , ..., xn ) under the mapping B∗ . This means that code
B is uniquely decodable.
Fixed-length codes
Example
Theorem
Each fixed-length code is uniquely decodable.
Proof
Consider a fixed-length code B. We observe that
B(x1 )...B(xn ) = B(y1 )...B(ym ) holds only if n = m (the same length of
strings on both sides of equality) and B(xi ) = B(yi ) for i = 1, ..., n. Because
B is an injection, string B(x1 )...B(xn ) may be only the image of (x1 , ..., xn )
under the mapping B∗ . Hence, code B is uniquely decodable.
Example
Consider the following distribution and a code:
symbol x: P(X = x): code word B(x):
a 1/2 0C
b 1/6 1C
c 1/6 10C
d 1/6 11C
1 1 1 1
We have E |B(X)| = 2 · 2
+2· 6
+3· 6
+3· 6
= 2 31 .
We are interested in codes that minimize the expected code length for
a given probability distribution.
In this regard, both comma-separated codes and fixed-length codes have
advantages and drawbacks.
If certain symbols appear more often than others then comma-separated
codes allow to code them as shorter strings and thus to spare space.
On the other hand, if all symbols are equiprobable then a fixed-length
code without a comma occupies less space than the same code with
a comma.
Kraft inequality
Proof
Consider an arbitrary L. Let a(m, n, L) denote the number of sequences
(x1 , ..., xn ) such that |B(xi )| ≤ L and the length of B∗ (x1 , ..., xn ) equals m.
We have
n
X XnL
2−|B(x)| = a(m, n, L) · 2−m .
x:|B(x)|≤L m=1
For any uniquely decodable code B : X → {0, 1}∗ , the expected length of the
code satisfies inequality
E |B(X)| ≥ H(X),
Proof
Introduce probability distributions p(x) = P(X = x) and
2−|B(x)|
r(x) = P −|B(y)|
.
y∈X 2
We have
!
X p(x) X −|B(x)|
E |B(X)| − H(X) = p(x) log − log 2
r(x) x∈X
x:p(x)>0
!
X
= D(p||r) − log 2−|B(x)| .
x∈X
Theorem
Any prefix-free or suffix-free code is uniquely decodable.
Proof
Without loss of generality we shall restrict ourselves to prefix-free codes. The
proof for suffix-free codes is mirror-like. Let B be a prefix-free code and assume
that B(x1 )...B(xn ) = B(y1 )...B(ym ). By the prefix-free property the initial
segments B(x1 ) and B(y1 ) must match exactly and x1 = y1 . The analogous
argument applied by induction yields xi = yi for i = 2, .., n and n = m. Thus
code B is uniquely decodable.
Theorem
Complete codes
Example
A code which is prefix-free, suffix-free, and complete.
symbol x: code word B(x):
a 00
b 01
c 10
d 11
Example
Another code which is prefix-free, suffix-free, and complete.
symbol x: code word B(x):
a 01
b 000
c 100
d 110
e 111
f 0010
g 0011
h 1010
i 1011
Shannon-Fano code
Theorem
Shannon-Fano codes exist for any distribution and satisfy
Proof
We have
X X
2−d− log P(X=x)e ≤ 2log P(X=x) ≤ 1.
x∈X x∈X
Example
Consider the following distribution and codes:
symbol x: P(X = x): code word B(x): code word C(x)
a 1 − 2−5 0 0
b 2−6 100000 10
c 2−6 100001 11
A code that minimizes the expected code length is known under the name of
Huffman code.
Definition (path)
We say that a binary tree contains a path w ∈ {0, 1}∗ if there is a sequence of
links starting from the root node and labeled with the consecutive symbols of
w. We say that the path is ended with symbol a ∈ X if the last link of the
sequence ends in a node labeled with symbol a.
Code trees
Example
0 1 0 1 0 1
a b a b c
0 0 1 0 1
e c d d e
Example
1
0 1
0.5 0.5
0 1 0 1
a, 0.2 b, 0.3 c, 0.1 d, 0.4
Huffman code
Example
1
0 1
0.6 d, 0.4
0 1
0.3 b, 0.3
0 1
a, 0.1 c, 0.2
Theorem
For any probability distribution, the Huffman code is optimal.
Lemma
Consider the two symbols x and y with the smallest probabilities. Then there is
an optimal code tree C such that these two symbols are sibling leaves in the
lowest level of C’s code tree.
Proof
Every internal node in a code tree for an optimal code must have two children.
Then let B be an optimal code and let symbols a and b be two siblings at the
maximal depth of B’s code tree. Assume without loss of generality that
p(x) ≤ p(y) and p(a) ≤ p(b). We have p(x) ≤ p(a), p(y) ≤ p(b),
|B(a)| ≥ |B(x)|, and |B(b)| ≥ |B(y)|. Now let C’s code tree differ from the
B’s code tree by switching a ↔ x and b ↔ y. Then we obtain
E |C(X)| − E |B(X)|
= (p(a) − p(x))(|B(x)| − |B(a)|)
+ (p(b) − p(y))(|B(y)| − |B(b)|) ≤ 0.
On the other hand, let B0 be the Huffman code for X0 and let B be the code
constructed from B0 by adding leaves with symbols x and y to the node with
symbol z. By construction, code B is the Huffman code for X. We have