0% found this document useful (0 votes)
3 views

IT8_02_Source_Coding

This document discusses concepts in information theory and coding, including definitions of injections, codes, uniquely decodable codes, and examples of such codes. It introduces key theorems like the Kraft inequality and the source coding inequality, emphasizing the importance of minimizing expected code length and the characteristics of prefix-free and suffix-free codes. Additionally, it touches on Shannon-Fano codes and Huffman codes, highlighting their applications and drawbacks in encoding symbols efficiently.

Uploaded by

karthikr90637
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

IT8_02_Source_Coding

This document discusses concepts in information theory and coding, including definitions of injections, codes, uniquely decodable codes, and examples of such codes. It introduces key theorems like the Kraft inequality and the source coding inequality, emphasizing the importance of minimizing expected code length and the characteristics of prefix-free and suffix-free codes. Additionally, it touches on Shannon-Fano codes and Huffman codes, highlighting their applications and drawbacks in encoding symbols efficiently.

Uploaded by

karthikr90637
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Codes Kraft inequality Huffman code

Information Theory and Statistics


Lecture 2: Source coding

Łukasz Dębowski
[email protected]

Ph. D. Programme 2013/2014

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Injections and codes

Definition (injection)
Function f is called an injection if x 6= y implies f(x) 6= f(y).

In coding theory we consider injections that map elements of a countable set X


into strings
S over a countable set Y. The set of these strings is denoted as
Y+ = ∞ n ∗ +
n=1 Y . Sometimes we also consider set Y = {λ} ∪ Y where λ is
the empty string. Sets X and Y are called alphabets.

Definition (code)
Any injection B : X → Y∗ will be called a code.

We will consider mostly binary codes, i.e., codes for which Y = {0, 1}∗ . On
the other hand, the alphabet X may consist of letters, digits or other symbols.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Example of a code

Example

An example of a code:
symbol x: code word B(x):
a 0
b 1
c 10
d 11

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Uniquely decodable codes

The original purpose of coding is to transmit some representations of strings


written with symbols from an alphabet X through a communication channel
which passes only strings written with symbols from a smaller alphabet Y.
Thus the idea of a particularly good coding is that we should be able to
reconstruct coded symbols from the concatenation of their codes.
Formally speaking, the following property is desired.

Definition (uniquely decodable code)


Code B : X → Y∗ is called uniquely decodable if the code extension

B∗ : X∗ 3 (x1 , ..., xn ) 7→ B(x1 )...B(xn ) ∈ Y∗

is also an injection.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Examples of codes

Example (a code which is not uniquely decodable)

symbol x: code word B(x):


a 0
b 1
c 10
d 11

Example (a uniquely decodable code)

symbol x: code word B(x):


a 0c
b 1c
c 10c
d 11c

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Comma-separated codes

Definition (comma-separated code)


Let c 6∈ Y. Code B : X → (Y ∪ {c})∗ is called comma-separated if for each
x ∈ X there exists a string w ∈ Y∗ such that B(x) = wc. Symbol c is called
the comma.

Theorem
Each comma-separated code is uniquely decodable.

Proof
For a comma-separated code B, let us decompose B(x) = φ(x)c. We first
observe that B(x1 )...B(xn ) = B(y1 )...B(ym ) holds only if n = m (the same
number of c’s on both sides of equality) and φ(xi ) = φ(yi ) for i = 1, ..., n.
Next, we observe that function φ is a code. Hence string B(x1 )...B(xn ) may
be only the image of (x1 , ..., xn ) under the mapping B∗ . This means that code
B is uniquely decodable.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Fixed-length codes

Definition (fixed-length code)


Let n be a fixed natural number. Code B : X → Yn is called a fixed-length
code.

Example

An example of a fixed-length code:


symbol x: code word B(x):
a 00
b 01
c 10
d 11

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Fixed-length codes (continued)

Theorem
Each fixed-length code is uniquely decodable.

Proof
Consider a fixed-length code B. We observe that
B(x1 )...B(xn ) = B(y1 )...B(ym ) holds only if n = m (the same length of
strings on both sides of equality) and B(xi ) = B(yi ) for i = 1, ..., n. Because
B is an injection, string B(x1 )...B(xn ) may be only the image of (x1 , ..., xn )
under the mapping B∗ . Hence, code B is uniquely decodable.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Expected code length


Let |w| denote the length of a string w ∈ Y∗ , measured in the number in
symbols. For a random variable X : Ω → X, we will be interested in the
expected code length
X
E |B(X)| = P(X = x) |B(x)| .
x∈X

Example
Consider the following distribution and a code:
symbol x: P(X = x): code word B(x):
a 1/2 0C
b 1/6 1C
c 1/6 10C
d 1/6 11C
1 1 1 1
We have E |B(X)| = 2 · 2
+2· 6
+3· 6
+3· 6
= 2 31 .

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

What is the shortest code?

We are interested in codes that minimize the expected code length for
a given probability distribution.
In this regard, both comma-separated codes and fixed-length codes have
advantages and drawbacks.
If certain symbols appear more often than others then comma-separated
codes allow to code them as shorter strings and thus to spare space.
On the other hand, if all symbols are equiprobable then a fixed-length
code without a comma occupies less space than the same code with
a comma.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Kraft inequality

Theorem (Kraft inequality)

For any uniquely decodable code B : X → {0, 1}∗ we have


X −|B(x)|
2 ≤ 1.
x∈X

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Kraft inequality (proof)

Proof
Consider an arbitrary L. Let a(m, n, L) denote the number of sequences
(x1 , ..., xn ) such that |B(xi )| ≤ L and the length of B∗ (x1 , ..., xn ) equals m.
We have
 n
X XnL
 2−|B(x)|  = a(m, n, L) · 2−m .
x:|B(x)|≤L m=1

Because the code is uniquely decodable, we have a(m, n, L) ≤ 2m . Therefore


n→∞
X
2−|B(x)| ≤ (nL)1/n −−−→ 1.
x:|B(x)|≤L

Letting L → ∞, we obtain the Kraft inequality.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Source coding inequality

Theorem (source coding inequality)

For any uniquely decodable code B : X → {0, 1}∗ , the expected length of the
code satisfies inequality

E |B(X)| ≥ H(X),

where H(X) is the entropy of X.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Source coding inequality (proof)

Proof
Introduce probability distributions p(x) = P(X = x) and

2−|B(x)|
r(x) = P −|B(y)|
.
y∈X 2

We have
!
X p(x) X −|B(x)|
E |B(X)| − H(X) = p(x) log − log 2
r(x) x∈X
x:p(x)>0
!
X
= D(p||r) − log 2−|B(x)| .
x∈X

That difference is nonnegative by nonnegativity of Kullback-Leibler divergence


and Kraft inequality.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Prefix-free and suffix-free codes

Definition (prefix-free code)


A code B is called prefix-free if no code word B(x) is a prefix of another code
word B(y), i.e., it is not true that B(y) = B(x)u for x 6= y and u ∈ Y∗ .

Definition (suffix-free code)


A code B is called suffix-free if no code word B(x) is a suffix of another code
word B(y), i.e., it is not true that B(y) = uB(x) for x 6= y and u ∈ Y∗ .

Example (a code which is prefix-free but not suffix-free)

symbol x: code word B(x):


a 10
b 0
c 11

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Prefix-free and suffix-free codes (continued)

Theorem
Any prefix-free or suffix-free code is uniquely decodable.

Proof
Without loss of generality we shall restrict ourselves to prefix-free codes. The
proof for suffix-free codes is mirror-like. Let B be a prefix-free code and assume
that B(x1 )...B(xn ) = B(y1 )...B(ym ). By the prefix-free property the initial
segments B(x1 ) and B(y1 ) must match exactly and x1 = y1 . The analogous
argument applied by induction yields xi = yi for i = 2, .., n and n = m. Thus
code B is uniquely decodable.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

A theorem converse to Kraft inequality

Theorem

If function l : X → N satisfies inequality


X −l(x)
2 ≤1
x∈X

then we may construct a prefix-free code B : X → {0, 1}∗ such that


|B(x)| = l(x).

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Complete codes

Definition (complete code)


A code B : X → {0, 1}∗ is called complete if
X −|B(x)|
2 = 1.
x∈X

Example
A code which is prefix-free, suffix-free, and complete.
symbol x: code word B(x):
a 00
b 01
c 10
d 11

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Complete codes (continued)

Example
Another code which is prefix-free, suffix-free, and complete.
symbol x: code word B(x):
a 01
b 000
c 100
d 110
e 111
f 0010
g 0011
h 1010
i 1011

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Shannon-Fano code

Definition (Shannon-Fano code)


A prefix-free code B : X → {0, 1}∗ is called a Shannon-Fano code if

|B(x)| = d− log P(X = x)e .

Theorem
Shannon-Fano codes exist for any distribution and satisfy

H(X) ≤ E |B(X)| ≤ H(X) + 1.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Shannon-Fano code (continued)

Proof
We have
X X
2−d− log P(X=x)e ≤ 2log P(X=x) ≤ 1.
x∈X x∈X

Hence Shannon-Fano codes exist. The other claim follows by

− log P(X = x) ≤ |B(x)| ≤ − log P(X = x) + 1.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Drawbacks of the Shannon-Fano code

Shannon-Fano code is not necessarily the shortest possible code.

Example
Consider the following distribution and codes:
symbol x: P(X = x): code word B(x): code word C(x)
a 1 − 2−5 0 0
b 2−6 100000 10
c 2−6 100001 11

Code B is a Shannon-Fano code, whereas code C is another code. We have


H(X) = 0.231..., E |B(X)| = 1.15625, and E |C(X)| = 1.03125. For no
symbol code C is worse than code B, whereas for less probable symbols code C
is much better.

A code that minimizes the expected code length is known under the name of
Huffman code.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Trees and paths

Definition (binary tree)


A binary tree is a directed acyclic connected graph where each node has at
most two children nodes and at most one parent node. The node which has no
parents is called the root node. The nodes which have no children are called
leaf nodes. We assume that links to the left children are labeled with 0’s
whereas links to the right children are labeled with 1’s. Moreover, some nodes
may be labeled with some symbols as well.

Definition (path)
We say that a binary tree contains a path w ∈ {0, 1}∗ if there is a sequence of
links starting from the root node and labeled with the consecutive symbols of
w. We say that the path is ended with symbol a ∈ X if the last link of the
sequence ends in a node labeled with symbol a.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Code trees

Definition (code tree)


The code tree for a code B : X → {0, 1}∗ is a labeled binary tree which
contains a path w if and only if B(a) = w for some a ∈ X, and exactly in that
case we require that path w is ended with symbol a.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Examples of code trees

Example

symbol x: code word B(x): code word C(x):


a 0 00
b 1 01
c 10 10
d 11 110
e 00 111

The code trees for these codes are:


0 1

0 1 0 1 0 1
a b a b c
0 0 1 0 1
e c d d e

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Weighted code trees

Definition (weighted code tree)


The weighted code tree for a prefix code B : X → {0, 1}∗ and a probability
distribution p : X → [0, 1] is the code tree for code B where the nodes are
enhanced with the following weights: (1) for a leaf node with symbol a, we add
weight p(a), (2) to other (internal) nodes, we ascribe weights equal to the sum
of weights of their children.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Example of a weighted code tree

Example

symbol x: p(x): code word C(x):


a 0.2 00
b 0.3 01
c 0.1 10
d 0.4 11

1
0 1
0.5 0.5
0 1 0 1
a, 0.2 b, 0.3 c, 0.1 d, 0.4

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Huffman code

Definition (Huffman code)


The Huffman code for a probability distribution p : X → [0, 1] is a code whose
weighted code tree is constructed by the following algorithm:
1 Create a leaf node for each symbol and add them to a list.
2 While there is more than one node in the list:
1 Remove two nodes of the lowest weight from the list.
2 Create a new internal node with these two nodes as children
and with weight equal to the sum of the two nodes’ weights.
3 Add the new node to the list.
3 The remaining node is the root node and the tree is complete.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Example of Huffman code

Example

symbol x: p(x): Huffman code B(x):


a 0.2 000
b 0.3 01
c 0.1 001
d 0.4 1

1
0 1
0.6 d, 0.4
0 1
0.3 b, 0.3
0 1
a, 0.1 c, 0.2

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Optimality of Huffman code

A code B will be called optimal for a given distribution p(x) = P(X = x) if


E |B(X)| achieves the minimum.

Theorem
For any probability distribution, the Huffman code is optimal.

To prove the theorem, we will use the this fact:

Lemma
Consider the two symbols x and y with the smallest probabilities. Then there is
an optimal code tree C such that these two symbols are sibling leaves in the
lowest level of C’s code tree.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Proof of the lemma

Proof
Every internal node in a code tree for an optimal code must have two children.
Then let B be an optimal code and let symbols a and b be two siblings at the
maximal depth of B’s code tree. Assume without loss of generality that
p(x) ≤ p(y) and p(a) ≤ p(b). We have p(x) ≤ p(a), p(y) ≤ p(b),
|B(a)| ≥ |B(x)|, and |B(b)| ≥ |B(y)|. Now let C’s code tree differ from the
B’s code tree by switching a ↔ x and b ↔ y. Then we obtain

E |C(X)| − E |B(X)|
= (p(a) − p(x))(|B(x)| − |B(a)|)
+ (p(b) − p(y))(|B(y)| − |B(b)|) ≤ 0.

Hence code C is also optimal.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Proof of Huffman code’s optimality

Now we will proceed by induction on the number of symbols in the alphabet X.


If X contains only two symbols, then Huffman code is optimal. In the second
step, we assume that Huffman code is optimal for n − 1 symbols and we prove
its optimality for n symbols. Let C be an optimal code for n symbols. Without
loss of generality we may assume that symbols x and y having the smallest
probabilities occupy two sibling leaves in the lowest level of C’s code tree.
Then from the weighted code tree of C we construct a code C0 for n − 1
symbols by removing nodes with symbols x and y and ascribing a symbol z to
its parent node. Hence we have

E C0 (X0 ) = E |C(X)| − p(x) − p(y),

where variable X0 = z if X ∈ {x, y} and X0 = X otherwise.

Project co-financed by the European Union


within the framework of the European Social Fund
Codes Kraft inequality Huffman code

Proof of Huffman code’s optimality (continued)

On the other hand, let B0 be the Huffman code for X0 and let B be the code
constructed from B0 by adding leaves with symbols x and y to the node with
symbol z. By construction, code B is the Huffman code for X. We have

E B0 (X0 ) = E |B(X)| − p(x) − p(y).

Because E |B0 (X0 )| ≤ E |C0 (X0 )| by optimality of Huffman code B0 , we


obtain E |B(X)| ≤ E |C(X)|. Hence Huffman code B is also optimal.

Project co-financed by the European Union


within the framework of the European Social Fund

You might also like