0% found this document useful (0 votes)
27 views

Notes Shannon

1) For any uniquely decodable code with codeword lengths li, the entropy H is less than or equal to the rate R. 2) A coder can be defined such that the rate R is less than the entropy H + 1. Shannon proved this by defining codeword lengths li as the ceiling of −log2pi, where pi is the probability of the symbol. 3) The document provides examples calculating Shannon codeword lengths, rates, and redundancies for codes representing symbols from different alphabets with varying probabilities.

Uploaded by

XxXavillitoxX 5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Notes Shannon

1) For any uniquely decodable code with codeword lengths li, the entropy H is less than or equal to the rate R. 2) A coder can be defined such that the rate R is less than the entropy H + 1. Shannon proved this by defining codeword lengths li as the ceiling of −log2pi, where pi is the probability of the symbol. 3) The document provides examples calculating Shannon codeword lengths, rates, and redundancies for codes representing symbols from different alphabets with varying probabilities.

Uploaded by

XxXavillitoxX 5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Notes on Shannon theorem for noiseless coding

Antonio Bonafonte

February 24, 2016

For any uniquely decodable code, with lengths li ,

H ≤R

Furthermore, a coder can be easily defined so that

R<H +1

To prove it, we assume the next two equations which will be shown below:

1. Kraft inequality:
M
X
2−li ≤ 1
i=1

2.
M
X M
X X
H=− pi log2 pi ≤ − pi log2 qi ∀qi : qi > 0, qi = 1
i=1 i=1
P
Obviously, this is also true if qi < 1.
It is equal only if qi = pi .

For any uniquely decodable code, with lengths li , we define qi = 2−li . Then qi ≤ 1,
P

(Kraft inequality). Therefore,

M
X
H≤− pi log2 qi
i=1
But the right term is the rate:

M
X M
X M
X
− pi log2 qi = − pi log2 2−li = pi · li = R
i=1 i=1 i=1

Remember, that the equality (H = R) only applies if qi = pi , i.e., if we define li = − log2 pi .


However, li is an integer number and, in general, this is not the case of − log2 pi .
In order to prove that it is possible to define a code with R < H + 1, Shannon defined a
code with lengths

li = ⌈− log2 pi ⌉ = − log2 pi + ǫi 0 ≤ ǫi < 1

1
which accomplish Kraft inequality:

M M M
X X X 1
2−li = 2log2 pi −ǫi = pi ≤1
i=1 i=1 i=1
2 ǫi

because as 0 ≤ ǫi < 1, then 1 ≤ 2 ǫi < 2.


Then, the rate is:
1
M M ✼✓
R=
X X ✓
pi · (− log2 pi + ǫi ) ≤ H + max{ǫi } · ✓ pi < H + 1
i
i=1 ✓
i=1

Exercise: An information source produces independents symbols, from the alphabet


A = {a, b, c, d, e}, with probabilities (0.03, 0.38, 0.51, 0.07, 0.01). Find the lengths of Shan-
non’s code, compute the rate and the redundancy. Define a prefix code with these lengths.
Solution
The length of Shannon’s code is ⌈− log2 pi ⌉. The next table shows the length and a prefix
code defined using the tree drawed below:

ai pi − log2 pi li = ⌈− log2 pi ⌉ code


a 0.03 5.059 6 000000
b 0.38 1.396 2 01
c 0.51 0.971 1 1
d 0.07 3.837 4 0001
e 0.01 6.644 7 0000010
H = 1.51 R = 1.80

0 a
0 0
0 0
0 e
0 1
1 d
1 b
1 c

The redundancy is ρ = R − H = 0.29


Note that a better prefix code can be obtained just removing the nodes with only one
child:

ai pi li code
a 0.03 4 0000
b 0.38 2 01
c 0.51 1 1
d 0.07 3 001
e 0.01 4 0001
H=1.51 R = 1.64

2
In this case the redundancy decreases to ρ = 0.13.
Exercise Exactly the same than previous one but with larger alphabet
Repeat with the alphabet A = {a, b, c, d, e, f, g, h} and probabilities
(0.32, 0.25, 0.15, 0.13, 0.07, 0.05, 0.02, 0.01)

ai pi − log2 pi li = ⌈− log2 pi ⌉ code


a 0.32 1.643 2 11
b 0.25 2.000 2 10
c 0.15 2.737 3 011
d 0.13 2.943 3 010
e 0.07 3.837 4 0001
f 0.05 4.322 5 00001
g 0.02 5.644 6 000001
h 0.01 6.664 7 0000000
H = 2.48 R = 2.70

0 0
h
0
0 g
0 1
0 f
1
1 e
0 0 d
1
1 c
0 b
1
1 a

ρ = 0.22

Note that a better prefix code can be obtained just removing the nodes with only one
child. In this particular case, it can be shown than the the rate is the same than the rate
of an optimal code (Huffman).

0 h
0
0 g
1
0 1 f
1 e
0 0 d
1
1 c
0 b
1
1 a

3
ai pi li code
a 0.32 2 11
b 0.25 2 10
c 0.15 3 011
d 0.13 3 010
e 0.07 3 001
f 0.05 4 0001
g 0.02 5 00001
h 0.01 5 00000
H = 2.48 R = 2.54

Now, let’s prove the two equations which were taken by granted. The first one:

M
X M
X X
H=− pi log2 pi ≤ − pi log2 qi ∀qi : qi > 0, qi = 1
i=1 i=1

x−1

First, recall that ln x ≤ x − 1 , being log(x)


equal in x = 1.
(The logarithm is a convex function that x

cuts in (1, 0), with slope 1.)

Then,

1 1
M M M ✼✓ X M ✓✼
qi qi ✓ ✓
X X   X
pi ln ≤ pi −1 = ✓ q i − ✓ pi = 0
pi pi
i=1 i=1 ✓
i=1 ✓
i=1

This apply with equality if pi = qi .


We can substitute ln with log2 in the result.

M
X M
X M
X
pi log2 qi − pi log2 pi ≤ 0 H≤− pi log2 qi
i=1 i=1 i=1

4
One side consecuence: given a ergodic source with unknown probabilities pi , Let’s be
qi an estimation of the probabilities. The entropy of this estimation over the test set
(x1 . . . ), HQ , is:
N M
1 X X
HQ = lim − log2 qxn = − pi log2 qi ≥ H
N →∞ N
n=1 i=1

The minimum value, H, is achieved when we estimate the correct probabilities qi = pi .


Even if we do not know the real probabilities pi , we can compare two probability esti-
mations, qia and qib . The first one is better if HQa < HQb .

This can be used to prove that

H(X) ≤ log2 M
1
If we select qi = M, then,

M
X 1
H≤− pi log2 = log2 M
i=1
M

Finally, let’s illustrate Kraf inequality.

M
X
2−li ≤ 1
i=1

First, let’s assume that the code is prefix.


The minimum legth of any code is 1. Let’s start with only two symbols.

0 1

a b
In the figure, two symbols of length 1, 2i=1 2−1 = 1. It is not possible select lengths
P

smaller than that. If we had selected longer codes for any symbol, the sum will be smaller
than 1.
We can add a new symbol if we substitute a code of length li , and contribution to the sum
2−li , with two symbols with lj , lk ≥ li +1, and contribution 2−lj +2−lk ≤ 2 2 i + 2 2 i = 2−li .
−l −l

In fact, it makes not sense to choose lj , lk 6= li + 1, as it increases the length of that code
with no benefit.
In the next figure, li : (1, 2, 4, 4, 3) and the sum is 1.

0 1

a 0 1

b 0 1

0 1 e

c d

5
Furthermore, given a code with lengths that satisfy Kraft inequality, we can build easily
a prefix code.
Example: suppose the alphabet A = {a, b, c, d, e}, and a uniquely decodable code:
ai code li 2−li

1
a 11 2 4
1
b 10 2 4
1
c 01 2 4
1
d 100 3 8
1
e 1000 4 16

(Puzzle: decode 10010010011010)


These lengths hold kraft inequality.
We can plot a prefix tree, and define the prefix code:

0 0 0 1 0 1 0 1

0 0 1 0 1 0 0 1 0 1 0 1 0 1

a a b a b c a b c 0 a b c 0 1

d d 0

e
Prefix code:
ai li code
a 2 00
b 2 01
c 2 10
d 3 110
e 4 1110

You might also like