Information Theory and Coding - Chapter 2
Information Theory and Coding - Chapter 2
Text Book: B.P. Lathi, “Modern Digital and Analog Communication Systems”, 3 th edition, Oxford University Press, Inc., 1998
Reference: A. Papoulis, Probability, Random Variables, and Stochastic Processes, Mc-Graw Hill, 2005
The less probable an event is, the more information we gain when it occurs.
Units of information
The base of the logarithm in Equation (9.4) is quite arbitrary.
Nevertheless, it is the standard practice today to use a logarithm to base
2. The resulting unit of information is called the bit
When pk = 1/2, we have I(sk) = 1 bit. Hence, one bit is the amount of
information that we gain when one of two possible and equally likely
(i,e., equiprobable) events occurs.
H(S) =
Where r is the code radix (number of symbols in the code alphabet, r =2 for
binary code), nj is the # of codewords of length j and l is the maximum
codeword length. Moreover, if a prefix code has been constructed for a discrete
memoryless source with source alphabet (s1, s2, . . . , sq) and source statistics
(P1, P2 , . . . , Pq) and the codeword for symbol si has length li, i = 1, 2, . . . , q,
then the codeword lengths must satisfy the above inequality known as the
Kraft-McMillan Inequality. It does not tell us that a source code is a prefix
code. Rather, it is merely a condition on the codeword lengths of the code and
not on the code words themselves. Referring to the three codes listed in Table
9.2:Code I violates the Kraft-McMillan inequality; it cannot therefore be a
prefix code while, the Kraft-McMillan inequality is satisfied by both codes II
and III; but only code II is a prefix code.
Kraft-McMillan Inequality
Prefix codes are distinguished from other uniquely decodable codes by the fact
that the end of the code word is always recognizable. Hence, the decoding of a
prefix can be accomplished as soon as the binary sequence representing a
source symbol is fully received. For this reason, prefix codes are also referred
to as instantaneous codes.
Code I:
Code II:
Code III:
Coding Efficiency
Assume the source has an alphabet with q different symbols, and that the ith symbol si
occurs with probability pi , i = 1, 2,. . . , q. Let the binary code word assigned to
symbol si by the encoder have length li measured in binits.
Then, the average code-word length, L, of the source encoder is defined as
In physical terms, the parameter L represents the average number of binits per source
symbol used in the source encoding process. Let Lmin denote the minimum possible
value of L, then, the coding efficiency of the source encoder is defined as
η = Lmin/ L
With L ≥ Lmin we clearly have η ≤1. The source encoder is said to be efficient when η
approaches unity.
Data Compaction
A common characteristic of signals generated by physical sources is that,
in their natural form, they contain a significant amount of information
that is redundant. The transmission of such redundancy is therefore
wasteful of primary communication resources. For efficient signal
transmission, the redundant information should be removed from the
signal prior to transmission.
This operation, with no loss of information, is ordinarily performed on a
signal in digital form, in which case it is called as data compaction or
lossless data compression.
According to the source-coding theorem, the entropy H(S) represents a
fundamental limit on the removal of redundancy from the data. i.e. the
average number of bits per source symbol necessary to represent a
discrete memoryless source can be made as small as, but no smaller than,
the entropy H(S).
Thus with Lmin = H(S), the efficiency of a source encoder may be
rewritten in terms of the source entropy H(S) as
η = H(S)/ L
Data Compaction
Problem
Find the efficiency
of the source code I
and II and II.
Code I:
Code II:
Code III:
Huffman Code
An important class of prefix codes is known as Huffman codes. The Huffman code by
definition is the most efficient code (highest possible efficiency without coding of
source extension).
The Huffman code of radix r algorithm proceeds as follows:
1. The source symbols are listed in order of decreasing probability.
2. The total # of source symbols q should equal to [b(r-1)+1] & b=0,1,2,3,…. Unless
dummy symbols with zero probabilities should be augmented at the end of the list.
3. The ‘r’ source symbols of lowest probabilities are regarded as being combined (or)
into a new source symbol with probability equal to the sum of the original r
probabilities. Therefore ,The list of source symbols is reduced in size by (r-1). The
probability of the new symbol is placed in the list in accordance with its value (Keep
probability descending order in all time).
3. The procedure is repeated until we are left with a final list of r combined symbols
for which a code symbol is assigned to each one.
4. The code for each (original) source symbol is found by working backward and
tracing the sequence of the code symbols assigned to that source symbol as well as its
successors.
Example 1: Huffman Binary Code (HC)
𝑞 4
∑𝑟 −𝑙𝑖
=∑ 2− 𝑙 =3 ×2− 2+ 2× 2−3 =1⇨ 𝑖𝑠 𝑝𝑟𝑒𝑓𝑖𝑥
𝑖
𝑖=1 𝑖=1
𝐻 ( 𝑆) 2.12
𝜂= = =0.96
𝐿 2.2
Example 2: Huffman Binary Code (HC)
Si Pi HC1
S1 0.7 0 s1 0.7 0 s1 0.7 0 s1 0.7 0
S2 0.1 100 s45 0.1 11 s23 0.2 10 s2-5 0.3 1
S3 0.1 101 s2 0.1 100 s45 0.1 11
S4 0.05 110 s3 0.1 101
S5 0.05 111
Si Pi HC2
S1 0.7 0 s1 0.7 0 s1 0.7 0 s1 0.7 0
S2 0.1 11 s20.1 11 s345 0.2 10 s2-5 0.3 1
S3 0.1 100 s3 0.1 100 s2 0.1 11
S4 0.05 1010 s45 0.1 101
S 0.05 1011
Problem 1
Consider a zero memory binary source S with P(s1) = 0.8 & P(s2) = 0.2 :
a) Construct 2nd and 3rd extensions of the source and find the corresponding probability
distribution of each extension and find the entropy.
b) Write down the binary code of the 2nd extension of the source [T ≡ S2] using each of the
following binary decision trees:
c) Find the average code word length L for each binary code.
d) Encode the following source symbol stream using each of the above binary code:
s2 s1 s1 s1 s1 s2 s2 s2 s1 s1
e) Calculate the binit rate in binits/sec. of each one if the source S emits 2000 symbols/sec.
Problem 2
Consider a zero memory statistical independent binary source S with two source symbols s 1 and
s2. If P(s1) = 0.85, calculate:
a) The amount of information of source symbol s1 = I(s1) in bit of information.
b) The amount of information of source symbol s2 = I(s2) in bit of information.
c) The statistical average of information of the source S = H(S) in bits/source symbol
d) The joint information of the events: A={s1s2} and B={s1s1} in Hartley.
e) The conditional information of the event: A={s1/ s2} in Nat.
Problem 2 - Solution
Consider a zero memory statistical independent binary source S with two source symbols s 1 and
s2. If P(s1) = 0.85, calculate:
5) Calculate the average code length of source (T) and the code efficiency for each
code (LI, 𝜂I, LII, 𝜂II)
6) Encode the following source symbol stream using each of the above binary code
(b a c c a a b b a c b a )
7) Calculate the binit rate in binits/sec. of each code if the source S emits 3000
symbols/sec.
Problem 3 - Solution
Consider 3-symbol, zero memory source S (a, b, c) with P(a) = 0.8 and P(b) = 0.05.
1) Encode the source S symbols using a binary code. Calculate the average code length
P(a) = 0.8
P(b) = 0.05
P(c) = 0.15
0.8 a 0
0.05 b 10
0.15 c 11
(L = 0.8 + 2× 0.05 + 3 × 0.15 = 1.35
L = 0.8 + 3 × 0.05 + 2 × 0.15 = 1.25)
3) Construct the second extension of the source [T ≡ S2] and find its probability
distribution.
6) Encode the following source symbol stream using each of the above binary code:
baccaabbacba
T: t5 t7 t1 t4 t3 t5
Code I 000 0100 11 010111 10 000
Code II 000 011 10 11111 110 000
Problem 3 - Solution
7) Calculate the binit rate in binits/sec. of each code if the source S emits 3000
symbols/sec.
Noteworthy that:
The binit rate without extension = 1.2 × 3000 = 3600 binit/sec = 3.6 kb/sec
Problem 4
Can an instantaneous (Prefix) code be constructed with the
following codeword lengths?. Find the corresponding code using
the decision tree for each eligible case
a) {1,2,3,3,4,4,5,5}, r = 2
b) {1,1,2,2,3,3,4,4}, r = 3
c) {1,1,1,2,2,2,2}, r = 4
Problem 4 - Solution
Problem 5
A zero memory source S emits one of six symbols randomly with probabilities
{0.25, 0.25, 0.2, 0.1, 0.1, 0.1}
1. Construct a Huffman ternary code.
2. Calculate the average length of this code.
3. Calculate the code efficiency.
4. Calculate the code Redundancy.(𝜸=1- 𝜂)
Problem 9
Complete the following probability distribution of the second
extension T of a binary memoryless source S of 3-symbols {a, b &
c}
T S Prob
0.25
1. Find the zero memory source S probability
distribution.
2. Calculate the source entropy H(T).
3. Find the ternary Huffman code for the above
source second extension T and calculate the code
efficiency and redundancy. (Hint: you do not
need to add dummy symbol with zero
probability)
0.01
Code Variance
As a measure of the variability in code-word lengths of a source code,
the variance of the average code-word length L over the ensemble of
source symbols is defined as
where po, pl, . . . , pK-1, are the source statistics, and lk is the length of the
code word assigned to source symbol sk. It is usually found that when a
combined symbol is moved as high as possible, the resulting Huffman
code has a significantly smaller variance σ2 (which is better) than when
it is moved as low as possible. On this basis, it is reasonable to choose
the former Huffman code over the latter.