0% found this document useful (1 vote)
213 views41 pages

Information Theory and Coding - Chapter 2

Information Theory - Discrete Memoryless Source - Information Measure - Uncertainty, Surprise, and Information - Units of information - Source Entropy - Source Coding Theory - Prefix (Instantaneous) Code - Decision Tree - Kraft-McMillan Inequality - Coding Efficiency - Data Compaction - Huffman Code - Code Variance - Solved Problems.
Copyright
© Public Domain
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
213 views41 pages

Information Theory and Coding - Chapter 2

Information Theory - Discrete Memoryless Source - Information Measure - Uncertainty, Surprise, and Information - Units of information - Source Entropy - Source Coding Theory - Prefix (Instantaneous) Code - Decision Tree - Kraft-McMillan Inequality - Coding Efficiency - Data Compaction - Huffman Code - Code Variance - Solved Problems.
Copyright
© Public Domain
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Mustaqbal University

College of Engineering &Computer Sciences


Electronics and Communication Engineering Department

Course: EE301: Probability Theory and Applications


Prerequisite: Stat 219

Text Book: B.P. Lathi, “Modern Digital and Analog Communication Systems”, 3 th edition, Oxford University Press, Inc., 1998
Reference: A. Papoulis, Probability, Random Variables, and Stochastic Processes, Mc-Graw Hill, 2005

Dr. Aref Hassan Kurdali


Application: Information Theory
• In the context of communications, information theory deals with
mathematical modeling and analysis of a communication system
rather than with physical sources and physical channels.
• In particular, it provides answers to two fundamental questions
(among others):

1) What is the minimum number of binits (binary digits) per source


symbol required to fully represent the source in acceptable quality ?
(Most efficient source coding)
2) What is the ultimate (highest) transmission binit rate for reliable
communication (no error transmission) over a noisy channel?
(Most efficient channel coding)
The answers to these two questions lie in the entropy of a source and the
capacity of a channel respectively.

Entropy is defined in terms of the probabilistic behavior of a source of


information (How much average uncertainty of an information source?);
it is so named in respect to the parallel use of this concept in
thermodynamics (How much average instability of a physical source?).

Capacity is defined as the basic ability of a channel to transmit


information; it is naturally related to the noise characteristics of the
channel.

A remarkable result that emerges from information theory is that if the


entropy of the source is less than the capacity of the channel, then
error-free communication over the channel can be achieved.
Discrete Memoryless Source
The discrete source output is modeled as a discrete random variable, S,
which takes on symbols from a fixed finite alphabet

S={s1, s2, s3, .........., sq}

With probability distribution P(S= si) = pi, i=1,2, 3,........,q


Where

A discrete memoryless source (zero memory source) emits statistically


independent symbols during successive signaling intervals where the
symbol emitted at any time is independent of previous emitted symbols.
Information Measure
How much information I(a) associated with an event ‘a’ whose
probability p(a) = p?.
The information measure I(a) should have several properties:
1. Information is a non-negative quantity: I(a) ≥ 0.
2. If an event has probability 1, we get no information from the
occurrence of that event, i.e. I(a) = 0 if p (a) =1.
3. If two independent events (a & b) occur (whose joint probability is the
product of their individual probabilities i.e. p(ab) = p(a)p(b)), then the
total information we get from observing these two events is the sum of
the two informations:
I(ab) = I(a)+I(b). (This is the critical property . . . )
4. The information measure should be a continuous (and, in fact,
monotonic) function of the probability (slight changes in probability
should result in slight changes in information).
Since, I(a2) = I(aa) = I(a)+I(a) = 2 I(a)
Thus, by continuity, we get, for 0 < p(a) ≤ 1, and n > 0 as real number:
I(an) = n * I(a)
From this, The information can be measured by the logarithm function,
i.e. I(a) = −logb(p(a)) = logb(1/p(a)) for some base b.
The base b determines the unit of information used.
The unit can be changed by changing the base, using the following formula:
For b1, b2 & x > 0,

Therefore, logb1 (x) = logb2(x) / logb2(b1)


Uncertainty, Surprise, and Information
The amount of (uncertainty, surprise), information gained (before, at)
after observing the event S = sk, which occurs with probability pk, is
therefore defined using the logarithmic function

The occurrence of an event S = sk either provides some or no information, but never


brings about a loss of information.

The less probable an event is, the more information we gain when it occurs.
Units of information
The base of the logarithm in Equation (9.4) is quite arbitrary.
Nevertheless, it is the standard practice today to use a logarithm to base
2. The resulting unit of information is called the bit

When pk = 1/2, we have I(sk) = 1 bit. Hence, one bit is the amount of
information that we gain when one of two possible and equally likely
(i,e., equiprobable) events occurs.

If a logarithm to base 10 is used, the resulting unit of information is


called the hartly. When pk = 1/10, we have I(sk) = 1 hartly.

A logarithm to base e can also be used, the resulting unit of information


is called the nat. When pk = 1/e, we have I(sk) = 1 nat.
Source Entropy H(S)
the entropy of a discrete memoryless source

H(S) =

It is the average amount of information content per source symbol.


The source entropy is bounded as follows:
0 ≤ H(S) ≤ log q

where q is the radix (number of symbols) of the alphabet of the source.


Furthermore, we may make two statements:
1. H(S) = 0, if and only if the probability pi = 1 for some i, and the
remaining probabilities in the set are all zero; this lower bound on
entropy corresponds to no uncertainty.
2. H(S) = log q, if and only if pi = 1/q for all i (i.e., all the symbols in the
alphabet are equiprobable); this upper bound on entropy corresponds to
maximum uncertainty.
Consider a binary source for which
symbol 0 occurs with probability p0 and
symbol 1 with probability pl = 1 – p0.
The source is memoryless so that
successive symbols emitted by the
source are statistically independent.
The entropy of the binary source is
usually called as the entropy function

h(p0) = p0 log (1/p0) + (1-p0) log (1/(1-p0))


we often find it useful to consider blocks rather than individual symbols,
with each block consisting of n successive source symbols.
We may view each such block as being produced by an extended source
with a source alphabet that has qn distinct blocks, where q is the number
of distinct symbols in the source alphabet of the original source.
a)In the case of a discrete memoryless source, the source symbols are
statistically independent. Hence, the probability of an extended source
symbol is equal to the product of the probabilities of the n original
source symbols constituting the particular extended source symbol.
Thus, it may be intuitively to expect that H(Sn), the entropy of the
extended source, is equal to n times H(S) the entropy of the original
source. That
is, we may write
H(Sn) = n H(S)
Problems
1. Find the entropy of a 7-symbol source at uniform distribution.
(Answer: 2.81 bits of information/SS)
2. Given a five-symbol source with the following probability
distribution {1/2, 1/4, 1/8, 1/16, 1/16, calculate the average
amount of information per source symbol. (Answer: 1.875
bits/SS)
3. Given a 3-symbol, zero memory source S (a, b, c). If the
amount of the joint information I(bc) = log(12) bits of
information. Find any possible source probability distribution
the source S. (Answer: {5/12, 1/3, 1/4} )
4. Consider a zero memory binary source S with P(s1) = 0.8 &
P(s2) = 0.2.
a) Construct 2nd and 3rd extensions of the source S.
b) Find the corresponding probability distribution of each extension.
c) Calculate the average amount of information per source symbol (H(S2) and
H(S3)).
Source Coding Theory
The process by which an efficient representation of data generated by a discrete source
( with finite source alphabet) is called source encoding. The device that performs this
representation is called a source encoder. For the source encoder to be efficient,
knowledge of the statistics of the source is required. In particular, if some source
alphabets (symbols) are known to be more probable than others, then this feature may
be exploited in the generation of a source code by assigning short code words to
frequent source symbols, and long code words to rare source symbols in order to
achieve lower code rate (# of code symbols/sec.)and hence using lower communication
channel bandwidth in Hz for transmission or less memory bits for storage. Such a
source code is called a variable-length code.
Let r represents the code radix (number of code alphabet), ( r =2 for binary code, r = 8
for octal code and r = 10 for decimal code and so on).
j is the codeword length (# of code symbol per codeword) and nj is the # of codewords
of length j.
An efficient source encoder should satisfy two functional requirements:
1. The code words produced by the encoder are in binary form.
2. The source code is uniquely decodable, so that the original source sequence can
be reconstructed perfectly from the encoded binary sequence.
Prefix (Instantaneous) Code
(Entropy Code - Lossless Data Compression)
For a source variable length code to be of practical use, the code has to
be uniquely decodable (The code and all its extensions must be unique).
This restriction ensures that for each finite sequence of symbols emitted
by the source, the corresponding sequence of code words is unique and
different from the sequence of code words corresponding to any other
source sequence. A prefix (instantaneous) code (a Subclass of uniquely
decodable) is defined as a code in which no code word is the prefix of
any other code word.

Only Code II is a prefix code which is always uniquely decodable code.


Code III is also an uniquely decodable code since the bit 0 indicates the
beginning of each code word but not an instantaneous code. Each codeword
of an instantaneous code can be directly decoded once it is completely
received. (Code I not decodable, example: when 00 is received, it will be s2 or s s )
0 0
Decision Tree
The shown decision tree is a graphical
representation of the code words which has
an initial state and four terminal states
corresponding to source symbols so, s1, s2,
and s3. Source symbols must not be in
intermediate states to satisfy the prefix
condition. The decoder always
starts at the initial state. The first received bit
moves the decoder to the terminal state so
if it is 0, or else to a second decision point if
it is 1. In the latter case, the second bit moves
the decoder one step further down the tree,
either to terminal state s2 if it is 0, or else to
a third decision point if it is 1, and so on.
Once each terminal state emits its symbol, the decoder is reset to its initial state. Note
also that each bit in the received encoded sequence is examined only once.
For example, the encoded sequence 1011111000 . . . is readily decoded as the source
sequence sl s3 s2 so so.. . .
Kraft-McMillan Inequality

Where r is the code radix (number of symbols in the code alphabet, r =2 for
binary code), nj is the # of codewords of length j and l is the maximum
codeword length. Moreover, if a prefix code has been constructed for a discrete
memoryless source with source alphabet (s1, s2, . . . , sq) and source statistics
(P1, P2 , . . . , Pq) and the codeword for symbol si has length li, i = 1, 2, . . . , q,
then the codeword lengths must satisfy the above inequality known as the
Kraft-McMillan Inequality. It does not tell us that a source code is a prefix
code. Rather, it is merely a condition on the codeword lengths of the code and
not on the code words themselves. Referring to the three codes listed in Table
9.2:Code I violates the Kraft-McMillan inequality; it cannot therefore be a
prefix code while, the Kraft-McMillan inequality is satisfied by both codes II
and III; but only code II is a prefix code.
Kraft-McMillan Inequality
Prefix codes are distinguished from other uniquely decodable codes by the fact
that the end of the code word is always recognizable. Hence, the decoding of a
prefix can be accomplished as soon as the binary sequence representing a
source symbol is fully received. For this reason, prefix codes are also referred
to as instantaneous codes.
Code I:

Code II:

Code III:
Coding Efficiency
Assume the source has an alphabet with q different symbols, and that the ith symbol si
occurs with probability pi , i = 1, 2,. . . , q. Let the binary code word assigned to
symbol si by the encoder have length li measured in binits.
Then, the average code-word length, L, of the source encoder is defined as

In physical terms, the parameter L represents the average number of binits per source
symbol used in the source encoding process. Let Lmin denote the minimum possible
value of L, then, the coding efficiency of the source encoder is defined as

η = Lmin/ L

With L ≥ Lmin we clearly have η ≤1. The source encoder is said to be efficient when η
approaches unity.
Data Compaction
A common characteristic of signals generated by physical sources is that,
in their natural form, they contain a significant amount of information
that is redundant. The transmission of such redundancy is therefore
wasteful of primary communication resources. For efficient signal
transmission, the redundant information should be removed from the
signal prior to transmission.
This operation, with no loss of information, is ordinarily performed on a
signal in digital form, in which case it is called as data compaction or
lossless data compression.
According to the source-coding theorem, the entropy H(S) represents a
fundamental limit on the removal of redundancy from the data. i.e. the
average number of bits per source symbol necessary to represent a
discrete memoryless source can be made as small as, but no smaller than,
the entropy H(S).
Thus with Lmin = H(S), the efficiency of a source encoder may be
rewritten in terms of the source entropy H(S) as
η = H(S)/ L
Data Compaction
Problem
Find the efficiency
of the source code I
and II and II.

Code I:

Code II:

Code III:
Huffman Code
An important class of prefix codes is known as Huffman codes. The Huffman code by
definition is the most efficient code (highest possible efficiency without coding of
source extension).
The Huffman code of radix r algorithm proceeds as follows:
1. The source symbols are listed in order of decreasing probability.
2. The total # of source symbols q should equal to [b(r-1)+1] & b=0,1,2,3,…. Unless
dummy symbols with zero probabilities should be augmented at the end of the list.
3. The ‘r’ source symbols of lowest probabilities are regarded as being combined (or)
into a new source symbol with probability equal to the sum of the original r
probabilities. Therefore ,The list of source symbols is reduced in size by (r-1). The
probability of the new symbol is placed in the list in accordance with its value (Keep
probability descending order in all time).
3. The procedure is repeated until we are left with a final list of r combined symbols
for which a code symbol is assigned to each one.
4. The code for each (original) source symbol is found by working backward and
tracing the sequence of the code symbols assigned to that source symbol as well as its
successors.
Example 1: Huffman Binary Code (HC)

𝑞 4

∑𝑟 −𝑙𝑖
=∑ 2− 𝑙 =3 ×2− 2+ 2× 2−3 =1⇨ 𝑖𝑠 𝑝𝑟𝑒𝑓𝑖𝑥
𝑖

𝑖=1 𝑖=1
𝐻 ( 𝑆) 2.12
𝜂= = =0.96
𝐿 2.2
Example 2: Huffman Binary Code (HC)
Si Pi HC1
S1 0.7 0 s1 0.7 0 s1 0.7 0 s1 0.7 0
S2 0.1 100 s45 0.1 11 s23 0.2 10 s2-5 0.3 1
S3 0.1 101 s2 0.1 100 s45 0.1 11
S4 0.05 110 s3 0.1 101
S5 0.05 111

Si Pi HC2
S1 0.7 0 s1 0.7 0 s1 0.7 0 s1 0.7 0
S2 0.1 11 s20.1 11 s345 0.2 10 s2-5 0.3 1
S3 0.1 100 s3 0.1 100 s2 0.1 11
S4 0.05 1010 s45 0.1 101
S 0.05 1011
Problem 1
Consider a zero memory binary source S with P(s1) = 0.8 & P(s2) = 0.2 :
a) Construct 2nd and 3rd extensions of the source and find the corresponding probability
distribution of each extension and find the entropy.
b) Write down the binary code of the 2nd extension of the source [T ≡ S2] using each of the
following binary decision trees:

c) Find the average code word length L for each binary code.
d) Encode the following source symbol stream using each of the above binary code:
s2 s1 s1 s1 s1 s2 s2 s2 s1 s1
e) Calculate the binit rate in binits/sec. of each one if the source S emits 2000 symbols/sec.
Problem 2
Consider a zero memory statistical independent binary source S with two source symbols s 1 and
s2. If P(s1) = 0.85, calculate:
a) The amount of information of source symbol s1 = I(s1) in bit of information. 
b) The amount of information of source symbol s2 = I(s2) in bit of information. 
c) The statistical average of information of the source S = H(S) in bits/source symbol 
d) The joint information of the events: A={s1s2} and B={s1s1} in Hartley. 
e) The conditional information of the event: A={s1/ s2} in Nat.
Problem 2 - Solution
Consider a zero memory statistical independent binary source S with two source symbols s 1 and
s2. If P(s1) = 0.85, calculate:

a) The amount of information of source symbol s1 = I(s1) in bit of information. 


I(s1) = log (1/0.85) = 0.2345

b) The amount of information of source symbol s2 = I(s2) in bit of information.


I(s2) = log(1/0.15) = 2.737 

c) The statistical average of information of the source S = H(S) in bits/source symbol


H(S) = 0.85 × 0.2345 + .15 × 2.737 = 0.61 bits/SS
 
d) The joint information of the events: A={s1s2} and B={s1s1} in Hartley. 
I(A) = log10 (1/(0.85 × 0.15)) = log10 (1/0.1275) = 0.8945 Hartley
  I(B) = log10 (1/(0.85 × 0.85)) = log10 (1/0.7225) = 0.1412 Hartley

e) The conditional information of the event: A={s1/ s2} in Nat.


P(s1/ s2) = P(s1) ……. (SI)
I(A) = ln(1/0.85) = 0.1625 Nat
Problem 3
Consider 3-symbol, zero memory source S (a, b, c) with P(a) = 0.8 and P(b) = 0.05.
1) Encode the source S symbols using a binary code. Calculate the average code
length L.
2) Calculate the source entropy H(S). Calculate the code efficiency η = H(S)/L
3) Construct the second extension of the source [T ≡ S2] and find its probability
distribution.
4) Write down the binary code of the source (T) symbols using each of the following
binary decision trees:

5) Calculate the average code length of source (T) and the code efficiency for each
code (LI, 𝜂I, LII, 𝜂II)
6) Encode the following source symbol stream using each of the above binary code
(b a c c a a b b a c b a )
7) Calculate the binit rate in binits/sec. of each code if the source S emits 3000
symbols/sec.
Problem 3 - Solution
Consider 3-symbol, zero memory source S (a, b, c) with P(a) = 0.8 and P(b) = 0.05.
1) Encode the source S symbols using a binary code. Calculate the average code length

P(a) = 0.8
P(b) = 0.05
P(c) = 0.15

0.8 a 0
0.05 b 10
0.15 c 11
 
(L = 0.8 + 2× 0.05 + 3 × 0.15 = 1.35
L = 0.8 + 3 × 0.05 + 2 × 0.15 = 1.25)

L = 0.8 + 2 × 0.05 + 2 × 0.15 = 0.8 + 2 × 0.2=1.2 binits/SS


Problem 3 - Solution
2) Calculate the source entropy H(S). Calculate the code efficiency η = H(S)/L

H(S) = .8log(1/.8) + .05log(1/.05) + .15log(1/.15) = 0.884 bits/SS,


η = H(S)/L = 0.884/1.2 = 73.68%

3) Construct the second extension of the source [T ≡ S2] and find its probability
distribution.

P(t1) = P(aa) = 0.82 = 0.64


P(t2) = P(ab) = 0.8 × 0.05 = 0.04
P(t3) = P(ac) = 0.8 × 0.15 = 0.12
P(t4) = P(bb) = 0.052 = 0.0025
P(t5) = P(ba) = 0.8 × 0.05 = 0.04
P(t6) = P(bc) = 0.05 × 0.15 = 0.0075
P(t7) = P(cc) = 0.152 = 0.0225
P(t8) = P(cb) = 0.15 × 0.05 = 0.0075
P(t9) = P(ca) = 0.8 × 0.15 = 0.12
Problem 3 - Solution
4) Write down the binary code of the source (T) symbols using each of the following
binary decision trees.
Problem 3 - Solution
5) Calculate the average code length of source (T) and the code efficiency for each code
(LI, 𝜂I, LII, 𝜂II)

Code I word length: {2, 2, 3, 3, 3, 4, 5, 6, 6}


L = 2×0.76 + 3×0.2 + 4×0.0225 + 5×0.0075 + 6×0.01= 2.3075 binits/2SS
ηI = H(T)/L = 2H(S)/L = 2×0.884/2.3075 = 76.62%
 

Code II word lengths: {2, 3, 3, 3, 3, 3, 4, 5, 5}


L = 2×0.64 + 3×0.3425 + 4×0.0075 + 5×0.01 = 2.3875 binits/2SS
ηII = H(T)/L = 2H(S)/L = 2×0.884/2.3875 = 74.05%

6) Encode the following source symbol stream using each of the above binary code:
baccaabbacba
 T: t5 t7 t1 t4 t3 t5
Code I 000 0100 11 010111 10 000
Code II 000 011 10 11111 110 000
Problem 3 - Solution
7) Calculate the binit rate in binits/sec. of each code if the source S emits 3000
symbols/sec.

(binit rate = source symbol rate × source average code length)


 
Code I binit rate = 2.3075 × 1500 = 3.461 kb/sec
 Code II binit rate = 2.3875 ×1500 = 3.581 kb/sec
 

Noteworthy that:
The binit rate without extension = 1.2 × 3000 = 3600 binit/sec = 3.6 kb/sec
Problem 4
Can an instantaneous (Prefix) code be constructed with the
following codeword lengths?. Find the corresponding code using
the decision tree for each eligible case

a) {1,2,3,3,4,4,5,5}, r = 2
b) {1,1,2,2,3,3,4,4}, r = 3
c) {1,1,1,2,2,2,2}, r = 4
Problem 4 - Solution
Problem 5

A zero memory source S emits one of eight symbols


randomly every 1 microsecond with probabilities
{0.13, 0.2, 0.16, 0.3, 0.07, 0.05, 0.03, 0.06}

1. Calculate the source entropy H(S).


2. Construct a Huffman binary code.
3. Calculate the code efficiency.
4. Find the encoder output average binit rate.
Problem 6

A zero memory source S emits one of five symbols


randomly every 2 microsecond with probabilities
{0.25, 0.25, 0.2, 0.15, 0.15}

1. Calculate the source entropy H(S).


2. Construct a Huffman binary code.
3. Calculate the average length of this code.
4. Calculate the code efficiency.
5. Find the encoder output average binit rate.
Problem 7

A zero memory source S emits one of five symbols


randomly every 2 microsecond with probabilities
{0.25, 0.25, 0.2, 0.15, 0.15}

1. Construct a Huffman ternary code.


2. Calculate the average length of this code.
3. Calculate the code efficiency.
4. Calculate the code Redundancy.(𝜸=1- 𝜂)
Problem 8
If r ≥ 3, we may not have a sufficient number of symbols so that we can
combine them r at a time. In such a case, we add dummy symbols to the end of
the set of symbols. The dummy symbols have probability 0 and are inserted to
fill the tree. Since at each stage of the reduction, the number of symbols is
reduced by r − 1, we want the total number of symbols to be 1 + k(r − 1),
where k is the number of merges. Hence, we add enough dummy symbols so
that the total number of symbols is of this form. For example:

A zero memory source S emits one of six symbols randomly with probabilities
{0.25, 0.25, 0.2, 0.1, 0.1, 0.1}
1. Construct a Huffman ternary code.
2. Calculate the average length of this code.
3. Calculate the code efficiency.
4. Calculate the code Redundancy.(𝜸=1- 𝜂)
Problem 9
Complete the following probability distribution of the second
extension T of a binary memoryless source S of 3-symbols {a, b &
c}
T S Prob
0.25
1. Find the zero memory source S probability
distribution.
2. Calculate the source entropy H(T).
3. Find the ternary Huffman code for the above
source second extension T and calculate the code
efficiency and redundancy. (Hint: you do not
need to add dummy symbol with zero
probability)

0.01
Code Variance
As a measure of the variability in code-word lengths of a source code,
the variance of the average code-word length L over the ensemble of
source symbols is defined as

where po, pl, . . . , pK-1, are the source statistics, and lk is the length of the
code word assigned to source symbol sk. It is usually found that when a
combined symbol is moved as high as possible, the resulting Huffman
code has a significantly smaller variance σ2 (which is better) than when
it is moved as low as possible. On this basis, it is reasonable to choose
the former Huffman code over the latter.

You might also like