STAT343 - Module-I Notes
STAT343 - Module-I Notes
Module I: Weightage
Introduction to Information Theory, Communication Process, Model for a
Communication System, Intuitive idea of the Fundamental Theorem of
Information Theory, A Measure of Uncertainty: Entropy, Properties of
Entropy function, Joint and Conditional Entropy for Discrete Random 25%
Variables, Relative Entropy and Mutual Information, Relationship
Between Relative Entropy and Mutual Information, Chain Rules for
Entropy, Relative Entropy and Mutual Information.
Information:
Probability of occurrence of any event is called information. It is denoted by
1
𝐼 ∝ log
𝑝
where, 𝑝 is the probability. Information is inversely proportional to the probability or we can
say that lower the probability, higher the information.
Signal:
Signal is the physical quantity by which information is carried between at least two objects.
It is denoted by independent variable ‘t’ (time). There are two types of signals
a) Analog Signal: These signals are continuous signals e.g. Sin wave, Cosine wave etc.
These signals involve transmission of continuous values (infinite number of levels)
over a defined interval of time.
0 1 0 1 0 1 0
0
5
Digital Discrete 4 4.1
4
1 3 2.5
2 1.5
1
1
0.1 0.3
0
0 1 0 1 0 1 0 1 2 3 4 5 6 7
0
Modulator:
A modulator is a circuit that combines two different signals in such a way that they can be
pulled apart later and the information obtained. In other words we can say that a modulator
is a device that perform modulation. There are three types of modulations:
• Digital baseband modulation: The aim of digital baseband modulation methods or line
coding, is to transfer a digital bit stream over a baseband channel, typically a non-filtered
copper wire such as a serial bus or a wired local area network.
• Pulse modulation: The aim of pulse modulation methods is to transfer a narrowband Commented [DDP3]: A narrowband channel is a
channel in which the bandwidth of the message does not
analog signal, for example, a phone call over a wideband baseband channel or, in some significantly exceed the channel's consistency bandwidth.
of the schemes, as a bit stream over another digital transmission system.
Commented [DDP4]: A system is wideband when the
message bandwidth significantly exceeds the consistency
Table 1: Conversion of decimal number into binary numbers bandwidth of the channel.
Binary Coded Decimal (BCD)
Decimal
28 27 26 25 24 23 22 21 20
0 0 0 0 0 0 0 0 0 0 0
1 20 0 0 0 0 0 0 0 0 1
2 21 0 0 0 0 0 0 0 1 0
3 21+20 0 0 0 0 0 0 0 1 1
4 22 0 0 0 0 0 0 1 0 0
5 2 +2
2 0 0 0 0 0 0 0 1 0 1
6 22+21 0 0 0 0 0 0 1 1 0
7 22+21+20 0 0 0 0 0 0 1 1 1
8 23 0 0 0 0 0 1 0 0 0
9 23+20 0 0 0 0 0 1 0 0 1
10 23+21 0 0 0 0 0 1 0 1 0
11 23+21+20 0 0 0 0 0 1 0 1 1
12 23+22 0 0 0 0 0 1 1 0 0
13 23+22+20 0 0 0 0 0 1 1 0 1
14 23+22+21 0 0 0 0 0 1 1 1 0
15 23+22+21+20 0 0 0 0 0 1 1 1 1
16 24 0 0 0 0 1 0 0 0 0
17 24+20 0 0 0 0 1 0 0 0 1
18 24+21 0 0 0 0 1 0 0 1 0
19 24+21+20 0 0 0 0 1 0 0 1 1
20 24+22 0 0 0 0 1 0 1 0 0
21 24+22+20 0 0 0 0 1 0 1 0 1
22 24+22+21 0 0 0 0 1 0 1 1 0
23 24+22+21+20 0 0 0 0 1 0 1 1 1
24 24+23 0 0 0 0 1 1 0 0 0
25 24+23+20 0 0 0 0 1 1 0 0 1
26 24+23+21 0 0 0 0 1 1 0 1 0
27 24+23+21+20 0 0 0 0 1 1 0 1 1
28 24+23+22 0 0 0 0 1 1 1 0 0
29 24+23+22+20 0 0 0 0 1 1 1 0 1
30 24+23+22+21 0 0 0 0 1 1 1 1 0
31 24+23+22+21+20 0 0 0 0 1 1 1 1 1
32 25 0 0 0 1 0 0 0 0 0
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
64 26 0 0 1 0 0 0 0 0 0
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
128 27 0 1 0 0 0 0 0 0 0
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
256 28 1 0 0 0 0 0 0 0 0
Information Theory:
It deals with measurement of transmission of information through a channel.
Transmission Goal:
1. Fast encoding of information.
2. Easy transmission of encoded messages.
3. Fast decoding of received messages.
4. Reliable collection of errors introduced in the channel.
5. Maximum transfer of information per unit time.
6. Security.
Input Digital
Analog Transducer Source Channel Modulator
Source and A to D Encoder Encoder and D to A
Converter Converter
TRANSMITTER
Channel
RECEIVER
Output Digital
Analog Transducer Source Channel Demodulato
Output and D to A Decoder Decoder r and A to D
Converter Converter
The figure shows basic operation of digital communication system. The source and the
destination are the two physically separate points. When the signal travels in the
communication channel noise interferes with it because of this interference the disturbed
version of the input signal is received at the receiver. This received signal my not be correct.
At channel encoder we add extra bit for security purposes.
A to D Converter: Analog to digital converter converts analog signal into discrete signal.
These are not continuous in nature. Therefore, output of the analog to digital converter is
Discrete Information Source.
Information Rate: Its unit is bit/second. It is equal to
Symbol Rate × Source Entropy
Source Encoder and Decoder: Symbols produced by the information source are given to the
source encoder. These symbols cannot be transmitted directly. They are first converted into
digital form (0 and 1) by the source encoder. The source encoder assigns codewords to the
symbol for every distinct symbol there is a unique codeword. Codeword can be 4, 8, 16, 32,
or 64 bits length. For example,
8 bits will have 28 = 256 distinct symbols/ codewords
Therefore, 8 bits can be used to represent 256 symbols.
Data Rate:
Data rate = Symbol rate × Codeword length
For example, Data rate = 10 × 8 = 80 bit/second
At the receiver source decoder is used to perform the reverse operation to that of source
encoder. It converts the binary output of channel decoder into symbol sequence.
Channel Encoder and Decoder: The communication channel adds some noise and
interference to the signal being transmitted. Therefore, errors are introduced in the binary
sequence and received at the receiver end. To avoid these errors channel coding is done.
Channel encoder adds some redundant binary bits to the input sequence. These redundant
bits are added with some properly defined logics.
For example: Consider that the codeword from the source encoder is 3 bit long and 1
redundant bit is added to make it 4 bit long. This 4th bit is added either (1 or 0) depending
on the even number of 1 or 0 are there. Such that the number of 1 in the encoded word
remains. It is called even parity.
Whenever, the modulating signal is discrete (binary codeword) then, digital modulation
technique is used. The carrier signal used by the digital modulator is always continuous
sinusoidal wave form of high frequency. The digital modulator maps the input sequence 1’s
and 0’s to the analog signal. If 1 bit at a time is to be transmitted, then digital modulator signal
is 𝑆1 (𝑡) to transmit ‘0’ and 𝑆2 (𝑡) to transmit ‘1’.
The signal 𝑆1 (𝑡) has low frequency compared to signal 𝑆2 (𝑡). Thus, even though the
modulated signal appears to be continuous the modulation is discrete. On the receiver side
the digits demodulator converts modulated signal to the sequence of binary bits.
Communication Channel:
Types of communication channels:
• Wireless
• Wirelines
• Optical Fibers
• Pen Drives
• Optical Disks
• Magnetic Tapes
Connection between transmitter and receiver is established through the channel. Noise,
alteration, distortion and dispersion are introduced in the channel.
Units of Information:
Different units of information can be defined for different base of logarithm:
Base ‘2’ - Bit
Base ‘e’ - Nat
Base ‘10’ - Decit or Hartley
Base ‘2’ or binary system is of practical importance. Hence, when no base is mentioned then
by default it is assumed as base ‘2’.
Units Bit (Base ‘2’) Nat (Base ‘e’) Decit (Base ‘10’)
1 1
1 Bit = 1 Bit =
Bit (Base ‘2’) - log2 𝑒 log2 10
≈ 0.693147 Nat ≈ 0.301 Decit
1 1 1
1 Nat = = 1 Nat = 𝑙𝑛 10
Nat (Base ‘e’) 𝑙𝑛 2 log𝑒 2 -
≈ 1.443 Bit ≈ 0.4342 Decit
1 1
1 Decit = log 1 Decit = log
Decit (Base ‘10’) 10 2 10 𝑒 -
≈ 3.3219 Bit ≈ 2.302585 Nat
where, 𝑙𝑛 2 = 𝑙𝑜𝑔𝑒 2
Information:
Mathematical presentation
1
𝐼(𝑥𝑖 ) = log 2 [ ] = log 2 1 − log 2 𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 )
= − log 2 𝑃(𝑥𝑖 ) = −3.32 log10 𝑃(𝑥𝑖 )
where,
𝑥𝑖 = message or source, and 𝐼(𝑥𝑖 ) = information carried by message or source 𝑥𝑖
Q. If 𝐼(𝑥1 ) is the information carried by message 𝑥1 and 𝐼(𝑥2 ) is the information carried by
𝑥2 then prove that amount of information carried compositely due to 𝑥1 and 𝑥2 is
𝐼(𝑥1 , 𝑥2 ) = 𝐼(𝑥1 ) + 𝐼(𝑥2 ).
1
Solution: We know that 𝐼(𝑥𝑖 ) = log 2 [𝑃(𝑥 ]
) 𝑖
1
then 𝐼(𝑥1 ) = log 2 [ ]
𝑃(𝑥1 )
1
and 𝐼(𝑥2 ) = log 2 [ ]
𝑃(𝑥2 )
Since, message 𝑥1 and 𝑥2 are independent the probability of composite message is
𝑃(𝑥1 ) × 𝑃(𝑥2 ). Therefore, information carried compositely due to 𝑥1 and 𝑥2 is
1 1 1
𝐼(𝑥1 , 𝑥2 ) = log 2 [ ] = log 2 [𝑃(𝑥 ) × 𝑃(𝑥 )]
𝑃(𝑥1 )∙𝑃(𝑥2 ) 1 2
1 1
= log 2 [𝑃(𝑥 ] + log 2 [𝑃(𝑥 ]
) 1 ) 2
= 𝐼(𝑥1 ) + 𝐼(𝑥2 )
Entropy (H):
1. Entropy is defined as average information per message.
2. It is a measure of uncertainty.
3. It is a quantitative measure of the disorder of a system and inversely related to the
amount of energy available to do the work in an isolated system.
4. High entropy indicates less energy available for the work and low entropy indicates
high energy available for work.
5. Entropy is maximum when all the messages are equiprobable.
= ∑ 𝑃(𝑥𝑖 ) 𝐼(𝑥𝑖 )
𝑖=1
Entropy does not have negative values.
= −[𝑃(𝑥1 ) 𝑙𝑜𝑔 𝑃(𝑥1 ) + 𝑃(𝑥2 ) 𝑙𝑜𝑔 𝑃(𝑥2 ) + 𝑃(𝑥3 ) 𝑙𝑜𝑔 𝑃(𝑥3 ) + 𝑃(𝑥4 ) 𝑙𝑜𝑔 𝑃(𝑥4 )]
= −[0.1 × 𝑙𝑜𝑔(0.1) + 0.2 × 𝑙𝑜𝑔(0.2) + 0.3 × 𝑙𝑜𝑔(0.3) + 0.4 × 𝑙𝑜𝑔(0.4)] Commented [DDP8]: Change of base formula is log 𝑎 𝑥 =
log𝑏 𝑥
log𝑏 𝑎
= −[0.1(log 1 − log 10) + 0.2(log 1 − log 5) + 0.3(log 3 − log 10) +
0.4(log 2 − log 5)]
Q. Source having probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05 and 0.05. Find the entropy.
Solution: Here 0.25 is given once, 0.2 is given 2 times, 0.1 is given 2 times and 0.05 is given
thrice. Now
1
𝐻(𝑋) = ∑8𝑖=1 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔2 = − ∑8𝑖=1 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔2 𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 )
= 2.741446 bits/message
Solution:
Let X be a binary source which is random in nature. It emits independent symbols ‘0’ and ‘1’
with equal probabilities i.e. P(0) = P(1) = ½ = P and the source entropy is given by
𝑚
1 1 1 1
= − [ log + (1 − ) log (1 − )]
2 2 2 2
1 1 1 1
= − [2 log 2 + 2 log 2]
1
= − log
2
= −[log 1 − log 2]
= −0 + 1
= 1 bit/message
𝐻𝑏𝑖𝑛𝑎𝑟𝑦
0 ½ 𝑃(𝑥1 )
Entropy goes to 0 (zero) when either message has 1 (one) probability i.e.
𝑃(𝑥1 ) or 𝑃(𝑥2 ) = 1.
If 𝑃(𝑥1 ) = 1, message 𝑥1 will be sent all the time with probability 1.
If 𝑃(𝑥1 ) = 0, then the message 𝑥2 will be sent all the time. So, in these cases no information
is transmitted by sending the message. Therefore, equally likely message entropy is
maximum.
Entropy is maximum for equally likely message 𝑥1 and 𝑥2 .
1
𝑃(𝑥1 ) = = 𝑃, for H to be maximum
2
𝑑𝐻
=0
𝑑𝑃
𝑑𝐻 𝑃 (1 − 𝑃)
= − [ + log 𝑃 − − log(1 − 𝑃)]
𝑑𝑃 𝑃 (1 − 𝑃)
𝑑𝐻
Put, 𝑑𝑃
=0
𝑃 (1−𝑃)
⇒ − [ + log 𝑃 − − log(1 − 𝑃)] = 0 ⇒ −[1 + log 𝑃 − 1 − log(1 − 𝑃)] = 0
𝑃 (1−𝑃)
1−𝑃
⇒ − log 𝑃 + log(1 − 𝑃) = 0 ⇒ log ( )=0
𝑃
1−𝑃 1
⇒ =1 or 𝑃=
𝑃 2
1 1 1 1
𝐻𝑚𝑎𝑥 = − [2 log 2 + (1 − 2) log (1 − 2)]
1 1
⇒ = − [2 {log 1 − log 2} + 2 {log 1 − log 2}]
𝑑𝐻 1
If the source is transmitting three messages then we get = 0 will give 𝑃 = i.e. the entropy
𝑑𝑃 3
of a source will be maximum when all the messages from the source are equally likely.
1
If the source is transmitting ‘m’ messages then for 𝐻𝑚𝑎𝑥 , 𝑃(𝑚1 ) = 𝑃(𝑚2 ) = 𝑃(𝑚3 ) = ⋯ =
𝑚
Therefore, in that case
1 1 1 1 𝑚
𝐻𝑚𝑎𝑥 = − [ log 2 + log 2 +⋯] = log 2 𝑚 = log 2 𝑚 bit/message.
𝑚 𝑚 𝑚 𝑚 𝑚
Properties of Entropy:
1) 0 ≤ 𝐻(𝑋) ≤ log 2 𝑚, where m is the number of symbols of the alphabet of source ‘X’.
2) When all the events are equally likely, the average uncertainty must have the largest
value i.e. log 2 𝑚 ≥ 𝐻(𝑋).
4) H(X) = 0, if all the P(X𝑖 ) are zero except for one symbol with 𝑝 = 1.
𝑃(𝑥1 ) = 0.25, 𝑃(𝑥2 ) = 0.2, 𝑃(𝑥3 ) = 0.2, 𝑃(𝑥4 ) = 0.1, 𝑃(𝑥5 ) = 0.1, 𝑃(𝑥6 ) = 0.05, 𝑃(𝑥7 ) =
0.05, 𝑃(𝑥8 ) = 0.05
= −[0.25 log 0.25 + 0.2 log 0.2 + 0.2 log 0.2 + 0.1 log 0.1 + 0.1 log 0.1 + 0.05 log 0.05 +
0.05 log 0.05 + 0.05 log 0.05]
= −[0.25 log 0.25 + 0.4 log 0.2 + 0.2 log 0.1 + 0.15 log 0.05]
1 1 1 1
= − [0.25 log + 0.4 log + 0.2 log + 0.15 log ]
4 5 10 20
= −[0.25{log 1 − log 4} + 0.4{log 1 − log 5} + 0.2{log 1 − log 10} + 0.15{log 1 − log 20}]
𝑅 = 𝑟𝐻(𝑋)
𝑓𝑚 = 5 𝑘𝐻𝑧
𝑟 = 2 𝑓𝑚 = 2 × 5 𝑘𝐻𝑧 = 10 𝑘𝐻𝑧 = 10000 message/sec.
Number of Heads: 0 1 2 3
Probability: 1/8 3/8 3/8 1/8
Here, total outcomes are: {TTT, HTT, THT, TTH, HHT, HTH, THH, HHH}
A discrete random variable has countable number of possible values. In this example the
number of heads can only take values (0, 1, 2, 3). So, variable is discrete and is called random
if the sum of probabilities is 1.
Information Source:
Type Classification
Discrete Memoryless
1) Type:
A continuous source has infinite number of symbols as the possible outcome, whereas
discrete source has a finite number of symbols as the possible outcome.
2) Classification:
Memoryless information source is that source in which output depends upon the
present input not on the previous input. In having memory, output depends upon the
present as well as past value of input.
Communication Channel:
𝑥1 𝑦1
𝑻𝑿 𝑥2 𝑦2 𝑹𝒀
⋮ 𝑷(𝒚𝒋 |𝒙𝒊 ) ⋮
Source X ⋮ ⋮ Receiver Y
𝑥𝑚 𝑦𝑛
If the alphabet 𝑋 and 𝑌 are infinite than the channel is continuous channel.
Discrete memoryless channel shown in the figure has ‘m’ inputs (𝑥1 , 𝑥2 , … , 𝑥𝑚 ) and ‘n’
outputs (𝑦1 , 𝑦2 , … , 𝑦𝑛 ).
The channel is represented by the conditional probability 𝑃(𝑦𝑗 |𝑥𝑖 ), where 𝑃(𝑦𝑗 |𝑥𝑖 ) is the
conditional probability of obtaining an output 𝑦𝑗 when the input 𝑥𝑖 is known and it it called
channel transition probability. So, a channel is completely characterized by channel
transition probability in the form of channel matrix.
𝑃(𝑦1 |𝑥1 ) 𝑃(𝑦2 |𝑥1 ) ⋯ 𝑃(𝑦𝑛 |𝑥1 )
𝑃(𝑦1 |𝑥2 ) 𝑃(𝑦2 |𝑥2 ) ⋯ 𝑃(𝑦𝑛 |𝑥2 )
[𝑃(𝑌|𝑋)] = [ ]
⋮ ⋮ ⋮
𝑃(𝑦1 |𝑥𝑚 ) 𝑃(𝑦2 |𝑥𝑚 ) ⋯ 𝑃(𝑦𝑛 |𝑥𝑚 )
Mutual Information:
The information gained about 𝑥𝑖 by the reception of 𝑦𝑗 is the net reduction in its uncertainty
is called mutual information.
𝑰(𝒙𝒊 , 𝒚𝒋 ) = 𝑰𝒏𝒊𝒕𝒊𝒂𝒍 𝑼𝒏𝒄𝒆𝒓𝒕𝒂𝒊𝒏𝒕𝒚 – 𝑭𝒊𝒏𝒂𝒍 𝑼𝒏𝒄𝒆𝒓𝒕𝒂𝒊𝒏𝒕𝒚
𝑥𝑖 𝑥𝑖
𝐼(𝑥𝑖 , 𝑦𝑗 ) = − log 𝑃(𝑥𝑖 ) − [− log 𝑃 ( )] = [log 𝑃 ( )] − log 𝑃(𝑥𝑖 )
𝑦𝑗 𝑦𝑗
𝑥
𝑃( 𝑖 )
𝑦𝑗 𝑃 (𝑥𝑖 ∙ 𝑦𝑗 )
= log [ ] = log [𝑃(𝑥 )∙𝑃(𝑦 )] P(A|B)=P(AB)/P(B)
𝑃(𝑥𝑖 ) 𝑖 𝑗
𝑃(𝑦𝑗 |𝑥𝑖 )
= log [ ]
𝑃(𝑦𝑗 )
𝐼(𝑥𝑖 , 𝑦𝑗 ) = 𝐼(𝑦𝑗 , 𝑥𝑖 )
Mutual information is symmetrical in nature.
association probability.
𝑃[𝑋|𝑦𝑗 ] = [𝑃(𝑥1 |𝑦𝑗 ), 𝑃(𝑥2 |𝑦𝑗 ), … , 𝑃(𝑥𝑚 |𝑦𝑗 )]
𝑃(𝑥1 ,𝑦𝑗 ) 𝑃 (𝑥2 ,𝑦𝑗 ) 𝑃 (𝑥𝑚 ,𝑦𝑗 )
=[ , ,…, ] … (2)
𝑃 (𝑦 𝑗 ) 𝑃 (𝑦𝑗 ) 𝑃(𝑦𝑗 )
Therefore, ∑𝑚
𝑖=1 𝑃 (𝑥𝑖 |𝑦𝑗 ) = 1
Hence, the sum of the elements of the matrix given by equation (2) is unity.
Derivations:
𝐻(𝑋|𝑌) = 𝐻 (𝑋|𝑦𝑗 ) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 ) … (3)
𝐻(𝑌|𝑋) = 𝐻(𝑌|𝑥𝑖 ) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃(𝑦𝑗 |𝑥𝑖 ) … (4)
Proof: For equation (3)
𝐻(𝑋|𝑌) = ∑𝑛𝑗=1 𝑃(𝑦𝑗 ) 𝑃(𝑥|𝑦𝑗 )
= − ∑𝑛𝑗=1 𝑃(𝑦𝑗 ) ∑𝑚
𝑖=1 𝑃(𝑥𝑖 |𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 )
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑦𝑗 ) 𝑃(𝑥𝑖 |𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 )
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 ) Hence proved.
= − ∑𝑚 𝑛
𝑖=1 𝑃(𝑥𝑖 ) ∑𝑗=1 𝑃 (𝑦𝑗 |𝑥𝑖 ) log 2 𝑃 (𝑦𝑗 |𝑥𝑖 )
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 ) 𝑃 (𝑦𝑗 |𝑥𝑖 ) log 2 𝑃 (𝑦𝑗 |𝑥𝑖 )
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑦𝑗 |𝑥𝑖 ) Hence proved.
Chain Rule:
1. 𝐻(𝑋, 𝑌) = 𝐻(𝑋|𝑌) + 𝐻(𝑌)
2. 𝐻(𝑋, 𝑌) = 𝐻(𝑌|𝑋) + 𝐻(𝑋)
Proof:
1. 𝐻(𝑋, 𝑌) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 )
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log[𝑃(𝑥𝑖 |𝑦𝑗 ) ∙ 𝑃(𝑦𝑗 )]
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) [log 𝑃(𝑥𝑖 |𝑦𝑗 ) + log 𝑃(𝑦𝑗 )]
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1[𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 |𝑦𝑗 ) + 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 )]
= 𝐻(𝑋|𝑌) − ∑𝑛𝑗=1[∑𝑚
𝑖=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 )]
2. 𝐻(𝑋, 𝑌) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 )
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log[𝑃(𝑦𝑗 |𝑥𝑖 ) ∙ 𝑃(𝑥𝑖 )]
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) [log 𝑃(𝑦𝑗 |𝑥𝑖 ) + log 𝑃(𝑥𝑖 )]
= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1[𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 |𝑥𝑖 ) + 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 )]
= 𝐻(𝑌|𝑋) − ∑𝑚 𝑛
𝑖=1[∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 )]
= 𝐻(𝑌|𝑋) − ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) [∵ 𝑃(𝑥𝑖 ) = ∑𝑛𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 )]
⇒ 𝐻(𝑋, 𝑌) = 𝐻(𝑌|𝑋) + 𝐻(𝑋).
Relation Between Mutual Information and Entropy:
𝐼(𝑋; 𝑌) = ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) 𝐼 (𝑥𝑖 , 𝑦𝑗 )
𝑃(𝑥𝑖 |𝑦𝑗 )
= ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 )
= ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) [log 𝑃 (𝑥𝑖 |𝑦𝑗 ) − log 𝑃(𝑥𝑖 )]
= − ∑𝑚 𝑛 𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 ) − [− ∑𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 |𝑦𝑗 )]
= − ∑𝑚 𝑛
𝑖=1[∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 )] log 𝑃(𝑥𝑖 ) − 𝐻(𝑋|𝑌)
= − ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) − 𝐻(𝑋|𝑌)
= 𝐻(𝑋) − 𝐻(𝑋|𝑌)
= 𝐻(𝑋) − [𝐻(𝑋, 𝑌) − 𝐻(𝑌)]
= 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌)
= 𝐻(𝑌) − 𝐻(𝑌|𝑋)
Mutual information 𝐼(𝑋, 𝑌) does not depend on the individual symbols 𝑥𝑖 and 𝑦𝑗 . It is a
property of whole communication system. On the other hand 𝐼(𝑥𝑖 , 𝑦𝑗 ) depends upon the
individual symbols 𝑥𝑖 and 𝑦𝑗 .
Q1. A transmitter has an alphabet of four letters [𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ] and the receiver has an alphabet
of three letters [𝑦1 , 𝑦2 , 𝑦3 ]. The joint probability matrix is:
0.3 0.05 0
0 0.25 0
[𝑃(𝑋, 𝑌)] = [ ]. Calculate the entropies.
0 0.15 0.05
0 0.05 0.15
Solution: We have
𝑦1 𝑦2 𝑦3
𝑥1 0.3 0.05 0
𝑥2 0 0.25 0
[𝑃(𝑋, 𝑌)] = 𝑥 [ ]
3 0 0.15 0.05
𝑥4 0 0.05 0.15
Q3. Find the mutual information for the channel whose joint probability matrix is:
0.125 0.075 0.05
0.125 0.075 0.05
[𝑃(𝑋, 𝑌)] = [ ].
0.125 0.075 0.05
0.125 0.075 0.05
Solution: 𝐼(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌)
[𝑃(𝑋)] = [0.25 0.25 0.25 0.25] and [𝑃(𝑌)] = [0.5 0.3 0.2]
𝑝(𝑥)
= 𝐸𝑝 [log ]
𝑞(𝑥)
0 0 𝑝
Conversions, 0 log = 0, 0 log = 0 and 𝑝 log = ∞
0 𝑞 0
Interpretation: The relative entropy is a measure of the distance between two distributions.
In statistics, it arises as an expected logarithm of likelihood ratio.
Relative entropy is a measure of the inefficiency of assuming that the distribution is q when
the true distribution is p.
𝑝(𝑋, 𝑌)
= 𝐸𝑝 (𝑥, 𝑦) [log ]
𝑝(𝑋)𝑝(𝑌)
𝑝(𝑌|𝑋)
= 𝐸𝑝(𝑥,𝑦) [log ]
𝑞(𝑌|𝑋)
Now chain rule for relative entropy
𝐷[𝑝(𝑥, 𝑦)||𝑞(𝑥, 𝑦)] = 𝐷[𝑝(𝑥)||𝑞(𝑥)] + 𝐷[𝑝(𝑦|𝑥)||𝑞(𝑦|𝑥)]
Channel Capacity:
It is the maximum of mutual information denoted by C.
𝐶 = max 𝐼(𝑋; 𝑌)
= max[𝐻(𝑋) − 𝐻(𝑋|𝑌)]
= max[𝐻(𝑌) − 𝐻(𝑌|𝑋)]
Mutual information is the difference between two entropies. Hence, sometimes the unit of
channel capacity is taken as bit/sec.
1/3 𝒚𝟑
𝒙𝟐 2/3
𝒚𝟒
1
𝒙𝟑 𝒚𝟓
For a lossless channel 𝐻(𝑋|𝑌) = 0. The channel capacity per symbol for a lossless
channel is given by 𝐶 = 𝑚𝑎𝑥[𝐻(𝑋)] = log 2 𝑚. Also, 𝐼(𝑋; 𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) =
𝐻(𝑋) − 0 = 𝐻(𝑋). The mutual information 𝐼(𝑋; 𝑌) is equal to the source entropy
𝐻(𝑋) and no source information is lost in transmission.
𝒙𝟏 1
𝒚𝟏
𝒙𝟐 1
𝒙𝟑 1
𝒚𝟐
𝒙𝟒 1
1
𝒙𝟓 𝒚𝟑
Here, 𝐻(𝑌|𝑋) = 0 and 𝐻(𝑋) = 𝐻(𝑌) or 𝐻(𝑋) ≠ 𝐻(𝑌). It may be seen that when a
given source symbol is transmitted in a deterministic channel, it is sure that what
output symbol will be received.
𝒙𝟏 1
𝒚𝟏
1
𝒙𝟐 𝒚𝟐
1
𝒙𝟑 𝒚𝟑
1
𝒙𝟒 𝒚𝟒
4. Binary Symmetric Channel (BSC): A BSC has two inputs {𝑥1 = 0, 𝑥2 = 1} and two
outputs {𝑦1 = 0, 𝑦2 = 1}, here 𝑚 = 𝑛 = 2. The channel matrix is
𝑦1 𝑦2
𝑥1 𝑝 1−𝑝 𝑝 𝑞
[𝑃(𝑌|𝑋)] = 𝑥 [ =
2 1−𝑝 𝑝 ] [𝑞 𝑝]
where, p is the transition probability.
Signal flow graph or channel diagram –
p
𝒙𝟏 𝒚𝟏
1-p
1-p
p
𝒙𝟐 𝒚𝟐
It is the most common and widely used channel. It is called Binary Symmetric Channel
because the probability of receiving a ‘1’ if a ‘0’ is send is the same, has the probability
of receiving a ‘0’ if a ‘1’ is sent.
p
0 0
q 0
q
0
1 1
p
⇒
p
0 0
q
Input 𝒚 Output
q
1 1
p
BEC is one of the important types of channel used in digital communication.
Whenever an error occurs the symbol will be received as y and no decision will be
made about the information, but an immediate request will be made for
retransmission, rejecting what has been received. Thus, ensuring 100% correct data
recovery. This channel is also a type of symmetric channel.
𝑃(𝑥1 = 0) = 𝛼, 𝑃(𝑥2 = 1) = 1 − 𝛼
1 1
𝐻(𝑌|𝑋) = 𝑝 log + 𝑞 log
𝑝 𝑞
1 1
𝐻(𝑋) = 𝛼 log 𝛼 + (1 − 𝛼) log
(1−𝛼)
The joint probability matrix [𝑃(𝑋, 𝑌)] can be found by multiplying the rows of 𝑃(𝑌|𝑋)
by 𝛼 and (1 − 𝛼) respectively.
𝛼𝑝 𝛼𝑞 0
[P(𝑋, 𝑌)] = [ ]
0 (1 − 𝛼)𝑞 (1 − 𝛼)𝑝
= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 |𝑦1 ) + 𝑃(𝑥1 , 𝑦2 ) log 𝑃(𝑥1 | 𝑦2 ) + ⋯ + 𝑃(𝑥2 , 𝑦3 ) log 𝑃(𝑥2 | 𝑦3 )]
= 𝑞𝐻(𝑋) = (1 − 𝑝)𝐻(𝑋)
p 𝒚𝟏 p
𝒙𝟏 𝒛𝟏
q
q
𝑋 𝑌
q q
𝒙𝟐 𝒛𝟐
p 𝒚𝟐 p
q'
q’
p'
𝒙𝟐 𝒛𝟐
A channel is symmetric if rows and columns of the channel matrix [𝑃(𝑌|𝑋)] are
separately identical except for permutation. If the channel matrix is a square matrix
than for symmetric channel the rows and columns are identical i.e., sum of each
column should be equal to the sum of each row. For example,
1 1 1
2 4 4
1 1 1
a) [𝑃(𝑌|𝑋)] = 4 2 4
1 1 1
[4 4 2]
This is a symmetric channel as rows and columns are identical except for
permutations. (Each row and column contain one ½ and two ¼).
1 1 1
2 4 4
1 1 1
b) [𝑃(𝑌|𝑋)] = 4 4 2
1 1 1
[2 4 4]
It is not a symmetric channel as although the rows are identical except for
permutations, the columns are not identical.
Q. A discrete source transmits message 𝑥1 , 𝑥2 , 𝑥3 with the probabilities 0.3, 0.4, 0.3
respectively. The source is connected to the channel as given in the diagram. Calculate
all entropies.
0.8
𝒙𝟏 𝒚𝟏
0.2
1
𝒙𝟐 𝒚𝟐
0.3
𝒙𝟑 𝒚𝟑
0.7
Solution: It is given that [𝑃(𝑋)] = [𝑝(𝑥1 ) 𝑝(𝑥2 ) 𝑝(𝑥3 )] = [0.3 0.4 0.3]
Conditional probability matrix of channel matrix for the given diagram is
𝑦1 𝑦2 𝑦3
𝑥1 0.8 0.2 0
[𝑃(𝑌|𝑋)] = 𝑥2 [ 0 1 0]
𝑥3 0 0.3 0.7
The joint probability matrix [𝑃(𝑋, 𝑌)] can be found by multiplying the rows of [𝑃(𝑌|𝑋)] by
0.3, 0.4 and 0.3 respectively.
0.24 0.06 0
[P(𝑋, 𝑌)] = [ 0 0.40 0 ]
0 0.09 0.21
= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 |𝑦1 ) + 𝑃(𝑥1 , 𝑦2 ) log 𝑃(𝑥1 | 𝑦2 ) + ⋯ + 𝑃(𝑥3 , 𝑦3 ) log 𝑃(𝑥3 | 𝑦3 )]
6 8 9
= − [0.24 log 1 + 0.06 log + 0.4 log + 0.09 log + 0.21 log 1]
55 11 55
= −[0 + 0.06(log 6 − log 55) + 0.4(log 8 − log 11) + 0.09(log 9 − log 55) + 0]
0.8
𝒙𝟏 𝒚𝟏
0.2
1
𝒙𝟐 𝒚𝟐
0.1
𝒙𝟑 𝒚𝟑
0.9
Solution: In this case channel matrix will be
𝑦1 𝑦2 𝑦3
𝑥1 0.8 0 0
[𝑃(𝑋|𝑌)] = 𝑥2 [0.2 1 0.1]
𝑥3 0 0 0.9
1 1 1
Here, 𝑃(𝑌) is not given so take it as 𝑃(𝑌) = [3 3
]
3
4
0 0
15
1 1 1
Now, [𝑃(𝑋, 𝑌)] = [𝑃(𝑌)][𝑃(𝑋|𝑌)] = 15 3 30
3
[0 0 10]
4 13 3
[𝑃(𝑋)] = [15 30 10
]
1 0 0
[𝑃(𝑋,𝑌)] 2 10 1
[𝑃(𝑌|𝑋)] = [𝑃(𝑋)]
= [13 13 9
]
0 0 1
𝐻(𝑋) = − ∑3𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) = −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + 𝑃(𝑥2 ) log 𝑃(𝑥2 ) + 𝑃(𝑥3 ) log 𝑃(𝑥3 )]
4 4 13 13 3 3
= − [ log + log + log ]
15 15 30 30 10 10
4 13 3
= − [15 (−1.906891) + 30 (−1.206451) + 10 (−1.736965)]
𝒚𝟐
1
𝒙𝟐 𝒚𝟑
1
0
20
1 3 1 1 9
Q2. Given [𝑃(𝑋, 𝑌)] = 5 10
, 𝑃(𝑥1 ) = , 𝑃(𝑥2 ) = , 𝑃(𝑥3 ) = . Find 𝑃(𝑌|𝑋),
20 2 20
1 2
[20 5]
𝑝11
𝒙𝟏 (0) 𝒚𝟏 (0)
𝑝21
𝑝12
𝑝22
𝒙𝟐 (1) 𝒚𝟐 (1)
To find the channel capacity of non-symmetric binary channel, the auxiliary variable 𝑄1 and
𝑄2 are defined by
[𝑃][𝑄] = −[𝐻]
𝑃11 𝑃12 𝑄1 −𝑃 log 𝑃11 − 𝑃12 log 𝑃12 𝑃 log 𝑃11 + 𝑃12 log 𝑃12
[𝑃 = − [ 11 = 11
21 𝑃22 ] [𝑄2 ] −𝑃21 log 𝑃21 − 𝑃22 log 𝑃22 ] [𝑃21 log 𝑃21 + 𝑃22 log 𝑃22 ]
Q. Find the mutual information and channel capacity when 𝑃(𝑥1 ) = 0.6 and 𝑃(𝑥2 ) = 0.4
and channel diagram is
0.8
𝒙𝟏 𝒚𝟏
0.3
0.2
0.7
𝒙𝟐 𝒚𝟐
Solution: 𝑦1 𝑦2
𝑥1 0.8 0.2
[𝑃(𝑌|𝑋)] = 𝑥 [ ]
2 0.3 0.7
0.48 0.12
[𝑃(𝑋, 𝑌)] = [𝑃(𝑌|𝑋)][𝑃(𝑋)] = [ ]
0.12 0.28
For channel capacity 𝑃11 = 0.8, 𝑃12 = 0.2, 𝑃21 = 0.3, 𝑃22 = 0.7
[𝑃][𝑄] = −[𝐻]
𝑃11 𝑃12 𝑄1 −𝑃 log 𝑃11 − 𝑃12 log 𝑃12 𝑃 log 𝑃11 + 𝑃12 log 𝑃12
[𝑃 = − [ 11 = 11
21 𝑃22 ] [𝑄2 ] −𝑃21 log 𝑃21 − 𝑃22 log 𝑃22 ] [𝑃21 log 𝑃21 + 𝑃22 log 𝑃22 ]
0.8 0.2 𝑄1 0.8 log 0.8 + 0.2 log 0.2
[ ] [ ] = [0.3 log 0.3 + 0.7 log 0.7]
0.3 0.7 𝑄2
0.8 0.2 𝑄1 −0.25754 − 0.46438
[ ][ ] = [ ]
0.3 0.7 𝑄2 −0.52109 − 0.36020
0.8 0.2 𝑄1 −0.72192
[ ][ ] = [ ]
0.3 0.7 𝑄2 −0.88129
𝑄1 −0.658172
[𝑄 ] = [ ]
2 −0.976912
Channel capacity is 𝐶 = log(2𝑄1 + 2𝑄2 ) = log(2−0.658172 + 2−0.976912 )
= 𝐻(𝑌) − 𝐴 ∑ 𝑃(𝑥𝑖 )
𝑖=1
Now, ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) = 1
𝐼(𝑋; 𝑌) = 𝐻(𝑌) − 𝐴
Channel Capacity:
1-p
1-p
p
𝒙𝟐 (1) 𝒚𝟐 (1)
Find the channel capacity for (𝑖) 𝑃 = 0.9, (𝑖𝑖) 𝑃 = 0.3, (𝑖𝑖𝑖) 𝑃 = 0.7
Solution:
𝑦1 𝑦2
𝑥1 𝑝 1−𝑝 𝑝 𝑞
[𝑃(𝑌|𝑋)] = 𝑥 [ =
2 1−𝑝 𝑝 ] [𝑞 𝑝]
Now,
Q. Write the matrices [𝑃(𝑌|𝑋)], [𝑃(𝑋|𝑌)] and [𝑃(𝑋, 𝑌)] for following channel diagrams:
𝒙𝟏 0.25 0.8
(iii) 𝒚𝟏 (iv) 𝒙𝟏 𝒚𝟏
0.15 0.2
0.25
𝒚𝟐 1
𝒙𝟐 𝒚𝟐
0.1 0.3
0.15
𝒙𝟐 𝒚𝟑 0.7 𝒚𝟑
𝒙𝟑
0.1
Solution:
Q. Find the entropies and mutual information for the following channel diagrams:
(i) (ii)
0.15 𝒚 0.8
𝒙𝟏 𝟏 𝒙𝟏 𝒚𝟏
0.2 0.2
0.15
𝒙𝟐 1
𝒙𝟐 𝒚𝟐
0.15 0.3
0.2
𝒙𝟑 𝒚𝟐 0.7 𝒚𝟑
𝒙𝟑
0.15
Solution:
0.15 0.15
(i) [𝑃(𝑋, 𝑌)] = [ 0.2 0.2 ]
0.15 0.15
𝐻(𝑌) = 1
0.8 0 0
(ii) [𝑃(𝑋|𝑌)] = [0.2 1 0.3]
0 0 0.7
𝐻(𝑋|𝑌) = 0.40861
𝐻(𝑌|𝑋) = 0.57841
0.9 0.8
𝒙𝟏 𝒚𝟏
0.1
0.2
0.1
0.2
𝒙𝟐 𝒚𝟐
0.8 0.9
Solution:
𝑃11
𝒙𝟏 𝒚𝟏
𝑃21
𝑃12
𝑃22
𝒙𝟐 𝒚𝟐
[𝑃][𝑄] = −[𝐻]
𝑃11 𝑃12 𝑄1 −𝑃 log 𝑃11 − 𝑃12 log 𝑃12 𝑃 log 𝑃11 + 𝑃12 log 𝑃12
[𝑃 = − [ 11 = 11
21 𝑃22 ] [𝑄2 ] −𝑃21 log 𝑃21 − 𝑃22 log 𝑃22 ] [𝑃21 log 𝑃21 + 𝑃22 log 𝑃22 ]
0.73 0.27 𝑄1 −0.33144 − 0.51002
[ ][ ] = [ ]
0.24 0.76 𝑄2 −0.4941 − 0.300905
𝑄1 −0.8670576531
[𝑄 ] = [ ]
2 −772251515306
Therefore,
𝐶 = log 2 (2𝑄1 + 2𝑄2 ) = 0.1811240 bit/message