0% found this document useful (0 votes)
29 views

STAT343 - Module-I Notes

This document discusses the key concepts in the STAT343 course module on information theory. It covers information and probability, communication systems, analog vs digital signals and communication, modulation, and the conversion of decimal numbers to binary codes. The module constitutes 25% of the course weightage and addresses topics like entropy, joint and conditional entropy, relative entropy, and chain rules for entropy calculations.

Uploaded by

tiwarigulshriya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

STAT343 - Module-I Notes

This document discusses the key concepts in the STAT343 course module on information theory. It covers information and probability, communication systems, analog vs digital signals and communication, modulation, and the conversion of decimal numbers to binary codes. The module constitutes 25% of the course weightage and addresses topics like entropy, joint and conditional entropy, relative entropy, and chain rules for entropy calculations.

Uploaded by

tiwarigulshriya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

[STAT343] Statistical Information Theory

Module I: Weightage
Introduction to Information Theory, Communication Process, Model for a
Communication System, Intuitive idea of the Fundamental Theorem of
Information Theory, A Measure of Uncertainty: Entropy, Properties of
Entropy function, Joint and Conditional Entropy for Discrete Random 25%
Variables, Relative Entropy and Mutual Information, Relationship
Between Relative Entropy and Mutual Information, Chain Rules for
Entropy, Relative Entropy and Mutual Information.

Information:
Probability of occurrence of any event is called information. It is denoted by
1
𝐼 ∝ log
𝑝
where, 𝑝 is the probability. Information is inversely proportional to the probability or we can
say that lower the probability, higher the information.

Information theory is concerned with the analysis of an object called a “communication


system”.

Signal:
Signal is the physical quantity by which information is carried between at least two objects.
It is denoted by independent variable ‘t’ (time). There are two types of signals
a) Analog Signal: These signals are continuous signals e.g. Sin wave, Cosine wave etc.
These signals involve transmission of continuous values (infinite number of levels)
over a defined interval of time.

b) Digital Signal: These signals are discrete or discontinuous e.g. calculator,

speedometer, mobile etc. Display of these signals is a seven-segment display . In


digital communication we use ICs (Chips). These signals must have finite set of
possible values, these values are 0 and 1.
0 and 1

0 1 0 1 0 1 0
0

Difference between digital and discrete:


In digital there are two entity 0 and 1 but in case of discrete respective of 0 and 1 we can
have any decimal value i.e. 0, 1.5, 0.1, 0.3, 1, 4, 4.1, … etc.

5
Digital Discrete 4 4.1
4

1 3 2.5

2 1.5
1
1
0.1 0.3
0
0 1 0 1 0 1 0 1 2 3 4 5 6 7
0

Comparison of analog and digital communication:


S. No. Analog Communication Digital Communication
Analog communication uses analog signals Digital communication uses digital signals
1
for the transmission of information. for the transmission of information.
Digital communication is immune from
2 Analog communication is affected by noise.
noise and distortion.
In analog communication error probability In digital communication error probability
3
is high due to parallax. is low.
In analog communication system In digital communication system hardware
4 hardware is complicated and less flexible is flexible and less complicated than analog
than digital system. system.
Analog communication systems require Digital communication system requires
5
low cost. high cost.
In analog communication low bandwidth is In digital communication high bandwidth
6
required. is required.
High power is required to run the analog In digital communication systems low
7
communication systems. power is required.
These systems are less portable due to These systems are more portable due to
8
heavy components. compact components.
In analog communication amplitude and In digital communication pulse coded
9
angle modulation are used. modulation or PCM, DPCM etc. are used.
Analog signal can be represented by sine Digital signal is represented by square
10
wave. wave.
Analog signal consists of continuous
11 Digital signal consists of discrete values.
values.
Analog signal comprises of voice, sound
12 Digital signals are used in computers.
etc.
Processing and storage of analog Processing and storage of digital
13
information is not easy. information is very easy.
Error occurred in analog communication
Chance of occurrence of error in digital
cannot be detected and corrected easily
14 communication is very less and
and it is very difficult to secure the
information can be secured easily.
information.

Advantages of digital communication:


There are many advantages of using digital communication over analog communication.
Some of them are as:
− Digital communication has mostly common structure of encoding a signal, so devices
used are mostly similar.
− Reduction of noise, distortion and other impairments are possible.
− In digital communication regeneration of signal is easier.
− Error detection and correction is easily possible.
− Storage is quite possible due to use of the digital computers.
− Secure communication is possible due to encryption and decryption.
− Compression of data is only possible in digital communication.
− Digital circuits are more reliable than the analog circuits.
− Digital Communication is cheaper than Analog Communication.

Disadvantages of digital communication:


− Digital communication generally requires more bandwidth than the analog
communication.
− Synchronization is required in case of synchronous modulation.
Modulation:
Modulation is defined as the process by which some characteristic of a carrier wave is varied
in accordance with an information-bearing signal.
Or
The process for verifying one or more properties of periodic waveform, called the carrier
signal is called modulation.

Modulator:
A modulator is a circuit that combines two different signals in such a way that they can be
pulled apart later and the information obtained. In other words we can say that a modulator
is a device that perform modulation. There are three types of modulations:

• Analog modulation: The aim of analog modulation is to transfer an analog baseband


(or lowpass) signal. For example, an audio signal or TV signal. Commented [DDP1]: It is a filter that passes signals
with a frequency lower than a selected cutoff frequency
and weakens signals with frequencies higher than the
• Digital modulation: The aim of digital modulation is to transfer a digital bit stream cutoff frequency.
over an analog communication channel. For example, over the public switched
telephone network (where a bandpass filter limits the frequency range to 300– Commented [DDP2]: Bandpass filter is a device that
passes frequencies within a certain limit and rejects the
3400 Hz) or over a limited radio frequency band. frequencies beyond that limit.

• Digital baseband modulation: The aim of digital baseband modulation methods or line
coding, is to transfer a digital bit stream over a baseband channel, typically a non-filtered
copper wire such as a serial bus or a wired local area network.

• Pulse modulation: The aim of pulse modulation methods is to transfer a narrowband Commented [DDP3]: A narrowband channel is a
channel in which the bandwidth of the message does not
analog signal, for example, a phone call over a wideband baseband channel or, in some significantly exceed the channel's consistency bandwidth.
of the schemes, as a bit stream over another digital transmission system.
Commented [DDP4]: A system is wideband when the
message bandwidth significantly exceeds the consistency
Table 1: Conversion of decimal number into binary numbers bandwidth of the channel.
Binary Coded Decimal (BCD)
Decimal
28 27 26 25 24 23 22 21 20
0 0 0 0 0 0 0 0 0 0 0
1 20 0 0 0 0 0 0 0 0 1
2 21 0 0 0 0 0 0 0 1 0
3 21+20 0 0 0 0 0 0 0 1 1
4 22 0 0 0 0 0 0 1 0 0
5 2 +2
2 0 0 0 0 0 0 0 1 0 1
6 22+21 0 0 0 0 0 0 1 1 0
7 22+21+20 0 0 0 0 0 0 1 1 1
8 23 0 0 0 0 0 1 0 0 0
9 23+20 0 0 0 0 0 1 0 0 1
10 23+21 0 0 0 0 0 1 0 1 0
11 23+21+20 0 0 0 0 0 1 0 1 1
12 23+22 0 0 0 0 0 1 1 0 0
13 23+22+20 0 0 0 0 0 1 1 0 1
14 23+22+21 0 0 0 0 0 1 1 1 0
15 23+22+21+20 0 0 0 0 0 1 1 1 1
16 24 0 0 0 0 1 0 0 0 0
17 24+20 0 0 0 0 1 0 0 0 1
18 24+21 0 0 0 0 1 0 0 1 0
19 24+21+20 0 0 0 0 1 0 0 1 1
20 24+22 0 0 0 0 1 0 1 0 0
21 24+22+20 0 0 0 0 1 0 1 0 1
22 24+22+21 0 0 0 0 1 0 1 1 0
23 24+22+21+20 0 0 0 0 1 0 1 1 1
24 24+23 0 0 0 0 1 1 0 0 0
25 24+23+20 0 0 0 0 1 1 0 0 1
26 24+23+21 0 0 0 0 1 1 0 1 0
27 24+23+21+20 0 0 0 0 1 1 0 1 1
28 24+23+22 0 0 0 0 1 1 1 0 0
29 24+23+22+20 0 0 0 0 1 1 1 0 1
30 24+23+22+21 0 0 0 0 1 1 1 1 0
31 24+23+22+21+20 0 0 0 0 1 1 1 1 1
32 25 0 0 0 1 0 0 0 0 0
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
64 26 0 0 1 0 0 0 0 0 0
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
128 27 0 1 0 0 0 0 0 0 0
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
256 28 1 0 0 0 0 0 0 0 0

Information Theory:
It deals with measurement of transmission of information through a channel.
Transmission Goal:
1. Fast encoding of information.
2. Easy transmission of encoded messages.
3. Fast decoding of received messages.
4. Reliable collection of errors introduced in the channel.
5. Maximum transfer of information per unit time.
6. Security.

Elements of digital communication system: or


Model of digital communication system:

Q. Using block diagram discuss the digital communication system. or


Discuss the model of digital communication system in detail.

Input Digital
Analog Transducer Source Channel Modulator
Source and A to D Encoder Encoder and D to A
Converter Converter

TRANSMITTER

Channel

RECEIVER

Output Digital
Analog Transducer Source Channel Demodulato
Output and D to A Decoder Decoder r and A to D
Converter Converter

The figure shows basic operation of digital communication system. The source and the
destination are the two physically separate points. When the signal travels in the
communication channel noise interferes with it because of this interference the disturbed
version of the input signal is received at the receiver. This received signal my not be correct.
At channel encoder we add extra bit for security purposes.
A to D Converter: Analog to digital converter converts analog signal into discrete signal.
These are not continuous in nature. Therefore, output of the analog to digital converter is
Discrete Information Source.
Information Rate: Its unit is bit/second. It is equal to
Symbol Rate × Source Entropy

Source Encoder and Decoder: Symbols produced by the information source are given to the
source encoder. These symbols cannot be transmitted directly. They are first converted into
digital form (0 and 1) by the source encoder. The source encoder assigns codewords to the
symbol for every distinct symbol there is a unique codeword. Codeword can be 4, 8, 16, 32,
or 64 bits length. For example,
8 bits will have 28 = 256 distinct symbols/ codewords
Therefore, 8 bits can be used to represent 256 symbols.

Data Rate:
Data rate = Symbol rate × Codeword length
For example, Data rate = 10 × 8 = 80 bit/second
At the receiver source decoder is used to perform the reverse operation to that of source
encoder. It converts the binary output of channel decoder into symbol sequence.

Channel Encoder and Decoder: The communication channel adds some noise and
interference to the signal being transmitted. Therefore, errors are introduced in the binary
sequence and received at the receiver end. To avoid these errors channel coding is done.
Channel encoder adds some redundant binary bits to the input sequence. These redundant
bits are added with some properly defined logics.
For example: Consider that the codeword from the source encoder is 3 bit long and 1
redundant bit is added to make it 4 bit long. This 4th bit is added either (1 or 0) depending
on the even number of 1 or 0 are there. Such that the number of 1 in the encoded word
remains. It is called even parity.

Output at source Bit to be added by channel


Output at channel encoder
encoder encoder
b3 b2 b1 b0 b3 b2 b1 b0
1 1 0 0 1 1 0 0
0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
Every codeword at the output of channel encoder contains even number of 1’s. At the
receiver if odd number of 1’s is detected then receiver comes to know that there is an error
in the received signal.

Digital Modulator and Digital to Analog Converter:

𝑆2 (𝑡) 𝑆1 (𝑡) 𝑆2 (𝑡)


1 0 1

Whenever, the modulating signal is discrete (binary codeword) then, digital modulation
technique is used. The carrier signal used by the digital modulator is always continuous
sinusoidal wave form of high frequency. The digital modulator maps the input sequence 1’s
and 0’s to the analog signal. If 1 bit at a time is to be transmitted, then digital modulator signal
is 𝑆1 (𝑡) to transmit ‘0’ and 𝑆2 (𝑡) to transmit ‘1’.
The signal 𝑆1 (𝑡) has low frequency compared to signal 𝑆2 (𝑡). Thus, even though the
modulated signal appears to be continuous the modulation is discrete. On the receiver side
the digits demodulator converts modulated signal to the sequence of binary bits.

Communication Channel:
Types of communication channels:
• Wireless
• Wirelines
• Optical Fibers
• Pen Drives
• Optical Disks
• Magnetic Tapes

Connection between transmitter and receiver is established through the channel. Noise,
alteration, distortion and dispersion are introduced in the channel.

Units of Information:
Different units of information can be defined for different base of logarithm:
Base ‘2’ - Bit
Base ‘e’ - Nat
Base ‘10’ - Decit or Hartley
Base ‘2’ or binary system is of practical importance. Hence, when no base is mentioned then
by default it is assumed as base ‘2’.

Conversion of Information Units:


Units Conversion Table Information Theory

Units Bit (Base ‘2’) Nat (Base ‘e’) Decit (Base ‘10’)
1 1
1 Bit = 1 Bit =
Bit (Base ‘2’) - log2 𝑒 log2 10
≈ 0.693147 Nat ≈ 0.301 Decit
1 1 1
1 Nat = = 1 Nat = 𝑙𝑛 10
Nat (Base ‘e’) 𝑙𝑛 2 log𝑒 2 -
≈ 1.443 Bit ≈ 0.4342 Decit
1 1
1 Decit = log 1 Decit = log
Decit (Base ‘10’) 10 2 10 𝑒 -
≈ 3.3219 Bit ≈ 2.302585 Nat

where, 𝑙𝑛 2 = 𝑙𝑜𝑔𝑒 2

Information:
Mathematical presentation
1
𝐼(𝑥𝑖 ) = log 2 [ ] = log 2 1 − log 2 𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 )
= − log 2 𝑃(𝑥𝑖 ) = −3.32 log10 𝑃(𝑥𝑖 )
where,
𝑥𝑖 = message or source, and 𝐼(𝑥𝑖 ) = information carried by message or source 𝑥𝑖

Q. If 𝐼(𝑥1 ) is the information carried by message 𝑥1 and 𝐼(𝑥2 ) is the information carried by
𝑥2 then prove that amount of information carried compositely due to 𝑥1 and 𝑥2 is
𝐼(𝑥1 , 𝑥2 ) = 𝐼(𝑥1 ) + 𝐼(𝑥2 ).

1
Solution: We know that 𝐼(𝑥𝑖 ) = log 2 [𝑃(𝑥 ]
) 𝑖
1
then 𝐼(𝑥1 ) = log 2 [ ]
𝑃(𝑥1 )
1
and 𝐼(𝑥2 ) = log 2 [ ]
𝑃(𝑥2 )
Since, message 𝑥1 and 𝑥2 are independent the probability of composite message is
𝑃(𝑥1 ) × 𝑃(𝑥2 ). Therefore, information carried compositely due to 𝑥1 and 𝑥2 is
1 1 1
𝐼(𝑥1 , 𝑥2 ) = log 2 [ ] = log 2 [𝑃(𝑥 ) × 𝑃(𝑥 )]
𝑃(𝑥1 )∙𝑃(𝑥2 ) 1 2
1 1
= log 2 [𝑃(𝑥 ] + log 2 [𝑃(𝑥 ]
) 1 ) 2
= 𝐼(𝑥1 ) + 𝐼(𝑥2 )
Entropy (H):
1. Entropy is defined as average information per message.
2. It is a measure of uncertainty.
3. It is a quantitative measure of the disorder of a system and inversely related to the
amount of energy available to do the work in an isolated system.
4. High entropy indicates less energy available for the work and low entropy indicates
high energy available for work.
5. Entropy is maximum when all the messages are equiprobable.

𝑯(𝑿) of a Discrete Random Variable X:


𝑚 1 1
1 Commented [DDP5]: Here log 𝑚𝑒𝑎𝑛𝑠 log 2
𝑃(𝑥𝑖 ) 𝑃(𝑥𝑖 )
𝐻(𝑋) = ∑ 𝑃(𝑥𝑖 ) 𝑜𝑔
𝑃(𝑥𝑖 )
𝑖=1
𝑚 1 1
Commented [DDP6]: Here log 𝑚𝑒𝑎𝑛𝑠 log 2
𝑃(𝑥𝑖 ) 𝑃(𝑥𝑖 )
= ∑ 𝑃(𝑥𝑖 ) {𝑜𝑔 1 − log 𝑃(𝑥𝑖 )}
𝑖=1
𝑚

= ∑ 𝑃(𝑥𝑖 ) {0 − log 𝑃(𝑥𝑖 )}


𝑖=1
𝑚
Commented [DDP7]: Negative sign come because of
= − ∑ 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔 𝑃(𝑥𝑖 ) inverse of log
𝑖=1
𝑚

= ∑ 𝑃(𝑥𝑖 ) 𝐼(𝑥𝑖 )
𝑖=1
Entropy does not have negative values.

Q. A quaternary source generates information with probabilities 𝑃1 = 0.1, 𝑃2 = 0.2, 𝑃3 =


0.3 and 𝑃4 = 0.4. Find the entropy.
1
Solution: Here, 𝐻(𝑋) = ∑4𝑖=1 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔 = − ∑4𝑖=1 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔 𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 )

= −[𝑃(𝑥1 ) 𝑙𝑜𝑔 𝑃(𝑥1 ) + 𝑃(𝑥2 ) 𝑙𝑜𝑔 𝑃(𝑥2 ) + 𝑃(𝑥3 ) 𝑙𝑜𝑔 𝑃(𝑥3 ) + 𝑃(𝑥4 ) 𝑙𝑜𝑔 𝑃(𝑥4 )]

= −[0.1 × 𝑙𝑜𝑔(0.1) + 0.2 × 𝑙𝑜𝑔(0.2) + 0.3 × 𝑙𝑜𝑔(0.3) + 0.4 × 𝑙𝑜𝑔(0.4)] Commented [DDP8]: Change of base formula is log 𝑎 𝑥 =
log𝑏 𝑥
log𝑏 𝑎
= −[0.1(log 1 − log 10) + 0.2(log 1 − log 5) + 0.3(log 3 − log 10) +
0.4(log 2 − log 5)]

= −[0.1(0 − 3.321928) + 0.2(0 − 2.321928) + 0.3(1.584963 − 3.321928) +


0.4(1 − 2.321928)] Commented [DDP9]: https://www.logcalculator.net/log-2

= −[0.1(−3.321928) + 0.2(−2.321928) + 0.3(−1.736965) +


0.4(−1.321928)]
= −(−0.3321928 − 0.4643856 − 0.5210895 − 0.5287712)

= −(−1.8464391) = 1.8464391 bits/message

Q. Source having probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05 and 0.05. Find the entropy.

Solution: Here 0.25 is given once, 0.2 is given 2 times, 0.1 is given 2 times and 0.05 is given
thrice. Now
1
𝐻(𝑋) = ∑8𝑖=1 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔2 = − ∑8𝑖=1 𝑃(𝑥𝑖 ) 𝑙𝑜𝑔2 𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 )

= −[𝑃(𝑥1 ) 𝑙𝑜𝑔 𝑃(𝑥1 ) + 𝑃(𝑥2 ) 𝑙𝑜𝑔 𝑃(𝑥2 ) + ⋯ + 𝑃(𝑥8 ) 𝑙𝑜𝑔 𝑃(𝑥8 )]

= −[0.25 𝑙𝑜𝑔(0.25) + 2(0.2) 𝑙𝑜𝑔(0.2) + 2(0.1) 𝑙𝑜𝑔(0.1) + 3(0.05) 𝑙𝑜𝑔(0.05)]

= −[−0.25(2) − 2(0.2)(2.321928) − 2(0.1)(3.321928) − 3(0.05)(4.321928)]

= −[−0.5 − 0.4(2.321928) − 0.2(3.321928) − 0.15(4.321928)]

= −(−0.5 − 0.9287712 − 0.6643856 − 0.6482892) = −(2.741446)

= 2.741446 bits/message

How the encoding is better?


Suppose we are watching cars passing a highway. For simplicity 50% of cars are black, 25%
are white, 12.5% are red and 12.5% are blue.
Consider the flow of cars as an interval of source with four words Black, White, Red and Blue.
A simple way of encoding this source into binary signal would be to associate each colour
with two bits i.e. black colour by 00, white by 01, red by 10 and blue by 11. So, average of
2.00 bit per colour. This is not a better encoding. Other way of encoding is to assign number
of bits depending upon these frequencies of cars. In that case,
Black - 0 (1 bit)
White - 10 (2 bit)
Red - 001 (3 bit)
Blue - 001 (3 bit)

How this second encoding scheme is better?


Answer: Entropy 0.5 Black × 1 bit = 0.5
0.25 White × 2 bit = 0.5
0.125 Red × 3 bits = 0.375
0.125 Blue × 3 bits = 0.375
Average = 1.75 bit per colour
Since the average of first method is 2.00 bit which is greater than the second way. So, second
way is better. Less the entropy, higher the energy.

Q. Prove that entropy is maximum when sources are equiprobable.


Or
The entropy of the source will be maximum when all the averages from the sources are
equally likely.
Or
For a binary source prove that 𝐻𝑚𝑎𝑥 = log 2 𝑚 bits/message.
Or
If the source is transmitting ‘m’ symbols say m 1, m2, m3, … then prove that
𝐻𝑚𝑎𝑥 = log 2 𝑚 bits/message.

Solution:
Let X be a binary source which is random in nature. It emits independent symbols ‘0’ and ‘1’
with equal probabilities i.e. P(0) = P(1) = ½ = P and the source entropy is given by
𝑚

𝐻(𝑋) = − ∑ 𝑃(𝑥𝑖 ) log 2 𝑃(𝑥𝑖 )


𝑖=1

= −[𝑃 log 𝑃 + (1 − 𝑃) log(1 − 𝑃)]

1 1 1 1
= − [ log + (1 − ) log (1 − )]
2 2 2 2
1 1 1 1
= − [2 log 2 + 2 log 2]

1
= − log
2

= −[log 1 − log 2]

= −0 + 1

= 1 bit/message

Further, the source entropy 𝐻(𝑋) will satisfy the relation


0 ≤ 𝐻(𝑋) ≤ log 2 𝑀
where, M is the number of symbols or alphabet ‘X’.
Entropy of a binary communication system in which two messages 𝑥1 and 𝑥2 with the
probability 𝑃(𝑥1 ) and 𝑃(𝑥2 ) = 1 − 𝑃(𝑥1 ) is
1 1
𝐻𝑏𝑖𝑛𝑎𝑟𝑦 = 𝑃(𝑥1 ) log + [1 − 𝑃(𝑥1 )] log
𝑃(𝑥1 ) [1−𝑃(𝑥1 )]

= −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + {1 − 𝑃(𝑥1 )} log{1 − 𝑃(𝑥1 )}]

𝐻𝑏𝑖𝑛𝑎𝑟𝑦

0 ½ 𝑃(𝑥1 )

Entropy goes to 0 (zero) when either message has 1 (one) probability i.e.
𝑃(𝑥1 ) or 𝑃(𝑥2 ) = 1.
If 𝑃(𝑥1 ) = 1, message 𝑥1 will be sent all the time with probability 1.
If 𝑃(𝑥1 ) = 0, then the message 𝑥2 will be sent all the time. So, in these cases no information
is transmitted by sending the message. Therefore, equally likely message entropy is
maximum.
Entropy is maximum for equally likely message 𝑥1 and 𝑥2 .
1
𝑃(𝑥1 ) = = 𝑃, for H to be maximum
2
𝑑𝐻
=0
𝑑𝑃

𝐻 = 𝐻𝑏𝑖𝑛𝑎𝑟𝑦 = −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + (1 − 𝑃(𝑥1 )) log(1 − 𝑃(𝑥1 ))]

= −[𝑃 log 𝑃 + (1 − 𝑃) log(1 − 𝑃)]

𝑑𝐻 𝑃 (1 − 𝑃)
= − [ + log 𝑃 − − log(1 − 𝑃)]
𝑑𝑃 𝑃 (1 − 𝑃)
𝑑𝐻
Put, 𝑑𝑃
=0

𝑃 (1−𝑃)
⇒ − [ + log 𝑃 − − log(1 − 𝑃)] = 0 ⇒ −[1 + log 𝑃 − 1 − log(1 − 𝑃)] = 0
𝑃 (1−𝑃)

1−𝑃
⇒ − log 𝑃 + log(1 − 𝑃) = 0 ⇒ log ( )=0
𝑃
1−𝑃 1
⇒ =1 or 𝑃=
𝑃 2

1 1 1 1
𝐻𝑚𝑎𝑥 = − [2 log 2 + (1 − 2) log (1 − 2)]

1 1
⇒ = − [2 {log 1 − log 2} + 2 {log 1 − log 2}]

⇒ = −{log 1 − log 2} = − log 2 = 1 bit/message

𝑑𝐻 1
If the source is transmitting three messages then we get = 0 will give 𝑃 = i.e. the entropy
𝑑𝑃 3
of a source will be maximum when all the messages from the source are equally likely.
1
If the source is transmitting ‘m’ messages then for 𝐻𝑚𝑎𝑥 , 𝑃(𝑚1 ) = 𝑃(𝑚2 ) = 𝑃(𝑚3 ) = ⋯ =
𝑚
Therefore, in that case

1 1 1 1 𝑚
𝐻𝑚𝑎𝑥 = − [ log 2 + log 2 +⋯] = log 2 𝑚 = log 2 𝑚 bit/message.
𝑚 𝑚 𝑚 𝑚 𝑚

Properties of Entropy:

1) 0 ≤ 𝐻(𝑋) ≤ log 2 𝑚, where m is the number of symbols of the alphabet of source ‘X’.

2) When all the events are equally likely, the average uncertainty must have the largest
value i.e. log 2 𝑚 ≥ 𝐻(𝑋).

3) if the probability of occurrence of events slightly changed than the measure of


uncertainty associated with the system should vary accordingly in the continuous
manner.

4) H(X) = 0, if all the P(X𝑖 ) are zero except for one symbol with 𝑝 = 1.

5) Entropy must be functionally symmetric in energy.

6) Partitioning of symbols into sub-symbols cannot use the entropy.

Q. Discuss the properties of entropy.

The intuitive and the engineering interpretation of entropy:

We have two physical interpretation of information. According to the engineering point of


view the information content of any message is equal to the minimum number of digits
required to encode the message and therefore the entropy is equal to the minimum number
of digits per message required.
On the average for encoding from the intuitive standpoint on the other hand information is
thought of as being synchronized with the amount of surprise or uncertainty associated with
the event. A smaller probability of occurrence implies more uncertainty about the event.

Information Rate (R):


If the message source 𝑋 generates message at the rate of ‘r’ messages per second
(symbol/sec.) than the information rate ‘R’ of the source is
𝑅 = 𝑟𝐻(𝑋) bit/sec.
A source which has higher message rate will have a higher demand on the communication
channel than the source with a smaller message rate.

Q. A continuous signal is band limited to 5 kHz. A signal is quantized in 8 levels of PCM


(Pulse Code Modulation) with probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05 and 0.05. Calculate
the entropy and rate of information.

Solution: We are given

𝑃(𝑥1 ) = 0.25, 𝑃(𝑥2 ) = 0.2, 𝑃(𝑥3 ) = 0.2, 𝑃(𝑥4 ) = 0.1, 𝑃(𝑥5 ) = 0.1, 𝑃(𝑥6 ) = 0.05, 𝑃(𝑥7 ) =
0.05, 𝑃(𝑥8 ) = 0.05

𝐻(𝑋) = − ∑8𝑖=1 𝑃(𝑋𝑖 ) log 𝑃(𝑥𝑖 )

= −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + 𝑃(𝑥2 ) log 𝑃(𝑥2 ) + ⋯ + 𝑃(𝑥8 ) log 𝑃(𝑥8 )]

= −[0.25 log 0.25 + 0.2 log 0.2 + 0.2 log 0.2 + 0.1 log 0.1 + 0.1 log 0.1 + 0.05 log 0.05 +
0.05 log 0.05 + 0.05 log 0.05]

= −[0.25 log 0.25 + 0.4 log 0.2 + 0.2 log 0.1 + 0.15 log 0.05]

1 1 1 1
= − [0.25 log + 0.4 log + 0.2 log + 0.15 log ]
4 5 10 20

= −[0.25{log 1 − log 4} + 0.4{log 1 − log 5} + 0.2{log 1 − log 10} + 0.15{log 1 − log 20}]

= −0.25(0 − 2) − 0.4(0 − 2.321928) − 0.2(0 − 3.321928)0.15(0 − 4.321928)

= −0.25(−2) − 0.4(−2.321928) − 0.2(−3.321928) − 0.15(−4.321928)

= 0.5 + 0.9287712 + 0.6643856 + 0.6482892 = 2.741446 bit/message

𝑅 = 𝑟𝐻(𝑋)

𝑓𝑚 = 5 𝑘𝐻𝑧
𝑟 = 2 𝑓𝑚 = 2 × 5 𝑘𝐻𝑧 = 10 𝑘𝐻𝑧 = 10000 message/sec.

So, 𝑅 = 10000 × 2.741446 = 27414.46 ≈ 27414 bit/sec.

Discrete Random Variable:


If a coin is tossed three times, the number of head obtained can be 0, 1, 2, 3. The probability
of these possibilities can be tabulated as:

Number of Heads: 0 1 2 3
Probability: 1/8 3/8 3/8 1/8

Here, total outcomes are: {TTT, HTT, THT, TTH, HHT, HTH, THH, HHH}
A discrete random variable has countable number of possible values. In this example the
number of heads can only take values (0, 1, 2, 3). So, variable is discrete and is called random
if the sum of probabilities is 1.

Information Source:

Type Classification

Discrete Memoryless

Continuous Having Memory

1) Type:
A continuous source has infinite number of symbols as the possible outcome, whereas
discrete source has a finite number of symbols as the possible outcome.

2) Classification:
Memoryless information source is that source in which output depends upon the
present input not on the previous input. In having memory, output depends upon the
present as well as past value of input.
Communication Channel:

𝑥1 𝑦1
𝑻𝑿 𝑥2 𝑦2 𝑹𝒀
⋮ 𝑷(𝒚𝒋 |𝒙𝒊 ) ⋮
Source X ⋮ ⋮ Receiver Y
𝑥𝑚 𝑦𝑛

Discrete Memoryless Channel (DMC):


A communication channel is a medium through which the signals generated by the source
flow through the receiver. A discrete memoryless channel is a model with an input 𝑋 and
output 𝑌 which accepts symbols.

If the alphabet 𝑋 and 𝑌 are infinite than the channel is continuous channel.

If alphabet 𝑋 and 𝑌 are finite than the channel is discrete channel.

Discrete memoryless channel shown in the figure has ‘m’ inputs (𝑥1 , 𝑥2 , … , 𝑥𝑚 ) and ‘n’
outputs (𝑦1 , 𝑦2 , … , 𝑦𝑛 ).

The channel is represented by the conditional probability 𝑃(𝑦𝑗 |𝑥𝑖 ), where 𝑃(𝑦𝑗 |𝑥𝑖 ) is the
conditional probability of obtaining an output 𝑦𝑗 when the input 𝑥𝑖 is known and it it called
channel transition probability. So, a channel is completely characterized by channel
transition probability in the form of channel matrix.
𝑃(𝑦1 |𝑥1 ) 𝑃(𝑦2 |𝑥1 ) ⋯ 𝑃(𝑦𝑛 |𝑥1 )
𝑃(𝑦1 |𝑥2 ) 𝑃(𝑦2 |𝑥2 ) ⋯ 𝑃(𝑦𝑛 |𝑥2 )
[𝑃(𝑌|𝑋)] = [ ]
⋮ ⋮ ⋮
𝑃(𝑦1 |𝑥𝑚 ) 𝑃(𝑦2 |𝑥𝑚 ) ⋯ 𝑃(𝑦𝑛 |𝑥𝑚 )

where, 𝑃(𝑦𝑗 |𝑥𝑖 ) is called a channel matrix or channel transition matrix.

Properties of Channel Matrix:


• Each input to the channel, results in some output.
• Each row of the channel matrix must sum to unity.
• If the input probability 𝑃(𝑋) is represented by the row matrix
[𝑃(𝑋)] = [𝑃(𝑥1 ) 𝑃(𝑥2 ) ⋯ 𝑃(𝑥𝑚 )]
and output probability 𝑃(𝑌) by [𝑃(𝑌)] = [𝑃(𝑦1 ) 𝑃(𝑦2 ) ⋯ 𝑃(𝑦𝑛 )]
then, [𝑃(𝑌)] = [𝑃(𝑋)] ∙ [𝑃(𝑌|𝑋)]
𝑚

𝐻(𝑋) = − ∑ 𝑃(𝑥𝑖 ) log 2 (𝑥𝑖 )


𝑖=1
𝑛

𝐻(𝑌) = − ∑ 𝑃(𝑦𝑗 ) log 2 (𝑦𝑗 )


𝑗=1
𝑚 𝑛

𝐻(𝑋, 𝑌) = − ∑ ∑ 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log2 𝑃 (𝑥𝑖 , 𝑦𝑗 )


𝑖=1 𝑗=1
𝑚 𝑛

𝐻(𝑋|𝑌) = − ∑ ∑ 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃(𝑥𝑖 |𝑦𝑗 )


𝑖=1 𝑗=1
𝑚 𝑛

𝐻(𝑌|𝑋) = − ∑ ∑ 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃(𝑦𝑗 |𝑥𝑖 )


𝑖=1 𝑗=1
𝑃(𝑥𝑖 .𝑦𝑗 )
where, 𝑃(𝑥𝑖 |𝑦𝑗 ) =
𝑃 (𝑦 𝑗 )
𝑃(𝑥𝑖 ∙ 𝑦𝑗 )
𝑃(𝑦𝑗 |𝑥𝑖 ) =
𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 , 𝑦𝑗 ) = 𝑃(𝑥𝑖 ) ∙ 𝑃(𝑦𝑗 )
𝑃(𝑥𝑖 ,𝑦𝑗 )
𝑃(𝑥𝑖 )
= 𝑃(𝑦𝑗 )
𝑃(𝑥𝑖 ,𝑦𝑗 )
𝑃 (𝑦 𝑗 )
= 𝑃(𝑥𝑖 )

Joint Probability Matrix:

𝑃(𝑥1 , 𝑦1 ) 𝑃(𝑥1 , 𝑦2 ) ⋯ 𝑃(𝑥1 , 𝑦𝑛 )


𝑃(𝑥2 , 𝑦1 ) 𝑃(𝑥2 , 𝑦2 ) ⋯ 𝑃(𝑥2 , 𝑦𝑛 )
[𝑃(𝑋, 𝑌)] = [ ]
⋮ ⋮ ⋮
𝑃(𝑥𝑚 , 𝑦1 ) 𝑃(𝑥𝑚 , 𝑦2 ) ⋯ 𝑃(𝑥𝑚 , 𝑦𝑛 )

Mutual Information:
The information gained about 𝑥𝑖 by the reception of 𝑦𝑗 is the net reduction in its uncertainty
is called mutual information.
𝑰(𝒙𝒊 , 𝒚𝒋 ) = 𝑰𝒏𝒊𝒕𝒊𝒂𝒍 𝑼𝒏𝒄𝒆𝒓𝒕𝒂𝒊𝒏𝒕𝒚 – 𝑭𝒊𝒏𝒂𝒍 𝑼𝒏𝒄𝒆𝒓𝒕𝒂𝒊𝒏𝒕𝒚
𝑥𝑖 𝑥𝑖
𝐼(𝑥𝑖 , 𝑦𝑗 ) = − log 𝑃(𝑥𝑖 ) − [− log 𝑃 ( )] = [log 𝑃 ( )] − log 𝑃(𝑥𝑖 )
𝑦𝑗 𝑦𝑗
𝑥
𝑃( 𝑖 )
𝑦𝑗 𝑃 (𝑥𝑖 ∙ 𝑦𝑗 )
= log [ ] = log [𝑃(𝑥 )∙𝑃(𝑦 )] P(A|B)=P(AB)/P(B)
𝑃(𝑥𝑖 ) 𝑖 𝑗
𝑃(𝑦𝑗 |𝑥𝑖 )
= log [ ]
𝑃(𝑦𝑗 )
𝐼(𝑥𝑖 , 𝑦𝑗 ) = 𝐼(𝑦𝑗 , 𝑥𝑖 )
Mutual information is symmetrical in nature.

Relation Between Entropy and Probability:


1. 𝐻(𝑋, 𝑌) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 , 𝑦𝑗 ) … (1)
2. 𝐻(𝑋|𝑌) = 𝐻 (𝑋|𝑦𝑗 ) and 𝐻(𝑌|𝑋) = 𝐻(𝑌|𝑥𝑖 )
𝑃(𝑥𝑖 ) = ∑𝑚
𝑖=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ), 𝑃(𝑦𝑗 ) = ∑𝑛𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 )
𝑥 𝑥 𝑥
𝑦𝑗 may occur in conjugation with 𝑥1 , 𝑥2 , … , 𝑥𝑚 . Thus, [𝑋|𝑦𝑗 ] = [ 1 , 2 , … , 𝑚 ] and
𝑦 𝑦 𝑦 1 2 𝑛

association probability.
𝑃[𝑋|𝑦𝑗 ] = [𝑃(𝑥1 |𝑦𝑗 ), 𝑃(𝑥2 |𝑦𝑗 ), … , 𝑃(𝑥𝑚 |𝑦𝑗 )]
𝑃(𝑥1 ,𝑦𝑗 ) 𝑃 (𝑥2 ,𝑦𝑗 ) 𝑃 (𝑥𝑚 ,𝑦𝑗 )
=[ , ,…, ] … (2)
𝑃 (𝑦 𝑗 ) 𝑃 (𝑦𝑗 ) 𝑃(𝑦𝑗 )

Now, 𝑃(𝑥1 , 𝑦𝑗 ) + 𝑃(𝑥2 , 𝑦𝑗 ) + ⋯ + 𝑃(𝑥𝑚 , 𝑦𝑗 ) = 𝑃(𝑦𝑗 )

Therefore, ∑𝑚
𝑖=1 𝑃 (𝑥𝑖 |𝑦𝑗 ) = 1

Hence, the sum of the elements of the matrix given by equation (2) is unity.

Therefore, entropy may be associated with probability i.e.


𝑚
𝑃 (𝑥𝑖 , 𝑦𝑗 ) 𝑃 (𝑥𝑖 , 𝑦𝑗 )
𝐻 (𝑋|𝑦𝑗 ) = − ∑ log
𝑃 (𝑦𝑗 ) 𝑃(𝑦𝑗 )
𝑖=1

= − ∑ 𝑃(𝑥𝑖 |𝑦𝑗 ) log 𝑃(𝑥𝑖 |𝑦𝑗 )


𝑖=1

Derivations:
𝐻(𝑋|𝑌) = 𝐻 (𝑋|𝑦𝑗 ) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 ) … (3)
𝐻(𝑌|𝑋) = 𝐻(𝑌|𝑥𝑖 ) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃(𝑦𝑗 |𝑥𝑖 ) … (4)
Proof: For equation (3)
𝐻(𝑋|𝑌) = ∑𝑛𝑗=1 𝑃(𝑦𝑗 ) 𝑃(𝑥|𝑦𝑗 )
= − ∑𝑛𝑗=1 𝑃(𝑦𝑗 ) ∑𝑚
𝑖=1 𝑃(𝑥𝑖 |𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 )

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑦𝑗 ) 𝑃(𝑥𝑖 |𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 )

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑥𝑖 |𝑦𝑗 ) Hence proved.

For equation (4)


𝐻(𝑌|𝑋) = ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) 𝑃(𝑦|𝑥𝑖 )

= − ∑𝑚 𝑛
𝑖=1 𝑃(𝑥𝑖 ) ∑𝑗=1 𝑃 (𝑦𝑗 |𝑥𝑖 ) log 2 𝑃 (𝑦𝑗 |𝑥𝑖 )

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 ) 𝑃 (𝑦𝑗 |𝑥𝑖 ) log 2 𝑃 (𝑦𝑗 |𝑥𝑖 )

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃 (𝑦𝑗 |𝑥𝑖 ) Hence proved.

Chain Rule:
1. 𝐻(𝑋, 𝑌) = 𝐻(𝑋|𝑌) + 𝐻(𝑌)
2. 𝐻(𝑋, 𝑌) = 𝐻(𝑌|𝑋) + 𝐻(𝑋)
Proof:
1. 𝐻(𝑋, 𝑌) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 )

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log[𝑃(𝑥𝑖 |𝑦𝑗 ) ∙ 𝑃(𝑦𝑗 )]

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) [log 𝑃(𝑥𝑖 |𝑦𝑗 ) + log 𝑃(𝑦𝑗 )]

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1[𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 |𝑦𝑗 ) + 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 )]

= 𝐻(𝑋|𝑌) − ∑𝑛𝑗=1[∑𝑚
𝑖=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 )]

= 𝐻(𝑋|𝑌) − ∑𝑛𝑗=1 𝑃(𝑦𝑗 ) log 𝑃(𝑦𝑗 ) 𝑚


[∵ 𝑃(𝑦𝑗 ) = ∑𝑖=1 𝑃(𝑥𝑖 , 𝑦𝑗 )]
⇒ 𝐻(𝑋, 𝑌) = 𝐻(𝑋|𝑌) + 𝐻(𝑌).

2. 𝐻(𝑋, 𝑌) = − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 )

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log[𝑃(𝑦𝑗 |𝑥𝑖 ) ∙ 𝑃(𝑥𝑖 )]

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) [log 𝑃(𝑦𝑗 |𝑥𝑖 ) + log 𝑃(𝑥𝑖 )]

= − ∑𝑚 𝑛
𝑖=1 ∑𝑗=1[𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 |𝑥𝑖 ) + 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 )]

= 𝐻(𝑌|𝑋) − ∑𝑚 𝑛
𝑖=1[∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 )]

= 𝐻(𝑌|𝑋) − ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) [∵ 𝑃(𝑥𝑖 ) = ∑𝑛𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 )]
⇒ 𝐻(𝑋, 𝑌) = 𝐻(𝑌|𝑋) + 𝐻(𝑋).
Relation Between Mutual Information and Entropy:
𝐼(𝑋; 𝑌) = ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) 𝐼 (𝑥𝑖 , 𝑦𝑗 )

𝑃(𝑥𝑖 |𝑦𝑗 )
= ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 )

= ∑𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) [log 𝑃 (𝑥𝑖 |𝑦𝑗 ) − log 𝑃(𝑥𝑖 )]

= − ∑𝑚 𝑛 𝑚 𝑛
𝑖=1 ∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 ) − [− ∑𝑖=1 ∑𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 |𝑦𝑗 )]

= − ∑𝑚 𝑛
𝑖=1[∑𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 )] log 𝑃(𝑥𝑖 ) − 𝐻(𝑋|𝑌)

= − ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) − 𝐻(𝑋|𝑌)

= 𝐻(𝑋) − 𝐻(𝑋|𝑌)
= 𝐻(𝑋) − [𝐻(𝑋, 𝑌) − 𝐻(𝑌)]
= 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌)
= 𝐻(𝑌) − 𝐻(𝑌|𝑋)
Mutual information 𝐼(𝑋, 𝑌) does not depend on the individual symbols 𝑥𝑖 and 𝑦𝑗 . It is a
property of whole communication system. On the other hand 𝐼(𝑥𝑖 , 𝑦𝑗 ) depends upon the
individual symbols 𝑥𝑖 and 𝑦𝑗 .

Q1. A transmitter has an alphabet of four letters [𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ] and the receiver has an alphabet
of three letters [𝑦1 , 𝑦2 , 𝑦3 ]. The joint probability matrix is:
0.3 0.05 0
0 0.25 0
[𝑃(𝑋, 𝑌)] = [ ]. Calculate the entropies.
0 0.15 0.05
0 0.05 0.15

Solution: We have
𝑦1 𝑦2 𝑦3
𝑥1 0.3 0.05 0
𝑥2 0 0.25 0
[𝑃(𝑋, 𝑌)] = 𝑥 [ ]
3 0 0.15 0.05
𝑥4 0 0.05 0.15

𝑃(𝑥1 ) 𝑃(𝑥2 ) 𝑃(𝑥3 ) 𝑃(𝑥4 )


𝑃(𝑋) = [0.35 0.25 0.2 0.2]

𝑃(𝑦1 ) 𝑃(𝑦2 ) 𝑃(𝑦3 )


𝑃(𝑌) = [0.3 0.5 0.2]

𝐻(𝑋) = − ∑4𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 )


= −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + 𝑃(𝑥2 ) log 𝑃(𝑥2 ) + 𝑃(𝑥3 ) log 𝑃(𝑥3 ) + 𝑃(𝑥4 ) log 𝑃(𝑥4 )]
= −[0.35 log(0.35) + 0.25 log(0.25) + 0.2 log(0.2) + 0.2 log(0.2)]
= −[0.35 log(0.35) + 0.25 log(0.25) + 0.4 log(0.2)]
= −[0.35(log 7 − log 20) + 0.25(log 1 − log 4) + 0.4(log 1 − log 5)]
= −[0.35(2.807355 − 4.321928) + 0.25(0 − 2) + 0.4(0 − 2.321928)]
= −(−0.53010055 − 0.5 − 0.9287712)
= 1.95887175 bit/message

𝐻(𝑌) = − ∑3𝑗=1 𝑃(𝑦𝑗 ) log 𝑃(𝑦𝑗 )


= −[𝑃(𝑦1 ) log 𝑃(𝑦1 ) + 𝑃(𝑦2 ) log 𝑃(𝑦2 ) + 𝑃(𝑦3 ) log 𝑃(𝑦3 )]
= −[0.3 log(0.3) + 0.5 log(0.5) + 0.2 log(0.2)]
= −[0.3(log 3 − log 10) + 0.5(log 1 − log 2) + 0.2(log 1 − log 5)]
= −[0.3(1.584963 − 3.321928) + 0.5(0 − 1) + 0.2(0 − 2.321928)]
= −(−0.5210895 − 0.5 − 0.4643856)
= 1.4854751 bit/message

𝐻(𝑋, 𝑌) = − ∑4𝑖=1 ∑3𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 , 𝑦𝑗 )


= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 , 𝑦1 ) + ⋯ + 𝑃(𝑥4 , 𝑦3 ) log 𝑃(𝑥4 , 𝑦3 )]
= −[0.3 log(0.3) + 0.05 log(0.05) + 0.25 log(0.25) + 0.15 log(0.15) +
0.05 log(0.05) + 0.05 log(0.05) + 0.15 log(0.15)]
= −[0.3(log 3 − log 10) + 3(0.05)(log 1 − log 20) + 0.25(log 1 − log 4) +
2(0.15)(log 3 − log 20)]
= −[0.3(1.584963 − 3.321928) + 0.15(0 − 4.321928) + 0.25(0 − 2) +
0.3(1.584963 − 4.321928)]
= −(−0.5210895 − 0.6282892 − 0.5 − 0.8210895)
= 2.4704682 bit/message

𝐻(𝑋|𝑌) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌) = 2.4704682 − 1.4854751 = 0.9849931 bit/message

𝐻(𝑌|𝑋) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = 2.4704682 − 1.95887175 = 0.51159645 bit/message


Q2. A discrete source transmits message 𝑥1 , 𝑥2 , 𝑥3 with the probability 0.3, 0.4, 0.3
respectively. If the conditional probability matrix is given as:
𝑦1 𝑦2 𝑦3
𝑥1 0.8 0.2 0
[𝑃(𝑌|𝑋)] = 𝑥2 [ 0 1 0 ]. Calculate all the entropies.
𝑥3 0 0.3 0.7
Solution:
[𝑃(𝑋)] = [𝑃(𝑥1 ) 𝑃(𝑥2 ) 𝑃(𝑥3 )] = [0.3 0.4 0.3]
Joint probability matrix [𝑃(𝑋, 𝑌)] can be obtained by multiplying the rows of [𝑃(𝑌|𝑋)]
by 𝑃(𝑥1 ), 𝑃(𝑥2 ), 𝑃(𝑥3 ) i.e. 0.3, 0.4, 0.3 respectively.
𝑦1 𝑦2 𝑦3
𝑥1 0.24 0.06 0 𝑃(𝑥1 , 𝑦1 ) 𝑃(𝑥1 , 𝑦2 ) 𝑃(𝑥1 , 𝑦3 )
[𝑃(𝑋, 𝑌)] = 𝑥2 [ 0 0.4 0 ] = [𝑃(𝑥2 , 𝑦1 ) 𝑃(𝑥2 , 𝑦2 ) 𝑃(𝑥2 , 𝑦3 )]
𝑥3 0 0.09 0.21 𝑃(𝑥3 , 𝑦1 ) 𝑃(𝑥3 , 𝑦2 ) 𝑃(𝑥3 , 𝑦3 )
The probability 𝑃(𝑦1 ), 𝑃(𝑦2 ), 𝑃(𝑦3 ) can be obtained by adding the columns of [𝑃(𝑋, 𝑌)].
[𝑃(𝑌)] = [𝑃(𝑦1 ) 𝑃(𝑦2 ) 𝑃 (𝑦3 )] = [0.24 0.55 0.21]
Now the conditional probability matrix [𝑃(𝑋|𝑌)] can be obtained by dividing the
columns of joint probability matrix by 𝑃(𝑥1 ), 𝑃(𝑥2 ), 𝑃(𝑥3 ) respectively.
1 0.109 0
[𝑃(𝑋|𝑌)] = [0 0.727 0]
0 0.164 1

𝐻(𝑋) = − ∑3𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 )


= −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + 𝑃(𝑥2 ) log 𝑃(𝑥2 ) + 𝑃(𝑥3 ) log 𝑃(𝑥3 )]
= −[0.3 log(0.3) + 0.4 log(0.4) + 0.3 log(0.3)]
= −[0.6 log(0.3) + 0.4 log(0.4)] = −[0.6(log 3 − log 20) + 0.4(log 2 − log 5)]
= −[0.6(1.584963 − 3.321928) + 0.4(1 − 2.321928)]
= −(−1.042179 − 0.5287712)
= 1.5709502 bit/message

𝐻(𝑌) = − ∑3𝑗=1 𝑃(𝑦𝑗 ) log 𝑃(𝑦𝑗 )


= −[𝑃(𝑦1 ) log 𝑃(𝑦1 ) + 𝑃(𝑦2 ) log 𝑃(𝑦2 ) + 𝑃(𝑦3 ) log 𝑃(𝑦3 )]
= −[0.24 log(0.24) + 0.55 log(0.55) + 0.21 log(0.21)]
= −[0.24(log 6 − log 25) + 0.55(log 11 − log 20) + 0.21(log 21 − log 100)]
= −[0.24(2.584963 − 4.643856) + 0.55(3.459432 − 4.321928) +
0.21(4.392317 − 6.643856)]
= −(−0.49413432 − 0.4743728 − 0.47282319)
= 1.44133031 bit/message

𝐻(𝑋, 𝑌) = − ∑3𝑖=1 ∑3𝑗=1 𝑃 (𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 , 𝑦𝑗 )


= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 , 𝑦1 ) + ⋯ + 𝑃(𝑥3 , 𝑦3 ) log 𝑃(𝑥3 , 𝑦3 )]
= −[0.24 log(0.24) + 0.06 log(0.06) + 0.4 log(0.4) + 0.09 log(0.09) +
0.21 log(0.21)]
= −[0.24(log 6 − log 25) + 0.06(log 3 − log 50) + 0.4(log 2 − log 5) +
0.09(log 9 − log 100) + 0.21(log 21 − log 100)]
= −[0.24(2.584963 − 4.643856) + 0.06(1.584963 − 5.643856) + 0.4(1 −
2.321928) + 0.09(3.169925 − 6.643856) + 0.21(4.392317 − 6.643856)]
= −(−0.49413432 − 0.24353358 − 0.5287712 − 0.3265379 − 0.47282319)
= 2.06580019 bit/message
𝐻(𝑋|𝑌) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌) = 2.06580019 − 1.44133031 = 0.62446988 bit/message
𝐻(𝑌|𝑋) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = 2.06580019 − 1.5709502 = 0.49484999 bit/message

Q3. Find the mutual information for the channel whose joint probability matrix is:
0.125 0.075 0.05
0.125 0.075 0.05
[𝑃(𝑋, 𝑌)] = [ ].
0.125 0.075 0.05
0.125 0.075 0.05
Solution: 𝐼(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌)
[𝑃(𝑋)] = [0.25 0.25 0.25 0.25] and [𝑃(𝑌)] = [0.5 0.3 0.2]

𝐻(𝑋) = − ∑4𝑖=1 𝑃(𝑋𝑖 ) log 𝑃(𝑋𝑖 )


= −[𝑃(𝑋1 ) log 𝑃(𝑋1 ) + 𝑃(𝑋2 ) log 𝑃(𝑋2 ) + 𝑃(𝑋3 ) log 𝑃(𝑋3 ) + 𝑃(𝑋4 ) log 𝑃(𝑋4 )]
= −[0.25 log 0.25 + 0.25 log 0.25 + 0.25 log 0.25 + 0.25 log 0.25]
= −[4(0.25) log 0.25] = −[log 1 − log 4] = −(0 − 2) = 2 bit/message

𝐻(𝑌) = − ∑3𝐽=1 𝑃(𝑌𝑗 ) log 𝑃 (𝑌𝑗 )


= −[𝑃(𝑌1 ) log 𝑃(𝑌1 ) + 𝑃(𝑌2 ) log 𝑃(𝑌2 ) + 𝑃(𝑌3 ) log 𝑃(𝑌3 )]
= −[0.5 log 0.5 + 0.3 log 0.3 + 0.2 log 0.2]
= −[0.5(−1) + 0.3(−1.736965) + 0.2(−2.321928)]
= −[−0.5 − 0.5210895 − 0.4643856] = 1.4854751 bit/message

𝐻(𝑋, 𝑌) = − ∑4𝑖=1 ∑3𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 )


= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 , 𝑦1 ) + ⋯ + 𝑃(𝑥4 , 𝑦3 ) log 𝑃(𝑥4 , 𝑦3 )]
= −4[0.125 log(0.125) + 0.075 log(0.075) + 0.05 log(0.05)]
= −4[0.125(log 1 − log 8) + 0.075(log 3 − log 40) + 0.05(log 1 − log 20)]
= −4[0.125(−3) + 0.075(−3.736965) + 0.05(−4.321928)]
= −4(−0.375 − 0.28027238 − 0.2160964) = 3.48547512 bit/message

𝐼(𝑋, 𝑌) = 2 + 1.4854751 − 3.48547512 ≈ 0

Relative Entropy (or Kullback - Leibler) Between 𝒑(𝒙) and 𝒒(𝒙):


The relative entropy is a measure of the distance between two probability distributions on a
random variable. It is not a true distance, since it is not symmetric and does not satisfy the
triangle inequality, sometimes called Kullback-Leibler divergence. For two probabilities Commented [DDP10]: The triangle inequality states that
for any triangle, the sum of the lengths of any two sides
𝑝(𝑥) and 𝑞(𝑥) over a discrete random variable X relative entropy is given by must be greater than or equal to the length of the remaining
side
𝑝(𝑥)
𝐷(𝑝||𝑞) = 𝐷𝐾𝐿 (𝑝||𝑞) = ∑ 𝑝(𝑥) log
𝑞(𝑥)
𝑥∈𝒳

𝑝(𝑥)
= 𝐸𝑝 [log ]
𝑞(𝑥)
0 0 𝑝
Conversions, 0 log = 0, 0 log = 0 and 𝑝 log = ∞
0 𝑞 0

Interpretation: The relative entropy is a measure of the distance between two distributions.
In statistics, it arises as an expected logarithm of likelihood ratio.
Relative entropy is a measure of the inefficiency of assuming that the distribution is q when
the true distribution is p.

Relation Between Relative Entropy and Mutual Information:


Let (𝑋, 𝑌) ~ 𝑝(𝑥, 𝑦), 𝑝(𝑥), 𝑝(𝑦) mass functions, then mutual information 𝐼(𝑋; 𝑌) is the
relative entropy between 𝑝(𝑥, 𝑦) and 𝑝(𝑥) 𝑝(𝑦).
𝐼(𝑋; 𝑌) = 𝐷[𝑝(𝑥, 𝑦)||𝑝(𝑥)𝑝(𝑦)]
𝑝(𝑥, 𝑦)
= ∑ ∑ 𝑝(𝑥, 𝑦) log
𝑝(𝑥)𝑝(𝑦)
𝑥∈𝒳 𝑦∈𝒴

𝑝(𝑋, 𝑌)
= 𝐸𝑝 (𝑥, 𝑦) [log ]
𝑝(𝑋)𝑝(𝑌)

Chain Rule for Relative Entropy:


Conditional relative entropy 𝐷[𝑝(𝑦|𝑥)||𝑞(𝑦|𝑥)]
𝑝(𝑦|𝑥)
= ∑ 𝑝(𝑦|𝑥) ∑ 𝑝(𝑦|𝑥) log
𝑞(𝑦|𝑥)
𝑥 𝑦

𝑝(𝑌|𝑋)
= 𝐸𝑝(𝑥,𝑦) [log ]
𝑞(𝑌|𝑋)
Now chain rule for relative entropy
𝐷[𝑝(𝑥, 𝑦)||𝑞(𝑥, 𝑦)] = 𝐷[𝑝(𝑥)||𝑞(𝑥)] + 𝐷[𝑝(𝑦|𝑥)||𝑞(𝑦|𝑥)]

Channel Capacity:
It is the maximum of mutual information denoted by C.
𝐶 = max 𝐼(𝑋; 𝑌)
= max[𝐻(𝑋) − 𝐻(𝑋|𝑌)]
= max[𝐻(𝑌) − 𝐻(𝑌|𝑋)]
Mutual information is the difference between two entropies. Hence, sometimes the unit of
channel capacity is taken as bit/sec.

Transmission Efficiency or Channel Efficiency:


It is denoted by 𝜂 (eta)
𝐴𝑐𝑡𝑢𝑎𝑙 𝑇𝑟𝑎𝑛𝑠𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛
𝜂=
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑇𝑟𝑎𝑛𝑠𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛
𝐼(𝑋; 𝑌)
=
max 𝐼(𝑋; 𝑌)
𝐼(𝑋; 𝑌)
=
𝐶
Redundancy:
Redundancy is the reduction in information content of a message from its maximum value.
It is denoted by E.
𝐼(𝑋; 𝑌) 𝐶 − 𝐼(𝑋; 𝑌)
𝐸 =1−𝜂 = 1− =
𝐶 𝐶
𝐻(𝑌|𝑋) 𝑅 𝑟𝐻(𝑋) Commented [DDP11]: Information Rate
or =1− = 1− alternatively 𝐸 = 1 −
𝐻(𝑋) 𝐶 𝐶
𝐻(𝑋|𝑌) 𝑅 𝑟𝐻(𝑌)
=1− 𝐻(𝑌)
= 1−𝐶 alternatively 𝐸 = 1 − 𝐶

Types of Channels: (Discrete and Random Channels)


1. Lossless channel
2. Deterministic channel
3. Noiseless channel
4. Binary symmetric channel (BSC)
5. Binary eraser channel
6. Cascaded channel
7. Symmetric and uniform channel
8. Channel with independent I/P (input) and independent O/P (output)

1. Lossless Channel: It is one in which no source information is lost in transmission. It


has a channel matrix with only one non-zero element in each column defined by the
channel transmitter matrix.
𝑦1 𝑦2 𝑦3 𝑦4 𝑦5
3 1
𝑥1 4 0 0 0
4
[𝑃(𝑌|𝑋)] = 𝑥2 1 2
𝑥3 0 0
3 3
0
[0 0 0 0 1]

Signal flow graph or Channel diagram –


3/4 𝒚𝟏
𝒙𝟏 1/4
𝒚𝟐

1/3 𝒚𝟑

𝒙𝟐 2/3
𝒚𝟒
1
𝒙𝟑 𝒚𝟓

For a lossless channel 𝐻(𝑋|𝑌) = 0. The channel capacity per symbol for a lossless
channel is given by 𝐶 = 𝑚𝑎𝑥[𝐻(𝑋)] = log 2 𝑚. Also, 𝐼(𝑋; 𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) =
𝐻(𝑋) − 0 = 𝐻(𝑋). The mutual information 𝐼(𝑋; 𝑌) is equal to the source entropy
𝐻(𝑋) and no source information is lost in transmission.

2. Deterministic Channel: A channel is termed as deterministic channel if its channel


matrix has only one non-zero element in each row. Which naturally must be unity.
The channel matrix is given by
𝑦1 𝑦2 𝑦3
𝑥1 1 0 0
𝑥2 1 0 0
[𝑃(𝑌|𝑋)] = 𝑥3 0 1 0
𝑥4 0 1 0
𝑥5 [0 0 1]

Signal flow graph or Channel diagram –

𝒙𝟏 1
𝒚𝟏
𝒙𝟐 1
𝒙𝟑 1
𝒚𝟐
𝒙𝟒 1

1
𝒙𝟓 𝒚𝟑
Here, 𝐻(𝑌|𝑋) = 0 and 𝐻(𝑋) = 𝐻(𝑌) or 𝐻(𝑋) ≠ 𝐻(𝑌). It may be seen that when a
given source symbol is transmitted in a deterministic channel, it is sure that what
output symbol will be received.

3. Noiseless Channel: A channel in which both lossless and deterministic channel’s


characteristics are seen is called noiseless channel. The channel matrix has one
element in each column and this element is unity.
𝑦1 𝑦2 𝑦3 𝑦4
𝑥1 1 0 0 0
𝑥 0 1 0 0
[𝑃(𝑌|𝑋)] = 𝑥2 [ ]
3 0 0 1 0
𝑥4 0 0 0 1

Signal flow graph or channel diagram –

𝒙𝟏 1
𝒚𝟏

1
𝒙𝟐 𝒚𝟐

1
𝒙𝟑 𝒚𝟑

1
𝒙𝟒 𝒚𝟒

Some properties are:


• [𝑃(𝑌|𝑋)] = [𝑃(𝑋|𝑌)]
• 𝐻(𝑋, 𝑌) = 𝐻(𝑋) = 𝐻(𝑌)
• 𝐻(𝑌|𝑋) = 𝐻(𝑋|𝑌) = −𝑚(1 log 1) = 0
• 𝐼(𝑋; 𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) = 𝐻(𝑋) = 𝐻(𝑌) = 𝐻(𝑋, 𝑌)

4. Binary Symmetric Channel (BSC): A BSC has two inputs {𝑥1 = 0, 𝑥2 = 1} and two
outputs {𝑦1 = 0, 𝑦2 = 1}, here 𝑚 = 𝑛 = 2. The channel matrix is
𝑦1 𝑦2
𝑥1 𝑝 1−𝑝 𝑝 𝑞
[𝑃(𝑌|𝑋)] = 𝑥 [ =
2 1−𝑝 𝑝 ] [𝑞 𝑝]
where, p is the transition probability.
Signal flow graph or channel diagram –

p
𝒙𝟏 𝒚𝟏

1-p
1-p
p
𝒙𝟐 𝒚𝟐

It is the most common and widely used channel. It is called Binary Symmetric Channel
because the probability of receiving a ‘1’ if a ‘0’ is send is the same, has the probability
of receiving a ‘0’ if a ‘1’ is sent.

5. Binary Erasure Channel (BEC): The channel matrix of BEC is


0 𝑦 1
0 𝑝 𝑞 0
[𝑃(𝑌|𝑋)] = [ ]
1 0 𝑞 𝑝
Signal flow graph or channel diagram –

p
0 0
q 0

q
0
1 1
p

p
0 0
q

Input 𝒚 Output

q
1 1
p
BEC is one of the important types of channel used in digital communication.
Whenever an error occurs the symbol will be received as y and no decision will be
made about the information, but an immediate request will be made for
retransmission, rejecting what has been received. Thus, ensuring 100% correct data
recovery. This channel is also a type of symmetric channel.

𝑃(𝑥1 = 0) = 𝛼, 𝑃(𝑥2 = 1) = 1 − 𝛼

1 1
𝐻(𝑌|𝑋) = 𝑝 log + 𝑞 log
𝑝 𝑞

1 1
𝐻(𝑋) = 𝛼 log 𝛼 + (1 − 𝛼) log
(1−𝛼)

The joint probability matrix [𝑃(𝑋, 𝑌)] can be found by multiplying the rows of 𝑃(𝑌|𝑋)
by 𝛼 and (1 − 𝛼) respectively.

𝛼𝑝 𝛼𝑞 0
[P(𝑋, 𝑌)] = [ ]
0 (1 − 𝛼)𝑞 (1 − 𝛼)𝑝

[𝑃(𝑌)] = [𝛼𝑝 𝑞 (1 − 𝛼)𝑝]


[𝑃(𝑋|𝑌)] can be found by dividing the column of joint probability matrix p(𝑦1 ), 𝑝(𝑦2 )
and 𝑝(𝑦3 ) respectively.
1 𝛼 0
[𝑃(𝑋|𝑌)] = [ ]
0 1−𝛼 1

[H(𝑋|𝑌)] = − ∑2𝑖=1 ∑3𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 |𝑦𝑗 )

= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 |𝑦1 ) + 𝑃(𝑥1 , 𝑦2 ) log 𝑃(𝑥1 | 𝑦2 ) + ⋯ + 𝑃(𝑥2 , 𝑦3 ) log 𝑃(𝑥2 | 𝑦3 )]

= −[𝛼𝑝 log 1 + 𝛼𝑞 log 𝛼 + (1 − 𝛼)𝑞 log(1 − 𝛼) + (1 − 𝛼)𝑝 log 1]

= −𝑞[𝛼 log 𝛼 + (1 − 𝛼) log(1 − 𝛼)]

= 𝑞𝐻(𝑋) = (1 − 𝑝)𝐻(𝑋)

𝐼(𝑋; 𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) = 𝐻(𝑋) − (1 − 𝑝)𝐻(𝑋) = 𝑝𝐻(𝑋)

𝐶 = max 𝐼(𝑋; 𝑌) = max[𝑝𝐻(𝑋)] = 𝑝 max[𝐻(𝑋)] = 𝑝 [∵ 𝑚𝛼𝑥[𝐻(𝑋)] = 1]


6. Cascaded Channel:
Signal flow graph or Channel diagram –

p 𝒚𝟏 p
𝒙𝟏 𝒛𝟏
q
q
𝑋 𝑌

q q

𝒙𝟐 𝒛𝟐
p 𝒚𝟐 p

Channel diagram of Cascaded Channel is a merger of two BSC diagrams.


The equivalent circuit is
p’
𝒙𝟏 𝒛𝟏

q'
q’
p'
𝒙𝟐 𝒛𝟐

The message from 𝑥1 reaches 𝑧1 in two ways 𝑥1 → 𝑦1 → 𝑧1 and 𝑥1 → 𝑦2 → 𝑧1 . Both


probabilities are 𝑝, 𝑝 and 𝑞, 𝑞.
Now, 𝑝′ = 𝑝2 + 𝑞 2
= (𝑝 + 𝑞)2 − 2𝑝𝑞 = 1 − 2𝑝𝑞
Similarly, the message from 𝑥1 to 𝑧2 will reach in two ways 𝑥1 → 𝑦1 → 𝑧2 and 𝑥1 →
𝑦2 → 𝑧2 . Therefore,
𝑞 ′ = 𝑝𝑞 + 𝑞𝑝 = 2𝑝𝑞
1 − 2𝑝𝑞 2𝑝𝑞 𝑝′ 𝑞′
[𝑃(𝑍|𝑋)] = [ ] =[ ′ ]
2𝑝𝑞 1 − 2𝑝𝑞 𝑞 𝑝′
Thus, the cascaded channel is equivalent to a single binary symmetric channel with
error probability equal to 2𝑝𝑞.
Now, capacity of BSC
𝐶 = 1 − 𝐻(𝑞)
So, capacity of cascaded channel
𝐶 = 1 − 𝐻(𝑞 ′ ) = 1 − 𝐻(2𝑝𝑞)
For 0.5 > 𝑞 > 0, 2pq is always greater than q. Hence, the channel capacity of two
cascaded binary symmetric channel is less than the single binary symmetric channel
as expected.

7. Symmetric Channel: A symmetric channel is defined as the one for which


a) 𝐻 (𝑦𝑗 |𝑥𝑖 ) is independent of i
b) ∑𝑚
𝑖=1 𝑃 (𝑦𝑗 |𝑥𝑖 ) is independent of j

A channel is symmetric if rows and columns of the channel matrix [𝑃(𝑌|𝑋)] are
separately identical except for permutation. If the channel matrix is a square matrix
than for symmetric channel the rows and columns are identical i.e., sum of each
column should be equal to the sum of each row. For example,
1 1 1
2 4 4
1 1 1
a) [𝑃(𝑌|𝑋)] = 4 2 4
1 1 1
[4 4 2]

This is a symmetric channel as rows and columns are identical except for
permutations. (Each row and column contain one ½ and two ¼).
1 1 1
2 4 4
1 1 1
b) [𝑃(𝑌|𝑋)] = 4 4 2
1 1 1
[2 4 4]

It is not a symmetric channel as although the rows are identical except for
permutations, the columns are not identical.

Q. A discrete source transmits message 𝑥1 , 𝑥2 , 𝑥3 with the probabilities 0.3, 0.4, 0.3
respectively. The source is connected to the channel as given in the diagram. Calculate
all entropies.
0.8
𝒙𝟏 𝒚𝟏
0.2

1
𝒙𝟐 𝒚𝟐

0.3
𝒙𝟑 𝒚𝟑
0.7

Solution: It is given that [𝑃(𝑋)] = [𝑝(𝑥1 ) 𝑝(𝑥2 ) 𝑝(𝑥3 )] = [0.3 0.4 0.3]
Conditional probability matrix of channel matrix for the given diagram is
𝑦1 𝑦2 𝑦3
𝑥1 0.8 0.2 0
[𝑃(𝑌|𝑋)] = 𝑥2 [ 0 1 0]
𝑥3 0 0.3 0.7
The joint probability matrix [𝑃(𝑋, 𝑌)] can be found by multiplying the rows of [𝑃(𝑌|𝑋)] by
0.3, 0.4 and 0.3 respectively.

0.24 0.06 0
[P(𝑋, 𝑌)] = [ 0 0.40 0 ]
0 0.09 0.21

[𝑃(𝑌)] = [𝑝(𝑦1 ) 𝑝(𝑦2 ) 𝑝(𝑦3 )] = [0.24 0.55 0.21]


[𝑃(𝑋|𝑌)] can be found by dividing the column of joint probability matrix p(𝑦1 ), 𝑝(𝑦2 ) and
𝑝(𝑦3 ) respectively.
6
1 0
55
8
[𝑃(𝑋|𝑌)] = 0 11
0
9
[0 55
1]

[H(𝑋|𝑌)] = − ∑3𝑖=1 ∑3𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 |𝑦𝑗 )

= −[𝑃(𝑥1 , 𝑦1 ) log 𝑃(𝑥1 |𝑦1 ) + 𝑃(𝑥1 , 𝑦2 ) log 𝑃(𝑥1 | 𝑦2 ) + ⋯ + 𝑃(𝑥3 , 𝑦3 ) log 𝑃(𝑥3 | 𝑦3 )]

6 8 9
= − [0.24 log 1 + 0.06 log + 0.4 log + 0.09 log + 0.21 log 1]
55 11 55
= −[0 + 0.06(log 6 − log 55) + 0.4(log 8 − log 11) + 0.09(log 9 − log 55) + 0]

= −[0.06(2.584963 − 5.78136) + 0.4(3 − 3.459432) + 0.09(3.169925 − 5.78136)]

= −(−0.19178382 − 0.1837728 − 0.23502915) = 0.61058577 bit/message

Q. A discrete source is connected to the channel as given in diagram. Calculate all


entropies.

0.8
𝒙𝟏 𝒚𝟏
0.2

1
𝒙𝟐 𝒚𝟐

0.1
𝒙𝟑 𝒚𝟑
0.9
Solution: In this case channel matrix will be
𝑦1 𝑦2 𝑦3
𝑥1 0.8 0 0
[𝑃(𝑋|𝑌)] = 𝑥2 [0.2 1 0.1]
𝑥3 0 0 0.9
1 1 1
Here, 𝑃(𝑌) is not given so take it as 𝑃(𝑌) = [3 3
]
3
4
0 0
15
1 1 1
Now, [𝑃(𝑋, 𝑌)] = [𝑃(𝑌)][𝑃(𝑋|𝑌)] = 15 3 30
3
[0 0 10]
4 13 3
[𝑃(𝑋)] = [15 30 10
]
1 0 0
[𝑃(𝑋,𝑌)] 2 10 1
[𝑃(𝑌|𝑋)] = [𝑃(𝑋)]
= [13 13 9
]
0 0 1
𝐻(𝑋) = − ∑3𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) = −[𝑃(𝑥1 ) log 𝑃(𝑥1 ) + 𝑃(𝑥2 ) log 𝑃(𝑥2 ) + 𝑃(𝑥3 ) log 𝑃(𝑥3 )]
4 4 13 13 3 3
= − [ log + log + log ]
15 15 30 30 10 10
4 13 3
= − [15 (−1.906891) + 30 (−1.206451) + 10 (−1.736965)]

= −[−0.5095042 − 0.5227954 − 0.5210895] = 1.5533891 bit/message


Similarly, 𝐻(𝑌) = − ∑3𝑗=1 𝑃(𝑦𝑗 ) log 𝑃(𝑦𝑗 ) = 1.584962501

𝐻(𝑋, 𝑌) = − ∑3𝑖=1 ∑3𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 ) = 1.981


𝐻(𝑌|𝑋) = − ∑3𝑖=1 ∑3𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑦𝑗 |𝑥𝑖 ) = 0.397
𝐻(𝑋|𝑌) = − ∑3𝑖=1 ∑3𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃(𝑥𝑖 | 𝑦𝑗 ) = 0.4288

Q1. If conditional probability channel matrix is


1 1
0
[𝑃(𝑌|𝑋)] = [ 2 2 ], then a) draw the channel diagram, b) find the channel capacity.
0 0 1
Solution: a)
0.5
𝒙𝟏 𝒚𝟏
0.5

𝒚𝟐
1
𝒙𝟐 𝒚𝟑

b) Let 𝑃(𝑥1 ) = 𝛼, 𝑃(𝑥2 ) = 1 − 𝛼


𝛼 𝛼
0 𝛼 𝛼
[𝑃(𝑋, 𝑌)] = [ 2 2 ], [𝑃(𝑌)] = [ 2 2
1 − 𝛼 ],
0 0 1−𝛼
𝛼
𝐻(𝑌) = −𝛼 log − (1 − 𝛼) log(1 − 𝛼), 𝐻(𝑌|𝑋) = 𝛼,
2
𝛼
𝐼(𝑋; 𝑌) = 𝐻(𝑌) − 𝐻(𝑌|𝑋) = −𝛼 log 2 − (1 − 𝛼) log(1 − 𝛼) − 𝛼
1
𝐶 = max 𝐼(𝑋; 𝑌) it occurs at 𝛼 = 2 i.e.
1 1 1 1 1
𝐶 = − 2 log 4 − 2 log 2 − 2 = −0.5(log 1 − log 4) − 0.5(log 1 − log 2) − 0.5

= −0.5(0 − 2 + 0 − 1) = −0.5(−3) = 1.5 bit/sec.

1
0
20
1 3 1 1 9
Q2. Given [𝑃(𝑋, 𝑌)] = 5 10
, 𝑃(𝑥1 ) = , 𝑃(𝑥2 ) = , 𝑃(𝑥3 ) = . Find 𝑃(𝑌|𝑋),
20 2 20
1 2
[20 5]

𝐻(𝑋), 𝐻(𝑌), 𝐻(𝑋, 𝑌), 𝐻(𝑌|𝑋).


1 0
2 3
Solution: 𝑃(𝑌|𝑋) = [ 5 5]
1 8
9 9
3 7
𝑃(𝑌) = [10 10
], 𝐻(𝑋) = 1.2343 bit/message, 𝐻(𝑌) = 0.8813 bit/message

𝐻(𝑋, 𝑌) = 1.94642 bit/message, 𝐻(𝑌|𝑋) = 0.71197 bit/message

8. Binary Channel (Non-symmetric): Channel matrix of this channel is


𝑃11 𝑃12
[𝑃(𝑌|𝑋)] = [
𝑃21 𝑃22 ]
Signal flow graph or channel diagram is

𝑝11
𝒙𝟏 (0) 𝒚𝟏 (0)

𝑝21
𝑝12
𝑝22
𝒙𝟐 (1) 𝒚𝟐 (1)

To find the channel capacity of non-symmetric binary channel, the auxiliary variable 𝑄1 and
𝑄2 are defined by
[𝑃][𝑄] = −[𝐻]

𝑃11 𝑃12 𝑄1 −𝑃 log 𝑃11 − 𝑃12 log 𝑃12 𝑃 log 𝑃11 + 𝑃12 log 𝑃12
[𝑃 = − [ 11 = 11
21 𝑃22 ] [𝑄2 ] −𝑃21 log 𝑃21 − 𝑃22 log 𝑃22 ] [𝑃21 log 𝑃21 + 𝑃22 log 𝑃22 ]

Channel capacity is 𝐶 = log(2𝑄1 + 2𝑄2 )

Q. Find the mutual information and channel capacity when 𝑃(𝑥1 ) = 0.6 and 𝑃(𝑥2 ) = 0.4
and channel diagram is
0.8
𝒙𝟏 𝒚𝟏
0.3

0.2
0.7
𝒙𝟐 𝒚𝟐
Solution: 𝑦1 𝑦2
𝑥1 0.8 0.2
[𝑃(𝑌|𝑋)] = 𝑥 [ ]
2 0.3 0.7

[𝑃(𝑋)] = [𝑃(𝑥1 ) 𝑃(𝑥2 )] = [0.6 0.4]

0.48 0.12
[𝑃(𝑋, 𝑌)] = [𝑃(𝑌|𝑋)][𝑃(𝑋)] = [ ]
0.12 0.28

[𝑃(𝑌)] = [𝑝(𝑦1 ) 𝑝(𝑦2 )] = [0.6 0.4]

[𝑃(𝑋, 𝑌)] 0.8 0.3


[𝑃(𝑋|𝑌)] = =[ ]
[𝑃(𝑌)] 0.2 0.7

𝐼(𝑋; 𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) or 𝐼(𝑋; 𝑌) = 𝐻(𝑌) − 𝐻(𝑌|𝑋)

𝐻(𝑋) = − ∑2𝑖=1 𝑃(𝑥𝑖 ) log 𝑃(𝑥𝑖 ) = 0.9709502 bit/message

𝐻(𝑋, 𝑌) = − ∑2𝑖=1 ∑2𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 𝑃 (𝑥𝑖 , 𝑦𝑗 ) = 1.7566 bit/message

𝐻(𝑌) = − ∑2𝑗=1 𝑃(𝑦𝑗 ) log 𝑃(𝑦𝑗 ) = 0.9709502 bit/message

𝐻(𝑌|𝑋) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = 0.78565

𝐻(𝑋|𝑌) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌) = 0.78565

𝐼(𝑋; 𝑌) = 0.9709502 − 0.78565 = 0.1853

For channel capacity 𝑃11 = 0.8, 𝑃12 = 0.2, 𝑃21 = 0.3, 𝑃22 = 0.7

[𝑃][𝑄] = −[𝐻]
𝑃11 𝑃12 𝑄1 −𝑃 log 𝑃11 − 𝑃12 log 𝑃12 𝑃 log 𝑃11 + 𝑃12 log 𝑃12
[𝑃 = − [ 11 = 11
21 𝑃22 ] [𝑄2 ] −𝑃21 log 𝑃21 − 𝑃22 log 𝑃22 ] [𝑃21 log 𝑃21 + 𝑃22 log 𝑃22 ]
0.8 0.2 𝑄1 0.8 log 0.8 + 0.2 log 0.2
[ ] [ ] = [0.3 log 0.3 + 0.7 log 0.7]
0.3 0.7 𝑄2
0.8 0.2 𝑄1 −0.25754 − 0.46438
[ ][ ] = [ ]
0.3 0.7 𝑄2 −0.52109 − 0.36020
0.8 0.2 𝑄1 −0.72192
[ ][ ] = [ ]
0.3 0.7 𝑄2 −0.88129
𝑄1 −0.658172
[𝑄 ] = [ ]
2 −0.976912
Channel capacity is 𝐶 = log(2𝑄1 + 2𝑄2 ) = log(2−0.658172 + 2−0.976912 )

= log 1.141747 = 0.189034 bit/message

Channel Capacity of Symmetric Channel:

𝐼(𝑋; 𝑌) = 𝐻(𝑌) − 𝐻(𝑌|𝑋)


𝑚

= 𝐻(𝑌) − ∑ 𝐻(𝑌|𝑥𝑖 ) 𝑃(𝑥𝑖 )


𝑖=1

= 𝐻(𝑌) − 𝐴 ∑ 𝑃(𝑥𝑖 )
𝑖=1

where, 𝐴 = 𝐻(𝑌|𝑥𝑖 ) is independent of i and hence taken out of the summation.

Now, ∑𝑚
𝑖=1 𝑃(𝑥𝑖 ) = 1

𝐼(𝑋; 𝑌) = 𝐻(𝑌) − 𝐴

Channel Capacity:

𝐶 = max[𝐼(𝑋; 𝑌)] = max[𝐻(𝑌) − 𝐴] = max 𝐻(𝑌) − 𝐴 = log 2 𝑛 − 𝐴

Q. For the binary symmetric channel


p
𝒙𝟏 (0) 𝒚𝟏 (0)

1-p
1-p
p
𝒙𝟐 (1) 𝒚𝟐 (1)

Find the channel capacity for (𝑖) 𝑃 = 0.9, (𝑖𝑖) 𝑃 = 0.3, (𝑖𝑖𝑖) 𝑃 = 0.7
Solution:
𝑦1 𝑦2
𝑥1 𝑝 1−𝑝 𝑝 𝑞
[𝑃(𝑌|𝑋)] = 𝑥 [ =
2 1−𝑝 𝑝 ] [𝑞 𝑝]

𝐶 = log 2 𝑛 − 𝐴 = log 2 2 − 𝐻(𝑌|𝑥𝑖 ) = 1 − [− ∑2𝑖=1 𝑃(𝑦𝑗 |𝑥𝑖 ) log 𝑃(𝑦𝑗 |𝑥𝑖 )]

= 1 + [𝑝 log 𝑝 + (1 − 𝑝) log(1 − 𝑝)] = 1 + 𝑝 log 𝑝 + 𝑞 log 𝑞

= 1 − 𝐻(𝑝) = 1 − 𝐻(𝑞) [It is binary, so 𝐻(𝑝) = 𝐻(𝑞)]

Now,

(i) For 𝑝 = 0.9, 𝐶 = 0.53100

(ii) For 𝑝 = 0.8, 𝐶 = 0.278071

(iii) For 𝑝 = 0.7, 𝐶 = 0.11871

Q. Write the matrices [𝑃(𝑌|𝑋)], [𝑃(𝑋|𝑌)] and [𝑃(𝑋, 𝑌)] for following channel diagrams:

(i) 0.2 (ii)


𝒙𝟏 𝒚𝟏 0.15 𝒚𝟏
𝒙𝟏
0.1 0.2
0.15
0.2 𝒙𝟐
𝒙𝟐 𝒚𝟐
0.15
0.1 0.1 0.2
𝒙𝟑 𝒚𝟐
0.2 0.15
𝒙𝟑 𝒚𝟑

𝒙𝟏 0.25 0.8
(iii) 𝒚𝟏 (iv) 𝒙𝟏 𝒚𝟏
0.15 0.2
0.25
𝒚𝟐 1
𝒙𝟐 𝒚𝟐
0.1 0.3
0.15
𝒙𝟐 𝒚𝟑 0.7 𝒚𝟑
𝒙𝟑
0.1
Solution:

0.2 0.1 0 0.15 0.15


(i) [𝑃(𝑋, 𝑌)] = [0.1 0.2 0.1] (ii) [𝑃(𝑋, 𝑌)] = [ 0.2 0.2 ]
0 0.1 0.2 0.15 0.15
0.8 0 0
0.25 0.15 0.1
(iii) [𝑃(𝑋, 𝑌)] = [ ] (iv) [𝑃(𝑋, 𝑌)] = [0.2 1 0.3]
0.25 0.15 0.1
0 0 0.7

Q. Find the entropies and mutual information for the following channel diagrams:
(i) (ii)
0.15 𝒚 0.8
𝒙𝟏 𝟏 𝒙𝟏 𝒚𝟏
0.2 0.2
0.15
𝒙𝟐 1
𝒙𝟐 𝒚𝟐
0.15 0.3
0.2
𝒙𝟑 𝒚𝟐 0.7 𝒚𝟑
𝒙𝟑
0.15

Solution:

0.15 0.15
(i) [𝑃(𝑋, 𝑌)] = [ 0.2 0.2 ]
0.15 0.15

[𝑃(𝑋)] = [0.3 0.4 0.3]

[𝑃(𝑌)] = [0.5 0.5]

𝐻(𝑋) = −[−0.521 − 0.5288 − 0.521] = 1.5708

𝐻(𝑌) = 1

𝐻(𝑋, 𝑌) = −[4(−0.4105) + 2(−0.4644)] = 2.5708

𝐼(𝑋; 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌) = 1.5708 + 1 − 2.5708 = 0

0.8 0 0
(ii) [𝑃(𝑋|𝑌)] = [0.2 1 0.3]
0 0 0.7

[𝑃(𝑌)] = [0.2 0.5 0.3]


0.16 0 0
[𝑃(𝑋, 𝑌)] = [0.04 0.5 0.09]
0 0 0.21

[𝑃(𝑋)] = [0.16 0.63 0.21]

𝐻(𝑋) = −[−0.423 − 0.4199 − 0.4728] = 1.3157

𝐻(𝑌) = −[−0.4644 − 0.5 − 0.5211] = 1.4855

𝐻(𝑋, 𝑌) = −[−0.423 − 0.1857 − 0.5 − 0.3126 − 0.47281] = 1.89411

𝐻(𝑋|𝑌) = 0.40861

𝐻(𝑌|𝑋) = 0.57841

𝐼(𝑋; 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌) = 1.3157 + 1.4855 − 1.89411 = 0.90709

Q. Find the channel capacity of the channel diagram shown below:

0.9 0.8
𝒙𝟏 𝒚𝟏

0.1
0.2

0.1
0.2
𝒙𝟐 𝒚𝟐
0.8 0.9
Solution:

First we make the equivalent channel.

𝑃11
𝒙𝟏 𝒚𝟏

𝑃21
𝑃12
𝑃22
𝒙𝟐 𝒚𝟐

𝑃11 = (0.9 × 0.8) + (0.1 × 0.1) = 0.73


𝑃12 = (0.9 × 0.2) + (0.1 × 0.9) = 0.27
𝑃21 = (0.8 × 0.1) + (0.2 × 0.8) = 0.24
𝑃22 = (0.8 × 0.9) + (0.2 × 0.2) = 0.76
0.73 0.27
[𝑃(𝑌|𝑋)] = [ ]
0.24 0.76
𝑝 1−𝑝
Binary non-symmetric as it is not of the form [ .
1−𝑝 𝑝 ]
Now, the auxiliary variable 𝑄1 and 𝑄2

[𝑃][𝑄] = −[𝐻]
𝑃11 𝑃12 𝑄1 −𝑃 log 𝑃11 − 𝑃12 log 𝑃12 𝑃 log 𝑃11 + 𝑃12 log 𝑃12
[𝑃 = − [ 11 = 11
21 𝑃22 ] [𝑄2 ] −𝑃21 log 𝑃21 − 𝑃22 log 𝑃22 ] [𝑃21 log 𝑃21 + 𝑃22 log 𝑃22 ]
0.73 0.27 𝑄1 −0.33144 − 0.51002
[ ][ ] = [ ]
0.24 0.76 𝑄2 −0.4941 − 0.300905
𝑄1 −0.8670576531
[𝑄 ] = [ ]
2 −772251515306
Therefore,
𝐶 = log 2 (2𝑄1 + 2𝑄2 ) = 0.1811240 bit/message

Fundamental Theorem of Information Theory:


Statement: Given a discrete memoryless channel with capacity 𝐶 > 0 and a positive number
𝑅 < 𝐶, there exists a sequence of codes 𝒜1 , 𝒜2 , 𝒜3 , … such that 𝒜𝑛 is a code ([2𝑛𝑅 ], 𝑛, 𝜆𝑛 )
and 𝜆𝑛 → 0 as 𝑛 → ∞.
Thus it is possible, by choosing 𝑛 sufficiently large, to reduce the maximum probability of
error to a figure as low as desired while at the same time maintaining the transmission rate
𝑅.
The fundamental theorem of information theory states that digits can be transmitted
through a noisy channel with an arbitrarily small probability of error at any rate less than a
certain limit known as channel capacity.
whiteboard.fi

You might also like