0% found this document useful (0 votes)

2 views

ITC-6 sem -1

The document covers the fundamentals of Information Theory, including concepts such as uncertainty, information, and mutual information, which are essential for understanding data transmission and compression. It explains how information quantifies the reduction of uncertainty and discusses the mathematical formulations for calculating information and mutual information. Additionally, it includes exercises to apply these concepts in practical scenarios, emphasizing their importance in various fields like communication and machine learning.

Uploaded by

praveen.yadav

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ITC-6 sem -1

Uploaded by

praveen.yadav

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 66

Information Theory and Coding

IT-6th Sem.

Praveen Yadav
Unit:1 INFORMATION THEORY

Topic: 1 Uncertainty , Information, Mutual information

Information Theory
 Information Theory is a branch of applied mathematics and electrical
engineering that deals with the quantification, storage, and
communication of information.
 It provides a framework to measure the efficiency of data transmission
and compression in communication systems.
 It was developed by Claude Shannon in the 1940s.
 Information theory is vital in data compression, communication,
cryptography, machine learning, and more.
 It helps solve real-world problems by enabling more efficient data
storage, secure communication, and intelligent decision-making.
Outline
 Uncertainty
 Information
 Mutual information
 Exercise (L-1, L-2, L-3, L-4)
Uncertainty & Information, Mutual information

• Uncertainty represents the lack of complete knowledge

about an event or system. It indicates the range of
possibilities before the outcome is known.
Example : Flipping a fair coin.

• Information is a measure of the reduction in uncertainty

upon receiving new data or observing an event.
Example : Once you flip the coin and observe the outcome
(say it lands on Heads), you gain information that resolves
the uncertainty.
Information
• Information is function of probability of outcome.
Example :

• Amount of Information
I = log2(1/P(Outcome)) …(1)
I = -log2P(Outcome) …(2)
When log base is 2 (or binary system) the unit is a bit. When the base is ‘e’
the unit is ‘nat’ ,base is 10 unit Hartley.
1 nat = 1.4426 bits, 1 Hartley = 3.3219 bits.
Mutual information
• What is Mutual Information?
A measure of the amount of information one variable reveals about
another.

• Mutual information quantifies the reduction in uncertainty about one

random variable due to knowledge of another.

• Example: Weather and dressing choices.

Cont.…Mutual Information
Let x an y are two symbol which are we transmitted.
• Marginal Probability
P(X=x): Probability of a specific value of X.
P(Y=y): Probability of a specific value of Y.
• Joint Probability:
P(X=x, Y=y) : Probability of X and Y occurring together.
• Conditional Probability
P(X=x/Y=y): Probability of X given Y.
P(X=x/Y=y) = P(X=x, Y=y)/P(Y=y) …(3)

The information gained about (X=x) by the reception of (Y=y) is the

net reduction in its uncertainty , and is known as mutual information
I(x, y).
Cont.…Mutual Information

I(x, y) = Initial uncertainty – Final uncertainty

= log(1/P(X=x)) – log(1/ P(X=x/Y=y))
= – log(P(X=x)) + log(P(X=x/Y=y))
I(x, y) = log(P(X=x/Y=y)/P(X=x)) …(4)
Some thing interesting …
From equation (3) and (4)
I(x, y) = log(P(X=x, Y=y)/P(Y=y) P(X=x))
= log(P(Y=y/X=x)/P(Y=y)) …(5)
= I(y, x)
I(x, y) = I(y, x) …(6)
Mutual information is symmetrical in x and y.
Unit:1 INFORMATION THEORY
Topic: 1 Uncertainty , Information, Mutual information

L-1: A digital assistant predicts whether you will choose to

watch a movie or listen to music based on historical data. The
probabilities are:
 Watching a movie: P(Movie)=0.7
 Listening to music: P(Music)=0.3
(a) What is the amount of uncertainty (in bits) for each option?
(b) Which activity has higher uncertainty, and why?

L-2: A delivery company uses GPS data to predict whether a

package will arrive on time or be delayed:
 On-time: P(O)=0.8
 Delayed: P(D)=0.2
a) If the probability of delay increases to 0.5, calculate the
average information and compare the two scenarios.
b) What does the change in average information tell you about
the uncertainty of the delivery prediction system?
Cont…Topic: 1 Uncertainty , Information, Mutual
information
L-3: In a communication system, a transmitter sends a
binary signal (0 or 1) through a noisy channel. The
receiver decodes the signal with the following
probabilities:
 If 0 is sent, the receiver detects 0 with probability 0.90
and 1 with probability 0.1.
 If 1 is sent, the receiver detects 1 with probability 0.85
and 0 with probability 0.15.
(a) How much information is gained when the receiver
correctly detects a transmitted signal?
(b) How this information can help improve the system’s
reliability.
Cont…Topic: 1 Uncertainty , Information, Mutual
information
L-4: A weather monitoring station reports the likelihood of
three events based on real-time data:
Sunny: P(Sunny)=0.5
Cloudy: P(Cloudy)=0.3
Rainy: P(Rainy)=0.2
A prediction model updates the probabilities after
observing the wind speed:
Sunny: P(Sunny | Wind)=0.7
Cloudy: P(Cloudy | Wind)=0.2
Rainy: P(Rainy | Wind)=0.1
(a) Calculate the information gained about the weather
condition after observing the wind speed for each
event.
(b) How such updates in information can improve real-
time weather prediction systems.
Topic: 2 Marginal, conditional and joint Entropies
Mutual information for noisy channel:

• Consider the set of symbols 𝑥1, 𝑥2,….,𝑥𝑛,

the transmitter 𝑇𝑥 my produce. The
receiver 𝑅𝑥 may receive 𝑦1, 𝑦2
……….𝑦𝑚. Theoretically, if the noise and
jamming is neglected, then the set X=set
Y.

• The amount of information that 𝑦𝑗

provides about 𝑥𝑖 is called the mutual
information between 𝑥𝑖 and 𝑦𝑖.
Properties of 𝑰(𝒙𝒊 , 𝒚𝒋):

1. It is symmetric, 𝐼(𝑥𝑖 , 𝑦𝑗) = 𝐼(𝑦𝑗 , 𝑥𝑖).

2. 𝐼(𝑥𝑖 , 𝑦𝑗) > 0 if a posteriori probability > a priori probability, 𝑦𝑗 provides

+ve information about 𝑥𝑖 .

3. 𝐼(𝑥𝑖 , 𝑦𝑗) = 0 if a posteriori probability = a priori probability, which is the

case of statistical independence when 𝑦𝑗 provides no information about

𝑥𝑖 .
4. 𝐼(𝑥𝑖 , 𝑦𝑗) < 0 if a posteriori probability < a priori probability, 𝑦𝑗 provides

-ve information about 𝑥𝑖 , or 𝑦𝑗 adds ambiguity.

1. Joint entropy:
In information theory, joint entropy is a measure of the
uncertainty associated with a set of variables.
P(x , y) joint
probability n
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖Entro
) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖 )
𝑖=1
py
𝑚 𝑛
𝐻 ( 𝑋 ,𝑌 )= 𝐻( 𝑋𝑌)=− ∑ ∑ 𝑃 ( 𝑥𝑖, 𝑦𝑗 ) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖, 𝑦𝑗 ) 𝑏𝑖𝑡
Joint𝑠/ 𝑠 𝑦𝑚
𝑗=1 𝑖=1 Entropy
2. Conditional entropy:
In information theory, the conditional entropy quantifies the
amount of information needed to describe the outcome of a
random variable Y given that the value of another random variable
X is known. P(x | y) conditional
probability n
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖Entro
) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖 )
𝑖=1
py
𝑚 𝑛
𝐻 ( 𝑌 ∨ 𝑋 )=− ∑ ∑ 𝑃 ( 𝑥𝑖, 𝑦𝑗 ) 𝑙𝑜𝑔2 𝑃 ( 𝑦𝑗∨𝑥𝑖 ) 𝑏𝑖𝑡Conditional
𝑠/𝑠 𝑦𝑚𝑏𝑜𝑙
𝑗=1 𝑖=1 Entropy
3. Marginal Entropies:

Marginal entropies is a term usually used to denote both source

entropy H(X) defined as before and the receiver entropy H(Y)
given by:
n
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖 ) 𝑙𝑜𝑔2Source
𝑃 ( 𝑥𝑖 )
𝑖 =1
Entropy

𝑚
𝐻 ( 𝑦 ) =− ∑ 𝑃 ( 𝑦𝑗 ) 𝑙𝑜𝑔 2 𝑃 ( 𝑦𝑗 ) 𝑏𝑖𝑡 /Receiver
𝑠𝑦𝑚𝑏𝑜𝑙
𝑗=1
Entropy
4. Relationship between joint, conditional and

Noise entropy: 𝐻(𝑌 ∣ 𝑋)

transinformation:

= 𝐻(𝑋,𝑌) − 𝐻(𝑋)

Loss entropy: 𝐻(𝑋 ∣ 𝑌)

= 𝐻(𝑋,𝑌) − 𝐻(𝑌)
Also we have transinformation (average mutual information)

𝐼(𝑋,𝑌) = 𝐻(𝑋) − 𝐻(𝑋

∣ 𝑌)

𝐼(𝑋,𝑌) = 𝐻(𝑌) −
Example: The joint probability of a system is given by:

[ ]
𝑥1 ¿ 0.5 0.25
𝑃 ( 𝑋 , 𝑌 )= 𝑥 2 ¿ 0 0.125
𝑥 3 ¿ 0.0625 0.0625

Find:
1. Marginal entropies.
2. Joint entropy.
3. Conditional entropies.
4. The transinformation.
Solution: x1 x2 x3 y1
y2
P(x)=[0.75 0.125 0.125], P(y)= [0.5625
1- Marginal 0.4375]
entropies:

[
𝑛
0.75 𝑙𝑛0.75 +2 ∗0.12
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖 ) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖 )=−
𝑖=1 𝑙𝑛2
¿ 1.06127 𝑏𝑖𝑡 /𝑠 𝑦𝑚𝑏𝑜𝑙

[
𝑚
0.5625𝑙𝑛0.5625+0.4375𝑙𝑛
𝐻 ( 𝑦 ) =− ∑ 𝑃 ( 𝑦𝑗 ) 𝑙𝑜𝑔 2 𝑃 ( 𝑦𝑗 )=−
𝑗=1 𝑙𝑛2
¿ 0.9887 𝑏𝑖𝑡/ 𝑠 𝑦𝑚𝑏𝑜𝑙
[
𝑥1 ¿ 0.5 0.2
2- Joint 𝑃 ( 𝑋 , 𝑌 )= 𝑥 2 ¿ 0 0.12
entropy: 𝑥 3 ¿ 0.0625 0.

m n
H ( x , y )=− ∑ ∑ P ( xi , yj ) log2 P ( xi , yj )
j =1 i=1
3. Conditional entropies :

𝐻 ( 𝑦|𝑥 )= 𝐻 ( 𝑥 , 𝑦 ) − 𝐻 ( 𝑥 )=1.875 − 1.06127=0.813 𝑏𝑖 𝑡 /𝑠 𝑦

𝐻 ( 𝑥| 𝑦 )= 𝐻 ( 𝑥 , 𝑦 ) − 𝐻 ( 𝑦 )=1.875 −0.9887=0.886 𝑏𝑖𝑡/ 𝑠 𝑦𝑚

4. The trans-information :

𝐼 ( 𝑥 , 𝑦 )=𝐻 ( 𝑥 ) − 𝐻 ( 𝑥| 𝑦 ) =1.06127 − 0.886=0.175 𝑏𝑖𝑡 /𝑠 𝑦𝑚𝑏𝑜

Topic: 2 Marginal, conditional and joint Entropies
L-1: A TV broadcasting station sends one of three types of
programs during primetime:
 News (P(N)=0.5
 Sports (P(S)=0.3
 Movies (P(M)=0.2
Calculate the marginal entropy H(X) of the program selection,
where X represents the type of program broadcast.
L-2: A temperature sensor sends readings to a control system
with two variables:
X: The actual temperature status (High, Low},
Y: The sensor’s output (High, Low}.
The joint probabilities are:
 P(X,Y)=[P(High, High)=0.6 P(High, Low)=0.1
P(Low, High)=0.2 P(Low, Low)=0.1]
Compute the factor which represents the uncertainty in the
sensor’s output given the actual temperature.
Cont…Topic: 2 Marginal, conditional and joint Entropies
L-3: A communication channel transmits two binary
signals X and Y with the following joint probabilities:
P(X,Y) = [P(0,0)=0.3, P(0,1)=0.2, P(1,0)=0.1,
P(1,1)=0.4]
Explain how joint entropy represents the total uncertainty
in the transmitted and received signals.
L-4: In a machine learning model, the variables X (input
features) and Y(model predictions) are related as follows:
P(X,Y)=[P(A, Correct)=0.4, P(A, Incorrect)=0.1,
P(B, Correct)=0.3, P(B, Incorrect)=0.2]
(a) Calculate the all entropies.
(b) Discuss how understanding these entropies can help
improve the reliability of the machine learning model.
Topic: 3- Shannon’s concept of information:
Information as a Measure of Uncertainty, Entropy-A Measure
of Average Information, Communication System Model,
Channel Capacity
Shannon’s Information
Theory
The
Claude Shannon: A Mathematical Theory of Communication
Bell System Technical Journal, 1948

 Shannon’s measure of information is the number of bits to represent the

amount of uncertainty (randomness) in a data source, and is defined as
entropy
n
H   pi log( pi )
i 1

Where there are n symbols 1, 2, … n , each with

probability of occurrence of pi
Shannon’s Entropy
 Consider the following string consisting of symbols a and
b:

abaabaababbbaabbabab… ….

 On average, there are equal number of a and b.

 The string can be considered as an output of a below source
with equal probability of outputting symbol a or b:
0.5 a

We want to characterize the average information

generated by the source!

0.5 b

source
Intuition on Shannon’s Entropy
n
Why H   p log( p ) i i
i 1
Suppose you have a long random string of two binary symbols 0 and 1, and the probability of symbols 1
and 0 are p
and
0 1 p
Ex: 00100100101101001100001000100110001 ….
If any string is long enough say N, it is likely to contain Np 0’s and Np11’s. The probability of this
0
string pattern occurs is equal to

p  p0Np0 p1Np1
Hence, # of possible patterns is 1 / p  p0 Np0 p1 Np1
1
# bits to represent all possible patterns is log( p  Np0
0 p1 Np1 )   Npi log pi
i 0
The average # of bits to represent the symbol is therefore
1
  p log p
i 0
i i
More Intuition on Entropy
 Assume a binary memoryless source, e.g., a flip of a coin. How
much information do we receive when we are told that the
outcome is heads?

 If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the
amount of information is 1 bit.

 If we already know that it will be (or was) heads, i.e., P(heads) =

1, the amount of information is zero!

 If the coin is not fair, e.g., P(heads) = 0.9, the amount of

information is more than zero but less than one bit!

 Intuitively, the amount of information received is the same if

P(heads) = 0.9 or P (heads) = 0.1.
Self Information
 So, let’s look at it the way Shannon did.
 Assume a memoryless source with
 alphabet A = (a1, …, an)
 symbol probabilities (p1, …, pn).
 How much information do we get when finding out that
the next symbol is ai?
 According to Shannon the self information of ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.

For both the events to happen, the probability is

pA ¢ pB. However, the amount of information
should be added, not multiplied.

Logarithms satisfy this!

No, we want the information to increase with
decreasing probabilities, so let’s use the negative
logarithm.
Self Information
Example 1:

Example 2:

Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
Self Information

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty

about the following symbol.
Entropy
Example: Binary Memoryless Source
BMS 01101000…

Let

Of t e
n de
Then note
d

1
The uncertainty (information) is greatest when

0 0.5 1
Example
Three symbols a, b, c with corresponding probabilities:

P = {0.5, 0.25, 0.25}

What is H(P)?

Three weather conditions in Corvallis: Rain, sunny, cloudy with corresponding

probabilities:

Q = {0.48, 0.32, 0.20}

What is H(Q)?
Entropy: Three properties
1. It can be shown that 0 ≤ H ≤ log N.

2. Maximum entropy (H = log N) is reached when

all symbols are equiprobable, i.e.,
pi = 1/N.

3. The difference log N – H is called the

redundancy of the source.
Communication System Model
Channel Capacity
In 1948, Claude Shannon developed a
mathematical model for channel such that,
insofar as the model is realistic, there exists an
upper limit on the rate at which information can
be transmitted from source to user error-free.
This upper limit is called the channel capacity, C.
It is a function of bandwidth (B) and signal-to-
noise ration (S/N or SNR).
 S
C B log 2  1  
 N
Topic: 3- Shannon’s concept of information: Information as a
Measure of Uncertainty, Entropy-A Measure of Average
Information, Communication System Model, Channel Capacity
L-1: A weather forecasting system predicts two possible outcomes
for tomorrow's weather:
 Sunny (P(S) = 0.8
 Rainy (P(R) = 0.2

(a) How much information (in bits) is gained when the forecast
predicts it will be sunny?
(b) Which weather outcome carries more uncertainty, and why?

L-2: A messaging app compresses text messages by analyzing

character usage. The probabilities of four characters A,B,C,D in
the messages are as follows:
P(A)=0.4, P(B)=0.3, P(C)=0.2, P(D)=0.1
(c) Calculate the entropy H(X) of the character distribution.
(d) How this entropy value relates to the efficiency of text
compression in the app.
Cont…Topic: 3- Shannon’s concept of information
L-3: In a communication system, a transmitter sends three
types of signals X1, X2, X3through a noisy channel. The
channel distorts the signals, and the receiver receives
corresponding signals Y1, Y2, Y3. The joint probabilities
are:

(a) Compute the mutual information I(X;Y) to quantify how

much information is shared between the transmitter and
receiver.
(b) How this mutual information reflects the effectiveness
of the communication channel.
Cont…Topic: 3- Shannon’s concept of information
L-4: "A wireless communication channel has a bandwidth
of 1 MHz and operates with a signal-to-noise ratio (SNR)
of 15 dB. What is the effect on channel capacity if the
SNR decreases to 10 dB? Discuss the impact of SNR on
channel capacity for this given bandwidth."
Topic: 4- Shannon’s measure of information: Properties of
Shannon’s Measure of Information: Non-Negativity, Higher for
Rare Events, Additivity for Independent Events
L-1: An email filtering system needs to classify incoming emails as
spam or not spam. The system assigns a probability to each word
appearing in an email based on past data.
1. Given that a rare word like “lottery” has a probability of 0.001 in
normal emails but 0.1 in spam emails, how much information is
gained when the word “lottery” appears in an email?
2. Use Shannon’s measure of information to decide whether seeing
this word should significantly impact the classification.

L-2: A smart traffic system analyzes road congestion at different

times of the day. The probability of a road being congested at 8 AM
is 0.8, whereas the probability of it being congested at 3 AM is
0.05. According to Shannon’s measure of information, which time
gives higher information content about road congestion?
How can this information help in optimizing traffic signal timing to
reduce congestion?
Cont…Topic: 4- Shannon’s measure of information
L-3: A company needs to improve its password security. The probability
of a user choosing "password123" is 0.1, while the probability of
choosing a random 12-character password is 10−12. Explain why using
a rare password increases security.
Propose an optimized method for users to create secure but memorable
passwords using Shannon’s principle.
L-4: A stock market analyst tracks daily price fluctuations of a
company's stock.
The probability of the stock price increasing by 1% is 0.5, while the
probability of it increasing by 10% in a day is 0.01.
1. Use Shannon’s measure to determine which event carries more
information.
2. How can investors use this principle to optimize risk assessment
strategies?
3. Given that stock movements are often independent events, how
does the additivity property of information help in predicting multi-
day trends?
Topic: 5- Shannon's model for Communication system:
Information source, Transmitter, Channle , Receiver, Noise and
Its Impact
Topic: 5- Shannon's model for Communication system:
Information source, Transmitter, Channle , Receiver, Noise and
Its Impact
L-1: A telecom company wants to improve its network so that users
experience fewer call drops. Based on Shannon’s model of
communication, identify the main sources of noise affecting the
signal and propose simple solutions to minimize call drops.
1. How does noise (e.g., signal interference, weak signals) impact
communication?
2. Suggest two practical ways to reduce call drops and improve
signal reception.

L-2: Many people experience slow Wi-Fi in certain areas of their

house. Using Shannon’s communication model, analyze the
communication process and determine how noise affects the signal.
3. What acts as the transmitter, channel, and receiver in a home Wi-
Fi network?
4. Propose three practical ways to optimize Wi-Fi performance and
minimize interference.
Cont…Topic: 5- Shannon's model for Communication system

L-3: An online banking system must ensure that customer data is

transmitted securely and efficiently over the internet. Based on
Shannon’s model, analyze how information is transmitted and how
noise (e.g., cyber threats) can impact data security.
1. What represents the information source, transmitter, channel,
receiver, and noise in an online banking transaction?
2. How can cyber attacks (e.g., hacking, phishing, data corruption)
be considered as noise in the communication process?

L-4: A space agency is developing a satellite-based internet service

for rural and remote areas. However, signal degradation due to
atmospheric interference is a major challenge.
3. How does Shannon’s communication model apply to satellite
communication?
4. How can error correction codes and adaptive modulation be used
to optimize communication?
Topic: 6- Source coding and line/channel coding: Definition,
difference , importance, types.
Topic: 6- Source coding and line/channel coding: Definition,
difference , importance, types.
L-1: A smart phone user notices that sending images and
videos over messaging apps consumes too much mobile
data.
1. How does source coding (compression) help in reducing the
size of multimedia files?
2. Suggest two practical ways to optimize data usage while
maintaining acceptable quality.

L-2: During voice calls, background noise often makes it hard

to understand the speaker. Mobile networks use error
correction techniques (channel coding) to improve call clarity.
3. How does channel coding (e.g., Hamming codes,
convolutional codes) help in improving call quality?
4. Propose two ways mobile networks can use coding
techniques to minimize noise and improve speech clarity.
Cont…Topic: 6- Source coding and line/channel coding

L-3: A user frequently experiences buffering when streaming videos over a

slow or unstable internet connection.
1. How does source coding (e.g., H.264, AV1) help in compressing videos
for efficient transmission?
2. How does channel coding (e.g., Forward Error Correction, Reed-Solomon
codes) help prevent video corruption?
3. Propose an optimized method where both source coding and channel
coding work together to improve streaming quality with minimal
buffering.

L-4: In disaster-stricken areas, communication networks are often disrupted.

Satellite-based emergency communication systems must transmit critical
messages and images with minimal data loss.
4. How does channel coding help protect data from errors caused by weak
signals?
5. Suggest an optimized encoding method for emergency communication
that ensures fast transmission, low bandwidth usage, and high accuracy.
Topic: 7- Bandwidth and Channel Capacity: Mutual
Information, Shannon’s Channel Capacity Theorem, Bandwidth
and Channel Capacity
Bandwidth and Channel Capacity
• Information transferring across channels
– Channel characteristics and binary
symmetric channel
– Average mutual information

• Average mutual information tells us what happens to information

transmitted across
channel, or it “characterises” channel
– But average mutual information is a bit too mathematical (too
abstract)
– As an engineer, one would rather characterises channel by its
physical quantities, such as bandwidth, signal power and noise
power or
• Also intuitively SNRsource with information rate R, one would
given
channel
like is capable
to know if of “carrying” the amount of information
transferred across it
– In other word, what is the channel capacity?
Review of Channel
Assumptions
• No amplitude or phase distortion by the channel, and the only
disturbance is due to additive white Gaussian noise (AWGN),
i.e. ideal channel

• In the simplest case, this can be modelled by a binary symmetric

channel (BSC)

• The channel error probability pe of the BSC depends on the noise

power N P relative to the signal power S P , i.e. SNR= S P / N P

• Hence pe could be made arbitrarily small by increasing the signal

power

• The channel noise power can be shown to be N P = N 0 B , where N 0 /2

is power spectral density of
the noise and B the channel bandwidth

Our aim is to determine the channel capacity C , the maximum

possible error-free information transmission rate across the channel
Channel Capacity for Discrete
Channels
• Shannon’s channel capacity C is based on the average mutual
information (average conveyed information across the channel),
and one possible definition is

C = m ax { I ( X , Y )} = max{H(Y ) − H ( Y | X ) } (bits/symbol)

where H ( Y ) is the average information per symbol at channel output

or destination entropy, and H ( Y | X ) error entropy

• Let t i be the symbol duration for X i and t a v be the average time for
transmission of a symbol, the channel capacity can also be
defined as

C = max { I ( X , Y )/t a v } (bits/second)

• C becomes maximum if H ( Y | X ) = 0 (no errors) and the symbols are

equiprobable (assuming constant symbol durations t i )

• Channel capacity can be expressed in either (bits/symbol) or

Channel
• Now I ( X , Y ) = Capacity:
H ( Y ) = H ( X ) , but theNoise-
entropy of the source is
given by:

H ( X ) = − Case
Free Σ
q P (X )
i 2 P (X )
i
log i=1 (bits/symbol)

• Let t i be symbol duration for X i ; the average time for transmission

of a symbol is
Σ
tav = q P ( X i ) · (second/
ti i=1 symbol)

• By definition, the channel capacity is C =

max { H(X)/t a v }
(bits/second)

• Assuming constant symbol durations ti = Ts ,

themaximum or the capacity is obtained with
equiprobable source symbols, C = log 2 q/T s , and this is
Channel Capacity for
BSC
• BSC with equiprobable source symbols P ( X 0 ) = P ( X 1 ) = 0.5 and variable
channel error probability p e (due to symmetry of BSC, P ( Y 0 ) = P ( Y 1 ) = 0.5)

• The channel capacity C (in bits/symbol) is given as

C = 1 + ( 1 − p e ) log 2 (1 − p e ) + p e log 2 pe

0.8

0.6
C

0.4

0.2

0
−4 −3 −2 −1
10 10 10 10
p
e

If p e = 0.5 (worst case), C = 0; and if p e = 0 (best case),

C = 1
Channel Capacity for BSC
(Derivation)
P (Y0 |X0 ) = 1 −
P (X0 ) = 2
1
X p0 e .QQ ) .Y
0
P (Y0 ) = 1
2
Q /
Q /
</
Q
Q /
/ P (Y0 |X1 ) =
/
/ /QQ p
Q eP ( Y 1 | X 0 ) =
/ Q
/ / Q
./ p e sQ P ( Y1 ) = 1
P ( X1 ) = 2 1
X1 ) .Y1 2
P ( Y 1 | X 1 ) = 1 − pe

P ( X 0 , Y 0 ) = P ( X 0 ) P ( Y 0 | X 0 ) = ( 1 − p e ) /2 , P ( X 0 , Y1 ) = P ( X 0 ) P (Y1 |X0 )

= p e /2 P ( X 1 , Y 0 ) = p e /2, P ( X 1 , Y 1 ) = (1 − pe )/2

P (Y0 |X0 ) P (Y1 |

I(X, Y ) = P ( X 0 , Y0 ) + P ( X 0 , Y1 ) +
X 0P) ( Y 0 ) P ( Y1 )
log 2 log 2
P (Y0 |X1 ) P (Y1 |X1 )
+ P ( X 1 , Y0 ) + P ( X 1 , Y1 )
P ( Y0 ) P ( Y1 )
log 2 log 2
1= ( 1 − p e ) log 2 2(1 − p e ) + 1 p e log 2 2p e + 1 p e log 2 2p e + 1 ( 1 − p e ) log 2 2(1
2 2 2 2
− pe )
= 1 + ( 1 − p e ) log 2 (1 − p e ) + p e log 2 p e (bits/symbol)
Channel Capacity and Channel
Coding

• Shannon’s theorem: If information rate R ≤ C , there exists a coding

technique such that information can be transmitted over the
channel with arbitrarily small error probability; if R > C , error-free
transmission is impossible

• C is the maximum possible error-free information transmission rate

• Even in noisy channel, there is no obstruction of reliable

transmission, but only a limitation of the rate at which
transmission can take place

• Shannon’s theorem does not tell how to construct such a capacity-

approaching code

• Most practical channel coding schemes are far from optimal,

but capacity- approaching codes exist, e.g. turbo codes and low-
density parity check codes
Capacity for Continuous Channels
• Entropy of a continuous (analogue) source, where the source output x
is described by the PDF p(x), is defined by
∫ +∞
H(x) = − p(x) log 2 p(x)dx
−∞

• According to Shannon, this entropy attends the maximum for

Gaussian PDFs p(x)
(equivalent to equiprobable symbols in the discrete case)
• Gaussian PDF with zero mean and 2
x
variance σ :
1 −(x 2 /2σ 2x )
p(x) = √ 2πσ
e
• The maximum entropy can be
shown to be
max 2
√ 1 2 2πeσx2
H x ( x ) = log 2πeσ = 2
log
Capacity for Continuous Channels
(continue)

• Since I ( x , y) = H ( y ) − H( y | x ) , and H ( y | x ) = H ( ε ) with ε being

AWGN
1

H(y|x) = log 2 2πeN P

2
• Therefore, the average mutual
information
1 SP
I ( x , y) = log2 1+
2 NP
Shannon-Hartley Law
• With a sampling rate of f s = 2 · B , the analogue channel capacity
is given by
SP
C = f s · I ( x , y) = B · log2 1 + (bits/second)
NP

where B is the signal bandwidth

• For digital communications, B (Hz) is equivalent to the channel

bandwidth, and
f s the symbol rate (symbols/second)

• Channel noise power is N P = N 0 · B , where N 0 is the power spectral

density of the channel AWGN

• Obvious implications:
S P increases the channel capacity
– Increasing the SNR
NP
– Increasing the channel bandwidth B increases the channel capacity
Bandwidth and SNR Trade
off
• From the definition of channel capacity, we can trade the channel bandwidth B
for the SNR or signal power S P , and vice versa

• Depending on whether B or S P is more precious, we can increase one and reduce

the other, and yet maintain the same channel capacity

• A noiseless analogue channel ( S P / N P = ∞) has an infinite capacity

• C increases as B increases, but it does not go to infinity as B → ∞; rather C

approaches an

We have
SP SP
C∞ = lim C = log 2 e = 1.44
B→∞ N0 N0
Bandwidth and SNR Trade
off – Example
• Q: A channel has an SNR of 15. If the channel bandwidth is
reduced by half, determine the increase in the signal power
required to maintain the same channel capacity

• A !
′
: SP ′ SP
B · log2 1+ = B · log2 1+
N0 B N0 B′
!
B (S′ / P
4·B = · log2 SP ) · S
2 1+ PN B / 2
0
′ !
SP
8 = log2 1 +
SP
30
S′ ′
256 = 1 + 30 −→ SP =
SP
P 8.5S P
ELEC3028 Digital Transmission – Overview & S

Summary
Information Theory Chen

• Channel capacity for discrete channels

• Channel capacity for continuous channels

• Shannon theorem

• Bandwidth and signal power trade off

7
Topic: 7- Bandwidth and Channel Capacity: Mutual
Information, Shannon’s Channel Capacity Theorem, Bandwidth
and Channel Capacity
L-1: A family is experiencing slow Wi-Fi speeds when multiple
devices are connected.
1. What factors affect the channel capacity of a home Wi-Fi
network?
2. How does bandwidth impact internet speed?
3. Suggest three practical ways to improve Wi-Fi speed based
on Shannon’s Channel Capacity theorem.

L-2: A remote worker faces lag and poor quality during video
calls, especially when multiple apps are running.
4. What is the role of mutual information in determining the
quality of transmitted video and audio?
5. Suggest two optimizations (e.g., adjusting bandwidth
allocation, using adaptive coding) to ensure smooth video
calls.
Cont…Topic: 7- Bandwidth and Channel Capacity:

L-3: A telecom company wants to improve 5G network performance

in a densely populated area where many users compete for
bandwidth.
1. How does channel capacity limit data rates in a congested area?
2. What techniques (e.g., MIMO, adaptive modulation, error
correction) can be used to increase data transmission
efficiency?

L-4: A government is launching a satellite-based internet service to

provide online education in rural areas with poor connectivity.
3. How does Shannon’s Channel Capacity theorem help determine
the maximum reliable data rate for satellite communication?
4. What role does mutual information play in reducing errors in
long-distance communication?

Orientation On BKD
100% (1)
Orientation On BKD
33 pages
Seminars and Fieldtrip: (Module 1)
No ratings yet
Seminars and Fieldtrip: (Module 1)
11 pages
17th Jan-2
No ratings yet
17th Jan-2
9 pages
21ECE72_Coding and Cryp Module 1
No ratings yet
21ECE72_Coding and Cryp Module 1
34 pages
Information Theory Final
No ratings yet
Information Theory Final
50 pages
ITC Module2 1
No ratings yet
ITC Module2 1
34 pages
Conditional Entropy: Y X) Y Y/X)
No ratings yet
Conditional Entropy: Y X) Y Y/X)
12 pages
2 Entropy and Mutual Information: I (A) F (P (A) )
No ratings yet
2 Entropy and Mutual Information: I (A) F (P (A) )
27 pages
Unit 1
No ratings yet
Unit 1
94 pages
ICT - Module 1 Lecture 1
No ratings yet
ICT - Module 1 Lecture 1
34 pages
Information Theory
No ratings yet
Information Theory
26 pages
1.1 Shannon's Information Measures: Lecture 1 - January 26
No ratings yet
1.1 Shannon's Information Measures: Lecture 1 - January 26
5 pages
Mutual Information
No ratings yet
Mutual Information
48 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
ITCT Lab Manual 2018-19
100% (3)
ITCT Lab Manual 2018-19
40 pages
ITC Module - I
No ratings yet
ITC Module - I
98 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Ec23ec4211itc PPT
No ratings yet
Ec23ec4211itc PPT
148 pages
Ece458 L6
No ratings yet
Ece458 L6
18 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
226 pages
Important Questions..
No ratings yet
Important Questions..
18 pages
Entropy
No ratings yet
Entropy
21 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
61 pages
Information Theory and Coding NOTES
No ratings yet
Information Theory and Coding NOTES
129 pages
ch3 PDF
No ratings yet
ch3 PDF
70 pages
Information Theory: Prepared By: Amit Degada Teaching Assistant, ECED, NIT Surat
No ratings yet
Information Theory: Prepared By: Amit Degada Teaching Assistant, ECED, NIT Surat
30 pages
Information Theory 5th Unit
No ratings yet
Information Theory 5th Unit
20 pages
INFORMATION THEORY AND SOURCE CODING
No ratings yet
INFORMATION THEORY AND SOURCE CODING
45 pages
Information Theory and Coding
100% (2)
Information Theory and Coding
108 pages
Digital Communication - Information Theory
100% (1)
Digital Communication - Information Theory
4 pages
Unit IV - Information Theory
No ratings yet
Unit IV - Information Theory
17 pages
Lec35 - 210108062 - ZAINAB ALI
No ratings yet
Lec35 - 210108062 - ZAINAB ALI
9 pages
1-Information Theory-2021
No ratings yet
1-Information Theory-2021
31 pages
Cryptography
No ratings yet
Cryptography
11 pages
Information Theory
No ratings yet
Information Theory
29 pages
Information T Information Theory and Coding: S.Chandramohan
No ratings yet
Information T Information Theory and Coding: S.Chandramohan
38 pages
Unit 1
100% (2)
Unit 1
45 pages
Tutorial 2
No ratings yet
Tutorial 2
3 pages
Information Theory Entropy Relative Entropy
No ratings yet
Information Theory Entropy Relative Entropy
60 pages
What Is Information Theory? The Basics: Sensor Reading Group 10 October 2003
No ratings yet
What Is Information Theory? The Basics: Sensor Reading Group 10 October 2003
9 pages
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
No ratings yet
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
43 pages
Amount of Information I Log (1/P)
No ratings yet
Amount of Information I Log (1/P)
2 pages
Information Theory PDF
No ratings yet
Information Theory PDF
26 pages
Info Theory
No ratings yet
Info Theory
59 pages
Information Theory Channel Capacity
No ratings yet
Information Theory Channel Capacity
27 pages
Chapter 6
No ratings yet
Chapter 6
34 pages
8.information Theory 1
No ratings yet
8.information Theory 1
10 pages
Communication System CH#2
No ratings yet
Communication System CH#2
40 pages
Digital Communication Notes
No ratings yet
Digital Communication Notes
9 pages
EC401 M1-Information Theory & Coding-Ktustudents - in PDF
No ratings yet
EC401 M1-Information Theory & Coding-Ktustudents - in PDF
50 pages
Unit 1 ITC
No ratings yet
Unit 1 ITC
25 pages
Chapte-2 Information Theory and Coding
No ratings yet
Chapte-2 Information Theory and Coding
68 pages
Lecture01 02 Part1
No ratings yet
Lecture01 02 Part1
27 pages
Lec.1n - COMM 552 Information Theory and Coding
No ratings yet
Lec.1n - COMM 552 Information Theory and Coding
25 pages
Information Theory and Coding by Norman Abramson
No ratings yet
Information Theory and Coding by Norman Abramson
103 pages
SummaryFeb5 2024
No ratings yet
SummaryFeb5 2024
2 pages
15ec54 PDF
No ratings yet
15ec54 PDF
56 pages
Itc
No ratings yet
Itc
304 pages
Module 1
No ratings yet
Module 1
29 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Ramadan Marketing Insights
No ratings yet
Ramadan Marketing Insights
5 pages
Bluetooth
No ratings yet
Bluetooth
4 pages
Lesson Plan Defining Aims
No ratings yet
Lesson Plan Defining Aims
6 pages
I. Intended Learning Outcomes: A Detailed Lesson Plan in English 10
No ratings yet
I. Intended Learning Outcomes: A Detailed Lesson Plan in English 10
5 pages
The Power of Words
No ratings yet
The Power of Words
20 pages
Jon Pope On Resilience, Stress Physiology, and Changing Beliefs
No ratings yet
Jon Pope On Resilience, Stress Physiology, and Changing Beliefs
7 pages
Churrosity: STI College Baliuag
No ratings yet
Churrosity: STI College Baliuag
9 pages
Speech Context and Speech Act
No ratings yet
Speech Context and Speech Act
22 pages
Npi Termination Phase-Gutierrez
No ratings yet
Npi Termination Phase-Gutierrez
3 pages
1.3 Smiles 1 Periudha e Dyte 2022-2023
No ratings yet
1.3 Smiles 1 Periudha e Dyte 2022-2023
3 pages
Activities in Module 5
No ratings yet
Activities in Module 5
5 pages
Institut Pendidikan Guru Kampus Kent, Tuaran, Sabah Coursework Tasks Academic Year 2020
No ratings yet
Institut Pendidikan Guru Kampus Kent, Tuaran, Sabah Coursework Tasks Academic Year 2020
4 pages
Strategic Brand Marketing: Assignment 1 Bulls Eye Model of Nestle Nestle's Resonace Pyramid
No ratings yet
Strategic Brand Marketing: Assignment 1 Bulls Eye Model of Nestle Nestle's Resonace Pyramid
3 pages
Activity Under Instructional Planning
50% (2)
Activity Under Instructional Planning
4 pages
First Day Readings in Philippine History
No ratings yet
First Day Readings in Philippine History
15 pages
Hack FB Online) 2 Minutes Using Our Site #NT$Ll6$P
No ratings yet
Hack FB Online) 2 Minutes Using Our Site #NT$Ll6$P
4 pages
Clil A
No ratings yet
Clil A
4 pages
(123doc) - Sach-Giao-Vien-Tieng-Anh-10-Thi-Diem-Tap-2 PDF
No ratings yet
(123doc) - Sach-Giao-Vien-Tieng-Anh-10-Thi-Diem-Tap-2 PDF
124 pages
The Art of Holding Hands
No ratings yet
The Art of Holding Hands
8 pages
Introduction To Trade-Offs and Opportunity Costs - Grace Heckendorn
No ratings yet
Introduction To Trade-Offs and Opportunity Costs - Grace Heckendorn
2 pages
16' Conclusion
No ratings yet
16' Conclusion
3 pages
To Chapter 4 Be Wise, Think Wise
0% (1)
To Chapter 4 Be Wise, Think Wise
11 pages
The Communication Barriers Between Teachers and Parents in Primary Schools
No ratings yet
The Communication Barriers Between Teachers and Parents in Primary Schools
20 pages
Communicative Domain I. Assessment of Communicative Language Competences. Reading Comprehension - 30 Points
No ratings yet
Communicative Domain I. Assessment of Communicative Language Competences. Reading Comprehension - 30 Points
6 pages
Sylabus Morphology
No ratings yet
Sylabus Morphology
2 pages
Desegregacion de Destrezas Listo
No ratings yet
Desegregacion de Destrezas Listo
14 pages
Aubreycurranresume
No ratings yet
Aubreycurranresume
2 pages
18BVC026 Portfolio Production
No ratings yet
18BVC026 Portfolio Production
3 pages

ITC-6 sem -1

Uploaded by

ITC-6 sem -1

Uploaded by

Information Theory and Coding

Topic: 1 Uncertainty , Information, Mutual information

• Uncertainty represents the lack of complete knowledge

• Information is a measure of the reduction in uncertainty

• Mutual information quantifies the reduction in uncertainty about one

• Example: Weather and dressing choices.

The information gained about (X=x) by the reception of (Y=y) is the

I(x, y) = Initial uncertainty – Final uncertainty

L-1: A digital assistant predicts whether you will choose to

L-2: A delivery company uses GPS data to predict whether a

• Consider the set of symbols 𝑥1, 𝑥2,….,𝑥𝑛,

• The amount of information that 𝑦𝑗

1. It is symmetric, 𝐼(𝑥𝑖 , 𝑦𝑗) = 𝐼(𝑦𝑗 , 𝑥𝑖).

2. 𝐼(𝑥𝑖 , 𝑦𝑗) > 0 if a posteriori probability > a priori probability, 𝑦𝑗 provides

+ve information about 𝑥𝑖 .

case of statistical independence when 𝑦𝑗 provides no information about

-ve information about 𝑥𝑖 , or 𝑦𝑗 adds ambiguity.

Marginal entropies is a term usually used to denote both source

Noise entropy: 𝐻(𝑌 ∣ 𝑋)

Loss entropy: 𝐻(𝑋 ∣ 𝑌)

𝐼(𝑋,𝑌) = 𝐻(𝑋) − 𝐻(𝑋

𝐻 ( 𝑦|𝑥 )= 𝐻 ( 𝑥 , 𝑦 ) − 𝐻 ( 𝑥 )=1.875 − 1.06127=0.813 𝑏𝑖 𝑡 /𝑠 𝑦

𝐼 ( 𝑥 , 𝑦 )=𝐻 ( 𝑥 ) − 𝐻 ( 𝑥| 𝑦 ) =1.06127 − 0.886=0.175 𝑏𝑖𝑡 /𝑠 𝑦𝑚𝑏𝑜

 Shannon’s measure of information is the number of bits to represent the

Where there are n symbols 1, 2, … n , each with

 On average, there are equal number of a and b.

We want to characterize the average information

 If we already know that it will be (or was) heads, i.e., P(heads) =

 If the coin is not fair, e.g., P(heads) = 0.9, the amount of

 Intuitively, the amount of information received is the same if

For both the events to happen, the probability is

Logarithms satisfy this!

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty

P = {0.5, 0.25, 0.25}

Three weather conditions in Corvallis: Rain, sunny, cloudy with corresponding

Q = {0.48, 0.32, 0.20}

2. Maximum entropy (H = log N) is reached when

3. The difference log N – H is called the

L-2: A messaging app compresses text messages by analyzing

(a) Compute the mutual information I(X;Y) to quantify how

L-2: A smart traffic system analyzes road congestion at different

L-2: Many people experience slow Wi-Fi in certain areas of their

L-3: An online banking system must ensure that customer data is

L-4: A space agency is developing a satellite-based internet service

L-2: During voice calls, background noise often makes it hard

L-3: A user frequently experiences buffering when streaming videos over a

L-4: In disaster-stricken areas, communication networks are often disrupted.

• Average mutual information tells us what happens to information

• In the simplest case, this can be modelled by a binary symmetric

• The channel error probability pe of the BSC depends on the noise

• Hence pe could be made arbitrarily small by increasing the signal

• The channel noise power can be shown to be N P = N 0 B , where N 0 /2

Our aim is to determine the channel capacity C , the maximum

where H ( Y ) is the average information per symbol at channel output

C = max { I ( X , Y )/t a v } (bits/second)

• C becomes maximum if H ( Y | X ) = 0 (no errors) and the symbols are

• Channel capacity can be expressed in either (bits/symbol) or

• Let t i be symbol duration for X i ; the average time for transmission

• By definition, the channel capacity is C =

• Assuming constant symbol durations ti = Ts ,

• The channel capacity C (in bits/symbol) is given as

If p e = 0.5 (worst case), C = 0; and if p e = 0 (best case),

P (Y0 |X0 ) P (Y1 |

• Shannon’s theorem: If information rate R ≤ C , there exists a coding

• C is the maximum possible error-free information transmission rate

• Even in noisy channel, there is no obstruction of reliable

• Shannon’s theorem does not tell how to construct such a capacity-

• Most practical channel coding schemes are far from optimal,

• According to Shannon, this entropy attends the maximum for

• The signal power at the channel input is 2

• Since I ( x , y) = H ( y ) − H( y | x ) , and H ( y | x ) = H ( ε ) with ε being

H(y|x) = log 2 2πeN P

where B is the signal bandwidth

• For digital communications, B (Hz) is equivalent to the channel