0% found this document useful (0 votes)
2 views

ITC-6 sem -1

The document covers the fundamentals of Information Theory, including concepts such as uncertainty, information, and mutual information, which are essential for understanding data transmission and compression. It explains how information quantifies the reduction of uncertainty and discusses the mathematical formulations for calculating information and mutual information. Additionally, it includes exercises to apply these concepts in practical scenarios, emphasizing their importance in various fields like communication and machine learning.

Uploaded by

praveen.yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ITC-6 sem -1

The document covers the fundamentals of Information Theory, including concepts such as uncertainty, information, and mutual information, which are essential for understanding data transmission and compression. It explains how information quantifies the reduction of uncertainty and discusses the mathematical formulations for calculating information and mutual information. Additionally, it includes exercises to apply these concepts in practical scenarios, emphasizing their importance in various fields like communication and machine learning.

Uploaded by

praveen.yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Information Theory and Coding

IT-6th Sem.

Praveen Yadav
Unit:1 INFORMATION THEORY

Topic: 1 Uncertainty , Information, Mutual information


Information Theory
 Information Theory is a branch of applied mathematics and electrical
engineering that deals with the quantification, storage, and
communication of information.
 It provides a framework to measure the efficiency of data transmission
and compression in communication systems.
 It was developed by Claude Shannon in the 1940s.
 Information theory is vital in data compression, communication,
cryptography, machine learning, and more.
 It helps solve real-world problems by enabling more efficient data
storage, secure communication, and intelligent decision-making.
Outline
 Uncertainty
 Information
 Mutual information
 Exercise (L-1, L-2, L-3, L-4)
Uncertainty & Information, Mutual information

• Uncertainty represents the lack of complete knowledge


about an event or system. It indicates the range of
possibilities before the outcome is known.
Example : Flipping a fair coin.

• Information is a measure of the reduction in uncertainty


upon receiving new data or observing an event.
Example : Once you flip the coin and observe the outcome
(say it lands on Heads), you gain information that resolves
the uncertainty.
Information
• Information is function of probability of outcome.
Example :

• Amount of Information
I = log​2(1/P(Outcome)) …(1)
I = -log2P(Outcome) …(2)
When log base is 2 (or binary system) the unit is a bit. When the base is ‘e’
the unit is ‘nat’ ,base is 10 unit Hartley.
1 nat = 1.4426 bits, 1 Hartley = 3.3219 bits.
Mutual information
• What is Mutual Information?
A measure of the amount of information one variable reveals about
another.

• Mutual information quantifies the reduction in uncertainty about one


random variable due to knowledge of another.

• Example: Weather and dressing choices.


Cont.…Mutual Information
Let x an y are two symbol which are we transmitted.
• Marginal Probability
P(X=x): Probability of a specific value of X.
P(Y=y): Probability of a specific value of Y.
• Joint Probability:
P(X=x, Y=y) : Probability of X and Y occurring together.
• Conditional Probability
P(X=x/Y=y): Probability of X given Y.
P(X=x/Y=y) = P(X=x, Y=y)/P(Y=y) …(3)

The information gained about (X=x) by the reception of (Y=y) is the


net reduction in its uncertainty , and is known as mutual information
I(x, y).
Cont.…Mutual Information

I(x, y) = Initial uncertainty – Final uncertainty


= log(1/P(X=x)) – log(1/ P(X=x/Y=y))
= – log(P(X=x)) + log(P(X=x/Y=y))
I(x, y) = log(P(X=x/Y=y)/P(X=x)) …(4)
Some thing interesting …
From equation (3) and (4)
I(x, y) = log(P(X=x, Y=y)/P(Y=y) P(X=x))
= log(P(Y=y/X=x)/P(Y=y)) …(5)
= I(y, x)
I(x, y) = I(y, x) …(6)
Mutual information is symmetrical in x and y.
Unit:1 INFORMATION THEORY
Topic: 1 Uncertainty , Information, Mutual information

L-1: A digital assistant predicts whether you will choose to


watch a movie or listen to music based on historical data. The
probabilities are:
 Watching a movie: P(Movie)=0.7
 Listening to music: P(Music)=0.3
(a) What is the amount of uncertainty (in bits) for each option?
(b) Which activity has higher uncertainty, and why?

L-2: A delivery company uses GPS data to predict whether a


package will arrive on time or be delayed:
 On-time: P(O)=0.8
 Delayed: P(D)=0.2
a) If the probability of delay increases to 0.5, calculate the
average information and compare the two scenarios.
b) What does the change in average information tell you about
the uncertainty of the delivery prediction system?
Cont…Topic: 1 Uncertainty , Information, Mutual
information
L-3: In a communication system, a transmitter sends a
binary signal (0 or 1) through a noisy channel. The
receiver decodes the signal with the following
probabilities:
 If 0 is sent, the receiver detects 0 with probability 0.90
and 1 with probability 0.1.
 If 1 is sent, the receiver detects 1 with probability 0.85
and 0 with probability 0.15.
(a) How much information is gained when the receiver
correctly detects a transmitted signal?
(b) How this information can help improve the system’s
reliability.
Cont…Topic: 1 Uncertainty , Information, Mutual
information
L-4: A weather monitoring station reports the likelihood of
three events based on real-time data:
Sunny: P(Sunny)=0.5
Cloudy: P(Cloudy)=0.3
Rainy: P(Rainy)=0.2
A prediction model updates the probabilities after
observing the wind speed:
Sunny: P(Sunny | Wind)=0.7
Cloudy: P(Cloudy | Wind)=0.2
Rainy: P(Rainy | Wind)=0.1
(a) Calculate the information gained about the weather
condition after observing the wind speed for each
event.
(b) How such updates in information can improve real-
time weather prediction systems.
Topic: 2 Marginal, conditional and joint Entropies
Mutual information for noisy channel:

• Consider the set of symbols 𝑥1, 𝑥2,….,𝑥𝑛,


the transmitter 𝑇𝑥 my produce. The
receiver 𝑅𝑥 may receive 𝑦1, 𝑦2
……….𝑦𝑚. Theoretically, if the noise and
jamming is neglected, then the set X=set
Y.

• The amount of information that 𝑦𝑗


provides about 𝑥𝑖 is called the mutual
information between 𝑥𝑖 and 𝑦𝑖.
Properties of 𝑰(𝒙𝒊 , 𝒚𝒋):

1. It is symmetric, 𝐼(𝑥𝑖 , 𝑦𝑗) = 𝐼(𝑦𝑗 , 𝑥𝑖).

2. 𝐼(𝑥𝑖 , 𝑦𝑗) > 0 if a posteriori probability > a priori probability, 𝑦𝑗 provides

+ve information about 𝑥𝑖 .


3. 𝐼(𝑥𝑖 , 𝑦𝑗) = 0 if a posteriori probability = a priori probability, which is the

case of statistical independence when 𝑦𝑗 provides no information about


𝑥𝑖 .
4. 𝐼(𝑥𝑖 , 𝑦𝑗) < 0 if a posteriori probability < a priori probability, 𝑦𝑗 provides

-ve information about 𝑥𝑖 , or 𝑦𝑗 adds ambiguity.


1. Joint entropy:
In information theory, joint entropy is a measure of the
uncertainty associated with a set of variables.
P(x , y) joint
probability n
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖Entro
) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖 )
𝑖=1
py
𝑚 𝑛
𝐻 ( 𝑋 ,𝑌 )= 𝐻( 𝑋𝑌)=− ∑ ∑ 𝑃 ( 𝑥𝑖, 𝑦𝑗 ) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖, 𝑦𝑗 ) 𝑏𝑖𝑡
Joint𝑠/ 𝑠 𝑦𝑚
𝑗=1 𝑖=1 Entropy
2. Conditional entropy:
In information theory, the conditional entropy quantifies the
amount of information needed to describe the outcome of a
random variable Y given that the value of another random variable
X is known. P(x | y) conditional
probability n
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖Entro
) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖 )
𝑖=1
py
𝑚 𝑛
𝐻 ( 𝑌 ∨ 𝑋 )=− ∑ ∑ 𝑃 ( 𝑥𝑖, 𝑦𝑗 ) 𝑙𝑜𝑔2 𝑃 ( 𝑦𝑗∨𝑥𝑖 ) 𝑏𝑖𝑡Conditional
𝑠/𝑠 𝑦𝑚𝑏𝑜𝑙
𝑗=1 𝑖=1 Entropy
3. Marginal Entropies:

Marginal entropies is a term usually used to denote both source


entropy H(X) defined as before and the receiver entropy H(Y)
given by:
n
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖 ) 𝑙𝑜𝑔2Source
𝑃 ( 𝑥𝑖 )
𝑖 =1
Entropy

𝑚
𝐻 ( 𝑦 ) =− ∑ 𝑃 ( 𝑦𝑗 ) 𝑙𝑜𝑔 2 𝑃 ( 𝑦𝑗 ) 𝑏𝑖𝑡 /Receiver
𝑠𝑦𝑚𝑏𝑜𝑙
𝑗=1
Entropy
4. Relationship between joint, conditional and

Noise entropy: 𝐻(𝑌 ∣ 𝑋)


transinformation:

= 𝐻(𝑋,𝑌) − 𝐻(𝑋)

Loss entropy: 𝐻(𝑋 ∣ 𝑌)

= 𝐻(𝑋,𝑌) − 𝐻(𝑌)
Also we have transinformation (average mutual information)

𝐼(𝑋,𝑌) = 𝐻(𝑋) − 𝐻(𝑋

∣ 𝑌)

𝐼(𝑋,𝑌) = 𝐻(𝑌) −
Example: The joint probability of a system is given by:

[ ]
𝑥1 ¿ 0.5 0.25
𝑃 ( 𝑋 , 𝑌 )= 𝑥 2 ¿ 0 0.125
𝑥 3 ¿ 0.0625 0.0625

Find:
1. Marginal entropies.
2. Joint entropy.
3. Conditional entropies.
4. The transinformation.
Solution: x1 x2 x3 y1
y2
P(x)=[0.75 0.125 0.125], P(y)= [0.5625
1- Marginal 0.4375]
entropies:

[
𝑛
0.75 𝑙𝑛0.75 +2 ∗0.12
𝐻 ( 𝑥 ) =− ∑ 𝑃 ( 𝑥𝑖 ) 𝑙𝑜𝑔2 𝑃 ( 𝑥𝑖 )=−
𝑖=1 𝑙𝑛2
¿ 1.06127 𝑏𝑖𝑡 /𝑠 𝑦𝑚𝑏𝑜𝑙

[
𝑚
0.5625𝑙𝑛0.5625+0.4375𝑙𝑛
𝐻 ( 𝑦 ) =− ∑ 𝑃 ( 𝑦𝑗 ) 𝑙𝑜𝑔 2 𝑃 ( 𝑦𝑗 )=−
𝑗=1 𝑙𝑛2
¿ 0.9887 𝑏𝑖𝑡/ 𝑠 𝑦𝑚𝑏𝑜𝑙
[
𝑥1 ¿ 0.5 0.2
2- Joint 𝑃 ( 𝑋 , 𝑌 )= 𝑥 2 ¿ 0 0.12
entropy: 𝑥 3 ¿ 0.0625 0.

m n
H ( x , y )=− ∑ ∑ P ( xi , yj ) log2 P ( xi , yj )
j =1 i=1
3. Conditional entropies :

𝐻 ( 𝑦|𝑥 )= 𝐻 ( 𝑥 , 𝑦 ) − 𝐻 ( 𝑥 )=1.875 − 1.06127=0.813 𝑏𝑖 𝑡 /𝑠 𝑦


𝐻 ( 𝑥| 𝑦 )= 𝐻 ( 𝑥 , 𝑦 ) − 𝐻 ( 𝑦 )=1.875 −0.9887=0.886 𝑏𝑖𝑡/ 𝑠 𝑦𝑚

4. The trans-information :

𝐼 ( 𝑥 , 𝑦 )=𝐻 ( 𝑥 ) − 𝐻 ( 𝑥| 𝑦 ) =1.06127 − 0.886=0.175 𝑏𝑖𝑡 /𝑠 𝑦𝑚𝑏𝑜


Topic: 2 Marginal, conditional and joint Entropies
L-1: A TV broadcasting station sends one of three types of
programs during primetime:
 News (P(N)=0.5
 Sports (P(S)=0.3
 Movies (P(M)=0.2
Calculate the marginal entropy H(X) of the program selection,
where X represents the type of program broadcast.
L-2: A temperature sensor sends readings to a control system
with two variables:
X: The actual temperature status (High, Low},
Y: The sensor’s output (High, Low}.
The joint probabilities are:
 P(X,Y)=[P(High, High)=0.6 P(High, Low)=0.1
P(Low, High)=0.2 P(Low, Low)=0.1]
Compute the factor which represents the uncertainty in the
sensor’s output given the actual temperature.
Cont…Topic: 2 Marginal, conditional and joint Entropies
L-3: A communication channel transmits two binary
signals X and Y with the following joint probabilities:
P(X,Y) = [P(0,0)=0.3, P(0,1)=0.2, P(1,0)=0.1,
P(1,1)=0.4]
Explain how joint entropy represents the total uncertainty
in the transmitted and received signals.
L-4: In a machine learning model, the variables X (input
features) and Y(model predictions) are related as follows:
P(X,Y)=[P(A, Correct)=0.4, P(A, Incorrect)=0.1,
P(B, Correct)=0.3, P(B, Incorrect)=0.2]
(a) Calculate the all entropies.
(b) Discuss how understanding these entropies can help
improve the reliability of the machine learning model.
Topic: 3- Shannon’s concept of information:
Information as a Measure of Uncertainty, Entropy-A Measure
of Average Information, Communication System Model,
Channel Capacity
Shannon’s Information
Theory
The
Claude Shannon: A Mathematical Theory of Communication
Bell System Technical Journal, 1948

 Shannon’s measure of information is the number of bits to represent the


amount of uncertainty (randomness) in a data source, and is defined as
entropy
n
H   pi log( pi )
i 1

Where there are n symbols 1, 2, … n , each with


probability of occurrence of pi
Shannon’s Entropy
 Consider the following string consisting of symbols a and
b:

abaabaababbbaabbabab… ….

 On average, there are equal number of a and b.


 The string can be considered as an output of a below source
with equal probability of outputting symbol a or b:
0.5 a

We want to characterize the average information


generated by the source!

0.5 b

source
Intuition on Shannon’s Entropy
n
Why H   p log( p ) i i
i 1
Suppose you have a long random string of two binary symbols 0 and 1, and the probability of symbols 1
and 0 are p
and
0 1 p
Ex: 00100100101101001100001000100110001 ….
If any string is long enough say N, it is likely to contain Np 0’s and Np11’s. The probability of this
0
string pattern occurs is equal to

p  p0Np0 p1Np1
Hence, # of possible patterns is 1 / p  p0 Np0 p1 Np1
1
# bits to represent all possible patterns is log( p  Np0
0 p1 Np1 )   Npi log pi
i 0
The average # of bits to represent the symbol is therefore
1
  p log p
i 0
i i
More Intuition on Entropy
 Assume a binary memoryless source, e.g., a flip of a coin. How
much information do we receive when we are told that the
outcome is heads?

 If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the
amount of information is 1 bit.

 If we already know that it will be (or was) heads, i.e., P(heads) =


1, the amount of information is zero!

 If the coin is not fair, e.g., P(heads) = 0.9, the amount of


information is more than zero but less than one bit!

 Intuitively, the amount of information received is the same if


P(heads) = 0.9 or P (heads) = 0.1.
Self Information
 So, let’s look at it the way Shannon did.
 Assume a memoryless source with
 alphabet A = (a1, …, an)
 symbol probabilities (p1, …, pn).
 How much information do we get when finding out that
the next symbol is ai?
 According to Shannon the self information of ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.

For both the events to happen, the probability is


pA ¢ pB. However, the amount of information
should be added, not multiplied.

Logarithms satisfy this!


No, we want the information to increase with
decreasing probabilities, so let’s use the negative
logarithm.
Self Information
Example 1:

Example 2:

Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
Self Information

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty


about the following symbol.
Entropy
Example: Binary Memoryless Source
BMS 01101000…

Let

Of t e
n de
Then note
d

1
The uncertainty (information) is greatest when

0 0.5 1
Example
Three symbols a, b, c with corresponding probabilities:

P = {0.5, 0.25, 0.25}

What is H(P)?

Three weather conditions in Corvallis: Rain, sunny, cloudy with corresponding


probabilities:

Q = {0.48, 0.32, 0.20}

What is H(Q)?
Entropy: Three properties
1. It can be shown that 0 ≤ H ≤ log N.

2. Maximum entropy (H = log N) is reached when


all symbols are equiprobable, i.e.,
pi = 1/N.

3. The difference log N – H is called the


redundancy of the source.
Communication System Model
Channel Capacity
In 1948, Claude Shannon developed a
mathematical model for channel such that,
insofar as the model is realistic, there exists an
upper limit on the rate at which information can
be transmitted from source to user error-free.
This upper limit is called the channel capacity, C.
It is a function of bandwidth (B) and signal-to-
noise ration (S/N or SNR).
 S
C B log 2  1  
 N
Topic: 3- Shannon’s concept of information: Information as a
Measure of Uncertainty, Entropy-A Measure of Average
Information, Communication System Model, Channel Capacity
L-1: A weather forecasting system predicts two possible outcomes
for tomorrow's weather:
 Sunny (P(S) = 0.8
 Rainy (P(R) = 0.2

(a) How much information (in bits) is gained when the forecast
predicts it will be sunny?
(b) Which weather outcome carries more uncertainty, and why?

L-2: A messaging app compresses text messages by analyzing


character usage. The probabilities of four characters A,B,C,D in
the messages are as follows:
P(A)=0.4, P(B)=0.3, P(C)=0.2, P(D)=0.1
(c) Calculate the entropy H(X) of the character distribution.
(d) How this entropy value relates to the efficiency of text
compression in the app.
Cont…Topic: 3- Shannon’s concept of information
L-3: In a communication system, a transmitter sends three
types of signals X1, X2, X3​through a noisy channel. The
channel distorts the signals, and the receiver receives
corresponding signals Y1, Y2, Y3​. The joint probabilities
are:

(a) Compute the mutual information I(X;Y) to quantify how


much information is shared between the transmitter and
receiver.
(b) How this mutual information reflects the effectiveness
of the communication channel.
Cont…Topic: 3- Shannon’s concept of information
L-4: "A wireless communication channel has a bandwidth
of 1 MHz and operates with a signal-to-noise ratio (SNR)
of 15 dB. What is the effect on channel capacity if the
SNR decreases to 10 dB? Discuss the impact of SNR on
channel capacity for this given bandwidth."
Topic: 4- Shannon’s measure of information: Properties of
Shannon’s Measure of Information: Non-Negativity, Higher for
Rare Events, Additivity for Independent Events
L-1: An email filtering system needs to classify incoming emails as
spam or not spam. The system assigns a probability to each word
appearing in an email based on past data.
1. Given that a rare word like “lottery” has a probability of 0.001 in
normal emails but 0.1 in spam emails, how much information is
gained when the word “lottery” appears in an email?
2. Use Shannon’s measure of information to decide whether seeing
this word should significantly impact the classification.

L-2: A smart traffic system analyzes road congestion at different


times of the day. The probability of a road being congested at 8 AM
is 0.8, whereas the probability of it being congested at 3 AM is
0.05. According to Shannon’s measure of information, which time
gives higher information content about road congestion?
How can this information help in optimizing traffic signal timing to
reduce congestion?
Cont…Topic: 4- Shannon’s measure of information
L-3: A company needs to improve its password security. The probability
of a user choosing "password123" is 0.1, while the probability of
choosing a random 12-character password is 10−12. Explain why using
a rare password increases security.
Propose an optimized method for users to create secure but memorable
passwords using Shannon’s principle.
L-4: A stock market analyst tracks daily price fluctuations of a
company's stock.
The probability of the stock price increasing by 1% is 0.5, while the
probability of it increasing by 10% in a day is 0.01.
1. Use Shannon’s measure to determine which event carries more
information.
2. How can investors use this principle to optimize risk assessment
strategies?
3. Given that stock movements are often independent events, how
does the additivity property of information help in predicting multi-
day trends?
Topic: 5- Shannon's model for Communication system:
Information source, Transmitter, Channle , Receiver, Noise and
Its Impact
Topic: 5- Shannon's model for Communication system:
Information source, Transmitter, Channle , Receiver, Noise and
Its Impact
L-1: A telecom company wants to improve its network so that users
experience fewer call drops. Based on Shannon’s model of
communication, identify the main sources of noise affecting the
signal and propose simple solutions to minimize call drops.
1. How does noise (e.g., signal interference, weak signals) impact
communication?
2. Suggest two practical ways to reduce call drops and improve
signal reception.

L-2: Many people experience slow Wi-Fi in certain areas of their


house. Using Shannon’s communication model, analyze the
communication process and determine how noise affects the signal.
3. What acts as the transmitter, channel, and receiver in a home Wi-
Fi network?
4. Propose three practical ways to optimize Wi-Fi performance and
minimize interference.
Cont…Topic: 5- Shannon's model for Communication system

L-3: An online banking system must ensure that customer data is


transmitted securely and efficiently over the internet. Based on
Shannon’s model, analyze how information is transmitted and how
noise (e.g., cyber threats) can impact data security.
1. What represents the information source, transmitter, channel,
receiver, and noise in an online banking transaction?
2. How can cyber attacks (e.g., hacking, phishing, data corruption)
be considered as noise in the communication process?

L-4: A space agency is developing a satellite-based internet service


for rural and remote areas. However, signal degradation due to
atmospheric interference is a major challenge.
3. How does Shannon’s communication model apply to satellite
communication?
4. How can error correction codes and adaptive modulation be used
to optimize communication?
Topic: 6- Source coding and line/channel coding: Definition,
difference , importance, types.
Topic: 6- Source coding and line/channel coding: Definition,
difference , importance, types.
L-1: A smart phone user notices that sending images and
videos over messaging apps consumes too much mobile
data.
1. How does source coding (compression) help in reducing the
size of multimedia files?
2. Suggest two practical ways to optimize data usage while
maintaining acceptable quality.

L-2: During voice calls, background noise often makes it hard


to understand the speaker. Mobile networks use error
correction techniques (channel coding) to improve call clarity.
3. How does channel coding (e.g., Hamming codes,
convolutional codes) help in improving call quality?
4. Propose two ways mobile networks can use coding
techniques to minimize noise and improve speech clarity.
Cont…Topic: 6- Source coding and line/channel coding

L-3: A user frequently experiences buffering when streaming videos over a


slow or unstable internet connection.
1. How does source coding (e.g., H.264, AV1) help in compressing videos
for efficient transmission?
2. How does channel coding (e.g., Forward Error Correction, Reed-Solomon
codes) help prevent video corruption?
3. Propose an optimized method where both source coding and channel
coding work together to improve streaming quality with minimal
buffering.

L-4: In disaster-stricken areas, communication networks are often disrupted.


Satellite-based emergency communication systems must transmit critical
messages and images with minimal data loss.
4. How does channel coding help protect data from errors caused by weak
signals?
5. Suggest an optimized encoding method for emergency communication
that ensures fast transmission, low bandwidth usage, and high accuracy.
Topic: 7- Bandwidth and Channel Capacity: Mutual
Information, Shannon’s Channel Capacity Theorem, Bandwidth
and Channel Capacity
Bandwidth and Channel Capacity
• Information transferring across channels
– Channel characteristics and binary
symmetric channel
– Average mutual information

• Average mutual information tells us what happens to information


transmitted across
channel, or it “characterises” channel
– But average mutual information is a bit too mathematical (too
abstract)
– As an engineer, one would rather characterises channel by its
physical quantities, such as bandwidth, signal power and noise
power or
• Also intuitively SNRsource with information rate R, one would
given
channel
like is capable
to know if of “carrying” the amount of information
transferred across it
– In other word, what is the channel capacity?
Review of Channel
Assumptions
• No amplitude or phase distortion by the channel, and the only
disturbance is due to additive white Gaussian noise (AWGN),
i.e. ideal channel

• In the simplest case, this can be modelled by a binary symmetric


channel (BSC)

• The channel error probability pe of the BSC depends on the noise


power N P relative to the signal power S P , i.e. SNR= S P / N P

• Hence pe could be made arbitrarily small by increasing the signal


power

• The channel noise power can be shown to be N P = N 0 B , where N 0 /2


is power spectral density of
the noise and B the channel bandwidth

Our aim is to determine the channel capacity C , the maximum


possible error-free information transmission rate across the channel
Channel Capacity for Discrete
Channels
• Shannon’s channel capacity C is based on the average mutual
information (average conveyed information across the channel),
and one possible definition is

C = m ax { I ( X , Y )} = max{H(Y ) − H ( Y | X ) } (bits/symbol)

where H ( Y ) is the average information per symbol at channel output


or destination entropy, and H ( Y | X ) error entropy

• Let t i be the symbol duration for X i and t a v be the average time for
transmission of a symbol, the channel capacity can also be
defined as

C = max { I ( X , Y )/t a v } (bits/second)

• C becomes maximum if H ( Y | X ) = 0 (no errors) and the symbols are


equiprobable (assuming constant symbol durations t i )

• Channel capacity can be expressed in either (bits/symbol) or


Channel
• Now I ( X , Y ) = Capacity:
H ( Y ) = H ( X ) , but theNoise-
entropy of the source is
given by:

H ( X ) = − Case
Free Σ
q P (X )
i 2 P (X )
i
log i=1 (bits/symbol)

• Let t i be symbol duration for X i ; the average time for transmission


of a symbol is
Σ
tav = q P ( X i ) · (second/
ti i=1 symbol)

• By definition, the channel capacity is C =


max { H(X)/t a v }
(bits/second)

• Assuming constant symbol durations ti = Ts ,


themaximum or the capacity is obtained with
equiprobable source symbols, C = log 2 q/T s , and this is
Channel Capacity for
BSC
• BSC with equiprobable source symbols P ( X 0 ) = P ( X 1 ) = 0.5 and variable
channel error probability p e (due to symmetry of BSC, P ( Y 0 ) = P ( Y 1 ) = 0.5)

• The channel capacity C (in bits/symbol) is given as

C = 1 + ( 1 − p e ) log 2 (1 − p e ) + p e log 2 pe

0.8

0.6
C

0.4

0.2

0
−4 −3 −2 −1
10 10 10 10
p
e

If p e = 0.5 (worst case), C = 0; and if p e = 0 (best case),


C = 1
Channel Capacity for BSC
(Derivation)
P (Y0 |X0 ) = 1 −
P (X0 ) = 2
1
X p0 e .QQ ) .Y
0
P (Y0 ) = 1
2
Q /
Q /
</
Q
Q /
/ P (Y0 |X1 ) =
/
/ /QQ p
Q eP ( Y 1 | X 0 ) =
/ Q
/ / Q
./ p e sQ P ( Y1 ) = 1
P ( X1 ) = 2 1
X1 ) .Y1 2
P ( Y 1 | X 1 ) = 1 − pe

P ( X 0 , Y 0 ) = P ( X 0 ) P ( Y 0 | X 0 ) = ( 1 − p e ) /2 , P ( X 0 , Y1 ) = P ( X 0 ) P (Y1 |X0 )

= p e /2 P ( X 1 , Y 0 ) = p e /2, P ( X 1 , Y 1 ) = (1 − pe )/2

P (Y0 |X0 ) P (Y1 |


I(X, Y ) = P ( X 0 , Y0 ) + P ( X 0 , Y1 ) +
X 0P) ( Y 0 ) P ( Y1 )
log 2 log 2
P (Y0 |X1 ) P (Y1 |X1 )
+ P ( X 1 , Y0 ) + P ( X 1 , Y1 )
P ( Y0 ) P ( Y1 )
log 2 log 2
1= ( 1 − p e ) log 2 2(1 − p e ) + 1 p e log 2 2p e + 1 p e log 2 2p e + 1 ( 1 − p e ) log 2 2(1
2 2 2 2
− pe )
= 1 + ( 1 − p e ) log 2 (1 − p e ) + p e log 2 p e (bits/symbol)
Channel Capacity and Channel
Coding

• Shannon’s theorem: If information rate R ≤ C , there exists a coding


technique such that information can be transmitted over the
channel with arbitrarily small error probability; if R > C , error-free
transmission is impossible

• C is the maximum possible error-free information transmission rate

• Even in noisy channel, there is no obstruction of reliable


transmission, but only a limitation of the rate at which
transmission can take place

• Shannon’s theorem does not tell how to construct such a capacity-


approaching code

• Most practical channel coding schemes are far from optimal,


but capacity- approaching codes exist, e.g. turbo codes and low-
density parity check codes
Capacity for Continuous Channels
• Entropy of a continuous (analogue) source, where the source output x
is described by the PDF p(x), is defined by
∫ +∞
H(x) = − p(x) log 2 p(x)dx
−∞

• According to Shannon, this entropy attends the maximum for


Gaussian PDFs p(x)
(equivalent to equiprobable symbols in the discrete case)
• Gaussian PDF with zero mean and 2
x
variance σ :
1 −(x 2 /2σ 2x )
p(x) = √ 2πσ
e
• The maximum entropy can be
shown to be
max 2
√ 1 2 2πeσx2
H x ( x ) = log 2πeσ = 2
log
Capacity for Continuous Channels
(continue)

• The signal power at the channel input is 2


S P = σx
• Assuming AWGN channel noise independent of the transmitted
signal, the received
signaly power is σ2 = S P +
N P , hence
1
H m a x ( y ) = log 2 2πe(S P + N P )
2

• Since I ( x , y) = H ( y ) − H( y | x ) , and H ( y | x ) = H ( ε ) with ε being


AWGN
1

H(y|x) = log 2 2πeN P


2
• Therefore, the average mutual
information
1 SP
I ( x , y) = log2 1+
2 NP
Shannon-Hartley Law
• With a sampling rate of f s = 2 · B , the analogue channel capacity
is given by
SP
C = f s · I ( x , y) = B · log2 1 + (bits/second)
NP

where B is the signal bandwidth

• For digital communications, B (Hz) is equivalent to the channel


bandwidth, and
f s the symbol rate (symbols/second)

• Channel noise power is N P = N 0 · B , where N 0 is the power spectral


density of the channel AWGN

• Obvious implications:
S P increases the channel capacity
– Increasing the SNR
NP
– Increasing the channel bandwidth B increases the channel capacity
Bandwidth and SNR Trade
off
• From the definition of channel capacity, we can trade the channel bandwidth B
for the SNR or signal power S P , and vice versa

• Depending on whether B or S P is more precious, we can increase one and reduce


the other, and yet maintain the same channel capacity

• A noiseless analogue channel ( S P / N P = ∞) has an infinite capacity

• C increases as B increases, but it does not go to infinity as B → ∞; rather C


approaches an

We have
SP SP
C∞ = lim C = log 2 e = 1.44
B→∞ N0 N0
Bandwidth and SNR Trade
off – Example
• Q: A channel has an SNR of 15. If the channel bandwidth is
reduced by half, determine the increase in the signal power
required to maintain the same channel capacity

• A !

: SP ′ SP
B · log2 1+ = B · log2 1+
N0 B N0 B′
!
B (S′ / P
4·B = · log2 SP ) · S
2 1+ PN B / 2
0
′ !
SP
8 = log2 1 +
SP
30
S′ ′
256 = 1 + 30 −→ SP =
SP
P 8.5S P
ELEC3028 Digital Transmission – Overview & S

Summary
Information Theory Chen

• Channel capacity for discrete channels

• Channel capacity for continuous channels

• Shannon theorem

• Bandwidth and signal power trade off

7
Topic: 7- Bandwidth and Channel Capacity: Mutual
Information, Shannon’s Channel Capacity Theorem, Bandwidth
and Channel Capacity
L-1: A family is experiencing slow Wi-Fi speeds when multiple
devices are connected.
1. What factors affect the channel capacity of a home Wi-Fi
network?
2. How does bandwidth impact internet speed?
3. Suggest three practical ways to improve Wi-Fi speed based
on Shannon’s Channel Capacity theorem.

L-2: A remote worker faces lag and poor quality during video
calls, especially when multiple apps are running.
4. What is the role of mutual information in determining the
quality of transmitted video and audio?
5. Suggest two optimizations (e.g., adjusting bandwidth
allocation, using adaptive coding) to ensure smooth video
calls.
Cont…Topic: 7- Bandwidth and Channel Capacity:

L-3: A telecom company wants to improve 5G network performance


in a densely populated area where many users compete for
bandwidth.
1. How does channel capacity limit data rates in a congested area?
2. What techniques (e.g., MIMO, adaptive modulation, error
correction) can be used to increase data transmission
efficiency?

L-4: A government is launching a satellite-based internet service to


provide online education in rural areas with poor connectivity.
3. How does Shannon’s Channel Capacity theorem help determine
the maximum reliable data rate for satellite communication?
4. What role does mutual information play in reducing errors in
long-distance communication?

You might also like