0% found this document useful (0 votes)
0 views

IT_w3

The document discusses the convergence of random variables and various types of convergence, including convergence in distribution, probability, mean, and almost sure convergence, with examples like the Central Limit Theorem and the Law of Large Numbers. It also covers the Asymptotic Equipartition Property (AEP) and its implications for source coding, as well as channel models and channel capacity, specifically for discrete memoryless channels and binary erasure channels. The document emphasizes the relationship between typical sets, entropy, and the capacity of communication channels.

Uploaded by

v3193373
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

IT_w3

The document discusses the convergence of random variables and various types of convergence, including convergence in distribution, probability, mean, and almost sure convergence, with examples like the Central Limit Theorem and the Law of Large Numbers. It also covers the Asymptotic Equipartition Property (AEP) and its implications for source coding, as well as channel models and channel capacity, specifically for discrete memoryless channels and binary erasure channels. The document emphasizes the relationship between typical sets, entropy, and the capacity of communication channels.

Uploaded by

v3193373
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Week 3 - AEP (Chapter 3) and Channel Coding

(Chapter 7)

1 / 10
Convergence of random variables
Convergence Recap
A sequence xn is said to converge to a limit l if ∃ an Nϵ such that
∀ n ≥ Nϵ , |xn − l| < ϵ or in other words, all terms of the sequence after
n = N will be in the ϵ-neighborhood of l.

We want to extend this analogy to random variables, but there is a


problem, that random variables are not just numbers, there are probabilities
associated with them.
Types of Convergence
d
Convergence in distribution Xn −
→X
p
Convergence in Probability Xn −
→X
Lr
Convergence in Mean Xn −→ X
a.s.
Almost Sure Convergence Xn −−→ X

a.s. ⇒ p ⇒ d
Lr ⇒ p ⇒ d 2 / 10
Examples
If you have done a course on probability, you should be familiar with these:
Central Limit Theorem (CLT)
1 P p
Weak law of large numbers (WLLN): For IID Xi s, n i Xi −
→ E [Xi ]
Strong law of large numbers (SLLN)
The CLT is an example of convergence in distribution. WLLN is an
example of convergence in probability while SLLN is an example of almost
sure convergence. For more on these, you may refer to
https://www.probabilitycourse.com/chapter7/7_2_0_
convergence_of_random_variables.php
We need the Convergence in probability for this course:
Convergence in probability
p
Xn −
→ X , if lim P[|Xn − X | > ϵ] = 0
n→∞

Prove Markov and Chebyshev’s inequalities and show WLLN for IID RVs
3 / 10
WLLN and AEP
WLLN → AEP
p
WLLN states for IID Xi s, n1 i Xi −
P
→ E [Xi ].
Hence,
1 1 p 1
n log p(X1 ,X2 ,...,Xn ) −
→ E [log p(X i)
] = H(Xi )

(n)
Typical Set (Aϵ
We define a set to be a typical set of sequences x1 , x2 , . . . , xn ∈ X n if
2−n(H(X )+ϵ) ≤ p(x1 , x2 , . . . , xn ) ≤ 2−n(H(X )−ϵ) ⇔
| n1 log2 p(x1 ,x21,...,xn ) − H(X )| ≤ ϵ
The number of bits to represent such sequences will be restricted in the
interval nH(X ) ± ϵ. The average codeword length is then H(X ) ± ϵ/n.
Hence, we consider those codes whose joint probabilities are such that, the
number of bits required to represent them is bounded in terms of entropy.

Due to AEP, the codes in the typical set will converge in probability to the
entropy and hence, satisfy some properties.
4 / 10
Properties of Aϵ (n)
(n)
If x = (x1 , x2 , . . . , xn ) ∈ Aϵ , then | n1 log2 1
p(x1 ,x2 ,...,xn ) − H(X )| ≤ ϵ:
Already proved
(n)
P[Aϵ ] > 1 − ϵ for sufficiently large n : Obvious from AEP
(n)
|Aϵ | ≤ 2n(H(X )+ϵ)
(n)
|Aϵ | ≥ (1 − ϵ)2n(H(X )−ϵ) for sufficiently large n

Cardinality Proofs
These kind of proofs will be quite common in Information Theory. The idea
is to use probability and the sets under consideration to good use:
X X X
1= p(x) ≥ p(x) ≥ 2−n(H(X )+ϵ) = |A(n)
ϵ |2
−n(H(X )+ϵ)

x∈X n x∈Aϵ
(n)
x∈Aϵ
(n)

For the last property, we use 2nd property, under large n,:
(n) (n)
1 − ϵ < P[Aϵ ] ≤ x∈A(n) 2−n(H(X )−ϵ) = |Aϵ |2−n(H(X )−ϵ)
P
ϵ

5 / 10
Consequences of AEP
Let X n be i.i.d ∼ p(x). Let ϵ > 0. ∃ a code that maps sequences x n of
length n into binary strings such that the mapping is one-one (and hence
invertible) and E [ n1 l(X n )] ≤ H(X ) + ϵ for n sufficiently large. Thus, we
can represent sequences X n using nH(X ) bits on the average.
(n)
Coding Scheme: 0 followed by n(H + ϵ) + 1 for x n ∈ Aϵ and 1 followed
by n log |X | + 1.

n n )l(x n ) =
P
E
P[l(X )] = n x n p(x n
(n) p(x )l(x ) +
Px n ∈Aϵ n n
(n)c p(x )l(x ) ≤
x n ∈Aϵ
(n)
P[Aϵ ](n(H + ϵ) + 2) +
(n)c
P[Aϵ ](n log |X | + 2) ≤ n(H + ϵ) +
ϵ(n log |X |) + 2 = n(H(X ) + ϵ′ )

Thus, AEP suggests that there is a 6scope


/ 10 for source coding.
Channel Model

Figure: Communication Channel

A message W is drawn from the index set {1, 2, . . . , M}, resulting in a


signal X n (W ) which is received at the receiver as a random sequence
Y n ∼ p(y n |x n ). The Rx guesses the index W by the decoding rule
Ŵ = g (Y n ).
Discrete Channel
This is denoted by (X , p(y |x), Y) where P for every x ∈ X , we have
p(y |x) ≥ 0, y ∈ Y with the property y p(y |x) = 1
Discrete Memoryless Channel (DMC) with nth extension
This is the channel (X n , p(y n |x n ), Y n ), where p(yk |x k , y k−1 ) = p(yk |xk )
7 / 10
Channel Capacity

Let us define C = maxp(x) I (X ; Y ). Recall from Week 2 that we had


shown that given channel p(y |x), this is concave w.r.t p(x) and hence the
local maxima is global maxima.
Channel with no error

C = maxp(x) I (X ; Y ) =
maxp(x) H(X ) − H(X |Y ) =
maxp(x) H(X ) = 1 bit per use of
channel. H(X |Y ) = 0 since given Y
there is no uncertainty in X

Please check out the examples in the book to find Channel Capacity of
Noisy Typewriter, BSC and Erasure Channels.
8 / 10
Capacity of Binary Erasure Channel

A bit is lost with probability α. Let π = [p0 , p1 ] be the distrubution of X .

P
H(Y |X ) = x p(x)H(Y |X = x)
H(Y |X = 0) =
−[(1−α) log 1 − α+α log α] = H(α).
Similarly, H(Y |X = 1) = H(α) and
hence they do not depend on p(x).
Thus, P
H(Y |X ) = H(α) x p(x) = H(α).

H(Y ) = −[p0 (1 − α) log p0 (1 − α) + p1 (1 − α) log p1 (1 − α) − α log α] =


(1 − α)[H(p) − log(1 − α)] − α log α = (1 − α)H(p) + H(α)
Hence, I (X ; Y ) = (1 − α)H(p) ⇒ maxp(x) = (1 − α) with H(p) = 1 for
p0 = p1 = 12
Suppose, we feedback the lost bit. Can we improve the capacity?
9 / 10
Intuition behind Channel Capacity
We know Anϵ to be a typical set for X n with 2nH(X ) elements that is highly
probable for large n. We are interested in finding how many possible
non-overlapping/disjoint typical sets can one of these typical sequences in
the typical set of X n be transformed via the channel p(y |x).
Given a n-length sequence from
X n , the uncertainty in Y n is
nH(Y |X ) (recall the bits are iid
from X ). So, there are
2nH(Y |X ) n-length sequences
from a X n .
Total number of typical
sequences in Y n is 2nH(Y ) .
Non-overlapping/disjoint sets so Thus, we can send at most 2nI (X ;Y )
that decoding is possible with distinguishable sequences of length n.
low probability of error is However, we need to check if this is
nH(Y )
≤ 22nH(Y |X ) = 2nI (X ;Y ) achievable for any p(x).
10 / 10

You might also like