IT_w3
IT_w3
(Chapter 7)
1 / 10
Convergence of random variables
Convergence Recap
A sequence xn is said to converge to a limit l if ∃ an Nϵ such that
∀ n ≥ Nϵ , |xn − l| < ϵ or in other words, all terms of the sequence after
n = N will be in the ϵ-neighborhood of l.
a.s. ⇒ p ⇒ d
Lr ⇒ p ⇒ d 2 / 10
Examples
If you have done a course on probability, you should be familiar with these:
Central Limit Theorem (CLT)
1 P p
Weak law of large numbers (WLLN): For IID Xi s, n i Xi −
→ E [Xi ]
Strong law of large numbers (SLLN)
The CLT is an example of convergence in distribution. WLLN is an
example of convergence in probability while SLLN is an example of almost
sure convergence. For more on these, you may refer to
https://www.probabilitycourse.com/chapter7/7_2_0_
convergence_of_random_variables.php
We need the Convergence in probability for this course:
Convergence in probability
p
Xn −
→ X , if lim P[|Xn − X | > ϵ] = 0
n→∞
Prove Markov and Chebyshev’s inequalities and show WLLN for IID RVs
3 / 10
WLLN and AEP
WLLN → AEP
p
WLLN states for IID Xi s, n1 i Xi −
P
→ E [Xi ].
Hence,
1 1 p 1
n log p(X1 ,X2 ,...,Xn ) −
→ E [log p(X i)
] = H(Xi )
(n)
Typical Set (Aϵ
We define a set to be a typical set of sequences x1 , x2 , . . . , xn ∈ X n if
2−n(H(X )+ϵ) ≤ p(x1 , x2 , . . . , xn ) ≤ 2−n(H(X )−ϵ) ⇔
| n1 log2 p(x1 ,x21,...,xn ) − H(X )| ≤ ϵ
The number of bits to represent such sequences will be restricted in the
interval nH(X ) ± ϵ. The average codeword length is then H(X ) ± ϵ/n.
Hence, we consider those codes whose joint probabilities are such that, the
number of bits required to represent them is bounded in terms of entropy.
Due to AEP, the codes in the typical set will converge in probability to the
entropy and hence, satisfy some properties.
4 / 10
Properties of Aϵ (n)
(n)
If x = (x1 , x2 , . . . , xn ) ∈ Aϵ , then | n1 log2 1
p(x1 ,x2 ,...,xn ) − H(X )| ≤ ϵ:
Already proved
(n)
P[Aϵ ] > 1 − ϵ for sufficiently large n : Obvious from AEP
(n)
|Aϵ | ≤ 2n(H(X )+ϵ)
(n)
|Aϵ | ≥ (1 − ϵ)2n(H(X )−ϵ) for sufficiently large n
Cardinality Proofs
These kind of proofs will be quite common in Information Theory. The idea
is to use probability and the sets under consideration to good use:
X X X
1= p(x) ≥ p(x) ≥ 2−n(H(X )+ϵ) = |A(n)
ϵ |2
−n(H(X )+ϵ)
x∈X n x∈Aϵ
(n)
x∈Aϵ
(n)
For the last property, we use 2nd property, under large n,:
(n) (n)
1 − ϵ < P[Aϵ ] ≤ x∈A(n) 2−n(H(X )−ϵ) = |Aϵ |2−n(H(X )−ϵ)
P
ϵ
5 / 10
Consequences of AEP
Let X n be i.i.d ∼ p(x). Let ϵ > 0. ∃ a code that maps sequences x n of
length n into binary strings such that the mapping is one-one (and hence
invertible) and E [ n1 l(X n )] ≤ H(X ) + ϵ for n sufficiently large. Thus, we
can represent sequences X n using nH(X ) bits on the average.
(n)
Coding Scheme: 0 followed by n(H + ϵ) + 1 for x n ∈ Aϵ and 1 followed
by n log |X | + 1.
n n )l(x n ) =
P
E
P[l(X )] = n x n p(x n
(n) p(x )l(x ) +
Px n ∈Aϵ n n
(n)c p(x )l(x ) ≤
x n ∈Aϵ
(n)
P[Aϵ ](n(H + ϵ) + 2) +
(n)c
P[Aϵ ](n log |X | + 2) ≤ n(H + ϵ) +
ϵ(n log |X |) + 2 = n(H(X ) + ϵ′ )
C = maxp(x) I (X ; Y ) =
maxp(x) H(X ) − H(X |Y ) =
maxp(x) H(X ) = 1 bit per use of
channel. H(X |Y ) = 0 since given Y
there is no uncertainty in X
Please check out the examples in the book to find Channel Capacity of
Noisy Typewriter, BSC and Erasure Channels.
8 / 10
Capacity of Binary Erasure Channel
P
H(Y |X ) = x p(x)H(Y |X = x)
H(Y |X = 0) =
−[(1−α) log 1 − α+α log α] = H(α).
Similarly, H(Y |X = 1) = H(α) and
hence they do not depend on p(x).
Thus, P
H(Y |X ) = H(α) x p(x) = H(α).