0% found this document useful (0 votes)
15 views

MIT Lec03

The document discusses various types of convergence in probability theory including weak and strong laws of large numbers. It then introduces the asymptotic equipartition property, which states that the entropy of a random variable can be approximated by the entropy of its realizations. The concept of a typical set is also introduced as the set of realizations that have probabilities close to what is expected based on the entropy.

Uploaded by

Mohamed Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

MIT Lec03

The document discusses various types of convergence in probability theory including weak and strong laws of large numbers. It then introduces the asymptotic equipartition property, which states that the entropy of a random variable can be approximated by the entropy of its realizations. The concept of a typical set is also introduced as the set of realizations that have probabilities close to what is expected based on the entropy.

Uploaded by

Mohamed Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

LECTURE 3

Convergence and Asymptotic


Equipartition Property

Last time:
• Convexity and concavity
• Jensen’s inequality
• Positivity of mutual information
• Data processing theorem
• Fano’s inequality

Lecture outline
• Types of convergence
• Weak Law of Large Numbers
• Strong Law of Large Numbers
• Asymptotic Equipartition Property

Reading: Scts. 3.1-3.2.


Types of convergence

Recall what a random variable is: a map­


ping from its set of sample values Ω onto
R

X: Ω �→ R
ξ → X(ξ)

In the cases we have been discussing, Ω =


X and we map onto [0, 1]
Types of convergence
• Sure convergence: a random sequence
X1, . . . converges surely to r.v. X if ∀ξ ∈
Ω the sequence Xn(ξ) converges to X(ξ)
as n → ∞
• Almost sure convergence (also called con­
vergence with probability 1) the random
sequence converges a.s. (w.p. 1) to X
if the sequence X1(ξ), . . . converges to
X(ξ) for all ξ except possibly on a set of
Ω of probability 0
• Mean-square convergence: X1, . . . con­
verges in m.s. sense to r.v. X if

limn→∞ EXn [|Xn − X|2] → 0


• Convergence in probability: the sequence
converges in probability to X if ∀� > 0

limn→∞ P r[|Xn − X| > �] → 0


• Convergence in distribution: the sequence
converges in distribution if the cumula­
tive distribution function Fn(x) = P r(Xn ≤
x) satisfies limn→∞ Fn(x) → FX (x) at all
x for which F is continuous.
Relations among types of convergence

Venn diagram of relation:


Weak Law of Large Numbers

X1, X2, . . . i.i.d.


finite mean µ and variance σ 2
X1 + · · · + X n
Mn =
n

• E[Mn] =

• Var(Mn) =

2
σX
Pr(|Mn − µ| ≥ �) ≤ 2
n�
Weak Law of Large Numbers

Consequence of Chebyshev’s inequality: Ran­


dom variable X


2
σX = (x − E[X])2PX (x)
x∈X

2 ≥ c2 Pr(|X − E[X]| ≥ c)
σX

2
σX
Pr(|X − E[X]| ≥ c) ≤ 2
c

1
Pr(|X − E[X]| ≥ kσX ) ≤
k2
Strong Law of Large Numbers

Theorem: (SLLN) If Xi are IID, and EX [|X |] <


∞, then
X1 + · · · + X n
Mn = → EX [X], w.p.1.
n
AEP

If X1, . . . , Xn are IID with distribution PX ,


then

1 log(P
−n X1 ,...,Xn (x1 , . . . , xn )) → H(X) in prob­
ability

j
Notation: X i = (Xi, . . . , Xj ) (if i = 1, gen­
erally omitted)

Proof: create r.v. Y that takes the value


yi = − log(PX (xi)) with probability PX (xi)
(note that the value of Y is related to its
probability distribution)

we now apply the WLLN to Y


AEP

1
− log(PX n (xn))
n
n
1 �
= − log(PX (xi))
n i=1
n
1 �
= yi
n i=1

using the WLLN on Y

1 �n
n i=1 yi → EY [Y ] in probability

EY [Y ] = −EZ [log(PX (Z))] = H(X)

for some r.v. Z identically distributed with


X
Consequences of the AEP: the typical
set

(n)
Definition: A� is a typical set with respect
to PX (x) if it is the set of sequences in the
set of all possible sequences xn ∈ X n with
probability:
2−n(H(X)+�) ≤ PX n (xn) ≤ 2−n(H(X)−�)

equivalently

1
H(X) − � ≤ − log(PX n (xn)) ≤ H(X) + �
n

As n increases, the bounds get closer to­


gether, so we are considering a smaller range
of probabilities

We shall use the typical set to describe a


set with characteristics that belong to the
majority of elements in that set.

Note: the variance of the entropy is finite


Consequences of the AEP: the typical
set

Why is it typical? AEP says ∀� > 0, ∀δ > 0,


∃n0 such that ∀n > n0
(n)
P r(A� )≥1−δ
(note: δ can be �)

How big is the typical set?


1 = PX n (xn)
xn ∈X n

≥ PX n (xn)
(n)
xn ∈A�

≥ 2−n(H(X)+�)
(n)
xn ∈A�
(n)
= |A� |2−n(H(X)+�)
(n)
⇒ |A� | ≤ 2n(H(X)+�)
(n)
P r(A� ) ≥ (1 − �)

⇒ 1−�≤ PX n (xn)
(n)
xn ∈A�
(n)
≤ |A� |2−n(H(X)−�)
(n)
⇒ |A� | ≥ 2n(H(X)−�)(1 − �)

Visualize:
Consequences of the AEP: using the
typical set for compression

Description in typical set requires no more


than n(H(X) + �) + 1 bits (correction of 1
bit because of integrality)

(n) C
Description in atypical set A� requires
no more than n log(|X |) + 1 bits

(n)
Add another bit to indicate whether in A�
or not to get whole description
Consequences of the AEP: using the
typical set for compression

Let l(xn) be the length of the binary de­


scription of xn

∀� > 0, ∃n0 s.t. ∀n > n0,

EX n [l(X n)]
� �
= PX n (xn) l(xn) + PX n (xn) l(xn)
(n)
xn ∈Aδ (n) C
xn ∈Aδ

≤ PX n (xn) (n(H(X) + δ) + 2)
(n)
xn ∈Aδ

+ PX n (xn) (n log(|X |) + 2)
(n) C
xn ∈Aδ
= nH(X) + n� + 2

for δ small enough with respect to �

1 l(X n )] ≤ H(X)+� for n sufficiently


so EX n [ n
large.
Jointly typical sequences

(n)
A� is a typical set with respect to PX,Y (x, y)
if it is the set of sequences in the set of all
possible sequences (xn, y n) ∈ X n × Y n with
probability:

2−n(H(X)+�) ≤ PX n (xn) ≤ 2−n(H(X)−�)


� �
2−n(H(Y )+�) ≤ PY n y n ≤ 2−n(H(Y )−�)
� �
2−n(H(X,Y )+�) ≤ PX n,Y n xn, y n ≤ 2−n(H(X,Y )−�)

for (X n, Y n) sequences of length n IID ac­



cording PX n,Y n (xn, y n) = ni=1 PX,Y (xi , yi)

(n)
P r((X n, Y n) ∈ A� ) → 1 as n → ∞
Jointly typical sequences

Use the union bound

(n)
P r((X n, Y n) �∈ A� )
≤ P r((X n, Y n) �∈ A���

(n) )

+ P r((X n) �∈ A��(n)
� )
+ P r((Y n) �∈ A��(n))

For A��� single typical sequence for pair, A��


for X and A� for Y

each element in the RHS goes to 0


MIT OpenCourseWare
http://ocw.mit.edu

6.441 Information Theory


Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like