0% found this document useful (0 votes)

15 views

MIT Lec03

The document discusses various types of convergence in probability theory including weak and strong laws of large numbers. It then introduces the asymptotic equipartition property, which states that the entropy of a random variable can be approximated by the entropy of its realizations. The concept of a typical set is also introduced as the set of realizations that have probabilities close to what is expected based on the entropy.

Uploaded by

Mohamed Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

MIT Lec03

Uploaded by

Mohamed Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

LECTURE 3

Convergence and Asymptotic

Equipartition Property

Last time:
• Convexity and concavity
• Jensen’s inequality
• Positivity of mutual information
• Data processing theorem
• Fano’s inequality

Lecture outline
• Types of convergence
• Weak Law of Large Numbers
• Strong Law of Large Numbers
• Asymptotic Equipartition Property

Reading: Scts. 3.1-3.2.

Types of convergence

Recall what a random variable is: a map

ping from its set of sample values Ω onto
R

X: Ω �→ R
ξ → X(ξ)

In the cases we have been discussing, Ω =

X and we map onto [0, 1]
Types of convergence
• Sure convergence: a random sequence
X1, . . . converges surely to r.v. X if ∀ξ ∈
Ω the sequence Xn(ξ) converges to X(ξ)
as n → ∞
• Almost sure convergence (also called con
vergence with probability 1) the random
sequence converges a.s. (w.p. 1) to X
if the sequence X1(ξ), . . . converges to
X(ξ) for all ξ except possibly on a set of
Ω of probability 0
• Mean-square convergence: X1, . . . con
verges in m.s. sense to r.v. X if

limn→∞ EXn [|Xn − X|2] → 0

• Convergence in probability: the sequence
converges in probability to X if ∀� > 0

limn→∞ P r[|Xn − X| > �] → 0

• Convergence in distribution: the sequence
converges in distribution if the cumula
tive distribution function Fn(x) = P r(Xn ≤
x) satisﬁes limn→∞ Fn(x) → FX (x) at all
x for which F is continuous.
Relations among types of convergence

Venn diagram of relation:

Weak Law of Large Numbers

X1, X2, . . . i.i.d.

ﬁnite mean µ and variance σ 2
X1 + · · · + X n
Mn =
n

• E[Mn] =

• Var(Mn) =

2
σX
Pr(|Mn − µ| ≥ �) ≤ 2
n�
Weak Law of Large Numbers

Consequence of Chebyshev’s inequality: Ran

dom variable X

�
2
σX = (x − E[X])2PX (x)
x∈X

2 ≥ c2 Pr(|X − E[X]| ≥ c)
σX

2
σX
Pr(|X − E[X]| ≥ c) ≤ 2
c

1
Pr(|X − E[X]| ≥ kσX ) ≤
k2
Strong Law of Large Numbers

Theorem: (SLLN) If Xi are IID, and EX [|X |] <

∞, then
X1 + · · · + X n
Mn = → EX [X], w.p.1.
n
AEP

If X1, . . . , Xn are IID with distribution PX ,

then

1 log(P
−n X1 ,...,Xn (x1 , . . . , xn )) → H(X) in prob
ability

j
Notation: X i = (Xi, . . . , Xj ) (if i = 1, gen
erally omitted)

Proof: create r.v. Y that takes the value

yi = − log(PX (xi)) with probability PX (xi)
(note that the value of Y is related to its
probability distribution)

we now apply the WLLN to Y

AEP

1
− log(PX n (xn))
n
n
1 �
= − log(PX (xi))
n i=1
n
1 �
= yi
n i=1

using the WLLN on Y

1 �n
n i=1 yi → EY [Y ] in probability

EY [Y ] = −EZ [log(PX (Z))] = H(X)

for some r.v. Z identically distributed with

X
Consequences of the AEP: the typical
set

(n)
Deﬁnition: A� is a typical set with respect
to PX (x) if it is the set of sequences in the
set of all possible sequences xn ∈ X n with
probability:
2−n(H(X)+�) ≤ PX n (xn) ≤ 2−n(H(X)−�)

equivalently

1
H(X) − � ≤ − log(PX n (xn)) ≤ H(X) + �
n

As n increases, the bounds get closer to

gether, so we are considering a smaller range
of probabilities

We shall use the typical set to describe a

set with characteristics that belong to the
majority of elements in that set.

Note: the variance of the entropy is ﬁnite

Consequences of the AEP: the typical
set

Why is it typical? AEP says ∀� > 0, ∀δ > 0,

∃n0 such that ∀n > n0
(n)
P r(A� )≥1−δ
(note: δ can be �)

How big is the typical set?

�
1 = PX n (xn)
xn ∈X n
�
≥ PX n (xn)
(n)
xn ∈A�
�
≥ 2−n(H(X)+�)
(n)
xn ∈A�
(n)
= |A� |2−n(H(X)+�)
(n)
⇒ |A� | ≤ 2n(H(X)+�)
(n)
P r(A� ) ≥ (1 − �)
�
⇒ 1−�≤ PX n (xn)
(n)
xn ∈A�
(n)
≤ |A� |2−n(H(X)−�)
(n)
⇒ |A� | ≥ 2n(H(X)−�)(1 − �)

Visualize:
Consequences of the AEP: using the
typical set for compression

Description in typical set requires no more

than n(H(X) + �) + 1 bits (correction of 1
bit because of integrality)

(n) C
Description in atypical set A� requires
no more than n log(|X |) + 1 bits

(n)
Add another bit to indicate whether in A�
or not to get whole description
Consequences of the AEP: using the
typical set for compression

Let l(xn) be the length of the binary de

scription of xn

∀� > 0, ∃n0 s.t. ∀n > n0,

EX n [l(X n)]
� �
= PX n (xn) l(xn) + PX n (xn) l(xn)
(n)
xn ∈Aδ (n) C
xn ∈Aδ
�
≤ PX n (xn) (n(H(X) + δ) + 2)
(n)
xn ∈Aδ
�
+ PX n (xn) (n log(|X |) + 2)
(n) C
xn ∈Aδ
= nH(X) + n� + 2

for δ small enough with respect to �

1 l(X n )] ≤ H(X)+� for n suﬃciently

so EX n [ n
large.
Jointly typical sequences

(n)
A� is a typical set with respect to PX,Y (x, y)
if it is the set of sequences in the set of all
possible sequences (xn, y n) ∈ X n × Y n with
probability:

2−n(H(X)+�) ≤ PX n (xn) ≤ 2−n(H(X)−�)

� �
2−n(H(Y )+�) ≤ PY n y n ≤ 2−n(H(Y )−�)
� �
2−n(H(X,Y )+�) ≤ PX n,Y n xn, y n ≤ 2−n(H(X,Y )−�)

for (X n, Y n) sequences of length n IID ac

�
cording PX n,Y n (xn, y n) = ni=1 PX,Y (xi , yi)

(n)
P r((X n, Y n) ∈ A� ) → 1 as n → ∞
Jointly typical sequences

Use the union bound

(n)
P r((X n, Y n) �∈ A� )
≤ P r((X n, Y n) �∈ A��
�
(n) )

+ P r((X n) �∈ A��(n)
� )
+ P r((Y n) �∈ A��(n))

For A�� single typical sequence for pair, A��

for X and A� for Y

each element in the RHS goes to 0

MIT OpenCourseWare
http://ocw.mit.edu

6.441 Information Theory

Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

ECE2191 Lecture Notes
No ratings yet
ECE2191 Lecture Notes
106 pages
Probability Cheat Sheet: Distributions
No ratings yet
Probability Cheat Sheet: Distributions
2 pages
HW 2
No ratings yet
HW 2
3 pages
Lecture 5 - AEP: Nguyễn Phương Thái
No ratings yet
Lecture 5 - AEP: Nguyễn Phương Thái
20 pages
Lecture2
No ratings yet
Lecture2
49 pages
Notes It
No ratings yet
Notes It
46 pages
Information Theory: Ying Nian Wu UCLA Department of Statistics
No ratings yet
Information Theory: Ying Nian Wu UCLA Department of Statistics
41 pages
Lec05 - Asymptotic Equipartition (1)
No ratings yet
Lec05 - Asymptotic Equipartition (1)
22 pages
Strong Law
No ratings yet
Strong Law
9 pages
Lec 4
No ratings yet
Lec 4
8 pages
Statistics Part3 2013
No ratings yet
Statistics Part3 2013
25 pages
Entropy 5
No ratings yet
Entropy 5
9 pages
dabel_info_theory
No ratings yet
dabel_info_theory
25 pages
Entropy 1
No ratings yet
Entropy 1
7 pages
07.2_pp_16_33_Types_and_typical_sequences
No ratings yet
07.2_pp_16_33_Types_and_typical_sequences
18 pages
Shannon's Theorems: Math and Science Summer Program 2020
No ratings yet
Shannon's Theorems: Math and Science Summer Program 2020
28 pages
STAT2011 Week3 2024
No ratings yet
STAT2011 Week3 2024
11 pages
Print_PTEPS
No ratings yet
Print_PTEPS
66 pages
Math5846_chapter6
No ratings yet
Math5846_chapter6
85 pages
Chapter7 (Probability)
No ratings yet
Chapter7 (Probability)
15 pages
Unit 3 YT Part2
No ratings yet
Unit 3 YT Part2
74 pages
Notes Lecture 3
No ratings yet
Notes Lecture 3
13 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
Asymptotic Equipartition Property: Notes On Information Theory
No ratings yet
Asymptotic Equipartition Property: Notes On Information Theory
23 pages
Lecture-6
No ratings yet
Lecture-6
90 pages
Limiting Distributions
No ratings yet
Limiting Distributions
10 pages
Recitation_1
No ratings yet
Recitation_1
10 pages
lecture_note_4
No ratings yet
lecture_note_4
8 pages
Exercise Sheet 1: I I N I 1 N N
No ratings yet
Exercise Sheet 1: I I N I 1 N N
2 pages
2 PDF
No ratings yet
2 PDF
27 pages
Prob Main
No ratings yet
Prob Main
124 pages
ORF309 Limit Theorems
No ratings yet
ORF309 Limit Theorems
7 pages
Chapter8 (Law of Numbers)
No ratings yet
Chapter8 (Law of Numbers)
24 pages
Mathematical Problems and Solutions On Information Theory
No ratings yet
Mathematical Problems and Solutions On Information Theory
28 pages
Lesson4 MAT284 PDF
100% (1)
Lesson4 MAT284 PDF
36 pages
Chapter3 Asymtotic Stats
No ratings yet
Chapter3 Asymtotic Stats
114 pages
Prof (1) F P Kelly - Probability
No ratings yet
Prof (1) F P Kelly - Probability
78 pages
Random Processes: Introduction
No ratings yet
Random Processes: Introduction
54 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Lecture Notes Fall Term 2013
No ratings yet
Lecture Notes Fall Term 2013
40 pages
Convergence Concepts: 2.1 Convergence of Random Variables
No ratings yet
Convergence Concepts: 2.1 Convergence of Random Variables
6 pages
Đồ_án_CSXS (1)
No ratings yet
Đồ_án_CSXS (1)
28 pages
Asymptotic Statistics (By Changliang ZOU)
No ratings yet
Asymptotic Statistics (By Changliang ZOU)
115 pages
MIT18 S096F13 Lecnote3
No ratings yet
MIT18 S096F13 Lecnote3
7 pages
Applied Probability Statistics by Mario Lefebure
0% (1)
Applied Probability Statistics by Mario Lefebure
291 pages
1 Academic Integrity and Collaboration Policy For This Assignment
No ratings yet
1 Academic Integrity and Collaboration Policy For This Assignment
5 pages
Econ-2042- Unit 5-HO
No ratings yet
Econ-2042- Unit 5-HO
22 pages
4 Convergence and Simulation
No ratings yet
4 Convergence and Simulation
55 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
The Law of Large Numbers
No ratings yet
The Law of Large Numbers
10 pages
Skript 2022
No ratings yet
Skript 2022
112 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Block
No ratings yet
Block
19 pages
Additive Cipher: Representation of Plaintext and Ciphertext Characters in Z
No ratings yet
Additive Cipher: Representation of Plaintext and Ciphertext Characters in Z
1 page
MIT Lec21
No ratings yet
MIT Lec21
17 pages
Coding Theory Introduction
No ratings yet
Coding Theory Introduction
2 pages

MIT Lec03

Uploaded by

MIT Lec03

Uploaded by

LECTURE 3

Convergence and Asymptotic

Reading: Scts. 3.1-3.2.

Recall what a random variable is: a map­

In the cases we have been discussing, Ω =

limn→∞ EXn [|Xn − X|2] → 0

limn→∞ P r[|Xn − X| > �] → 0

Venn diagram of relation:

X1, X2, . . . i.i.d.

Consequence of Chebyshev’s inequality: Ran­

Theorem: (SLLN) If Xi are IID, and EX [|X |] <

If X1, . . . , Xn are IID with distribution PX ,

Proof: create r.v. Y that takes the value

we now apply the WLLN to Y

using the WLLN on Y

EY [Y ] = −EZ [log(PX (Z))] = H(X)

for some r.v. Z identically distributed with

As n increases, the bounds get closer to­

We shall use the typical set to describe a

Note: the variance of the entropy is ﬁnite

Why is it typical? AEP says ∀� > 0, ∀δ > 0,

How big is the typical set?

Description in typical set requires no more

Let l(xn) be the length of the binary de­

∀� > 0, ∃n0 s.t. ∀n > n0,

for δ small enough with respect to �

1 l(X n )] ≤ H(X)+� for n suﬃciently

2−n(H(X)+�) ≤ PX n (xn) ≤ 2−n(H(X)−�)

for (X n, Y n) sequences of length n IID ac­

Use the union bound

For A��� single typical sequence for pair, A��

each element in the RHS goes to 0

6.441 Information Theory

You might also like

Recall what a random variable is: a map

Consequence of Chebyshev’s inequality: Ran

As n increases, the bounds get closer to

Let l(xn) be the length of the binary de

for (X n, Y n) sequences of length n IID ac

For A�� single typical sequence for pair, A��