0% found this document useful (0 votes)
7 views

확통1 LectureNote06 on Limit Theorems

Uploaded by

jedem10224
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

확통1 LectureNote06 on Limit Theorems

Uploaded by

jedem10224
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Limit Theorems

Limit Theorems: Motivation


𝑋1 , ⋯ , 𝑋𝑛 are i.i.d. random variables.
Let
𝑋1 +⋯+𝑋𝑛
𝑀𝑛 = .
𝑛
What happens to 𝑀𝑛 as 𝑛 → ∞ ?

• A tool: Several inequalities in probability

• Convergence “in probability”

• Convergence “with probability 1”


.
2
Markov Inequality
• For a nonnegative random variable 𝑋, 𝑓𝑋 (𝑥)

𝐄[𝑋]
P 𝑋≥𝑎 ≤ for all 𝑎 > 0
𝑎

0, if 𝑋 < 𝑎
• Why? : Let 𝑌𝑎 = ቊ 𝑃(𝑌𝑎 = 0)

𝑎, if 𝑋 ≥ 𝑎 𝑃(𝑌𝑎 = 𝑎)

Then, 𝑌𝑎 ≤ 𝑋 and 𝐄 𝑌𝑎 ≤ 𝐄[𝑋]. 0 𝑎

On the other hand, 𝐄 𝑌𝑎 = 𝑎𝐏 𝑌𝑎 = 𝑎 = 𝑎𝐏 𝑋 ≥ 𝑎 ,


from which we get the result.
3
Generalized Markov Inequality
• We now have for a nonnegative random variable 𝑋,
𝐄𝑋
P 𝑋≥𝑎 ≤ for all 𝑎 > 0
𝑎
• Next, we can generalize the Markov inequality. We can
substitute any positive non-decreasing function 𝑓: 𝑋 →
ℝ+ :
𝐄[𝑓(𝑋)]
P 𝑋 ≥ 𝑎 = 𝐏(𝑓 𝑋 ≥ 𝑓 𝑎 ) ≤
𝑓(𝑎)

⚫ If we pick 𝑓(𝑋) judiciously we can obtain better bounds.

4
Chebyshev Inequality
• For a random variable 𝑋 with mean 𝐄[𝑋] and variance 𝜎𝑋2 ,
𝜎𝑋2
𝐏 𝑋−𝐄 𝑋 ≥𝑐 ≤ for all 𝑐 > 0
𝑐2
• Why? : As a first application of the generalized Markov bound,
we pick 𝑓 𝑋 = 𝑋 2 . Then,
𝐏 𝑋 − 𝐄 𝑋 ≥ 𝑐 = 𝐏 𝑋 − 𝐄 𝑋 2 ≥ 𝑐2
𝐄 𝑋−𝐄 𝑋 2 𝜎𝑋2
≤ 2
= 2
𝑐 𝑐
⚫ For 𝑐 = 𝑘𝜎,
1
𝐏(|𝑋 − 𝐄 𝑋 | ≥ 𝑘𝜎𝑋 ) ≤ 2
𝑘

5
Example: Chebyshev bound is conservative
• The Chebyshev bound is more powerful than the Markov bound,
because it also uses variance. But since the mean and variance are
only a rough summary of its properties, we cannot expect the
bound to be close approximation to the exact value.
2 4−0 2ൗ
• If 𝑋~𝑈[0,4], we 𝐄 X = 2, 𝜎𝑋 ≤ 12 = 4/3, and for 𝑐 = 1
4
𝐏 𝑋−2 ≥1 ≤ .
3
which is uninformative compared to the exact value 1/2.
• Let 𝑋~Exp(𝜆 = 1), so that 𝐄 X = 1, 𝜎𝑋2 = 1. For 𝑐 > 1,
𝐏 𝑋 ≥𝑐 =𝐏 𝑋−1≥𝑐−1
1
≤𝐏 𝑋−1 ≥𝑐−1 ≤ 2
,
𝑐−1
which is again conservative compared to the exact value
𝐏 𝑋 > 𝑐 = 𝑒 −𝑐 .
6
Example: Upper bound of Chebyshev Ineq.
• If 𝑋 is in [𝑎, 𝑏], we claim a conservative bound 𝜎𝑋2 ≤ 𝑏 − 𝑎 2 /4.
If 𝜎𝑋2 is unknown, we may use 𝜎𝑋2 = 𝑏 − 𝑎 2 /4, and claim
𝑏−𝑎 2
𝐏(|𝑋 − 𝐄 𝑋 | ≥ 𝑐) ≤
4𝑐 2
• Why? : For any constant 𝛾, we have
𝐄 (𝑋 − 𝛾)2 = 𝐄 𝑋 2 − 2𝐄 𝑋 𝛾 + 𝛾 2 ,
and this is minimized when 𝛾 = 𝐄 𝑋 . Thus,
2
𝐄 (𝑋 − 𝛾)2 ≥𝐄 𝑋−𝐄 𝑋 = 𝜎𝑋2 , for all 𝛾.
By setting 𝛾 = (𝑎 + 𝑏)/2, we have
2 2 2
𝑎+𝑏 𝑏−𝑎 𝑏−𝑎
𝜎𝑋2 ≤ 𝐄 𝑋− =𝐄 𝑋−𝑎 𝑋−𝑏 + ≤
2 4 4
where the last inequality follows (𝑥 − 𝑎)(𝑥 − 𝑏) ≤ 0 for all 𝑥 in
the range [𝑎, 𝑏].
7
Chernoff Bound (1)
• Chernoff bounds are typically (but not always) tighter than
Markov and Chebyshev bounds but require stronger assumptions.
Let 𝑋 be a sum of 𝑛 independent Bernoulli random variables {𝑋𝑖 },
𝑋 = σ𝑖 𝑋𝑖 with 𝐄[𝑋𝑖 ] = 𝑝𝑖 . Let 𝜇 = 𝐄 𝑋 . Then we have
𝜇 = 𝐄 𝑋 = 𝐄 ෍ 𝑋𝑖 = ෍ 𝐄[𝑋𝑖 ] = ෍ 𝑝𝑖
𝑖 𝑖 𝑖
• We pick 𝑓 𝑋 = 𝑒 𝑡𝑋 . Then,
𝐄 𝑒 𝑡𝑋
𝐏[𝑋 ≥ 1 + 𝛿 𝜇] = 𝐏 𝑒 𝑡𝑋 ≥ 𝑒 1+𝛿 𝜇𝑡 ≤ 1+𝛿 𝜇𝑡 (1)
𝑒
⚫ We will establish a bound on 𝐄 𝑒 𝑡𝑋 :

𝐄 𝑒 𝑡𝑋 = 𝐄 𝑒 𝑡 σ 𝑋𝑖 = 𝐄 ෑ 𝑒 𝑡𝑋𝑖 = ෑ 𝐄[𝑒 𝑡𝑋𝑖 ]

= ෑ(𝑝𝑖 𝑒 𝑡 + (1 − 𝑝𝑖 ) ∙ 1) = ෑ(1 + 𝑝𝑖 (𝑒 𝑡 − 1)) 8


Chernoff Bound (2)
• We now use the following approximation, for 𝑦 ∈ ℝ, 1 + 𝑦 ≤ 𝑒 𝑦 .
Hence, regarding 𝑦 as 𝑝𝑖 (𝑒 𝑡 − 1)
𝑡 −1)
𝐄 𝑒 𝑡𝑋 = ෑ(1 + 𝑝𝑖 (𝑒 𝑡 − 1)) ≤ ෑ𝑒 𝑝 𝑖 (𝑒

σ 𝑝𝑖 (𝑒 𝑡 −1) (𝑒 𝑡 −1) σ 𝑝 (𝑒 𝑡 −1)𝜇


=𝑒 = 𝑒 = 𝑖 𝑒
Substituting this into Eq.(1), we get that for all 𝑡 ≥ 0,
𝑒 𝑡 −1 𝜇
𝑒
𝐏 𝑋 ≥ 1 + 𝛿 𝜇 ≤ 1+𝛿 𝜇𝑡 (2)
𝑒
⚫ In order to make the bound as tight to as possible, we find the
value of 𝑡 that minimizes the upper bound of eq.(2), 𝑡 = ln 1 + 𝛿 .
Substituting this into eq.(2), we obtain, for all 𝛿 ≥ 0:
𝑒 ln 1+𝛿 −1 𝜇− 1+𝛿 ln 1+𝛿 𝜇
𝐏 𝑋 ≥ 1+𝛿 𝜇 ≤𝑒
= 𝑒 [𝛿− 1+𝛿 ln 1+𝛿 ]𝜇 (3)
9
Chernoff Bound (3)
⚫ We will now try to obtain a simpler form of the above bound. In
particular, we use the Taylor series expansion of ln 1 + 𝛿 given by
𝑖+1 𝛿𝑖
ln 1 + 𝛿 = σ𝑖≥1(−1) ∙ . Therefore
𝑖
𝑖𝛿𝑖(
1 1
1 + 𝛿 ln 1 + 𝛿 = 𝛿 + ෍ −1 − )
𝑖≥2 𝑖−1 𝑖
Assuming that 0 < 𝛿 < 1, and ignoring the higher order terms,
𝛿2 𝛿3 𝛿2
1 + 𝛿 ln 1 + 𝛿 > 𝛿 + − >𝛿+
2 6 3
Plugging this into eq.(3), we obtain
−𝛿 2 𝜇
𝐏 𝑋 ≥ 1+𝛿 𝜇 ≤ 𝑒 3 (0 < 𝛿 < 1)

⚫ A very similar calculation shows that


−𝛿 2 𝜇
𝐏 𝑋 < 1−𝛿 𝜇 ≤ 𝑒 2 (0 < 𝛿 < 1) 10
A More General Chernoff Bound
2𝛿
⚫ We observe that ln 1 + 𝛿 > for 𝛿 > 0. This implies that
2+𝛿
−𝛿 2
𝛿 − 1 + 𝛿 ln 1 + 𝛿 ≤ .
2+𝛿
Hence, using eq.(3) we obtain the following bound, which works
for all positive 𝛿 ,
−𝛿 2 𝜇
𝐏 𝑋 ≥ 1+𝛿 𝜇 ≤ 𝑒 2+𝛿 (𝛿 > 0)
Similarly, it can be shown that
−𝛿 2 𝜇
𝐏 𝑋 < 1−𝛿 𝜇 ≤ 𝑒 2+𝛿 (𝛿 > 0)
⚫ We can combine both inequalities into one called two-sided
Chernoff bound
𝛿2𝜇

𝐏 |𝑋 − 𝜇| ≥ 𝛿𝜇 ≤ 2𝑒 2+𝛿 (𝛿 > 0)
11
Example: Fair Coin Tossing
• Suppose you toss a fair coin 200 times. How likely is it that you
see at least 120 heads?
𝑛
First note that 𝜇 = = 100, and from 120 = 1 + 𝛿 𝜇, we see
2
𝛿 = 0.2. Then the Chernoff bound says
0.22 ×100 20
𝐏 𝑋 ≥ 120 ≤ 𝑒 − 2+0.2 =𝑒 −11
= 0.162
⚫ Let us compare this with the Chebyshev bound. Note that 𝜎 2 =
𝑛
= 50, and from 120 = 1 + 𝛿 𝜇, we see 𝜇𝛿 = 20. Then the
4
Chebyshev bound is
𝜎2 50
𝐏 𝑋 ≥ 120 ≤ 2
= 2 = 0.125
(𝜇𝛿) 20
This result shows that the Chernoff bound is not always tighter
than the Chebyshev bound.
12
Convergence of a deterministic sequence

• We have a sequence of real numbers 𝑎𝑛 and a number 𝑎

• We say that 𝑎𝑛 converges to 𝑎 and write lim 𝑎𝑛 = 𝑎,


𝑛→∞
− if (intuitively): 𝑎𝑛 eventually gets and stays
(arbitrarily) close to 𝑎

− if (rigorously): For every 𝜖 > 0, there exists


some 𝑛0 such that for all 𝑛 ≥ 𝑛0 ,
𝑎𝑛 − 𝑎 < 𝜖

13
Convergence “in probability”

• We have a sequence of random variables 𝑌𝑛


• We say that 𝑌𝑛 converges to a number 𝑎 in probability,
− if (intuitively): “Almost all” of the PMF/PDF of 𝑌𝑛
eventually gets concentrated (arbitrarily)
close to 𝑎

− if (rigorously): For every 𝜖 > 0, we have


lim 𝐏 𝑌𝑛 − 𝑎 < 𝜖 = 1
𝑛→∞

14
Example: Convergence
• One might be tempted to believe that if a sequence 𝑌𝑛
converges to a number 𝑎, then 𝐄[𝑌𝑛 ] must also to 𝑎. The
following example shows this need not be a case.
• Consider a sequence of random variables with the following
sequence of PMFs: 𝑷𝒀 (𝒚)
𝒏

1 − 𝑛1 , 𝑦=0 𝟏−
𝟏
𝐏 𝑌𝑛 = 𝑦 = ቐ 1
𝒏

𝑛, 𝑦 = 𝑛2 𝟏
𝒏
• For every 𝜖 > 0, we have 𝟎 𝒏𝟐 𝒀𝒏
1
lim 𝐏 𝑌𝑛 − 0 ≥ 𝜖 = lim = 0.
𝑛→∞ 𝑛→∞ 𝑛
Thus, 𝑌𝑛 converges to 0 in probability.
1
• 𝐄 𝑌𝑛 = 𝑛2 × = 𝑛, which goes to ∞ as 𝑛 increases.
𝑛
15
Convergence “with probability 1” (1)
• We have a sequence of random variables 𝑌1 , 𝑌2 , 𝑌3 , …
(not necessarily i.i.d.)
• We say that 𝑌𝑛 converges to 𝑎 with probability 1 (wp1)
(or almost surely (a.s.)) if

𝐏( lim 𝑌𝑛 = 𝑎) = 1
𝑛→∞

• Convergence with probability 1 implies convergence in


probability, but the converse is not necessarily true

16
Convergence “with probability 1” (2)
• Consider a sequence 𝑌1 , 𝑌2 , 𝑌3 , … . If for all 𝜖 > 0, we have

෍ 𝐏 𝑌𝑛 − 𝑎 > 𝜖 < ∞,
𝑛=1
a.s.
then 𝑌𝑛 𝑎. This provides only a sufficient condition for
almost sure convergence.
• In the case σ∞
𝑛=1 𝐏 𝑌𝑛 − 𝑎 > 𝜖 = ∞, then we have a
necessary and sufficient condition for almost surely
convergence: Define the set of events
𝑆𝑚 = 𝑌𝑛 − 𝑎 < 𝜖, for all 𝑛 ≥ 𝑚 .
a.s.
Then, 𝑌𝑛 𝑎 if and only if for any 𝜖 > 0, we have

lim 𝐏 𝑆𝑚 = lim 𝐏 𝑌𝑛 − 𝑎 < 𝜖, for all 𝑛 ≥ 𝑚 = 1


𝑚→∞ 𝑚→∞
17
Convergence “with probability 1” (3)
• Example: Let 𝑋1 , … , 𝑋𝑛 be i.i.d. Bernoulli(1/2), and
define 𝑌𝑛 = 2𝑛 ς𝑛𝑖=1 𝑋𝑖 . Then for any 0 < 𝜖 < 2𝑛 ,
𝐏{ 𝑌𝑛 − 0 < 𝜖 for all 𝑛 ≥ 𝑚}
= 𝐏{𝑋𝑛 = 0 for some 𝑛 ≤ 𝑚}
= 1 − 𝐏{𝑋𝑛 = 1 for all 𝑛 ≤ 𝑚}
= 1 − 1Τ2 𝑚 ,
which converges to 1 as 𝑚 → ∞. Hence, the sequence 𝑌𝑛
converges to 0 almost surely.
• Exercise: Let 𝑋𝑛 be i.i.d. Bernoulli(1/𝑛) rvs for 𝑛 =
a.s.
2,3, … . The goal is to check whether 𝑋𝑛 0.
(a) Check that σ∞𝑛=1 𝐏 𝑋𝑛 − 0 > 𝜖 = ∞ .
(b) Show that 𝑋𝑛 does not converge to 0 almost surely.
18
Convergence of Sample Mean
• Let 𝑋1 , ⋯ , 𝑋𝑛 be i.i.d. rvs with mean 𝜇 and variance
𝜎 2 and the sample mean is
𝑋1 + 𝑋2 + ⋯ 𝑋𝑛
𝑀𝑛 =
𝑛
• Mean: 𝐄 𝑀𝑛 = 𝜇
𝜎2
• Variance: 𝐕 𝑀𝑛 =
𝑛
𝐕 𝑀𝑛
• Chebyshev: 𝐏( 𝑀𝑛 − 𝐄 𝑀𝑛 ≥ ϵ) ≤
𝜖2

𝜎2
𝐏( 𝑀𝑛 − 𝜇 ≥ 𝜖) ≤ 2
𝑛𝜖 19
WLLN and SLLN
• Let 𝑋1 , ⋯ , 𝑋𝑛 be i.i.d. with finite mean 𝜇 and variance 𝜎 2
• Weak Law of Large Numbers (WLLN)
For every 𝜖 > 0, 𝑀𝑛 converges to 𝜇 in probability

𝐏 |𝑀𝑛 − 𝜇| ≥ 𝜖 → 0 , as 𝑛 → ∞

• Strong Law of Large Numbers (SLLN)


𝑀𝑛 converges to 𝜇 with probability 1, in the sense that

𝐏 lim 𝑀𝑛 = 𝜇 = 1
𝑛→∞

20
The Pollster’s Problem (1)
• 𝑝: proportion of population that do something
1 if "Yes"
• 𝑖th person polled ~ Bernoulli(𝑝): 𝑋𝑖 = ቊ
0 if "No"
σ𝑛
• 𝑀𝑛 = 𝑖=1 𝑋𝑖
𝑛
= sample proportion of “Yes” as our estimate of 𝑝
• How many persons should be polled to satisfy
𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05
• Chebyshev bound is 𝐏 𝑀𝑛 − 𝐄 𝑀𝑛 ≥ 𝜖 ≤ 2 .
𝐕 𝑀𝑛
𝜖
𝑝(1−𝑝) 1
We have ϵ = 0.01, 𝐄 𝑀𝑛 = 𝑝, 𝐕 𝑀𝑛 = ≤
𝑛 4𝑛
1 (∵)When 𝑿 takes values
Thus, 𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 2 ≤ 0.05 in [𝒂,𝟐𝒃], 𝝈𝟐𝑿 ≤ 𝒃 − 𝒂 𝟐/𝟒.
4𝑛 0.01 So 𝝈𝑿 = 𝒑(𝟏 − 𝒑) ≤ 𝟏/𝟒
• If we choose 𝑛 large enough to satisfy the above bound,
we have a conservative value of 𝑛 ≥ 50,000
21
Central Limit Theorem (1)
• Let 𝑋1 , ⋯ , 𝑋𝑛 be a sequence of i.i.d. rvs with finite mean
𝜇 and variance 𝜎 2
• Look at three variants of their sum:
− 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 (variance 𝑛𝜎 2 ) increases to ∞
− 𝑀𝑛 = 𝑆𝑛 /𝑛 (variance 𝜎 2 /𝑛) converges “in probability”
to 𝜇 from WLLN
− 𝑆𝑛 / 𝑛 has the variance at a constant level 𝜎 2
• We define a “standardized” sum
𝑀𝑛 − 𝐄(𝑀𝑛 ) 𝑀𝑛 − 𝜇 𝑛𝑀𝑛 − 𝑛𝜇 𝑆𝑛 − 𝑛𝜇
𝑍𝑛 = = = =
𝜎𝑀𝑛 𝜎/ 𝑛 𝜎 𝑛 𝜎 𝑛
from which 𝐄 𝑍𝑛 = 0 and 𝐕(𝑍𝑛 ) = 1
22
Central Limit Theorem (2)
• Then, the CDF of 𝑍𝑛 converges to the standard normal CDF
in the sense that
lim 𝐏(𝑍𝑛 ≤ 𝑧) = Φ(𝑧), for every 𝑧
𝑛→∞
where Φ(𝑧) is the standard normal CDF
𝑧
1 −𝑥 2 /2
Φ 𝑧 = න𝑒 𝑑𝑥
2𝜋
−∞
• This is called the Central Limit Theorem (CLT).

23
What exactly does the CLT say?
• CDF of 𝑍𝑛 converges to Φ(𝑧)
− Not a statement about convergence of PDFs or PMFs.

• Normal Approximation:
− Treat 𝑍𝑛 as if normal (CLT)
− Also treat 𝑆𝑛 as if normal (NA)

• Can we use it when 𝑛 is “moderate” ?


− Yes, but no nice theorems about the value of 𝑛

24
Normal Approximation based on CLT
• If 𝑛 is large, the probability 𝐏(𝑆𝑛 ≤ 𝑠) can be
approximated by treating 𝑆𝑛 as if it were normal,
according to the following procedure:

1. Calculate the mean 𝑛𝜇 and the variance 𝑛𝜎 2 of 𝑆𝑛 .


2. Calculate the normalized value 𝑧 = (𝑠 − 𝑛𝜇)/𝜎 𝑛.
3. Use the approximation
𝐏 𝑆𝑛 ≤ 𝑠 ≈ 𝐏 𝑍𝑛 ≤ 𝑧 = Φ 𝑧 ,
where Φ 𝑧 is available from the standard normal
CDF table.

25
Example: CLT (1)
• We load on a plane 100 packages whose weights are i.i.d. rvs
that are uniformly distributed between 5 and 50 kgs. What is
𝐏(𝑆100 > 3000 kgs) ?
5+50 (50−5)2
• 𝜇= = 27.5, 𝜎 2 = = 168.75
2 12
3000 − 100 × 27.5
𝑧= = 1.92
100 × 168.75
Use the standard normal tables to get the approximation
𝐏(𝑆100 ≤ 3000) ≈ Φ 1.92 = 0.9726.
Thus, the desired probability is
𝐏 𝑆100 > 3000 = 1 − 𝐏 𝑆100 ≤ 3000
≈ 1 − 0.9726 = 0.0274.

26
Example: CLT (2)
• The production times of machine parts are i.i.d. rvs, uniformly
distributed in [1, 5] minutes. What is the probability that the
number of parts produced within 320 minutes, 𝑁320 , is at least
100?
• Let 𝑋𝑖 be the processing time of the 𝑖th part and let 𝑆100 be the
total processing time of the 100 parts. Note that the event {𝑁320 ≥
100} is the same as the event {𝑆100 ≤ 320}.
1+5 (5−1)2 320−100×3
𝜇= = 3, 𝜎2 = = 4/3, 𝑧 = = 1.73
2 12 100×4/3
Thus, the desired probability is
𝐏 𝑁320 ≥ 100 = 𝐏 𝑆100 ≤ 320 ≈ Φ 1.73 = 0.9582.
n n+1

𝑆𝑛 ⇒ 𝑁𝑡 ≥ 𝑛 = {𝑆𝑛 ≤ 𝑡}
27
𝑁𝑡 events
Continuity Correction (1)
• Let us assume that 𝑌~Bin(𝑛 = 20, 𝑝 = 1/2), and suppose that
we are interested in 𝐏(8 ≤ 𝑌 ≤ 10). Then,
𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 with 𝑋𝑖 ~Bernoulli 𝑝 = 1/2 .
• We can apply the CLT to approximate
8 − 𝑛𝜇 𝑌 − 𝑛𝜇 10 − 𝑛𝜇
𝐏 8 ≤ 𝑌 ≤ 10 = 𝐏 ≤ ≤
𝜎 𝑛 𝜎 𝑛 𝜎 𝑛
8 − 10 10 − 10
=𝐏 ≤𝑍≤
5 5
2
≈Φ 0 −Φ − = 0.3145
5
• We can also find the exact value
10 𝑘 20−𝑘
20 1 1
𝐏 8 ≤ 𝑌 ≤ 10 = ෍ 1− = 0.4565
𝑘=8 𝑘 2 2 28
Continuity Correction (2)
• We notice that our approximation is not good. Part of the error
comes from the fact that 𝑌 is a discrete rv and we are using a
continuous distribution. Here is a trick to get a better result, called
continuity correction.
• Since 𝑌 can only take integer values, we can write
𝐏 8 ≤ 𝑌 ≤ 10 = 𝐏(7.5 ≤ 𝑌 ≤ 10.5)
7.5 − 10 𝑌 − 𝑛𝜇 10.5 − 10
= 𝐏( ≤ ≤ )
5 𝜎 𝑛 5
0.5 2.5
≈Φ −Φ − = 0.4567
5 5
• As we can see, our approximation improved significantly. The
continuity correction is particularly useful when we use the
normal approximation to the binomial distribution.
29
Continuity Correction (3)
𝑌 is at least 8 = {𝑌 ≥ 8}
(includes 8 and above)

𝑌 is more than 8 = {𝑌 > 8}


(doesn’t include 8)

𝑌 is at most 8 = {𝑌 ≤ 8}
(includes 8 and below)

𝑌 is fewer than 8 = {𝑌 < 8}


(doesn’t include 8)

𝑌 is exactly 8 = {𝑌 = 8}
30
The Pollster’s Problem (2)
• Suppose we want 𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05 with 𝐄 𝑆𝑛 =
𝑛𝑝, 𝜎𝑆2𝑛 = 𝑛𝜎 2 and 𝜎 2 ≤ 1/4 (∵)Since 𝒑 takes values
in [𝟎, 𝟏], 𝝈𝟐 ≤ 𝟏 − 𝟎 𝟐 /𝟒
𝒑

• Event of interest: |𝑀𝑛 − 𝑝| ≥ 0.01


𝑋1 + ⋯ 𝑋𝑛 − 𝑛𝑝
≥ 0.01
𝑛
𝑋1 + ⋯ 𝑋𝑛 − 𝑛𝑝 0.01 𝑛

𝜎 𝑛 𝜎
𝑍𝑛 ≥ 0.01 𝑛/𝜎 ≈ |𝑍| ≥ 0.01 𝑛/𝜎
⇒ 𝐏 |𝑀𝑛 − 𝑝| ≥ 0.01 ≈ 𝐏 |𝑍| ≥ 0.01 𝑛/𝜎
• Obtain the upper bound on 𝑍 by assuming that 𝑝 has the largest
possible variance, 𝜎 2 = 1/4, which corresponds to 𝑝 = 1/2.
⇒ 𝐏 |𝑀𝑛 − 𝑝| ≥ 0.01 ≈ 𝐏 |𝑍| ≥ 0.02 𝑛
31
The Pollster’s Problem (3)
• How large a sample size n is needed if we want
𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05 ?
⇒ 𝐏 𝑀𝑛 − 𝑝 ≥ 0.01 ≈ 𝐏 𝑍 ≥ 0.02 𝑛
= 2 − 2𝐏(𝑍 ≤ 0.02 𝑛) = 2 − 2Φ(0.02 𝑛) ≤ 0.05,
or
Φ(0.02 𝑛) ≥ 0.975
• From the standard normal table, Φ 1.96 = 0.975
1.962
0.02 𝑛 ≥ 1.96 or 𝑛 ≥ = 9604
0.022

• Compare to 𝑛 ≥ 50,000 that we derived using the


Chebyshev’s inequality
32
Usefulness of the CLT
• Only means and variances matter
• Much more accurate than Chebyshev’s inequality
• Useful computational shortcut, even if we have a
formula for the distribution of 𝑆𝑛
• Justification of models involving normal rvs
− Noise in electrical components
− Motion of a particle suspended in a fluid (Brownian
motion)

33
CLT Summary

• 𝑋1 , ⋯ , 𝑋𝑛 are i.i.d. with finite 𝜇 and 𝜎 2


• 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 with mean 𝑛𝜇 and variance 𝑛𝜎 2
𝑆𝑛 −𝑛𝜇
• 𝑍𝑛 = →𝑍
𝜎 𝑛
where 𝑍 is standard normal (zero mean, unit variance)
• CLT: For every 𝑐, 𝐏 𝑍𝑛 ≤ 𝑐 → 𝐏 𝑍 ≤ 𝑐 = Φ(𝑐)
• Normal approximation: Treat 𝑆𝑛 as if normal

34
Proof of the CLT
• Assume for simplicity 𝜇 = 𝐄 𝑋 = 0, 𝐄 𝑋 2 = 𝜎 2 = 1
(𝑋1 +𝑋2 +⋯+𝑋𝑛 )
• We want to show that 𝑍𝑛 = converges to
𝑛
the standard normal, or equivalently show the MGF of 𝑍𝑛
tends to that of the standard normal distribution:
𝑀𝑍𝑛 𝑠 = 𝐄 𝑒 𝑠𝑍𝑛 = 𝐄[𝑒 (𝑠Τ 𝑛)(𝑋1+⋯+𝑋𝑛 ) ]
𝑠 𝑠 2 𝑠 2
𝐄 𝑒 𝑠𝑋Τ 𝑛 ≈ 1 + 𝐄𝑋 + 𝐄 𝑋2 ≈ 1 +
𝑛 2𝑛 2𝑛
𝑛 𝑛
𝑠2 2 /2
Thus, 𝑀𝑍𝑛 𝑠 = 𝐄[𝑒 𝑠𝑋Τ 𝑛 ] ≈ 1+ → 𝑒 𝑠 ,
2𝑛
which is the MGF of the standard normal distribution.

Note) MGF of 𝑁 𝜇, 𝜎 2 is exp(𝜇𝑠 + 𝜎 2 𝑠 2 Τ2) 35


Homework #6
Textbook “Introduction to Probability”, 2nd Edition, D. Bertsekas and J. Tsitsiklis
Chapter5 p.284-p.294, Problems 1, 4, 5, 8, 9, 10, 11
Due date: 아주BB 과제출제 확인

36

You might also like