확통1 LectureNote06 on Limit Theorems
확통1 LectureNote06 on Limit Theorems
𝐄[𝑋]
P 𝑋≥𝑎 ≤ for all 𝑎 > 0
𝑎
0, if 𝑋 < 𝑎
• Why? : Let 𝑌𝑎 = ቊ 𝑃(𝑌𝑎 = 0)
𝑎, if 𝑋 ≥ 𝑎 𝑃(𝑌𝑎 = 𝑎)
4
Chebyshev Inequality
• For a random variable 𝑋 with mean 𝐄[𝑋] and variance 𝜎𝑋2 ,
𝜎𝑋2
𝐏 𝑋−𝐄 𝑋 ≥𝑐 ≤ for all 𝑐 > 0
𝑐2
• Why? : As a first application of the generalized Markov bound,
we pick 𝑓 𝑋 = 𝑋 2 . Then,
𝐏 𝑋 − 𝐄 𝑋 ≥ 𝑐 = 𝐏 𝑋 − 𝐄 𝑋 2 ≥ 𝑐2
𝐄 𝑋−𝐄 𝑋 2 𝜎𝑋2
≤ 2
= 2
𝑐 𝑐
⚫ For 𝑐 = 𝑘𝜎,
1
𝐏(|𝑋 − 𝐄 𝑋 | ≥ 𝑘𝜎𝑋 ) ≤ 2
𝑘
5
Example: Chebyshev bound is conservative
• The Chebyshev bound is more powerful than the Markov bound,
because it also uses variance. But since the mean and variance are
only a rough summary of its properties, we cannot expect the
bound to be close approximation to the exact value.
2 4−0 2ൗ
• If 𝑋~𝑈[0,4], we 𝐄 X = 2, 𝜎𝑋 ≤ 12 = 4/3, and for 𝑐 = 1
4
𝐏 𝑋−2 ≥1 ≤ .
3
which is uninformative compared to the exact value 1/2.
• Let 𝑋~Exp(𝜆 = 1), so that 𝐄 X = 1, 𝜎𝑋2 = 1. For 𝑐 > 1,
𝐏 𝑋 ≥𝑐 =𝐏 𝑋−1≥𝑐−1
1
≤𝐏 𝑋−1 ≥𝑐−1 ≤ 2
,
𝑐−1
which is again conservative compared to the exact value
𝐏 𝑋 > 𝑐 = 𝑒 −𝑐 .
6
Example: Upper bound of Chebyshev Ineq.
• If 𝑋 is in [𝑎, 𝑏], we claim a conservative bound 𝜎𝑋2 ≤ 𝑏 − 𝑎 2 /4.
If 𝜎𝑋2 is unknown, we may use 𝜎𝑋2 = 𝑏 − 𝑎 2 /4, and claim
𝑏−𝑎 2
𝐏(|𝑋 − 𝐄 𝑋 | ≥ 𝑐) ≤
4𝑐 2
• Why? : For any constant 𝛾, we have
𝐄 (𝑋 − 𝛾)2 = 𝐄 𝑋 2 − 2𝐄 𝑋 𝛾 + 𝛾 2 ,
and this is minimized when 𝛾 = 𝐄 𝑋 . Thus,
2
𝐄 (𝑋 − 𝛾)2 ≥𝐄 𝑋−𝐄 𝑋 = 𝜎𝑋2 , for all 𝛾.
By setting 𝛾 = (𝑎 + 𝑏)/2, we have
2 2 2
𝑎+𝑏 𝑏−𝑎 𝑏−𝑎
𝜎𝑋2 ≤ 𝐄 𝑋− =𝐄 𝑋−𝑎 𝑋−𝑏 + ≤
2 4 4
where the last inequality follows (𝑥 − 𝑎)(𝑥 − 𝑏) ≤ 0 for all 𝑥 in
the range [𝑎, 𝑏].
7
Chernoff Bound (1)
• Chernoff bounds are typically (but not always) tighter than
Markov and Chebyshev bounds but require stronger assumptions.
Let 𝑋 be a sum of 𝑛 independent Bernoulli random variables {𝑋𝑖 },
𝑋 = σ𝑖 𝑋𝑖 with 𝐄[𝑋𝑖 ] = 𝑝𝑖 . Let 𝜇 = 𝐄 𝑋 . Then we have
𝜇 = 𝐄 𝑋 = 𝐄 𝑋𝑖 = 𝐄[𝑋𝑖 ] = 𝑝𝑖
𝑖 𝑖 𝑖
• We pick 𝑓 𝑋 = 𝑒 𝑡𝑋 . Then,
𝐄 𝑒 𝑡𝑋
𝐏[𝑋 ≥ 1 + 𝛿 𝜇] = 𝐏 𝑒 𝑡𝑋 ≥ 𝑒 1+𝛿 𝜇𝑡 ≤ 1+𝛿 𝜇𝑡 (1)
𝑒
⚫ We will establish a bound on 𝐄 𝑒 𝑡𝑋 :
13
Convergence “in probability”
14
Example: Convergence
• One might be tempted to believe that if a sequence 𝑌𝑛
converges to a number 𝑎, then 𝐄[𝑌𝑛 ] must also to 𝑎. The
following example shows this need not be a case.
• Consider a sequence of random variables with the following
sequence of PMFs: 𝑷𝒀 (𝒚)
𝒏
1 − 𝑛1 , 𝑦=0 𝟏−
𝟏
𝐏 𝑌𝑛 = 𝑦 = ቐ 1
𝒏
𝑛, 𝑦 = 𝑛2 𝟏
𝒏
• For every 𝜖 > 0, we have 𝟎 𝒏𝟐 𝒀𝒏
1
lim 𝐏 𝑌𝑛 − 0 ≥ 𝜖 = lim = 0.
𝑛→∞ 𝑛→∞ 𝑛
Thus, 𝑌𝑛 converges to 0 in probability.
1
• 𝐄 𝑌𝑛 = 𝑛2 × = 𝑛, which goes to ∞ as 𝑛 increases.
𝑛
15
Convergence “with probability 1” (1)
• We have a sequence of random variables 𝑌1 , 𝑌2 , 𝑌3 , …
(not necessarily i.i.d.)
• We say that 𝑌𝑛 converges to 𝑎 with probability 1 (wp1)
(or almost surely (a.s.)) if
𝐏( lim 𝑌𝑛 = 𝑎) = 1
𝑛→∞
16
Convergence “with probability 1” (2)
• Consider a sequence 𝑌1 , 𝑌2 , 𝑌3 , … . If for all 𝜖 > 0, we have
∞
𝐏 𝑌𝑛 − 𝑎 > 𝜖 < ∞,
𝑛=1
a.s.
then 𝑌𝑛 𝑎. This provides only a sufficient condition for
almost sure convergence.
• In the case σ∞
𝑛=1 𝐏 𝑌𝑛 − 𝑎 > 𝜖 = ∞, then we have a
necessary and sufficient condition for almost surely
convergence: Define the set of events
𝑆𝑚 = 𝑌𝑛 − 𝑎 < 𝜖, for all 𝑛 ≥ 𝑚 .
a.s.
Then, 𝑌𝑛 𝑎 if and only if for any 𝜖 > 0, we have
𝜎2
𝐏( 𝑀𝑛 − 𝜇 ≥ 𝜖) ≤ 2
𝑛𝜖 19
WLLN and SLLN
• Let 𝑋1 , ⋯ , 𝑋𝑛 be i.i.d. with finite mean 𝜇 and variance 𝜎 2
• Weak Law of Large Numbers (WLLN)
For every 𝜖 > 0, 𝑀𝑛 converges to 𝜇 in probability
𝐏 |𝑀𝑛 − 𝜇| ≥ 𝜖 → 0 , as 𝑛 → ∞
𝐏 lim 𝑀𝑛 = 𝜇 = 1
𝑛→∞
20
The Pollster’s Problem (1)
• 𝑝: proportion of population that do something
1 if "Yes"
• 𝑖th person polled ~ Bernoulli(𝑝): 𝑋𝑖 = ቊ
0 if "No"
σ𝑛
• 𝑀𝑛 = 𝑖=1 𝑋𝑖
𝑛
= sample proportion of “Yes” as our estimate of 𝑝
• How many persons should be polled to satisfy
𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05
• Chebyshev bound is 𝐏 𝑀𝑛 − 𝐄 𝑀𝑛 ≥ 𝜖 ≤ 2 .
𝐕 𝑀𝑛
𝜖
𝑝(1−𝑝) 1
We have ϵ = 0.01, 𝐄 𝑀𝑛 = 𝑝, 𝐕 𝑀𝑛 = ≤
𝑛 4𝑛
1 (∵)When 𝑿 takes values
Thus, 𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 2 ≤ 0.05 in [𝒂,𝟐𝒃], 𝝈𝟐𝑿 ≤ 𝒃 − 𝒂 𝟐/𝟒.
4𝑛 0.01 So 𝝈𝑿 = 𝒑(𝟏 − 𝒑) ≤ 𝟏/𝟒
• If we choose 𝑛 large enough to satisfy the above bound,
we have a conservative value of 𝑛 ≥ 50,000
21
Central Limit Theorem (1)
• Let 𝑋1 , ⋯ , 𝑋𝑛 be a sequence of i.i.d. rvs with finite mean
𝜇 and variance 𝜎 2
• Look at three variants of their sum:
− 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 (variance 𝑛𝜎 2 ) increases to ∞
− 𝑀𝑛 = 𝑆𝑛 /𝑛 (variance 𝜎 2 /𝑛) converges “in probability”
to 𝜇 from WLLN
− 𝑆𝑛 / 𝑛 has the variance at a constant level 𝜎 2
• We define a “standardized” sum
𝑀𝑛 − 𝐄(𝑀𝑛 ) 𝑀𝑛 − 𝜇 𝑛𝑀𝑛 − 𝑛𝜇 𝑆𝑛 − 𝑛𝜇
𝑍𝑛 = = = =
𝜎𝑀𝑛 𝜎/ 𝑛 𝜎 𝑛 𝜎 𝑛
from which 𝐄 𝑍𝑛 = 0 and 𝐕(𝑍𝑛 ) = 1
22
Central Limit Theorem (2)
• Then, the CDF of 𝑍𝑛 converges to the standard normal CDF
in the sense that
lim 𝐏(𝑍𝑛 ≤ 𝑧) = Φ(𝑧), for every 𝑧
𝑛→∞
where Φ(𝑧) is the standard normal CDF
𝑧
1 −𝑥 2 /2
Φ 𝑧 = න𝑒 𝑑𝑥
2𝜋
−∞
• This is called the Central Limit Theorem (CLT).
23
What exactly does the CLT say?
• CDF of 𝑍𝑛 converges to Φ(𝑧)
− Not a statement about convergence of PDFs or PMFs.
• Normal Approximation:
− Treat 𝑍𝑛 as if normal (CLT)
− Also treat 𝑆𝑛 as if normal (NA)
24
Normal Approximation based on CLT
• If 𝑛 is large, the probability 𝐏(𝑆𝑛 ≤ 𝑠) can be
approximated by treating 𝑆𝑛 as if it were normal,
according to the following procedure:
25
Example: CLT (1)
• We load on a plane 100 packages whose weights are i.i.d. rvs
that are uniformly distributed between 5 and 50 kgs. What is
𝐏(𝑆100 > 3000 kgs) ?
5+50 (50−5)2
• 𝜇= = 27.5, 𝜎 2 = = 168.75
2 12
3000 − 100 × 27.5
𝑧= = 1.92
100 × 168.75
Use the standard normal tables to get the approximation
𝐏(𝑆100 ≤ 3000) ≈ Φ 1.92 = 0.9726.
Thus, the desired probability is
𝐏 𝑆100 > 3000 = 1 − 𝐏 𝑆100 ≤ 3000
≈ 1 − 0.9726 = 0.0274.
26
Example: CLT (2)
• The production times of machine parts are i.i.d. rvs, uniformly
distributed in [1, 5] minutes. What is the probability that the
number of parts produced within 320 minutes, 𝑁320 , is at least
100?
• Let 𝑋𝑖 be the processing time of the 𝑖th part and let 𝑆100 be the
total processing time of the 100 parts. Note that the event {𝑁320 ≥
100} is the same as the event {𝑆100 ≤ 320}.
1+5 (5−1)2 320−100×3
𝜇= = 3, 𝜎2 = = 4/3, 𝑧 = = 1.73
2 12 100×4/3
Thus, the desired probability is
𝐏 𝑁320 ≥ 100 = 𝐏 𝑆100 ≤ 320 ≈ Φ 1.73 = 0.9582.
n n+1
𝑆𝑛 ⇒ 𝑁𝑡 ≥ 𝑛 = {𝑆𝑛 ≤ 𝑡}
27
𝑁𝑡 events
Continuity Correction (1)
• Let us assume that 𝑌~Bin(𝑛 = 20, 𝑝 = 1/2), and suppose that
we are interested in 𝐏(8 ≤ 𝑌 ≤ 10). Then,
𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 with 𝑋𝑖 ~Bernoulli 𝑝 = 1/2 .
• We can apply the CLT to approximate
8 − 𝑛𝜇 𝑌 − 𝑛𝜇 10 − 𝑛𝜇
𝐏 8 ≤ 𝑌 ≤ 10 = 𝐏 ≤ ≤
𝜎 𝑛 𝜎 𝑛 𝜎 𝑛
8 − 10 10 − 10
=𝐏 ≤𝑍≤
5 5
2
≈Φ 0 −Φ − = 0.3145
5
• We can also find the exact value
10 𝑘 20−𝑘
20 1 1
𝐏 8 ≤ 𝑌 ≤ 10 = 1− = 0.4565
𝑘=8 𝑘 2 2 28
Continuity Correction (2)
• We notice that our approximation is not good. Part of the error
comes from the fact that 𝑌 is a discrete rv and we are using a
continuous distribution. Here is a trick to get a better result, called
continuity correction.
• Since 𝑌 can only take integer values, we can write
𝐏 8 ≤ 𝑌 ≤ 10 = 𝐏(7.5 ≤ 𝑌 ≤ 10.5)
7.5 − 10 𝑌 − 𝑛𝜇 10.5 − 10
= 𝐏( ≤ ≤ )
5 𝜎 𝑛 5
0.5 2.5
≈Φ −Φ − = 0.4567
5 5
• As we can see, our approximation improved significantly. The
continuity correction is particularly useful when we use the
normal approximation to the binomial distribution.
29
Continuity Correction (3)
𝑌 is at least 8 = {𝑌 ≥ 8}
(includes 8 and above)
𝑌 is at most 8 = {𝑌 ≤ 8}
(includes 8 and below)
𝑌 is exactly 8 = {𝑌 = 8}
30
The Pollster’s Problem (2)
• Suppose we want 𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05 with 𝐄 𝑆𝑛 =
𝑛𝑝, 𝜎𝑆2𝑛 = 𝑛𝜎 2 and 𝜎 2 ≤ 1/4 (∵)Since 𝒑 takes values
in [𝟎, 𝟏], 𝝈𝟐 ≤ 𝟏 − 𝟎 𝟐 /𝟒
𝒑
33
CLT Summary
34
Proof of the CLT
• Assume for simplicity 𝜇 = 𝐄 𝑋 = 0, 𝐄 𝑋 2 = 𝜎 2 = 1
(𝑋1 +𝑋2 +⋯+𝑋𝑛 )
• We want to show that 𝑍𝑛 = converges to
𝑛
the standard normal, or equivalently show the MGF of 𝑍𝑛
tends to that of the standard normal distribution:
𝑀𝑍𝑛 𝑠 = 𝐄 𝑒 𝑠𝑍𝑛 = 𝐄[𝑒 (𝑠Τ 𝑛)(𝑋1+⋯+𝑋𝑛 ) ]
𝑠 𝑠 2 𝑠 2
𝐄 𝑒 𝑠𝑋Τ 𝑛 ≈ 1 + 𝐄𝑋 + 𝐄 𝑋2 ≈ 1 +
𝑛 2𝑛 2𝑛
𝑛 𝑛
𝑠2 2 /2
Thus, 𝑀𝑍𝑛 𝑠 = 𝐄[𝑒 𝑠𝑋Τ 𝑛 ] ≈ 1+ → 𝑒 𝑠 ,
2𝑛
which is the MGF of the standard normal distribution.
36