Confidence Intervals
Confidence Intervals
Sampling Distribution
75
50
25
1 2 3 4
x (mean year)
1
Standard Error (SE) The standard deviation of the sampling
distribution of a statistic.
Confidence Interval An interval of two values that represent
lower and upper bounds on the statistic that captures
most of the sampling distribution.
Population Distribution
0.075
Proportion
0.050
0.025
0.000
60 80 100
Food Safety Scores
1
The terms census refers to a setting where you have access to the entire
population.
2
The population distribution is skewed left with a long left tail.
The highest possible score is 100. It appears that even scores
are more popular than odd scores for scores in the 90s; in fact
there are no scores of 99, 97, and 95.
We can calculate two parameters of this population:
Population parameters, like the
parameters of probability
• The population mean, 𝜇, is 87.6. distributions, are usually given a
• The population SD, 𝜎, is 8.9. Greek letter. The population mean
is 𝜇, said “myoo”, and the
population standard deviation is 𝜎,
said “sigma”.
The Empirical Distribution
Although we have data on all of the restaurants in the city,
imagine that you’re an inspector who has visited a simple
random sample of 100 restaurants. That is, you draw 100
times without replacement from the population, with each unit
equally likely to be selected. This leads to a representative
sample that will have no selection bias.
The distribution of this sample (an empirical distribution) looks
like:
9
Proportion
0
60 80 100
Food Safety Scores
3
Observe that the empirical distribution resembles the popula-
tion distribution because we are using a sampling method with-
out with selection bias. It’s not a perfect match but the shape
is similar. The sample average (𝑥)̄ and the sample SD (𝑠) are
also close to but not the same as the population average (𝜇)
and SD (𝜎).
Sampling Distribution
0.5
0.4
Proportion
0.3
0.2
0.1
0.0
85.0 87.5 90.0
Average Food Safety Scores
4
We can consider numerical summaries of this distribution:
0.075 9 0.4
Proportion
Proportion
Proportion
0.3
0.050 6
0.2
0.025 3
0.1
0.000 0 0.0
60 80 100 60 80 100 85.0 87.5 90.0
Food Safety Scores Food Safety Scores Average Food Safety Scores
Observe that:
5
2. 𝜎 and the SE of the sample averages are related in the
following way2 :
𝜎
𝑆𝐸(𝑥)̄ ≈ √
𝑛
𝜇 ≈ 𝑥̄
𝜎≈𝑠
2
This approximation becomes equality for a random sample with re-
placement. When we have a SRS, the exact formula is 𝑆𝐸(𝑥)̄ =
√
√ 𝑁−𝑛
𝑁−1
𝜎/ 𝑛.
This additional term, called the finite population correction factor,
adjusts for the fact that we are drawing without replacement. Here 𝑁
is the number of tickets in the box (the size of the population) and 𝑛
is the number of tickets drawn from the box (the size of the sample).
To help make sense of this correction factor, think about the following
two cases:
6
Inference for a Population Average
Standard Error
7
𝑠
𝑆𝐸(𝑥)̄ ≈ √
𝑛
8
Confidence Intervals
3
This is not always the case. We’ll come back to this point later.
9
be 95% confident that the population parameter is within the
following interval.
For the particular interval that you have created, you don’t
know if it contains the population mean or not. This is why
we use the term confidence to describe it instead of probability.
Probability comes into play when taking the sample, after that
our confidence interval is a known observed value with nothing
left to chance.
10
100
75
Iteration
50
25
80 85 90 95
x
11
Suppose we only want to eat at restaurants with food safety
scores above 95. Let’s make a confidence interval for the pro-
portion of restaurants in San Francisco with scores that are
“excellent” (scores over 95). To tackle this problem, we can
modify our population. Since we need only to keep track of
whether a score is excellent, we can replace the scores on the
tickets with 0s and 1s, where 1 indicates an excellent score. Of
the 5766 restaurants in San Francisco, 1240 are excellent. We
can think of our population as a box with 5766 tickets in it,
and 1240 are marked 1, and 4526 are marked 0. This time let’s
take a SRS of 25.
The thought-experiment appears as
Population Empirical Distribution Sampling Distribution
0.8 5
0.6
4
0.6
Proportion
Proportion
Proportion
0.4 3
0.4
2
0.2
0.2
1
0.0 0.0 0
−0.5 0.0 0.5 1.0 1.5 −0.5 0.0 0.5 1.0 1.5 0.0 0.2 0.4 0.6
Excellent score? Excellent scores? Proportion excellent
12
√𝑝(1−
̂ 𝑝)̂
𝑆𝐸(𝑝)̂ = √
𝑛
.
√𝑝(1
̂ − 𝑝)̂ √𝑝(1
̂ − 𝑝)̂
[𝑝̂ − 1.96 √ , 𝑝̂ + 1.96 √ ]
𝑛 𝑛
Summary
1
𝑉 𝑎𝑟(𝑝)̂ = 𝑉 𝑎𝑟( 𝑛 𝑌) (1)
= 𝑛12 𝑉 𝑎𝑟(𝑌 ) (2)
= 𝑛12 𝑛𝑝(1 − 𝑝) (3)
= 𝑝(1−𝑝)
𝑛
(4)
13
The confidence intervals that we have made are approximate in
the following sense:
14