STAT 206 - Chapter 7 (Sampling Distributions)
STAT 206 - Chapter 7 (Sampling Distributions)
Chapter 7
Sampling Distributions
1
Ideas in Chapter 7
• The concept of the sampling distribution
• To compute probabilities related to the sample mean and
the sample proportion
• The importance of the Central Limit Theorem
• Remember a previous question?
Question:
• The quality control manager for a manufacturing
company wishes to choose a sample of 10 ball bearings
from a lot of 100 ball bearings check the average radius
of the ball bearings. How many different ways could the
sample be chosen?
A. ݇ ൌͳͲଵ
Ǩ ଵǨ
B. nPx = ൌଵି ଵ Ǩ
ି ௫ Ǩ
Ǩ ଵǨ
C. nCx = ൌ ൌͳ ǡ͵ ͳͲǡ͵ Ͳͻ ǡͶͷ ǡͶͶͲ
௫Ǩି ௫ Ǩ ଵǨଵି ଵ Ǩ
4
2
7.1 Sampling Distributions
• SAMPLING DISTRIBUTION is a distribution of all of the possible
values of a sample statistic for a given sample size selected from a
population
• EXAMPLE: Cereal plant Operations Manager (OM) monitors the
amount of cereal in each box. Main plant fills thousands of boxes
of cereal during each shift. Speed of process produces variability.
Average weight must be 368 grams of cereal. To maintain quality
control, does OM take just ONE sample?
• No… Takes many samples!
• How many different sample means would the OM have?
• How are the sample mean values distributed?
• EXAMPLE: Suppose you sample 50 students from USC regarding
their mean GPA. If you obtained many different samples of size 50,
you will compute a different mean for each sample.
• We are interested in the distribution of all potential means for a
particular sample size (n is the same for each sample)
3
Developing a Sampling Distribution
• Assume there is a population …
• Population size N=4 A C D
B
• Random variable, X,
is age of individuals
• Values of X:
18, 20, 22, 24 (years)
μ
X i
σ
(X μ) 2.236
i
2
N N
18 20 22 24
21
4
P(x)
.3
.2
.1
18 20 22 24 Pearson slide (Chapter 7, slides 4-5)
0
A B C D x
Developing a Sampling Distribution
Now we want to consider all possible samples of size n=2
How many outcomes of size 2 are there is we are sampling with
replacement?
A. 2*4=8 outcomes
B. outcomes
C. kn = 42 = 16 outcomes
v
Developing a Sampling Distribution
16 Sample
Now consider all possible samples of size n=2 Means
0 _
18 19 20 21 22 23 24 X
Pearson slide (Chapter 7, slides 6-7)
(no longer uniform)
Developing A Sampling Distribution
Summary Measures of this Sampling Distribution:
μx
x
μ x
EXAMPLE:
Cereal plant Operations Manager (OM) monitors the amount of cereal
in each box. Amount of cereal in boxes (X) is approximately normal
with mean = 368 grams and standard deviation = 15.
• What is the probability that a randomly chosen box contains less
than 360 grams of cereal?
What do we know?
~ normal
μ = 368
σ = 15
x = 360 360
368
x
What is the value of Z for x= 360 grams of
cereal? P(Z<–0.53)=0.2981
A. z=-1.00
( X ) (360 368)
B. z
15
0.533
( X ) (368 360)
C. z 0.533
15
10
EXAMPLE:
Cereal plant Operations Manager (OM) monitors the amount of cereal
in each box. Amount of cereal in boxes (X) is approximately normal
with mean = 368 grams and standard deviation = 15.
• What is the probability that a random sample of size 25 has a mean
of less than 360 grams of cereal?
What do we know?
~ normal
Sample with n=25
15 15 368
X
n
3
25 5 360 x
P(Z<–2.67)=0.0038
11
Question:
Cereal plant Operations Manager (OM) monitors the amount of
cereal in each box. Amount of cereal in boxes (X) is approximately
normal with mean = 368 grams and standard deviation = 15. The OM
is interested in determining whether a random sample of size 25 has
a mean of less than 360 grams of cereal.
A.
B.
C. 15 15
X 3
n 25 5
D. X 15
12
Review:
• SAMPLING DISTRIBUTION is a distribution of all of the possible
values of a sample statistic for a given sample size selected from a
population
• If a population is normal with mean μ and standard deviation σ,
the sampling distribution of (statistic) is also normally distributed
with mean = and
standard deviation (standard error of the mean) =
• If the underlying population is NORMAL, this is true for ANY size sample
( X μX ) ( X μ)
Z
• Z-value for the sampling distribution of : σX σ
n
Smaller sample
size
μ x
Pearson slide (Chapter 7, slides 14)
Interval Including A Fixed Proportion of the
Sample Means?
• For the distribution with
15 15
X 3
n 25 5
find interval (symmetrically) around µ that will include 95% of the sample
means.
• From the standardized normal
table, the Z score with 2.5%
(0.0250) below it is -1.96 and 0.05
=0.025
the Z score with 2.5% (0.0250) 2
0.95
above it is 1.96
XL μ Z
σ
368 (1.96)
15
362.12 359 362 365 368 371 374 377
n 25
0.05
σ 15
XU μZ 368 (1.96) 373.88
n 25 -1.96 1.96
• 95% of all sample means of sample size 25 are between 362.12 and 373.88
15
But what if our population is NOT normal?
• CENTRAL LIMIT THEOREM!
• Even if the population is not normal,
• …sample means from the population will be
approximately normal as long as the sample size is large
enough
• Properties:
μx μ σ
σx
n
16
Central Limit Theorem
the sampling
As the n↑ distribution of
sample the sample
size gets mean becomes
large almost normal
enough… regardless of
shape of
population
x
Pearson slide (Chapter 7, slides 18)
The Sampling Distribution of a Sample Mean
A remarkable statistical fact, called the central limit
theorem, says that as we take more and more observations at
random from any population, the distribution of the mean of
these observations eventually gets close to a Normal
distribution.
18
Concepts and Controversies slide (Chapter 21, slide 21)
Show overhead from Agresti and Franklin, Statistics: The Art and
Science of Learning from Data, 3rd edition, p. 322
19
How large is “large enough”?
• For most distributions, n > 30 will give a sampling distribution that
is nearly normal
• For fairly symmetric distributions, n > 15
• For a normal population distribution, the sampling distribution of
the mean is always normally distributed
• EXAMPLE: Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n = 36 is
selected. What is the probability that the sample mean is between
7.8 and 8.2?
What do we know?
• n=36 > 30 ~normal
20
= 0.6554 – 0.3446 = 0.3108
Example:
Physicians’ assistants The 2006 AAPA a. What do we know?
survey of the population of – n=100
physicians’ assistants who were – Salary quantitative
working full time reported a mean mean and standard deviation
annual income of $84,396 and and
standard deviation of $21,975. data population parameters…
(Source: Data from 2006 AAPA survey The mean of the sampling distribution
[ www.aapa.org]) would be:
a. Suppose the AAPA had randomly The standard error would be:
sampled 100 physicians’
x x
assistants instead of collecting n
data for all of them. Describe
The sampling distribution would
the mean, standard deviation,
and shape of the sampling have a bell shape because the
distribution of the sample mean. sample size is greater than 30.
A.
B.
C.
D. 275
22
Review:
• CENTRAL LIMIT THEOREM!
• Even if the underlying population is not normal, sample means
from the population will be approximately normal as long as the
sample size is large enough
• Large enough (for means) is 30
• Properties: and
• If the underlying data are normally distributed, the sample size
DOES NOT matter
23
REMINDER: Excel Can Be Used To Find
Normal Probabilities
• Find P(X < 9) where X is
normal with a mean of 7
and standard deviation of 2
x µ σ
x µ σ
Textbook example:
7.38 Physicians’ assistants The 2006 c. ($84,396-4000 , 84,396+4,000)
AAPA survey of the population of ($80,396 , $88,396)
physicians’ assistants who were
working full time reported a mean ± z = ±(x ± µ)/σ
annual income of $84,396 and
standard deviation of $21,975. ± 4000/2197.5 = ± 1.82
(Source: Data from 2006 AAPA
survey [ www.aapa.org]) P(-1.82 ≤ z ≤ 1.82)
26
Z-value for Proportions
• To standardize a value of calculated from a sample:
What do we know?
• n=200
27
Example: • What do we know?
Random variability in baseball A • Hit/no hit (2 outcomes),
baseball player in the major leagues p = 0.300, (1-p) = 0.700,
who plays regularly will have about 500 n = 500
at-bats (that is, about 500 times, he • If we have np ≥ 5 and n(1-p) ≥ 5, the
can be the hitter in a game) during the distribution of sample proportions is
season. Suppose a player has a 0.300 bell-shaped
probability of getting a hit in an at-bat.
• 500(0.300) = 150 and 500(0.700) = 350
His batting average at the end of the
bell-shaped
season is the number of hits divided by
the number of at-bats. This batting • We are estimating the probability of
average is a sample proportion, so it getting a hit ( ) for a player
has a sampling distribution describing • = p = 0.300
where it is likely to fall.
• Standard error = =
a. Describe the shape, mean and
standard deviation of the sampling =
distribution of the player’s batting
average after a season of 500 at-bats. • The shape is approximately normal
with a mean of 0.30 and standard
deviation of 0.0205
Example: • Since shape is approximately
normal with a mean of 0.30 and
Random variability in baseball A baseball standard deviation of 0.0205, and
player in the major leagues who plays sample size is large enough
regularly will have about 500 at-bats (that
is, about 500 times, he can be the hitter in • z320 = = 0.98 and
a game) during the season. Suppose a
player has a 0.300 probability of getting a z280 = = -0.98
hit in an at-bat. His batting average at the
end of the season is the number of hits
divided by the number of at-bats. This
batting average is a sample proportion,
so it has a sampling distribution
describing where it is likely to fall.
b. Explain why a batting average of
0.320 or of 0.280 would or would not
be especially unusual for this player’s
year-end batting average. Since both 0.32 and 0.280 are
about a standard deviation from
(That is, you should not conclude that the mean (0.300), they would not
someone with a batting average of 0.320 be considered unusual for this
one year is necessarily a better hitter than a player’s year-end batting average
player with a batting average of 0.280.)
• 39 STAT 201 students randomly selected and asked whether took STAT 110. 60%
of STAT 201 students have taken STAT 110.
A. Quantitative use and to calculate probabilities
B. Categorical use and to calculate probabilities
• The 2006 AAPA survey of 100 physicians’ assistants who were working full time
reported an average annual income of $84,396 and standard deviation of
$21,975. (Source: Data from 2006 AAPA survey [ www.aapa.org])
A. Quantitative use and to calculate probabilities
B. Categorical use and to calculate probabilities
• The following question was asked on a StatCrunch survey administered to STAT
201 students in Spring 2011: “How many hours did you sleep last night?” 90
students responded.
A. Quantitative use and to calculate probabilities
B. Categorical use and to calculate probabilities
• According to a Boston Globe story, only about 1 in 6 Americans have blue eyes,
whereas a 1900 about half had blue eyes (Source: Data from The Boston Globe,
October 17, 2006) For a random sample of 100 living Americans, find the mean
and standard deviation.
A. Quantitative use and to calculate probabilities
B. Categorical use and to calculate probabilities
30
Review: CENTRAL LIMIT THEOREM!
• QUANTITATIVE VARIABLES MEAN
• If the underlying data are normally distributed, the sample size CAN BE ANYTHING
• Even if the underlying population is not normal, sample means from the population
will be approximately normal as long as the sample size is large enough
(Large enough (for means) is 30)
• Properties of distribution of values: and
• Calculation of normal probabilities are determined in the same way using the
Normal Probability Table (E2)
• CATEGORICAL VARIABLES PROPORTIONS
• p = proportion of the population (parameter) having some characteristic
• = = sample proportion, , is an estimate of parameter, p
• Distribution of values for all possible samples of size n is approximately normal
when n is large (n is assumed to be “large” if
• Properties of distribution of values: and
where p is the population proportion
• Calculation of normal probabilities are determined in the same way using the
Normal Probability Table (E2)
31
Central Limit Theorem: Proportions AND Means
RULE: If many samples or repetitions of the SAME SIZE are taken,
the frequency curve made from STATISTICS from the SAMPLES will be
approximately normally distributed
Categorical (2 outcomes) Quantitative (Measurement)
PROPORTIONS (’s): MEANS (’s ):
• Assumptions: • Conditions/Assumptions
1. Population w/fixed proportion 1. If population bell-shaped (normal),
2. Random sample from population random sample of any size
3. np5 and n(1-p)5 (“large” samples) 2. If population not bell-shaped, a large
• MEAN of samples ’s will be random sample ( 30)
population proportion (p) • MEAN of sample means’s) will be
population
mean ()