0% found this document useful (0 votes)
31 views

STAT 206 - Chapter 7 (Sampling Distributions)

The document discusses sampling distributions and how they relate to calculating probabilities of sample statistics. It provides examples of how sampling distributions are developed and their key properties when the underlying population is normal. Formulas are given for computing z-scores and standard errors of sample means.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

STAT 206 - Chapter 7 (Sampling Distributions)

The document discusses sampling distributions and how they relate to calculating probabilities of sample statistics. It provides examples of how sampling distributions are developed and their key properties when the underlying population is normal. Formulas are given for computing z-scores and standard errors of sample means.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

STAT 206:

Chapter 7
Sampling Distributions

1
Ideas in Chapter 7
• The concept of the sampling distribution
• To compute probabilities related to the sample mean and
the sample proportion
• The importance of the Central Limit Theorem
• Remember a previous question?
Question:
• The quality control manager for a manufacturing
company wishes to choose a sample of 10 ball bearings
from a lot of 100 ball bearings check the average radius
of the ball bearings. How many different ways could the
sample be chosen?

A. ݇ ௡ ൌͳͲଵ଴଴
௡Ǩ ଵ଴଴Ǩ
B. nPx = ൌଵ଴଴ି ଵ଴ Ǩ
௡ି ௫ Ǩ

௡Ǩ ଵ଴଴Ǩ
C. nCx = ൌ ൌͳ ǡ͵ ͳͲǡ͵ Ͳͻ ǡͶͷ ǡͶͶͲ
௫Ǩ௡ି ௫ Ǩ ଵ଴Ǩଵ଴଴ି ଵ଴ Ǩ

4
2
7.1 Sampling Distributions
• SAMPLING DISTRIBUTION is a distribution of all of the possible
values of a sample statistic for a given sample size selected from a
population
• EXAMPLE: Cereal plant Operations Manager (OM) monitors the
amount of cereal in each box. Main plant fills thousands of boxes
of cereal during each shift. Speed of process produces variability.
Average weight must be 368 grams of cereal. To maintain quality
control, does OM take just ONE sample?
• No… Takes many samples!
• How many different sample means would the OM have?
• How are the sample mean values distributed?
• EXAMPLE: Suppose you sample 50 students from USC regarding
their mean GPA. If you obtained many different samples of size 50,
you will compute a different mean for each sample.
• We are interested in the distribution of all potential means for a
particular sample size (n is the same for each sample)
3
Developing a Sampling Distribution
• Assume there is a population …
• Population size N=4 A C D
B
• Random variable, X,
is age of individuals
• Values of X:
18, 20, 22, 24 (years)
μ
X i
σ
 (X  μ)  2.236
i
2

N N
18  20  22  24
  21
4
P(x)

.3
.2
.1
18 20 22 24 Pearson slide (Chapter 7, slides 4-5)
0
A B C D x
Developing a Sampling Distribution
Now we want to consider all possible samples of size n=2
How many outcomes of size 2 are there is we are sampling with
replacement?

A. 2*4=8 outcomes
B. outcomes
C. kn = 42 = 16 outcomes
v
Developing a Sampling Distribution
16 Sample
Now consider all possible samples of size n=2 Means

Counting Rule 1: kn = 42 = 16 outcomes 1st 2nd Observation


Obs 18 20 22 24
1st 2nd Observation
18 18 19 20 21
Obs 18 20 22 24
20 19 20 21 22

18 18,18 18,20 18,22 18,24 22 20 21 22 23


20 20,18 20,20 20,22 20,24 24 21 22 23 24
22 22,18 22,20 22,22 22,24 Sample Means Distribution
_
24 24,18 24,20 24,22 24,24 P(X)
16 possible samples .3
(sampling with
.2
replacement)
.1

0 _
18 19 20 21 22 23 24 X
Pearson slide (Chapter 7, slides 6-7)
(no longer uniform)
Developing A Sampling Distribution
Summary Measures of this Sampling Distribution:

Note: Here we divide by 16 because there are 16 different samples


(i.e. population) of n=2

Population, N = 4 Sample Means Distribution, n = 2


μ  21 σ  2.236 μX  21 σ X  1.58
P(X)
P(x) .3
.3
.2
.2
.1 .1
0 0 _
18 20 22 24
A B C D x 18 19 20 21 22 23 24 X

Pearson slide (Chapter 7, slides 8-9)


7.2 Sampling Distributions of the Mean
• SAMPLING DISTRIBUTION is a distribution of all of the possible
values of a sample statistic for a given sample size selected from a
population. We’re going to look at the statistic – SAMPLE MEAN
• Different samples of the SAME SIZE from the SAME POPULATION
will yield different SAMPLE MEANS (that is, there is sampling error
due to chance involved in choosing the different samples)
• Mean of all possible sample means (same sample size=n) is μ,
which is the population mean (that is, )
• Since we have sampling error for our many calculated means, we
are able to calculate the standard deviation of the sample means,
called STANDARD ERROR OF THE MEAN
• If a population is normal with mean μ and standard deviation σ,
the sampling distribution of is also normally distributed with
mean = and
standard deviation (standard error of the mean) =
8
Pearson slide (Chapter 7, slides 12-13)
Z-value for Sampling Distribution of the Mean
• Z-value for the sampling distribution of X :
( X  μX ) ( X  μ)
where: X = sample mean
Z  μ = population mean
σX σ
σ = population standard deviation
n
n = sample size
μx  μ (i.e. x is unbiased )
Normal Sampling
Distribution
Normal Population (has the same mean)
Distribution

μx
x
μ x
EXAMPLE:
Cereal plant Operations Manager (OM) monitors the amount of cereal
in each box. Amount of cereal in boxes (X) is approximately normal
with mean = 368 grams and standard deviation = 15.
• What is the probability that a randomly chosen box contains less
than 360 grams of cereal?
What do we know?
~ normal
μ = 368
σ = 15
x = 360 360
368
x
What is the value of Z for x= 360 grams of
cereal? P(Z<–0.53)=0.2981
A. z=-1.00
( X  ) (360  368)
B. z  

15
 0.533

(   X ) (368  360)
C. z    0.533
 15
10
EXAMPLE:
Cereal plant Operations Manager (OM) monitors the amount of cereal
in each box. Amount of cereal in boxes (X) is approximately normal
with mean = 368 grams and standard deviation = 15.
• What is the probability that a random sample of size 25 has a mean
of less than 360 grams of cereal?
What do we know?
~ normal
Sample with n=25

 15 15 368
X 
n
  3
25 5 360 x

P(Z<–2.67)=0.0038

11
Question:
Cereal plant Operations Manager (OM) monitors the amount of
cereal in each box. Amount of cereal in boxes (X) is approximately
normal with mean = 368 grams and standard deviation = 15. The OM
is interested in determining whether a random sample of size 25 has
a mean of less than 360 grams of cereal.

What is the standard error of the mean?

A.

B.
C.  15 15
X    3
n 25 5

D.  X    15
12
Review:
• SAMPLING DISTRIBUTION is a distribution of all of the possible
values of a sample statistic for a given sample size selected from a
population
• If a population is normal with mean μ and standard deviation σ,
the sampling distribution of (statistic) is also normally distributed
with mean = and
standard deviation (standard error of the mean) =
• If the underlying population is NORMAL, this is true for ANY size sample
( X  μX ) ( X  μ)
Z 
• Z-value for the sampling distribution of : σX σ
n

• As sample size, n, increases, standard error decreases


• REMEMBER: When calculating (x or ) values (, remember that Z is
the number of standard deviations associated with the probability
in the tail(s), and the probability is found INSIDE the normal
probability table, and the z-score is determined from the value on
the outside of the table
13
Sampling Distribution Properties

As n increases, Larger sample


σ decreases
x
size

Smaller sample
size

μ x
Pearson slide (Chapter 7, slides 14)
Interval Including A Fixed Proportion of the
Sample Means?
• For the distribution with
 15 15
X    3
n 25 5
find interval (symmetrically) around µ that will include 95% of the sample
means.
• From the standardized normal
table, the Z score with 2.5%
(0.0250) below it is -1.96 and 0.05
=0.025
the Z score with 2.5% (0.0250) 2
0.95
above it is 1.96

XL  μ Z
σ
 368  (1.96)
15
 362.12 359 362 365 368 371 374 377
n 25
0.05
σ 15
XU  μZ  368  (1.96)  373.88
n 25 -1.96 1.96

• 95% of all sample means of sample size 25 are between 362.12 and 373.88
15
But what if our population is NOT normal?
• CENTRAL LIMIT THEOREM!
• Even if the population is not normal,
• …sample means from the population will be
approximately normal as long as the sample size is large
enough
• Properties:

μx  μ σ
σx 
n

16
Central Limit Theorem
the sampling
As the n↑ distribution of
sample the sample
size gets mean becomes
large almost normal
enough… regardless of
shape of
population

x
Pearson slide (Chapter 7, slides 18)
The Sampling Distribution of a Sample Mean
A remarkable statistical fact, called the central limit
theorem, says that as we take more and more observations at
random from any population, the distribution of the mean of
these observations eventually gets close to a Normal
distribution.

18
Concepts and Controversies slide (Chapter 21, slide 21)
Show overhead from Agresti and Franklin, Statistics: The Art and
Science of Learning from Data, 3rd edition, p. 322

19
How large is “large enough”?
• For most distributions, n > 30 will give a sampling distribution that
is nearly normal
• For fairly symmetric distributions, n > 15
• For a normal population distribution, the sampling distribution of
the mean is always normally distributed
• EXAMPLE: Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n = 36 is
selected. What is the probability that the sample mean is between
7.8 and 8.2?
What do we know?
• n=36 > 30  ~normal

6.5 7.0 7.5 8.0 8.5 9.0 9.5


7.8 8.2
-0.4 0.4

20
= 0.6554 – 0.3446 = 0.3108
Example:
Physicians’ assistants The 2006 AAPA a. What do we know?
survey of the population of – n=100
physicians’ assistants who were – Salary  quantitative
working full time reported a mean  mean and standard deviation
annual income of $84,396 and  and
standard deviation of $21,975. data population parameters…
(Source: Data from 2006 AAPA survey The mean of the sampling distribution
[ www.aapa.org]) would be:
a. Suppose the AAPA had randomly The standard error would be:
sampled 100 physicians’ 
x  x 
assistants instead of collecting n
data for all of them. Describe
The sampling distribution would
the mean, standard deviation,
and shape of the sampling have a bell shape because the
distribution of the sample mean. sample size is greater than 30.

b. Using this sampling distribution, b. = $80,000 and


find the z-score for a sample
mean of $80,000.
Question:
The 2006 American Association of Physicians’ Assistants survey of
the population of physicians’ assistants who were working full time
reported a mean annual income of $84,396 and standard deviation
of $21,975. (Source: Data from 2006 AAPA survey [ www.aapa.org]) If
a random sample of size n=64 were taken, what is the probability
that the average salary would be less than $80,000?

A. 

B. 

C. 

D. 275

22
Review:
• CENTRAL LIMIT THEOREM!
• Even if the underlying population is not normal, sample means
from the population will be approximately normal as long as the
sample size is large enough
• Large enough (for means) is 30
• Properties: and
• If the underlying data are normally distributed, the sample size
DOES NOT matter

• Calculation of normal probabilities are determined in the same


way using the Normal Probability Table (E2)

23
REMINDER: Excel Can Be Used To Find
Normal Probabilities
• Find P(X < 9) where X is
normal with a mean of 7
and standard deviation of 2

x µ σ

x µ σ
Textbook example:
7.38 Physicians’ assistants The 2006 c. ($84,396-4000 , 84,396+4,000)
AAPA survey of the population of ($80,396 , $88,396)
physicians’ assistants who were
working full time reported a mean ± z = ±(x ± µ)/σ
annual income of $84,396 and
standard deviation of $21,975. ± 4000/2197.5 = ± 1.82
(Source: Data from 2006 AAPA
survey [ www.aapa.org])  P(-1.82 ≤ z ≤ 1.82)

c. Using parts a and b, find the = P(z ≤ 1.82) – P(z ≤ -1.82)


probability that the sample
mean would fall within = 0.9656 – 0.0344
approximately $4000 of the
population mean. = 0.9312
0.0688
d. What is the probability that =0.0344
2
the sample average salary is
less than $80,396?
0.9312

P(x<$80,396) = P(z<-1.82)=0.0344 -1.82 1.82

1.0000 - 0.9312 = 0.0688


Proportions
• p = the proportion of the population (parameter) having some
characteristic (that is the proportion of “successes”)
• = sample proportion provides an estimate of the parameter, p

• is approximately normally distributed when n is large


(assuming sampling without replacement from an “infinite”
population of sampling with replacement from a finite population)
• n is assumed to be “large” if
• Mean of the values and
where p is the population proportion

26
Z-value for Proportions
• To standardize a value of calculated from a sample:

• EXAMPLE: If the true proportion of voters who support a particular


candidate is p = 0.40, what is the probability that a sample of size
200 yields a sample proportion () between 0.40 and 0.45?

What do we know?
• n=200

• Check: np=80 and n(1-p)=120

= 0.9241 – 0.5000 = 0.4251

27
Example: • What do we know?
Random variability in baseball A • Hit/no hit (2 outcomes),
baseball player in the major leagues p = 0.300, (1-p) = 0.700,
who plays regularly will have about 500 n = 500
at-bats (that is, about 500 times, he • If we have np ≥ 5 and n(1-p) ≥ 5, the
can be the hitter in a game) during the distribution of sample proportions is
season. Suppose a player has a 0.300 bell-shaped
probability of getting a hit in an at-bat.
• 500(0.300) = 150 and 500(0.700) = 350
His batting average at the end of the
 bell-shaped
season is the number of hits divided by
the number of at-bats. This batting • We are estimating the probability of
average is a sample proportion, so it getting a hit ( ) for a player
has a sampling distribution describing • = p = 0.300
where it is likely to fall.
• Standard error = =
a. Describe the shape, mean and
standard deviation of the sampling =
distribution of the player’s batting
average after a season of 500 at-bats. • The shape is approximately normal
with a mean of 0.30 and standard
deviation of 0.0205
Example: • Since shape is approximately
normal with a mean of 0.30 and
Random variability in baseball A baseball standard deviation of 0.0205, and
player in the major leagues who plays sample size is large enough
regularly will have about 500 at-bats (that
is, about 500 times, he can be the hitter in • z320 = = 0.98 and
a game) during the season. Suppose a
player has a 0.300 probability of getting a z280 = = -0.98
hit in an at-bat. His batting average at the
end of the season is the number of hits
divided by the number of at-bats. This
batting average is a sample proportion,
so it has a sampling distribution
describing where it is likely to fall.
b. Explain why a batting average of
0.320 or of 0.280 would or would not
be especially unusual for this player’s
year-end batting average. Since both 0.32 and 0.280 are
about a standard deviation from
(That is, you should not conclude that the mean (0.300), they would not
someone with a batting average of 0.320 be considered unusual for this
one year is necessarily a better hitter than a player’s year-end batting average
player with a batting average of 0.280.)
• 39 STAT 201 students randomly selected and asked whether took STAT 110. 60%
of STAT 201 students have taken STAT 110.
A. Quantitative  use and to calculate probabilities
B. Categorical  use and to calculate probabilities
• The 2006 AAPA survey of 100 physicians’ assistants who were working full time
reported an average annual income of $84,396 and standard deviation of
$21,975. (Source: Data from 2006 AAPA survey [ www.aapa.org])
A. Quantitative  use and to calculate probabilities
B. Categorical  use and to calculate probabilities
• The following question was asked on a StatCrunch survey administered to STAT
201 students in Spring 2011: “How many hours did you sleep last night?” 90
students responded.
A. Quantitative  use and to calculate probabilities
B. Categorical  use and to calculate probabilities
• According to a Boston Globe story, only about 1 in 6 Americans have blue eyes,
whereas a 1900 about half had blue eyes (Source: Data from The Boston Globe,
October 17, 2006) For a random sample of 100 living Americans, find the mean
and standard deviation.
A. Quantitative  use and to calculate probabilities
B. Categorical  use and to calculate probabilities

30
Review: CENTRAL LIMIT THEOREM!
• QUANTITATIVE VARIABLES  MEAN
• If the underlying data are normally distributed, the sample size CAN BE ANYTHING
• Even if the underlying population is not normal, sample means from the population
will be approximately normal as long as the sample size is large enough
(Large enough (for means) is 30)
• Properties of distribution of values: and

• Calculation of normal probabilities are determined in the same way using the
Normal Probability Table (E2)
• CATEGORICAL VARIABLES  PROPORTIONS
• p = proportion of the population (parameter) having some characteristic
• = = sample proportion, , is an estimate of parameter, p
• Distribution of values for all possible samples of size n is approximately normal
when n is large (n is assumed to be “large” if
• Properties of distribution of values: and
where p is the population proportion

• Calculation of normal probabilities are determined in the same way using the
Normal Probability Table (E2)
31
Central Limit Theorem: Proportions AND Means
RULE: If many samples or repetitions of the SAME SIZE are taken,
the frequency curve made from STATISTICS from the SAMPLES will be
approximately normally distributed
Categorical (2 outcomes) Quantitative (Measurement)
PROPORTIONS (’s): MEANS (’s ):
• Assumptions: • Conditions/Assumptions
1. Population w/fixed proportion 1. If population bell-shaped (normal),
2. Random sample from population random sample of any size
3. np5 and n(1-p)5 (“large” samples) 2. If population not bell-shaped, a large
• MEAN of samples ’s will be random sample ( 30)
population proportion (p) • MEAN of sample means’s) will be
 population
 mean ()

• STANDARD DEVIATION of the • STANDARD DEVIATION of the


sample proportions (s) will be: sample means ’s) will be:
^
𝒑

You might also like