Chap005
Chap005
Introductory Statistical
Methods
Preparatory Module
5-2
Chapter 5
5 LEARNING OBJECTIVES
After studying this chapter you should be able to:
Take random samples from populations
Distinguish between population parameters and sample statistics
Apply the central limit theorem
Derive sampling distributions of sample means and proportions
Explain why sample statistics are good estimators of population
parameters
Judge one estimator as better than another based on desirable
properties of estimators
Apply the concept of degrees of freedom
Identify special sampling methods
Compute sampling distributions and related results
5-5
Make
Make Onthe
On thebasis
basisofof
generalizationsabout
generalizations about observationsofofaa
observations
thecharacteristics
the characteristicsofof sample,aapart
sample, partofofaa
aapopulation...
population... population
population
5-6
Unbiased
Sample
Unbiased,
representative sample
Democrats Republicans drawn at random from
Population
the entire population.
Biased Biased,
People who have Sample
phones and/or cars unrepresentative
and/or are Digest
readers. sample drawn from
Democrats Republicans people who have cars
Population and/or telephones
and/or read the Digest.
5-7
•• An
Anestimator
estimatorof
ofaapopulation
populationparameter
parameterisisaasample
samplestatistic
statistic
usedto
used toestimate
estimateororpredict
predictthe
thepopulation
populationparameter.
parameter.
•• An
Anestimate
estimateof ofaaparameter
parameterisisaaparticular
particularnumerical
numericalvalue
value
ofaasample
of samplestatistic
statisticobtained
obtainedthrough
throughsampling.
sampling.
•• AApoint
pointestimate
estimateisisaasingle
singlevalue
valueused
usedasasan
anestimate
estimateof
ofaa
populationparameter.
population parameter.
5-8
Estimators
•• The
The sample
sample mean,
mean,X ,, isis the
the most
most common
common
estimator of
estimator of the
the population
population mean, mean,
•• The
The sample
sample variance,
variance, ss22,, isis the
the most
most common
common
estimator of the population variance,
estimator of the population variance, . 22.
•• The
The sample
sample standard
standard deviation,
deviation, s,s, isis the
the most
most
common estimator
common estimator ofof the
the population
population standard
standard
deviation, ..
deviation,
•• The
The sample
sample proportion,
proportion,p̂,, isis thethe most
most common
common
estimator of
estimator of the
the population
population proportion,
proportion, p. p.
5-9
X X X X X X X
X X X X X X X
X X X X
Sample points
Sample mean X
( )
5-11
P(X)
5 0.125 0.625 0.5 0.25 0.03125
5 0.125 0.625 0.5 0.25 0.03125 0.1
6 0.125 0.750 1.5 2.25 0.28125
6 0.125 0.750 1.5 2.25 0.28125
7 0.125 0.875 2.5 6.25 0.78125
7 0.125 0.875 2.5 6.25 0.78125
8 0.125 1.000 3.5 12.25 1.53125
8 0.125 1.000 3.5 12.25 1.53125
1.000 4.500 5.25000 0.0
1.000 4.500 5.25000 1 2 3 4 5 6 7 8
X
E(X)====4.5
E(X) 4.5
V(X) = 22 = 5.25
V(X) = = 5.25
SD(X)====2.2913
SD(X) 2.2913
5-14
P(X)
0.062500 0.156250 -2.0 4.00 0.250000
0.078125 0.234375 -1.5 2.25 0.175781 0.05
0.093750 0.328125 -1.0 1.00 0.093750
0.109375 0.437500 -0.5 0.25 0.027344
0.125000 0.562500 0.0 0.00 0.000000
0.00
0.109375 0.546875 0.5 0.25 0.027344
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.093750 0.515625 1.0 1.00 0.093750
X
0.078125 0.468750 1.5 2.25 0.175781
0.062500 0.406250 2.0 4.00 0.250000
0.046875
0.031250
0.328125
0.234375
2.5
3.0
6.25
9.00
0.292969
0.281250 E ( X ) X 4.5
0.015625 0.125000 3.5 12.25 0.191406
V ( X ) 2X 2.625
1.000000 4.500000 2.625000 SD( X ) X 1.6202
5-16
distributionand
distribution andthe
thesampling
sampling
distributionof
distribution ofthe
themean:
mean:
P(X)
0.1
The
Thesampling
samplingdistribution
distributionisis
morebell-shaped
more bell-shapedandand 0.0
symmetric.
symmetric.
1 2 3 4
X
5 6 7 8
Both
Bothhave
havethe
thesame
samecenter.
center. Sampling Distribution of the Mean
The
Thesampling
samplingdistribution
distributionof
of
themean
the meanisismore
morecompact,
compact, 0.10
P(X)
withaasmaller
with smallervariance.
variance. 0.05
0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
X
5-17
The expected value of the sample mean is equal to the population mean:
E ( X )
X X
The variance of the sample mean is equal to the population variance divided by
the sample size:
2
V ( X ) 2
X
X
n
The standard deviation of the sample mean, known as the standard error of
the mean,
mean is equal to the population standard deviation divided by the square
root of the sample size:
SD( X ) X
X
n
5-18
Thismeans
This meansthat,
that,as
asthe
the Sampling Distribution of the Sample Mean
samplesize
sample sizeincreases,
increases,the
the 0.4
samplingdistribution
sampling distributionof
ofthe
the 0.3
Sampling Distribution: n =16
Sampling Distribution: n = 4
samplemean
sample meanremains
remains
f(X)
0.2
centeredon
centered onthe
thepopulation
population 0.1
Sampling Distribution: n = 2
Normal population
mean,but
mean, butbecomes
becomesmore
more 0.0
Normal population
compactlydistributed
compactly distributedaround
around
thatpopulation
that populationmean
mean
5-19
When sampling
When sampling from from aa population
population 0.25
n=5
P(X)
0.10
deviation ,
deviation , the
the sampling
sampling 0.05
0.00
X
distribution of
distribution of the
the sample
sample mean
mean will
will
n = 20
tend to
tend to aa normal
normal distribution
distribution with
with 0.2
mean and standard deviation
mean and standard deviation n as as
P(X)
0.1
the sample
the sample size
size becomes
becomes large
large 0.0
X
(n >30).
(n >30). Large n
0.4
0.3
f(X)
For “large
“large enough”
enough” n:
n: X ~ N ( , / n)
0.2
For
2
0.1
0.0
X
-
5-20
Population
n=2
n = 30
X X X X
5-21
217 220 217 220
P Z P Z
15 15
100 10
P ( Z 2) 0.0228
5-22
Example 5-2
15 4.00 - 4.49
4.50 - 4.99
10 5.00 - 5.49
5.50 - 5.99
5
6.00 - 6.49
0 6.50 - 6.99
7.00 - 7.49
Range
7.50 - 7.99
5-23
Student’s t Distribution
IfIfthe
thepopulation
populationstandard
standarddeviation,
deviation,, ,isisunknown,
unknown, with
replacewith
unknown replace
unknown
thesample
the samplestandard
standarddeviation,
deviation,s.s. IfIfthe
thepopulation
populationisisnormal,
normal,the
the
resultingstatistic:
resulting statistic: t X
s/ n
hasaattdistribution
has distributionwith
with(n
(n--1)
1)degrees
degreesof
offreedom.
freedom.
freedom
freedom
•• The
Thet tisisaafamily
familyofofbell-shaped
bell-shapedand
andsymmetric
symmetric
distributions,one
distributions, oneforforeach
eachnumber
numberofofdegree
degreeofof
freedom. Standard normal
freedom.
•• Theexpected
The expectedvalue
valueofoft tisis0.0. t, df=20
•• Thevariance
The varianceofoft tisisgreater
greaterthanthan1,1,butbutapproaches
approaches t, df=10
11asasthe
thenumber
numberofofdegrees
degreesofoffreedom
freedomincreases.
increases.
Thet tisisflatter
The flatterand
andhashasfatter
fattertailstailsthan
thandoes
doesthe
the
standardnormal.
standard normal.
•• Thet tdistribution
The distributionapproaches
approachesaastandardstandardnormal
normal
asasthe
thenumber
numberofofdegrees
degreesofoffreedom
freedomincreases.
increases.
5-24
successesininnnbinomial
binomialtrials.
trials. ItItisisthe
the
0 .4
successes 0 .3
P(X)
numberof
number ofsuccesses,
successes,X,X,divided
dividedby bythe the 0 .2
0 .1
numberof
number oftrials,
trials,n.n. 0 .0
0 1 2
n=10,p=0.3
0.3
X
Sample proportion: pˆ
n
0.2
P(X)
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10
Asthe
As thesample
samplesize,
size,n,n,increases,
increases,the
thesampling
sampling X
distribution of p approaches
distributionof approachesaanormal
normal n=15, p = 0.3
distributionwith
withmean
meanppand andstandard
standard
0.2
distribution
P(X)
deviation p(1 p)
deviation
0.1
n 0.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X
0 1 2 3 4 5 6 7 8 9 10 1112 13 1415
15 15 15 15 15 15 15 15 151515 1515 15 1515 ^
p
5-25
n 100
p 0.25
p p .20 p
P ( p 0.20 ) P
p (1 p ) p (1 p )
np (100 )( 0.25) 25 E ( p )
n n
p (1 p )
(.25)(.75)
0.001875 V ( p ) .20 .25 .05
n 100 P z P z
(.25)(.75)
.0433
p (1 p ) 100
0.001875 0.04330127 SD ( p ) P ( z 1.15) 0.8749
n
5-26
Anestimator
An estimatorof ofaapopulation
populationparameter
parameterisisaasample
samplestatistic
statisticused
usedto
to
estimatethe
estimate theparameter.
parameter. The
Themost
mostcommonly-used
commonly-usedestimator
estimatorof ofthe:
the:
PopulationParameter
Population Parameter SampleStatistic
Sample Statistic
Mean()
Mean () isisthe
the Mean(X)
Mean (X)
Variance (
Variance ( ) 22) is the
is the Variance (s
22)
Variance (s )
StandardDeviation
Standard Deviation()() isisthe
the StandardDeviation
Standard Deviation(s)(s)
Proportion(p)
Proportion (p) isisthe
the Proportion((p ))
Proportion
••Desirable
Desirable properties
properties of
of estimators
estimators include:
include:
Unbiasedness
Unbiasedness
Efficiency
Efficiency
Consistency
Consistency
Sufficiency
Sufficiency
5-27
Unbiasedness
Anestimator
An estimatorisissaid
saidto
tobe
beunbiased
unbiasedififits
itsexpected
expectedvalue
valueisisequal
equalto
to
thepopulation
the populationparameter
parameterititestimates.
estimates.
Forexample,
For example,E(X)=so
E(X)=sothe
thesample
samplemean
meanisisan
anunbiased
unbiasedestimator
estimator
ofthe
of thepopulation
populationmean.
mean. Unbiasedness
Unbiasednessisisan
anaverage
averageororlong-run
long-run
property. The
property. Themean
meanof
ofany
anysingle
singlesample
samplewill
willprobably
probablynotnotequal
equalthe
the
populationmean,
population mean,but
butthe
theaverage
averageofofthe
themeans
meansof ofrepeated
repeated
independentsamples
independent samplesfrom
fromaapopulation
populationwill
willequal
equalthe
thepopulation
population
mean.
mean.
Anysystematic
Any systematicdeviation
deviationof ofthe
theestimator
estimatorfrom
fromthe
thepopulation
population
parameterof
parameter ofinterest
interestisiscalled
calledaabias.
bias.
bias
bias
5-28
{
Bias
Efficiency
Anestimator
An estimatorisisefficient
efficientififitithas
hasaarelatively
relativelysmall
smallvariance
variance(and
(and
standarddeviation).
standard deviation).
Anestimator
An estimatorisissaid
saidto
tobe
beconsistent
consistentififits
itsprobability
probabilityof
ofbeing
beingclose
close
tothe
to theparameter
parameterititestimates
estimatesincreases
increasesas asthe
thesample
samplesize
sizeincreases.
increases.
Consistency
n = 10 n = 100
Anestimator
An estimatorisissaid
saidto
tobe
besufficient
sufficientififititcontains
containsall
allthe
theinformation
information
inthe
in thedata
dataabout
aboutthe
theparameter
parameterititestimates.
estimates.
5-31
Thesample
The samplevariance
variance(the
(thesum
sumof ofthe
thesquared
squareddeviations
deviationsfrom
fromthe
the
samplemean
sample meandivided
dividedby
by(n-1)
(n-1)isisan
anunbiased
unbiasedestimator
estimatorofofthe
the
populationvariance.
population variance. In
Incontrast,
contrast,the
theaverage
averagesquared
squareddeviation
deviation
fromthe
from thesample
samplemean
meanisisaabiased
biased(though
(thoughconsistent)
consistent)estimator
estimatorofofthe
the
populationvariance.
population variance.
E (s ) E
2 ( x x )
2
2
(n 1)
( x x )2
2
E
n
5-33
If only two data points and the sample mean are known:
12 14 x x4 56
3
5-35
Thenumber
The numberof ofdegrees
degreesofoffreedom
freedomisisequal
equalto
tothe
thetotal
totalnumber
numberof of
measurements(these
measurements (theseare
arenot
notalways
alwaysraw
rawdata
datapoints),
points),less
lessthe
thetotal
total
numberof
number ofrestrictions
restrictionson
onthe
themeasurements.
measurements. AArestriction
restrictionisisaa
quantitycomputed
quantity computedfrom
fromthe
themeasurements.
measurements.
Thesample
The samplemean
meanisisaarestriction
restrictionon
onthe
thesample
samplemeasurements,
measurements,soso
aftercalculating
after calculatingthe
thesample
samplemean
meanthere
thereare
areonly
only(n-1)
(n-1)degrees
degreesof
of
freedomremaining
freedom remainingwith
withwhich
whichto tocalculate
calculatethe
thesample
samplevariance.
variance.
Thesample
The samplevariance
varianceisisbased
basedonononly
only(n-1)
(n-1)free
freedata
datapoints:
points:
s
2
( x x )
2
(n 1)
5-36
Example 5-4
AAsample
sampleofofsize
size10
10isisgiven
givenbelow.
below. We Wearearetotochoose
choosethree
threedifferent
differentnumbers
numbersfromfrom
whichthe
which thedeviations
deviationsare
aretotobe
betaken.
taken. The
Thefirst
firstnumber
numberisistotobe
beused
usedfor
forthe
thefirst
firstfive
five
samplepoints;
sample points;the
thesecond
secondnumber
numberisistotobe
beused
usedforforthe
thenext
nextthree
threesample
samplepoints;
points;andand
thethird
the thirdnumber
numberisistotobe
beused
usedfor
forthe
thelast
lasttwo
twosample
samplepoints.
points.
Sample # 1 2 3 4 5 6 7 8 9 10
Sample 93 97 60 72 96 83 59 66 88 53
Point
ii. Calculate
ii. Calculatethe
theSSD
SSDwith
withchosen
chosennumbers.
numbers.
Solution:SSD
Solution: SSD==2030.367.
2030.367.See
Seetable
tableon
onnext
nextslide
slidefor
forcalculations.
calculations.
iii. What
iii. Whatisisthe
thedfdffor
forthe
thecalculated
calculatedSSD?
SSD?
Solution: dfdf==10
Solution: 10––33==7.7.
iv. Calculate
iv. Calculatean
anunbiased
unbiasedestimate
estimateof
ofthe
thepopulation
populationvariance.
variance.
Solution: An
Solution: Anunbiased
unbiasedestimate
estimateof
ofthe
thepopulation
populationvariance
varianceisisSSD/df
SSD/df==2030.367/7
2030.367/7
==290.05.
290.05.
5-38