08 Learning About Mean Difference
08 Learning About Mean Difference
8:LearningaboutaPopulationMeanDifference
Part1:DistributionforaSampleMeanofPairedDifferences
ThePairedDataScenario
An important special case of a single mean of a population occurs when two quantitative
variablesarecollectedinpairs,andwedesireinformationaboutthedifferencebetweenthetwo
variables.Herearesomewaysthatpaireddatacanoccur:
Each person or unit is measured twice. The two measurements of the same
characteristicortraitaremadeunderdifferentconditions.Anexampleismeasuring
aquantitativeresponsebothbeforeandaftertreatment.
Similarindividualsorunitsarepairedpriortoanexperiment.Duringtheexperiment,
each member of a pair receives a different treatment. The same quantitative
responsevariableismeasuredforallindividuals.
Forpaireddatadesigns,itisthedifferencesthatweareinterestedinexamining.Byfocusingon
thedifferencesweagainhavejustonesampleofobservations(thedifferences).Sometimesyou
may see a d in the subscript of the mean to represent the mean of the population of
differences:d;andthedatamayberepresentedgenericallyas: d 1 , d 2 ,..., d n .
SamplingDistributionfortheSampleMeanofPairedDifferences
Thesamplingdistributionresultsforthemeanofpaireddifferencesisreallythesameasthatfor
aregularsamplemean.Sincethemeasurementsaredifferences,thesamplemeanofthedata,
x ,isjustwrittenas d ,toemphasizethatthisisapaireddesign.
FreshmenWeight
Astudywasconductedtolearnabouttheaverageweightgaininthefirstyearofcollegefor
students.Asampleof60studentsresultedinanaverageweightgainof4.2pounds(overthe
first12weeksofcollege).
Population=Allfirstyearcollegestudents(andtheirweightgains)
Parameter=md=populationaverageweightgain
forallfirstyearcollegestudents(unknown)
Sample=the60firstyearcollegestudentssampled
Statistic=dbar= d =thesampleaverageweightgainforthese60students
=4.2pounds(knownforagivenselectedsample)
Cananyonesayhowclosethisobservedsamplemeandifference d of4.2poundsistothetrue
populationmeandifferenced?___No____Ifweweretotakeanotherrandomsampleofthe
samesize,wouldwegetthesamevalueforthesamplemeandifference?__ProbablyNOT__.
So what are the possible values for the sample mean difference d if we took many random
samplesofthesamesizefromthispopulation?Whatwouldthedistributionofthepossible d
valueslooklike?Whatcanwesayaboutthedistributionofthesamplemeandifference?
Thinkif d isreallyjustlikean x ,thendontyoualreadyknowabouthowsamplemeansvary?
127
DistributionoftheSampleMeanDifferenceMainResults
Letd=meanofthedifferencesinthepopulationofinterest.
Letd=standarddeviationforthedifferencesinthepopulationofinterest.
Let d =thesamplemeanofthedifferencesforarandomsampleofsizen.
Ifthepopulationofdifferencesisnormal(bellshaped),andarandomsampleofanysizeis
obtained,thenthedistributionofthesamplemeandifference d isalsonormal,withamean
d
ofdandastandarddeviationof s.d .(d )
.
n
Ifthepopulationofdifferencesisnotnormal(bellshaped),butalargerandomsampleofsize
s.d .(d ) d
n
(3) In practice,the population standard deviation d is rarely known, so the sample standard
deviationsdisusedinstead.Whenmakingthissubstitutionwecalltheresultastandarderror.
Standarderrorofthesamplemeandifferenceisgivenby:
s.e.( d )=
sd
n
We can interpret the standard error of the sample mean difference as estimating,
approximately,theaveragedistanceofthepossible d values(forrepeatedsamplesof
thesamesizen)fromthepopulationmeandifferenced.
Moreover,wecanusethisstandarderrorofthesamplemeandifferencetoproducearangeof
values that we are very confident will contain the population mean difference d, namely,
d (afew)s.e.( d ).Thisisthebasisforconfidenceintervalforthepopulationmeandifference
d,discussedinPart2.
128
Lookingahead:
Wewillusethestandarderrorofthesamplemeandifferencetocomputeastandardizedtest
statisticfortestinghypothesesaboutthepopulationmeandifferenced,namely,
SamplestatisticNullvalue.
(Null)standarderror
ThisisthebasisfortestingaboutapopulationmeandifferencecoveredinPart3.
Lookingahead:
Doyouthinkthestandardizedteststatisticwillbeazstatisticoratstatistic?
Itwillbeatstatisticaswearetestingatheoryaboutapopulationmean,thepopulationstd
dev is unknown, so the standard error is in the denominator, and the sample size may not
necessarilybelarge.
Whatdoyouthinkwillbethemostcommonnullvalue?
H0:d=___0___
129
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.
130
Stat250GundersonLectureNotes
8:LearningaboutaPopulationMeanDifference
Part2:ConfidenceIntervalforaPopulationMeanofPairedDifferences
ConfidenceIntervalforthePopulationMeanofPairedDifferencesd
Recall that an important special case of a single mean of a population occurs when two
quantitative variables are collected in pairs, and we desire information about the difference
betweenthetwovariables.Forpaireddatadesigns,itisthedifferencesthatweareinterested
inanalyzing.Byfocusingonthedifferencesweagainhavejustonesampleofobservations(the
differences)andareabletousetheconfidenceintervalforthepopulationmeandifference.
Notation:
PopulationParameter:d=populationmeandifference(avgofalldifferencesinpopulation)
SampleEstimate: d =samplemeandifference(avgofalldifferencesinsample)
StandardError:s.e.( d )=
sd
n
(sd=standarddeviationofthesampleddifferences)
Weusethesampleestimateanditsstandarderrortoformaconfidenceintervalestimatefor
theparameterusingthefollowingform:
SampleEstimateMultiplierxStandarderror
The multiplier used will depend on the confidence level, the sample size, and the type of
parameterbeingestimated.Inthiscase,sinceweareestimatingasinglepopulationmean,the
multiplierwillbeat*value.Hereisthesummaryforapaireddataconfidenceinterval:
OnesampletConfidenceIntervalforthePopulationMeanDifferenced
d t *s.e.(d )
where t * istheappropriatevalueforat(n1)distribution.
Notethatnisthenumberofpairs,orthenumberofdifferences.Thisintervalrequiresthatthe
differencescanbeconsideredarandomsamplefromanormalpopulation.Ifthesamplesize
islarge,theassumptionofnormalityisnotsocrucialandtheresultisapproximate.
131
TryIt!ChangesinReasoningScores
Dopianolessonsimprovespatialtemporalreasoningofpreschoolchildren?
Data:Thechangeinreasoningscore,afterpianolessonsbeforepianolessons,withlargervalues
indicatingbetterreasoning,forarandomsampleofn=34preschoolchildren.
2
3
3
5
4
4
7
9
6
2
4
7
2
5
2
7
2
7
4
9
3
1
6
3
0
0
4
7
3
4
3
6
4
1
(a)Displaythedata,summarizethedistribution.
ThesedatawereenteredintoRtoproducethefollowinghistogram.
Notes:
1.Diff=afterbeforesowewantto
seelarge(positive)differences
2.Samplemeandifference=3.62=>
descriptivelyimproved
3.Normalityoftheresponse
(thedifference)forthepopulation?
Seemsreasonable,nooutliers
andwehaven=34.
SomesummarymeasureswereobtainedusingRCommanderandenteredintothetable:
mean
sd IQR 0% 25% 50% 75% 100% n
3.617647 3.055196
4 -3
2
4
6
9 34
Meandiff( )
3.62
SummaryStatistics
Samplesize(n)
Std.Dev(sd)
3.06
34
Std.Error
0.52
(b) Givea95%confidenceintervalforthepopulationmeanimprovementinreasoningscores.
d t *s.e.(d ) =>3.62(2.04)(0.52)3.621.06(2.56,4.68)
usingconservativedf=30
(c)Whatvalueisofparticularinteresttoseewhetherornotitisintheinterval?
Weseethatthevalueof0isnotintheintervalofreasonablevaluesforthepopulationmean
difference.Avalueof0wouldimplynoimprovementinreasoningscoresonaverage.Notonly
doesourintervalnothave0init,butallofthereasonablevaluesarepositive(theentireinterval
isabove0).Basedonourinterval,wewouldestimatethatthepopulationmeanimprovementis
somewherebetween2.56to4.68points.
132
(d) Astudentinyourclasswrotethefollowinginterpretationaboutthe95%confidencelevel
usedinmakingtheinterval.Isitacorrectinterpretation?Ifnot,updateittomakeitcorrect.
If this study were repeated many times, we would expect 95% of the resulting
confidenceintervalstocontainthesamplemeanimprovementinreasoningscores.
Thisisnotquitecorrect.Weknowthatabout95%oftheintervalsmadewiththismethodwould
containthePOPULATIONmeanimprovement(notthesamplemean).Eachsamplemean(foreach
repetition)wouldbeinthecorrespondinginterval,itwouldbethemidpointoftheinterval.
RNote:
The differences were already
computedandenteredasthedata.
So to make a confidence interval
withRCommanderwewouldneed
to perform a singlesample tTest
on the differences (and leave the
nullhypothesisvaluethedefaultof
0). Be sure the confidence level is
theoneyouwant,namely.95.
df
33
Meandiff( )
3.62
PairedTResults
95%CILower
2.55
95%CIUpper
4.68
IfthebeforeandafterscoreswereenteredintoR,thenwewoulduseapairedttestoption.
Rwouldcomputethedifferencesforusandprovidetheconfidenceintervalresults.Detailsof
theRstepsforanalyzingpaireddatacanbefoundinyourLabWorkbook.
133
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandask
duringofficehours,takeafewextranotes,writeout
an extra problem or summary completed in lecture,
createyourownsummaryabouttheseconcepts.
134
Stat250GundersonLectureNotes
8:LearningaboutaPopulationMeanDifference
Part3:TestingaboutaPopulationMeanofPairedDifference
TestingHypothesesaboutthePopulationMeanofPairedDifferencesd
An important special case of a single mean of a population occurs when two quantitative
variablesarecollectedinpairs,andwedesireinformationaboutthedifferencebetweenthetwo
variables.Forpaireddatadesigns,itisthedifferencesthatweareinterestedinanalyzing.By
focusingonthedifferencesweagainhavejustonesampleofobservations(thedifferences)and
areabletoperformaonesamplettestonthedifferences.
Theprocedureforthesignificancetestisreallythesamethesamplemeanofthedata, x ,is
justwrittenas d ,primarilytoemphasizethatthisisapaireddesignwiththedifferencesbeing
analyzed.Thecommonlyusednullhypothesisisthatthepopulationmeandifferenceisd=0.
Possiblenullandalternativehypotheses.
1.H0: D=0
versusHa:
D0
2.H0: D=0
versusHa:
D>0
3.H0: D=0
versusHa:
D<0
Note: Theformatofthealternativehypothesisdependsontheresearchquestionofinterest
andtheorderinwhichthedifferencesweretaken.
TestStatisticandConditionsfortheTest
Teststatistic=SamplestatisticNullvalue
Standarderror
d 0
d
s.e.(d ) sd n
IfH0istrue,thisteststatistichasa_________t(n1)________distribution.
Weusethisdistributiontoreportthe(boundsforthe)pvalue.
Conditionsforthetest:Thedifferenceisassumedtobenormallydistributedforthepopulation
(but if the sample size is large, this condition is less crucial). So you need to examine the
differences graphically and assess if there are any extreme outliers or skewness in the
differences.Ifso,eitherthesamplesizeneedstobelargeoranalternativetestingmethodmay
berequired.
135
TryIt!KnobTurning
Astudyinvolvedn=25righthandedstudentsandadevicewithtwodifferentknobs(righthand
threadandlefthandthread).Theresponseofinterestisthetimeittakestomoveknobindicator
afixeddistance.Thequestionofinterestistoassessifrighthandthreadsareeasiertoturnon
average.Usea5%significancelevel.
a. Whyisthisapaireddesignandhowshouldrandomizationbeusedintheexperiment?
Thisisapaireddesignw/2treatmentsoneachsubject.Randomizationshouldbe
usedtodeterminewhichknobisusedfirstbyeachsubject.
b. Statethehypotheses.H0:___D=0_____ versusHa:____D<0_____
Diff=RTtimeLTtime(seeoutput);somD=mRTmLT.IfRTeasierwewouldexpecttoseedifferences<0.
Hereareafewsummariesofeachsetofresponsesseparatelyandthenofthepaireddata:
Paired Samples Statistics
Pair
1
RTHREAD
LTHREAD
Mean
104.00
117.44
N
25
25
Std. Deviation
15.93
27.26
Std. Error
Mean
3.19
5.45
BelowarethettestresultsgeneratedusingRCommanderandselectingStatistics>Means
>PairedTTestandthecorrectdirectionforthealternativehypothesis.Noticethata95%
onesidedconfidenceboundisprovidedsinceourtestalternativewasonesidedtotheleft.
Ifyouwantedtoalsoreportaregular95%confidenceinterval,youwouldrunatwosided
hypothesistestinR.
SummaryStatistics
Std.Dev(sd)
Samplesize(n)
23.06
25
Meandiff( )
13.44
Std.Error
4.61
PairedTResults
df
pvalue
95%CILower
95%CIUpper
2.914
24
0.004
***
5.55
c. Performthetest.
13.44 13.44
d 0
2.914
23.06
sd
4.61
25
n
Sincepvalueislessthan0.05,werejectH0andtheresultsarestatisticallysignificant.There
issufficientevidencetosupportthatRTareeasiertoturnthanLTforRHstudentsonaverage.
d. Whichareassumptionsrequiredforperformingthepairedttest?
theturningtimesfortherighthandthreadedknobareindependentoftheturningtimes
forthelefthandthreadedknob.
theturningtimesfortherighthandthreadedknobarenormallydistributed.
thedifferenceinturningtimes(diff=RTLT)isnormallydistributed.
136
137
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandask
duringofficehours,takeafewextranotes,writeout
an extra problem or summary completed in lecture,
createyourownsummaryabouttheseconcepts.
138