0% found this document useful (0 votes)
47 views12 pages

08 Learning About Mean Difference

The document discusses the distribution of sample means for paired data, where two measurements are taken for each individual or unit. It states that the distribution of the sample mean of differences (d-bar) is approximately normal, with a mean of the population mean difference (μd) and a standard deviation of σd/√n. It also covers how to construct a confidence interval for the population mean difference μd using a t-statistic, based on the sample mean difference, standard error, and degrees of freedom equal to n-1, where n is the number of pairs. An example uses data on changes in reasoning scores before and after piano lessons to illustrate calculating a confidence interval.

Uploaded by

JustinMalin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views12 pages

08 Learning About Mean Difference

The document discusses the distribution of sample means for paired data, where two measurements are taken for each individual or unit. It states that the distribution of the sample mean of differences (d-bar) is approximately normal, with a mean of the population mean difference (μd) and a standard deviation of σd/√n. It also covers how to construct a confidence interval for the population mean difference μd using a t-statistic, based on the sample mean difference, standard error, and degrees of freedom equal to n-1, where n is the number of pairs. An example uses data on changes in reasoning scores before and after piano lessons to illustrate calculating a confidence interval.

Uploaded by

JustinMalin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Stat250GundersonLectureNotes

8:LearningaboutaPopulationMeanDifference

Part1:DistributionforaSampleMeanofPairedDifferences

ThePairedDataScenario
An important special case of a single mean of a population occurs when two quantitative
variablesarecollectedinpairs,andwedesireinformationaboutthedifferencebetweenthetwo
variables.Herearesomewaysthatpaireddatacanoccur:
Each person or unit is measured twice. The two measurements of the same
characteristicortraitaremadeunderdifferentconditions.Anexampleismeasuring
aquantitativeresponsebothbeforeandaftertreatment.

Similarindividualsorunitsarepairedpriortoanexperiment.Duringtheexperiment,
each member of a pair receives a different treatment. The same quantitative
responsevariableismeasuredforallindividuals.

Forpaireddatadesigns,itisthedifferencesthatweareinterestedinexamining.Byfocusingon
thedifferencesweagainhavejustonesampleofobservations(thedifferences).Sometimesyou
may see a d in the subscript of the mean to represent the mean of the population of
differences:d;andthedatamayberepresentedgenericallyas: d 1 , d 2 ,..., d n .

SamplingDistributionfortheSampleMeanofPairedDifferences
Thesamplingdistributionresultsforthemeanofpaireddifferencesisreallythesameasthatfor
aregularsamplemean.Sincethemeasurementsaredifferences,thesamplemeanofthedata,
x ,isjustwrittenas d ,toemphasizethatthisisapaireddesign.

FreshmenWeight
Astudywasconductedtolearnabouttheaverageweightgaininthefirstyearofcollegefor
students.Asampleof60studentsresultedinanaverageweightgainof4.2pounds(overthe
first12weeksofcollege).

Population=Allfirstyearcollegestudents(andtheirweightgains)
Parameter=md=populationaverageweightgain
forallfirstyearcollegestudents(unknown)
Sample=the60firstyearcollegestudentssampled
Statistic=dbar= d =thesampleaverageweightgainforthese60students
=4.2pounds(knownforagivenselectedsample)

Cananyonesayhowclosethisobservedsamplemeandifference d of4.2poundsistothetrue
populationmeandifferenced?___No____Ifweweretotakeanotherrandomsampleofthe
samesize,wouldwegetthesamevalueforthesamplemeandifference?__ProbablyNOT__.
So what are the possible values for the sample mean difference d if we took many random
samplesofthesamesizefromthispopulation?Whatwouldthedistributionofthepossible d
valueslooklike?Whatcanwesayaboutthedistributionofthesamplemeandifference?
Thinkif d isreallyjustlikean x ,thendontyoualreadyknowabouthowsamplemeansvary?

127

DistributionoftheSampleMeanDifferenceMainResults

Letd=meanofthedifferencesinthepopulationofinterest.
Letd=standarddeviationforthedifferencesinthepopulationofinterest.
Let d =thesamplemeanofthedifferencesforarandomsampleofsizen.

Ifthepopulationofdifferencesisnormal(bellshaped),andarandomsampleofanysizeis
obtained,thenthedistributionofthesamplemeandifference d isalsonormal,withamean
d
ofdandastandarddeviationof s.d .(d )
.
n

Ifthepopulationofdifferencesisnotnormal(bellshaped),butalargerandomsampleofsize

n is obtained, then the distribution of the sample mean difference d is approximately


d
normal,withameanofdandastandarddeviationof s.d .(d )
.
n
Notes:
(1) Anarbitrarylevelforwhatislargeenoughhasbeen30.However,ifanyofthedifferences
areextremeoutliers,itisbettertohavealargersamplesize.
(2) Thestandarddeviationof d isameasureoftheaccuracyoftheprocessofusingasample
meandifferencetoestimatethepopulationmeandifference.

s.d .(d ) d

n
(3) In practice,the population standard deviation d is rarely known, so the sample standard
deviationsdisusedinstead.Whenmakingthissubstitutionwecalltheresultastandarderror.

Standarderrorofthesamplemeandifferenceisgivenby:

s.e.( d )=

sd
n

We can interpret the standard error of the sample mean difference as estimating,
approximately,theaveragedistanceofthepossible d values(forrepeatedsamplesof
thesamesizen)fromthepopulationmeandifferenced.

Moreover,wecanusethisstandarderrorofthesamplemeandifferencetoproducearangeof
values that we are very confident will contain the population mean difference d, namely,
d (afew)s.e.( d ).Thisisthebasisforconfidenceintervalforthepopulationmeandifference
d,discussedinPart2.

128

Lookingahead:

Doyouthinkthefewintheexpression d (afew)s.e.( d )willbeaz*valueorat*value?


Itwillbeat*aswearelearningaboutapopulationmean,thepopulationstddevisunknown,
andthesamplesizemaynotnecessarilybelarge.
Whatdegreesoffreedomwillbeused?
Thedfforasinglesample=n1wherenisthenumberofpairsornumberofdifferences.

Wewillusethestandarderrorofthesamplemeandifferencetocomputeastandardizedtest
statisticfortestinghypothesesaboutthepopulationmeandifferenced,namely,
SamplestatisticNullvalue.
(Null)standarderror

ThisisthebasisfortestingaboutapopulationmeandifferencecoveredinPart3.

Lookingahead:

Doyouthinkthestandardizedteststatisticwillbeazstatisticoratstatistic?
Itwillbeatstatisticaswearetestingatheoryaboutapopulationmean,thepopulationstd
dev is unknown, so the standard error is in the denominator, and the sample size may not
necessarilybelarge.
Whatdoyouthinkwillbethemostcommonnullvalue?
H0:d=___0___

129

AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.

130

Stat250GundersonLectureNotes
8:LearningaboutaPopulationMeanDifference

Part2:ConfidenceIntervalforaPopulationMeanofPairedDifferences

ConfidenceIntervalforthePopulationMeanofPairedDifferencesd

Recall that an important special case of a single mean of a population occurs when two
quantitative variables are collected in pairs, and we desire information about the difference
betweenthetwovariables.Forpaireddatadesigns,itisthedifferencesthatweareinterested
inanalyzing.Byfocusingonthedifferencesweagainhavejustonesampleofobservations(the
differences)andareabletousetheconfidenceintervalforthepopulationmeandifference.

Notation:

PopulationParameter:d=populationmeandifference(avgofalldifferencesinpopulation)

Data: d 1 , d 2 ,..., d n arandomsampleoffewdifferencesfrompopul

SampleEstimate: d =samplemeandifference(avgofalldifferencesinsample)

StandardError:s.e.( d )=

sd
n

(sd=standarddeviationofthesampleddifferences)

Weusethesampleestimateanditsstandarderrortoformaconfidenceintervalestimatefor
theparameterusingthefollowingform:

SampleEstimateMultiplierxStandarderror

The multiplier used will depend on the confidence level, the sample size, and the type of
parameterbeingestimated.Inthiscase,sinceweareestimatingasinglepopulationmean,the
multiplierwillbeat*value.Hereisthesummaryforapaireddataconfidenceinterval:

OnesampletConfidenceIntervalforthePopulationMeanDifferenced

d t *s.e.(d )
where t * istheappropriatevalueforat(n1)distribution.

Notethatnisthenumberofpairs,orthenumberofdifferences.Thisintervalrequiresthatthe
differencescanbeconsideredarandomsamplefromanormalpopulation.Ifthesamplesize
islarge,theassumptionofnormalityisnotsocrucialandtheresultisapproximate.

131

TryIt!ChangesinReasoningScores
Dopianolessonsimprovespatialtemporalreasoningofpreschoolchildren?
Data:Thechangeinreasoningscore,afterpianolessonsbeforepianolessons,withlargervalues
indicatingbetterreasoning,forarandomsampleofn=34preschoolchildren.

2
3
3

5
4
4

7
9
6

2
4
7

2
5
2

7
2
7

4
9
3

1
6
3

0
0
4

7
3
4

3
6

4
1

(a)Displaythedata,summarizethedistribution.
ThesedatawereenteredintoRtoproducethefollowinghistogram.
Notes:
1.Diff=afterbeforesowewantto

seelarge(positive)differences

2.Samplemeandifference=3.62=>

descriptivelyimproved

3.Normalityoftheresponse
(thedifference)forthepopulation?

Seemsreasonable,nooutliers

andwehaven=34.

SomesummarymeasureswereobtainedusingRCommanderandenteredintothetable:

> numSummary(Dataset[,"ChangeReas"], statistics=c("mean", "sd", "IQR",


+
"quantiles"), quantiles=c(0,.25,.5,.75,1))

mean
sd IQR 0% 25% 50% 75% 100% n
3.617647 3.055196
4 -3
2
4
6
9 34
Meandiff( )
3.62

SummaryStatistics
Samplesize(n)
Std.Dev(sd)
3.06
34

Std.Error
0.52

(b) Givea95%confidenceintervalforthepopulationmeanimprovementinreasoningscores.

d t *s.e.(d ) =>3.62(2.04)(0.52)3.621.06(2.56,4.68)

usingconservativedf=30

(c)Whatvalueisofparticularinteresttoseewhetherornotitisintheinterval?

Weseethatthevalueof0isnotintheintervalofreasonablevaluesforthepopulationmean
difference.Avalueof0wouldimplynoimprovementinreasoningscoresonaverage.Notonly
doesourintervalnothave0init,butallofthereasonablevaluesarepositive(theentireinterval
isabove0).Basedonourinterval,wewouldestimatethatthepopulationmeanimprovementis
somewherebetween2.56to4.68points.

132

(d) Astudentinyourclasswrotethefollowinginterpretationaboutthe95%confidencelevel
usedinmakingtheinterval.Isitacorrectinterpretation?Ifnot,updateittomakeitcorrect.

If this study were repeated many times, we would expect 95% of the resulting
confidenceintervalstocontainthesamplemeanimprovementinreasoningscores.

Thisisnotquitecorrect.Weknowthatabout95%oftheintervalsmadewiththismethodwould
containthePOPULATIONmeanimprovement(notthesamplemean).Eachsamplemean(foreach
repetition)wouldbeinthecorrespondinginterval,itwouldbethemidpointoftheinterval.

RNote:
The differences were already
computedandenteredasthedata.
So to make a confidence interval
withRCommanderwewouldneed
to perform a singlesample tTest
on the differences (and leave the
nullhypothesisvaluethedefaultof
0). Be sure the confidence level is
theoneyouwant,namely.95.

> with(Dataset, (t.test(ChangeReas, alternative='two.sided', mu=0.0,


+
conf.level=.95)))

One Sample t-test


data: ChangeReas
t = 6.9044, df = 33, p-value = 6.919e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
2.551639 4.683655
sample estimates:
mean of x
3.617647

df
33

Meandiff( )
3.62

PairedTResults
95%CILower
2.55

95%CIUpper
4.68

IfthebeforeandafterscoreswereenteredintoR,thenwewoulduseapairedttestoption.
Rwouldcomputethedifferencesforusandprovidetheconfidenceintervalresults.Detailsof
theRstepsforanalyzingpaireddatacanbefoundinyourLabWorkbook.

133

AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandask
duringofficehours,takeafewextranotes,writeout
an extra problem or summary completed in lecture,
createyourownsummaryabouttheseconcepts.

134

Stat250GundersonLectureNotes
8:LearningaboutaPopulationMeanDifference

Part3:TestingaboutaPopulationMeanofPairedDifference

TestingHypothesesaboutthePopulationMeanofPairedDifferencesd

An important special case of a single mean of a population occurs when two quantitative
variablesarecollectedinpairs,andwedesireinformationaboutthedifferencebetweenthetwo
variables.Forpaireddatadesigns,itisthedifferencesthatweareinterestedinanalyzing.By
focusingonthedifferencesweagainhavejustonesampleofobservations(thedifferences)and
areabletoperformaonesamplettestonthedifferences.

Theprocedureforthesignificancetestisreallythesamethesamplemeanofthedata, x ,is
justwrittenas d ,primarilytoemphasizethatthisisapaireddesignwiththedifferencesbeing
analyzed.Thecommonlyusednullhypothesisisthatthepopulationmeandifferenceisd=0.

Possiblenullandalternativehypotheses.

1.H0: D=0

versusHa:

D0

2.H0: D=0

versusHa:

D>0

3.H0: D=0

versusHa:

D<0

Note: Theformatofthealternativehypothesisdependsontheresearchquestionofinterest
andtheorderinwhichthedifferencesweretaken.

TestStatisticandConditionsfortheTest
Teststatistic=SamplestatisticNullvalue
Standarderror

d 0
d

s.e.(d ) sd n

IfH0istrue,thisteststatistichasa_________t(n1)________distribution.

Weusethisdistributiontoreportthe(boundsforthe)pvalue.

Conditionsforthetest:Thedifferenceisassumedtobenormallydistributedforthepopulation
(but if the sample size is large, this condition is less crucial). So you need to examine the
differences graphically and assess if there are any extreme outliers or skewness in the
differences.Ifso,eitherthesamplesizeneedstobelargeoranalternativetestingmethodmay
berequired.

135

TryIt!KnobTurning
Astudyinvolvedn=25righthandedstudentsandadevicewithtwodifferentknobs(righthand
threadandlefthandthread).Theresponseofinterestisthetimeittakestomoveknobindicator
afixeddistance.Thequestionofinterestistoassessifrighthandthreadsareeasiertoturnon
average.Usea5%significancelevel.

a. Whyisthisapaireddesignandhowshouldrandomizationbeusedintheexperiment?

Thisisapaireddesignw/2treatmentsoneachsubject.Randomizationshouldbe

usedtodeterminewhichknobisusedfirstbyeachsubject.

b. Statethehypotheses.H0:___D=0_____ versusHa:____D<0_____

Diff=RTtimeLTtime(seeoutput);somD=mRTmLT.IfRTeasierwewouldexpecttoseedifferences<0.

Hereareafewsummariesofeachsetofresponsesseparatelyandthenofthepaireddata:
Paired Samples Statistics

Pair
1

RTHREAD
LTHREAD

Mean
104.00
117.44

N
25
25

Std. Deviation
15.93
27.26

Std. Error
Mean
3.19
5.45

BelowarethettestresultsgeneratedusingRCommanderandselectingStatistics>Means
>PairedTTestandthecorrectdirectionforthealternativehypothesis.Noticethata95%
onesidedconfidenceboundisprovidedsinceourtestalternativewasonesidedtotheleft.
Ifyouwantedtoalsoreportaregular95%confidenceinterval,youwouldrunatwosided
hypothesistestinR.

SummaryStatistics
Std.Dev(sd)
Samplesize(n)
23.06
25

Meandiff( )
13.44

Std.Error
4.61

PairedTResults

df

pvalue

95%CILower

95%CIUpper

2.914

24

0.004

***

5.55

c. Performthetest.

13.44 13.44
d 0

2.914
23.06
sd
4.61
25
n

Sincepvalueislessthan0.05,werejectH0andtheresultsarestatisticallysignificant.There
issufficientevidencetosupportthatRTareeasiertoturnthanLTforRHstudentsonaverage.
d. Whichareassumptionsrequiredforperformingthepairedttest?
theturningtimesfortherighthandthreadedknobareindependentoftheturningtimes
forthelefthandthreadedknob.
theturningtimesfortherighthandthreadedknobarenormallydistributed.

thedifferenceinturningtimes(diff=RTLT)isnormallydistributed.

136

137

AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandask
duringofficehours,takeafewextranotes,writeout
an extra problem or summary completed in lecture,
createyourownsummaryabouttheseconcepts.

138

You might also like