Chapter 11
Chapter 11
and Statistics
Twelfth Edition
Chapter 11
The Analysis of Variance
Some graphic screen captures from Seeing Statistics ® Copyright ©2006 Brooks/Cole
Some images © 2001-(current year) www.arttoday.com A division of Thomson Learning, Inc.
Experimental Design
• The sampling plan or experimental design
determines the way that a sample is selected.
• In an observational study, the experimenter
observes data that already exist. The sampling
plan is a plan for collecting this data.
• In a designed experiment, the experimenter
imposes one or more experimental conditions
on the experimental units and records the
response.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Definitions
• An experimental unit is the object on which a
measurement or measurements) is taken.
• A factor is an independent variable whose
values are controlled and varied by the
experimenter.
• A level is the intensity setting of a factor.
• A treatment is a specific combination of factor
levels.
• The response is the variable being measured by
the experimenter.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Example
• A group of people is randomly divided into
an experimental and a control group. The
control group is given an aptitude test after
having eaten a full breakfast. The
experimental group is given the same test
without having eaten any breakfast.
Experimental unit = person Factor = meal
Breakfast or
Response = Score on test Levels =
no breakfast
Treatments: Breakfast or no breakfast
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Example
• The experimenter in the previous example
also records the person’s gender. Describe
the factors, levels and treatments.
Experimental unit = person Response = score
Factor #1 = meal Factor #2 = gender
breakfast or
Levels = Levels = male or
no breakfast female
Treatments:
male and breakfast, female and breakfast, male
and no breakfast, female and no breakfast
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Analysis of Variance
(ANOVA)
• All measurements exhibit variability.
• The total variation in the response
measurements is broken into portions that
can be attributed to various factors.
factors
• These portions are used to judge the effect
of the various factors on the experimental
response.
2.
2. Assumptions
Assumptions regarding
regarding the
the sampling
sampling
procedures
procedures are
are specified
specified for
for each
each design.
design.
•Analysis of variance procedures are fairly robust
when sample sizes are equal and when the data are
fairly mound-shaped.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Three Designs
• Completely randomized design: an extension of the two independent sample
t-test.
• Randomized block design: an extension of the paired difference test.
• a × b Factorial experiment: we study two experimental factors and their effect
on the response.
Source df SS MS F
Treatments k -1 SST SST/(k-1) MST/MSE
Error n-k SSE SSE/(n-k)
Total n -1 Total SS
Source df SS MS F
Treatments 2 64.6667 32.3333 5.00
Error 9 58.25 6.4722
Total 11 122.9167
We
Wereject
reject HH00 and
and conclude
conclude that thatther
thereeisis aa
difference
difference inin average
average attention
attention spans.
spans.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Confidence Intervals
•If a difference exists between the treatment
means, we can explore it with confidence
intervals.
ss
AAsingle mean, i i ::xxi i
single mean, tt/ /22
nni i
11 11
tt/ /22 ss
Difference i i j j ::((xxi i xxj j))
22
Difference
nni i nnj j
where ss
where MSE
MSE and and ttisis basedbasedon onerror
error df df..
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Tukey’s Method for
Paired Comparisons
•Designed to test all pairs of population means
simultaneously, with an overall error rate of
.
•Based on the studentized range,
range the
difference between the largest and smallest of
the k sample means.
•Assume that the sample sizes are equal and
calculate a “ruler” that measures the distance
required between any pair of means to declare
a significant difference. Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Tukey’s Method
ss
Calculate ::
Calculate qq((kk,,df
df ))
nni i
where kk
where number
number of
of treatment
treatment means
means
ss MSE
MSE df
df error
error df
df
nni i
common
common sample
sample size
size
df ))
qq((kk,,df value
value from
from Table
Table 11.
11.
IfIf any
any pair
pair of
of means
means differ
differ by
bymore than ,,
more than
they
they arearedeclared
declared different.
different.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Breakfast Problem
Use Tukey’s method to determine which of the
three population means differ from the others.
No Breakfast Light Breakfast Full Breakfast
T1 = 37 T2 = 59 T3 = 53
Means 37/4 = 9.25 59/4 = 14.75 53/4 = 13.25
ss 66.4722
.4722 5.02
qq.05.05((33,9,9)) 33.95
.95 5.02
44 44
Block C 10 15 13 10
Block==location
location(b
(b==4)
4)
IsIsthe
theaverage
averagegrowth
growthdifferent
differentfor
forthe
the33
soil
soilpreps?
preps? Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Randomized
Block Design
• Let xij be the response for the i-th
treatment applied to the j-th block.
– i = 1, 2, …k j = 1, 2, …, b
• The total variation in the experiment is
measured by the total sum of squares:
squares
22
Total SS
Total SS ((xxijij xx))
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Analysis of Variance
The Total SS is divided into 3 parts:
SST (sum of squares for treatments): measures
the variation among the k treatment means
SSB (sum of squares for blocks): measures the
variation among the b block means
SSE (sum of squares for error): measures the
random variation or experimental error
in such a way that:
Total SS
Total SS SST SSB
SST SSB SSE
SSE
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Computing Formulas
G2
CM where G xij
n
Total SS xij2 CM
2
Ti
SST CM where Ti total for treatm ent i
b
2
Bj
SSB CM where B j total for block j
k
SSE Total SS - SST - SSB
ss 11.8889
.8889 2.98
qq.05.05((33,6,6)) 44.34
.34 2.98
44 44
beneficial.
Remember that you cannot construct
Total SS
Total SS SSA SSB
SSA SSB SS(AB)
SS(AB) SSE
SSE
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Computing Formulas
G2
CM where G xijk
n
2
Total SS xijk CM
2
Ai
SSA CM where Ai total for level i of A
br
2
Bj
SSB CM where B j total for level j of B
ar
2
ABij
SS(AB) CM - SSA - SSB
r
wher e ABij total for level i of A and level j of B
SSE Total SS - SSA - SSB - SS(AB) Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Drug Manufacturer
• Each supervisors works at each of
three different shift times and the shift’s
output is measured on three randomly
selected days.
Supervisor Day Swing Night Ai
1 571 480 470 4650
610 474 430
625 540 450
2 480 625 630 5238
516 600 680
465 581 661
Bj 3267 3300 3321 9888
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The ANOVA Table
Total df = n –1 = abr - 1 Mean Squares
Factor A df = a –1 MSA= SSA/(k-1)
Factor B df = b –1 MSB = SSB/(b-1)
Interaction df = (a-1)(b-1) MS(AB) = SS(AB)/(a-1)(b-1)
Error df = by subtraction MSE = SSE/ab(r-1)
Source df SS MS F
A a -1 SST SST/(a-1) MST/MSE
B b -1 SSB SSB/(b-1) MSB/MSE
Interaction (a-1)(b-1) SS(AB) SS(AB)/(a-1)(b-1) MS(AB)/MSE
Error ab(r-1) SSE SSE/ab(r-1)
Total abr -1 Total SS
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Drug Manufacturer
• We generate the ANOVA table using
Minitab (StatANOVA Two way).
Two-way ANOVA: Output versus Supervisor, Shift
Source DF SS MS F P
Supervisor 1 19208 19208.0 26.68 0.000
Shift 2 247 123.5 0.17 0.844
Interaction 2 81127 40563.5 56.34 0.000
Error 12 8640 720.0
Total 17 109222
Source DF SS MS F P
Supervisor 1 19208 19208.0 26.68 0.000
Shift 2 247 123.5 0.17 0.844
Interaction 2 81127 40563.5 56.34 0.000
Error 12 8640 720.0
Total 17 109222
Supervisor 1 does
600
better earlier in the day,
while supervisor 2 does
Mean
550
better at night.
500
450
1 2 3
Shift
2.
2. Assumptions
Assumptions regarding
regarding the
the sampling
sampling
procedures
procedures are
are specified
specified for
for each
each design.
design.
(response is Growth)
IfIf not,
not, you
you will
will often
99
95
often see
see the
the pattern
pattern fail
fail
in
in the
the tails
tails of
of the
the graph.
90
80
70
graph.
Percent
60
50
40
30
20
10
1
-3 -2 -1 0 1 2 3
Residual
IfIf not,
not, you
you will
will see
see aa pattern
1.5
0.0
-0.5
-1.0
-1.5
-2.0
10 12 14 16 18 20
Fitted Value