Chapter 16(Technical English for Statistics)
Chapter 16(Technical English for Statistics)
EXAMPLE 1. Suppose that in an agricultural experiment four different chemical treatments of soil produced mean
wheat yields of 28, 22, 18, and 24 bushels per acre, respectively. Is there a significant difference in these means, or is
the observed spread due simply to chance?
Problems such as this can be solved by using an important technique known as analysis of variance, developed
by Fisher. It makes use of the F distribution already considered in Chapter 11.
We shall denote by Xj: the mean of the measurements in the jth row. We have
1 X
b
Xj: ¼ Xjk j ¼ 1, 2, . . . , a ð1Þ
b k¼1
403
Copyright © 2008, 1999, 1988, 1961 by The McGraw-Hill Companies, Inc. Click here for terms of use.
404 ANALYSIS OF VARIANCE [CHAP. 16
The dot in Xj: is used to show that the index k has been summed out. The values Xj: are called group
means, treatment means, or row means. The grand mean, or overall mean, is the mean of all the measure-
ments in all the groups and is denoted by X:
1 Xa X b
X ¼ X ð2Þ
ab j¼1 k¼1 jk
We call the first summation on the right-hand side of equations (5) and (6) the variation within treatments
(since it involves the squares of the deviations of Xjk from the treatment means Xj: ) and denote it by VW .
Thus X
VW ¼ ðXjk Xj: Þ2 ð7Þ
j;k
The second summation on the right-hand side of equations (5) and (6) is called the variation between
treatments (since it involves the squares of the deviations of the various treatment means Xj: from the
grand mean X) and is denoted by VB . Thus
X X
VB ¼ ðXj: XÞ
2¼b ðXj XÞ
2 ð8Þ
j;k j
1 X 2 T2
VB ¼ Tj: ð11Þ
b j ab
VW ¼ V VB ð12Þ
CHAP. 16] ANALYSIS OF VARIANCE 405
where T is the total of all values Xjk and where Tj: is the total of all values in the jth treatment:
X X
T¼ Xjk Tj: ¼ Xjk ð13Þ
j;k k
In practice, it is convenient to subtract some fixed value from all the data in the table in order to simplify
the calculation; this has no effect on the final results.
the population for all treatments and if we let j ¼ j , so that j ¼ þ j , then equation (14)
becomes
Xjk ¼ þ j þ "jk ð15Þ
P
where j j ¼ 0 (see Problem 16.9). From equation (15) and the assumption that the "jk are normally
distributed with mean 0 and variance 2 , we conclude that the Xjk can be considered random variables
that are normally distributed with mean and variance 2 .
The null hypothesis that all treatment means are equal is given by ðH0 : j ¼ 0; j ¼ 1, 2, . . . , aÞ or,
equivalently, by ðH0 : j ¼ ; j ¼ 1, 2, . . . , aÞ. If H0 is true, the treatment populations will all have the
same normal distribution (i.e., with the same mean and variance). In such cases there is just one treat-
ment population (i.e., all treatments are statistically identical); in other words, there is no significant
difference between the treatments.
VW
so that S^W
2
¼ ð20Þ
aðb 1Þ
is always a best (unbiased) estimate of 2 regardless of whether H0 is true. On the other hand, we see
from equations (16) and (18) that only if H0 is true (i.e., j ¼ 0Þ will we have
VB V
E ¼ 2
and E ¼ 2 ð21Þ
a1 ab 1
406 ANALYSIS OF VARIANCE [CHAP. 16
Theorem 2: Under the null hypothesis H0 , VB =2 and V=2 are chi-square distributed
with a 1 and ab 1 degrees of freedom, respectively.
It is important to emphasize that Theorem 1 is valid whether or not H0 is assumed, whereas Theorem 2 is
valid only if H0 is assumed.
Theorem 3 enables us to test the null hypothesis at some specified significance level by using a one-tailed
test of the F distribution (discussed in Chapter 11).
ANALYSIS-OF-VARIANCE TABLES
The calculations required for the above test are summarized in Table 16.2, which is called an
analysis-of-variance table. In practice, we would compute V and VB by using either the long method
[equations (3) and (8)] or the short method [equations (10) and (11)] and then by computing
VW ¼ V VB . It should be noted that the degrees of freedom for the total variation (i.e., ab 1Þ
are equal to the sum of the degrees of freedom for the between-treatments and within-treatments
variations.
CHAP. 16] ANALYSIS OF VARIANCE 407
Table 16.2
Variation Degrees of Freedom Mean Square F
BetweenXtreatments,
V S^B2
VB ¼ b ðXj: XÞ
2 a1 S^B2 ¼ B
j
a1 S^2
W
Within treatments,
VW with a 1 and aðb 1Þ
VW ¼ V VB aðb 1Þ S^W
2
¼
aðb 1Þ degrees of freedom
Total,
V ¼ VB þ VW ab 1
X
¼ 2
ðXjk XÞ
j; k
X X X Tj:2 T 2
VB ¼ ðXj: XÞ
2¼ Nj ðXj: XÞ
2¼ ð25Þ
j;k j j
Nj N
VW ¼ V VB ð26Þ
P
where j;k denotes the summation over k from 1 to Nj and then the summation over j from 1 to a.
Table 16.3 is the analysis-of-variance table for this case.
Table 16.3
Variation Degrees of Freedom Mean Square F
Between
X treatments 2 V S^B2
VB ¼ Nj ðXj: XÞ
a1 S^B2 ¼ B
j
a1 S^2
W
Within treatments,
VW with a 1 and N a
VW ¼ V VB Na S^W
2
¼
Na degrees of freedom
Total,
V ¼ VB þ VW N1
X
¼ 2
ðXjk XÞ
j; k