0% found this document useful (0 votes)
4 views5 pages

Chapter 16(Technical English for Statistics)

The document discusses Analysis of Variance (ANOVA), a statistical method used to test the significance of differences between three or more sample means. It explains the one-way classification method, total variation, and the distinction between variation within treatments and variation between treatments. The document also covers the mathematical model for ANOVA, the F test for null hypothesis testing, and provides analysis-of-variance tables for both equal and unequal numbers of observations.

Uploaded by

kirilyakov96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Chapter 16(Technical English for Statistics)

The document discusses Analysis of Variance (ANOVA), a statistical method used to test the significance of differences between three or more sample means. It explains the one-way classification method, total variation, and the distinction between variation within treatments and variation between treatments. The document also covers the mathematical model for ANOVA, the F test for null hypothesis testing, and provides analysis-of-variance tables for both equal and unequal numbers of observations.

Uploaded by

kirilyakov96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Analysis of Variance

THE PURPOSE OF ANALYSIS OF VARIANCE


In Chapter 8 we used sampling theory to test the significance of differences between two sampling
means. We assumed that the two populations from which the samples were drawn had the same variance.
In many situations there is a need to test the significance of differences between three or more sampling
means or, equivalently, to test the null hypothesis that the sample means are all equal.

EXAMPLE 1. Suppose that in an agricultural experiment four different chemical treatments of soil produced mean
wheat yields of 28, 22, 18, and 24 bushels per acre, respectively. Is there a significant difference in these means, or is
the observed spread due simply to chance?
Problems such as this can be solved by using an important technique known as analysis of variance, developed
by Fisher. It makes use of the F distribution already considered in Chapter 11.

ONE-WAY CLASSIFICATION, OR ONE-FACTOR EXPERIMENTS


In a one-factor experiment, measurements (or observations) are obtained for a independent groups of
samples, where the number of measurements in each group is b. We speak of a treatments, each of which
has b repetitions, or b replications. In Example 1, a ¼ 4.
The results of a one-factor experiment can be presented in a table having a rows and b columns, as
shown in Table 16.1. Here Xjk denotes the measurement in the jth row and kth column, where
j ¼ 1, 2, . . . , a and where k ¼ 1, 2, . . . , b. For example, X35 refers to the fifth measurement for the third
treatment.
Table 16.1
Treatment 1 X11 , X12 , . . . , X1b X1:

Treatment 2 X21 , X22 , . . . , X2b X2:


.. .. ..
. . .

Treatment a Xa1 , Xa2 , . . . , Xab Xa:

We shall denote by Xj: the mean of the measurements in the jth row. We have

1 X
b
Xj: ¼ Xjk j ¼ 1, 2, . . . , a ð1Þ
b k¼1

403
Copyright © 2008, 1999, 1988, 1961 by The McGraw-Hill Companies, Inc. Click here for terms of use.
404 ANALYSIS OF VARIANCE [CHAP. 16

The dot in Xj: is used to show that the index k has been summed out. The values Xj: are called group
means, treatment means, or row means. The grand mean, or overall mean, is the mean of all the measure-
ments in all the groups and is denoted by X:

1 Xa X b
X ¼ X ð2Þ
ab j¼1 k¼1 jk

TOTAL VARIATION, VARIATION WITHIN TREATMENTS, AND VARIATION


BETWEEN TREATMENTS
We define the total variation, denoted by V, as the sum of the squares of the deviations of each
measurement from the grand mean X
X
Total variation ¼ V ¼  2
ðXjk  XÞ ð3Þ
j;k
By writing the identity
Xjk  X ¼ ðXjk  Xj: Þ þ ðXj:  XÞ
 ð4Þ
and then squaring and summing over j and k, we have (see Problem 16.1)
X X X
 2¼
ðXjk  XÞ ðXjk  Xj: Þ2 þ ðXj:  XÞ
 2 ð5Þ
j;k j;k j;k
X X X
or  2¼
ðXjk  XÞ ðXjk  Xj: Þ2 þ b ðXj:  XÞ
 2 ð6Þ
j;k j;k j

We call the first summation on the right-hand side of equations (5) and (6) the variation within treatments
(since it involves the squares of the deviations of Xjk from the treatment means Xj: ) and denote it by VW .
Thus X
VW ¼ ðXjk  Xj: Þ2 ð7Þ
j;k

The second summation on the right-hand side of equations (5) and (6) is called the variation between
treatments (since it involves the squares of the deviations of the various treatment means Xj: from the
grand mean X)  and is denoted by VB . Thus
X X
VB ¼ ðXj:  XÞ
 2¼b ðXj  XÞ
 2 ð8Þ
j;k j

Equations (5) and (6) can thus be written


V ¼ VW þ VB ð9Þ

SHORTCUT METHODS FOR OBTAINING VARIATIONS


To minimize the labor of computing the above variations, the following forms are convenient:
X T2
V¼ Xjk2  ð10Þ
j;k
ab

1 X 2 T2
VB ¼ Tj:  ð11Þ
b j ab

VW ¼ V  VB ð12Þ
CHAP. 16] ANALYSIS OF VARIANCE 405

where T is the total of all values Xjk and where Tj: is the total of all values in the jth treatment:
X X
T¼ Xjk Tj: ¼ Xjk ð13Þ
j;k k

In practice, it is convenient to subtract some fixed value from all the data in the table in order to simplify
the calculation; this has no effect on the final results.

MATHEMATICAL MODEL FOR ANALYSIS OF VARIANCE


We can consider each row of Table 16.1 to be a random sample of size b from the population for that
particular treatment. The Xjk will differ from the population mean j for the jth treatment by a chance
error, or random error, which we denote by "jk ; thus
Xjk ¼ j þ "jk ð14Þ
These errors are assumed to be normally distributed with mean 0 and variance  . If  is the mean of
2

the population for all treatments and if we let j ¼ j  , so that j ¼  þ j , then equation (14)
becomes
Xjk ¼  þ j þ "jk ð15Þ
P
where j j ¼ 0 (see Problem 16.9). From equation (15) and the assumption that the "jk are normally
distributed with mean 0 and variance 2 , we conclude that the Xjk can be considered random variables
that are normally distributed with mean  and variance 2 .
The null hypothesis that all treatment means are equal is given by ðH0 : j ¼ 0; j ¼ 1, 2, . . . , aÞ or,
equivalently, by ðH0 : j ¼ ; j ¼ 1, 2, . . . , aÞ. If H0 is true, the treatment populations will all have the
same normal distribution (i.e., with the same mean and variance). In such cases there is just one treat-
ment population (i.e., all treatments are statistically identical); in other words, there is no significant
difference between the treatments.

EXPECTED VALUES OF THE VARIATIONS


It can be shown (see Problem 16.10) that the expected values of VW , VB , and V are given by
EðVW Þ ¼ aðb  1Þ2 ð16Þ
X
EðVB Þ ¼ ða  1Þ2 þ b 2j ð17Þ
j
X
EðVÞ ¼ ðab  1Þ2 þ b 2j ð18Þ
j

From equation (16) it follows that


 
VW
E ¼ 2 ð19Þ
aðb  1Þ

VW
so that S^W
2
¼ ð20Þ
aðb  1Þ
is always a best (unbiased) estimate of 2 regardless of whether H0 is true. On the other hand, we see
from equations (16) and (18) that only if H0 is true (i.e., j ¼ 0Þ will we have
   
VB V
E ¼ 2
and E ¼ 2 ð21Þ
a1 ab  1
406 ANALYSIS OF VARIANCE [CHAP. 16

so that only in such case will


V V
S^B2 ¼ B and S^2 ¼ ð22Þ
a1 ab  1
provide unbiased estimates of 2 . If H0 is not true, however, then from equation (16) we have
b X 2
EðS^B2 Þ ¼ 2 þ j ð23Þ
a1 j

DISTRIBUTIONS OF THE VARIATIONS


Using the additive property of chi-square, we can prove the following fundamental theorems
concerning the distributions of the variations VW , VB , and V:

Theorem 1: VW =2 is chi-square distributed with aðb  1Þ degrees of freedom.

Theorem 2: Under the null hypothesis H0 , VB =2 and V=2 are chi-square distributed
with a  1 and ab  1 degrees of freedom, respectively.

It is important to emphasize that Theorem 1 is valid whether or not H0 is assumed, whereas Theorem 2 is
valid only if H0 is assumed.

THE F TEST FOR THE NULL HYPOTHESIS OF EQUAL MEANS


If the null hypothesis H0 is not true (i.e., if the treatment means are not equal), we see from equation
(23) that we can expect S^B2 to be greater than 2 , with the effect becoming more pronounced as the
discrepancy between the means increases. On the other hand, from equations (19) and (20) we can expect
S^W
2
to be equal to 2 regardless of whether the means are equal. It follows that a good statistic for testing
hypothesis H0 is provided by S^B2 =S^W
2
. If this statistic is significantly large, we can conclude that there is a
significant difference between the treatment means and can thus reject H0 ; otherwise, we can either
accept H0 or reserve judgment, pending further analysis.
In order to use the S^B2 =S^W
2
statistic, we must know its sampling distribution. This is provided by
Theorem 3.

Theorem 3: The statistic F ¼ S^B2 =S^W


2
has the F distribution with a  1 and aðb  1Þ
degrees of freedom.

Theorem 3 enables us to test the null hypothesis at some specified significance level by using a one-tailed
test of the F distribution (discussed in Chapter 11).

ANALYSIS-OF-VARIANCE TABLES
The calculations required for the above test are summarized in Table 16.2, which is called an
analysis-of-variance table. In practice, we would compute V and VB by using either the long method
[equations (3) and (8)] or the short method [equations (10) and (11)] and then by computing
VW ¼ V  VB . It should be noted that the degrees of freedom for the total variation (i.e., ab  1Þ
are equal to the sum of the degrees of freedom for the between-treatments and within-treatments
variations.
CHAP. 16] ANALYSIS OF VARIANCE 407

Table 16.2
Variation Degrees of Freedom Mean Square F

BetweenXtreatments,
V S^B2
VB ¼ b ðXj:  XÞ
 2 a1 S^B2 ¼ B
j
a1 S^2
W

Within treatments,
VW with a  1 and aðb  1Þ
VW ¼ V  VB aðb  1Þ S^W
2
¼
aðb  1Þ degrees of freedom

Total,
V ¼ VB þ VW ab  1
X
¼  2
ðXjk  XÞ
j; k

MODIFICATIONS FOR UNEQUAL NUMBERS OF OBSERVATIONS


In case the treatments 1, . . . , a have different numbers of observations—equal to N1 , . . . , Na ,
respectively—the above results are easily modified, Thus we obtain
X X T2
V¼  2¼
ðXjk  XÞ Xjk2  ð24Þ
j;k j;k
N

X X X Tj:2 T 2
VB ¼ ðXj:  XÞ
 2¼ Nj ðXj:  XÞ
 2¼  ð25Þ
j;k j j
Nj N

VW ¼ V  VB ð26Þ
P
where j;k denotes the summation over k from 1 to Nj and then the summation over j from 1 to a.
Table 16.3 is the analysis-of-variance table for this case.

Table 16.3
Variation Degrees of Freedom Mean Square F

Between
X treatments 2 V S^B2
VB ¼ Nj ðXj:  XÞ
 a1 S^B2 ¼ B
j
a1 S^2
W

Within treatments,
VW with a  1 and N  a
VW ¼ V  VB Na S^W
2
¼
Na degrees of freedom

Total,
V ¼ VB þ VW N1
X
¼  2
ðXjk  XÞ
j; k

TWO-WAY CLASSIFICATION, OR TWO-FACTOR EXPERIMENTS


The ideas of analysis of variance for one-way classification, or one-factor experiments, can be
generalized. Example 2 illustrates the procedure for two-way classification, or two-factor experiments.

You might also like