0% found this document useful (0 votes)

348 views91 pages

Anova (Keller)

Uploaded by

Kavita Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

348 views91 pages

Anova (Keller)

Uploaded by

Kavita Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 91

Chapter 14

Analysis of Variance

Copyright © 2009 Cengage Learning 14.1

Analysis of Variance
Analysis of variance is a technique that
allows us to compare two or more
populations of interval data.

Analysis of variance is:

 an extremely powerful and widely
used procedure.
 a procedure which determines
whether differences exist between
population means.
 a procedure which works by
analyzing sample variance.
Copyright © 2009 Cengage Learning 14.2
One-Way Analysis of Variance
Independent samples are drawn from k
populations:

Note: These populations are referred to as

treatments.
It is not a requirement that n1 = n2 = … = nk.

Copyright © 2009 Cengage Learning 14.3

One Way Analysis of Variance
New Terminology:

x is the response variable, and its values are

responses.

xij refers to the ith observation in the jth sample.

E.g. x35 is the third observation of the
fifth sample.

The grand mean, , is the mean of all the

observations, i.e.:

(n = n1 + n2 + … + nk)
Copyright © 2009 Cengage Learning 14.4
One Way Analysis of Variance
More New Terminology:

Population classification criterion is

called a factor.

Each population is a factor level.

Copyright © 2009 Cengage Learning 14.5

Example 14.1
In the last decade stockbrokers have
drastically changed the way they do business.
It is now easier and cheaper to invest in the
stock market than ever before.

What are the effects of these changes?

To help answer this question a financial

analyst randomly sampled 366 American
households and asked each to report the age
of the head of the household and the
proportion of their financial assets that are
invested in the stock market.
Copyright © 2009 Cengage Learning 14.6
Example 14.1
The age categories are
Young (Under 35)
Early middle-age (35 to 49)
Late middle-age (50 to 65)
Senior (Over 65)
The analyst was particularly interested in
determining whether the ownership of stocks
varied by age. Xm14-01

Do these data allow the analyst to determine

that there are differences in stock
ownership between the four age groups?
Copyright © 2009 Cengage Learning 14.7
Example 14.1 Terminology

Percentage of total assets invested in the stock

market is the response variable; the actual
percentages are the responses in this example.

Population classification criterion is called a

factor.

The age category is the factor we’re interested

in. This is the only factor under consideration
(hence the term “one way” analysis of variance).

Each population is a factor level.

In this example, there are four factor levels:
Young, Early middle age, Late middle age, and
Senior.
Copyright © 2009 Cengage Learning 14.8
Example 14.1 IDENTIFY

The null hypothesis in this case is:

H0:µ1 = µ2 = µ3 = µ4

i.e. there are no differences between

population means.

Our alternative hypothesis becomes:

H1: at least two means differ

OK. Now we need some test statistics…

Copyright © 2009 Cengage Learning 14.9

Test Statistic
Since µ1 = µ2 = µ3 = µ4 is of interest to
us, a statistic that measures the
proximity of the sample means to each
other would also be of interest.

Such a statistic exists, and is called

the between-treatments variation. It is
denoted SST, short for “sum of grand mean
squares
sum across k treatments
for treatments”. Its is calculated as:

A large SST indicates large variation between sample means which supports H 1.
Copyright © 2009 Cengage Learning 14.10
Test Statistic
When we performed the equal-variances test
to determine whether two means differed
(Chapter 13) we used
2 2
( x1  x 2 ) ( n  1) s  ( n  1)s
t s 2p  1 1 2 2

2 1 1  n 1  n 2  2 where
s p   
 n1 n 2 

The numerator measures the difference

between sample means and the denominator
measures the variation in the samples.

Copyright © 2009 Cengage Learning 14.11

Test Statistic
SST gave us the between-treatments variation.
A second statistic, SSE (Sum of Squares for
Error) measures the within-treatments
variation.

SSE is given by:

or:

In the second formulation, it is easier to

see that it provides a measure of the amount
of variation we can expect from the random
variable we’ve observed.
Copyright © 2009 Cengage Learning 14.12
Example 14.1 COMPUTE

Since:

If it were the case that:

x1  x 2  x 3  x 4
then SST = 0 and our null hypothesis, H0:µ1 = µ2
= µ3 = µ4
would be supported.

More generally, a small value of SST supports

the null hypothesis. A large value of SST
supports the alternative hypothesis. The
question is, how large is “large enough”?
Copyright © 2009 Cengage Learning 14.13
Example 14.1 COMPUTE

The following sample statistics and

grand mean were computed
x1  44.40
x 2  52.47
x 3  51.14
x 4  51.84
x  50.18

Copyright © 2009 Cengage Learning 14.14

Example 14.1 COMPUTE

Hence, the between-treatments variation,

sum of squares for treatments, is

SST  84( x1  x ) 2  131( x 2  x ) 2  93( x 3  x ) 2  58( x 4  x ) 2

 84(44.40  50.18) 2  131(52.47  50.18) 2  93(51.14  50.18) 2
 58(51.84  50.18) 2
 3741.4

Is SST = 3,741.4 “large enough”?

Copyright © 2009 Cengage Learning 14.15

Example 14.1 COMPUTE

We calculate the sample variances as:

s12  386.55, s 22  469.44, s 32  461.82, s 24  444.79

and from these, calculate the within-treatments
variation (sum of squares for error) as:

SSE  (n1  1)s12  (n 2 1)s 22  (n 3  1)s 32  (n 4  1)s 24

 (84  1)(386.55)  (131  1)(469.44)  (93  1)(471.82)  (58  1)(444.79)

= 161,871.0
We still need a couple more quantities in order
to relate SST and SSE together in a meaningful
way…

Copyright © 2009 Cengage Learning 14.16

Mean Squares
The mean square for treatments (MST) is given
by:

The mean square for errors (MSE) is given by:

And the test statistic:

is F-distributed with k–1 and n–k degrees of

freedom.
Aha! We must be close…
Copyright © 2009 Cengage Learning 14.17
Example 14.1 COMPUTE

We can calculate the mean squares treatment

and mean squares error quantities as:
SST 3,741.4
MST    1,247.12
k 1 3
SSE 161,612.3
MSE    447.16
nk 362

Giving us our F-statistic of:

MST 1,247.12
F   2.79
MSE 447.16
Does F = 2.79 fall into a rejection region
or not? What is the p-value?

Copyright © 2009 Cengage Learning 14.18

Example 14.1 INTERPRET

Since the purpose of calculating the F-

statistic is to determine whether the
value of SST is large enough to reject
the null hypothesis, if SST is large, F
will be large.

P-value = P(F > Fstat)

Copyright © 2009 Cengage Learning 14.19

Example 14.1 COMPUTE

Using Excel:
Click Data, Data Analysis, Anova:
Single Factor

Copyright © 2009 Cengage Learning 14.20

Example 14.1 COMPUTE
A B C D E F G
1 Anova: Single Factor
2
3 SUMMARY
4 Groups Count Sum Average Variance
5 Young 84 3729.5 44.40 386.55
6 Early Middle Age 131 6873.9 52.47 469.44
7 Late Middle Age 93 4755.9 51.14 471.82
8 Senior 58 3006.6 51.84 444.79
9
10
11 ANOVA
12 Source of Variation SS df MS F P-value F crit
13 Between Groups 3741.4 3 1247.12 2.79 0.0405 2.6296
14 Within Groups 161871.0 362 447.16
15
16 Total 165612.3 365

Copyright © 2009 Cengage Learning 14.21

Example 14.1 INTERPRET

Since the p-value is .0405, which is

small we reject the null hypothesis
(H0:µ1 = µ2 = µ3 = µ4) in favor of the
alternative hypothesis (H1: at least two
population means differ).

That is: there is enough evidence to

infer that the mean percentages of
assets invested in the stock market
differ between the four age categories.

Copyright © 2009 Cengage Learning 14.22

ANOVA Table
The results of analysis of variance are
usually reported in an ANOVA table…
Source of degrees of
Sum of Squares Mean Square
Variation freedom

Treatments k–1 SST MST=SST/(k–1)

Error n–k SSE MSE=SSE/(n–k)

Total n–1 SS(Total)

F-stat=MST/MSE

Copyright © 2009 Cengage Learning 14.23

ANOVA and t-tests of 2 means
Why do we need the analysis of variance? Why not
test every pair of means? For example say k = 6.
There are C26 = 6(5)/2= 14 different pairs of means.
1&2 1&3 1&4 1&5 1&6
2&3 2&4 2&5 2&6
3&4 3&5 3&6
4&5 4&6
5&6

If we test each pair with α = .05 we increase the

probability of making a Type I error. If there are
no differences then the probability of making at
least one Type I error is 1-(.95)14 = 1 - .463 = .
537

Copyright © 2009 Cengage Learning 14.24

Checking the Required Conditions
The F-test of the analysis of variance requires
that the random variable be normally
distributed with equal variances. The normality
requirement is easily checked graphically by
producing the histograms for each sample.
(To see histograms click
Example 14.1 Histograms)

The equality of variances is examined by

printing the sample standard deviations or
variances. The similarity of sample variances
allows us to assume that the population
variances are equal.

Copyright © 2009 Cengage Learning 14.25

Violation of the Required Conditions
If the data are not normally distributed
we can replace the one-way analysis of
variance with its nonparametric
counterpart, which is the Kruskal-Wallis
test. (See Section 19.3.)

If the population variances are unequal,

we can use several methods to correct the
problem.

However, these corrective measures are

beyond the level of this book.
Copyright © 2009 Cengage Learning 14.26
Identifying Factors
Factors that Identify the One-Way
Analysis of Variance:

Copyright © 2009 Cengage Learning 14.27

Multiple Comparisons
When we conclude from the one-way analysis of
variance that at least two treatment means
differ (i.e. we reject the null hypothesis that
H0: ), we often need to
know which treatment means are responsible for
these differences.

We will examine three statistical inference

procedures that allow us to determine which
population means differ:
• Fisher’s least significant difference
(LSD) method
• Bonferroni adjustment, and
• Tukey’s multiple comparison method.
Copyright © 2009 Cengage Learning 14.28
Multiple Comparisons
Two means are considered different if the
difference between the corresponding sample
means is larger than a critical number. The
general case for this is,

THEN we conclude and differ.

The larger sample mean is then believed to

be associated with a larger population
mean.
Copyright © 2009 Cengage Learning 14.29
Fisher’s Least Significant Difference
What is this critical number, NCritical ? Recall
that in Chapter 13 we had the confidence
interval estimator
 1 of
1 µ1-µ2
( x 1  x 2 )  t  / 2 s 2p   
 n1 n 2 

If the interval excludes 0 we can conclude that

the population means differ. So another way to
conduct a two-tail test is to determine whether
( x1  x 2 )

is greater 1 than
1 
2
t  / 2 s p   
 n1 n 2 
Copyright © 2009 Cengage Learning 14.30
Fisher’s Least Significant Difference
However, we have a better estimator of the
pooled variances. It is MSE. We substitute MSE
in place of sp2. Thus we compare the
difference between means to the Least
Significant Difference LSD, given by:

LSD will be the same for all pairs of means if

all k sample sizes are equal. If some sample
sizes differ, LSD must be calculated for each
combination.
Copyright © 2009 Cengage Learning 14.31
Example 14.2
North American automobile manufacturers have
become more concerned with quality because of
foreign competition.

One aspect of quality is the cost of repairing

damage caused by accidents. A manufacturer is
considering several new types of bumpers.

To test how well they react to low-speed

collisions, 10 bumpers of each of four
different types were installed on mid-size
cars, which were then driven into a wall at 5
miles per hour.
Copyright © 2009 Cengage Learning 14.32
Example 14.2
The cost of repairing the damage in each
case was assessed. Xm14-02

a Is there sufficient evidence to infer

that the bumpers differ in their
reactions to low-speed collisions?

b If differences exist, which bumpers

differ?

Copyright © 2009 Cengage Learning 14.33

Example 14.2
The problem objective is to compare four
populations, the data are interval, and the
samples are independent. The correct
statistical method is the one-way analysis of
variance.
A B C D E F G
11 ANOVA
12 Source of Variation SS df MS F P-value F crit
13 Between Groups 150,884 3 50,295 4.06 0.0139 2.8663
14 Within Groups 446,368 36 12,399
15
16 Total 597,252 39

F = 4.06, p-value = .0139. There is enough

evidence to infer that a difference exists
between the four bumpers. The question is
now, which bumpers differ?
Copyright © 2009 Cengage Learning 14.34
Example 14.2
The sample means are
x1  380.0
x 2  485.9
x 3  483.8
x 4  348.2

and MSE = 12,399. Thus

 1 1   1 1
LSD  t  / 2  
 n i n j   2.030 12,399 10  10   101.09
MSE
   

Copyright © 2009 Cengage Learning 14.35

Example 14.2
We calculate the absolute value of the differences
between means and compare them to LSD = 101.09.

| x1  x 2 |  | 380.0  485.9 |  | 105.9 |  105.9

| x1  x 3 |  | 380.0  483.8 |  | 103.8 |  103.8
| x1  x 4 |  | 380.0  348.2 |  | 31.8 |  31.8
| x 2  x 3 | | 485.9  483.8 |  | 2.1 |  2.1
| x 2  x 4 |  | 485.9  348.2 |  | 137.7 |  137.7
| x3  x4 | µ
Hence, 1| 483.8 µ348
and 21| and
2, .µ | 135µ.63,|  µ135.6 µ4, and µ3 and µ4
2 and
differ.

The other two pairs µ1 and µ4, and µ2 and µ3 do not

differ.

Copyright © 2009 Cengage Learning 14.36

Example 14.2 Excel
Click Add-Ins > Data Analysis Plus >
Multiple Comparisons

Copyright © 2009 Cengage Learning 14.37

Example 14.2 Excel
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.05 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 100.99 133.45
6 Bumper 3 -103.8 100.99 133.45
7 Bumper 4 31.8 100.99 133.45
8 Bumper 2 Bumper 3 2.1 100.99 133.45
9 Bumper 4 137.7 100.99 133.45
10 Bumper 3 Bumper 4 135.6 100.99 133.45
Hence, µ1 and µ2, µ1 and µ3, µ2 and µ4, and µ3
and µ4 differ.

The other two pairs µ1 and µ4, and µ2 and µ3 do

not differ.
Copyright © 2009 Cengage Learning 14.38
Bonferroni Adjustment to LSD Method…
Fisher’s method may result in an increased
probability of committing a type I error.

We can adjust Fisher’s LSD calculation by

using the “Bonferroni adjustment”.

Where we used alpha ( ), say .05,

previously, we now use and adjusted value for
alpha:
E

C
where

Copyright © 2009 Cengage Learning 14.39

Example 14.2
If we perform the LSD procedure with the Bonferroni
adjustment the number of pairwise comparisons is 6
(calculated as C = k(k − 1)/2 = 4(3)/2).

We set α = .05/6 = .0083. Thus, tα/2,36 = 2.794 (available

from Excel and difficult to approximate manually) and

 1 1  1 1
LSD  t  / 2 MSE     2.79 12,399    139.13
 ni n j   10 10 
 

Copyright © 2009 Cengage Learning 14.40

Example 14.2 Excel
Click Add-Ins > Data Analysis Plus >
Multiple Comparisons

Copyright © 2009 Cengage Learning 14.41

Example 14.2 Excel
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.0083 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 139.11 133.45
6 Bumper 3 -103.8 139.11 133.45
7 Bumper 4 31.8 139.11 133.45
8 Bumper 2 Bumper 3 2.1 139.11 133.45
9 Bumper 4 137.7 139.11 133.45
10 Bumper 3 Bumper 4 135.6 139.11 133.45

Now, none of the six pairs of means

differ.

Copyright © 2009 Cengage Learning 14.42

Tukey’s Multiple Comparison Method
As before, we are looking for a critical
number to compare the differences of the
sample means against. In this case:

Critical value of the Studentized range

with n–k degrees of freedom
Table 7 - Appendix B harmonic mean of the sample sizes

Note: is a lower case Omega, not a “w”

Copyright © 2009 Cengage Learning 14.43

Example 14.2 Excel
k = number of treatments
n = Number of observations ( n = n1+ n2 +
. . . + nk )
ν = Number of degrees of freedom
associated with MSE ( )
qng(k , =) Number of observations in each of k
samples
α = Significance level
= Critical value of the
Studentized range

Copyright © 2009 Cengage Learning 14.44

Example 14.2
k = 4
N1 = n2 = n3 = n4 = ng = 10
Ν = 40 – 4 = 36
MSE = 12,399
q .05 (4,37)  q .05 (4,40)  3.79

Thus,

MSE 12,399
  q  (k, )  (3.79)  133.45
ng 10

Copyright © 2009 Cengage Learning 14.45

Example 14.1 • Tukey’s Method
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.05 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 100.99 133.45
6 Bumper 3 -103.8 100.99 133.45
7 Bumper 4 31.8 100.99 133.45
8 Bumper 2 Bumper 3 2.1 100.99 133.45
9 Bumper 4 137.7 100.99 133.45
10 Bumper 3 Bumper 4 135.6 100.99 133.45

Using Tukey’s method µ2 and µ4, and µ3 and

µ4 differ.

Copyright © 2009 Cengage Learning 14.46

Which method to use?
If you have identified two or three
pairwise comparisons that you wish to
make before conducting the analysis of
variance, use the Bonferroni method.

If you plan to compare all possible

combinations, use Tukey’s comparison
method.

Copyright © 2009 Cengage Learning 14.47

Analysis of Variance Experimental Designs
Experimental design determines which
analysis of variance technique we use.

In the previous example we compared

three populations on the basis of one
factor – advertising strategy.

One-way analysis of variance is only one

of many different experimental designs
of the analysis of variance.

Copyright © 2009 Cengage Learning 14.48

Analysis of Variance Experimental Designs
A multifactor experiment is one where there are
two or more factors that define the treatments.

For example, if instead of just varying the

advertising strategy for our new apple juice
product we also varied the advertising medium
(e.g. television or newspaper), then we have a
two-factor analysis of variance situation.

The first factor, advertising strategy, still

has three levels (convenience, quality, and
price) while the second factor, advertising
medium, has two levels (TV or print).

Copyright © 2009 Cengage Learning 14.49

Independent Samples and Blocks
Similar to the ‘matched pairs experiment’,
a randomized block design experiment
reduces the variation within the samples,
making it easier to detect differences
between populations.

The term block refers to a matched group of

observations from each population.

We can also perform a blocked experiment by

using the same subject for each treatment
in a “repeated measures” experiment.
Copyright © 2009 Cengage Learning 14.50
Independent Samples and Blocks
The randomized block experiment is also
called the two-way analysis of variance,
not to be confused with the two-factor
analysis of variance. To illustrate
where we’re headed…

we’ll
do this
first

Copyright © 2009 Cengage Learning 14.51

Randomized Block Analysis of Variance
The purpose of designing a randomized block
experiment is to reduce the within-
treatments variation to more easily detect
differences between the treatment means.

In this design, we partition the total

variation into three sources of variation:
SS(Total) = SST + SSB + SSE
where SSB, the sum of squares for blocks,
measures the variation between the blocks.

Copyright © 2009 Cengage Learning 14.52

Randomized Blocks…
In addition to k treatments, we
introduce notation for b blocks in our
mean of the
experimental observations of the 1 treatment
design… st

mean of the observations of the 2nd treatment

Copyright © 2009 Cengage Learning 14.53
Sum of Squares : Randomized Block…
Squaring the ‘distance’ from the grand
mean, leads to the following set of
formulae…

test statistic for treatments

test statistic for blocks

Copyright © 2009 Cengage Learning 14.54
ANOVA Table…
We can summarize this new information in
an analysis of variance (ANOVA) table
for the randomized block analysis of
variance as follows…
Source of Sum of
d.f.: Mean Square F Statistic
Variation Squares

Treatments k–1 SST MST=SST/(k–1) F=MST/MSE

Blocks b–1 SSB MSB=SSB/(b-1) F=MSB/MSE

Error n–k–b+1 SSE MSE=SSE/(n–k–b+1)

Total n–1 SS(Total)

Copyright © 2009 Cengage Learning 14.55

Example 14.3
Many North Americans suffer from high levels of
cholesterol, which can lead to heart attacks.
For those with very high levels (over 280),
doctors prescribe drugs to reduce cholesterol
levels. A pharmaceutical company has recently
developed four such drugs. To determine whether
any differences exist in their benefits, an
experiment was organized. The company selected
25 groups of four men, each of whom had
cholesterol levels in excess of 280. In each
group, the men were matched according to age
and weight. The drugs were administered over a
2-month period, and the reduction in
cholesterol was recorded (Xm14-03). Do these
results allow the company to conclude that
differences exist between the four new drugs?
Copyright © 2009 Cengage Learning 14.56
Example 14.3 IDENTIFY

The hypotheses to test in this case are:

H0:µ1 = µ2 = µ3 = µ4
H1: At least two means differ

Copyright © 2009 Cengage Learning 14.57

Example 14.3 IDENTIFY

Each of the four drugs can be considered a

treatment.

Each group) can be blocked, because they are

matched by age and weight.

By setting up the experiment this way, we

eliminates the variability in cholesterol
reduction related to different combinations
of age and weight. This helps detect
differences in the mean cholesterol
reduction attributed to the different drugs.

Copyright © 2009 Cengage Learning 14.58

Example 14.3 The Data

Treatment
Group Drug 1 Drug 2 Drug 3 Drug 4
1 6.6 12.6 2.7 8.7
2 7.1 3.5 2.4 9.3
3 7.5 4.4 6.5 10.0
4 9.9 7.5 16.2 12.6
Block 5 13.8 6.4 8.3 10.6
6 13.9 13.5 5.4 15.4

There are b = 25 blocks, and

k = 4 treatments in this example.

Copyright © 2009 Cengage Learning 14.59

Example 14.3 COMPUTE

Click Data, Data Analysis, Anova: Two Factor

Without Replication
a.k.a. Randomized Block

Copyright © 2009 Cengage Learning 14.60

Example 14.3 COMPUTE
A B C D E F G
1 Anova: Two-Factor Without Replication
2
3 SUMMARY Count Sum Average Variance
4 1 4 30.60 7.65 17.07
5 2 4 22.30 5.58 10.20
25 22 4 112.10 28.03 5.00
26 23 4 89.40 22.35 13.69
27 24 4 93.30 23.33 7.11
28 25 4 113.10 28.28 4.69
29
30 Drug 1 25 438.70 17.55 32.70
31 Drug 2 25 452.40 18.10 73.24
32 Drug 3 25 386.20 15.45 65.72
33 Drug 4 25 483.00 19.32 36.31
34
35
36 ANOVA
37 Source of Variation SS df MS F P-value F crit
38 Rows 3848.7 24 160.36 10.11 0.0000 1.67
39 Columns 196.0 3 65.32 4.12 0.0094 2.73
40 Error 1142.6 72 15.87
41
42 Total 5187.2 99
Copyright © 2009 Cengage Learning 14.61
Checking the Required Conditions
The F-test of the randomized block
design of the analysis of variance has
the same requirements as the
independent samples design.

That is, the random variable must be

normally distributed and the population
variances must be equal.

The histograms (not shown) appear to

support the validity of our results;
the reductions appear to be normal.
Copyright © 2009 Cengage Learning 14.62
Violation of the Required Conditions

When the response is not normally

distributed, we can replace the
randomized block analysis of variance
with the Friedman test, which is
introduced in Section 19.4.

Copyright © 2009 Cengage Learning 14.63

Developing an Understanding of Statistical Concepts

As we explained previously, the

randomized block experiment is an
extension of the matched pairs
experiment discussed in Section 13.3.

In the matched pairs experiment, we

simply remove the effect of the
variation caused by differences between
the experimental units.

The effect of this removal is seen in

the decrease in the value of the
standard error (compared to the
Copyright © 2009 Cengage Learning 14.64
Developing an Understanding of Statistical Concepts

In the randomized block experiment of

the analysis of variance, we actually
measure the variation between the
blocks by computing SSB.

The sum of squares for error is reduced

by SSB, making it easier to detect
differences between the treatments.

Additionally, we can test to determine

whether the blocks differ--a procedure
we were unable to perform in the
matched pairs experiment.
Copyright © 2009 Cengage Learning 14.65
Identifying Factors
Factors that Identify the Randomized
Block of the Analysis of Variance:

Two-Factor Analysis of Variance…
In Section 14.1, we addressed problems where the data
were
generated from single-factor experiments.

In Example 14.1, the treatments were the four age

categories.
Thus, there were four levels of a single factor. In
this
section, we address the problem where the experiment
features two factors.

The general term for such data-gathering procedures is

factorial experiment.

Two-Factor Analysis of Variance…
In factorial experiments, we can examine
the effect on the
response variable of two or more factors,
although in this
book we address the problem of only two
factors.

We can use the analysis of variance to

determine whether the
Levels of each factor are different from
one another.

Copyright © 2009 Cengage Learning 14.68
Example 14.4
One measure of the health of a nation’s economy is how
quickly it creates jobs.

One aspect of this issue is the number of jobs

individuals
hold.

As part of a study on job tenure, a survey was

conducted
wherein Americans aged between 37 and 45 were asked
how
many jobs they have held in their lifetimes. Also
recorded
were gender and educational attainment.

Example 14.4
The categories are
Less than high school (E1)
High school (E2)
Some college/university but no degree (E3)
At least one university degree (E4)

The data were recorded for each of the eight categories

of
Gender and education. Xm14-04

Can we infer that differences exist between genders and

educational levels?

Example 14.4

Male E1 Male E2 Male E3 Male E4 Female E1 Female E2 Female E3 Female E4
10 12 15 8 7 7 5 7
9 11 8 9 13 12 13 9
12 9 7 5 14 6 12 3
16 14 7 11 6 15 3 7
14 12 7 13 11 10 13 9
17 16 9 8 14 13 11 6
13 10 14 7 13 9 15 10
9 10 15 11 11 15 5 15
11 5 11 10 14 12 9 4
15 11 13 8 12 13 8 11

Example 14.4 IDENTIFY

We begin by treating this example as a one-way

analysis of
Variance with eight treatments.

However, the treatments are defined by two

different factors.

One factor is gender, which has two levels.

The second factor is educational attainment,

We can proceed to solve this problem in

the same way we
did in Section 14.1: that is, we test
H 0 : following
the 1   2   3  hypotheses:
 4  5   6   7  8

H1: At least two means differ.

Example 14.4 COMPUTE

A B C D E F G
1 Anova: Single Factor
2
3 SUMMARY
4 Groups Count Sum Average Variance
5 Male E1 10 126 12.60 8.27
6 Male E2 10 110 11.00 8.67
7 Male E3 10 106 10.60 11.60
8 Male E4 10 90 9.00 5.33
9 Female E1 10 115 11.50 8.28
10 Female E2 10 112 11.20 9.73
11 Female E3 10 94 9.40 16.49
12 Female E4 10 81 8.10 12.32
13
14
15 ANOVA
16 Source of Variation SS df MS F P-value F crit
17 Between Groups 153.35 7 21.91 2.17 0.0467 2.1397
18 Within Groups 726.20 72 10.09
19
20 Total 879.55 79

Example 14.4 INTERPRET

The value of the test statistic is F =

2.17 with a p-value of
.0467.

We conclude that there are differences

in the number
of jobs between the eight treatments.

Example 14.4
This statistical result raises more questions.

Namely, can we conclude that the differences in the mean

number of jobs are caused by differences between males
and
females?

Or are they caused by differences between educational

levels?

Or, perhaps, are there combinations, called interactions

of
gender and education that result in especially high or
low
numbers?

Terminology
• A complete factorial experiment is an
experiment in which the data for all
possible combinations of the levels of the
factors are gathered. This is also known as
a two-way classification.

• The two factors are usually labeled A & B,

with the number of levels of each factor
denoted by a & b respectively.

• The number of observations for each

combination is called a replicate, and is
denoted by r. For our purposes, the number
of replicates will be the same for each
treatment, that is they are balanced.
Copyright © 2009 Cengage Learning 14.77
Terminology Xm14-04a
Male Female
Less than high school 10 7
9 13
12 14
16 6
14 11
17 14
13 13
9 11
11 14
15 12
High School 12 7
11 12
9 6
14 15
12 10
16 13
10 9
10 15
5 12
11 13
Less than Bachelor's degree 15 5
8 13
7 12
7 3
7 13
9 11
14 15
15 5
11 9
13 8
At least one Bachelor's degree 8 7
9 9
5 3
11 7
13 9
8 6
7 10
11 15
10 4
8 11

Terminology
Thus, we use a complete factorial
experiment where the
number of treatments is ab with r
replicates per treatment.

In Example 14.4, a = 2, b = 4, and r =

10.

As a result, we have 10 observations for

each of the eight
treatments.

Example 14.4
If you examine the ANOVA table, you can see that the
total
variation is SS(Total) = 879.55, the sum of squares for
treatments is SST = 153.35, and the sum of squares for
error
is SSE = 726.20.

The variation caused by the treatments is measured by

SST.

In order to determine whether the differences are due to

factor A, factor B, or some interaction between the two
factors, we need to partition SST into three sources.

These are SS(A), SS(B), and SS(AB).

ANOVA Table… Table 14.8

Source of Sum of
d.f.: Mean Square F Statistic
Variation Squares

Factor A a-1 SS(A) MS(A)=SS(A)/(a-1) F=MS(A)/MSE

Factor B b–1 SS(B) MS(B)=SS(B)/(b-1) F=MS(B)/MSE

MS(AB) = SS(AB)
Interaction (a-1)(b-1) SS(AB) F=MS(AB)/MSE
[(a-1)(b-1)]

Error n–ab SSE MSE=SSE/(n–ab)

Total n–1 SS(Total)

Example 14.4
Test for the differences between the Levels
of Factor A…
H0: The means of the a levels of Factor A
are equal
H1: At least two means differ
Test statistic: F = MS(A) / MSE

Example 14.4: Are there differences in the

mean number of jobs between men and women?
H0: µmen = µwomen
H1: At least two means differ

Example 14.4
Test for the differences between the Levels
of Factor B…
H0: The means of the a levels of Factor B
are equal
H1: At least two means differ
Test statistic: F = MS(B) / MSE

Example 14.4: Are there differences in the

H 0 :  Enumber
mean 1
  E 2  of
 E3   E 4
jobs between the four educational levels?

H1: At least two means differ

Copyright © 2009 Cengage Learning 14.83
Example 14.4
Test for interaction between Factors A and
B…
H0: Factors A and B do not interact to affect
the mean responses.
H1: Factors A and B do interact to affect the
mean responses.
Test statistic: F = MS(AB) / MSE

Example 14.4: Are there differences in the

Example 14.4 COMPUTE

ANOVA table part of the printout. Click here to see

the
complete Excel printout.
A B C D E F G
35 ANOVA
36 Source of Variation SS df MS F P-value F crit
37 Sample 135.85 3 45.28 4.49 0.0060 2.7318
38 Columns 11.25 1 11.25 1.12 0.2944 3.9739
39 Interaction 6.25 3 2.08 0.21 0.8915 2.7318
40 Within 726.20 72 10.09
41
42InTotal
the ANOVA table 879.55 79
Sample refers to factor B (educational
level) and
Columns refers to factor A (gender). Thus, MS(B) = 45.28,
MS(A) =
11.25, MS(AB) = 2.08 and MSE = 10.09. The F-statistics are
4.49
(educational level), 1.12 (gender), and .21 (interaction).

Example 14.4 INTERPRET

There are significant differences

between the mean number
of jobs held by people with different
educational
backgrounds.

There is no difference between the mean

number of jobs
held by men and women.

Finally, there is no interaction.

Order of Testing in the Two-Factor Analysis of Variance
In the two versions of Example 14.4, we conducted
the tests
of each factor and then the test for interaction.

However, if there is evidence of interaction, the

tests of the
factors are irrelevant.

There may or not be differences between the levels

of factor
A and of the levels of factor B.

Accordingly, we change the order of conducting the

If there is enough evidence to infer

that there is interaction,
do not conduct the other tests.

If there is not enough evidence to

conclude that there is
interaction proceed to conduct the F-
tests for factors A and
B.
Copyright © 2009 Cengage Learning 14.89
Identifying Factors…
• Independent Samples Two-Factor Analysis
of Variance…

Summary of ANOVA…
two-factor analysis of variance

one-way analysis of variance

two-way analysis of variance

a.k.a. randomized blocks

Physical Pharmaceutics
From Everand
Physical Pharmaceutics
Manavalan R
5/5 (3)
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
An Ova
No ratings yet
An Ova
82 pages
ANOVA
No ratings yet
ANOVA
82 pages
An Ova
No ratings yet
An Ova
82 pages
Anova IMP
No ratings yet
Anova IMP
45 pages
Analysis of Variance
No ratings yet
Analysis of Variance
82 pages
Lecture 8
No ratings yet
Lecture 8
21 pages
Selvanathan 6e - 16 - PPT
No ratings yet
Selvanathan 6e - 16 - PPT
139 pages
Anova: Analysis of Variance
No ratings yet
Anova: Analysis of Variance
45 pages
CH 14 and 15
No ratings yet
CH 14 and 15
71 pages
11 ANOVA (Student Version)
No ratings yet
11 ANOVA (Student Version)
30 pages
ANOVA
No ratings yet
ANOVA
44 pages
PLU Quantitative Techniques 4
No ratings yet
PLU Quantitative Techniques 4
13 pages
ANOVA (Analysis of Variance) : Dr. Rachita Gupta
No ratings yet
ANOVA (Analysis of Variance) : Dr. Rachita Gupta
13 pages
Chap 26 One Way Anova
No ratings yet
Chap 26 One Way Anova
38 pages
Chap 26 One Way Anova
No ratings yet
Chap 26 One Way Anova
38 pages
Chapter 5 (Anova)
No ratings yet
Chapter 5 (Anova)
9 pages
11 ANOVA (Class Version)
No ratings yet
11 ANOVA (Class Version)
36 pages
Chi-Square, F-Tests & Analysis of Variance (Anova)
No ratings yet
Chi-Square, F-Tests & Analysis of Variance (Anova)
37 pages
13.1 Classroom Notes
No ratings yet
13.1 Classroom Notes
4 pages
SBE11 CH 13
No ratings yet
SBE11 CH 13
38 pages
Anova Biometry
No ratings yet
Anova Biometry
33 pages
Experimental Designs and ANOVA
No ratings yet
Experimental Designs and ANOVA
78 pages
Chapter 5, ANOVA
No ratings yet
Chapter 5, ANOVA
6 pages
Lecture 9: Analysis of Variance: Statistics For Economics 1
No ratings yet
Lecture 9: Analysis of Variance: Statistics For Economics 1
50 pages
ANOVA 1 F-Ratio
No ratings yet
ANOVA 1 F-Ratio
17 pages
Chapter 12
No ratings yet
Chapter 12
44 pages
Lecture 2
No ratings yet
Lecture 2
13 pages
Chapter 13
No ratings yet
Chapter 13
59 pages
BST 32202 Linear Regression 3 Anova One Way
No ratings yet
BST 32202 Linear Regression 3 Anova One Way
29 pages
Unit Five
No ratings yet
Unit Five
16 pages
Stat 8
No ratings yet
Stat 8
48 pages
Anova
No ratings yet
Anova
32 pages
Chapter 13 One Way ANOVA
No ratings yet
Chapter 13 One Way ANOVA
19 pages
Anova
67% (3)
Anova
55 pages
Module 9
No ratings yet
Module 9
11 pages
20-Introduction To Analysis of Variance
No ratings yet
20-Introduction To Analysis of Variance
31 pages
CH V Anova
No ratings yet
CH V Anova
22 pages
Wa0003.
No ratings yet
Wa0003.
41 pages
Anova 1
No ratings yet
Anova 1
47 pages
Lesson 15 ANOVA (Analysis of Variance)
No ratings yet
Lesson 15 ANOVA (Analysis of Variance)
6 pages
Stat Notes
No ratings yet
Stat Notes
9 pages
Lecture 09 Anova
No ratings yet
Lecture 09 Anova
37 pages
Anova
No ratings yet
Anova
6 pages
One and Two Way ANOVA
No ratings yet
One and Two Way ANOVA
11 pages
Basic Anova PDF
No ratings yet
Basic Anova PDF
6 pages
Anova and F Test
No ratings yet
Anova and F Test
32 pages
Annova PDF
No ratings yet
Annova PDF
22 pages
Chapter Five
No ratings yet
Chapter Five
19 pages
Wa0039.
No ratings yet
Wa0039.
80 pages
Chapter 4 Hypotheses Testing of More Than Two Populations
No ratings yet
Chapter 4 Hypotheses Testing of More Than Two Populations
90 pages
Analysis of Variance: Mcgraw-Hill/Irwin
No ratings yet
Analysis of Variance: Mcgraw-Hill/Irwin
84 pages
FDA Application of ANOVA
No ratings yet
FDA Application of ANOVA
18 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
25 pages
Stat 2 - 10 - Two Way Anova
No ratings yet
Stat 2 - 10 - Two Way Anova
35 pages
Analysis of Variance
No ratings yet
Analysis of Variance
10 pages
Readings For Lecture 5,: S S N N S N
No ratings yet
Readings For Lecture 5,: S S N N S N
16 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
November-09: Oral Examination Programme For M.E.O. Class Ii Part B
No ratings yet
November-09: Oral Examination Programme For M.E.O. Class Ii Part B
20 pages
Mass Communication MCQ
No ratings yet
Mass Communication MCQ
4 pages
Research Final
No ratings yet
Research Final
7 pages
Cve 172
No ratings yet
Cve 172
4 pages
Awa Chap III
No ratings yet
Awa Chap III
3 pages
Researh Problem Definition: Research Methodology
No ratings yet
Researh Problem Definition: Research Methodology
10 pages
Self Acceptance
No ratings yet
Self Acceptance
17 pages
Questionnaires
No ratings yet
Questionnaires
7 pages
Presentation Mcqs
100% (2)
Presentation Mcqs
3 pages
Memo - Entrance Exam and Room Assignments
No ratings yet
Memo - Entrance Exam and Room Assignments
22 pages
The Effects of Read Naturally On Grade 3 Reading: A Study in The Minneapolis Public Schools
No ratings yet
The Effects of Read Naturally On Grade 3 Reading: A Study in The Minneapolis Public Schools
11 pages
GB 03h Analyse Tollgate V1.4
0% (1)
GB 03h Analyse Tollgate V1.4
3 pages
Unit-4 Hypothesis - Testing
No ratings yet
Unit-4 Hypothesis - Testing
17 pages
Academic Calendar 2011
No ratings yet
Academic Calendar 2011
3 pages
Tests, Chi-Squares, Phi, Correlations: It's All The Same Stuff
No ratings yet
Tests, Chi-Squares, Phi, Correlations: It's All The Same Stuff
2 pages
The Correlation Between The PSAT and SAT
No ratings yet
The Correlation Between The PSAT and SAT
29 pages
How Meta-Analysis Increases Statistical Power (Cohn 2003)
No ratings yet
How Meta-Analysis Increases Statistical Power (Cohn 2003)
11 pages
Hallmarks of Scientific Research
95% (22)
Hallmarks of Scientific Research
3 pages
Academic Poster - Paper Ballot
No ratings yet
Academic Poster - Paper Ballot
1 page
Final Exam Math 106 1st Sem. 22 23
No ratings yet
Final Exam Math 106 1st Sem. 22 23
7 pages
Polytechnic University of The Philippines
100% (1)
Polytechnic University of The Philippines
6 pages
Causal-Comparative Research: A Non-Experimental Quantitative Research
No ratings yet
Causal-Comparative Research: A Non-Experimental Quantitative Research
19 pages
Score Card: Suchismita Subhransu Dam Female 13 Sep 1996
No ratings yet
Score Card: Suchismita Subhransu Dam Female 13 Sep 1996
2 pages
605 Kruskal Wallis Test
No ratings yet
605 Kruskal Wallis Test
2 pages
Statistics: Dr. Ebtisam El - Hamalawy
100% (1)
Statistics: Dr. Ebtisam El - Hamalawy
20 pages
2007 How To Write A Systematic Review
No ratings yet
2007 How To Write A Systematic Review
7 pages
Tes 3
No ratings yet
Tes 3
1 page
Non Parametric Test Notes
No ratings yet
Non Parametric Test Notes
10 pages
Lab 7 Worksheet T Tests
No ratings yet
Lab 7 Worksheet T Tests
2 pages
Quantitative Analysis of Qualitative Information From Interviews: A Systematic Literature Review
No ratings yet
Quantitative Analysis of Qualitative Information From Interviews: A Systematic Literature Review
23 pages