Introduction To ANOVA ANOVA Designs Multi-Factor ANOVA Difference Between Means, Correlated Pairs
Introduction To ANOVA ANOVA Designs Multi-Factor ANOVA Difference Between Means, Correlated Pairs
Prerequisites
Introduction to ANOVA, ANOVA Designs, Multi-Factor ANOVA, Difference Between Means, Correlated Pairs
Learning Objectives
Within-subjects factors involve comparisons of the same subjects under different conditions. For example, in the ADHD Treatment
Study, each child's performance was measured four times, once after being on each of four drug doses for a week. Therefore, each
subject's performance was measured at each of the four levels of the factor "Dose." Note the difference from between-subjects
factors for which each subject's performance is measured only once and the comparisons are among different groups of subjects. A
within-subjects factor is sometimes referred to as a repeated measures factor since repeated measurements are taken on each
subject. An experimental design in which the independent variable is a within-subjects factor is called a within-subjects design.
An advantage of within-subjects designs is that individual differences in subjects' overall levels of performance are
controlled. This is important because subjects invariably will differ from one another. In an experiment on problem solving, some
subjects will be better than others regardless of the condition they are in. Similarly, in a study of blood pressure some subjects will
have higher blood pressure than others regardless of the condition. Within-subjects designs control these individual differences by
comparing the scores of a subject in one condition to the scores of the same subject in other conditions. In this sense each subject
serves as his or her own control. This typically gives within-subjects designs considerably more power than between-subjects
designs.
One-factor Designs
Let's consider how to analyze the data from the ADHD treatment case study. These data consist of the scores of 24 children
with ADHD on a delay of gratification (DOG) task. Each child was tested under four dosage levels. For now we will be concerned
only with testing the difference between the mean in the placebo condition (the lowest dosage, D0) and the mean in the highest
dosage condition (D60). The details of the computations are relatively unimportant since they are almost universally done by
computers. Therefore we jump right to the ANOVA Summary table shown in Table 1.
Source df SSQ MS F p
Total 47 630.48
The first source of variation, "Subjects," refers to the differences among subjects. If all the subjects had exactly the same mean
(across the two dosages) then the sum of squares for subjects would be zero; the more subjects differ from each other, the larger
Dosage refers to the differences between the two dosage levels. If the means for the two dosage levels were equal, the sum
of squares would be zero. The larger the difference between means, the larger the sum of squares.
The error reflects the degree to which the effect of dosage is different for different subjects. If subjects all responded very
similarly to the drug, then the error would be very low. For example, if all subjects performed moderately better with the high dose
than they did with the placebo, then the error would be low. On the other hand, if some subjects did better with the placebo while
others did better with the high dose, then the error would be high. It should make intuitive sense that the less consistent the effect
of the drug, the larger the drug effect would have to be in order to be significant. The degree to which the effect of the drug differs
depending on the subject is the Subjects x Drug interaction. Recall that an interaction occurs when the effect of one variable differs
depending on the level of another variable. In this case, the size of the error term is the extent to which the effect of the variable
"Drug" differs depending on the level of the variable "Subjects." Note that each subject is a different level of the variable
"Subjects."
Other portions of the summary table have the same meaning as in between-subjects ANOVA. The F for dosage is the mean
square for dosage divided by the mean square error. For these data, the F is significant with p = 0.004. Notice that this F test is
Table 2 shows the ANOVA Summary Table when all four doses are included in the analysis. Since there are now four dosage
levels rather than two, the df for dosage is three rather than one. Since the error is the Subjects x Dosage interaction, the df for
error is the df for "Subjects" (23) times the df for Dosage (3) and is equal to 69.
Source df SSQ MS F p
Total 95 12099.74
Carry-over effects
Often performing in one condition affects performance in a subsequent condition in such a way to make a within-subjects design
impractical. For example, consider an experiment with two conditions. In both conditions subjects are presented with pairs of
words. In Condition A subjects are asked to judge whether the words have similar meaning whereas in Condition B subjects are
asked to judge whether they sound similar. In both conditions, subjects are given a surprise memory test at the end of the
presentation. If condition were a within-subjects variable, then the there would be no surprise after the second presentation and it
is likely that the subjects would have been trying to memorize the words.
Not all carry-over effects cause such serious problems. For example, if subjects get fatigued by performing a task, then they
would be expected to do worse on the second condition they were in. However, as long as half the subjects are in Condition A first
and Condition B second, the fatigue effect itself would not invalidate the results, although it would add noise and reduce power. The
carryover effect is symmetric in that having Condition A first affects performance in Condition B to the same degree that having
Asymmetric carryover effects cause more serious problems. For example, suppose performance in Condition B were much
better if preceded by Condition A whereas performance in Condition A was approximately the same regardless of whether it was
preceded by Condition B. With this kind of carryover effect it is probably better to use a between-subjects design.
In the Stroop Interference case study, subjects performed three tasks: naming colors, reading color words, and naming the ink
color of color words. Some of the subjects were males and some of the subjects were females. Therefore this design had two
factors: gender and task. The ANOVA Summary Table for this design is shown in Table 3.
Source df SSQ MS F p
The computations for the sum of squares will not be covered since computations are normally done by software. However, there are
some important things to learn from the summary table. First notice that there are two error terms: one for the between-subjects
variable Gender and one for both the within-subjects variable Task and the interaction of the between-subjects variable and the
within-subjects variable. Typically, the mean square error for the between-subjects variable will be higher than the other mean
square error. In this example, the mean square error for Gender is about twice as large as the other mean square error.
The degrees of freedom for the between-subjects variable is equal to the number of levels of the between subjects variable
minus one. In this example it is one since there are two levels of gender. Similarly, the degrees of freedom for the within-subjects
variable is equal to the number of levels of the variable minus one. In this example, it is two since there are three tasks. The
degrees of freedom for the interaction is the product of the degrees of freedom of the two variables. For the Gender x Task
interaction, the degrees of freedom is the product of degrees of freedom Gender (which is 1) and the degrees of freedom Task
Assumption of Sphericity
Within-subjects ANOVA makes a restrictive assumption about the variances and the correlations among the dependent variables.
Although the details of the assumption are beyond the scope of this book, it is approximately correct to say that it is assumed that
all the correlations are equal and all the variances are equal. Table 4 shows the correlations among the three dependent variables
Note that the correlation between the word reading and the color naming variables of 0.7013 is much higher than the correlation
between either of these variables with the interference variable. Moreover, as shown in Table 5, the variances among the variables
differ greatly.
Table 5. Variances.
Variable Variance
word reading 15.77
color naming 13.92
Interference 55.07
Naturally, the assumption of sphericity, like all assumptions, refers to populations not samples. However it is clear from these
Although ANOVA is robust to most violations of its assumptions, the assumption of sphericity is an exception: Violating the
assumption of sphericity leads to a substantial increase in the Type I error rate. Moreover, this assumption is rarely met in practice.
Although violations of this assumption Although violations of this assumption had at one time received little attention, the current
Approaches to Dealing with Violations of Sphericity
If an effect is highly significant, there is a conservative test that can be used to protect against an inflated Type I error rate. This
test consists of adjusting the degrees of freedom for all within subject variables as follows: The degrees of freedom numerator and
denominator are divided by the number of scores per subject minus one. Consider the effect of Task shown in Table 3. There are
three scores per subject and therefore the degrees of freedom should be divided by two. The adjusted degrees of freedom are:
The probability value is obtained using the F probability calculator with the new degrees of freedom parameters. The probability of
an F of 228.06 or larger with 1 and 45 degrees of freedom is less than 0.001. Therefore, there is no need to worry about the
Possible violation of sphericity does make a difference in the interpretation of the analysis shown in Table 2. The probability
value of an F or 5.18 with 1 and 23 degrees of freedom is 0.032, a value that would lead to a more cautious conclusion than the p
The correction described above is very conservative and should only be used when, as in Table 3, the probability value is
very low. A better correction, but one that is very complicated to calculate is to multiply the degrees of freedom by a quantity called
ε. There are two methods of calculating ε. The correction called the Huynh-Feldt (or H-F) is slightly preferred to the called the
Geisser Greenhouse (or G-G) although both work well. The G-G correction is generally considered a little too conservative.
A final method for dealing with violations of sphericity is to use a multivariate approach to within-subjects variables. This
method has much to recommend it, but it is beyond the score of this text.