Just Learn Stats
Just Learn Stats
com
Introduction to
ANOVA
Table of Contents
What is ANOVA?
Assumptions of ANOVA
One-Way ANOVA
Two-Way ANOVA
Page 2
https://leaps.analyttica.com
What is ANOVA?
Analysis of Variance (ANOVA) is a statistical method used to test differences between
two or more means by analysing the variations in observations between and within
different groups. It was developed by Ronald Fisher.
ANOVA is based on the principal of total variance, where the total observed variation is
partitioned into two subcomponents, namely, the variance between the groups and the
variance within the groups. Using these two variances, we can statistically test whether
the groups are significantly different or not. This process is explained later in this
document.
1) Randomization: Consider a hospital is analysing the effect of three drugs, Drug A, Drug
B and Drug C. These are to be tested on 10 patients (say). Randomization implies that
each patient is equally likely to receive any of the three drugs. There is no pre-
experiment bias while administering the drugs on the patients. One-way ANOVA takes
care of the randomization priniciple. It is also called a completely randomized design
(CRD). In this example, patients 1,3,4 and 7 may receive Drug 1, patients 2,5 and 9 may
receive Drug 2, and the remaining patients may receive Drug 3.
2) Replication: Consider a farmer wants to test the effectiveness of three fertilizers on his
crops, Fertilizer A, Fertilizer B and Fertilizer C. He also wants to consider the effect of
soil type on his crops. Suppose his plot of land has five different types of soil, and he
wants to plant 15 crops in total. The ideal design in such a case would be to divide the
land into five heterogenous groups (called as blocks), each block corresponding to one
particular type of soil, and each block having three crops. Then, he would take each
block, and apply the three fertilizers to the block in a random manner. One thing to be
kept in mind is that, he would have to apply all three fertilizers in each block. Such a
design is called is called a randomized block design or two-way ANOVA, and it takes into
consideration both the randomization principle, as within each block the treatments are
applied randomly, and also the replication principle, as the same treatments are applied in
every block.
Page 3
https://leaps.analyttica.com
Assumptions of ANOVA
While carrying out ANOVA, one must keep in mind the following assumptions:
Although these assumptions are necessary and essential while theoretically deriving the
results obtained from ANOVA, however, practically, very few real-life datasets follow
any of these criteria, and in such cases, the user may choose to carry out ANOVA
anyway, ignoring the violation of the assumptions.
One-Way ANOVA
One-way ANOVA involves one independent categorical variable, and one dependant
continuous variable. This technique is used to analyse whether the different categories of
the independent variable differ significantly, based on the differences in the mean value
of the dependent variable for each category. It involves dividing the total variation in the
dependant variable into the explained variation and the unexplained variation.
The explained variation is due to the application of the different treatments. The
unexplained variation is the variation which cannot be numerically explained. It may be
due experimental error or sampling error.
Suppose the independent variable has k classes, and y represents the dependent variable.
Consider the following notations:
Page 4
https://leaps.analyttica.com
% *! % *! % *!
) )
A ABy!" − y=. . D = A A(y!. − y=. . )) + A ABy!" − y=!. D
!&' "&' !&' "&' !&' "&'
The first term on the right hand side is called the Treatment Sum of Squares, or the
Between Sum of squares, and the second term on the right hand side is called the Error
Sum of Squares or the Within Sum of Squares.
Using these values, our objective is to test the null hypothesis H+ vs the alternative
hypothesis H' , where,
Page 5
https://leaps.analyttica.com
As we see from the table, the F value is high, which indicates that we should reject the
null hypothesis.
Another indicator of testing the hypothesis is the p-value. The p-value is essentially the
probability that the null hypothesis is true. Usually, we keep a level of significance of 5%.
This means that, if the p-value is less than 5%, we reject the null hypothesis. Else, we
accept the null hypothesis. However, the desired level of significance can change
depending on our requirement, and correspondingly, our inference and decision to accept
or reject the null hypothesis will change.
In the above data case, we see that the p-value is extremely low. So, we can safely reject
the null hypothesis at 5% level of significance. Our final inference will be that all study
groups do not have the same mean test score in Mathematics. In other words, study
groups have a significant effect on the Maths score.
Two-way ANOVA
In two-way ANOVA, we have two independent categorical variables, and a dependent
variable. Along with testing the equality of means of the individual categorical variables,
two-way ANOVA also helps in testing the significance of the interaction between the
two independent variables on the dependent variable.
The calculations and testing of hypotheses are similar to one-way ANOVA. The only
difference is that we have the additional terms for the sum of squares of the second
categorical variable, along with the interaction sum of squares.
If we have two categorical variables A and B, then the total sum of squares can be written
as,
Page 6
https://leaps.analyttica.com
H+' : Maths score does not differ significantly across study groups
H+) : Maths score does not differ significantly across test preparation levels
H+7 : There is no significant interaction effect between study group
and test preparation on Maths score
The corresponding alternative hypotheses are:
Page 7
https://leaps.analyttica.com
As we see, the p-values for both Study Group and Test Preparation is very small,
indicating that the Maths score differs significantly between different Study Groups and
different Test Preparation levels.
1) Compared to other tests, ANOVA is a robust test against violations of its assumptions.
2) ANOVA facilitates testing of differences among multiple means without increasing the
Type I error rate i.e. increases statistical power.
Limitations
1) Requires that the population distributions are normal. It assumes equality of variances
for each group which may not be true at times.
2) A one-way ANOVA will confirm that at least two groups are different from each other;
however, it does not confirm what groups are different. If H+ is rejected, to find out
which exact groups have a difference in means, you need to run Fisher’s LSD or pairwise t
test.
Page 8
https://leaps.analyttica.com
Write to us at
USA Address
Analyttica Datalab Inc.
1007 N. Orange St, Floor-4,
Wilmington, Delaware - 19801
Tel: +1 917 300 3289/3325
India Address
Analyttica Datalab Pvt. Ltd.
702, Brigade IRV Centre,2nd Main Rd,
Nallurhalli,
Whitefield, Bengaluru - 560066.
Tel : +91 80 4650 7300
Page 9