0% found this document useful (0 votes)
5 views24 pages

Analysis of Variance

This document provides an introduction to Analysis of Variance (ANOVA), covering key concepts such as null and alternative hypotheses, types of errors, and the assumptions required for ANOVA. It explains the procedure for one-way and two-way ANOVA, including the calculation of the F-statistic and the interpretation of results. Additionally, it includes examples and practice exercises to illustrate the application of ANOVA in testing for differences in population means.

Uploaded by

harold chisunzi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views24 pages

Analysis of Variance

This document provides an introduction to Analysis of Variance (ANOVA), covering key concepts such as null and alternative hypotheses, types of errors, and the assumptions required for ANOVA. It explains the procedure for one-way and two-way ANOVA, including the calculation of the F-statistic and the interpretation of results. Additionally, it includes examples and practice exercises to illustrate the application of ANOVA in testing for differences in population means.

Uploaded by

harold chisunzi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit 2

Introduction to Analysis of
Variance
(ANOVA)

C. Chafuwa April 2019 1


Null and Alternative hypothesis

• In hypothesis testing we begin by making a tentative


assumption about a population parameter. This tentative
assumption is called the null hypothesis and is denoted by H0.

• We then define another hypothesis, called the alternative


hypothesis, which is the opposite of what is stated in the null
hypothesis. The alternative hypothesis is denoted by Ha.

2
Some concepts
• Type I Error: Rejecting the null hypothesis when it is true.

• Type II Error: Accepting the null hypothesis when it is false.

• Level of confidence (α): Probability of committing Type I


Error.
• β is the probability of committing Type II Error.

• Power (1-β) is the probability of rejecting the null


hypothesis when in fact it is false.

• P-value: Probability that provides a measure of the evidence


against the null hypothesis provided by the sample. Smaller p-
values indicate more evidence against H0. 3
Errors and correct conclusions in hypothesis
testing

Population condition
H0 True Ha True

Accept H0 Correct conclusion Type II Error


Conclusion

Reject H0 Type I Error Correct conclusion

4
ANOV
A
• ANOVA – Analysis of Variance

• ANOVA can be used to test for the equality of k


population means using data obtained from a
completely randomized design as well as data
obtained from an observational study.

• ANOVA is the statistical procedure used to determine


whether the observed differences in three or more
sample means are large enough to reject H0.
5
Introduction to ANOVA
• The null hypothesis is that the several population
means are mutually equal.

• The sampling procedure used is that several


independent random samples are collected, one for
each of the data categories (treatment levels).

• The assumption underlying the use of the analysis of


variance is that the several sample means
were obtained
from normally distributed populations
having the same variance.
6
Assumptions for ANOVA
Three assumptions are required to use analysis of variance.

1. For each population, the response variable is


normally distributed
– Populations follow a normal distribution.

2. The variance of the response variable, denoted σ2, is the same


for all of the populations.
– σ2 1 = σ2 2 = σ2 3 = … = σ2 k
– This means the populations have equal standard deviations.

3. The populations are independent.

When these conditions are met, the F-statistic is used as the


test statistic. 7
Test statistic for ANOVA

Recall:
• To compare means of 2 groups we use a Z or a T statistic.
– Compare means from two groups to see whether they are so far
apart that the observed difference cannot reasonably be attributed to
sampling variability.

• To compare means of more than 2 groups, we use a new test


called ANOVA and a new statistic called F-statistic.
– Compare the means from two or more groups to see whether they
are so far apart that the observed differences cannot all reasonably be
attributed to sampling variability

8
Between- and Within- variability

• Variability between groups: amount of variation among sample


means due to assigned causes or treatments

• Variability within groups: the amount of variation within the


sample observations due to error (or unexplained chance
causes)

9
What are the characteristics of the F
distribution?
• The F distribution is continuous. This means that it can
assume an infinite number of values between zero and
positive infinity.
• The F distribution cannot be negative. The smallest value
F can assume is 0.
• It is positively skewed. The long tail of the distribution is
to the right-hand side. As the number of degrees of
freedom increases in both the numerator and denominator
the distribution approaches a normal distribution.
• It is asymptotic. As the values of X increase, the F curve
approaches the X-axis but never touches it. This is similar to
the behavior of the normal distribution.
10
Two types of ANOVA

One-way ANOVA
• The different populations are classified according to one
attribute or factor
– E.g. levels of crop production classified according to
type of fertilizer used

Two-way ANOVA
• The different populations are classified according to 2
attributes or factors
– E.g. levels of crop production classified according to
type of fertilizer used and seed type 11
One-way ANOVA
• Classification of populations is based on a single
factor or attribute or treatment.

• To make inferences about several popln means based


on the sample data, the null hypothesis for k
population means is

– H0: μ1= μ2= · · · = μk

– Ha: Not all k means are equal

12
What are the steps for testing the equality of
means using the one-way ANOVA procedure?
• Step 1: State the null and alternative hypotheses as follows:
– H0: μ1= μ2= · · · = μk
– Ha: Not all k means are equal.

• Step 2: Use the F-distribution table and the level of


significance, α, to determine the rejection region.

• Step 3: Build the ANOVA table, and from the table determine
the computed value of the F-ratio.

• Step 4: State your conclusion. The null hypothesis is rejected if


the computed value of the test statistic falls in the rejection
region. Otherwise, the null hypothesis is not rejected. 13
Testing procedure
• Estimate the popln variance from the variance between
sample means (sum of squares between treatments, SSTR)

• Treatment variation: sum of squared differences between each


treatment and the grand overall mean
– Calculate mean values for each sample
– Calculate the grand mean
– Calculate the difference between the mean of each sample
and the grand mean, square the difference, multiply by each
sample size, and sum over to the number of samples
(SSTR
– sum of squares between sample means)
– Between treatments mean squares (MSTR) = SSTR/k-1
14
Testing procedure cont’d

• Estimate the variance from the variance within the samples

• Random variation: sum of squared differences between each


observation and its treatment mean
– Calculate mean values for each sample
– Calculate the difference of each observation in k samples
from the mean value of the respective samples
– Finally, compute the sum of squares within samples (SSE –
Error Sum of Squares)
– Compute the variance from the variance within the
samples
(MSE)
15
Testing procedure cont’d

• Calculate the F-ratio or F-test statistic using the 2


population variances
• F = MSTR/MSE

• Using the calculated F-value, make a decision on the null


hypothesis (No difference among the means)

• Decision rule:
– If Fcal > Ftab, reject the null hypothesis
– Ftab from F-table using a given level of significance
and degrees of freedom k-1 and n-k
16
Alternative short-cut method

• Calculate the total of observations in samples from each


sample
• Calculate the correction factor (CF)
• Calculate the total sum of squares (SST)
– Total variation: sum of squared differences between each
observation and the overall mean
• Calculate the treatment sum of squares (between samples)
(SSTR) and MSTR
• Calculate the error sum of squares (within samples) (SSE) and
MSE
• Compute Fk-1, n-k

17
18
Example
4 NGOs were randomly selected in 3 districts in Malawi
to test whether significant variations exist in helping
farmers in adoption of farm technologies. The following
table records the number of farmers they have reached
with disseminating the farm technology.

District 1 District 2 District 3


NGO1 20 15 16

NGO2 10 10 5

NGO3 12 15 20

NGO4 15 7 5
19
Example cont’d
• Formulate the hypotheses to test if there are
significant differences in mean number of farmers
reached with disseminating the farm technology.

• Set up the ANOVA table, clearly showing your


calculations.

• Are there significant differences in the mean number


of farmers reached? Use the 95% confidence level.

20
Two-way ANOVA

• Variation within samples can be due to some measurable rather


than pure error
• SSE partitioned into 2:
– Unwanted variation due to some excluded variables
– Actual variation due to random error

• Therefore investigate 2 factors of interest for testing


the difference between sample means
• Also consider interaction between the 2 factors under
investigation (beyond scope of this course)

21
Two-way ANOVA

• The two-way ANOVA tests the null hypotheses of equal means for each
factor
• E.g suppose that an agricultural experiment consists of examining
the yields per acre of 4 different varieties of wheat, where each
variety is grown on 5 different plots of land.
– n=20
– Yield differences can be due to either of the 2 factors or
classifications:
1) type of wheat grown 2) block (or plot) used
– The 2 factors are referred to as treatments and blocks, or simply
factor 1 and factor 2
• Assume we have a treatments (factor 1) and b blocks (factor 2)
• It is supposed that there is one experimental value (such as yield per
acre) corresponding to each treatment and block combination.
22
Practice exercise
Table 1 gives fresh graduates daily earnings (in
thousands of MK) of former students with bachelor’s
degrees from 5 colleges and for 3 class rankings at
graduation.

Test at the 5% level of significance that the means are


identical
(a) for college populations and
(b) for class-ranking populations

23
Table 1
Bunda Chanco Poly CoM Nursing

Top 20 18 16 14 12

Middle 19 16 13 12 8

Bottom 18 14 10 10 10

24

You might also like