unit5
unit5
ANALYSIS OF VARIANCE
F-test – ANOVA – estimating effect size – multiple comparisons – case studies Analysis of
variance with repeated measures Two-factor experiments – three f-tests – two-factor ANOVA –
other types of ANOVA Introduction to chi-square tests
5.1 F-TEST
The F-test is a statistical test used to compare variances or standard deviations of two or more
groups. There are two main types of F-tests: one for the equality of two variances and another for the
equality of more than two variances. Here, I'll explain the F-test for the equality of two variances.
Let's consider two independent samples, and we want to test whether the variances of these
samples are equal. The null hypothesis (H0H0) is that the variances are equal, and the alternative
hypothesis (H1H1) is that the variances are not equal.
EXAMPLE:
1.Suppose you have two independent samples, Sample A and Sample B, and you want to test
whether their variances are equal. The following are the sample variances calculated from each
sample:
If the p-value is less than 0.05 (assuming a significance level of 0.05), reject the null hypothesis. If
using critical values, compare the calculated F-statistic to the critical F-value.
7. Interpret Results:
If the null hypothesis is rejected, conclude that there is evidence to suggest that the variances are
not equal.
2.Test whether there is any significant difference between the variances of the population from
which the following samples are taken:
Sample I: 20 16 26 27 23 22
Sample II: 27 33 42 35 34 38
Solution: Given n1=6 , n2=7
x1 x1
2
x2 x2
2
20 400 27 729
16 256 33 1089
26 676 42 1764
27 729 35 1225
23 529 34 1156
22 484 38 1444
134 3704 209 7407
x 1=
∑ x 1 = 134 =22.33
6 6
x 2=
∑ x 2 = 209 =34.83
6 6
S1 =
2 ∑ x 12 −( x )2= 3704 −(22.33)2=13.70
1
n1 6
S2
2
=
∑ x2
2
2
−( x ) =
7407 2
− (34.83 ) =21.37
2
n2 6
2 2
S2 > S1
Analysis of Variance (ANOVA) is a statistical method used to compare means among three or
more groups. ANOVA tests whether there are any statistically significant differences between the means
of the groups. It does this by partitioning the total variability in the data into two components: variability
between groups and variability within groups.
Here are the key concepts and steps involved in conducting an ANOVA:
7. Degrees of Freedom:
8. Mean Squares:
9. F-Statistic:
If the p-value is less than the significance level, reject the null hypothesis. If using critical values,
compare the calculated F-statistic to the critical F-value.
EXAMPLE :
1.The following are the number of mistakes made in % successive days by 4 technicians working
for a photographic laboratory test at a level of significance∝=0 . 01.
Technician
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
8 10 15 10
11 14 11 11
Test whether the difference among the four samples means can be attributed to chance.
Solution:
X1 X2 X3 X4 Total X
2
1 X
2
2 X
2
3 X
2
4
-4 4 0 -1 -1 16 16 0 1
4 -1 2 2 7 16 1 4 4
0 2 -3 -2 -3 0 4 9 4
-2 0 5 0 3 4 0 25 0
1 4 1 1 7 1 16 1 1
-1 9 5 0 13 37 37 39 10
Step 1: N=20
Step 2: T=13
2 2
T (13)
Step 3: = =8.45
N 20
Step 4:
2
T
TSS=∑ X 12+ ∑ X 22+ ¿ ∑ X 32 +¿ ∑ X 42− ¿¿
N
¿ 37+37 +39+10−8.45
¿ 114.55
Step 5:
2 2 2
(−1) (9) (5)
¿ + + −0−8.45
N1 N1 N 1
1 81
¿ + + 5−8.45
5 5
¿ 0.2+16.2+5−8.45
¿ 12.95
SSE=TSS-SSC
¿ 114.55−12.95=101.6
Step 6: ANOVA
Step 7: Conclusion:
∴ So we accept H 0
Two-way ANOVA (Analysis of Variance) is a statistical method used to analyze the effects of two
independent categorical variables on a dependent variable. This type of ANOVA allows for the
examination of the interaction between the two factors and their individual effects on the outcome.
In the context of two-factor ANOVA, there are two main factors (independent variables) that are
being considered. Each factor has two or more levels, and the goal is to understand how these
factors, individually and in combination, affect the dependent variable.
The two factors are typically referred to as Factor A and Factor B. The levels or categories of
Factor A are crossed with the levels or categories of Factor B, resulting in different combinations or
cells. The response variable, or dependent variable, is measured for each combination of Factor A
and Factor B.
The two-way ANOVA model can be expressed as follows:
1.Perform a 2-way ANOVA on the data given below:
Treatment 2 1 30 26 38
2 24 29 28
3 33 24 35
4 36 31 30
5 27 35 33
Use the coding method subtracting 30 from the given number.(A/M 2017)
Solution:
There is no significant difference between treatment 2 (rows) Code the data by 30 from each value.
1 2 3 Total
1 0 -4 8 4
2 -6 -1 -2 -9
3 3 -6 5 2
4 6 1 0 7
5 -3 5 3 5
Total 0 -5 14 9(T)
2 2
T 9
Step 2: Correction factor (C.F) = = = 5.4
N 15
2 2 2
0 (−5 ) ( 14 )
= + + – C.F
5 5 5
=44.2 – 5.4
= 38.8
d. f = c -1 =3 -1 =2
2 2 2 2 2
4 (−9 ) 2 7 5
= + + + + – C.F
3 3 3 3 3
=58.3 – 5.4
=52.9
TSS = 265.6
Step 6: SSE = Residual sum of squares
=265.6 – (38.8+52.9)
=173.9
Step 7:d.f(c-1)(r-1)=(2)(4) = 8.
ANOVA TABLE
1. Formulate Hypotheses: Define the null hypothesis (H0H0) and alternative hypothesis based on
the specific chi-square test.
2. Collect Data: Obtain data in the form of a contingency table or a set of observed frequencies.
3. Calculate Expected Frequencies: If required (e.g., for tests of independence or homogeneity),
calculate the expected frequencies for each cell under the assumption of independence or the
specified distribution.
4. Calculate Test Statistic: Compute the chi-square test statistic based on the observed and
expected frequencies.
5. Determine Critical Value or p-value: Compare the calculated test statistic to the critical value
from the chi-square distribution or obtain a p-value.
6. Make a Decision: If the p-value is less than the significance level (commonly 0.05), reject the null
hypothesis in favor of the alternative.
Chi-square tests are valuable tools in statistics for analyzing categorical data and detecting relationships
between variables. They are widely used in various fields, including social sciences, biology, market
research, and epidemiology.
Chi-Square test for goodness of fitis a test to find if the deviation of the experiment from
theory is just by chance or it is due to the inadequacy of the theory to fit the observed data.
By this test, we test whether difference between observed and expected frequencies are
significant or not.
E- Expected frequency
EXAMPLE:
1. The following data gives the number of aircraft accidents that occurred during the various days of a week.
Find whether the accidents are uniformly distributed over the week
Days Sun Mon Tue Wed Thu Fri Sat
No. of.accidents 14 16 8 12 11 9 14
Solution:
(i) The parameters of interest is to test whether the accidents are uniformly distributed over the week
(ii) Null Hypothesis H 0 : The accident is uniformly distributed over the week.
(iii) Alternative Hypothesis H 1: The accident is not uniformly distributed over the week.
(iv) Level of significance ∝=0.05 , d . f =n−1=7−1=6
( O−E )2
(v) The test statistic is χ 2=∑
E
(vi) Reject H 0 if χ 2=12.592
(vii) Computation:
84
On the assumption H 0 ,the expected number of accidents on each days are =12
7
frequency(O) frequency(E) E
14 12 4 0.333
16 12 16 1.333
8 12 16 1.333
12 12 0 0
11 12 1 0.0833
9 12 9 0.75
14 12 4 0.33
4.1653
2
(O−E)
χ =∑
2
=4.1653
E
(viii) Conclusion
Here χ 2=4.1653<12.592, so we accept H 0 at 5% level of significance.
Therefore the accidents are uniformly distributed over the week.
Chi-square goodness of fit test example
Let’s use the bags of candy as an example. We collect a random sample of ten bags. Each bag has 100
pieces of candy and five flavors. Our hypothesis is that the proportions of the five flavors in each bag are
the same.
Let’s start by answering: Is the Chi-square goodness of fit test an appropriate method to evaluate the
distribution of flavors in bags of candy?
Our categorical variable is the flavors of candy. We have the count of each flavor in 10 bags of
candy. We meet this requirement.
Each bag has 100 pieces of candy. Each bag has five flavors of candy. We expect to have equal
numbers for each flavor. This means we expect 100 / 5 = 20 pieces of candy in each flavor from
each bag. For 10 bags in our sample, we expect 10 x 20 = 200 pieces of candy in each flavor.
This is more than the requirement of five expected values in each category.