0% found this document useful (0 votes)
20 views

unit5

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

unit5

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT – V

ANALYSIS OF VARIANCE

F-test – ANOVA – estimating effect size – multiple comparisons – case studies Analysis of
variance with repeated measures Two-factor experiments – three f-tests – two-factor ANOVA –
other types of ANOVA Introduction to chi-square tests

5.1 F-TEST

The F-test is a statistical test used to compare variances or standard deviations of two or more
groups. There are two main types of F-tests: one for the equality of two variances and another for the
equality of more than two variances. Here, I'll explain the F-test for the equality of two variances.

F-Test for Equality of Two Variances:

Let's consider two independent samples, and we want to test whether the variances of these
samples are equal. The null hypothesis (H0H0) is that the variances are equal, and the alternative
hypothesis (H1H1) is that the variances are not equal.
EXAMPLE:

1.Suppose you have two independent samples, Sample A and Sample B, and you want to test
whether their variances are equal. The following are the sample variances calculated from each
sample:

 Sample A variance (sA2sA2): 25


 Sample B variance (sB2sB2): 20
Assuming a significance level (αα) of 0.05, perform the F-test to determine whether there is a
significant difference in variances between the two samples.
6. Make a Decision:

 If the p-value is less than 0.05 (assuming a significance level of 0.05), reject the null hypothesis. If
using critical values, compare the calculated F-statistic to the critical F-value.
7. Interpret Results:

 If the null hypothesis is rejected, conclude that there is evidence to suggest that the variances are
not equal.
2.Test whether there is any significant difference between the variances of the population from
which the following samples are taken:

Sample I: 20 16 26 27 23 22
Sample II: 27 33 42 35 34 38
Solution: Given n1=6 , n2=7

x1 x1
2
x2 x2
2

20 400 27 729
16 256 33 1089
26 676 42 1764
27 729 35 1225
23 529 34 1156
22 484 38 1444
134 3704 209 7407

x 1=
∑ x 1 = 134 =22.33
6 6

x 2=
∑ x 2 = 209 =34.83
6 6

S1 =
2 ∑ x 12 −( x )2= 3704 −(22.33)2=13.70
1
n1 6

S2
2
=
∑ x2
2
2
−( x ) =
7407 2
− (34.83 ) =21.37
2
n2 6
2 2
S2 > S1

(i) The parameter isσ 1∧σ 2.


2 2
(ii) Null hypothesis H 0 :σ 1 =σ 2 (there is no significant difference).
2 2
(iii) Alternative hypothesis H 0 :σ 1 =σ 2

(iv) Level of significance ∝=0.05 . Degree of freedom( v 1 )=5 , d . f ( v 2 )=6


2
S2
(v) The test statistic is F= 2
S1
(vi) Reject H 0 if F(6 ,5 )> 4.95 (from table ‘F’ )
(vii) Computation:
2
S2 21.37
F= 2
= =1.55
S1 13.70
(viii) Conclusion
Here |F|=1.55< 4.95 , so we accept the null hypothesis H 0
4.9 ANOVA

Analysis of Variance (ANOVA) is a statistical method used to compare means among three or
more groups. ANOVA tests whether there are any statistically significant differences between the means
of the groups. It does this by partitioning the total variability in the data into two components: variability
between groups and variability within groups.

Here are the key concepts and steps involved in conducting an ANOVA:

1. Null and Alternative Hypotheses:

 Null Hypothesis (H0H0): The means of all groups are equal.


 Alternative Hypothesis (H1H1): At least one group mean is different from the others.
2. Data:

 Collect data from three or more groups.


3. Calculate Group Means:

 Calculate the mean for each group.


4. Calculate Total Sum of Squares (SST):

 SST measures the total variability in the data.

5. Calculate Between-Group Sum of Squares (SSB):

 SSB measures the variability between the group means.

6. Calculate Within-Group Sum of Squares (SSW):


 SSW measures the variability within each group.

7. Degrees of Freedom:

 Degrees of freedom for between-group (dfBdfB) and within-group (dfWdfW) variations.

where k k is the number of groups, and NN is the total number of observations.

8. Mean Squares:

 Calculate mean squares for between-group (MSBMSB) and within-group (MSWMSW).

9. F-Statistic:

 Calculate the F-statistic.

10. Critical Value or P-Value:

 Determine the critical F-value or p-value for a given significance level.

11. Make a Decision:

 If the p-value is less than the significance level, reject the null hypothesis. If using critical values,
compare the calculated F-statistic to the critical F-value.

12. Interpret Results:


 If the null hypothesis is rejected, conclude that there is at least one group mean that is different
from the others.

EXAMPLE :

1.The following are the number of mistakes made in % successive days by 4 technicians working
for a photographic laboratory test at a level of significance∝=0 . 01.

Technician
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
8 10 15 10
11 14 11 11
Test whether the difference among the four samples means can be attributed to chance.
Solution:

Null Hypothesis H 0= There is no significant difference between the technicians

Alternative Hypothesis H 1= significant difference between the technicians

We shift the origin to 10

X1 X2 X3 X4 Total X
2
1 X
2
2 X
2
3 X
2
4

-4 4 0 -1 -1 16 16 0 1
4 -1 2 2 7 16 1 4 4
0 2 -3 -2 -3 0 4 9 4
-2 0 5 0 3 4 0 25 0
1 4 1 1 7 1 16 1 1
-1 9 5 0 13 37 37 39 10
Step 1: N=20

Step 2: T=13

2 2
T (13)
Step 3: = =8.45
N 20

Step 4:
2
T
TSS=∑ X 12+ ∑ X 22+ ¿ ∑ X 32 +¿ ∑ X 42− ¿¿
N

¿ 37+37 +39+10−8.45

¿ 114.55

Step 5:

SSC=¿ ¿ ¿ N 1 →Number of elements in each column

2 2 2
(−1) (9) (5)
¿ + + −0−8.45
N1 N1 N 1

1 81
¿ + + 5−8.45
5 5

¿ 0.2+16.2+5−8.45

¿ 12.95

SSE=TSS-SSC

¿ 114.55−12.95=101.6
Step 6: ANOVA

Source Sum of Degree of Mean squares Variance-ratio Table


of squares freedom value at
variation 1% level
s
Between SSC=12.95 c-1=4-1=3 SSC MSE F C ( 16 , 3 ) =5.29
MSC= F c=
C−1 MSC
Columns
12.95 6.35
¿ =4.317 ¿ =1.471
3 4.317
Error SSE=101.6 N-C=20- SSE MSC
MSE¿ Since <1
N −C MSE
4=16
101.6
¿ =6.35
16
Total 114.55

Step 7: Conclusion:

Cal F c < ¿ Tab F C

∴ So we accept H 0

4.10 THREE F-TEST :


4.11 TWO FACTOR ANOVA

 Two-way ANOVA (Analysis of Variance) is a statistical method used to analyze the effects of two
independent categorical variables on a dependent variable. This type of ANOVA allows for the
examination of the interaction between the two factors and their individual effects on the outcome.
 In the context of two-factor ANOVA, there are two main factors (independent variables) that are
being considered. Each factor has two or more levels, and the goal is to understand how these
factors, individually and in combination, affect the dependent variable.
 The two factors are typically referred to as Factor A and Factor B. The levels or categories of
Factor A are crossed with the levels or categories of Factor B, resulting in different combinations or
cells. The response variable, or dependent variable, is measured for each combination of Factor A
and Factor B.
 The two-way ANOVA model can be expressed as follows:
1.Perform a 2-way ANOVA on the data given below:

Treatment 2 1 30 26 38
2 24 29 28
3 33 24 35
4 36 31 30
5 27 35 33

Use the coding method subtracting 30 from the given number.(A/M 2017)
Solution:

Null Hypothesis :- There is no significant difference between treatment 1 (columns)

There is no significant difference between treatment 2 (rows) Code the data by 30 from each value.

1 2 3 Total
1 0 -4 8 4
2 -6 -1 -2 -9
3 3 -6 5 2
4 6 1 0 7
5 -3 5 3 5
Total 0 -5 14 9(T)

Step 1: Grand Total (T)=9

2 2
T 9
Step 2: Correction factor (C.F) = = = 5.4
N 15

Step 3: SSC = Sum of squares between columns

2 2 2
0 (−5 ) ( 14 )
= + + – C.F
5 5 5

=44.2 – 5.4

= 38.8

d. f = c -1 =3 -1 =2

Step 4: SSR = Sum of squares between rows.

2 2 2 2 2
4 (−9 ) 2 7 5
= + + + + – C.F
3 3 3 3 3

=58.3 – 5.4

=52.9

d.f. = r-1 =5-1 =4.

Step 5: TSS =Total sum of squares

=(−4 ¿ ¿2 + 82+¿ +(−1)2 +(−2 ¿ ¿2 + 32 + (−6)2+ 52 + 6 2 + 12 + 02 + ¿ + 52 + 32−¿5.4

TSS = 265.6
Step 6: SSE = Residual sum of squares

=TSS –(SSC + SSR)

=265.6 – (38.8+52.9)

=173.9

Step 7:d.f(c-1)(r-1)=(2)(4) = 8.

ANOVA TABLE

Source of Sum of squares Degree of Mean squares F- ratio


variation freedom
Between SSC 2 MSC= MSE
f c=
Columns =38.8 SSC MSC
d .f =1.121
19.4
Between rows SSR 4 MSR= f R=
=52.9 SSR MSE
d .f MSR
=13.23 =1.643
Residual SSE 8 MSE=
=173.9 SSE
d .f .
=21.74

Tabulated value of F for (8,2) d.f.at 5% level of significance is

9.37.Since F c< F tabWe accept the null hypothesis H 0

That is there is no significance difference between columns.

Tabulated value of F for (8,4) d.f at 5% level of significance is

6.04. Since F R < F tabWe accept the null hypothesis H 0

That is there is no significance difference between rows.

4.12 INTRODUCTION TO CHI - SQUARE TESTS


 Chi-square tests are a family of statistical tests that assess the association between categorical
variables. These tests are commonly used when dealing with data that can be organized into
contingency tables, where each cell in the table represents the frequency of a combination of
categories from two or more variables. There are different types of chi-square tests, each designed
for specific scenarios.
1. Chi-square Test of Independence:
 Objective: To determine if there is a significant association between two categorical
variables.
 Null Hypothesis (H0H0): There is no association between the variables; they are
independent.
 Test Statistic: The chi-square statistic (χ2χ2) is calculated based on the differences
between observed and expected frequencies in each cell of the contingency table.
 Example: Assessing whether there is a significant association between gender and
smoking habits.
2. Chi-square Test for Goodness of Fit:
 Objective: To assess if observed categorical frequencies match the expected frequencies.
 Null Hypothesis (H0H0): The observed frequencies are consistent with the expected
frequencies.
 Test Statistic: The chi-square statistic is used to compare observed and expected
frequencies in a single categorical variable.
 Example: Checking whether the distribution of blood types in a population fits the expected
distribution.
3. Chi-square Test for Homogeneity:
 Objective: To compare the distribution of a categorical variable across different groups.
 Null Hypothesis (H0H0): The distribution of the variable is the same across all groups.
 Test Statistic: Similar to the chi-square test of independence, it assesses differences in
observed and expected frequencies.
 Example: Comparing the voting preferences of different age groups in an election.
Steps to Perform a Chi-square Test:

1. Formulate Hypotheses: Define the null hypothesis (H0H0) and alternative hypothesis based on
the specific chi-square test.
2. Collect Data: Obtain data in the form of a contingency table or a set of observed frequencies.
3. Calculate Expected Frequencies: If required (e.g., for tests of independence or homogeneity),
calculate the expected frequencies for each cell under the assumption of independence or the
specified distribution.
4. Calculate Test Statistic: Compute the chi-square test statistic based on the observed and
expected frequencies.
5. Determine Critical Value or p-value: Compare the calculated test statistic to the critical value
from the chi-square distribution or obtain a p-value.
6. Make a Decision: If the p-value is less than the significance level (commonly 0.05), reject the null
hypothesis in favor of the alternative.
Chi-square tests are valuable tools in statistics for analyzing categorical data and detecting relationships
between variables. They are widely used in various fields, including social sciences, biology, market
research, and epidemiology.

 Chi-Square test for goodness of fitis a test to find if the deviation of the experiment from

theory is just by chance or it is due to the inadequacy of the theory to fit the observed data.

By this test, we test whether difference between observed and expected frequencies are

significant or not.

Chi-Square test for goodness of fit is defined by


2
(O−E)
χ =∑
2
. Where O-Observed frequency
E

E- Expected frequency

EXAMPLE:

1. The following data gives the number of aircraft accidents that occurred during the various days of a week.
Find whether the accidents are uniformly distributed over the week
Days Sun Mon Tue Wed Thu Fri Sat

No. of.accidents 14 16 8 12 11 9 14
Solution:
(i) The parameters of interest is to test whether the accidents are uniformly distributed over the week
(ii) Null Hypothesis H 0 : The accident is uniformly distributed over the week.
(iii) Alternative Hypothesis H 1: The accident is not uniformly distributed over the week.
(iv) Level of significance ∝=0.05 , d . f =n−1=7−1=6

( O−E )2
(v) The test statistic is χ 2=∑
E
(vi) Reject H 0 if χ 2=12.592
(vii) Computation:
84
On the assumption H 0 ,the expected number of accidents on each days are =12
7

Observed Expected (O−E)


2
(O−E)
2

frequency(O) frequency(E) E

14 12 4 0.333
16 12 16 1.333
8 12 16 1.333
12 12 0 0
11 12 1 0.0833
9 12 9 0.75
14 12 4 0.33
4.1653
2
(O−E)
χ =∑
2
=4.1653
E

(viii) Conclusion
Here χ 2=4.1653<12.592, so we accept H 0 at 5% level of significance.
Therefore the accidents are uniformly distributed over the week.
Chi-square goodness of fit test example
Let’s use the bags of candy as an example. We collect a random sample of ten bags. Each bag has 100
pieces of candy and five flavors. Our hypothesis is that the proportions of the five flavors in each bag are
the same.

Let’s start by answering: Is the Chi-square goodness of fit test an appropriate method to evaluate the
distribution of flavors in bags of candy?

 We have a simple random sample of 10 bags of candy. We meet this requirement.

 Our categorical variable is the flavors of candy. We have the count of each flavor in 10 bags of
candy. We meet this requirement.

 Each bag has 100 pieces of candy. Each bag has five flavors of candy. We expect to have equal
numbers for each flavor. This means we expect 100 / 5 = 20 pieces of candy in each flavor from
each bag. For 10 bags in our sample, we expect 10 x 20 = 200 pieces of candy in each flavor.
This is more than the requirement of five expected values in each category.

You might also like