Chi Square
Chi Square
(Chi-Square Test)
Parametric & Non parametric Test
• In certain test procedure, we have made certain
assumptions about the population distribution or
parameters. For example, in ‘Z test’ we assume that the
samples are drawn from population following normal
distribution. When such assumptions are made the test is
known as parametric test.
• When a situation is not possible to make any assumptions
about the distribution of the population (from which
samples are drawn) in such situation, we follow non-
parametric test. χ2 test is an example of non- parametric test
χ2 test
• The statistical test in which the test statistics follows a
χ2 distribution is called χ2 test. This test was developed
by Karl Pearson in 1900. In χ2 test, we test the
significance of differences between observed
frequencies and the corresponding theoretical
frequencies of a distribution (expected frequencies of
the population), without any assumption about the
population.
Characteristics of χ2 test
1. χ2 test is a non-parametric test. Assumptions about the form of the
distribution or parameter are not required.
2. χ2 test is a distribution free test, which can be used in any type of
distribution of population.
3. It analyses the differences between a set of observed frequencies
and a set of corresponding expected frequencies.
4. χ2 value ranges from 0 to infinity. It is 0 when the expected and
observed frequencies completely coincide. So, greater the value
of χ2 , greater is the discrepancy between observed and expected
frequencies.
Feature of Chi Square Distribution:
The Chi Square distribution is the distribution of the sum of
squared standard normal deviates.
1. The mean of a Chi Square distribution is its degrees of
freedom.
2. Chi Square distributions are positively skewed, with the
degree of skew decreasing with increasing degrees of
freedom.
3. As the degrees of freedom increases, the Chi Square
distribution approaches a normal distribution.
4. Figure 1 shows density functions for three Chi Square
distributions.
Use of χ2 test
χ2 test is one of the most useful statistics test. It is applicable
to very large number of problems in practice.
1. Useful for the test of goodness of fit :
χ2 test can be used to ascertain how well theoretical
distributions fits data. We can test whether there is
goodness of fit observed between the observed frequencies
and expected frequencies. The goodness of fit test is used to
test if sample data fits a distribution from a certain
population (i.e. a population with a normal distribution or
one with a poisson distribution)
2. Useful for the test of independence of attributes:
With the help of χ2 test we can find out whether two attributes
are associated or not.
3. Useful for testing homogeneity :
Test of homogeneity are concerned with whether different
samples come from the same population
4. Useful for testing given population variance:
χ2 Test can be used for testing whether the given population
variance is acceptable on the basis of sample drawn from the
population
1. Test of goodness of fit
If we have a set of frequencies of a distribution obtained by an
experiment and if we are interested in knowing whether these
frequencies are consistent with those which may be obtained
based on some theory, then we use Chi square test of goodness
of fit for this purpose.
For example, if a frequency distribution like binomial or poisson
or normal distribution is applicable, the expected frequencies
would be derived using that distribution.
Steps for the χ test
2 of goodness of fit
Step 1: Set up Null Hypothesis:
H0= There is goodness of fit between observed & expected
frequencies
Step 2: Compute the test statistic χ2=∑[(O-E)2/ E]where ‘O’
stands for observed frequencies and ‘E’ stands for expected
frequencies. Observed frequencies are given in a problem but
certain cases expected frequencies are to be computed
• Degree of freedom= n-r-1. ‘n’ is the number of counts. For a
frequencies distribution ‘r’ is the number of parameters
computed from the data
• It is the degree of choosing the value.
• In binomial distribution(r=0:n-0-1), degree of freedom is n-1
• In poisson distribution(r=1-we calculate mean:n-1-1), degree of
freedom is n-2
• In normal distribution(r=2, we calculate SD and Mean:n-2-1)),
degree of freedom is n-3
• Step 3: Obtain the table value of χ2 for the degree of freedom
and for the desired level of significance
• Step 4: If the calculated value of χ2 is less than the table value,
we conclude that there is a goodness of fit between observed
frequencies and theoretical frequencies (expected frequencies)
Conditions for applying χ 2 test
• The total frequencies (N) must be reasonably large, say at least 50
• Expected frequencies less than 5 are pooled with the preceding or
succeeding frequency so that no expected frequency is less than 5.
Then the degree of freedom is based on the resulting number of
frequencies
• The distribution should not be of proportion/ percentage, but it
should be of original units
Problem-1
• A sample analysis of an examination result of 200 students were
made. It was found that 46 students had failed, 68 secured 3rd
class, 62 secure 2nd classes and the rest were placed in the first
division. Are these figure commensurate with the general
examination results which is in the ratio 2:3:3:2 for various
categories respectively.
• Use χ2 test to test goodness of fit.
Solution
• To get ‘E’ values in to the ratio 2:3:3:2
• That is 200*2/10=40, 200*3/10=60, 200*3/10=60, 200*2/10=40
• Test procedure:
• H0: There is goodness of fit between observed & expected
frequencies
O E (O-E) 2 (O-E) 2/E
46 40 36 0.900
68 60 64 1.070
62 60 4 0.067
24 40 256 6.400
∑[(O-E) 2/E] 8.437
• χ2=∑[(O-E) 2/E]= 8.437
• Degree of freedom=n-r-1
• 4-0-1=3
• Table value at 0.5 level of significance =7.815
• Calculated value is greater than table value. Therefore, there is
no goodness of fit between observed & expected frequencies
Qn 2
Test whether the accidents occur uniformly over week days on
the basis of the following information:
Days: Sun Mon Tue Wed Thu Fri Sat
No of accidents: 11 13 14 13 15 14 18
Qn 3
• 8 coins were tossed 256 times. The results obtained are given
below. Test whether the coins are unbiased applying χ2 test
No.of heads 0 1 2 3 4 5 6 7 8
Frequencies 2 10 25 50 75 58 21 9 6
Solution
• p= (probability of success) q= (probability of failure)
• Here p= p (getting head in a toss) =1/2
• q= 1-p, that is 1-1/2=1/2
• N=8
• Here x (no of heads) follows binomial distribution
• The binomial distribution is p(r)= ncr pr qn-r
• Let us find the expected frequencies, by using binomial
distribution
Expected frequencies
No of heads P(r)ncr pr qn-r
p(x)*256
0 8c0 (1/2)0(1/2)8=1/256 1
1 8c1 (1/2)1(1/2)7=8/256 8
2 8c2(1/2)2(1/2)6=28/256 28
3 8c3(1/2)3(1/2)5=56/256 56
4 8c4(1/2)4(1/2)4=70/256 70
5 8c5(1/2)5(1/2)3=56/256 56
6 8c6(1/2)6(1/2)2=28/256 28
7 8c7(1/2)7(1/2)1=8/256 8
8 8c8(1/2)8(1/2)0=1/256 1
• The expected frequencies are 1,8,28,56,70, 56,28,8,1
• Test procedure
• H0= there is goodness of fit between observed frequencies
and theoretical frequencies
• The first expected frequencies are less than 5 so it is added with
the second. So, we get 1+8=9. Corresponding observed
frequencies is also added so it is 2+10=12 last expected &
observed frequencies are dealt with
No.of heads 0 1 2 3 4 5 6 7 8
Observed Frequencies 2 10 25 50 75 58 21 9 6
Expected Frequencies 1 8 28 56 70 56 28 8 1
0 E (O-E)2 (O-E)2/E
12(2+10) 9(1+8) 9 1.00
25 28 9 0.32
50 56 36 0.64
75 70 25 0.36
58 56 4 0.07
21 28 49 1.75
15(9+6) 9(8+1) 36 4.00
8.14
• χ2=∑(O-E)2/E= 8.14
• Degree of freedom= n-r-1
• 7-0-1=6
• Level of significance =0.05 (5%)
• Table value =12.592
• Since the calculated value is less than the table value, we accept
the hypothesis that there is goodness of fit between observed
and expected values. Therefore, the coins are unbiased
• Here, the parameters are n and p which are available in the
data. So, they are not computed therefore r=0
Qn 4
A systematic sample of 100 pages was taken from the Oxford dictionary
and the observed frequency distribution of foreign words per page was
found to be as follows:
No of foreign words per page: 0 1 2 3 4 5 6
Frequency : 48 27 12 7 4 1 1
Test whether the distribution conforms to poisson distribution
Answer
Mean =99/100=0.99
m=0.99
P(x)= e-m X mX
x!
P(x)= e-m X mr
r!
Question (No.5)
=SD= 5.26
Answer
Mean= 16.65, SD=5.25
Class Limits Z=X-(16.65)/5.25 Area from table Area for classes Area for
classes
-ά 0.5000 0.0869 9
9.5 -1.36 0.4131
0.2557 26
14.5 -0.40 0.1554
0.3608 36
19.5 0.54 0.2054
0.2278 22
24.5 1.50 0.4322
0.0668 6
ά ά 0.5000
100
0 E (O-E)2 (O-E)2/E
10 9 1 0.11
22 26 16 0.62
40 36 16 0.44
21 22 1 0.05
7 7 0 0
1.22
• χ2=∑(O-E)2/E= 1.22
• Degree of freedom= n-r-1
• 5-2-1=2
• Level of significance =0.05 (5%)
• Table value =5.991
• Since the calculated value is less than the table value,
we accept the hypothesis that there is goodness of fit
between observed and expected values.
Contingency Table
• A contingency table is a frequency table in which a sample from
the population is classified according to two or more attributes,
which are divided into two or more classes. When there are
only two divisions for each attribute the contingency table is
known as 2*2contingency table. For eg: consider the two
attributes ‘smoking’ & ‘drinking’
• A 2*2 contingency table for these 2 attributes can be shown as
follows
Column 1 Column 2
Row 1 a b Row 1 Total
Row 2 c d Row 2Total
Column 1 Column 2 Grand Total
Total Total
For finding the expected frequency of a cell, use the following formulae:
Corresponding Row Total X corresponding Column Total)/Grand Total
For finding expected frequency of a, column 1 total(of a)X row1 total(of a)/
Grand total
How to find expected values in a contingency table?
• Let a,b,c,d be the observed frequencies. If frequencies are independently
distributed, then expected frequencies corresponding to a,b,c,d are
respectively.
a b
c d
• ie. (a + b)(a +c)/ a +b +c+d
• (a + b)(b + d)/a + b + c+ d
• (c + d)(a +c)/ a +b +c+ d
• (c +d)(b +d) /a +b+ c+ d
For finding expected frequency of a, column total(of a)X row total(of a)/ Grand
total
2.Testing independence of two attributes
(for contingency table)
Steps:
• H0:The two attributes are independent (ie they are not associated)
• Compute the test statistic by the formula χ2 =∑( O-E)2/E
• Where ‘o’ refers to the observed frequencies and ‘E’ refers to the
expected frequencies.
• Degree of freedom= (r-1)*(c-1) where ‘r’ is the number of rows and ‘c’ is
the number of columns.
• Obtain the table value for the degree of freedom and the desired level of
significance.
• If the calculated value of χ2 is less than the table value accept the null
hypothesis. Otherwise reject it
Qn
• From the following data use χ2 test and conclude whether
inoculation is effective in preventing tuberculosis.
31 54 529 9.80
469 446 529 1.19
185 162 529 3.27
1315 1338 529 0.40
∑( O-E)2/E 14.66
• χ2 = (O-E)2/E =14.66
• Degree of freedom = (r-1 (c-1)= 1*1=1
• Table vale of x2 for one degree of freedom at 5% level of
significance is 3.841
• The calculated value is greater than the table value. Therefore
we reject H0
• Attack and inoculation are not independent, i.e. The inoculation
is effective
Qn 7
• From the following data use χ2 test and conclude whether social
status and intelligence are related(associated)
Column 1 Column 2
Row 1 10 40 50
Row 2 15 85 100
25 125 150
Column 1 Column 2
Row 1 4 40 44
Row 2 15 85 100
19 125 144
Economic Status
Rich Poor
Attitude Favourable 50 155 205
towards
election Not 90 110 200
favourable
Economic Status
Male Female
Tea Tea 19 12 31
habit drinkers
Not tea 32 37 69
drinkers
51 49 100
χ2 = (|ad-bc|)2 Grand Total/column 1 total X Column 2 Total X Row 1 Total X Row 2 Total
=1.90
Table value at 5% level of significance for 1 DOF is 3.84.
Null Hypothesis is accepted. Both attributes are independent(i.e. Not associated)
Answer
Town B: Null Hypothesis: Gender status and tea habit are independet
Gender Status
Male Female
Tea Tea 17 9 26
habit drinkers
Not tea 29 45 74
drinkers
46 54 100
χ2 = (|ad-bc|)2 Grand Total/column 1 total X Column 2 Total X Row 1 Total X Row 2 Total
=5.32
Table value at 5% level of significance for 1 DOF is 3.84.
Null Hypothesis is rejected. Both attributes are dependent(i.e. associated).
Degree of association is in Town B only.
3. χ2 test as a test of homogeneity
Here we have more than one sample unlike the test of independence
where there is only one sample. We want to test whether these
samples are homogeneous as far as a particular attribute is
concerned. When there is homogeneity, we conclude that the
samples belong to the same population or identical population.
The null hypothesis in this case is that there is homogeneity.
The test is performed in the same manner as in the case of test of
independence.
When null hypothesis is accepted, we conclude that there is
homogeneity
Qn 12
• From the adult population of four large cities, random samples were
selected and the number of married and unmarried men was
recorded as below:
cities
A B C D Total
Married 137 164 152 147 600
Single 32 57 56 35 180
total 169 221 208 182 780
• Is there significant variation among the cities in the tendency of men
to marry?
Solution:
A B C D Total
Married 169*600/780=130 221*600/780=160 208*600/780=160 42 600
Single 39 51 48 42 180
Total 169 221 208 182 780
H0: The four cities are homogeneous
O E (O-E)2 (O-E)2/E
137 130 49 0.4
32 39 49 1.3
164 170 36 0.2
57 51 36 0.7
152 160 64 0.4
56 48 64 1.3
147 140 49 0.4
35 42 49 1.2
5.9
χ2 =∑(O-E)2/E=5.9
Degree of freedom=(r-1) (c-1)=(2-1) (4-1)= 3
• Level of significance = 0.05
• Table value=7.82
• Calculated value < table value
• We accept H0 : There is homogeneity
• The cities are identical in the tendency of their men to marry
• There is no significant variation among cities in the tendency of men
to marry
Qn 13
• In a diet survey the following results were obtained
Hindus Muslims
Families taking tea 124 16
Total 180 26
χ2 =∑(O-E)2/E=117.76
Degree of freedom=(r-1) (c-1)=(2-1) (3-1)= 2
• Level of significance = 0.05
• Table value=5.991
• Calculated value > table value
• We accept H0 : There is no homogeneity
• The religions are not identical in respect of drinking habit.
χ test
2 for population variance
χ2 test can be used for testing the given population variance,(to test
whether there is any significant difference between sample variance and
population variance) when the sample is small.
The test statistic is obtained by the formula ns2/ σ2
n= sample size
s2=sample variance
σ2=population variance
Degree of freedom = n-1
Note:For large sample apply Z test, for small sample apply chi square
test for population variance (ns2/σ2)
Problem
Weight in kgm , of 10 students are given below:
38, 40, 45, 53, 47, 43, 55, 48, 52, 49
Can we say that variance of the distribution of weight of all students
from which the above sample of 10 student was drawn, is equal to 20
square kgm?
Solution
x dx dx2
38 -5 25
40 -3 9
45 +2 4
53 +10 100
47 +4 16
43 0 0
55 +12 144
48 +5 25
52 +9 81
49 +6 36
40 440
Solution
S2= 440/10-(40/10)2
S2= 28
Null hypothesis (H0): Variance of the population is 20
Alternative hypothesis (H1):Variance of the population is not 20
χ2 = ns2/ σ2
=(10*28)÷20=14
Degree of freedom=n-1=10-1=9
Level of significance=5%
Table value of χ2 for 9 degree of freedom at 0.05 level of significance=16.919
Calculated value is less than the table value of χ2 . We accept the null hypothesis
Population variance is 20 square kgms.
Qn (15)
• The Standard Deviation of a sample of 10 observations from a normal
population was found to be 5. Examine whether this is consistent
with the hypothesis that the SD of the population is 5.3
An
Null hypothesis (H0): Standard Deviation the population is 5.3
Alternative hypothesis (H1): Standard Deviation the population is not 5.3
χ2 = ns2/ σ2
=(10*52)÷5.32=8.9
Degree of freedom=n-1=10-1=9
Level of significance= 5% (i.e. 0.05)
Table value of χ2 for 9 degree of freedom at 0.05 level of significance=16.9
Calculated value is less than the table value of χ2 . We accept the null
hypothesis
Population SD is 5.3
Qn (17)
A random sample of size 20 from a normal population gives a standard
deviation of 6.
Test the hypothesis that the population SD is 9.
State clearly the alternative hypothesis you allow for and the level of
significance adopted.
(An χ2 8.88, Table value 30.144)
HW 25,26,27,28,29,30,31