CHI-SQUARE
CHI-SQUARE
The 2 test (pronounced as chi-square test) is an important and popular test of hypothesis which fall
is categorized in non-parametric test. This test was first introduced by Karl Pearson in the year 1900.
It is used to find out whether there is any significant difference between observed frequencies and
expected frequencies pertaining to any particular phenomenon. Here frequencies are shown in the
different cells (categories) of a so-called contingency table. It is noteworthy that we take the
observations in categorical form or rank order, but not in continuation or normal distribution. The test
is applied to assess how likely the observed frequencies would be assuming the null hypothesis is
true. This test is also useful in ascertaining the independence of two random variables based on
observations of these variables. This is a non parametric test which is being extensively used for the
following reasons: 1. This test is a Distribution free method, which does not rely on assumptions that
the data are drawn from a given parametric family of probability distributions. 2. This is easier to
compute and simple enough to understand as compared to parametric test. 3. This test can be used in
the situations where parametric test are not appropriate or measurements prohibit the use of
parametric tests.
It is defined as:
( 0−E )2
x 2=∑
E
Where
Chi Square test has a large number of applications where paremertic tests can not be applied. Their
uses can be summarized as under:
This test is helpful in detecting the association between two or more attributes. Suppose we have N
observations classified according to two attributes. By applying this test on the given observations
(data) we try to find out whether the attributes have some association or they are independent. This
association may be positive, negative or absence of association.
It is the most important utility of the Chi Square test. This method is mainly used for testing of
goodness of fit. It attempts to set up whether an observed frequency distribution differs from an
estimated frequency distribution. When an ideal frequency curve whether normal or some other type
is fitted to the data, we are interested in finding out how well this curve fits with the observed facts.
The following steps are followed for the above said purpose: i. A null and alternative hypothesis
pertaining to the enquiry are established, ii. A level of significance is chosen for rejection of the null
hypothesis. iii. A random sample of observations is drawn from a relevant statistical population. iv.
On the basis of given actual observations, expected or theoretical frequencies are derived through
probability. This generally takes the form of assuming that a particular probability distribution is
applicable to the statistical population under consideration. v. The observed frequencies are compared
with the expected or theoretical frequencies. vi. If the calculated value of 2 is less than the table
value at a certain level of significance (generally 5% level) and for certain degrees of freedom the, fit
is considered to be good. i.e.. the divergence between the actual and expected frequencies is attributed
to fluctuations of simple sampling. On the other hand, if the calculated value of 2 is greater than the
table value, the fit is considered to be poor i.e. it cannot be attributed to the fluctuations of simple
sampling rather it is due to the inadequacy of the theory to fit the observed facts.
The 2 test of homogeneity is an extension of the 2 test of independence. Such tests indicate
whether two or more independent samples are drawn from the same population or from different
populations. Instead of one sample as we use in the independence problem, we shall now have two or
more samples. Supposes a test is given to students in two different higher secondary schools. The
sample size in both the cases is the same. The question we have to ask: is there any difference
between the two higher secondary schools? In order to find the answer, we have to set up the null
hypothesis that the two samples came from the same population. The word ‘homogeneous’ is used
frequently in Statistics to indicate ‘the same’ or ‘equal’. Accordingly, we can say that we want to test
in our example whether the two samples are homogeneous. Thus, the test is called a test of
homogeneity.