0% found this document useful (0 votes)
100 views14 pages

Chi Square

The document describes how to conduct a chi-square test to analyze differences between expected and observed frequencies in categories. It provides an example of using a chi-square test to analyze data from a study on providing a pneumonia vaccine to employees. The study found fewer cases of pneumococcal pneumonia among vaccinated employees compared to unvaccinated employees. The chi-square statistic was calculated to determine if the difference was statistically significant.

Uploaded by

NOYON KUMAR DA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views14 pages

Chi Square

The document describes how to conduct a chi-square test to analyze differences between expected and observed frequencies in categories. It provides an example of using a chi-square test to analyze data from a study on providing a pneumonia vaccine to employees. The study found fewer cases of pneumococcal pneumonia among vaccinated employees compared to unvaccinated employees. The chi-square statistic was calculated to determine if the difference was statistically significant.

Uploaded by

NOYON KUMAR DA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Course Code: IPE - 324

Experiment No. 04 Course Title: Measurement &


Instrumentation Sessional
Group No- 05

Name of The Experiment : Study of Normality Test

NOYON KUMAR DA
Date of Experiment: 27/08/19 Reg. No. 2016334044
Session: 2016-17
3rd Year 2nd Semester
Dept. of Industrial & Production
Date of Submission: 17/09/19 Engineering.
Shahjalal University of Science &
Technology, Sylhet.
OBJECTIVES
i.To study about chi square test.
Ii.To check the normality test.

INTRODUCTION:
A chi-squared test, also written as χ2 test, is any statistical hypothesis test where
the sampling distribution of the test statistic is a chi-squared distribution when
the null hypothesis is true. Without other qualification, 'chi-squared test' often is
used as short for Pearson's chi-squared test. The chi-squared test is used to
determine whether there is a significant difference between the expected
frequencies and the observed frequencies in one or more categories.

In the standard applications of this test, the observations are classified into
mutually exclusive classes, and there is some theory, or say null hypothesis, which
gives the probability that any observation falls into the corresponding class. The
purpose of the test is to evaluate how likely the observations that are made
would be, assuming the null hypothesis is true.

Chi-squared tests are often constructed from a sum of squared errors, or through
the sample variance. Test statistics that follow a chi-squared distribution arise
from an assumption of independent normally distributed data, which is valid in
many cases due to the central limit theorem. A chi-squared test can be used to
attempt rejection of the null hypothesis that the data are independent.

Also considered a chi-squared test is a test in which this is asymptotically true,


meaning that the sampling distribution (if the null hypothesis is true) can be made
to approximate a chi-squared distribution as closely as desired by making the
sample size large enough

DESCRIPTION:

The Chi-square statistic is a non-parametric (distribution free) tool designed to


analyze group differences when the dependent variable is measured at a nominal
level. Like all non-parametric statistics, the Chi-square is robust with respect to
the distribution of the data. Specifically, it does not require equality of variances
among the study groups or homoscedasticity in the data. It permits evaluation of
both dichotomous independent variables, and of multiple group studies. Unlike
many other non-parametric and some parametric statistics, the calculations
needed to compute the Chi-square provide considerable information about how
each of the groups performed in the study. This richness of detail allows the
researcher to understand the results and thus to derive more detailed
information from this statistic than from many others.

The Chi-square is a significance statistic, and should be followed with a strength


statistic. The Cramer’s V is the most common strength test used to test the data
when a significant Chi-square result has been obtained. Advantages of the Chi-
square include its robustness with respect to distribution of the data, its ease of
computation, the detailed information that can be derived from the test, its use in
studies for which parametric assumptions cannot be met, and its flexibility in
handling data from both two group and multiple group studies. Limitations
include its sample size requirements, difficulty of interpretation when there are
large numbers of categories (20 or more) in the independent or dependent
variables, and tendency of the Cramer’s V to produce relative low correlation
measures, even for highly significant results.
The Chi-square test is a non-parametric statistic, also called a distribution free
test. Non-parametric tests should be used when any one of the following
conditions pertains to the data:
1.The level of measurement of all the variables is nominal or ordinal.

2.The sample sizes of the study groups are unequal; for the χ2 the groups may be
of equal size or unequal size whereas some parametric tests require groups of
equal or approximately equal size.

3.The original data were measured at an interval or ratio level, but violate one of
the following assumptions of a parametric test:

a .The distribution of the data was seriously skewed or kurtotic (parametric


tests assume approximately normal distribution of the dependent variable), and
thus the researcher must use a distribution free statistic rather than a parametric
statistic.

b. The data violate the assumptions of equal variance or homoscedasticity.

c. For any of a number of reasons (1), the continuous data were collapsed into
a small number of categories, and thus the data are no longer interval or ratio.

Assumptions of the Chi-square


As with parametric tests, the non-parametric tests, including the χ2 assume the
data were obtained through random selection. However, it is not uncommon to
find inferential statistics used when data are from convenience samples rather
than random samples. (To have confidence in the results when the random
sampling assumption is violated, several replication studies should be performed
with essentially the same result obtained). Each non-parametric test has its own
specific assumptions as well. The assumptions of the Chi-square include:

The data in the cells should be frequencies, or counts of cases rather than
percentages or some other transformation of the data.

The levels (or categories) of the variables are mutually exclusive. That is, a
particular subject fits into one and only one level of each of the variables.

Each subject may contribute data to one and only one cell in the χ2. If, for
example, the same subjects are tested over time such that the comparisons are of
the same subjects at Time 1, Time 2, Time 3, etc., then χ2 may not be used.

The study groups must be independent. This means that a different test must be
used if the two groups are related. For example, a different test must be used if
the researcher’s data consists of paired samples, such as in studies in which a
parent is paired with his or her child.

There are 2 variables, and both are measured as categories, usually at the
nominal level. However, data may be ordinal data. Interval or ratio data that have
been collapsed into ordinal categories may also be used. While Chi-square has no
rule about limiting the number of cells (by limiting the number of categories for
each variable), a very large number of cells (over 20) can make it difficult to meet
assumption #6 below, and to interpret the meaning of the results.

The value of the cell expecteds should be 5 or more in at least 80% of the cells,
and no cell should have an expected of less than one (3). This assumption is most
likely to be met if the sample size equals at least the number of cells multiplied by
5. Essentially, this assumption specifies the number of cases (sample size) needed
to use the χ2 for any number of cells in that χ2

Case study
To illustrate the calculation and interpretation of the χ2 statistic, the following
case example will be used:

The owner of a laboratory wants to keep sick leave as low as possible by keeping
employees healthy through disease prevention programs. Many employees have
contracted pneumonia leading to productivity problems due to sick leave from
the disease. There is a vaccine for pneumococcal pneumonia, and the owner
believes that it is important to get as many employees vaccinated as possible. Due
to a production problem at the company that produces the vaccine, there is only
enough vaccine for half the employees. In effect, there are two groups;
employees who received the vaccine and employees who did not receive the
vaccine. The company sent a nurse to every employee who contracted
pneumonia to provide home health care and to take a sputum sample for culture
to determine the causative agent. They kept track of the number of employees
who contracted pneumonia and which type of pneumonia each had. The data
were organized as follows:

Group 1: Not provided with the vaccine (unvaccinated control group, N = 92)

Group 2: Provided with the vaccine (vaccinated experimental group, N = 92)

In this case, the independent variable is vaccination status (vaccinated versus


unvaccinated). The dependent variable is health outcome with three levels:
contracted pneumococcal pneumonia;

contracted another type of pneumonia; and

did not contract pneumonia.

The company wanted to know if providing the vaccine made a difference. To


answer this question, they must choose a statistic that can test for differences
when all the variables are nominal. The χ2 statistic was used to test the question,
“Was there a difference in incidence of pneumonia between the two groups?” At
the end of the winter, Table 1 was constructed to illustrate the occurrence of
pneumonia among the employees.

Table 1
Results of the vaccination program.

Health Outcome Unvaccinated Vaccinated


Sick with pneumococcal pneumonia 23 5
Sick with non-pneumococcal pneumonia 8 10
No pneumonia 61 77

Calculating Chi-square
With the data in table form, the researcher can proceed with calculating the χ2
statistic to find out if the vaccination program made any difference in the health
outcomes of the employees. The formula for calculating a Chi-Square is:

∑χ2i−j= (O−E)2E

Where:

Observed (the actual count of cases in each cell of the table)Expected value
(calculated below)χ2The cell Chi-square value∑χ2Formula instruction to sum all
the cell Chi-square valuesχ2i−ji−j is the correct notation to represent all the cells,
from the first cell (i) to the last cell (j); in this case Cell 1 (i) through Cell 6 (j).
The first step in calculating a χ2 is to calculate the sum of each row, and the sum
of each column. These sums are called the “marginal” and there are row marginal
values and column marginal values. The marginal values for the case study data
are presented in Table 2.

Table 2
Calculation of marginals.

Health Outcome Not Vaccinated Col 2 Row marginals


vaccinated Col 1
Sick with 23 5 28
pneumococcal
pneumonia
Sick with non- 8 10 18
pneumococcal
pneumonia
Stayed healthy 61 77 138
Column marginals 92 92 N = 184
(Sum of the
column)
The second step is to calculate the expected values for each cell. In the Chi-square
statistic, the “expected” values represent an estimate of how the cases would be
distributed if there were NO vaccine effect. Expected values must reflect both the
incidence of cases in each category and the unbiased distribution of cases if there
is no vaccine effect. This means the statistic cannot just count the total N and
divide by 6 for the expected number in each cell. That would not take account of
the fact that more subjects stayed healthy regardless of whether they were
vaccinated or not. Chi-Square expected are calculated as follows:

E=MR×MCn
Where:

E represents the cell expected value Represents the row marginal for that
cell,MCrepresents the column marginal for that cell, andn =represents the total
sample size.
Specifically, for each cell, its row marginal is multiplied by its column marginal,
and that product is divided by the sample size. For Cell 1, the math is as follows:
(28 × 92)/184 = 13.92. Table 3 provides the results of this calculation for each cell.
Once the expected values have been calculated, the cell χ2 values are calculated
with the following formula:

χ2=(O−E)2E
The cell χ2 for the first cell in the case study data is calculated as follows:
(23−13.93)2/13.93 = 5.92. The cell χ2 value for each cellis the value in
parentheses in each of the cells in Table 3.

Table 3
Cell expected values and (cell Chi-square values).

Health outcome Not vaccinated Vaccinated


Sick with pneumococcal 13.92 (5.92) 12.57 (4.56)
pneumonia
Sick with non- 8.95 (0.10) 9.05 (0.10)
pneumococcal
pneumonia
Stayed healthy 69.12(0.95) 69.88(0.73)
Once the cell χ2 values have been calculated, they are summed to obtain the χ2
statistic for the table. In this case, the χ2 is 12.35 (rounded). The Chi-square table
requires the table’s degrees of freedom (df) in order to determine the significance
level of the statistic. The degrees of freedom for a χ2 table are calculated with the
formula:

(Number of rows − 1) × (Number of columns − 1).


For example, a 2 × 2 table has 1 df. (2−1) × (2−1) = 1. A 3 × 3 table has (3−1) ×
(3−1) = 4 df. A 4 × 5 table has (4−1) × (5−1) = 3 × 4 = 12 df. Assuming a χ2 value of
12.35 with each of these different df levels (1, 4, and 12), the significance levels
from a table of χ2 values, the significance levels are: df = 1, P < 0.001, df = 4, P <
0.025, and df = 12, P > 0.10. Note, as degrees of freedom increase, the P-level
becomes less significant, until the χ2 value of 12.35 is no longer statistically
significant at the 0.05 level, because P was greater than 0.10.

For the sample table with 3 rows and 2 columns, df = (3−1) × (2−1) = 2 × 1 = 2. A
Chi-square table of significances is available in many elementary statistics texts
and on many Internet sites. Using a χ2 table, the significance of a Chi-square value
of 12.35 with 2 df equals P < 0.005. This value may be rounded to P < 0.01 for
convenience. The exact significance when the Chi-square is calculated through a
statistical program is found to be P = 0.0011.

As the P-value of the table is less than P < 0.05, the researcher rejects the null
hypothesis and accepts the alternate hypothesis: “There is a difference in
occurrence of pneumococcal pneumonia between the vaccinated and
unvaccinated groups.” However, this result does not specify what that difference
might be. To fully interpret the result, it is useful to look at the cell χ2 values.

Interpreting cell χ2 values

It can be seen in Table 3 that the largest cell χ2 value of 5.92 occurs in Cell 1. This
is a result of the observed value being 23 while only 13.92 were expected.
Therefore, this cell has a much larger number of observed cases than would be
expected by chance. Cell 1 reflects the number of unvaccinated employees who
contracted pneumococcal pneumonia. This means that the number of
unvaccinated people who contracted pneumococcal pneumonia was significantly
greater than expected. The second largest cell χ2 value of 4.56 is located in Cell 2.
However, in this cell we discover that the number of observed cases was much
lower than expected (Observed = 5, Expected = 12.57). This means that a
significantly lower number of vaccinated subjects contracted pneumococcal
pneumonia than would be expected if the vaccine had no effect. No other cell has
a cell χ2 value greater than 0.99.

A cell χ2 value less than 1.0 should be interpreted as the number of observed
cases being approximately equal to the number of expected cases, meaning there
is no vaccination effect on any of the other cells. In the case study example, all
other cells produced cell χ2 values below 1.0. Therefore the company can
conclude that there was no difference between the two groups for incidence of

Non-pneumococcal pneumonia. It can be seen that for both groups, the majority
of employees stayed healthy. The meaningful result was that there were
significantly fewer cases of pneumococcal pneumonia among the vaccinated
employees and significantly more cases among the unvaccinated employees. As a
result, the company should conclude that the vaccination program did reduce the
incidence of pneumococcal pneumonia.

Very few statistical programs provide tables of cell expected and cell χ2 values as
part of the default output. Some programs will produce those tables as an option,
and that option should be used to examine the cell χ2 values. If the program
provides an option to print out only the cell χ2 value (but not cell expected), the
direction of the χ2 value provides information. A positive cell χ2 value means that
the observed value is higher than the expected value, and a negative cell χ2 value
(e.g. −12.45) means the observed cases are less than the expected number of
cases. When the program does not provide either option, all the researcher can
conclude is this: The overall table provides evidence that the two groups are
independent (significantly different because P < 0.05), or are not independent (P >
0.05). Most researchers inspect the table to estimate which cells are
overrepresented with a large number of cases versus those which have a small
number of cases. However, without access to cell expected or cell χ2 values, the
interpretation of the direction of the group differences is less precise. Given the
ease of calculating the cell expected and χ2 values, researchers may want to hand
calculate those values to enhance interpretation.

DISCUSSION:
The chi square test for independence is an extremely flexible and useful test. The
test can be used to examine the relationship between any two variables, with any
types of measurement - nominal, ordinal, interval or ratio, and discrete or
continuous. While chi square tests of independence are very flexible, and can be
used with any cross classification, a researcher must be careful not to either
overemphasize or hide a relationship between two variables. In doing this, the chi
square test itself is not the problem. The methodological problem is the difficulty
of deciding the proper approach to the grouping of the data. There are no strict
guidelines concerning how data is to be grouped properly. In many cases, the
researcher may try several groupings, and observe what happens as these
groupings change. If the results change little as the groupings change, then the
relationship is likely to be quite apparent. Where the relationship seems to
change as the grouping of the data changes, considerably more effort may have
to be made to discern the exact nature of the relationship between the variables.
In our study we took 120 small roller diameters and plotted the data according to
chi square distribution after that we test the normality under this distribution
though we don’t find the exact value.

CONCLUSION:
The chi square distribution is a theoretical or mathematical distribution which has
wide applicability in statistical work. The term ‘chi square’ (pronounced with a
hard ‘ch’) is used because the Greek letter χ is used to define this distribution. It
will be seen that the elements on which this distribution is based are squared, so
that the symbol χ2 is used to denote the distribution. Each χ2 distribution has a
degree of freedom associated with it, so that there are many different chi squared
distributions. The χ2 statistic appears quite different from the other statistics
which have been used in the previous hypotheses tests. It also appears to bear
little resemblance to the theoretical chi square distribution just described.

You might also like