Maths report (2)
Maths report (2)
Chi-Square Distribution………………………………………………….2
What is a Chi-Square Test ?…………………………………………….. 2
Chi-Square Test Formula……………………………………………….. 2
Degrees of Freedom……………………………………………………...3
Steps for Chi-Square Test………………………………………………..4
Example for Chi-Square Test…………………………………………….7
C – Program……………………………………………………………...9
Applications of Chi-Sqaure Distribution………………………………..11
Advantages……………………………………………………………...13
Limitations………………………………………………………………13
Conclusion………………………………………………………………14
References……………………………………………………………….15
1
CHI SQUARE DISTRIBUTION
2
Where,
• Σ (sigma): The symbol means sum, so each cell of your contingency
table must be computed.
• Oi: This shorthand captures the idea of actual number of observations in
a given cell of a contingency table, or what was actually counted.
• Ei: The number of times you would expect to see a particular result
under conditions where we assume the hypothesis of no association (null
hypothesis) is called as the expected frequency i.e. Ei.
• (Oi - Ei): The difference between the expected and actual frequencies is
computed in this section of the formula.
3
Expected Value (Ei)
The expected values are the theoretical frequencies that would occur if
the null hypothesis were true. They are calculated based on the assumption of
no association (for independence tests) or a specific distribution (for
goodness-of-fit tests).
4
• When one is investigating the use of two related variables, it is necessary
to use a contingency table to capture all combinations they can possibly
be combined in. In this table, the values of one variable show up in the
columns across, while values of another variable show up in rows. For
instance, one can use it to determine how many females liked diet
coke/vanilla flavored ice cream.
The hypothesis is that men prefer vanilla while women prefer chocolate. So
we need to record how many have chosen vanilla among all male respondents
versus the number who chose chocolate out of all female respondents
Here's an example of what a contingency table might look like:
In this table:
• Table contains two dimensions which are gender and ice cream flavors.
The row headings are male and female categories respectively whereas
column headings represent chocolate, vanilla and strawberry flavors.
Each cell contains numerical counts for every combination of category.
Conduct a chi-square test on this table to examine association between
these two categorical variables.
5
• Expected Frequency Calculation: To compute the anticipated
frequency of individual cells, one must use a method of comparison.
This involves multiplying the sums of rows and columns in proportion,
then dividing by the total number of observations in a table.
Formula :
• One can use a chi-squar table to get the p-value for a particular chi-
square statistic (χ²) with certain degrees of freedom (df) which was
calculated. This table has chances of various values of the chi-square
statistic in different degrees of freedom.
6
Step 7: Interpret Results
• If the p-value is less than a certain significance level (e.g., 0.05) then we
reject the null hypothesis, which is commonly denoted by α. Thus it
means that category variables highly correlate each other.
EXAMPLE-
1. A die is thrown 60 times and the frequency distribution for the number
appearing on the face x is given by the following table:
Faces 1 2 3 4 5 6
Frequency 15 5 4 7 11 17
SOLUTION -
Calculate the expected frequency for each face if the die is unbiased.
Since the die is unbiased, each face should appear equally often.
Therefore, the expected frequency for each face is 60/6 = 10 times.
Ei = N x P(Xi) = 60 x 1/6 = 10
7
Observed frequencies (Oi): 15, 6, 4, 7, 11, 17
1 15 10 25 2.5
2 6 10 16 1.6
3 4 10 36 3.6
4 7 10 9 0.9
5 11 10 1 0.1
6 17 10 49 4.9
Total 13.6
8
C – Program to compute Chi–Square Distribution -
#include <stdio.h>
#include <math.h>
int main()
{
int n, i;
float chi_square = 0.0;
return 0;
}
OUTPUT -
10
Applications of Chi-Sqaure Distribution in Computer Science-
11
• Chi-Square tests are used to determine whether two categorical variables
are independent of each other in large datasets.
• This is useful in association rule mining (e.g., market basket analysis) to
identify dependencies between items.
• Example: Finding relationships between products in transaction data to
discover patterns like "people who buy X often buy Y."
4. Anomaly Detection
• In text mining and NLP, the Chi-Square test helps identify significant
terms for document classification.
• Chi-Square evaluates the dependency between words (features) and
document classes.
• Example: Determining which words are most significant for categorizing
emails as spam or non-spam.
6. Image Processing
12
7. Quality of Software Testing
Advantages -
Limitations -
13
• Assumption of Independence : It assumes independent observations,
which may not always be true in real-world data
Conclusion -
The conclusion for a Chi-Square distribution analysis depends on the
context in which it is applied. However, a general conclusion for this
statistical method includes the following points:
1. Purpose:
The Chi-Square test evaluates whether there is a significant association
or difference between observed and expected frequencies in categorical
data.
2. Key Outcomes:
3. Interpretation:
4. Limitations:
• The test assumes a sufficiently large sample size and that expected
frequencies in each category are generally ≥ 5. If these assumptions
are violated, the results may not be reliable.
14
References :
1. GeeksforGeeks
https://www.geeksforgeeks.org/chi-square-test/
2. Wikipedia
https://en.wikipedia.org/wiki/Chi-squared_distribution
3. https://math.arizona.edu/~jwatkins/chi-square-table.pdf
4. https://www.simplilearn.com/tutorials/statistics-tutorial/chi-square-test
5. https://www.w3schools.com/python/numpy/numpy_random_chisquare.asp
15