Ombc 106 Notes u11
Ombc 106 Notes u11
1.0 Introduction
What is a Chi-Square Goodness-of-Fit Test? - A Chi-Square Goodness-of-Fit test is a statistical test used to
determine whether a sample data fits a specific distribution. It compares the observed frequencies of data with
the expected frequencies from a theoretical distribution.
When to Use:
• Categorical Data: When your data is categorical (e.g., colors, brands, ratings).
• Theoretical Distribution: You have a specific theoretical distribution in mind (e.g., uniform, normal,
binomial, Poisson).
Hypotheses:
• Null Hypothesis (H₀): The observed frequencies fit the expected frequencies.
• Alternative Hypothesis (H₁): The observed frequencies do not fit the expected frequencies.
Test Statistic:
χ²Ê=ÊΣÊ[(OÊ- E)²Ê/ÊE]
where:
χ²: Chi-square test statistic
O: Observed frequency
E: Expected frequency
Degrees of Freedom (df) = k - 1, (where k is the number of categories / cells).
Decision Rule:
• Critical Value Approach: Compare the calculated χ² value with the critical χ² value from the chi-square
distribution table based on the significance level (α) and degrees of freedom.
• p-value Approach: Calculate the p-value associated with the calculated χ² value and compare it to the
significance level (α).
Example:
Suppose we have a six-sided die and we want to test whether it is fair. We roll the die 60 times and
observe the following frequencies:
Face Observed Frequency (O)
1 12
2 10
3 8
4 15
5 9
6 6
Expected Frequency (E) for each face: 60/6 = 10
Degrees of Freedom: df = 6 - 1 = 5
Decision:
Using a significance level of α = 0.05, the critical value for a chi-square distribution with 5 degrees of
freedom is approximately 11.07.
Since the calculated χ² value (4.4) is less than the critical value, we fail to reject the null hypothesis.
Conclusion: There is not enough evidence to conclude that the die is unfair. The observed frequencies are
consistent with the expected frequencies of a fair die.
Note: The chi-square goodness-of-fit test is sensitive to sample size. A larger sample size can increase the power
of the test to detect small deviations from the expected distribution.
What is a Chi-Square Test for Independence? - A Chi-Square Test for Independence is a statistical test used to
determine whether there is a significant association between two categorical variables. It helps us understand if
the variables are independent or dependent.
When to Use:
• Categorical Data: Both variables should be categorical.
• Contingency Table: The data is often presented in a contingency table.
Hypotheses:
• Null Hypothesis (H₀): The two variables are independent.
• Alternative Hypothesis (H₁): The two variables are dependent.
Test Statistic:
χ²Ê=ÊΣÊ[(OÊ- E)²Ê/ÊE]
where:
χ²: Chi-square test statistic
O: Observed frequency
E: Expected frequency
Decision Rule:
Critical Value Approach: Compare the calculated χ² value with the critical χ² value from the chi-
square distribution table based on the significance level (α) and degrees of freedom.
p-value Approach: Calculate the p-value associated with the calculated χ² value and compare it to the
significance level (α).
Example:
Suppose we want to test whether there is a relationship between gender and preference for a particular
brand of soda. We collect data from 100 people and organize it into a contingency table:
Gender/Soda Preference Brand A Brand B Total
Male 25 25 50
Female 25 25 50
Total 50 50 100
Expected Frequencies:
Gender/Soda Preference Brand A Brand B Total
Male 25 25 50
Female 25 25 50
Total 50 50 100
Decision:
Using a significance level of α = 0.05, the critical value for a chi-square distribution with 1 degree of
freedom is approximately 3.84.
Since the calculated χ² value (0) is less than the critical value, we fail to reject the null hypothesis.
Conclusion: There is not enough evidence to conclude that there is a relationship between gender and
soda preference. The two variables appear to be independent.
Note: The chi-square test for independence is sensitive to sample size and the expected cell frequencies. It's
important to ensure that the expected frequencies are not too small, as this can affect the accuracy of the test.
4.0 A CHI-SQUARE TEST FOR THE EQUALITY OF MORE THAN TWO POPULATION PROPORTIONS
This statistical test is used to determine whether the proportions of a categorical variable are equal across
multiple populations. It's a versatile tool that finds applications in various fields, from social sciences to medical
research.
Assumptions:
1. Independence: Observations within and between groups are independent.
2. Sample Size: Expected cell counts should be at least 5 for the chi-square approximation to be valid.
Steps Involved:
1. Create a Contingency Table:
○ Organize the data into a contingency table with rows representing the categories of the categorical
variable and columns representing the different populations.
2. Calculate Expected Frequencies:
○ For each cell in the table, calculate the expected frequency under the null hypothesis. This is done by
multiplying the row total by the column total and dividing by the grand total.
3. Calculate the Chi-Square Test Statistic:
○ Use the following formula:
χ²Ê=ÊΣÊ[(OÊ- E)²Ê/ÊE]
where:
χ²: Chi-square test statistic
O: Observed frequency in a cell
E: Expected frequency in a cell
Σ: Summation over all cells in the table
4. Determine Degrees of Freedom:
○ Calculate the degrees of freedom (df)Ê=Ê(rÊ- 1)Ê*Ê(cÊ- 1)
where:
r: Number of rows in the contingency table
c: Number of columns in the contingency table
5. Find the Critical Value or p-value:
○ Critical Value Approach:
Use a chi-square distribution table to find the critical value corresponding to the desired
significance level (α) and degrees of freedom.
○ p-value Approach:
Use statistical software or a chi-square calculator to find the p-value associated with the
calculated chi-square test statistic and degrees of freedom.
6. Make a Decision:
○ Critical Value Approach:
If the calculated chi-square test statistic is greater than the critical value, reject the null
hypothesis.
○ p-value Approach:
If the p-value is less than the significance level (α), reject the null hypothesis.
Interpretation:
• Rejecting the null hypothesis indicates that there is evidence to suggest that at least one population
proportion is different from the others.
• Failing to reject the null hypothesis suggests that there is not enough evidence to conclude that the
population proportions are different.
Additional Considerations:
• Post-hoc Tests: If the null hypothesis is rejected, post-hoc tests can be used to determine which specific
pairs of populations have significantly different proportions.
• Assumptions: It's important to check the assumptions of independence and expected cell sizes before
conducting the chi-square test.
By following these steps and considering the assumptions, you can effectively use the chi-square test to compare
proportions across multiple populations.