Chi Square Test
Chi Square Test
Chi-Square Tests
15
Chi-Square Test for Independence Chi-Square Tests for Goodnessof-Fit Uniform Goodness-of-Fit Test Poisson Goodness-of-Fit Test Normal Chi-Square Goodnessof-Fit Test ECDF Tests (Optional)
McGraw-Hill/Irwin Copyright 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
15-2
Table 15.1
15-3
15-4
15-5
Figure 15.1
15-6
15-7
15-8
Figure 15.2
15-9
Figure 15.3
15-10
15-11
Step 5: Make the Decision Reject H0 if c2R > test statistic or if the p-value < a.
15-12
The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Cochrans Rule requires that ejk > 5 for all cells. Up to 20% of the cells may have ejk < 5 Most agree that a chi-square test is infeasible if ejk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.
15-13
For example, the variables Infant Deaths per 1,000 and Doctors per 100,000 can each be coded into various categories:
Figure 15.6
15-14
15-15
15-16
15-17
A multinomial distribution is defined by any k probabilities p1, p2, , pk that sum to unity. For example, consider the following official proportions of M&M colors.
calc
15-18
H1: At least one of the pj differs from the hypothesized value No parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom are n=cm1=60-1
15-19
The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).
15-20
where
fj = the observed frequency of observations in class j ej = the expected frequency in class j if H0 were true
15-21
15-22
15-23
Mixtures: A Problem
Mixtures occur when more than one datagenerating process is superimposed on top of one another.
15-24
15-25
15-26
15-27
The test can be performed on data that are already tabulated into groups. Calculate the expected frequency ej for each cell. The degrees of freedom are n = c 1 since there are no parameters for the uniform distribution. Obtain the critical value c2a from Appendix E for the desired level of significance a. The p-value can be obtained from Excel. Reject H0 if p-value < a.
First form c bins of equal width and create a frequency distribution. Calculate the observed frequency fj for each bin. Define ej = n/c. Perform the chi-square calculations. The degrees of freedom are n = c 1 since there are no parameters for the uniform distribution. Obtain the critical value from Appendix E for a given significance level a and make the decision.
15-28
15-29
15-30
15-31
15-32
15-33
S xj fj n
Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formula P(x) = (lxe-l)/x! or Excel.
15-34
15-35
15-36
Count the sample observations fj within intervals of the form x + ks and compare them with the known frequencies ej based on the normal distribution.
15-37
15-39
15-40
15-41
15-42
Table 15.16
15-43
The fitted normal histogram gives visual clues as to the likely outcome of the GOF test. Histograms reveal any outliers or other nonnormality issues. Further tests are needed since histograms vary.
Figure 15.15
15-44
Table 15.19
15-45
ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests
There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF). The Kolmogorov-Smirnov (K-S) test statistic D is the largest absolute difference between the actual and expected cumulative relative frequency of the n data values: D = Max |Fa Fe| The K-S test is not recommended for grouped data.
15-46
ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests
15-47
Fa is the actual cumulative frequency at observation i. Fe is the expected cumulative frequency at observation i under the assumption that the data came from the hypothesized distribution. The K-S test assumes that no parameters are estimated. If parameters are estimated, use a Lilliefors test. Both of these tests are done by computer.
ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests
Figure 15.20
15-48
ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests
Figure 15.21
15-49
ECDF Tests
Anderson-Darling Tests
The Anderson-Darling (A-D) test is widely used for non-normality because of its power. The A-D test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test statistic measures the overall distance between the actual and the hypothesized distributions, using a weighted squared distance.
15-50
ECDF Tests
Anderson-Darling Tests with MINITAB
Figure 15.22
15-51
End of Chapter 15
15-52