0% found this document useful (0 votes)
11 views

4.goodness of Fit and Contingency Tables

This document discusses goodness of fit tests and contingency tables. It provides examples of using chi-square tests to determine if observed data fits several theoretical distributions including the normal, binomial, and Poisson distributions. Steps for conducting chi-square tests are outlined and calculations are shown for several examples.

Uploaded by

加赛 郭
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

4.goodness of Fit and Contingency Tables

This document discusses goodness of fit tests and contingency tables. It provides examples of using chi-square tests to determine if observed data fits several theoretical distributions including the normal, binomial, and Poisson distributions. Steps for conducting chi-square tests are outlined and calculations are shown for several examples.

Uploaded by

加赛 郭
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Goodness of Fit and Contingency

Tables

主讲老师:刘子真
Chi-square Distribution
𝒁𝟏 , 𝒁𝟐 , … , 𝒁 𝒌 are independent variables following
standard normal distribution, then
𝑸 = 𝒁𝟏 𝟐 + 𝒁𝟐 𝟐 + ⋯ + 𝒁𝒌 𝟐 ~𝝌𝟐𝒌 𝒅𝒆𝒈𝒓𝒆𝒆 𝒐𝒇 𝒇𝒓𝒆𝒆𝒅𝒐𝒎 = 𝒌
Number on die 1 2 3 4 5 6
Observed 23 15 25 18 21 18
Expected 20 20 20 20 20 20

𝑯𝟎 : there is no difference between the observed


and the theoretical distribution.

𝑯𝟏 : there is a difference between the observed and


the theoretical distribution.
Number on die 1 2 3 4 5 6
Observed 23 15 25 18 21 18
Expected 20 20 20 20 20 20

The measure of goodness of fit:

(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = ~𝝌𝟐𝒏−𝟏
𝑬𝒊

Degree of Freedom= # cells − # constraints


Number on die 1 2 3 4 5 6
Observed 23 15 25 18 21 18
Expected 20 20 20 20 20 20
(𝑶𝒊 − 𝑬𝒊 )𝟐
𝟎. 𝟒𝟓 𝟏. 𝟐𝟓 𝟏. 𝟐𝟓 𝟎. 𝟐 𝟎. 𝟎𝟓 𝟎. 𝟐
𝑬𝒊

(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = = 𝟑. 𝟒
𝑬𝒊

Degree of Freedom= 6 − 1= 5
𝝂=𝟓
(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = = 𝟑. 𝟒
𝑬𝒊

𝝌𝟐𝟓 𝟓% = 𝟏𝟏. 𝟎𝟕𝟎

Since 𝟑. 𝟒 < 𝟏𝟏. 𝟎𝟕𝟎 , do not reject the null


hypothesis. There is no evidence that the die
is biased.
1. Determine which distribution is likely to be a
good model.

2. Set the significance level, for example, 5%.

3. Estimate the parameter (if necessary).

4. Form your hypothesis.

5. Calculate expected value.

6. Combine any expected values so that none are


less than 5.
7. Find 𝝂 = #𝐜𝐞𝐥𝐥𝐬 𝐚𝐟𝐭𝐞𝐫 𝐜𝐨𝐦𝐛𝐢𝐧𝐞 − #𝐜𝐨𝐧𝐬𝐭𝐫𝐚𝐢𝐧𝐭𝐬

8. Find the critical value of 𝝌𝟐 .

(𝑶𝒊 − 𝑬𝒊 )𝟐
9. Calculate .
𝑬𝒊

10. See if your value is significant.

11. Draw the conclusion and interpret.


Digit 0 1 2 3 4 5 6 7 8 9
OB 11 8 8 7 8 9 12 9 13 15

Could the digits be from a random number table?


Test at the 0.05 level.

𝑯𝟎 : A discrete uniform distribution is a suitable


model.

𝑯𝟏 : A discrete uniform distribution is not a


suitable model.
Digit 0 1 2 3 4 5 6 7 8 9
OB 11 8 8 7 8 9 12 9 13 15
EX 10 10 10 10 10 10 10 10 10 10
𝑿𝟐 0.1 0.4 0.4 0.9 0.4 0.1 0.4 0.1 0.9 2.5

𝝂 = 𝟏𝟎 − 𝟏 = 𝟗 𝝌𝟐𝟗 𝟓% = 𝟏𝟔. 𝟗𝟏𝟗

(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = = 𝟔. 𝟐
𝑬𝒊
Since 𝟔. 𝟐 < 𝟏𝟔. 𝟗𝟏𝟗, do not reject the null hypothesis.
There is no evidence that the digits are not random.
𝒙 0 1 2 3 4 5 6 7 8
Frequency 12 28 28 17 7 4 2 2 0

The data are thought to be modelled by a binomial


B(10,0.2). Use the table for the binomial cumulative
distribution function to find expected values, and
conduct a test to see if this is a good model. Use a 5%
significance level.
𝒙 0 1 2 3 4 5 6 7 8
Frequency 12 28 28 17 7 4 2 2 0

𝑯𝟎 : A B(10,0.2) distribution is a suitable model.

𝑯𝟏 : A B(10,0.2) distribution is not a suitable model.

𝒙 0 1 2 3
Probability 0.1074 0.2684 0.3020 0.2013
Expected 10.74 26.84 30.2 20.13
𝒙 4 5 6 7 8
Probability 0.0881 0.0264 0.0055 0.0008 0.0001
Expected 8.81 2.64 0.55 0.08 0.01

𝒙 0 1 2 3 ≥4
Observed 12 28 28 17 15
Expected 10.74 26.84 30.2 20.13 12.09
(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑬𝒊 0.1478 0.0501 0.1603 0.4867 0.7004

𝝂=𝟓−𝟏=𝟒 𝝌𝟐𝟒 𝟓% = 𝟗. 𝟒𝟖𝟖


(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = = 𝟏. 𝟓𝟒𝟓𝟑
𝑬𝒊

Since 𝟏. 𝟓𝟒𝟓𝟑 < 𝟗. 𝟒𝟖𝟖 , do not reject the null


hypothesis. There is no evidence that a B(10,0.2)
distribution is not a suitable model.
Girls 0 1 2 3 4 5
Observed 13 18 38 20 10 1

A study of the number of girls in families with five


children was done on 100 such families.

a. Test, at the 5% significance level, whether a


B(5,0.5) distribution is a good model.

𝑯𝟎 : A B(5,0.5) distribution is a suitable model.

𝑯𝟏 : A B(5,0.5) distribution is not a suitable model.


Girls 0 1 2 3 4 5
Observed 13 18 38 20 10 1
Expected 3.12 15.63 31.25 31.25 15.63 3.12

Girls 0 or 1 2 3 4 or 5
Observed 31 38 20 11
Expected 18.75 31.25 31.25 18.75

𝝂=𝟒−𝟏=𝟑 𝝌𝟐𝟑 𝟓% = 𝟕. 𝟖𝟏𝟓


(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = = 𝟏𝟔. 𝟕𝟏𝟒
𝑬𝒊

Since 𝟏𝟔. 𝟕𝟏𝟒 > 𝟕. 𝟖𝟏𝟓, reject the null hypothesis.


There is evidence that a B(5,0.5) distribution is not
a suitable model.
b. Test, at the 5% significance level, whether a
binomial distribution is a good model.

𝟏𝟗𝟗
𝒑= = 𝟎. 𝟑𝟗𝟖
𝟏𝟎𝟎 × 𝟓

𝑯𝟎 : A B(100,0.398) distribution is a suitable model.

𝑯𝟏 : A B(100,0.398) distribution is not a suitable


model.
Girls 0 1 2 3 4 5
Observed 13 18 38 20 10 1
Expected 7.91 26.14 34.56 22.85 7.55 0.99

Girls 0 1 2 3 4 or 5
Observed 13 18 38 20 11
Expected 7.91 26.14 34.56 22.85 8.54

𝝂=𝟓−𝟐=𝟑 𝝌𝟐𝟑 𝟓% = 𝟕. 𝟖𝟏𝟓


(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑿𝟐 = = 𝟕. 𝟐𝟐
𝑬𝒊

Since 𝟕. 𝟐𝟐 < 𝟕. 𝟖𝟏𝟓 , do not reject the null


hypothesis. There is no evidence that a
B(100,0.398) distribution is not a suitable model.
𝒙 0 1 2 3 4 5 6 7 8
Frequency 8 9 26 13 7 5 1 1 0

The number of telephone calls arriving at an


exchange in six-minute periods were recorded over a
period of 8 hours. Can these results be modelled by a
Poisson distribution? Test at the 5% significance
level.
𝟏𝟕𝟔
𝝀= = 𝟐. 𝟐
𝟖 × 𝟔𝟎 ÷ 𝟔
𝑯𝟎 : A Po(2.2) distribution is a suitable model.

𝑯𝟏 : A Po(2.2) distribution is not a suitable model.

𝟐. 𝟐𝒏 −𝟐.𝟐
𝑷 𝑿=𝒏 = 𝒆
𝒏!

𝒙 0 1 2 3 4 5 ≥6
E 8.864 19.504 21.448 15.728 8.656 3.808 1.692
𝒙 0 1 2 3 4 ≥𝟓
O 8 19 26 13 7 7
E 8.864 19.504 21.448 15.728 8.656 5.8

𝝂=𝟔−𝟐=𝟒 𝝌𝟐𝟒 𝟓% = 𝟗. 𝟒𝟖𝟖

(𝑶 − 𝑬 ) 𝟐
𝒊 𝒊
𝑿𝟐 = = 𝟐. 𝟏𝟎𝟏𝟔
𝑬𝒊
Since 𝟐. 𝟏𝟎𝟏𝟔 < 𝟗. 𝟒𝟖𝟖 , do not reject the null
hypothesis. There is no evidence that a Po(2.2)
distribution is not a suitable model.
Height 150- 155- 160- 170- 175- 180- 185- 190-
(cm) 154 159 169 174 179 184 189 194
OB 11 8 8 7 8 9 12 9

a. Test at the 5% significance level to see if the height


could be modelled by a normal distribution with mean 172
and standard deviation 6.

𝑯𝟎 : The N(172,36) distribution is a suitable model.

𝑯𝟏 : The N(172,36) distribution is not a suitable model.


𝝂=𝟓−𝟏=𝟒

𝝌𝟐𝟒 𝟓% = 𝟗. 𝟒𝟖𝟖

(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑬𝒊
= 𝟏𝟐. 𝟏𝟎𝟕𝟒

Since 𝟏𝟐. 𝟏𝟎𝟕𝟒 > 𝟗. 𝟒𝟖𝟖, reject the null hypothesis.


There is evidence that the N(172,36) distribution is
not a suitable model.
b. Describe how you would modify this test if the
mean and variance were unknown.

1. Estimate the parameter

𝒙 = 𝟏𝟕𝟑. 𝟏𝟓 𝑺𝟐 = 𝟓𝟖. 𝟐𝟐

2. Calculate the degrees of freedom

𝝂 = #𝐜𝐞𝐥𝐥𝐬 𝐚𝐟𝐭𝐞𝐫 𝐜𝐨𝐧𝐦𝐢𝐧𝐠 − 𝟑


Chi-square test of Independence

Observed Employed Unemployed Total


Middle 1153 110 1263
High 3585 206 3791
College 3297 146 3443
Total 8035 462 8497
Chi-square test of Independence

Expected Employed Unemployed Total


Middle 8035∙(1263/8497) 462∙ (1263/8497) 1263
High 8035 ∙ (3791/8497) 462∙ (3791/8497) 3791
College 8035 ∙ (3443/8497) 462∙ (3443/8497) 3443
Total 8035 462 8497
𝑯𝟎 : There is no association between employment and
education.
𝑯𝟏 : There is an association between employment and
education.
Chi-square test of Independence

Expected Employed Unemployed Total


Middle 8035∙(1263/8497) 462∙ (1263/8497) 1263
High 8035 ∙ (3791/8497) 462∙ (3791/8497) 3791
College 8035 ∙ (3443/8497) 462∙ (3443/8497) 3443
Total 8035 462 8497

𝝂 = (𝟑 − 𝟏) × (𝟐 − 𝟏) = 𝟐 𝝌𝟐𝟐 𝟓% = 𝟓. 𝟗𝟗𝟏
(𝑶 − 𝑬)𝟐
𝝌𝟐 = = 𝟑𝟓. 𝟖𝟗𝟐 > 𝟓. 𝟗𝟗𝟏 Reject 𝑯𝟎 .
𝑬

You might also like