0% found this document useful (0 votes)
8 views

Intro ChiSqr

Uploaded by

sujithreddy765
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Intro ChiSqr

Uploaded by

sujithreddy765
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Section 12.

2
Tests for
Independence
and the
Homogeneity of
Proportions

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Characteristics of the Chi-Square
Distribution

1. It is not symmetric.
2. It’s shape depends on the degrees of
freedom, just like Student’s t-distribution.
3. As the number of degrees of freedom
increases, it becomes more nearly
symmetric.
4. The values of χ2 are nonnegative. That is, the
values of χ2 are greater than or equal to 0.

12-2 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


12-3 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Objectives
1. Perform a test for independence
2. Perform a test for homogeneity of
proportions

12-4 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Objective 1
• Perform a Test for Independence

12-5 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


The chi-square test for independence is used
to determine whether there is an association
between a row variable and column variable in
a contingency table constructed from sample
data. The null hypothesis is that the variables
are not associated; in other words, they are
independent. The alternative hypothesis is that
the variables are associated, or dependent.

12-6 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


“In Other Words”

In a chi-square independence test, the null


hypothesis is always
H0: The variables are independent

The alternative hypothesis is always


H0: The variables are not independent

12-7 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Test Statistic for the Test of Independence

Let Oi represent the observed number of counts in the


ith cell and Ei represent the expected number of counts
in the ith cell. Then
2

 
2 Oi  Ei
Ei
approximately follows the chi-square distribution with
(r – 1)(c – 1) degrees of freedom, where r is the number of
rows and c is the number of columns in the contingency
table, provided
 that (1) all expected frequencies are
greater than or equal to 1 and (2) no more than 20% of
the expected
frequencies are less than 5.

12-8 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Cramer’s V statistic

 2
V
N min(r  1, c  1)

This is a standardized form of Chi-square


statistic

12-9 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


The idea behind testing these types of claims is
to compare actual counts to the counts we would
expect if the null hypothesis were true (if the
variables are independent). If a significant
difference between the actual counts and
expected counts exists, we would take this as
evidence against the null hypothesis.

12-10 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


If two events are independent, then
P(E and F) = P(E)P(F)

We can use the Multiplication Principle for


Independent Events to obtain the expected
proportion of observations within each cell
under the assumption of independence and
multiply this result by n, the sample size, in
order to obtain the expected count within
each cell.

12-11 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Parallel Example 1: Determining the Expected Counts in a
Test for Independence

In a poll, 883 males and 893 females were asked “If you
could have only one of the following, which would you
pick: money, health, or love?” Their responses are
presented in the table below. Determine the expected
counts within each cell assuming that gender and
response are independent.

Source: Based on a Fox News Poll conducted in January, 1999


12-12 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution

Step 1: We first compute the row and column totals:

Money Health Love Row Totals


Men 82 446 355 883
Women 46 574 273 893
Column totals 128 1020 628 1776

12-13 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

Step 2: Next compute the relative marginal frequencies


for the row variable and column variable:

Money Health Love Relative


Frequency
Men 82 446 355 883/1776
≈ 0.4972
Women 46 574 273 893/1776
≈0.5028
Relative 128/1776 1020/1776 628/1776
Frequency ≈0.0721 ≈0.5743 ≈0.3536 1

12-14 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

Step 3: Assuming gender and response are independent,


we use the Multiplication Rule for Independent
Events to compute the proportion of observations
we would expect in each cell.

Money Health Love


Men 0.0358 0.2855 0.1758
Women 0.0362 0.2888 0.1778

12-15 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

Step 4: We multiply the expected proportions from step 3


by 1776, the sample size, to obtain the expected
counts under the assumption of independence.

Money Health Love


Men 1776(0.0358) 1776(0.2855) 1776(0.1758)
≈ 63.5808 ≈ 507.048 ≈ 312.2208
Women 1776(0.0362) 1776(0.2888) 1776(0.1778)
≈ 64.2912 ≈ 512.9088 ≈ 315.7728

12-16 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Expected Frequencies in a Chi-Square
Test for Independence

To find the expected frequencies in a cell when


performing a chi-square independence test,
multiply the cell’s row total by its column total and
divide this result by the table total. That is,

(row total)(column total)


Expected frequency =
table total

12-17 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Test Statistic for the Test of Independence

Let Oi represent the observed number of counts in the


ith cell and Ei represent the expected number of counts
in the ith cell. Then
2

 
2 Oi  Ei
Ei
approximately follows the chi-square distribution with
(r – 1)(c – 1) degrees of freedom, where r is the number of
rows and c is the number of columns in the contingency
table, provided
 that (1) all expected frequencies are
greater than or equal to 1 and (2) no more than 20% of
the expected
frequencies are less than 5.

12-18 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Chi-Square Test for Independence
To test the association (or independence of)
two
variables in a contingency table, we use the
steps thatDetermine
Step 1: follow: the null and alternative
hypotheses.
H0: The row variable and column
variable are
independent.
H1: The row variable and column
variables are
12-19 dependent.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Step 2: Choose a level of significance, α,
depending on the seriousness of making
a Type I error.

12-20 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Step 3:
a) Calculate the expected frequencies
(counts) for each cell in the
contingency table.
b) Verify that the requirements for the chi-
square test for independence are
satisfied:
1. All expected frequencies are greater
than or equal to 1 (all Ei ≥ 1).
2. No more than 20% of the expected
frequencies are less than 5.

12-21 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Classical Approach
Step 3:
c) Compute the test statistic:

O  E 
2

 
2
0
i i

Ei
Note: Oi is the observed count for the ith category.

12-22 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Classical Approach

Step 4: Determine the critical value. All


chi-square tests for independence are
right-tailed tests, so the critical value is
2 with (r – 1)(c – 1) degrees
of freedom, where r is the number of
rows and c is the number of columns in
the
contingency table.

12-23 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


12-24 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Classical Approach

Compare the critical value to the test statistic.


If  0    , reject the null hypothesis.
2 2

12-25 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


P-Value Approach
By Hand Step 3:
c) Compute the test statistic:

O  E 
2

 
2
0
i i

Ei
Note: Oi is the observed count for the ith category.

12-26 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


P-Value Approach

d) Use Table VII to determine an approximate P-


value by determining the area under the chi-square
distribution with (r – 1)(c – 1) degrees of freedom
to the right of the test statistic.

12-27 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


P-Value Approach
Technology Step 3:
c) Use a statistical spreadsheet or
calculator with statistical capabilities
to obtain the P-value. The directions
for obtaining the P-value using the TI-
83/84 Plus graphing calculator,
MINITAB, Excel, and StatCrunch are in
the Technology Step-by-Step in the
text.

12-28 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


P-Value Approach

Step 4: If the P-value < α, reject the null


hypothesis.

12-29 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Step 5: State the conclusion.

12-30 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Parallel Example 2: Performing a Chi-Square Test for
Independence

In a poll, 883 males and 893 females were asked “If you
could have only one of the following, which would you
pick: money, health, or love?” Their responses are
presented in the table below. Test the claim that gender
and response are independent at the α = 0.05 level of
significance.

Source: Based on a Fox News Poll conducted in January, 1999

12-31 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

Step 1: We want to know whether gender and response


are dependent or independent so the hypotheses
are:
H0: gender and response are independent
H1: gender and response are dependent

Step 2: The level of significance is α = 0.05.

12-32 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution
Step 3:
(a) The expected frequencies were computed in Example
1 and are given in parentheses in the table below,
along with the observed frequencies.

Money Health Love


Men 82 446 355
(63.5808) (507.048) (312.2208)
Women 46 574 273
(64.2912) (512.9088) (315.7728)

12-33 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution
Step 3:
(b) Since none of the expected frequencies are less than 5,
the requirements for the goodness-of-fit test are
satisfied.
(c) The test statistic is
2 2

 

282  63.5808

446  507.048

0
63.5808 507.048
2


273  315.7728
315.7728
 36.82
12-34 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution: Classical Approach

There are r = 2 rows and c =3 columns, so we find the


critical value using (2 – 1)(3 – 1) = 2 degrees of
 2
freedom. The critical value is 0.05 5.99 .



12-35 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: Classical Approach

Step 4: Since the test statistic,  02 36.82 is greater


than the critical value  0.05
2
5.99, we reject
the null hypothesis.



12-36 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: P-Value Approach

There are r = 2 rows and c =3 columns so we find the


P-value using (2 – 1)(3 – 1) = 2 degrees of freedom.
The P-value is the area under the chi-square
distribution with 2 degrees of freedom to the right of
 02 36.82 which is approximately 0.

12-37 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: P-Value Approach

Step 4: Since the P-value is less than the level of


significance α = 0.05, we reject the null
hypothesis.

12-38 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

Step 5: There is sufficient evidence to conclude that


gender and response are dependent at the
α = 0.05 level of significance.

12-39 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


To see the relation between response and gender,
we draw bar graphs of the conditional
distributions of response by gender. Recall that
a conditional distribution lists the relative
frequency of each category of a variable, given a
specific value of the other variable in a
contingency table.

12-40 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Parallel Example 3: Constructing a Conditional Distribution
and Bar Graph

Find the conditional distribution of response by gender for


the data from the previous example, reproduced below.

Source: Based on a Fox News Poll conducted in January, 1999

12-41 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

We first compute the conditional distribution of response


by gender.

Money Health Love


Men 82/883 446/883 355/883
≈ 0.0929 ≈ 0.5051 ≈ 0.4020
Women 46/893 574/893 273/893
≈ 0.0515 ≈ 0.6428 ≈ 0.3057

12-42 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

12-43 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Objective 2
• Perform a Test for Homogeneity of
Proportions

12-44 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


In a chi-square test for homogeneity of
proportions, we test whether different
populations have the same proportion of
individuals with some characteristic.

12-45 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


The procedures for performing a test of
homogeneity are identical to those for a
test of independence.

12-46 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Parallel Example 5: A Test for Homogeneity of Proportions

The following question was asked of a random sample of individuals


in 1992, 2002, and 2008: “Would you tell me if you feel being a
teacher is an occupation of very great prestige?” The results of the
survey are presented below:

1992 2002 2008


Yes 418 479 525
No 602 541 485

Test the claim that the proportion of individuals that feel being a
teacher is an occupation of very great prestige is the same for each
year at the α = 0.01 level of significance.
Source: The Harris Poll
12-47 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution

Step 1: The null hypothesis is a statement of “no


difference” so the proportions for each year who
feel that being a teacher is an occupation of very
great prestige are equal. We state the hypotheses
as follows:
H0: p1= p2= p3
H1: At least one of the proportions is different
from the others.
Step 2: The level of significance is α = 0.01.

12-48 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution
Step 3:
(a) The expected frequencies are found by multiplying the
appropriate row and column totals and then dividing by
the total sample size. They are given in parentheses in the
table below, along with the observed frequencies.

1992 2002 2008


418 479 525
Yes
(475.554) (475.554) (470.892)
602 541 485
No
(544.446) (544.446) (539.108)

12-49 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution
Step 3:
(b) b) Since none of the expected frequencies are less
than 5, the requirements are satisfied.

12-50 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: Classical Approach
Step 3:
(b) c) The test statistic is
2 2

 
2 418  475.554  479  475.554 
 
0
475.554 475.554
2


485  539.108
539.108
 24.74


12-51 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution: Classical Approach
Step 3:
(b) c) The test statistic is
2 2

 
2 418  475.554  479  475.554 
 
0
475.554 475.554
2


485  539.108
539.108
 24.74


12-52 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution: Classical Approach

Step 4: There are r = 2 rows and c =3 columns, so


we find the critical value using
(2 – 1)(3 – 1) = 2 degrees of freedom.
The critical value is  0.01
2
9.210 .



12-53 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: Classical Approach

Because the test statistic,  02 24.74 is greater than the


critical value  0.01
2
9.210 , we reject the null
hypothesis.



12-54 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: P-Value Approach
By Hand Step 3:
(b) c) The test statistic is
2 2

 
2 418  475.554  479  475.554 
 
0
475.554 475.554
2


485  539.108
539.108
 24.74


12-55 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution: P-Value Approach

d) There are r = 2 rows and c =3 columns so we find


the P-value using (2 – 1)(3 – 1) = 2 degrees of
freedom. The P-value is the area under the chi-
square distribution with 2 degrees of freedom to the
right of  02 24.74 which is approximately 0.



12-56 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution: P-Value Approach

Step 4: Because the P-value is less than the level of


significance α = 0.01, we reject the null
hypothesis.

12-57 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.


Solution

Step 5: There is sufficient evidence to reject the null


hypothesis at the α = 0.01 level of
significance. We conclude that the
proportion of individuals who believe that
teaching is a very prestigious career is
different for at least one of the three years.

12-58 Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

You might also like