0% found this document useful (0 votes)
102 views

Analysis For Business Chi-Square Test

The chi-square test is a non-parametric test used to analyze categorical data that does not meet the assumptions of parametric tests. It can test if there are significant relationships between two variables organized in a contingency table. The chi-square test statistic compares observed and expected frequencies in each cell of the table if the null hypothesis of independence is true. A larger chi-square statistic value indicates a poorer fit between the observed and expected values and potential rejection of the null hypothesis.

Uploaded by

thamirad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Analysis For Business Chi-Square Test

The chi-square test is a non-parametric test used to analyze categorical data that does not meet the assumptions of parametric tests. It can test if there are significant relationships between two variables organized in a contingency table. The chi-square test statistic compares observed and expected frequencies in each cell of the table if the null hypothesis of independence is true. A larger chi-square statistic value indicates a poorer fit between the observed and expected values and potential rejection of the null hypothesis.

Uploaded by

thamirad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Analysis for Business

Chi-square Test
Introduction
• we learned how to test hypotheses using data from
one or two samples.
• These tests are known as parametric tests, because
they involve testing the parameters of a population,
such as the mean and proportions.
• They use the parametric statistics of samples that
come from the population being tested.
• To formulate these tests, we make assumptions about
the population, for example, that the population is
normally distributed.
Introduction
• There are certain kinds of data that cannot be tested in
this way such as: data which was not collected in a
random sample and therefore does not have a normal
distribution; ordinal data; ranked data; and data from
more than two populations.
• In business, we often encounter data of this type,
such as: the results of a survey of which brand of
washing powder consumers prefer, an analysis of the
arrival of customers at supermarket checkouts, a
survey of employees' attitudes towards performance
appraisal in different departments,
Introduction
• a study of whether male staff have been more
successful in passing professional examinations
than female staff.
• For these types of data, it is necessary to use
tests which do not make restrictive assumptions
about the shape of population distributions.
• These are known as non-parametric tests. we are
going to consider one of the most commonly used non-
parametric tests, called the chi-squared test
Chi-Square as a Statistical Test
• Chi-square test: an inferential statistics
technique designed to test for significant
relationships between two variables
organized in a bivariate table.

• Chi-square requires no assumptions about


the shape of the population distribution
from which a sample is drawn.
The Chi Square Test
• A statistical method used to determine goodness of fit
– Goodness of fit refers to how close the observed data
are to those predicted from a hypothesis

• Note:
– The chi square test does not prove that a hypothesis is
correct
• It evaluates to what extent the data and the
hypothesis have a good fit
Example
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female

 2 categories for each variable, so


called a 2 x 2 table

 Suppose we examine a sample of


300 children
Example

Sample results organized in a contingency table:

Hand Preference
sample size = n = 300:
Gender Left Right
120 Females, 12
were left handed Female 12 108 120
180 Males, 24 were
left handed Male 24 156 180

36 264 300
2 Test for the Difference
Between Two Proportions
H0: π1 = π2 (Proportion of females who are left
handed is equal to the proportion of
males who are left handed)
H1: π1 ≠ π2 (The two proportions are not the same)

• If H0 is true, then the proportion of left-handed females should be


the same as the proportion of left-handed males
• The two proportions above should be the same as the proportion of
left-handed people overall
The Chi-Square Test
Statistic
The Chi-square test statistic is:
( fo  fe )2
2
χ STAT  
all cells
fe
• where:
fo = observed frequency in a particular cell
fe = expected frequency in a particular cell if H0 is true

2
χ STAT for the 2 x 2 case has 1 degree of freedom

(Assumed: each cell in the contingency table has expected


frequency of at least 5)
Decision Rule
2
The χ STAT test statistic approximately follows a
chi-squared distribution with one degree of freedom

Decision Rule:
χ 2
If STAT  χ 2
α , reject H0,
otherwise, do not reject 
H0
0
Do notH Reject H0 2
reject
2α
0
Computing the
Average Proportion
The average X1  X2 X
p 
proportion is: n1  n2 n

120 Females, 12 Here:


were left handed
12  24 36
180 Males, 24 were p   0.12
left handed
120  180 300

i.e., of all the children the proportion of left handers is 0.12,


that is, 12%
Finding Expected
Frequencies
• To obtain the expected frequency for left handed
females, multiply the average proportion left handed (p)
by the total number of females
• To obtain the expected frequency for left handed males,
multiply the average proportion left handed (p) by the
total number of males

If the two proportions are equal, then


P(Left Handed | Female) = P(Left Handed | Male) = .12

i.e., we would expect (.12)(120) = 14.4 females to be left handed


(.12)(180) = 21.6 males to be left handed
Observed vs. Expected
Frequencies
Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Observed = 24 Observed = 156
Male 180
Expected = 21.6 Expected = 158.4

36 264 300
The Chi-Square Test Statistic
Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Observed = 24 Observed = 156
Male 180
Expected = 21.6 Expected = 158.4
36 264 300
The test statistic is:
(f o  f e ) 2
χ 2STAT  
all cells
fe
(12  14.4) 2 (108  105.6) 2 (24  21.6) 2 (156  158.4) 2
     0.7576
14.4 105.6 21.6 158.4
Decision Rule
2
The test statistic is χ STAT  0.7576 ; χ 02.05 with 1 d.f.  3.841

Decision Rule:
2
If χ STAT > 3.841, reject H0,
otherwise, do not reject H0

Here,
2 2
0.05 χ STAT = 0.7576< χ 0.05 = 3.841,
so we do not reject H0 and
0 conclude that there is not
Do notH
reject 0
Reject H0 2 sufficient evidence that the two
20.05 = 3.841 proportions are different at  =
0.05
2 Test for Differences Among
More Than Two Proportions
• Extend the 2 test to the case with more
than two independent populations:

H0: π1 = π2 = … = πc
H1: Not all of the πj are equal (j = 1, 2, …, c)
The Chi-Square Test
Statistic
The Chi-square test statistic is:
( fo  fe )2
2
χ STAT  
all cells
fe
• Where:
fo = observed frequency in a particular cell of the 2 x c table
fe = expected frequency in a particular cell if H0 is true

χ 2STAT for the 2 x c case has (2 - 1)(c - 1)  c - 1 degreesof freedom

(Assumed: each cell in the contingency table has expected


frequency of at least 1)
Computing the
Overall Proportion
The overall X1  X2    Xc X
p 
proportion is: n1  n2    nc n

• Expected cell frequencies for the c


categories are calculated as in the 2 x 2
case, and the decision rule is the same:
2
χ
Where α is from the chi-
Decision Rule: squared distribution with
2
If χ STAT  χ α2 , reject H0, c – 1 degrees of freedom
otherwise, do not reject H0
2 Test of Independence

• Similar to the 2 test for equality of more


than two proportions, but extends the
concept to contingency tables with r rows
and c columns

H0: The two categorical variables are independent


(i.e., there is no relationship between them)
H1: The two categorical variables are dependent
(i.e., there is a relationship between them)
2 Test of Independence
(continued)

The Chi-square test statistic is:


( fo  fe )2
2
χ STAT  
all cells
fe
 where:
fo = observed frequency in a particular cell of the r x c table
fe = expected frequency in a particular cell if H0 is true

χ 2STAT for the r x c case has (r - 1)(c - 1) degrees of freedom

(Assumed: each cell in the contingency table has expected


frequency of at least 1)
Expected Cell Frequencies

• Expected cell frequencies:

row total  column total


fe 
n

Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
Decision Rule

• The decision rule is

χ 2
If STAT  χ 2
α , reject H0,
otherwise, do not reject H0

2
Where χ α is from the chi-squared distribution
with (r – 1)(c – 1) degrees of freedom
Example
• The meal plan selected by 200 students is shown below:

Number of meals per week


Class
Standing 20/week 10/week none Total
Fresh. 24 32 14 70
Soph. 22 26 12 60
Junior 10 14 6 30
Senior 14 16 10 40
Total 70 88 42 200
Example
(continued)

• The hypothesis to be tested is:


H0: Meal plan and class standing are independent
(i.e., there is no relationship between them)
H1: Meal plan and class standing are dependent
(i.e., there is a relationship between them)
Example:
Expected Cell Frequencies
(continued)
Observed:
Number of meals
Class per week
Expected cell
Standing 20/wk 10/wk none Total
Fresh. 24 32 14 70
frequencies if H0 is true:
Soph. 22 26 12 60
Number of meals
Junior 10 14 6 30 per week
Class
Senior 14 16 10 40
Standing 20/wk 10/wk none Total
Total 70 88 42 200
Fresh. 24.5 30.8 14.7 70
Example for one cell:
Soph. 21.0 26.4 12.6 60
row total  column total
fe  Junior 10.5 13.2 6.3 30
n
Senior 14.0 17.6 8.4 40
30  70
  10 .5 Total 70 88 42 200
200
Example: The Test Statistic
(continued)

• The test statistic value is:

( f o  f e )2
2
χ STAT  
all cells
fe
( 24  24 .5 ) 2 ( 32  30 .8 ) 2 ( 10  8.4 ) 2
    0.709
24 .5 30 .8 8.4

χ 0.2 05 = 12.592 from the chi-squared distribution


with (4 – 1)(3 – 1) = 6 degrees of freedom
Example:
Decision and Interpretation
(continued)

2
The test statistic is χ STAT  0.709 ; χ 02.05 with 6 d.f.  12.592

Decision Rule:
2
If χ STAT > 12.592, reject H0,
otherwise, do not reject H0

0.05 Here,
2 2
χ STAT = 0.709 < χ 0.05 = 12.592,
so do not reject H0
0
Do notH
reject Reject H0 2 Conclusion: there is not
0
sufficient evidence that meal
20.05=12.592 plan and class standing are
related at  = 0.05

You might also like