0% found this document useful (0 votes)
16 views37 pages

Chi Square

Uploaded by

esasc.swayam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views37 pages

Chi Square

Uploaded by

esasc.swayam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

1

Chi-Square

Heibatollah Baghi, and


Mastee Badii
Different Scales, Different Measures of
Association

Scale of Both Measures of


Variables Association
Nominal Scale Pearson Chi-Square:
χ2

Ordinal Scale Spearman’s rho

Interval or Ratio Pearson r


Scale
2
Chi-Square (χ2) and Frequency Data

Up to this point, the inference to the population has been


concerned with “scores” on one or more variables, such as
CAT scores, mathematics achievement, and hours spent on
the computer.
We used these scores to make the inferences about
population means. To be sure not all research questions
involve score data.
Today the data that we analyze consists of frequencies; that
is, the number of individuals falling into categories. In other
words, the variables are measured on a nominal scale.
The test statistic for frequency data is Pearson Chi-Square.
The magnitude of Pearson Chi-Square reflects the amount of
discrepancy between observed frequencies and expected
frequencies.
3
Steps in Test of Hypothesis
1. Determine the appropriate test
2. Establish the level of significance:α
3. Formulate the statistical hypothesis
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare computed test statistic against a
tabled/critical value

4
1. Determine Appropriate Test

Chi Square is used when both variables are


measured on a nominal scale.
It can be applied to interval or ratio data that have
been categorized into a small number of groups.
It assumes that the observations are randomly
sampled from the population.
All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
It does not make any assumptions about the shape
of the distribution nor about the homogeneity of
variances.
5
2. Establish Level of Significance

α is a predetermined value
The convention
• α = .05
• α = .01
• α = .001

6
3. Determine The Hypothesis:
Whether There is an Association
or Not
Ho : The two variables are independent
Ha : The two variables are associated

7
4. Calculating Test Statistics
Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
The expected frequencies represent the number of
cases that would be found in each cell if the null
hypothesis were true ( i.e. the nominal variables
are unrelated).
Expected frequency of two unrelated events is
product of the row and column frequency divided
by number of cases.
F e = Fr F c / N

8
4. Calculating Test Statistics

 ( Fo  Fe )  2
  
2

 Fe 

9
4. Calculating Test Statistics
O
fre bse
qu r ve
en d
c ie
s

 ( Fo  Fe )  2
  
2

 Fe 

Ex que
fre
pe ncy
c te
d
qu ed
cy
fre pect
en
Ex

10
11

5. Determine Degrees of

f
be r o
Num column
df = (R-1)(C-1)

ls i n
l e ve a r i a bl e
Freedom

v
N u m b er of
l e v e l s i n ro
va r i a b w
le
6. Compare computed test statistic
against a tabled/critical value
The computed value of the Pearson chi- square
statistic is compared with the critical value to
determine if the computed value is improbable
The critical tabled values are based on sampling
distributions of the Pearson chi-square statistic
If calculated 2 is greater than 2 table value,
reject Ho

12
Example

Suppose a researcher is interested in voting


preferences on gun control issues.
A questionnaire was developed and sent to
a random sample of 90 voters.
The researcher also collects information
about the political party membership of the
sample of 90 respondents.

13
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

14
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column d 25 25 40 n = 90
r ve ies
b se nc
O que
fre
15
Row frequency
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

16
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90
Column frequency

17
1. Determine Appropriate Test

1. Party Membership ( 2 levels) and Nominal


2. Voting Preference ( 3 levels) and Nominal

18
19

2. Establish Level of Significance

Alpha of .05
3. Determine The Hypothesis

• Ho : There is no difference between D & R


in their opinion on gun control issue.

• Ha : There is an association between


responses to the gun control survey and the
party membership in the population.

20
4. Calculating Test Statistics

Favor Neutral Oppose f row

Democrat fo =10 fo =10 fo =30 50


fe =13.9 fe =13.9 fe=22.2
Republican fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90

21
4. Calculating Test Statistics

Favor Neutral Oppose f row


= 50*25/90
Democrat fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
Republican fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90

22
4. Calculating Test Statistics

Favor Neutral Oppose f row

Democrat fo =10 fo =10 fo =30 50


fe =13.9 fe =13.9 fe=22.2
= 40* 25/90
Republican fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90

23
4. Calculating Test Statistics

(10  13.89) 2 (10  13.89) 2 (30  22.2) 2


 
2
  
13.89 13.89 22.2

(15  11.11) 2 (15  11.11) 2 (10  17.8) 2


 
11.11 11.11 17.8

= 11.03

24
25

5. Determine Degrees of
Freedom

df = (R-1)(C-1) =
(2-1)(3-1) = 2
6. Compare computed test statistic
against a tabled/critical value
α = 0.05
df = 2
Critical tabled value = 5.991
Test statistic, 11.03, exceeds critical value
Null hypothesis is rejected
Democrats & Republicans differ
significantly in their opinions on gun
control issues
26
SPSS Output for Gun Control
Example

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 11.025a 2 .004
Likelihood Ratio 11.365 2 .003
Linear-by-Linear
8.722 1 .003
Association
N of Valid Cases 90
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 11.11.

27
Additional Information in SPSS
Output
Exceptions that might distort χ2
Assumptions
– Associations in some but not all categories
– Low expected frequency per cell
Extent of association is not same as
statistical significance

Demonstrated
through an example
28
Another Example Heparin Lock
Placement
Complication Incidence * Heparin Lock Placement Time Group Crosstabulation

Heparin Lock Time:


Placement Time Group
1 = 72 hrs
1 2 Total
Complication Had Compilca Count 9 11 20
2 = 96 hrs
Incidence Expected Count 10.0 10.0 20.0
% within Heparin Lock
18.0% 22.0% 20.0%
Placement Time Group
Had NO Compilca Count 41 39 80
Expected Count 40.0 40.0 80.0
% within Heparin Lock
82.0% 78.0% 80.0%
Placement Time Group
Total Count 50 50 100
Expected Count 50.0 50.0 100.0
% within Heparin Lock
100.0% 100.0% 100.0%
Placement Time Group

from Polit Text: Table 8-1 29


Hypotheses in Heparin Lock Placement

Ho: There is no association between


complication incidence and length of
heparin lock placement. (The variables are
independent).
Ha: There is an association between
complication incidence and length of
heparin lock placement. (The variables are
related).
30
More of SPSS Output

31
Pearson Chi-Square
Pearson Chi-Square
= .250, p = .617
Since the p > .05, we fail to
reject the null hypothesis
that the complication rate
is unrelated to heparin
lock placement time.
Continuity correction is
used in situations in which
the expected frequency for
any cell in a 2 by 2 table is
less than 10.

32
More SPSS Output

Symmetric Measures

Asymp.
a b
Value Std. Error Approx. T Approx. Sig.
Nominal by Phi -.050 .617
Nominal Cramer's V .050 .617
Interval by Interval Pearson's R -.050 .100 -.496 .621c
Ordinal by Ordinal Spearman Correlation -.050 .100 -.496 .621c
N of Valid Cases 100
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.

33
Phi Coefficient
Pearson Chi-Square
Symmetric Measures

Value

provides information Nominal by


Nominal
Phi
Cramer's V
-.050
.050
Interval by Interval Pearson's R

about the existence of


-.050
Ordinal by Ordinal Spearman Correlation -.050
N of Valid Cases 100

relationship between 2 a. Not assuming the null hypothesis.


b. Using the asymptotic standard error assuming the null hypothes

nominal variables, but not


c. Based on normal approximation.

about the magnitude of the


relationship
Phi coefficient is the  2
measure of the strength of 
the association N
34
Cramer’s V
When the table is larger than 2 Symmetric Measures

Asymp.
by 2, a different index must be Nominal by Phi
Value
-.050
Std. Error

used to measure the strength of Nominal


Interval by Interval
Cramer's V
Pearson's R
.050
-.050 .100
the relationship between the Ordinal by Ordinal
N of Valid Cases
Spearman Correlation -.050
100
.100

variables. One such index is a. Not assuming the null hypothesis.


b. Using the asymptotic standard error assuming the null hypothesis
Cramer’s V. c. Based on normal approximation.

If Cramer’s V is large, it means


that there is a tendency for
particular categories of the first
variable to be associated with
 2
particular categories of the
second variable. V
N (k  1)

35
Cramer’s V
When the table is larger than 2 Symmetric Measures

Asymp.
by 2, a different index must be Nominal by Phi
Value
-.050
Std. Error

used to measure the strength of Nominal


Interval by Interval
Cramer's V
Pearson's R
.050
-.050 .100
the relationship between the Ordinal by Ordinal
N of Valid Cases
Spearman Correlation -.050
100
.100

variables. One such index is a. Not assuming the null hypothesis.


b. Using the asymptotic standard error assuming the null hypothesis
Cramer’s V. c. Based on normal approximation.

If Cramer’s V is large, it means


that there is a tendency for
particular categories of the first
variable to be associated with
 2
particular categories of the
second variable. V
N (k  1)
Number of Smallest of number
cases 36
of rows or columns
37

Take Home Lesson

How to Test Association between


Frequency of Two Nominal Variables

You might also like