U9.2-ContingencyTables
U9.2-ContingencyTables
independence and
homogeneity (§10.5)
How to test hypotheses of independence (association) and
homogeneity (similarity) for general two-way cross
classifications of count data.
Terms:
Contingency Table Independence in two-way tables
Cross-Classification Table Chi-Square Test for Independence
Measure of association or Homogeneity
1
Test of Independence or Association
A university conducted a study concerning faculty teaching
evaluation classification by students. A sample of 467
faculty is randomly selected, and each person is classified
according to rank (Instructor, Assistant Professor, etc. ) and
teaching evaluation (Above, Average, Below).
Person Rank Evaluation
1 Professor Above
2 Instructor Average
3 Professor Below
4 Assistant Professor Average Data can be formatted into a cross-
5 Associate Professor Average tabulation or contingency table.
. . .
. . . Rank
. . . Teaching
Evaluation Instructor
Assistant
Professor
Associate
Professor Professor
Above
36 62 45 50
Average
Each person has two Average 48 50 35 43
categorical responses. Below
30 13 20 35
Average
2
What are we interested in from this two-way
classification table?
Rank
Teaching Assistant Associate Relative
Evaluation Instructor Professor Professor Professor Sum Frequency Is the level
Above
36 62 45 50 193 0.413
of teaching
Average evaluation
Average 48 50 35 43 176 0.377
related to
Below
Average
30 13 20 35 98 0.210 rank?
Sum 114 125 100 128 467 1.000
Relative
0.244 0.268 0.214 0.274 1.000
Frequency
Are Professors more likely to be judged above average than other ranks?
Ho: Teaching Evaluation and Rank are independent variables.
Two variables that have been categorized in a two-way table are independent
if the probability that a measurement is classified into a given cell of the table is
equal to the probability of being classified into that row times the probability of
being classified into that column. This must be true for all cells of the table.
3
Rank
Eij n i j
2
r c n ij Eij
2
Expected Eij
ni n j i 1 j 1
Eij
n df = (r-1)(c-1)
r=#rows=3, c=#cols=4, 3 4 table.4
Observed Counts
Rank
Teaching Assistant Associate Relative
Evaluation Instructor Professor Professor Professor Sum Frequency
Above
36 62 45 50 193 0.413
Average
Average 48 50 35 43 176 0.377
Below
30 13 20 35 98 0.210
Average
Sum 114 125 100 128 467 1.000
Relative
0.244 0.268 0.214 0.274 1.000
Frequency
5
Expected Counts
Rank
Teaching Assistant Associate
Evaluation Instructor Professor Professor Professor Sum
Above
47.113 51.660 41.328 52.899 193
Average
Average 42.964 47.109 37.687 48.240 176
Below
23.923 26.231 20.985 26.861 98
Average
Sum 114 125 100 128 467
8
Cell Contents --
Tabulated Statistics: eval, rank Count
Rows: eval Columns: rank Exp Freq
Std. Resid
1 2 3 4 All
1 30 13 20 35 98
23.92 26.23 20.99 26.86 98.00
1.24 -2.58 -0.22 1.57 -- Square roots of
Individual Chi-
2 48 50 35 43 176
square values:
42.96 47.11 37.69 48.24 176.00
0.77 0.42 -0.44 -0.75 --
nij Eij
3 36 62 45 50 193
47.11
-1.62
51.66
1.44
41.33
0.57
52.90 193.00
-0.40 --
Eij
All 114 125 100 128 467
114.00 125.00 100.00 128.00 467.00
-- -- -- -- --
job rating
SAS
input job $ rating $ number;
Frequency‚
datalines; Percent ‚
Instructor Above 36 Row Pct ‚
Col Pct ‚Above ‚Average ‚Below ‚ Total
Instructor Average 48 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Instructor Below 30 Assistan ‚ 62 ‚ 50 ‚ 13 ‚ 125
‚ 13.28 ‚ 10.71 ‚ 2.78 ‚ 26.77
Assistant Above 62 ‚ 49.60 ‚ 40.00 ‚ 10.40 ‚
Assistant Average 50 ‚ 32.12 ‚ 28.41 ‚ 13.27 ‚
Assistant Below 13 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Associat ‚ 45 ‚ 35 ‚ 20 ‚ 100
Associate Above 45 ‚ 9.64 ‚ 7.49 ‚ 4.28 ‚ 21.41
Associate Average 35 ‚ 45.00 ‚ 35.00 ‚ 20.00 ‚
‚ 23.32 ‚ 19.89 ‚ 20.41 ‚
Associate Below 20 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Professor Above 50 Instruct ‚ 36 ‚ 48 ‚ 30 ‚ 114
‚ 7.71 ‚ 10.28 ‚ 6.42 ‚ 24.41
Professor Average 43 ‚ 31.58 ‚ 42.11 ‚ 26.32 ‚
Professor Below 35 ‚ 18.65 ‚ 27.27 ‚ 30.61 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
; Professo ‚ 50 ‚ 43 ‚ 35 ‚ 128
run; ‚ 10.71 ‚ 9.21 ‚ 7.49 ‚ 27.41
‚ 39.06 ‚ 33.59 ‚ 27.34 ‚
proc freq data=eval; ‚ 25.91 ‚ 24.43 ‚ 35.71 ‚
weight number; ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 193 176 98 467
table job*rating / chisq ; 41.33 37.69 20.99 100.00
run; 10
The FREQ Procedure
11
First you need to tell SPSS that each observation
must be weighted by the cell count.
SPSS
12
13
> score <- c(36,48,30,62,50,13,45,35,20,50,43,35)
R
> mscore <- matrix(score,3,4)
> mscore
[,1] [,2] [,3] [,4]
[1,] 36 62 45 50
[2,] 48 50 35 43
[3,] 30 13 20 35
> chisq.test(mscore)
data: mscore
X-squared = 17.4354, df = 6, p-value = 0.00781
$parameter
df
6
$p.value 14
[1] 0.00780959
$method
[1] "Pearson's Chi-squared test"
$data.name
[1] "mscore"
$observed
[,1] [,2] [,3] [,4]
[1,] 36 62 45 50
[2,] 48 50 35 43
[3,] 30 13 20 35
Square roots of
$expected Individual Chi-
[,1] [,2] [,3] [,4] square values:
[1,] 47.11349 51.65953 41.32762 52.89936
[2,] 42.96360 47.10921 37.68737 48.23983
nij Eij
[3,] 23.92291 26.23126 20.98501 26.86081 Eij
$residuals
[,1] [,2] [,3] [,4]
[1,] -1.6191155 1.4386830 0.5712511 -0.3986361
[2,] 0.7683695 0.4211764 -0.4377528 -0.7544218
[3,] 1.2424774 -2.5834003 -0.2150237 1.5704402
15
Test of Homogeneity
16
Homogeneity Null Hypothesis
In general, if the column categories represent c distinct subpopulations,
random samples of size n1, n2, …, nc are selected from each and classified
into the r values of a categorical variable represented by the rows of the
contingency table. The hypothesis of interest here is if there a difference
in the distribution of subpopulation units among the r levels of the
categorical variable, i.e. are the subpopulations homogenous or not.
r
ij = proportion of subpop j subjects
(j=1,…,c) that fall in category i (i=1,
i 1
ij 1, for each j 1, , c
…,r). 17
Null hypothesis
of homogeneity
11 12 1c
21 22 2c
r1 r 2 rc
18
Example: Myocardial Infarction (MI)
Data was collected to determine if there is an association between
myocardial infarction and smoking in women. 262 women suffering
from MI were classified according to whether they had ever smoked
or not. Two controls (patients with other acute disorders) were
matched to every case.
Myocardial Infarction
Smoked Yes No Totals
Yes 172 173 355
No 90 346 436
Totals 262 519 791
MI Yes MI No Total
1 172 173 345
115.74 229.26
2 90 346 436
146.26 289.74
Myocardial Infarction
Smoked Yes No Totals
Yes 172 183 355
No 90 346 436
Totals 262 519 791
24