0% found this document useful (0 votes)
3 views

U9.2-ContingencyTables

Uploaded by

naveenfatimanibd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

U9.2-ContingencyTables

Uploaded by

naveenfatimanibd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Contingency Tables: Tests for

independence and
homogeneity (§10.5)
How to test hypotheses of independence (association) and
homogeneity (similarity) for general two-way cross
classifications of count data.

Terms:
Contingency Table Independence in two-way tables
Cross-Classification Table Chi-Square Test for Independence
Measure of association or Homogeneity

1
Test of Independence or Association
A university conducted a study concerning faculty teaching
evaluation classification by students. A sample of 467
faculty is randomly selected, and each person is classified
according to rank (Instructor, Assistant Professor, etc. ) and
teaching evaluation (Above, Average, Below).
Person Rank Evaluation
1 Professor Above
2 Instructor Average
3 Professor Below
4 Assistant Professor Average Data can be formatted into a cross-
5 Associate Professor Average tabulation or contingency table.
. . .
. . . Rank

. . . Teaching
Evaluation Instructor
Assistant
Professor
Associate
Professor Professor
Above
36 62 45 50
Average
Each person has two Average 48 50 35 43
categorical responses. Below
30 13 20 35
Average
2
What are we interested in from this two-way
classification table?
Rank
Teaching Assistant Associate Relative
Evaluation Instructor Professor Professor Professor Sum Frequency Is the level
Above
36 62 45 50 193 0.413
of teaching
Average evaluation
Average 48 50 35 43 176 0.377
related to
Below
Average
30 13 20 35 98 0.210 rank?
Sum 114 125 100 128 467 1.000
Relative
0.244 0.268 0.214 0.274 1.000
Frequency

Are Professors more likely to be judged above average than other ranks?
Ho: Teaching Evaluation and Rank are independent variables.

Two variables that have been categorized in a two-way table are independent
if the probability that a measurement is classified into a given cell of the table is
equal to the probability of being classified into that row times the probability of
being classified into that column. This must be true for all cells of the table.
3
Rank

Teaching Assistant Associate


ni Relative
Evaluation Instructor Professor Professor Professor Sum Frequency
Above p11 p12 p13 p14 p1.
193
Average
Average p21 p22 p23 p24 176 p2.
Below p31 p32 p33 p34 p3.
98
Average
n j Sum 114 125 100 128 467 1.000
Relative p.1 p.2 p.3 p.4 1.000
Frequency
n 
The independence assumption: ij i  j for all ij
nij n ij Observed Test Statistic:

Eij n i  j   
2
r c n ij  Eij 
2

Expected Eij
ni n j i 1 j 1
Eij 
n  df = (r-1)(c-1)
r=#rows=3, c=#cols=4, 3 4 table.4
Observed Counts

Rank
Teaching Assistant Associate Relative
Evaluation Instructor Professor Professor Professor Sum Frequency
Above
36 62 45 50 193 0.413
Average
Average 48 50 35 43 176 0.377
Below
30 13 20 35 98 0.210
Average
Sum 114 125 100 128 467 1.000
Relative
0.244 0.268 0.214 0.274 1.000
Frequency
5
Expected Counts

Rank
Teaching Assistant Associate
Evaluation Instructor Professor Professor Professor Sum
Above
47.113 51.660 41.328 52.899 193
Average
Average 42.964 47.109 37.687 48.240 176
Below
23.923 26.231 20.985 26.861 98
Average
Sum 114 125 100 128 467

ni n j Assumptions: no Eij < 1, and


Eij 
n  no more than 20% of Eij < 5.
6
Individual Cell Chi Square Values

Teaching Assistant Associate


Evaluation Instructor Professor Professor Professor
Above
2.6215 2.0698 0.3263 0.1589
Average
Average 0.5904 0.1774 0.1916 0.5692
Below
1.5438 6.6740 0.0462 2.4663
Average

 2 2.62    2.47 17.44,  62, 0.95 12.59,  Reject Ho

There is evidence of an association between rank and


evaluation. Note that we observed less Assistant Professors
getting below average evaluations (13) than we would expect
under independence (26.2). Chi Square value is 6.67.
7
Minitab
rank eval count
1 1 30 STAT >
1 2 48 TABLES >
1 3 36 Cross Tabs
2 1 13 Classification Variables:
2 2 50 rank eval
2 3 62 Check Chi-square Analysis, and
3 1 20
Above and Std. residual
3 2 35
3 3 45 Frequencies are in: count
4 1 35
4 2 43
4 3 50

Input data in this


way

8
Cell Contents --
Tabulated Statistics: eval, rank Count
Rows: eval Columns: rank Exp Freq
Std. Resid
1 2 3 4 All

1 30 13 20 35 98
23.92 26.23 20.99 26.86 98.00
1.24 -2.58 -0.22 1.57 -- Square roots of
Individual Chi-
2 48 50 35 43 176
square values:
42.96 47.11 37.69 48.24 176.00
0.77 0.42 -0.44 -0.75 --
nij  Eij
3 36 62 45 50 193
47.11
-1.62
51.66
1.44
41.33
0.57
52.90 193.00
-0.40 --
Eij
All 114 125 100 128 467
114.00 125.00 100.00 128.00 467.00
-- -- -- -- --

Chi-Square = 17.435, DF = 6, P-Value = 0.008 9


options ls=79 ps=40 nocenter;
data eval;
Table of job by rating

job rating
SAS
input job $ rating $ number;
Frequency‚
datalines; Percent ‚
Instructor Above 36 Row Pct ‚
Col Pct ‚Above ‚Average ‚Below ‚ Total
Instructor Average 48 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Instructor Below 30 Assistan ‚ 62 ‚ 50 ‚ 13 ‚ 125
‚ 13.28 ‚ 10.71 ‚ 2.78 ‚ 26.77
Assistant Above 62 ‚ 49.60 ‚ 40.00 ‚ 10.40 ‚
Assistant Average 50 ‚ 32.12 ‚ 28.41 ‚ 13.27 ‚
Assistant Below 13 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Associat ‚ 45 ‚ 35 ‚ 20 ‚ 100
Associate Above 45 ‚ 9.64 ‚ 7.49 ‚ 4.28 ‚ 21.41
Associate Average 35 ‚ 45.00 ‚ 35.00 ‚ 20.00 ‚
‚ 23.32 ‚ 19.89 ‚ 20.41 ‚
Associate Below 20 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Professor Above 50 Instruct ‚ 36 ‚ 48 ‚ 30 ‚ 114
‚ 7.71 ‚ 10.28 ‚ 6.42 ‚ 24.41
Professor Average 43 ‚ 31.58 ‚ 42.11 ‚ 26.32 ‚
Professor Below 35 ‚ 18.65 ‚ 27.27 ‚ 30.61 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
; Professo ‚ 50 ‚ 43 ‚ 35 ‚ 128
run; ‚ 10.71 ‚ 9.21 ‚ 7.49 ‚ 27.41
‚ 39.06 ‚ 33.59 ‚ 27.34 ‚
proc freq data=eval; ‚ 25.91 ‚ 24.43 ‚ 35.71 ‚
weight number; ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 193 176 98 467
table job*rating / chisq ; 41.33 37.69 20.99 100.00
run; 10
The FREQ Procedure

Statistics for Table of job by rating

Statistic DF Value Prob


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 6 17.4354 0.0078
Likelihood Ratio Chi-Square 6 18.7430 0.0046
Mantel-Haenszel Chi-Square 1 10.8814 0.0010
Phi Coefficient 0.1932
Contingency Coefficient 0.1897
Cramer's V 0.1366

Sample Size = 467

11
First you need to tell SPSS that each observation
must be weighted by the cell count.
SPSS

DATA > WEIGHT CASES

Then you choose the analysis.


ANALYZE >
DESCRIPTIVE STATISTICS >
CROSS TABS

12
13
> score <- c(36,48,30,62,50,13,45,35,20,50,43,35)

R
> mscore <- matrix(score,3,4)
> mscore
[,1] [,2] [,3] [,4]
[1,] 36 62 45 50
[2,] 48 50 35 43
[3,] 30 13 20 35
> chisq.test(mscore)

Pearson's Chi-squared test

data: mscore
X-squared = 17.4354, df = 6, p-value = 0.00781

> out <- chisq.test(mscore)


> out[1:length(out)]
$statistic
X-squared
17.43537

$parameter
df
6

$p.value 14
[1] 0.00780959
$method
[1] "Pearson's Chi-squared test"

$data.name
[1] "mscore"

$observed
[,1] [,2] [,3] [,4]
[1,] 36 62 45 50
[2,] 48 50 35 43
[3,] 30 13 20 35
Square roots of
$expected Individual Chi-
[,1] [,2] [,3] [,4] square values:
[1,] 47.11349 51.65953 41.32762 52.89936
[2,] 42.96360 47.10921 37.68737 48.23983
nij  Eij
[3,] 23.92291 26.23126 20.98501 26.86081 Eij
$residuals
[,1] [,2] [,3] [,4]
[1,] -1.6191155 1.4386830 0.5712511 -0.3986361
[2,] 0.7683695 0.4211764 -0.4377528 -0.7544218
[3,] 1.2424774 -2.5834003 -0.2150237 1.5704402
15
Test of Homogeneity

Suppose we wish to determine if there is an association between a


rare disease and another more common categorical variable (e.g.
smoking). We can’t just take a random sample of subjects and hope
to get enough cases (subjects with the disease).
One solution is to choose a fixed number of cases, and a fixed
number of controls, and classify each according to whether they are
smokers or not. The same chi square test of independence applies
here, but since we are sampling within subpopulations (have fixed
margin totals), this is now called a chi square test of homogeneity
(of distributions).

16
Homogeneity Null Hypothesis
In general, if the column categories represent c distinct subpopulations,
random samples of size n1, n2, …, nc are selected from each and classified
into the r values of a categorical variable represented by the rows of the
contingency table. The hypothesis of interest here is if there a difference
in the distribution of subpopulation units among the r levels of the
categorical variable, i.e. are the subpopulations homogenous or not.

Subpop 1 = Subpop 2 =…= Subpop c


11 12 ... 1c
21 22 ... 2c
: : : :
r1 r2 ... rc

r
ij = proportion of subpop j subjects
(j=1,…,c) that fall in category i (i=1, 
i 1
ij 1, for each j 1,  , c
…,r). 17
Null hypothesis
of homogeneity

  11    12    1c 
     
  21    22    2c 

      
     
     
 r1   r 2   rc 

18
Example: Myocardial Infarction (MI)
Data was collected to determine if there is an association between
myocardial infarction and smoking in women. 262 women suffering
from MI were classified according to whether they had ever smoked
or not. Two controls (patients with other acute disorders) were
matched to every case.
Myocardial Infarction
Smoked Yes No Totals
Yes 172 173 355
No 90 346 436
Totals 262 519 791

Is the incidence of smoking the same for MI and non-MI


sufferers?
Ho: the incidence of MI is homogenous with respect to smoking

Ho: 11=12 and 21=22 19


Example: MI results in MTB
Stat -> Tables -> Chi-Square Test
--------------------------------------------------------------------------------------------
Chi-Square Test: MI Yes, MI No
Expected counts are printed below observed counts

MI Yes MI No Total
1 172 173 345
115.74 229.26

2 90 346 436
146.26 289.74

Total 262 519 781

Chi-Sq = 27.352 + 13.808 + 21.643 + 10.926 = 73.729


DF = 1, P-Value = 0.000

Conclude: there is evidence of lack of homogeneity of incidence


of MI with respect to smoking.
20
Odds and Odds Ratios
Sometimes probabilities are expressed as odds, e.g.
• Gambling circles. (Why?)
• Biomedical studies. (Easy interpretation in logistic regression, etc.)

Odds of Event A = P(A)  (1-P(A))


P(A) = Odds of A / (1 + Odds of A)

Ex: A horse has odds of 3 to 2 of winning. This means that in every


3+2=5 races the horse wins 3 and loses 2. So P(Wins) = 3/5.
To use the above formula express the odds as d to 1, so 1.5 to 1 in
this case. Thus
P(Wins) = 1.5 / (1+1.5) = 1.5 / 2.5 = 3/5.
21
Example: MI and Odds Ratios
For women sufferers of MI, the proportion who ever smoked is
172/262 = 0.656. In other words, the odds that a woman MI
sufferer is a smoker are 0.656/(1-0.656) = 1.9.
ˆ11 0.656

For women non-sufferers of MI, the proportion who ever smoked is


173/519 = 0.333. In other words, the odds that a woman non-MI
sufferer is a smoker are 0.333/(1-0.333) = 0.5.

We can now calculate the odds ratio of being a smoker among MI


sufferers:
ˆ12 0.333
OR = 1.9/0.5 = 3.82

Among MI suffers, the odds of being a smoker are about 4 times


the odds of not being a smoker. Put another way: a randomly
selected MI sufferer is about twice as likely (.656/.333) of being a
22
smoker than of not being one.
Measures of Risk for Binary Outcomes
Consider again the MI (disease) and smoking (risk factor) data.

Myocardial Infarction
Smoked Yes No Totals
Yes 172 183 355
No 90 346 436
Totals 262 519 791

Risk: proportion of subjects with disease in two risk factor groups


(smoke vs. no-smoke).
p1 = prop. of exposed subjs who develop disease
p2 = prop. of unexposed subjs who develop disease
Risk (Smoke=Y) = 172/355 = 0.485
Risk (Smoke=N) = 90/436 = 0.206 23
Risk difference (Excess risk): difference in the proportion of
subjects with outcome in the two risk factor groups, p1−p2.
Risk Difference = 0.485 − 0.206 = 0.279
Use the difference of two sample proportions CI discussed earlier to
obtain a confidence interval.
Risk Ratio (Relative Risk): ratio of risks (to get disease) in the two
risk factor groups, p1/p2.
RR = Relative Risk = 0.485/0.206 = 2.35
“Probability of smokers developing MI is approx 2.4 times that for
non-smokers”.

24

You might also like