0% found this document useful (0 votes)
27 views4 pages

3 SAS 1 Independence

This document discusses chi-square statistics for analyzing 2x2 contingency tables. It defines the chi-square test statistic Q, which follows a chi-square distribution with 1 degree of freedom under the null hypothesis of no association between the row and column variables. It also discusses the Pearson chi-square statistic QP, which is asymptotically equivalent to Q. The document provides an example analysis using PROC FREQ in SAS to calculate Q, QP, and other statistics for a 2x2 table comparing treatment and outcome. Both Q and QP are highly significant, indicating a strong association between treatment and response.

Uploaded by

vaxor paradose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

3 SAS 1 Independence

This document discusses chi-square statistics for analyzing 2x2 contingency tables. It defines the chi-square test statistic Q, which follows a chi-square distribution with 1 degree of freedom under the null hypothesis of no association between the row and column variables. It also discusses the Pearson chi-square statistic QP, which is asymptotically equivalent to Q. The document provides an example analysis using PROC FREQ in SAS to calculate Q, QP, and other statistics for a 2x2 table comparing treatment and outcome. Both Q and QP are highly significant, indicating a strong association between treatment and response.

Uploaded by

vaxor paradose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2.2.

Chi-Square Statistics 17

2.2 Chi-Square Statistics

Table 2.2 displays the generic 2  2 table, including row and column marginal totals.

Table 2.2 2  2 Contingency Table


Row Column
Levels 1 2 Total
1 n11 n12 n1C
2 n21 n22 n2C
Total nC1 nC2 n

Under the randomization framework that produced Table 2.1, the row marginal totals n1C and
n2C are fixed since 60 patients were randomly allocated to one of the treatment groups and 64 to
the other. The column marginal totals can be regarded as fixed under the null hypothesis of no
treatment difference for each patient (since each patient would have the same response regardless
of the assigned treatment, under this null hypothesis). Then, given that all of the marginal totals
n1C , n2C , nC1 , and nC2 are fixed under the null hypothesis, the probability distribution from the
randomized allocation of patients to treatment can be written
n1C Šn2C ŠnC1 ŠnC2 Š
Prfnij g D
nŠn11 Šn12 Šn21 Šn22 Š

which is the hypergeometric distribution. The expected value of nij is


ni C nCj
Efnij jH0 g D D mij
n
and the variance is
n1C n2C nC1 nC2
V fnij jH0 g D D vij
n2 .n  1/

For a sufficiently large sample, n11 approximately has a normal distribution, which implies that

.n11  m11 /2
QD
v11

approximately has a chi-square distribution with one degree of freedom. It is the ratio of a squared
difference from the expected value versus its variance, and such quantities follow the chi-square
distribution when the variable is distributed normally. Q is often called the randomization (or
Mantel-Haenszel) chi-square. It doesn’t matter how the rows and columns are arranged; Q takes
the same value since
jn11 n22  n12 n21 j n1C n2C
jn11  m11 j D jnij  mij j D D jp1  p2 j
n n

where pi D .ni1 =n1C / is the observed proportion in column 1 for the i th row.
18 Chapter 2: The 2  2 Table

A related statistic is the Pearson chi-square statistic. This statistic is written


2 X
X 2
.nij  mij /2 n .p1  p2 /2
QP D D QD
mij .n  1/ f.1=n1C C 1=n2C /pC .1  pC /g
i D1 j D1

where pC D .nC1 =n/ is the proportion in column 1 for the pooled rows.
If the cell counts are sufficiently large, QP is distributed as chi-square with one degree of freedom.
As n grows large, QP and Q converge. A useful rule for determining adequate sample size for
both Q and QP is that the expected value mij should exceed 5 (and preferable 10) for all of the
cells. While Q is discussed here in the framework of a randomized allocation of patients to two
groups, Q and QP are also appropriate for investigating the hypothesis of no association for all of
the sampling frameworks described previously.
The following PROC FREQ statements produce a frequency table and the chi-square statistics
for the data in Table 2.1. The data are supplied in frequency (count) form. An observation
is supplied for each configuration of the values of the variables TREAT and OUTCOME. The
variable COUNT holds the total number of observations that have that particular configuration.
The WEIGHT statement tells the FREQ procedure that the data are in frequency form and names
the variable that contains the frequencies. Alternatively, the data could be provided as case records
for the individual patients; with this data structure, there would be 124 data lines corresponding to
the 124 patients, and neither the variable COUNT nor the WEIGHT statement would be required.
The CHISQ option in the TABLES statement produces chi-square statistics.

data respire;
input treat $ outcome $ count;
datalines;
placebo f 16
placebo u 48
test f 40
test u 20
;

proc freq;
weight count;
tables treat*outcome / chisq;
run;

Output 2.1 displays the data in a 2  2 table. With an overall sample size of 124, and all expected
cell counts greater than 10, the sampling assumptions for the chi-square statistics are met. PROC
FREQ prints out a warning message when more than 20% of the cells in a table have expected
counts less than 5. (You can specify the EXPECTED option in the TABLE statement to produce
the expected cell counts along with the cell percentages.)
2.2. Chi-Square Statistics 19

Output 2.1 Frequency Table

Frequency Table of treat by outcome


Percent
Row Pct outcome
Col Pct
treat f u Total

placebo 16 48 64
12.90 38.71 51.61
25.00 75.00
28.57 70.59

test 40 20 60
32.26 16.13 48.39
66.67 33.33
71.43 29.41

Total 56 68 124
45.16 54.84 100.00

Output 2.2 contains the table with the chi-square statistics.

Output 2.2 Chi-Square Statistics

Statistic DF Value Prob

Chi-Square 1 21.7087 <.0001

Likelihood Ratio Chi-Square 1 22.3768 <.0001

Continuity Adj. Chi-Square 1 20.0589 <.0001

Mantel-Haenszel Chi-Square 1 21.5336 <.0001

Phi Coefficient -0.4184

Contingency Coefficient 0.3860

Cramer's V -0.4184

Fisher's Exact Test

Cell (1,1) Frequency (F) 16

Left-sided Pr <= F 2.838E-06

Right-sided Pr >= F 1.0000

Table Probability (P) 2.397E-06

Two-sided Pr <= P 4.754E-06

Sample Size = 124

The randomization statistic Q is labeled “Mantel-Haenszel Chi-Square,” and the Pearson chi-
square QP is labeled “Chi-Square.” Q has a value of 21.5336 and p < 0:0001; QP has a value
of 21.7087 and p < 0:0001. Both of these statistics are clearly significant. There is a strong
20 Chapter 2: The 2  2 Table

association between treatment and outcome such that the test treatment results in a more favorable
response outcome than the placebo. The row percentages in Output 2.1 show that the test treatment
resulted in 67% favorable response and the placebo treatment resulted in 25% favorable response.
The output also includes a statistic labeled “Likelihood Ratio Chi-Square.” This statistic, often
written QL , is asymptotically equivalent to Q and QP . The statistic QL is described in Chapter
8 in the context of hypotheses for the odds ratio, for which there is some consideration in Section
2.5. QL is not often used in the analysis of 2  2 tables. Some of the other statistics are discussed
in the next section.

2.3 Exact Tests

Sometimes your data include small and zero cell counts. For example, consider the data in Table 2.3
from a study on treatments for healing severe infections. Randomly assigned test treatment and
control are compared to determine whether the rates of favorable response are the same.

Table 2.3 Severe Infection Treatment Outcomes


Treatment Favorable Unfavorable Total
Test 10 2 12
Control 2 4 6
Total 12 6 18

Obviously, the sample size requirements for the chi-square tests described in Section 2.2 are not
met by these data. However, if you can consider the margins (12, 6, 12, 6) to be fixed, then the
random assignment and the null hypothesis of no association imply the hypergeometric distribution

n1C Šn2C ŠnC1 ŠnC2 Š


Prfnij g D
nŠn11 Šn12 Šn21 Šn22 Š

The row margins may be fixed by the treatment allocation process; that is, subjects are randomly
assigned to Test and Control. The column totals can be regarded as fixed by the null hypothesis;
there are 12 patients with favorable response and 6 patients with unfavorable response, regardless of
treatment. If the data are the result of a sample of convenience, you can still condition on marginal
totals being fixed by addressing the null hypothesis that the patients are interchangeable; that is,
the observed distributions of outcome for the two treatments are compatible with what would be
expected from random assignment. That is, all possible assignments of the outcomes for 12 of the
patients to Test and for 6 to Control are equally likely.
Recall that a p-value is the probability of the observed data or more extreme data occurring under
the null hypothesis. With Fisher’s exact test, you determine the p-value for this table by summing
the probabilities of the tables that are as likely or less likely, given the fixed margins. Table 2.4
includes all possible table configurations and their associated probabilities.

You might also like