PDF Exact Statistical Inference for Categorical Data 1st Edition Shan - eBook PDF download
PDF Exact Statistical Inference for Categorical Data 1st Edition Shan - eBook PDF download
com
https://ebooksecure.com/download/exact-
statistical-inference-for-categorical-data-ebook-
pdf/
https://ebooksecure.com/product/ebook-pdf-applied-statistical-
inference-with-minitab-second-edition-2nd-edition/
ebooksecure.com
https://ebooksecure.com/product/ebook-pdf-an-introduction-to-
probability-and-statistical-inference-2nd-edition/
ebooksecure.com
https://ebooksecure.com/product/ebook-pdf-probability-and-statistical-
inference-10th-edition-by-robert-v-hogg/
ebooksecure.com
https://ebooksecure.com/product/ebook-pdf-fire-and-emergency-services-
instructor-principles-and-practice-3rd-edition/
ebooksecure.com
(eBook PDF) Supervision Today! 9th Edition by Stephen P.
Robbins
https://ebooksecure.com/product/ebook-pdf-supervision-today-9th-
edition-by-stephen-p-robbins/
ebooksecure.com
https://ebooksecure.com/product/business-in-action-8th-edition-ebook-
pdf/
ebooksecure.com
https://ebooksecure.com/product/writing-about-writing-a-college-
reader-third-edition-by-elizabeth-wardle-ebook-pdf/
ebooksecure.com
https://ebooksecure.com/product/human-anatomy-physiology-laboratory-
manual-main-version-12th-edition-by-elaine-n-marieb-ebook-pdf/
ebooksecure.com
https://ebooksecure.com/download/harrisons-hematology-and-oncology-
ebook-pdf/
ebooksecure.com
Fluid Mechanics (9th Edition) Frank M. White - eBook PDF
https://ebooksecure.com/download/fluid-mechanics-9th-edition-ebook-
pdf/
ebooksecure.com
Exact Statistical Inference
for Categorical Data
Exact Statistical Inference
for Categorical Data
Guogen Shan
University of Nevada, Las Vegas, NV, USA
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system,
without permission in writing from the publisher. Details on how to seek permission, further
information about the Publisher’s permissions policies and our arrangements with organizations such as
the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website:
www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical treatment
may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating
and using any information, methods, compounds, or experiments described herein. In using such
information or methods they should be mindful of their own safety and the safety of others, including
parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume
any liability for any injury and/or damage to persons or property as a matter of products liability,
negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas
contained in the material herein.
ISBN: 978-0-08-100681-8
vii
viii List of Figures
2.2 Type I error rate for the five exact approaches when K = 3
and a sample size of 30 per group at the significance level of
α = 0.05, the left plot with the score value d = (0, 1, 2), and
the right plot with the score value d = (0, 1, 4). ...................... 34
2.3 Power comparison among the five exact approaches when
K = 4, a sample size of 30 per group, and the score value
d = (0, 1, 2, 3). ................................................................ 36
2.4 Power comparisons among the C approach, the M approach,
and the C+M approach, using the χ 2 test with total
sample sizes of 25, 50, 100, and 300 from the first row to
the fourth row.................................................................. 40
LIST OF TABLES
ix
PREFACE
The first chapter reviews the three sampling methods to generate data, then
presents the five exact approaches. Data that can be organized in a 2 × 2
contingency table is considered in this chapter. Among the five approaches,
one is the exact conditional approach, and the remaining four are uncondi-
tional. This book is the first to comprehensively compare the performance
of the five exact approaches for data from a binomial comparative study.
Chapter 2 deals with data from a 2 × K table by applying the exact
approaches. Such data is commonly obtained from a dose-response study,
and a genetic study. The last chapter is given to sample size determination
based on exact approaches. Power analysis is an essential part of a research
proposal, and accurate sample size determination would increase the chance
of the proposal being funded and finished in a timely manner.
xi
CHAPTER 1
Exact Statistical Inference for a 2 × 2 Table
Independent Study
In the first sampling method for a 2 × 2 independent study, both marginal
totals of a 2×2 table are considered as fixed, that is, sample sizes (n1 , n2 ) for
the factor A and (m1 , m2 ) for the factor B, are known before the study. One
classical example is the experiment described by Fisher [2]: a lady claimed
that she could tell whether milk was poured before or after tea in a cup. In
this interesting experiment, eight cups were prepared with four of each kind.
The lady was informed that among these eight cups, four were prepared with
tea first and the remaining four with milk first. After the lady tasted all eight
cups, she reported which four cups she thought had milk added first. It is
obvious in this experiment that both marginal totals are fixed, with four for
each kind on both marginal totals from the truth and the lady’s answer. Such
studies are relatively rare in practice due to the fact that subjects in the study
were informed about the number of each kind, and for this reason, a 2 × 2
independent study will not be further discussed in detail here.
Comparative Study
A 2 × 2 comparative study involves two independent groups with sample
sizes n1 and n2 . At the end of the study, the associated number of events
(e.g., response, survival) x1 and x2 are observed from the first group and the
second group, respectively. It is often interesting to compare the response
Exact Statistical Inference for Categorical Data. http://dx.doi.org/10.1016/B978-0-08-100681-8.00001-4
Copyright © 2016 Elsevier Ltd. All rights reserved. 1
2 Exact Statistical Inference for Categorical Data
rate between the two groups. The following Phase III study is used to
illustrate the setting of a 2 × 2 comparative study.
Ha : p1 = p2 .
This test can be found in the function prop.test from statistical software R,
and the freq procedure from SAS to compare two independent proportions.
It should be noted that the χ 2 test statistic is equivalent to the Z test statistic
with a pooled variance estimate, which is given as
p̂1 − p̂2
,
p̂(1 − p̂)(1/n1 + 1/n2 )
where p̂1 = x1 /n1 , p̂2 = x2 /n2 , and p̂ = (x1 + x2 )/(n1 + n2 ). It is obvious
that the Z test statistic can be applied to a one-sided problem, but the χ 2 test
statistic Tχ 2 is only used for a two-sided problem.
The asymptotic limiting distributions of the χ 2 test and the Z test are
often used for statistical inference, and they are appropriate for use in
practice only when cell frequencies are large enough. The χ 2 test is not
recommended for use when the lowest expected frequencies from the four
cells is less than 5 [4, 5]. However, Cochran [6] argued that the cut point
value 5 is chosen arbitrarily, and this cut point may be modified when new
evidence from data becomes available. For data with small cell frequencies,
exact approaches (e.g., Fisher’s exact conditional approach) are generally
recommended [2, 7, 8]. Several exact approaches [2, 8–15] will be discussed
later in Section 1.1.
The Pearson χ 2 test was used for testing the association, and the p-value
was found to be much less than 0.05. Then, the authors concluded that
smoking was significantly associated with the occurrence of synchronous
primaries in UADT. They also mentioned that the Yates’ correction was
used in the χ 2 test statistic as small expected frequencies were observed
from the table.
2
2
nij N
TLR = 2 nij log ,
mi n j
i=1 j=1
Let pij = nij /N be probability for the i-th level of the factor A and
j-th level of the factor B. Suppose p1 = p11 + p21 and p2 = p11 + p12 are
the marginal probabilities. The difference between these two proportions is
often the parameter of interest, p1 − p2 , or equivalently p21 − p12 . To make a
statistical inference for this parameter, the most commonly used test statistic
is the McNemar test [19]:
(n21 − n12 )2
TMC = .
n21 + n12
It should be noted that only the off-diagonal numbers, n12 and n21 , from a
2 × 2 table are used in the test statistic, and the diagonal values, n11 and n22 ,
have no influence on computing the test statistic and the p-value calculation.
There has been a long-term debate over whether all values should be used
in the test statistic.
Under the conditional framework with both marginal totals fixed, the
value at the (1, 1) cell, n11 , determines the other three values in a 2 × 2
contingency table. Therefore, we use n11 to represent the complete data
(n11 , n12 , n21 , n22 ) for simplicity. The probability of a 2 × 2 table as in Table
1.1 under the conditional approach is computed as
m1 !m2 !n1 !n2 !
PC (n11 ) = . (1.1)
N!n11 !n12 !n21 !n22 !
Then, the p-value is computed by adding the probabilities of the given table
and other more extreme tables. For example, in a one-sided hypothesis
problem that rejects the null hypothesis with a large test statistic, all the
tables with the test statistic values being larger than or equal to that of
the given table’s are in the rejection region and their probabilities are
added together to compute the p-value. Although the assumption for the
limiting distribution of a test statistic is not needed in exact approaches,
some assumptions related to a study itself must be satisfied, for example,
the independence assumption of participants. These assumptions can be
checked from the study.
approach is the alternative that should be used to guarantee the type I error
rate. The exact conditional approach is widely available in the majority
of statistical software (SAS, R, StatXact, Stata, SPSS, etc.). For a simple
2 × 2 table, the p-value calculation should not take a long time even for a
large sample size.
The data from Example 1.1 is used to illustrate the application of the
conditional approach. This is a randomized placebo-control two-arm study
for patients with chronic non-cancer-related pain [3]. If this study assumes
that the treatment response rate should be higher than that from the placebo
group in advance, then a one-sided hypothesis would be appropriate with the
alternative hypothesis as Ha : pt > pc , where pt and pc are the response rate
for the treatment group and the placebo group, respectively. In this study, the
response rates for the treatment group and the placebo group are estimated as
pt = 27.1% and pc = 18.9%, respectively, from the observed data in Table
1.2. When both marginal totals are fixed as in the conditional approach,
the value, n11 , determines the other three values, and the sample space can
be simplified as a collection of all possible n11 values. Given the marginal
totals, n1 and n2 for column totals, m1 and m2 for row totals as in Table
1.1, the range of the possible values for n11 is from max(0, n1 + m1 − N)
to min(n1 , m1 ). For this particular example, this range is from 0 to 99,
therefore the size of the sample space for the conditional approach is 100.
For each data point in the sample space, its probability can be computed
from Equation (1.1). The null hypothesis99 will be rejected when n11 ≥ 58,
therefore, the p-value is calculated as n11 =58 PC (n11 ) = 0.028.