The Control of the False Discovery Rate in Multiple Testing Under Dependency
The Control of the False Discovery Rate in Multiple Testing Under Dependency
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
The Annals of Statistics
2001, Vol. 29, No. 4, 1165-1188
Benjamini and Hochberg suggest that the false discovery rate may be
the appropriate error rate to control in many applied multiple testing prob-
lems. A simple procedure was given there as an FDR controlling procedure
for independent test statistics and was shown to be much more powerful
than comparable procedures which control the traditional familywise error
rate. We prove that this same procedure also controls the false discovery
rate when the test statistics have positive regression dependency on each of
the test statistics corresponding to the true null hypotheses. This condition
for positive dependency is general enough to cover many problems of prac-
tical interest, including the comparisons of many treatments with a single
control, multivariate normal test statistics with positive correlation matrix
and multivariate t. Furthermore, the test statistics may be discrete, and
the tested hypotheses composite without posing special difficulties. For all
other forms of dependency, a simple conservative modification of the proce-
dure controls the false discovery rate. Thus the range of problems for which
a procedure with proven FDR control can be offered is greatly increased.
1. Introduction.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1166 Y. BENJAMINI AND D. YEKUTIELI
between the treatment and the control groups. These endpoints included,
among others, the number of patients developing hypercalcemia, the num-
ber of episodes, the time the episodes first appeared, number of fractures
and morbidity. As is clear from the condensed information in the abstract,
the researchers were interested in all 18 particular potential benefits of the
treatment.
All six p-values less than 0.05 are reported as significant findings. No
adjustment for multiplicity was tried nor even a concern voiced.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1167
between Scylla and Charybdis" in Lander and Kruglyak (1995)] has been
heavily debated.
1.2. The false discovery rate. The false discovery rate (FDR), suggested by
Benjamini and Hochberg (1995) is a new and different point of view for how
the errors in multiple testing could be considered. The FDR is the expected
proportion of erroneous rejections among all rejections. If all tested hypotheses
are true, controlling the FDR controls the traditional FWE. But when many
of the tested hypotheses are rejected, indicating that many hypotheses are
not true, the error from a single erroneous rejection is not always as crucial
for drawing conclusions from the family tested, and the proportion of errors
is controlled instead. Thus we are ready to bear with more errors when many
hypotheses are rejected, but with less when fewer are rejected. (This frequen-
tist goal has a Bayesian flavor.) In many applied problems it has been argued
that the control of the FDR at some specified level is the more appropriate
response to the multiplicity concern: examples are given in Section 2.1 and
discussed in Section 4.
The practical difference between the two approaches is neither trivial nor
small and the larger the problem the more dramatic the difference is. Let us
demonstrate this point by comparing two specific procedures, as applied to
Example 1.1. To fix notation, let us assume that of the m hypotheses tested
{H?, Ho, ..., Ho }, mo are true null hypotheses, the number and identity of
which are unknown. The other m - mo hypotheses are false. Denote the cor-
responding random vector of test statistics {X1, X2, ..., Xm}, and the corre-
sponding p-values (observed significance levels) by {P1, P2,..., Pm } where
Pi = 1- FHo(Xi).
Benjamini and Hochberg (1995) showed that when the test statistics are
independent the following procedure controls the FDR at level q mo/m < q.
THE BENJAMINI HOCHBERG PROCEDURE. Let P(1) < P(2) < < P(m) be the
ordered observed p-values. Define
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1168 Y BENJAMINI AND D. YEKUTIELI
cedure rejects the two hypotheses with p-values less than 0.001, just as the
Bonferroni procedure does. The FDR controlling procedure rejects the four
hypotheses with p-values less than 0.01. In this study the ninth p-value is
compared with 0.005 if FWE control is required, with 0.025 if FDR control is
desired.
More details about the concept and procedures, other connections and his-
torical references are discussed in Section 2.2.
1.3. The problem. When trying to use the FDR approach in practice,
dependent test statistics are encountered more often than independent ones,
the multiple endpoints example of the above being a case in point. A simulation
study by Benjamini, Hochberg and Kling (1997) showed that the same proce-
dure controls the FDR for equally positively correlated normally distributed
(possibly Studentized) test statistics. The study also showed, as demonstrated
above, that the gain in power is large. In the current paper we prove that the
procedure controls the FDR in families with positively dependent test statis-
tics (including the case investigated in the mentioned simulation study). In
other cases of dependency, we prove that the procedure can still be easily modi-
fied to control the FDR, although the resulting procedure is more conservative.
Since we prove the theorem for the case when not all tested hypotheses
are true, the structure of the dependency assumed may be different for the
set of the true hypotheses and for the false. We shall obviously assume that
at least one of the hypotheses is true, otherwise the FDR is trivially 0. The
following property, which we call positive regression dependency on each one
from a subset Io, or PRDS on Io, captures the positive dependency structure
for which our main result holds. Recall that a set D is called increasing if
x E D and y > x, implying that y E D as well.
PROPERTY PRDS. For any increasing set D, and for each i E Io, P(X E D
Xi = x) is nondecreasing in x.
1.4. The results. We are now able to state our main theorems.
THEOREM 1.2. If the joint distribution of the test statistics is PRDS on the
subset of test statistics corresponding to true null hypotheses, the Benjamini
Hochberg procedure controls the FDR at level less than or equal to m? q.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1169
As can be seen from the above summary, the results of this article greatly
increase the range of problems for which a powerful procedure with proven
FDR control can be offered.
2. Background.
Q_ |V/R, if R,>O~
Q /R, otherwise.
Then the FDR is simply E(Q). Their approach calls for controlling the FDR
at a desired level q, while maximizing E(R).
If all null hypotheses are true (the intersection null hypothesis holds) the
FDR is the same as the probability of making even one error. Thus controlling
the FDR controls the latter, and q is maybe chosen at the conventional levels
for a. Otherwise, when some of the hypotheses are true and some are false, the
FDR is smaller [Benjamini and Hochberg (1995)]. The control of FDR assumes
that when many of the tested hypotheses are rejected it may be preferable to
control the proportion of errors rather than the probability of making even
one error.
The FDR criterion, and the step-up procedure that controls it, have been
used successfully in some very large problems: thresholding of wavelets coeffi-
cients [Abramovich and Benjamini (1996)], studying weather maps [Yekutieli
and Benjamini (1999)] and multiple trait location in genetics [Weller et al.
(1998)], among others. Another attractive feature of the FDR criterion is that
if it is controlled separately in several families at some level, then it is also
controlled at the same level at large (as long as the families are large enough,
and do not consist only of true null hypotheses).
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1170 Y. BENJAMINI AND D. YEKUTIELI
Although the FDR controlling procedure has been implemented in standard
computer packages (MULTPROC in SAS), one of its merits is the simplicity
with which it can be performed by succinct examination of the ordered list
of p-values from the largest to the smallest, and comparing each P(i) to i
times q/m stopping at the first time the former is smaller than the latter and
rejecting all hypotheses with smaller p-values. Rough arithmetic is usually
enough.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1171
2.3. Historical background and related results. The FDR controlling mul-
tiple testing procedure [Benjamini and Hochberg (1995)], given by (1), is a
step-up procedure that involves a linear set of constants on the p-value scale
(step-up in terms of test statistics, not p-values). The FDR controlling pro-
cedure is related to the global test for the intersection hypothesis, which is
defined in terms of the same set of constants: reject the single intersection
hypothesis if there exist an i s.t. P(i) < ma. Simes (1986) showed that when
the test statistics are continuous and independent, and all hypotheses are
true, the level of the test is a. The equality is referred to as Simes' equality,
and the test has been known in recent years as Simes' global test. However
the result had already been proved by Seeger (1968) [Shaffer (1995) brought
this forgotten reference to the current literature.] See Sen (1999a, b) for an
even earlier, though indirect, reference.
Simes (1986) also suggested the procedure given by (1) as an informal mul-
tiple testing procedure, and so did Elkund, some 20 years earlier [Seeger
(1968)]. The distinction between a global test and a multiple testing proce-
dure is important. If the single intersection hypothesis is rejected by a global
test, one cannot further point at the individual hypotheses which are false.
When some hypotheses are true while other are false (i.e., when mo < m),
Seeger (1968) showed, referring to Elkund, and Hommel (1988) showed, refer-
ring to Simes, that the multiple testing procedure does not necessarily control
the FWE at the desired level. Therefore, from the perspective of FWE control,
it should not be used as a multiple testing procedure. Other multiple testing
procedures that control the FWE have been derived from the Seeger-Simes
equality, for example, by Hochberg (1988) and Hommel (1988).
Interest in the performance of the global test when the test statistics are
dependent started with Simes (1986), who investigated whether the procedure
is conservative under some dependency structures, using simulations. On the
negative side, it has been established by Hommel (1988) that the FWE can
get as high as a (1 + 1/2 + +.. ? 1/m). The joint distribution for which this
upper bound is achieved is quite bizarre, and rarely encountered in practice.
But even with tamed distributions, the global test does not always control
the FWE at level a. For example, when two test statistics are normally dis-
tributed with negative correlation the FWE is greater than a, even though the
difference is very small for conventional levels [Hochberg and Rom (1995)].
On the other hand, extensive simulation studies had shown that for posi-
tive dependent test statistics, the test is generally conservative. These results
were followed by efforts to extend theoretically the scope of conservativeness,
starting with Hochberg and Rom (1995). These efforts have been reviewed in
the most recent addition to this line of research by Sarkar (1998). An exten-
sive discussion with many references can be found in Hochberg and Hommel
(1998).
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1172 Y. BENJAMINI AND D. YEKUTIELI
Directly relevant to our work are the two strongest results for positive
dependent test statistics: Chang, Rom and Sarkar (1996) proved the conser-
vativeness for multivariate distributions with MTP2 densities. The condition
for positive dependency is weaker in the first but the proof applies to bivariate
distributions only. Theorem 1.2, when applied to the limited situation where
all null hypotheses are true, generalizes the result of Chang, Rom and Sarkar
(1996) to multivariate distributions. Although the final result is somewhat
stronger than that of Sarkar (1998), the generalization is hardly of impor-
tance for the limited case in which all tested hypotheses are true. The full
strength of Theorem 1.2 is in the situation when some hypotheses may be
true and some may be false, where the full strength of a multiple testing pro-
cedure is needed. For this situation the results of Section 2.1 for independent
test statistics are the only ones available.
3.1. Distributions.
PROOF. For any i E Io, denote by X(i) the remaining m - 1 test statistics,
Aw is its mean vector, Y(i) i is the column of covariances of Xi with X(i), and
(i, i) is X after dropping the ith row and column.
The distribution of X) given Xi xi is N(,u(i), V(i)), where
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1173
as xi increases; that is, for any increasing functions f, if xi < x' then
The proof of this lemma is somewhat delicate and lengthy and is given in
the Appendix. Condition (d) of the lemma depends on both the transformation
gi and the distribution of Yi and U. In the following example condition (d) is
asserted via the stronger TP2 condition.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1174 Y. BENJAMINI AND D. YEKUTIELI
the TP2 property for each pair (Ui, W), i = 0, 1. Since for i = 0, 1,
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1175
EXAMPLE 3.4. The study of uterine weights of mice reported by Steel and
Torrie (1980) and discussed in Westfall and Young (1993) comprised a com-
parison of six groups receiving different solutions to one control group. The
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1176 Y. BENJAMINI AND D. YEKUTIELI
lower-tailed p-values of the pooled variance t-statistics are 0.183, 0.101, 0.028,
0.012, 0.003, 0.002. Westfall and Young (1993) show that, using p-value resam-
pling and step-down testing, three hypotheses are rejected at FWE 0.05. Four
hypotheses are rejected when applying procedure (1) using FDR level
of 0.05.
EXAMPLE 3.5 (Low lead levels and IQ). Needleman, Gunnoe, Leviton,
Reed, Presie, Maher and Barret (1979) studied the neuropsychologic effects
of unidentified childhood exposure to lead by comparing various psycholog-
ical and classroom performances between two groups of children differing
in the lead level observed in their shed teeth. While there is no doubt that
high levels of lead are harmful, Needleman's findings regarding exposure to
low lead levels, especially because of their contribution to the Environmen-
tal Protection Agency's review of lead exposure standards, are controversial.
Needleman's study was attacked on the ground of methodological flaws; for
details see Westfall and Young (1993). One of the methodological flaws pointed
out is control of multiplicity. Needleman et al. (1979) present three families of
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1177
TABLE 1
jointly
The critics argue that multiplicity should be controlled for all families
jointly. Using Hochberg's method at 0.05 level, correcting within each fam-
ily, six hypotheses are rejected. Correcting for all 35 responses, lead is found
to have an adverse effect in only two out of 35 endpoints.
Applying procedure (1) at 0.05 FDR level, the attack on Needleman findings
on grounds of inadequate multiplicity control is unjustified; whether analyzed
jointly or each family separately, lead was found to have an adverse effect in
more than a quarter of the endpoints.
4. Proof of theorem. For ease of exposition let us denote the set of con-
stants in (1), which define the procedure, by
Let A s denote the event that the Benjamini Hochberg procedure rejects
exactly v true and s false null hypotheses. The FDR is then
ml mO
s=O v=l v + S
LEMMA 4.1.
i=1
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1178 Y. BENJAMINI AND D. YEKUTIELI
PROOF. For a fixed v and s, let w denote a subset of {1... mO} of size v,
and AO s the event in A that the v true null hypotheses rejected are w. Note
that Pr{Pi < qv+s n A, 5} equals Pr{A, J} if i E wt), and is otherwise 0.
i-=1 i=l (0
LPr({Pi < q,e,} n Av, s) Pr ({Pi < qv+sl n AO
(7) ( i1 mO
mO
ml mO M ?O1
(8) S=O V V-+S| V Pr( Pi ' qV+S} n Av s)}
(8)s0v1j 0 i1 mli is A 1 1
i=O S=O =1U?S
Denote by C(t) - U{C('): v + s = k}. For each i the C(k) are disjoint, so the
FDR can be expressed as
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1179
can also be described using the ordered set of the p-values in the range of
p(i) () < < P( 1), in the following way:
P(I) P(,-i) i) i)
(11) Dk = {p: qk+l < P(k)' qk+2 < P(k+l) ... <P(m-1)J
for k = 1... m - 1, and D(i) is simply the entire space. Expressing D(i) as
I ~~~~~~~~~~ ~~k
above, it becomes clear that for each k, D(i) is a nondecreasing set.
We now shall make use of the PRDS property, which states that for p < p',
+ kI
Pr ({Pi < qk} n Dkk)+ Pr ({Pi < qk+1} n C(k)
Pr(Pi < qk) Pr(Pi < qk+l)
(15) < Pr ({Pi < qk+1} n Dk ) Pr ({Pi < qk+i} n C(5)J1)
Pr(Pi < qk+l) Pr(Pi < qk+l)
Pr ({ Pi < qk+l} n Dt1)
Pr (Pi < qk+l)
Now, start by noting that C1 = D1, and repeatedly use the above inequality
for i = 1, ..., m - 1, to fold the sum on the left into a single expression,
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1180 Y. BENJAMINI AND D. YEKUTIELI
because Pr(Pi < qk) < qk = mq under the null hypothesis (with equality for
continuous test statistics where each Pi is uniform), so finally, invoking (16),
REMARK 4.2. Note that PRDS is a sufficient but not a necessary condition.
In particular the PRDS property need not hold for all monotone sets D and
all values of pi. According to inequality (12), it is enough that they hold for
monotone sets of the form of (11) and Pi E [0, q].
This remark is used to establish that Theorem 1.2 holds for one-sided mul-
tivariate t and q < 1/2, even though the distribution is not PRDS.
(19) mo m 1 k P 1:Ek
(Pi < qmJ
Pr k
(C())
i=1 k=1
THEOREM 5.1. For independent test statistics, the Benjamini Hochberg pro-
cedure controls the FDR at level less or equal to m? q. If the test statistics are
also continuous, the FDR is exactly m? q.
The argument leading to the above theorem used only the fact that for
discrete test statistics the tail probabilities are smaller. Thus, in a similar
way, it follows that the FDR is controlled when the procedure is used for
testing composite null hypotheses, as in one-sided tests.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1181
The surprising part of Theorem 5.1 is that equality holds no matter what
the distributions of the test statistics corresponding to the false null hypothe-
ses are. The following theorem shows that this is a unique property of the
step-up procedure which uses the constants { k q}. More generally, we can
define step-up procedures SU(&), using any other monotone series of con-
stants a1 < a,1 < < am_ let k = max{i: P(i) < ai}, and if such k exists
reject H(1)... H(k).
THEOREM 5.3. Testing m hypotheses with SU(ot), assume that the distri-
bution of the P-values, P = (PO, P1) is jointly independent.
(i) If the ratio ak/k is increasing in k, as the distribution of P1 increases
stochastically the FDR decreases.
(ii) If the ratio ak/k is decreasing in k, as the distribution of P1 increases
stochastically the FDR increases.
PROOF. Given the set of critical values a for k = 1, ..., m we define the
following sets:
(22) Ck(a) = P(i): P(k-1) < ak,., P(k) > ak+?1 (mr-) >1amj
Thus if p(i) E Ck(a) and Pi < ak then Ho is rejected along with k - 1 other
hypotheses, but if Pi > ak, Ho is not rejected. Notice that sets Ck(et) are
ordered. If p(i) E Ck(oa) and p(i) < pi(i), then all ordered coordinates of p,(i)
are greater or equal to corresponding coordinates of p(i). Therefore for j =
... m -1, P() > aj, thus p,(i) E Cl(a) for some 1 < k.
Next we define the function fa, fa: [0, l]m-1 f,
mO m
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1183
(30) E (m < q.
Other procedures, which get closer to controlling the FDR at the desired level,
have been offered for independent test statistics in Benjamini and Hochberg
(2000), and in Benjamini and Wei (1999). Only little is known about the per-
formance of the first for dependent test statistics [Benjamini, Hochberg and
Kling (1997)], and nothing about the second.
Finally, recall the resampling based procedure of Yekutieli and Benjamini
(1999), which tries to cope with the above problem and at the same time uti-
lize the information about the dependency structure derived from the sample.
The resampling based procedure is more powerful, at the expense of greater
complexity and only approximate FDR control.
APPENDIX
Pr(XeD IXi=x)
is increasing in x. We will achieve this by expressing
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1184 Y. BENJAMINI AND D. YEKUTIELI
We prove the lemma in two steps.
1. For each x < x' we construct a new random variable U' whose marginal
distribution is stochastically smaller than the marginal distribution of U,
but its conditional distribution given Xi = x' is identical to the conditional
distribution of U given Xi = x.
2. We show that the newly defined random variable U' satisfies
U' = hx x/(U)
and is, from (35), stochastically smaller than U. Because g, Y and U are
continuous, the conditional distribution of U given Xi is continuous, hence
hx x and its inverse hx! x can be defined. Using the notation
(i) u < u', again because of (35), and hx, X being its inverse.
(ii) Fulx =x(u) = Fujx=x,(u'), which follows directly from the definition of
hx, xI
(iii) The events U < u' and U' < u are identical, as U' is a monotone
function of U.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1185
REMARK A. 1. Note that the seemingly simple route of proving Lemma 3.1
via showing
Pr(X eD Yi = y, U = u)
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1186 Y BENJAMINI AND D. YEKUTIELI
in y and in u. However, because gi is increasing, fixing Xi and increasing U
will decrease Yi, because Y is PRDS, and
(41) Pr(X E D I Xi = x, U = u)
does not necessarily increase in u. If expression 41 increases in u, for example
when the components of Y are independent, proof of Lemma 3.2 is immediate
because the distribution of U I Xi = x' is stochastically greater than the
distribution of U I Xi = x.
REMARK A.4. If conditions (a)-(c) of the lemma are met, while condition
(d), U and Yi, are PRDS on Xi is only true for Xi such that Xi > xi then
altering the proof accordingly, X is PRDS on Xi > xi.
REFERENCES
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
CONTROLLING THE FDR UNDER DEPENDENCY 1187
HOCHBERG, Y. and HOMMEL, G. (1998). Step-up multiple testing procedures. Encyclopedia Statist.
Sci. (Supp.) 2.
HOCHBERG, Y. and ROM, D. (1995). Extensions of multiple testing procedures based on Simes'
test. J Statist. Plann. Inference 48 141-152.
HOCHBERG, Y. and TAMHANE, A. (1987). Multiple Comparison Procedures. Wiley, New York.
HOLLAND, P. W. and ROSENBAUM, P. R. (1986). Conditional association and unidimensionality in
monotone latent variable models. Ann. Statist. 14 1523-1543.
HOLM, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist 6 65-70.
HOMMEL, G. (1988). A stage-wise rejective multiple test procedure based on a modified Bonferroni
test. Biometrika 75 383-386.
Hsu, J. (1996). Multiple Comparisons Procedures. Chapman and Hall, London.
KARLIN, S. and RINOTT, Y. (1980). Classes of orderings of measures and related correlation
inequalities I. Multivariate totally positive distributions. J. Multivariate Statist. 10
467-498.
KARLIN, S. and RINOTT, Y. (1981). Total positivity properties of absolute value multinormal
variable with applications to confidence interval estimates and related probabilistic
inequalities. Ann. Statist. 9 1035-1049.
LANDER E. S. and BOTSTEIN D. (1989). Mapping Mendelian factors underlying quantitative traits
using RFLP linkage maps. Genetics 121 185-190.
LANDER, E. S. and KRUGLYAK L. (1995). Genetic dissection of complex traits: guidelines for inter-
preting and reporting linkage results. Nature Genetics 11 241-247.
LEHMANN, E. L. (1966). Some concepts of dependence. Ann. Math. Statist. 37 1137-1153.
NEEDLEMAN, H., GUNNOE, C., LEVITON, A., REED, R., PRESIE, H., MAHER, C. and BARRET, P. (1979).
Deficits in psychologic and classroom performance of children with elevated dentine
lead levels. New England J Medicine 300 689-695.
PATERSON, A. H. G., POWLES, T. J., KANIS, J. A., MCCLOSKEY, E., HANSON, J. and ASHLEY, S. (1993).
Double-blind controlled trial of oral clodronate in patients with bone metastases from
breast cancer. J Clinical Oncology 1 59-65.
ROSENBAUM, P. R. (1984). Testing the conditional independence and monotonicity assumptions of
item response theory. Psychometrika 49 425-436.
SARKAR, T. K. (1969). Some lower bounds of reliability. Technical Report, 124, Dept. Operation
Research and Statistics, Stanford Univ.
SARKAR, S. K. (1998). Some probability inequalities for ordered MTP2 random variables: a proof
of Simes' conjecture. Ann. Statist. 26 494-504.
SARKAR, S. K. and CHANG, C. K. (1997). The Simes method for multiple hypotheses testing with
positively dependent test statistics. J. Amer. Statist. Assoc. 92 1601-1608.
SEEGER, (1968). A note on a method for the analysis of significances en mass. Technometrics 10
586-593.
SEN, P. K. (1999a). Some remarks on Simes-type multiple tests of significance. J Statist. Plann.
Inference, 82 139-145.
SEN, P. K. (1999b). Multiple comparisons in interim analysis. J Statist. Plann. Inference 82
5-23.
SHAFFER, J. P. (1995). Multiple hypotheses-testing. Ann. Rev. Psychol. 46 561-584.
SIMES, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance.
Biometrika 73 751-754.
STEEL, R. G. D. and TORRIE, J. H. (1980). Principles and Procedures of Statistics: A Biometrical
Approach, 2nd ed. McGraw-Hill, New York.
TAMHANE, A. C. (1996). Multiple comparisons. In Handbook of Statistics (S. Ghosh and C. R. Rao,
eds.) 13 587-629. North-Holland, Amsterdam.
TAMHANE, A. C. and DUNNETT, C. W. (1999). Stepwise multiple test procedures with biometric
applications. J Statist. Plann. Inference 82 55-68.
TROENDLE, J. (2000). Stepwise normal theory tests procedures controlling the false discovery rate.
J. Statist. Plann. Inference 84 139-158.
WASSMER, G., REITMER, P., KIESER, M. and LEHMACHER, W. (1999). Procedures for testing multiple
endpoints in clinical trials: an overview. J. Statist. Plann. Inference 82 69-81.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms
1188 Y BENJAMINI AND D. YEKUTIELI
WELLER, J. I., SONG, J. Z., HEYEN, D. W, LEWIN, H. A. and RON, M. (1998). A new approach to the
problem of multiple comparison in the genetic dissection of complex traits. Genetics
150 1699-1706.
WESTFALL, P. H. and YOUNG, S. S. (1993). Resampling Based Multiple Testing, Wiley, New York.
WILLIAMS, V. S. L., JONES, L. V. and TUKEY, J. W (1999). Controlling error in multiple comparisons,
with special attention to the National Assessment of Educational Progress. J Behav.
Educ. Statist. 24 42-69.
YEKUTIELI, D. and BENJAMINI, Y. (1999). A resampling based false discovery rate controlling mul-
tiple test procedure. J Statist. Plann. Inference 82 171-196.
This content downloaded from 61.231.157.215 on Fri, 30 May 2025 09:45:48 UTC
All use subject to https://about.jstor.org/terms