Board of The Foundation of The Scandinavian Journal of Statistics
Board of The Foundation of The Scandinavian Journal of Statistics
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Wiley-Blackwell and Board of the Foundation of the Scandinavian Journal of Statistics are collaborating with
JSTOR to digitize, preserve and extend access to Scandinavian Journal of Statistics.
http://www.jstor.org
Scand J Statist 6: 65-70, 1979
ABSTRACT. This paper presents a simple and widely ap- The methodologicalmotivationand exact definition
plicable multiple test procedure of the sequentially rejective is the following.
type, i.e. hypothesesare rejected one at a tine until no further
rejectionscan be done. It is shownthat the test has a prescribed Let the (detail)hypothesesin a multipletest prob-
level of significance protection against error of the first kind lem be denoted by H1, H12,..., Hn and the alternatives
for any combinationof true hypotheses. The power properties to those by K., K2, ..., K,. A (non-randomized) mul-
of the test and a number of possible applications are also tiple test procedureis a rule assigningto each out-
discussed.
come a set of rejectedhypotheses(which might be
Key words:multiple test, simultaneous test empty). This means that there are also n critical
regions C1, C2, ..., Cn consisting of those outcomes
for whichthe correspondinghypothesesare rejected.
1. Introduction In a test of a single null hypothesisH1 againstan
alternativeK1 the size of the test is defined as the
The statisticalproblemsarisingin applicationsoften
supremumof the probabilityof the criticalregion C1
involve a numberof detail problems,i.e. there are
when the hypothesisH11is true. This probabilityof
often a numberof interestingparametersto be esti-
errorof the first kind is alwayskept at (or below) a
mated and/or a numberof interestinghypothesesto small predeterminedlevel a. The philosophicalrea-
be tested. In some cases these detail problemsmay son for this is that when we have made a 'discovery'
be treatedseparatelywithoutany connectionto each
by rejectingthe null hypothesiswe can quite safely
other.But in most cases the detailproblemsare con-
claim that the null hypothesisis not true, becauseif
nected to each other and the totality of solutions to
it was true, we should have accepted it with a
the detailproblemsare used to get a generalpicture.
probability of at least 1 -oc. This also implies that
In this latter case the statisticianis faced with a
we do not makeany 'discovery'by acceptingthe null
multiplestatisticalinferenceproblem,where he has hypotheses,becausewe do not have such a protec-
to take into considerationthat the differentdetail tion againsterrorsof the second kind.
problemsshould be treatedsimultaneously. In a multiple test of a number of hypotheses
Multiple statisticalinferencehas been a vital re-
H,, H12,..., Hn there are a lot of possible combina-
search area within statistical inference theory the tions of null hypotheses.If we want to make our
past 50 years, and methodshave been proposedfor 'discoveries'in form of rejectednull hypothesesto
severalsituationsof practicalinterest.A good presen- be safely claimed, we must keep the probabilityof
tation of the earliermain results is given by Miller rejectingany true null hypothesessmall, how many
(1966).The multiplestatisticalinferencemethodsare and which the true hypothesesmay be. Thus we are
separatedinto two main types, multipleconfidence led to the following definition
intervalmethodsand multipletest methods.
For multipletest proceduresthere has been sug-
gested several types of properties,which the tests Definition.A multiple test procedurewith critical
should have in orderto give satisfactoryprotection regions C1, C2, ..., Cn for testing hypotheses Hi,
againstwrongdecisions.Some of those are basedon H2, ..., H,, is said to have a multiple level of signifi-
decisiontheoreticconceptions,whileothersare based cance a (for free combinations)if for any non-empty
on probabilitiesof makingwrong decisions.In this index set I' {1, 2, 3, ..., n} the supremum of the
paperwe will study multipletest proceduresand we probabilityP( u iE C2) when Hi are true for all i eI
will use the most commontype of protectionagainst is smallerthan or equal to a.
errorof the firstkind by requiringthe tests to have a The words 'for free combinations'are put into the
small probabilityof rejectingany true hypotheses. definition in order to underlinethat all subsets of
5 - 791923 Scand J Statist 6
66 S. Holm
null hypothesescould appearas the set of truehypo- test in an analysisof variancesituation,and indicate
theses. There might be situationsin which all sub- that otherscan be constructed.But they do not seem
sets are not allowed for some reason, for instance to have thoughtof the simpleand generalprocedure
situationswherethe truth of two hypothesesimplies we presentin the nextsection,becausethat procedure
the truth or falsenessof a third hypothesis.It is to can easily be used to make one-sided rejections,
be observedthat a multiplelevel of significanceacfor which they have posed as a difficult problem. Our
some restrictedcombinationsimposes fewer condi- test is based on the simple Boole inequalityand can
tions on the test procedurethan a multiplelevel of be applied to any parametric or non-parametric
significancea for free combinations,i.e. a test proce- model, but yet it has good power properties.It will
dure with multiple level of significance x for free be shown by examplesthat it may have considerably
combinationshas a multiple level of significancea higherpowerthan classicalmultipletest procedures.
for any type of restrictedcombinations. It also has surprisinglysmallloss of powercompared
In our setting the basic hypotheses H1, H2, ..., H. to the special sequentiallyrejective tests (or equi-
are minimal in the sense of Gabriel (1969). This valentconsonantclosedtests)that can be constructed
means that if o,, con,..., on are the parametersets in differentparametricmodels, for instanceanalysis
where the hypotheses H1, H2, ..., Hn are true than of variancemodels.
the only (secondary)hypothesesto be tested are the
hypotheses that the parameterbelongs to intersec-
tions nfEIco, of sets coi for different index sets 2. A simple sequentiallyrejectivetest
Ic- {1, 2, ..., n}.
In this section we will presenta simple sequentially
We will exclusivelydiscussa type of multipletest rejective
test, whichis based on the Boole inequality.
procedures,which may be called sequentiallyrejec- The use of the
Boole inequalitywithin multiplein-
tive because basic hypothesesare rejectedone at a ference
theory is usuallycalled the Bonferronitech-
time accordingto certainrules.Thuswe do not make
nique, and for this reason we will call our test the
separatetests of all the (secondary)hypothesesthat
sequentiallyrejectiveBonferronitest.
the parameter belongs to intersections nflI wi for
When the n hypotheses H,, H2, ..., Hn are tested
different Ic {1, 2, ..., n}. We always consider such
separatelyby using tests with the level ac/nit follows
(secondary)hypothesesto be rejectedas soon as any immediatelyfrom the
Boole inequalitythat the prob-
of the includedbasic hypothesesare rejected.
ability of rejectingany true hypotheses is smaller
A test procedureis called coherent if it prevents then or
equal to ac.This constitutesthen a multiple
the contradictionof rejectinga hypothesiswithout
test procedurewith the multiplelevel of significance
also rejectingall other hypothesesimplyingit. It is a for free
combination, the classical Bonferroni
called consonant if it avoids dissonancesconsisting
multipletest procedure.
in rejectinga hypothesisand not rejectingany other The separate tests in the classical Bonferroni
hypothesesimpliedby it. (See Gabriel,1969,pp. 229 multipletest are
usuallyperformedby usingsometest
and 231.) The sequentiallyrejectivetests are coherent statistics, which we
will denote here by Y,, Y2,..., Yn.
and consonantby their very definition. We suppose now that this is the case, and also that
In many applications there are logical implica- these test
statistics have a tendency of obtaining
tions amongthe basichypothesesi.e. some combina- greatervalues when
the correspondinghypothesisis
tions of falseness of differentbasic hypotheses are not true. The
criticallevel Sk(Y) for the outcomey of
not allowed becausethereare no possibleparameter
the test statistic Ykis then equal to the supremumof
points correspondingto those combinations.Then the
probabilityP(Yk>y) when the hypothesisHk is
we do not want the multipletest procedureto end
true. Definingnow the obtainedlevels Rl, R2, ..., Rn
up with a statementthat the parameterbelongs to by
such an empty set. This requirementhas to be
studied separatelyfor each kind of logical implica- Rk = dk(Yk)
tion. We will consider only the type of logical im-
plications arising when we have two-sided alterna- the classical Bonferronitest can be performedby
tives for some parameters,and want to make one- comparingall the obtainedlevels R&,R2, ..., Rn with
sided statements. a/n.
The sequentiallyrejective multiple tests are not The sequentiallyrejectiveBonferronitest will also
completelynew. Tests of the same type are discussed be definedby the obtainedlevels.Denotingby R('1<
by Naik (1975, p. 522), and the consonant closed R'2' 6 ... <R(n,the ordered obtained levels and by
proceduresdiscussedby Marcuset al. (1976, p. 656) H"), H , ..., H'() the corresponding hypotheses,
are equivalentto sequentiallyrejectivetests. Marcus the procedurecan most easilybe describedby scheme
et al. (1976) give one particularexample of such a 1, where ac,0 <La< 1, is a fixed number.
Scand J Statist 6
Sequentially rejective multiple test 67
Start occursthen
Is R(3) < ? ac ac ac
n-2
n' n-I' '
es Accept H(3),H(), H(.).
t
i Stop whereas in the classical Bonferroni test they are
Reject H'9 comparedto a/n. This meansthat the probabilityof
rejectingany set of (false)hypothesesusingthe classi-
cal Bonferronitest is smaller than or equal to the
same probability using the sequentially rejective
Bonferronitest basedon the same test statistics.The
Is Rkn)<?- classical Bonferronitest has been used mainly in
Yes No Accept H(n)
situationswhereno other(morespecial)multipletest
procedureis available.It can alwaysbe replacedby
Reject H(n| Stop the correspondingsequentiallyrejectiveBonferroni
Stop
test withoutloosing any probabilityof rejectingfalse
hypotheses. Except in trivial non-interestingcases
Scheme I the sequentiallyrejectiveBonferronitest has strictly
larger probabilityof rejectingfalse hypothesesand
The test is performedby startingat the top of the thus it ought to replacethe classicalBonferronitest
schemeand going down step by step until no further at all instantswherethe latterusuallyis applied.
rejection can be done. This can happen either by The power gain obtained by using a sequentially
acceptingall remaininghypothesesor rejectingthe rejectiveBonferronitest instead of a classical Bon-
last hypothesisH(n). ferronitest dependsvery much upon the alternative.
It is small if all the hypothesesare 'almosttrue', but
Theorem 1. The sequentially rejective Bonferroni test it may be considerableif a numberof hypothesesare
described by scheme I has the multiple level of sig- 'completelywrong'. If m of the n basic hypotheses
nificance a for free combinations. are 'completelywrong' the correspondinglevels at-
tain small values, and these hypothesesare rejected
Proof. Let I be the set of indexesof the true hypo- in the first m steps with a big probability. The
theses. By the Boole inequalitywe then have other levels are then comparedto ac/kfor k = n -m,
n-rm-l, n-rm -2, ..., 2, 1, which is equivalent to
P Rf >- for all i e performinga sequentiallyrejectiveBonferronitest
m
only on those hypotheses that are not 'completely
=1-P(R<- for some iEl) wrong'.
A very simple examplewill indicate how big the
power gain may be. Suppose that Yk, k = 1, 2, ..., 10
are independentand normallydistributedwith para-
meters Ilk and 1 for k = 1, 2, ..., 10 and that we want
where m is the numberof elementsin I. But if the to test the hypothesesHk: k = 0 againstthe alterna-
event tivesIlk 10
>0 for k = 1, 2, ..., at a multiple level of
significance0.05. If four of the Puk'sare equal to 0.0,
four of them are equal to 6.0 and the remainingtwo
{Ri >-m for all i e I} are equal to 3.0, the classicalBonferronitest rejects
Scand J Statist 6
68 S. Holm
both the latter hypotheses with probability 0.439, This means that for any outcomes Yi and Y2 `Y1
while the sequentiallyrejectiveBonferroniprocedure of Y1 and Y2 at least one of the obtained levels
rejectsboth with probability0.565. Mi(Y,) and d2(Y2) is >, and thus both hypotheses
The greatadvantagewith the sequentiallyrejective H1 and H2can not be rejectedin a sequentiallyrejec-
Bonferronitest (as well as with the classical Bon- tive Bonferronitest (or a classical Bonferronitest)
ferronitest) is its flexibility.Thereare no restrictions for any multiplelevel of significancecx6 i.
on the type of tests, the only requirementbeing that If there are a numberof pairs of one-sidedhypo-
it should be possible to calculatethe obtainedlevel theses and no logical implicationsbesidethose with-
for each separatetest. Furtherthereare no problems in the pairsall illogicalstatementswill still be avoided
in includingin the analysisonly the a priori interest- if the same statistics with opposite signs are used
ing hypotheses, while more special multiple tests withinthe pairsand the multiplelevel of significance
usuallyincludeall hypothesesof a certainkind. But ocis smallerthan or equal to 1. These tests are also
when there exist logical implications among the coherentand consonant.
hypothesesproblemsarisewhichwe have to take in-
to consideration. 3. Applicationsand extensions
Let as before wl, w02, Ct3, ..., on denote the para-
meter sets where the hypotheses H1, H2, H3, ..., Hn The sequentiallyrejectiveBonferronitest can be ap-
are true. Then there exists a logical implicationas plied in all situationswherethe classicalBonferroni
soon as there is some indexset Isuch that nieiC Oi= test is usually applied. And it ought to replace the
b. In words this means that some combination of classical Bonferroni test in these cases because it
falsenessof the differenthypothesesis not possible, gives only slightly more complicatedcomputations
and the naturalconditionis of coursethat we should and a non-negligableincrease of power. It should
not end up the multipletest with a statementthat the howeverbe noted that the sequentiallyrejectiveBon-
true parameterpoint is in an emptyset. Each type of ferronitest can not be used to constructsmallercon-
logical implicationrequiresa special analysis of the fidence sets than those constructedby the classical
propertiesof the test statisticsin orderto ensurethat Bonferronitest. This is so because the confidence
the test can not end up with such statements.The set consists of the parameterpoints that would not
only type of logical implicationwe will consider is be rejectedas trueparameterpointsin separatesingle
the one arising in connection with two-sided rejec- tests. And when a confidenceset is constructedfrom
tions. multipletests it consists of the parameterpoints for
Let y be a (one-dimensional)parameterand sup- which none of the detail hypotheses are rejected,
pose that H1: y < yOand H2: y > y2oare basic hypo- whichis in fact a specialconstructionof a singletest
theses in a multiple test problem. Then C a,l n C co2= from a multipletest. If the sequentiallyrejectiveBon-
b and both these hypothesesshould not be rejected ferronitest is used in this way it is equivalentto the
in the multipletest procedure.It is naturalto use the classicalBonferronitest.
same test statisticto test both hypothesesand since The great advantageof the sequentiallyrejective
we have the convention of rejectingthe hypotheses Bonferronitest (as well as the classical Bonferroni
for high values of the test statisticswe should have test) is its computational simplicity, which arises
Y2= - Y1. Now for the outcomes Yi of Y1 and y= from the reductionof the distributionalproblemsto
-Yl for Y2the obtainedlevels ail(yi)and A2(Y2)satisfy one dimensionwhenthe Boole inequalityis used.The
same computationalsimplicityis obtainedwhen the
test statisticsare independent.It is easily seen that
Y2)UP P(Y2 > Y2)
V>Yo a sequentiallyrejectiveprocedurewith multiplelevel
of significancea can be constructedby replacingthe
> sup P(Y2 > Y2)= sup (1 -P(Y2 < Y2))
V=YO Y=Yo
comparison constants c/n, cx/(n-1), ..., c/1 in the
sequentially rejective Bonferroni test by 1 - (1 - c)11',
= sup (1 - P(Y1 >y1)) 1 - (1 - c)1/(nl1)' ..., 1 - (1 - cc)1, which are greater.
Y=Yo
This meansthat we get a more powerfultest, but the
= 1 - inf P(Y1 < y1) increase in power is not very big. Among the
Y='Yo numerous possible applicationsof the sequentially
> 1 - inf P(Y1 > y1) rejectiveBonferronitest we will next mentiona few.
Y=VYo The problemof comparingseveraltreatmentswith
> 1 -sup P(Y1 > y1) one control have been studied by several authors.
V=Yo For the case of normallydistributedobservationsthe
> 1-sup P(Y1 > Ay1)1 l(y)
multipletest proceduresuggestedby Dunnett (1955)
V?Yo
is commonly used. It requiresthe same number of
Scand J Statist 6
Sequentially rejective multiple test 69
Scand J Statist 6
70 S. Holm
Acknowledgement
The referee's suggestions for improving the presentation are
Yes No Accept l gratefully acknowledged.
I
t > LStop
Reject H"3
Stop References
Dunnett, C. W. (1955). A multiple comparison procedure for
Scheme 2
comparing several treatments with a control. J. Amer.
Statist. Assoc. 50, 1096-1121.
Gabriel, K. R. (1969). Simultaneous test procedures-some
Now supposethat the event theory of multiple comparisons. Ann. Math. Statist. 40,
224-250.
Haberman, S. J. (1974). The analysis of frequency data. The
St> - for all i e I} University of Chicago Press.
Holm, S. A. (1977). Sequentially rejective multiple test proce-
JEI
dures. Statistical research report 1977-1, University of
UmeA, Sweden.
occurs,and let v be the smallestordernumberin the Marcus, R., Peritz, E. & Gabriel, K. R. (1976). On closed
series S(1)< (2)<... <S (n) attained by the variables testing procedures with special reference to ordered ana-
{St: iEI}. Then lysis of variance. Biometrika 63, 655-660.
Miller, R. G., Jr. (1966). Simultaneous statistical inference.
McGraw-Hill, New York.
c
S(y) > -x- Naik, U. D. (1975). Some selection rules for comparing p
processes with a standard. Communicationsin Statistics
fI na c(l 4, 519-535.
J=v
Sture Holm
which implies that the procedurewill stop in step v Department of Mathematics
or earlier and that all true hypotheseswill be ac- Chalmers University of Technology
E S-412 96 Goteborg
cepted. Sweden
From the definitionof the generalizedsequentially
rejectiveBonferronitest and the proof of Theorem2
it can easily be seen what role is played by the
constantscl, c2, ..., cn. At each step in the procedure
the obtainedlevelsfor the not yet rejectedhypotheses
Scand J Statist 6