Tutorial Cluster RCT No Stata
Tutorial Cluster RCT No Stata
Editors
H. Joseph Newton Nicholas J. Cox
Department of Statistics Department of Geography
Texas A&M University Durham University
College Station, Texas Durham, UK
[email protected] [email protected]
Associate Editors
Christopher F. Baum, Boston College Frauke Kreuter, Univ. of Maryland–College Park
Nathaniel Beck, New York University Peter A. Lachenbruch, Oregon State University
Rino Bellocco, Karolinska Institutet, Sweden, and Jens Lauritsen, Odense University Hospital
University of Milano-Bicocca, Italy Stanley Lemeshow, Ohio State University
Maarten L. Buis, WZB, Germany J. Scott Long, Indiana University
A. Colin Cameron, University of California–Davis Roger Newson, Imperial College, London
Mario A. Cleves, University of Arkansas for Austin Nichols, Urban Institute, Washington DC
Medical Sciences Marcello Pagano, Harvard School of Public Health
William D. Dupont, Vanderbilt University Sophia Rabe-Hesketh, Univ. of California–Berkeley
Philip Ender, University of California–Los Angeles J. Patrick Royston, MRC Clinical Trials Unit,
David Epstein, Columbia University London
Allan Gregory, Queen’s University Philip Ryan, University of Adelaide
James Hardin, University of South Carolina Mark E. Schaffer, Heriot-Watt Univ., Edinburgh
Ben Jann, University of Bern, Switzerland Jeroen Weesie, Utrecht University
Stephen Jenkins, London School of Economics and Nicholas J. G. Winter, University of Virginia
Political Science Jeffrey Wooldridge, Michigan State University
Ulrich Kohler, University of Potsdam, Germany
The Stata Journal publishes reviewed papers together with shorter notes or comments, regular columns, book
reviews, and other material of interest to Stata users. Examples of the types of papers include 1) expository
papers that link the use of Stata commands or programs to associated principles, such as those that will serve
as tutorials for users first encountering a new field of statistics or a major new technique; 2) papers that go
“beyond the Stata manual” in explaining key features or uses of Stata that are of interest to intermediate
or advanced users of Stata; 3) papers that discuss new commands or Stata programs of interest either to
a wide spectrum of users (e.g., in data management or graphics) or to some large segment of Stata users
(e.g., in survey statistics, survival analysis, panel analysis, or limited dependent variable modeling); 4) papers
analyzing the statistical properties of new or existing estimators and tests in Stata; 5) papers that could
be of interest or usefulness to researchers, especially in fields that are of practical importance but are not
often included in texts or other journals, such as the use of Stata in managing datasets, especially large
datasets, with advice from hard-won experience; and 6) papers of interest to those who teach, including Stata
with topics such as extended examples of techniques and interpretation of results, simulations of statistical
concepts, and overviews of subject areas.
The Stata Journal is indexed and abstracted by CompuMath Citation Index, Current Contents/Social and Behav-
ioral Sciences, RePEc: Research Papers in Economics, Science Citation Index Expanded (also known as SciSearch,
Scopus, and Social Sciences Citation Index.
For more information on the Stata Journal, including information for authors, see the webpage
http://www.stata-journal.com
Subscriptions are available from StataCorp, 4905 Lakeway Drive, College Station, Texas 77845, telephone
979-696-4600 or 800-STATA-PC, fax 979-696-4601, or online at
http://www.stata.com/bookstore/sj.html
Subscription rates listed below include both a printed and an electronic copy unless otherwise mentioned.
1-year university library subscription $125 1-year university library subscription $165
2-year university library subscription $215 2-year university library subscription $295
3-year university library subscription $315 3-year university library subscription $435
http://www.stata.com/bookstore/sjj.html
Individual articles three or more years old may be accessed online without charge. More recent articles may
be ordered online.
http://www.stata-journal.com/archives.html
The Stata Journal is published quarterly by the Stata Press, College Station, Texas, USA.
Address changes should be sent to the Stata Journal, StataCorp, 4905 Lakeway Drive, College Station, TX
77845, USA, or emailed to [email protected].
c 2013 by StataCorp LP
Copyright
Copyright Statement: The Stata Journal and the contents of the supporting files (programs, datasets, and
c by StataCorp LP. The contents of the supporting files (programs, datasets, and
help files) are copyright
help files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy
or reproduction includes attribution to both (1) the author and (2) the Stata Journal.
The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part,
as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.
Written permission must be obtained from StataCorp if you wish to make electronic copies of the insertions.
This precludes placing electronic copies of the Stata Journal, in whole or in part, on publicly accessible websites,
fileservers, or other locations where the copy may be accessed by anyone other than the subscriber.
Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting
files understand that such use is made without warranty of any kind, by either the Stata Journal, the author,
or StataCorp. In particular, there is no warranty of fitness of purpose or merchantability, nor for special,
incidental, or consequential damages such as loss of profits. The purpose of the Stata Journal is to promote
free communication among Stata users.
The Stata Journal (ISSN 1536-867X) is a publication of Stata Press. Stata, , Stata Press, Mata, ,
and NetCourse are registered trademarks of StataCorp LP.
The Stata Journal (2013)
13, Number 1, pp. 114–135
1 Introduction
Sample-size calculations are frequently undertaken for cluster randomized controlled tri-
als (RCTs). This is usually done by prespecifying the average cluster size, obtaining the
sample size required under individual randomization, and inflating by the design effect
(DE), which is a simple function of the intracluster correlation (ICC) (Donner and Klar
2000). Alternatively, heterogeneity between clusters can be parameterized by the coef-
ficient of variation (standard deviation or mean) of the outcome and similar two-step
procedures (Hayes and Bennett 1999). However, these two-step procedures are some-
times not efficient (for example, when many calculations are required) and sometimes
not quite so straightforward. The reasons are outlined below.
Cluster sample-size calculations are not completely straightforward in a number
of situations. Complexity arises in cases when the user prespecifies the number of
clusters available (as opposed to the average cluster size); when the user requires a
power or detectable difference calculation (as opposed to a sample-size calculation);
and particularly when the calculation involves binary outcomes. This is because the
conventional inflation by the DE is only useful when the user specifies the cluster size
c 2013 StataCorp LP
st0286
K. Hemming and J. Marsh 115
and needs to obtain an estimate of the number of clusters needed. When the user
specifies the number of clusters available and needs to obtain an estimate of the cluster
size, the inflation over that which is required under individual randomization depends
on the very quantity the user is trying to compute, the cluster size.
Additionally, because limited precision sets in as the cluster sizes increase, some
designs will be infeasible (Guittet, Giraudeau, and Ravaud 2005). That is, irrespective
of how large the clusters are made, a fixed number of available clusters might mean
there is insufficient power to detect the required difference. When the objective is to
calculate power or detectable difference under cluster RCT designs of fixed sample sizes
for continuous outcomes, the user can use the simple relationships that exist between
those power and detectable differences obtainable under individual randomization and
those obtainable under cluster randomization (Hemming et al. 2011). To obtain an
estimate of the detectable difference for binary outcomes where the variance depends
on the proportion, the user must solve a quadratic equation. This is also the case
for the computation of detectable differences for continuous outcomes when the cluster
heterogeneity is parameterized by the coefficient of variation.
Currently, several options are available to Stata users planning a cluster RCT. The
sampsi command may be used to estimate the required number of clusters (for both
binary and continuous outcomes) via a two-step procedure that involves calculating the
sample size under individual randomization and inflating this by a self-computed DE. To
estimate power for continuous outcomes, the user could also use sampsi after inflating
the estimated standard deviation by the DE. For cluster designs, sampsi cannot be used
to estimate detectable differences, power for binary outcomes, or the number of clusters
required.
Another two-step method consists of using the sampclus command (Garrett 2001),
which again requires the user to calculate the sample size required under individual
randomization immediately before implementing the command. With sampclus, the
user is permitted to specify either the number of clusters available or the cluster size and
the command returns, whichever is not specified. In cases where the number of clusters
available is insufficient to detect the required difference at the prespecified power level,
the user is alerted and informed of the minimum number of clusters required. sampclus
does not compute power (to detect a prespecified difference for a fixed sample size) or
detectable difference (to detect a prespecified power for a fixed sample size).
The command clsampsi (Batistatou and Roberts 2010) was developed primarily
for designs with differential clustering between arms. Differential clustering occurs, for
example, when the individuals in the intervention arm are grouped (say, group therapy)
but there is no grouping in the control arm. While clsampsi does offer a single-step
procedure that calculates both the power (for a prespecified difference for a fixed sample
size) and the sample size (either the number of clusters or the cluster size), it does not
compute the detectable difference and does not alert the user to infeasible designs.
116 Sample-size calculations in cluster randomized controlled trials
2.1 Background
Suppose a trial will test the null hypothesis H0 : µ1 = µ2 , where µ1 and µ2 represent the
means of two populations, by using a two-sample t test and assuming that var(µ1 ) =
σ12 and var(µ2 ) = σ22 . Suppose further that an equal number of individuals will be
randomized to both arms, letting d denote the difference to be detected such that
d = µ1 − µ2 , 1 − β denote the power, and α denote the significance level. Alternatively,
we may be interested in comparing two proportions, p1 and p2 , or two rates, λ1 and λ2 .
We limit our consideration to trials with two equal-sized parallel arms (two-sided t tests).
K. Hemming and J. Marsh 117
Then we assume normality of outcomes and approximate the variance of the difference
of the two proportions or two rates (Hemming et al. 2011). The approximations made
for binomial proportions (Armitage, Berry, and Matthews 2002) are slightly different
from those made in the sampsi command (details in appendix).
DE = 1 + (m − 1)ρ
where ρ is the ICC coefficient. This DE is modified for varying cluster sizes by a function
that depends on the coefficient of variation of the cluster sizes, cvsizes (Eldridge, Ashby,
and Kerry 2006; this term is not to be confused with the coefficient of variation of
outcomes, cvclusters , described above).
From this total sample size, the number of clusters (k) required per arm can be
calculated. We round up the number of clusters so that the total sample size is a
multiple of the cluster size (using the ceiling function). Additionally, we add one extra
cluster to each arm to allow for the use of the t distribution (Hayes and Bennett 1999).
If the user instead specifies the average cluster size and needs to determine the number of
clusters required per arm (m), the formula can be rearranged to determine the number
of clusters as a function of the sample size required under individual randomization,
the ICC, and the average cluster size (and also cvsizes ). More detailed mathematical
formulas are provided in the appendix.
The between-cluster heterogeneity may be parameterized using either the ICC coeffi-
cient or cvclusters ; clustersampsi permits specification of either parameter. The sample-
size formula for the cvclusters method is outlined below (Hayes and Bennett 1999). The
number of clusters k required is
nI
k =1+ + CVIF (1)
m
where the coefficient of variation inflation factor (CVIF) is
2
cvclusters (µ21 + µ22 )(zα/2 + zβ )2
CVIF =
d2
where zα/2 denotes the upper 100α/2 standard normal centile.
that when you parameterize the heterogeneity by using the ICC, the power for cluster
RCTs is the power available under individual randomization for a standardized effect
size that is deflated by the square root of the DE. Similarly, for cluster RCTs of fixed
sample size and prespecified power, the detectable difference is that of a trial using
individual randomization inflated by the square root of the DE (Hemming et al. 2011).
When parameterizing the heterogeneity with cvclusters , the power available is obtained
by a simple rearrangement of the sample-size formula [(2) above], whereas obtaining the
detectable difference involves solving a quadratic formula.
k > (nI × ρ) + 1
These formulas differ slightly from those reported elsewhere because of the addition of
one more cluster in each arm (to allow for the use of the t distribution). When you
parameterize the heterogeneity by the coefficient of variation, the following inequality
must hold for the design to be feasible:
k > CVIF + 1
Where these inequalities do not hold, the clustersampsi command will determine the
maximum available power to detect the prespecified difference, the minimum detectable
difference under the prespecified value for power, and the minimum number of clusters
required to detect the prespecified difference at the prespecified value of the power
(Hemming et al. 2011).
1. The Main tab allows users to specify whether the calculation to be performed is
a sample-size calculation (default), a power calculation, or a detectable difference
calculation and whether this calculation is for binary, rates, or continuous (default)
outcomes. If users specify a sample-size calculation, then they must also specify
whether they desire to prespecify the average cluster size (the default, in which
case the command computes the number of clusters required) or whether they
wish to prespecify the number of clusters available (in which case the command
computes the average cluster size needed). On this tab, the user also specifies the
estimated ICC coefficient or the coefficient of variation.
2. The Options tab allows the user to specify the significance level (default 0.05),
the power (default 0.8), the number of clusters per arm, the cluster size (or average
cluster size), and cvsizes (default 0, indicating all the clusters are the same size).
Variables required to be specified on the Options tab are dependent on those
specified on the Main tab, and the user will only be able to input the variables
relevant to the calculation specified on the Main tab. For example, if the user
specifies a power calculation on the Main tab, the power option on the Options
tab will be shaded out. If the user specifies a sample-size calculation, then the
user must also specify only one of either the number of clusters or the cluster sizes.
3. The Values tab allows the user to specify the proportion, rate, or mean (and
standard deviation) values for the two arms, along with an estimate of correla-
tion between any before-and-after measurements or the correlation between any
covariates and the outcome (default value of 0). The command is limited to a max-
imum of one before and one after measurement (that is, it cannot accommodate
additional repeated measurements). Once again, depending on the calculations
requested on the Main tab (that is, sample size, power, detectable difference, and
binary or continuous outcomes), those values not relevant are shaded out.
120 Sample-size calculations in cluster randomized controlled trials
3 Examples
3.1 Example 1: Illustration of infeasible designs
In a real example, a cluster RCT will be designed to evaluate the effectiveness of support
to promote breastfeeding. Randomization will be carried out at a single point in time,
randomizing teams of midwives (the clusters) to either the intervention arm or the
standard care arm. The trial will be carried out within a single primary care trust,
so the number of clusters is limited to the 40 midwifery teams delivering care within
the region. A clinically important difference to detect is an increase in the rate of
breastfeeding from about 40% to 50%. Estimates of ICC range from 0.005 to 0.07 in
similar trials (MacArthur et al. 2003; MacArthur et al. 2009). Using these values, we
illustrate how clustersampsi can be used to determine the required cluster size.
Figure 1 shows a screenshot of the Main tab for this calculation to determine the
sample size for a Two sample comparison of proportions with an ICC of 0.005 (the lower
of the two ICC estimates).
Figure 2 shows the corresponding Options tab specifying a Significance level of 0.05
and 80% power. On this Options tab, the Number of clusters per arm is set at 20. The
Average cluster size is shaded out because this is a sample-size calculation specifying the
number of clusters and obtaining an estimate of the average cluster size required. The
Coefficient of variation of cluster sizes is left at the default value of 0 and so assumes
the cluster sizes are equal.
Figure 3 shows the Values tab for this calculation. Because this is a comparison
of binary proportions, the mean, standard deviation, and rate values are shaded out.
Proportion 1 is set at 0.4 and Proportion 2 at 0.5. The correlation between before-and-
after measurements is set at 0 because no baseline measurements are anticipated in this
cross-sectional study.
The Stata output from the command is shown below. The output shows that under
individual randomization, 385 individuals would be required per arm to detect a change
in proportions from 0.4 to 0.5 at 80% power and a 5% significance level. Allowing
for cluster randomization with 20 clusters per arm, a total of 23 individuals would be
required per cluster, equating to a total sample size of 460 per arm.
K. Hemming and J. Marsh 123
In a variation of this example, the ICC is replaced by the higher of the two estimates
of 0.07. The output for this computation is provided below. Under this estimate of the
ICC, the design becomes infeasible; that is, however many individuals are recruited per
cluster, it will not be possible to obtain 80% power to detect a difference between 0.4
and 0.5. In this scenario, the command alerts the user to this fact. The user is told
that the minimum number of clusters required to detect a change from 0.4 to 0.5 at
80% power is 28 per arm. Alternatively, the user is told that because of the prespecified
number of clusters (here, 20 per arm), the maximum achievable power would be in the
region of 65% (that is, with 20 clusters per arm to detect a difference from 0.4 to 0.5,
the study would have 65% power), and the minimum detectable difference is 0.12; that
is, the design would have 80% power to detect a change from 0.4 to 0.52.
124 Sample-size calculations in cluster randomized controlled trials
We illustrate how clustersampsi can be used to determine the effect sizes detectable
at 80% power under both estimates for the ICC for the fixed sample size. Initially, we
perform the calculations assuming the ICC is 0.018. The output for this calculation is
provided below and illustrates the use of cvsizes . The detectable event rate under the
intervention arm is 0.053 (assuming a decreasing event rate), which equates to a relative
risk of 0.69, that is, a relative risk reduction of 31%.
Because estimation of the ICC is subject to much uncertainty, we have also carried
out the calculation assuming the ICC is 0.038. Again the output is provided below.
Here the detectable event rate under the intervention arm is 0.049 (again assuming a
decreasing event rate), which equates to a relative risk of 0.63, that is, a 37% relative
risk reduction.
126 Sample-size calculations in cluster randomized controlled trials
A clinically important relative risk is in the region of 0.65, which equates to an event
rate in the treatment group of 0.05. If the ICC is as high as 0.038, then the trial will
have less than 80% power to detect this difference. We illustrate how clustersampsi
can be used to determine the power available to detect the clinically important relative
risk, assuming the ICC is 0.038:
K. Hemming and J. Marsh 127
The power available to detect this difference is 75%, close to 80%. Thus the trial will
almost be sufficiently powered to detect this difference.
This is very close to the 36.2 reported by Hayes and Bennett. In the trial, only 28
clusters were recruited. We can therefore use clustersampsi to evaluate the power
that the trial would have had if limited to 28 clusters:
clustersampsi estimates the power to be about 69%, again similar to that reported
by Hayes and Bennett.
K. Hemming and J. Marsh 129
4 Conclusion
While cluster sample-size calculations are, for the most part, simple extensions of those
required under individual randomization, specific commands in Stata for this class of
problems should prove very useful. Some commands are currently available in Stata to
perform these calculations, but one is very basic and requires a two-step approach, and
the other is specifically designed for trials in which there is no clustering in the control
arm.
The command outlined here, clustersampsi, allows not only for clustering but also
for varying cluster sizes, for baseline measurements, or for adjustment for predictive
covariates. It also incorporates calculations of samples sizes, power, and detectable
differences. It will alert the user to infeasible designs and suggest possible options. The
user can parameterize cluster heterogeneity by using either the ICC coefficient or the
coefficient of variation. The dialog box for clustersampsi should allow straightforward
implementation for the most common types of cluster RCTs.
When we compare the output of clustersampsi with that of sampclus, the es-
timates from clustersampsi tend to result in slightly higher sample sizes because it
rounds up to a multiple of the average cluster size and because it adds one to the num-
ber of clusters. On the other hand, compared with the estimates from clsampsi, the
estimates from clustersampsi tend to be more conservative (that is, a slightly lower
estimated sample size or slightly higher estimated power) because of the noncentral F
distribution used by clsampsi. These differences are more marked at the parameter
boundaries (such as small proportions or few clusters).
We have used a number of approximations here. First, we have approximated the
variance of proportions and rates, we have assumed normality, and we have not made
continuity corrections. Continuity-corrected sample-size calculations are more conserva-
tive but are not considered optimal by everyone (Royston and Babiker 2002). More im-
portantly, we have also approximated the variance reduction due to correlation between
any baseline measurements for binary outcomes by using normality approximations.
For continuous outcome measurements in RCTs, adjustment for baseline measurements
will always lead to a reduction in the standard deviation by a factor that depends
on the correlation between the before-and-after measurements (Robinson and Jewell
1991). For binary outcomes (as opposed to continuous outcomes), although adjustment
for baseline measures will lead to an increase in power, this is not necessarily by the
same factor. However, it has been shown by others to provide a good approximation
(Hernández, Steyerberg, and Habbema 2004).
5 Appendix: Formulas
The formulas follow those already published (Hemming et al. 2011; Hayes and Bennett
1999), with some minor modifications. When the heterogeneity between clusters is
specified by the ICC, then the formulas in Hemming et al. (2011) are used but with the
addition of one to the number of clusters in each arm to account for the t distribution
130 Sample-size calculations in cluster randomized controlled trials
rather than the normal distribution (as recommended by Hayes and Bennett [1999]).
When the heterogeneity between clusters is specified by the coefficient of variation,
then the formulas follow those in Hayes and Bennett (1999). The essential formulas for
both methods are described below.
VIF = 1 + (m − 1)ρ
For binary variables p1 and p2 , we approximate sd21 = p1 (1 − p1 ) and similarly for sd22 .
For rates λ1 and λ2 , we approximate the variances sd21 = λ1 and sd22 = λ2 .
The above formulas may be simply rearranged to compute power and detectable dif-
ferences for mean values. For detectable differences for binary outcomes, it is necessary
to solve the following quadratic to find the detectable difference p2 :
where
a = −1 − a1
b = 1 + 2a1 p1
c = p1 (1 − p1 ) − a1 p21
and where
(k − 1)m
a1 =
B × VIF(zα/2 + zβ )2
This provides two values for p2 that correspond to increasing and decreasing values.
K. Hemming and J. Marsh 131
If the user is limited to a fixed number of clusters and needs to determine the
number of observations per cluster, then (5) can be rearranged to give the number
of observations required for each cluster. So, where the clusters are of fixed size, the
number of observations per cluster is
nI (1 − ρ)
m=
k − 1 − ρnI
so that the number of clusters required to make this design feasible is greater than
ρnI + 1. If the clusters are of varying size, then using the alternative VIF in (6) gives
the number of observations required per cluster as
nI (1 − ρ)
m= 2
k − 1 − ρnI (cvsizes + 1)
and, in this case, the minimum number of clusters required to make this design feasible
2
is ρ(cvsizes + 1)nI + 1.
As well as computing the minimum number of clusters required under a design that
is infeasible, clustersampsi computes the maximum power value and the minimum
detectable difference available with the limited number of clusters. These values are
obtained by finding the maximum value for zβ or the minimum value for d2 , which
2
would result in k − 1 − nI ρ(cvsizes + 1) > 0. So for example, the maximum available
power for fixed m is s
(k − 1)d2
zβ = − zα/2
ρ(cvsizes + 1)(σ12 + σ22 )
2
For binary outcomes, the minimum detectable difference is given by (4) except that a1
is replaced by
(k − 1)
a1 = 2
(zα/2 + zβ )2 (Bcvsizes + 1)ρ
0 = aµ22 + bµ2 + c
a = cv 2 − a2
b = 2a2 µ1
(σ12 + σ22 )
c= − a2 µ21 + cv 2 µ21
m
where a2 is as in the binary case above.
Again, if the user is limited to a prespecified number of clusters, then it is possible
to determine the required average cluster size:
nI
m=
k − 1 − CVIF
Certain designs will be infeasible; for a feasible design, the number of clusters required
is greater than CVIF + 1. Alternatively, limited to this number of clusters, the design
will become feasible on either lowering the power or increasing the difference to be de-
tected. The maximum available power and minimum detectable difference are obtained
by determining the maximum value for zβ or minimum value for d2 , which results in
k − 1 − CVIF > 0.
The maximum available power for both continuous and binary outcomes is
s
(k − 1)d2
zβ = − zα/2
Bcvclusters (µ21 + µ22 )
2
K. Hemming and J. Marsh 133
The minimum detectable difference for both continuous and binary outcomes again
involves solving a quadratic whose coefficients are
a = 1 − a3
b = 2a3 µ1
c = µ21 − a3 µ21
and where
(k − 1)
a3 = 2
B × (zα/2 + zβ )2 cvclusters
All functions use ceiling values throughout, so for example, if the number of clusters is
estimated to be 7.1, this will be rounded up to 8.
clustersampsi will not give identical results to sampsi for the sample size un-
der individual randomization with binary data (hence, any cluster sample sizes calcu-
lated via a two-step approach from results of sampsi will not tally with results from
clustersampsi). This is due to an approximation in the case of equal allocation to
treatment group: sampsi uses no approximation (equation 3.2 in Machin et al. [1997])
but clustersampsi does (equation 3.8 in Machin et al. [1997]). Practically speaking,
the difference in sample sizes is only large (more than 10% of the exact sample size re-
quired) where small sample sizes (fewer than about 50) are called for. In such situations,
the more pressing issue is the use of a cluster design with small samples rather than the
precise size of said sample. Power will also differ for comparisons of proportions because
of the use of this approximation. Generally, this difference is negligible but may be of
concern when looking for particularly large effects.
6 Funding acknowledgment
Karla Hemming was partially funded by a National Institute of Health Research (NIHR)
grant for Collaborations for Leadership in Applied Health Research and Care (CLAHRC)
for the duration of this work. The views expressed in this publication are not necessarily
those of the NIHR or the Department of Health.
7 References
Armitage, P., G. Berry, and J. N. S. Matthews. 2002. Statistical Methods in Medical
Research. 4th ed. Oxford: Blackwell.
Donner, A., and N. Klar. 2000. Design and Analysis of Cluster Randomization Trials
in Health Research. London: Arnold.
134 Sample-size calculations in cluster randomized controlled trials
Eldridge, S. M., D. Ashby, and S. Kerry. 2006. Sample size for cluster randomized trials:
Effect of coefficient of variation of cluster size and analysis method. International
Journal of Epidemiology 35: 1292–1300.
Garrett, J. M. 2001. sxd4: Sample size estimation for cluster designed samples. Stata
Technical Bulletin 60: 41–45. Reprinted in Stata Technical Bulletin Reprints, vol. 10,
pp. 387–393. College Station, TX: Stata Press.
Guittet, L., B. Giraudeau, and P. Ravaud. 2005. A priori postulated and real power in
cluster randomized trials: Mind the gap. BMC Medical Research Methodology 5: 25.
Hayes, R. J., and S. Bennett. 1999. Simple sample size calculation for cluster-randomized
trials. International Journal of Epidemiology 28: 319–326.
Hemming, K., A. J. Girling, A. J. Sitch, J. Marsh, and R. J. Lilford. 2011. Sample size
calculations for cluster randomised controlled trials with a fixed number of clusters.
BMC Medical Research Methodology 11: 102.
Machin, D., M. J. Campbell, P. M. Fayers, and A. Pinol. 1997. Sample Size Tables for
Clinical Studies. 2nd ed. Oxford: Blackwell Science.
Robinson, L. D., and N. P. Jewell. 1991. Some surprising results about covariate ad-
justment in logistic regression models. International Statistical Review 58: 227–240.
K. Hemming and J. Marsh 135
Royston, P., and A. Babiker. 2002. A menu-driven facility for complex sample size
calculation in randomized controlled trials with a survival or a binary outcome. Stata
Journal 2: 151–163.