Sample Size
Sample Size
Meridith Blevins, MS
Vanderbilt University
Department of Biostatistics
[email protected]
http://biostat.mc.vanderbilt.edu/MeridithBlevins
1
Inouye & Fiellin, “An Evidence-Based Guide to Writing Grant Proposals for Clinical Research”, Annals of
Internal Medicine, 142.4 (2005): 274-282.
Meridith Blevins, MS (Vandy Biostats) Sample Size 3 / 33
Underlying principles
. Research hypothesis:
Specific version of the research question that summarizes the
main elements of the study – the sample, and the predictor and
outcome variables – in a form that establishes the basis for the
statistical hypothesis tests.2
Should be simple (ie, contain one predictor and one outcome
variable); specific (ie, leave no ambiguity about the subjects and
variables or about how the statistical hypothesis will be applied);
and stated in advance.
Example: Use of tricyclic antidepressant medications, assessed
with pharmacy records, is more common in patients hospitalized
with an admission of myocardial infarction at Longview Hospital
in the past year than in controls hospitalized for pneumonia.
2
NOTE: Hypotheses are not needed for descriptive studies – more to come.
Meridith Blevins, MS (Vandy Biostats) Sample Size 4 / 33
Underlying principles, cont’d
. Null hypothesis:
Formal basis for testing statistical significance; states that there
is no association, difference, or effect.
eg, Alcohol consumption (in mg/day) is not associated with a
risk of proteinuria (>300 mg/day) in patients with diabetes.
. Alternative hypothesis:
Proposition of an association, difference, effect.
Can be one-sided (ie, specifies a direction).
eg, Alcohol consumption is associated with an increased risk of
proteinuria in patients with diabetes.
However, most often two-sided – no direction mentioned.
Expected by most reviewers; very critical of a one-sided.
Rules of thumb:
Smaller sample size needed for paired groups – SD of the
difference in a variable usually smaller than the SD of a variable.
Sample size decreases as the difference in the mean values
increases (holding SD constant).
Sample size increases as SD increases (holding the difference in
the mean values constant).
Estimating a Mean:
2
(Z1−α/2 )(σ 2 /n)
N= D2
where
ψ = Se1 + Se2 − 2 × Se2 × P(T1 = 1|T2 = 1)
Se1 and Se2 are the presumed values of sensitivity from the
alternative hypothesis and P(T1 = 1|T2 = 1) is the probability that
the test 1 is positive given that test 2 is positive. The value of ψ
ranges from ∆1 (perfect correlation of test results) to
Se1 × (1 − Se2 ) + (1 − Se1 ) × Se2 (zero correlation).
1
Hayes RJ, Bennett S. Simple sample size calculation for
cluster-randomized trials. Int J Epidemiol 1998;28:319-326.
Meridith Blevins, MS (Vandy Biostats) Sample Size 31 / 33
Cohort and Case-control
Depends on the outcome, see reference.
1
Schlesselman, JJ. Sample size requirements in cohort and
case-control studies of disease. Am J Epidemiol 1974;9:6:381-384.
Meridith Blevins, MS (Vandy Biostats) Sample Size 32 / 33
The 10-20 Rule
A fitted regression model is likely to be reliable when the number of
predictors is less than m/10 or m/20 where m is the ‘limiting sample
size’.
1
Harrell, FE. Regression Modeling Strategies. New York, NY: Springer;
2001.
Meridith Blevins, MS (Vandy Biostats) Sample Size 33 / 33