0% found this document useful (0 votes)
7 views

Sample Size

Uploaded by

Temp temp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Sample Size

Uploaded by

Temp temp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Sample Size Planning,

Calculation, and Justification

Meridith Blevins, MS

Vanderbilt University
Department of Biostatistics
[email protected]
http://biostat.mc.vanderbilt.edu/MeridithBlevins

Meridith Blevins, MS (Vandy Biostats) Sample Size 1 / 33


Introduction
. After you’ve decided what and whom you’re going to study and the
design to be used, you must decide how many ‘subjects’ to sample.
Even the most rigorously executed study may fail to answer its
research question if the sample size is too small.
If the sample size is too large, the study will be more difficult
and costly than necessary while unnecessarily exposing a number
of ‘subjects’ to possible harm.

. Goal: to estimate an appropriate number of ‘subjects’ for a given


study design.
ie, the number needed to find the results you’re looking for.

Meridith Blevins, MS (Vandy Biostats) Sample Size 2 / 33


Introduction, cont’d
. Only as accurate as the data and estimates on which they are
based, which are often just informed guesses.
. Often reveals that the research design is not feasible or that
different predictor or outcome variables are needed.
. TAKE HOME MESSAGE: Sample size should be estimated early in
the design phase of the study, when major changes are still possible.

. In addition to the statistical analysis plan, the sample size section


is critical to an IRB proposal and any kind of grant.
42% of R01s examined in one review paper were criticized for
the sample size justifications or analysis plans.1

1
Inouye & Fiellin, “An Evidence-Based Guide to Writing Grant Proposals for Clinical Research”, Annals of
Internal Medicine, 142.4 (2005): 274-282.
Meridith Blevins, MS (Vandy Biostats) Sample Size 3 / 33
Underlying principles
. Research hypothesis:
Specific version of the research question that summarizes the
main elements of the study – the sample, and the predictor and
outcome variables – in a form that establishes the basis for the
statistical hypothesis tests.2
Should be simple (ie, contain one predictor and one outcome
variable); specific (ie, leave no ambiguity about the subjects and
variables or about how the statistical hypothesis will be applied);
and stated in advance.
Example: Use of tricyclic antidepressant medications, assessed
with pharmacy records, is more common in patients hospitalized
with an admission of myocardial infarction at Longview Hospital
in the past year than in controls hospitalized for pneumonia.
2
NOTE: Hypotheses are not needed for descriptive studies – more to come.
Meridith Blevins, MS (Vandy Biostats) Sample Size 4 / 33
Underlying principles, cont’d
. Null hypothesis:
Formal basis for testing statistical significance; states that there
is no association, difference, or effect.
eg, Alcohol consumption (in mg/day) is not associated with a
risk of proteinuria (>300 mg/day) in patients with diabetes.
. Alternative hypothesis:
Proposition of an association, difference, effect.
Can be one-sided (ie, specifies a direction).
eg, Alcohol consumption is associated with an increased risk of
proteinuria in patients with diabetes.
However, most often two-sided – no direction mentioned.
Expected by most reviewers; very critical of a one-sided.

Meridith Blevins, MS (Vandy Biostats) Sample Size 5 / 33


1-sided versus 2-sided
. In general a one sided test is appropriate when a large difference in
one direction would lead to the same action as no difference at all.
eg, Test the null hypothesis that the conception rate after
laparoscopy was less than or equal to that before. The one-sided
alternative hypothesis is that the conception rate after
laparoscopy was higher than that before.
. Expectation of a difference in a particular direction is not adequate
justification.
TAKE HOME: Two-sided tests should be used unless there is a very
good reason for doing otherwise. If one-sided tests are to be used,
the direction of the test must be specified in advance. One-sided
tests should never be used simply as a device to make a
conventionally non-significant difference significant.
1
Bland J.M. and D.G. Bland. Statistics Notes: One and two sided
tests of significance. BMJ, 1994; 309:248.
Meridith Blevins, MS (Vandy Biostats) Sample Size 6 / 33
Underlying principles, cont’d
. General process used in hypothesis testing:
Presume the null hypothesis (eg, no association between the
predictor and outcome variables in the population).
Based on the data collected in the sample, use statistical tests
to determine whether there is sufficient evidence to reject the
null hypothesis in favor of the alternative hypothesis (eg, there is
an association in the population).
. Reaching a wrong conclusion:
Type I error: false-positive; rejecting the null hypothesis that is
actually true in the population.
Type II error: false-negative; failing to reject the null
hypothesis that is actually not true in the population.
Neither can be avoided entirely.

Meridith Blevins, MS (Vandy Biostats) Sample Size 7 / 33


Underlying principles, cont’d
. Effect size:
Size of the association/difference/effect you expect/wish to be
present in the sample.
Selecting an appropriate size is most difficult aspect of sample
size planning.
REMEMBER: Sample size calculation only as accurate as the
data/estimates on which they are based.
Find data from prior studies to make an informed guess – needs
to be as similar as possible to what you expect to see in your
study.
Pilot study/studies sometimes needed first.
Good rule of thumb: choose the smallest effect size that would
be clinically meaningful (and you would hate to miss).
Will be okay if true effect size ends up being larger.
Meridith Blevins, MS (Vandy Biostats) Sample Size 8 / 33
Underlying principles, cont’d
. Establish the maximum chance that you will tolerate of making
wrong conclusions:
α: probability of committing a type I error; aka ‘level of
statistical significance’.
β: probability of making a type II error.
Power: 1 − β; probability of correctly rejecting the null
hypothesis in the sample if the actual effect in the population is
equal to (or greater than) the effect size.
Aim: choose a sufficient number of ‘subjects’ to keep α and β at
an acceptably low level without making the study unnecessarily
large (ie, expensive or difficult).
α and β decrease as sample size increases.
Often α = 0.05; β = 0.20, 0.10 (power = 80, 90%).

Meridith Blevins, MS (Vandy Biostats) Sample Size 9 / 33


Additional considerations
. Variability of the effect size:
Statistical tests depend on being able to show a difference
between the groups being compared.
The greater the variability (spread) in the outcome variable
among the subjects, the more likely it is that the values in the
groups will overlap, and the more difficult it will be to
demonstrate an overall difference between them.
Use the most precise measurements/variables possible.
. Often have >1 hypothesis, but should specify a single primary
hypothesis for sample size planning.
Helps to focus the study on its main objective and provides a
clear basis for the main sample size calculation.
Useful to rank other research questions/specific aims as
secondary, etc.
Meridith Blevins, MS (Vandy Biostats) Sample Size 10 / 33
Calculating sample size
. Specific method used depends on
The specific aim(s)/objective(s).
The study design, including the planned number of
measurements per ‘subject’.
The outcome(s) and predictor(s).
The proposed statistical analysis plan.
. Will also need to consider:
Accrual/Enrollment (response rate for questionnaires).
Drop-outs (ie, lost to follow-up) and missing data.
Budgetary constraints.
. Requires you to make assumptions.
Assume specific effect size (variability), α, power, etc.

Meridith Blevins, MS (Vandy Biostats) Sample Size 11 / 33


Calculating sample size for analytic studies
. Often want to show a significant difference/association between 2
groups.
. Most traditional ‘recipe’ in this case:
1 State the null and 1- or 2-sided alternative hypothesis.
2 Select the appropriate statistical test based on the type of
predictor and outcome variables in the hypotheses.
3 Choose a reasonable effect size (and variability, if necessary).
4 Specify α and power.
5 Use an appropriate table, formula, or software program to
estimate the sample size.
. Most common used statistical tests for comparing 2 groups:
The t-test.
The Chi-squared test.
Meridith Blevins, MS (Vandy Biostats) Sample Size 12 / 33
Calculating sample size for analytic studies, cont’d
. The t-test:
Commonly used to determine whether the mean value of a
continuous outcome variable in one group differs significantly
from that in another group.
Assumes that the distribution of the variables in each of the 2
groups is approximately normal (bell-shaped).
Assumptions:
Whether the 2 groups are paired or independent.
Example ‘paired’: Comparing the mean BMI of ‘subjects’ before
and after a weight loss program.
Example ‘independent’: Comparing the mean depression score in
‘subjects’ treated with 2 different antidepressants.
The mean value of the variable in each group.
Calculation actually uses the ‘effect size’ (the difference in the
mean values between the 2 groups).

Meridith Blevins, MS (Vandy Biostats) Sample Size 13 / 33


Calculating sample size for analytic studies, cont’d
. The t-test, cont’d:
Assumptions, cont’d:
The standard deviation (SD) of the variable.
If 2 groups are paired: SD of the difference in the variable
between matched pairs.
If 2 groups are independent: SD of the variable itself.

Rules of thumb:
Smaller sample size needed for paired groups – SD of the
difference in a variable usually smaller than the SD of a variable.
Sample size decreases as the difference in the mean values
increases (holding SD constant).
Sample size increases as SD increases (holding the difference in
the mean values constant).

Meridith Blevins, MS (Vandy Biostats) Sample Size 14 / 33


Calculating sample size for analytic studies, cont’d
. Example using the t-test:
Research question: Is there a difference in the efficacy of salbutamol and
ipratropium bromide for the treatment of asthma?
Planned study: randomized trial of the effect of these drugs on FEV1
(forced expiratory volume in 1 second) after 2 weeks of treatment.
Previous data: mean FEV1 in persons with asthma treated with ipratropium
was 2.0 liters, with a SD of 1.0 liter.
Wish: to be able to detect a difference of ≥10% in mean FEV1 between the
2 treatment groups.
Assumptions: α (two-sided) = 0.05; power = 0.80; effect size = 0.2 liters
(10% X 2.0 liters); SD = 1.0 liter.
Calculation: A sample size of 394 patients per group is needed to detect a
difference of ≥10% in mean FEV1 between the 2 (independent) treatment
groups with 80% power, using a two-sample t-test and assuming a
(two-sided) α of 0.05, a mean FEV1 of 2.0 liters in the ipratropium group,
and a SD of 1.0 liter.
Meridith Blevins, MS (Vandy Biostats) Sample Size 15 / 33
Calculating sample size for analytic studies, cont’d
. The Chi-square-test:
Commonly used to determine whether the proportion of
‘subjects’ who have a binary outcome variable in one group
differs significantly from that in another group.
Assumptions:
Whether the 2 groups are matched/paired or independent.
The proportion with the outcome in each group.
Rule of thumb: both the value of the proportions and the
difference between them affect sample size – sample size much
larger for small proportions.
Can also calculate assuming a relative risk (instead of 2
proportions).

Meridith Blevins, MS (Vandy Biostats) Sample Size 16 / 33


Calculating sample size for analytic studies, cont’d
. Example using the Chi-square-test:
Research question: Is there a difference in the adherence to antiretroviral
medication between HIV patients enrolled to standard of care (SOC) and
intervention groups?
Previous data: the 1-year adherence to ART is about 0.60 in HIV patients
on SOC.
Wish: to determine that the 1-year adherence is ≥0.75 in HIV patients on
intervention.
Assumptions: α (two-sided) = 0.05; power = 0.80; P1 (1-year adherence in
SOC) = 0.60; P2 (1-year adherence in intervention) = 0.75.
Calculation: A sample size of 152 HIV patients on SOC and 152 HIV
patients on intervention is needed to determine that the 1-year adherence
to ART is ≥0.75 in intervention group with 80% power, using an
Chi-square test and assuming a (two-sided) α of 0.05 and a 1-year
adherence to ART of 0.60 in SOC group.

Meridith Blevins, MS (Vandy Biostats) Sample Size 17 / 33


Calculating sample size for analytic studies, cont’d
. Alternative recipes – useful when sample size is fixed or the
number of subjects who are available or affordable is limited:
Calculate the power for a given effect and sample size.
Calculate the effect size for a given power and sample size.
Good general rule: always assume ≥80% power.
. Can also calculate the sample size required to determine whether
the correlation coefficient between 2 continuous variables is significant
(ie, significantly differs from 0) – see ‘Designing Clinical Research’.
. ‘Adjusting for covariates’:when designing a study, you may decide
that ≥1 variable will confound the association between the predictor
and outcome, and plan to use regression analysis to adjust for these
confounders.
Calculated sample size will need to be increased (‘10:1 rule’) –
see a statistician.
Meridith Blevins, MS (Vandy Biostats) Sample Size 18 / 33
Calculating sample size for descriptive studies
. Usually do not involve hypotheses; goal is to calculate descriptive
statistics (eg, means and proportions).
. Approach: calculate sample size required to estimate a confidence
interval (CI) of a specified confidence level (eg, 95%) and total width
(ie, precision).
For a continuous variable: interested in the CI around the mean
value of that variable.
For a binary variable: interested in the CI around the estimated
proportion of ‘subjects’ with one of the values.
. Rules of thumb:
A larger sample size is needed for a ‘tighter’ (ie, smaller total
width; more precise) CI of any confidence level.
For a given total width, a larger sample size is needed for a CI
with a higher confidence level.
Meridith Blevins, MS (Vandy Biostats) Sample Size 19 / 33
Sample size for descriptive studies, cont’d
. Assumptions made for calculating the sample size:
For a continuous variable: the standard deviation of the variable.
For a binary variable: the expected proportion with the variable
of interest in the population.
If more than half of the population is expected to have the
characteristic, then plan the sample size based on the
proportion of expected not to have the characteristic.
For both:
(1) the desired precision (total width) of the CI.
(2) the confidence level for the interval (eg, 95 or 99%).
. Example: A sample size of 166 ‘subjects’ is needed to estimate the
mean IQ among 3rd graders in an urban area with a 99% CI of ±3
points (ie, a total with of 6 points), assuming a standard deviation of
15 points.
Meridith Blevins, MS (Vandy Biostats) Sample Size 20 / 33
Sample size for descriptive studies, cont’d
. IMPORTANT: descriptive studies of binary variables include
studies of the sensitivity and specificity of a diagnostic test.
Goal of a diagnostic test (or procedure): to correctly classify
‘subjects’ as having or not having a ‘disease’ (or symptom).
Sensitivity: proportion of true positives; probability of testing
positive when you truly have the ‘disease’.
Specificity: proportion of true negatives; probability of testing
negative when you truly do not have the ‘disease’.
Example: A sample size of 246 ‘diseased subjects’ is needed to
estimate a 95% CI for the sensitivity of a new diagnostic test for
pancreatic cancer, assuming 80±5% of the patients with
pancreatic cancer will have positive tests.
When studying the specificity of a test, must estimate the
required number of ‘subjects’ who do not have the ‘disease’.
Meridith Blevins, MS (Vandy Biostats) Sample Size 21 / 33
Sample Size Software Programs
. PS: Power and Sample Size Calculation
Free from http://biostat.mc.vanderbilt.edu/PowerSampleSize.
Available for Windows and can run on Linux using Wine.
Handles several common analyses including t-test, Chi-square
test (‘Dichotomous’), and Survival.
Can generate graphs (eg, power vs sample size) and keeps ‘log’.
. Simulations can also be done for any statistical technique.
Most valuable for complex analyses, such as mixed effects or
GEE models – statistician (most likely) needed.

Meridith Blevins, MS (Vandy Biostats) Sample Size 22 / 33


Additional thoughts/considerations
. Consider strategies for minimizing sample size and maximizing
power, which include using
Continuous variables,
Paired measurements,
Unequal group sizes, and
A more common (ie, prevalent) binary outcome.
. Useful to calculate (and report) a range of sample sizes by
assuming different combinations of parameter values – take the
largest sample size to ‘cover all bases’.
. Always justify the feasibility of the calculated sample size.
How long would it take to accrue/enroll the subjects?
Need to consider the source of subjects, the inclusion/exclusion
criteria, the prevalence of the outcome, etc.
Meridith Blevins, MS (Vandy Biostats) Sample Size 23 / 33
What do I include in my sample size write-up?
. Key: state all the information assumed such that anyone reading
your ‘Sample Size section’ would be able to re-calculate the sample
size given. This includes
The (primary) specific aim.
The outcome and predictor variable(s).
Primary comparison of interest (if applicable).
Parameter estimates (ie, α, power, effect size, variability, etc).
Data on which you based your assumptions.
Statistical test used (if applicable).
. Including graphs can be very helpful.
. Show the reviewers that you have solid reasoning behind your
calculations (as well as your statistical analysis plan).
Acknowledge whether study will be a pilot or feasibility study.
Meridith Blevins, MS (Vandy Biostats) Sample Size 24 / 33
Take home message
. Sample size justification is an essential part of every research
study, and thus any IRB proposal or grant application.
. Sample size should be estimated early in the design phase of the
study, when major changes are still possible.

. References & acknowledgments:


Designing Clinical Research (3rd edition) by Hulley, et al.
Terri Scott, Ayumi Shintani & Jennifer Thompson.

Meridith Blevins, MS (Vandy Biostats) Sample Size 25 / 33


Common Values for Critical Regions
Z1−α/2 ≈ 1.96, when α = 0.05
Z1−α/2 ≈ 2.58, when α = 0.01
Z1−β ≈ 0.84, when β = 0.20
Z1−β ≈ 1.28, when β = 0.10

Meridith Blevins, MS (Vandy Biostats) Sample Size 26 / 33


Precision of Estimate
Estimating a Proportion:
2
(Z1−α/2 )p(1−p)
N= D2

Estimating a Mean:
2
(Z1−α/2 )(σ 2 /n)
N= D2

D, half-length of confidence interval (est ± D)

Meridith Blevins, MS (Vandy Biostats) Sample Size 27 / 33


Detecting a Difference
Difference of (Independent) Proportions:

(Z1−α/2 + Z1−β )2 (p1 (1 − p1 ) + p2 (1 − p2 ))


N=
(p1 − p2 )2

Difference of (Independent) Means:

(Z1−α/2 + Z1−β )2 (σ12 + σ22 )


N=
(µ1 − µ2 )2

Meridith Blevins, MS (Vandy Biostats) Sample Size 28 / 33


Diagnostic Tests
Accuracy of One Test (same as estimating a proportion):
(Z 2 )Se(1−Se)
N = 1−α/2D 2
Accuracy of Two Tests1 : The null and alternative hypotheses are:
H0 : ϑ1 = ϑ2
Ha : ϑ1 6= ϑ2
where ϑ1 is the diagnostic acccuracy of test 1 and ϑ2 is the
diagnostic accuracy of test 2. The presumed value of the difference
in sensitivity is denoted as ∆1 .
q q
[Z1−α/2 V0 (ϑˆ1 − ϑˆ2 ) + Z1−β VA (ϑˆ1 − ϑˆ2 )]2
N=
(∆1 )2
1
Zhou X-H, Obuchowski NA, McClish DK. Statistical Methods in
Diagnostic Medicine. New York, NY: Wiley; 2002.
Meridith Blevins, MS (Vandy Biostats) Sample Size 29 / 33
Diagnostic Tests, cont’d
With a paired-study design, the variance functions under the null and
alternative hypotheses are given by:
ˆ 1 − Se
V0 (Se ˆ2) = ψ ˆ 1 − Se
VA (Se ˆ 2 ) = ψ − ∆1 2

where
ψ = Se1 + Se2 − 2 × Se2 × P(T1 = 1|T2 = 1)

Se1 and Se2 are the presumed values of sensitivity from the
alternative hypothesis and P(T1 = 1|T2 = 1) is the probability that
the test 1 is positive given that test 2 is positive. The value of ψ
ranges from ∆1 (perfect correlation of test results) to
Se1 × (1 − Se2 ) + (1 − Se1 ) × Se2 (zero correlation).

Meridith Blevins, MS (Vandy Biostats) Sample Size 30 / 33


Cluster Randomized Trials
Our objective is to compare the population proportions for
intervention and control groups of randomized clusters. Suppose
there are n study subjects in each cluster, c.
c = 1+(zα/2 +zβ )2 [π0 (1−π0 )/n+π1 (1−π1 )/n+km 2 (π0 2 +π1 2 )]/(π0 −π1 )2
where π1 and π0 are the true proportion for intervention and control
groups, and km is the coefficient of variation.
Clusters (eg, communities) are matched on the basis of factors that
are expected to be correlated with the main study outcomes, with the
aim of minimizing the degree of between-cluster variation within
matched pairs. Then c, the number of clusters required, is given by:
c = 2+(zα/2 +zβ )2 [π0 (1−π0 )/n+π1 (1−π1 )/n+km 2 (π0 2 +π1 2 )]/(π0 −π1 )2

1
Hayes RJ, Bennett S. Simple sample size calculation for
cluster-randomized trials. Int J Epidemiol 1998;28:319-326.
Meridith Blevins, MS (Vandy Biostats) Sample Size 31 / 33
Cohort and Case-control
Depends on the outcome, see reference.

1
Schlesselman, JJ. Sample size requirements in cohort and
case-control studies of disease. Am J Epidemiol 1974;9:6:381-384.
Meridith Blevins, MS (Vandy Biostats) Sample Size 32 / 33
The 10-20 Rule
A fitted regression model is likely to be reliable when the number of
predictors is less than m/10 or m/20 where m is the ‘limiting sample
size’.

Table: Limiting Sample Sizes for Various Response Variables

Type of Response Variable Limiting Sample Size m


Continuous n (total sample size)
Binary min(n1 ,n2 )
n − n12 ki=1 ni3
P
Ordinal (k categories)
Failure (survival) time number of failures

1
Harrell, FE. Regression Modeling Strategies. New York, NY: Springer;
2001.
Meridith Blevins, MS (Vandy Biostats) Sample Size 33 / 33

You might also like