0% found this document useful (0 votes)
87 views

Biostatistics Module

Thirichu ghkk hjkkk gjkkk bjkkk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Biostatistics Module

Thirichu ghkk hjkkk gjkkk bjkkk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/308179883

Biostatistics Series Module 5: Determining Sample Size

Article  in  Indian Journal of Dermatology · September 2016


DOI: 10.4103/0019-5154.190119

CITATIONS READS
5 1,386

2 authors:

Avijit Hazra Nithya J Gogtay


Institute of Post-Graduate Medical Education and Research and Seth Sukhlal Karnan… KEM Hospital
165 PUBLICATIONS   1,764 CITATIONS    258 PUBLICATIONS   2,552 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

WHO/TDR Project of Mass Drug administration of DEC and Albandazole admin station for control of Filariases in Ruaral Area of Wardha View project

Biostatistics series in Indian Journal of Dermatology View project

All content following this page was uploaded by Avijit Hazra on 30 September 2016.

The user has requested enhancement of the downloaded file.


IJD® MODULE ON BIOSTATISTICS AND RESEARCH METHODOLOGY FOR THE DERMATOLOGIST
MODULE EDITOR: SAUMYA PANDA

Biostatistics Series Module 5: Determining Sample Size


Avijit Hazra, Nithya Gogtay1

Abstract From the Department of


Determining the appropriate sample size for a study, whatever be its type, is a fundamental Pharmacology, Institute of
aspect of biomedical research. An adequate sample ensures that the study will yield reliable Postgraduate Medical Education
and Research, Kolkata,
information, regardless of whether the data ultimately suggests a clinically important
West Bengal, 1Department of
difference between the interventions or elements being studied. The probability of Type 1 Clinical Pharmacology, Seth GS
and Type 2 errors, the expected variance in the sample and the effect size are the essential Medical College and KEM Hospital,
determinants of sample size in interventional studies. Any method for deriving a conclusion Mumbai, Maharashtra, India
from experimental data carries with it some risk of drawing a false conclusion. Two types of
false conclusion may occur, called Type 1 and Type 2 errors, whose probabilities are denoted
Address for correspondence:
by the symbols σ and β. A Type 1 error occurs when one concludes that a difference exists
Dr. Avijit Hazra,
between the groups being compared when, in reality, it does not. This is akin to a false Department of Pharmacology,
positive result. A Type 2 error occurs when one concludes that difference does not exist Institute of Postgraduate Medical
when, in reality, a difference does exist, and it is equal to or larger than the effect size Education and Research,
defined by the alternative to the null hypothesis. This may be viewed as a false negative 244B Acharya J. C. Bose Road,
result. When considering the risk of Type 2 error, it is more intuitive to think in terms of Kolkata ‑ 700 020,
power of the study or (1 − β). Power denotes the probability of detecting a difference when West Bengal, India.
a difference does exist between the groups being compared. Smaller α or larger power will E‑mail: [email protected]
increase sample size. Conventional acceptable values for power and α are 80% or above and
5% or below, respectively, when calculating sample size. Increasing variance in the sample
tends to increase the sample size required to achieve a given power level. The effect size is
the smallest clinically important difference that is sought to be detected and, rather than
statistical convention, is a matter of past experience and clinical judgment. Larger samples
are required if smaller differences are to be detected. Although the principles are long known,
historically, sample size determination has been difficult, because of relatively complex
mathematical considerations and numerous different formulas. However, of late, there has
been remarkable improvement in the availability, capability, and user‑friendliness of power
and sample size determination software. Many can execute routines for determination of
sample size and power for a wide variety of research designs and statistical tests. With the
drudgery of mathematical calculation gone, researchers must now concentrate on determining
appropriate sample size and achieving these targets, so that study conclusions can be accepted
as meaningful.

Key Words: Effect size, power, sample size, Type 1 error, Type 2 error

Introduction the entire batch of light bulbs. The inherent difficulty


In most areas in life, it is very difficult to work with in working with populations makes researchers chose
populations. During general elections, for instance, a samples to work with. These difficulties include cost,
newspaper is likely interview a few thousand people at time considerations, logistics, and also ethics as it is
the most and predict results based on their responses. usually unethical to study an entire population when
Similarly, in a factory that manufactures light bulbs, a
few bulbs are chosen at random to assess the quality of This is an open access article distributed under the terms of the Creative
Commons Attribution‑NonCommercial‑ShareAlike 3.0 License, which allows
Access this article online others to remix, tweak, and build upon the work non‑commercially, as long as the
author is credited and the new creations are licensed under the identical terms.
Quick Response Code:
For reprints contact: [email protected]
Website: www.e‑ijd.org

How to cite this article: Hazra A, Gogtay N. Biostatistics series module


DOI: 10.4103/0019-5154.190119 5: Determining sample size. Indian J Dermatol 2016;61:496-504.
Received: August, 2016. Accepted: August, 2016.

496 © 2016 Indian Journal of Dermatology | Published by Wolters Kluwer - Medknow


Hazra and Gogtay: Determining sample size

a subset of the population could answer the research studying the situation in a particular region of India,
question. while the second may be that of a group with better
manpower and funding who are trying to cover all the
Studies that cover entire populations go by the generic
regions of the country. It stands to logic that we will
term “census.” In India, we have a decennial census
accept the results of the second group as representative
that is held once in every 10 years. This is only a
of the entire country. We might also accept the results of
demographic and socioeconomic census that aims to
the first group as representative of the entire country if
capture data on a limited range of demographic, social,
there is not much reason to suspect there could be large
and economic indicators. However, each and every Indian
regional differences. However, it is unlikely that we will
citizen has to be covered. This makes it a huge exercise
accept the results of a third group, who have sampled
and necessitates that the Government of India maintain
only a tiny circle, say of a single town (Sample 3) as
an elaborate machinery called the Office of the Registrar representing the entire country. Thus, whenever we work
General and Census Commissioner under the Ministry of with samples rather than populations, it is important to
Home Affairs (popularly called the Census Bureau). This ensure that the sample is optimally sized and adequately
census, though it aims to capture only limited data, is representative of the entire population. It does not
such an elaborate affair, that by the time all the data matter what type of study we are doing; a clinical
have been collected, collated, processed and analyzed, it trial, laboratory experiment, field survey, or quality
is almost time for the next census. Hence, what will the control – everywhere these two issues – that of sample
Indian Government do if it requires some quick answers, size and sampling – are of paramount importance. If
such as the immunization coverage in a particular we do not get these right, our sample results are not
district or the malnutrition prevalence in a particular generalizable to the population we have intended to
region? For this, the government has to maintain study, and all our efforts will go in vain.
another machinery called the National Sample Survey
Organization (NSSO), now known as National Sample “How much is enough?” is often a question that
Survey Office, under the Ministry of Statistics. The NSSO plagues researchers and clinicians alike. While sample
conducts periodic surveys, not of the entire country’s size calculations immediately bring to mind complex
population but of a sample of randomly selected “NSSO formulas, the aim of this module is not to present
blocks,” to provide answers that are generally available a pantheon of fearsome formulas, but rather to
in a 2–3 year timeframe. It routinely collects data on familiarize the reader with the principles though a few
several socioeconomic, health, industrial, agricultural, representative formulas with solved examples. Estimation
and price indicators. of the minimum sample size required for any study can
have technical variations, but the concepts underlying
Most biomedical researchers will never have the luxury most methods are similar. These concepts are important
of conducting a census but will have to depend on a as they enable researchers to use a minimum number of
population subset, called the sample, to seek reasonably subjects to draw strong (valid and robust) conclusions
valid answers to their research questions. Look at with a limited number of research participants. It is
Figure 1 where the ellipse represents a population. also important to remember that whatever the formulas
Suppose, a researcher draws a sample represented by used, small differences in selected parameters (described
the first circle (Sample 1) to answer a research question. below) can lead to large differences in the calculated
Another researcher may be trying to answer the same sample size. Thus, any sample size calculation, however
research question using another sample (Sample 2) carefully done, will always remain approximate. In most
represented by the second circle. Obviously, the two studies, there is a primary question that the researcher
circles vary in their size (radius) and location (center). wishes to answer. Sample size calculations are based on
The situation of the first researcher is akin to a group this primary objective. Finally, before locking the sample
size to work with, one must take into account available
funding, manpower, logistics, and, most important of
all in clinical studies, the ethics of subjecting human
participants to potentially harmful interventions.

Elements in Sample Size Calculation


Sample 2
Sample 1 Sample 3
for Randomized controlled trials – An
Understanding of Key Concepts
Determining the sample size required to answer the
Population research question is one of the first steps in designing
a study. The main aim of sample size calculation for a
Figure 1: Population vis‑à‑vis samples clinical trial is to determine the number of participants

497 Indian Journal of Dermatology 2016; 61(5)


Hazra and Gogtay: Determining sample size

needed to detect a clinically relevant treatment effect. • Type  2 error: The probability of failing to reject a
As a general rule, the greater the variability in the false null hypothesis (β). This represents the false
outcome variable, the larger the sample size needed to negative error and is the probability of not finding
assess whether the observed effect (that seen when the a difference between the two groups studied when
study is completed) is a true effect. one actually exists. It is also called the investigator’s
error. The value of β is conventionally set at 20% or
Here, we will discuss principles of sample size calculation
lower. The lower the value, the larger would be the
for two group randomized controlled trials (RCTs).
The calculation of sample size for RCTs requires that sample size. Since β error is the inability to detect
an investigator specify certain factors outlined below. a difference, it follows the (1 − β) is the ability to
Broadly, as we have seen earlier in this series, data detect a difference should one actually exist and is
can be categorized as numerical (quantitative) and referred to as the “power” of a study. A study must
categorical (qualitative) data. For the former, information have at least 80% power to detect a difference. A β
on the mean responses in the two group, µ1, and µ2, value of 10% will confer 90% power to the study.
are required as also the standard deviations (SDs) or a Note again that the relationship between α and β are
common (pooled) SD for the two groups. For categorical reciprocal. If one tries to lower α, the value of β will
data, information on proportions of successes in the two go up, unless one expands the sample size. The only
groups, p1, and p2, is needed. Such information is usually way to achieve zero α and zero β errors is to work
obtained either from published literature, a pilot study or with an infinite sample size, which is not possible.
at times “guesstimated.” The other two key components Selection of α and β error values, will in turn lead
are the Type 1 (alpha) and Type 2 (beta) error probabilities. to selection of the standard normal deviates, Zα and
Apart from this, an understanding of whether the data Zβ values, that are actually entered into the formulas
is normally distributed (following the Gaussian curve) or [Table 1]. The formulas incorporate a factor (Zα + Zβ)2,
otherwise is required. Moreover, an understanding of the which has been referred to as the power index.
null hypothesis is useful at this stage. The null hypothesis • Standard deviation (SD) of the outcome measure
states that the two groups that are being compared are of interest in the underlying population (SD or σ).
not different while the alternate hypothesis would be that This is the variability or spread associated with
the two groups are actually different. quantitative data. The larger the variability, the
larger would be the sample size required to attain a
The four elements that enter into sample size calculation given power at the chosen level of significance. In
formulas: many cases, although the SD is not exactly known,
• Effect size  (d or δ): The size of the effect that is one has a rough estimate and the sample size may
clinically worthwhile to detect, that is the smallest be calculated based on the maximum variance that
difference that is clinically meaningful. For numerical is likely. If the variance in the observed data is
data, this is the difference between µ1 and µ2; that smaller, the study will attain higher power. If the SD
is the anticipated outcome means in the two groups, is completely unknown, the solution is to conduct a
while for categorical data, this is represented by p1 pilot study to obtain an estimate of the variance of
and p2 that is the proportions of successful outcomes the outcome measure.
in the two groups. The effect size represents a
clinically meaningful difference in the sense that Although it is often assumed that study groups would
it may make the physician change his practice. As have the same variance, this is not always the case. If
stated earlier, choice of this clinically meaningful the variance of the outcome measure in question varies
difference can be based on existing literature or widely between the different groups, a “pooled” SD value
a pilot study. In case of numerical data, the ratio has been used. This can be calculated for two groups as:
of effect size to SD has been called “standardized
difference” or the “standardized effect”. Table 1: Commonly used standard normal deviate
• Type  1 error: The probability of falsely rejecting a values used in sample size calculations
null hypothesis when it is true (α). This represents
Direction of testing α or β Value
the false positive error and can be regarded as
Zα Two‑sided α=0.05 Zα=1.960
the probability of finding a difference between
Two‑sided α=0.025 Zα=2.326
two groups where none exists. It is also called the
Two‑sided α=0.01 Zα=2.576
regulator’s error as regulatory decision making takes
One‑sided α=0.05 Zα=1.645
place based on results of these comparisons. Note
One‑sided α=0.025 Zα=1.960
that the α error is akin to the significance level of a
One‑sided α=0.01 Zα=2.326
study. Conventionally, the value of α is set at 5% or
lower. The lower the value, the larger would be the Zβ β=0.20 Zβ=0.840
sample size. β=0.10 Zβ=1.282

Indian Journal of Dermatology 2016; 61(5) 498


Hazra and Gogtay: Determining sample size

SDPooled = √([SD12 + SD22]/2) patients with pulmonary hypertension in a randomized


placebo‑controlled trial. (Channick RN, Simonneau G,
Note that it is also important to decide whether one should
Sitbon O, Robbins IM, Frost A, Tapson VF, et al. Effects
do two‑tailed or one‑tailed testing. In most situations,
of the dual endothelin‑receptor antagonist bosentan in
two‑tailed testing is the norm, although it requires a
patients with pulmonary hypertension: A randomised
larger sample size. One‑tailed testing is acceptable only if
placebo‑controlled study. Lancet 2001;358:1119‑23).
one can be sure that change or difference can only be in
one direction and not in either direction. They proposed to detect a mean difference of 50 m in
the 6‑min walk test given a common SD of 50 m between
A Few Solved Examples Based on Formulas the two groups at 80% power and 5% significance using
one‑sided test. The anticipated dropout rate was 25%.
Although sample size tables can provide useful guides for
How many patients did they require.
determining sample size, they are based on commonly
used values for different parameters, and one may need In this example, we have:
to calculate the necessary sample size for different • Z value (two‑sided) related to probability of falsely
combinations of Type 1 error probability, power, and rejecting a true null hypothesis (α) = 0.05 Zα = 1.65
variability. Formulas are then needed to obtain exact for a one‑sided test
sample size estimations for such situations. There are • Z value related to probability of failing to reject a
umpteen formulas to suit various study designs. We will false null hypothesis (β) = 0.80 Zβ = 0.84
consider few common examples here. • SD of the outcome being studied (SD or σ) = 50 m
• Size of the effect that is clinically worthwhile to
Sample size for one mean, normal distribution detect = 50 m.
( Z α + Zβ )2 × σ 2
N = Substituting in the formula, we get:
δ2
(1.645 + 0.84 )
2
× 502
Problem 1: A physician wants to know if the mean heart N = = 24.7
rate after use of a new drug differs from the healthy 252
population rate of 72 beats/min. He considers a mean Thus, 24 patients needed to be randomized to the two
difference of 6 beats/min to be clinically meaningful. study groups. To allow for the anticipated dropout rate,
He also chooses 9.1 beats/min as the variation expected the recruitment target was 32. In this example, the
based on a previously published study. How many authors have used a one‑sided test as the outcome of
patients will be need to carry out the study with 5% interest is the 6‑min walk test and one does not expect
probability of Type 1 error and 80% power? the placebo to worsen the baseline distance the patient
can walk. However, one must be careful though in the
In this example, we have: choice of the one‑sided test. It is to be used only when
• Z value (two‑sided) related to the probability of the outcome of interest truly can be in one direction
falsely rejecting a true null hypothesis (α) = 0.05
only.
Zα = 1.96
• Z value related to the probability of failing to reject Note that an alternative version of the formula given
a false null hypothesis (β) = 0.80 Zβ = 0.84 above is:
• SD of the outcome being studied  (SD or σ) = 9.1 σ2 + σ2
N = 2 × ( Z α + Zβ )2 × 1 2 2
beats/min δ
• Size of the effect that is clinically worthwhile to In this version, we can accommodate two separate SD
detect (δ) = 6 beats/min. values for the two groups and δ remains simply as (µ1− µ2).
Substituting in the formula, we get: Sample size for two proportions, categorical
(1.96 + 0.84 ) × 9.1
2 2 data
N = = 18.03 4 × ( Z α + Zβ )2
62 N =
Thus, 18 patients are needed to be studied with the new δ2
drug. Where δ = (p1 − p2)/√(p [1 − p]); p being (p1 + p2)/2.
Sample size for two means, quantitative data Problem 3: In the bypass versus angioplasty in severe
( Z α + Z β )2 × σ 2 ischemia of the leg study on bypass versus angioplasty
N = for leg ischemia (Adam DJ, Beard JD, Cleveland T, Bell J,
δ2
Bradbury AW, Forbes JF, et al. Bypass versus angioplasty
Where δ = (µ1− µ2)/2.
in severe ischaemia of the leg [BASIL]: Multicentre,
Problem 2: Channick et al. studied the effects of randomised controlled trial. Lancet 2005;366:1925‑34.).
the dual endothelin‑receptor antagonist bosentan in The statistical calculations were based on the 3‑year

499 Indian Journal of Dermatology 2016; 61(5)


Hazra and Gogtay: Determining sample size

survival value of 50% in the angioplasty and 65% in the Earlbaum Associates; 1988) dealt elaborately with
bypass group. At 5% significance and 90% power, how effect sizes and standardized differences and proposed
many patients would be needed to detect a difference thumb rules for these. They have been widely used
between the two groups? in psychology, but many feel that predetermined
standardized differences are too restrictive.
In this example, we have:
• Z value (two‑sided) related to the probability of Nomograms
falsely rejecting a true null hypothesis (α) = 0.05 Nomograms offer a graphical alternative method of
Zα = 1.96 sample size calculation. They are cleverly designed on
• Z value related to the probability of failing to reject the basis of general formulas. The Altmans’s nomogram,
a false null hypothesis (β) = 0.90 Zβ = 1.282 devised by Doug Altman and published in his 1991 book
• Proportion of success in one group (p1) = 0.65 (Altman DG. Practical statistics for medical research.
• Proportion of success in the other group (p2) = 0.50. London: Chapman and Hall, 1991) is a popular graphical
Therefore, δ will be (0.65 − 0.50)/√(0.575 × 0.425) = method that enables sample size estimations for paired
0.15/0.49434 = 0.30343. and unpaired t‑tests and the Chi‑square test. You can
easily download a copy from the internet and try it.
Substituting in the formula, we get: To use the nomogram, we first need to translate our
4 × (1.96 + 1.282)
2
required difference into a standardized difference.
N = = 456.63
( 0.30343)
2
Various other nomograms have been devised. For
Thus, 456 patients overall or 228 per group are needed studies of diagnostic tests, Malhotra and Indrayan have
(assuming groups to be of equal size) to conduct the proposed a convenient nomogram for sample size based
study. on anticipated sensitivity/specificity, and estimated
Note that an alternative version of the formula given prevalence (Malhotra RK, Indrayan A. A simple nomogram
above is: for sample size for estimating sensitivity and specificity
p (1 – p1 ) + p2 (1 – p2 ) of medical tests. Indian J Ophthalmol 2010;58:519‑52).
N = 2 × ( Z α + Zβ )2 × 1 The Schoenfeld and Richter nomograms give sample size
( p1 – p2 )
2
for detecting difference in median survival between two
In this version, we can directly accommodate the two treatment groups in survival analysis (Schoenfeld DA,
proportions. Richter JR. Nomograms for calculating the number of
patients needed for a clinical trial with survival as an
Alternatives to General Formulas endpoint. Biometrics 1982;38:163‑70). There are other
Beyond working with formulas by hand, sample size nomograms for other study designs.
estimations can also be done using tables based on these
Software
general formulas, quick formulas, graphical methods
An understanding of the principles helps in inputting
(nomograms), and software.
the desired information into software and quickly
Quick formulas getting the calculations done. Many softwares are
Various simplified versions of the general formulas have available that provide sample size calculation routines.
been proposed to enable rapid sample size calculations Some come as part of larger statistical packages while
for standard situations. For instance, Lehr’s formula some are standalone power and sample size software.
can be used for quick calculation of sample size for Power and Sample Size Calculation (PS) is a free
comparison of two equal‑sized groups for power of 80% software (current version 3.1.2; 2014) developed by
and a two‑sided significance level of 0.05. the Department of Biostatistics, Vanderbilt University,
USA, that provides sample size routines for both
The required size of each equal sized group is interventional and observational studies. It can be
16/(standardized difference)2. downloaded (http://biostat.mc.vanderbilt.edu/wiki/
For power of 90%, numerator becomes 21. main/powersamplesize), installed and used without
restrictions. Power Analysis and Sample Size Software
However, if the standardized difference is small, this
(PASS) is a comprehensive commercial package developed
formula tends to overestimate sample size. In comparison
by NCSS, a company based in Kayesville, Utah, USA.
to the general formula presented above.
It is regarded as an industry standard and provides
Recall that standardized difference is the ratio of sample size calculations for over 650 statistical tests and
effect size to SD. For unpaired t‑test, the standardized confidence interval (CI) scenarios (http://www.ncss.com/
difference is calculated simply as δ/σ. Jacob Cohen in software/pass/). nMaster (current version 2.0; 2011),
his 1988 book (Cohen J. Statistical power analysis for developed by the Department of Biostatistics, Christian
the behavioral sciences. 2nd ed., Hillsdale, NJ: Lawrence Medical College, Vellore, India, is a very affordable

Indian Journal of Dermatology 2016; 61(5) 500


Hazra and Gogtay: Determining sample size

package that also provides a large number of sample to approach more subjects than are needed in the first
size routines required for most academic studies. It is instance. In addition, even in the very best designed and
important to remember, however, that the sample size conducted studies, it is unusual to finish with a data set
calculations are prone to errors and, even with software, in which complete data are available for every subject.
it is usually a good practice to get these calculations Subjects may fail to turn up, refuse to be examined or
verified by someone experienced in the field before their samples may be lost. In studies involving long
embarking upon the study. follow‑up, there will always be a substantial degree of
attrition. It is therefore often necessary to estimate
Adjustments to Calculated Sample Size the number of subjects that need to be approached to
Once a “raw” sample size is calculated, various achieve the final desired sample size.
adjustments may be needed to accommodate variations The adjustment may be done as follows. If a total
in study objectives and to keep an adequate safety of (N) subjects is required in the final study, but a
margin for potential attritions. Thus, adjustments may proportion (q) are expected to refuse to participate or to
be needed for multiple outcomes, unequal group sizes, drop out before the study ends, then the total number
dropouts, planned subgroup analysis, cluster sampling, of subjects (N”) who would have to be approached at the
and so on. For instance, it is important that we calculate outset to ensure that the final sample size is achieved:
sample size on the basis of a single primary outcome. If N
we have multiple primary outcomes, there is no other N' =
way out than to calculate sample sizes separately for
( – q)
1
each of these outcomes and then work with the largest Thus, if 135 is the estimated total sample size required
so that the rest are covered as well. Other common and maximum 20% of recruited subjects are expected
adjustments are discussed below. to drop out before study ends, the recruitment target
would be:
Adjusting for Unequal Sized Groups N" =
135
=
135
= 168.75
The methods described above assume that the (1 – 0.2) 0.8
comparison is to be made across two equal sized groups. Thus, 169 eligible subjects would need to be recruited in
However, this may not be the case in practice, for order that at least 135 subjects complete the study.
example in an observational study or in an RCT with
Using the formula 100/(100 − x), where x represents the
unequal randomization. In this case, it is possible to
estimated maximum drop out fraction in percentage, one
adjust the numbers to reflect this inequality. The first
can easily derive the following set of correction factors
step is to calculate the total (across both groups) sample
[Table 2] that may be applied to the total sample size to
size assuming that the groups are equal sized. This total
derive the screening or recruitment target sample size.
sample size (N) can then be adjusted according to the
actual ratio of the two groups (k) with the revised total As with other aspects of sample size calculations,
sample size (N') being: the proportion of eligible subjects who will refuse to
N (1 + k )
2 participate or who drop out will be unknown at the
N’ = onset of the study. However, good estimates will often
4k be possible using information from similar studies in
For instance, consider that a placebo‑controlled trial comparable populations or from an appropriate pilot
requires a total sample size of 120 if the two groups are study. Note that it is particularly important to account
of equal size. If it is decided that twice as many subjects for dropouts in the budgeting of studies in which initial
would be randomized to the active treatment group than recruitment costs or treatment costs are likely to be high.
to the placebo group, the new sample size N’ would be:
120 × (1 + 2)
2
120 × 9 Table 2: Correction factors (to total sample size) for
N' = = = 135
4×2 8 different dropout fractions
Thus, 135/3 = 45 patients need to be allocated to Estimated maximum Total sample size
placebo treatment and 135 × 2/3 = 90 to active dropout fraction (%) to be multiplied by
treatment groups. 5 1.05
10 1.11
Adjusting for Consent Refusals, 15 1.18
Withdrawals, and Missing Data 20 1.25
25 1.33
Any sample size calculation is based on the total
30 1.43
number of participants who are needed in the final
analysis. In practice, eligible subjects will not always be 35 1.54
willing or continue to take part, and it will be necessary 40 1.67

501 Indian Journal of Dermatology 2016; 61(5)


Hazra and Gogtay: Determining sample size

Other Considerations in Sample Size study. Since the predicted precision or CI width depends
Determination mostly on the variance of the data and much less on
the effect size, a study can be planned to yield a given
There are some additional issues in sample size
precision or CI width without choosing a likely effect
determination that are not often discussed, but size.
nevertheless may be relevant in certain situations and
therefore merit attention. When designing a study to yield a CI with a given width,
the general technique is to choose a sample size so that
First, the above approaches to sample size have assumed the ‘average, resulting CI will have the given width. It is
that a simple random sample is the sampling design. important to realize that, given such a sample size, there
More complex designs, like stratified random samples or is a 50‑50 chance that the final width will be narrower
cluster samples, must take into account the variances or wider than the desired width, even if the estimate
of subpopulations, strata, or clusters before an estimate used for the variance of the outcome data is accurate.
of the variability in the population as a whole can be Alternatively, one can choose a sample size so that there
made. is a defined probability (e.g., 0.80 or 0.95) that the final
A second consideration is the extent of the analysis that CI width is below a given value. This calculation is more
is planned to be performed. If descriptive summaries analogous to a typical power calculation, but is more
and simple inferential statistics are planned than almost complex and not generally part of currently‑available
any sample size is good enough. On the other hand, sample size software.
larger samples are required if multivariate analysis such
as multiple regression, analysis of covariance, logistic
Post hoc Power Analysis
regression, log‑linear analysis, and Cox’s proportional A power calculation needs to be before a study is
hazard analysis are planned for more rigorous assessment initiated to determine the appropriate sample size.
of the combined effect of multiple variables. Methods A power calculation can always be done once the study
are still evolving to estimate optimum sample size for data have been generated but it is not good, and rather
such multivariate techniques. In studies with “adaptive unfair, statistical practice to tweak parameters of the
design,” the sample size may keep changing as the study a priori power analysis on the basis of post hoc power
progresses, based on results obtained. These calculations analysis.
are quite complex. How can one interpret data from a negative study in
Finally, an adjustment in the sample size may be needed which no power calculation was initially performed?
to accommodate a comparative analysis of subgroups. Although tempting, performing a post hoc power analysis
There are various suggestions but no hard and fast to estimate the effect size that could have been found
recommendations toward this. Even if total sample is with the actual sample size and with a given power,
relatively large, skewed distribution of variables in is inappropriate and should not be done. The correct
approach to such data is to calculate the 95% CI for the
question can result in erroneous conclusions on subgroup
outcome of interest, based on the final data, and use
analysis. The safest approach is to plan all subgroup
this interval to guide interpretation of the study results.
analyses beforehand and ensure that the likely subgroup
Incidentally, a CI width‑based power and sample size
sample size would have adequate power to demonstrate a
calculation can also be done before initiating the study.
clinically important effect on comparison of subgroups.
This is routinely done in cohort, case–control and other
The sample size for surveys requires different epidemiological studies.
considerations. This is highlighted in Box 1.
What if There is No Choice About Sample
Precision versus Effect Size Size
Frequently, studies are designed to yield an estimate Finally, before we close this chapter, let us address this
of some parameter of interest, for example, the mean common dilemma. Often, a study has a limited budget,
of a continuous variable, proportion, or a difference in and this curtails the possibility of a “comfortably”
proportions. A difference in proportions can be measured large sample. It is hard to argue with a budget. It
as a raw difference, a relative risk, an odds ratio, and in is equally unwelcome to give up (the aptly called
other ways. In each case, however, the precision of the “terminator” approach) and say that the study cannot
final estimate can be measured by the width of the 95% be done. The practical alternative is to realize that by
CI. The larger the sample size used to calculate the final tweaking aspects of sample size calculation it may still
CI, the narrower the CI will be. Therefore, the desired be possible to execute the study within the constraints
final CI width can be used, instead of the desired effect of a restricted budget and therefore the imposed
size, to determine the appropriate sample size for a sample size. It may be possible to raise the effect

Indian Journal of Dermatology 2016; 61(5) 502


Hazra and Gogtay: Determining sample size

Box 1: Sample size for surveys


Determining the sample size for a simple survey requires consideration of four elements:
ME: This denotes the extent of deviation from the true result that you are willing to tolerate. Thus if the true prevalence figure for
a disease in a population is say 30%, and we are happy if our result is in the 25%–35% range than we are tolerating a±5% margin of
error. Lower margin of error requires a larger sample size. In fact the sample size can change dramatically with change in the margin
of error. By convention, however, this figure should not exceed 5%.
CL: This is the extent of uncertainty we can tolerate. In the above example, if we are saying that we are happy if our result is in
the 25%– 35% range, then how confident are we that the true figure will actually lie in this range. Higher confidence level requires
a larger sample size but 95% is generally the minimum confidence level we should work with. With this choice, when you survey a
sample of the population, you don’t know that you’ve found the correct answer, but you do know that there’s a 95% chance that
you’re within the margin of error of the correct answer.
Population size: We need some idea of the size of the population we are to sample from. However, the sample size doesn’t change
much for populations larger than 20,000. This is a great thing because if the population size is unknown, we can simply take an
arbitrarily large figure of 20,000 or bigger.
RD: This may also be thought of as crude prevalence. If you test a random sample of 10 people for a disease, and 9 of them test
“positive” then the prediction that you make about the general population is different than say if 5 test positive. Therefore this
factor enters into the calculation. Setting the response distribution to 50% is the most conservative choice, as it assumes a most
heterogenous population and will return the largest sample size. If we have a crude idea of the response distribution or prevalence,
we can use this in the calculation. Any figure moving closer to 0 or 100, than 50%, will return a smaller sample size, the other
variables remaining constant.
The formula that we use is:
Sample size=([RD] [1−RD])/([ME/CL score]2)
If we are working with a finite population, we apply a finite population correction:
Corrected sample size = (Sample size × population)/(sample size + population – 1)
Let us work out an example. Suppose we want to find out what proportion of dermatology residents, from a population of 1000
residents, hate biostatistics. We will assume 5% margin of error and 95% confidence level. We have no prior idea of the response
distribution and so will use 50%.
Applying the first formula, we get
Sample size=([0.5] [1−0.5])/([0.05/1.96]2)=(0.25/0.00065077)=384.16
Since our population is relatively small, we apply a finite population correction
Corrected sample size=(384.16×1000)/(384.16+999)=384160/1383.16=277.74
Thus we need to survey about 278 respondents to find out the answer to our question. Note that if we reduce our error margin to 3%,
the required sample size jumps to 517.
In real life surveys, we need to adjust the calculated sample size for anticipated nonresponse rate and also anticipated attrition rate,
if the survey is to be administered more than once. If we adopt stratified or cluster sampling strategy while selecting respondents,
adjustments also need to be made on these counts.
A very easy to use survey sample size calculator, that is free to use, can be found online at http://www.raosoft.com/samplesize.html
ME: Margin of error, CL: Confidence level, RD: Response distribution

size, while still keeping it within clinically plausible a pooled conclusion and contribute to the body of
range. Perhaps a better instrument can be found evidence‑based medicine.
that will reduce the variability in the measurements.
It may even be feasible to make modifications to Conclusion
the study design (e.g., judicious stratification) that Determining the appropriate sample size for a study is
will further reduce the variance of the estimator. a fundamental aspect of ethical research. Performing a
As an alternative, we may consider networking valid sample size calculation requires estimates of the
with other sites and investigators willing to carry permissible Type 1 error rate (α) and power, variability
out the same study (the “Spiderman” approach) in in the data, the effect size sought, as well as a planned
collaboration. As a last resort, we forget about the method of analysis. Although the concepts underlying
limited sample size and concentrate instead on doing power analysis and sample size determination are
the study well (the Nike‑like “just do it” approach). relatively simple, the large number of different study
In this era of systematic reviews and meta‑analysis, designs and analysis methods results in a bewildering
even if our relatively small study cannot achieve number of different sample size formulas. Therefore, the
sufficient statistical power, as part of a sequence of use of power and sample size software is helpful and is
studies, it may still add enough muscle to arrive at gaining widespread usage.

503 Indian Journal of Dermatology 2016; 61(5)


Hazra and Gogtay: Determining sample size

Financial support and sponsorship Basic principles of sample size estimation. J Adv Nursing
2004;47:297‑302.
Nil.
3. Karlsson J, Engebretsen L, Dainty K, ISAKOS scientific
Conflicts of interest committee. Considerations on sample size and power calculations
in randomized clinical trials. Arthroscopy 2003;19;997‑9.
There are no conflicts of interest.
4. Statistical power and sample size. In: Cleophas TJ,
Zwinderman AH, Cleophas TF, Cleophas EP. Statistics applied
Further Reading to clinical trials. 4th ed. Amsterdam: Springer; 2009. p. 81‑97.
1. Julios SA. Sample sizes for clinical trials with normal data. 5. Calculation of required sample size. In: Kirkwood BR,
Stats Med 2004;23:1921‑86. Sterne JA. Essential medical statistics. 2nd ed. Oxford:
2. Devane D, Begley CM, Clarke M. How many do I need? Blackwell Science; 2003. p. 413‑28.

Indian Journal of Dermatology 2016; 61(5) 504

View publication stats

You might also like