0% found this document useful (0 votes)
50 views56 pages

9sample Size Determination

This document discusses determining appropriate sample sizes for studies. It covers estimating sample sizes for single populations, comparing two populations, and hypothesis testing. Key factors that influence sample size calculations include the required precision, confidence level, estimates of variance, and desired power.

Uploaded by

Muhe Man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views56 pages

9sample Size Determination

This document discusses determining appropriate sample sizes for studies. It covers estimating sample sizes for single populations, comparing two populations, and hypothesis testing. Key factors that influence sample size calculations include the required precision, confidence level, estimates of variance, and desired power.

Uploaded by

Muhe Man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

Sample Size Determination

Wakgari Deressa, PhD


School of Public Health
Addis Ababa University
• An essential part of planning any study
is to decide how many people need to
be studied
Sample Size
• Sample Size: The number of study
subjects selected to represent a given
study population.
• Important to make inferences based on
the findings from the sample.
• Should be sufficient to represent the
characteristics of interest of the study
population.
• In estimating a certain characteristic of
a population, sample size calculations
are important to ensure that estimates
are obtained with required precision or
confidence
• The accuracy of the envisaged results
determine the size of the sample
Example
• A prevalence of 10% from a sample size of
20
– would have a 95% CI of 1% to 31%,
– which is not very precise or informative.
• But, a prevalence of 10% from a sample
of size 400
– would have a 95% CI of 7% to 13%,
– which may be considered sufficiently
accurate.
• In studies concerned with detecting an
effect (e.g. a difference between two
groups), sample size calculations are
important to ensure the detection of
whether association exists or not.
• Large sample size can result in statistical
significance when Δ is very small (not of
practical, clinical or public health
importance).
• Small sample size can result in a non
statistically significant finding even when Δ
is large (of practical, clinical or public
health importance).
• If the sample is too small, then even if
large differences are observed, it will be
impossible to show that these are due to
anything more than sampling variation.
Sample size determination depends on the:
– Objective of the study
– Design of the study
• Descriptive/Analytic
– Accuracy of the measurements to be made
– Degree of precision required for generalization
– Plan for statistical analysis
– Degree of confidence with which to conclude
• Common questions:
– “How many subjects should I study?”
– Too small sample = Waste of time and resources
= Results have no practical use
– Too large sample = Waste of resources
= Data quality compromised
When deciding on sample size:
PRECISION COST

Sample size = Precision = Cost


• The feasible sample size is also
determined by the availability of
resources:
– time
– manpower
– transport
– available facility, and
– money
1. Sample Size: Single Sample
• The aim is to have a large enough sample
with which to estimate a population mean
or proportion within a narrow interval with
high reliability.
• Concerned with the precision of the
estimate (“narrowness of the CI”).
estimate ± d units
Sample size for single sample
includes:
A. Sample size for estimating a single
population mean
B. Sample size to estimate a single
population proportion
A. Sample size for estimating
a single population mean
• AIM: Estimate µ
• WANT: Estimate ( ) ± d units
where d = Margin of error =
= Absolute precision
= Half of the width (w) of CI
Steps:
1. Specify d (or w = 2d)
2. Use known σ2 or estimate using s2
Standard error of the
estimator of the parameter
3. of interest

Where d = e in some text books


Example:
1. Find the minimum sample size needed to estimate
the drop in heart rate (µ) for a new study using a
higher dose of propranolol than the standard one.
We require that the two-sided 95% CI for µ be no
wider than 5 beats per minute and the sample sd
for change in heart rate equals 10 beats per
minute.
2 2 2
n = (1.96) 10 /(2.5) = 62 patients
2. Suppose that for a certain group of cancer patients, we
are interested in estimating the mean age at diagnosis.
We would like a 95% CI of 5 years wide. If the
population SD is 12 years, how large should our
sample be?
• Suppose d=1
• Then the sample size increases
3. A hospital director wishes to estimate the
mean weight of babies born in the
hospital. How large a sample of birth
records should be taken if she/he wants a
95% CI of 0.5 wide? Assume that a
reasonable estimate of  is 2. Ans: 246
birth records.
But the population 2 is most of the
time unknown
As a result, it has to be estimated from:
• Pilot or preliminary sample:
– Select a pilot sample and estimate 2 with
the sample variance, s2
• Previous or similar studies
B. Sample size to estimate a single
population proportion
• Aim: Estimate p
• Want: Estimate ± d units where d = Z•SE
(95% CI of width=2d)
Steps:
1. Specify d (or w = 2d)
2. Use estimated p (use p=0.5 if no
information)
3. Solve for n
1. Suppose that you are interested to know the
proportion of infants who breastfed >18
months of age in a rural area. Suppose that in
a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample
size is required to estimate the true proportion
within ±3% points with 95% confidence. Let
p=0.20, d=0.03, α=5%
• Suppose there is no prior information
about the proportion (p) who breastfeed
• Assume p=q=0.5 (most conservative)
• Then the required sample size increases
• An estimate of p is not always available.
• However, the formula may also be used
for sample size calculation based on
various assumptions for the values of p.
• P = 0.1  n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2  n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3  n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5  n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7  n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8  n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
• For a fixed absolute precision (d), the
required sample size increases as P
increases form 0 to 0.5, and then
decreases in the same way as the
prevalence approaches 1.
2. A survey is planned to determine what
proportion of the medical students have
regularly chewed khat. If no estimate of p is
available and a pilot sample cannot be
drawn, what sample size would be required
if a 95% confidence is desired, and d=0.04
is to be used.
Ans: 600 students
2. Sample Size: Two Samples
A. Estimation of the difference between two
population means
B. Estimation of the difference between two
population proportions
A. Sample size for estimating
a difference in two means
• Aim: Estimate μ1-μ2
• Want: within ± d units,
where d = Zα/2.SE
(95% CI of width= w =2d)
• If equal sample size in both groups is
required, then:

2 2 2 2
• Use σ1 , σ2 or estimate using s1 and s2
B. Sample size for estimating a
difference in two proportions
• Aim: Estimate p1-p2
• Want: within ± d units
where d = Zα/2•SE
(95% CI of width = w = 2d)
• If equal sample sizes in both groups, then:

• Use estimates of p1, p2 or (or p1=p2 =0.5 if


unknown)
Points for Consideration
1. Sample size estimates might need to be adjusted to compensate
for non-response rate, patient dropout or loss to follow-up, lack
of compliance, etc.
2. If sampling is from a finite population of size N, then:
n0
n=
 n0 
1 + 
 N

where n0 is the sample from an infinite population. When N is


large in comparison to n, (i.e., n/N ≤ 0.05), the finite population
correction may be ignored.
3. Design effect for complex cluster sampling. Common values:
multiply n by 2, 3, …5.
3. Sample Size Based on
Hypothesis Testing
• The method of determining sample size in
the preceding sections takes into account
the probability of a type I error, but not a
type II error since the level of confidence is
determined by the confidence level (1-α).
• However, in many statistical inference
procedures, type II and type I errors are
considered when determining the sample
size.
Significance Difference Between Two Groups

• Using power of a study to determine


sample size = significant difference
= Hypothesis testing

• Aim: Have large enough samples to


detect a difference in population means
(or in population proportions)
• We would like to maintain low probability of
a Type I error (α) and low probability of a
Type II error (β) [high power = 1 - β].

Significance level of a test = α = Type I error

1 – α = Confidence 1 – β = Power
• Type I error (α) = The probability of
rejecting Ho when it is true

• Type II error () = The probability of not


rejecting Ho when it is false
• Power (1-) = the probability H0 is rejected
given that it is false
= P (rejecting Ho/H1 is true)
• If the power of a test is low, then there is
little chance of detecting a difference even if
one really exists
• Power is an important part of the design of a
study
– Power (1 - β) = 50%, Zβ = 0.00
– Power (1 - β) = 75%, Zβ = 0.67
– Power (1 - β) = 80%, Zβ = 0.84
– Power (1 – β) = 90%, Zβ = 1.28
• Power is one-sided and Zβ is always one-
sided
• Most of the studies recommend power of
80%.
Factors affecting the power
• If α decreases, the power decreases
• When the difference between Ho and HA
increases, then the power increases
• When  increases, then the power
decreases
• If the sample size (n) increases, the power
increases
Factors affecting the sample size
• The sample size increases as 2 increases
• The sample size increases as the
significance level (α) is made smaller (α
decreases)
• The sample size increases as the required
power increases
• The sample size decreases as the absolute
value of the difference between the Ho and
HA) increases
Ho = There is no difference between the
two groups
Ho: µ1 - µ2 = 0
P1 - P 2 = 0
HA = There is a difference between the
two groups
HA: µ1 - µ2 ≠ 0
P1 - P 2 ≠ 0
A. Comparison between two
means (Equal sample sizes)

∆ = /μ1-μ2/

The means and variances of the two respective groups


are (µ1, 2 ) and (µ2, 22).
1
Example
1. Determine the sample sizes required to detect a
difference of 5 mm in mean blood pressure
between individuals receiving placebo and those
receiving drug with α =5% and power of 0.80
• Assume σ1=σ2 = 15 mm in each group.
• We are interested in testing:
Ho: μ1- μ2 = 5, HA: μ1- μ2 ≠ 5

• We would need 142 individuals in each group


2. Suppose that the true blood pressure distribution
among OC users is normal with µ1 and 12.
Similarly, for non-users the distribution is normal
with µ2 and 22.We wish to test the hypothesis
that Ho: µ1 = µ2 versus µ1 ≠ µ2. Determine the
appropriate sample size for the study using α
=0.05 and a power of 80%. It was revealed by
the small study that: sample mean1=132.86,
s1=15.34, sample mean2=127.44, and s2=18.23.
Use the sample data to estimate population
parameters.
n = (15.342+18.232)(1.96+0.84)2/(132.86-127.44)2
= 152 in each group
B. Comparison between two
means (Unequal sample sizes)

λ =n2/n1
In some text books, λ = k = r
3. Suppose we anticipate twice as many non
OC users as OC users entering the study
using the previous example. Determine
the sample size to achieve an 80% power
in the study using α=0.05. λ = 2.

n1 = (15.342+18.232/2)(1.96+0.84)2/(5.42)2
= 108 OC users
and n2 = 2(108) = 216 non-OC users.
C. Comparison between two
proportions (Equal sample sizes)
• To test the hypothesis,
Ho: p1-p2 vs HA: p1-p2 ≠ 0,
|p1-p2| = ∆
with α and power (1-)
Where

∆ = p1-p2
• Let p1=0.35, p2=0.25, and Δ=p1-p2=0.35-
0.25 =0.10

• We would need approximately 329


subjects in each group
D. Comparison between two
proportions (Unequal sample sizes)

Note: This formula is quite general, and applies to cross-sectional,


case-control and cohort studies.
Example
• A study is proposed to study the effect of a new
anticoagulant therapy. Patients are to be
randomly divided into two groups: one receives
the anticoagulant, and the other placebo. The
groups are then followed for the incidence of
major bleeding events over 3 years. Suppose
that 5% of treated patients and 22% of controls
are anticipated to experience a major event over
3 years. How large sample should such a study
be to have an 80% chance of finding a
significance difference at a ratio of 1:2 for
treated and control at α =5%.
Solution
p1=0.05, p2=0.22, = (0.05+2*0.22)/(1+2) = 0.16
q1=0.95, q2=0.78, ∆ = 0.22-0.05 = 0.17
• If the OR or RR and one of the
proportions are known, we can compute
the unknown proportion by:

P2
P1  P1 = P2 * RR
1  P2
P2 
OR
Example
• A case-control study to compare the efficacy of
a vaccine for the prevention of child-hood
tuberculosis with a placebo. Let the proportion of
unvaccinated children is 30%, with an estimated
OR of at least 2.
P2 = 0.3, q2 = 0.7, OR = 2.0
P1 = 0.3/(0.3+0.7/2) = 0.462
• With equal cases and controls, what sample size
is required to detect, with 80% power and at α
5%?
= 140 in each group
Summary
• Sample size calculations depend on a
number of assumptions:
– the hypothesized difference of interest, Δ
– the probability of Type I error (α)
– the probability of Type II error (β)
– the variance
• Choice of sample size depends on a
balance of reasonable assumptions, time,
effort, and expense
• Sample sizes provide a minimum estimate
of the desired sample sizes for the study

You might also like