0% found this document useful (0 votes)
37 views

HT For PHO

The document discusses hypothesis testing and sample size determination. It defines hypothesis, types of hypotheses and errors. It also describes the steps of hypothesis testing which includes identifying hypotheses, selecting test statistics, determining critical values, and making decisions. Additionally, it discusses one and two tailed tests and calculating p-values.

Uploaded by

Bekalu Endale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

HT For PHO

The document discusses hypothesis testing and sample size determination. It defines hypothesis, types of hypotheses and errors. It also describes the steps of hypothesis testing which includes identifying hypotheses, selecting test statistics, determining critical values, and making decisions. Additionally, it discusses one and two tailed tests and calculating p-values.

Uploaded by

Bekalu Endale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Debre Birhan University

Asrat Woldeyes Heath science campus

School of Public health

Department of Epidemiology and Biostatistics

Hypothesis testing and Sample size determination

For public health officer students


By: Zenebe A.(BSc in PHO, MPH in Biostatistics )
Email: [email protected]
May, 2023
Debre Birhan, Ethiopia
5/22/2023 Zenebe A. 1
Objectives
At the end of this lesson students are expected to:
• List the types of hypothesis

• Differentiate the types of error in hypothesis testing

• Describe the steps of hypothesis testing

• Apply hypothesis test for different scenarios

• Determine sample size for cross-sectional studies

5/22/2023 Zenebe A. 2
HYPOTHESIS TESTING
Hypothesis is a testable statement that describes the nature
of the proposed relationship between two or more variables
of interest.

Researchers are interested in answering many types of


questions.

For example, A physician might want to know whether a new


medication will lower a person‟s blood pressure.

These types of questions can be addressed through


statistical hypothesis testing, which is a decision-making
process for evaluating claims about a population.
5/22/2023 Zenebe A. 3
Hypothesis Testing

• The formal process of hypothesis testing provides us with


a means of answering research questions.

• In hypothesis testing, the researcher must defined the


population under study, state the particular hypotheses
that will be investigated, give the significance level, select
a sample from the population, collect the data, perform the
calculations required for the statistical test, and reach a
conclusion.

5/22/2023 Zenebe A. 4
Type of Hypotheses
• Null hypothesis (represented by HO) is the statement about the value of the
population parameter. That is the null hypothesis postulates that ‘there is no
difference between factor and outcome’ or ‘there is no an intervention effect’.
• Alternative hypothesis (represented by HA) states the ‘opposing’ view that
‘there is a difference between factor and outcome’ or ‘there is an intervention
effect’.

5/22/2023 Zenebe A. 5
Methods of hypothesis testing
• Hypotheses concerning about parameters which may or may
not be true

• Examples

• The mean height of the AWHSC students is 1.63m.

• There is no difference between the distribution of Pf and


Pv malaria in Ethiopia (are distributed in equal
proportions.)

5/22/2023 Zenebe A. 6
Steps in hypothesis testing

2. Choose a. The value should be small,


1. Identify the null hypothesis usually less than 10%. It is important to
H0 and the alternate consider the consequences of both types
hypothesis HA. of errors.

3. Select the test statistic and 4. Compare the observed value of the
determine its value from the statistic to the critical value obtained for
sample data. This value is called the chosen a.
the observed value of the test
statistic. Remember that t
statistic is usually appropriate
for a small number of samples; 5
for larger number of samples, a Make a decision.
z statistic can work well if data 6
are normally distributed. Conclusion

5/22/2023 Zenebe A. 7
Test Statistics
 Because of random variation, even an unbiased sample may not accurately
represent the population as a whole.

 As a result, it is possible that any observed differences or associations may have


occurred by chance.

• A test statistics is a value we can compare with known distribution of what we


expect when the null hypothesis is true.

• The general formula of the test statistics is:

Observed _ Hypothesized

Test statistics = value value .

Standard error

• The known distributions are Normal distribution, student’s distribution , Chi-square


distribution ….
5/22/2023 Zenebe A. 8
Critical value
• The critical value separates the critical region from the noncritical
region for a given level of significance

5/22/2023 Zenebe A. 9
Decision making in hypothesis test
• Accept or Reject the null hypothesis
• There are 2 types of errors

Type of
H0 true H0 false
decision
Correct decision (1-
Reject H0 Type I error (a)
β)

Accept H0 Correct decision (1-a) Type II error (β)

• Type I error is more serious error and it is the level of significant


• power is the probability of rejecting false null hypothesis and it is
given by 1-β
5/22/2023 Zenebe A. 10
5/22/2023 Zenebe A. 11
5/22/2023 Zenebe A. 12
5/22/2023 Zenebe A. 13
Types of testes

H0: m = m0 One tailed test a Critical


Value(s)
H1: m < m0
0
Rejection Regions
a
H0: m = m0
H1: m > m0 0

H0: m = m0 a/2
H1: m  m0
0
Two tailed test
5/22/2023 Zenebe A. 14
• Hypothesis test for different scenarios

5/22/2023 Zenebe A. 15
1. Hypothesis testing about a Population mean
(μ)
Two Tailed Test:
The large sample (n > = 30) test of hypothesis about a population
mean μ is as follows

1 H 0 : m = m 0 ( =  0 )
H A : m1  m 0 (   0 )
x  m0
zcal =

n
ztabulated = z a for two tailed test
2

if | zcal | ztab reject H o


Decision : 
if | zcal | ztab do not reject H o
5/22/2023 Zenebe A. 16
Steps in hypothesis testing…..

If the test statistic does not fall in


If the test statistic falls in the
the critical region:
critical region:

Conclude that there is not enough


Reject H0 in favour of HA.
evidence to reject H0.
5/22/2023 Zenebe A. 17
One tailed tests
2 H 0 : m = m 0 ( =  0 )
H A : m1  m 0 (   0 )
x  m0
z cal = , ztabulated = za for one tailed test

n
if z cal   ztab reject H o
Decision : 
if z cal   ztab do not reject H o
3 H 0 : m = m 0 ( =  0 )
H A : m1  m 0 (   0 )
if z cal  ztab reject H o
Decision : 
if z cal  ztab do not reject H o
5/22/2023 Zenebe A. 18
The P- Value
• In most applications, the outcome of performing a hypothesis
test is to produce a p-value.

• P-value is the probability that the observed difference is due to


chance.

• That is, a small p-value suggests that there might be sufficient


evidence for rejecting the null hypothesis.

• The p value is defined as the probability of observing the


computed significance test value or a larger one, if the H0
hypothesis is true. For example, P[ Z >=Zcal/H0 true].
5/22/2023 Zenebe A. 19
P-value……
• A p-value is the probability of getting the observed
difference, or one more extreme, in the sample purely
by chance from a population where the true difference is
zero.

• If the p-value is greater than 0.05 then, by convention,

• we conclude that the observed difference could have


occurred by chance and there is no statistically significant
evidence (at the 5% level) for a difference between the
groups in the population.
5/22/2023 Zenebe A. 20
How to calculate P-value
 Use statistical software like SPSS, SAS……..
 Hand calculations
 obtained the test statistics (Z Calculated or t-calculated)
 find the probability of test statistics from standard normal
table
 subtract the probability from 0.5
 the result is P-value
 Note: if the test two tailed multiply 2 the result.

5/22/2023 Zenebe A. 21
P-value and confidence interval
• Confidence intervals and p-values are based upon the same
theory and mathematics and will lead to the same conclusion
about whether a population difference exists.

• Confidence intervals are preferable because they give


information about the size of any difference in the population,
and they also (very usefully) indicate the amount of uncertainty
remaining about the size of the difference.

• When the null hypothesis is rejected in a hypothesis-testing


situation, the confidence interval for the mean using the same
level of significance will not contain the hypothesized mean.

5/22/2023 Zenebe A. 22
The P- Value …..
• But for what values of p-value should we reject the null
hypothesis?

By convention, a p-value of 0.05 or smaller is considered


sufficient evidence for rejecting the null hypothesis.

By using p-value of 0.05, we are allowing a 5% chance of


wrongly rejecting the null hypothesis when it is in fact
true.

• When the p-value is less than to 0.05, we often say that the
result is statistically significant.

5/22/2023 Zenebe A. 23
Hypothesis testing for single population mean…..

EXAMPLE 1: A researcher claims that the mean of the IQ for 16


students is 110 and the expected value for all population is 100 with
standard deviation of 10. Test the hypothesis .
• Solution
1. Ho:µ=100 VS HA:µ≠100
2. Assume α=0.05
3. Test statistics: z=(110-100)4/10=4
4. z-critical at 0.025 is equal to 1.96.
5. Decision: reject the null hypothesis since 4 ≥ 1.96
6. Conclusion: the mean of the IQ for all population is different
from 100 at 5% level of significance.

5/22/2023 Zenebe A. 24
Example: 2
Suppose that we have a population mean 3.1 and n=20
people x = 4.5 and s = 5.5 found and , our test statistic is
1. Ho: m = 3.1
HA: m  3.1
2. α = 0.5 at 95% CI t 0.05,19 = 2.09
3. x  m 4.5  3.1
t= = = 1.14
s 5 .5
n 20
4. the observed value of the test statistic falls with in the range
of the critical values
5. we accept Ho and conclude that there is no enough
evidence to reject the null hypothesis.
5/22/2023 Zenebe A. 25
Hypothesis testing for single proportions
Example : In the study of childhood abuse in psychiatry patients, brown
found that 166 in a sample of 947 patients reported histories of physical or sexual
abuse.
a) constructs 95% confidence interval
b) test the hypothesis that the true population proportion is 30%?
• Solution (a)
– The 95% CI for P is given by

 p (1  p )
p  za
2 n
0.175 0.825
 0.175  1.96 
947
 0.175  1.96  0.0124
 [0.151 ; 0.2]
5/22/2023 Zenebe A. 26
Example……
• To the hypothesis we need to follow the steps
Step 1: State the hypothesis
Ho: P=Po=0.3
Ha: P≠Po ≠0.3
Step 2: Fix the level of significant (α=0.05)
Step 3: Compute the calculated and tabulated value of the test statistic


p  Po 0.175  0.3  0.125
zcal = = = = 8.39
p (1  p ) 0.3(0.7) 0.0149
n 947
ztab = 1.96
5/22/2023 Zenebe A. 27
Example……
• Step 4: Comparison of the calculated and tabulated values
of the test statistic

• Step 5: Since the tabulated value is smaller than the


calculated value of the test the we reject the null hypothesis.

• Step 6: Conclusion

• Hence we concluded that the proportion of childhood abuse


in psychiatry patients is different from 0.3

• If the sample size is small (if np<5 and n(1-p)<5) then use
student‟s t- statistic for the tabulated value of the test
statistic.
5/22/2023 Zenebe A. 28
Statistical Inferences Based on Two Samples

Comparing Two Population Means;

 Independent Samples: Variances Known

 Independent Samples: Variances Unknown

Paired Difference Experiments

 Paired/matched/repeated sampling

Comparing Two Population Proportions

 Large, Independent Samples case

5/22/2023 Zenebe A. 29
Case-1: Independent Samples, Variances Known

5/22/2023 Zenebe A. 30
Comparing Two Population Means;
Independent Samples, Vars Known cont’d…

 12  22
 x x = 
1 2
n1 n2

5/22/2023 Zenebe A. 31
Comparing Two Population Means;
Ind’t Samples, Vars Known cont’d…

• In testing hypothesis, the z value can then be calculated as;

z =
 x1  x2   D0
 12  22

n1 n2

5/22/2023 Zenebe A. 32
Hypothesis testing for two sample means
• The steps to test the hypothesis for difference of means is the
same with the single mean
Step 1: state the hypothesis
Ho: µ1-µ2 =0
VS
HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2 >0
Step 2: Significance level (α)
Step 3: Test statistic

( x  y )  ( m1  m 2 )
zcal =
12  22

n1 n2
5/22/2023 Zenebe A. 33
Hypothesis …
ztabulated = z a for two tailed test
2

ztabulated = za for one tailed test


if | zcal | ztab reject H o
For H A : m1  m 2  0
if | zcal | ztab do not reject H o
if zcal   ztab reject H o
For H A : m1  m 2  0
if zcal  zcal do not reject H o
if zcal  zcal reject H o
For H A : m1  m 2  0
if zcal  zcal do not reject H o
5/22/2023 Zenebe A. 34
Example :
• A researchers wish to know if the data they have collected
provide sufficient evidence to indicate a difference in mean
serum uric acid levels between normal individual and
individual with down‟s syndrome.

• The data consists of serum uric acid readings on 12


individuals with down‟s syndrome and 15 normal individuals.
The means are 4.5mg/100ml and 3.4 mg/100ml with
standard deviation of 2.9 and 3.5 mg/100ml respectively.

H O : m1  m 2 = 0
5/22/2023 Zenebe A.
H A : m1  m 2  0 35
SOLUTION
( x  y )  ( m1  m 2 ) ( 4 .5  3 .4 )  0
z cal = =
 2
 2
2 .9 2 3 .5 2
1
 2

n1 n2 12 15
1 .6 1 .1
= = = 0.90
1.5178 1.23
z a = z 0.025 = 1.96
2

Decision: accept the null hypothesis


Conclusion: at 5% level of significant their no enough evidence to conclude that
the mean serum uric acid levels between normal individual and individual with
Down’s syndrome is different.

5/22/2023 Zenebe A. 36
case-2: Independent Samples, Variances Unknown

5/22/2023 Zenebe A. 37
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…

A. Assume that the unknown variances; σ12 = σ22 = σ2


• The pooled estimate of σ2 is the weighted average of the two
sample variances, s12 and s22
• The pooled estimate of σ2 is denoted by sp2 .

s 2p =
n1  1s12  n2  1s22
n1  n 2 2

• The estimate of the population standard deviation of the


sampling distribution is;
2 1 1 
 x1  x2 = s p   
 n1 n2 
5/22/2023 Zenebe A. 38
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…

• The sampling distribution in this case approximately


normal when both n1 and n2 are large (>30), irrespective of
the distribution of the population. So, in this

• The calculated value of z will be;

z =
 x1  x2   D0
 1 1  where Do = (µ1 – µ2)o
s 2p 
n  

 1 n2 

5/22/2023 Zenebe A. 39
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…

• The sampling distribution will have t-distribution in case


n1 and/or n2 are (≤30), irrespective of the distribution of
the population. So, in this

• The calculated value of t will be

t =
 x1  x2   D0
 1 1 
s 
n  n 
2
p 
 1 2 

5/22/2023 Zenebe A. 40
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…

z =
 x1  x2  D0
s12 s 22

n1 n2

 x1  x2   D0 df =
s 2
1 /n1  s /n2
2
2 
2

t =
s12

s22 s
2
1 /n1

2
s /n2 2
2 
2

n1  1 n 2  141
n1 n2
5/22/2023 Zenebe A.
Case-3: Paired/matched/repeated sampling
• Rises from two different processes on same study units (e.g.
"before” and “after” treatments) or two different processes on
paired/matched study units ( e.g. Pair matched case control
studies).

• Use of the same/matched individuals, eliminates any


differences in the individuals themselves (confounding factors).

• Inference concerning the difference between two population


means is similar to one population mean; except that we will be
manipulating on the dis here.

5/22/2023 Zenebe A. 42
Paired sampling cont’d…

5/22/2023 Zenebe A. 43
Paired sampling cont’d…

5/22/2023 Zenebe A. 44
Paired sampling cont’d…

5/22/2023 Zenebe A. 45
Paired sampling cont’d…

5/22/2023 Zenebe A. 46
Hypothesis testing for two proportions

• Suppose that n1 and n2 are large enough so that;


– n1·p1≥5, n1·(1 - p1)≥5, n2·p2≥5, and n2·(1 – p2)≥5

• Then the population of all possible values of p̂1 - p̂ 2;


– Has approximately a normal distribution

– Has mean µp̂1 - p̂2 = p1 – p2


p1 1  p1  p 2 1  p 2 
– Has standard deviation;  p̂1  p̂2 = 
n1 n2

5/22/2023 Zenebe A. 47
Hypothesis testing for two proportions

• To test the hypothesis


Ho: π1-π2 =0
VS
HA: π1-π2 ≠0
The test statistic is given by

( p1  p2 )  ( 1   2 )
zcal =
p1 (1  p1 ) p2 (1  p2 )

n1 n2
5/22/2023 Zenebe A. 48
Small sample size

• If the sample size is small and n1p1 <5 and n2p2<5,


then use student‟s t-test at n1+n2-2 degree of freedom
at a give level of significant .

5/22/2023 Zenebe A. 49
Comparing Two Population Proportions cont’d…

Example 10: A study was conducted to look at the effects of oral


contraceptives (OC) on heart disease in women 40–44 years of age. It is
found that among n1 = 500 current OC users, 13 develop a myocardial
infarction (MI) over a three-year period, while among n2 = 1000 non-
OC users, seven develop a MI over a three-year period. Then;

A. Construct a 95% confidence interval for the difference of MI rates


between OC-users and non-users.

B. Can you conclude that rate of MI is significantly greater among OC


users? (Report the P-value for your test)
5/22/2023 Zenebe A. 50
Chi-Square test

5/22/2023 Zenebe A. 51
Chi-Square Test
• The Chi-squared test measures the disparity
between observed frequencies (data from the
sample) and expected frequencies (probability
distribution)

• Chi-Square test(X²) allows us to test for


association between two categorical variables.

5/22/2023 Zenebe A. 52
The Chi-squared test (2-test with d.f. = (r-1)x(c-1))

nad  bc
2
 =
2
for 2 x 2 table
(a  c)(b  d )(a  b)(c  d )

2 = 
O
ij  Eij 
2

for rxc table


i, j Eij

i th raw total  jth column total Ri  C j


Eij = =
grand total n
Oij=observed frequency, Eij=expected frequency of the cell
at the juncture of I th raw & j th column

5/22/2023 Zenebe A. 53
Assumptions of the 2 - test
 No expected value in the table is <1, and

 No observed frequency is zero

 No more than 20% of the expected frequencies


should be <5.

 Observation should be independents of each other

5/22/2023 Zenebe A. 54
Assumptions…
 If some numbers are too small,
• row or column variables categories can sometimes be
combined to make the expected frequencies larger or
use Yates correction,
• the Fisher‟s exact test should be used instead.

 It assumes that measures are independent of each other i.e.


the categories created are mutually exclusive.

 The 2 - test assumes that there is/must exist theoretical


basis for the categorization of the variables.
 This is to ensure that the analysis will be meaningful
5/22/2023 Zenebe A. 55
Assumptions

5/22/2023 Zenebe A. 56
Testing hypothesis

• Step 1. State the hypotheses to be tested


• The null hypothesis (HO) states „No association
between Exposure and disease‟

• The alternative hypothesis (HA) states there is an


association between Exposure and disease

• Step 2. Select a Sample & Collect Data

• Construct cross tabulation with cells comprised of


observed data or sample frequencies

5/22/2023 Zenebe A. 57
Testing hypothesis ...

• Step 3. Calculate the Test statistics


• Chi-squared test with df= (rows-1)(columns-1)
• Assumption:
• no observed cell is 0
• no expected cell is less than 5

• Step 4. Evaluate the Evidence Against HO .

Set α (usually =<0.05) ;

5/22/2023 Zenebe A. 58
Testing hypothesis ...
• Calculate p value (the p value is the actual
probability of obtaining a test statistic equal to or
greater than the calculated χ2).

• Reject HO if the calculated p value ≤ α i.e. the


calculated χ2statistic is greater than the distribution
value.

• Step 5. Estimate the measure of strength of


association and 95% CI and Give conclusion

5/22/2023 Zenebe A. 59
Characteristics of chi-square test

1. Every χ2 distribution extends indefinitely to the right


from 0.

2. Every χ2 distribution has only one (right ) tail.

3. As df increases, the χ2 curves get more bell shaped


and approach the normal curve in appearance (but
remember that a chi square curve starts at 0, not at -
∞)

5/22/2023 Zenebe A. 60
• If the value of χ2 is zero, then there is a perfect
agreement between the observed and the expected
frequencies.

• The greater the discrepancy between the observed


and expected frequencies, the larger will be the value
of χ2.

• In order to test the significance of the χ2, the calculated


value of χ2 is compared with the tabulated value for the
given df at a certain level of significance.

5/22/2023 Zenebe A. 61
Example…
• In an experiment with peas one observed 360 round
and yellow, 130 round and green, 118 wrinkled and
yellow and 32 wrinkled and green. According to the
Mendelian theory of heredity the numbers should be in
the ratio 9:3:3:1. Is there any evidence of difference
from the plants at 5% level of significance?

5/22/2023 Zenebe A. 62
Example…

• Ho : Ratio is 9:3:3:1 & HA : Ration is not 9:3:3:1


• Data
• Category Oi Proportion Ei
RY 360 9/16 360
RG 130 3/16 120
WY 118 3/16 120
EG 32 1/16 40
• χ2 calc = (360 -360)2 / 360 + (130-120)2/ 120 + (118-120)2/
120 + (32-40)2/40 = 0 + .833 + .033 + 1.60 = 2.466 ≈ 2.47
• χ2 tab ( α = .05, df=3) = 7.82
• χ2 calc < χ2 tab ⇒accept HO
• Therefore, Ratio is 9:3:3:1

5/22/2023 Zenebe A. 63
5/22/2023 Zenebe A. 64
Sample size determination…
• How Big is Big Enough?

• Generally the larger the better, but that takes more time and
money

• Answer depends on:

How different or dispersed the population is

Desired level of confidence

Desired margin of error

5/22/2023 Zenebe A. 65
Sample size determination…
The prevalence/incidence of the problem

The effect size of the factors

Number of predictors
• If too few sample size : It may fail to detect an important
effect
• Estimates of effect may be too imprecise (wide CI‟s)
• Both too small and too large sample may be unethical

5/22/2023 Zenebe A. 66
Sample size calculation for qualitative
studies

• It is very difficult to set the sample size from the outset.

• Of course, with a thorough look at of the resource and time


one has, and some reading of similar studies, one can give
a reasonable indication.

• However, it is good to leave it for saturation of the study

• Hypothetical sample size for phenomenological study, and


ground theory ?

5/22/2023 Zenebe A. 67
Sample size determination for quantitative
study
• An adequate minimum sample size can be determined using:

• Thumb rule,

• Precision approach (single population proportion or mean


approach)

• Power approach (two population proportion or mean


difference approach)

• We may required to consider design effect, a non-response


rate (contingency), and/or the number of independent variables

5/22/2023 Zenebe A. 68
Sample size determination…

• Which variables should be included in sample size


calculation?

 It should relate to the study‟s primary outcome


variable

 If the study have secondary outcome variables which


are considered important, the sample size should also
be sufficient for the analysis of these variables

5/22/2023 Zenebe A. 69
1. Rules of thumb approach
1. For smaller samples (N ‹ 100), there is little point in
sampling. Survey the entire population.

2. If the population size is around 500, 50% should be


sampled.

3. If the population size is around 1500, 20% should be


sampled.

4. Statistician – máxima list – at least 500

5. To make generalization about entire population, need a


total sample size of 200-400
5/22/2023 Zenebe A. 70
Statistical approaches

• Precision approach (single population proportion or


mean approach)

• Power approach (two population proportion or mean


difference approach)

5/22/2023 Zenebe A. 71
Statistical approach: precision approach

Mean estimation Proportion estimation


• Given confidence level • Given confidence level
• •

5/22/2023 Zenebe A. 72
Precision approach (Single population)
• In proportion estimation, p can be obtained from:

• Previous studies

• Conduct pilot study, or

• Take 50%,

• In mean estimation, use either

• Previous study, or

• Conduct pilot study to get S

5/22/2023 Zenebe A. 73
Example
• Assume that a researcher wants to estimate
the prevalence of gestational DM among
pregnant women in Debre Birhan Town.
• According to previous studies the prevalence
of GDM was 32%. The researcher wants to
calculate his sample size at 95% level of
confidence.
• Based on the above information, what is the
minimum adequate sample size needed to
answer his research objective?
5/22/2023 Zenebe A. 74
Example…
• Assume you are interested in knowing the mean
systolic blood pressure among pregnant women
with danger signs of pregnancy admitted at
Hakim gizaw hospital. Based on previous studies
done, the SD was 25 mmHg and level of precision
of 5 mmHg of either side.
• The researcher wants to calculate his sample size
at 95 percent level of confidence.
• Based on the above information, what is the
minimum adequate sample size needed to
answer his research objective?

5/22/2023 Zenebe A. 75
Power approach analytic or two population approach
for proportion estimation

5/22/2023 Zenebe A. 76
Power approach…
• The inputs to calculate sample size are confidence level,
power, p1 , p2 , or measure of association (OR or RR)

• Measures of associations are based on previous studies


or reports and should reflect the minimum effect that the
investigator considers worth detecting.

• Two examples are the relative risk and the odds ratio

• If n1 = n2 then the first (Fleiss) formula simplifies to

5/22/2023 Zenebe A. 77
Power approach …
• For means estimation

• If the sample size in each group to be taken is different,


we use the following formula for mean estimation

Where k is the ratio of n1 to n2 ; and ε is a clinically


meaningful mean difference between two groups (μ1 -μ2 ); σ
2 is the variance of either of the two groups which is assumed
to be equal
5/22/2023 Zenebe A. 78
Power approach …
• If variance of the two group is not known, we
take the pooled variance of the two samples
as

5/22/2023 Zenebe A. 79
Example
• A study is being planned to test whether a dietary
supplement for pregnant women will bring change to
the birth weight of babies.
• One group of women will receive the new supplement
and the other group which has a double size of the first
group will receive the usual nutritional consultation.
• From a pilot study, the standard deviation in birth
weight is estimated as 500g and is assumed to be the
same in both groups.
• The hypothesis of no difference is to be tested at 5%
level of significance. It is desired to have 80% power of
detecting an increase of 100g

5/22/2023 Zenebe A. 80
Solution

Thus sample of 294 and 588 will be taken from group one and two respectively

5/22/2023 Zenebe A. 81
Points to be considered

5/22/2023 Zenebe A. 82
Design effects
• The loss of effectiveness by the use of cluster sampling instead
of simple random sampling is design effect.

• Working definition of design effect is that factor by how much


sample variance for the sample plan exceeds simple random
sample of same size.

• How much worse your sample is from a simple random sample

• For cluster sampling, design effect is 2

• For multistage sampling design effect is equal to the number of


stages Get

5/22/2023 Zenebe A. 83
Sample size calculation for survival

• The number of events(m)

Where:
m= number of events
HR= exp (θ)
π = Fraction of subjects in the first group
With equal allocation (m1 = m2), then

Then, you have to


consider the
proportion of
withdrawal in the
final sample size
5/22/2023 Zenebe A. 84
Example
• If you need to conduct a research on time to
mortality among neonates admitted at NICU, and
previous study reported the probability of death
was 0.75.
• Besides, the commonly reported predictors of
mortality was prematurity with HR of 2.8 and
proportion of withdrawal of 10 percent at 95
percent level of confidence and 80 percent
power.
• Based on this calculate the minimum adequate
sample size

5/22/2023 Zenebe A. 85
For comparative cross-sectional study

5/22/2023 Zenebe A. 86
For cohort study

5/22/2023 Zenebe A. 87
Reading assignment
• Sample size calculation for continuous
outcome comparative cross-sectional study??
• Sample size calculation for independent-t
test?
• Sample size calculation for chi-square test??

5/22/2023 Zenebe A. 88
5/22/2023 Zenebe A. 89

You might also like