100% found this document useful (1 vote)
110 views124 pages

Basics of Biostatistics Course

This document provides an introduction to biostatistics. It discusses what biostatistics is, the stages of statistical investigation, and classifications of statistics such as descriptive and inferential statistics. Probability is also introduced, which is the likelihood of events occurring. Key concepts in probability like random experiments, sample space, and events are outlined. The introduction serves to lay the groundwork for further exploring topics in biostatistics like probability distributions, sampling methods, sample size calculations, and sampling distributions.

Uploaded by

Mubarek Umer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
110 views124 pages

Basics of Biostatistics Course

This document provides an introduction to biostatistics. It discusses what biostatistics is, the stages of statistical investigation, and classifications of statistics such as descriptive and inferential statistics. Probability is also introduced, which is the likelihood of events occurring. Key concepts in probability like random experiments, sample space, and events are outlined. The introduction serves to lay the groundwork for further exploring topics in biostatistics like probability distributions, sampling methods, sample size calculations, and sampling distributions.

Uploaded by

Mubarek Umer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 124

Basics for Biostatistics

Zeytu Gashaw Asfaw (PhD)


Department of Biostatistics and Epidemiology
School of Public Health, Addis Ababa
University
Addis Ababa, Ethiopia

June 29, 2023

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
1 / 123 A
d
Table of contents

1 Introduction

2 Probability distributions (Normal, Binomial, Poisson)

3 Types of Sampling Methods

4 Sample Size Calculation

5 Sampling Distribution

6 References

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
2 / 123 A
d
Introduction

What is Statistics?

Statistics is a science deals with collect, organize, analyze, and


draw meaningful inferences from data, which lead to good
decisions.
Statistics: is the art and science of making decisions in the face
of uncertainty
The field of statistics provides some of the most fundamental tools
and techniques of the scientific method;
forming hypotheses,
designing experiments and observational studies,
gathering data,
summarizing data,
drawing inferences from data (e.g., testing
hypotheses)

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
3 / 123 A
d
Introduction

Stages in Statistical Investigation

Data Collection
Organization and Presentation of
data Data Analysis
Interpretation of the results

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
4 / 123 A
d
Introduction

Limitation of Statistics

It does not study qualitative characteristics directly


It doesn’t deals with a single individuals but deals with aggregate
of facts.
Statistical findings are
approximate It is sensitive to
misuse.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
5 / 123 A
d
Introduction

Classification of Statistics

Descriptive Statistics
It helps to describe a given set of data without going beyond that
data It consists of collection, organization, summarization,and anaysis
of data

Inferential Statistics
It helps to make inference/conclusion about a population based on
the selected sample
It consists of predict and forecast values of population parameters, test
hypothesis about values of population parameters and make decisions

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
6 / 123 A
d
Introduction

What is Probability?

Probability is the branch of mathematics concerning numerical


descriptions of how likely an event is to occur, or how likely it is
that a proposition is true.
Probability theory, a branch of mathematics concerned with
the analysis of random phenomena.
The outcome of a random event cannot be determined before
it occurs, but it may be any one of several possible outcomes.
The actual outcome is considered to be determined by chance.
The probability of an event is a number between 0 and 1, where,
roughly speaking, 0 indicates impossibility of the event and 1
indicates certainty.
The higher the probability of an event, the more likely it is that the
event will occur.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
7 / 123 A
d
Introduction

Basic concepts of probability

Random Experiments
Sample space
Events
1 Mutually exclusive events (Disjoint events)
2 Equally likely events - equal chance to
3 occur.
Favourable events - the number of outcomes favourable to an event
in an experiment is the number of outcomes which entail the
4 happening of the event
Exhaustive events - outcomes are said to be exhaustive when they
5 include all possible outcomes.
Independent events - if the occurrence or non-occurrence of an
event does not affect the occurrence or non-occurrence of the other.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
8 / 123 A
d
Introduction

Examples - Applications Of Probability in Real


Life
Probability is used to answer the following types of
questions: What is the chance that it will rain
tomorrow?
What is the chance that a stock will go up in price? What
is the chance that I will have a heart attack? What is
the chance that I will live longer than 70 years?
What is the likelihood that when rolling a pair of dice, I will roll
doubles?
What is the probability that I will win the lottery? What
is the probability that I will become diabetic?

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
9 / 123 A
d
Introduction

...Examples - Applications Of Probability in Real Life

Forecasting the weather.


Sports outcomes. Coaches use probability to decide the best
possible strategy to pursue in a game.
Card games and other games of
chance. Insurance.
Medical diagnosis.
This is one of the most noble applications of probability in real
life. How does your doctor know that your cough is just because of
an infection and not because of something more serious?
Election results.
Lottery probability.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
10 / 123Ad
Introduction

Random variable

A real variable X whose value is determined by the outcome of


a random experiment is called a random variable.
A random variable, X, is a function that assigns a single, but
variable, value to each element of a sample space.
Example: A random experiment consists of two tosses of a coin.
Consider the random variable which is the number of heads (0, 1,
2).
Outcome Value of X
HH 2
HT 1
TH 1
TT 0

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
11 / 123Ad
Introduction

Discrete and Continuous Random


Variables
1 A discrete random variable:
has a countable number of possible values
has discrete jumps (or gaps) between successive values
has measurable probability associated with individual values
counts
2 A continuous random variable:
has an uncountably infinite number of possible values
moves continuously from value to value
has no measurable probability associated with each value
measures (e.g.: height, weight, speed, value, duration, length)

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
12 / 123Ad
Introduction

Discrete Vs Continuous Probability Distribution

Discrete Probability Distribution Continuous Probability Distribution


Bernouli Normal
Binomial Exponential
Poisson Gamma
Hypergeometric Logistic
Multinomial Weibul
Geometric Log-logistic
Negative Binomial Log-normal
. .

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
13 / 123Ad
Introduction

The Binomial Random Variable

Consider a Bernoulli Process in which we have a sequence of


n identical trials satisfying the following conditions:
1 Each trial has two possible outcomes, called success and failure.
The two outcomes are mutually exclusive and exhaustive.
2 The probability of success, denoted by p, remains constant from
trialtrial. The probability of failure is denoted by q, where q = 1 −
to
3 The
p. n trials are independent. That is, the outcome of any trial does
not affect the outcomes of the other trials.
A random variable, X, that counts the number of successes in n
Bernoulli trials, where p is the probability of success in any given
trial, is said to follow the binomial probability distribution with
parameters n (number of trials) and p (probability of success). We
call X the binomial random variable.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
14 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

What is Probability Distribution?

The probability distribution of a random variable is a table that lists the


possible values of the random variables and their associated
probabilities.
Example: Consider the different possible orderings of boy (B) and
girl
(G) in four sequential births. There are 2*2*2*2=24 =
x space
16 possibilities, so the sample p(x) is:
1
0 16
4
1 16
6
2 16
4
3 16
1
4 16

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
15 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

The Binomial Probability Distribution

What is Binomial Distribution?


Binomial distribution is a common probability distribution that
models the probability of obtaining one of two outcomes under
a given number of parameters.
It summarizes the number of trials when each trial has the
same chance of attaining one specific outcome.
The binomial distribution is frequently used to model the number of
successes in a sample of size n drawn with replacement from a
population of size N.
If the sampling is carried out without replacement, the draws are
not independent and so the resulting distribution is a hypergeometric
distribution, not a binomial one.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
16 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...The Binomial Probability Distribution

The binomial probability distribution:


P(x ) = x P x(1
n − p) = n!
x !(n− x )!P
x
(1 − n− x
n−x p)
where
p is the probability of success in a single trial
q = 1 - p = probability of failure on a single
trial n is the number of trials
x is the number of successes.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
17 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Binomial distribution

1 Mean = µ = np
2 Variance = σ2 = npq
3 √
Standard Deviation = σ = npq
4
Moment Generating function (MGF) = (q +
5
pet ) n Characteristic function (CF) = (q + pei t ) n
6 Skewness = √q−npq
p

1−6pq
7 Kurtosis = npq

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
18 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Binomial distribution - Probability mass


function

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
19 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Binomial distribution - Cumulative distribution function

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
20 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Examples of the Binomial Distribution

Example 1. A family has four children. Assuming the probability of


male birth is 12, find the probability that there will be
a
1at least one boy
2at least one boy
and one girl
Solution:
Let X and Y denote the number of boys and girls in the
family, respectively.
Given P(x ) = 1 and P(y ) = 1
2
2

P(at least one boy ) = P(X ≥ 4


1) = 1 − P(X = 0) = 1 − [ ( )4 ]
1
0
1 2
= 1 − 15 =
16 16
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
21 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Examples of the Binomial Distribution

...Example 1

P(at least one boy and one girl )


= P[(X ≥ 1) ∩ (Y ≥ 1)]
= 1− P[(X < 1) ∩ (Y < 1)]
= 1− [P(X = 0) + P(Y =
0)] 1 1
= 1− [( )4 + ( )4
2
] 1
2
= 1 − (1 + )
16 16
= 7
8

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
22 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...Examples of the Binomial Distribution

Example 2. In a research, rats are injected with a drug that inhibits


body synthesis of protein. By the previous research, it was found
that the probability of a rat dying from the drug before the
experiment is over is 0.2. If 10 rats are used.
1How many are expected to die before the experiment
2ends. What is the probability that at least eight will
survive/
Solution:
Let X be the random variable representing the number of rats that die
in the experiment.
Here, P = 0.2,⇒ q = 0.8, n = 10

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
23 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...Examples of the Binomial Distribution

...Example 2 Expected number of rats that die before the experiment


= 10 × 0.2 = 2
P(at least eight will survive) = P(at most 2 will die)

P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X =


2) 10 10 10
= (0.2)0(0.8)10 + (0.2)1(0.8)9 + (0.2)2(0.8)8
0 1 2
=
0.6778

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
24 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...Examples of the Binomial Distribution

Example 3: Assuming half of the population are consuming rice, so


that
the chance of an individual being a consumer is 21 and assuming that
investigation
100 take each 10 individuals to check whether they are
consumers. How many investigators do we expect to report three or
less were consumers.
Solution:
Let X be the random variable representing the number of
consumers of rice in 10 individuals.
n = 10, p = 2, q =
1 1

P(X = x ) = 2 10
x ( 2 )x ( ) 10−x
10
1 1 = x ( 1 )10
2 2

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
25 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...Examples of the Binomial Distribution

...Example 3

P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X =


3) 10 10 10 10
= ( )10 + ( ) + ( )10 + ( ) 10
10 2 10 11 12 13
2 2 2
= ( 1 )10 (1 + 10 + 45 +
2
120)176
= =
1024
0.1718

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
26 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...Examples of the Binomial Distribution

Example 4: Number of Side Effects from Medications


Medical professionals use the binomial distribution to model the
probability that a certain number of patients will experience side
effects as a result of taking new medications.
For example, suppose it is known that 5% of adults who take a
certain medication experience negative side effects.
We can use a Binomial Distribution Calculator to find the
probability that more than a certain number of patients in a random
sample of 100 will experience negative side effects.
1 P(X > 5 patients experience side effects) = 0.38400
2 P(X > 10 patients experience side effects) = 0.01147
3 P(X > 15 patients experience side effects) = 0.0004

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
27 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...Examples of the Binomial Distribution

Example 5: Number of Spam Emails per Day


Email companies use the binomial distribution to model the
probability that a certain number of spam emails land in an inbox
per day.
For example, suppose it is known that 4% of all emails are spam.
If an account receives 20 emails in a given day, we can use a Binomial
Distribution Calculator to find the probability that a certain number
of those emails are spam:
1 P(X = 0 spam emails) = 0.44200
2 P(X = 1 spam email) = 0.36834
3 P(X = 2 spam emails) = 0.14580

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
28 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

The Poisson Probability Distribution

What is Poisson Distribution?


The Poisson probability distribution is useful for determining the
probability of a number of occurrences over a given period of time
or within a given area or volume.
That is, the Poisson random variable counts occurrences over
a continuous interval of time or space.
A Poisson distribution is a discrete probability distribution.
It gives the probability of an event happening a certain number of
times (k) within a given interval of time or space. The Poisson
distribution has only one parameter, (lambda), which is the mean
number of events.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
29 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...The Poisson Probability Distribution

The Poisson distribution represents the probability distribution of


a certain number of events occurring in a fixed time interval.
The events tend to have a constant mean rate.
Poisson distribution is further used to determine how many times
an event is likely to occur within a given time period.
The range of Poisson distribution starts at zero, and it goes
until infinity.
In Poisson distribution, the rate at which the events occur must
be constant, and the occurrence of one event must not affect the
occurrence of any other event, i.e., the events should occur
independently.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
30 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Examples of Poisson
Distribution
1 Number of Network Failures per Week
2 Number of Bankruptcies Filed per Month
3 Number of Website Visitors per Hour
4 Number of Arrivals at a Restaurant
5 Number of Calls per Hour at a Call
6 Center Number of Books Sold per Week
7 Average Number of Storms in a City
8 Number of Emergency Calls Received by
a Hospital Every Minute

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
31 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...The Poisson Probability Distribution

A discrete random variable X is said to have a Poisson distribution,


with parameter λ > 0, if it has a probability mass function given by:
e− λ λ x
P(x ) = x
!
for x = 1,2,3, ...
The events are independent.
The average number of successes in the given period of time
alone can occur.
No two events can occur at the same time.
The Poisson distribution is limited when the number of trials n
is indefinitely large.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
32 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Poisson distribution

1 Mean = λ
2 Variance = λ
3 √
Standard Deviation = λ
4
Moment Generating function (MGF) = exp[λ(et −
5 Characteristic function (CF) = exp[λ(ei t −
1)]
6 1)]
Skewness = √1λ
7 Kurtosis = 1
λ

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
33 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Poisson distribution - Probability mass


function

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
34 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Poisson distribution - Cumulative distribution function

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
35 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Poisson distribution is limiting case of binomial


distribution
1 The number of trials is indefinitely large i.e, n →
2 ∞
p the probability of success in each trial is very small i.e, p →
3 0
np(= λ) is finite and p = λ n
λ
4 q= 1 n

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
36 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

Example 1: A life insurance company insures the lives of 5000 men of


age 42 years. The probability of a 42 year old man dying in a given year
is
0.001. What is the probability that the company will have to pay 4
claims in a given year.
Solution:
Total number of men insured = n = 5000
Probability = p = 0.001
λ = np = 5000 x 0.001 = 5.
Thus,
e− λ λ x e− 5 5x
P(X = x ) = =
x! x!
e −5 5 4
P(X = 4) = 4! = 0.1755

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
37 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

Example 2: A travel company has two cars for hiring. The demand for a
car on each day is distributed as poisson variate, with mean 1.5.
Calculate the proportion of days on which
1 Neither cars were used
2 Some demand is
refused
Solution
Let X be random variable representing the number of demands
for cars.
λ = 1.5
e−λλx e −1.5 (1.5) x
P(X = x) = P(x demands in a day ) = =
x! x!

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
38 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

...Example 2
1 The proportion of days in which neither car is

used −1.5 0
P(X = 0) = e (1.5) 0! = e−1.5 = 0.2231
2 The proportion of days on which some demand is refused.

The demand is refused when X goes more than 2

P(X > 2) = 1 − P(X ≤ 2)


= 1 − [P(X = 0) + P(X = 1) + P(X =
2)] e−1.5(1.5)0 e−1.5(1.5)1 e−1.5(1.5)2
= 1− + + ]
0! 1!
[
= 1 − e− 1.5 [1 + 1.5 ]=
2!2!
+ 0.19126
(1.5)2
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
39 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

Example 3: The probability of an individual suffering a bad reaction from


an injection of a certain antibiotic is 0.001. Out of 2000 individuals, find
the probability that
1 Exactly 3 suffer
2More than 2 suffer from bad
reaction
Solution:
Let X denote the number of individuals suffering from bad
reaction
P = 0.001 and n = 2000
Since n is a large and probability of bad reaction is small, we can
assume X as a Poisson variate with λ = np = 2000 × 0.001 =
2
−λ x −2 x
P(X=x) = e λ = e (2)
x!
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
40 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

...Example 3
e −2 (2) 3
P(exactly 3 suffer) = P(X = 3) = 3! = 0.180447

P(more than 2 suffer ) = 1 − P(X ≤ 2)


= 1 − [P(X = 0) + P(X = 1) + P(X =
2)] e−2 (2) 0 e−2 (2)1 e−2 (2)2
= 1− + + ]
0! 2!
[
= 1 − e− 21!
[1 + 2 + 2]
= 1 − 0.6767 =
0.3233

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
41 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

Example 4: An insurance company found that only 0.01% of the


population is involved in road accident in a particular area. If 1000 policy
holders were randomly selected from the population, what is the
probability that not more than two of its clients will be involved in such
an accident next year.
Solution:
p = 0.01% = 0.0001
n = 1000
λ = np = 0.000 × 1000 = 0.1
Let X be a random variable of clients
−λ
involved
− 0. 1
in xaccident
(λ ) (0.1)
P(X = x ) = e = e
x

x!

x!

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
42 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Poisson Probabilities

...Example 4

P(not more than two) = P(X ≤ 2)


= P(X = 0) + P(X = 1) + P(X =
2) e−0.1(0.1)0 e−0.1(0.1)1 e−0.1(0.1)2
= + +
0! 1! 2!
(0.1) 2
= e− 0.1 [1 + 0.1 ]=
2
+ 0.9998

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
43 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

The Normal Probability Distribution

What Is a Normal Distribution?


Normal distribution, also known as the Gaussian distribution, is a
probability distribution that is symmetric about the mean, showing
that data near the mean are more frequent in occurrence than
data far from the mean.
In graphical form, the normal distribution appears as a ”bell
curve”

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
44 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...The Normal Probability Distribution

What are the 5 properties of a normal distribution?


It is symmetric. A normal distribution comes with a perfectly
symmetrical shape.
The mean, median, and mode are equal.
The middle point of a normal distribution is the point with
the maximum frequency, which means that it possesses the
most observations of the variable.
Empirical rule.
Skewness and
kurtosis.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
45 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

...The Normal Probability Distribution

In statistics, a normal distribution or Gaussian distribution is a type of


continuous probability distribution for a real-valued random variable. The
general form of its probability density function is

f (x ) √1 e −21 (x −σµ )2
σ 2π
=
for x = 1,2,3, ...
The parameter µ is the mean or expectation of the distribution
(and also its median and mode), while the parameter σ is its
standard deviation.
The variance of the distribution is σ2.
A random variable with a Gaussian distribution is said to be
normally distributed, and is called a normal deviate.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
46 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Normal distribution

1 Mean = µ
2 Variance = σ2
3 Standard Deviation = σ
Moment Generating function (MGF) = exp[µt + σ2 2t ]
2
4

5 Characteristic function (CF) = exp[i µt + σ22


t2
6 ]
Skewness = 0
7 Kurtosis = 0

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
47 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Normal distribution - Probability mass


function

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
48 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Normal distribution - Cumulative distribution function

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
49 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Standard Normal Distribution

What is Standard Normal distribution?


If X is a random variable following normal distribution with
µ and σ, then Z = X σ− µ is called the standard normal variate and
parameter
probability
the density function of the standard
variate Z is given by 2

e−
z
1
φ(Z ) = √

2 , −∞ < x <

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
50 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Examples of Normal Distribution

1 Birthweight of Babies
2 Height of Babies
3 Shoe Sizes
4 Blood Pressure
5 Students
mark

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
51 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Normal Probabilities

Example 1. Let X ∼ N(µ = 160, σ2 =


302 ) Compute P(100 ≤ X ≤ 180) =?

P(100 ≤ X ≤ 180) = 100 — ≤ X − ≤ 180 − )


σµ µσ µ σ
P( 100 − 180 −
≤Z )
160 30 160 30
= P( ≤
= P(−2 ≤ Z ≤
0.6667)
= 0.4772 + 0.2475
= 0.7247

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
52 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Normal Probabilities

Example 2. Let X ∼ N(µ = 127, σ2 =


222 ) Compute P(X ≤ 150) =?
— 150 −
P(X ≤ 150) = P( X ≤ )
µ
σ µ σ
150 −
= P(Z ≤ )
127 22
= P(Z ≤
1.045)
= 0.5 + 0.3520
= 0.8520

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
53 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Normal Probabilities

Example 3. Suppose the height of men of a certain country are


normally distributed with average 68 and SD = 2.5. Find the percentage
of men who are between a = 66 and b = 71 inches in height.
Solution:

66 − X − µ 71 −
P(66 ≤ X ≤ 71) = P( ≤ ≤ )
µ σ µ σ
66 − 68 71 −68
= P( σ ≤Z≤ )
2.5
2.5≤ Z ≤ 1.20)
= P(−0.8
= P(0 ≤ Z ≤ 0.8) + P(0 ≤ Z ≤ 1.20)
= 0.2881 + 0.3849
= 0.673

Approximately 67.3% men are between 66 and 71 inches in


height.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics School of Public Health,June
Epidemiology Addis29,
Ababa
2023 University
54 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Normal Probabilities

Example 4. If X is normally distributed with µ = 12 and σ = 4. Find


the probability of,
1 X ≥

2 20

X≤
3 0 ≤X ≤
20
12
Solution:

X− 20 −
P(X ≥ 20) = P( ) ≥
µσ µ σ
20 −
= P(Z ≥ )
12 4
= P(Z ≥ 2)
= 0.5 −
0.4772
= 0.0228
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
55 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Normal Probabilities

...Example 4 Solution:

P(X ≤ 20) = 1 − P(X ≥ 20) = 1 − 0.0228 =


0.9772

0− X− 12 −
P(0 ≤ X ≤ 12) = P( ≤ ≤ )
µσ µσ µ σ
0− 12 −
= P( ≤Z )
12 4 12 4

= P(−3 ≤ Z≤
0)
= P(0 ≤ Z ≤ 3)
= 0.4987
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
56 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Computations of Normal Probabilities

Exercise. If X is normally distributed with µ = 30 and σ = 5. Find


the probability of,
1 P(26 ≤ X ≤ 40) =?
2 P(X ≥ 45) =?

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
57 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

What is statistical
data?
When census data cannot be collected, statisticians collect sample data
by developing specific experiment designs and survey samples.

How do we get representative


samples? How do we get optimal
sample?

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
58 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

What is Sampling?
Why we need it?
When we need it?
How do we get it?
How much we need
it?

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
59 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Definition of some basic


terms
A Population consists of the set of all measurements/elements
under study
Population: is the totality of elements or units under study.
A survey is any activity that collects information in an organized
and methodical manner about characteristics of interest from some
or all units of a population using well-defined concepts, methods and
procedures, and compiles such information into a useful summary
form (Statistics Canada, 2010)
Census Surveys: data are collected for all units in the population
Sample Surveys: data are collected for only a fraction (typically a
very small fraction) of units of the population
Sampling frame: A complete list of all the units of the
population Sample: is the part of the population.
Sampling: is the techniques of selecting representative sample
from the population
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
60 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Parameter Vs Statistic

A parameter is a number describing a whole population (e.g.,


population mean), while a statistic is a number describing a sample
(e.g., sample mean).

Sample Population
n ←− size −→ N
X¯ ←− mean −→ µ
S2
←− variance −→ σ2
s ←− st.dev −→ σ
p ←− Proportion −→ P
ˆ

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
61 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

Reasons for Sampling

Using sample saves time, cost/finance and human


resources It prevents destruction
It provides higher level of accuracy
It may be the only way of undertaking the study

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
62 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

What are the limitations of sampling?

Impossibility of sampling.
Chances of bias. The serious limitation of the sampling method
is that it involves biased selection and thereby leads us to draw
erroneous conclusions.
Difficulties in selecting a truly representative
sample. In adequate knowledge in the subject.
Changeability of units.
Impossibility of sampling.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
63 / 123Ad
Probability distributions (Normal, Binomial, Poisson)

What factors do we consider when


sampling?

The reasons for and objectives of sampling.


The relationship between accuracy and precision.
The reliability of estimates with varying sample
size. The determination of safe sample sizes for
surveys. The variability of data.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
64 / 123Ad
Types of Sampling Methods

Types of Sampling
Methods
Probability sampling involves random selection, allowing you to make
strong statistical inferences about the whole group.
Non-probability sampling involves non-random selection based on
convenience or other criteria, allowing you to easily collect data.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
65 / 123Ad
Types of Sampling Methods

Probability (random) Sampling

Simple Random Sampling


Systematic Random Sampling
Stratified Random Sampling
Cluster Random Sampling
Multistage Random Sampling
Multiphase Random Sampling
Matched Random Sampling
Panel Random Sampling

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
66 / 123Ad
Types of Sampling Methods

....Probability (random) Sampling


Probability (random) Sampling - the selection of the sample is purely
based on chance. Every unit of the population has a known non-zero
probability of to be included in the sample.
Simple random sampling: every unit of the population is given
an equal chance of being selected. The sample can be drawn
using lottery method or table of random numbers
Systemic sampling: in systemic sampling, we start at a random
point in the sampling frame, and from this point selected every kth,
say, value in the frame to formulate the sample.
Stratified sampling: in stratified sampling, the population is
partitioned into two or more subpopulation called strata, and
from each stratum a desired sample size is selected at random.
nh = Nh N n

Cluster sampling: in cluster sampling, a random sample of the


strata is selected and then samples from these selected strata are
obtained.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics School of Public Health,June
Epidemiology Addis29,
Ababa
2023 University
67 / 123Ad
Types of Sampling Methods

....Probability (random) Sampling

Multistage Sampling
Complex form of cluster sampling in which two or more levels of
units are embedded one in the other.
First stage, random number of districts chosen in all
states. Followed by random number of villages.
Then third stage units will be houses.
All ultimate units (houses, for instance) selected at last step are
surveyed.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
68 / 123Ad
Types of Sampling Methods

....Probability (random) Sampling

...Multistage Sampling
This technique, is essentially the process of taking random samples
of preceding random samples.
Not as effective as true random sampling, but probably solves more
of the problems inherent to random sampling.
An effective strategy because it banks on multiple randomizations.
As
such, extremely useful.
Multistage sampling used frequently when a complete list of all
members of the population not exists and is inappropriate.
Moreover, by avoiding the use of all sample units in all
selected
clusters, multistage sampling avoids the large, and perhaps
unnecessary, costs associated with traditional cluster
sampling.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
69 / 123Ad
Types of Sampling Methods

....Probability (random) Sampling

Multiphase Sampling
Part of the information collected from whole sample and part from
subsample.
In Tb survey MT in all cases Phase I
X Ray chest in MT +ve cases Phase II
Sputum examination in X Ray +ve
cases - Phase III
Survey by such procedure is less costly, less laborious and
more purposeful

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
70 / 123Ad
Types of Sampling Methods

....Probability (random) Sampling

Matched Random Sampling


A method of assigning participants to groups in which pairs of
participants are first matched on some characteristic and then
individually assigned randomly to groups.
Two samples in which the members are clearly paired, or are
matched explicitly by the researcher.
For example, IQ measurements or pairs of identical twins.
Those samples in which the same attribute, or variable, is measured
twice on each subject, under different circumstances. Commonly called
repeated measures.
Examples include the times of a group of athletes for 1500m before
and
after a week of special training; the milk yields of cows before and
after being fed a particular diet.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
71 / 123Ad
Types of Sampling Methods

....Probability (random) Sampling

Panel Sampling
Method of first selecting a group of participants through a random
sampling method and then asking that group for the same
information again several times over a period of time.
Therefore, each participant is given same survey or interview at two
or more time points; each period of data collection called a ”wave”.
This sampling methodology often chosen for large scale or nation-
wide
studies in order to gauge changes in the population with regard to
any number of variables from chronic illness to job stress to weekly
food expenditures.
Panel sampling can also be used to inform researchers about
within-person health changes due to age or help explain changes
in continuous dependent variables such as spousal interaction.
There have been several proposed methods of analyzing panel
sample
data, including
Zeytu Gashaw Asfaw (PhD)
growth curves.Basics
Department of Biostatistics andfor Biostatistics School of Public Health,June
Epidemiology Addis29,
Ababa
2023 University
72 / 123Ad
Types of Sampling Methods

Non-probability Sampling

Non-probability Sampling - the sample is not based on chance. It is


based on personal judgment.
Quota Sampling
Judgment or purposive
Sampling Convenience Sampling
Snowball sampling
Snowball sampling or chain-
referral sampling is defined as
a
non-probability sampling technique in which the samples have traits
that are rare to find.
Snowball sampling is where research participants recruit other
participants for a test or study.
As sample members are not selected from a sampling frame, snowball
samples are subject to numerous biases. For example, people who have
many friends are more likely to be recruited into the sample. When
Zeytu Gashaw Asfawvirtual
(PhD) social networks
Department are
of Biostatistics used,
Basics
and for then this
Biostatistics
Epidemiology technique
School is called
of Public Health,June
Addis29,virtual
Ababa
2023 University
73 / 123Ad
Types of Sampling Methods

What do you need to consider before deciding the right sampling


technique?

How to Choose the Best Sampling Method


List the research goals (usually some combination of
accuracy, precision, and/or cost).
Identify potential sampling methods that might effectively
achieve those goals.
Test the ability of each method to achieve each goal.

What are the factors that could affect the choice of sampling
method?

Choices in sample design are influenced by many factors, including


the desired level of precision and detail of the information to be
produced, the availability of appropriate sampling frames, the
availability of
Zeytu Gashaw Asfaw (PhD)
suitable auxiliary Basics
Department of Biostatistics andfor
variables for stratification
Biostatistics School of Public Health,June
Epidemiology
and
Addis29,
Ababa
sample
2023 University
74 / 123Ad
Sample Size Calculation

Sample Size Calculation

Many of the studies published in several journals have less


than required sample size and hence less power.
Many articles have been published in existing literature explaining
the methods of calculation of sample size but still a lot of confusion
exists.
It is very important to understand that method of sample size
calculation is different for different study designs and one blanket
formula for sample size calculation cannot be used for all study
designs.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
75 / 123Ad
Sample Size Calculation

...Sample Size Calculation

Why we need to find optimal sample size?


The main aim of a sample size calculation is to determine the
number of participants needed to detect a clinically relevant
treatment effect.
Studies should be designed to include a sufficient number
of participants to adequately address the research
question.
Studies that have either an inadequate number of participants or an
excessively large number of participants are both wasteful in terms
of participant and investigator time, resources to conduct the
assessments, analytic efforts and so on.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
76 / 123Ad
Sample Size Calculation

...Sample Size Calculation

Why sample size is important?


The power of a statistical test is the probability that a test will
reject the null hypothesis when the null hypothesis is false.
That is, power reflects the probability of not committing a type
II error.
The two major factors affecting the power of a study are the
sample size and the effect size
The larger the sample size is the smaller the effect size that can
be detected.
The reverse is also true; small sample sizes can detect large
effect sizes.
While researchers generally have a strong idea of the effect size in
their planned study it is in determining an appropriate sample size
that often leads to an underpowered study.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
77 / 123Ad
Sample Size Calculation

...Sample Size Calculation

Why sample size is important?


The two major factors affecting the power of a study are the
sample size and the effect size
A study should only be undertaken once there is a realistic
chance that the study will yield useful information
A study that has a sample size which is too small may
produce inconclusive results and could also be considered
unethical by exposing human subjects or lab animals to
needless risk
A study that is too large will waste scarce resources and could
expose more participants than necessary to any related risk
Thus an appropriate determination of the sample size used in a
study is a crucial step in the design of a study
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
78 / 123Ad
Sample Size Calculation

...Sample Size Calculation

Before directly heading to calculate a sample size, you need to determine


a few things about the target population and the sample you need?
Population Size: How many total people fit your demographic?
Margin of Error (Confidence Interval): No sample will be perfect, so
you need to decide how much error to allow. The confidence
interval determines how much higher or lower than the population
mean you are willing to let your sample mean fall.
Confidence Level: How confident do you want to be that the
actual mean falls within your confidence interval? The most
common confidence intervals are 90% confident, 95% confident,
and 99% confident.
Standard of Deviation: How much variance do you expect in
your responses?

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
79 / 123Ad
Sample Size Calculation

What factors affect the sample


size?
The factors affecting sample sizes are study design, method of
sampling, and outcome measures effect size, standard deviation,
study power, and significance level.
How does variation affect sample size?
As a sample size increases, sample variance (variation between
observations) increases but the variance of the sample mean
(standard error) decreases and hence precision increases.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
80 / 123Ad
Sample Size Calculation

What is the appropriate Sample Size for my


study?
The answer depends on number of factors, including the purpose of
your study, population of your study, population size, risk of
selecting a bad sample and the allowable sampling error.
In addition to the purpose of the study and population size, three
criteria usually will need to be specified to determine the appropriate
sample size (Miaoulis and Michener, 1976)
Level of precision
Level of confidence or risk
Degree of variability in the attributes being measured.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
81 / 123Ad
Sample Size Calculation

Level of Precision

A measure of how close an estimate is to the true value of


a population parameter.
It is the range in which the true value of the population is
estimated to be
This range is often expressed in percentage points.
It may be expressed in absolute terms or relative to the estimate
±3%, ±5%, ±10%...

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
82 / 123Ad
Sample Size Calculation

α and Confidence level

α: The significance level of a test the probability of rejecting the null


hypothesis when it is true (or the probability of making a Type I
error)
Confidence level: The probability that an estimate of a population
parameter is within certain specified limits of the true value;
commonly denoted by 1 − α
Also known as risk level
Based on CLT, which means when a population is repeatedly
sampled, the average value of the attribute obtained by those
samples is equal to the true population value.
This is expressed in percentage points, for example 95%

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
83 / 123Ad
Sample Size Calculation

Degree of Variability

It refers to the distribution of attributes in the population


The more heterogeneous a population, the larger the sample
size required to obtain a given level of precision
The less variable (more homogeneous) a population, the smaller
the sample size.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
84 / 123Ad
Sample Size Calculation

β and
Power
β: The probability of failing to reject the null hypothesis when it
is false (or the probability of making a Type II error)
Power: The probability of correctly rejecting the null hypothesis when
it is false; commonly denoted by 1 − β

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
85 / 123Ad
Sample Size Calculation

Steps in Estimating Sample


Size
Identify major variables (Qualitative or Quantitative)
Determine type of estimate (%, mean, ratio, proportion,...)
Indicate expected frequency of factor of interest
Decide on desired precision of the estimate
Decide on acceptable risk that estimate will fall outside Its
real population value
Adjust for population size
Adjust for estimated designed
effects Adjust for expected response
rate

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
86 / 123Ad
Sample Size Calculation

What is the design effect in


sampling?
The design effect (denoted as deff) is defined as the ratio of the
variance of an estimate under a sampling plan to the variance of
the same estimate from a simple random sample with same number
of observation units.
The sampling plan could be a stratified sampling or other
complex sample designs.

What is design effect in sample size calculation?

A design effect(DEFF) is an adjustment made to find a survey


sample size, due to a sampling method (e.g. cluster sampling,
respondent driven sampling, or stratified sampling) resulting in larger
sample sizes (or wider confidence intervals) than you would expect
with simple random sampling(SRS).
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
87 / 123Ad
Sample Size Calculation

Is sample size part of study


design?
Calculation of exact sample size is an important part of
research design.
It is very important to understand that different study design need
different method of sample size calculation and one formula
cannot be used in all designs.

How can design and sample size affect results?

The use of sample size calculation directly influences research


findings.
Very small samples undermine the internal and external validity of
a study.
Very large samples tend to transform small differences into statistically
significant differences - even when they are clinically insignificant.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
88 / 123Ad
Sample Size Calculation

Study Designs

Cross-sectional studies
Case control studies
Cohort studies

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
89 / 123Ad
Sample Size Calculation

Sample Size Calculation for Cross-sectional


studies/survey
Cross sectional studies or cross sectional survey are done to
estimate a population parameter like prevalence of some disease in
a community or finding the average value of some quantitative
variable in a population.
Sample size formula for qualitative variable and quantities variable are
different.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
90 / 123Ad
Sample Size Calculation

...Sample Size Calculation for Cross-sectional studies/survey


For Qualitative variable
Suppose an epidemiologist want to know proportion of
children who are hypertensive in a population then this formula
should be used as proportion is a qualitative variable.
Z α2/ 2
P(1− P)
n0 =
E2
n= 1+
n0
n0 for finite population and n0
N > 5%
N

where
Z is the value from the standard normal distribution reflecting
the confidence level that will be used (e.g., Z = 1.96 for 95%)
E is the desired margin of error/Absolute error or precision Has to be
decided by researcher.
P is the proportion of successes in the population. Here we are
planning a study to generate a 95% confidence interval for the
unknown population proportion, p.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
91 / 123Ad
Sample Size Calculation

Example 1

Let us assume that a researcher wants to estimate proportion of


patients having hypertension in paediatric age group in a city.
According to previously published studies actual number
of hypertensives may not be more than 15%.
The researcher wants to calculate this sample size with
the precision/absolute error of 5% and at Type I error of
5%.
Z α2 / 2 P(1− P) 1.96 2 ×0.15(1−0.15)
n= = 0.052
= 196
E2

So for this cross sectional study researcher has to take at least


196 subjects.
If the researcher want to increase the error (decrease the
precision) then denominator will increase and hence sample size
will decrease.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
92 / 123Ad
Sample Size Calculation

...Sample Size Calculation for Cross-sectional studies/survey

For Quantitative variable


Suppose the same researcher is interested in knowing average
systolic blood pressure of children of the same city. The researcher
should considered blood pressure is a quantitative variable.

Z2 sd 2
α/2
n= E2

where
Z is the value from the standard normal distribution reflecting
the confidence level that will be used (e.g., Z = 1.96 for 95%)
E is the desired margin of error/Absolute error or precision Has to be
decided by researcher.
SD = Standard deviation of variable. Value of standard deviation can
be taken from previously done study or through pilot study.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
93 / 123Ad
Sample Size Calculation

Example 2

If the researcher is interested in knowing the average systolic blood


pressure in pediatric age group of that city at 5% of Type of I
error and precision of 5 mmHg of either side (more or less than
mean systolic BP) and standard deviation, based on previously
done studies,is 25 mmHg then formula for sample size calculation
will be
n = 1.9652×25 = 96
2 2

So researcher will have to take the blood pressure of at least 96


children to know average systolic blood pressure in paediatric
age group.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
94 / 123Ad
Sample Size Calculation

Sample Size Calculation for case control


studies
In case control studies cases (the group with disease/ condition
under consideration) are compared with controls (the group without
disease/condition under consideration) regarding exposure to the risk
factor under question.
The formula for sample size calculation for this design also depends
on the type of variable (qualitative or quantitative). Here formula
for independent case control study.
For qualitative variable
Suppose a researcher want to see the link between childhood
sexual abuses with psychiatric disorder in adulthood.
He will take a sample of adult persons with psychiatric disorder
and will take another sample of normal adults having no psychiatric
disorders.
He will then go retrospectively to see history of childhood
sexual abuse in both groups.
Exposure to both groups will be compared and odds ratio will
be
calculated
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics School of Public Health,June
Epidemiology Addis29,
Ababa
2023 University
95 / 123Ad
Sample Size Calculation

...Sample Size Calculation for case control


studies
Here number of people exposed to childhood sexual abuse is
qualitative variable hence this formula will be used for such type
of design
(r +1)(p∗)(1− p∗)(Z β + Zα )
2
n= r (p1 − p2 ) 2 , where

r = Ratio of control to cases, 1 for equal number of case and


control p∗ = Average proportion exposed = (proportion of exposed
cases + proportion of control exposed)/2
Zβ = Standard normal variate for power = for 80% power it is
0.84 and for 90% value is 1.28. Researcher has to select power for
the study.
Z α/2 = Standard normal variate for level of significance as
mentioned in previous section.
p1 − p2 = Effect size or different in proportion expected based on
previous studies. p1 is proportion in cases and p2 is proportion in
control.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology School of Public Health,June
Addis29,
Ababa
2023 University
96 / 123Ad
Sample Size Calculation

...Sample Size Calculation for case control


studies
So if the researcher wants to calculate sample size for the above
mentioned case control study to know link between childhood
sexual abuse with psychiatric disorder in adulthood and he wants to
fix power of study at 80% and assuming expected proportions in
case group and control group are 0.35 and0.20 respectively, and he
wants to have equal number cases and control; then the sample size
per group will be
∗ ∗
n = 2(p )(1−p )(0.84+1.96) , where
2

1(0.35−0.20)2

p∗ = Average proportion exposed = 0.35+0.20


2 = 0.275
2(0.275)(1−0.275)(0.84+1.96) 2
n= 1(0.35−0.20)2
= 138.9 ≈
139
So the researcher has to recruit at least 139 subjects in cases and
equal number in control as he wants to have equal number in
both.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics School of Public Health,June
Epidemiology Addis29,
Ababa
2023 University
97 / 123Ad
Sample Size Calculation

...Sample Size Calculation for case control studies

For Quantitative variables


Suppose a researcher wants to see the association between birth
weight and diabetes in adulthood.
The birth weight being a quantitative data, the researcher will select
one group i.e. cases that will be diabetic adults and other group i.e.
control will be nondiabetic adults.
Both groups will be traced back for data regarding childhood weight.
The formula for sample size calculation is
(r +1)sd 2 (Z β + Z α / 2 ) 2
n= r×d2
,
where
sd = Standard deviation = researcher can take value from previously
published studies; d = Expected mean difference between case and
control (may be based on previously published studies.); r, Zβ , Z α/2
are already explained in previous sections.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
98 / 123 A
d
Sample Size Calculation

...Sample Size Calculation for case control studies

...For Quantitative variables


For example, if researcher think that difference in mean weight
between case and control may be around 250 gm and SD is 1 Kg
then considering equal number of cases and control and 80% power
the sample size will be
2×1 2(0.84+1.96)
n= 2
0.252
= 250.88 ≈
251
Hence, researcher has to take 251 subjects in each group (case
and control).

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
99 / 123 A
d
Sample Size Calculation

Sample Size Calculation for cohort


studies
In cohort studies healthy subjects with or without exposure to
some risk factor are observed over a time period to see the event
rate in both groups.
If a researcher wants to see the impact of weight training exercise on
cardiovascular mortality then he will select two groups, one consisting
of subjects who do exercise and another consisting of those who dont
do.
These groups will be followed up for a specific time period to see
cardiovascular mortality in both groups.
At the end of the study period both groups will be compared for
cardiovascular mortality.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
100 / 123Ad
Sample Size Calculation

...Sample Size Calculation for cohort


studies

The formula for sample size is √


, 2

Zα (1+ m
1 )p ∗(1−p ) + Z β p 1 (1−p 1 )/m+p 2 (1−p 2 )
n= ∗
(p 1 −p 2 ) 2

Z α = Standard normal variate for level of significance


m = Number of control subject per experimental subject
Zβ = Standard normal variate for power or type 2 error as
explained in earlier section
p1 = Probability of events in control group
p2 = Probability of events in experimental group p
p∗ = p 2 +mp 1
m+1

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
101 / 123Ad
Sample Size Calculation

...Sample Size Calculation for cohort


studies
For example, so suppose the researcher wants to see the impact of
weight training exercise on cardiovascular mortality and according
to previous studies proportion of cardiovascular death in case may be
around 20% and in control it can be around 40% hence sample size
calculation for 5% of significant level and 80% power with equal
number of case and control will be
, 1
√ 2
1.96 (1+ 1 )0.30(1−0.30)+0.84 0.40(1−0.40)/1+0.20(1−0.20)
n= (0.40−0.20)2
= 59.41 ≈
59
Thus, the researcher has to take 59 samples in each
group.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
102 / 123Ad
Sample Size Calculation

Sample Size Calculation for comparison between two groups for


quantitative data

When the variable is quantitative data like blood pressure, weight,


height, etc., then the following formula can be used for calculation
of sample size for comparison between two groups.
2× sd 2 (Z α / 2 + Z β ) 2
n=
E2

sd = standard deviation from previous studies or pilot study


Z α/2 = Z0.05/2 = Z0.025 = 1.96 (From Z table) at type 1 error of 5%
Zβ = Z0.20 = 0.842 (From Z table) at 80%
power d = effect size = difference between
mean values
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
103 / 123Ad
Sample Size Calculation

...Sample Size Calculation for comparison between two groups for


quantitative data

For example, suppose a researcher wants to see the effect of a


potential antihypertensive drug and He wants to compare the
new drug with placebo.
Researcher thinks that if this new drug reduces this blood pressure
by 10 mmHg as compared to placebo then it should be considered as
clinically significant.
Let us assume standard deviation found in previously done studies
was 25 mmHg.
Suppose the researcher selects the level of significance at 5% and the
power of study at 80%., and he thinks suitable statistical test in this
condition will be two tailed unpaired t test. The effect size in this
condition is 10 mmHg. Hence sample size will be

n = 2×25 (1.96+0.84)
2 2
102 = 98
So in this case the researcher needs 98 subjects per
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
104 / 123Ad
Sample Size Calculation

Sample Size Calculation for comparison between two groups for


qualitative data

When the endpoint of a clinical intervention study is qualitative


like alive/dead, diseased/non diseased, male/ female etc., then the
following formula can be used for sample size calculation for
comparison between two groups.
Suppose the researcher is interested in knowing protective effect of
a drug on mortality in patients of myocardial infarction.
He selected two groups of patients of myocardial infarction one
group was given that drug and another group was given placebo.
The both groups were kept under observation and at the end of
study death in both groups were compared.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
105 / 123Ad
Sample Size Calculation

...Sample Size Calculation for comparison between two groups for


qualitative data

For sample size of this type of study below mentioned formula can
be used.
2× [Z α / 2 + Z β ] 2 P(1− P)
n= [p 1 −p 2 ] 2

p1 − p2 = Difference in proportion of events in two groups


P = Pooled prevalence = [prevalence in case group (p1) +
prevalence in control group (p 2 )]/2
In above example, let us assume that previous study says that 20%
of patient of myocardial infarction die within a specified time.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
106 / 123Ad
Sample Size Calculation

...Sample Size Calculation for comparison between two groups for


qualitative data

The researcher feels that if the drug being tested increases survival
to 30% then the finding can be considered as clinically significant.
Effect size will be difference between proportions. 0.2 - 0.3= -
0.1. At 5% of significance level and 80% power sample size will
be Pooled prevalence = (0.20 + 0.30)/2 = 0.25

n = 2×[1.96+0.84] 0.25(1−0.25) = 294


2

[−0.1] 2

Thus, the researcher needs 294 subjects per group.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
107 / 123Ad
Sample Size Calculation

Summary for Sample Size


Calculation
Determining the appropriate design of a study is more important than
the statistical analysis; a poorly designed study can never be salvaged,
whereas a poorly analyzed study can be re-analyzed.
A critical component in study design is the determination of
the appropriate sample size.
The sample size must be large enough to adequately answer
the research question, yet not too large so as to involve too
many patients when fewer would have sufficed.
The determination of the appropriate sample size involves
statistical criteria as well as clinical or practical considerations.
Sample size determination involves teamwork; biostatisticians must
work closely with clinical investigators to determine the sample
size that will address the research question of interest with
adequate precision or power to produce results that are clinically
meaningful.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
108 / 123Ad
Sample Size Calculation

...Summar
y
The following table summarizes the sample size formulas for each scenario
described here. The formulas are organized by
the proposed analysis, a confidence interval estimate or a test of
Situation
hypothesis. To Estimate CI To conduct HT
2
(Z σ) 2
One Sample n = ,
α/2
1−α/2 1−β
Continuous n=
2 |µ1 −µ2 |
E

(Z α / 2 σ) 2 Z 1−α/2+ Z 1 − β
Two Indep ni = 2 E2 ni = 2 |µ 1 −µ 2 |
σ
2
(Zα/2 σd )2 Z 1−α/2+ Z 1 − β
Two pairs n= E2 n= µd
σd

2
(Z α / 2 ) 2 Z 1−α/2 + Z1 −β
One Sample n = p(1 − p) E2 , n=
p 1 −p 0
Dichotomous
2
(Z α / 2 ) 2 Z 1−α/2+ Z 1 − β
Two Indep.
Zeytu Gashaw Asfaw (PhD) nDepartment
= [p1 (1 − p1Basics ) and
of Biostatistics +p for (1 − p2 )]School Eof 2Public Health,
2Biostatistics
Epidemiology n= June |p 1University
29,Ababa
Addis 2023 −p 2109
| / 123Ad
Sample Size Calculation

...Summary

Estimating a population proportion with specified absolute


Z 1 − α / 2 P(1−P)
precision n=
e2

Estimating a population proportion with specified relative


Z 1 − α / 2 (1−P)
precision n= e2P

Hypothesis tests for a population


proportion
h √ i2
For a one-sided test 1−α √ 0 0 1−β a a
n= Z P (1−P ) + Z P (1−P )
[P0 − Pa ] 2
For a two-sided test
h √ i2
√ −β P a(1−P a)
n= Z 1− α/2 P0 (1−P0 ) + Z 1
[P0 − Pa ] 2

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
110 / 123Ad
Sample Size Calculation

...Summar
y
Estimating the difference between two population proportions
with specified absolute precision
2 α / 2 [P 1 (1−P 1 )+P 2 (1−P 2 )]
Z 1−
n=
e2
Hypothesis tests for two population
proportions
For a one-sidedhtest √ i2
1−α √ 1−β 1 −P 1)+ P 2(1−P 2)
Z 2P(1−P)+Z P (1 ,
[P1 − P2 ] 2
where P = P1 +2 P2
For a two-sidedh test √ i2

Z 1−α/2 2P(1−P)+Z1 − β P1 (1−P1 )+ P2 (1−P2 ) ,
[P1 − P2 ] 2
For a one-sided test for small proportion 2
[Z 1− α + Z 1− β ]
n = 0.00061(arcsin P −arcsin P ) 2 √ √
[ 2 1 ]
For a two-sided test for small proportion
2
[Z √ + Z
n = 0.00061(arcsin
1− α / 2 1− β ] √
[ P 2−arcsin P 1) 2 ]

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
111 / 123Ad
Sample Size Calculation

...Summar
y
Estimating an odds ratio with specified relative
precision
1
Z 21 − α / 2 ∗ 1 ∗+ ∗
n= 1 (1−P 1 )
P∗ P2 (1−P2 )
2
[loge (1− e)]

Hypothesis tests for an odds


√ ∗ √ 2
ratio [ Z 2P (1−P ∗) + Z 1− β P ∗(1−P ∗) + P ∗(1−P ∗) ]
n= 1 − α / 2 2 2
[P −P ]
∗ ∗ 2
1 1 2 ,
2 1 2

In this formula the term − is used instead of 2P2∗ (1 −


2P2∗ (1 P2∗ )
P2∗ ) because the study population is likely to be made up of many
more controls than cases, and the exposure rate among the controls
is often known with a high degree of precision; under the null
hypothesis this is the exposure rate for the cases as well. If the
investigator is in doubt about the exposure rate among the controls,
howeverP∗
the formula should be modified and the term 2P2∗ (1 −
∗ 1+ P
P 2∗ = ∗
P2 ) used, 2 where
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
112 / 123Ad
Sample Size Calculation

...Summary

Estimating a relative risk with specified relative


precision h i
1− P1 2
Z2 1− P +
1−α/2 P1 2 P2
n= [loge (1− e)]

Since RR = PP1 , RR =≤ P1 . For RR =≤ 1, use the column


2 2
value
corresponding to RR 1
and the row value corresponding to P1.
Hypothesis tests for a
relative risk h √ i2

Z1− α / 2 2P(1−P)+Z 1− β P1 (1−P1 ) + P2 (1−P2 ) ,
[P −P ] 2 1 2

where P = P 1 +P 2
2

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
113 / 123Ad
Sample Size Calculation

...Summar
y
Estimating an incidence rate with specified relative
hZ i2
precision n = 1−eα / 2
Hypothesis tests for an incidence
rate For a one-sided test 2
[Z 1− α λ 0 + Z 1− β λ a ]
n= 2
[λ 0 − λ a ]
For a two-sided test 2
Z λ +Z λ
n= [ 1− α / 2 0 1− β a ]
[λ 0 − λ a ] 2
Hypothesis tests for two incidence rates in follow-up (cohort)
studies (study duration nor fixed)
For a one-sided test
hZ
1−α
√(1+k)λ 2 + Z1 − β √ (kλ12 + λ22 )
i2

n1 = k [λ 1 − λ 2 ]
2

For a two-sided test i


hZ √(1+k)λ 2 + Z √ (kλ12 + λ22 )
n1 = 1−α/2 1−β
k [λ 1 − λ 2 ]
2 2

where λ = 2
λ 1+ λ 2
and k is the ratio of the sample size for the
group
second
Zeytu Gashaw Asfaw (PhD)
of subjects
Department o
n to Basics
thatforfor the first group n
Biostatistics June 29, 2023 114 / 123 d
2 1
Sampling Distribution

Sampling distribution

The sampling distribution of a statistic is the probability distribution


of all possible values the statistic may assume, when computed from
random samples of the same size, drawn from a specified
population.
Sampling distribution is a statistic that determines the probability of
an event based on data from a small group within a large
population.
Its primary purpose is to establish representative results of
small samples of a comparatively larger population.
Since the population is too large to analyze, you can select a
smaller group and repeatedly sample or analyze them.
The gathered data, or statistic, is used to calculate the
likely occurrence, or probability, of an event.
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
115 / 123Ad
Sampling Distribution

...Sampling distribution

Factors that influence sampling distribution


We can measure the sampling distribution’s variability either by
standard deviation, also called standard error of the mean, or
population variance, depending on the context and inferences you
are trying to draw.
There are three primary factors that influence the variability of
a sampling distribution. They are:
The number observed in a population: The symbol for this variable is
”N.” It is the measure of observed activity in a given group of data.
The number observed in the sample: The symbol for this variable is
”n.” It is the measure of observed activity in a random sample of
data that is part of the larger grouping.
The method of choosing the sample: How you chose the samples can
account for variability in some cases.

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
116 / 123Ad
Sampling Distribution

Types of Sampling distribution


There are three standard types of sampling distributions in
statistics: Sampling distribution of mean
The most common type of sampling distribution is of the mean.
It focuses on calculating the mean of every sample group chosen from the
population and plotting the data points. The graph shows a normal
distribution where the center is the mean of the sampling distribution,
which represents the mean of the entire population.
Sampling distribution of proportion
This sampling distribution focuses on proportions in a population. You can
select samples and calculate their proportions.
The mean of the sample proportions from each group represent the
proportion of the entire population
T-distribution
A T-distribution is a sampling distribution that involves a small
population or one where you don’t know much about it.
It is used to estimate the mean of the population and other
statistics
such as confidence intervals.
The T-distribution
Zeytu Gashaw Asfaw (PhD)
uses a t-score Basics
Department of Biostatistics
toBiostatistics
andfor
evaluateSchool
Epidemiology
dataof that wouldn’t
June
Public Health,
be
29,Ababa
Addis 2023 University
117 / 123Ad
Sampling Distribution

...Sampling distribution

Relationships between Population Parameters and the


sampling Distribution of the Sample Mean
The expected value of the sample mean is equal to the
population mean:
E (X ) = µx = µx
The variance of the sample mean is equal to the population variance
divided by the sample size:
2
V (X ) = σ2 σ x
x
The standard deviation of =the samplen mean, known as the
standard error of the mean, is equal to the population standard
deviation divided by the square root of the sample size:
SD(X ) = σx = √σ x
n

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
118 / 123Ad
Sampling Distribution

Sampling distribution of mean

Examples
Example 1: Suppose we have a hypothetical population of size 3,
consisting of three children: A is 3 years old, B is 6 years old and C
is 9 years old. Construct sampling distribution of the sample mean
of size 2 using sampling without replacement and with replacement.
Solution:
The mean and variance of the population are 6 and 6, respectively.
If sampling is without replacement we will have 3C2 = 3
possible samples. E (X ) = 6 and V (X ) = 3
If sampling is with replacement we will have Nn = 32 = 9
possible samples. E (X ) = 6 and V (X ) = 3

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
119 / 123Ad
Sampling Distribution

Sampling distribution of
mean
Example 2: Let X be the mean of a random sample of size 50 drawn
from a population with mean 112 and standard deviation 40
Find the mean and standard deviation of X
Solution:

µX = µ = 112

σ = √σ = √40
50
=
X n
5.65685

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
120 / 123Ad
Sampling Distribution

Sampling Distribution of the Sample Proportion,



If repeated random samples of a given size n are taken from a
population of values for a categorical variable, where the proportion
in the category of interest is p, then the mean of all sample
proportions pˆis the population proportion (p).
The distribution of the sample proportion approximates a
normal distribution under the following 2 conditions.
1 np ≥
2 15 − p) ≥
n(1
15

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
121 / 123Ad
Sampling Distribution

...Sampling Distribution of the Sample Proportion,



If any set of the two conditions listed above are satisfied,
the sampling distribution of the sample proportion is...
1 approximately normal
2 with mean, µ = P q
p(1− p)
3 standard deviation n
4 [standard
If the sampling σ=
error]distribution of is approximately normal, we can
convert a sample proportion to a z-score using the following formula:
Zcal = q pˆ −
pp(1− p)
n

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
122 / 123Ad
References

References
1 Mukhopadhyay, Nitis. Probability and statistical inference/Nitis

Mukhopadhyay. p. cm. (Statistics, textbooks and monographs; v.


162) Includes bibliographical references and index. ISBN
0-8247-0379-0 (alk. paper)
2 An Introduction to Probability and Statistics by V.K. Rohatgi and

A.K. Md. E. Saleh


3 Probability and Statistical Inference by Hogg, R. V., Tanis, E. A.

and Zimmerman D. L.
4 Mathematical Statistics: A Textbook, S. Biswas and

G.L.Sriwastav, Narosa
5 Cai J, Zeng D. Sample size/power calculation for casecohort studies.

Biometrics 2004;60:101524.
6 S. K. Lwanga and S. Lemeshow. Sample size determination in

health studies : A practical manual. World Health Organization,


Geneva, 1991. https : //apps.who.int/iris/handle/10665/40062;
9241544058( p1 − p22).pdf ; 9241544058( p23 − p80).pdf
Zeytu Gashaw Asfaw (PhD) Basics
Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
123 / 123Ad
References

Thank you !!!

Zeytu Gashaw Asfaw (PhD) Basics


Department of Biostatistics andfor Biostatistics
Epidemiology June
School of Public Health, 29,Ababa
Addis 2023 University
123 / 123Ad

You might also like