0% found this document useful (0 votes)
27 views

Measurement (Study Design and Inferential Statistics)

This document discusses different epidemiological study designs including experimental, observational, analytical, and descriptive designs. It describes cross-sectional, cohort, and case-control study designs. Cross-sectional studies measure prevalence at a point in time to describe disease burden and patterns. Cohort studies follow populations over time to study disease incidence and risk factors. Case-control studies compare exposures in individuals with and without a disease.

Uploaded by

alex
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Measurement (Study Design and Inferential Statistics)

This document discusses different epidemiological study designs including experimental, observational, analytical, and descriptive designs. It describes cross-sectional, cohort, and case-control study designs. Cross-sectional studies measure prevalence at a point in time to describe disease burden and patterns. Cohort studies follow populations over time to study disease incidence and risk factors. Case-control studies compare exposures in individuals with and without a disease.

Uploaded by

alex
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 104

Epidemiological Study

Designs
By : Alemayo E(BSc,MPH)

1
Epidemiological Study Designs
Study designs could be
• Exploratory/experimental
• Observational - Analytical or
- descriptive

2
Epidemiologic Study Designs

Experimental Observational
(RCTs)

Analytical Descriptive
(Describing patterns
of disease)
cross-sectional Cohort
Case-Control

3
Choosing Epidemiologic Study
Designs

4
Hierarchy of Epidemiologic Study Design

• Case report (one case of unusual findings)


• Case series ( multiple case findings)
• Cross sectional studies Generate hypothesis
• Case control studies
• Cohort studies
• Randomized trial studies

Establish causality
5
Epidemiologic Study Designs

6
Grimes & Schulz, 2002 (www)
A cross-sectional study Design

• Definition: A cross-sectional study is one in


which subjects are sampled irrespective of their
disease status
• Is study at a particular point in time, without
follow-up
• The term cross-sectional study or prevalence
study usually refers to studies at the individual
level

7
A cross-sectional…

It measures prevalence at point in time:


• attitudes, beliefs, behaviors,
• family history, genetic factors,
• existing or past health conditions, or anything else
that does not require follow-up to assess can be
studied
E.g. DHS, Population census
It is commonly used to describe the burden of disease
and its distribution
8
A cross-sectional…

The Magnitude of the event (Prevalence)


• Used to describe population
characteristics, often in terms of:
• person (who?)
• place (where?) &
• time (when?)
• e.g. The National Diet and Health Survey
in Ethiopia in 2007 among adults

9
Cross-sectional…
• The current or historical status of individuals is
assessed and may be examined in relation to
some current or past exposure
• These studies are obviously most useful for
conditions that are not rapidly fatal,
• but not routinely brought to medical attention
(e.g., elevated blood pressure, elevated blood
cholesterol, many psychiatric disorders, diet, Sub
clinical infections, etc)
10
Cross-sectional…
• participants for a cross-sectional study are
generally chosen without previous knowledge
of their disease or exposure status,

• Such studies can be used to estimate prevalence


of both diseases and exposures

• Therefore it is used to compute prevalence


ratios and prevalence odds ratios.

11
Cross-sectional…
• the researcher has no control over the exposure
of interest (research subject), No interventions
• It involves
– identifying a defined population at a
particular point in time (target population)
– measuring a range of variables on an
individual basis
– include past and current practices/behaviors

12
Cross Sectional…
• KAP (knowledge, attitude, and practices ) study:
• KAP studies are purely descriptive and help to build up
a better understanding of the behavior of the population,
without necessarily relating this to any disease or health
outcome.
• It is a management tool: health service managers and
planners may make use of cross-sectional survey to
assess utilization and effectiveness of service
• Development of hypothesis: Hypotheses on the causes
of disease may be developed using data from cross-
sectional study survey
13
Limitation of cross-sectional study

• It is not possible to establish cause-effect


relationship; (which comes first?); Chicken egg
dilemma
• Confounding factors may not be equally
distributed between the groups being compared
and this unequal distribution may lead to bias
and subsequent misinterpretation
• It may measure current health event but current
health event may be altered by the presence of
disease
14
Limitation of…

• Recall Bias: due to errors in recalling the


exposure and possible outcome (people may
forget their past Hx)
Advantage
• Time saving /Quick result
• Relatively less costly

15
2 . Cohort Study designs
An approach to studying disease incidence and risk
factors:
Approach:
1. Start with a population at risk (a batch of
population)
2. Measure characteristics at baseline
3. Follow-up the population (with out the event) over
time
4. Compare the rate of events (incidence) among with
& without the exposure status
16
Cohort Studies

(A) Diseased
Exposed Not diseased
People
Population without
disease (B) Not Diseased
exposed
Not disease

17
Cohort Study..

Population -Can be large or small


Time - Can be long or short
Scope - Can be simple or elaborate
For rare out comes, it needs many people
and/or lengthy follow-up
May have to decide what characteristics to
measure long in advance

18
Prospective cohort study

Disease
Exposure Study starts occurrence

time

Disease
Study starts Exposure occurrence

time
19
Retrospective cohort studies

Disease
Exposure Study starts
occurrence

time

20
Advantages of Cohort Studies

• Can establish population-based incidence


• Accurate relative risk (risk ratio) estimation
• Can examine rare exposures (asbestos & lung cancer)
• Temporal relationship can be inferred (prospective design)
• Can be used where randomization is not possible
• Magnitude of a risk factor’s effect can be quantified
• Decreases selection and information biases
• Multiple outcomes can be studied for each exposure
• Exposure status can be determined before disease detection

21
Disadvantages of Cohort Studies

• Lengthy and expensive


• May require very large samples
• Not suitable for rare diseases
• Not suitable for diseases with long-latency
• Unexpected environmental changes may
influence the association
• High non-response, migration and loss-to-
follow-up biases

22
3.Case control Study
• A study that starts with the identification of
persons with the disease (or other outcome
variable, the case) of interest, and
• A suitable comparison, reference or control
group of persons without the disease
• And inquiries are made about past exposure
• We compare diseased (cases) and non-diseased
(controls) to find the level of their exposure
• Exposure status is traced back in time
23
Case control Study…

Exposed Not exposed Exposed Not Exposed

past

Have the disease Have no the


of the interest disease of the
(Cases) interest
(Controls)
Retrospective nature
Present

General population

24
Cont
• A case-control study is designed to help
determine if an exposure is associated with an
outcome
• Case-control study is always retrospective
because it starts with an outcome then traces
back to investigate exposures
• Controls must be comparable to cases in every
aspects except that they do not have the
disease

25
Case…
• Preferably controls are drawn from the same
population as the cases
There are two basic types of case-control studies:
A/ Non-matched case-control study
In this study design, the number of controls does
not necessarily equal the number of cases
For example, we may enroll 105 cases and 178
controls

26
Case…
B/ Matched case control
• The study has linked a case to a control based
on matching of one or more variables
• The summary table will differ for a matched
case-control study
• Let's look at an example. Suppose we plan to
match cases to controls by gender and age (+ 5
years). We first identify the following case:

27
Case…

In a matched study, we enroll controls based upon some


characteristic(s) of the case
For example, we might match the sex or to the age of the
control to the sex or age of the case
The idea in matching is to match upon a potential
confounding variable in order to remove their
confounding effect
There are two basic types of matched designs:
One-to-one matching (i.e., one case to one control) and
One case to ‘n’ controls (n = 2-4 controls)

28
Case…

• Matching should be considered only for risk factors


whose confounding effects need to be controlled for
• Common examples of matching variables
(confounding factors): age, sex, level of education,
wealth (income), marital status, race
• Matching all confounding variables is not efficient and
almost impossible in practice.
• Besides, matching all confounding variables could lead
to overmatching and thus to less precise estimates

29
Advantages of Case-Control Studies

cheap, easy and quick studies (than cohort


- Relatively
& experimental)
- Multiple exposures can be examined
- Suitable when randomization is unethical
(like alcohol and pregnancy outcome)

30
Disadvantages of Case-Control Studies

- Selection of case and control is tiresome


- Subject to bias (selection, recall,
misclassification)
- Direct incidence estimation is not possible
- Temporal relationship is not clear
- Multiple outcomes cannot be studied

31
4. Experimental study design
• The subjects in the study who actually
receive the treatment of interest are called
the treatment group
• The subjects who receive no treatment are
called the comparison group ( control group)
• In an experiment, we are interested in the
effect of treatment on outcome of interest

32
4.Experimental…
Before considering an experimental design, you
need to ask yourself the following questions:
• Is it possible/feasible to apply random selection
to assign people to treatment and control group?
• Is the process of random selection into
experimental and control groups ethical and
legal?
e.g. pregnant mother and alcohol drinking etc
(fetal alcoholic syndrome)

33
4. Experimental …
There are certain procedures you need to
follow:
• First randomly select a treatment & control
group
• Both, treatment & control groups must come
from the same population,
• Both groups should have about the same
number of subjects in them,
• As a general rule, not less than 25 subjects in
each group

34
4. Experimental …

• The experiment is complete when you take a


final measure, called the post test,
• Your findings should be interpreted primarily
from differences in a post test score between
experimental and control groups
• Use the blind experiment: subjects don't
know which group they're in, the
experimental or control group
• These precautions help protect your study
from the Hawthorne and the Placebo Effect

35
4. Experimental …

• The Hawthorne effect refers to the tendency of


subjects to act differently when they know they are
being studied, especially if they think they have
been singled out from some experimental treatment
• The Placebo effect refers to the tendency of some
subjects to think they are "cured" or sufficiently
treated when they know about the research
• (A tendency to report favorable response to any
therapy regardless of its physiologic efficacy of the
Rx )
36
4.Experimental…
Advantages
• Randomized Controlled Trials (RCTs)
– the “gold standard” of research designs
– provides most convincing evidence of
relationship between exposure and effect
Disadvantages
• Very expensive
• it may be unethical, for example, to assign
persons to certain treatment or comparison
groups
37
Chance, bias and confounding
• The observed statistical association between a
certain outcome and the hypothesized
exposure could be simply a matter of chance
• Or it could be the result of systematic errors
(in data collection sampling, disease or
exposure ascertainment: the role of bias
• Or it could be due to the effect of additional
variables that might be responsible for the
observed association: the role of confounding
• Or it could be a real association
38
Confounding

• A confounder is a third factor that is associated with the


exposure and independently affects the risk of
developing the disease
• It distorts the estimate of true relationship between the
exposure and disease.
• This distortion, either over estimates or underestimates
the true relation between exposure and disease.
• If we eliminate the possibility that the association is due
to chance or due to a faulty study design that causes a
bias, it is likely that the association is a real one

39
Confounding…

40
Controlling chance, bias
Chance
• Obtaining adequate sample size for the study
could reduce the likelihood of chance as a
possible explanation
Bias
• selection bias is best eliminated by
randomization
• information bias can be eliminated by: using
blinding procedures

42
Confounding can be controlled by:
At design level
1. Randomization: assures equal distribution of confounders
between study and control groups e.g. selection bias
2. Restriction: subjects are restricted by the levels of a
known confounder (restricting extraneous variables)
3. Matching: potential confounding factors are kept equal
between the study groups
At analysis level
4. Stratification: for various levels of potential confounders
e.g. categorization such as age category, sex, level of
education
5. Multivariable analysis (Logistic Regression)
43
Types of data
1. Primary data: data that one has collected oneself
• Better understood by the researcher
• Usually contains few variables
2. Secondary data: data that has already been
collected by somebody else
• Not well understood by the researcher
• Very large number of variables

44
Quantitative data collection
Techniques
• Interview administered questionnaire
• Self-administered questionnaire
• Direct measurement
• Review of record

45
Qualitative data collection
Techniques
• Key informant interview
• In-depth interview
• Focus group discussions
• Observations (direct, participant)

46
Statistical inference

It is the act of using data in a sample to


make generalizations about the population.

01/12/2024 47
Basic concepts

◌ Parameter refers to a numerical characteristic of a


statistical population.
◌ Statistic refers to a value calculated in a sample.
◌ We are particularly interested in a type of statistic
known as an estimate.
◌ A statistical estimate is a direct reflection of an
underlying population parameter.

01/12/2024 48
01/12/2024 49
1. Estimation

 The computation of a statistic from sample


data, yielding a value that is an approximation
(guess) of its target, an unknown true
population parameter value.

01/12/2024 50
Estimates
We can estimate a Population With a Sample
Parameter Statistic(a Point Estimate)

Proportion P
Mean μ
Standard deviation s
Mean difference
Proportion difference

01/12/2024 51
Example

• The proportion of WU students who

O Drinking alcohol during the preceding 30-day period was

30.2%,

O Smoked cigarettes was 21.9%

O Overweight was 13.5%.

 These values are statistical estimates of population

01/12/2024 52
Methods of estimation
1) Point estimate:
It is a single numerical value used to estimate the
corresponding population parameter.
It is single best guess
It has the form: [ Value ]

01/12/2024 53
Methods of estimation

2)Interval estimate

• Consists of two numerical values defining a range of values


that, with a specified degree of confidence, we feel includes
the parameter being estimated.

• It has the form of a “range of plausible values”

• It has the form: [ lower limit, upper limit ]

01/12/2024 54
Ctd…
• A point estimate is a single number
• A confidence interval provides additional information
on variability

Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval

01/12/2024 55
Estimation Process
Random Sample

I am 95%
confident that μ is
Population between 40 & 60.
Mean
(mean, μ, is = 50
unknown)

Sample

01/12/2024 14
Properties of good estimator

A. Unbiased: The mean of the estimates obtained from


samples of a given size is equal to the parameter being
estimated.
B. Consistency: as sample size increases, the value of the
estimator approaches the value of the parameter
estimated.
C. Relatively efficient: the smallest variance.

01/12/2024 57
Confidence Intervals[CI]

 Confidence interval length reflects the precision of the


estimate.

◌ Narrow intervals reflect precision;

◌ Wide intervals reflect imprecision.

01/12/2024 58
Higher confidence requires a wider interval

01/12/2024 59
The general formula for CIs
The value of the statistic in my sample (e.g., mean,
proportion, odds ratio, etc.)

point estimate  (measure of how confident we want to


be)  (standard error)

From a Z table or a T table, depending on the


sampling distribution of the statistic.
Standard error of the statistic

01/12/2024 60
Lower limit

= Point Estimate - (Critical Value) x (Standard Error)

Upper limit

= Point Estimate + (Critical Value) x (Standard Error)


• A wide interval suggests imprecision of estimation.

• Narrow CI widths reflects large sample size or low variability or both.

Note:
• Measure of how confident we want to be = critical value = confidence
coefficient

01/12/2024 61
Schematic of a confidence interval

01/12/2024 62
Definition: 95%CI
1. Probabilistic interpretation:
• If all possible random samples (an infinite number) of
a given sample size (e.g. 10 or 100) were obtained and
if each were used to obtain its own CI, then 95% of all
such CIs would contain the unknown population
parameter; the remaining 5% would not.
• It is incorrect to say “there is a 95% probability that
the CI contains the unknown population parameter”.
01/12/2024 63
2. Practical interpretation

• When sampling is from a normally distributed

population with known standard deviation, we are 100

(1-α) [e.g., 95%] confident that the single computed

interval contains the unknown population parameter.

01/12/2024 64
Five 95%CI from sampling distribution of means

Figure 1

• Depicts five 95% confidence


intervals for μ constructed
from five independent SRSs
of the same size from the
same source population.
• It so happens that four of the
five intervals captured μ in
this schematic.
Missed Interval

01/12/2024 65
Confidence Limits
• The confidence interval is
σ
x  z α/2
n
• The endpoints of the interval are

σ
UCL  x  z α/2 Upper confidence limit
n
σ
LCL  x  z α/2 Lower confidence limit
n

66
01/12/2024
Margin of Error
σ
x  z α/2
• The confidence interval, n

• Can also be written as x  ME

where ME is called the margin of error


σ
ME  z α/2
n

• The interval width, w, is equal to twice the margin of


error
67
01/12/2024
Reducing the Margin of Error
σ
ME  z α/2
n

The margin of error can be reduced if


• The population standard deviation can be reduced
(σ↓)
• The sample size is increased (n↑)
• The confidence level is decreased, (1 – ) ↓

01/12/2024 68
Finding z1-/2
• Consider a 95% confidence interval:
1    .95

α α
 .025  .025
2 2

Z units: z = -1.96 0 z = 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit

 Find z.025 = 1.96 from the standard normal distribution table

69
01/12/2024
Intervals and Level of Confidence
Sampling Distribution of the Mean

/2 1  /2
x
Intervals extend μx  μ
from x1
σ 100(1-)% of
LCL  x  z x2
intervals
n constructed
to
contain μ;
σ
UCL  x  z 100()% do not.
n
Confidence Intervals

70
01/12/2024
1. CI for a Single Population Mean
Known variance (large sample size)
• Three elements to a CI:

1. Point estimate
2. Confidence coefficient
3. SE of the point estimate

01/12/2024 71
The point estimate of the sample mean
The standard error the mean() is
Commonly used CIs are 90%, 95%, and 99%

α-level 1 − α(confidence level)


0.10 90 = 1.645

0.05 95 1.960
0.01 99 2.576

01/12/2024 72
01/12/2024 73
Degrees of Freedom (df)

 The t value depends on degrees of freedom (d.f.)

 Number of observations that are free to vary after


sample mean has been calculated

d.f. = n - 1

01/12/2024 74
Finding Critical Values of t
Find the critical value for a 95% confidence level when the
sample size is 15

Solution
• Because the degrees of
freedom are: d.f.= n-1 15-
1=14
• Using d.f. =14 and c = 0.95,
you can find the critical
value as shown by the
highlighted areas in the
table.

01/12/2024 75
Choosing between z and t
Start

01/12/2024 76
Example

A random sample of size n =20 duration(minutes) of cardiac


bypass surgeries has a mean of duration of = 267 minutes, and
variance = 36,700 minutes Assuming the underlying distribution
is normal with unknown variance, construct a 90%Cl estimate of
unknown true mean,µ.
Standard error = = = = 42.7 minutes
t-value at 90% CL at 19 df =1.729

01/12/2024 77
Solution
• Putting this altogether:-
• Lower limit
= (point estimate) - con coeff(SE of point estimate )
= 267 - (1.729) * (42.7)
= 193.17
• Upper limit
=(point estimate) + con coeff(SE of point estimate )
= 267 + (1.729)*(42.7
= 340.83
Thus, a 90%CI for the true mean duration of surgery is
(193.17, 340.83) minutes
01/12/2024 78
Increasing the sample size
• Increases in the sample size reduce the width of the
confidence interval.
Example
• If the sample size in the above example is doubled in
200, and if 50 are left handed in the sample, then the
interval is still centered at 0.25, but the width shrinks
to 0.19, 0.31

01/12/2024 79
Hypotheses Testing

01/12/2024 80
Hypothesis Testing(HT)
• A deductive procedure to judge claims about
parameters.

01/12/2024 81
Hypothesis Testing(HT)
Hypothesis Testing (HT) provides an objective
framework for making decisions

The purpose of HT help us to reach a conclusion

concerning a population by examining a sample .

01/12/2024 82
Types of Hypothesis
1. The Null Hypothesis, H0

· Is a statement claiming that there is no difference between

the hypothesized value and the population value.

· The effect of interest is zero = no difference

· States the assumption (hypothesis) to be tested

01/12/2024 83
The Null Hypothesis, H0

• States the assumption to be tested

Example: The average age of Ethiopian population is at least


fifty ( )H 0 : μ  50

• Is always about a population parameter, not about a sample


statistic (even though sample is used)

H 0 : μ  50 H 0 : x  50

01/12/2024 9-84
The Null Hypothesis…
• Begin with the assumption that the Ho is true
Similar to the notion of innocent until proven
guilty
• Always contains “=” , “ ≤” or “≥ ” sign
• May or may not be rejected

01/12/2024 85
Types of hypothesis

2. The Alternative Hypothesis, HA

• Is a statement of what we believe is true if our sample

data causes us to reject Ho.

• Is generally the hypothesis that is believed (or needs

to be supported) by the researcher.

01/12/2024 86
Alternative Hypothesis
• Is a statement that disagrees (opposes) with
Ho.
• The effect of interest is not zero

• Never contains “=” , “ ≤” or “≥ ” sign

• May or may not be accepted

01/12/2024 87
Summary

01/12/2024 88
Statistical Hypotheses

Test on parameter of a
single population

Tests comparing
Guide to common
parameters of two
hypothesis tests
populations

Tests comparing
parameters of more than
two populations

01/12/2024 89
Hypothesis testing

01/12/2024 90
Detail Process of Hypothesis Testing
1. Specify population parameter of interest
2. Formulate the null and alternative hypotheses
3. Specify the desired significance level, α
4. Define the rejection region
5. Take a random sample and determine whether or not
the sample result is in the rejection region
6. Reach a decision and draw a conclusion

01/12/2024 91
Decision rule
• Computed from the data of the sample
• The decision to reject or not to reject the Ho is based on
the magnitude of the test statistic.
• An example of a test statistic is the quantity

• When the variance of the population is unknown, we use

01/12/2024 92
Statistical Decision Rule

1. Rejection region

2. Pvalue approach

01/12/2024 93
REJECTION REGION

• If the standardized test statistic(calculated)

• In the rejection region, then reject


• Not in the rejection region, then fail to reject H0

01/12/2024 94
I. Rejection and Non-Rejection Regions…

• The values of the test statistic forming the


rejection region are less likely to occur and the
values making the acceptance (non-rejection)
region are more likely to occur if the Ho is true.

01/12/2024 95
II. P-value approach

• Reject Ho if P-value < α

• Accept Ho if P-value ≥ α
 P-value is the probability of obtaining a test statistic as
extreme as or more extreme than the actual test statistic
obtained if the Ho is true
 The larger the test statistic, the smaller is the P-value. OR,
the smaller the P-value the stronger the evidence against the
Ho.
01/12/2024 96
P-value…

• If the P-value ≥ 0.05,


 then the results are considered not statistically significant.

• If 0.01 < P-value < 0.05,


 then the results are significant.

• If 0.001 < P-value ≤ 0.01,


 then the results are highly significant.

• If the P-value ≤ 0.001,


 then the results are very highly significant.
01/12/2024 97
Level of Significance, α

• Is the probability of rejecting a true Ho


• Defines unlikely values of sample statistic if Ho is true
 Defines rejection region of the sampling distribution
• The decision is made on the basis of the level of significance,
designated by α.
• More frequently used values of α are 0.01, 0.05 and 0.10.
• α is selected by the researcher at the beginning

01/12/2024 98
Types of Errors Hypothesis Tests

• When we reject or fail to reject the Ho, we


commit errors
These are:
 Type I error()
 Type II error(β)

01/12/2024 99
Type I Error(False positive) Type II Error(False Negative)

01/12/2024 100
Type I Error
• The error committed when a true Ho is rejected
• Considered a serious type of error
• The probability of a type I error is the probability of
rejecting the Ho when it is true
• The probability of type I error is α
• Called level of significance of the test
• Set by researcher in advance
01/12/2024 101
Type II Error
• The error committed when a false Ho is not
rejected.
• The probability of Type II Error is 
• Usually unknown but larger than α

01/12/2024 102
Power
• The probability of rejecting the Ho when it is false.

Power = 1 – β = 1- probability of type II error

• We would like to maintain low probability of a Type I


error (α) and low probability of a Type II error (β)
[high power = 1 - β].

01/12/2024 103
Action (Conclusion) Reality

Ho True Ho False

Do not Correct action Type II error (β)


reject Ho (Prob. = 1-α) (Prob. = β= 1-Power)

Reject Ho Type I error (α) Correct action


(Prob. = α = Sign. level) (Prob. = Power = 1-β)

01/12/2024 104
Thank You !

01/12/2024 105

You might also like