Measurement (Study Design and Inferential Statistics)
Measurement (Study Design and Inferential Statistics)
Designs
By : Alemayo E(BSc,MPH)
1
Epidemiological Study Designs
Study designs could be
• Exploratory/experimental
• Observational - Analytical or
- descriptive
2
Epidemiologic Study Designs
Experimental Observational
(RCTs)
Analytical Descriptive
(Describing patterns
of disease)
cross-sectional Cohort
Case-Control
3
Choosing Epidemiologic Study
Designs
4
Hierarchy of Epidemiologic Study Design
Establish causality
5
Epidemiologic Study Designs
6
Grimes & Schulz, 2002 (www)
A cross-sectional study Design
7
A cross-sectional…
9
Cross-sectional…
• The current or historical status of individuals is
assessed and may be examined in relation to
some current or past exposure
• These studies are obviously most useful for
conditions that are not rapidly fatal,
• but not routinely brought to medical attention
(e.g., elevated blood pressure, elevated blood
cholesterol, many psychiatric disorders, diet, Sub
clinical infections, etc)
10
Cross-sectional…
• participants for a cross-sectional study are
generally chosen without previous knowledge
of their disease or exposure status,
11
Cross-sectional…
• the researcher has no control over the exposure
of interest (research subject), No interventions
• It involves
– identifying a defined population at a
particular point in time (target population)
– measuring a range of variables on an
individual basis
– include past and current practices/behaviors
12
Cross Sectional…
• KAP (knowledge, attitude, and practices ) study:
• KAP studies are purely descriptive and help to build up
a better understanding of the behavior of the population,
without necessarily relating this to any disease or health
outcome.
• It is a management tool: health service managers and
planners may make use of cross-sectional survey to
assess utilization and effectiveness of service
• Development of hypothesis: Hypotheses on the causes
of disease may be developed using data from cross-
sectional study survey
13
Limitation of cross-sectional study
15
2 . Cohort Study designs
An approach to studying disease incidence and risk
factors:
Approach:
1. Start with a population at risk (a batch of
population)
2. Measure characteristics at baseline
3. Follow-up the population (with out the event) over
time
4. Compare the rate of events (incidence) among with
& without the exposure status
16
Cohort Studies
(A) Diseased
Exposed Not diseased
People
Population without
disease (B) Not Diseased
exposed
Not disease
17
Cohort Study..
18
Prospective cohort study
Disease
Exposure Study starts occurrence
time
Disease
Study starts Exposure occurrence
time
19
Retrospective cohort studies
Disease
Exposure Study starts
occurrence
time
20
Advantages of Cohort Studies
21
Disadvantages of Cohort Studies
22
3.Case control Study
• A study that starts with the identification of
persons with the disease (or other outcome
variable, the case) of interest, and
• A suitable comparison, reference or control
group of persons without the disease
• And inquiries are made about past exposure
• We compare diseased (cases) and non-diseased
(controls) to find the level of their exposure
• Exposure status is traced back in time
23
Case control Study…
past
General population
24
Cont
• A case-control study is designed to help
determine if an exposure is associated with an
outcome
• Case-control study is always retrospective
because it starts with an outcome then traces
back to investigate exposures
• Controls must be comparable to cases in every
aspects except that they do not have the
disease
25
Case…
• Preferably controls are drawn from the same
population as the cases
There are two basic types of case-control studies:
A/ Non-matched case-control study
In this study design, the number of controls does
not necessarily equal the number of cases
For example, we may enroll 105 cases and 178
controls
26
Case…
B/ Matched case control
• The study has linked a case to a control based
on matching of one or more variables
• The summary table will differ for a matched
case-control study
• Let's look at an example. Suppose we plan to
match cases to controls by gender and age (+ 5
years). We first identify the following case:
27
Case…
28
Case…
29
Advantages of Case-Control Studies
30
Disadvantages of Case-Control Studies
31
4. Experimental study design
• The subjects in the study who actually
receive the treatment of interest are called
the treatment group
• The subjects who receive no treatment are
called the comparison group ( control group)
• In an experiment, we are interested in the
effect of treatment on outcome of interest
32
4.Experimental…
Before considering an experimental design, you
need to ask yourself the following questions:
• Is it possible/feasible to apply random selection
to assign people to treatment and control group?
• Is the process of random selection into
experimental and control groups ethical and
legal?
e.g. pregnant mother and alcohol drinking etc
(fetal alcoholic syndrome)
33
4. Experimental …
There are certain procedures you need to
follow:
• First randomly select a treatment & control
group
• Both, treatment & control groups must come
from the same population,
• Both groups should have about the same
number of subjects in them,
• As a general rule, not less than 25 subjects in
each group
34
4. Experimental …
35
4. Experimental …
39
Confounding…
40
Controlling chance, bias
Chance
• Obtaining adequate sample size for the study
could reduce the likelihood of chance as a
possible explanation
Bias
• selection bias is best eliminated by
randomization
• information bias can be eliminated by: using
blinding procedures
42
Confounding can be controlled by:
At design level
1. Randomization: assures equal distribution of confounders
between study and control groups e.g. selection bias
2. Restriction: subjects are restricted by the levels of a
known confounder (restricting extraneous variables)
3. Matching: potential confounding factors are kept equal
between the study groups
At analysis level
4. Stratification: for various levels of potential confounders
e.g. categorization such as age category, sex, level of
education
5. Multivariable analysis (Logistic Regression)
43
Types of data
1. Primary data: data that one has collected oneself
• Better understood by the researcher
• Usually contains few variables
2. Secondary data: data that has already been
collected by somebody else
• Not well understood by the researcher
• Very large number of variables
44
Quantitative data collection
Techniques
• Interview administered questionnaire
• Self-administered questionnaire
• Direct measurement
• Review of record
45
Qualitative data collection
Techniques
• Key informant interview
• In-depth interview
• Focus group discussions
• Observations (direct, participant)
46
Statistical inference
01/12/2024 47
Basic concepts
01/12/2024 48
01/12/2024 49
1. Estimation
01/12/2024 50
Estimates
We can estimate a Population With a Sample
Parameter Statistic(a Point Estimate)
Proportion P
Mean μ
Standard deviation s
Mean difference
Proportion difference
01/12/2024 51
Example
30.2%,
01/12/2024 52
Methods of estimation
1) Point estimate:
It is a single numerical value used to estimate the
corresponding population parameter.
It is single best guess
It has the form: [ Value ]
01/12/2024 53
Methods of estimation
2)Interval estimate
01/12/2024 54
Ctd…
• A point estimate is a single number
• A confidence interval provides additional information
on variability
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval
01/12/2024 55
Estimation Process
Random Sample
I am 95%
confident that μ is
Population between 40 & 60.
Mean
(mean, μ, is = 50
unknown)
Sample
01/12/2024 14
Properties of good estimator
01/12/2024 57
Confidence Intervals[CI]
01/12/2024 58
Higher confidence requires a wider interval
01/12/2024 59
The general formula for CIs
The value of the statistic in my sample (e.g., mean,
proportion, odds ratio, etc.)
01/12/2024 60
Lower limit
Upper limit
Note:
• Measure of how confident we want to be = critical value = confidence
coefficient
01/12/2024 61
Schematic of a confidence interval
01/12/2024 62
Definition: 95%CI
1. Probabilistic interpretation:
• If all possible random samples (an infinite number) of
a given sample size (e.g. 10 or 100) were obtained and
if each were used to obtain its own CI, then 95% of all
such CIs would contain the unknown population
parameter; the remaining 5% would not.
• It is incorrect to say “there is a 95% probability that
the CI contains the unknown population parameter”.
01/12/2024 63
2. Practical interpretation
01/12/2024 64
Five 95%CI from sampling distribution of means
Figure 1
01/12/2024 65
Confidence Limits
• The confidence interval is
σ
x z α/2
n
• The endpoints of the interval are
σ
UCL x z α/2 Upper confidence limit
n
σ
LCL x z α/2 Lower confidence limit
n
66
01/12/2024
Margin of Error
σ
x z α/2
• The confidence interval, n
01/12/2024 68
Finding z1-/2
• Consider a 95% confidence interval:
1 .95
α α
.025 .025
2 2
69
01/12/2024
Intervals and Level of Confidence
Sampling Distribution of the Mean
/2 1 /2
x
Intervals extend μx μ
from x1
σ 100(1-)% of
LCL x z x2
intervals
n constructed
to
contain μ;
σ
UCL x z 100()% do not.
n
Confidence Intervals
70
01/12/2024
1. CI for a Single Population Mean
Known variance (large sample size)
• Three elements to a CI:
1. Point estimate
2. Confidence coefficient
3. SE of the point estimate
01/12/2024 71
The point estimate of the sample mean
The standard error the mean() is
Commonly used CIs are 90%, 95%, and 99%
0.05 95 1.960
0.01 99 2.576
01/12/2024 72
01/12/2024 73
Degrees of Freedom (df)
d.f. = n - 1
01/12/2024 74
Finding Critical Values of t
Find the critical value for a 95% confidence level when the
sample size is 15
Solution
• Because the degrees of
freedom are: d.f.= n-1 15-
1=14
• Using d.f. =14 and c = 0.95,
you can find the critical
value as shown by the
highlighted areas in the
table.
01/12/2024 75
Choosing between z and t
Start
01/12/2024 76
Example
01/12/2024 77
Solution
• Putting this altogether:-
• Lower limit
= (point estimate) - con coeff(SE of point estimate )
= 267 - (1.729) * (42.7)
= 193.17
• Upper limit
=(point estimate) + con coeff(SE of point estimate )
= 267 + (1.729)*(42.7
= 340.83
Thus, a 90%CI for the true mean duration of surgery is
(193.17, 340.83) minutes
01/12/2024 78
Increasing the sample size
• Increases in the sample size reduce the width of the
confidence interval.
Example
• If the sample size in the above example is doubled in
200, and if 50 are left handed in the sample, then the
interval is still centered at 0.25, but the width shrinks
to 0.19, 0.31
01/12/2024 79
Hypotheses Testing
01/12/2024 80
Hypothesis Testing(HT)
• A deductive procedure to judge claims about
parameters.
01/12/2024 81
Hypothesis Testing(HT)
Hypothesis Testing (HT) provides an objective
framework for making decisions
01/12/2024 82
Types of Hypothesis
1. The Null Hypothesis, H0
01/12/2024 83
The Null Hypothesis, H0
H 0 : μ 50 H 0 : x 50
01/12/2024 9-84
The Null Hypothesis…
• Begin with the assumption that the Ho is true
Similar to the notion of innocent until proven
guilty
• Always contains “=” , “ ≤” or “≥ ” sign
• May or may not be rejected
01/12/2024 85
Types of hypothesis
01/12/2024 86
Alternative Hypothesis
• Is a statement that disagrees (opposes) with
Ho.
• The effect of interest is not zero
01/12/2024 87
Summary
01/12/2024 88
Statistical Hypotheses
Test on parameter of a
single population
Tests comparing
Guide to common
parameters of two
hypothesis tests
populations
Tests comparing
parameters of more than
two populations
01/12/2024 89
Hypothesis testing
01/12/2024 90
Detail Process of Hypothesis Testing
1. Specify population parameter of interest
2. Formulate the null and alternative hypotheses
3. Specify the desired significance level, α
4. Define the rejection region
5. Take a random sample and determine whether or not
the sample result is in the rejection region
6. Reach a decision and draw a conclusion
01/12/2024 91
Decision rule
• Computed from the data of the sample
• The decision to reject or not to reject the Ho is based on
the magnitude of the test statistic.
• An example of a test statistic is the quantity
01/12/2024 92
Statistical Decision Rule
1. Rejection region
2. Pvalue approach
01/12/2024 93
REJECTION REGION
01/12/2024 94
I. Rejection and Non-Rejection Regions…
01/12/2024 95
II. P-value approach
• Accept Ho if P-value ≥ α
P-value is the probability of obtaining a test statistic as
extreme as or more extreme than the actual test statistic
obtained if the Ho is true
The larger the test statistic, the smaller is the P-value. OR,
the smaller the P-value the stronger the evidence against the
Ho.
01/12/2024 96
P-value…
01/12/2024 98
Types of Errors Hypothesis Tests
01/12/2024 99
Type I Error(False positive) Type II Error(False Negative)
01/12/2024 100
Type I Error
• The error committed when a true Ho is rejected
• Considered a serious type of error
• The probability of a type I error is the probability of
rejecting the Ho when it is true
• The probability of type I error is α
• Called level of significance of the test
• Set by researcher in advance
01/12/2024 101
Type II Error
• The error committed when a false Ho is not
rejected.
• The probability of Type II Error is
• Usually unknown but larger than α
01/12/2024 102
Power
• The probability of rejecting the Ho when it is false.
01/12/2024 103
Action (Conclusion) Reality
Ho True Ho False
01/12/2024 104
Thank You !
01/12/2024 105