0% found this document useful (0 votes)
33 views

Lecture 8b Inference June 22

Lecture 8b Inference June 22

Uploaded by

Santi Padmasari
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Lecture 8b Inference June 22

Lecture 8b Inference June 22

Uploaded by

Santi Padmasari
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

Inference

and
Hypothesis Testing

Lecture 8b
June 22, 2005
Kevin Schwartzman MD
Inference & Hypothesis Testing

Reading

Fletcher, chapters 9 and 11


Inference & Hypothesis Testing - Slide 1

Objectives

Students will be able to:

1. Distinguish between association and causation

2. Describe strengths and weaknesses of various study


designs with respect to causal inference

3. Define the terms alpha (significance level), beta,


type 1 and 2 errors, and statistical power

4. Explain the conceptual relationship between sample


size requirement and statistical power

5. Distinguish between clinical and statistical significance

6. Distinguish between measures of health effect and


measures of statistical association
Inference & Hypothesis Testing - Slide 2

Objectives

7. Distinguish indices of etiologic versus clinical or


public health importance

8. Describe the phenomenon of multiple comparisons

9. Apply this concept to understanding the difference


between primary and secondary research
objectives, and hypothesis testing versus hypothesis
generation

10. Provide potential explanations for conflicting


findings
Inference & Hypothesis Testing - Slide 3

During this course, we have focused primarily on


generation of valid estimates of association between
exposure (or treatment) and outcome, using various
study designs.

Valid estimates of association require that bias be


avoided (e.g. selection, measurement) and that
potential confounders be properly addressed.

Effect measures and the related statistical tests are used


to evaluate potential associations.

Association of exposure and outcome does not


necessarily imply causation - even without obvious
confounding, there may be other factors responsible.
Inference & Hypothesis Testing - Slide 4

Causes of most diseases remain incompletely


understood.

e.g. smoking is the strongest known risk factor


for lung cancer, but
- some non-smokers develop lung cancer too
- most smokers do not develop lung cancer

Rothman points out that differences which appear to be


due to chance now may ultimately be shown
to relate to important causal factors, e.g. genetics.
Inference & Hypothesis Testing - Slide 5

Hill criteria - used to evaluate possibility of causation;


(from Rothman, Modern Epidemiology, 2nd ed., 1998)

1. Strength of association (magnitude of effect)

2. Consistency - more an argument against associations due to chance

3. Specificity - idea that a cause should lead to a single effect -


not tenable

4. Temporality

5. Biologic gradient (dose-response)

6. Biologic plausibility

7. Coherence (also biologic)with natural history of disease/condition

8. Experimental evidence

9. Analogy parallels exposure-outcome relationship for similar


exposure and/or disease
Inference & Hypothesis Testing - Slide 6

Study Design and Inference

Ecologic Studies
- best for generating questions about
possible associations
- no idea about confounding factors
- many other potential explanations for
putative associations

Cross-Sectional Studies
- better ability to assess confounders
- significant problems with temporality
- best suited for stable exposures and outcomes
Inference & Hypothesis Testing - Slide 7

Cohort Studies
- temporality usually not a problem

- threatened by selection bias,


ascertainment issues (especially losses to follow-up),
confounding

Case-Control Studies
- temporality may be problematic
- key issue is control selection

- ascertainment issues particularly reconstruction


of exposure history
- confounders (measured, unmeasured)
Inference & Hypothesis Testing - Slide 8

Randomized Clinical Trials

May not be feasible, ethical, or appropriate


(e.g. etiologic studies)

Control of confounding is only truly optimal at


time of randomization
Selection of subjects may severely hamper
generalizability
Ascertainment issues (blinding, losses to follow-up)
Cointerventions, overlapping treatments
(contamination)
Efficacy vs. effectiveness (what is the relevant
question?)

Variable concordance between experimental and


observational studies addressing same questions.
Inference & Hypothesis Testing - Slide 9

Example: Association vs. Causation?

Spitzer et al. Use of beta-agonists and the risk of death


and near-death from asthma (NEJM 1992)

Beta-agonists, e.g. Ventolin - bronchodilators for


asthma treatment

Used Saskatchewan prescription database

Identified cohort of over 12,000 patients using asthma


medications
Inference & Hypothesis Testing - Slide 10

Nested Case-Control Study:

Identified 44 deaths probably due to asthma, plus


85 subjects with near-fatal asthma these were the
cases

655 controls randomly selected, after matching on


region, welfare use, age, hospitalization within 2 years,
date of entry into cohort, date of index event
Inference & Hypothesis Testing - Slide 11

Investigators found increased odds of beta-agonist use


among patients with fatal or near-fatal asthma

Also found dose-response effect,


e.g., adjusted OR 8.0 for 1-2 canisters of albuterol
per month compared with no use

Does increased use of beta agonists cause


asthma fatalities?
Inference & Hypothesis Testing - Slide 12

Biologic plausibility:
Beta agonists have cardiac effects, e.g. possible
arrhythmias - but this cause of death was not
specifically evaluated (though patients in study generally
died of respiratory failure)
Other potential explanations:
- confounding by indication (severity of disease)
- difficult to adjust for severity with this study design
- patient/physician behaviour
- marker for overreliance on symptomatic treatment
as opposed to more definitive anti-asthma therapy
- may also be a marker for quality of asthma care,
some aspects of which could not be adjusted
in this type of analysis
Inference & Hypothesis Testing - Slide 13

Like other scientific research, inference in epidemiology


involves explicit hypothesis testing
This requires a clearly defined hypothesis a priori

Data obtained is used to address the primary hypothesis


using a statistical approach, i.e.

1. There is a so-called alternative hypothesis (Ha)


advanced, regarding an association between
exposure and outcome

2. There is a corresponding null hypothesis (Ho),


which states that there is no association
between exposure and outcome.
Inference & Hypothesis Testing - Slide 14

Various test statistics are calculated, depending on


the setting, which summarize the difference
between the data observed, and those expected
under the null hypothesis, while accounting for
variance (spread) in the data.

Statistical hypothesis testing involves an estimate


as to the probability of obtaining an equally or more
extreme test statistic if the null hypothesis is in fact
correct.
Inference & Hypothesis Testing - Slide 15

Alpha () is the preset probability level at which


we reject the null hypothesis,
i.e. it is unlikely to explain the observed findings

- this is conventionally set at = 0.05


i.e. the probability of the observed findings
is 5% or less if the null hypothesis is in fact correct

- the estimated probability of obtaining an


equally or more extreme test statistic under
the null hypothesis is the significance level
(P-value)
Inference & Hypothesis Testing - Slide 16

Use of the confidence interval (e.g. 95% CI) has


the advantage of incorporating the notion of
spread/precision while preserving the hypothesis
testing component

- the 95% confidence interval may be thought of as


the range within which 95% of the estimates for a
value (e.g. mean cholesterol levels among patients
with first myocardial infarcts) or an association
(e.g. odds ratio for smoking in lung cancer patients
vs controls) would be expected to lie, if the sampling
or study were repeated many times

- we are 95% confident that the true value lies


within the specified range

- based on observed value and its variance


Inference & Hypothesis Testing - Slide 17

For example, the 95% CI for an OR =

exp Ln(OR) 1.96 var Ln(OR)

and

var Ln (OR )
1 1 1 1
a b c d
In general, if the 95% CI for the effect measure does
not include the null value then the association is
statistically significant at the P 0.05 threshold.
Similarly, if the 99% CI does not include the null value,
then the association is significant at the P 0.01
threshold.
Inference & Hypothesis Testing - Slide 18

The larger the sample size, the smaller the variance


of the various parameter estimates and of estimates
of effect measures
(increased precision with larger sample sizes)

- reflected in narrower confidence intervals

- stratification and related techniques


for statistical adjustment of confounders
lead to wider confidence intervals, because
of smaller numbers in study subgroups
Inference & Hypothesis Testing - Slide 19

Failure to reject the null hypothesis does not mean


the alternative hypothesis is false.

The sample size may simply be too small to detect a


significant deviation from values expected under
the null hypothesis.

Example: You believe a coin is loaded, i.e. not fair

Of 10 tosses, 7 are heads - could happen quite easily


with a fair coin (P=0.34, two-sided)
For 7/10, 95% CI is (0.35, 0.93)

Of 100 tosses, 70 are heads - very unlikely to happen


with a fair coin (P=0.00008, two-sided)
For 70/100, 99% CI is (0.57, 0.81)
Inference & Hypothesis Testing - Slide 20

The probability of failing to reject the null hypothesis


when the alternative hypothesis is in fact true
is known as beta ()

depends on the number of observations,


the variance of the measurement in question, and
the magnitude of the true difference/association (Ha)
- as well as the significance threshold

decreases with: observations


variance
magnitude of true underlying effect
-level
Inference & Hypothesis Testing - Slide 21

and are analogous to false-positive and false-


negative rates with diagnostic tests:

Ho true Ha true (Ho false)

Accept Ho Correct Type 2 error


Reject Ho Type 1 error Correct

= P (type 1 error) = P (reject Ho | Ho true)


= P (type 2 error) = P (fail to reject Ho | Ha true)
Inference & Hypothesis Testing - Slide 22

The statistical power of a study is analogous to


the sensitivity of a diagnostic test

It is the ability to detect an association when it is in fact


present in the underlying population of interest

Hence the power of a study is equal to 1-


just as true positive rate = 1 - false negative rate

Insufficient power leads to type 2 error--probably a


much more frequent phenomenon than type 1 error

Failure to detect an association does not prove


there is no association

Power issues should be addressed when studies are


reported
Inference & Hypothesis Testing - Slide 23

In designing studies, usually a minimum power of 80%


is targeted; sometimes 90% ( = 0.20 or 0.10)

The estimated sample size requirement hinges on


the power to detect a specified magnitude of
effect/association, assuming a given variance in the
measurements of interest.

For example, the ability to detect a decrease in the


10-year cumulative incidence of first myocardial
infarction from 20% to 10% among subjects
randomized to an intensive exercise program vs. usual
care
Inference & Hypothesis Testing - Slide 24

During the design phase, this requires


a specific definition of Ha,
e.g. a decrease in risk from 20% to 10%
associated with intensive exercise
--not simply a reduction in risk

For 80% power to detect


a significant difference with = 0.05, and
an underlying decrease in the risk of MI
from 20% to 10%,
219 subjects per study group are required

For 90% power, 286 per group are needed

Based on binomial distribution, two-tailed test


Inference & Hypothesis Testing - Slide 25

This may also require an estimate of the expected


variance in measurements--particularly if continuous

For example,
to detect a treatment effect of an antihypertensive drug:
If underlying mean diastolic BP is
80 mm Hg in treatment group and 90 in comparison group,
with standard deviation of 10 in both, then
16 subjects are needed per group, for 80% power
- If the standard deviation is 20 in both groups,
then 63 per group are needed for 90% power

Predictions about effect size and measurement variability


may be best guesses based on other research,
or may be obtained from pilot studies conducted in the
same or similar settings
Inference & Hypothesis Testing - Slide 26

If the magnitude of the true association is greater


than predicted, power will increase for a given
sample size.

If it is less than predicted, power will decrease for a


given sample size.

Hence it is best to base sample size requirements on


conservative estimates of effect/association.

Sample size calculations should also account for


refusals, incomplete data, dropouts, etc.
Inference & Hypothesis Testing - Slide 27

Statistical hypothesis testing involves a judgment


as to whether the observed findings are likely to
relate to chance, rather than to a true association

Hence an effect, association, or contrast that is


statistically significant may be considerable or it
may be trivial; statistical significance per se does
not imply causation or even clinical/scientific
importance

For a fixed contrast, (e.g. 1% fewer myocardial


infarctions with a given treatment), statistical
significance increases with the number of
observations
Inference & Hypothesis Testing - Slide 28

Example:
Researcher 1 conducts a randomized clinical trial comparing
5-year mortality among smokers post-myocardial infarction
who receive vs do not receive an intensive smoking
cessation intervention.
The following results are obtained:

Intervention No Intervention

Dead 20 40

Alive 180 160

For death, RR = (20/200)/(40/200) = 0.5 (0.30, 0.82)


RD = 0.1 - 0.2 = -0.1 (-0.17, -0.03)
P = 0.005
Inference & Hypothesis Testing - Slide 29

Researcher 2 conducts a randomized clinical trial


comparing 5-year mortality among post-myocardial
infarction patients who receive vs do not receive a novel
medication.
Recruitment is massive:

Medication No Medication

Dead 19,500 20,000

Alive 80,500 80,000

RR = (19,500/100,000)/(20,000/100,000)
= 0.975 (0.958, 0.992)
RD = -0.005 (-0.0085, -0.0015)
P = 0.005
Inference & Hypothesis Testing - Slide 30

Which finding carries more clinical significance?

How many patients must be treated to save one


additional life?

For study 1, 1/RD = 1/0.1 = 10 (5.9, 33)

For study 2, 1/RD = 1/0.005 = 200 (117, 667)

The number needed to treat (NNT) is the reciprocal of


the estimated risk difference

Like the risk difference, it reflects the frequency of the


outcome in the target group, as well as the treatment
effect

Hence even if an intervention has a major protective


effect, the risk difference will be small (and the NNT
large) if the outcome of concern is uncommon
Inference & Hypothesis Testing - Slide 31

Magnitude of health effect is not the same as strength


of statistical association

A lower P value or larger test statistic does not make a


treatment more effective or a risk factor for disease
more important

For a given effect size, statistical association grows


stronger as the number of observations/subjects grows
and/or the variance among observations/subjects falls

Health effects are measured using


a) ratio measures, e.g. odds ratio, risk or rate ratio
b) difference measures e.g. risk or rate difference

Confidence intervals should accompany point estimates


of effect size (ratio or difference)
Inference & Hypothesis Testing - Slide 32

Ratio measures of effect emphasize the etiologic


importance of an exposure or intervention, i.e. the
scientific relationship between exposure and outcome

An exposure is often said to be a strong risk factor


for an outcome if the associated ratio measure
(odds ratio, risk or rate ratio) is high

Hypothetical example:
- rate of malignant mesothelioma in asbestos-exposed
persons: 100/100,000 person-years
- rate in unexposed: 0.1/100,000 person-years

IRR = 1000; ID = 99.9/100,000 person-years


Inference & Hypothesis Testing - Slide 33

The attributable fraction AMONG THE EXPOSED


is also an indication of etiologic importance,
since it reflects the magnitude of the ratio measure

e.g. (IRR-1)/IRR, which is 99.9% in this example


(99.9% of mesothelioma cases in asbestos-exposed
persons are the result of that exposure)

In contrast, the risk or rate difference emphasizes


the impact of an agent on the health of a group,
community, or population
Inference & Hypothesis Testing - Slide 34

An exposure can lead to a large risk/rate difference


if the outcome in question is common,
even if the risk/rate ratio is relatively low

Hypothetical example:
- rate of myocardial infarction among
60 year-old male smokers:
1,500/100,000 person-years
- rate among 60 year-old male nonsmokers:
1000/100,000 person-years

IRR = 1.5, but ID = 500/100,000 person-years


Inference & Hypothesis Testing - Slide 35

Population attributable risk/rate:


the absolute reduction in risk or incidence
which would occur in the population
if the exposure of concern were removed

= It - Io, which is equivalent to Pexp x (Ie - Io)

since It = (Ie x Pexp) + [I0 x (1 Pexp)]

Probably the key parameter for public health


since it incorporates both the rate/risk difference
and the frequency of exposure
Inference & Hypothesis Testing - Slide 36

Population attributable risk/rate fraction:


the fraction of disease in the population
that would be eliminated if
the exposure of concern were removed

= (It - Io)/It, or (RR-1)/RR x P(exposure disease)

A rare disease which is almost always seen


in association with a characteristic exposure
may have a very high PAR fraction
but a very low PAR in absolute terms
Inference & Hypothesis Testing - Slide 37

Epidemiology has been defined as the study of


the distribution and determinants of disease
occurrence in human populations (MacMahon
and Pugh)
The methods of epidemiology may be used to
address questions in clinical medicine, public
health, occupational health, etc.
Inference & Hypothesis Testing Slide 38

Clinical medicine involves management of individual


patients; management is usually based on experience
with other, similar patients (hopefully involving
published studies which were properly conducted and
reported)
Public health practice involves the prevention or
treatment of disease on a community or population
level.
Epidemiologic methods are in fact crucial to clinical and
public health research and decisions, but priorities for
clinical and public health intervention may not match
priorities in epidemiologic research.
Inference & Hypothesis Testing - Slide 39

Clinicians have a duty to individual patients

Public health practice is more explicitly utilitarian


(greatest good for the greatest number)

Some of the greatest challenges in clinical medicine


and public health involve situations where answers
to the classical etiologic or treatment questions
are well known

Such challenges may involve questions of behaviour


uptake, adherence, accessibility, equitability, funding,
etc.
Inference & Hypothesis Testing - Slide 40

Examples:

- diagnosis of pulmonary embolism


- smoking cessation in persons at risk
for cardiovascular disease

- injury prevention
- tuberculosis treatment programs
- uptake of safer sexual practices
Inference & Hypothesis Testing - Slide 41

Regardless of the content area, it is best to focus on


one key hypothesis/comparison of interest when
designing, conducting, and reporting a study

This allows a clear idea of the target population


and sampling strategy, and of data to be collected
regarding exposure, outcome, and potential
confounders or effect modifiers

Permits unambiguous sample size estimates

Avoids the problem of multiple comparisons


Inference & Hypothesis Testing - Slide 42

Recall that is the preset probability level at which


we reject the null hypothesis (usually no association
between exposure and outcome), based on the
probability of observing results as or more extreme
under the null hypothesis.

With the conventional = 0.05 threshold, we accept


we accept a 5% chance that we are erroneously
rejecting the null hypothesis, for a given exposure-
outcome association (a false-positive finding)

If we examine 2 potential exposure-outcome


associations, the chance of correctly detecting no
association when none in fact exists is (0.95)2 =
0.9025
Inference & Hypothesis Testing - Slide 43

Hence there is a 9.75% chance of detecting at least


one spurious (false) association

The more potential associations we examine, the


greater the probability of detecting a significant
association even if the truth is that no association of
any sort exists

It is 1 - (1 - )x , where x is the number of


associations/comparisons examined (this assumes
that each is independent of the others)

If 10 exposures are examined in a case-control study,


there is a 40% chance that at least one will yield a P-
value of 0.05 or less, if none of the exposures is truly
related to the outcome of concern
Inference & Hypothesis Testing - Slide 44

This is the phenomenon of multiple comparisons: the


more comparisons that are made, the greater the
chance of a statistically significant
difference/association in the absence of any true
effect.

This does not exclude the possibility that true


associations have indeed been identified--but it makes
them much harder to distinguish and to justify

There are statistical techniques for adjustment for


multiple comparisons

may be decreased, such that the aggregate false-


positive probability remains low
Inference & Hypothesis Testing - Slide 45

Lowering the value of confers its own problems

- this is usually done post hoc

- this renders both false and true associations


more difficult to detect, since the sample size is
already fixed
Inference & Hypothesis Testing - Slide 46

Researchers generally distinguish between primary


and secondary objectives or endpoints

The primary objective involves the key question


to be answered, and design of the study (including
sample size) hinges on this objective

Secondary objectives involve questions of interest


which the investigators would like to explore

It is understood that these questions may be


addressed with less precision and less power to
detect differences between groups

Detection of secondary associations or of differences


in secondary endpoints does not carry the same
weight as the primary association/endpoint of interest
(e.g. subgroup analyses)
Inference & Hypothesis Testing - Slide 47

A study is labelled as hypothesis testing when study


design, conduct, and analysis are based primarily on
one specific hypothesized association

The evaluation of secondary endpoints or multiple


associations should be considered to represent
hypothesis generation, i.e.
the identification of future primary research questions
Inference & Hypothesis Testing - Slide 48

Why do studies addressing the same question sometimes


yield conflicting results?

Internal validity issues (selection or information bias,


confounding)--especially if different study designs
are used

Differences in exposure or outcome definition

Association actually due to chance (type 1 error)

True association not detected (type 2 error)

Differences in setting (generalizability issues)


e.g. different population characteristics,
differential presence of effect modifiers

Was the same question truly addressed?


Inference & Hypothesis Testing - Slide 49

Example 1:
Many studies have evaluated the protective effect
of the BCG vaccine against tuberculosis;
estimates of efficacy range from 0-80%

Differences in who was vaccinated and when


--in some instances, vaccinated individuals were
already infected (conferring no protection)

Behr and colleagues demonstrated that the vaccine


likely lost protective elements over the years
Inference & Hypothesis Testing - Slide 50

Example 2:
Risk of drug-induced hepatitis among persons
taking isoniazid preventive therapy for tuberculosis

A US Public Health Service study in the early 1970s


found substantial risks which increased with age--
over 2% for persons over 50, and a number of deaths

A recent Seattle study found risks on the order of


0.1%, with no fatalities

The earlier study was distorted by a large number of


deaths in Baltimore, which experienced an unrelated
epidemic of hepatitis at that time

The more recent study involved stricter monitoring


techniques (and probably stricter patient selection)
Inference & Hypothesis Testing - Slide 51

Example 3:
Conflicting reports about the advisability of calcium-
channel blockers for the treatment of hypertension

Estacio, NEJM 1998:


A prospective, randomized, blinded clinical study
in a population of patients with non-insulin dependent
diabetes mellitus and hypertension demonstrated
that treatment with enalapril for a mean of five years
was associated with a lower incidence of myocardial
infarction than was treatment with nisoldipine [a
calcium-channel blocker] for the same period.
Inference & Hypothesis Testing - Slide 52

Tuomilehto, NEJM 1999:


Our trial demonstrated that [calcium-channel blocker]
-based antihypertensive treatment is particularly
beneficial in older diabetic patients with isolated
systolic hypertension. Thus our findings do not
support the hypothesis that the use of long-acting
calcium-channel blockers may be harmful in diabetic
patients.

Different target population

Different comparison:
in study 1,
it was calcium-channel blocker vs alternative drug;
in study 2
it was calcium-channel blocker vs placebo!
Inference & Hypothesis Testing - Slide 53

Example 4
Mechanical ventilation strategies for ARDS

Control groups were not managed consistently, which


altered the results of the intervention

Example 5
Post-menopausal hormone replacement therapy

- Findings from randomized controlled trial contradicted


earlier observational studies

- Presumably reflects selection and measurement


issues, other confounders

You might also like