Epidemiology 3rd Edition
Visit the link below to download the full version of this book:
https://medipdf.com/product/epidemiology-3rd-edition/
Click Download Now
vi Contents
Part THREE Threats to Validity and Issues of Interpretation 107
CHAPTER 4 Understanding Lack of Validity: Bias 109
4.1 Overview 109
4.2 Selection Bias 111
4.3 Information Bias 116
4.4 Combined Selection/Information Biases 133
References 147
Exercises 150
CHAPTER 5 Identifying Noncausal Associations: Confounding 153
5.1 Introduction 153
5.2 The Nature of the Association Between
the Confounder, the Exposure, and the Outcome 156
5.3 Theoretical and Graphical Aids to Frame Confounding 162
5.4 Assessing the Presence of Confounding 164
5.5 Additional Issues Related to Confounding 171
5.6 Conclusion 179
References 180
Exercises 182
CHAPTER 6 Defining and Assessing Heterogeneity of Effects: Interaction 185
6.1 Introduction 185
6.2 Defining and Measuring Effect 186
6.3 Strategies to Evaluate Interaction 186
6.4 Assessment of Interaction in Case-Control Studies 196
6.5 More on the Interchangeability of the Definitions of Interaction 203
6.6 Which is the Relevant Model? Additive Versus Multiplicative Interaction 205
6.7 The Nature and Reciprocity of Interaction 207
6.8 Interaction, Confounding Effect, and Adjustment 211
6.9 Statistical Modeling and Statistical Tests for Interaction 212
6.10 Interpreting Interaction 214
6.11 Interaction and Search for New Risk Factors in Low-Risk Groups 219
6.12 Interaction and “Representativeness” of Associations 221
References 222
Exercises 224
Part four Dealing with Threats to Validity 227
CHAPTER 7 Stratification and Adjustment: Multivariate Analysis in
Epidemiology 229
7.1 Introduction 229
Contents vii
7.2 Stratification and Adjustment Techniques to Disentangle Confounding 230
7.3 Adjustment Methods Based on Stratification 234
7.4 Multiple Regression Techniques for Adjustment 248
7.5 Alternative Approaches for the Control of Confounding 282
7.6 Incomplete Adjustment: Residual Confounding 293
7.7 Over-Adjustment 295
7.8 Conclusion 296
References 301
Exercises 305
CHAPTER 8 Quality Assurance and Control 313
8.1 Introduction 313
8.2 Quality Assurance 313
8.3 Quality Control 318
8.4 Indices of Validity and Reliability 328
8.5 Regression to the Mean 358
8.6 Final Considerations 359
References 360
Exercises 363
Part FIVE Issues of Reporting and Application
of Epidemiologic Results 367
CHAPTER 9 Communicating Results of Epidemiologic Studies 369
9.1 Introduction 369
9.2 What to Report 369
9.3 How to Report 373
9.4 Conclusion 385
References 386
Exercises 388
CHAPTER 10 Epidemiologic Issues in the Interface with Public Health Policy 391
10.1 Introduction 391
10.2 Causality: Application to Public Health and Health Policy 392
10.3 Decision Tree and Sensitivity Analysis 408
10.4 Meta-Analysis 413
10.5 Publication Bias 416
10.6 Summary 420
References 421
Exercises 426
viii Contents
APPENDIX A Standard Errors, Confidence Intervals, and Hypothesis Testing
for Selected Measures of Risk and Measures of Association 431
APPENDIX B Test for Trend (Dose Response) 455
APPENDIX C Test of Homogeneity of Stratified Estimates
(Test for Interaction) 459
APPENDIX D Quality Assurance and Quality Control Procedures Manual for
Blood Pressure Measurement and Blood/Urine Collection in
the ARIC Study 461
APPENDIX E Calculation of the Intraclass Correlation Coefficient 469
APPENDIX F Answers to Exercises 473
Index 501
Preface
This book was conceived as an intermediate epidemiology textbook. Similarly to the
first and second editions, the third edition explores and discusses key epidemiologic
concepts and basic methods in more depth than that found in basic textbooks on
epidemiology. For the third edition, new examples and exercises have been added to all
chapters. In Chapters 7 and 10, respectively, we included discussions of novel epidemi-
ologic strategies for handling confounding (i.e., instrumental variables and propensity
scores) and of decision tree as a decision-making tool.
As an intermediate methods text, this book is expected to have a heterogeneous
readership. Epidemiology students may wish to use it as a bridge between basic and
more advanced epidemiologic methods. Other readers may desire to advance their
knowledge beyond basic epidemiologic principles and methods but are not statistically
minded and are thus reluctant to tackle the many excellent textbooks that strongly focus
on epidemiology’s quantitative aspects. The demonstration of several epidemiologic
concepts and methods needs to rely on statistical formulations, and this text exten-
sively supports these formulations with real-life examples, hopefully making their logic
intuitively easier to follow. The practicing epidemiologist may find selected portions of
this book useful for an understanding of concepts beyond the basics. Thus, the common
denominators for the intended readers are familiarity with the basic strategies of
analytic epidemiology and a desire to increase their level of understanding of several
notions that are insufficiently covered (and naturally so) in many basic textbooks. The
way in which this textbook is organized makes this readily apparent.
In Chapter 1, the basic observational epidemiologic research strategies are reviewed,
including those based on studies of both groups and individuals. Although descriptive
epidemiology is not the focus of this book, birth cohort analysis is discussed in
some depth in this chapter because this approach is rarely covered in detail in basic
textbooks. Another topic in the interface between descriptive and analytical epidemi-
ology—namely, ecological studies—is also discussed, with a view toward extending
its discussion beyond the possibility of inferential (ecological) bias. Next, the chapter
reviews observational studies based on individuals as units of observation—that is,
cohort and case-control studies. Different types of case-control design are reviewed.
The strategy of matching as an approach by which to achieve comparability prior to
data collection is also briefly discussed.
Chapters 2 and 3 cover issues of measurement of outcome frequency and measures
of association. In Chapter 2, absolute measures of outcome frequency and their calcu-
lation methods are reviewed, including the person-time approach for the calculation
of incidence density, and both the classic life table and the Kaplan-Meier method for
the calculation of cumulative incidence. Chapter 3 deals with measures of association,
including those based on relative (e.g., relative risk, odds ratio) and absolute (attrib-
utable risk) differences. The connections between measures of association obtained in
ix
x Preface
cohort and case-control studies are emphasized. In particular, a description is given of
the different measures of association (i.e., odds ratio, relative risk, rate ratio) that can
be obtained in case-control studies as a function of the control selection strategies that
were introduced in Chapter 1.
Chapters 4 and 5 are devoted to threats to the validity of epidemiologic studies,
namely bias and confounding. In Chapter 4, the most common types of bias are
discussed, including selection bias and information bias. In the discussion of
information bias, simple examples are given to improve the understanding of the
phenomenon of misclassification resulting from less-than-perfect sensitivity and speci-
ficity of the approaches used for ascertaining exposure, outcome, and/or confounding
variables. This chapter also provides a discussion of cross-sectional biases and biases
associated with evaluation of screening procedures; for the latter, a simple approach
to estimate lead time bias is given, which may be useful for those involved in evalu-
ative studies of this sort. In Chapter 5, the concept of confounding is introduced,
and approaches to evaluate confounding are reviewed. Special issues related to
confounding are discussed, including the distinction between confounders and inter-
mediate variables, residual confounding, and the role of statistical significance in the
evaluation of confounding effects.
Interaction (effect modification) is discussed in Chapter 6. The chapter presents the
concept of interaction, emphasizing its pragmatic application as well as the strategies
used to evaluate the presence of additive and multiplicative interactions. Practical
issues discussed in this chapter include whether to adjust when interaction is suspected
and the importance of the additive model in public health.
The next three chapters are devoted to the approaches used to handle threats to
the validity of epidemiologic results. In Chapter 7, strategies for the adjustment of
confounding factors are presented, including the more parsimonious approaches (e.g.,
direct adjustment, Mantel-Haenszel) and the more complex (i.e., multiple regression,
instrumental variables, Mendelian randomization, and propensity scores). Emphasis is
placed on the selection of the method that is most appropriate for the study design used
(e.g., Cox proportional hazards for the analysis of survival data or Poisson regression
for the analysis of rates per person-time). Chapter 8 reviews the basic quality-control
strategies for the prevention and control of measurement error and bias. Both quali-
tative and quantitative approaches used in quality control are discussed. The most-
often used analytic strategies for estimating validity and reliability of data obtained in
epidemiologic studies are reviewed (e.g., unweighted and weighted kappa, correlation
coefficients) in this chapter. In Chapter 9, the key issue of communication of results
of epidemiologic studies is discussed. Examples of common mistakes made when
reporting epidemiologic data are given as a way to stress the importance of clarity in
such reports.
Chapter 10 discusses—from the epidemiologist’s viewpoint—issues relevant to the
interface between epidemiology, health policy, and public health, such as Rothman’s
causality model, proximal and distal causes, and Hill’s guidelines. This chapter also
includes brief discussions of three topics pertinent to causal inference: sensitivity
analysis, meta-analysis, and publication bias; and consideration of the decision tree as
a tool to evaluate interventions. As in the previous editions, Appendices A, B, C, and
E describe selected statistical procedures (e.g., standard errors and confidence levels,
trend test, test of heterogeneity of effects, intraclass correlation) to help the reader
to more thoroughly evaluate the measures of risk and association discussed in the
Preface xi
text and to expose him or her to procedures that, although relatively simple, are not
available in many statistical packages used by epidemiology students and practitioners.
Appendix D includes two sections on quality assurance and control procedures taken
from the corresponding manual of the Atherosclerosis Risk in Communities (ARIC)
Study as examples of real-life applications of some of the procedures discussed in
Chapter 8. Finally, Appendix F provides the answers to the exercises.
We encourage readers to advise us of any errors or unclear passages, and to suggest
improvements. Please email any such suggestions or comments to: [email protected].
All significant contributions will be acknowledged in the next edition.
Acknowledgments
This book is an outgrowth of an intermediate epidemiology course taught by the
authors at the Johns Hopkins Bloomberg School of Public Health. Over the years, this
course has benefited from significant intellectual input of many faculty members,
including, among others, George W. Comstock, Helen Abbey, James Tonascia, Leon
Gordis, and Mary Meyer. The authors especially acknowledge the late George W.
Comstock, a mentor to both of us, who was involved with the course for several
decades. His in-depth knowledge of epidemiologic methods and his wisdom over the
years has been instrumental to our professional growth. Dr. Comstock also kindly
provided many of the materials and examples used in Chapter 9 of this book.
We are indebted to many colleagues, including Leonelo Bautista, Daniel
Brotman, Woody Chambless, Steve Cole, Joseph Coresh, Rosa Maria Corona,
Ana Diez-Roux, Jingzhong Ding, Manning Feinleib, Leon Gordis, Eliseo Guallar,
Jay Kaufman, Kristen Malecki, Alfonso Mele, Paolo Pasquini, Paul Peppard,
Patrick Remington, Jonathan Samet, Eyal Shahar, Richey Sharrett, and Michael
Silverberg. These colleagues reviewed partial sections of this or previous editions
or provided guidance in solving conceptual or statistical riddles. We are especially
grateful to our current and former students Gabrielle Detjen, Salwa Massad,
Margarete (Grete) Wichmann, and Hannah Yang for their careful review of the
exercises. The authors are also grateful to Lauren Wisk for creating the ancillary
instructor materials for this text. Finally, we would like to extend our appreciation
to Patty Grubb, whose assistance has been instrumental in getting this edition
ready, and to Jennifer Seltzer for her administrative help.
To have enjoyed the privilege of teaching intermediate epidemiology for so many
years made us realize how much we have learned from our students, to whom we
are deeply grateful. Finally, without the support and extraordinary patience of all
members of our families, particularly our wives, Hilda and Marion, we could not
have devoted so much time and effort to writing the three editions of this text.
xii
About the Authors
Moyses Szklo, MD, DrPH, is a Professor of Epidemiology and Medicine (Cardiology)
at the Johns Hopkins University. His current research focuses on risk factors for
subclinical and clinical atherosclerosis. He is also Editor-in-Chief of the American
Journal of Epidemiology.
F. Javier Nieto, MD, PhD, is Helfaer Professor of Public Health, and Professor and
Chair at the Department of Population Health Sciences at the University of Wisconsin
School of Medicine and Public Health. His current research focuses on epidemiology of
cardiovascular and sleep disorders as well as on population-based survey methods.
xiii
PA R T O N E
Introduction
Basic Study Designs in Analytical
Epidemiology 3
CHAPTER
Basic Study Designs in
Analytical Epidemiology 1
1.1 INTRODUCTION: DESCRIPTIVE AND
ANALYTICAL EPIDEMIOLOGY
Epidemiology is traditionally defined as the study of the distribution and determinants
of health-related states or events in specified populations and the application of this
study to control health problems.1 Epidemiology can be classified as either “descriptive”
or “analytic.” In general terms, descriptive epidemiology makes use of available data to
examine how rates (e.g., mortality) vary according to demographic variables (e.g., those
obtained from census data). When the distribution of rates is not uniform according
to person, time, and place, the epidemiologist is able to define high-risk groups for
prevention purposes—e.g., hypertension is more prevalent in US blacks than in US
whites, thus defining blacks as a high-risk group. In addition, disparities in the distri-
bution of rates serve to generate causal hypotheses based on the classic agent–host–
environment paradigm—e.g., the hypothesis that environmental factors to which blacks
are exposed, such as excessive salt intake or psychosocial stress, are responsible for their
higher risk of hypertension.
A thorough review of descriptive epidemiologic approaches can be readily found in
numerous sources.2,3 For this reason and given the overall scope of this book, this chapter
focuses on study designs that are relevant to analytical epidemiology; that is, designs that
allow assessment of hypotheses of associations of suspected risk factor exposures with
health outcomes. Moreover, the main focus of this textbook is observational epidemiology,
even though many of the concepts discussed in subsequent chapters, such as measures
of risk, measures of association, interaction/effect modification and quality assurance/
control, are also relevant to experimental studies (randomized clinical trials).
In this chapter, the two general strategies used for the assessment of associations in
observational studies are discussed: (1) studies using populations or groups of individuals
as units of observation—the so-called ecologic studies; and (2) studies using individuals
as observation units, which include the prospective (or cohort), the case-control, and the
cross-sectional study designs.
Before that, however, the next section briefly discusses the analysis of birth cohorts.
The reason for including this descriptive technique here is that it often requires the
application of an analytical approach with a level of complexity usually not found in
descriptive epidemiology; furthermore, this type of analysis is frequently important for
understanding the patterns of association between age (a key determinant of health
status) and disease in cross-sectional analyses. (An additional, more pragmatic reason
for including a discussion of birth cohort analysis here is that it is usually not discussed
in detail in basic textbooks.)
3
4 CHAPTER 1 | Basic Study Designs in Analytical Epidemiology
1.2 ANALYSIS OF AGE, BIRTH COHORT, AND PERIOD EFFECTS
Health surveys conducted in population samples usually include participants over
a broad age range. Age is a strong risk factor for many health outcomes and is also
frequently associated with numerous exposures. Thus, even if the effect of age is not
among the primary objectives of the study, given its potential confounding effects, it is
often important to assess its relationship with exposures and outcomes.
Table 1-1 shows the results of a hypothetical cross-sectional study conducted in 2005
to assess the prevalence rates of a disease Y according to age. (A more strict use of the
term “rate” as a measure of the occurrence of incident events is defined in Section 2.2.2.
This term is also widely used in a less precise sense to refer to proportions such as preva-
lence.1 It is in this more general sense that the term is used here and in other parts of
the book.)
In Figure 1-1, these results are plotted at the midpoints of 10-year age groups (e.g., for
ages 30–39, at 35 years; for ages 40–49, at 45 years; and so on). These data show that
the prevalence of Y in this population decreases with age. Does this mean that the preva-
lence rates of Y decrease as individuals age? Not necessarily. For many disease processes,
Table 1-1 Hypothetical data from a cross-sectional study of prevalence of disease Y in a
population, by age, 2005.
Age group (years) Midpoint (years) 2005 Prevalence (per 1000)
30–39 35 45
40–49 45 40
50–59 55 36
60–69 65 31
70–79 75 27
Figure 1-1 Hypothetical data from a cross-sectional study of prevalence of disease Y in a
population, by age, 2005 (based on data from Table 1-1).
50
40
Prevalence (per 1000)
30
20
10
0
0 10 20 30 40 50 60 70 80
Age (years)
1
1.2 Analysis of Age, Birth Cohort, and Period Effects 5
Analytical Epidemiology
Basic Study Designs in
exposures have cumulative effects that are expressed over long periods of time. Long
latency periods and cumulative effects characterize, for example, numerous exposure/
disease associations, including smoking-lung cancer, radiation-thyroid cancer, and
saturated fat intake-atherosclerotic disease. Thus, the health status of a person who is
50 years old at the time of the survey may be partially dependent on this person’s past
exposures (e.g., smoking during early adulthood). Variability of past exposures across
successive generations (birth cohorts*) can distort the apparent associations between
age and health outcomes that are observed at any given point in time. This concept can
be illustrated as follows.
Suppose that the same investigator who collected the data shown in Table 1-1 is able
to recover data from previous surveys conducted in the same population in 1975, 1985,
and 1995. The resulting data, presented in Table 1-2 and Figure 1-2, show consistent
trends of decreasing prevalence of Y with age in each of these surveys. Consider now
plotting these data using a different approach, as shown in Figure 1-3. The dots in
Figure 1-3 are at the same places as in Figure 1-2, except that the lines are connected by
birth cohort (the 2005 survey data are also plotted in Figure 1-3). Each of the dotted lines
represents a birth cohort converging to the 2005 survey. For example, the “youngest”
age point in the 2005 cross-sectional curve represents the rate of disease Y for individuals
aged 30 to 39 years (average of 35 years) who were born between 1965 and 1974—
that is, in 1970 on average (the “1970 birth cohort”). Individuals in this 1970 birth
cohort were on average 10 years younger—that is, 25 years of age at the time of the
1995 survey and 15 years of age at the time of the 1985 survey. The line for the 1970
birth cohort thus represents how the prevalence of Y changes with increasing age for
individuals born, on average, in 1970. Evidently, the cohort pattern shown in Figure 1-3
is very different from that suggested by the cross-sectional data and is consistent for all
birth cohorts shown in Figure 1-3 in that it suggests that the prevalence of Y actually
Table 1-2 Hypothetical data from a series of cross-sectional studies of prevalence of disease Y in
a population, by age and survey date (calendar time), 1975–2005.
Survey date
Age group Midpoint
(years) (years) 1975 1985 1995 2005
Prevalence (per 1000)
10–19 15 17 28
20–29 25 14 23 35
30–39 35 12 19 30 45
40–49 45 10 18 26 40
50–59 55 15 22 36
60–69 65 20 31
70–79 75 27
*Birth cohort: From Latin cohors, warriors, the 10th part of a legion. The component of the population born
during a particular period and identified by period of birth so that its characteristics (e.g., causes of death and
numbers still living) can be ascertained as it enters successive time and age periods.1
6 CHAPTER 1 | Basic Study Designs in Analytical Epidemiology
Figure 1-2 Hypothetical data from a series of cross-sectional studies of prevalence of disease
Y (per 1000) in a population, by age, and survey date (calendar time), 1975, 1985, 1995, and 2005
(based on data from Table 1-2).
50
40
Prevalence (per 1000)
30
2005
1995
20
1985
1975
10
0
0 10 20 30 40 50 60 70 80
Age (years)
Figure 1-3 Plotting of the data in Figure 1-2 by birth cohort (see also Table 1-3). The dotted
lines represent the different birth cohorts (from 1930 to 1970) as they converge to the 2005 cross-
sectional survey (solid line, as in Figure 1-1).
50
20
05
cro
40 ss-s
ect
Prevalence (per 1000)
ion
al s
1970 urv
30 ey
1960
1950
20 1940
1930
10
0
0 10 20 30 40 50 60 70 80
Age (years)
increases as people age. The fact that the inverse trend is observed in the cross-sectional
data is due to a strong “cohort effect” in this example; that is, the prevalence of Y is
strongly determined by the year of birth of the person. For any given age, the prevalence
rate is higher in younger (more recent) than in older cohorts. Thus, in the 2005 cross-
sectional survey (Figure 1-1), the older subjects come from birth cohorts with relatively
lower rates, whereas the youngest come from the cohorts with higher rates. This can be
seen clearly in Figure 1-3 by selecting one age (e.g., 45 years) and observing that the rate
is lowest for the 1930 birth cohort, and increases for each subsequent birth cohort (i.e.,
the 1940, 1950, and 1960 cohorts, respectively).