SAS for Epidemiologists Applications and Methods Full Book Access
SAS for Epidemiologists Applications and Methods Full Book Access
Visit the link below to download the full version of this book:
https://medipdf.com/product/sas-for-epidemiologists-applications-and-methods/
123
Charles DiMaggio, PhD
Departments of Anesthesiology
and Epidemiology
College of Physicians and Surgeons
Mailman School of Public Health
Columbia University
New York, USA
vii
viii Foreword
The path to this book began, as these things so often do, when I was asked to teach
a course. In this case, a semester-long class for master’s students on epidemiologic
analyses using SAS. Over a few years of preparing for and teaching the material I
confronted a combination of practical and conceptual considerations that led me to
believe that perhaps there was room for another book about SAS.
On a practical level, working with a program like SAS is a skill I consider
necessary for all graduating master’s epidemiologists. To be honest, the necessity
that the program actually be SAS is based on a circular argument. Many employers
of epidemiologists use SAS because their current analysts use SAS, and newly
minted analysts will compel additional future analysts to use SAS. This reliance
on SAS of potential employers of master’s-level epidemiology students may change
in the future, but my sense is that it will not be anytime soon. While the practical
motivation to learn SAS is somewhat self-fulfilling, it does not detract from the
capabilities that made SAS an important skill in the first place. And, does it make
the choice of SAS any less necessary. As I sit and write this, a quick search on the
New York Times jobs link returns 15 epidemiology jobs in the New York City area.
A search for SAS returns 457 hits. When I do this search on the first day of class,
with generally the same results, there is invariably an increased interest among the
students in spending a few hours a week learning SAS.
The kinds of SAS-related work that master’s-level epidemiologists are called
upon to undertake do not exceed some fairly straightforward categorical and
continuous data analyses. There was, though, no book that addressed this material
in a similarly straightforward fashion. The feedback I’ve received from the past
students is that the procedures covered in this material account for a good majority
of their daily activity and that knowing how to do those things helps set the stage
for learning more advanced material.
On a conceptual level, the role of statistical software in epidemiologic practice is
in a state of flux, and the kinds of data and analyses epidemiologists are being called
upon to work with are evolving into what might be called the era of “big data” and
the rise of “computational epidemiology.” SAS is tailor-made to deal with the kinds
of huge data sets that are becoming routine in epidemiology. That there has been
ix
x Preface
I was fortunate enough to learn (and hopefully pass on to you) these concepts and
procedures from a small but uniformly excellent set of books and the scientists who
wrote them [1–12]. I try to cite areas where they were particularly illustrative, but I
borrowed from them shamelessly throughout.
I am especially indebted to the SAS Institute for allowing me to draw on their
training manuals, course notes, and data sets. If you want a thorough grounding in
SAS, I strongly recommend you take one of their outstanding training classes. I also
appreciate the willingness of the New York State Department of Health to allow
me to use a version of their Statewide Planning and Research Cooperative System
(SPARCS) data that figure so prominently in these pages.
Finally, many thanks to my colleagues in the departments of anesthesiology and
epidemiology at Columbia University in New York, for their support and feedback,
and to my stellar teaching assistants, Joanne Brady, and George Loo, who know
SAS so much better than I do, and who helped breathe life into this material.
xi
Contents
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 About SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1.1 Alternatives to SAS. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2
1.1.2 Why SAS, Then? . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
1.2.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
1.2.2 How to Use the Book .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
xiii
xiv Contents
6 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 79
6.1 PROC MEANS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 79
6.2 PROC FREQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 80
6.3 PROC TABULATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
6.3.1 Using TABULATE for Surveillance Data . . . . . . . . . . . . . . . . . 84
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 86
7 Histograms and Plots.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91
7.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91
7.2 PROC GCHART for Histograms .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91
7.3 PROC GPLOT to Plot Continuous Data . . . . . . . .. . . . . . . . . . . . . . . . . . . . 93
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
8 Categorical Data Analysis I . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 99
8.1 Introduction to Categorical Outcomes . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 99
8.2 Associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100
8.3 Examining Frequency Tables . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
8.4 Reordering Categorical Variables . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103
8.5 Tests of Statistical Significance for Categorical Variables . . . . . . . . . 105
8.5.1 Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 105
8.5.2 Exact Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 108
8.5.3 The Mantel–Haenszel Chi-Square . . . . .. . . . . . . . . . . . . . . . . . . . 112
8.5.4 The Spearman Correlation Coefficient . . . . . . . . . . . . . . . . . . . . 113
8.6 Significance vs. Strength .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 114
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 116
9 Categorical Data Analysis II . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
9.1 Probabilities and Odds . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
9.2 The Odds Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 120
9.2.1 Why Epidemiologists Need the Odds Ratio . . . . . . . . . . . . . . . 120
9.2.2 The Disease Odds Ratio . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 121
9.2.3 The Exposure Odds Ratio . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
9.3 Preterm Labor and Birth Weight Example 1 . . . .. . . . . . . . . . . . . . . . . . . . 124
9.4 Confounding .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 126
9.4.1 Identifying and Controlling Confounding .. . . . . . . . . . . . . . . . 126
9.5 Controlling for Confounding . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 128
9.5.1 Controlling Confounding in Study Design . . . . . . . . . . . . . . . . 128
9.5.2 Analytic Approaches to Confounding .. . . . . . . . . . . . . . . . . . . . 128
9.6 Preterm Labor and Birth Weight Example 2 . . . .. . . . . . . . . . . . . . . . . . . . 129
9.7 Adjusted Odds Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 129
9.7.1 Cochran–Mantel–Haenszel Statistic . . .. . . . . . . . . . . . . . . . . . . . 131
9.7.2 The Mantel–Haenszel Odds Ratio . . . . .. . . . . . . . . . . . . . . . . . . . 131
9.8 Summarizing Exploratory Contingency Table Analyses . . . . . . . . . . . 135
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 135