Clinical Prediction Models A Practical Approach to Development, Validation, and Updating 2nd Edition Readable PDF Download
Clinical Prediction Models A Practical Approach to Development, Validation, and Updating 2nd Edition Readable PDF Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/clinical-prediction-models-a-practical-approach-to-d
evelopment-validation-and-updating-2nd-edition/
Second Edition
123
Ewout W. Steyerberg
Department of Biomedical Data Sciences
Leiden University Medical Center
Leiden, The Netherlands
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
For Aleida, Matthijs, Laurens and Suzanne
For my father Wim
Preface
The first edition of this book was made during the years 2005–2007. Since then
quite some new developments have taken place, both in the general scientific
direction that prediction research is taking and specific technical innovations. These
developments have been addressed as far as possible in the second edition. Many
new references have been added. Some detailed material has been moved from print
to the web. Many figures have been redrawn in color for better clarity and attrac-
tiveness. In all, many changes have been made to nearly every chapter.
Prediction models are important in widely diverse fields, including medicine,
physics, engineering, meteorology, and finance. Prediction models are becoming
more relevant in the medical field with the increase in biological knowledge on
potential predictors of outcome, e.g., from “omics” (including genomics, tran-
scriptomics, proteomics, glycomics, metabolomics). Also, the Big Data era implies
we will have increasing access to large volumes of routinely collected data. The
number of applications for prediction models will increase, e.g., with targeted early
detection of disease, and individualized approaches to diagnostic testing and
treatment.
We are moving to an era of personalized evidence-based medicine that asks for
an individualized approach to shared medical decision-making. Evidence-based
medicine has a central place for meta-analysis to summarize results from ran-
domized controlled trials; prediction models summarize the effects of predictors to
provide individualized predictions of the absolute risk of a diagnostic or prognostic
outcome. Prediction models and related algorithms will increasingly form the basis
for personalized evidence-based medicine and individualized decision-making.
My motivation for working on the first and second editions of this book stems
primarily from the fact that the development and applications of prediction models
are often suboptimal in medical publications. With this book, I hope to contribute to
vii
viii Preface
better understanding of relevant issues and give practical advice on better modeling
strategies than are nowadays used.
Issues include the following:
(a) Better predictive modeling is sometimes readily possible, e.g., a large data set
with high-quality data is available, but all continuous predictors are dichot-
omized, which is known to have several disadvantages.
(b) Small samples are used:
– Studies are underpowered, implying unreliable answers to difficult questions
such as “Which are the most important predictors in this prediction
problem?”
– The problem of small sample size is aggravated by doing a complete case
analysis which discards information from nearly complete records.
Statistical imputation methods are nowadays available to exploit all avail-
able information, especially “multiple imputations.”
– Predictors are omitted that should reasonably have been included based on
subject matter knowledge. Analysts rely too much on the limited data that
they have available in their data set, instead of wisely combining information
from several sources, such as medical literature and experts in the field.
– Stepwise selection methods are abundant when researchers apply regression
modeling, while these methods are suboptimal, especially in small data sets.
– Modeling approaches are used that require higher numbers. Data-hungry
techniques, such as neural network modeling, machine learning or artificial
intelligence techniques, should not be used in small data sets.
– No attempts are made towards validation, or validation is done inefficiently.
For example, a split-sample approach is followed, leading to a smaller
sample for model development and a smaller sample for model validation.
Better methods are nowadays available and should be used far more often,
specifically bootstrap resampling.
(c) Claims are exaggerated:
– Often, we see statements such as “the independent predictors were identi-
fied”; in many instances, such findings are purely exploratory and may not
be reproducible; they may largely represent noise.
– Models are not internally valid, with overoptimistic expectations of model
performance in new patients.
– One modern machine learning method with a fancy name is claimed as
being superior to a more traditional regression approach, while no con-
vincing evidence is presented, and a suboptimal modeling strategy was
followed for the regression model. Fair comparisons between well-used
statistical methods and machine learning methods are required.
– Researchers are insufficiently aware of overfitting, implying that their
apparent findings are merely coincidental.
Preface ix
Intended Audience
Other Sources
Many excellent textbooks exist on regression analysis techniques, but these usually
do not have a focus on modeling strategies for prediction. The main exception is
Frank Harrell’s book “Regression Modeling Strategies”. He brings advanced bio-
statistical concepts to practical application, supported by the rms package for R.
Harrell’s book may, however, be too advanced for clinical and epidemiological
researchers. This also holds for the Hastie, Tibshirani, and Friedman quite thorough
textbook “The Elements of Statistical Learning”. These books are very useful for a
more in-depth discussion of statistical techniques and strategies. Harrell’s book
provided the main inspiration for the presented work here. Another good com-
panion book is the Vittinghoff et al. book on “Regression Methods in Biostatistics”.
Various sources at the Internet can be used that explain terms used in this book.
Frank Harrell maintains a useful glossary: [http://hbiostat.org/doc/glossary.pdf].
Structure
It has been found that people learn by example, by checklists, and by own dis-
covery. Therefore, many examples are provided throughout the text, including the
essential computer code and output. I also suggest a checklist for prediction
modeling (Part II). Own discovery is possible with exercises per chapter, with data
sets and scripts provided at the book’s website: www.clinicalpredictionmodels.org.
Many statistical techniques and approaches are readily possible with any modern
software package. Personally, I have worked with SPSS for simple, straightforward
analyses. This package is insufficient for more advanced analyses which are essential
in prediction modeling. The SAS computer package is more advanced, but may not
be so practical for some. A package such as Stata is very suitable. It is similar in
capabilities to R software for the key elements of prediction modeling. The R soft-
ware has several advantages: the software is for free, and innovations in biostatistical
methods become readily available. Therefore, R is the natural choice as the software
accompanying this book. R software is available at www.cran.r-project.org, with help
files and a tutorial.
An important disadvantage of R is a relatively slow learning curve; it takes time
and efforts to learn R. Some R commands are provided in this book; full programs
Preface xi
Many have made small to large contributions to this book and the revision. I’m very
grateful to all. Frank Harrell has been a source of inspiration for my research in the
area of clinical prediction models, together with Hans van Houwelingen, who has
developed many of the theoretical innovations that are presented in this book. I’m
grateful to be his successor as a chair of the Department of Biomedical Data
Sciences at the Leiden University Medical Center. At the Department of Public
Health, Erasmus MC, Rotterdam, Dik Habbema, and René Eijkemans have
sharpened my thinking on prediction modeling. Hester Lingsma was very sup-
portive in the last phase of finishing the first edition of this book and has been a
wonderful colleague over many years. Lex Burdorf, Daan Nieboer, and Jan
Verbeek (Erasmus MC) provided specific comments for the second edition.
My insights in meta-analysis have benefitted from a project with Carl Moons and
Thomas Debray (Utrecht). I have enjoyed the vigorous discussions about the
evaluation of model performance with Ben van Calster, Michael Pencina, Stuart
Baker, and Andrew Vickers, which is reflected in further textual changes in the
second edition. Several Ph.D. students, colleagues, and external reviewers provided
input and made specific comments on various chapters.
I specifically would like to thank investigators who allowed their data sets to be
made available for didactic purposes, including Kerry Lee (Duke University) for the
GUSTO-I data, Andrew Maas (Antwerp University) for the IMPACT data, Yolanda
van der Graaf (Utrecht University) for the SMART data, and all other investigators
and patients who were involved in the studies used in this book. Finally, I thank my
family for their love and support over the years, and for allowing me to devote
private time to this book.
xiii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1
1.1 Diagnosis, Prognosis, and Therapy Choice in Medicine . . . . .. 1
1.1.1 Predictions for Personalized Evidence-Based
Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1
1.2 Statistical Modeling for Prediction . . . . . . . . . . . . . . . . . . . .. 5
1.2.1 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . .. 5
1.2.2 Reliability of Predictions: Aleatory and Epistemic
Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Part I: Prediction Models in Medicine . . . . . . . . . . . 9
1.3.2 Part II: Developing Internally Valid Prediction
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9
1.3.3 Part III: Generalizability of Prediction Models . . . .. 10
1.3.4 Part IV: Applications . . . . . . . . . . . . . . . . . . . . . .. 11
xv
xvi Contents