The Cox Model and Its Applications Digital DOCX Download
The Cox Model and Its Applications Digital DOCX Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/the-cox-model-and-its-applications/
123
Mikhail Nikulin Hong-Dar Isaac Wu
Université Bordeaux Segale National Chung-Hsing University
Bordeaux Taichung
France Taiwan
Since Sir David Cox’s pioneering work in 1972, the proportional hazards
(PH) model has become the most important model in survival analysis and in
related applications. The success of the Cox model stimulated further studies in
semiparametric and nonparametric theory, counting process models, study designs
in epidemiology, and the development of many other regression models which
could be more flexible or reasonable in data analysis. Flexible semiparametric
regression models are used increasingly often in carcinogenesis studies to relate
lifetime distributions to time-dependent explanatory variables. In addition to clas-
sical regression models such as the Cox PH model and the accelerated failure time
(AFT) model, alternative models like the linear transformation model, the frailty
model, and some varying-effect models are also considered by researchers
(Martinussen and Scheike 2006; Scheike 2006; Dabrowska 2005, 2006;
Bagdonavičius 1978; Zeng and Lin 2007). In this monograph, we discuss some
important parametric models as well as several semiparametric regression models.
Several classical examples are reconsidered and analyzed here, including the
well-known datasets concerning effects of chemotherapy and chemo- plus radio-
therapy on the survival of gastric and lung cancer patients (Stablein and
Koutrouvelis 1985; Piantadosi 1997; Kalbfleisch and Prentice 2002; Klein and
Moeschberger 2003). Following the lines of Scheike (2006), Zeng and Lin (2007),
Wu (2007), Huber et al. (2006), we also give examples to illustrate and compare
possible applications of the Cox model (1972), the Hsieh model (2001), and
Bagdonavicius and Nikulin (2002); Bagdonavičius and Nikulin (2005, 2006)
simple cross-effect (SCE) model. All three of them are particularly useful to analyze
survival data with one crossing point. This monograph offers a short course or
one-semester material for undergraduate or graduate students, for biostatisticians,
vii
viii Preface
and for scientific researchers who demand applications of survival analysis and
reliability theory in areas such as gerontology, demography, insurance, clinical
trials, medicine, epidemiology, and social sciences.
References
We are deeply grateful for the support, help, discussions, and nice papers of our
friends and colleagues, C. Huber, Z. Ying, F. Hsieh, V. Bagdonavicius, V. Solev,
W. Meeker, N. Limnios, M.L.T. Lee, S. Gross, N. Singpurwalla, F. Vonta,
W. Kahle, H. Lauter, U. Jensen, A. Lehmann, D. Dabrowska, M. Mesbah,
N. Balakrishan, W. Nelson, V. Couallier, and L. Gerville-Reache. They introduced
us to different branches of survival analysis and the theory of reliability. Our
interest for this research fields was boosted with the appearance of the books of Sir
Cox and Oakes (1984), Andersen et al. (1993), Lawless (2003), Meeker and
Escobar (1998), Martinussen and Scheike (2006), and Hougaard (2000). Finally we
would like to thank all our friends and our families.
References
Andersen, P. K., Borgan, O., Gill, R., & Keiding, N. (1993). Statistical models based on counting
processes. New York: Springer.
Cox, D. R., & Oakes, D. (1984). Analysis of survival data. London: Chapman and Hall.
Hougaard, P. (2000). Analysis of multivariate survival data. New York: Springer.
Lawless, J. F. (2003). Statistical models and methods for lifetime data. New York: Wiley.
Martinussen, T., & Scheike, T. (2006). Dynamic regression models for survival functions.
New York: Springer.
Meeker, W. Q., & Escobar L., (1998), Statistical methods for reliability data. Wiley.
ix
Contents
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Chapter 1
Introduction: Several Classical Data
Examples for Survival Analysis
The proportional hazards (PH) model was proposed by Sir David Cox just over 40
years ago (Cox 1972). Today, the Cox model is the most important model in sur-
vival analysis, reliability and quality of life research, epidemiology, clinical trials,
and biomedical studies. There have also been tremendous applications of the Cox
model in demography, econometrics, finance, pharmacology, biology, gerontology,
insurance, etc. These have marked the great success of the Cox PH model which
further induced extended studies of competitive survival regression models and the
corresponding development of semiparametric estimation theory, likelihood princi-
ple, counting process modeling and applications.
The developments in reliability and survival analysis have provided the basis and
useful methods to obtain general theory. A patient’s survival depends on his/her age,
sex, fatigue, genetic or physiological damages, the dynamics of body temperature,
body weight (or BMI), some physiological or biochemical indices, and also on the
presence of chronic disease (like cancer, diabetes mellitus, renal disease, cardiac
disease, metabolic syndrome, etc.). In general, these characteristics are coded as the
so-called covariates or explanatory variables; some of them are called degradation
processes. We suppose that the lifespan of an individual is described by covariates.
In this case, the survival (or failure) of a patient is characterized by this covariate
process and by the random moment of its potential failures. The Cox model is an
example which relates the lifetime distributions to a set of covariates by modeling
hazard rate.
The popularity and the success of the Cox model is based on the fact that there
exist simple semiparametric estimation procedures and that the regression parame-
ter in the PH model is easily interpreted as (log-) hazard ratio. The hazard ratios
under different fixed covariates are usually assumed to be constant in time. In prac-
tice, the hazard rates may approach, go away from, or even intersect each other.
In these circumstances, using the conventional Cox PH model to estimate the haz-
ard ratio leads to biased inference. The phenomenon of nonproportionality may be
derived from several aspects: First, some authors have considered the heterogeneity
effect coming from individuals with unobserved frailty so that extra variations may
be present (Hougaard 1984, 1986; Aalen 1988). Second, nonproportionality is part
© The Author(s) 2016 1
M. Nikulin and H.-D.I. Wu, The Cox Model and Its Applications,
SpringerBriefs in Statistics, DOI 10.1007/978-3-662-49332-8_1
2 1 Introduction: Several Classical Data Examples …
of the result of the time-varying effect, which could possibly be modeled by the
varying coefficient Cox model (Martinussen and Scheike 2006). Third, the interac-
tion between time and a qualitative covariate gives nonproportionality (O’Quigley
1991). Finally, some observable covariates contribute both to the mean and to the
variance of the lifetime variable or its transformation (Bagdonavičius and Nikulin
1999; Hsieh 2001; Zeng and Lin 2007), and thus produce “nonproportional hazards.”
In the last case, stratification by some variables can eliminate part of the nonpropor-
tionality. However, stratification is not reasonable if a variable is of continuous type
and, in particular, when the sample size is not large. Nevertheless, the Cox model
helps to construct dynamic models well adapted to the study of survival functions
with cross-effect. The PH model is generalized by assuming that at any moment,
the hazard ratio depends on time-varying covariates. Relations with generalized pro-
portional hazards, frailty, linear transformation, Sedyakin and degradation models
and cross-effect models have been considered. Using some new flexible regression
models, in this monograph, we analyze survival data of the Gastrointestinal Tumor
Study Group (Stablein and Koutrouvelis 1985), the Veteran’s Administration lung
cancer trials, the data of Piantadosi (1997) on lung cancer patients, the Stanford Heart
Transplant data, and a dataset concerning the length of hospital stay of rehabilitating
stroke patients.
These data examples illustrate the characteristics of survival data which may
be collected from clinical operation (the Standford Heart Transplant data), hospital
registration system (length of hospital stay for stroke patients), and clinical trials
(gastric cancer data and lung cancer data). In these data, survival estimates using
the Kaplan–Meier method (see Chap. 2) are presented when the characteristics of
proportional hazards (see Chap. 3) or nonproportional hazards (see Chaps. 5 and 6)
according to different covariate configurations are considered.
The SHT data reported in Miller and Halpern (1982) contains 184 patients with the
following variables: survival time, dead/alive status, age and T5 mismatch scores.
Cox and Oakes (1984, Chap. 8) tabulated another version of the SHT data which com-
prises 249 patients with transplant indicators and waiting times. Here, we consider
the data presented in Miller and Halpern (1982). A complete dataset with 154 obser-
vations is used. We display the Kaplan–Meier (KM) survival estimates for different
age and mismatch score groups. Derivation of the KM estimate and its properties are
discussed in Chap. 2.
For the 154 observed times, 102 failured and 52 “right-censored” (explained in
Chap. 2) times, the three quartiles of age are 35.0, 44.5, and 49.0. The younger two
1.1 Example 1: The Standford Heart Transplant (SHT) Data 3
groups (age ≤ 35.0 and 35 < age < 45) have no statistical difference in the lifetimes
using the log-rank test (see Chap. 3); these two groups are combined. So we divide
the patients into three groups: “age < 45,” “45 ≤ age ≤ 49,” and “age ≥ 50.” The
survival estimates are shown in Fig. 1.1(a). The mismatch score measures the tis-
sue incompatibility between recipient and donor; it can be viewed as a continuous
random variable. The log-rank test reveals no significant difference in the lifetime
distributions among the four groups formed by the quartiles 0.69, 1.05, and 1.49. We
simply use the median (T5 = 1.05) as the cut-off point and plot the KM estimates
for the two groups (Fig. 1.1b).
These two figures show that the survivals are significantly different in age, but
not in (dichotomized) mismatch score. The group “age ≥ 50” has a sudden drop in
survival at the early stage (time < 100 days). The other two younger groups have
crossings at an early stage and at a time very close to 2000 days. It appears that the
“difference between groups” varies with time. With a proportional hazards regression
setting (Chap. 3), the effect of age cannot be modeled by a simple univariate variable
age. As indicated by this example, seeking an alternative model is important.
Inference, 139(12),
H.-D.I. Wu, F. Hsieh,
Heterogeneity and Varying
Effect in Hazards
Regression, pp. 4213–4222,
Copyright 2015, with
0.1
(b)
1.0
0.5
log(survival)
score<1.055
score>=1.055
0.1
Cerebral vascular disease was among the leading causes of death in Taiwan in recent
decades (crude mortalities, 53.5–78.4 cases per 105 person-years), and rehabilitating
stroke patients often had a long length of hospital stay (LOS). The work of study of
the principal factors affecting LOS is essential for the management of health-care
costs, of after-discharge home care, and of bed occupancy in hospitals of different
levels, etc. Further, LOS is a factor related to short-term prognosis and is also an
indicator of long-term survival of patients. These data offer an example for the case
of non-censoring (see Chap. 2); that is, the time of “discharge” from hospital is
treated as an “event time.” The data enrolled 586 patients who experienced their first
hemorrhage/infarct strokes and received in-hospital rehabilitations (Lin et al. 2009).
The baseline data include age, gender, co-morbidity status, and previous history
of stroke and/or severe injury, etc. Modified Barthel index (MBI) and functional
independence measure (FIM) questionnaires were administrated to patients admitted
for rehabilitation. The MBI and FIM are two different scores measuring the severity
of disability and functional dependence/independence level of patients. These two
scores are highly correlated and both indicative of a patient’s discharge. In this
data, 24.6, 60.8, and 14.6 % of the patients had MBI = 0, 0 < MBI ≤ 30, andMBI ≥
35; and 24.4, 48.0, and 27.6 % had FIM between [18,28], [29,63] and [64, 125],
respectively.
The KM “survival” estimates for different MBI and FIM groups are displayed in
Fig. 1.2. Different Barthel index groups (upper panel, Fig. 1.2a) and different FIM
groups (lower panel, Fig. 1.2b) both have the proportional hazards relationship. In
Lin et al. (2009), confidence intervals of mean LOS are constructed based on the PH
model assumption.
When analyzing survival data from clinical trials, cross-effects of survival functions
are sometimes observed. A classical example is the well-known data concerning
effects of chemotherapy (CH) and chemotherapy plus radiotherapy (CH+R) on the
survival times of gastric cancer patients studied by Stablein and Koutrouvelis (1985).
The number of patients is 90. Survival times of chemotherapy (group 0 of size 45)
and chemotherapy plus radiotherapy (group 1 of size 45) patients are as follows (*
denotes right-censored observations). For further details and discussions, see also
Kleinbaum and Klein (2005), Klein and Moeschberger (2003), Bagdonavičius et al.
(2002), Hsieh (2001), Bagdonavičius et al. (2004), and Zeng and Lin (2007).
1.3 Example 3: Gastric Carcinoma Data 5
1.0
different BAR (or MBI)
groups. b KM estimates for
0.8
different FIM groups.
Reprinted from Journal of BAR=0
0<BAR<35
the Formosan Medical
0.6
BAR>=35
survival
Association, 108(8), C.-L.
Lin, P.-H. Lin, L.-W. Chou,
0.4
S.-J. Lan, N.-H. Meng,
S.-F. Lo, H.-D.I. Wu,
0.2
Model-based Prediction of
Length of Stay for
Rehabilitating Stroke 0.0 0 20 40 60 80 100 120
Patients, pp. 653–662, days
Copyright 2015, with
permission from Elsevier
(b)
1.0
0.8
FIM<=28
29=<FIM<=63
0.6
FIM>=64
survival
0.4
0.2
0.0
0 20 40 60 80 100 120
days
• Chemotherapy: 1 63 105 129 182 216 250 262 301 301 342 354 356 358 380 383
383 388 394 408 460 489 499 523 524 535 562 569 675 676 748 778 786 797 955
968 1000 1245 1271 1420 1551 1694 2363 2754* 2950*;
• Chemotherapy plus Radiotherapy: 17 42 44 48 60 72 74 95 103 108 122 144 167
170 183 185 193 195 197 208 234 235 254 307 315 401 445 464 484 528 542 567
577 580 795 855 1366 1577 2060 2412* 2486* 2796* 2802* 2934* 2988*.
At the beginning of treatment, the mortality of CH+R patients is greater but
at a certain moment the survival functions of CH+R and CH patients intersect,
and later the mortality of CH patients is greater. That is, if patients survive CH+R
therapy during a certain period then later this treatment is more beneficial than the
CH therapy. Doses of CH and R therapy can be different so regression data can be
collected. One will observe (Fig. 1.3) this “cross-effect” phenomena by plotting the
Kaplan–Meier estimators of the survival function for both treatment groups. The
two estimated curves indicate that radiotherapy would initially be detrimental to
a patient’s survival but become beneficial later on. We shall consider models for
analysis of data with cross-effect of survival functions under constant covariates in