Models For Multi-State Survival Data - Per Kragh Andersen, Henrik Ravn (Chapman & Hall - CRC Texts in Statistical Science) - CRC (2024)
Models For Multi-State Survival Data - Per Kragh Andersen, Henrik Ravn (Chapman & Hall - CRC Texts in Statistical Science) - CRC (2024)
Survival Data
Multi-state models provide a statistical framework for studying longitudinal data on subjects when
focus is on the occurrence of events that the subjects may experience over time. They find appli-
cation particularly in biostatistics, medicine, and public health. The book includes mathematical
detail which can be skipped by readers more interested in the practical examples. It is aimed at
biostatisticians and at readers with an interest in the topic having a more applied background, such
as epidemiology. This book builds on several courses the authors have taught on the subject.
Key Features:
Software code in R and SAS and the data used in the book can be found on the book’s webpage.
Henrik Ravn is senior statistical director at Novo Nordisk A/S, Denmark. He graduated with
an MSc in theoretical statistics in 1992 from University of Aarhus, Denmark and completed
a PhD in Biostatistics in 2002 from the University of Copenhagen, Denmark. He joined Novo
Nordisk in late 2015 after more than 22 years of experience doing biostatistical and epidemio-
logical research, at Statens Serum Institut, Denmark and in Guinea-Bissau, West Africa. He
has co-authored more than 160 papers, mainly within epidemiology and application of survival
analysis and has taught several courses as external lecturer at Section of Biostatistics, University
of Copenhagen.
Per Kragh Andersen is professor of Biostatistics at the Department of Public Health, University of
Copenhagen, Denmark since 1998. He earned a mathematical statistics degree from the University
of Copenhagen in 1978, a PhD in 1982, and a DMSc degree in 1997. From 1993 to 2002 he worked
as chief statistician at Danish Epidemiology Science. He is author or co-author of more than 125
papers on statistical methodology and more than 250 papers in the medical literature. His research
has concentrated on survival analysis, and he is co-author of the 1993 book Statistical Models
Based on Counting Processes. He has taught several courses both nationally and internationally
both for students with a mathematical background and for students in medicine or public health.
CHAPMAN & HALL/CRC
Texts in Statistical Science Series
Joseph K. Blitzstein, Harvard University, USA
Julian J. Faraway, University of Bath, UK
Martin Tanner, Northwestern University, USA
Jim Zidek, University of British Columbia, Canada
Recently Published Titles
Sampling
Design and Analysis, Third Edition
Sharon L. Lohr
Bayes Rules!
An Introduction to Applied Bayesian Modeling
Alicia Johnson, Miles Ott and Mine Dogucu
Statistical Theory
A Concise Introduction, Second Edition
Felix Abramovich and Ya’acov Ritov
Applied Linear Regression for Longitudinal Data
With an Emphasis on Missing Observations
Frans E.S. Tan and Shahab Jolani
Fundamentals of Mathematical Statistics
Steffen Lauritzen
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as-
sume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please
write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for iden-
tification and explanation without intent to infringe.
DOI: 10.1201/9780429029684
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
Preface xi
1 Introduction 1
1.1 Examples of event history data 2
1.1.1 PBC3 trial in liver cirrhosis 2
1.1.2 Guinea-Bissau childhood vaccination study 4
1.1.3 Testis cancer incidence and maternal parity 5
1.1.4 PROVA trial in liver cirrhosis 6
1.1.5 Recurrent episodes in affective disorders 7
1.1.6 LEADER cardiovascular trial in type 2 diabetes 9
1.1.7 Bone marrow transplantation in acute leukemia 9
1.1.8 Copenhagen Holter study 10
1.2 Parameters in multi-state models 13
1.2.1 Choice of time-variable 13
1.2.2 Marginal parameters 14
1.2.3 Conditional parameters 18
1.2.4 Data representation 20
1.2.5 Target parameter 25
1.3 Independent censoring and competing risks 27
1.4 Mathematical definition of parameters (*) 29
1.4.1 Marginal parameters (*) 30
1.4.2 Conditional parameters (*) 31
1.4.3 Counting processes (*) 32
1.5 Exercises 34
v
vi CONTENTS
2.2.4 Additive regression models 54
2.2.5 Additive versus multiplicative models 57
2.3 Delayed entry 58
2.4 Competing risks 61
2.5 Recurrent events 62
2.5.1 Recurrent episodes in affective disorders 63
2.5.2 LEADER cardiovascular trial in type 2 diabetes 64
2.6 Exercises 66
3 Intensity models 69
3.1 Likelihood function (*) 69
3.2 Non-parametric models (*) 73
3.2.1 Nelson-Aalen estimator (*) 73
3.2.2 Inference (*) 74
3.3 Cox regression model (*) 76
3.4 Piece-wise constant hazards (*) 78
3.5 Additive regression models (*) 80
3.6 Examples 82
3.6.1 PBC3 trial in liver cirrhosis 82
3.6.2 Guinea-Bissau childhood vaccination study 82
3.6.3 PROVA trial in liver cirrhosis 83
3.6.4 Testis cancer incidence and maternal parity 85
3.7 Time-dependent covariates 87
3.7.1 Adapted covariates 87
3.7.2 Non-adapted covariates 88
3.7.3 Inference 88
3.7.4 Inference (*) 89
3.7.5 Recurrent episodes in affective disorders 89
3.7.6 PROVA trial in liver cirrhosis 90
3.7.7 PBC3 trial in liver cirrhosis 95
3.7.8 Bone marrow transplantation in acute leukemia 96
3.7.9 Additional issues 101
3.8 Models with shared parameters 103
3.8.1 Duplicated data set 103
3.8.2 PROVA trial in liver cirrhosis 105
3.8.3 Bone marrow transplantation in acute leukemia 105
3.8.4 Joint likelihood (*) 106
3.9 Frailty models 110
3.9.1 Inference (*) 110
3.9.2 Clustered data 111
3.9.3 Recurrent events 112
3.10 Exercises 115
6 Pseudo-values 221
6.1 Intuition 222
6.1.1 Introduction 222
6.1.2 Hazard difference 229
6.1.3 Restricted mean 230
6.1.4 Cumulative incidence 233
6.1.5 Cause-specific time lost 234
6.1.6 Non-Markov transition probabilities 234
6.1.7 Recurrent events 235
6.1.8 Covariate-dependent censoring 236
6.2 Theoretical properties (*) 237
6.3 Approximation of pseudo-values (*) 240
6.4 Goodness-of-fit (*) 241
6.5 Exercises 243
Bibliography 261
Multi-state models provide a statistical framework for studying longitudinal data on sub-
jects when focus is on the occurrence of events that the subjects may experience over time.
The most simple situation is when only a single event, ‘death’ is of interest – a situation
known as survival analysis. We shall use the phrase multi-state survival data for the data
that arise in the general case when observing subjects over time and several events may be
of interest during their life spans.
As indicated in the sub-title of the book, models for multi-state survival data can either
be specified via rates of transition between states or by directly addressing the risk of
occupying a given state at given points in time. A general approach for addressing risks and
other marginal parameters is via pseudo-values about which a whole chapter is provided.
The background for writing this book is our teaching of several courses on various aspects
of multi-state survival data – either for participants with a basic training in statistics or with
a clinical/epidemiological background. Several texts on multi-state models already exist.
These include the books by Beyersmann et al. (2012), Broström (2012), Geskus (2016),
and more recently, Cook and Lawless (2018). The book by Kalbfleisch and Prentice (1980,
2002) focuses on survival analysis but also discusses general multi-state models. Books
with more emphasis on mathematics are those by Andersen et al. (1993), Hougaard (2000),
Martinussen and Scheike (2006), and Aalen et al. (2008). In addition, several review papers
have appeared (e.g., Hougaard, 1999; Andersen and Keiding, 2002; Putter et al., 2007; An-
dersen and Pohar Perme, 2008; Bühler et al., 2023). In spite of the existence of these texts,
we were unable to identify a suitable common book for these different types of participants
and, importantly, none of the cited texts provide detailed discussions on the development
of methods based on pseudo-values.
With this book we aim at filling this gap (at least for ourselves) and to provide a text that
is applicable as the basis for courses for mixed groups of participants. By addressing at
least two types of readers, we deliberately run the risk of falling between two chairs; how-
ever, we believe that readers with different mathematical backgrounds and interests should
all benefit from the book. Those readers who appreciate some mathematical details can
read the book from the beginning to the end, thereby first getting a (hopefully) more intu-
itive introduction to the topics, including practical examples and, subsequently, in sections
marked with ‘(*)’ get more details. This will, unavoidably, entail some repetitions. On the
other hand, readers with less interest in mathematical details can read all sections that are
not marked with ‘(*)’ without losing the flow of the book. It should be emphasized that
we will from time to time refer to (*)-marked sections and to more technical publications
xi
xii PREFACE
from those more intuitive sections. The text includes several summary boxes that emphasize
highlights from recent sections.
The book discusses a number of practical examples of event history data that are meant to
illustrate the methods discussed. Sometimes, different statistical models that are fitted to the
same data are mathematically incompatible and we will make remarks to this effect along
the way. Software code for the calculations in the examples is not documented in the book.
Rather, code in R and SAS and data can be found on the book’s webpage. The webpage
also includes code for solving the practical exercises found at the end of each chapter and
solutions to theoretical exercises marked with (*).
The cover drawing probably calls for an explanation, as follows. When analyzing recurrent
events data and ignoring competing risks (which is quite common in applications), then
a curve like the top one on the figure may be obtained – a curve that is upwards biased.
However, then comes the book by the crow (Per’s middle name) and the raven (Henrik’s
last name) as a rescue and forces the curve downwards to avoid the bias. We wish to thank
Gustav Ravn for the cover drawing.
There are a number of other people to whom we wish to address our thanks and without
whose involvement the writing of this book would not have been possible.
First and foremost, our sincere thanks go to Julie K. Furberg who has carefully created
all figures and validated all analyses quoted in the book. She also gave valuable feedback
on the text. Eva N.S. Wandall thoroughly created solutions to practical exercises and con-
tributed to some of the examples.
Several earlier drafts of chapters were read and commented upon by Anne Katrine Duun-
Henriksen, Niels Keiding, Thomas H. Scheike, and Henrik F. Thomsen. Torben Marti-
nussen provided important input for Chapter 6.
A special thank you goes to those who have provided data for the practical examples:
Peter Aaby, Jules Angst, Flemming Bendtsen, John P. Klein, Bjørn S. Larsen, Thorkild
I.A. Sørensen, Niels Tygstrup, and Tine Westergaard. Permission to present analyses of
LEADER data was given by Novo Nordisk A/S.
We thank our employers: University of Copenhagen and Novo Nordisk A/S for letting us
work on this book project during working hours for several years. A special thank to Novo
Nordisk A/S for granting a stay at Favrholm Campus to finalize the book. Communication
with the publishers has been smooth and we are grateful for their patience.
Bagsværd Per Kragh Andersen
March 2023 Henrik Ravn
List of symbols and abbreviations
The following list describes main symbols and abbreviations used in the book:
α(t) Hazard (intensity, rate) function
β Regression coefficient
εh (τ) Expected length of stay in state h in [0, τ]
λ (t) Intensity process for counting process
µ(t) Mean number of recurrent events until time t; µ(t) = E(N(t))
Rt
A(t) Cumulative hazard function; A(t) = 0 α(u)du
xiii
xiv LIST OF SYMBOLS AND ABBREVIATIONS
LRT Likelihood ratio test
M(t) Martingale process
N(t) Counting process; Ni (t) for subject i
P(·) Probability
PL Partial likelihood
Qh (t) Probability to be in state h at time t (state occupation probability)
R(t) Risk set at time t
S(t) Survival distribution function for the random variable T ; 1 − F(t)
SD Standard deviation
Ti Event time for individual i
U(·) Score function or other estimating function
V (t) Multi-state process
Wi Weight for subject i
Xi Observation time for subject i; Xi = min(Ti ,Ci )
Y (t) Number at risk at time t
Yi (t) At-risk indicator for subject i
Zi Covariate for subject i, may be time-dependent: Zi (t)
Chapter 1
Introduction
In many fields of quantitative science, subjects are followed over time for the occurrence of
certain events. Examples include clinical studies where cancer patients are followed from
time of surgery until time of death from any cause (survival analysis), epidemiological co-
hort studies, e.g., registry-based, where disease-free individuals (‘exposed’ or ‘unexposed’)
are followed from a given calendar time until diagnosis of a certain disease, or demographic
studies where women are followed through child-bearing ages with the focus on ages at
which they give birth to live-born children. Data from such studies may be represented as
events occurring in continuous time and a mathematical framework in which to study such
phenomena is that of multi-state models where an event is considered a transition between
certain (discrete) states. We will denote the resulting data as multi-state survival data or
event history data.
Possible scientific questions that may be addressed in event history analyses include how
the mortality of the cancer patients is associated with individual prognostic variables such
as age, disease stage or histological features of the tumor, or what is the probability that ex-
posed or unexposed subjects are diagnosed with the disease within a certain time interval, or
what is the expected time spent for women as nulliparous depending on the socio-economic
status of her family.
An important feature of event history data is that of incomplete observation. This means
that observation of the event(s) of interest is precluded by the occurrence of another event,
such as end-of-study, drop-out of study, or death of the individual (in case the event of
interest is non-fatal). Here, as we shall discuss in more detail in Section 1.3, an important
distinction is between avoidable events (right-censoring) representing practical restrictions
in data collection that prevent further observation of the subject (e.g., end-of-study or drop-
out) and non-avoidable events (competing risks), such as the death of a patient. For the
former class of avoidable events, it is an important question whether the incomplete data
that are available to the investigator after censoring still suitably represent the population
for which inference is intended. This is the notion of independent censoring that will also
be further discussed in Section 1.3.
In this book we will discuss two classes of statistical models for multi-state survival data:
Intensity-based models and marginal models. Briefly, intensities or rates are parameters that
describe the immediate future development of the process conditionally on past information
on how the process has developed, while marginal parameters, such as the risk of being in
1
2 INTRODUCTION
a given state at a particular time, do not involve such a conditioning. Both classes of models
often involve explanatory variables (or covariates/prognostic variables/risk factors – terms
that we will use interchangeably in the book).
The first model class targets intensities and is inspired by standard hazard models for sur-
vival data, and we shall see that models such as the Cox (1972) proportional hazards model
also play an important role for more general multi-state survival data. Throughout, we will
use the terms intensity, hazard, and rate interchangeably. Models for intensities are dis-
cussed in Chapters 2 and 3.
The second model class targets marginal parameters (e.g., risks) and, here, one approach is
plug-in methods where the marginal parameter is estimated using intensity-based models.
Thus, the results from these models are either inserted (‘plugged’) into an equation giving
the relationship between the marginal parameter and the intensities, or they are used as the
basis for simulating a large number of realizations of the multi-state process, whereby the
marginal parameter may be estimated, a technique known as micro-simulation. Another
approach is models that directly target marginal parameters, and a number of such mod-
els will also be presented. Marginal models are discussed in Chapters 4 and 5. For direct
marginal models (or simply direct models), as we shall see in Chapter 6, pseudo-values
(or pseudo-observations) are useful. In the final Chapter 7, a number of further topics are
briefly discussed.
Sections marked with ‘(*)’ contain, as indicated in the Preface, more mathematical details.
Each chapter ends with a number of exercises where those marked with ‘(*)’ are more
technical.
Multi-state survival data
Multi-state survival data (event history data) represent subjects followed over time
for the occurrence of events of interest. The events occur in continuous time and an
event is considered a transition between discrete states.
0 1
Alive - Dead
of the trial, an increased use of liver transplantation as a possible treatment for patients with
this disease forced the investigators to reconsider the trial design. Liver transplantation was
primarily offered to severely ill patients and, therefore, censoring patients at the time of
transplantation would likely leave the investigators with a sample of ‘too well’ patients
that would no longer be representative of patients with PBC. This led them to redefine the
main event of interest to be ‘failure of medical treatment’ defined as the composite end-
point of either death or liver transplantation, whichever occurred first. This is because both
death and the need of a liver transplantation signal that the medical treatment is no longer
effective. Patients were followed from randomization until treatment failure, drop-out or
January 1989; 61 patients died (CyA: 30, placebo: 31), another 29 were transplanted (CyA:
14, placebo: 15) and 4 patients were lost to follow-up before January 1989. For patients
lost to follow-up and for those alive without having had a liver transplantation on January
1989, all that is known about time to failure was that it exceeds time from randomization
to end of follow-up.
Figure 1.1 shows the general two-state model for survival data with states ‘0: Alive’ and ‘1:
Dead’ and one possible transition from state 0 to state 1 representing the event ‘death’. In
the PBC3 trial, this model is applicable with the two states representing: (0) ‘Alive without
transplantation’ and (1) ‘Dead or transplantation’ and the transition, 0 → 1, representing
the event of interest – failure of medical treatment.
PBC3 was a randomized trial and, therefore, the explanatory variable of primary inter-
est was the treatment indicator. However, in addition, a number of clinical, biochemical,
and histological variables were recorded at entry into the study. Studying the distribution
of such prognostic variables in the two treatment groups, it appeared that, in spite of the
randomization, the CyA group tended to present with somewhat less favorable values of
these variables than the placebo group. Therefore, evaluation of the treatment effect with
or without adjustment for explanatory variables shows some differences to be discussed in
later chapters.
Another option than defining the composite end-point ‘failure of medical treatment’ would
be to study the two events ‘death without transplantation’ and ‘liver transplantation’ sepa-
rately. This would enable a study of possibly different effects of treatment (and other co-
variates) on each of these separate events. This situation is depicted in Figure 1.2, showing
the general competing risks model. Compared to Figure 1.1 it is seen that the initial state
‘Alive’ is the same whereas the final state ‘Dead’ is now split into a number, k separate
states, transitions into which represent deaths from different causes. For the PBC3 trial,
4 INTRODUCTION
1
3 Dead, cause 1
0 q
Alive q
q
Q
Q
Q
Q
Q
Q k
QQ
s Dead, cause k
state 0 represents, as before, ‘Alive without transplantation’ and there are k = 2 final states
representing, respectively, ‘1: Transplantation’ and ‘2: Dead without transplantation’. The
event ‘liver transplantation’ is a 0 → 1 transition and ‘death without liver transplantation’ a
0 → 2 transition. Some patients died after liver transplantation. However, the initial medical
treatment (CyA or placebo) was no longer considered relevant after a transplantation, so,
information on mortality after transplantation was not ascertained as a part of the trial and
is not available.
As in the PBC3 example, there are two relevant states ‘Alive’ and ‘Dead’ as represented in
Figure 1.1. The censoring events include out-migration between visits and being alive at the
subsequent visit. Table 1.1 provides the basic mortality for vaccinated and non-vaccinated
children.
This study was an observational study, as allocation to vaccination groups was not random-
ized. This means that any observed association between vaccination status and later mor-
tality may be confounded because of uneven distributions of mortality risk factors among
vaccinated and non-vaccinated children. Thus, there may be a need to adjust for covariates
ascertained at the initial visit in the analysis of mortality and vaccinations.
In principle, information on vaccines received between visits was available for surviving
children at the next visit. However, these extra data are discarded as, culturally, the be-
longings of deceased children, including immunization cards, are destroyed implying a dif-
ferential information on vaccines given between visits and leading to immortal time bias.
Deaths Deaths
without after
Treatment group Patients Bleedings bleeding bleeding Drop-out
Sclerotherapy only 73 13 13 5 5
Propranolol only 68 12 5 6 7
Both treatments 73 12 20 10 5
No treatment 72 13 8 8 3
Total 286 50 46 29 20
non-seminomas) and ‘Death without testis cancer’. The censoring events are emigration
and end-of-study. However, because of the rather large data set with more than a million
cohort members, the raw data set with individual records was first tabulated according to
the explanatory variables (including ‘current age of the son’) where, for each combination
of these variables, the person-years at risk and numbers of seminomas and non-seminomas
are given. In a similar fashion, the numbers of deaths could be tabulated; however, that
information was not part of the available data and this has a number of consequences for
the analyses that are possible to conduct for this study. This will be discussed later (Sec-
tion 3.6).
0 1
Disease-free - Diseased
S
S
S
S 2
S
w Dead
/
As for the case with the PBC3 trial (Section 1.1.1), there were two censoring events:
Drop-out and end-of-study. Furthermore, a number of potential explanatory variables were
recorded at entry into the PROVA trial. These variables may be used when studying the
prognosis of the patients.
0 1
-
At risk Not at risk
S
S
S
S 2
S
w Dead
/
Figure 1.4 The illness-death model with recovery, applicable for recurrent episodes with a terminal
event, i.e., situations with a terminal event and with periods between times at which subjects are at
risk for a new event.
Sometimes, in spite of the fact that there are intervals between the ‘at-risk periods’, focus
may be on times from the initiation of one episode to the initiation of the next rather on
times between events. In such an approach, depicted in Figure 1.5, the interval not at risk is
included in the time between events and, thereby by definition, there are no such intervals.
Note that the terminal state has been re-labelled as ‘D’. We will denote recurrent events
where the events have a certain duration (as in this example) as recurrent episodes.
0 1 2
No event - 1 event - 2 events - ···
@
@
@
@
@
@ ?
@ D
R Dead
@
Figure 1.5 A multi-state model for recurrent events with a terminal event and no intervals between
at-risk periods.
In these data, a number of explanatory variables that may affect the outcome and its asso-
ciation with the initial diagnosis (unipolar vs. bipolar) were recorded at the time when the
initial diagnosis was given. These potential confounders include sex and age of the patient
and calendar time at the initial diagnosis.
EXAMPLES OF EVENT HISTORY DATA 9
In situations without a terminal event, e.g., when mortality is negligible, the models in
Figures 1.4 and 1.5 without the final ‘Dead’ state may be applicable.
to the treatment is graft versus host disease (GvHD) where the infused donor cells react
against the patient.
A data set compiled from the Center for International Blood and Marrow Transplant Re-
search (CIBMTR) was analyzed by Andersen and Pohar Perme (2008) with the main pur-
pose of studying how occurrence of the GvHD event affected relapse and death in remis-
sion. The CIBMTR is comprised of clinical and basic scientists who confidentially share
data on their blood and bone marrow transplant patients with the CIBMTR Data Collection
Center located at the Medical College of Wisconsin, Milwaukee, USA. The CIBMTR is
a repository of information about results of transplants at more than 450 transplant cen-
ters worldwide. The present data set consists of 2,009 patients from 255 different centers
who received an HLA-identical sibling transplant between 1995 and 2004 for acute myel-
ogenous leukemia (AML) or acute lymphoblastic leukemia (ALL) and were transplanted in
first complete remission, i.e., when the pre-conditioning has eliminated the leukemia symp-
toms. All patients received BM or peripheral blood (PB) stem cell transplantation. Table
1.4 gives an overview of the events observed during follow-up (until 2007). The 1, 272
(= 2, 009 − 737) patients who were still alive at that time are censored.
Figure 1.6 shows the states and events for this study. A number of potential prognostic
variables for the events (GvHD, relapse and death) were ascertained at time of transplan-
tation. These variables include disease type (ALL vs. AML), graft type (BM or BM/PB),
and sex and age of the patient. Sometimes, GvHD is considered, not a state but rather a
time-dependent covariate, in which case the diagram in Figure 1.3 would be applicable
with states BMT, relapse and dead.
0 1
BMT - GvHD
H
HH
H
? HH ?
2 H
3
HHH
Relapse
j
H
- Dead
Figure 1.6 Bone marrow transplantation (BMT) in acute leukemia: States and transitions (GvHD:
Graft versus host disease).
reported on a follow-up until 2013 of 678 participants with the purpose of studying the
association between excessive supra-ventricular ectopic activity (ESVEA, a particular kind
of irregular heart rhythm detected via the Holter monitoring) and later atrial fibrillation (AF,
a serious heart arrhythmia affecting the blood circulation) and stroke. It was well known that
ESVEA increases the incidence of AF, but one purpose of the study was to examine whether
the incidence of stroke in patients with ESVEA was increased over and above what could
be explained by an increase in the occurrence of AF. Events of AF, stroke and death during
follow-up were ascertained via the Danish National Patient Registry. Figure 1.7 shows the
possible states and transitions that can be studied based on these data. Note that, compared
to Figure 1.6, this state diagram allows a 2 → 1 transition; however, such transitions do not
impact the basic scientific question raised in the Copenhagen Holter Study. Table 1.5 shows
the number of patients who were observed to follow the different possible paths through
these states according to ESVEA at time of recruitment. From this table it appears that AF
occurred in 18% of the patients with ESVEA and in 10% of those without, stroke occurred
in 21% of patients with ESVEA and in 9% of those without. Among those who experienced
AF without stroke, 25% of patients with ESVEA later had a stroke. The similar fraction for
patients without ESVEA was 9%. An analysis of these interacting events must account for
the fact that patients may also die without AF and/or stroke events.
12 INTRODUCTION
0 1
No event -
*
AF
HH
H
HH
? H ?
2 HH 3
H
Stroke
H
j
H
- Dead
Figure 1.7 Copenhagen Holter study: States and transitions (AF: Atrial fibrillation).
Table 1.5 Copenhagen Holter study: Number of patients following different paths (ESVEA: Exces-
sive supra-ventricular ectopic activity; AF: Atrial fibrillation).
Number of patients
Observed path Without ESVEA With ESVEA Total
0 320 34 354
0 → AF 29 8 37
0 → Stroke 17 1 18
0 → AF → Stroke 3 1 4
0 → Stroke → AF 4 0 4
0 → Dead 158 32 190
0 → AF → Dead 20 4 24
0 → Stroke → Dead 25 14 39
0 → AF → Stroke → Dead 2 3 5
0 → Stroke → AF → Dead 1 2 3
Total 579 99 678
The Copenhagen Holter study was an observational study, so, adjustment for covariates
(potential confounders) may be needed when examining the association between the ‘ex-
posure’ ESVEA and later events like AF or stroke. A number of covariates were observed
at the examination at the time of recruitment, including smoking status, age, sex, blood
pressure, and body mass index. The follow-up in this study was, like in the testis cancer
incidence study, Example 1.1.3, registry-based and in Denmark this means that, in princi-
ple, there is no loss to follow-up (except for the fact that patients may emigrate before the
end of the study which, however, nobody did). As a consequence the only censoring event
is end-of-study (alive in 2013). This data set will mainly be used for practical exercises
throughout the book.
PARAMETERS IN MULTI-STATE MODELS 13
Examples of event history data
1. PBC3 trial: Randomized trial of effect of CyA vs. placebo on survival and liver
transplantation in patients with Primary Biliary Cirrhosis (n = 349).
2. Guinea-Bissau study: Observational study of effect of childhood vaccinations
on survival (n = 5, 274).
3. Testis cancer study: Register study on the relationship between maternal parity
and testicular cancer rates of their sons (n = 1, 015, 994).
4. PROVA trial: Randomized trial of effect of propranolol and/or sclerotherapy on
the occurrence of bleeding and death in patients with liver cirrhosis (n = 286).
5. Recurrent episodes in affective disorders: Observational study of pattern of
repeated disease episodes for patients with unipolar or bipolar disorder (n = 119).
6. LEADER trial: Randomized trial in type 2 diabetics with high cardiovascular
risk – effect of liraglutide vs. placebo on cardiovascular events (n = 9, 340).
7. Bone marrow transplantation: Observational study of effect of graft versus host
disease (GvHD) on relapse and death in remission among bone marrow trans-
planted patients with leukemia (n = 2, 009).
8. Copenhagen Holter study: Observational study of the association between ex-
cessive supra-ventricular ectopic activity (ESVEA) and later atrial fibrillation
(AF) and stroke (n = 678).
we can consider the study of children in Guinea-Bissau (Example 1.1.2) where a choice of
a primary time-variable for the survival analysis is needed. For the time-variable ‘time
since initial visit’, say t, all children will be followed from the same time zero. This time-
variable has the advantage of all risk factors being ascertained at the same time; however,
the mortality rate for the children will likely not depend strongly on t. Thus, an alternative to
using t as the time-variable would be to use (current) age of the children, a time-variable that
will definitely affect the mortality rates. Some children were born into the study because
the mother was followed during pregnancy and those children will be followed from age 0.
Other children will only be included, i.e., being at risk of dying in the analysis, at a later
age, namely the age at which the child was first observed to be in state 0 (initial visit).
This is an example of delayed entry that will be further discussed in Section 2.3. Also for
Example 1.1.3 (testis cancer incidence and maternal parity), there will be delayed entry
when age is chosen as the primary time-variable because only boys born after 1968 are
followed from birth.
For illustration, consider the small set of survival data provided in Table 1.6 and Figure
1.8. The subjects were followed from a time zero of entry (recruitment). Let t be the time
since entry. Additionally, the age at time zero and at time of exit is given. Figure 1.8a
depicts the survival data using time t (time since entry) as time-variable and Figure 1.8b
the same survival data using age as time-variable, illustrating delayed entry. It is seen that
the same follow-up intervals are represented in the two figures. These intervals, however,
are re-allocated differently along the chosen time axis.
12 12
11 11
10 10
9 9
Subject number
Subject number
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 5 10 15 20 25 0 5 10 15 20 25
Time (t) Age
Figure 1.8 Small set of survival data: Twelve subjects of whom seven died during the study (dots)
and five were censored (circles).
uncensored survival times are applicable as estimates of E(T ) (both will under-estimate the
mean, see Exercise 1.1). Similarly, the relative frequency of observation times greater than,
say t = 10, cannot be used as an estimator of the probability P(T > t) of surviving past
time t because of the censored observations ≤ t for which the corresponding true survival
times T may or may not exceed t.
These considerations illustrate that other parameters and methods of estimation are required
for multi-state survival data and, in the following, we will discuss such parameters. Having
decided on a time zero, we let
(V (t), t ≥ 0)
be the multi-state process denoting, at time t, the state occupied at that time among a num-
ber of discrete states h = 0, . . . , k. For the two-state model for survival data in Figure 1.1,
the multi-state process at time t can take the values V (t) = 0 or V (t) = 1. One set of param-
eters of interest is the state occupation (or ‘occupancy’) probabilities at any time, t. Denote
the probability (risk) of being in state h at time t as
Qh (t) = P(V (t) = h);
then the sum of these over all possible states will be equal to 1
k
Q0 (t) + · · · + Qk (t) = ∑ Qh (t) = 1.
h=0
In the two-state model for survival data, Figure 1.1, with the random variable T being time
to death, the state 0 occupation probability Q0 (t) is the survival function, i.e.,
Q0 (t) = S(t) = P(T > t),
and Q1 (t) is the failure distribution function, i.e.,
Q1 (t) = F(t) = P(T ≤ t) = 1 − S(t).
16 INTRODUCTION
(a) Time since entry as time-variable (b) Age as time-variable
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
Estimate of S(age)
Estimate of S(t)
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
0 5 10 15 20 25 0 5 10 15 20 25
Time (t) Age
For the small set of survival data (Table 1.6), Figure 1.9 provides estimates of the survival
function, S(·), with either (a) time since entry or (b) age as time-variable. The Kaplan-
Meier estimator, which we will return to later in the book (Sections 4.1.1 and 5.1.1) was
used for the estimation of S(·). Note that the shapes of the survival functions are somewhat
different, illustrating the importance of choice of time zero. For any time point, the vertical
distance from the curve up to 1 represents F(t) = 1 − S(t).
The probabilities Qh (t) are examples of marginal parameters, i.e., at time t, their value is
not conditional on the past history (V (s), s < t) of the multi-state process (though they may
involve covariates recorded at time zero). Other marginal parameters include the expected
time, εh (·), spent in state h, either during all times, i.e., all the way up to infinity, εh (∞), or
up to some threshold time τ < ∞, εh (τ). The latter parameters have the property that they
add up to τ, i.e.,
k
ε0 (τ) + · · · + εk (τ) = ∑ εh (τ) = τ,
h=0
because the time from 0 to τ has to be divided among the possible states. For the two-state
model (Figure 1.1), ε0 (τ) is the τ-restricted mean life time, i.e., the expected time lived
before time τ, and ε1 (τ) = τ − ε0 (τ) is the expected time lost before time τ. Figure 1.10
illustrates the estimated restricted mean life time for τ = 12, the area under the survival
curve, for the small set of survival data.
In cases where all subjects are in the same state ‘0’ at time zero (which is the case for all the
multi-state models depicted in Section 1.1), the distribution of the time (Th ) from time zero
until (first) entry into another state h is another marginal parameter. Examples include the
distribution of the survival time (time until entry into state 1 in Figure 1.1), time to event
no. h in a recurrent events situation (e.g., Figure 1.5), or time to relapse or to GvHD in the
model for the bone marrow transplantation data (Example 1.1.7, Figure 1.6). Note that, in
the last two examples, not all subjects will eventually enter into these states and the entry
PARAMETERS IN MULTI-STATE MODELS 17
Figure 1.10 Small set of survival data: Illustration of estimated restricted mean life time before time
τ = 12.
times must be defined properly (formally, the value of these times may be infinite and Th is
denoted an improper variable).
For the recurrent events multi-state models (Figures 1.4 and 1.5), another marginal param-
eter of interest is the expected number of events, say µ(t) = E(N(t)) before time t where
N(t) counts events before time t. In Figure 1.4, this is the expected number of times that
state 1 is visited before time t.
Marginal parameters
Marginal parameters express the following type of quantities: You place yourself
at time 0 and ask questions about aspects of the multi-state process V (·) at a later
time point t (without consideration of what may happen between 0 and t). We will
discuss the following marginal parameters:
• State occupation probability: Qh (t) = P(V (t) = h), the probability (risk) of
being in state h at time t.
• Restricted mean: εh (τ), the expected time spent in state h before time τ.
• Distribution of time Th to (first) entry into state h; relevant if everyone starts at
time 0 in the same state (0).
• Expected number of events µ(t) before time t; relevant for a recurrent events
process N(·), counting the number of events over time.
18 INTRODUCTION
1.2.3 Conditional parameters
Another set of parameters in multi-state models for a time point t is conditional on the past
up to that time. The most important of such parameters are the transition intensities (or
rates or hazards – as emphasized in the Introduction, we will use these notions interchange-
ably). For two different states, h, j these are (for a ‘small’ dt > 0) given by
Ph j (s,t) = P(V (t) = j | V (s) = h and the past before time s),
i.e., the conditional probability of being in state j at time t given state h at the earlier time
s and given the past at time s. In the common situation where all subjects are in an initial
state (say, 0) at the time origin, t = 0 we have for any state h that
the state occupation probability at time t. However, more generally, transition probabilities
are more complicated parameters than the state occupation probabilities because they may
depend on the past at time s > 0. If Ph j (s,t) only depends on the past via the state (h)
occupied at time s then the multi-state process is said to be a Markov process. Note that
the parameters αh j (s) and Ph j (s,t) involve conditioning on the past, but never on the future
beyond time s. Indeed, conditioning on the future is a quite common mistake in multi-state
survival analysis (e.g., Andersen and Keiding, 2012) – a mistake that we will sometimes
refer to in later chapters.
The most simple example of a transition intensity is the hazard function, α01 (t) = α(t), in
the two-state model, in Figure 1.11. For some states, the transition intensities out of that
state are all equal to 0, i.e., no transitions out of that state are possible. An example is the
state ‘Dead’ in Figure 1.11 and such a state is said to be absorbing, whereas a state that is
not absorbing is said to be transient, an example being the state ‘Alive’ in that figure. For
the competing risks model (Figure 1.2), there is a transition intensity from the transient state
0 to each of the absorbing states, the cause-specific hazards for cause h = 1, . . . , k having
the interpretations α0h (t)dt ≈ the conditional probability of failure from cause h at time t
given no failure before time t. In Figure 1.12, we have added the cause-specific hazards to
the earlier Figure 1.2 in the same way as we did when going from Figure 1.1 to Figure 1.11.
PARAMETERS IN MULTI-STATE MODELS 19
0 1
α01 (t)
Alive - Dead
Figure 1.11 The two-state model for survival data with hazard function for the 0 → 1 transition.
1
3 Dead, cause 1
α01 (t)
0 q
Alive q
q
Q
Q
Q
Q
α0k (t) Q
Q
k
QQ
s Dead, cause k
Figure 1.12 The competing risks model with cause-specific hazard functions.
In a similar way, transition intensities may be added to the other box-and-arrow diagrams
in Section 1.1.
Intuitively, if one knows all transition intensities at all times. then both the marginal pa-
rameters and the transition probabilities may be calculated. This is because, by knowing
the intensities, numerous paths for V (t) may be generated by moving forward in time in
small steps (of size dt), whereby Qh (t), εh (τ), and Ph j (s,t) may be computed as simple
averages over these numerous paths. This is, indeed, true and it is the idea behind micro-
simulation that we will return to in Section 5.4. In some multi-state models, including the
two-state model and the competing risks model (Figures 1.1, resp. 1.11 and 1.2, resp. 1.12),
the marginal parameters may also be computed explicitly by certain mathematical expres-
sions, e.g., the probability of staying in the initial state 0 in the two-state model (the survival
20 INTRODUCTION
function) is given by the formula
Zt
S(t) = Q0 (t) = exp − α(u)du (1.2)
0
that expresses how to get the survival function from the hazard function. Likewise, for the
competing risks model, the probability of being in the final, absorbing state h = 1, . . . , k at
time t is given by Z t
Qh (t) = S(u)α0h (u)du, (1.3)
0
where, in Equation (1.2), α = α01 + · · · + α0k . This probability is frequently referred to as
the (cause h-) cumulative incidence function, Fh (t), a name that originates from epidemi-
ology where that name means ‘the cumulative risk of an event over time’, see, e.g., Szklo
and Nieto (2014, ch. 2). In Chapter 4, we will give intuitive arguments why Equations (1.2)
and (1.3) look the way they do.
Models for multi-state survival data, e.g., regression models where adjustment for covari-
ates is performed, may conveniently be specified via the transition intensities, the Cox
(1972) regression model for survival data being one prominent such example. Intensity-
based models are studied in Chapters 2 and 3. Having modeled all intensities, marginal pa-
rameters in simple multi-state models may be obtained by plugging-in the intensities into
expressions like Equation (1.2) or (1.3). However, the marginal parameters may depend on
the intensities in a non-simple fashion, and it is therefore of interest to aim at setting up
direct regression models for the way in which, e.g., εh (t), depends on covariates. Marginal
models (both models based on plug-in and direct models) are the topic of Chapters 4 and 5
(see also Chapter 6 where such direct models are based on pseudo-values).
Conditional parameters
Conditional parameters for a multi-state process V (·) quantify, at time t, the future
development of the process conditionally on the past of the process before t. We will
discuss two types of conditional parameters:
1. Transition intensities: αh j (t) gives the probability per time unit of moving to
state j right after time t given that you are in state h at t and given the past up to t
αh j (t) ≈ P(V (t + dt) = j | V (t) = h and the past for s < t)/dt,
((0,V (0)), (T1 ,V (T1 )), (T2 ,V (T2 )), . . . , (X,V (X))),
with one record per subject where T1 , T2 , . . . , TN−1 are the observed times of transition and X
is either the time TN of the last transition into an absorbing state or the time, C of censoring
and are said to be in wide format (or marked point process format). Such a format may
typically be directly obtained from raw data consisting of dates where events happened
(together with date of entry into the study and/or date of birth).
We will, in later chapters, typically assume that data for independent and identically dis-
tributed (i.i.d.) subjects i = 1, . . . , n are observed and, in situations where data may be de-
pendent, we will explicitly emphasize this.
Wide format is, however, less suitable as a basis for the analysis of the data, for which
purpose data are transformed into long format (or counting process format) where each
subject may be represented by several records. Here, each record typically corresponds to
a given type of transition, say from state h to state j and includes time of entry into state h,
time last seen in state h and information on whether the subject, at this latter time, made a
transition to state j, the ‘(Start, Stop, Status)’ triple.
As the name suggests, data in long format are closely linked to the mathematical repre-
sentation of observations from multi-state processes via counting processes. Thus, for each
possible transition, say from state h to another state j, data from a given subject, i may be
represented as the counting process
Nh ji (t) = No. of direct h → j transitions observed for subject i in the interval [0,t],
together with the indicator of being at risk for that transition at time t− (i.e., ‘just before
time t’)
Yhi (t) = I(subject i is in state h at time t−).
Here, the indicator function I(· · · ) is 1 if · · · is true and 0 otherwise. Note that, for several
multi-state models depicted in Section 1.1, the number of observed events of a given type
is at most 1 for any given subject; an exception is the model for recurrent events in Figure
1.4 where each subject may experience several 0 → 1 and 1 → 0 transitions. Counting
processes, N(t) = Nh j (t) with more than one jump may also be constructed by adding up
processes for individual subjects,
N(t) = ∑ Nh ji (t).
i
Likewise, the total number at risk at time t, Y (t) = Yh (t), is obtained by adding up individual
at-risk processes,
Y (t) = ∑ Yhi (t).
i
22 INTRODUCTION
(a) Counting process, N(t) (b) Number at risk, Y (t)
12 12
11 11
10 10
9 9
8 8
7 7
N(t)
Y(t)
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time (t) Time (t)
12 12
11 11
10 10
9 9
8 8
7 7
N(age)
Y(age)
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Age Age
Figure 1.13 Small set of survival data: Counting process and number at risk using t as time-variable
(a-b) and age as time-variable (c-d).
Figure 1.13 shows N(t) = N01 (t) and Y (t) = Y0 (t) for the small set of survival data from Ta-
ble 1.6. When using t (time since entry) as time-variable the at-risk function Y (·) is mono-
tonically decreasing, while using age as time-variable the at-risk function also increases
due to the delayed entries.
For a small value of dt > 0 we define the jump at time T for the counting process N(·) to be
dN(T ) = N(T + dt) − N(T −), i.e., dN(t) = 1 if t = T is an observed time of transition and
dN(t) = 0 otherwise. The representation using counting processes has turned out to be very
useful when formulating estimators in multi-state models and their statistical properties.
We will next illustrate how typical raw data in wide format (with one record per subject)
may be transformed into long format where each subject may be represented by several
PARAMETERS IN MULTI-STATE MODELS 23
records. We will use the PROVA trial (Section 1.1.4) to illustrate the ideas and assume that
the dates doe, dob, dod, and dls are defined as in Table 1.7.
Table 1.7 PROVA trial in liver cirrhosis: Date variables available (NA: Not available).
Date Description
1 doe Date of entry into the study, i.e., date of random-
ization.
2 dob (> doe) Date of transfusion-requiring bleeding; if no
bleeding was observed, then dob=NA .
Time of bleeding is T1 = dob − doe.
3 dod (> doe) Date of death; if no death was observed, then
dod=NA.
If dod 6= NA, then time of death is T2 = dod − doe
and this is also the right-hand end-point, say X, for
the interval of observation.
4 dls (> doe) Date last seen; if dod=NA then dls = date of
censoring in which case the censoring time is
C = dls − doe and this equals the right-hand end-
point, X for the interval of observation. If dod 6=
NA, then dls is equal to dod.
Note that the inequalities given in the table should all be checked with the data to make sure
that the recorded dates are consistent with reality. Also, if both dob and dod are observed,
then the inequality dob ≤ dod should hold. A data set with one line per subject containing
these date variables is in wide format. From the basic date-variables, times and types of
observed events may be defined as shown in Table 1.8, thereby taking the first step towards
transforming the data into long format.
Table 1.8 PROVA trial in liver cirrhosis: Observed transitions for different patterns of observed
dates (NA: Not available).
Here, times refer to time since entry, i.e., the time origin t = 0 is the time of randomization.
The resulting counting processes and at-risk processes for the data in long format are shown
in Table 1.9. Examples of how realizations of the multi-state process V (t) would look like
for specific values of T1 , T2 ,C are shown in Figure 1.14.
Based on these observations, one may now construct three data sets – one for each of the
possible transitions. Each record in the data set for the h → j transition has the structure
24 INTRODUCTION
Table 1.9 PROVA trial in liver cirrhosis: Counting processes and at-risk processes for different
patterns of observed dates (NA: Not available).
dob dod N01 (t) N02 (t) N12 (t) Y0 (t) Y1 (t)
1 NA NA 0 0 0 I(C ≥ t) 0
2 Yes NA I(T1 ≤ t) 0 0 I(T1 ≥ t) I(T1 < t ≤ C)
3 NA Yes 0 I(T2 ≤ t) 0 I(T2 ≥ t) 0
4 Yes Yes I(T1 ≤ t) 0 I(T2 ≤ t) I(T1 ≥ t) I(T1 < t ≤ T2 )
0 a - 0 r -
C t T1 C t
0 r - 0 r -
T2 t T1 T2 t
Figure 1.14 PROVA trial in liver cirrhosis: The process V (t) for different patterns of observed dates
corresponding to the rows in Tables 1.8–1.10.
This is the data set in long format. In the PROVA example, each subject contributes at most
with one record to each data set as shown in Table 1.10 where the Status variable is 1 if a
h → j transition was observed at time Stop (i.e., Status=dNh j (Stop)).
Note that an observed 0 → 1 transition also gives rise to a record in the data set for 0 →
2 transitions ending in no transition (and vice versa). Also note that, in the data set for
the 1 → 2 transition, there is delayed entry meaning that subjects are not at risk of that
transition from time zero but only from a later time point (Start, T1 ) where the subject
was first observed to be in state 1. Presence of delayed entry is closely connected to the
choice of time-variable. Thus, had one chosen to consider the 1 → 2 transition intensity
in the PROVA trial to depend primarily on time since bleeding (and not on time since
PARAMETERS IN MULTI-STATE MODELS 25
Table 1.10 PROVA trial in liver cirrhosis: Records (Start, Stop, Status) in the three data sets
for different patterns of observed dates (NA: Not available).
randomization), then there would have been no delayed entry. The example also illustrates
that even though a basic time origin is chosen for a given multi-state model (here time of
randomization), there may be (later) transitions in the model for which another time origin
may be more appropriate.
log(Qh (t)) = β0 + β1 Z1 + β2 Z2 + · · · + β p Z p ,
i.e., the covariate effects on Qh (t) are linear on the scale of the logarithm (the link function)
of the risk. Note that we will often refer to a coefficient, β j , as the ‘effect’ of the corre-
sponding covariate Z j – also in situations where a causal interpretation is not aimed at. The
expression on the right-hand side of this equation is the linear predictor
LP = β1 Z1 + β2 Z2 + · · · + β p Z p (1.4)
and involves regression coefficients β1 , β2 , . . . , β p (but not the intercept, β0 ). The interpre-
tation of a single β j is, as follows. Consider two subjects differing 1 unit for covariate j
and having identical values for the remaining covariates in the model. Then the difference
between the log(risks) for those two subjects is
βj = (β0 + β1 Z1 + · · · + β j−1 Z j−1 + β j (Z j + 1) + · · · + βpZp)
− (β0 + β1 Z1 + · · · + β j−1 Z j−1 + β jZ j + · · · + β p Z p ).
Thus, exp(β j ) is the risk ratio for a 1 unit difference in Z j for given values of the re-
maining covariates in the model. It is seen that not only does the interpretation of β j de-
pend on the chosen link function (here, the logarithm) but also on which other covariates
26 INTRODUCTION
(Z1 , . . . , Z j−1 , Z j+1 , . . . , Z p ) that are included in the model. Therefore, a regression coeffi-
cient, e.g., for a treatment variable, unadjusted for other covariates is likely to differ from
one that is adjusted for sex and age. (One exception is when the model is linear, i.e., the
link function is the identity, and treatment is randomized and, thereby, independent of sex
and age and other covariates in which case the parameter is collapsible, see, e.g., Daniel
et al., 2021). Nevertheless, for a number of reasons, regression models and their estimated
coefficients are useful in connection with the analysis of multi-state survival data.
First of all, regression models describe the association between covariates and intensities
or marginal parameters in multi-state models and insight may be gained from these asso-
ciations when trying to understand the development of the process. In this connection, it
is also of interest to compare estimates of β j across models with different levels of adjust-
ments, e.g., do we see similar associations with Z j with or without adjustment for other
covariates? Another major use of multi-state regression models is prediction, e.g., what is
the estimated risk of certain events for a subject with given characteristics? These aspects
will be further illustrated in the later chapters.
However, the proper answer to a scientific question posed need not be given by quoting a
coefficient from a suitable regression model in which case other target parameters should
be considered. We will see that regression models are still useful ‘building blocks’ when
targeting alternative parameters. As an example of how a target parameter properly address-
ing the scientific question posed may be chosen, we can consider the PBC3 trial (Example
1.1.1). Here, the question of interest is whether treatment with CyA prolongs time to treat-
ment failure and, since the study was randomized, this may be answered by estimating and
comparing survival curves (S(t) – see Section 1.2.2) for the CyA and placebo groups. How-
ever, as we shall see in Section 2.2.1, randomization was not perfect and levels of important
prognostic variables (albumin and bilirubin) tended to be more beneficial in the placebo
group than in the CyA group. For this reason (but also influenced by non-collapsibility),
the estimated regression coefficients for treatment with or without adjustment for these two
variables will differ. Also, estimated survival curves for treated and control patients will
vary with their levels of albumin and bilirubin and it would be of interest to estimate one
survival curve for each treatment group that properly accounts for the covariate imbalance
between the groups. Such a parameter, the contrast (e.g., difference or ratio) between the
survival functions in the two groups had they had the same covariate distribution may be
obtained using the g-formula (e.g., Hernán and Robins, 2020, ch. 13) and works by aver-
aging individually predicted curves over the observed distribution of albumin and bilirubin
(Z2 , Z3 ). Thus, two predictions are performed for each subject, i: One setting treatment (Z1 )
to CyA and one setting treatment to placebo and in both predictions keeping the observed
values (Z2i , Z3i ) for albumin and bilirubin. The predictions for each value of treatment are
then averaged over i = 1, . . . , n
1 n b
Sbj (t) = ∑ S(t | Z1 = j, Z2 = Z2i , Z3 = Z3i ), j = CyA, placebo. (1.5)
n i=1
We will illustrate the use of the g-formula in later chapters. Here, a challenge will be to do
inference for the treatment contrast (i.e., to assess the uncertainty in the form of a confi-
dence interval) and, typically, a bootstrap procedure will be applied (e.g., Efron and Tib-
shirani, 1993, ch. 6).
INDEPENDENT CENSORING AND COMPETING RISKS 27
In the final chapter of the book (Section 7.3), we will also discuss under what circumstances
the resulting treatment contrast may be given a causal interpretation. There, we will also de-
fine what is meant by causal, discuss alternative approaches to causal inference, and under
which assumptions (including that of no unmeasured confounders) a causal interpretation
is possible.
A multi-state model is given by a number of different states that a subject can occupy
and the possible transitions between the states. The transitions represent the events
that may happen. Such a model can be depicted in a box-and-arrow diagram where
the transition intensities may be indicated (e.g., Figures 1.11 and 1.12).
These diagrams show the possible states in a completely observed population, i.e.,
censoring is not a state in the model. If one particular transition is of interest, then
other transitions in the multi-state model, possibly competing with that, are non-
avoidable events that must be properly addressed in the analysis and should not be
treated as a (potentially, avoidable) censoring.
Corresponding to these states there are state occupation (or ‘occupancy’) probabilities
giving the marginal distribution over the states at time t, so we have for all t that ∑h Qh (t) =
1.
In the two-state model for survival data in Figure 1.1, Q0 (t) is the probability of being still
alive time t, the survival function, often denoted S(t), and Q1 (t) = 1 − Q0 (t) is the failure
distribution function, F(t) = 1 − S(t). In Figure 1.2, Q0 (t) is also the survival function,
S(t), and Qh (t), h = 1, . . . , k are the cumulative incidence functions for cause h, i.e., the
probability Fh (t) of failure from cause h before time t. The probability Fh (t) is sometimes
referred to as a sub-distribution function as Fh (∞) < 1.
Another marginal parameter of interest, which may be obtained from the state occupation
probabilities, is the expected time spent in a given state (expected length of stay). For state
h, this is given by
Z ∞ Z∞
εh (∞) = E I(V (t) = h)dt = Qh (t)dt. (1.9)
0 0
Since we have to deal with right-censoring, whereby information about the process V (t)
for large values of time t is limited, restricted means are often studied, i.e.,
Z τ
εh (τ) = Qh (t)dt (1.10)
0
for some suitable time threshold, τ < ∞. This is the expected time spent in state h in the
interval from 0 to τ. Since, for all t, ∑h∈S Qh (t) = 1 it follows that ∑h∈S εh (τ) = τ.
For the two-state model for survival data (Figure 1.1), ε0 (∞) is the expected life time E(T )
and ε0 (τ) is the τ-restricted mean life time E(T ∧ τ), the expected time lived before time
τ and, thus, ε1 (τ) = τ − ε0 (τ) is the expected time lost before time τ. For the competing
risks model (Figure 1.2), ε0 (τ) is the τ-restricted mean life time and, for h 6= 0, εh (τ) is the
expected time lost due to cause h before time τ (see Section 5.1.2). For the disability model
(Figure 1.3), ε1 (τ) is the expected time lived with disability before time τ.
In the common situation where everyone is in the same state (0) at time t = 0 (i.e., P(V (0) =
0) = 1), the marginal distribution of the random variable
that is, the time of first entry into state h, h 6= 0, (which may be infinite) may also be of
MATHEMATICAL DEFINITION OF PARAMETERS (*) 31
interest. For recurrent events, Th is the time until the hth occurrence of the event, e.g.,
the time from diagnosis to episode no. h for the psychiatric patients discussed in Section
1.1.5 (Figure 1.5). However, the most important marginal parameter for a recurrent events
process is the expected number of events in [0,t]
µ(t) = E(N(t)), (1.12)
where N(t) is the number of recurrent events in [0,t]. For the model in Figure 1.4, this is
the expected number of visits to state 1 in [0,t].
The parameters defined in this section are called marginal since, at time t, they involve no
conditioning on the past (V (s), s < t) (though they may involve time-fixed covariates).
for the survival time T = T1 (time until entry into state 1). For the competing risks model,
Figure 1.2, α0h (t) is the cause-specific hazard
where D = V (∞) is the cause of death and T = minh>0 Th is the survival time (time of exit
from state 0) with Th defined in (1.11). Both of these multi-state processes are Markovian.
The illness-death process of Figure 1.3 is non-Markovian if the intensity α12 (t) not only
depends on t but also on the time, d = t − T1 , spent in state 1 at time t.
The transition intensities are the most basic parameters of a multi-state model in the sense
that, in principle, if all transition intensities are specified, then all other parameters such as
state occupation probabilities, transition probabilities, and expected times spent in various
states may be derived. As we shall see in later chapters, the mapping from intensities to
other parameters is sometimes given by explicit formulas, though this depends both on the
structure of the states and possible transitions in the model and on the specific assumptions
(e.g., Markov or non-Markov) made for the intensities. Examples of such explicit formu-
las include the survival function (1.2) for the two-state model for survival data and, more
generally, !
Z t
Q0 (t) = exp − ∑ αh (u)du
0 h
in the competing risks model (Figure 1.2). Also, the cause-h cumulative incidence function
in the competing risks model, Equation (1.3), and the probability of being in the interme-
diate ‘Diseased’ state in both the Markov and semi-Markov illness-death model (Figure
1.3) (formulas to be given in Sections 5.1.3 and 5.2.4) are explicit functions of the inten-
sities. A general way of going from intensities to marginal parameters (not building on a
mathematical expression) is to use micro-simulation (Section 5.4).
Here, the transition intensity αh j (t) is some function of time t and the past history (Ht− )
for the interval [0,t),
Yh (t) = I(V (t−) = h) (1.19)
is the indicator for the subject of being observed to be in state h just before time t, and
is the increment (0 or 1, the jump) for Nh j at time t. Since λh j (t) is fixed given (Ht− ),
Equation (1.18) implies that if we define
Z t
Mh j (t) = Nh j (t) − λh j (u)du (1.21)
0
then
E(dMh j (t) | Ht− ) = 0
from which it follows that the process Mh j (t) in (1.21) is a martingale, i.e.,
see Exercise 1.4. The decomposition of the counting process in (1.21) into a martingale plus
the integrated intensity process (the compensator) is known as the Doob-Meyer decompo-
sition of Nh j (·). Since martingales possess a number of useful mathematical properties, in-
cluding approximate large-sample normal distributions (e.g., Andersen et al., 1993, ch. II),
this observation has the consequence that large-sample properties of estimators, estimating
equations, and test statistics may be derived when formulated via counting processes. We
will hint at this in later chapters.
34 INTRODUCTION
1.5 Exercises
Exercise 1.1 Consider the small data set in Table 1.6 and argue why both the average of all
(12) observation times and the average of the (7) uncensored times will likely underestimate
the true mean survival time from entry into the study
Exercise 1.2
1. Consider the following records mimicking the Copenhagen Holter study (Example 1.1.8)
in wide format and transform them into long format, i.e., create one data set for each of
the possible transitions in Figure 1.7.
Time of
Subject AF stroke death last seen
1 NA NA NA 100
2 10 NA NA 90
3 NA 20 NA 80
4 15 30 NA 85
5 NA NA 70 70
6 30 NA 75 75
7 NA 35 95 95
8 25 50 65 65
Exercise 1.4 (*) Argue (intuitively) how the martingale property E(M(t) | Hs ) = M(s)
follows from E(dM(t) | Ht− ) = 0 (Section 1.4.3).
Chapter 2
35
36 INTUITION FOR INTENSITY MODELS
2.1.1 Nelson-Aalen estimator
We will use the counting process notation (where I(· · · ) is an indicator function, see Section
1.2)
dNi (t) = I(i has an event in the interval (t,t + dt))
and
Yi (t) = I(i is event-free and uncensored just before time t).
If we assume that censoring is independent (Equation (1.6)) and that all CyA treated pa-
tients have the same intensity α(t), then a natural estimator of the probability α(t)dt is the
fraction
No. of patients with an event in (t,t + dt) dN(t)
= .
No. of patients at risk of an event just before time t Y (t)
Here, dN(t) = ∑i dNi (t) and Y (t) = ∑i Yi (t) are the total number of events at time t (typi-
cally 0 or 1) and the total number of subjects at risk at time t, respectively. This idea leads
to the Nelson-Aalen estimator for the cumulative hazard
Z t
A(t) = α(u)du,
0
as follows. Let 0 < X1 ≤ X2 ≤ · · · ≤ Xn be the ordered times of observation for the CyA
treated patients, i.e., an X is either an observed time of failure or a time of censoring,
whatever came first for a given subject. Then, for each such time, a term dN(X)/Y (X) is
added to the estimator, and the Nelson-Aalen estimator may then be written
b = dN(X1 ) + dN(X2 ) + · · · ,
A(t)
Y (X1 ) Y (X2 )
where contributions from all times of observation ≤ t are added up. Since only observed
failure times (dN(X) = 1) effectively contribute to this sum (for a censoring at X, dN(X) =
0), the estimator may be re-written as
b = 1
A(t) ∑ .
No. at risk at X
event times, X≤t
This estimates the cumulative hazard and on a plot of A(t)b against t, the ‘approximate local
slope’ estimates the intensity at that point in time. To establish an interpretation of A(t), we
study the situation with survival data (Figure 1.1), where the following ‘experiment’ can be
considered: Assume that one subject is observed from time t = 0 until failure (at time X1 ,
say). At that failure time, replace the first subject by another subject who is still alive and
observe that second subject until failure (at time X2 > X1 ). At X2 , replace by a third subject
who is still alive and observe until failure (at X3 > X2 ), and so on. In that experiment A(t)
is the expected number of replacements in [0,t]. In particular, we may note that A(t) is not
a probability and its value may exceed the value 1.
The standard deviation (or standard error) of the estimator A(t)
b (which we will throughout
abbreviate SD) can also be estimated, whereby confidence limits can be added to plots
of A(t),
b preferably by transforming symmetric limits for log(A(t)). This amounts to a
b exp(−1.96 · SD/A(t))
lower 95% confidence limit for A(t) of A(t) b and an upper limit of
MODELS FOR HOMOGENEOUS GROUPS 37
0.6
Cumulative hazard
0.4
0.2
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
b exp(1.96 · SD/A(t)).
A(t) b The SD for the cumulative hazard estimate will increase most
markedly during periods at which few subjects are at risk, i.e., Y (t) is low.
Figure 2.1 shows the Nelson-Aalen estimates for the two treatment groups, CyA and
placebo, from the PBC3 trial. It is seen that the curves for both treatment groups are
roughly linear (suggesting that the hazards are roughly constant) and that they are quite
equal (suggesting that CyA treatment does not affect the rate of failure from medical treat-
ment in patients with PBC). This can be emphasized by estimating the SD of A(t). b Thus,
at 2 years the estimates are, respectively, 0.183 (SD=0.035) in the placebo group and 0.167
(SD=0.034) in the CyA group, leading to the 95% confidence intervals (0.126, 0.266), re-
spectively (0.112, 0.249). To simplify the figure, confidence limits have not been added to
the curves.
such interval the value of the hazard may then be estimated by an occurrence/exposure rate
obtained as the ratio between the number of events occurring in that interval and the total
(‘exposure’) time at risk in the interval. Thus, if α` is the hazard in interval no. `, then it is
estimated by
No. of events in interval ` D`
b` =
α = .
Total time at risk in interval ` Y`
Note that the hazard has a per time dimension and that, therefore, whenever a numerical
value of a hazard is quoted, the units in which time is measured should be given.
For the PBC3 data, working with two-year intervals of follow-up time, the resulting event
and person-time counts together with the resulting estimated
√ rates are shown in Table 2.1
together with their estimated standard deviation, SD = D` /Y` and depicted in Figure 2.2.
It is seen that, judged from the SD values, the estimated hazards are, indeed, quite constant
over time and between the treatment groups. The SD is smaller when the event count is
large.
Figure 2.3 compares for the placebo group the estimated cumulative hazards using the two
different models (a step function for the non-parametric model and a broken straight line
for the piece-wise exponential model) and the two sets of estimates are seen to coincide
well.
14
Estimated hazard function (per 100 years)
12
10
0
0 1 2 3 4 5
Time since randomization (years)
Placebo CyA
Figure 2.2 PBC3 trial in liver cirrhosis: Estimated piece-wise exponential hazard functions by treat-
ment group, see Table 2.1.
either group, 0 or 1), setting up a two-by-two table summarizing the observations at that
time, see Table 2.2.
Across the tables for different X, the observed, dN0 (X), and expected (under the hypothesis
of identical hazards in groups 0 and 1)
Y0 (X)
(dN0 (X) + dN1 (X)),
Y0 (X) +Y1 (X)
numbers of failures from one group (here group 0) are added. Denote the resulting sums by
O0 and E0 , respectively. Also the variances
Y0 (X)Y1 (X)
(dN0 (X) + dN1 (X))
(Y0 (X) +Y1 (X))2
(if all failure times are distinct) are added across the tables to give v. Note that only observed
failure times (in either group) effectively contribute to these sums. The two-sample logrank
0.6
Cumulative hazard
0.4
0.2
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Figure 2.3 PBC3 trial in liver cirrhosis: Estimated cumulative hazards for the placebo group.
(O0 − E0 )2
.
v
For the PBC3 trial, the observed number of failures in the placebo group is O0 = 46, the
expected number is E0 = 44.68, and the variance v = 22.48 leading to a logrank test statistic
of 0.08 and the P-value 0.78. Note that the same results would be obtained by focusing,
instead, on the CyA group (because O0 + O1 = E0 + E1 = total number of observed failures
and, therefore, (O0 − E0 )2 = (O1 − E1 )2 ).
The logrank test can be extended to a comparison of more than two groups where, for
2 -distribution (e.g.,
comparison of k groups, the resulting test statistic is evaluated in the χk−1
Collett, 2015, ch. 2). Also a stratified version of the logrank test is available.
Cox model
A Cox model for the PBC3 trial would assume that the hazard for the placebo group is α0 (t)
and no assumptions concerning this hazard are imposed (it is a completely unspecified non-
parametric, baseline hazard function). On the other hand, the hazard function (say, α1 (t))
for the CyA group is assumed to be proportional to the baseline hazard, i.e., there exists a
constant hazard ratio, say HR, such that, for all t,
i.e.,
α1 (t)
= HR.
α0 (t)
This specifies a regression model because, for each patient (i) in the PBC3 trial, we can
define an explanatory variable (or covariate) Zi , as follows,
0 if patient i was in the placebo group
Zi =
1 if patient i was in the CyA group
and then the Cox model for the hazard for patient i
α0 (t) if patient i was in the placebo group
αi (t) =
α0 (t) HR if patient i was in the CyA group
Thus, the proportionality assumption is the same as a constant difference between the
log(hazards) at any time t. Figure 2.4 illustrates the proportional hazards assumption for a
binary covariate Z, both on the hazard scale (a) and the log(hazard) scale (b).
Because the Cox model combines a non-parametric baseline hazard with a parametric spec-
ification of the covariate effect, it is often called semi-parametric.
42 INTUITION FOR INTENSITY MODELS
Figure 2.4 Illustrations of the assumptions for a Cox model for a binary covariate Z.
REGRESSION MODELS 43
To estimate the hazard ratio, HR (or the regression coefficient β = log(HR)), the Cox log-
partial likelihood function l(β ) is maximized. This is
!
exp(β Zevent )
l(β ) = ∑ log (2.1)
∑ j at risk at time X exp(β Z j )
event times, X
and the intuition behind this is, as follows. At any event time, X, the covariate value, say
Zevent , for the individual with an event at that time ‘is compared’ to that of the subjects ( j)
who were at risk for an event at that time (i.e., still event-free and uncensored, including
the failing subject). Thus, if ‘surprisingly often’, the individual having an event is placebo-
treated compared to the distribution of the treatment variable Z j among those at risk, then
this signals that placebo treatment is a risk factor. The set of subjects who are at risk of the
event at a given time t is denoted the risk set, R(t).
For the PBC3 data, a Cox model including the treatment indicator, Z yields an estimated
regression coefficient of βb = −0.059 with an estimated standard deviation of 0.211, lead-
ing to an estimated hazard ratio of exp(−0.059) = 0.94 with 95% confidence limits from
0.62(= exp(−0.059 − 1.96 · 0.211)) to 1.43(= exp(−0.059 + 1.96 · 0.211)). This contains
the null value HR = 1, in accordance with the logrank test statistic. The estimated SD is
known as a model-based standard deviation since it follows from the likelihood function
l(β ). In the Cox model, the cumulative baseline hazard may be estimated using a ‘Nelson-
Aalen-like’ estimator, known as the Breslow estimator:
b0 (t) = 1
A ∑ . (2.2)
event times, X≤t ∑ j at risk at time X exp(β Z j )
b
For the PBC3 data, A0 (t) is the cumulative hazard in the placebo group, and the estimate
is shown in Figure 2.5. Note that, compared to Figure 2.1, there are many more steps in
the Breslow estimate. This is because all event times, i.e., in either treatment group, give
rise to a jump in the baseline hazard estimator. The intuition is that, due to the proportional
hazards assumption, an event for a CyA treated patient also contains information about the
hazard in the placebo group.
0.4
0.2
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Figure 2.5 PBC3 trial in liver cirrhosis: Breslow estimate for the cumulative baseline hazard in a
Cox model including only treatment.
where the regression coefficients have the following interpretation (see Section 1.2.5). Con-
sider two subjects differing 1 unit for covariate j and having identical values for the remain-
ing covariates in the model. Then the ratio between their hazards at any time t is
α0 (t) exp(β1 Z1 + · · · + β j−1 Z j−1 + β j (Z j + 1) + · · · + β p Z p )
= exp(β j ).
α0 (t) exp(β1 Z1 + · · · + β j−1 Z j−1 + β j (Z j ) + · · · + βpZp)
Thus, exp(β j ) is the hazard ratio for a 1 unit difference in Z j at any time t and for given
values of the remaining covariates in the model. Furthermore, the baseline hazard α0 (t) is
the hazard function for subjects where the linear predictor equals 0.
Table 2.3 PBC3 trial in liver cirrhosis: Average covariate values by treatment group.
If, for the PBC3 data we add the covariates Z2 = albumin and Z3 = bilirubin to the model
including only the treatment indicator, then the results in Table 2.4 are obtained.
It is seen that, for given value of the variables albumin and bilirubin, the log(hazard ratio)
comparing CyA with placebo is now numerically much larger and with a 95% confidence
interval for exp(β1 ) which is (0.391, 0.947) and, thus, excludes the null value. The interpre-
tation of the coefficient for albumin is that the hazard ratio when comparing two subjects
differing 1 g/L is exp(−0.116) = 0.89 for given values of treatment and bilirubin.
REGRESSION MODELS 45
Table 2.4 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from a Cox model.
Covariate βb SD
Treatment CyA vs. placebo -0.496 0.226
Albumin per 1 g/L -0.116 0.021
Bilirubin per 1 µmol/L 0.00895 0.00098
Table 2.5 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from a Poisson regression
model.
Covariate βb SD
Treatment CyA vs. placebo -0.475 0.224
Albumin per 1 g/L -0.112 0.021
Bilirubin per 1 µmol/L 0.00846 0.00094
Poisson model
In a similar way, a multiplicative regression model can be obtained with the starting point
being the piece-wise constant hazards model. For the PBC3 data, the model including only
treatment is
αi (t) = α0 (t) exp(β Zi ),
but now the baseline hazard, instead of being completely unspecified as in the Cox model,
is assumed to be constant in, e.g., 2-year intervals of follow-up time
α1 if t < 2,
α0 (t) = α2 if 2 ≤ t < 4,
α3 if 4 ≤ t.
α0 (t) exp(LP)
where the linear predictor LP is given by Equation (1.4) and the baseline hazard α0 (t) (the
hazard function when LP = 0) is assumed piece-wise constant. For the PBC3 data, adding
albumin and bilirubin to the model yields the estimates shown in Table 2.5 which are seen
to be quite close to those from the similar Cox model (Table 2.4).
46 INTUITION FOR INTENSITY MODELS
2.2.2 Modeling assumptions
Whenever the effect on some outcome of several explanatory variables is obtained by com-
bining the variables into a linear predictor, some assumptions are imposed:
• The effect of a quantitative covariate on the linear predictor is linear.
• For each covariate, its effect on the linear predictor is independent of other variables’
effects, i.e., there are no interactions between the covariates.
Since these assumptions are standard in models with a linear predictor, there are standard
ways of checking them. Thus, as discussed, e.g., by Andersen and Skovgaard (2006, ch.
4-5), to check linearity, extended models including non-linear effects, such as splines or
polynomials, may be fitted and compared statistically to the simple model with a linear
effect. To examine interactions, interaction terms may be added to the linear predictor and
the resulting models may be compared statistically to the simple additive model.
We will exemplify goodness-of-fit investigations using the data from the PBC3 trial.
Checking linearity
We will illustrate how to examine linearity of a quantitative covariate, Z, using either a
quadratic effect or linear splines. Both in the Cox model and in the Poisson model, either
the covariate Z 2 or covariates of the form
Z j = (Z − a j ) · I(Z > a j ), j = 1, . . . , s
may be added to the linear predictor. Here, the covariate values a1 < · · · < as are knots
to be selected. If no particular clinically relevant cut-points are available, then one would
typically use certain percentiles as knots. The spline covariate Z j gives, for subjects who
have a value of Z that exceeds the knot a j , how much the value exceeds a j . For subjects
with Z ≤ a j , the spline covariate is Z j = 0 and the linear predictor now depends on Z as
a broken straight line. Here, the interpretation of coefficient no. j is the change in slope
at the knot a j , and the coefficient for Z is the slope below the first knot, a1 . Linearity,
therefore, corresponds to the hypothesis that all coefficients for the added spline covariates
are equal to zero. In a model with a quadratic effect, i.e., including both Z and Z 2 , the
corresponding coefficients (say, β1 and β2 ) do not, themselves, have particularly simple
interpretations. However, the fact that a positive β2 suggests that the best fitting quadratic
curve for the covariate is a convex (‘happy’) parabola, while a negative β2 suggests that the
best fitting parabola for the covariate is concave (‘bad tempered’) does give some insight
into the dose-response relationship between the linear predictor and Z. In both cases, the
extreme point for the parabola (a minimum if β2 > 0 and a maximum if β2 < 0) corresponds
to Z = −β1 /(2β2 ), a fact that may give further insight.
For albumin, there is a normal range from 25g/L and up, and we choose s = 1 knot placed at
a1 = 25. For bilirubin, the normal range is 0 to 17.1µmol/L and we let s = 3 and a1 = 17.1,
a2 = 2×17.1, and a3 = 3×17.1. Table 2.6 shows the results for both the Cox model and the
Poisson model. It is seen that, for albumin, there is no evidence against linearity in either
model which is also illustrated in Figure 2.6 where the estimated linear predictors under
linearity and under the spline model are shown for the Poisson case.
REGRESSION MODELS 47
Table 2.6 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from Cox and Poisson mod-
els modeling the effects of albumin, bilirubin, and log2 (bilirubin) using linear splines (S) or as
quadratic (Q); LRT denotes the appropriate likelihood ratio test for linearity. All models included
albumin and bilirubin (modeled as described) and treatment.
For bilirubin, however, linearity describes the relationship quite poorly as illustrated both by
the likelihood ratio tests and Figure 2.7 (showing the linear predictors for the Poisson model
under linearity and with linear splines). Both the negative coefficients from the models
with quadratic effects and this figure suggest that the effect of bilirubin should rather be
modeled as some concave function. The maximum point for the parabola corresponds to a
bilirubin value of 0.0200/(2 · 0.000031) = 322.6 which is compatible with the figure. The
concave curve could be approximated by a logarithmic curve and Table 2.6 (and Figure 2.8)
show the results after a log2 -transformation and using the same knots. It should be noticed
that any logarithmic transformation would have the same impact on the results, and we
chose the log2 -transformation because it enhances the interpretation, as will be explained
in what follows. Since the linear spline has no systematic deviations from a straight line,
linearity after log-transformation is no longer contraindicated, and Table 2.7 shows the
48 INTUITION FOR INTENSITY MODELS
−1
−2
Linear predictor
−3
−4
−5
20 30 40 50
Albumin
Figure 2.6 PBC3 trial in liver cirrhosis: Linear predictor as a function of albumin in two Poisson
models. Both models also included treatment and bilirubin.
Table 2.7 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from Cox and Poisson mod-
els with linear effects of albumin and log2 (bilirubin).
estimates from Cox and Poisson models including treatment, albumin and log2 (bilirubin).
The interpretation of the Cox-coefficient for the latter covariate is that the hazard increases
by a factor of about exp(0.665) = 1.94 when comparing two subjects where one has twice
the value of bilirubin compared to the other (and similarly for the coefficient from the
Poisson model).
Checking interactions
In the models from Table 2.7, we will now study potential treatment-covariate interactions.
In Table 2.8, interactions between treatment and, in turn, albumin and log2 (bilirubin) have
been introduced, as follows. The covariate Z (albumin or log2 (bilirubin)) is replaced by two
covariates:
Z if treatment is placebo,
Z(0) =
0 if treatment is CyA,
REGRESSION MODELS 49
0
Linear predictor
−1
−2
−3
Figure 2.7 PBC3 trial in liver cirrhosis: Linear predictor as a function of bilirubin in two Poisson
models. Both models also included treatment and albumin.
−1
Linear predictor
−2
−3
−4
Figure 2.8 PBC3 trial in liver cirrhosis: Linear predictor as a function of log2 (bilirubin) in two
Poisson models. Both models also included treatment and albumin.
50 INTUITION FOR INTENSITY MODELS
Table 2.8 PBC3 trial in liver cirrhosis: Cox and Poisson models with examination of interaction
between treatment and albumin or log2 (bilirubin); LRT denotes the likelihood ratio test for the
hypothesis of no interaction.
and
Z if treatment is CyA,
Z(1) =
0 if treatment is placebo.
The model, additionally, includes a main effect of treatment. However, since the interpre-
tation of this is the hazard ratio for treatment when Z = 0, its parameter estimate was not
included in the table. From Table 2.8 it is seen that, for both models, the interactions are
quite small both judged from the separate coefficients in the two treatment groups and the
corresponding likelihood ratio tests. If a more satisfactory interpretation of the main effect
of treatment is required, then the Z in the definition of the interaction covariates Z(0) and
Z(1) can be replaced by a centered covariate, Z − Z̄, where Z̄ is, e.g., an average Z-value. In
that case, the main effect of treatment is the hazard ratio at Z = Z̄. Since centering does not
change the coefficients for Z(0) and Z(1) , we did not make this modification in the analysis.
For the Cox model, an alternative to assuming proportional hazards for treatment is to
stratify by treatment, leading to the stratified Cox model
Here, the linear predictor no longer includes treatment and the stratum is j = 0 for placebo
treated patients and j = 1 for patients from the CyA group. The effect of treatment is
via the two separate baseline hazards α00 (t) for placebo and α10 (t) for CyA, and these
two baseline hazards are not assumed to be proportional. Rather, like the baseline hazard
in the unstratified Cox model, they are completely unspecified. Figure 2.9 illustrates the
assumptions behind the stratified Cox model for two strata and one binary covariate Z both
on the hazard and the log(hazard) scale.
By estimating the cumulative baseline hazards A00 (t) and A10 (t) separately, the propor-
tional hazards assumption may be investigated. This may be done graphically by plotting
b10 (t) against A
A b00 (t) where, under proportional hazards, the resulting curve should be close
to a straight line through the point (0, 0) with a slope equal to the hazard ratio for treatment.
Note that, in Equation (2.3), the effect of the linear predictor (i.e., of the variables albumin
and log2 (bilirubin)) is the same in both treatment groups. Inference for this model builds on
a stratified Cox log partial likelihood where there are separate risk sets for the two treatment
groups (details to be given in Section 3.3).
We fitted the stratified Cox model to the PBC3 data which resulted in coefficients (SD)
−0.090 (0.022) for albumin and 0.663 (0.075) for log2 (bilirubin) close to what we have
seen before. Figure 2.10 shows the goodness-of-fit plot and suggests that proportional haz-
ards for treatment fits the PBC3 data well. The slope of the straight line in the plot is the
estimated hazard ratio for treatment exp(−0.574) found in the unstratified model. Similar
investigations could be done for albumin and log2 (bilirubin); however, for these quantitative
covariates, one would need a categorization in order to create the strata. For such covari-
ates, other ways of examining proportional hazards are better suited and will be discussed
in Sections 3.7 and 5.7.
Figure 2.9 Illustrations of the assumptions for a stratified Cox model for two strata and one binary
covariate Z.
REGRESSION MODELS 53
0.4
Cumulative baseline hazard: CyA
0.3
0.2
0.1
0.0
0.0 0.3 0.6 0.9
Cumulative baseline hazard: placebo
Figure 2.10 PBC3 trial in liver cirrhosis: Cumulative baseline hazard for CyA plotted against
cumulative baseline hazard for placebo in a stratified Cox model. The straight line has slope
exp(−0.574) = 0.563.
to hold quite generally (depending, though, to some extent on how the time-intervals for the
Poisson model are chosen). This is because any (hazard) function may be approximated by
a piece-wise constant function. Given this fact, the choice between the two types of model
is rather a matter of convenience. Some pros and cons may be given:
• In the Cox model a choice of time-intervals is not needed.
• For the Poisson model, estimates of the time-variable are given together with covariate
estimates. In the Cox model, the (cumulative) baseline hazard needs special considera-
tion.
• In the Poisson model, examination of proportional hazards is an integrated part of the
analysis requiring no special techniques.
• Some problems may involve several time-variables (e.g., Example 1.1.4). For the Cox
model, one of these must be selected as the ‘baseline’ time-variable, and the others can
then be included as time-dependent covariates (Section 3.7). For the Poisson model,
several (categorized) time-variables may be accounted for simultaneously.
• The Poisson model with categorical covariates may be fitted to a tabulated (and typically
much smaller) data set (Sections 3.4 and 3.6.4).
54 INTUITION FOR INTENSITY MODELS
2.2.4 Additive regression models
Both the Cox model and the Poisson model resulted in hazard ratios as measures of the
association between a covariate and the hazard function. Furthermore, we saw in the PBC3
example (and tried to argue beyond that study in Section 2.2.3) that these two multiplicative
models were so closely related that the resulting hazard ratios from either would be similar.
However, other hazard regression models exist and may sometimes provide a better fit to
a given data set and/or provide estimates with a more useful and direct interpretation. One
such class of models is the class of additive hazard models among which the Aalen model
(Aalen, 1989) is the most frequently used. In this model, the hazard function for a subject
with covariates (Z1 , Z2 , . . . , Z p ) is given by the sum
Here, both the baseline hazard α0 (t) and the regression functions β1 (t), . . . , β p (t) are un-
specified functions of time, t. The interpretation of the baseline hazard, like the baseline
hazard α0 (t) in the Cox model, is the hazard function for a subject with LP(t) = 0, while
the value, β j (t), of the jth regression function is the hazard difference at time t for two sub-
jects who differ by 1 unit in their values for Z j and have identical values for the remaining
covariates:
(see also Section 1.2.5). In the Cox model, the cumulative baseline hazard could be esti-
mated using the Breslow estimator. Likewise, in the Aalen model the cumulative baseline
hazard and the cumulative regression functions
Z t Z t Z t
A0 (t) = α0 (u)du, B1 (t) = β1 (u)du, . . . , B p (t) = β p (u)du
0 0 0
can be estimated. More specifically, the change in (A0 (t), B1 (t), . . . , B p (t)) at an observed
event time, X, is estimated by multiple linear regression. The subjects who enter this linear
regression are those who are at risk at time X, the outcome is 1 for the subject with an
event and 0 for the others, and this outcome is regressed linearly on the covariates for those
subjects. The resulting estimators Ab0 (t), Bb1 (t), . . . , Bbp (t) at time t are obtained by adding up
the estimated changes for event times X ≤ t. Since the estimated change at an event time,
X need not be positive, plots of the estimates Bb j (t) against t need not be increasing.
Figure 2.11 shows both the estimated cumulative baseline hazard and the estimated cu-
mulative treatment effect in an Aalen model for the PBC3 data including treatment as the
only covariate. The estimated treatment effect is equipped with 95% point-wise confidence
limits. It is seen that the cumulative baseline hazard is roughly linear suggesting (as we
have seen in previous analyses) that the baseline hazard is roughly constant. In this model
including only one binary explanatory variable, the estimated cumulative baseline hazard
REGRESSION MODELS 55
0.4 0.00
0.2 −0.25
0.0 −0.50
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Time since randomization (years) Time since randomization (years)
Figure 2.11 PBC3 trial in liver cirrhosis: Estimated cumulative baseline hazard and cumulative
regression function for treatment (with 95% point-wise confidence limits) in an additive hazard
model.
is the Nelson-Aalen estimate in the placebo group, cf. Figure 2.1. The cumulative treat-
ment effect, Bb1 (t) (judged from the confidence limits) is close to 0, still in accordance with
previous analyses. The estimator in this model is the difference between the Nelson-Aalen
estimators for the CyA group and the placebo group, see Figure 2.1. A significance test
derived from the model confirms that there is no evidence against a null effect of treatment
in this model (P = 0.75).
It is seen that the Aalen model is very flexible including completely unspecified (non-
parametric) baseline hazard and covariate effects and that the estimates from the model are
entire curves that may be hard to communicate (though exp(−B1 (t)) is the ratio between
survival functions, see Equation (1.2)). It is, thus, of interest to simplify the model, e.g.,
by restricting the regression functions to be time-constant (e.g., Martinussen and Scheike,
2007, ch. 5). The hypothesis of a time-constant hazard difference β1 (t) = β1 may also
be tested within the model and results in a P-value of 0.62 and an estimate βb1 = −0.0059
(SD = 0.021) per year corresponding to P = 0.78. Note that this coefficient has a ‘per time’
dimension relating to the units in which the time-variable was recorded. Thus, if somehow
10,000 person-years were collected for both the treated group and for the control group
then, according to this estimate, 59 fewer treatment failures are expected in the treated
group.
The simple additive model with a time-varying treatment effect may be extended with more
covariates like albumin and bilirubin. This leads to the estimated cumulative regression
functions shown in Figure 2.12. To interpret such a curve one should (as it was the case for
the Nelson-Aalen estimator) focus on its local slope which at time t is the approximate haz-
ard difference at that time when comparing subjects differing by one unit of the covariate.
It is seen that these slopes are generally negative for treatment and albumin and positive for
bilirubin in accordance with earlier results. To enhance readability of the figure, confidence
limits have been excluded. However, significance tests for the three regression functions
(Table 2.10) show that both of the biochemical variables are quite significant, but treatment
is not. Inspection of Figure 2.12 suggests that the regression functions are roughly constant
56 INTUITION FOR INTENSITY MODELS
0.0
Cumulative treatment effect
−0.1
−0.2
0 1 2 3 4 5 6
Time since randomization (years)
0.00
Cumulative albumin effect
−0.01
−0.02
−0.03
−0.04
0 1 2 3 4 5 6
Time since randomization (years)
0.0125
Cumulative bilirubin effect
0.0100
0.0075
0.0050
0.0025
0.0000
0 1 2 3 4 5 6
Time since randomization (years)
Figure 2.12 PBC3 trial in liver cirrhosis: Estimated cumulative regression functions for treatment,
albumin, and bilirubin in an additive hazard model.
(roughly linear estimated cumulative effects) and this is also what formal significance tests
indicate (Table 2.10).
Even though there is not evidence against a constant effect for any of the three covariates,
we first consider a model with a constant effect of treatment (Z) and time-varying effects
of albumin (Z2 ) and bilirubin (Z3 )
Here, the estimated hazard difference for treatment is βb1 = −0.040 per year (0.020)
P = 0.05. This model imposes the assumptions for the linear predictor of linear ef-
fects of albumin and bilirubin and no interactions between the included covariates – now
REGRESSION MODELS 57
Table 2.10 PBC3 trial in liver cirrhosis: P-values and estimated coefficients (and SD) from additive
hazard models including treatment and linear effects of albumin and bilirubin.
Estimated constant
P-value for effect per year
Covariate Covariate effect Constant effect βb SD
Treatment 0.112 0.69 -0.041 0.020
Albumin 0.006 0.96 -0.0084 0.0022
Bilirubin <0.001 0.16 0.0023 0.00038
assumptions referring to the additive hazard scale. These assumptions may be tested using
standard methods. Adding quadratic effects of, respectively, albumin and bilirubin to the
model with a constant effect of treatment results in P-values for linearity of 0.065 for al-
bumin and 0.05 for bilirubin. There seems to be no strong evidence against linearity. In the
models including quadratic effects, the estimated (constant) treatment effects are, respec-
tively, βb1 = −0.042 (0.020), and βb1 = −0.040 (0.021) (per year). This means that more
flexible models for the biochemical variables do not substantially change the estimated
treatment effect. A test for no interaction was performed in the model where all three co-
variate effects are constant (Table 2.10, last column). This was done along the same lines as
for the Cox model and identified no important interactions between treatment and albumin
(P = 0.76) and between treatment and bilirubin (P = 0.08).
Like the Cox model, an additive hazard regression model having some or all regres-
sion functions constant (β j (t) = β j ) and an unspecified baseline hazard α0 (t) is semi-
parametric.
α(t) = α0 (t) + β1 Z
58 INTUITION FOR INTENSITY MODELS
Table 2.11 PBC3 trial in liver cirrhosis: Estimated time-constant coefficients and SD (per year)
from an additive hazard model with piece-wise constant baseline hazard including treatment and
linear effects of albumin and bilirubin.
Covariate βb SD
Treatment -0.050 0.062
Albumin -0.0083 0.0048
Bilirubin 0.0020 0.00064
where
α1 if t < 2,
α0 (t) = α2 if 2 ≤ t < 4,
α3 if 4 ≤ t.
b1 = 0.091 (0.016), α
The estimates in this model are: α b2 = 0.132 (0.024), α
b3 = 0.094 (0.046)
(all rates per year, quite close to the similar estimates from the multiplicative Poisson
model) and βb1 = −0.0073 (0.021) (per year, quite close to the insignificant treatment ef-
fect in the Aalen model with a time-constant hazard difference). When including the two
biochemical variables, however, the convergence of the algorithm is questionable but nev-
ertheless leads to estimates quite similar to those from the time-constant Aalen model but
with larger estimated standard deviations, see Table 2.11.
0.06
0.04
Cumulative hazard
0.02
0.00
1 2 3 4 5 6
Follow−up time (months)
Figure 2.13 Guinea-Bissau childhood vaccination study: Estimated cumulative hazard in a Cox
model for a BCG-unvaccinated child aged 3 months at baseline. Time origin is time of first visit.
the intensity of future events and to use the non-parametric baseline hazard in a Cox model
to model its effect may not be an optimal use of that model’s possibilities. In the Bissau
study and in many other observational studies, a general alternative to using time since re-
cruitment as the time-variable for intensity models is to use (current) age. That is, the time
origin is now time of birth, time of entry is age at entry and time of event or censoring is
the corresponding age. Inference in this case has to be performed taking delayed entry into
account.
It turns out that intensity modeling as discussed earlier in this chapter, carries through with
simple modifications. Thus, the following types of model may be studied for the Bissau
data. If, for the ith child, ai is the age at entry then the mortality rate at age a, a > ai
could be α1 (a) if the child was BCG vaccinated before age ai and α0 (a) otherwise. If these
rates are not further specified then their cumulatives can be estimated using the Nelson-
Aalen estimator and they may be non-parametrically compared using the logrank test (with
delayed entry). In both cases, all that is needed is a modification of the risk set at any age a
60 INTUITION FOR INTENSITY MODELS
0.15
0.10
Cumulative hazard
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13
Age (months)
Figure 2.14 Guinea-Bissau childhood vaccination study: Breslow estimate of the cumulative base-
line hazard in a Cox model with current age as time-variable, i.e., time origin is time of birth.
which should now be those children, i for whom ai < a and ai +Xi ≥ a where, as previously,
Xi is time (since entry) of event or censoring. That is, the risk set at age a includes those
subjects who have an age at entry smaller than a and for whom event or censoring has
not yet occurred (see Figure 1.8b). Similarly, a Cox model αi (a) = α0 (a) exp(β Zi ) may
be fitted using the same risk set modification, and inference for a model with a piece-wise
constant intensity should be based on numbers of events and person-years at risk in suitably
chosen age intervals. Also, additive hazard models may be adapted to handle delayed entry
using similar risk set modifications.
Table 2.12 also shows the estimated BCG coefficient in a Cox model with age as the time-
variable and it is seen to be very similar to that from the model using follow-up time. Figure
2.14 shows the Breslow estimate of the cumulative baseline hazard (i.e., for an unvaccinated
child) in this model. The curve appears slightly convex (bending upwards) in accordance
with the positive coefficient for age at entry in the Cox model with follow-up time as time-
variable (Table 2.12). This figure also illustrates another feature of intensity models for
data with delayed entry, namely that, for small values of time (here, ages less that about 1
month) there may be few subjects at risk and, thereby, few or no events. As a consequence,
cumulative hazard plots may appear flat (as here) or have big steps. In such situations, one
may choose to present plots where the time axis begins in a suitable value a0 > 0 of age
where the risk set is sufficiently large – a choice that has no impact of the shape of the curve
(i.e., the estimated hazard itself) for values of a > a0 .
In Section 1.3, the concept of independent censoring was discussed. Recall that the meaning
of that concept is that, at any time t, subjects who were censored at that time should have the
COMPETING RISKS 61
same failure rate as those who survived beyond t. Handling delayed entry as just described
relies on a similar assumption of independent delayed entry. The assumption is here that,
at any time t, subjects who enter at that time should have the same failure rate as those who
were observed to be at risk prior to t. The consequence of independent delayed entry is
that, at any time t, the observed risk set at that time is representative for all subjects in the
population who are still alive at t.
Time zero
When analyzing multi-state survival data, a time zero must be chosen. For random-
ized studies, the time of randomization where participants fulfill the relevant in-
clusion criteria and where treatment is initiated is the standard choice. In clinical
follow-up studies there may also be an initiating event that defines the inclusion into
the study and may, therefore, serve as time zero. In observational studies, the time of
entry into the study is not necessarily a suitable time zero because that date may not
be the time of any important event in the life time of the participating subjects. In
such cases, an alternative time axis to be used is (current) age. For this, subjects are
not always followed from the corresponding time zero (age at birth), and data may
be observed with delayed entry. This means that subjects are only included into the
study conditionally on being alive and event-free at the time of entry into the study.
Analysis of intensities in the presence of delayed entry requires a modification of
the risk sets, and care must be taken when the risk set, as a consequence, becomes
small for small values of age.
patients because there were more deaths than transplantations. Such patterns can of course
only be observed when actually separating the components of the composite end-point.
Note that, because the hazard for the composite end-point, α(t) is the sum α1 (t) + α2 (t)
of the cause-specific hazards, a Cox model for α(t) may be mathematically incompatible
with Cox models for the cause-specific hazards. However, such a potential lack of fit may
not be sufficiently serious to invalidate conclusions.
Readers familiar with the Fine-Gray regression model may wonder why that model is not
discussed under this section’s heading of competing risks. The explanation is that the Fine-
Gray model is not a hazard-based model, and we will, therefore, postpone the discussion of
this model to Chapters 4 and 5 where the focus is on models for marginal parameters, such
as the cumulative incidence function.
with a single baseline 0 → 1 transition intensity and a common effect of the covariate
Z (Z = 1 for bipolar patients and Z = 0 for unipolar patients). Table 2.14 (left column)
also shows the resulting βb summarizing the h-specific estimates into a single effect, and
leading to a markedly reduced SD (all re-admissions are studied here, not just the first
four episodes). This model is a simple version of what is sometimes referred to as the AG
model, so named after Andersen and Gill (1982), see, e.g., Amorim and Cai (2015), where
re-admissions for any patient are independent of previous episodes. More involved models
for these data will be studied in later sections (e.g., Section 3.7.5). A way of accounting for
previous episodes is to stratify, leading to the model
for episode no. h = 1, 2, 3, · · · where there are separate baseline hazards (α01,h0 (t) for
episode no. h) but only a single regression coefficient for Z. This model (known as the
PWP model after Prentice, Williams and Peterson, 1981, see Amorim and Cai, 2015) is
seen to provide a smaller coefficient for Z than the AG model (Table 2.14, bottom line).
This is because, by taking the number of previous episodes into account (here, via stratifi-
cation), some of the discrepancy between bipolar and unipolar patients disappears since the
occurrence of repeated episodes is itself affected by the initial diagnosis. Similar models
(AG and PWP) may be set up for gap times and the results are also shown in the right col-
umn of Table 2.14. These models rely on the assumption that gap times are independent,
not only among patients – a standard assumption – but also within patients. The latter as-
sumption may not be reasonably fulfilled and we will in later sections (e.g., Sections 3.7.5
and 3.9) discuss models where this assumption is relaxed. A discussion of the use of gap
time models for recurrent events was given by Hougaard (2022), recommending (at least
64 INTUITION FOR INTENSITY MODELS
Table 2.14 Recurrent episodes of affective disorder: Estimated coefficients (and SD) from Cox mod-
els per episode, AG model, and PWP model for bipolar vs. unipolar disease.
for data from randomized trials) not to use gap time models. A further complication arising
when studying recurrent events is that censoring may depend on the number of previous
events (see, e.g., Cook and Lawless, 2007; sect. 2.6).
0.12
0.10
Cumulative hazard
0.08
0.06
0.04
0.02
0.00
0 12 24 36 48 60
Time since randomization (months)
Liraglutide Placebo
Figure 2.15 LEADER cardiovascular trial in type 2 diabetes: Estimated cumulative hazards
(Nelson-Aalen estimates) of recurrent myocardial infarctions by treatment group.
RECURRENT EVENTS 65
Table 2.15 LEADER cardiovascular trial in type 2 diabetes: Estimated coefficients (and SD) from
models for the hazard of recurrent myocardial infarctions for liraglutide vs. placebo.
Model βb SD
Cox model 1st event -0.159 0.080
AG model Cox type -0.164 0.072
Piece-wise constant -0.164 0.072
PWP model 2nd event -0.047 0.197
3rd event -0.023 0.400
4th event 0.629 0.737
5th event -0.429 1.230
All events -0.130 0.072
summarizes results from regression models. The coefficients for liraglutide versus placebo
from a Cox model for time to first event and from an AG model are quite similar; however,
with the latter having a smaller SD. Notice that a Cox type model and a model with a piece-
wise constant intensity give virtually identical results. As in the previous example, we can
see that estimates from PWP-type models with event-dependent coefficients become highly
variable, and that the coefficient from a PWP model with a common effect among different
event numbers gets numerically smaller compared to that from an AG model. One may
argue that the PWP models with event-dependent coefficients are not relevant for estimating
the treatment effect in a randomized trial because, for later event numbers, patients are no
longer directly comparable between the two treatment groups due to different selection into
the groups when the treatment, indeed, has an effect.
The basic parameter in multi-state models is the intensity (also called hazard or
rate). The intensity describes what happens locally in time and conditionally on the
past among those who are at risk for a given type of event at that time. Specification
of all intensities allows simulation of realizations of the multi-state process.
Intensities may typically be analyzed one transition at a time using methods as de-
scribed in this chapter. These include:
• Non-parametric estimation using the Nelson-Aalen estimator.
• Parametric estimation using a model with a piece-wise constant hazard.
• Multiplicative regression models:
– Cox model with a non-parametric baseline hazard (including the AG model
for recurrent events).
– Poisson model with a piece-wise constant baseline hazard.
• Aalen (additive) regression model.
66 INTUITION FOR INTENSITY MODELS
2.6 Exercises
Exercise 2.1 Consider the data from the Copenhagen Holter study (Example 1.1.8).
1. Estimate non-parametrically the cumulative hazards of death for subjects with or without
ESVEA.
2. Make a non-parametric test for comparison of the two.
3. Make a similar analysis based on a model where the hazard is assumed constant within
5-year intervals.
Exercise 2.2 Consider the data from the Copenhagen Holter study.
1. Make a version of the data set enabling an analysis of the composite end-point of stroke
or death without stroke (‘stroke-free survival’, i.e., define the relevant Time and Status
variables), see Section 1.2.4.
2. Estimate non-parametrically the cumulative hazards of stroke-free survival for subjects
with or without ESVEA.
3. Make a non-parametric test for comparison of the two.
4. Make a similar analysis based on a model where the hazard is assumed constant within
5-year intervals.
Exercise 2.3 Consider the data from the Copenhagen Holter study and the composite end-
point stroke-free survival.
1. Fit a Cox model and estimate the hazard ratio between subjects with or without ESVEA.
2. Fit a Poisson model where the hazard is assumed constant within 5-year intervals and
estimate the hazard ratio between subjects with or without ESVEA.
3. Compare the results from the two models.
Exercise 2.4 Consider the data from the Copenhagen Holter study and the composite end-
point stroke-free survival.
1. Fit a Cox model and estimate the hazard ratio between subjects with or without ESVEA,
now also adjusting for sex, age, and systolic blood pressure (sysBP).
2. Fit a Poisson model where the hazard is assumed constant within 5-year intervals and
estimate the hazard ratio between subjects with or without ESVEA, now also adjusting
for sex, age, and sysBP.
3. Compare the results from the two models.
EXERCISES 67
Exercise 2.5
1. Check the Cox model from the previous exercise by examining proportional hazards
between subjects with or without ESVEA and between men and women.
2. Check for linearity on the log(hazard)-scale for age and sysBP.
3. Do the same for the Poisson model.
Exercise 2.6 Consider the data from the Copenhagen Holter study and focus now on the
mortality rate after stroke.
1. Estimate non-parametrically the cumulative hazards for subjects with or without ESVEA
using the time-variable ‘time since recruitment’.
2. Assume proportional hazards and estimate the hazard ratio between subjects with or
without ESVEA.
3. Repeat these two questions using now the time-variable ‘time since stroke’ and compare
the results.
Exercise 2.7
1. Consider the data from the Copenhagen Holter study and fit Cox models for the cause-
specific hazards for the outcomes stroke and death without stroke including ESVEA,
sex, age, and sysBP.
2. Compare with the results from Exercise 2.4 (first question).
Exercise 2.8 Consider the data on repeated episodes in affective disorder, Example 1.1.5.
1. Estimate non-parametrically the cumulative event intensities for unipolar and bipolar
patients.
2. Fit an AG-type model and estimate, thereby, the ratio between intensities for unipolar
and bipolar patients, adjusting for year of diagnosis.
3. Fit a PWP model and estimate, thereby, the ratio between intensities for unipolar and
bipolar patients, adjusting for year of diagnosis.
4. Compare the results from the two models.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Chapter 3
Intensity models
This chapter gives some of the mathematical details behind the intensity models introduced
in Chapter 2. The corresponding sections are marked with ‘(*)’. The chapter also contains
a less technical section dealing with examples (Section 3.6) and, finally, time-dependent
covariates (Section 3.7) and models with shared parameters (Section 3.8) are introduced, as
well as random effects (frailty) models for situations where an assumption of independence
among observations may not be justified (Section 3.9).
Here, Yh (t) = I(V (t−) = h) is the indicator for being in state h just before time t and
the transition intensity αh j (t) is some non-negative function of the past (Ht− ) and of
time t containing parameters (say, θ ) to be estimated. Estimation is based on observing
(Nh ji (t),Yhi (t)) over some time interval for n independent subjects i = 1, . . . , n possibly,
together with covariates Z i for those subjects. As explained in Section 1.4, the right-hand
end-point of the time interval of observation for subject i is either an observed time, Ci of
right-censoring or it is a point in time (say, Ti ) where the multi-state process Vi (t) reaches
an absorbing state, e.g., when subject i dies. We will denote the time interval of observa-
tion for subject i by [0, Xi ] with Xi = Ti ∧ Ci . There may also be delayed entry (Sections
1.2 and 2.3), i.e., the observation of subject i begins at a later time point Bi > 0 and i is
only observed conditionally on not having reached an absorbing state by time Bi . For the
moment, we will discuss the case of no delayed entry and return to the general case below.
We assume throughout that both censoring and delayed entry are independent (Sections 1.3
and 2.3), possibly given relevant covariates.
69
70 INTENSITY MODELS
For a given state space S = {0, 1, . . . , k} and a given transition structure as indicated by
the box-and-arrow diagrams of Chapter 1, there is a certain number of possible transitions
and we index these by v = 1, . . . , K. Splitting the time interval of observation, t ≤ Xi , into
(small) sub-intervals, each of length dt > 0 we have, for each subject, a (long!) sequence
of multinomial experiments
1; dN1i (t), . . . , dNKi (t), 1 − ∑ dNvi (t) .
v
The index parameter of the multinomial distribution equals 1 because, in continuous time,
at most 1 event can happen at time t. The probability, conditionally on the past Ht− , of ob-
serving a given configuration (dN1 (t), . . . , dNK (t)) of events at time t, where either exactly
one of the dNv (t) equals 1 or they are all equal to 0, is then
dN1i (t) dNKi (t) 1−∑v dNvi (t)
Li (t) = λ1i (t)dt · · · λKi (t)dt × 1 − ∑ λvi (t)dt
v
and, therefore, the contribution to the likelihood function from subject i is the product
Li = ∏ Li (t)
t≤Xi
over all such intervals for t ≤ Xi . There will only be a finite number of intervals with an
event and letting dt → 0, the last factor
t≤Xi v
reduces to the product-integral exp(− ∑v 0Xi λvi (u)du) (Andersen et al., 1993, ch. II). We
R
will discuss the product-integral in more detail in Section 5.1. The likelihood contribution
from subject i then becomes
Z Xi
dN1i (t) dNKi (t)
Li = ∏ (λ1i (t)dt) · · · (λKi (t)dt) exp(− ∑ λvi (u)du). (3.1)
t≤Xi v 0
Equation (3.1) is the Jacod formula for the likelihood based on observing a multivariate
counting process for subject i, (Nvi (t), t ≤ Xi ; v = 1, . . . , K) (Andersen et al., 1993, ch. II).
For independent subjects, the overall likelihood is
L = ∏ Li .
i
Some brief remarks are in order here; for more details, the reader is referred to Andersen et
al. (1993; ch. III).
LIKELIHOOD FUNCTION (*) 71
1. In the case of delayed entry at time Bi > 0, the vth observed counting process for subject
i is Z t
dNv (u) = Nv (t) − Nv (Bi )
Bi
This has the consequence that data may be analyzed by considering one transition at a
time (unless some parameters of interest are shared between several transition types – a
situation that will be discussed in Section 3.8).
6. For the special case of survival data (Figure 1.1) there is only one type of event and the
index v may be omitted. There is also at most one event for subject i (at time Xi ) and in
72 INTENSITY MODELS
this case the likelihood contribution from subject i reduces to
Z Xi dN (X )
Li = exp − λi (u)du)(λi (Xi ) i i .
0
With Di = dNi (Xi ), and fi and Si , respectively, the density and survival function for the
survival time Ti , this becomes
Li = fi (Xi )Di Si (Xi )(1−Di ) .
It is seen that, for each v, the factor (3.2) in the general multi-state likelihood has the
same structure as the likelihood contribution from a subject, i in the two-state model
(however, possibly with more than one jump for each subject).
Likelihood factorization
Intensities are modeled in continuous time and at most one event can happen at any
given time. This has the consequence that the likelihood function (the Jacod for-
mula) factorizes and intensities may be modeled one at a time.
Under the assumption of independent censoring, the multi-state process is condi-
tionally independent of the censoring times given covariates. This has the conse-
quence that, as long as the covariates that create conditional independence are ac-
counted for, the censoring times will also give rise to separate factors in the likeli-
hood, and intensities in the multi-state process may be analyzed without specifying
a model for censoring.
As a consequence of the likelihood factorization, models for the αv (t) can be set up for one
type of transition v (∼ h → j) at a time. Possible such models (to be discussed in detail
in the following sections) include (3.3)-(3.6) below, where for ease of notation we have
dropped the index v. First,
α(·) is completely unspecified (3.3)
(Section 3.2). Here, the time-variable needs specification, the major choices being the base-
line time-variable t, or current duration in the state h, i.e., d = t − Th where Th (as in Section
1.4) denotes the time of entry into state h. In both cases, α(·) may depend on categorical co-
variates by stratification, and we
Rt
shall see in Section 3.2 how non-parametric estimation of
the cumulative hazard A(t) = 0 α(u)du (using the Nelson-Aalen estimator) is performed.
α(·) is piece-wise constant (3.4)
(Section 3.4). The time-variable needs specification and, in contrast to (3.3), it is now as-
sumed that time is split into a number, L, of intervals given by 0 = s0 < s1 < · · · < sL = ∞
and in each interval α(·) is constant
α(u) = α` for s`−1 ≤ u < s` .
The likelihood, as we shall see in Section 3.4, now leads to estimating α` by occur-
rence/exposure rates. As it was the case for (3.3), the hazard may depend on a categorical
covariate using stratification.
β T Z ) Cox regression model
α(· | Z ) = α0 (·) exp(β (3.5)
NON-PARAMETRIC MODELS (*) 73
(Section 3.3). In (3.5), the hazard is allowed to depend on time-fixed covariates, Z assuming
that all hazards are proportional and the baseline hazard, α0 (·) is completely unspecified
(as in (3.3)). The resulting Cox regression model is said to be semi-parametric because it
contains both the finite parameter vector β = (β1 , . . . , β p )T and the non-parametric compo-
nent α0 (·). This model may be generalized by allowing the covariates to depend on time.
Thus, in a recurrent events process (Example 1.1.5) the intensity of events can be modeled
as depending on the number of previous events, see Section 3.7. Though the Cox model
(3.5) is the most frequently used regression model, alternatives do exist. Thus (3.4) and
(3.5) may be combined into a multiplicative hazard regression model with a piece-wise
constant baseline hazard, known as piece-wise exponential or Poisson regression (Section
3.4). Finally, as an alternative to the multiplicative model (3.5), an additive hazard model
may be studied (Section 3.5). Here, all regression functions β j (·) may be time-dependent
(leading to the Aalen model) or some or all of the β j (·) may be time-constant. The baseline
hazard α0 (·) is typically taken to be unspecified like in the Cox model, an alternative being
to assume it to be piece-wise constant.
with Yi (t) = I(Bi < t ≤ Xi ) for some entry time Bi ≥ 0. That is, there is a common hazard,
α(t) for all subjects and this hazard is not further specified. In other words, we have a non-
parametric model for the intensity. The likelihood based on observing independent subjects
i = 1, . . . , n is Z ∞
L = ∏ ∏ (α(t)Yi (t))dNi (t) exp −
α(u)Yi (u)du ,
i t 0
Here, integrals are written with a lower limit of 0 and an upper limit equal to ∞ because the
indicators Yi (t) take care of the proper range of integration, (Bi , Xi ] for subject i. Formally,
the derivative with respect to a single α(t) is
dNi (t)
∑ −Yi (t)dt (3.8)
i α(t)
and it has a maximum likelihood interpretation. Note that ∑i Yi (t) is the number of subjects
observed to be at risk at time t and if X is an observed event time for (N1 (t), . . . , Nn (t)) then,
for each such time, a term 1/(∑i Yi (X)) is added to the estimator. On a plot of A(t) b against
t, the approximate local slope estimates the hazard at that point in time, see Figure 2.1 for
an example. Note that this slope may be estimated directly by smoothing the Nelson-Aalen
estimator (e.g., Andersen et al., 1993; ch. IV).
showing that the Nelson-Aalen estimator minus its target parameter (slightly modified be-
cause of the possibility that the risk set may become empty, i.e., Y (u) = 0) is a martingale
integral and, thereby, itself a martingale (Andersen et al., 1993, ch. IV). From this, point-
wise confidence limits for A(t) may be based on asymptotic normality for martingales and
on the following model-based variance estimator. By formally taking the derivative of (3.8)
6 t and −dN(t)/α(t)2 for s = t. Plugging in the Nelson-
with respect to α(s), we get 0 for s =
Aalen increments dN(t)/Y (t), this leads to the following estimator for var(A(t))
b
Z t
dN(u)
σb 2 (t) = . (3.11)
0 (Y (u))2
Point-wise confidence limits based on (3.11) typically build on symmetric confidence limits
for log(A(t)), i.e., a 95% confidence interval for A(t) is
Simultaneous confidence bands can also be constructed (e.g., Andersen et al., 1993; ch.
IV).
As mentioned in Section 3.1, stratification according to a categorical covariate, Z is possible
and separate Nelson-Aalen estimators can be calculated for each category. Comparison
among two or more cumulative hazards may be performed using the logrank test or other
NON-PARAMETRIC MODELS (*) 75
Rt
non-parametric tests, as follows. For the two-sample test, let A b j (t) = dN j (u)/Y j (u) be
0
the Nelson-Aalen estimator in group j = 0, 1, where N j = ∑i in group j Ni counts failures in
group j and Y j = ∑i in group j Yi is the number at risk in that group. A general class of non-
parametric test statistics for the null hypothesis H0 : A0 (u) = A1 (u), u ≤ t can then be based
on the process Z t
b1 (u) − d A
K(u) d A b0 (u) , (3.12)
0
where K(·) is some weight function of the observations in the interval [0,t) which is 0
whenever Y1 (u) = 0 or Y0 (u) = 0. Under H0 and using (1.21), the process (3.12) reduces to
the martingale
Z t
dM1 (u) dM0 (u)
K(u) − ,
0 Y1 (u) Y0 (u)
whereby conditions for asymptotic normality under H0 of the test statistic may be found
(Andersen et al., 1993; ch. V), see Exercise 3.1.
The most common choice of weight function is
Y0 (t)Y1 (t)
K(t) =
Y0 (t) +Y1 (t)
Y0 (Xi )Y1 (Xi ) Y0 (Xi ) +Y1 (Xi ) − dN0 (Xi ) − dN1 (Xi )
(dN0 (Xi ) + dN1 (Xi ))
(Y0 (Xi ) +Y1 (Xi ))2 Y0 (Xi ) +Y1 (Xi ) − 1
added across all observation times (Xi ) to give v. Note that the last factor in vi equals 1
when exactly 1 failure is observed at Xi , i.e., when dN0 (Xi ) + dN1 (Xi ) = 1. The resulting
two-sample logrank test statistic to be evaluated in the χ12 -distribution is
LR(∞)2
.
v
The logrank test is most powerful against proportional hazards alternatives, i.e., when
α1 (t) = HRα0 (t) for some constant hazard ratio HR, but other choices of weight func-
tion K(t) provide non-parametric tests with other power properties.
Along the same lines, non-parametric tests for comparison of intensities α j (t) among m > 2
groups may be constructed, as well as a stratified tests (Andersen et al., 1993, ch. V).
76 INTENSITY MODELS
3.3 Cox regression model (*)
Often, analyses of multi-state models involve several covariates, in which case stratification
as discussed in Section 3.2, is no longer feasible and some model specification of how
the covariates affect the hazard is needed. In the Cox regression model (Cox, 1972) this
specification is done using hazard ratios in a multiplicative model while, at the same time,
keeping the way in which time affects the hazard unspecified. Thus, in the Cox model the
hazard function for subject i with covariates Z i = (Zi1 , . . . , Zip )T is
β T Z i ),
α(t | Z i ) = α0 (t) exp(β (3.13)
where the baseline hazard α0 (t) is not further specified and the effect of the covariates is
via the linear predictor LPi = β T Z i involving p regression coefficients β = (β1 , . . . , β p )T
(Section 1.2.5). Inference for the baseline hazard and regression coefficients builds on the
Jacod formula
Z∞ dNi (t)
T
Li = exp − Yi (t)α0 (t) exp(β
β Z i )dt ∏ Yi (t)α0 (t) exp(ββ TZ i)
0 t
for the likelihood contribution from subject i (Section 3.1). This leads to the total log-
likelihood
Z ∞ Z ∞
T T
∑ β Z i ) dNi (t) −
log Yi (t)α0 (t) exp(β Yi (t)α0 (t) exp(β
β Z i dt
i 0 0
and differentiation with respect to a single α0 (t) (along the same lines as were used when
deriving the Nelson-Aalen estimator in Section 3.2.1) leads to the score equation
T
∑ i dN (t) −Y i (t)α0 (t) exp(β
β Z i )dt = 0. (3.14)
i
∂
Z ∞
∑ j Y j (t)Z β TZ j )
Z j exp(β
β )) = ∑
log(PL(β Zi − dNi (t) (3.17)
∂β i 0 ∑ j Y j (t) exp(ββ TZ j )
COX REGRESSION MODEL (*) 77
and solving the resulting score equation. This leads to the Cox maximum partial likelihood
estimator βb and inserting this into (3.15) yields the Breslow estimator of the cumulative
baseline hazard A0 (t) = 0t α0 (u)du
R
Z t
b0 (t) = ∑i dNi (u)
A T
(3.18)
0
∑i Yi (u) exp(βb Z i )
(Breslow, 1974). Note that the sums ‘∑ Y j (t)...’ in (3.15)-(3.18) are effectively sums over
the risk set
R(t) = { j : Y j (t) = 1} (3.19)
at time t.
Large-sample inference for βb may be based on standard likelihood results for PL(β β ). A
crucial step is to note that, evaluated at the true regression parameter and considered as
a process in t when based on the data from [0,t], (3.17) is a martingale. (Andersen and
Gill, 1982; Andersen et al., 1993, ch. VII), see Exercise 3.2. Thus, model-based standard
deviations of βb may be obtained from the second derivative of log(PL(β β )) and the resulting
Wald tests (as well as score- and likelihood ratio tests) are also valid. Also, joint large-
sample inference for βb and A b0 (t) is available (Andersen and Gill, 1982; Andersen et al.,
1993, ch. VII). For a simple model including only a binary covariate, the score test reduces
to the logrank test discussed in Section 3.2.2 (see Exercise 3.3). If the simple model is a
stratified Cox model (see (3.20) below), the score test reduces to a stratified logrank test.
Since the Cox model (3.13) includes a linear predictor LPi = β T Z i , there are general meth-
ods available for checking some of the assumptions imposed in the model, such as linear
effects (on the log(hazard) scale) of quantitative covariates and absence or presence of
interactions between covariates. Examination of properties for the linear predictor was ex-
emplified in Section 2.2.2. A special feature of the Cox model that needs special attention
when examining goodness-of-fit is that of proportional hazards (no interaction between
covariates and time). This is because of the non-parametric modeling of the time effect
via the baseline hazard α0 (t). We have chosen to collect discussions of general methods
for goodness-of-fit examinations for a number of different multi-state models (including
the Cox model) in a separate Section 5.7. However, methods for the Cox model are also
described in connection with the PBC3 example in Section 2.2.2, and in Section 3.7 ex-
amination of the proportional hazards assumption using time-dependent covariates will be
discussed.
A useful extension of (3.13) when non-proportional hazards are detected, say, among the
levels j = 1, . . . , m of a categorical covariate, Z0 , is the stratified Cox model
β T Z ),
α(t | Z0 = j, Z ) = α j0 (t) exp(β j = 1, . . . , m. (3.20)
In (3.20), there is an unspecified baseline hazard for each stratum, S j given by the level of
Z0 , but the same effect of Z in all strata (‘no interaction between Z0 and Z ’, though this
assumption may be relaxed). Inference still builds on the Jacod formula that now leads to a
stratified Cox partial likelihood for β
β TZ i)
Yi (t) exp(β dNi (t)
PLs (β
β) = ∏ ∏ ∏ T
(3.21)
j i∈S j t β Zk)
∑k∈S j Yk (t) exp(β
78 INTENSITY MODELS
Rt
and a Breslow estimator for A j0 (t) = 0 α j0 (u)du
Z t
∑i∈S j dNi (u)
b j0 (t) =
A (3.22)
T
0
∑i∈S j Yi (u) exp(βb Z i )
for a number (L) of intervals given by 0 = s0 < s1 < · · · < sL = ∞. Thus, in each interval,
α(·) is assumed to be constant. This model typically provides a reasonable approximation
to any given hazard, it is flexible and, as we shall see shortly, inference for the model is
simple. The model has the drawbacks that the cut-points (s` ) need to be chosen and that
the resulting hazard is not a smooth function of time. Smooth extensions of the piece-wise
constant hazard model using, e.g., splines have been developed (e.g., Royston and Parmar,
2002) but will not be further discussed here. We will show the estimation details for the
case L = 2, the general case L > 2 is analogous. The starting point is the Jacod formula
(3.1), and the (essential part of the) associated log-likelihood (3.7) now becomes
Z s` Z s`
!
log(L) = ∑ ∑ log(α` ) dNi (t) − α` Yi (t)dt .
i `=1,2 s`−1 s`−1
With D` = ∑i ss`−1
R `
dNi (t), the total number of observed events in interval ` and Y` =
R s`
∑i s`−1 Yi (t)dt, the total time at risk observed in interval `, the associated score is
∂ D`
log(L) = −Y`
∂ α` α`
leading to the occurrence/exposure rate
D`
b` =
α
Y`
PIECE-WISE CONSTANT HAZARDS (*) 79
being the maximum likelihood estimator. Standard large sample likelihood inference tech-
niques can be used to show that the pair (α b2 )T is asymptotically normal with the proper
b1 , α
mean and a covariance matrix based on the derivatives of the score which is estimated by
D1
!
Y12
0
.
0 YD22
2
A crucial step in this derivation is to notice that the score ∂ log(L)/∂ α` is a martingale
when evaluated at the true parameter values and considered as a process in t, when based
on data in [0,t] (Andersen et al., 1993, ch. VI).
By the delta-method, it is seen that the log(α b` ) are asymptotically normal with mean
√
log(α` ) and a standard deviation which may be estimated by 1/ D` . Furthermore, the
b` ) are asymptotically independent. This result is used for constructing 95%
different log(α
confidence limits for α` which become
√ √
b` exp(−1.96/ D` ) to α
from α b` exp(1.96/ D` ).
Comparison of piece-wise constant hazards in m strata, using the same partition of time,
0 = s0 < s1 < · · · < sL = ∞ in all strata may be performed via the likelihood ratio test which,
under the null hypothesis of equal hazards in all strata, follows an asymptotic χL(m−1) 2 -
distribution.
The model with a piece-wise constant hazard can also be used as baseline hazard in a
multiplicative (or additive – see Section 3.5) hazard regression model. The resulting multi-
plicative model is
β T Z i ),
α(t | Z i ) = α0 (t) exp(β
with α0 (t) = α0` when s`−1 ≤ t < s` , ` = 1, . . . , L. Since any baseline hazard function can
be approximated by a piece-wise constant function, the resulting Poisson or piece-wise
exponential regression model is closely related to a Cox model, and we showed in Section
2.2.1 that (for the PBC3 data) results from the two types of model were very similar. The
parameters is this model are estimated via the likelihood function (3.1), and hypothesis
testing for the parameters is also based on this.
The likelihood simplifies if Z i only consists of categorical variables, i.e., if there exist finite
sets C1 , . . . , C p such that Zi j ∈ C j , j = 1, . . . , p. In that case, Z i takes values c , say, in the
finite set C = C1 × · · · × C p . Letting θc = exp(β β T Z i ) when Z i = c , the likelihood function
becomes Z ∞
dNi (t)
∏ ∏(λi (t)) exp(−
0
λi (t)dt)
i t
L
= ∏ ∏ (α0` θc )N`cc exp(−α0` θcY`cc )
`=1 c∈C
with Z s`
N`cc = ∑ (Ni (s` ) − Ni (s`−1 )), Y`cc = ∑ Yi (t)dt,
Z i =cc
i:Z Z i =cc s`−1
i:Z
the total number of events in the interval from s`−1 to s` , respectively, the total time at risk
in that interval among subjects with Z i = c , see Exercise 3.5.
80 INTENSITY MODELS
The resulting likelihood is seen to be proportional to the likelihood obtained by, formally,
treating N`cc as independent Poisson random variables with mean α0` θcY`cc . This fact is the
origin of the name Poisson regression, and it has the consequence that parameters may
be estimated using software for fitting such a Poisson model. However, since there is no
requirement of assuming the N`cc to be Poisson distributed, this name has caused some
confusion and the resulting model should, perhaps, rather be called piece-wise exponential
regression since it derives from an assumption of piece-wise constant hazards. Another con-
sequence of this likelihood reduction is that, when fitting the model with only categorical
covariates, data may first be summarized in the cross-tables (N`cc , Y`cc ), ` = 1, . . . , L, c ∈ C .
For large data sets this may be a considerable data reduction, and we will use this fact when
analyzing the testis cancer incidence data (Example 1.1.3) in Section 3.6.4.
The model, as formulated here, is multiplicative in time and covariates and, thus, assumes
proportional hazards. However, since the categorical time-variable and the covariates enter
the model on equal footings, examination of proportional hazards can be performed by
examining time×covariate interactions. Other aspects of the linear predictor may be tested
in the way described in Section 2.2.2.
Here, both the baseline hazard α0 (t) and the regression functions β1 (t), ... , β p (t) are un-
specified functions of time, t. The interpretation of the baseline hazard, like the baseline
hazard α0 (t) in the Cox model, is the hazard function for a subject with a linear predictor
equal to 0, while the value, β j (t), of the jth regression function is the hazard difference at
time t for two subjects who differ by 1 unit in their values for Z j and have identical values
for the remaining covariates, see Section 2.2.4.
In this model, the likelihood is intractable and other methods of estimation are most of-
ten used (though Lu et al., 2023,studied the maximum likelihood estimator under certain
constraints). To this end, we define the cumulative baseline hazard and the cumulative re-
gression functions
Z t Z t
A0 (t) = α0 (u)du, B j (t) = β j (u)du, j = 1, . . . , p
0 0
however, more efficient, weighted versions of (3.24) exist (e.g., Martinussen and Scheike,
2006, ch. 5).
Large-sample properties of the estimators may be derived using that
Z t
b −
B(t) I(rank(Y(u)) = p + 1)dB(u)
0
is a vector-valued martingale. Thereby, SD(Bb j (t)) may be estimated to obtain 95% point-
wise confidence limits for B j (t). Furthermore, hypothesis tests for model reductions, such
as B j (t) = 0, 0 < t ≤ τ ∗ for a chosen τ ∗ , may be derived, e.g., based on supt≤τ ∗ |Bb j (t)|
(Martinussen and Scheike, 2006, ch. 5).
The Aalen model is very flexible including a completely unspecified baseline hazard and
covariate effects and, as a result, the estimates from the model are entire curves. It may,
thus, be of interest to simplify the model, e.g., by restricting some regression functions to be
time-constant. The hypothesis of a time-constant hazard difference β j (t) = β j , 0 < t ≤ τ ∗
may be tested (e.g., using a supremum-based statistic such as supt≤τ ∗ |Bb j (t) − (t/τ ∗ )βbj |,
where βbj is an estimate under the hypothesis of a time-constant hazard difference for the
jth covariate). In the resulting semi-parametric model where some covariate effects are
time-varying and other are time-constant, parameters may be estimated as described by
Martinussen and Scheike (2006, ch. 5). The ultimate model reduction leads to the additive
hazard model
α(t | Z ) = α0 (t) + β1 Z1 + · · · + β p Z p (3.25)
with a non-parametric baseline and time-constant hazard differences, much like the Cox
regression model (Lin and Ying, 1994).
The multiplicative models assuming either a non-parametric baseline (Cox) or a piece-wise
constant baseline (Poisson) were quite similar in terms of their estimates and, in a similar
fashion, an additive hazard model with a piece-wise constant baseline can be studied. This
leads to a model like (3.25) but now with α0 (t) = α0` when s`−1 ≤ t < s` , ` = 1, . . . , L. This
model is fully parametric and may be fitted using maximum likelihood. However, fitting
algorithms may be sensitive to starting values, reflecting the general difficulty in relation to
additive hazard models that estimated hazards may become negative.
82 INTENSITY MODELS
Counting processes and martingales
The mathematical foundation of the models for analyzing intensities is that of count-
ing processes and martingales (e.g., Andersen et al., 1993).
3.6 Examples
This section presents a series of worked examples to illustrate the models for rates discussed
so far. We will first recap the results from the PBC3 trial (Example 1.1.1), next present
extended analyses of the childhood vaccination survival data from Guinea-Bissau (Example
1.1.2), and finally discuss Examples 1.1.4 and 1.1.3.
DTP doses
BCG 0 1 2 3
Yes 1159 35.1% 1299 39.4% 582 17.6% 261 7.9%
No 1942 98.4% 19 1.0% 9 0.5% 3 0.2%
Total 3101 58.8% 1318 25.0% 591 11.2% 264 5.0%
Table 3.2 Guinea-Bissau childhood vaccination study: Estimated coefficients (and SD) from Cox
models, using age as time-variable, for vaccination status at initial visit.
Since relatively few children have received multiple DTP doses, we will dichotomize that
variable in the following. Table 3.2 shows estimated regression coefficients from Cox mod-
els addressing the (separate and joint) effects of vaccinations on mortality. We have repeated
the analysis from the previous chapter where only BCG is accounted for showing a ben-
eficial effect of this vaccine. It it seen that, unadjusted for BCG, there is no association
between any dose of DTP and mortality, while, adjusting for BCG, DTP tends to increase
mortality, albeit insignificantly, while the effect of BCG seems even more beneficial than
without adjustment. In this model, due to the collinearity between the two covariates, stan-
dard deviations are inflated compared to the two simple models. Finally, it is seen that there
is no important interaction between the effects of the two vaccines on mortality (though
the test for this has small power because few children received a DTP vaccination without
a previous BCG). An explanation behind these findings could be that BCG vaccination is,
indeed, beneficial as seen in the additive model, while DTP tends to be associated with
an increased mortality rate. This latter effect is, however, not apparent without adjustment
for BCG because most children who got the DTP had already received the ‘good’ BCG
vaccine. Further discussions are provided by Kristensen et al. (2000).
Covariate βb SD βb SD
Sclerotherapy vs. none 0.056 0.392 0.177 0.433
Propranolol vs. none -0.040 0.400 0.207 0.424
Both treatments vs. none -0.032 0.401 0.031 0.421
Sex male vs. female -0.026 0.329
Coagulation factors % of normal -0.0207 0.0078
log2 (bilirubin) µmol/L 0.191 0.149
Medium varices vs. small 0.741 0.415
Large varices vs. small 1.884 0.442
Covariate βb SD βb SD
Sclerotherapy vs. none 0.599 0.450 0.826 0.459
Propranolol vs. none -0.431 0.570 -0.160 0.575
Both treatments vs. none 1.015 0.419 0.910 0.420
Sex male vs. female 0.842 0.416
Coagulation factors % of normal -0.0081 0.0068
log2 (bilirubin) µmol/L 0.445 0.137
Medium varices vs. small 0.222 0.347
Large varices vs. small 0.753 0.449
arms may be quantified via hazard ratios from Cox models for each of the two outcomes.
The results from the Cox models are summarized in Table 3.3.
The four-sample logrank test statistics for the two outcomes take, respectively, the values
0.071 for bleeding and 12.85 for death without bleeding corresponding to P-values of 0.99
and 0.005. So, for bleeding there are no differences among the treatment groups, while
for death without bleeding there are. Inspecting the regression parameters for this outcome
in Table 3.3, it is seen that the two groups where sclerotherapy was given have higher
death rates. The Cox model with separate effects in all treatment arms can be reduced
to a model with additive (on the log-rate scale) effects of sclerotherapy and propranolol
(LRT = 1.63, 1 DF, P = 0.20) and in the resulting model, propranolol is insignificant (LRT
= 0.35, 1 DF, P = 0.56). The regression parameter for sclerotherapy in the final model is
βb = 1.018 (SD = 0.328).
The PROVA trial was randomized and, hence, adjustment for prognostic variables should
not change these conclusions (though, in the PBC3 trial, Example 1.1.1, such an adjustment
did change the estimated treatment effects considerably). Nevertheless, Table 3.3 for illus-
tration also shows treatment effects after such adjustments. Adjustment was made for four
covariates, of which two (coagulation factors and size of varices, the latter a three-category
variable), are associated with bleeding and two (sex and log2 (bilirubin)) with death with-
out bleeding. However, conclusions concerning treatment effects are not changed by this
EXAMPLES 85
Table 3.4 PROVA trial in liver cirrhosis: Cox model for the composite outcome bleeding-free sur-
vival.
Covariate βb SD
Sclerotherapy vs. none 0.525 0.313
Propranolol vs. none 0.100 0.338
Both treatments vs. none 0.495 0.292
Sex male vs. female 0.360 0.253
Coagulation factors % of normal -0.0136 0.0053
log2 (bilirubin) µmol/L 0.328 0.102
Medium varices vs. small 0.446 0.263
Large varices vs. small 1.333 0.301
adjustment, and the same is the case if further adjustment for age is done (results not
shown). It could be noted that, from the outset of the trial, the two end-points were consid-
ered equally important and, thus, merging them into the composite end-point ‘bleeding-free
survival’ is of interest. However, the results in Table 3.3 suggest that doing so would provide
a less clear picture. Estimates are shown in Table 3.4 where the significance of treatment
diminishes (LRT with 3 DF: P = 0.19) and, among the covariates, sex loses its significance
(Wald test: P = 0.16), while the three remaining covariates keep their significance. Note
also that Cox models for the two separate end-points are mathematically incompatible with
a Cox model for the composite end-point.
50
40
10 16
30
Age (years)
8 39 36
8 40 77 41
20
8 40 78 84 42
0
1950 1960 1970 1980 1990 2000
Calendar time
Figure 3.1 Testis cancer incidence and maternal parity: Lexis diagram showing the numbers of
person-years at risk (in units of 10,000 years) for combinations of age and birth cohort.
Fortunately, due to the likelihood factorization (3.2), and as explained in the introduction to
Chapter 2, the available data do allow a correct inference for the rate of testis cancer. Table
3.5 shows estimated log(hazard ratios) from Poisson models with age as the time-variable
(in categories 0-14, 15-19, 20-24, 25-29, and 30+ years) and including either parity alone
(1 vs. 2+) or parity together with birth cohort and mother’s age. It is seen that, unadjusted,
firstborn sons have an exp(0.217) = 1.24-fold increased rate of testicular cancer and this
estimate is virtually unchanged after adjustment for birth cohort and mother’s age. The
95% confidence interval for the adjusted hazard ratio is (1.05, 1.50), P = 0.01. The rates
increase markedly with age and LR tests for the adjusting factors are LRT= 2.53, 3 DF,
P = 0.47 for mother’s age and LRT= 9.22, 4 DF, P = 0.06 for birth cohort of son. It was
also studied whether the increased rate for firstborn sons varied among age groups (i.e., a
potential interaction between parity and age, non-proportional hazards). This was not the
case as seen by an insignificant LR-statistic for interaction (LRT= 7.76, 4 DF, P = 0.10).
It was finally studied how the rates of seminomas and non-seminomas, respectively, were
associated with parity. The hazard ratios (HRs) in relation to these competing end-points
were remarkably similar: HR= 1.23 (0.88, 1.72) for seminomas, HR= 1.27 (1.02, 1.56) for
non-seminomas.
TIME-DEPENDENT COVARIATES 87
Table 3.5 Testis cancer incidence and maternal parity: Poisson regression models.
Covariate βb SD βb SD
Parity 1 vs. 2+ 0.217 0.084 0.230 0.091
Age (years) 0-14 vs. 20-24 -4.031 0.211 -4.004 0.239
15-19 vs. 20-24 -1.171 0.119 -1.167 0.125
25-29 vs. 20-24 0.560 0.098 0.617 0.104
30+ vs. 20-24 0.753 0.133 0.954 0.154
Mother’s age (years) 12-19 vs. 30+ 0.029 0.241
20-24 vs. 30+ 0.058 0.222
25-29 vs. 30+ -0.117 0.225
Son cohort 1950-57 vs. 1973+ -0.363 0.288
1958-62 vs. 1973+ -0.080 0.248
1963-67 vs. 1973+ 0.124 0.237
1968-72 vs. 1973+ 0.134 0.236
αh j (t) ≈ P(V (t + dt) = j | V (t) = h and the past for s < t)/dt,
if the event is a transition from state h to state j (and dt > 0 is ‘small’). In the model dis-
cussions and examples given so far, the past for s < t only included time-fixed covariates,
recorded at the time of study entry. However, one of the strengths of hazard regression
models is their ability to also include covariates that are time-dependent. Time-dependent
covariates can be quite different in nature and, in what follows, we will distinguish between
adapted and non-adapted covariates and, for the latter class, between internal (or endoge-
nous) and external (or exogenous) covariates. In later Subsections 3.7.5-3.7.8 we will, via
examples, illustrate different aspects of hazard regression models with time-dependent co-
variates.
3.7.3 Inference
Inference for the parameters in the hazard model can proceed along the lines described in
earlier sections. For the Cox model, the regression coefficient β can be estimated from the
Cox partial log-likelihood
!
exp(β Zevent (X))
l(β ) = ∑ log
event times, X ∑ j at risk at time X exp(β Z j (X))
where, at event time X, the covariate values at that time are used for everyone at risk at X.
Similarly, the cumulative baseline hazard A0 (t) is estimated by
b0 (t) = 1
A ∑ .
event times, X≤t ∑ j at risk at time X exp(β Z j (X))
b
Least squares estimation in the additive Aalen model (Sections 2.2.4 and 3.5) can, likewise,
be modified to include time-dependent covariates. The arguments for the Cox model depend
on whether the time-dependent covariates are adapted or not. These arguments are outlined
in the next section.
TIME-DEPENDENT COVARIATES 89
The ability to do micro-simulation (Section 5.4) depends on whether time-dependent co-
variates are adapted or not. Micro-simulation of models including non-adapted covariates
requires joint models for the multi-state process and the time-dependent covariate, and in
Section 7.4 we will briefly discuss joint models.
There is a related feature (to be discussed in Sections 4.1 and 5.2) that depends on whether
a hazard model includes time-dependent covariates or not. This is the question of whether it
is possible to estimate marginal parameters by plug-in. This will typically only be possible
with time-fixed covariates, with deterministic time-dependent covariates, or with exoge-
nous covariates, for the latter situation, see Yashin and Arjas (1988).
Covariate βb SD βb SD βb SD
Bipolar vs. unipolar 0.366 0.094 0.318 0.095 0.067 0.097
N(t−) 0.126 0.0087 0.425 0.032
N(t−)2 -0.0136 0.0016
Ni (t−) as a time-dependent covariate. This is an adapted variable. Since its effect appears
highly non-linear (P < 0.001 for linearity in a model including both Ni (t−) and Ni (t−)2 ),
we quote the hazard ratio for diagnosis from the model including the quadratic term which
is exp(0.067) = 1.069 with 95% confidence limits from 0.884 to 1.293. In a model includ-
ing only Ni (t−), it is seen that the recurrence rate increases with the number of previous
episodes and so does the rate in the quadratic model where the maximum of the estimated
parabola for Ni (t−) is −0.425/(−2 · 0.0136) = 15.6. Table 3.6 summarizes the results. In-
cluding Ni (t−) as a categorical variable instead gives the hazard ratio exp(0.090) = 1.094
(0.904, 1.324). In both cases, a substantial reduction of the hazard ratio is seen when com-
paring to the value 1.442 from the model without adjustment for Ni (t−). The explanation
was given in Section 2.5 in connection with the PWP model where adjustment for previ-
ous episodes was carried out using (time-dependent) stratification, namely that the occur-
rence of repeated episodes is itself affected by the initial diagnosis, and previous episodes,
therefore, serve as an intermediate variable between the baseline covariate and recurrent
episodes. The AG model including functions of Ni (t−) is, thus, more satisfactory in the
sense that the independence assumption is relaxed; however, it is less clear if it answers the
basic question of comparison between the two diagnostic groups. To answer this question,
models for the marginal mean number of events over time may be better suited (Sections
4.2.3 and 5.5.4).
Another time-dependent covariate which may affect the estimate is current calendar period.
This is also an adapted covariate for given value of the baseline covariate calendar time at
diagnosis. The number of available beds in psychiatric hospitals has been decreasing over
time, and if access to hospital varies between the two diagnostic groups, then adjustment
for period may give rise to a different hazard ratio for diagnosis. We, therefore, created a
covariate by categorizing current period, i.e., calendar time at diagnosis + follow-up time,
into the intervals: Before 1965, 1966-70, 1971-75, 1976-80, 1981+, and adjusted for this in
the simple AG model including only diagnosis. As seen in Table 3.7, the resulting hazard
ratio for diagnosis is only slightly smaller than without adjustment (exp(0.361) = 1.435,
95% c.i. from 1.193 to 1.728). It is also seen that the recurrence rate tends to decrease with
calendar period.
Covariate βb SD
Bipolar vs. unipolar 0.361 0.095
Period 1966-70 vs. 1959-65 -0.251 0.208
1971-75 vs. 1959-65 -0.179 0.331
1976-80 vs. 1959-65 -0.367 0.439
1981+ vs. 1959-65 -1.331 0.554
the basic trial question; however, it is of clinical interest to study the 1 → 2 transition rate
in the illness-death model of Figure 1.3. For this purpose, a choice of time-variable in the
model for the rate α12 (·) is needed. For the two-state model for survival analysis and for
the competing risks model, a single time origin was assumed and all intensities depended
on time t since that origin. The same is the case with the rates α01 (t) and α02 (t) in the
illness-death model. However, for the rate α12 (·), both the time-variable t (time since ran-
domization) and time since entry into state 1, duration d = d(t) = t − T1 , may play a role.
Note the similarity with the choice of time-variable for models for transition intensities in
models for recurrent events (Section 2.5). If α12 (·) only depends on t, then the multi-state
process is said to be Markovian; if it depends on d, then it is semi-Markovian; see Section
1.4. In the Markovian case, inference for α12 (t) needs to take delayed entry into account;
if α12 (·) only depends on d, then this is not the case.
Results from Cox regression analyses are displayed in Table 3.8. It should be kept in mind
when interpreting these results that this is a small data set with 50 patients and 29 deaths
(Table 1.2). We first fitted a (Markov) model with t as the baseline time-variable including
treatment, sex, and log2 (bilirubin) (top panel (a), first column in Table 3.8). The latter two
covariates were not strongly associated with the rate and their coefficients are not shown –
the same holds for subsequent models. In this model, the treatment effect was significant
(LRT= 11.12, 3 DF, P = 0.01) and there was an interaction between propranolol and scle-
rotherapy (LRT= 6.78, 1 DF, P = 0.009). The combined treatment group seems to have a
high mortality rate and the group receiving only sclerotherapy a low rate, however, it should
be kept in mind that one is no longer comparing randomized groups because the patients
entering the analysis are selected by still being alive and having experienced a bleeding –
features that may, themselves, be affected by treatment. The Breslow estimate of the cu-
mulative baseline hazard is shown in Figure 3.2 and is seen to increase sharply for small
values of t. It should be kept in mind that there is delayed entry and, as a consequence, few
patients at risk at early failure times: For the first five failure times the numbers at risk were,
respectively, 2, 3, 5, 6, and 6. The Markov assumption corresponds to no effect of duration
since bleeding on the mortality rate after bleeding, and this hypothesis may be investigated
using adapted time-dependent covariates. Defining di (t) = t − T1i where T1i is the time of
entry into state 1 for patient i, i.e., his or her time of bleeding, the following two covariates
were added to the first model
Covariate βb SD βb SD
Sclerotherapy vs. none -1.413 0.679 -1.156 0.684
Propranolol vs. none -0.115 0.595 -0.024 0.631
Both treatments vs. none 0.733 0.544 0.425 0.611
d(t) <5 days vs. ≥10 days 2.943 0.739
5 days≤ d(t) <10 days vs. ≥10 days 2.345 0.803
(b) Duration
Covariate βb SD βb SD
Sclerotherapy vs. none -0.997 0.643 -1.019 0.650
Propranolol vs. none -0.300 0.596 -0.312 0.601
Both treatments vs. none 0.871 0.514 0.847 0.524
t <1 year vs. t ≥2 years -0.172 0.910
1 year≤ t <2 years vs. t ≥2 years -0.221 0.886
see top panel (a), second column of Table 3.8. These covariates were strongly associated
with the mortality rate (P < 0.001), and the Markov assumption is clearly rejected. We
can also see that the treatment effect changes somewhat and it is no longer statistically
significant (P = 0.29). The same conclusion is arrived at if, instead, the time-dependent
covariate di (t) is included with an assumed linear effect on the log(rate) (not shown).
The coefficients for the time-dependent covariates show that the mortality rate is very high
shortly after the bleeding episode and instead of attempting to model this effect parametri-
cally, using time-dependent covariates, an alternative would be to use duration since bleed-
ing as the baseline time-variable in a Cox model. Results from such models are shown
in the lower panel (b) of Table 3.8. In the model including treatment (together with sex
and log2 (bilirubin) – coefficients not shown) this is statistically significant (P = 0.02), and
there is a tendency that the combined treatment group has the highest mortality rate and
the group receiving only sclerotherapy the lowest. The Breslow estimate of the cumulative
baseline hazard is shown in Figure 3.3 and is seen to increase sharply for small values of
duration. With this time-variable there is no left-truncation and the estimator does not have
a particularly large variability for small values of duration.
To this model, one may add functions of t as time-dependent covariates to investigate
whether time since randomization affects the mortality rate. Neither a piece-wise constant
effect (Table 3.8, lower panel (b), second column) nor a linear effect (not shown) suggested
any importance and their inclusion has little impact on the estimated treatment effects.
We will prefer the model with duration as the baseline time-variable because, of the two
time-variables, duration seems to have the strongest effect on the mortality rate and by
using this in the Cox baseline hazard, one avoids making parametric assumptions about the
way in which it affects the rate.
TIME-DEPENDENT COVARIATES 93
1.00
0.75
Cumulative hazard
0.50
0.25
0.00
0 1 2 3 4
Time since randomization (years)
Figure 3.2 PROVA trial in liver cirrhosis: Breslow estimate for the cumulative baseline hazard in a
Cox model for the 1 → 2 transition rate as a function of time since randomization.
0.15
Cumulative hazard
0.10
0.05
0.00
0 1 2 3
Duration (years)
Figure 3.3 PROVA trial in liver cirrhosis: Breslow estimate for the cumulative baseline hazard in a
Cox model for the 1 → 2 transition rate as a function of time since bleeding (duration).
94 INTENSITY MODELS
Table 3.9 PROVA trial in liver cirrhosis: Deaths after bleeding/person-years at risk according to
duration and time since randomization.
The choice between duration of bleeding and time since randomization as baseline time-
variable may be entirely avoided by, instead of using a Cox regression model, analyzing
the data using a Poisson regression model and splitting the follow-up time (since bleeding)
according to both time-variables. This will require a choice of cut-points for both time-
variables. Using the same intervals as in the two Cox models (i.e., 1 and 2 years for t
and 5 and 10 days for d(t)) gives the distribution of deaths and person-years at risk as
shown in Table 3.9. It is seen that no patients with a bleeding episode after 2 years since
randomization died within the first 10 days after bleeding, and one is also reminded of the
fact that this is a small data set. Having split the follow-up time after bleeding according
to the two time-variables, Poisson regression models including either time-variable or both
may be fitted. The results from these models, also including the insignificant variables sex
and log2 (bilirubin), are presented in Table 3.10. It can be noticed (cf. Section 2.2) that
results from similar Cox or Poisson models tend to be very close and, furthermore, that it
is not crucial for the estimated regression coefficients for treatment which time-variable(s)
we adjust for. This is also the case in a model allowing for an interaction between the
two time-variables (not shown). However, the fact that in the model including both time-
variables additively, duration is strongly associated with the rate (P < 0.001) but time since
randomization is not (P = 0.53) suggests that it is most important to account for duration
since bleeding.
The Cox model requires a specification of the time-variable for the baseline hazard.
If several time-variables affect the intensity (e.g., both time on study and duration in
a state), there is a choice to be made: Which time-variable should be baseline, and
which can be handled using adapted time-dependent covariates? A general advice is
to choose as baseline time-variable, one that has a marked effect on the hazard that
may be hard to model parametrically.
For a Poisson model, several time-variables can be handled in parallel (i.e., without
pin-pointing one as ‘baseline’); however, in that case all time-variables must be
categorized with an assumption that they have a piece-wise constant effect on the
intensity.
TIME-DEPENDENT COVARIATES 95
Table 3.10 PROVA trial in liver cirrhosis: Poisson regression models for the rate of death after
bleeding accounting for duration since bleeding (a), time since randomization (b), or both (c).
(a)
Covariate βb SD
Sclerotherapy vs. none -1.130 0.642
Propranolol vs. none -0.314 0.589
Both treatments vs. none 0.967 0.516
d(t) <5 days vs. ≥10 days 3.602 0.439
5≤ d(t) <10 days vs. ≥10 days 2.844 0.637
(b)
Covariate βb SD
Sclerotherapy vs. none -1.281 0.652
Propranolol vs. none -0.432 0.566
Both treatments vs. none 0.801 0.525
t <1 year vs. t ≥2 years 1.507 0.579
1≤ t <2 years vs. t ≥2 years 0.430 0.648
(c)
Covariate βb SD
Sclerotherapy vs. none -1.110 0.648
Propranolol vs. none -0.318 0.579
Both treatments vs. none 0.826 0.514
d(t) <5 days vs. ≥10 days 3.350 0.464
5≤ d(t) <10 days vs. ≥10 days 2.583 0.659
t <1 year vs. t ≥2 years 0.733 0.618
1≤ t <2 years vs. t ≥2 years 0.189 0.655
any evidence against proportionality. It is also seen that, for each of the three covariates,
the coefficient has the same sign for all choices of f (t). The tendencies for treatment and
albumin are that the hazard ratio increases over time while, for bilirubin, it decreases.
is an adapted time-dependent covariate. For relapse, the estimated hazard ratio for GvHD
is exp(βb2 ) = 0.858 with 95% confidence limits from 0.663 to 1.112 (P = 0.25). The pro-
portional hazards assumption was evaluated by including the time-dependent covariate
Zi (t) log(t + 1) which gives a P-value of 0.35. A graphical evaluation, following the lines
from the stratified Cox model in Section 2.2, can be performed by plotting A b12 (t) against
b02 (t), see Figure 3.7. Under proportional hazards, the resulting curve should be a straight
A
line through the point (0,0), with slope equal to exp(βb2 ) = 0.858 and this is seen to be a
good approximation. For death, the similar analyses yield exp(βb3 ) = 3.113 with 95% confi-
dence limits from 2.577 to 3.760. Addition of an interaction between GvHD and log(t + 1)
TIME-DEPENDENT COVARIATES 97
6
Cumulative hazard of death
0
0 12 24 36 48 60 72 84 96 108 120 132 144 156
Time since bone marrow transplantation (months)
Figure 3.4 Bone marrow transplantation in acute leukemia: Cumulative mortality rate after relapse
(dashed line); cumulative mortality rate after GvHD (dotted line); cumulative mortality rate without
relapse or GvHD (solid line) (GvHD: Graft versus host disease).
0.8
0.7
0.6
Cumulative GvHD hazard
0.5
0.4
0.3
0.2
0.1
0.0
0 12 24 36 48 60 72 84 96 108 120 132 144 156
Time since bone marrow transplantation (months)
Figure 3.5 Bone marrow transplantation in acute leukemia: Cumulative rate of GvHD (Graft versus
host disease).
98 INTENSITY MODELS
0.2
Cumulative relapse hazard
0.1
0.0
0 12 24 36 48 60 72 84 96 108 120 132 144 156
Time since bone marrow transplantation (months)
Figure 3.6 Bone marrow transplantation in acute leukemia: Cumulative relapse rate after GvHD
(dashed line); cumulative relapse rate without GvHD (solid line) (GvHD: Graft versus host disease).
gives P = 0.11, and the goodness-of-fit plot is seen in Figure 3.8. This figure does suggest
some deviations from proportional hazards apparently caused by a too low hazard ratio
early on (convex shape of the curve); however, the formal test is insignificant.
Bone marrow transplantation studies often aim at studying the two adverse end-points re-
lapse and death without relapse, the latter often termed death in remission or treatment-
related mortality, both signaling that the treatment with BMT is no longer effective. In such
a situation, a relevant multi-state model to use would be the competing risks model, Figure
1.2, i.e., the disease course after relapse is not studied, and GvHD is no longer considered
a separate state in the model. However, in an analysis of the rates of relapse and death in
remission, it would still be of interest to study how these may be affected by occurrence of
GvHD over time. This can be done as just described, i.e., by including the time-dependent
GvHD covariate Zi (t) in the Cox models for the two rates. However, for the competing
risks model, this will no longer be an adapted time-dependent covariate but rather a non-
adapted, internal or endogenous time-dependent covariate, because the past history at time
t in the competing risks model does not contain information on GvHD and because the
existence of Zi (t) requires subject i to be alive. At this point, the distinction between these
two situations may look rather academic but when, later in the book (Section 5.2.4), we go
beyond rate models and also target marginal parameters, such as the probability of experi-
encing a relapse, the distinction will become important. We will conclude this example by
presenting results from analyses of these two event rates taking both of the time-dependent
covariates GvHD and ANC500 into account, the latter taking the value 1 at time t if, at that
time, the Absolute Neutrophil Count is above 500 cells per µL, based on repeated blood
TIME-DEPENDENT COVARIATES 99
0.15
0.10
0.05
0
0 0.05 0.10 0.15 0.20
Cumulative relapse hazard: no GvHD
Figure 3.7 Bone marrow transplantation in acute leukemia: Cumulative relapse rate with GvHD
plotted against cumulative relapse rate without GvHD. The dashed straight line has slope equal to
exp(βb2 ) = 0.858 (GvHD: Graft versus host disease).
1.0
0.8
Cumulative death hazard: GvHD
0.6
0.4
0.2
0
0 0.05 0.10 0.15 0.20
Cumulative death hazard: no GvHD
Figure 3.8 Bone marrow transplantation in acute leukemia: Cumulative death rate with GvHD
plotted against cumulative death rate without GvHD. The dashed straight line has slope equal to
exp(βb3 ) = 3.113 (GvHD: Graft versus host disease).
100 INTENSITY MODELS
Table 3.12 Bone marrow transplantation in acute leukemia: Cox models for relapse and death in
remission (GvHD: Graft versus host disease, BM: Bone marrow, PB: Peripheral blood, AML: Acute
myelogenous leukemia, ALL: Acute lymphoblastic leukemia, ANC: Absolute neutrophil count)
(a) Relapse
Covariate βb SD βb SD
GvHD(t) -0.184 0.134 -0.188 0.134
Age per 10 years -0.040 0.045 -0.039 0.045
Graft type BM only vs. PB/BM -0.125 0.135 -0.130 0.135
Disease ALL vs. AML 0.563 0.130 0.562 0.130
ANC500(t) -2.138 1.077
Covariate βb SD βb SD
GvHD(t) 1.041 0.098 1.040 0.099
Age per 10 years 0.263 0.033 0.263 0.033
Graft type BM only vs. PB/BM -0.085 0.096 -0.140 0.096
Disease ALL vs. AML 0.334 0.098 0.336 0.098
ANC500(t) -2.228 0.305
samples taken during follow-up, and equal to 0 otherwise. Table 3.12 shows the results. It
is seen that the earlier results for GvHD are sustained after adjustment for age, graft type,
disease (and ANC500), i.e., GvHD tends to reduce the relapse rate and increase the death
rate. Higher age markedly increases the death rate and tends to be associated with a lower
relapse rate. Patients with ALL have higher event rates than those with AML, and patients
receiving only bone marrow tend to have lower event rates than those who also receive
peripheral blood. Finally, obtaining an ANC above 500 markedly reduces both event rates.
In principle, one could go a step further and concentrate on the mortality rate (cf. Figure
1.1), treating relapse as an internal time-dependent covariate. However, as indicated in Fig-
ure 3.4, occurrence of relapse is such a serious event that it is typically treated as a separate
end-point.
α02 (t | Z) = α02,0 (t) exp (β0 Z), α12 (t | Z) = α12,0 (t) exp (β1 Z). (3.28)
Include the type-specific covariates (Z0 , Z1 ) and stratify by Type (time-dependent strata).
This is equivalent to fitting separate models for the two transition rates.
• Same covariate effect and non-proportional hazards
α02 (t | Z) = α02,0 (t) exp (β Z), α12 (t | Z) = α12,0 (t) exp (β Z). (3.29)
α02 (t | Z) = α0 (t) exp (β0 Z), α12 (t | Z) = α0 (t) exp (β1 Z + γ). (3.30)
Include the type-specific covariates (Z0 , Z1 ) and use Type as a (time-dependent) covari-
ate – the latter yielding the hazard ratio exp(γ).
• Same covariate effect and proportional hazards
Include the original covariate Z and use Type as a (time-dependent) covariate – the latter
yielding the hazard ratio exp(γ).
Models (3.28) and (3.29), respectively (3.30) and (3.31), may be compared using likelihood
ratio tests, i.e., it can be examined whether the regression coefficients for Z can be taken to
be the same for the two transition types. Comparing models (3.28) and (3.30), respectively
(3.29) and (3.31), corresponds to an examination of proportional hazards as exemplified,
e.g., in Sections 2.2.2 and 3.7.8. Multiple regression models, i.e., including more covariates
with combinations of identical and different effects on the two rates, can be set up along
these lines and will be exemplified below. For all models, model-based SD can be applied.
MODELS WITH SHARED PARAMETERS 105
The models (3.28) and (3.30) with type-specific covariates correspond to inclusion of an
interaction between Type and Z. This observation suggests how joint Poisson models for
the two mortality rates may also be fitted. This will require a duplicated data set including
cases of death and person-years at risk, both before and after disease occurrence, and where
interactions between some covariates and Type may be included.
The data duplication trick for the illness-death model works in a similar fashion for other
multi-state models, including the competing risks model (Figure 1.2). Thus, for the PBC3
trial, models with common covariate effects on the rate of death without transplantation
and the rate of transplantation could be fitted as well as models with proportional cause-
specific hazards. However, since the interpretation of such common effects on rates of quite
different events is not attractive, we will not illustrate this feature in the following.
(a)
(b)
(c)
both covariates have quite similar effects. The cumulative baseline rates from this model
are shown in Figure 3.10 and do not contra-indicate proportionality, so, model (b) in the ta-
ble shows results from a model assuming α13 (t) = exp(γ)α03 (t), corresponding to treating
Type (GvHD) as a time-dependent covariate. Here, γb = 0.842 (SD = 0.269). Proportion-
ality is also supported by including the covariate GvHD(t) log(t) for which the likelihood
ratio test is LRT=2.08 with 1 DF. The resulting model is on the form (3.30) with different
coefficients and proportional hazards. In model (c) of the table, coefficients are shown for
the model where the two type-specific covariates for age and disease are replaced by com-
mon covariates leading to a model of the form (3.31). The LRT for common coefficients
is 0.74 with 2 DF, supporting the simpler model in which the gain in efficiency for the
regression coefficients can be noticed. In model (c), γb = 1.049 (SD = 0.098).
3
Cumulative hazard
0
0 1 2 3 4
Time since randomization (years)
Figure 3.9 PROVA trial in liver cirrhosis: Breslow estimates for the cumulative baseline mortality
rates in a joint Cox model for the 0 → 2 (solid line) and 1 → 2 transition rates (dashed line).
0.20
0.15
Cumulative hazard
0.10
0.05
0.00
0 12 24 36 48 60 72 84 96 108 120 132 144 156
Time since bone marrow transplantation (months)
Figure 3.10 Bone marrow transplantation in acute leukemia: Breslow estimates for the cumulative
baseline rates of death in remission in a joint Cox model for the mortality rates with (dashed line)
or without GvHD (solid line) (GvHD: graft versus host disease).
108 INTENSITY MODELS
Table 3.14 Bone marrow transplantation in acute leukemia: Joint Cox models for the rates of death
in remission without or with GvHD (GvHD: Graft versus host disease, disease: Acute myelogenous
leukemia (AML) vs. acute lymphoblastic leukemia (ALL)).
(a)
(b)
(c)
has to be argued that all the models studied in Section 3.8.1, i.e., with covariates having
either different or common regression coefficients for the different transition hazards and
with separate or proportional baseline hazards, may be written in the form
β T Z h ji (t))
αh ji (t) = αv0 (t) exp(β
β T Z h ji (t))
= αφ (h, j)0 (t) exp(β (3.32)
Note that (3.33) no longer factorizes over types, ν since the same β appears for all types.
Transforming by the logarithm, differentiating with respect to a single αv0 (t), and solving
for αv0 (t) as in Section 3.3 leads, for fixed β , to the estimate
∑i ∑φ (h, j)=v dNh ji (t) dNv (t)
(t)dt =
αv0d T
= , (3.34)
∑i ∑φ (h, j)=v Yhi (t) exp(β
β Z h ji (t)) S0v (β
β ,t)
say. Inserting this into (3.33) leads to the relevant version of the stratified Cox partial like-
lihood
Yhi (t) exp(β β T Z h ji (t)) dNh ji (t)
PL(ββ ) = ∏∏ ∏ ∏ ,
i v φ (h, j)=v t S0v (β
β ,t)
and the Breslow estimator becomes
Z t
bv0 (t) dNv (u)
A
0 S0v (βb , u)
with notation as in (3.34), see Exercise 3.6. Since the resulting estimators are likelihood-
based, model-based SD may be obtained from the second derivative of the log-likelihood,
and likelihood ratio tests are also available.
110 INTENSITY MODELS
3.9 Frailty models
In Chapter 2 and in previous sections of the present chapter, we have shown several ex-
amples of regression models for a single transition rate in a multi-state model and unless
there were parameters that were shared among several transitions (see Section 3.8), these
intensities could be analyzed separately. In all these examples, an assumption of indepen-
dence among observational units (typically among subjects/patients) was reasonable. In
the present section, we will study situations where the independence assumption is not nec-
essarily met. First of all, correlated event history data may be a consequence of subjects
‘coming in clusters’, such as members of the same family, patients attending the same gen-
eral physician or medical center, or inhabitants in the same community. In these situations
it is likely that the event occurrences for subjects from the same cluster are more alike
than those among clusters. An example is the bone marrow transplantation study (Example
1.1.7) where the 2,009 patients were treated in one of 255 different medical centers and
where patients attending the same center may share some common traits and, thereby, be
more alike than patients from different centers. As a quite different situation, one may be
interested in the distribution of times to entry into different states in a multi-state model,
e.g., time to event no. h = 1, 2, . . . in a recurrent events situation (e.g., Figure 1.5), or time
to relapse or to GvHD in the model for the bone marrow transplantation data (Example
1.1.7, Figure 1.6). Here, within any given patient, these times will likely be dependent.
A classical way of modeling dependence among observational units in statistics is to use
random effects to represent unobserved common traits, and in event history analysis random
effects models are known as frailty models (e.g., Hougaard, 2000). In the present section,
we will discuss frailty models, first presenting some of the more technical inference details
(Section 3.9.1) and, next, focusing on two major examples of using frailty models. Thus,
in Section 3.9.2, we will study shared frailty models for clustered data, while Section 3.9.3
presents frailty models for recurrent events, possibly jointly with mortality. In Sections 4.3
and 5.6, we will return to the problem of dependent event history data and discuss marginal
hazard models, and in Section 7.2 we will summarize our discussions.
In the situation where, for each subject i = 1, . . . , n (assumed independent), we have a multi-
state model with possible transition types ν = 1, . . . , K (Section 3.1), the frailty model for
the type ν transition could be
c
ανi (t | Zi , Ai ) = Aνi ανi (t | Zi ) (3.36)
where the independent frailties Ai = (A1i , . . . , AKi ), i = 1, . . . , n follow some K-variate dis-
tribution across the population. In both cases, inference for parameters in the baseline in-
c (t | Z ), respectively α c (t | Z ) (the conditional hazards given covariates for a
tensities ανi i ih ih
FRAILTY MODELS 111
frailty of 1) and in the frailty distribution may, in principle, be performed using the likeli-
hood approach as described in Section 3.1. For given frailty, obervations are independent
with the likelihood given by the Jacod formula (3.1), and the likelihood for the observed
data is obtained by integrating out the frailty. This may entail technical and numerical
challenges and, furthermore, for this approach to work, the assumption that censoring is
independent of the frailty must be imposed (Nielsen et al., 1992). This is because the full
likelihood, as explained in Section 3.1, in addition to the factors arising from (3.1), also in-
volves factors reflecting the censoring distribution, and if these factors depend on the frailty,
then integration of the likelihood over the frailty distribution may become intractable.
Models like (3.35) and (3.36) were discussed by Putter and van Houwelingen (2015) and
by Balan and Putter (2020). Even though models of both types may, in principle, be ana-
lyzed, these authors concluded that frailty models are most useful for clustered data and
for recurrent events (without or with competing risks). As a side remark, we can mention
that frailty models may also be studied for univariate survival data to explain effects of
omitted covariates (e.g., Aalen et al., 2008, ch. 6). However, as discussed by Putter and van
Houwelingen (2015) and Balan and Putter (2020), this may become quite speculative be-
cause information on effects of missing covariates comes from deviations from proportional
hazards and a proportional hazards model with a missing covariate and a non-proportional
hazards model will be virtually indistinguishable. Following this we will, in what follows,
concentrate on frailty models for clustered data and for recurrent events.
Here, the Zih are observed individual level covariates and A1 , . . . , An are independent and
identically distributed random frailties representing unobserved covariates shared by mem-
bers of cluster i. We will assume that their distribution is independent of the observed
covariates. Standard choices for the frailty distribution include the gamma distribution with
mean E(A) = 1 and an unknown standard deviation σ = SD(A) to be estimated, and the
log-normal distribution with E(log(A)) = 0 and SD(log(A)) = σ . The parameter σ quan-
tifies the unobserved heterogeneity among clusters and, at the same time, the intra-cluster
association. We will exemplify this below. The baseline hazard could be of the Cox-form
with an unspecified α0 (t), possibly stratified, or α0 (t) could be piece-wise constant. The
regression parameters β` in the linear predictor LPih = β1 Zih1 + · · · + β p Zihp have a within-
cluster interpretation, exp(β` ) giving the hazard ratio for a one-unit difference in covariate
Zih` for given values of the remaining observed covariates, cf. Section 2.2.1, and for given
frailty.
112 INTENSITY MODELS
Table 3.15 Bone marrow transplantation in acute leukemia: Frailty models for relapse-free survival
taking clustering by medical center into account (BM: Bone marrow, PB: Peripheral blood, AML:
Acute myelogenous leukemia, ALL: Acute lymphoblastic leukemia).
was studied by Liu et al. (2004) and Rondeau et al. (2007). Here, γ is an additional param-
eter to be estimated and Ai follows a gamma distribution with mean 1. In the former paper,
inference was based on the EM algorithm while, in the latter paper, a penalized likelihood
approach was used.
coefficients. The recurrent event under study is recurrent myocardial infarctions (MI) and
the competing event is all-cause death. The joint frailty model with Cox baseline hazards,
α0 (t), αD0 (t), did not converge when using the penalized likelihood approach of Rondeau
et al. (2007), so instead, models with piece-wise constant baseline hazards were studied.
Analyses of frailty models with Cox-type or piece-wise constant baseline hazards for the
recurrent events process alone (i.e., assuming that frailty does not affect mortality) yielded
quite similar results. This is seen in Table 3.16, where the effect of treatment (log(rate ratio)
for liraglutide vs. placebo) on the recurrent MI rate is βb = −0.177 (SD = 0.088) for both
the piece-wise constant and the Cox-type model. The estimated frailty SD (σb ) in the two
models (assuming a gamma distributed frailty) was 2.38 and 2.39, respectively.
The similar estimates from the joint frailty model, Equation (3.38), were βb = −0.186 (SD =
0.068) and σ b = 0.947 (SD = 0.031), see Table 3.17. In this model, the estimated effect
of treatment on mortality was βbD = −0.211 (SD = 0.078) and the association parameter
linking the frailties for recurrent events and mortality was estimated to be γb = 1.860 (SD =
0.115). The interpretation is that, for any given patient (i.e., for given frailty), treatment
with liraglutide reduces the MI rate by a factor of exp(−0.186) = 0.830 and reduces the
mortality rate by exp(−0.211) = 0.809. Furthermore, patients at a high rate of an MI (high
frailty) also have a high mortality rate (γb > 0). Heterogeneity among patients (frailty SD,
σb ) appears to be considerably higher when not accounting for mortality.
Table 3.17 LEADER cardiovascular trial in type 2 diabetes: Joint frailty model for recurrent
myocardial infarctions (MI) and all-cause mortality – piece-wise constant baseline hazards and
gamma frailty distribution.
Exercise
Rt
3.1 (*) Show that, under the null hypothesis H0 : A0 (t) = A1 (t), the test statistic
K(u) d b1 (u) − d A
A b0 (u) is a martingale (Section 3.2.2).
0
Exercise 3.2 (*) Show that, when evaluated at the true parameter vector β 0 , the Cox partial
likelihood score
Z t
∑ j Y j (u)Z βT
Z j exp(β 0 Z j)
∑ Z i − dNi (u)
0 i ∑ j Y j (u) exp(ββT
0 Z j )
is a martingale (Section 3.3).
Exercise 3.3 (*) Show that, for a Cox model with a single binary covariate, the score test
for the hypothesis β = 0 based on the first and second derivative of log PL(β ) (Equation
(3.16)) is equal to the logrank test.
Exercise 3.4 (*) Show that, for the stratified Cox model (3.20), the profile likelihood is
given by (3.21) and the resulting Breslow estimator by (3.22).
Exercise 3.5 (*) Consider the situation in Section 3.4 with categorical covariates and show
that the likelihood is given by
L
∏ ∏ (α0` θc )N `cc
exp(−α0` θcY`cc ).
`=1 c ∈C
Exercise 3.6 (*) Derive the estimating equations for the model studied in Section 3.8.4.
Exercise 3.7 Consider the Cox model for stroke-free survival in the Copenhagen Holter
study including the covariates ESVEA, sex, age, and systolic blood pressure (Exercise 2.4).
Test, using time-dependent covariates, whether the effects of these covariates may be de-
scribed as time-constant hazard ratios.
Exercise 3.8 Consider the Cox model for stroke-free survival in the Copenhagen Holter
study including the covariates ESVEA, sex, age, and systolic blood pressure. Add to that
model the time-dependent covariate I(AF ≤ t). How does this affect the effect of ESVEA?
Exercise 3.9 Consider the Cox model for stroke-free survival in the Copenhagen Holter
study including the covariates ESVEA, sex, age, and systolic blood pressure. Add to that
model, incorrectly, the covariate AF – now considered as time-fixed. How does this affect
the AF-effect?
Exercise 3.10 Consider an illness-death model for the Copenhagen Holter study with states
‘0: Alive without AF or stroke’, ‘1: Alive with AF and no stroke’, ‘2: Dead or stroke’, see
Figures 1.3 and 1.7.
116 INTENSITY MODELS
1. Fit separate Cox models for the rates of the composite end-point for subjects without or
with AF, i.e., for the 0 → 2 and 1 → 2 transitions including the covariates ESVEA, sex,
age, and systolic blood pressure. The time-variable in both models should be time since
recruitment.
2. Examine to what extent a combined model for the two intensities (i.e., possibly with
common regression coefficients and/or proportional hazards between the 0 → 2 and 1 →
2 transition rates) may be fitted.
Exercise 3.11 Consider the data on repeated episodes in affective disorder, Example 1.1.5.
1. Fit separate gamma frailty models for unipolar and bipolar patients including the co-
variate ‘number of previous events N(t−)’ assuming (not quite satisfactorily!) that the
mortality rate is independent of frailty.
2. Do the recurrence rates tend to increase with number of previous episodes?
In this chapter, we will give a less technical introduction to the different models for risks
and other marginal parameters to be discussed in more mathematical details in Chapter 5.
Along with the introduction of the models, examples will be given to illustrate how results
from analysis of these models can be interpreted. In Chapter 2, we gave an intuitive in-
troduction to models for the basic parameter in multi-state models, the transition intensity.
As explained in Section 1.2, knowing all rates in principle enables calculation of marginal
model parameters such as the probability (risk), Qh (t) of occupying state h at time t. In
some multi-state models it is possible, mathematically, to describe this relationship. This is
the case for the two-state survival model of Figure 1.1, the competing risks model (Figure
1.2), and the progressive illness-death model (Figure 1.3). This means that if estimates are
given for all transition rates, e.g., via a regression model, then the marginal parameters may
be estimated (for given covariates) by plug-in. Plug-in refers to the idea of estimating a
given function, say g(θ ), of the parameter θ , by first getting an estimate θb of θ , and then
using g(θb) as the plug-in estimate of g(θ ). This is the topic of Section 4.1. As we shall
see there, in a regression situation this activity does not provide parameters that directly
describe how the marginal parameters are associated with the covariates. Therefore, it may
be of interest to set up regression models where marginal parameters are linked directly to
covariates. This is the topic of Section 4.2. The direct model approach has the additional
advantage that while plug-in builds on correctly specified models for all intensities (and,
thereby, a risk of model misspecification is run), only a single directly specified model for
the marginal parameter needs to be correct. In Section 4.3, we introduce marginal hazard
models that may be applicable in situations where an independence assumption need not
be justified and in Section 4.4, we return to a discussion of the concept of independent cen-
soring, including ways of studying whether censoring is affected by observed covariates.
117
118 INTUITION FOR MARGINAL MODELS
From the hazard function, the survival function, a marginal parameter, S(t) = Q0 (t), is
derived as follows. Divide the interval from 0 to t into small intervals all of length ∆. From
Equation (4.1), the probability of surviving the next little time interval (from u to u + ∆)
given survival till u is (1 − α(u)∆) as illustrated in Figure 4.1.
1 − α(u)∆
-
0 ∆ 2∆ ··· u u+∆ ··· t −∆ t
Figure 4.1 The probability of surviving the time interval from u to u + ∆ given survival till u is
(1 − α(u)∆).
The marginal probability, S(t), is the product of such factors (conditional probabilities) for
u<t
S(t) = (1 − α(∆)∆) · (1 − α(2∆)∆) · · · (1 − α(u)∆) · · · (1 − α(t − ∆)∆). (4.2)
This observation leads to the Kaplan-Meier estimator for the survival function (Kaplan and
Meier, 1958), as follows. We follow the arguments in Section 2.1.1 leading to the Nelson-
Aalen estimator for the cumulative hazard, i.e., estimating α(u)∆ by the fraction
where dN(u), the observed number of failures at time u, is typically 0 or 1. The survival
function is then estimated by plugging-in this fraction into the product-representation for
S(t). Hereby, the Kaplan-Meier estimator is obtained
dN(X1 ) dN(Xk )
S(t)
b = 1− ··· 1−
Y (X1 ) Y (Xk )
1
= ∏ 1− . (4.3)
No. at risk at X
event times, X≤t
In Equation (4.3), X1 , . . . , Xk are the individual observation times before time t (some are
event times, others are censoring times) and the second line of the equation uses the ‘prod-
uct symbol’ ∏ which is similar to the ‘summation symbol’ ∑ used previously. Note that,
for times u < t with no observed event, dN(u) is 0, and the factor 1 − dN(u)/Y (u) becomes
1, so the plug-in estimator effectively becomes a product over observed event times before
time t as seen in Equation (4.3). The standard deviation of S(t) b can be estimated using the
Greenwood formula whereby confidence limits around S(t) may be obtained (Kaplan and
b
Meier, 1958). This is typically done by taking as starting point symmetric confidence limits
for log(A(t)), see Section 2.1.1. This amounts to 95% confidence limits for S(t) obtained
b to the powers exp(±1.96 · SD/(S(t)
by raising S(t) b A(t)))
b where SD is the Greenwood esti-
mate.
As an example, Figure 4.2 shows the Kaplan-Meier estimates for the two treatment groups
in the PBC3 trial (Example 1.1.1) for the time to the composite end-point ‘failure of medical
PLUG-IN METHODS 119
1.00
0.75
Survival probability
0.50
0.25
0.00
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
treatment’. Similar to Figure 2.1, displaying the Nelson-Aalen estimates, the figure suggests
that, unadjusted for prognostic variables, the survival probability is unaffected by treatment.
This should be no surprise since the two estimators build on exactly the same information
and, therefore, are in one-to-one correspondence with each other. One important difference,
however, lies in the interpretation of the values on the vertical axis. In Figure 2.1, the
cumulative hazard was estimated and, as discussed there, the interpretation of this quantity
is not so direct. Figure 4.2, on the other hand, depicts the fraction of patients that, over
time, is still event-free. At 2 years, the estimates are 0.846 in the CyA group and 0.832 for
placebo with values of the Greenwood SD equal to 0.029, respectively 0.030. The resulting
95% confidence interval for S(2 years) is then (0.766, 0.882) for the placebo group and
(0.800, 0.894) for CyA. To enhance readability, confidence limits have not been added to
Figure 4.2.
Let us have a closer look at the product representation Equation (4.2) for S(t) to understand
this one-to-one correspondence. From the product representation we get a sum representa-
tion by using the logarithm
where for the approximation we have used that − log(1 − x) ≈ x which holds for small
positive values of x as seen in Figure 4.3. It then follows that
Z t
S(t) = exp(− α(u)du) (4.4)
0
because, for small ∆, the sum ∑u≤t α(u)∆ will be equal to the integral in Equation (4.4).
120 INTUITION FOR MARGINAL MODELS
1.0
0.8
0.6
−log(1−x)
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 4.3 The functions y = − log(1 − x) and y = x for 0 < x < 1. Note that the two functions almost
coincide for small values of x > 0.
This equation was already given in Equation (1.2). This formula expresses the one-to-one
correspondence between the survival function and the hazard. Knowing the hazard is know-
ing the survival function, and vice versa.
Suppose now that in the PBC3 example we have estimated the hazard by assuming it to be
piece-wise constant as in Table 2.1. We can then estimate S(t) by plug-in
Z t
b = exp(−
S(t) b (u)du).
α
0
Figure 4.4 shows the Kaplan-Meier estimator for the PBC3 placebo group together with
the plug-in estimator using a piece-wise constant hazard model for α(t) and, just like in
Figure 2.3, it is seen that the two models give quite similar results.
Having the one-to-one correspondence, Equation (4.4), a regression model for α(t) induces
a regression model for S(t). Assume a Cox regression model for α(t)
where the linear predictor LP is given by Equation (1.4). The survival function is then given
by Z t
S(t | Z) = exp − α0 (u) exp(LP)du = exp −A0 (t) exp(LP) , (4.5)
0
PLUG-IN METHODS 121
1.0
0.9
0.8
0.7
Survival probability
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Figure 4.4 PBC3 trial in liver cirrhosis: Estimated survival curves for the placebo group.
where A0 (t) = 0t α0 (u)du. Using the complementary log-log transformation cloglog of the
R
The cloglog function is the link function which takes us from the marginal parameter to the
linear predictor. As an example, let us consider the models for the PBC3 trial presented
in Table 2.7. Here, the survival function at time t for a CyA treated subject (Z1 = 1) with
biochemical values albumin = Z2 , bilirubin = Z3 may (based on the Cox model) be estimated
by
b | Z1 = 1, Z2 , Z3 ) = exp −A
S(t b0 (t) exp(−0.574 − 0.091Z2 + 0.665 log2 (Z3 ))
while, for a placebo treated patient with the same values of the biochemical variables, the
estimated survival function at time t is
b | Z1 = 0, Z2 , Z3 ) = exp −A
S(t b0 (t) exp(−0.091Z2 + 0.665 log2 (Z3 )) .
Figure 4.5 shows the estimated survival curves for albumin = 38g/L and bilirubin =
45µmol/L – values close to the observed average values among all patients. We can see
that, on the probability scale Equation (4.5), the treatment effect is time-dependent, while
on the cloglog scale Equation (4.6), the effect is time-constant as assumed in the Cox and
122 INTUITION FOR MARGINAL MODELS
Poisson models, see Figure 4.6. It is also the case that, on the probability scale, the differ-
ence between the curves will depend on the values of albumin and bilirubin. The fact that,
on the cloglog scale, vertical distances between survival curves are constant under propor-
tional hazards was classically used to construct goodness-of-fit plots for the Cox model
based on a stratified model (e.g., Andersen et al., 1993, Section VII.3) – a technique that is
still offered by standard software packages. However, since we find that these plots may be
hard to interpret, we will not provide examples of their use and prefer, instead, plots such
as those exemplified in Figure 2.10.
If a single set of covariate-adjusted survival curves for the two treatment groups is desired,
then this may be obtained by averaging curves such as those exemplified over the observed
distribution of Z2 , Z3 . As explained in Section 1.2.5, this is known as the g-formula and
works by performing two predictions for each subject, i, one setting treatment to CyA
and one setting treatment to placebo, and in both predictions keeping the observed values
(Z2i , Z3i ) for albumin and bilirubin. The predictions for each value of treatment are then
averaged over i = 1, . . . , n (see Equation (1.5)). The g-formula results in the curves shown
in Figure 4.7. Note that, if randomization in the PBC3 trial had been more successful, then
these curves would resemble the Kaplan-Meier estimates in Figure 4.2. Using the curves
obtained based on the g-formula, it is possible to visualize the treatment effect on the prob-
ability scale after covariate-adjustment using plug-in. At 2 years, the values of the curves in
Figure 4.7 are 0.799 for placebo and 0.867 for CyA with estimated SD, respectively, 0.025
and 0.019 – slightly smaller than what is obtained based on 1,000 bootstrap replications,
namely SD values of 0.028 and 0.022. The treatment effect (risk difference at 2 years) is
thus 0.867 − 0.799 = 0.068, and it has an estimated SD of 0.026 close to that based on
1,000 bootstrap replications which is 0.027.
On a technical note, one may wonder why one does typically not estimate S(t) non-
parametrically by plugging-in the Nelson-Aalen estimator into Equation (4.4). The answer
is that for a distribution with jumps as the one estimated by Nelson-Aalen, the relation-
ship between the cumulative hazard and the survival function is given by the product-
representation rather than by Equation (4.4). Having said this, it should be mentioned that
computer packages often offer the estimator ‘exp(−Nelson-Aalen)’ as an alternative to
Kaplan-Meier (and ‘− log(Kaplan-Meier)’ as an alternative to Nelson-Aalen) and that this
in practice makes little difference. Following this remark, the survival function for given
covariates based on the Cox model could, alternatively, have been estimated by a product-
representation based on the Breslow estimator for the cumulative baseline hazard. Figure
4.8 shows the result of using these, alternative, estimators when predicting the survival
function in the two treatment groups for albumin = 38 and bilirubin = 45 and, as we can
see, this makes virtually no difference compared to Figure 4.5.
1.0
0.9
0.8
Estimated survival function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
Figure 4.5 PBC3 trial in liver cirrhosis: Estimated survival curves for a patient with albumin =
38 g/L and bilirubin = 45 µmol/L based on a Cox model. There is one curve for each value of
treatment.
0
log(−log(survival function))
−2
−4
−6
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
Figure 4.6 PBC3 trial in liver cirrhosis: Estimated survival curves for a patient with albumin = 38
g/L and bilirubin = 45 µmol/L based on a Cox model. The vertical scale is cloglog-transformed,
and there is one curve for each value of treatment.
124 INTUITION FOR MARGINAL MODELS
1.0
0.9
Estimated survival function (g−formula)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
Figure 4.7 PBC3 trial in liver cirrhosis: Estimated survival curve in the two treatment groups based
on the g-formula.
1.0
0.9
0.8
Estimated survival function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
Figure 4.8 PBC3 trial in liver cirrhosis: Estimated survival curves for a patient with albumin = 38
g/L and bilirubin = 45 µmol/L based on a Cox model. There is one curve for each value of treatment
and estimates are based on the product-formula.
PLUG-IN METHODS 125
Table 4.1 PBC3 trial in liver cirrhosis: Estimated 3-year restricted means (and SD) by treatment
group. *: Based on 1,000 bootstrap replications.
Placebo CyA
Model (albumin, bilirubin) ε0 (3)
b SD ε0 (3)
b SD
Non-parametric 2.61 0.064 2.68 0.057
0.064* 0.058*
Cox (38, 45) 2.53 0.068* 2.72 0.054*
(20, 90) 0.96 0.268* 1.38 0.279*
g-formula 2.55 0.060* 2.71 0.046*
i.e., it is the area under the survival function (Figure 1.10). The equation can be derived in
the following (perhaps less intuitive) way. If T is the life time, then ε0 (τ) is the expected
value of the minimum, min(T, τ), of T and the threshold τ. This random variable may be
written as Z min(T,τ)Z τ
min(T, τ) = 1dt = I(T > t)dt
0 0
and, therefore, ε0 (τ), the expected value of this is
Z τ Z τ Z τ Z τ
E( (I(T > t)dt) = E(I(T > t))dt = P(T > t)dt = S(t)dt
0 0 0 0
which is exactly the right-hand side of Equation (4.7). A non-parametric estimator for ε0 (τ)
is obtained by plugging-in the Kaplan-Meier estimator for S(t) into Equation (4.7) while
a regression model for ε0 (τ) may be obtained by plugging-in, e.g., a Cox model-based
estimator for S(t | Z) into the equation.
The method is illustrated using the PBC3 data, and Table 4.1 shows the results. It is seen
that, unadjusted, the 3-year restricted means do not differ between the two treatment groups.
If based on a Cox model, the restricted means differ according to the chosen covariate pat-
tern and single, adjusted values may be obtained using the g-formula. Most of the SD values
in the table are based on 1,000 bootstrap replications, and it is seen that the scenario with
albumin and bilirubin values of 20 and 90 provides larger SD – the explanation being that
these values are more extreme compared to the observed distributions of the two biochem-
ical variables.
(Section 1.2.3). The cumulative incidence at time t is the probability that a cause h event
has happened between time 0 and time t. The probability that it happens in the little time
interval from u to u + du (with 0 < u ≤ t) is the probability S(u) of no events before time u
(being in state 0 at time u) times the conditional probability αh (u)du of cause h happening
in that little interval given no previous events as illustrated in Figure 4.9.
S(u)αh (u)du
-
0 u u + du t
Figure 4.9 The probability that a cause h event happens in the little time interval from u to u + du is
the probability S(u) of no events before time u times the conditional probability αh (u)du.
Since, for different values of u, the events ‘cause h happens in the interval from u to u + du’
are exclusive, their total probability is the sum (integral) of the separate probabilities from
0 to t, i.e., Z t
Fh (t) = S(u)αh (u)du, (4.8)
0
an equation that was already given in (1.3).
Estimating S(t) by the overall Kaplan-Meier estimator and the cumulative cause h spe-
cific hazard by the Nelson-Aalen estimator, plug-in into Equation (4.8) leads to the non-
parametric Aalen-Johansen estimator
1
Fbh (t) = ∑ S(X−)
b (4.9)
Y (X)
cause h event times X≤t
PLUG-IN METHODS 127
of the cause h cumulative incidence. In Equation (4.9), S(X−)
b is the Kaplan-Meier value
just before the event time, X, i.e., the jump at that time is not included. Confidence limits
around Fbh (t) may also be computed, preferably based on a symmetric confidence interval
for cloglog(Fh (t)) in the same way as for the Kaplan-Meier estimator (Section 4.1.1).
In a similar fashion, Cox models (i.e., regression coefficients (βb1 , βb2 ) and Breslow estimates
(A
b10 , A
b20 )) for each of the cause-specific hazards may be plugged-in into Equation (4.8) to
obtain estimates of Fh (t | Z) for given values of covariates Z.
It is very important to notice that, via the factor S(·), the cumulative incidence for cause
1 depends on both of the cause-specific hazards α1 (·) and α2 (·). This means that, in spite
of the fact that inference for α1 (·) could be carried out by, formally, censoring for cause
2 events, both causes must be taken into account when estimating the cumulative risk of
cause 1 events. An estimator for F1 (t) obtained as ‘1 − Sb1 (t)’ where Sb1 (t) is a Kaplan-
Meier estimator counting only cause 1 events as events (and cause 2 events Rt
as censor-
ings) will be a biased estimator. This Kaplan-Meier curve estimates exp(− 0 α1 (u)du) –
a quantity that does not possess a probability interpretation in the population where both
causes are operating (the population for which we wish to make inference, cf. Section 1.3).
The incorrect cumulative
Rt
incidence estimator 1 − Sb1 (t) will be upwards biased because
F1 (t) ≤ 1 − exp(− 0 α1 (u)du), intuitively because, by counting cause 2 events as censor-
ings, we pretend that, had these subjects not been ‘censored’, then they would still be at
risk for the event of interest (i.e., cause 1), see, e.g., Andersen et al. (2012). In other words,
the one-to-one correspondence between a single rate (αh (t)) and the risk (Fh (t)) that we
saw for the two-state model (Section 4.1.1) does not exist in the competing risks model: To
compute the cause h risk, Fh (t), not only the rate for cause h is needed, but also the rates
for the competing cause(s) (and vice versa).
We will illustrate cumulative incidences and the bias incurred when using the incorrect
estimator using the PBC3 data. Figures 4.10 and 4.11 show, for the placebo group, a
stacked plot of cumulative incidences and overall survival function computed, respectively,
correctly by using the Aalen-Johansen estimator, Equation (4.9), and incorrectly using
‘1− Kaplan-Meier’ based on a single cause. More specifically, Fb1 (t), Fb1 (t) + Fb2 (t) and
Fb1 (t) + Fb2 (t) + S(t)
b are plotted against t. In Figure 4.10, the latter curve is, correctly, equal
to 1 while, in Figure 4.11, this sum exceeds 1 because ‘1− Kaplan-Meier’ is an upwards
biased estimator of the cumulative incidence. Note that (in the correct Figure 4.10), the
values of the vertical axis have simple interpretations as the fractions of patients who, over
time, are expected to experience the various events.
We will also illustrate predicted cumulative incidences from cause-specific Cox models.
This can be done for a given pattern for the covariates that enter into the Cox models.
Figure 4.12 shows predicted stacked cumulative incidences and overall survival for placebo
treated female patients with, respectively (age, albumin, bilirubin) equal to (40, 38, 45), (40,
20, 90), and (60, 38, 45). It is seen that, for the second pattern, both cumulative risks are
considerably larger than for the first while, comparing the first and the last pattern, it is seen
that the older patient has a much higher risk of death and a lower risk of transplantation.
128 INTUITION FOR MARGINAL MODELS
1.0
Stacked cumulative incidence and survival
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Overall
Transplantation
Transplantation + death without transplantation
Figure 4.10 PBC3 trial in liver cirrhosis: Stacked cumulative incidence and survival curves for the
placebo group. Cumulative incidences are, correctly, estimated using the Aalen-Johansen estimator,
Equation (4.9).
1.1
Stacked cumulative incidence and survival
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Overall
Transplantation
Transplantation + death without transplantation
Figure 4.11 PBC3 trial in liver cirrhosis: Stacked cumulative incidence and survival curves for
the placebo group. Cumulative incidences are, incorrectly, estimated using the ‘1−Kaplan-Meier’
estimators based on single causes.
PLUG-IN METHODS 129
(a) 40 years old, albumin = 38 g/L, bilirubin = 45 µmol/L
1.0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Overall
Transplantation
Transplantation + death without transplantation
1.0
Stacked cumulative incidence and survival
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Overall
Transplantation
Transplantation + death without transplantation
1.0
Stacked cumulative incidence and survival
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Overall
Transplantation
Transplantation + death without transplantation
Figure 4.12 PBC3 trial in liver cirrhosis: Predicted, stacked cumulative incidence and survival
curves for three women in the placebo group based on cause-specific Cox models.
130 INTUITION FOR MARGINAL MODELS
Table 4.2 PBC3 trial in liver cirrhosis: Estimated time lost (in years) before 3 years due to trans-
plantation (T) and death without transplantation (D) by treatment group in four scenarios (F: fe-
male).
is the expected time lost ‘due to cause h’ before time τ (Andersen, 2013). This may be
estimated as the area under the Aalen-Johansen estimator (or under a model-predicted cu-
mulative incidence estimator) over the interval from 0 to τ.
Table 4.2 shows the estimated expected time lost (in years) before τ = 3 years in each
treatment group of the PBC3 trial due to transplantation and death without transplantation,
respectively. The scenarios considered are either: No adjustment or adjustment for sex,
age, albumin, and bilirubin with three different covariate configurations. In the unadjusted
situation, note that the total time lost equals 3 minus the restricted means (2.61 and 2.68
years, respectively) presented in Table 4.1. The estimated values for the SD in the two
treatment groups: Placebo, respectively CyA, based on 1,000 bootstrap replications, were
0.040 and 0.030 for transplantation, and 0.053 and 0.050 for death without transplantation.
For the other three scenarios, the numbers in Table 4.2 (for the placebo group) provide,
for each cause, one-number summaries of the tendencies seen in Figure 4.12. In this scale
we can notice the beneficial effect of CyA after adjustment. For these values, estimates
of SD could also be obtained using the bootstrap (though we have not illustrated this).
Single values for each treatment group could be obtained by averaging over the observed
covariate distribution using the g-formula (Section 1.2.5). We will illustrate this based on a
direct regression model for εh (τ) in Section 4.2.2.
PLUG-IN METHODS 131
0 1
α01 (t)
Disease-free - Diseased
Figure 4.13 States and transitions in the modified illness-death model without recovery.
Rt
Q0 (u)α01 (u)du· exp(− u α13 (x)dx)
-
0 u u + du t
Figure 4.14 The probability that a 0 → 1 transition happens in the small time interval from u to
u + du is the probability Q0 (u) of no transition out of state 0 before time u times the conditional
probability The probability of no 1 → 3 transition from u to t given in state 1 at time u is
α01 (u)du.
exp − ut α13 (x)dx .
R
for the state 1 occupation probability. The semi-Markovian case is similar; however,
Rt
now
the probability
of staying in state 1 from entry at time u and until t is exp − α
u 13 (x, x−
u)dx .
The probability Q3 (t) equals F1 (t)−Q1 (t) and, for all h, plug-in is applicable for estimation
of Qh (t). The latter approach for estimation of Q3 (t) is related to an idea of Pepe (1991), as
follows. For both the Markovian and the semi-Markovian situation, the probability Q0 (t) +
Q1 (t) of being alive with or without the disease, may be estimated by the Kaplan-Meier
estimator, say S(t),
b counting all deaths and disregarding disease occurrences. This leads to
the alternative Pepe estimator
Q b −Q
b1 (t) = S(t) b0 (t).
PLUG-IN METHODS 133
0.05
0.04
Probability
0.03
0.02
0.01
0.00
0 12 24 36 48 60 72 84 96 108 120 132 144 156
Time since bone marrow transplantation (months)
Figure 4.15 Bone marrow transplantation in acute leukemia: State occupation probability and
prevalence for the relapse state.
Q1 (t)
,
Q0 (t) + Q1 (t)
i.e., the conditional probability of being diseased at time t given alive at time t. Finally, the
expected time lived with the disease before time τ is
Z τ
ε1 (τ) = Q1 (u)du.
0
To illustrate this, we use a simplified version of the model for the bone marrow trans-
plantation data (Example 1.1.7) where graft versus host disease is not accounted for, see
Figure 1.6. Figure 4.15 shows both the estimated probability, Q b1 (t) of being alive with
relapse and the estimated prevalence. As a consequence of the high mortality rate with re-
lapse (Figure 3.4), both probabilities are rather low. From Q b1 (t), the expected time lived
with relapse before time τ can be estimated, and with τ = 120 months the estimate is
ε1 (τ) = 1.62 (SD = 0.28) months (with SD in brackets based on 1,000 bootstrap sam-
b
ples). For state 0, the expected time spent alive without relapse before τ = 120 months is
ε0 (τ) = 75.78 (SD = 1.25) months while the expected time lost due to death before τ = 120
b
months is 42.61 (SD = 1.23) months (b ε2 (τ) = 29.13, SD = 1.13 months lost without relapse
and ε3 (τ) = 13.48, SD = 0.80 months lost after relapse).
b
So, in the illness-death model, the use of plug-in becomes cumbersome though still doable
and the same is the case for more complicated irreversible (‘forward-going’) models, i.e.,
134 INTUITION FOR MARGINAL MODELS
Table 4.3 Recurrent episodes in affective disorders: Estimated numbers of years spent in and out of
hospital, and lost due to death before τ = 15 years for patients with unipolar or bipolar disease (SD
based on 1,000 bootstrap replications).
those for which transitions back into previous states are not possible (e.g., Figures 1.5-
1.6). However, it seems clear that a more general technique – also covering models with
back-transitions – would be preferable, and for Markov processes such a technique exists.
This technique, based on product-integration of the transition intensities, is, however, not as
intuitive as those described in the present section. We will return to a discussion of product-
integration in Section 5.1 and skip the details here. We will rather illustrate results from an
analysis using the illness-death model with recovery, i.e., the model for recurrent episodes
(recurrent events with periods between times at which subjects are at risk for a new event),
Figure 1.4, as an example.
We study Example 1.1.5 on recurrent episodes in affective disorders and refer to an ongoing
episode as ‘being in hospital’. As for the illness-death model without recovery, we may be
interested in the state occupation probabilities, Q0 (t), the probability of being out of the
hospital t years after the initial diagnosis, Q1 (t), the probability of being in the hospital
at time t, and Q2 (t), the probability of being dead at time t. Likewise, the average times,
ε0 (τ), ε1 (τ), ε2 (τ), spent in each of the states until some threshold τ may be of interest.
Figure 4.16 shows the stacked estimates of the state occupation probabilities for patients
with unipolar or bipolar disorder. It is seen that bipolar patients spend more time out of the
hospital, and unipolar patients have a higher mortality – an observation that is emphasized
by computing the one-number summaries b εh (15 years), h = 0, 1, 2, see Table 4.3. Note that,
as explained in Section 1.2.2, the estimates add up to τ = 15 years.
Plug-in
0.75 0.75
Probability
Probability
0.50 0.50
0.25 0.25
0.00 0.00
0 10 20 0 10 20
Time since first admission (years) Time since first admission (years)
State Dead In hospital Out of hospital
Figure 4.16 Recurrent episodes in affective disorders: Estimated stacked state occupation probabil-
ities for patients with unipolar or bipolar disorder.
of generalized estimating equations (GEEs) whose solutions are the regression parameters
giving this direct link. Mathematical details will be described in Section 5.5 where we will
see that this approach typically also involves estimation of the distribution of censoring
times (see also Section 4.4.1).
and, therefore,
log(− log(S(t | Z))) = log(− log(1 − F(t | Z))) = log(A0 (t)) + LP (4.10)
where LP = β1 Z1 + · · · + β p Z p is the linear predictor, see Section 2.2.1 and Equation (4.6).
For an additive hazard model with constant hazard differences, α(t | Z) = α0 (t) + LP, we
have that
− log(S(t | Z))/t = A0 (t)/t + LP.
In both cases, a certain transformation, the link function, of the marginal parameter (here
S(t)) gives the linear predictor and, therefore, the regression parameters can be interpreted
in the scale given by this link function, see Section 1.2.5. In the case of a multiplicative
hazard, the link function is the cloglog function, corresponding to exp(β ) being hazard
ratios (see Equation (4.6)), and in the additive case the link function is − log(·) (note the
minus sign) and the exp(β ) coefficients correspond to ratios between survival probabilities.
136 INTUITION FOR MARGINAL MODELS
Restricted mean life time
For the restricted mean life time and for marginal parameters in more complicated multi-
state models, hazard models do not provide a simple link between this parameter and co-
variates. In such situations, a way forward is to set up equations whose solutions provide
parameters that establish a direct link. As a first example, we will look at the restricted
mean life time ε0 (τ). For this parameter, Tian et al. (2014) proposed estimating equations
where some transformation, such as the logarithm or the identity function of the restricted
mean survival time is linear in the covariates, i.e.,
log(ε0 (τ)) = β0 + LP
ε0 (τ) = β0 + LP.
We provide more mathematical details in Section 5.5.2 and here illustrate the method using
the PBC3 data.
Table 4.4 shows the estimated coefficients in a linear model for the 3-year restricted mean
including treatment (Z1 ), albumin (Z2 ), and bilirubin (Z3 )
ε0 (3 | Z) = β0 + β1 Z1 + β2 Z2 + β3 log2 (Z3 ),
i.e., the link function is the identity function, meaning that it is the restricted mean itself that
is given by the linear predictor. In the estimation, it has been assumed that censoring does
not depend on covariates, see Section 4.4.1 for more details. To estimate the variability
of the estimators, robust or sandwich estimators of the SD are used (we will be using
both names in what follows). The use of the word ‘sandwich’ stems from the fact that the
mathematical expression for the collection of standard deviations and correlations consists
of two identical parts, the ‘bread’, with something different, the ‘meat’, in between. The
coefficients β1 , β2 , and β3 have attractive interpretations. For given values of albumin and
bilirubin, a CyA-treated patient, on average lives 0.168 years longer without transplantation
during the first 3 years after randomization compared to a placebo-treated patient, for each
extra 10 g/L of albumin the average years lived without transplantation during 3 years
increases by 0.31 years, and for each doubling of bilirubin the average years lived without
transplantation decrease by 0.214 years. The intercept β0 is the 3-year restricted mean when
the covariates all take the value 0 – an enhanced interpretability would be obtained by
centering the quantitative covariates (Section 2.2.2). However, more informative absolute
values for the restricted mean can be obtained using the g-formula (Section 1.2.5). This
gives the values 2.559 (0.065) years for placebo and 2.729 (0.053) for CyA with estimated
SD values based on 1,000 bootstrap replications in brackets. Since the model is linear,
the difference between these two restricted means is the coefficient (β1 ) for treatment and
based on the bootstrap procedure, the estimated value is 0.170 year with a bootstrap SD of
0.079, almost as in Table 4.4.
Covariate βb SD
Intercept 2.376 0.381
Treatment CyA vs. placebo 0.168 0.078
Albumin per 1 g/L 0.031 0.008
log2 (Bilirubin) per doubling -0.214 0.034
Table 4.5 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from Fine-Gray models for
death without transplantation and transplantation.
Death without
transplantation Transplantation
Covariate βb SD βb SD
Treatment CyA vs. placebo -0.353 0.260 -0.409 0.368
Albumin per 1 g/L -0.061 0.031 -0.070 0.033
log2 (Bilirubin) per doubling 0.616 0.089 0.619 0.101
Sex male vs. female -0.415 0.317 -0.092 0.580
Age per year 0.087 0.016 -0.075 0.017
as F(t) = 1 − S(t) is linked to covariates in the Cox model for survival data, i.e., via the
cloglog function – see Equation (4.6). The model is
with linear predictor LPh = β1h Z1 + · · · + β ph Z p . Estimating equations for the β -parameters
were proposed together with an estimator for A e0h (t), from which Fh (t | Z) may be predicted.
Robust estimators for the associated SD were also given. We will give more details in
Section 5.5.3 and here illustrate the model using the PBC3 data.
Table 4.5 shows the estimated coefficients when fitting Fine-Gray models to the cumulative
incidences for the two competing events transplantation and death without transplantation.
In the estimation, it has been assumed that censoring does not depend on covariates. From
the negative signs of the coefficients for treatment, sex and albumin, it appears that CyA
treatment, male sex and high albumin all decrease the risks of both end-points. High biliru-
bin increases the risk of both end-points, while advanced age increases the death risk and
decreases the risk of transplantation. These results are qualitatively well in line with those
obtained when analyzing the cause-specific hazards. It is important to realize that the two
sets of models target different parameters and, as we have seen in Section 4.1.2, each cu-
mulative incidence depends on both cause-specific hazards and, therefore, a coefficient in
a Fine-Gray model depends on how the corresponding covariate is associated with both
cause-specific hazards. It follows that a situation can occur where a covariate is associated
with, e.g., an increased cause-specific hazard for cause 1 but not associated with that for
cause 2, in which case that covariate could affect (decrease) the cumulative incidence for
cause 2. This is because a high cause 1 risk ‘leaves fewer subjects to experience cause
138 INTUITION FOR MARGINAL MODELS
2’. An example of this situation was provided by Andersen et al. (2012). Similar mecha-
nisms also explain differences between the coefficients from the cause-specific Cox models
(Table 2.13) and the Fine-Gray models (Table 4.5). For those covariates where the cause-
specific Cox coefficients have the same sign for both events (i.e., treatment, sex, albumin
and bilirubin), the Fine-Gray coefficients are numerically smaller while, for age where the
Cox coefficients have opposite signs, the Fine-Gray coefficients are numerically larger.
One may wonder, what is the exact interpretation of the Fine-Gray coefficients (except from
being risk differences on the cloglog scale)? When applying the cloglog(x) = log(− log(1−
x)) transformation to the risk function F = 1−S in the case of no competing risks, the result
is the cumulative hazard, i.e., its slope (the hazard α(·)) has the interpretation
However, application of the cloglog function to the cause h cumulative incidence in the
eh (t)) has the following
presence of competing risks results in a function whose slope (say, α
interpretation
eh (t)dt ≈ P(cause h event in (t,t + dt) | no cause h or a competing event < t).
α
This function is known as the cause h sub-distribution hazard and its interpretation is not
very appealing: It gives the cause h event rate among those who have either not yet had a
cause h event or have experienced a competing event. This awkward ‘risk set’ has caused
some debate (e.g., Putter et al., 2020), but it is also the basis for the equations from which
the regression coefficients are estimated, see Section 5.5.3 for more details. In conclusion,
the Fine-Gray model has the nice feature that it provides a direct link between a cumulative
incidence and covariates, but this association, exp(β ), is expressed on the not-so-nice scale
of sub-distribution hazard ratios. The use of other link functions will be discussed in Sec-
tions 5.5.5 and 6.1.4 and, as we shall see there, these choices may entail other difficulties.
An appealing feature of the Fine-Gray model is the ease with which the cause h cumula-
tive incidence can be predicted for given covariates by combining the estimated regression
coefficients for that cause with the estimate of the baseline cumulative sub-distribution
hazard A e0h (t). Recall that, in order to predict a cause h cumulative incidence based on
cause-specific hazard models, both regression coefficients and cumulative baseline hazards
for all causes are needed. Prediction based on Fine-Gray models is exemplified in Figure
4.17 where the cumulative incidences for both treatments and both events in the PBC3 trial
are estimated for a 40-year old woman with albumin equal to 38 g/L and bilirubin equal to
45 µmol/L. The curves for placebo treatment can be compared with those in Figure 4.12a
and are seen to be close to those presented there. It should be noted that cause-specific Cox
proportional hazards models and Fine-Gray (proportional sub-distribution hazards) models
are mathematically incompatible.
To illustrate how single predictions across the patient population may be obtained, we use
the g-formula (Section 1.2.5) to estimate cumulative incidences at 2 years based on the
Fine-Gray models. To estimate the SD of the resulting risks of transplantation or death
without transplantation, 1,000 bootstrap samples were drawn. For transplantation, the g-
formula gives a 2-year risk of 0.069 for placebo and 0.049 for CyA. Based on the bootstrap,
the corresponding values (SD) are, respectively, 0.113 (0.198) and 0.092 (0.202); however,
DIRECT MODELS 139
excluding samples with degenerate estimates of 1 (47 samples), the bootstrap-based values
are 0.069 (0.020) for placebo and 0.048 (0.013) for CyA – in line with the risk estimates
from the original data. The estimated treatment effect (risk difference at 2 years) is 0.021
with a bootstrap SD of 0.020. For death without transplantation, the estimated 2-year risks
are 0.117 for placebo and 0.088 for CyA in accordance with the corresponding bootstrap
values (SD) of, respectively, 0.117 (0.021) and 0.088 (0.019) leading to an estimated risk
difference at 2 years of 0.030 (0.023).
It is sometimes advocated that the Fine-Gray model is used only for the ‘cause of interest’
(say, cause 1). However, we believe that, in the competing risks model, all causes should be
analyzed because the association between the cause 1 cumulative incidence and a covariate
may be a result of an association between that covariate and the cause-specific hazard for
a competing cause. Therefore, Latouche et al. (2013) argued that, to get an overview, all
cause-specific hazards and all cumulative incidences should be studied. Another item to
pay attention to is the fact that separate Fine-Gray models for two competing causes may
be mathematically incompatible (and certainly incompatible with a Cox model for overall
survival) and may provide overall risk estimates exceeding 1.
1.0
Cumulative incidence for death w/o transplantation
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
(b) Transplantation
1.0
0.9
Cumulative incidence for transplantation
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Placebo CyA
Figure 4.17 PBC3 trial in liver cirrhosis: Predicted cumulative incidence for a 40-year-old woman
with albumin equal to 38 g/L and bilirubin equal to 45 µmol/L based on Fine-Gray models.
DIRECT MODELS 141
0.967 − 1.377 = −0.410, and 0.043 − 0.080 = −0.037, respectively for transplantation
for the three choices made, and 0.061 − 0.090 = −0.029, 0.302 − 0.364 = −0.062, and
0.256 − 0.373 = −0.117, respectively for death without transplantation – all numbers in
years). By fitting the direct linear model, we get a single treatment effect of βb11 = −0.067
years for transplantation and βb21 = −0.068 years for death without transplantation. In the
final model in Table 4.6 (right column) we only adjust for the two biochemical variables
albumin and log2 (bilirubin) and we can compare with the results for the 3-year restricted
mean life time in Table 4.4. This is because ε0 (τ) + ε1 (τ) + ε2 (τ) = τ (Section 1.2.2) and,
therefore, coefficients β0 j , β1 j , β2 j in linear models for the three ε-parameters will satisfy
β1 j + β2 j = −β0 j . Adding up the β -parameters for the two end-points from the last model
in Table 4.6 we get, −0.068 − 0.079 = −0.147 for treatment, −0.002 − 0.028 = −0.030
for albumin, and 0.091 + 0.124 = 0.215 for log2 (bilirubin), to be compared with the coef-
ficients 0.148, 0.030, and −0.215 for the models for ε0 (τ).
The overall average time lost may be estimated using the g-formula (Section 1.2.5). The
last models in Table 4.6 were re-fitted on 1,000 bootstrap samples and this provided average
time lost due to transplantation of 0.143 years (bootstrap SD = 0.040) for placebo and
0.073 years (bootstrap SD = 0.030) for CyA. This gives an average treatment effect of
−0.070 years (0.050) – close to the estimated treatment effect in the linear model of Table
4.6, however with a somewhat smaller SD. The corresponding numbers for death without
transplantation were, respectively, 0.288 years (0.057) for placebo and 0.208 years (0.048)
for CyA, yielding an average treatment effect of −0.080 years (0.072), both numbers close
to the values from the table.
Table 4.6 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from direct linear models
for time lost (in years) due to death without transplantation or to transplantation before τ = 3 years
(Bili: Bilirubin).
Covariate βb SD βb SD βb SD
Intercept 0.244
Treatment CyA vs. placebo 0.000 0.088 -0.082 0.078 -0.079 0.078
Albumin per 1 g/L -0.019 0.007 -0.028 0.007
log2 (Bili) per doubling 0.159 0.037 0.124 0.037
Sex male vs. female 0.129 0.125
Age per year 0.013 0.004
(b) Transplantation
Covariate βb SD βb SD βb SD
Intercept 0.138
Treatment CyA vs. placebo -0.049 0.089 -0.067 0.085 -0.068 0.083
Albumin per 1 g/L -0.007 0.008 -0.002 0.008
log2 (Bili) per doubling 0.072 0.044 0.091 0.042
Sex male vs. female 0.132 0.139
Age per year -0.010 0.004
142 INTUITION FOR MARGINAL MODELS
Fine-Gray model
The Fine-Gray model is a direct model for the cumulative incidence that expresses
the association with covariates as sub-distribution hazard ratios. Therefore, interpre-
tation of the resulting parameters is challenging. If using the model, this should be
done for all causes in the competing risks model because the association between
a given covariate and a given cumulative incidence may be a result of the way in
which the covariate affects other causes. Based on Fine-Gray models for all causes,
the estimated overall failure risk for given subjects may exceed 1.
µ(t) = E(N(t))
where N(t) is the process counting the number of recurrent events in the interval from 0 to
t.
No terminal event
We will focus on the situation depicted in Figure 1.5 and first look at the case where the
mortality rate is negligible, i.e., state D on that figure is not relevant. In that case it turns
out that the estimating equations that are set up for µ(t) are solved by the Nelson-Aalen
estimator
dN(X)
b (t) =
µ ∑ (4.11)
Y (X)
event times X≤t
(Lawless and Nadeau, 1995). To compute confidence limits around this estimator, robust
estimators of the SD should be used and the confidence interval will typically be based
on symmetric confidence limits for log(µ(t)) – similarly to confidence limits around the
Nelson-Aalen estimator for the cumulative hazard (Section 2.1.1). A regression model
may also be analyzed quite simply since it may be shown (Lawless and Nadeau, 1995;
Lin et al., 2000) that solving what are formally score equations based on a Cox partial log
likelihood
exp(LPevent)
l(β ) = ∑ log
event times, X ∑ j at risk at time X exp(LP j )
leads to valid estimators for β (more details to be given in Section 5.5.4). Robust stan-
dard deviations must be used. A Breslow-type estimator for the baseline mean function
µ0 (t) also exists. The model in Equation (4.12) is often denoted the LWYY model after the
DIRECT MODELS 143
Table 4.7 Recurrent episodes in affective disorders: Estimated ratios between mean numbers of
psychiatric episodes between patients with bipolar vs. unipolar diagnosis (c.i.: confidence interval).
authors of Lin et al. (2000). Just as it was the case for the Cox model (Section 2.2.2), atten-
tion should be paid to the goodness-of-fit of the multiplicative model in Equation (4.12).
Methods for doing this are discussed in Section 5.7.4.
We will exemplify this using data from Example 1.1.5 on recurrent episodes in affective
disorders. By focusing on times from one re-admission to the next, disregarding the fact
that there are in-hospital periods during which the event does not occur (see, e.g., Andersen
et al., 2019), we are in the situation of Figure 1.5. The parameter µ(t), the expected number
of re-admissions in [0,t], refers to a population where the duration of these periods has a
certain distribution, and one should realize that, in a population with another distribution of
these durations, the parameter would have been different. Most importantly, the parameter
µ(t) also refers to a population where patients cannot die. This is a completely unreason-
able assumption, and we include this example, mainly to demonstrate the bias that arises
when we, incorrectly, treat patients who die as censorings in Equation (4.11). We shall see
that this bias is similar to that seen for competing risks in Section 4.1.2 when, incorrectly,
estimating the cumulative incidence using ‘1−Kaplan-Meier’ and we will below return to
the correct analysis, properly taking mortality into account. Figure 4.18 shows the estimated
values of µ(t) for patients whose initial diagnosis was either unipolar or bipolar, obtained
using Equation (4.11). Note that (except for the fact that mortality is treated incorrectly) the
vertical axis has the attractive interpretation as the average numbers of re-admissions over
time since diagnosis and note that bipolar patients, on average, have more re-admissions
than unipolar patients. This discrepancy can be quantified using the multiplicative regres-
sion model from Equation (4.12) and, as seen in Table 4.7, the ratio (assumed constant)
between the two mean curves is estimated to be 1.52.
Terminal event
To perform a satisfactory analysis of the data on recurrent episodes in affective disorders,
we need to estimate µ(t) in the presence of a non-negligible mortality rate. Let S(t) be the
(marginal) survival function, i.e., S(t) = P(TD > t) where TD is time to entry into state D of
Figure 1.5 without consideration of re-admissions before time t. The mean number of re-
current events in the interval from 0 to t may be estimated using the following modification
of the estimator in Equation (4.11)
dN(X)
b (t) =
µ ∑ S(X−)
b (4.13)
Y (X)
event times X≤t
(Cook and Lawless, 1997; Ghosh and Lin, 2000). We will denote Equation (4.13) the Cook-
Lawless estimator. Here, S(·)
b is the Kaplan-Meier estimator for S and the minus sign in
144 INTUITION FOR MARGINAL MODELS
10
8
Expected number of episodes
0
0 5 10 15 20 25 30
Time since first admission (years)
Unipolar Bipolar
Figure 4.18 Recurrent episodes in affective disorders: Estimated average numbers of psychiatric
episodes after initial diagnosis for patients with unipolar or bipolar disorder. NB: mortality is
treated as censoring.
S(X−)
b means that a death event at time X is not included in the Kaplan-Meier estimator
at that time. An SD for this estimator is also available whereby confidence limits may be
computed. Since the Kaplan-Meier estimator is ≤ 1, it is seen by comparing Equations
(4.11) and (4.13) that treating mortality as censoring leads to an upwards biased estima-
tor for µ(t). The intuition behind this bias is the same as that discussed when comparing
the correct Aalen-Johansen estimator and the biased ‘1−Kaplan-Meier’ estimator for the
cumulative incidence with competing risks – namely that by treating dead patients as cen-
sored we pretend that, had they not been ‘censored’, then they would still be at risk for the
recurrent event. The bias is clearly seen in Figure 4.19 when comparing with Figure 4.18
(and even clearer on the cover figure where the birds sit on the correctly estimated curve
for unipolar patients).
The discrepancy between the two curves in Figure 4.19 may be quantified using a multi-
plicative regression model
µ(t | Z) = µ0 (t) exp(LP)
just like for the situation with no mortality, Equation (4.12). However, the estimating equa-
tions now need to be modified to properly account for the presence of a non-negligible
mortality rate (Ghosh and Lin, 2002; more details to be given in Section 5.5.4). We will
refer to this model as the Ghosh-Lin model. Table 4.7 also shows the estimated mean ra-
tio from this model which is seen to be 1.95. For both estimates in Table 4.7, it has been
assumed that censoring does not depend on covariates. The Ghosh-Lin estimate is seen
to be larger than that based on the incorrect assumption of no mortality. The explanation
DIRECT MODELS 145
10
8
Expected number of episodes
0
0 5 10 15 20 25 30
Time since first admission (years)
Unipolar Bipolar
Figure 4.19 Recurrent episodes in affective disorders: Estimated average numbers of psychiatric
admissions after initial diagnosis for patients with unipolar or bipolar disorder. NB: Mortality is
treated as a competing risk using the Cook-Lawless estimator.
is that the bias affects the curves for the two groups differently because unipolar patients
have a higher mortality rate than bipolar patients (estimated hazard ratio between bipolar
and unipolar patients in a Cox model for the marginal mortality rate is 0.410 with 95%
confidence limits from 0.204 to 0.825).
A critique that can be raised against the use of the marginal mean µ(t) in the presence of a
competing risk is that a treatment may appear beneficial if it quickly kills the patient and,
thereby, prevents further recurrent events from happening. Therefore, the occurrence of the
competing event (‘death’) must somehow be considered jointly with the recurrent events
process N(t), at the least by also quoting results from an analysis of the mortality rate.
One approach in this direction is the Mao-Lin (2016) model for the composite end-point
consisting of recurrent events and death – to be discussed in Section 5.5.4.
0.12
Expected number of events per subject
0.10
0.08
0.06
0.04
0.02
0.00
0 12 24 36 48 60
Time since randomization (months)
Mortality treated as a competing risk (CL), Liraglutide
Mortality treated as a competing risk (CL), Placebo
Mortality treated as censoring (NA), Liraglutide
Mortality treated as censoring (NA), Placebo
Figure 4.20 LEADER cardiovascular trial in type 2 diabetes: Estimated average numbers of my-
ocardial infarctions. NB: One curve for each treatment group where mortality is treated as cen-
soring and one for each group where mortality is treated as a competing risk (CL: Cook-Lawless
estimates, NA: Nelson-Aalen estimates).
between liraglutide and placebo without taking mortality into account (LWYY model) is
exp(−0.164) = 0.849 (95% confidence limits from 0.714 to 1.009), while that obtained in
the Ghosh-Lin model is exp(−0.159) = 0.853 (0.718, 1.013). The latter is slightly closer to
1 because the mortality rate in the placebo group is slightly higher (Cox model for all-cause
mortality gives a log(hazard ratio) for placebo vs. liraglutide of 0.166 (SD = 0.070)). We
notice the need for both studying the recurrent events and mortality.
which is just a standard Cox model (in which stratification is also possible, i.e., different
baseline hazards in certain sub-groups). Estimation of the β coefficients and the baseline
hazard(s) follow exactly the same lines as described previously (Section 2.2.1 and Section
3.3), and the estimates are exactly the same as they would have been under independence.
The standard deviations of the estimates will be different because the cluster structure is
taken into account when computing the robust standard deviations instead of the model-
based standard deviations used in previous examples of the Cox model. The robust standard
deviations will often be larger than the model-based since the latter will over-estimate the
amount of precision by over-estimating the number of independent units in the study. This is
typically the case when there is a positive within-cluster correlation. However, in situations
with a negative within-cluster correlation or when the covariate varies within, rather than
among clusters, they may also be smaller. We will consider the bone marrow transplantation
data (Example 1.1.7) and take the cluster structure implied by patients attending different
medical centers into account. We will study models for three different outcomes: relapse,
relapse-free survival (i.e., the composite end-point of either relapse or death in remission
– leaving state 0 in Figure 4.13), or overall survival (time until entry into either state 2
or 3 in that figure). Table 4.8 shows results from models including the three covariates
graft type, disease, and age. For relapse, the estimated coefficients and the model-based
standard deviations are close to those found in Table 3.12 where adjustment for the time-
dependent covariate graft versus host disease was also conducted – without that adjustment
the two sets of estimates and standard deviations would have been identical. The robust
standard deviations tend to be larger than the model-based since a positive within-center
association is suspected. This is, in particular, the case for the covariate age that has a
larger variation among centers than within (F-ratio in a one-way ANOVA is 5.64). This
tendency is confirmed by the overall Wald significance tests for the three coefficients: 21.60
based on the model-based results and 16.36 for the robust for the outcome relapse. The
results for relapse-free survival and overall survival are similar since most events for the
composite end-point are deaths (Table 1.4). Robust standard deviations tend to be larger,
and the three degree of freedom Wald tests for all coefficients are more significant when
based on the model-based results (81.52 vs. 52.59 for relapse-free survival and 76.06 vs.
48.78 for overall survival).
MARGINAL HAZARD MODELS 149
Table 4.8 Bone marrow transplantation in acute leukemia: Estimated coefficients, model-based SD,
and robust SD from marginal hazard models for relapse, relapse-free survival, and overall survival
taking clustering by medical center into account (BM: Bone marrow, PB: Peripheral blood, AML:
Acute myelogenous leukemia, ALL: Acute lymphoblastic leukemia).
SD
βb Model-based Robust Ratio
Relapse
Graft type BM only vs. BM/PB -0.108 0.134 0.138 1.025
Disease ALL vs. AML 0.549 0.129 0.174 1.345
Age per 10 years -0.045 0.044 0.075 1.686
Relapse-free survival
Graft type BM only vs. BM/PB -0.161 0.077 0.077 0.997
Disease ALL vs. AML 0.455 0.078 0.078 1.004
Age per 10 years 0.169 0.026 0.033 1.286
Overall survival
Graft type BM only vs. BM/PB -0.160 0.079 0.081 1.022
Disease ALL vs. AML 0.405 0.080 0.078 0.975
Age per 10 years 0.173 0.026 0.033 1.267
should be taken to mean that the subject either is alive at time t but has not yet experienced
event no. h, or the subject has already died at time t without having had h recurrences.
We have already dismissed in Section 1.3 the first possibility as being unrealistic, and the
second possibility means that αh (t) is a sub-distribution hazard rather than an ordinary
hazard (Section 4.2.2). In the latter case, one may turn to marginal Fine-Gray models for
the cumulative incidences for event occurrence no. h = 1, . . . .K (Zhou et al., 2010).
To have well-defined marginal hazards, the definition of ‘event occurrence no. h’ could
be modified to being the composite end-point ‘occurrence of event no. h or death’, much
like earlier definitions of recurrence-free survival in the bone marrow transplantation study,
Example 1.1.7, or failure of medical treatment (transplantation-free survival) in the PBC3
trial, Example 1.1.1. This possibility was discussed by Li and Lagakos (1997) together with
a suggestion to model the cause-specific hazards for recurrence no. h = 1, . . . , K, taking
mortality into account as a competing risk. In the latter case, hazards are no longer marginal
in the sense of Equation (4.14).
We will illustrate marginal hazard models using the data on recurrence and death in patients
with affective disorders (Example 1.1.5). As in Section 2.5 we will restrict attention to
the first K = 4 recurrences for which the numbers of events (recurrences and/or deaths)
are shown in Table 4.9. Table 4.10 shows the results from analyses including only the
covariate bipolar vs. unipolar disorder. For the composite end-point, the hazard ratios tend
to decrease with episode number (h) while there is rather an opposite trend for the models
for the cause-specific hazards for recurrence no. h = 1, 2, 3, K = 4. The likely explanation
is that, as seen in Table 4.9, the fraction of deaths for the composite end-point increases
with h and, as we have seen earlier (Section 4.2), mortality is higher for unipolar than for
bipolar patients. The estimates for the separate coefficients βh may be compared using a
three degree of freedom Wald test which for the composite end-point is 6.08 (P = 0.11)
and for the cause-specific hazards is 8.27 (P = 0.04). Even though the latter is borderline
statistically significant, Table 4.10 also shows, for both analyses, the estimated log(hazard
ratio) in the model where β is the same for all h.
The answer seems to be ‘no’ because the marginal hazard for relapse given in Equation
(4.14) is not well defined in the relevant population where death also operates. There have
been attempts in the literature to do this anyway (taking into account the ‘informative cen-
soring by death’) under the heading of semi-competing risks (e.g., Fine et al. 2001) , but
we will not follow that idea here and refer to further discussion in Section 4.4.4. Instead,
we will proceed as in the recurrent events case and re-define the problem to jointly study
times to death and times to the composite end-point of either relapse or death (relapse-free
survival). These times are correlated within each patient, since all deaths in remission count
as events of both types but their marginal hazards are well defined. The numbers of events
are 737 overall deaths and 764 (= 259 + 737 − 232, cf. Table 1.4) occurrences of relapse
or death in remission. Table 4.11 shows results from models including the covariates graft
type, disease, and age. The models are fitted using a stratified Cox model, stratified for the
two types of events with type-specific covariates and using robust SD. Note that, for both
end-points, the estimates are the same as those found in Table 4.8. This is because they
solve the same estimating equations. The robust SD are also close but not identical because
another clustering is now taken into account (patient rather than center as in Table 4.8). The
two sets of coefficients are strongly correlated: The estimated correlations are 0.98, 0.96,
and 0.97, respectively, for graft type, disease, and age. These correlations are accounted
for in Wald tests for equality of the two sets of coefficients. These are 0.004 (P = 0.95),
5.17 (P = 0.023), and 0.32 (P = 0.57), respectively, for the three covariates. Under the
hypothesis of equal coefficients for graft type and age, the estimates are −0.161 (0.077)
and 0.171 (0.026). Note that the SDs are not much reduced as a consequence of the high
correlations.
If one, further, wishes to include ‘time to GvHD’ in the analysis, then this needs to be
defined as GvHD-free survival, i.e., events for this outcome are either GvHD or death,
whatever comes first. For this outcome there are 1, 324 (= 976 + 737 − 389, cf. Table 1.4)
events of which 976 are GvHD occurrences. The results from an analysis of all three out-
comes are found in Table 4.11 (where those for overall and relapse-free survival are the
same as in the previous model). It is seen that the estimated coefficients for GvHD-free sur-
vival differ somewhat from those for the other two end-points since the majority of these
events are not deaths.
152 INTUITION FOR MARGINAL MODELS
Table 4.11 Bone marrow transplantation in acute leukemia: Estimated coefficients (and robust SD)
from marginal hazard models for relapse-free survival, overall survival, and GvHD-free survival
(BM: Bone marrow, PB: Peripheral blood, AML: Acute myelogenous leukemia, ALL: Acute lym-
phoblastic leukemia).
βb SD
Relapse-free survival
Graft type BM only vs. BM/PB -0.161 0.077
Disease ALL vs. AML 0.455 0.078
Age per 10 years 0.169 0.026
Overall survival
Graft type BM only vs. BM/PB -0.160 0.079
Disease ALL vs. AML 0.405 0.079
Age per 10 years 0.173 0.027
GvHD-free survival
Graft type BM only vs. BM/PB -0.260 0.059
Disease ALL vs. AML 0.293 0.060
Age per 10 years 0.117 0.019
A marginal hazard model describes the marginal distribution of the time to a certain
event. For clustered data, this is carried out without consideration of the event times
for other cluster members and without having to specify the within-cluster associ-
ation, and, in this situation, marginal hazard models are useful. The same may be
the case in a recurrent events situation without a terminal event, in which case the
marginal time to first, second, third, etc. event may be analyzed without having to
specify their dependence (the WLW model).
However, in situations with competing risks (both for recurrent events and other
multi-state models) the concept of a marginal hazard is less obvious.
Y (t)
G(t−)
b S(t−)
b = , (4.15)
n
the fraction of subjects still in the study just before t (where, as previously, the t− means
that a possible jump in the estimator at t is not yet included).
154 INTUITION FOR MARGINAL MODELS
1.0
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
Time since randomization (years)
Figure 4.21 PBC3 trial in liver cirrhosis: Kaplan-Meier estimates for censoring and treatment fail-
ure.
To study if censoring depends on covariates, simple Cox models were studied including,
respectively, treatment, albumin or log2 (bilirubin). The estimates (SD) were, respectively,
βb = 0.084 (0.126), βb = 0.0010 (0.013), and βb = −0.0025 (0.0018) with associated (Wald)
P-values 0.50, 0.94, and 0.16 showing that the censoring times (entry times) have a distri-
bution that is independent of the prognostic variables.
0.9
0.8
0.7
Probability of no censoring
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7
Time since randomization (years)
Figure 4.22 PROVA trial in liver cirrhosis: Kaplan-Meier estimate for censoring.
Table 4.12 PROVA trial in liver cirrhosis: Tests for association between censoring and covariates.
Covariate P-value
Treatment 0.98
Size of varices 0.10
Sex 0.59
Coagulation factors 0.28
log2 (Bilirubin) 0.18
Age 0.003
(bipolar versus unipolar, P = 0.20), but a very strong association with calendar time at
initial diagnosis (P < 0.001). It was seen in Table 3.6 that the variable was also associated
with the recurrence intensity, but also that adjustment did not much affect the estimate for
the initial diagnosis.
0.9
0.8
0.7
Probability of no censoring
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 5 10 15 20 25 30
Time since first admission (years)
Figure 4.23 Recurrent episodes in affective disorders: Kaplan-Meier estimate for censoring.
For models based on intensities, the situation is different. If censoring is independent, i.e.,
censoring is conditionally independent of the multi-state process given covariates, then the
methods discussed in Chapters 2 and 3 apply. However, this strictly requires that the covari-
ates which create the conditional independence are accounted for in the hazard model. We
hinted at this in Sections 1.3 and 2.2. The mathematical explanation follows from Section
3.1 where we emphasized that observation of the censoring times gives rise to likelihood
contributions which factorize from the failure contributions under the conditional indepen-
dence assumption. This means that the contributions arising from censoring can be disre-
garded when analyzing the failure intensities, and one needs not worry about the way in
which censoring is affected by covariates (‘the censoring model may be misspecified’) ex-
cept from the fact that the covariates that create the conditional independence are accounted
for in the intensity model.
So, the conclusion is that intensity-based models and, thereby, inference for marginal pa-
rameters based on plug-in, are less sensitive to the way in which covariates may affect cen-
soring. However, to achieve this flexibility, one has to assume that censoring is independent
given covariates and covariates affecting the censoring distribution should be accounted for
in the hazard model provided that they are associated with the hazard. Investigations along
these line were exemplified in the previous section.
with marginal distributions Seh (th ) = P(Teh > th ), h = 1, 2 (Se1 (t1 ) = S(t
e 1 , 0) and Se2 (t2 ) =
e 2 )) and associated marginal (or net) hazards. Complete observations would be the
S(0,t
smaller of the latent failure times, min(Te1 , Te2 ) and the corresponding cause (the index of
the minimum). Incomplete observation may be a consequence of right-censoring at C, in
which case it is only known that both Te1 and Te2 are greater than C, and the cause of failure
is unknown.
A major problem with this approach is that, as discussed, e.g., by Tsiatis (1975), with
e 1 ,t2 ) is unidentifiable from the available
no further assumptions, the joint distribution S(t
observations – also in the case of no censoring. What is identifiable are the cause-specific
hazard functions αh (t) and functions thereof, such as the overall survival function S(t) and
the cumulative incidences Fh (t) (e.g., Prentice et al., 1978), i.e., exactly the quantities that
we have been focusing on. In terms of the joint survival function, S(t) = S(t,t) e and the
cause-specific hazards are obtained as
∂
αh (t) = − e 1 ,t2 )|t =t =t .
log S(t 1 2
∂th
An assumption that would make S(te 1 ,t2 ) identifiable is independence of Te1 , Te2 – a situation
known as independent competing risks and, under that assumption, the hazard of Teh (the
net hazard) equals αh (t). However, as just mentioned, independence cannot be identified
by observing only the minimum of the latent failure times and the associated cause. Another
e 1 ,t2 ) identifiable was discussed by Zheng and Klein (1995)
assumption that would make S(t
158 INTUITION FOR MARGINAL MODELS
who showed that if the dependence structure is specified by some specific shared frailty
e 1 ,t2 ) can be estimated. The problem is that there is no
model (see Section 3.9), then S(t
support in the data for any such dependence structure.
These problems were nicely illustrated by Kalbfleisch and Prentice (1980, ch. 7) who gave
an example of two different joint survival functions, one corresponding to independence,
the other not necessarily, with identical cause-specific hazards. More specifically the fol-
lowing two joint survival functions were studied
e 1 ,t2 ) = exp(1 − α1t1 − α2t2 − exp(α12 (α1t1 + α2t2 )))
S(t
e Te1 and Te2 are independent if α12 = 0 (in which case S(t
For S, e 1 ,t2 ) is the product
exp(−α1t1 ) exp(−α2t2 )) and the parameter α12 , therefore, quantifies deviations from in-
dependence. On the other hand, Se∗ (t1 ,t2 ) corresponds to independence no matter the value
of α12 (it is always a product of survival functions of t1 and t2 ); however, the cause-specific
hazards for Se and Se∗ are the same, namely
So, even though α1 , α2 , α12 can all be estimated if this model for the cause-specific hazards
were postulated, the estimated value of α12 cannot be taken to quantify discrepancies from
independence, since it cannot be ascertained from the data whether Se or Se∗ (or some other
model with the same cause-specific hazards) is correctly specified.
Such considerations have had the consequence that the latent failure time approach to com-
peting risks has been more or less abandoned in biostatistics (e.g., Prentice et al., 1978;
Andersen et al., 2012), and emphasis has been on cause-specific hazards and cumulative
incidences. Even when simulating competing risks data, it has been argued (Beyersmann et
al., 2009) that one should not use latent failure times even though this provides data with
the same distribution as when using cause-specific hazards. We will return to this question
in Section 5.4.
In summary, both concepts of independent censoring and independent competing risks re-
late to independence between two random time-variables. However, it is only in the former
case that these two random variables (time to event and time to censoring) are well-defined,
and, for that reason, we find the concept of independent censoring important, whereas we
find the concept of independent competing risks (and the associated latent failure time ap-
proach) less relevant.
Exercise 4.1 Consider the data from the Copenhagen Holter study and estimate the proba-
bilities of stroke-free survival for subjects with or without ESVEA using the Kaplan-Meier
estimator.
Exercise 4.2 Consider the Cox model for stroke-free survival in the Copenhagen Holter
study including the covariates ESVEA, sex, age, and systolic blood pressure (Exercise 2.4).
1. Estimate the survival functions for a female subject aged 65 years and with systolic
blood pressure equal to 150 mmHg – either with or without ESVEA.
2. Estimate the survival functions for patients with or without ESVEA using the g-formula.
Exercise 4.3 Consider the data from the Copenhagen Holter study and fit a linear model
for the 3-year restricted mean time to the composite end-point stroke or death including
ESVEA, sex, age, and systolic blood pressure.
Exercise 4.4 Consider the Cox models for the cause-specific hazards for the outcomes
stroke and death without stroke in the Copenhagen Holter study including ESVEA, sex,
age, and systolic blood pressure (Exercise 2.7). Estimate (using plug-in) the cumulative
incidences for both end-points for a female subject aged 65 years and with systolic blood
pressure equal to 150 mmHg – either with or without ESVEA.
Exercise 4.5
1. Repeat the previous question using instead Fine-Gray models.
2. Estimate the cumulative incidence functions for patients with or without ESVEA using
the g-formula.
Exercise 4.6 Consider the data from the Copenhagen Holter study and fit linear models
for the expected time lost (numbers of years) before 3 years due to either stroke or death
without stroke including ESVEA, sex, age, and systolic blood pressure.
Exercise 4.7 Consider an illness-death model for the Copenhagen Holter study with states
‘0: Alive without AF or stroke’, ‘1: Alive with AF and no stroke’, ‘2: Dead or stroke’, see
Figures 1.3 and 1.7.
1. Estimate the prevalence of AF.
2. Estimate the expected lengths of stay in states 0 or 1 up to 3 years.
3. Evaluate the SD of the expected lengths of stay using the bootstrap.
Exercise 4.9 Consider the data on recurrent episodes in affective disorder, Example 1.1.5.
1. Estimate the mean number of episodes, µ(t), in [0,t] for unipolar and bipolar patients,
taking the mortality into account.
2. Estimate, incorrectly, the same mean curves by treating death as censoring and compare
with the correct curves from the first question, thereby, re-constructing the cover figure
from this book (unipolar patients).
Exercise 4.10 Consider the data from the Copenhagen Holter study.
1. Estimate the distribution, G(t) of censoring.
2. Examine to what extent this distribution depends on the variables ESVEA, sex, age, and
systolic blood pressure.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Chapter 5
Marginal models
This chapter explains some of the mathematical foundation for the methods illustrated via
the practical examples in Chapter 4. As we did in that chapter, we will separately study
methods based on plug-in of results from intensity models and, here, it turns out to be cru-
cial whether the multi-state process is Markovian (Section 5.1) or not (Section 5.2). Along
the way, we will also introduce methods that were not exemplified in Chapter 4, including
the techniques of landmarking and micro-simulation (Sections 5.3 and 5.4), both of which
also build on plug-in of hazard models. The second part of this chapter, Sections 5.5-5.7,
describes the background for direct models for marginal parameters based on generalized
estimating equations (GEEs), including a general method based on cumulative residuals
for assessment of goodness-of-fit. Finally, Section 5.8 provides practical examples of new
methods presented.
P(s,t) = π
(s,t]
I + dA(u)
= lim ∏ I + A(ui ) − A(ui−1 ) (5.2)
max |ui −ui−1 |→0
for any partition s = u0 < u1 < · · · < uN = t of (s,t] (Gill and Johansen, 1990). Here, I is the
(k + 1) × (k + 1) identity matrix. We defined the Ah j as integrated intensities, but (5.2) is
163
164 MARGINAL MODELS
also well-defined if the Ah j have jumps, and in the case where the Ah j correspond to purely
discrete measures, the product-integral (5.2) is just a finite matrix product over the jump
times in (s,t] (reflecting the Chapman-Kolmogorov equations). In the special case where
all intensities are time-constant on (s,t], the product-integral (5.2) is the matrix exponential
∞
1
α · (t − s)) = I + ∑ (α
P(s,t) = exp(α α · (t − s))i .
i=1 i!
It now holds (Gill and Johansen, 1990), that P(s,t) is the transition probability matrix
for the Markov process V (·), i.e., element h, j is Ph j (s,t) = P(V (t) = j|V (s) = h). If A is
absolutely continuous, then, for given intensity matrix α (t), the matrix P given by (5.2)
solves the Kolmogorov forward differential equations
∂
P(s,t) = P(s,t)α
α (t), with P(s, s) = I. (5.3)
∂t
This suggests plug-in estimators for P(s,t) based on models fitted for the intensities. A
non-parametric estimator for P(s,t) for an assumed homogeneous group is obtained by
plugging-in the Nelson-Aalen estimator A,
b and the resulting estimator
P(s,t)
b = πI + d A(u)
(s,t]
b
(5.4)
is the Aalen-Johansen estimator (Aalen and Johansen, 1978; Andersen et al., 1993; ch. IV).
If the transition intensities for a homogeneous group are assumed piece-wise constant on
(s,t], e.g., α = α 1 on (s, u] and α = α 2 on (u,t], then α 1 and α 2 are estimated by separate
occurrence/exposure rates in the two sub-intervals, and the plug-in estimator, using the
Chapman-Kolmogorov equations, is
P(s,t)
b = P(s,
b u)P(u,t)
b b 1 · (u − s)) exp(α
= exp(α b 2 · (t − u)), (5.5)
with similar expressions if there are more than two sub-intervals of (s,t] on which α is
constant. Both this expression and (5.4) also apply if the model for the intensities is a
hazard regression model with time-fixed covariates, e.g., a Cox model or a Poisson model.
The situation with time-dependent covariates is more challenging and will be discussed
in Section 5.2.4. Aalen et al. (2001) studied plug-in estimation based on additive hazards
models.
The state occupation probabilities are
In the situation where all subjects are in the same state (0) at time t = 0, i.e., Q0 (0) = 1,
these are Qh (t) = P0h (0,t) and the product-limit estimator may be used for this marginal
parameter. We will pay special attention to this situation in what follows. Based on the
state occupation probabilities, another marginal parameter, the expected length of stay in
state h before time τ, is directly obtained as
Z τ
εh (τ) = Qh (t)dt.
0
PLUG-IN FOR MARKOV PROCESSES (*) 165
Since these marginal parameters (and P(s,t)) are differentiable functionals of the intensi-
ties, large-sample properties of the resulting plug-in estimators may be derived from those
of the intensity estimators using functional delta-methods. The details are beyond the scope
of this presentation and may be found in Andersen et al. (1993; ch. II and IV).
We will now look at some of the multi-state models introduced in Section 1.1.
∂
P00 (s,t) = −P00 (s,t)α(t)
∂t
with the solution P00 (s,t) = exp(− st α(u)du). In Section 4.1, an intuitive argument for this
R
expression was given and the survival function S(t) = P00 (0,t) may be estimated by plug-in
based on a model for α(t), e.g., a piece-wise constant hazard model. If A corresponds to
a discrete distribution, such as that estimated by the Nelson-Aalen estimator or by a Cox
model using the Breslow estimator for the cumulative baseline hazard, then it is the general
product-integral in which plug-in should be made and not the exponential expression. This
leads in the case of the Nelson-Aalen estimator to the Kaplan-Meier estimator for S(t)
b =
S(t) π 1 − dA(u)
[0,t]
b
= ∏ 1−
dN(X )
Y (X )
Xi ≤t i
i
(Kaplan and Meier, 1958) and the conditional Kaplan-Meier estimator for P00 (s,t) = S(t | s)
b | s) = dN(Xi )
S(t ∏ 1− ,
s<Xi ≤t Y (Xi )
where R(t) = { j : Y j (t) = 1} is the risk set at time t, cf. Equation (3.19). This estimator may
become negative and an alternative and commonly used estimator is
T
b | Z ) = exp(−A
S(t b0 (t) exp(βb Z )),
where A b0 (t) is the Breslow estimator (3.18). This estimator can be criticized for using
the continuous-time version of the product-integral on a time-discrete estimator; how-
ever, from a practical point of view, the differences between these estimators tend to
be of minor importance, cf. the discussion in Section 4.1.1 about not estimating S(t) by
‘exp(−Nelson-Aalen)’.
From estimates of the survival function S, the restricted mean life time
Z τ
ε0 (τ) = E(T ∧ τ) = S(t)dt
0
may be estimated by plug-in, also for given covariates based on a hazard regression model.
ε0 (τ) = (1/n) ∑i (Ti ∧ τ) is a simple average.
In the special case of no censoring, b
where A1 , A2 are the cumulative cause-specific hazards for the two competing events. With
the P matrix
P00 (s,t) P01 (s,t) P02 (s,t)
P(s,t) = 0 1 0
0 0 1
PLUG-IN FOR MARKOV PROCESSES (*) 167
(where P00 = 1 − P01 − P02 and P0h (s,t), h = 1, 2 are the conditional cumulative incidences),
the Kolmogorov forward equations become
∂
P00 (s,t) = −P00 (s,t)(α1 (t) + α2 (t))
∂t
∂
P0h (s,t) = P00 (s,t)αh (t), h = 1, 2
∂t
with solutions
Zt
P00 (s,t) = exp − (α1 (u) + α2 (u))du
s
Z t
P0h (s,t) = P00 (s, u)αh (u)du, h = 1, 2.
s
In Section 4.1, intuitive arguments for these expressions were given. The resulting non-
parametric plug-in estimators (for s = 0) are the overall Kaplan-Meier estimator
dN(Xi )
b = Pb00 (0,t) = Q
S(t) b0 (t) = ∏ 1−
Xi ≤t Y (Xi )
(jumping at times of failure from either cause) and the cumulative incidence estimator
Z t
bh (t) = Fbh (t) = Pb0h (0,t) = dNh (u)
Q S(u−)
b , h = 1, 2, (5.6)
0 Y (u)
where Nh (·) counts failures from cause h = 1, 2 and Y (t) is the number of subjects at risk
in state 0 just before time t. The estimator (5.6) was discussed by, e.g., Aalen (1978), and
is often denoted the Aalen-Johansen estimator even though this name is also used for the
general estimator (5.4) (Aalen and Johansen, 1978). Note that, in (5.6), the Kaplan-Meier
estimator is evaluated just before a cause-h failure time at u.
bh (t) = Nh (t)/n.
In the special case of no censoring, Q
Expressions similar to Sb and (5.6) apply when estimating the state occupation probabilities
Qh (t), h = 0, 1, 2 based on, e.g., Cox models, for the two cause-specific hazards: αh (t | Z ) =
βT
αh0 (t) exp(β h Z ). In all cases, variance estimators are available (e.g., Andersen et al., 1991;
1993, ch. VII). Plug-in based on piece-wise exponential models may also be performed.
If T , as in the two-state model, is the life time (time spent in state 0), then the restricted
mean life time is Z τ
ε0 (τ) = E(T ∧ τ) = S(t)dt
0
and plug-in estimation is straightforward. In the competing risks model, one can also study
the random variables
Th = inf{t > 0 : V (t) = h}, h = 1, 2,
i.e., the times of entry into state h = 1, 2. These are improper random variables because,
e.g., P(T1 = ∞) = limt→∞ Q2 (t) is the positive probability that cause 1 never happens. The
restricted random variables Th ∧ τ are proper and
Z τ Z τ Z τ
E(Th ∧ τ) = E I(Th > t)dt = τ − E I(Th ≤ t)dt = τ − Qh (t)dt.
0 0 0
168 MARGINAL MODELS
It follows that Z τ
εh (τ) = Qh (t)dt
0
can be interpreted as the expected life time lost due to cause h before time τ and plug-in
estimation is straightforward (Andersen, 2013).
εh (τ) = (1/n) ∑i (τ − Thi ∧ τ) is a simple average.
In the special case of no censoring, b
with P00 = 1 − P01 − P02 and P11 = 1 − P12 . The Kolmogorov forward equations become
∂
P00 (s,t) = −P00 (s,t)(α01 (t) + α02 (t))
∂t
∂
P01 (s,t) = P01 (s,t)α01 (t) − P11 (s,t)α12 (t)
∂t
∂
P11 (s,t) = −P11 (s,t)α12 (t)
∂t
with solutions
Z t
P00 (s,t) = exp(− (α01 (u) + α02 (u))du)
s
Z t
P11 (s,t) = exp(− α12 (u)du)
s
Z t
P01 (s,t) = P00 (s, u)α01 (u)P11 (u,t)du.
s
In Section 4.1.3, intuitive arguments for the latter expression were given, and plug-in es-
timation is possible. Classical work on the illness-death model include Fix and Neyman
(1951) and Sverdrup (1965).
Additional marginal parameters for the illness-death model (with or without recovery) in-
clude Z τ
εh (τ) = Qh (t)dt, h = 0, 1,
0
PLUG-IN FOR MARKOV PROCESSES (*) 169
the expected length of stay in [0, τ], respectively, alive without or with the illness. The
prevalence is
Q1 (t)
,
Q0 (t) + Q1 (t)
i.e., the probability of living with the illness at time t given alive at that time.
with P00 = 1 − P01 − P02 and P11 = 1 − P10 − P12 . In contrast to the progressive illness-death
model, the Kolmogorov forward equations do not have an explicit solution but for given
Nelson-Aalen estimates A bh j , P can be estimated from (5.4) using plug-in. For the recurrent
events intensity α01 (t), this corresponds to an AG-type Markov model, see Section 2.5.
Assuming, further, that Q0 (0) = 1, both the εh (τ) and the prevalence may be estimated as
in the previous section. This was exemplified in Section 4.1.3 using the data on recurrence
in affective disorders (Example 1.1.5).
For the case with no intervals between at-risk periods, a maximum number (say, K) of
recurrent events to be considered must be decided upon in order to get a transition matrix
with finite dimension. Letting, e.g., K = 2 this corresponds to having states 0 and 1 in
Figure 1.5 transient and states 2 and D absorbing (i.e., transitions out of state 2 are not
considered). Re-labelling state D as 3, this gives the A(t) matrix
−A01 (t) − A03 (t) A01 (t) 0 A03 (t)
0 −A12 (t) − A13 (t) A12 (t) A13 (t)
A(t) =
0 0 0 0
0 0 0 0
because, at some time u < t, a 0 → 2-transition must have happened and, between times u
and t, the subject must stay in state 2. For the other path, i.e., via state 1, the probability is
Z t Z t
(b)
Q2 (t) = P00 (0, u)α01 (u) P11 (u, x)α12 (x)P22 (x,t)dxdu,
0 u
reflecting that, first a 0 → 1-transition must happen (at u < t), next the subject must stay
in state 1 from time u to a time x between u and t, make a 1 → 2-transition at x and stay
in state 2 between x and t. Similar expressions, though cumbersome, may be derived for
Qh (t) parameters in other progressive processes. These are not crucial for Markov processes
as discussed so far in this chapter because the general product-integral representation is
available; however, similar arguments may be applied also to some semi-Markov processes
where intensities not only depend on time t but also on the time spent in a given state. This
will be discussed in Section 5.2.4.
Z t
b∗h j (t) = dNh j (u)
A
0 Yh (u)
with Nh j ,Yh defined as previously. Extensions to the situation where censoring and V (t)
depend on (possibly, time-dependent) covariates were studied by Datta and Satten (2002)
and Gunnes et al. (2007).
The partial transition rates may be of interest in their own right and asymptotic results
follow from Glidden (2002). The partial rates are also important for marginal models for
recurrent events as we shall see in Section 5.5.4 where we will also argue why the Nelson-
Aalen estimator is consistent for A∗h j (t). However, their main interest lies in the fact that
they provide a step towards estimating state occupation probabilities. Assume for simplic-
ity that all subjects occupy the same state, 0 at time 0, i.e., Q0 (0) = 1. In that case, the top
row of the (k + 1) × (k + 1) product-integral matrix
π
(0,t]
I + dA∗ (u)
is the vector of state occupation probabilities Q(t) = (Q0 (t), Q1 (t), . . . , Qk (t)), suggesting
the plug-in estimator
b = (1, 0, . . . , 0)
Q(t) πb ∗ (u)
I + dA
(0,t]
(5.8)
which is the top row of the Aalen-Johansen estimator. Asymptotic results for (5.8) were also
given by Glidden (2002), including both a complex variance estimator and a simulation-
based way of assessing the uncertainty based on an idea of Lin et al. (1993) – an idea
that we will return to in connection with goodness-of-fit examinations using cumulative
residuals in Section 5.7.
The estimator (5.8) works for any state in a multi-state model, but for a transient state in
a progressive model, an alternative is available. This estimator, originally proposed for the
non-Markov irreversible illness-death model (Figure 1.3) by Pepe (1991) and Pepe et al.
172 MARGINAL MODELS
(1991), builds on the difference between Kaplan-Meier estimators and is, as such, not a
plug-in estimator based on intensities. We discuss it here for completeness and it works,
as follows, for the illness-death model. If T0 is the time spent in the initial state and T2
is the time of death, both random variables observed, possibly with right-censoring, then
the Kaplan-Meier estimator based on T0 estimates Q0 (t) while that based on T2 estimates
1−Q2 (t) = Q0 (t)+Q1 (t), so, their difference estimates the probability Q1 (t) of being in the
transient state 1 at time t. The resulting estimator is known as the Pepe estimator, and Pepe
(1991) also provided variance estimators. Alternatively, a non-parametric bootstrap may
be applied to assess the variability of the estimator. This idea generalizes to any transient
state in a progressive model. To estimate Qh (t) for such a state, one may use the difference
between the Kaplan-Meier estimators of staying in the set of states, say Sh from which
state h is reachable and that of staying in Sh ∪ {h}.
Rτ
Based on an estimator for Qh (t), the expected length of stay in that state, εh (τ) = 0 Qh (t)dt
may be estimated by plug-in.
dN LM
j` (t) = ∑ dN j`i (t)Yhi (s), Y jLM (t) = ∑ Y ji (t)Yhi (s), t ≥ s.
i i
Here ‘LM’ stands for landmarking, a common name used for restricting attention to sub-
jects who are still at risk at a landmark time point, here time s (Anderson et al., 1983; van
Houwelingen, 2007). We will return to uses of the landmarking idea in Section 5.3 where
models with time-dependent covariates are studied. The Nelson-Aalen estimators for the
partial transition rates based on these sub-sets are
Z t dN LM (u)
b∗LM j`
A j` (t) = , t ≥ s.
s Y jLM (u)
where QLM (s) is the (k + 1) row vector with element h equal to 1 and other elements equal
to 0. For fixed s, the asymptotic properties of (5.9) follow from the results of Glidden
(2002).
PLUG-IN FOR NON-MARKOV PROCESSES (*) 173
The work by Titman (2015) on transition probabilities for non-Markov models should also
be mentioned here, even though the methods are not based on plug-in of intensities. Follow-
ing Uña-Alvarez and Meira-Machado (2015), Titman suggested a similar extension (i.e.,
based on sub-setting) of the Pepe estimator for a transient state j in a progressive model.
To estimate Ph j (s,t), one looks at the sub-set of processes Vi (·) in state h at time s and,
for fixed s, this transition probability is estimated as the difference between Kaplan-Meier
estimators of staying in sets of states Sh j and Sh j ∪ { j}, respectively, at time t where Sh j
is the set of states reachable from h and from which j can be reached. A variance estimator
was also presented.
For any state, j (absorbing or transient) in any multi-state model (progressive or not), Tit-
man (2015) also suggested another estimator for Ph j (s,t) based on sub-setting to processes
in state h at time s, as follows. Define Rh j to be the set of states reachable from h but
from which j cannot be reached. For the considered sub-set of processes, the following
competing risks process for u ≥ s is defined when j is an absorbing state
/ Rh j ∪ { j},
0 if V (u) ∈
Vs∗ (u) = 1 if V (u) ∈ Rh j ,
2 if V (u) = j.
For the considered sub-set, this process is linked to V (t) by the relation Ph j (s,t) =
P(Vs∗ (t) = 2) and the desired transition probability can be estimated using the Aalen-
Johansen estimator for the cause 2 cumulative incidence for Vs∗ (t). More specifically, if
∗ (u) counts cause ` = 1, 2 events and Y ∗ (u) is the number still at risk for cause 1 or 2
Ns` s
events at time u− then the estimator is
Z t ∗
PbhTj (s,t) = b s∗ (u) = 0 | Vs∗ (s) = 0) dNs2 (u) ,
P(V
s Ys∗ (u)
If j is a transient state, then the following survival process for u ≥ s is defined for the
considered sub-set of processes
/ Rh j ,
∗ 0 if V (u) ∈
Vs (u) =
1 if V (u) ∈ Rh j .
For this sub-set, the process Vs∗ (t) is related to V (t) via Ph j (s,t) = P(Vs∗ (t) = 0)P(V (t) =
j | Vs∗ (t) = 0), where the first factor can be estimated by the Kaplan-Meier estimator for
Vs∗ (t). Titman (2015) proposed to estimate the second factor by the relative frequency of
processes in state j at time t among those for which Vs∗ (t) = 0, i.e., by
and since the Qh (t) may be estimated by (5.8), we can estimate µ(t) by plug-in. A difficulty
is, though, that one has to decide on the number of terms to include in the sum – a choice
that need not be clear-cut since, for a large number, h of events, there may not be sufficient
data to properly estimate Qh (t).
We shall later (Section 5.5.4) see how direct models for µ(t) may be set up using general-
ized estimating equations. This also leads to an alternative plug-in estimator suggested by
Cook et al. (2009) that we will discuss there.
and Z t Z t
(b)
Q2 (t) = P00 (0, u)α01 (u) P11 (u, x)α12 (x)P22 (x,t)dxdu.
0 u
Suppose now that the transition intensities out of states 1 and 2 depend on both t and d
and that we have modeled α12 (t, d), α13 (t, d) and α23 (t, d). In such a situation, for t > s,
the probability of staying in state 1 between times s and t given entry into the state at time
T1 ≤ s is given by
because the waiting time distribution in state 1 given entry at T1 has hazard function
α12 (u, u − T1 ) + α13 (u, u − T1 ) at time u. Similarly, the probability af staying in state 2
between times s and t given entry into the state at time T2 ≤ s is given by
and Z t Z t
(b∗)
Q2 (t) = P00 (0, u)α01 (u) P11 (u, x; u)α12 (x, x − u)P22 (x,t; x)dxdu.
0 u
Note that P22 in this expression could also depend on the time u of 0 → 1 transition (though,
in that case the process would not be termed semi-Markov). This idea generalizes to other
progressive semi-Markov processes and to multi-state processes where the dependence on
the past is modeled using adapted time-dependent covariates; however, both the resulting
expressions and the associated variance calculations tend to get rather complex, as demon-
strated by Shu et al. (2007) for the irreversible illness-death model.
176 MARGINAL MODELS
Markov and non-Markov processes
For Markov processes, the intensities at time t only depend on the past history via
the state occupied at t and, possibly, via time-fixed covariates. They have attrac-
tive mathematical properties, most importantly that transition probabilities may be
obtained using plug-in via the product-integral.
Both the two-state model and the competing risks model are born Markov; however,
for more complicated multi-state models the Markov assumption is restrictive and
may not be fulfilled in practical examples. Analysis of non-Markov processes is less
straightforward, though state occupation probabilities (and expected length of stay
in a state) may be obtained using product-integration.
5.3 Landmarking
Plug-in works for hazard models with time-fixed covariates and for some models with
adapted time-dependent covariates as exemplified in Sections 3.7, 5.1, and 5.2.4. For haz-
ard models with non-adapted time-dependent covariates Z(t), it is typically not possible to
express parameters such as transition or state occupation probabilities using only the tran-
sition intensities. This is because the future course of the process V (t) will also depend on
the way in which the time-dependent covariates develop. In such a situation, in order to
estimate these probabilities, a joint model for V (t) and Z(t) is needed and we will briefly
discuss such joint models in Section 7.4. One way of approximating these probabilities
using plug-in estimators is based on landmarking. In Sections 5.3.1-5.3.3, we will discuss
this concept in the framework of the two-state model for survival data (Figure 1.1) with an
illustration using the example on bone marrow transplantation in acute leukemia (Example
1.1.7). In Section 5.3.4, we will briefly mention the extensions needed for more complex
multi-state models, and Section 5.3.5 provides some of the mathematical details.
with the same smoothing functions for all covariates. If, in Equation (5.11), we let f1 (s1 ) =
· · · = fm (s1 ) = 0, then βk will be the effect of covariate k at the first landmark, s1 . A typical
choice could be m = 2 and
To fit these models, a data duplication trick, following the lines of Section 3.8 can be ap-
plied. Thus, L copies of the data set are needed, where copy number j includes all subjects
still at risk at landmark s j , i.e., subjects i with Yi (s j ) = 1. The Cox model is stratified on
178 MARGINAL MODELS
Table 5.1 Bone marrow transplantation in acute leukemia: Distribution of the time-dependent co-
variates ANC500 and GvHD at entry and at five landmarks (ANC: Absolute neutrophil count,
GvHD: Graft versus host disease).
Landmark s j (months)
0 0.5 1.0 1.5 2.0 2.5
At risk Y (s j ) 2099 1988 1949 1905 1876 1829
ANC500(s j ) = 1 0 906 1912 1899 1874 1828
GvHD(s j ) = 1 0 180 391 481 499 495
j (to yield separate baseline hazards α0s j (t)), and the model in stratum j should include
the interactions Z · f` (s j ), ` = 1, . . . , m. For inference, robust standard deviations should be
used.
Having separate baseline hazards for each landmark will provide different models for the
hazard at some time points if the time horizons thor (s j ) are chosen in such a way that
prediction intervals overlap. Therefore, the baseline hazards could also be taken to vary
from one landmark to the next in a smooth way by letting
with all g` (s1 ) = 0, such that α0 (t) refers to the first landmark, s1 . Often, one would choose
the same smoothing functions for regression coefficients and baseline hazards, i.e., m = m0
and g` = f` . This model can also be fitted using the duplicated data set; however, stratifica-
tion on j should no longer be imposed because only a single baseline hazard is needed. The
model with baseline hazard given by Equation (5.12) provides a description for all values
of s and, thereby, using this model conditional predictions may be obtained for all s, not
only for the landmarks chosen when fitting the model.
Note the similarity with the tests for proportional hazards using time-dependent covariates
as discussed in Section 3.7.7. It is seen that the idea of studying departures from proportion-
ality using suitably defined functions f j (t) may also be used to obtain flexible models with
non-proportional hazards – both for models with time-fixed and time-dependent covariates.
Table 5.3 Bone marrow transplantation in acute leukemia: Estimated (Est) smooth effects of time-
dependent covariates ANC500 and GvHD (with robust SD) based on landmark super models using
a 6-month horizon from landmarks (ANC: Absolute neutrophil count, GvHD: Graft versus host
disease).
Stratified Smoothed
Covariate Parameter Est SD Est SD
ANC500(s) β1 -0.322 0.106 -0.298 0.094
ANC500(s) f1 (s) γ11 -1.191 1.670 -1.393 1.630
ANC500(s) f2 (s) γ12 -2.257 1.791 -2.038 1.765
GvHD(s) β2 0.663 0.143 0.674 0.140
GvHD(s) f1 (s) γ21 0.391 0.430 0.333 0.417
GvHD(s) f2 (s) γ22 -0.226 0.342 -0.175 0.333
g1 (s) η1 1.414 1.606
g2 (s) η2 1.940 1.751
the effect is rather constant over time, and presence of ANC500 seems to be increasingly
protective over time. Based on this model we predict the 6-month conditional relapse-free
survival probabilities given still at risk at the respective landmarks, i.e., using thor (s j ) =
s j + 6 months. Figure 5.1 shows these predictions for subjects with either ANC500(s j ) =
GvHD(s j ) = 0 or ANC500(s j ) = GvHD(s j ) = 1. It is seen that the effect of, in particular
ANC500, over time has a quite marked influence on the curves.
Next, we fit a landmark super model with coefficients given by (5.11) choosing f1 (s) =
(s − s1 )/(sL − s1 ) = (s − 0.5)/2 and f2 (s) = f1 (s)2 . Estimated coefficients are shown in
Table 5.3 (stratified), and Figure 5.2 shows the associated conditional relapse-free survival
curves. These are seen to be roughly consistent with those based on the simple landmark
model. Table 5.3 (smoothed) also shows coefficients in a super model with a single base-
line hazard α0 (t) (i.e., at s1 = 0.5 months) and later landmark-specific baseline hazards
(i.e., at s j > 0.5 months) specified by (5.12) and, thereby, varying smoothly among land-
marks. The smoothing functions were chosen as g` (s) = f` (s), ` = 1, 2. Figure 5.3 shows
the corresponding estimated conditional relapse-free survival probabilities – quite similar
to those in the previous figures.
180 MARGINAL MODELS
(a) ANC500(s j ) = GvHD(s j ) = 0
1.0
0.9
1.0
0.9
Conditional survival probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9
Time since bone marrow transplantation (months)
Figure 5.1 Bone marrow transplantation in acute leukemia: Estimated 0- to 6-month conditional
relapse-free survival probabilities given survival till landmarks s j = 0.5, 1.0, 1.5, 2.0, and 2.5
months.
1.0
0.9
1.0
0.9
Conditional survival probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9
Time since bone marrow transplantation (months)
Figure 5.2 Bone marrow transplantation in acute leukemia: Estimated 0- to 6-month conditional
relapse-free survival probabilities given survival till landmarks s j = 0.5, 1.0, 1.5, 2.0, and 2.5
months. Estimates are based on a landmark super model with coefficients varying smoothly among
landmarks and landmark-specific baseline hazards.
may be chosen for the different transitions, though the same landmarks are typically used
for all transition hazards.
1.0
0.9
1.0
0.9
Conditional survival probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9
Time since bone marrow transplantation (months)
Figure 5.3 Bone marrow transplantation study: Estimated 0- to 6-month conditional relapse-free
survival probabilities given survival till landmarks s j = 0.5, 1.0, 1.5, 2.0, and 2.5 months. Estimates
are based on a landmark super model with coefficients and baseline hazards varying smoothly
among landmarks.
for each of the data sub-sets that together constitute the stacked data set. The estimating
equations β s are obtained from the Cox log-partial likelihood
n Z thor
exp(LPs,i )
∑ Yi (s) log dNi (t), s = s1 , . . . , sL ,
i=1 s ∑k Yk (t) exp(LPs,k )
by taking derivatives with respect to the parameters in the linear predictor LPs = β1s Z1 (s) +
· · · + β ps Z p (s) and equating to zero. The resulting Breslow estimator for the cumulative
LANDMARKING 183
baseline hazard A0s (t), t < thor is
Z t
b0s (t) = ∑i Yi (s)dNi (u)
A .
s ∑i Yi (u) exp(LP
c s,i )
This model contains L sets of parameters, each consisting of p regression parameters and a
baseline hazard, estimated separately for each landmark s j , j = 1, . . . , L. If the horizon time,
thor (s j ), used when analyzing data at landmark s j is no larger than the subsequent s j+1 , then
the analyses at different landmarks will be independent and inference for parameters can in
principle be performed using model-based variances; however, typically robust variances
are used for all landmarking models.
The model (5.11) is also a stratified Cox model where all strata contribute to the estimat-
ing equations for regression parameters. The estimating equations are obtained from the
log(pseudo-likelihood)
L n Z thor (s j )
exp(LPs j ,i )
∑∑ Yi (s j ) log dNi (t) (5.13)
j=1 i=1 s j ∑k Yk (t) exp(LPs j ,k )
Also in model (5.12), all strata contribute to the estimation of the coefficients in the linear
predictor
m0 p
LPs j = ∑ η` g` (s j ) + ∑ βks j Zk (s j )
`=1 k=1
m0 p m
= ∑ η` g` (s j ) + ∑ (βk + ∑ γk` f` (s j )Zk (s j )),
`=1 k=1 `=1
and the pseudo-log-likelihood has the same form as (5.13). Model (5.12) has only one
cumulative baseline hazard which may be estimated by
Here, event times belonging to several prediction intervals give several contributions to the
estimator. In all cases, inference for regression parameters (including γs and ηs) is based
on robust variances.
184 MARGINAL MODELS
5.4 Micro-simulation
In Section 4.1 and in previous sections of this chapter, we have utilized mathematical re-
lationships between transition intensities and marginal parameters to obtain estimates for
the latter via plug-in. This worked well for the more simple models like those depicted in
Figures 1.1-1.3 and, more generally, for Markov models. However, when the multi-state
models become more involved (e.g., Section 5.2), this approach becomes more cumber-
some. In this section, we will briefly discuss a brute force approach to estimating marginal
parameters based on transition intensities, namely micro-simulation (or discrete event sim-
ulation) (e.g., Mitton, 2000; Rutter et al., 2011).
The relevant multinomial distribution has index parameter 1 because at most one of the
counting processes Nh j (·) jumps at t, and the probability parameters are given by the tran-
sition intensities (see also Section 3.1). The intensities may be given conditionally on time-
fixed covariates Z and also conditionally on adapted time-dependent covariates Z(t) (Sec-
tion 3.7). In these cases, the time-fixed covariate pattern for which processes are generated
must first be decided upon. Non-adapted time-dependent covariates Z(t) involve extra ran-
domness, and micro-simulations can in this case only be carried out if a model for this extra
randomness is also set up. This will require joint modeling of V (t) and Z(t) (Section 7.4)
and will not be further considered here.
It may be computationally involved to generate paths for V (t) in steps of size dt as just
described and, fortunately, the simulations may be carried out in more efficient ways. This
is because, at any time t, we have locally a competing risks situation, and methods for
generating competing risks data (e.g., Beyersmann et al., 2009) may be utilized. Following
the recommended approach from that paper, the simulation algorithm goes, as follows:
1. Generate a time T1 of transition out of the initial state 0 based on the survival function
Z t
S0 (t) = exp − ∑ α0h (u)du = P00 (0,t).
h6=0 0
6 0 with probability
Given a transition at T1 , the process moves to state h1 =
α0h1 (T1 )
.
∑h6=0 α0h (T1 )
MICRO-SIMULATION 185
2. If the state, say h̃, reached when drawing a state from this multinomial distribution is
absorbing, then stop.
3. If state h̃ is transient, then generate a time T2 > T1 of transition out of that state based on
the (conditional) survival function
Z t
S1 (t | T1 ) = exp − ∑ αh̃h (u | T1 )du = Ph̃h̃ (T1 ,t).
T1
h6=h̃
6 h̃ with probability
Given a transition at T2 , the process moves to state h2 =
αh̃h2 (T2 )
.
∑h6=h̃ αh̃h (T2 )
4. Go to step 2.
Note that, in step 3, as the algorithm progresses, the past will contain more and more infor-
mation in the form of previous times (T1 , T2 , . . . ) and types of transition, and the transition
intensities may depend on this information, e.g., in the form of adapted time-dependent
covariates. The simulation process stops when reaching an absorbing state. For processes
without an absorbing state (e.g., Figure 1.4 without state 2), one has to decide upon a time
horizon (say, τ) within which the processes are generated. In this case, attention must be
restricted to marginal parameters not relating to times > τ.
The algorithm works well with a parametric specification of the intensities, e.g., as piece-
wise constant functions of time (Iacobelli and Carstensen, 2013). However, for non- or
semi-parametric models, such as the Cox model, in the steps where the next state is de-
termined given a transition at time T` , i.e., when computing the probabilities αh`−1 h` (T` )/
(∑h6=h`−1 αh`−1 h (T` )) and drawing from the associated multinomial distribution, one would
need the jumps for the cumulative hazards A bh h (T` ) at that time and, typically, at most one
`−1
of these will be > 0. For such situations, an alternative version of the algorithm is needed
and, for that purpose, the method not recommended by Beyersmann et al. (2009) is appli-
cable. This method builds on the latent failure time approach to competing risks discussed
in Section 4.4. However, the method does provide data with the correct distribution. The
algorithm then goes, as follows:
1. Generate independent latent potential times Te11 , . . . , Te1k of transition from the initial state
0 to each of the other states based on the ‘survival functions’
Z t
S0h (t) = exp − α0h (u)du , h = 1, . . . , k.
0
Let T1 be the minimum of Te11 , . . . , Te1k ; if this minimum corresponds to T1 = Te1h1 , then
the process moves to state h1 at time T1 .
2. If the state, h̃, thus reached is absorbing, then stop.
6 h̃ all > T1
3. If state h̃ is transient, then generate independent latent potential times Te2h , h =
of transition from that state based on the (conditional) ‘survival functions’
Z t
Sehh (t | T1 ) = exp − αh̃h (u | T1 )du , h =
6 h̃.
T1
186 MARGINAL MODELS
Let T2 be the minimum of Te2h , h 6= h̃; if this minimum corresponds to T2 = Te2h2 , then the
process moves to state h2 at time T2 .
4. Go to step 2.
Based on N processes (V`∗ (t), ` = 1, 2, . . . , N) generated in either of these ways, the desired
marginal parameter may be estimated by a simple average, e.g.,
The variability of the estimator Qbh (t) obtained by simulating using the estimate θb based
on the observed data can now, for large N, be quantified by (5.14). An alternative way of
evaluating this variability would be to use the bootstrap.
MICRO-SIMULATION 187
1.0
0.9
0.8
0.7
Survival probability, U
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10
Survival time, T
Figure 5.4 Illustration of sampling from a survival function: For the solid curve, values of U of,
respectively 0.4 or 0.1, provide survival times of 1.53 and 3.84, for the dashed curve U = 0.4 gives
T = 4.59 and U = 0.1 a survival time censored at τ = 10.
Table 5.4 PROVA trial in liver cirrhosis: Estimates (and SD) of the probability, Q1 (t), of being in
the bleeding state 1 at time t, using either the Aalen-Johansen estimator or micro-simulation. The
SD for the estimate obtained using micro-simulation was either based on SD1 (t) given by Equation
(5.14) with N = B = 1, 000 or on 1, 000 bootstrap replications.
Aalen-Johansen Micro-simulation
t b1 (t)
Q SD b1 (t)
Q SD1 (t) Bootstrap
0.5 0.050 0.013 0.053 0.015 0.011
1.0 0.081 0.016 0.074 0.020 0.016
1.5 0.091 0.018 0.091 0.022 0.017
2.0 0.093 0.019 0.088 0.021 0.018
2.5 0.089 0.019 0.082 0.021 0.018
3.0 0.089 0.019 0.075 0.022 0.019
3.5 0.079 0.019 0.069 0.023 0.020
4.0 0.063 0.020 0.051 0.017 0.016
From 10,000 simulated illness-death processes, the expected time (years) ε1 (τ) spent in the
bleeding state before time τ was also estimated. Figure 5.6 shows the estimate as a func-
tion of τ together with the corresponding estimate based on the integrated Aalen-Johansen
estimator. The two estimators are seen to coincide well.
In a second set of simulations, the binary covariate Z: Sclerotherapy (yes=1, no=0) was
added to the three transition intensity models assuming proportional hazards, thus requiring
three more parameters to be estimated. From N = 1, 000 processes, each re-sampled B =
1, 000 times, the probability Q1 (2) was estimated for Z = 0, 1. Histograms of the resulting
estimates are shown in Figure 5.7. It is seen that treatment with sclerotherapy reduces
MICRO-SIMULATION 189
0.100
0.075
Density
0.050
0.025
0.000
0.04 0.08 0.12 0.16
Q1(2)
Figure 5.6 PROVA trial in liver cirrhosis: Estimates of average time, ε1 (τ) spent in the bleeding
state before time (year) τ; based on either the Aalen-Johansen estimator or on micro-simulation
(N = 10, 000).
190 MARGINAL MODELS
0.100
0.075
Density
0.050
0.025
0.000
0.00 0.05 0.10 0.15 0.20
Q1(2 | Z)
Z 0 1
the probability of being in the bleeding state, likely owing to the fact that this treatment
increases the death intensity without bleeding (Table 3.3).
Micro-simulation
Micro-simulation is a general plug-in technique for marginal parameters in a multi-
state model when intensities have been specified. The strength of the method is
its generality, and it is applicable in situations where plug-in using a mathematical
expression is not feasible. This includes estimation based on a model where the
intensities are functions of the past given by adapted time-dependent covariates.
β 0 )) = 0. The
Equations (5.16) are unbiased by (5.15) since, given covariates Z , E(U(β
β , Z i ) is typically the p-vector
function A(β
∂ −1 T
β , Z i) =
A(β g (β
β Z i ), j = 1, . . . , p
∂βj
of partial derivatives of the mean function. The independent random variables Ti , i =
1, . . . , n could be vector-valued with a dimension that may vary among i-values reflecting
a clustered data structure, possibly with clusters of varying size, si . In that case, A(β β , Z i)
would be a (p×si )-matrix, possibly including an (si ×si ) working correlation matrix. How-
ever, we will for simplicity restrict attention to the scalar case where Ti is univariate.
The asymptotic properties of the solution βb to (5.16) rely on a Taylor expansion of (5.16)
around the true parameter value β 0
β ) ≈ U(β
U(β β 0 ) + DU(β β − β 0 ),
β 0 )(β
The ‘meat’ of the sandwich, V,b is the covariance of the GEE and the ‘bread’ is the inverse
matrix of the partial derivatives, DU(β β ), of the GEE. If the GEEs are obtained as score
equations by equating log-likelihood derivatives to zero, then the variance estimator sim-
plifies because minus the second log-likelihood derivative equals the inverse variance of
the score (e.g., Andersen et al., 1993, ch. VI).
In our applications of GEE, we will often face the additional complication that ‘our random
variables Ti ’ are incompletely observed because of right-censoring in which case inverse
probability of censoring weighted (IPCW) GEE
β TZ i) = 0
β , Z i ) Ti − g−1 (β
U(ββ ) = ∑ Ui (β
β ) = ∑ DiWbi A(β (5.18)
i i
β TZ )
ε0 (τ | Z ) = g−1 (β
DIRECT REGRESSION MODELS 193
(where the vector Z now includes the constant covariate equal to 1). Typical link functions
could be g = log or g = identity. Now, β is estimated by solving the unbiased GEE
Di
U (β
β) = ∑ β T Z i )) = 0,
Z i (Xi − g−1 (β
i G(Xi )
b
where G b is the Kaplan-Meier estimator for the distribution of C. Thus, subjects for whom
Ti ∧ τ was observed are up-weighted to also represent subjects who were censored. Using
counting process notation, the estimating equations become
Zi
Z τ
U (β
β) = ∑ β T Z i ))dNi (t) = 0 .
(t − g−1 (β
i 0 G(t)
b
Tian et al. (2014) discussed asymptotic normality for βb , the solution to these GEEs, as-
suming, among other things that G is independent of Z with G(τ) > 0, and derived an
expression for the sandwich estimator of the variance of βb .
e01 (t)) + β T Z ,
log(− log(1 − F1 (t | Z ))) = log(A (5.19)
where the risk parameter is linked to the covariates in the same way as in the Cox model for
survival data, i.e., using the cloglog link. (Fine and Gray allowed inclusion of deterministic
time-dependent covariates, but we will skip this possibility in what follows.) Indeed, the
Fine-Gray model (5.19) is a Cox model for the hazard function for the improper random
variable
T1 = inf{t : V (t) = 1},
which is the time of entry into state 1. This hazard function is the cause 1 sub-distribution
hazard given by
e1 (t) =
α lim P(T1 ≤ t + dt | T1 > t)/dt (5.20)
dt→0
= lim P(T ≤ t + dt, D = 1 | T > t or (T ≤ t and D =
6 1))/dt.
dt→0
It follows from this expression that the sub-distribution hazard has a rather unintuitive in-
terpretation, being the cause-1 mortality rate among subjects who are either alive or have
already failed by a competing cause. We will discuss other choices of link function for the
cumulative incidence later (Section 5.5.5 and Chapter 6), and there we will see that this may
lead to other difficulties for the resulting model. Nice features of the link function in the
194 MARGINAL MODELS
Fine-Gray model include the fact that predicted failure probabilities stay within the admis-
sible range between 0 and 1 and, as we shall see now, that it suggests estimating equations
inspired by the score equations resulting from a Cox model.
In the special case of no censoring, the ‘risk set’
for the sub-distribution hazard is completely observed, and we let Yei (t) = I(Ti ≥ t or (Ti ≤
6 1)) = 1 − N1i (t−) (where N1i (t) = I(Ti ≤ t, Di = 1) is the counting process for
t and Di =
cause 1 failures) be the membership indicator for this risk set. In this case the ‘Cox score’
is
Z ∞
∑ j Yej (t)Z β TZ j )
Z j exp(β
U1 (β
β) = ∑ Zi − dN1i (t).
i 0 ∑ j Yej (t) exp(ββ TZ j)
Fine and Gray (1999) used martingale results to ascertain that the resulting score equation is
unbiased and to obtain asymptotic normality for its solution. Similar results were obtained
in the case with right-censoring, conditionally independent of V (t) for given covariates,
and where the censoring times Ci are known for all i (e.g., administrative censoring). In
that case the risk set is re-defined as
i.e., subjects who are either still alive and uncensored at t or have failed from a competing
cause before t and, at the same time, have a censoring time, Ci , exceeding t. The member-
ship indicator for this risk set is Yi∗ (t) = Yei (t)I(Ci ≥ t) and, thus, subjects who fail from a
competing cause stay in the risk set, not indefinitely, but until their time of censoring. This
leads to the ‘Cox score’
Z ∞
∑ j Y j∗ (t)Z β TZ j )
Z j exp(β
U∗1 (β
β) = ∑ Zi − dN1i (t).
i 0 ∑ j Y j∗ (t) exp(ββ TZ j )
I(C j ≥ T j ∧ t)G(t)
b
W j (t) = (5.21)
b j ∧C j ∧ t)
G(T
where Gb estimates the censoring distribution either non-parametrically using the Kaplan-
Meier estimator or via a regression model. There are three kinds of subjects who were not
observed to fail from cause 1 before t:
1. j is still alive and uncensored in which case W j (t) = 1,
DIRECT REGRESSION MODELS 195
2. j was censored before t in which case W j (t) = 0,
3. j failed from a competing cause before t in which case W j (t) = G(t)/
b b j ), the con-
G(T
ditional probability of being still uncensored at t given uncensored at the failure time
Tj .
The resulting GEE are UW
1 (β
β ) = 0 where
Z ∞
∑ j W j (t)Yej (t)Z β TZ j )
Z j exp(β
UW
1 (β
β) =∑ Zi −
(Z )Wi (t)dN1i (t), (5.22)
i 0 ∑ j W j (t)Yej (t) exp(ββ TZ j )
and Yei (t) = 1 − N1i (t−), the indicator of no observed cause 1 failure before time t. Fine
and Gray (1999) showed that these equations are approximately unbiased, that their solu-
tions are asymptotically normal, and derived a consistent sandwich variance estimator. The
estimator Z t
Wi (u)dNi (u)
A (t) =
be
01 ∑ 0 T
i ∑ j W j (u)Yej (u) exp(βb Z j )
e01 (t) was also presented with asymp-
for the cumulative baseline sub-distribution hazard A
totic results.
The Fine-Gray model can be used for a single cause or for all causes – one at a time –
and, as we have seen, inference requires modeling of the censoring distribution. When all
causes are modeled, there is no guarantee that, for any given covariate pattern, one minus
the sum of the estimated cumulative incidences given that covariate pattern is a proper
survival function (e.g., Austin et al., 2021). Furthermore, the partial likelihood approach is
not fully efficient. Based on such concerns, Mao and Lin (2017) proposed an alternative
non-parametric likelihood approach to joint modeling of all cumulative incidences. The
Jacod formula (3.1) for the competing risks model was re-written in terms of the cumulative
incidences and their derivatives – the sub-distribution densities
d
f j (t) = Fj (t) = α j (t)S(t),
dt
as follows. For two causes of failure, the contribution to the Jacod formula from an obser-
vation at time X is, with the notation previously used,
in which the cumulative incidences may be parametrized, e.g., as in the Fine-Gray model
or using other link functions. Mao and Lin (2017) showed that, under suitable conditions,
the resulting estimators are efficient and asymptotically normal. Similar to modeling via
hazard functions, this approach does not require a model for censoring.
196 MARGINAL MODELS
Cause-specific time lost
Conner and Trinquart (2021) used the approach of Tian et al. (2014) to study regression
models for the τ-restricted cause-specific time lost in the competing risks model. Following
Section 5.1.2, the parameter of interest is
Z τ
εh (τ) = τ − E(Th ∧ τ) = Qh (u)du
0
where Th is the time of entry into state h, possibly observed with right-censoring. Let the
potential right-censoring time for subject i be Ci and assume the generalized linear model
βT
εh (τ | Z ) = g−1 (β h Z)
(where the vector Z includes the constant covariate equal to 1). Typical link functions could
be g = log or g = identity. Now, β h is estimated by solving the unbiased GEE
Zi
Z τ
U (β
β h) = ∑ βT
(τ − t − g−1 (β h Z i ))dNhi (t) = 0
i 0 G(t)
b
where Gb is the Kaplan-Meier estimator for the distribution of C and Nhi (t) the counting
process for h-events for subject i.
Conner and Trinquart (2021) discussed conditions for asymptotic normality of the resulting
solution βb h to these GEEs and derived an expression for the sandwich estimator for the
variance of βb h .
µ(t) = E(N(t)),
where N(t) counts the number of events in [0,t]. We will distinguish between the two situa-
tions where either there are competing risks in the form of a terminal event, the occurrence
of which prevents further recurrent events from happening, or there is no such terminal
event.
No terminal event
We will begin by considering the latter situation. The parameter µ(t) is closely linked to a
partial transition rate as introduced in (5.7), as follows. The partial transition rates for this
model are (approximately for small dt > 0)
∗
αh,h+1 (t) ≈ P(V (t + dt) = h + 1 | V (t) = h)/dt,
where Yi (t) = 1 if i is still uncensored at time t, i.e., Yi (t) = I(Ci > t) (note that, for recurrent
events without competing risks, times Ci of censoring will always be observed). Equation
(5.23) is solved by
∑ Yi (t)dNi (t)
dµ(t) = i
∑i Yi (t)
corresponding to estimating the mean function by the Nelson-Aalen estimator
Z t
∑i dNi (u)
b (t) =
µ . (5.24)
0 ∑i Yi (u)
(Note that we only have dNi (t) = 1 if Yi (t) = 1.) Equation (5.23) is unbiased if censoring
is independent of the multi-state process in which case the estimator can be shown to be
consistent (Lawless and Nadeau, 1995; Lin et al., 2000). For more general censoring, (5.23)
may be replaced by the IPCW GEE
Yi (t)
∑ Gb (t) dNi (t) − dµ(t) = 0
i i
where G bi (t) estimates the probability E(Yi (t)) that subject i is uncensored at time t, possibly
via a regression model. For both estimators, a sandwich variance estimator is available, or
bootstrap methods may be used.
A multiplicative regression model for the mean function, inspired by the Cox regression
model, is
β T Z ),
µ(t | Z ) = µ0 (t) exp(β (5.25)
see Lawless and Nadeau (1995) and Lin et al. (2000), often referred to as the LWYY model.
Unbiased GEE may be established from a working intensity model with a Cox type inten-
sity where the score equations are
and
β TZ i) = 0.
∑ Yi (t)ZZ i dNi (t) − dµ0 (t) exp(β (5.27)
i
Equation (5.26) is, following the lines of Section 3.3, for fixed β solved by
Z t
∑i dNi (u)
b0 (t) =
µ (5.28)
0 β TZ i)
∑i Yi (u) exp(β
198 MARGINAL MODELS
and inserting this solution into (5.27) leads to the equation
β TZ j )
!
Z j exp(β
∑ j Y j (t)Z
∑ Z i − Y (t) exp(ββ T Z ) dNi (t) = 0
i ∑j j j
which is identical to the Cox score Equation (3.17). To assess the uncertainty of the esti-
mator, a sandwich estimator, as derived by Lin et al. (2000) must be used instead of the
model-based SD obtained from the derivative of the score. Further, the baseline mean func-
tion µ0 (t) may be estimated by the Breslow-type estimator obtained by inserting βb into
(5.28).
Terminal event
The situation where there are events competing with the recurrent events process was stud-
ied by Cook and Lawless (1997) and by Ghosh and Lin (2000, 2002), see also Cook et al.
(2009). In this situation, the partial transition rate
∗
αh,h+1 (t) ≈ P(V (t + dt) = h + 1 | V (t) = h)/dt
and, as in the case of no competing risks, it may be estimated by the Nelson-Aalen estimator
Z t
b∗ (t) = ∑i dNi (u)
A .
0 ∑i Yi (u)
The quantity A∗ (t) is not of much independent interest (it conditions on the future); how-
ever, since in the case of competing risks we have
Z t
E(N(t)) = S(u)dA∗ (u),
0
b is the Kaplan-Meier estimator for S(t) = P(TD > t). Asymptotic results for this
where S(·)
estimator were presented by Ghosh and Lin (2000).
Regression analysis for µ(t) in the presence of a terminal event can proceed in two direc-
tions. Cook et al. (2009) discussed a plug-in estimator combining a regression model for
S(t) via a Cox model for the marginal death intensity and one for A∗ (t) using the estimat-
ing Equations (5.26)-(5.27) for µ(t | Z ) without competing risks. As it was the case for
the plug-in models discussed previously, this enables prediction of E(N(t) | Z ) but does
not provide regression parameters that directly quantify the association. To obtain this, the
DIRECT REGRESSION MODELS 199
direct model for µ(t | Z ) discussed by Ghosh and Lin (2002) is applicable. This model
also has the multiplicative structure (5.25), and direct IPCW GEE for this marginal param-
eter were set up, as follows. Ghosh and Lin (2002), following Fine and Gray (1999), first
considered the case with purely administrative censoring, i.e., where the censoring times
Ci are known for all subjects i and, next, for general censoring, IPCW GEE were studied.
The resulting equations are, except for the factors Yei (t) = 1 − Ni (t−) appearing in Equation
(5.22) for the Fine-Gray model, identical to that equation, i.e.,
T
!
Z ∞
∑ j W j (t)ZZ j exp(β
β Z j )
UW1 (β
β) = ∑ Zi − Wi (t)dN1i (t), (5.30)
i 0 ∑ j W j (t) exp(ββ TZ j )
where the weights Wi (t) are given by (5.21). Ghosh and Lin (2002) presented asymptotic re-
sults for the solution βb , including a sandwich-type variance estimator, and for the Breslow-
type estimator Z t
Wi (u)dNi (u)
b0 (t) = ∑
µ
i 0 ∑ W j (u) exp(βb TZ j )
j
for the baseline mean function.
As discussed in Section 4.2.3, the occurrence of the competing event (‘death’) must be
considered jointly with the recurrent events process N(t) when a terminal event is present.
To this end, Ghosh and Lin (2002) also studied an inverse probability of survival weighted
(IPSW) estimator, as follows. In (5.30), the weights are re-defined as
I(TDi ∧Ci ≥ t)
WiD (t) =
b | Zi )
S(t
where the denominator estimates the conditional probability given covariates of survival
past time t. Ghosh and Lin showed that the corresponding GEE are approximately unbiased
and derived asymptotic properties of the resulting estimator βb . Though the details were not
given, this also provides the joint asymptotic distribution of βb and, say βb D , the estimated
regression coefficient in a Cox model for the survival time distribution. Note that, compared
to the IPCW approach, the IPSW approach has the advantage of not having to estimate the
censoring distribution, but instead the survival time distribution which is typically of greater
scientific interest.
Mao-Lin model
Mao and Lin (2016) defined a composite end-point combining information on N(t) and
survival. They considered multi-type recurrent events processes Nh (t), h = 1, . . . , k, Nh (t)
counting events of type h together with, say N0 (t), the counting process for the terminal
event and assumed that each event type and death can be equipped with a severity weight
(or utility) ch , h = 0, 1, . . . , k. They defined the weighted process
k
N̄(t) = ∑ ch Nh (t)
h=0
for the regression parameter β = (β0 , β1 , . . . , β p )T . This parameter vector includes an in-
tercept β0 depending on the chosen value of t0 . The model has link function g, i.e.,
g(E(I(Vi (t0 ) = h) | Z i )) = β T Z i . In (5.32), A (β
β , Z i ) is usually the (p + 1)-vector of partial
derivatives
∂ −1 T
A(ββ , Z i) = g (β β Z i ), j = 0, 1, . . . , p
∂βj
of the mean function (see Section 5.5.1) and G b is the Kaplan-Meier estimator if Ci , i =
1, . . . , n are assumed i.i.d. with survival distribution G(·).
The asymptotic distribution of βb was derived by Scheike and Zhang (2007) together with
a variance estimator using the sandwich formula. However, bootstrap or an i.i.d. decompo-
sition (to be further discussed in Section 5.7) are also possible when estimating the asymp-
totic variance. Extensions to a model for several time points simultaneously have also been
considered (e.g., Grøn and Gerds, 2014). Blanche et al. (2023) compared, for the compet-
ing risks model, estimates based on (5.32) with those obtained by solving the GEE of the
form (5.18) with I(Vi (t0 ) = h) as the response variable (see also Exercise 5.4).
For the analysis of the cumulative incidence in a competing risks model, the cloglog link
function log(− log(1 − p)) will provide regression coefficients with a similar interpreta-
tion as those in the Fine-Gray model (Section 5.5.3); however, using the direct binomial
approach other link functions may also be studied. This is also possible using pseudo-
observations to be discussed in Chapter 6. As discussed, e.g., by Gerds et al. (2012), a
MARGINAL HAZARD MODELS (*) 201
log-link gives parameters with a relative risk interpretation; however, this comes with the
price that estimates may be unstable for time points close to 0 and that predicted risks may
exceed 1.
β T Z i (t))
λi (t) = Yi (t)α0 (t) exp(β
for the intensity process λi (t) for the counting process Ni (t) = I(Xi ≤ t, Di = 1) counting
occurrences of the event of interest where, as usual, Yi (t) = I(Xi ≥ t). These are U (β
β) = 0
where Z ∞
U (β β) = ∑ Z i (t) − Z̄
(Z Z (β
β ,t))dNi (t),
i 0
(Equation (1.21)). This shows that the score evaluated based on data on the interval [0,t]
and evaluated at the true parameter value β 0 is a martingale, and the martingale central
202 MARGINAL MODELS
limit theorem may be used to show asymptotic normality of the score. Thereby, asymp-
totic normality of the solution βb follows as in Section 5.5.1 with the simplification that,
as explained below (5.17), minus the derivative DU (βb ) of the score estimates the inverse
variance of the score, such that the variance of βb may be estimated by DU (βb )−1 with
β ,t)T
Z ∞
S 2 (β β ,t)SS 1 (β
β ,t) S 1 (β
DU (β β) = ∑ − dNi (t) (5.34)
i 0 S0 (ββ ,t) β ,t)2
S0 (β
and S 2 (β Z i (t)Z
β ,t) = ∑i Yi (t)Z β T Z i (t)). This is the model-based variance estimate.
Z i (t)T exp(β
A robust estimator of the variance of βb may also be derived as in Section 5.5.1 following
Lin and Wei (1989) where the ‘meat’ of the sandwich in (5.17) is
Z ∞ Z ∞
Vb = ∑ Z i (t) − Z̄
(Z Z (βb ,t))d M
bi (t) (Z Z (βb ,t))T d M
Z i (t) − Z̄ bi (t). (5.35)
i 0 0
Here, Mbi is obtained by plugging-in βb and the Breslow estimator for α0 (t)dt into the
expression for Mi . The resulting robust variance-covariance matrix is then, as in (5.17),
DU (βb )−1Vb DU (βb )−1 .
i.e., conditioning is only on Thi > t (and on covariates) and not on other information for
cluster no. i.
The Cox model for the marginal intensity of type h events is
βT
λhi (t) = Yhi (t)αh0 (t) exp(β h Z i (t)), h = 1, . . . , K
with Yhi (t) = I(Xhi ≥ t). Following Lin (1994) we will use the feature of type-specific co-
variates (see Section 3.8) and re-write the model as
β T Z hi (t)),
λhi (t) = Yhi (t)αh0 (t) exp(β h = 1, . . . , K, (5.37)
MARGINAL HAZARD MODELS (*) 203
since this formulation, as discussed in Section 3.8, allows models with the same β for
U (β
several types of events. The GEEs for β are now Ū β ) = 0 where
K Z ∞
U (β
Ū β) = ∑ ∑ Z hi (t) − Z̄
(Z Z h (β
β ,t))dNhi (t).
i h=1 0
Lin (1994) also discussed a model with a common baseline hazard across event types and
a more general, stratified model generalizing both of these models was discussed by Spik-
erman and Lin (1998) . The derivations for these models are similar to those for (5.37) and
we will omit the corresponding details.
Lin (1994) discussed asymptotic normality of the solution βb and presented the robust es-
timator of the variance-covariance matrix, which is DŪ U (βb )−1 . Here, DŪ
U (βb )−1Vb̄ DŪ U and
V are sums over types h of the corresponding type-specific quantities given by (5.34) and
b̄
(5.35). Lin’s asymptotic results hold true no matter the correlation among clusters i. How-
ever, the results build on the assumption that the vectors of event times (Thi , h = 1, . . . , K)
and right-censoring times (Chi , h = 1, . . . , K) are conditionally independent given the co-
Z hi , h = 1, . . . , K).
variates (Z
i.e., also conditioning on being alive at t, but then the parameter is no longer a marginal
hazard. This solution was also discussed by Li and Lagakos (1997) and exemplified in Table
4.10.
5.7 Goodness-of-fit
In previous chapters and sections, a number of different models for multi-state survival
data have been discussed, including models for intensities (rates) and direct models for
marginal parameters (such as risks). All models impose a number of assumptions, such as
proportional hazards, additivity of covariates (no interaction) and linearity of quantitative
covariates in a linear predictor. We have in connection with examples shown how these
assumptions may be checked, often by explicitly introducing parameters describing depar-
tures from these assumptions. Thus, interaction terms or quadratic terms have been added
to a linear predictor (e.g., Section 2.2.1) as well as time-dependent covariates expressing
interactions in a Cox model between covariates and time (e.g., Section 3.7.7).
In this section, some general techniques for assessment of goodness-of-fit will be reviewed,
building on an idea put forward for the Cox model by Lin et al. (1993). The techniques
are based on cumulative residuals and provide both a graphical model assessment and a
numerical goodness-of-fit test – both using re-sampling from an approximate large-sample
distribution and, as we shall see, they are applicable for both hazard models and marginal
models. Section 5.7.1 presents the mathematical idea for the general method, with special
attention to GEE and the Cox model. Examples and discussion of how graphs and tests are
interpreted are given in Section 5.8.4.
U (β
β ) = ∑ A (β
β , Z i )ei
i
for a suitably defined set of residuals ei , i = 1, . . . , n. This is the case for the general GEE
discussed in Section 5.5.1, but also the score equations based on Cox partial likelihood may
be re-written into this form (Section 5.6.1). Solving the equations and, thereby, obtaining
parameter estimates βb , a set of observed residuals ebi are obtained which may be used for
206 MARGINAL MODELS
model checking by looking at processes of the form
Z i )I(Zi j ≤ z)b
W (z) = W j (z) = ∑ h(Z ei (5.38)
i
(Lin et al., 2002). Here, the second sum is Taylor expanded around the true parameter value
β0
∂ −1 ∗
− ∑ h(Z Z i )I(Zi j ≤ z) β , Z i )(βb − β 0 )
g (β
i ∂ β
and by another Taylor expansion, see Section 5.5.1,
DU (βb ))−1U (β
βb − β 0 ≈ (−D β 0)
DU (βb ))−1 ∑ A (β
= (−D β 0 , Z i )ei .
i
Collecting terms, the goodness-of-fit process W (z) is seen to have the same asymptotic
distribution as the following sum ∑i fz (Vi (·))ei of i.i.d. terms, where
!
∂ −1
Z i )I(Zi j ≤ z) 1 −
fz (Vi (·)) = h(Z g (ββ 0 , Z i )(−D β 0 ))−1 ∑ A (β
DU (β β 0, Z k) .
∂β k
The asymptotic distribution of W (z) can now be approximated by generating i.i.d. standard
normal variables (U1 , . . . ,Un ) and calculating ∑i fbz (Vi (·))b
eiUi (sometimes referred to as the
conditional multiplier theorem or the wild bootstrap, e.g., Martinussen and Scheike, 2006,
ch. 2; Bluhmki et al., 2018) where, in fbz (·), β 0 is replaced by βb . This i.i.d. decomposition
gives rise to a number of plots of cumulative residuals and also to tests obtained by compar-
ing the observed goodness-of-fit process to those obtained by repeated generation of i.i.d.
standard normal variables. We will illustrate this in Section 5.8.4.
with Z t
T ∑ dNk (u)
bi (t) = Ni (t) −
M Yi (u) exp(βb Z i ) k .
0 S0 (βb , u)
bi (t) as a function of β around β 0 and using
Taylor expanding M
βb − β 0 ≈ (−D β 0 ))−1U (β
DU (β β 0)
Z ∞
= (−D β 0 ))−1 ∑
DU (β Z i − Z̄
(Z Z (β
β 0 ,t))dMi (t),
i 0
In this expression, the Doob-Meyer decomposition (1.21) of Ni is used to get that W (t, z)
has the same asymptotic distribution as
Z t
∑ Z i )I(Zi j ≤ z) − g j (β
(h(Z β 0 , u, z))dMi (u)
i 0
Z t
−∑ βT
Yi (u) exp(β Z i )I(Zi j ≤ z)(Z
0 Z i )h(Z Z i − Z̄ β 0 , u))T α0 (u)du
Z (β
i 0
Z ∞
×(−D β 0 ))−1 ∑
DU (β Z k − Z̄
(Z Z (β
β 0 , u))dMk (u),
k 0
where
β T Z i )h(Z
∑i Yi (u) exp(β Z i )I(Zi j ≤ z)
g j (β
β , u, z) = .
S0 (β
β , u)
This asymptotic distribution is approximated by replacing β 0 by βb , α0 (u)du by the Breslow
estimator, and dMi (t) by dNi (t)Ui with U1 , . . . ,Un i.i.d standard normal variables.
Lin et al. (1993) suggested to use W j (t, z) with h(·) = 1 and t = ∞ to check the functional
form for a quantitative Zi j , i.e., to plot cumulative martingale residuals
against z, together with a large number of paths generated from the approximate asymptotic
distribution.
208 MARGINAL MODELS
Z ) = Zi j and z = ∞,
To examine proportional hazards, Lin et al. (1993) proposed to let h(Z
i.e., cumulative Schoenfeld or ‘score’ residuals
Z t
∑ (Zi j − Z̄ j (βb , u))dNi (u)
i 0
are plotted against t, where the jth Schoenfeld residual for subject i failing at time Xi is
its contribution Zi j − Z̄ j (βb , Xi ) to the Cox score. The observed path for the goodness-of-fit
process is plotted together with a large number of paths generated from the approximate
asymptotic distribution.
5.8 Examples
In this section, we will exemplify some of the new methods that have been introduced in
the current chapter.
0.08
0.06
Probability
0.04
0.02
0.00
1 2 3 4
Time since randomization (years)
LM Pepe LM Titman
LM AaJ AaJ
Figure 5.8 PROVA trial in liver cirrhosis: Estimates for the transition probability P01 (s,t), t > s for
s = 1 year (LM: Landmark, AaJ: Aalen-Johansen).
estimators for t < 1.8 years while, for larger values of t, it approaches the Aalen-Johansen
estimator.
We next compare with plug-in estimators using the expression
Z t
P01 (s,t) = P00 (s, u)α01 (u)P11 (u,t | u)du,
s
and basing the probability P11 (u,t | u) = exp − ut α12 (x, x − u)dx of staying in state 1
R
until time t given entry into that state at the earlier time u on different models for the 1 → 2
transition intensity. We consider the following models
α12 (t, d) = α12,0 (d), (5.39)
α12 (t, d) = α12,0 (t) exp(LP(d)). (5.40)
Equation (5.39) is the special semi-Markov model where the intensity only depends on d.
In (5.40), the baseline 1 → 2 intensity depends on t and functions of d are used as time-
dependent covariates (as in Table 3.8). In (5.40), the linear predictor is either chosen as
or LP(d) = β · d. Figure 5.9 shows the resulting estimates Pb01 (s,t) for s = 1 year to-
gether with the Aalen-Johansen and landmark Aalen-Johansen estimates. It is seen that the
estimate based on a semi-Markov model with d as the baseline time-variable is close to the
landmark Aalen-Johansen estimate. On the other hand, the models with t as baseline time-
variable and duration-dependent covariates differ according to the way in which the effect
of duration is modeled: With a piece-wise constant effect it is closer to the Markov-based
Aalen-Johansen estimator, and with a linear effect it is closer to the landmark estimate.
210 MARGINAL MODELS
0.06
0.04
Probability
0.02
0.00
1 2 3 4
Time since randomization (years)
FPD LD LM AaJ
PWCD Semi AaJ
Figure 5.9 PROVA trial in liver cirrhosis: Estimates for the transition probability P01 (s,t), t > s
for s = 1 year (FPD: Fractional polynomial duration effect in (5.40), LD: Linear duration effect in
(5.40), LM: Landmark, AaJ: Aalen-Johansen, PWCD: Piece-wise constant duration effect in (5.40),
Semi: Semi-Markov (5.39)).
To study whether a more detailed model for the duration effect LP(d) in (5.40) would
provide a better fit to the data, a model with a duration effect modeled using a fractional
polynomial
LP(d) = β1 d + β2 d 2 + β3 d 3 + β4 log(d)
(e.g., Andersen and Skovgaard, 2010, ch. 4) was also studied, see Figure 5.9. It is seen that
the latter estimate is close to that using a linear duration effect.
To assess the variability of the estimators, Andersen et al. (2022) also conducted a bootstrap
experiment by sampling B = 1, 000 times with replacement from the PROVA data set and
repeating the analyses on each bootstrap sample. It was found that the Aalen-Johansen
estimator has a relatively large SD; however, since this estimator tends to be upwards biased
as seen in Figures 5.8-5.9, a more fair comparison between the estimated variabilities is
obtained by studying the relative SD, i.e., the coefficient of variation SD(P)/
b P.b This showed
that the estimators based on sub-sampling (landmark Aalen-Johansen, Pepe, Titman) have
relatively large relative SD-values. On the other hand, the Aalen-Johansen estimator and
the plug-in estimators (5.39) and (5.40) (with a linear duration effect as covariate) have
smaller relative SD.
In conclusion, the estimators based on sub-sampling are truly non-parametric and hence
reliable; however, being based on fewer subjects, they are likely to be more variable. On
the other hand, the plug-in estimators are based on the full sample and hence less variable
though it may be a challenge to correctly model the effect of ‘the other time-variable’ using
time-dependent covariates.
EXAMPLES 211
Table 5.5 Bone marrow transplantation in acute leukemia: Estimated coefficients (and SD) for du-
ration effects in Cox models for the transition intensities α12 (·), α13 (·) and α23 (·) (GvHD: Graft
versus host disease).
Transition Duration in βb SD
1→2 GvHD state (t − T1 ) 0.074 0.046
1→3 GvHD state (t − T1 ) 0.050 0.021
2→3 Relapse state (t − T2 ) -0.066 0.016
0.03
Probability
0.02
0.01
0.00
0 12 24 36 48 60
Time since bone marrow transplantation (months)
LinGvHD LM Pepe Semi
LinRel LM AaJ AaJ
Figure 5.10 Bone marrow transplantation in acute leukemia: Estimates for the transition probability
P02 (s,t),t > s for s = 3 months (LinGvHD: Linear duration effect in state 1, LM: Landmark, Semi:
Semi-Markov, LinRel: Linear duration effect in state 2, AaJ: Aalen-Johansen).
0.03
Probability
0.02
0.01
0.00
0 12 24 36 48 60
Time since bone marrow transplantation (months)
LinGvHD LM Pepe Semi
LinRel LM AaJ AaJ
Figure 5.11 Bone marrow transplantation in acute leukemia: Estimates for the transition probability
P02 (s,t),t > s for s = 9 months (LinGvHD: Linear duration effect in state 1, LM: Landmark, Semi:
Semi-Markov, LinRel: Linear duration effect in state 2, AaJ: Aalen-Johansen).
EXAMPLES 213
death without a liver transplantation (F2 (·)), either at t0 = 2 years or simultaneously at
(t1 ,t2 ,t3 ) = (1, 2, 3) years. It is seen that, unadjusted, the odds of dying without transplan-
tation before 2 years is 1.098 = exp(0.093) times higher for a CyA-treated person com-
pared to placebo with 95% confidence interval from 0.531 to 2.27, and after adjustment
for albumin and log2 (bilirubin) the corresponding odds ratio is 0.630 = exp(−0.463) (95%
confidence interval (0.266, 1.488)). Moving from analyzing F2 (t0 | Z) to jointly analyzing
F2 (t j | Z), j = 1, 2, 3 and assuming time-constant effects, the estimated SD become smaller.
Table 5.6 PBC3 trial in liver cirrhosis: Estimated coefficients (and SD) from direct binomial (lo-
gistic) models for the cumulative incidence of death without transplantation at t0 = 2 years or
simultaneously at (t1 ,t2 ,t3 ) = (1, 2, 3) years.
(a) t0 = 2 years
Covariate βb SD βb SD
Treatment CyA vs. placebo 0.093 0.371 -0.463 0.439
Albumin per 1 g/L -0.147 0.037
log2 (bilirubin) per doubling 0.639 0.151
Covariate βb SD βb SD
Treatment CyA vs. placebo -0.030 0.323 0.520 0.373
Albumin per 1 g/L -0.125 0.035
log2 (bilirubin) per doubling 0.579 0.128
between the failure indicator Di (= 1 for a failure and = 0 for a censored observation) for
subject i and the estimated cumulative hazard evaluated at the time, Xi , of failure/censoring.
The latter has an interpretation as an ‘expected value’ of Di at time Xi . If Z j is a quantitative
covariate, then, according to Lin et al. (1993), a plot of the cumulative sum of martingale
residuals for subjects with Zi j ≤ z against z is sensitive to non-linearity of the effect of the
covariate on the linear predictor. If linearity provides a good description of the effect, then
the resulting curve should vary non-systematically around 0. A formal test for linearity may
be obtained by comparing the curve with a large number of random realizations of how the
curve should look like under linearity, e.g., focusing on the maximum value of the observed
curve compared to the maxima of the random realizations.
In Section 2.2, a model for the rate of failure of medical treatment including the covariates
treatment, albumin, and bilirubin was fitted to the PBC3 data, see Table 2.4. To assess
linearity of the two quantitative covariates albumin and bilirubin, Figures 5.12 and 5.13
show cumulative martingale residuals plotted against the covariate. While, for albumin,
linearity is not contra-indicated, the plot for bilirubin shows clear departures from ‘random
variation around 0’. This is supported by P-values from a formal significance test (0.459
for albumin and extremely small for bilirubin). The curve for bilirubin gets negative for
small values of the covariate indicating that the ‘expected’ number of failures is too large
10
Cumulative martingale residuals
−5
20 30 40 50
Albumin
Figure 5.12 PBC3 trial in liver cirrhosis: Checking linearity using cumulative martingale residuals
plotted against albumin.
EXAMPLES 215
−10
Figure 5.13 PBC3 trial in liver cirrhosis: Checking linearity using cumulative martingale residuals
plotted against bilirubin.
for low values of bilirubin compared to the observed (the latter is often equal to 0). This
suggests that relatively more weight should be given to low values of bilirubin in the linear
predictor and relatively less weight to high values – something that may be achieved by a
transformation of the covariate with a concave (‘downward bending’) function, such as the
logarithm. Figure 5.14 shows the plot after transformation and now the curve is more in
accordance with what would be expected under linearity. This is supported by the P-value
of 0.481. The estimates in this model were given in Table 2.7. The plot for albumin in this
model (not shown) is not much different from what was seen in Figure 5.12.
between the observed covariate Zi j for a subject failing at time Xi and an expected aver-
age value for covariate j, Z̄ j (Xi ), among subjects at risk at that time. According to Lin et
al. (1993), a plot of the cumulative sum of Schoenfeld residuals for subjects with Xi ≤ t
against time t is sensitive to departures from proportional hazards. If the proportional haz-
ards assumption fits the data well, then the resulting curve should vary non-systematically
around 0, and a formal goodness-of-fit test may be obtained along the same lines as for the
plot of cumulative martingale residuals.
Figures 5.15-5.17 show plots of cumulative Schoenfeld residuals (standardized by division
by SD(βbj )) against time for the three covariates in the model: Treatment, albumin, and
216 MARGINAL MODELS
10
5
Cumulative martingale residuals
−5
−10
2.5 5.0 7.5
log2(bilirubin)
Figure 5.14 PBC3 trial in liver cirrhosis: Checking linearity using cumulative martingale residuals
plotted against log2 (bilirubin).
2
Standardized score process
−2
0 1 2 3 4 5
Time since randomization (years)
Figure 5.15 PBC3 trial in liver cirrhosis: Checking proportional hazards using cumulative Schoen-
feld residuals for treatment (standardized) plotted against the time-variable.
EXAMPLES 217
2
Standardized score process
−2
0 1 2 3 4 5
Time since randomization (years)
Figure 5.16 PBC3 trial in liver cirrhosis: Checking proportional hazards using cumulative Schoen-
feld residuals for albumin (standardized) plotted against the time-variable.
2
Standardized score process
−1
−2
−3
0 1 2 3 4 5
Time since randomization (years)
Figure 5.17 PBC3 trial in liver cirrhosis: Checking proportional hazards using cumulative Schoen-
feld residuals for log2 (bilirubin) (standardized) plotted against the time-variable.
218 MARGINAL MODELS
log2 (bilirubin). It is seen that for neither of the covariates is the proportional hazards as-
sumption contra-indicated. This is confirmed both by the curves and the associated P-values
(0.919, 0.418, and 0.568, respectively).
The goodness-of-fit examinations show that linearity for albumin seems to describe the data
well, while, for bilirubin, a log-transformation is needed to obtain linearity. Furthermore,
for all three variables in the model, proportional hazards is a reasonable assumption. These
conclusions are well in line with what was seen in Sections 2.2.2 and 3.7.7. An advantage of
the general approach using cumulative residuals is that one needs not specify an alternative
against which linearity or proportional hazards is tested.
EXERCISES 219
5.9 Exercises
α01 (t)
0 -1
At risk Not at risk
α10 (t)
Set up the A (t) and P (s,t) matrices and express, using the Kolmogorov forward differential
equations, the transition probabilities in terms of the transition intensities (Section 5.1).
Exercise 5.2 (*) Consider the four-state model for the bone marrow transplantation study,
Figure 1.6. Set up the A (t) and P (s,t) matrices and express, using the Kolmogorov for-
ward differential equations, the transition probabilities in terms of the transition intensities
(Section 5.1).
eh (t)
α S(t)
= .
αh (t) 1 − Fh (t)
with Di (t0 ) the indicator I(Ti ∧ t0 ≤ Ci ) of observing the state occupied at t0 and W
bi (t0 ) =
1/G((t0 ∧ Xi )−) the estimated inverse probability of no censoring (strictly) before the min-
b
imum of t0 and the observation time Xi for subject i. The alternative estimating equation
(5.32) is
∑ A (ββ T Z i ) Nhi (t0 )Di (t0 )Wbi (t0 ) − Qh (t0 | Z i ) .
i
Exercise 5.5 (*) Derive the estimating equations for the landmark model (5.11).
220 MARGINAL MODELS
Exercise 5.6 Consider an illness-death model for the Copenhagen Holter study with states
‘0: Alive without AF or stroke’, ‘1: Alive with AF and no stroke’, ‘2: Dead or stroke’, see
Figures 1.3 and 1.7. Examine, using a time-dependent covariate, whether this process may
be modeled as being Markovian.
Exercise 5.7 Consider the four-state model for the Copenhagen Holter study, see Figure
1.7.
1. Fit separate landmark models at times 3, 6, and 9 years for the mortality rate, including
AF, stroke, ESVEA, sex, age, and systolic blood pressure.
2. Fit landmark ‘super models’ where the coefficients vary smoothly among landmarks but
with separate baseline hazards at each landmark.
3. Fit a landmark ‘super model’ where both the coefficients and the baseline hazards vary
smoothly among landmarks.
Exercise 5.8 Consider a competing risks model for the Copenhagen Holter study with
states ‘0: Alive without AF or stroke’, ‘1: Alive with AF and no stroke’, ‘2: Dead or stroke’,
see Figures 1.2 and 1.7.
Fit, using direct binomial regression, a model for being in state 1 at time 3 years including
the covariates ESVEA, sex, age, and systolic blood pressure.
Exercise 5.9 Consider the Cox model for stroke-free survival in the Copenhagen Holter
study including the covariates ESVEA, sex, age, and systolic blood pressure (Exercises 2.4
and 3.7).
1. Investigate, using cumulative Schoenfeld residuals, whether the effects of the covariates
may be described as time-constant hazard ratios.
2. Investigate, using cumulative martingale residuals, whether the effects of age and sys-
tolic blood pressure can be considered linear on the log(hazard) scale.
Exercise 5.10 Consider the data on recurrent episodes in affective disorder, Example 1.1.5.
Fit a Mao-Lin regression model (5.31) for the mean of the composite end-point recurrent
episode or death, including initial diagnosis as the only covariate and using severity weights
equal to 1.
Chapter 6
Pseudo-values
In Sections 4.2 and 5.5, we discussed how direct regression models for marginal parameters
in a multi-state model could be set up and fitted using generalized estimating equations
(GEEs). It turned out that this could be done on a case-by-case basis and that, furthermore, it
was typically necessary to explicitly address the censoring distribution. This is because the
uncensored observations had to be re-weighted to also represent those who were censored
and this required estimation of the probability of being uncensored at the times of observed
failures. One might ask whether it would be possible to apply a more general technique
when fitting marginal models for multi-state processes. The answer to this question is ‘yes’,
under the proviso that one is content with a model for a single or a finite number of time
points. A way to do this is to apply pseudo-values (or pseudo-observations – we will use
these notions interchangeably in what follows).
The idea is as follows: With complete data, i.e., in the absence of censoring, a regres-
sion model could be set up and fitted using standard GEE using the relevant aspect of the
complete data as response variable as explained in Section 5.5.1. To model the survival
probability S(t0 | Z) in the point t0 , the survival indicator I(Ti > t0 ) would be observed for
all subjects i = 1, . . . , n and could thus be used as outcome variable in the GEE. With in-
complete data this is not possible, and in this case the pseudo-values are calculated based
on the available data and they replace the incompletely observed response variables (e.g.,
Andersen et al., 2003; Andersen and Pohar Perme, 2010). This is doable because they,
under suitable assumptions on the censoring distribution, have the correct expected value
for given covariates (Graw et al., 2009; Jacobsen and Martinussen, 2016; Overgaard et al.,
2017). The pseudo-values typically build on a non-parametric estimator for the marginal
parameter, such as the Kaplan-Meier estimator for the survival function in the two-state
model (Sections 4.1.1 or 5.1.1) or the Aalen-Johansen estimator for the competing risks
cumulative incidence (Sections 4.1.2 or 5.1.2). Thereby, censoring is dealt with once and
for all leaving us with a set of n observations which are approximately independent and
identically distributed (i.i.d.). Note that, while right-censoring may be handled in this way,
data with left-truncation are typically harder to deal with (Parner et al., 2023).
In Section 6.1, the basic idea is presented in an intuitive way with several examples and in
Section 6.2, more mathematical details are provided. Section 6.3 presents a fast approxi-
mation to calculation of pseudo-values, and Section 6.4 gives a brief account of how to use
cumulative residuals when assessing goodness-of-fit of models fitted to pseudo-values.
221
222 PSEUDO-VALUES
6.1 Intuition
6.1.1 Introduction
The set-up is as follows: V (t) is a multi-state process, and interest focuses on a marginal
parameter which is the expected value, E( f (V )) = θ , say, of some function f of the process.
Examples include the following:
• V (t) is the two-state process for survival data, Figure 1.1, and θ is the state 0 occupation
probability Q0 (t0 ) at a fixed time point t0 , i.e., the survival probability S(t0 ) = P(T > t0 )
at that time.
• V (t) is the competing risks process, Figure 1.2, and θ is the state h, h > 0 occupation
probability Qh (t0 ) at a fixed time point t0 , i.e., the cause-h cumulative incidence Fh (t0 ) =
P(T ≤ t0 , D = h) at that time.
• V (t) is the two-state process for survival data, Figure 1.1, and θ is the expected time
ε0 (τ) spent in state 0 before a fixed time point τ, i.e., the τ-restricted mean survival
time.
• V (t) is the competing risks process, Figure 1.2, and θ is the expected time εh (τ) spent in
state h > 0 up to a fixed time point τ, i.e., the cause-h specific time lost due to that cause
before time τ.
• V (t) is a recurrent events process, Figures 1.4-1.5, and θ is the expected number, µ(t0 ) =
E(N(t0 )) of events at a fixed time point t0 .
In this section, we present the idea for the first of these examples. The other examples follow
the same lines and more discussion is provided in Sections 6.1.2-6.1.7. We are interested in
a regression model for the survival function at time t0 , S(t0 | Z), that is, the expected value
of the survival indicator f (T ) = I(T > t0 ) given covariates Z
One typical model for this could be what corresponds to a Cox model, i.e.,
where the intercept is β0 = log(A0 (t0 )), the log(cumulative baseline hazard) at time t0 , and
LP is the linear predictor LP = β1 Z1 + · · · + β p Z p . Another would correspond to an additive
hazard model
− log(S(t0 | Z))/t0 = β0 /t0 + LP
with β0 = A0 (t0 ), the cumulative baseline hazard at t0 . In general, some function g, the
link function, of the marginal parameter θ is the linear predictor. Note that such a model is
required to hold only at time t0 and not at all time points.
We first consider the unrealistic situation without censoring, i.e., survival times T1 , . . . , Tn
are observed and so are the t0 -survival indicators f (Ti ) = I(Ti > t0 ), i = 1, . . . , n. This situa-
tion serves as motivation for the way in which pseudo-observations are defined and, in this
situation, two facts can be noted:
INTUITION 223
1. The marginal mean E( f (T )) = E(I(T > t0 )) = S(t0 ) can be estimated as a simple aver-
age
b 0 ) = 1 ∑ I(Ti > t0 ).
S(t
n i
2. A regression model for θ = S(t0 | Z) with link function g can be analyzed using GEE with
f (T1 ) = I(T1 > t0 ), . . . , f (Tn ) = I(Tn > t0 ) as responses. This is a standard generalized
linear model for a binary outcome with link function g.
Let Sb−i be the estimator for S without observation i, i.e.,
1
Sb−i (t0 ) = I(T j > t0 ).
n−1 ∑
j6=i
n · S(t
b 0) = f (T1 ) + · · · + f (Ti−1 ) + f (Ti ) + f (Ti+1 ) + · · · + f (Tn ),
−i
(n − 1) · Sb (t0 ) = f (T1 ) + · · · + f (Ti−1 ) + f (Ti+1 ) + · · · + f (Tn ),
i.e.,
b 0 ) − (n − 1) · Sb−i (t0 ) = f (Ti ).
n · S(t
Thus, the ith observation can be re-constructed by combining the marginal estimator based
on all observations and that obtained without observation no. i.
We now turn to the realistic scenario where some survival times are incompletely observed
because of right-censoring, i.e., the available data are (Xi , Di ), i = 1, . . . , n where Xi is the
ith time of observation, the smaller of the true survival time Ti and the censoring time
Ci , and Di is 1 if Ti is observed and 0 if the ith observation is censored. In this case it is
still possible to estimate the marginal survival function, namely using the Kaplan-Meier
estimator, Sb given by Equation (4.3). Based on this, we can calculate the quantity
b 0 ) − (n − 1) · Sb−i (t0 ),
θi = n · S(t (6.1)
where Sb−i (t0 ) is the estimator (now Kaplan-Meier) applied to the sample of size n − 1
obtained by eliminating observation no. i from the full sample. The θi , i = 1, . . . , n given by
(6.1) are the pseudo-observations for the incompletely observed survival indicators f (Ti ) =
I(Ti > t0 ), i = 1, . . . , n.
Note that pseudo-values are computed for all subjects – whether the survival time was
observed or only a censored observation was available.
The idea is now, first, to transform the data (Xi , Di ), i = 1, . . . , n into θi , i = 1, . . . , n using
(6.1), i.e., to add one more variable to each of the n lines in the data set and, next, to an-
alyze a regression model for θ by using θi , i = 1, . . . , n as responses in a GEE with the
desired link function, g. Here, typically, a normal error distribution is specified since this
will enable the correct estimating equations to be set up for the mean value (in spite of
the fact that the distribution of pseudo-values is typically far from normal). Such a proce-
dure will provide estimators for the parameters in the regression model S(t0 | Z) that have
been shown to be mathematically well-behaved if the distribution of the censoring times Ci
224 PSEUDO-VALUES
does not depend on the covariates, Z (Graw et al., 2009; Jacobsen and Martinussen, 2016;
Overgaard et al., 2017, 2023). The situation with covariate-dependent censoring is more
complex and will be discussed in Section 6.1.8. The standard sandwich estimator based on
the GEE is most often used, though this may be slightly conservative, i.e., a bit too large,
as will be explained in Section 6.2.
The use of pseudo-values for fitting marginal models for multi-state parameters has a num-
ber of attractive features:
1. It can be used quite generally for marginal multi-state parameters whenever a suitable
estimator θb for the marginal mean θ = E( f (V )) is available.
2. It provides us with a set of new variables θ1 , . . . , θn for which standard models for com-
plete data can be analyzed.
3. It provides us with a set of new variables θ1 , . . . , θn to which various plotting techniques
are applicable.
4. If interest focuses on a single time point t0 , then a specification of a model for other time
points is not needed.
A number of difficulties should also be mentioned:
1. If censoring depends on covariates, then modifications of the method are necessary.
2. It only provides a model at a fixed point in time t0 , or as we shall see just below, at a
number of fixed points in time t1 , . . . ,tm , and these time points need to be specified.
3. The base estimator needs to be re-calculated n + 1 times, and if this computation is
involved and/or n is large, then obtaining the n pseudo-values may be cumbersome.
A multivariate model for S(t1 | Z), . . . , S(tm | Z) at a number, m of time points t1 , . . . ,tm can
be analyzed in a similar way. The response in the resulting GEE is m-dimensional and a
joint model for all time points is considered. The model could be what corresponds to a
Cox model, i.e.,
log(− log S(t j | Z)) = β0 j + LP,
with β0 j = log(A0 (t j )), j = 1, . . . , m, the log(cumulative baseline hazard) at t j .
1.00
0.75
Pseudo−values
0.50
0.25
0.00
0 1 2 3 4 5
Time since randomization (years)
Figure 6.1 PBC3 trial in liver cirrhosis: Pseudo-values for the survival indicator I(T > t) as a
function of follow-up time t for two subjects: A failure at T = 1 year (dashed) and a censoring at
C = 1 year (dotted).
negative values for t > 1 that increase towards 0), whereas, for the censored observation, the
pseudo-values decrease (without reaching 0). Even though the pseudo-values for I(T > t)
go beyond the interval [0, 1], they have approximately the correct conditional expectation
given covariates, i.e.,
E(I(Ti > t) | Zi ) ≈ E(θi | Zi )
if the censoring distribution is independent of covariates. This is why they can be used as
responses in a GEE for S(t | Z).
We next fix time and show how the pseudo-observations for all subjects in the data
set look like at those time points. For illustration, we compute pseudo-values at times
(t1 ,t2 ,t3 ) = (1, 2, 3) years. Figures 6.2a-6.2c show the results which are equivalent to what
was seen in Figure 6.1. For an observed failure time at T ≤ t j the pseudo-value is negative,
for an observed censoring time at C ≤ t j the pseudo-value is between 0 and 1, and for an
observation time X > t j (both failures and censorings) the pseudo-value is slightly above 1.
We next show how pseudo-observations can be used when fitting models for S(t0 | Z) and
look at t0 = 2 years as an example. To assess models including only a single quantitative
covariate like bilirubin or log2 (bilirubin), scatter-plots may be used much like in simple
linear regression models (Andersen and Skovgaard, 2010, ch. 4). Figure 6.3 (left panel)
shows pseudo-values plotted against bilirubin. Note that adding a scatter-plot smoother to
the plot is crucial – much like when plotting binary outcome data. The resulting curve
226 PSEUDO-VALUES
(a) I(Ti > t1 = 1 year).
1.0
0.5
Pseudo−values
0.0
−0.5
−1.0
0 1 2 3 4 5 6
Xi (years)
1.0
0.5
Pseudo−values
0.0
−0.5
−1.0
0 1 2 3 4 5 6
Xi (years)
1.0
0.5
Pseudo−values
0.0
−0.5
−1.0
0 1 2 3 4 5 6
Xi (years)
Figure 6.2 PBC3 trial in liver cirrhosis: Pseudo-values for the survival indicator I(Ti > tl = l years),
l = 1, 2, 3 for all subjects, i, plotted against the observation time Xi (failures: o, censorings: x).
INTUITION 227
1.2 2
log(−log(predicted pseudo−values))
1.0 1
0.8
0
Pseudo−values
0.6
−1
0.4
−2
0.2
−3
0.0
−0.2 −4
−0.4 −5
0 100 200 300 400 0 100 200 300 400
Bilirubin Bilirubin
Figure 6.3 PBC3 trial in liver cirrhosis: Pseudo-values for the survival indicator I(Ti > 2 years)
for all subjects, i, plotted against the covariate Zi = bilirubin with a scatter-plot smoother super-
imposed (left); in the right panel, the smoother is transformed with the cloglog link function.
1.2 2
log(−log(predicted pseudo−values))
1.0 1
0.8
0
Pseudo−values
0.6
−1
0.4
−2
0.2
−3
0.0
−0.2 −4
−0.4 −5
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
log2(bilirubin) log2(bilirubin)
Figure 6.4 PBC3 trial in liver cirrhosis: Pseudo-values for the survival indicator I(Ti > 2 years)
for all subjects, i, plotted against the covariate Zi = log2 (bilirubin) with a scatter-plot smoother
super-imposed (left); in the right panel, the smoother is transformed with the cloglog link function.
and Figure 6.3 (right panel) shows the smoother after this transformation. It is seen that
linearity does not describe the association well. Plotting, instead, against log2 (bilirubin)
(Figure 6.4) shows that using a linear model in this scale is not contra-indicated.
We can then fit a model for S(t0 | Z1 , Z2 , log2 (Z3 )) with Z1 , the indicator for CyA treatment,
Z2 = albumin, and Z3 = bilirubin using the pseudo-values at t0 = 2 years as the outcome
variable and using the cloglog link. Table 6.1 (left panel) shows the results. Compared
228 PSEUDO-VALUES
Table 6.1 PBC3 trial in liver cirrhosis: Estimated coefficients (and robust SD) from models for the
survival function with linear effects of albumin and log2 (bilirubin) based on pseudo-values. The
cloglog link function was used, and the SD values are based on the sandwich formula.
with the Cox model results in Table 2.7, it is seen that the estimated coefficients based
on the pseudo-values are similar. The SD values are somewhat larger which should be no
surprise since the Cox models use all data, whereas the pseudo-values concentrate on a
single point in time. A potential advantage of using the pseudo-observations is that if inter-
est does focus on a single time point, then they, in contrast to a Cox model, avoid making
modeling assumptions about the behavior at other time points. In Table 6.1 (right panel),
results are shown for a joint model for pseudo-values at times (t1 ,t2 ,t3 ) = (1, 2, 3) years:
log(− log S(t j | Z)) = β0 j + LP, j = 1, 2, 3. Now, the results are closer to those based on
the Cox model in Table 2.7. In particular, values of SD are smaller than when based on
pseudo-values at a single point in time and simulation studies (e.g., Andersen and Pohar
Perme, 2010) have shown that the SD does tend to get smaller when based on more time
points; however, more than m ∼ 5 time points will typically not add much to the preci-
sion. The model for more time points is fitted by adding the time points at which pseudo-
values are computed as a categorical covariate and the output then also includes estimates
(βb01 , βb02 , βb03 ) of the Cox log(cumulative baseline hazard) at times (t1 ,t2 ,t3 ). Note that, in
such a model, non-proportional hazards (at the chosen time points) corresponds to interac-
tions with this categorical time-variable. For the models based on pseudo-values, the SD
values are obtained using sandwich estimators from the GEE. These have been shown to
be slightly conservative, however, typically not seriously biased, see also Section 6.2.
Another type of plot which is applicable when assessing a regression model based on
pseudo-observations is a residual plot. Figure 6.5 shows residuals from the model in Table
6.1 (right panel), i.e.,
ri j = θi j − exp(− exp(βb0 j + LP
c i ))
plotted against log2 (bilirubin) for subject i. Here, j = 1, 2, 3 refers to the three time points
(t1 ,t2 ,t3 ). A smoother has been superimposed for each j and it is seen that the residuals
vary roughly randomly around 0 indicating a suitable fit of the model. Note that residual
plots are applicable also for multiple regression models in contrast to the scatter-plots in
Figure 6.3 (left panel). In Section 6.4 we will briefly discuss how formal significance tests
for the goodness-of-fit of regression models based on pseudo-observations may be devised
using cumulative pseudo-residuals.
INTUITION 229
0
Pseudo−residuals
−1
−2
1 2 3 4 5 6 7 8 9
log2(bilirubin)
Year 1 2 3
Figure 6.5 PBC3 trial in liver cirrhosis: Pseudo-residuals for the survival indicator I(Ti > t j ) for
all subjects, i, and for (t1 ,t2 ,t3 ) = (1, 2, 3) years plotted against log2 (bilirubin). The cloglog link
function was used.
Pseudo-values
Pseudo-observations are computed once and for all at the chosen time points. This
takes care of censoring and provides us with a set of new observations that can
be used as response variables in a GEE with the desired link function. This also
enables application of various graphical techniques for data presentation and model
assessment. In the example, this provided estimates comparable to those obtained
with a Cox model. This, however, is only a ‘poor man’s Cox model’ since fitting
the full Cox model is both easier and more efficient. So, the main argument for
using pseudo-observations is the generality of the approach: The same basic ideas
apply for a number of marginal parameters in multi-state models and for several link
functions. We will demonstrate these features in Sections 6.1.2-6.1.7.
−log(predicted pseudo−values)
1.0
0.8 2
Pseudo−values
0.6
0.4 1
0.2
0.0 0
−0.2
−0.4 −1
0 100 200 300 400 0 100 200 300 400
Bilirubin Bilirubin
Figure 6.6 PBC3 trial in liver cirrhosis: Pseudo-values for the survival indicator I(Ti > 2 years)
for all subjects, i, plotted against the covariate Zi = bilirubin with a scatter-plot smoother super-
imposed (left); in the right panel, the smoother is transformed with the (minus) log link function.
Table 6.2 PBC3 trial in liver cirrhosis: Estimated coefficients (and robust SD) from a model for the
survival indicator I(Ti > t j ) with linear effects of albumin and bilirubin based on pseudo-values at
(t1 ,t2 ,t3 ) = (1, 2, 3) years. The (minus) log link function was used.
Covariate βb SD
Treatment CyA vs. placebo -0.048 0.031
Albumin per 1 g/L -0.0097 0.0032
Bilirubin per 1 µmol/L 0.0042 0.0008
bilirubin. This is done in Figure 6.6 where a smoother has been superimposed (left), and in
the right-hand panel this smoother is log transformed. There seems to be a problem with
the fit for large values of bilirubin (where the smoother gets negative, thereby preventing a
log transform). Table 6.2 shows the estimated coefficients in a model for S(t j | Z) for the
three time points (t1 ,t2 ,t3 ) = (1, 2, 3) years, using the (minus) log link function and includ-
ing the covariates treatment, albumin and bilirubin. The coefficients have hazard difference
interpretations and may be compared to those seen in Table 2.10. The estimates based on
pseudo-values are seen to be similar, however, with larger SD values.
A residual plot may be used to assess the model fit, and Figure 6.7 shows the pseudo-
residuals from the model in Table 6.2 plotted against bilirubin. Judged from the smoothers,
the fit is not quite as good as that using the cloglog link.
0
Pseudo−residuals
−1
−2
0 100 200 300 400 500
Bilirubin
Year 1 2 3
Figure 6.7 PBC3 trial in liver cirrhosis: Pseudo-residuals for the survival indicator I(Ti > t j ) for all
subjects, i, and for (t1 ,t2 ,t3 ) = (1, 2, 3) years plotted against bilirubin. The model used the (minus)
log link function (Table 6.2).
Table 6.3 PBC3 trial in liver cirrhosis: Estimated coefficients (and robust SD) from a linear model
(identity link function) for the τ-restricted mean life time for τ = 3 years based on pseudo-values.
Covariate βb SD
Intercept 2.83 0.35
Treatment CyA vs placebo 0.148 0.073
Albumin per 1 g/L 0.023 0.0068
log2 (bilirubin) per doubling -0.243 0.032
that is, Z τ Z τ
θi = n S(t)dt
b − (n − 1) Sb−i (t)dt.
0 0
We consider the PBC3 trial and the value τ =3 years and compare with results using the
model by Tian et al. (2014). Figure 6.8 shows the scatter-plot where the pseudo-values θi
are plotted against the observation times Xi . It is seen that all observations Xi > τ give rise
to identical pseudo-values slightly above τ while observed failures before τ have pseudo-
values close to the observed Xi = Ti and censored observations before τ have values that
increase with Xi in the direction of τ.
Table 6.3 shows the results from a linear model (i.e., identity as link function) for ε0 (τ |
Z), τ = 3 years, based on pseudo-observations. The results are seen to coincide quite well
with those obtained using the Tian et al. model (Table 4.4). Figure 6.9 shows scatter-plots
of pseudo-values against log2 (bilirubin) and seems to not contra-indicate a linear model.
232 PSEUDO-VALUES
3
Pseudo−values
0
0 1 2 3 4 5 6
Xi (years)
Figure 6.8 PBC3 trial in liver cirrhosis: Pseudo-values for the restricted life time min(Ti , τ) for all
subjects, i, plotted against the observation time Xi for τ =3 years: Observed failures (o), censored
observations (x).
3
Pseudo−values
0
1 2 3 4 5 6 7 8 9
log2(bilirubin)
Figure 6.9 PBC3 trial in liver cirrhosis: Pseudo-values for the restricted life time min(Ti , τ) for
all subjects, i, plotted against log2 (bilirubin) for τ =3 years. A scatter-plot smoother has been
superimposed.
INTUITION 233
Table 6.4 PBC3 trial in liver cirrhosis: Estimated coefficients (and robust SD) from models with
logistic and cloglog link functions for the cumulative incidence of death without transplantation
before t0 = 2 years based on pseudo-values.
Covariate βb SD βb SD
Treatment CyA vs. placebo 0.112 0.370 -0.574 0.506
Albumin per 1 g/L -0.144 0.049
log2 (bilirubin) per doubling 0.713 0.188
Covariate βb SD βb SD
Treatment CyA vs placebo 0.106 0.351 -0.519 0.425
Albumin per 1 g/L -0.114 0.037
log2 (bilirubin) per doubling 0.570 0.145
Table 6.5 PBC3 trial in liver cirrhosis: Estimated coefficients (and robust SD) from models for the
cumulative incidence of death without transplantation. Models use the cloglog link function based
on pseudo-values at either 3 or 10 time points.
(a) Transplantation
Covariate βb SD βb SD
Treatment CyA vs. placebo -0.056 0.051 -0.063 0.046
Albumin per 1 g/L -0.001 0.004
log2 (Bilirubin) per doubling 0.100 0.026
Covariate βb SD βb SD
Treatment CyA vs. placebo -0.015 0.073 -0.085 0.069
Albumin per 1 g/L -0.022 0.007
log2 (Bilirubin) per doubling 0.143 0.032
and
log(− log(E(I(TD > t j ) | Z))) = log(A0 (t j )) + βS Z,
where Z is the indicator for treatment with liraglutide. Joint estimation of the treatment
effects (βR , βS ) was based on pseudo-values for (N(t j ), I(TD > t j )), j = 1, 2, 3, using the
Table 6.7 PROVA trial in liver cirrhosis: Estimated coefficients (log(relative risk)), (with robust SD)
of sclerotherapy (yes vs. no) on the probability P01 (1,t) of being alive in the bleeding state at time
t = 2 years given alive in the initial state at time s = 1 year based on pseudo-values using different
base estimators.
Base estimator βb SD
Landmark Pepe -0.151 0.925
Landmark Aalen-Johansen -0.261 0.882
Plug-in linear, complete -0.849 1.061
Plug-in duration scale, complete data -0.636 0.832
Plug-in linear, at-risk data -0.650 0.674
Plug-in linear, landmark data -0.079 0.920
236 PSEUDO-VALUES
0.05
0.04
0.03
Probability
0.02
0.01
0.00
1 2 3 4
Time since randomization (years)
Sclerotherapy No Yes
Figure 6.10 PROVA trial in liver cirrhosis: Landmark Aalen-Johansen estimators for the probability
P01 (1,t) of being alive in the bleeding state at time t > 1 year among patients observed to be in
the initial state at time s = 1 year. Separate curves are estimated for patients treated or not with
sclerotherapy.
sandwich estimator to estimate the SD and correlations of (βbR , βbS ). Figure 6.11 shows the
non-parametric Cook-Lawless estimates for E(N(t) | Z) and the Kaplan-Meier estimates
for S(t | Z), and Table 6.8 shows the results from the bivariate pseudo-value regression.
The estimated regression coefficients are close to those based on separate Ghosh-Lin and
Cox models quoted in Section 4.2.3, i.e., −0.159 (SD = 0.088) for the log(mean ratio)
and −0.166 (SD = 0.070) for the log(hazard ratio). From the estimated joint distribution
of (βbR , βbS ), i.e., a bivariate normal distribution with SD and correlation equal to the values
from Table 6.8, it is possible to conduct a bivariate Wald test for the hypothesis (βR , βS ) =
(0, 0). The 2 DF Wald statistic takes the value 8.138 corresponding to P = 0.017.
Table 6.8 LEADER cardiovascular trial in type 2 diabetes: Parameter estimates (with robust SD) for
treatment (liraglutide vs. placebo) from a bivariate pseudo-value model with recurrent myocardial
infarctions (R) and overall survival (S) at three time points at (20,30,40) months.
Figure 6.11 LEADER cardiovascular trial in type 2 diabetes: Cook-Lawless estimates of the mean
number of recurrent myocardial infarctions (left) and Kaplan-Meier estimates of the survival func-
tion (right), by treatment group.
(see, e.g., Binder et al., 2014) that when survival data (Xi , Di ), i = 1, . . . , n are available then
the Kaplan-Meier estimator may, alternatively, be written using IPCW
n
b = 1 − 1 ∑ Ni (t) ,
S(t) (6.2)
n i=1 G(X
b i −)
1 n Ni (t)
Sbc (t) = 1 − ∑ , (6.3)
n i=1 G(X
b i − | Zi )
b | Z) now based on a regression model for the censoring distribution, e.g., a Cox
with G(t
model. For this situation, pseudo-values θi for the survival indicator can be based on Sbc (t)
and used as explained in Section 6.1.1. If the model for the censoring distribution is cor-
rectly specified, then the resulting estimators have the desired properties (Overgaard et al.,
2019). Similar modifications may be applied to other estimators on which pseudo-values
are based.
with d A bh (u) = dNh (u)/Y (u) being the jumps in the Nelson-Aalen estimator of the cause-h
specific cumulative hazard (Equation (3.10)) and S(u)
b the all-cause Kaplan-Meier estimator
(Equation (4.3)). Now, by Equation (4.15), the fraction of subjects still at risk at time t−,
i.e., Y (t)/n, can be re-written as
1
Y (t) = S(t−)
b G(t−),
b
n
leading to the IPCW version of the Aalen-Johansen estimator
Z t
1
Fbh (t) = dNh (u)/n,
0 G(u−)
b
compare Section 6.1.8. Here, as previously, G b is the Kaplan-Meier estimator for the cen-
soring distribution. The empirical processes on which the estimator is based are
bY (t) = 1 ∑ Yi (t)
H
n i
and
bh (t) = 1 ∑ Nhi (t), h = 0, 1, . . . , k,
H
n i
where N0i is the counting process for censoring, N0i (t) = I(Xi ≤ t, Di = 0). With this nota-
tion, the cumulative censoring hazard is estimated by the Nelson-Aalen estimator
Z t
b0 (t) = 1 b0 (u)
A dH
0 bY (u)
H
π
b =
G(t) 1 − dAb0 (u) .
[0,t]
Since observations for i = 1, . . . , n are independent, it follows by the law of large numbers
that Hb = (HbY , H
b0 , H
b1 , . . . , H
bk ) converges to a certain limit η = (ηY , η0 , η1 , . . . , ηk ). The
Aalen-Johansen estimator Fh is a certain functional, say φ , of H
b b and the true value Fh (t) = θ
is the same functional applied to η .
THEORETICAL PROPERTIES (*) 239
A smooth functional such as φ may be Taylor (von Mises) expanded
1
η ) + ∑ φ̇ (Xi∗ )
b ) ≈ φ (η
φ (H
n i
1
= θ + ∑ φ̇ (Xi∗ )
n i
where Xi∗ = (Xi , Di ) is the data point for subject i and φ̇ is the first order influence function
for φ (·). This is defined by
d
φ̇ (x) = φ ((1 − u)ηη + uδx )|u=0
du
= φη0 (δx − η ),
i.e.,
θi ≈ θ + φ̇ (Xi∗ ). (6.4)
We assume a model for the cumulative incidence of the form
g(E(I(T ≤ t, D = h) | Z )) = β T Z ,
i.e., with link function g and where Z contains the constant ‘1’ and β the corresponding
intercept, and estimates of β are obtained by solving the GEE
U (β
β ) = ∑ A (β β T Z i )) = 0 .
β , Z i )(θi − g−1 (β
i
β TZ i) − θ
E(φ̇ (Xi∗ ) | Z i ) = g−1 (β (6.5)
and this must be verified on a case-by-case basis by explicit calculation of the influence
function. This has been done by Graw et al. (2009) for the cumulative incidence and more
generally by Overgaard et al. (2017) under the assumption that censoring is independent of
covariates. For the cumulative incidence the influence function is
Z t Z t
dNhi (u) Fh (t) − Fh (u)
φ̇ (Xi∗ ) = − Fh (t) + dM0i (u), (6.6)
0 G(u−) 0 S(u)G(u)
240 PSEUDO-VALUES
Rt
where Nhi counts h-events for subject i and M0i (t) = N0i (t) − 0 Yi (u)dA0 (u) is the martin-
gale for the process N0i counting censorings for subject i. From this expression, (6.5) may
be shown using the martingale property of M0i (t).
By the first order von Mises expansion, unbiasedness of the GEE was established and, had
the pseudo-values θ1 , . . . , θn been independent, the standard sandwich variance estimator
would apply for βb . A second order von Mises expansion gives the approximation
1
θi ≈ θ + φ̇ (Xi∗ ) + φ̈ (Xi∗ , X j∗ ) (6.7)
n−1 ∑
j6=i
where φ̈ is the second-order influence function. This may be shown to have expectation
zero (Overgaard et al., 2017); however, the presence of the second order terms shows that
θ1 , . . . , θn are not independent meaning that the GEEs are not a sum of independent terms
even when inserting the true value β 0 , and, therefore, the sandwich estimator needs to be
modified to properly describe the variability of βb . The details were presented by Jacobsen
and Martinussen (2016) for the Kaplan-Meier estimator and more generally by Overgaard
et al. (2017). However, use of the standard sandwich variance estimator based on the GEE
for pseudo-values from the Aalen-Johansen estimator turns out to be only slightly conser-
vative because the extra term in the correct variance estimator arising from the second order
terms in the expansion is negative and tends to be numerically small.
where, in the latter term, estimates for the quantities appearing in the expression for the
influence function are inserted (Parner et al., 2023; Bouaziz, 2023). To illustrate this idea,
we study the Aalen-Johansen estimator Fbh (t) for which the influence function is given by
(6.6). The corresponding approximation in (6.8) is then
Z t Z t b
dNhi (u) Fh (t) − Fbh (u)
θbi = + b0i (u),
dM
0 G(u−)
b 0 Y (u)/n
inserting estimates for Fh , G and the cumulative censoring hazard A0 . An advantage is that
these estimates can be calculated once based on the full sample and, next, the estimated
influence function is evaluated at each observation Xi∗ . Thus,
˙ ∂ Fbw (t)
φb(Xi∗ ) = h |
∂ wi w =1/n
and the approximate pseudo-value is
˙
θbi = θb + φb(Xi∗ ),
0.8
0.005
IJ pseudo−value
0.6
Difference
0.4 0.000
0.2
−0.005
0.0
−0.2 −0.010
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.2 0.4 0.6 0.8 1.0
Pseudo−value Pseudo−value
Figure 6.12 PBC3 trial in liver cirrhosis: Infinitesimal jackknife (IJ) pseudo-values for the survival
indicator I(Ti > 2 years) for all subjects, i, (left) and difference between IJ pseudo-values and or-
dinary pseudo-values (6.1) (right) plotted against the ordinary pseudo-values. An identity line has
been added to the left-hand plot.
pseudo-observations. Models for the probability of no medical failure before chosen time
points were fitted to pseudo-values using the cloglog link and including only the covari-
ate bilirubin or only log(bilirubin). The conclusion was that linearity is rejected without
transforming the covariate, whereas no deviations from linearity were detected after a log-
transformation.
EXERCISES 243
6.5 Exercises
Exercise 6.1 (*) The influence function for the survival function S(t) is
Z t
dMi (u)
φ̇ (Xi∗ ) = −S(t)
0 S(u)G(u)
with Mi (u) = Ni (u) − 0u Yi (s)dA(s) being the martingale for the failure counting process
R
for subject i (Overgaard et al. 2017). The corresponding ‘plug-in’ approximation is then
Z t∧Xi
∗ N i (t) dN(u)
φ (Xi ) = −S(t)
ḃ b − .
Y (t ∧ Xi )/n 0 Y (u)2 /n
1. Show Rthat, writing the estimator in the ‘exp(−Nelson-Aalen)’ form Sbw (t) =
exp(− 0t (∑i wi dNi (u)/ ∑i wiYi (u)), this expression is obtained as
˙ ∂ Sbw (t)
φb(Xi∗ ) = | .
∂ wi w =1/n
it holds that
∂ Sbw (t)
Z t∧Xi
Ni (t) dN(u)
| = −S(t) − .
∂ wi w =1/n
b
Y (t ∧ Xi ) − dN(t ∧ Xi ) 0 Y (u)(Y (u) − dN(u)
˙ ∂ Fbw (t)
φb(Xi∗ ) = h | ,
∂ wi w =1/n
with Fbhw (t) given by (6.11), leads to the pseudo-value approximation obtained by
plugging-in estimators into (6.10).
2. Show that, in the case of no censoring, the influence function reduces to
Exercise 6.3 Consider the Copenhagen Holter study and the composite end-point stroke-
free survival.
244 PSEUDO-VALUES
1. Fit, using pseudo-values, a cloglog model for experiencing that end-point before time 3
years including the covariates ESVEA, sex, age, and systolic blood pressure.
2. Compare the results with those of Exercise 2.4.
Exercise 6.4 Consider the Copenhagen Holter study and the composite end-point stroke-
free survival.
1. Fit, using pseudo-values a linear model for the 3-year restricted mean time to the com-
posite event including the covariates ESVEA, sex, age, and systolic blood pressure.
2. Compare with the results of Exercise 4.3.
Exercise 6.5 Consider the competing outcomes stroke and death without stroke in the
Copenhagen Holter study.
1. Fit, using pseudo-values, a cloglog-model for the cumulative incidences at 3 years in-
cluding ESVEA, sex, age, and systolic blood pressure.
2. Compare with the results of Exercises 4.4 and 5.8.
Chapter 7
Further topics
In previous chapters, we have discussed a number of methods for analyzing statistical mod-
els for multi-state survival data based on rates or on marginal parameters, such as risks of
being in certain states at certain time-points – the latter type of models sometimes based on
pseudo-values. In this final chapter, we will introduce a number of possible extensions to
these methods. For these further topics, entire books and review papers have been written,
e.g., Sun (2006) and van den Hout (2020) on interval-censored data (see also Cook and
Lawless, 2018, ch. 5), Hougaard (2000) and Prentice and Zhao (2020) on non-independent
data, Hernán and Robins (2020) on causal inference, Rizopoulos (2012) on joint models,
and Borgan and Samuelsen (2014) on cohort sampling. This means that our exposition will
be brief and we provide references for further reading.
7.1 Interval-censoring
So far, we have assumed that the multi-state process Vi (t) was observed continuously, i.e.,
exact transition times were observed up till the time Xi = Ti ∧Ci – the minimum of the time
of reaching an absorbing state and the time of right-censoring. Such an observation scheme
is not always possible. Sometimes, Vi (t) is only observed intermittently, that is, only the
values Vi (J0i ),Vi (J1i ), . . . ,Vi (JNi i ) at a number (Ni +1) of inspection times J i = (J0i , J1i , . . . , JNi i )
are ascertained. Typically the first time, J0i equals 0 for all subjects, i but, in general, the
inspection times may vary among subjects. The resulting observations of Vi (t) are said to
be interval-censored. The data arising when J i is the same for all i are known as panel data
(e.g., Kalbfleisch and Lawless, 1985). There may also be situations where the very concept
of an exact transition time is not meaningful, e.g., the time of onset of a slowly developing
disease such as dementia. In such a case, typically only a last time seen without the disease
and a first time seen with the disease are available for any subject who develops the disease,
once more giving rise to interval-censoring.
An assumption that will be made throughout, similar to that of independent censoring (Sec-
tion 1.3), is that the inspection process J i is independent of Vi (t) (e.g., Sun, 2006, ch. 1; see
also Cook and Lawless, 2018, ch. 7).
In this section, we will give a brief account of some techniques that have been developed
for analysis of interval-censored multi-state survival data.
245
246 FURTHER TOPICS
7.1.1 Markov processes (*)
Intermittent observation of the process V (t) gives rise to a likelihood contribution from
subject i that is a product of factors
P(Vi (J`i ) = s` | Vi (J`−1
i
) = s`−1 ), ` = 1, . . . , Ni ,
each corresponding to the probability of moving from the state s`−1 occupied at time J`−1 i
to the state s` occupied at the next inspection time, J`i . The resulting likelihood is tractable
if V (t) is a Markov process and transition hazards are assumed to be piece-wise constant
because then the transition probabilities are explicit functions of the transition hazards,
see Equation (5.5). Piece-wise constant hazard models for general Markov multi-state pro-
cesses were discussed by Jackson (2011) (see also van den Hout, 2017, ch. 4) and may also
be used in the special models to be discussed in the next sections.
where S is the survival function S(t) = P(T > t). Analysis of parametric models, including
the piece-wise exponential model, based on this likelihood is simple and asymptotic prop-
erties follow from standard likelihood theory. Non-parametric maximization leads to the
Turnbull (1976) estimator for which the large-sample distribution is more complex (e.g.,
Sun, 2006, ch. 2 and 3). Pseudo-values based on a parametric model were discussed by
Bouaziz (2023).
Regression analysis of interval-censored survival data via transformation models based on
this likelihood were studied by Zeng et al. (2016). This class of models includes the Cox
model, previously studied by Finkelstein (1986). Regression analysis of current status data
using an additive hazards model was discussed by Lin et al. (1998). For the two-state model,
panel data give rise to grouped survival data. In this situation, non-parametric estimation
of the survival function reduces to the classical life-table (e.g., Preston et al., 2000). The
Cox model for grouped survival data was studied by Prentice and Gloeckler (1978).
INTERVAL-CENSORING 247
7.1.3 Competing risks (*)
Also for the competing risks model (Figure 1.2), interval-censored data reduce to an interval
(JL , JR ] with T ∈ (JL , JR ]. Following Section 7.1.1, we will assume that, when JR < ∞, we
also observe the state V (JR ), i.e., the cause of death. When JR = ∞, observation of T is
right-censored at JL and the cause of death is unknown.
Similar to the two-state model, Section 7.1.2, mid-point imputation is a possibility but it
is generally not recommended. The likelihood contribution from subject i based on the
interval-censored competing risks data is
!
∏(Fh (JRi ) − Fh (JLi ))I(D =h)
i
S(JLi )I(Di =0)
h
where fθ (·) is the density function for the gamma distribution with mean 1 and SD2 = θ .
The survival function can be evaluated to be
1 −θ
Sh (t) = 1 + Ach (t)
θ
Rt
with Ach (t) = c
0 αh (u)du (e.g., Hougaard, 2000, ch. 7).
From the assumption of conditional independence of (T1 , T2 ) given frailty, A, it follows that
the bivariate survival function is
−θ
S(t1 ,t2 ) = P(T1 > t1 , T2 > t2 ) = S1 (t1 )−1/θ + S2 (t2 )−1/θ − 1 , (7.1)
(e.g., Hougaard, 2000, ch. 7; see also Cook and Lawless, 2018, ch. 6). Based on this result,
the intensity process for the counting process Nh (t) for subject h = 1, 2 can be seen to equal
1 θ + N1 (t−) + N2 (t−)
lim P(Nh (t + ∆) − Nh (t) = 1 | Ht− ) = Yh (t)αhc (t) ,
∆→0 ∆ θ + A1 (t ∧ X1 ) + A1 (t ∧ X2 )
250 FURTHER TOPICS
where Ht , as introduced in Section 1.4, denotes the observed past in [0,t] (see, e.g., Nielsen
et al., 1992). This shows that the shared gamma frailty model induces an intensity for
members of a given cluster that, at time t, depends on the number of previously observed
events in the cluster.
These derivations show how the marginal and joint distributions of T1 and T2 follow from
the shared (gamma) frailty specification where the joint distribution is expressed in terms
of the marginals by (7.1). Based on this expression, Glidden (2000), following earlier work
by Hougaard (1987) and Shih and Louis (1995), showed that it is possible to go the other
way around, i.e., first to specify the margins S1 (t) and S2 (t) and, subsequently, to combine
them into a joint survival function using (7.1). This equation is an example of a copula, i.e.,
a joint distribution on the unit square [0, 1] × [0, 1] with uniform margins. It is seen that a
shared frailty model induces a copula – other examples were given by, among others, An-
dersen (2005). Glidden (2000), for the gamma distribution, and Andersen (2005) for other
frailty distributions studied two-stage estimation, as follows. First, the marginal survival
functions S1 (t) and S2 (t) are specified and analyzed, e.g., using marginal Cox models as
in Sections 4.3 and 5.6. Next, estimates from these marginal models are inserted into (7.1)
(or another specified copula) to obtain a profile-like likelihood for the parameter(s) in the
copula, i.e., for θ in the case of the gamma distribution. This is maximized to obtain an
estimate, θb, for the association parameter. These authors derived asymptotic results for the
estimators (building, to a large extent, on Spiekerman and Lin, 1999). With a two-stage
approach, it is possible to get regression coefficients with a marginal interpretation (based
on a model for which goodness-of-fit examinations are simple) and at the same time get
a quantification of the within-cluster association, e.g., using Kendall’s coefficient of con-
cordance as exemplified in Section 3.9.2. Methods for evaluating the fit to the data of the
chosen copula are, however, not well developed (see, e.g., Andersen et al., 2005).
i.e., the difference between the means of what would be observed if every subject either
receives the treatment or every subject receives the control. In (7.2), the average causal
effect is defined as a difference; however, the causal treatment effect could equally well be
a ratio or another contrast between E( f (V z=1 )) and E( f (V z=0 )).
Examples of functions f (·) are the state h indicator f (V ) = I(V (t0 ) = h) at time t0 in which
case θZ would be the causal risk difference of occupying state h at time t0 , e.g., the t0 -year
risk difference of failure in the two-state model (Figure 1.1). Another example based on the
two-state model is f (V ) = min(T, τ), in which case θZ is the causal difference between the
τ-restricted mean life times under treatment and control. Note that θZ could not be a hazard
difference or a hazard ratio because the hazard functions
do not contrast the same population under treatment and control but, rather, they are con-
trasting the two, possibly different, sub-populations who would survive until time t under
either treatment or under control (e.g., Martinussen et al., 2020).
If treatment were randomized, then θZ would be estimable based on the observed data. This
is because for z0 = 0, 1 we have that
E( f (V z=z0 )) = E( f (V z=z0 ) | Z = z0 )
E( f (V z=z0 ) | Z = z0 ) = E( f (V ) | Z = z0 )
and the latter mean is estimable from the subset of subjects randomized to treatment z0
– at least under an assumption of independent censoring. Note that, by the consistency
assumption, counterfactual outcomes are linked to the observed outcomes.
252 FURTHER TOPICS
7.3.2 The g-formula (*)
In the previous section, we presented a formal definition of causality based on counterfactu-
als and argued why average causal effects were estimable based on data from a randomized
study under the assumption of consistency. We will now turn to observational data and dis-
cuss under which extra assumptions an average causal effect can be estimated using the
g-formula. Recall from Section 1.2.5 that the g-formula computes the average prediction
1
θbz0 = ∑ fb(Vi (t) | Z = z0 , Z
ei) (7.3)
n i
based on some regression model for the parameter, θ of interest including treatment, Z
and other covariates (confounders) Z e . The prediction is performed by setting treatment to
z0 (= 0, 1) for all subjects and keeping the observed confounders Z e i for subject i = 1, . . . , n.
This estimates (under assumptions to be stated in the following) the mean E( f (V z=z0 )).
This is because we always have the identity
E( f (V z=z0 )) = EZe E( f (V z=z0 | Z
e)
That is, we assume that sufficiently many confounders are collected in Z e to obtain ex-
changeability for given value of Z or, in other words, for given confounders those who get
e
treatment 1 and those who get treatment 0 are exchangeable. This assumption is also known
as no unmeasured confounders. Finally, consistency, i.e.,
EZe E( f (V z=z0 ) | Z
e , Z = z0 ) = E e E( f (V ) | Z
Z
e , Z = z0 ) ,
is assumed, where the right-hand side is the quantity that is estimated by the g-formula.
In addition, an assumption of positivity should be imposed, meaning that for all values
of Z
e the probability of receiving either treatment should be positive. By this assumption,
prediction of the outcome based on the regression model for θ is feasible for all confounder
values under both treatments and, therefore, ‘every corner of the population is reached by
the predictions’. The g-formula estimate of (7.2) then becomes
of subject i receiving treatment 1 is known as the propensity score and the idea in inverse
probability of treatment weighting, IPTW, is to construct a re-weighted data set, replacing
the outcome for subject i by a weighted outcome using the weights
bi = Zi 1 − Zi
W + , (7.6)
PS(Z i ) 1 − PS(
c e c Z ei)
where the propensity score has been estimated. That is, the outcome for subject i is
weighted by the inverse probability of receiving the treatment that was actually received
and, by this, the re-weighted data set becomes free of confounding because Z
e has the same
distribution among treated (Z = 1) and controls (Z = 0) (e.g., Rosenbaum and Rubin, 1983).
Therefore, a simple model including only treatment can be fitted to the re-weighted data set
to estimate θZ . This could be any of the models discussed in previous chapters from which
θ = E( f (V )) can be estimated, e.g., Cox or Fine-Gray models for risk parameters at some
time point, or direct models for the expected length of stay in a state in [0, τ].
In the situation where the outcome is represented by a pseudo-value θi (Andersen et al.,
2017) or with complete data, i.e., with no censoring, whereby f (Vi (t)) is completely ob-
servable and equals θi , see Section 6.1, the estimate is a simple difference between weighted
averages
1 b i θi − 1 ∑ W
θbZ = ∑ W bi θi .
n i:Zi =1 n i:Zi =0
In this case, it can be seen that this actually estimates the average causal effect, as follows.
The mean of the estimate in treatment group 1 (inserting the true propensity score) is
1 1 Zi θi
E ∑ Wi θi = EZe ∑ E |Z
ei
n i:Zi =1 n i PS(Zei)
An identical calculation for the control group gives the desired result. It is seen that, because
we divide by PS(Z e ) or 1 − PS(Ze ) in (7.6), the assumption of positivity is needed.
254 FURTHER TOPICS
7.3.4 Summary and discussion
Sections 7.3.2 and 7.3.3 demonstrated that the average causal effect (7.2) may be estimated
in two different ways under a certain set of assumptions. The g-formula (Equations (7.3)
and (7.4)) builds on an outcome model, i.e., a model by which the marginal parameter, θ of
interest may be predicted for given values of treatment Z and confounders Z e . On the other
hand, IPTW builds on a model for treatment assignment (the propensity score, Equation
(7.5)) from which weights (7.6) are calculated and a re-weighted data is constructed. The
re-weighted data set is free of confounding from Ze and, therefore, the average causal effect
(7.2) may be estimated by fitting a simple model including only treatment Z to this data set.
The assumptions needed for a causal interpretation of the resulting θbZ include: Consistency
that links the observed outcomes to the counterfactuals, see Section 7.3.1, positivity, i.e., a
probability different from both 0 and 1 for any subject in the population of receiving either
treatment, and no unmeasured confounders – sufficiently many confounders are collected
in Z
e to ensure, for given confounder values, that those who get treatment 1 and those who
get treatment 0 are exchangeable. It is an important part of any causal inference endeavor
to discuss to what extent these conditions are likely to be fulfilled. In addition to these
assumptions, the g-formula rests on the outcome model being correctly specified and IPTW
on the propensity score model being correctly specified. Doubly robust methods have been
devised that only require one of these models to be correct, as well as even less model-
dependent techniques based on targeted maximum likelihood estimation (TMLE), see, e.g.,
van der Laan and Rose (2011).
Causal inference is frequently used – also for the analysis of multi-state survival data, see,
e.g., Gran et al. (2015), while Janvin et al. (2023) discussed causal inference for recurrent
events with competing risks. In this connection, analysis with time-dependent covariates
poses particular challenges because these may, at the same time, be affected by previous
treatment allocation and be predictive for both future treatment allocation and for the out-
come (known as time-dependent confounding, see, e.g., Daniel et al., 2013).
with a fixed-effects linear predictor LPm i (t) = ∑` γ` Z`i (t) depending on covariates Z
e e
that are either time-fixed or deterministic functions of time. The random effects,
log(A1i ), . . . , log(Aki ) enter via k fixed functions of time, f` (t), ` = 1, . . . , k, where k is of-
ten taken to be 2 with ( f1 (t), f2 (t)) = (1,t), corresponding to random intercept and random
slope. The random effects are assumed to follow a k−variate normal distribution with mean
zero and some covariance. The hazard function
is assumed to depend on the true value of the time-dependent covariate and, thereby, on
the random effects, and possibly on other time-fixed covariates via the linear predictor
LPαi = ∑` β` Z`i . Some components, Z, Ze may appear in both linear predictors LPm , LPα .
The baseline hazard is typically modeled parametrically, e.g., by assuming it to be piece-
wise constant. In this model, the random effects (A1 , . . . , Ak ) serve as frailties (Section 3.9)
which, at the same time, affect the longitudinal development of the time-dependent covari-
ate. The survival time, T and the time-dependent covariate are assumed to be conditionally
independent given the frailties, and measurements Zi (ti`1 ) and Zi (ti`2 ) taken at different time
points are also conditionally independent for given frailties. Thus, the correlation among
repeated measurements of Zi (·) is given entirely by the random effects. These assumptions
are utilized when setting up the likelihood in the next section. A careful discussion of the
assumptions was given by Tsiatis and Davidian (2004).
256 FURTHER TOPICS
7.4.2 Likelihood (*)
The data for each of n independent subjects include the censored event time informa-
tion (Xi , Di ), covariates (Z Z i, Z
e i (t)), and measurements of the time-dependent covariate
Z i (t) = (Zi (ti1 ), . . . , Zi (tini )) taken at ni time points (typically with ti1 = 0). The likelihood
contribution from the event time information (Xi , Di ) for given frailties and given (Z Z i, Z
e i (t))
is ZX
i
α Di
Li (θθ | A i ) = (αi (Xi | A i )) exp − αi (t | A i )dt .
0
where ϕ is the relevant normal density function. The observed-data likelihood is now ob-
tained by integrating over the unobserved frailties
Z
Li (θθ ) = Liα (θθ | A i )LiZ (θθ | A i )ϕ(A
Ai )dA
Ai
with normal density ϕ(A Ai ). Maximization of L(θθ ) = ∏i Li (θθ ) over the set of all parameters
(denoted θ ) involves numerical challenges which may be approached, e.g., using the EM-
algorithm (Rizopoulos, 2012; ch. 4). Also, variance estimation for the parameter estimates,
θb may be challenging though, in principle, these estimates may be obtained from the second
derivative of log(L(θθ )).
q d qd
d
q qd
a a
d d
q qd
a d a
p p
p p
p p
d
0 t1 t2 t3 τ 0 t1 t2 t3 τ
Figure 7.1 A cohort observed from time t = 0 to τ with D = 3 cases observed at times t1 ,t2 ,t3 , at
which m − 1 = 2 controls are sampled from the respective risk sets.
f NCC (β
D β TZ `)
exp(β
PL β) = ∏ , (7.7)
`=1 ∑ j∈R(t β TZ j )
e ` ) exp(β
which is Equation (3.16) with the sum ∑ j Y j (t) exp(β β T Z j ) over the full risk set replaced
by the corresponding sum over the sampled risk set R(t e ` ). Having estimated the regression
coefficients, the cumulative baseline hazard function A0 (t) = 0t α0 (u)du may be estimated
R
by
b0,NCC (t) = ∑ 1
A , (7.8)
t` ≤t (Y (t` )/m) ∑ exp(βb TZ j )
j∈R(t` )
e
which is the Breslow estimator (3.18) with the sum over the sampled risk set up-weighted
by the ratio between the full risk set size, Y (t` ) and that of the sampled risk set, m. Large
COHORT SAMPLING 259
q qd
q qd
a d da
d d d
d d d
q qd
a Se d da
d d d
p p
p p
p p
0 t1 t2 t3 τ 0 t1 t2 t3 τ
Figure 7.2 A cohort observed from time t = 0 to τ with D = 3 cases observed at times t1 ,t2 ,t3 . A
random sub-cohort, Se is sampled at time t = 0.
sample properties of (7.7) and (7.8), including estimation of SD, were discussed by Borgan
et al. (1995) who also introduced other ways of sampling from the risk set than simple
random sampling.
f CC (β
D β TZ `)
exp(β
PL β) = ∏ . (7.9)
`=1 ∑ j∈S∪{`}
e β TZ j )
Y j (t` ) exp(β
Here, the comparison group at case time t` is the part of the sub-cohort Se that is still at risk
(i.e., with Y j (t` ) = 1) – with the case {`} added if this occurred outside the sub-cohort. Let
Ye (t` ) be the size of this comparison group. From βb , the cumulative baseline hazard may be
estimated by
b0,CC (t) = 1
A ∑ T
, (7.10)
t` ≤t (Y (t` )/Ye (t` )) ∑ j∈S∪{`} Y j (t` ) exp(βb Z j )
e
which is the Breslow estimator (3.18) with the sum over the remaining sub-cohort at time t`
up-weighted to represent the sum over the full risk set at that time. Large sample properties
of (7.9) and (7.10) were discussed by Self and Prentice (1988), and modifications of the
estimating equations by Borgan and Samuelsen (2014). Thus, all cases still at risk at t` may
be included in the comparison group when equipped with suitable weights.
260 FURTHER TOPICS
7.5.3 Discussion
Figures 7.1 and 7.2 illustrate the basic ideas in the two sampling designs. In the nested
case-control design, controls are sampled at the observed failure times, while, in the case-
cohort design, the sub-cohort is sampled at time t = 0 and used throughout the period of
observation. It follows that the latter design is useful when more than one case series is of
interest in a given cohort, because the same sub-cohort may be used as comparison group
for all cases. This was the situation in the study by Petersen et al. (2005) where mortality
rates from a number of different causes were analyzed. In the nested case-control design,
controls are matched on time – a feature that was useful in the study by Josefsen et al.
(2000) because the smears from cases and matched controls had the same storage time and,
thereby, ‘similar quality’. If, in a nested case-control study, more case series are studied
then new controls must be sampled for each new series since the failure times will typically
differ among case series. However, Støer et al. (2012) discussed how to re-use controls
among case series in such a situation.
In situations where both designs are an option, one may wonder about their relative effi-
ciencies. It appears that the efficiency of the two designs are quite similar when based on
similar numbers of subjects for whom covariates are ascertained. The efficiency of a nested
case-control study compared to a full cohort study has been shown to be of the order of
magnitude of (m − 1)/m, see, e.g., Borgan and Samuelsen (2014).
Table 7.2 Guinea-Bissau childhood vaccination study: Estimated coefficients (and SD) for BCG
vaccination (yes vs. no) from Cox models using follow-up time as the time-variable. Adjustment for
age at entry was made, and different sampling designs were used: Full cohort, nested case-control
with m − 1 = 3 controls per case, and case-cohort with m
e = 664.
Design βb SD
Full cohort -0.347 0.146
Nested case-control -0.390 0.174
Case cohort -0.389 0.166
Bibliography
261
262 BIBLIOGRAPHY
Andersen, P. K., Keiding, N. (2012). Interpretability and importance of functionals in com-
peting risks and multistate models. Statist. in Med., 31:1074–1088.
Andersen, P. K., Klein, J. P., Rosthøj, S. (2003). Generalized linear models for correlated
pseudo-observations, with applications to multi-state models. Biometrika, 90:15–27.
Andersen, P. K., Liestøl, K. (2003). Attenuation caused by infrequently updated covariates
in survival analysis. Biostatistics, 4:633–649.
Andersen, P. K., Pohar Perme, M. (2008). Inference for outcome probabilities in multi-state
models. Lifetime Data Analysis, 14:405–431.
– (2010). Pseudo-observations in survival analysis. Statist. Meth. Med. Res., 19:71–99.
Andersen, P. K., Pohar Perme, M., van Houwelingen, H. C., Cook, R. J., Joly, P., Mart-
inussen, T., Taylor, J. M. G., Abrahamowicz, M., Therneau, T. M. (2021). Analysis of
time-to-event for observational studies: Guidance to the use of intensity models. Statist.
in Med., 40:185–211.
Andersen, P. K., Skovgaard, L. T. (2006). Regression with Linear Predictors. New York:
Springer.
Andersen, P. K., Syriopoulou, E., Parner, E. T. (2017). Causal inference in survival analysis
using pseudo-observations. Statist. in Med., 36:2669–2681.
Andersen, P. K., Wandall, E. N. S., Pohar Perme, M. (2022). Inference for transition prob-
abilities in non-Markov multi-state models. Lifetime Data Analysis, 28:585–604.
Anderson, J. R., Cain, K. C., Gelber, R. D. (1983). Analysis of survival by tumor response.
J. Clin. Oncol., 1:710–719.
Austin, P. C., Steyerberg, E. W., Putter, H. (2021). Fine-Gray subdistribution hazard models
to simultaneously estimate the absolute risk of different event types: Cumulative total
failure probability may exceed 1. Statist. in Med., 40:4200–4212.
Azarang, L., Scheike, T., Uña-Alvarez, J. (2017). Direct modeling of regression effects for
transition probabilities in the progressive illness-death model. Statist. in Med., 36:1964–
1976.
Balan, T. A., Putter, H. (2020). A tutorial on frailty models. Statist. Meth. Med. Res.,
29:3424–3454.
Bellach, A., Kosorok, M. R., Rüschendorf, L., Fine, J. P. (2019). Weighted NPMLE for the
subdistribution of a competing risk. J. Amer. Statist. Assoc., 114:259–270.
Beyersmann, J., Allignol, A., Schumacher, M. (2012). Competing Risks and Multistate
Models with R. New York: Springer.
Beyersmann, J., Latouche, A., Bucholz, A., Schumacher, M. (2009). Simulating competing
risks data in survival analysis. Statist. in Med., 28:956–971.
Binder, N., Gerds, T. A., Andersen, P. K. (2014). Pseudo-observations for competing risks
with covariate dependent censoring. Lifetime Data Analysis, 20:303–315.
Blanche, P. F., Holt, H., Scheike, T. H. (2023). On logistic regression with right censored
data, with or without competing risks, and its use for estimating treatment effects. Life-
time Data Analysis, 29:441–482.
Bluhmki, T., Schmoor, C., Dobler, D., Pauly, M., Finke, J., Schumacher, M., Beyersmann,
J. (2018). A wild bootstrap approach for the Aalen–Johansen estimator. Biometrics,
74:977–985.
Borgan, Ø., Goldstein, L., Langholz, B. (1995). Methods for the analysis of sampled cohort
data in the Cox proportional hazards model. Ann. Statist., 23:1749–1778.
BIBLIOGRAPHY 263
Borgan, Ø., Samuelsen, S. O. (2014). “Nested case-control and case-cohort studies”. Hand-
book of Survival Analysis. Ed. by J. P. Klein, H. C. van Houwelingen, J. G. Ibrahim, T. H.
Scheike. Boca Raton: CRC Press. Chap. 17:343–367.
Bouaziz, O. (2023). Fast approximations of pseudo-observations in the context of right-
censoring and interval-censoring. Biom. J., 65:22000714.
Breslow, N. E. (1974). Covariance analysis of censored survival data. Biometrics, 30:89–
99.
Broström, G. (2012). Event history analysis with R. London: Chapman and Hall/CRC.
Bühler, A., Cook, R. J., Lawless, J. L. (2023). Multistate models as a framework for esti-
mand specification in clinical trials of complex diseases. Statist. in Med., 42:1368–1397.
Bycott, P., Taylor, J. M. G. (1998). A comparison of smoothing techniques for CD4 data
measured with error in a time-dependent Cox proportional hazards model. Statist. in
Med., 17:2061–2077.
Clayton, D. G., Hills, M. (1993). Statistical Models in Epidemiology. Oxford: Oxford Uni-
versity Press.
Collett, D. (2015). Modelling Survival Data in Medical Research (3rd ed.) Boca Raton:
Chapman and Hall/CRC.
Conner, S. C., Trinquart, L. (2021). Estimation and modeling of the restricted mean time
lost in the presence of competing risks. Statist. in Med., 40:2177–2196.
Cook, R. J., Lawless, J. F. (1997). Marginal analysis of recurrent events and a terminating
event. Statist. in Med., 16:911–924.
– (2007). The Statistical Analysis of Recurrent Events. New York: Springer.
– (2018). Multistate Models for the Analysis of Life History Data. Boca Raton: Chapman
and Hall/CRC.
Cook, R. J., Lawless, J. F., Lakhal-Chaieb, L., Lee, K.-A. (2009). Robust estimation of
mean functions and treatment effects for recurrent events under event-dependent censor-
ing and termination: Application to skeletal complications in cancer metastatic to bone.
J. Amer. Statist. Assoc., 104:60–75.
Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc., ser. B, 34:187–
220.
– (1975). Partial likelihood. Biometrika, 62:269–276.
Crowder, M. (2001). Classical Competing Risks. London: Chapman and Hall/CRC.
Daniel, R. M., Cousens, S. N., de Stavola, B. L., Kenward, M. G., Sterne, J. A. C. (2013).
Methods for dealing with time-dependent confounding. Statist. in Med., 32:1584–1618.
Daniel, R. M., Zhang, J., Farewell, D. (2021). Making apples from oranges: Comparing
non collapsible effect estimators and their standard errors after adjustment for different
covariate sets. Biom. J., 63:528–557.
Datta, S., Satten, G. A. (2001). Validity of the Aalen-Johansen estimators of stage oc-
cupation probabilities and Nelson-Aalen estimators of integrated transition hazards for
non-Markov models. Stat. & Prob. Letters, 55:403–411.
– (2002). Estimation of integrated transition hazards and stage occupation probabilities for
non-Markov systems under dependent censoring. Biometrics, 58:792–802.
Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. SIAM,
Philadelphia: CBMS-NSF Regional Conference Series in Applied Mathematics.
264 BIBLIOGRAPHY
Efron, B., Tibshirani, R. (1993). An Introduction to the Bootstrap. Boca Raton: Chapman
and Hall/CRC.
Fine, J. P., Gray, R. J. (1999). A proportional hazards model for the subdistribution of a
competing risk. J. Amer. Statist. Assoc., 94:496–509.
Fine, J. P., Jiang, H., Chappell, R. (2001). On semi-competing risks data. Biometrika,
88:907–919.
Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure time
data. Biometrics, 42:845–854.
Fisher, L. D., Lin, D. Y. (1999). Time-dependent covariates in the Cox proportional-hazards
regression model. Ann. Rev. Public Health, 20:145–157.
Fix, E., Neyman, J. (1951). A simple stochastic model of recovery, relapse, death and loss
of patients. Hum. Biol., 23:205–241.
Frydman, H. (1995). Nonparametric estimation of a Markov illness-death process from
interval-censored observations, with applications to diabetes survival data. Biometrika,
82:773–789.
Frydman, H., Liu, J. (2013). Nonparametric estimation of the cumulative intensities in an
interval censored competing risks model. Lifetime Data Analysis, 19:79–99.
Frydman, H., Szarek, M. (2009). Nonparametric estimation in a Markov illness-death pro-
cess from interval censored observations with missing intermediate transition status. Bio-
metrics, 65:143–151.
Furberg, J. K., Korn, S., Overgaard, M., Andersen, P. K., Ravn, H. (2023). Bivariate pseudo-
observations for recurrent event analysis with terminal events. Lifetime Data Analysis,
29:256–287.
Furberg, J. K., Rasmussen, S., Andersen, P. K., Ravn, H. (2022). Methodological challenges
in the analysis of recurrent events for randomised controlled trials with application to
cardiovascular events in LEADER. Pharmaceut. Statist., 21:241–267.
Gerds, T. A., Scheike, T. H., Andersen, P. K. (2012). Absolute risk regression for competing
risks: interpretation, link functions, and prediction. Statist. in Med., 31:1074–1088.
Geskus, R. (2016). Data Analysis with Competing Risks and Intermediate States. Boca
Raton: Chapman and Hall/CRC.
Ghosh, D., Lin, D. Y. (2000). Nonparametric analysis of recurrent events and death. Bio-
metrics, 56:554–562.
– (2002). Marginal regression models for recurrent and terminal events. Statistica Sinica,
12:663–688.
Gill, R. D., Johansen, S. (1990). A survey of product-integration with a view towards ap-
plication in survival analysis. Ann. Statist., 18:1501–1555.
Glidden, D. V. (2000). A two-stage estimator of the dependence parameter for the Clayton-
Oakes model. Lifetime Data Analysis, 6:141–156.
– (2002). Robust inference for event probabilities with non-Markov event data. Biometrics,
58:361–368.
Glidden, D. V., Vittinghoff, E. (2004). Modelling clustered survival data from multicentre
clinical trials. Statist. in Med., 23:369–388.
Gran, J. M., Lie, S. A., Øyeflaten, I., Borgan, Ø., Aalen, O. O. (2015). Causal inference in
multi-state models – Sickness absence and work for 1145 participants after work reha-
bilitation. BMC Publ. Health, 15:1082.
BIBLIOGRAPHY 265
Graw, F., Gerds, T. A., Schumacher, M. (2009). On pseudo-values for regression analysis
in competing risks models. Lifetime Data Analysis, 15:241–255.
Grøn, R., Gerds, T. A. (2014). “Binomial regression models”. Handbook of Survival Analy-
sis. Ed. by J. P. Klein, H. C. van Houwelingen, J. G. Ibrahim, T. H. Scheike. Boca Raton:
CRCPress. Chap. 11:221–242.
Gunnes, N., Borgan, Ø., Aalen, O. O. (2007). Estimating stage occupation probabilities in
non-Markov models. Lifetime Data Analysis, 13:211–240.
Henderson, R., Diggle, P., Dobson, A. (2000). Joint modelling of longitudinal measure-
ments and event time data. Biostatistics, 1:465–480.
Hernán, M. A., Robins, J. M. (2020). Causal Inference: What If. Boca Raton: Chapman
and Hall/CRC.
Hougaard, P. (1986). A class of multivariate failure time distributions. Biometrika, 73:671–
678.
– (1999). Multi-state models: a review. Lifetime Data Analysis, 5:239–264.
– (2000). Analysis of Multivariate Survival Data. New York: Springer.
– (2022). Choice of time scale for analysis of recurrent events data. Lifetime Data Analysis,
28:700–722.
Huang, C., Wang, M. (2004). Joint modeling and estimation for recurrent event processes
and failure time data. J. Amer. Statist. Assoc., 99:1153–1165.
Hudgens, M. G., Satten, G. A., Longini, I. M. (2004). Nonparametric maximum likelihood
estimation for competing risks survival data subject to interval censoring and truncation.
Biometrics, 57:74–80.
Iacobelli, S., Carstensen, B. (2013). Multiple time scales in multi-state models. Statist. in
Med., 30:5315–5327.
Jackson, C. (2011). Multi-state models for panel data: the msm package for R. J. Statist.
Software, 38:1–27.
Jacobsen, M., Martinussen, T. (2016). A note on the large sample properties of estimators
based on generalized linear models for correlated pseudo-observations. Scand. J. Statist.,
43:845–862.
Jaeckel, L. A. (1972). The Infinitesimal Jackknife. Tech. rep. Bell Laboratories, MM 72-
1215-11.
Janvin, M., Young, J. G., Ryalen, P. C., Stensrud, M. J. (2023). Causal inference with re-
current and competing events. Lifetime Data Analysis. (in press).
Jensen, H., Benn, C. S., Nielsen, J., Lisse, I. M., Rodrigues, A., Andersen, P. K., Aaby, P.
(2007). Survival bias in observational studies of the effect of routine immunisations on
childhood survival. Trop. Med. Int. Health, 12:5–14.
Johansen, M. N., Lundbye-Christensen, S., Parner, E. T. (2020). Regression models using
parametric pseudo-observations. Statist. Med., 39:2949–2961.
Joly, P., Commenges, D., Helmer, C., Letenneur, L. (2002). A penalized likelihood ap-
proach for an illness-death model with interval-censored data: application to age-specific
incidence of dementia. Biostatistics, 3:433–443.
Josefson, A. M., Magnusson, P. K. E., Ylitalo, N., Sørensen, P., Qwarforth-Tubbin, P., An-
dersen, P. K., Melbye, M., Adami, H.-O., Gyllensten, U. B. (2000). Viral load of human
papilloma virus 16 as a determinant for development of cervical carcinoma in situ: a
nested case-control study. The Lancet, 355:2189–2193.
266 BIBLIOGRAPHY
Kalbfleisch, J. D., Lawless, J. F. (1985). The analysis of panel data under a Markov as-
sumption. J. Amer. Statist. Assoc., 80:863–871.
Kalbfleisch, J. D., Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data.
(2nd ed. 2002). New York: Wiley.
Kaplan, E. L., Meier, P. (1958). Non-parametric estimation from incomplete observations.
J. Amer. Statist. Assoc., 53:457–481, 562–563.
Keiding, N. (1998). “Lexis diagram”. Encyclopedia of Biostatistics vol. 3. New York: Wi-
ley:2232–2234.
Kessing, L. V., Hansen, M. G., Andersen, P. K., Angst, J. (2004). The predictive effect
of episodes on the risk of recurrence in depressive and bipolar disorder - a life-long
perspective. Acta Psych. Scand., 109:339–344.
Kessing, L. V., Olsen, E. W., Andersen, P. K. (1999). Recurrence in affective disorder:
Analyses with frailty models. Amer. J. Epidemiol., 149:404–411.
Kristensen, I., Aaby, P., Jensen, H. (2000). Routine vaccinations and child survival: follow
up study in Guinea-Bissau, West Africa. Br. Med. J., 321:1435–1438.
Larsen, B. S., Kumarathurai, P., Falkenberg, J., Nielsen, O. W., Sajadieh, A. (2015). Exces-
sive atrial ectopy and short atrial runs increase the risk of stroke beyond atrial fibrillation.
J. Amer. College Cardiol., 66:232–241.
Latouche, A., Allignol, A., Beyersmann, J., Labopin, M., Fine, J. P. (2013). A competing
risks analysis should report results on all cause-specific hazards and cumulative inci-
dence functions. J. Clin. Epidemiol., 66:648–653.
Lawless, J. F., Nadeau, J. C. (1995). Some simple robust methods for the analysis of recur-
rent events. Technometrics, 37:158–168.
Li, J., Scheike, T. H., Zhang, M.-J. (2015). Checking Fine and Gray subdistribution hazards
model with cumulative sums of residuals. Lifetime Data Analysis, 21:197–217.
Li, Q. H., Lagakos, S. W. (1997). Use of the Wei-Lin-Weissfeld method for the analysis of
a recurrent and a terminating event. Statist. in Med., 16:925–940.
Lin, D. Y. (1994). Cox regression analysis of multivariate failure time data: the marginal
approach. Statist. in Med., 13:2233–2247.
Lin, D. Y., Oakes, D., Ying, Z. (1998). Additive hazards regression models with current
status data. Biometrika, 85:289–298.
Lin, D. Y., Wei, L. J. (1989). The robust inference for the Cox proportional hazards model.
J. Amer. Statist. Assoc., 84:1074–1078.
Lin, D. Y., Wei, L. J., Yang, I., Ying, Z. (2000). Semiparametric regression for the mean
and rate functions of recurrent events. J. Roy. Statist. Soc., ser. B, 62:711–730.
Lin, D. Y., Wei, L. J., Ying, Z. (1993). Checking the Cox model with cumulative sums of
martingale-based residuals. Biometrika, 80:557–572.
– (2002). Model-checking techniques based on cumulative residuals. Biometrics, 58:1–12.
Lin, D. Y., Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika,
81:61–71.
Lindsey, J. C., Ryan, L. M. (1993). A three-state multiplicative model for rodent tumori-
genicity experiments. J. Roy. Statist. Soc., ser. C, 42:283–300.
Liu, L., Wolfe, R. A., Huang, X. (2004). Shared frailty models for recurrent events and a
terminal event. Biometrics, 60:747–756.
BIBLIOGRAPHY 267
Lombard, M., Portmann, B., Neuberger, J., Williams, R., Tygstrup, N., Ranek, L., Ring-
Larsen, H., Rodes, J., Navasa, M., Trepo, C., Pape, G., Schou, G., Badsberg, J. H., An-
dersen, P. K. (1993). Cyclosporin A treatment in primary biliary cirrhosis: results of a
long-term placebo controlled trial. Gastroenterology, 104:519–526.
Lu, C., Goeman, J., Putter, H. (2023). Maximum likelihood estimation in the additive haz-
ards model. Biometrics, 28:700–722.
Lu, X., Tsiatis, A. A. (2008). Improving the efficiency of the log-rank test using auxiliary
covariates. Biometrika, 95:679–694.
Malzahn, N., Hoff, R., Aalen, O. O., Mehlum, I. S., Putter, H., Gran, J. M. (2021). A hybrid
landmark Aalen-Johansen estimator for transition probabilities in partially non-Markov
multi-state models. Lifetime Data Analysis, 27:737–760.
Mao, L., Lin, D. Y. (2016). Semiparametric regression for the weighted composite endpoint
of recurrent and terminal events. Biostatistics, 17:390–403.
– (2017). Efficient estimation of semiparametric transformation models for the cumulative
incidence of competing risk. J. Roy. Statist. Soc., ser. B, 79:573–587.
Mao, L., Lin, D. Y., Zeng, D. (2017). Semiparametric regression analysis of interval-
censored competing risks data. Biometrics, 73:857–865.
Marso, S. P., Daniels, G. H., Brown-Frandsen, K., Kristensen, P., Mann, J. F. E., Nauck,
M. A., Nissen, S. E., Pocock, S., Poulter, N. R., Ravn, L. S., Steinberg, W. M., Stockner,
M., Zinman, B., Bergenstal, R. M., Buse, J. B., for the LEADER steering committee
(2016). Liraglutide and Cardiovascular Outcomes in Type 2 Diabetes. New Engl. J. Med.,
375:311–322.
Martinussen, T., Scheike, T. H. (2006). Dynamic Regression Models for Survival Data.
New York: Springer.
Martinussen, T., Vansteelandt, S., Andersen, P. K. (2020). Subtleties in the interpretation of
hazard contrasts. Lifetime Data Analysis, 26:833–855.
Meira-Machado, L., J. Uña-Alvarez, Cadarso-Saurez, C. (2006). Nonparametric estimation
of transition probabilities in a non-Markov illness-death model. Lifetime Data Analysis,
13:325–344.
Mitton, L., Sutherland, H., Week, M., (eds.) (2000). Microsimulation Modelling for Policy
Analysis. Challenges and Innovations. Cambridge: Cambridge University Press.
Nielsen, G. G., Gill, R. D., Andersen, P. K., Sørensen, T. I. A. (1992). A counting process
approach to maximum likelihood estimation in frailty models. Scand. J. Statist., 19:25–
43.
O’Hagan, A., Stevenson, M., Madan, J. (2007). Monte Carlo probabilistic sensitivity anal-
ysis for patient level simulation models: efficient estimation of mean and variance using
ANOVA. Health Economics, 16:1009–1023.
O’Keefe, A. G., Su, L., Farewell, V. T. (2018). Correlated multistate models for multiple
processes: An application to renal disease progression in systemic lupus erythematosus.
Appl. Statist., 67:841–860.
Overgaard, M. (2019). State occupation probabilities in non-Markov models. Math. Meth.
Statist., 28:279–290.
Overgaard, M., Andersen, P. K., Parner, E. T. (2023). Pseudo-observations in a multi-state
setting. The Stata Journal, 23:491–517.
268 BIBLIOGRAPHY
Overgaard, M., Parner, E. T., Pedersen, J. (2017). Asymptotic theory of generalized esti-
mating equations based on jack-knife pseudo-observations. Ann. Statist., 45:1988–2015.
– (2019). Pseudo-observations under covariate-dependent censoring. J. Statist. Plan. and
Inf., 202:112–122.
Parner, E. T., Andersen, P. K., Overgaard, M. (2023). Regression models for censored time-
to-event data using infinitesimal jack-knife pseudo-observations, with applications to
left-truncation. Lifetime Data Analysis, 29:654–671.
Pavlič, K., Martinussen, T., Andersen, P. K. (2019). Goodness of fit tests for estimating
equations based on pseudo-observations. Lifetime Data Analysis, 25:189–205.
Pepe, M. S. (1991). Inference for events with dependent risks in multiple endpoint studies.
J. Amer. Statist. Assoc., 86:770–778.
Pepe, M. S., Longton, G., Thornquist, M. (1991). A qualifier Q for the survival function to
describe the prevalence of a transient condition. Statist. in Med., 10:413–421.
Petersen, L., Andersen, P. K., Sørensen, T. I. A. (2005). Premature death of adult adoptees:
Analyses of a case-cohort sample. Gen. Epidemiol., 28:376–382.
Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease
prevention trials. Biometrika, 73:1–11.
Prentice, R. L., Gloeckler, L. A. (1978). Regression analysis of grouped survival data with
application to breast cancer data. Biometrics, 34:57–67.
Prentice, R. L., Kalbfleisch, J. D., Peterson, A. V., Flournoy, N., Farewell, V. T., Breslow,
N. E. (1978). The analysis of failure times in the presence of competing risks. Biometrics,
34:541–554.
Prentice, R. L., Williams, B. J., Peterson, A. V. (1981). On the regression analysis of mul-
tivariate failure time data. Biometrika, 68:373–379.
Prentice, R. L., Zhao, S. (2020). The Statistical Analysis of Multivariate Failure Time Data.
Boca Raton: Chapman and Hall/CRC.
Preston, S., Heuveline, P., Guillot, M. (2000). Demography: Measuring and Modeling Pop-
ulation Processes. New York: Wiley.
PROVA study group (1991). Prophylaxis of first time hemorrage from esophageal varices
by sclerotherapy, propranolol or both in cirrhotic patients: A randomized multicenter
trial. Hepatology, 14:1016–1024.
Putter, H., Fiocco, M., Geskus, R. B. (2007). Tutorial in biostatistics: competing risks and
multi-state models. Statist. in Med., 26:2389–2430.
Putter, H., Schumacher, M., van Houwelingen, H. C. (2020). On the relation between the
cause-specific hazard and the subdistribution rate for competing risks data: The Fine-
Gray model revisited. Biom. J, 62:790–807.
Putter, H., Spitoni, C. (2018). Non-parametric estimation of transition probabilities in non-
Markov multi-state models: The landmark Aalen-Johansen estimator. Statist. Meth. Med.
Res., 27:2081–2092.
Putter, H., van Houwelingen, H. C. (2015). Frailties in multi-state models: Are they identi-
fiable? Do we need them? Statist. Meth. Med. Res., 24:675–692.
– (2022). Landmarking 2.0: Bridging the gap between joint models and landmarking.
Statist. in Med., 41:1901–1917.
Rizopoulos, D. (2012). Joint Models for Longitudinal and Time-to-Event Data. Boca Ra-
ton: Chapman and Hall/CRC.
BIBLIOGRAPHY 269
Rodriguez-Girondo, M., Uña-Alvarez, J. (2012). A nonparametric test for Markovianity in
the illness-death model. Statist. in Med., 31:4416–4427.
Rondeau, V., Mathoulin-Pelissier, S., Jacqmin-Gadda, H., Brouste, V., Soubeyran, P.
(2007). Joint frailty models for recurring events and death using maximum penalized
likelihood estimation: Application on cancer events. Biostatistics, 8:708–721.
Rosenbaum, P. R., Rubin, D. B. (1993). The central role of the propensity score in obser-
vational studies for causal effects. Biometrika, 70:41–55.
Royston, P., Parmar, M. K. B. (2002). Flexible parametric proportional-hazards and
proportional-odds models for censored survival data, with application to prognostic mod-
elling and estimation of treatment effects. Statist. in Med., 21:2175–2197.
Rutter, C. M., Zaslavsky, A. M., Feuer, E. J. (2011). Dynamic microsimulation models for
health outcomes: a review. Med. Decision Making, 31:10–18.
Sabathé, C., Andersen, P. K., Helmer, C., Gerds, T. A., Jacqmin-Gadda, H., Joly, P. (2020).
Regression analysis in an illness-death model with interval-censored data: a pseudo-
value approach. Statist. Meth. Med. Res., 29:752–764.
Scheike, T. H., Zhang, M.-J. (2007). Direct modelling of regression effects for transition
probabilities in multistate models. Scand. J. Statist., 34:17–32.
Scheike, T. H., Zhang, M.-J., Gerds, T. A. (2008). Predicting cumulative incidence proba-
bility by direct binomial regression. Biometrika, 95:205–220.
Self, S. G., Prentice, R. L. (1988). Asymptotic distribution theory and efficiency results for
case-cohort studies. Ann. Statist., 16:64–81.
Shih, J. H., Louis, T. A. (1995). Inferences on association parameter in copula models for
bivariate survival data. Biometrics, 51:1384–1399.
Shu, Y., Klein, J. P., Zhang, M.-J. (2007). Asymptotic theory for the Cox semi-Markov
illness-death model. Lifetime Data Analysis, 13:91–117.
Spikerman, C. F., Lin, D. Y. (1999). Marginal regression models for multivariate failure
time data. J. Amer. Statist. Assoc., 93:1164–1175.
Støer, N., Samuelsen, S. O. (2012). Comparison of estimators in nested case-control studies
with multiple outcomes. Lifetime Data Analysis, 18:261–283.
Suissa, S. (2007). Immortal time bias in pharmacoepidemiology. Amer. J. Epidemiol.,
167:492–499.
Sun, J. (2006). The Statistical Analysis of Interval-censored Failure Time Data. New York:
Springer.
Sverdrup, E. (1965). Estimates and test procedures in connection with stochastic models for
deaths, recoveries and transfers between different states of health. Skand. Aktuarietidskr.,
48:184–211.
Szklo, M., Nieto, F. J. (2014). Epidemiology. Beyond the Basics. Burlington: Jones and
Bartlett.
Thomas, D. C. (1977). Addendum to ‘Methods of cohort analysis: appraisal by application
to asbestos mining’ by F. D. K. Liddell, J. C. McDonald, D. C. Thomas. J. Roy. Statist.
Soc., ser. B, 140:469–491.
Tian, L., Zhao, L., Wei, L. J. (2014). Predicting the restricted mean event time with the
subject’s baseline covariates in survival analysis. Biostatistics, 15:222–233.
Titman, A. C. (2015). Transition probability estimates for non-Markov multi-state models.
Biometrics, 71:1034–1041.
270 BIBLIOGRAPHY
Titman, A. C., Putter, H. (2022). General tests of the Markov property in multi-state models.
Biostatistics, 23:380–396.
Tsiatis, A. A. (1975). A nonidentifiability aspect of the problem of competing risks. Proc.
Nat. Acad. Sci. USA, 72:20–22.
Tsiatis, A. A., Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data:
An overview. Statistica Sinica, 14:809–834.
Turnbull, B. W. (1976). The empirical distribution with arbitrarily grouped, censored and
truncated data. J. Roy. Statist. Soc., ser. B, 38:290–295.
Uña-Alvarez, J., Meira-Machado, L. (2015). Nonparametric estimation of transition prob-
abilities in the non-Markov illness-death model: A comparative study. Biometrics,
71:364–375.
van den Hout, A. (2020). Multi-State Survival Models for Interval-Censored Data. Boca
Raton: Chapman and Hall/CRC.
van der Laan, M. J., Rose, S. (2011). Targeted Learning. Causal Inference for Observa-
tional and Experimental Data. New York: Springer.
van Houwelingen, H. C. (2007). Dynamic prediction by landmarking in event history anal-
ysis. Scand. J. Statist., 34:70–85.
van Houwelingen, H. C., Putter, H. (2012). Dynamic Prediction in Clinical Survival Anal-
ysis. Boca Raton: Chapman and Hall/CRC.
Wei, L. J., Glidden, D. V. (1997). An overview of statistical methods for multiple failure
time data in clinical trials. Statist. in Med., 16:833–839.
Wei, L. J., Lin, D. Y., Weissfeld, L. (1989). Regression analysis of multivariate incomplete
failure time data by modeling marginal distributions. J. Amer. Statist. Assoc., 84:1065–
1073.
Westergaard, T., Andersen, P. K., Pedersen, J. B., Frisch, M., Olsen, J. H., Melbye, M.
(1998). Testis cancer risk and maternal parity: a population-based cohort study. Br. J.
Cancer, 77:1180–1185.
Xu, J., Kalbfleisch, J. D., Tai, B. (2010). Statistical analysis of illness-death processes and
semicompeting risks data. Biometrics, 66:716–725.
Yashin, A., Arjas, E. (1988). A note on random intensities and conditional survival func-
tions. J. Appl. Prob., 25:630–635.
Zeng, D., Mao, L., Lin, D. Y. (2016). Maximum likelihood estimation for semiparametric
transformation models with interval-censored data. Biometrika, 103:253–271.
Zheng, M., Klein, J. P. (1995). Estimates of marginal survival for dependent competing
risks based on an assumed copula. Biometrika, 82:127–138.
Zhou, B., Fine, J. P., Latouche, A., Labopin, M. (2012). Competing risks regression for
clustered data. Biostatistics, 13:371–383.
Subject Index
Page numbers followed by * refer to a (*)-marked section, and page numbers followed by
an italic b refer to a summary box.
Aalen additive hazard model, 54–58, 65b, box-and-arrows diagram, see diagram
80–81* Breslow estimator, 43, 77*
piece-wise constant baseline, 58, 81*
semi-parametric, 81* case-cohort study, 259*
Aalen-Johansen estimator, 126, 167* causal inference, 27, 250–251, 252*
confidence limits, 127 cause-specific hazard, see hazard
for general Markov process, 164* cause-specific time lost, see time lost
intuitive argument, 126 censoring, 29b
IPCW version, 238 administrative, 29
state occupation probability, 171* and competing risks, 28
absorbing state, 18, 31* drop-out event, 29
adapted covariate, 71*, 87–88 independent, 1, 27–29, 153–156, 159b
additive hazard model, see Aalen additive definition, 28
hazard model lack of test for, 29
administrative censoring, 29 interval, 245, 246–248*
AG model, 63, 65b investigation of, 153–155
at risk, 21 left, 246*
total number, 21 non-informative, 71*
at-risk right, 1, 27
function, 22 Chapman-Kolmogorov equations, 164*
indicator, 21 cloglog, see link function
process, 21 clustered data, see also frailty model
avoidable event, 1, 28 marginal hazard model, 148, 203*
cohort sampling, 257–260
bone marrow transplantation, 9–10 collapsibility, 26
analyzed as clustered data, 148 compensator, 33*
expected length of stay, 133 competing risks, 1, 27–29, 29b
frailty model, 112 as censoring, 28
illness-death model, 133 diagram, 4, 19
joint Cox models, 105 direct model, 136–141, 193–196*
landmarking, 178–179 in marginal hazard model, 149
marginal hazard models, 151 independent, 156, 159b
non-Markov model, 211 interval-censoring, 247*
prevalence, 133 latent failure times, 157*
time-dependent covariate, 96–100 plug-in, 125–130, 166–168*
bootstrap, 26 time lost, 30*, 130
271
272 SUBJECT INDEX
composite end-point, 3, 9 intuition, 43
condition on the future, 18, 101, 102, 198 time-dependent covariate, 88, 89*
condition on the past, 18 cumulative hazard, 36
conditional multiplier theorem, 206* interpretation, 36
conditional parameter, 20b non-parametric estimator, see
confounder, 8, 250 Nelson-Aalen estimator
consistency, see causal inference cumulative incidence function, 20, 30*, 131b
Cook-Lawless estimator, 143, 198* area under, 130
Copenhagen Holter study, 10–12 biased estimator using Kaplan-Meier,
copula, 250* 127
counterfactual, see causal inference cloglog link, 137, 193*
counting process, 21, 32, 33*, 69* direct model, 136, 193*
format, 21 etymology, 20
jump, 22 Fine-Gray model, see Fine-Gray model
covariate, 2, 25 intuition, 126
adapted, 71*, 87–88, 100b non-parametric estimator, see
endogenous, 88 Aalen-Johansen estimator
exogenous, 88 plug-in, 126, 167*
external, 88 prediction of, 138
internal, 88 pseudo-values, 233–234
non-adapted, 71*, 88 cumulative mean function, see mean
time-dependent, 71*, 87–89, 94b, 100b function
Cox model, 88, 89* cumulative regression function, 54
cumulative baseline hazard, 88 current status data, 246*
immortal time bias, 102
inference, 88–89, 89* data duplication trick, 105, 177
interpretation, 101 delayed entry, 13, 24, 58–61, 61b, 71*
partial likelihood, 88, 89* choice of time-variable, 24
type-specific, 103, 202* GEE, 192*
Cox model, 41–44, 54, 65b, 76–78* independent, 61, 69*
baseline hazard, 41 logrank test, 59
checking proportional hazards, 50, 95, pseudo-values, 221
208* dependent data, see frailty model and
cloglog link, 121 marginal hazards model
interpretation, 41, 44 diagram
Jacod formula, 76* competing risks model, 4, 19
large-sample inference, 77* illness-death model, 7
linear predictor, 43 illness-death model with recovery, 8
marginal, see marginal hazard model recurrent episodes, 8
martingale residuals, 207* recurrent events, 8
multiple, 43 two-state model, 3, 19
multivariate, 202–203* direct binomial regression, 200–201*
partial likelihood, 43, 76* link function, 200*
profile likelihood, 76* direct marginal model, see direct model
Schoenfeld residuals, 208* direct model, 2, 134–146, 147b, 190–201*
score function, 76*, 201* competing risk, 193–196*
stratified, 51, 77* cumulative incidence function, 136,
survival function, 121, 166* 193*
time-dependent covariate, 88, 89* GEE, 191–192*
time-dependent strata, 103 link function, 191*
versus Poisson model, 51–53 recurrent events, 142, 196*
Cox partial likelihood, 43, 76* restricted mean, 136, 192*
SUBJECT INDEX 273
sandwich estimator, 192* Ghosh-Lin model, 144, 199*
state occupation probability, 200–201* goodness-of-fit, see residuals
survival function, 135, 192* Greenwood formula, 118, 165*
time lost, 139, 196* Guinea-Bissau study, 4–5
two-state model, 192–193* Breslow estimate, 60
disability model, 6, see also illness-death case-cohort, 260
model Cox model, 59, 82–83
discrete event simulation, see delayed entry, 58–61
micro-simulation nested case-control, 260
Doob-Meyer decomposition, 33*
drop-out event, 29 hazard, 65b, see also intensity
cause-specific, 18, 32*, 125
event history data, 1 difference, 54, 80*
examples, 13b pseudo-values, 229
bone marrow transplantation, 9 integrated, see cumulative hazard
Copenhagen Holter study, 10 marginal, 148, 152b, 202*
Guinea-Bissau study, 4 one-to-one with survival function, 20,
LEADER trial, 9 120
PBC3 trial, 2 ratio, 41
PROVA trial, 6 sub-distribution, 138, 193*
recurrent episodes in affective
disorders, 7 illness-death model, 6
small set of survival data, 14 diagram, 7
testis cancer study, 5 expected length of stay, 133
exchangeability, see causal inference interval-censoring, 247–248*
expected length of stay, 16, 30*, 133, 164* intuitive argument, 132
plug-in, 168* irreversible, 6
expected time spent in state, see expected marginal hazard model, 150, 204*
length of stay plug-in, 131–134, 168*
prevalence, 133
failure distribution function, 15, 30* progressive, 6, 168*
Fine-Gray model, 136, 142b, 193* semi-Markov, 132
interpretation, 138 time with disability, 30
frailty model, 110–114 with recovery, 8
censoring assumption, 111* immortal time bias, 102
clustered data, 111 improper random variable, 157, 167*, 187,
frailty distribution, 111 193*, 204*, 248
inference, 110* incomplete observation, see censoring
recurrent events, 112 independent censoring, see censoring
shared, 111, 249–250* independent delayed entry, see delayed entry
two-stage estimation, 249–250* indicator function, 21
functional delta-method, 165* influence function, 238–241*
integrated hazard, see cumulative hazard
gap time model, 63 integrated intensity process, 33*
GEE, 135, 152b, 191–192* intensity, 1, 2, 18, 65b
delayed entry, 192* models with shared parameters,
generalized linear model, 191* 103–105
IPCW, 192*, 194*, 197*, 199* likelihood function, 106–109*
sandwich estimator, 152b, 192* non-parametric model, 73*
generalized estimating equation, see GEE intensity process, 33*, 69*
generalized linear model, 191* intermediate variable, 90
g-formula, 26, 252* intermittent observation, 246
274 SUBJECT INDEX
interval-censoring, 245–248* likelihood ratio test, 38
competing risks, 247* linear predictor, 25
illness-death model, 247–248* checking interactions, 48
Markov process, 246* checking linear effect, 46
two-state model, 246* link function, 25, 135, 191*, 200*
inverse probability of cloglog, 121, 135, 227, 233
censoring weight, 192*, see also GEE identity, 136, 231, 233
survival weight, 199* logarithm, 136, 233, 234
treatment weight, 252–253* –logarithm, 135, 229
IPCW, 192*, see also GEE logit, 233
irreversible model, 6, see also illness-death pseudo-values, 223
model logrank test, 38, 75*
as score test, 77*
Jacod formula, 70* delayed entry, 59
joint Cox models, see intensity models with stratified, 40, 77*
shared parameters long data format, 21
joint models, 254–257 LWYY model, 143, 197*
likelihood, 256*
Mao-Lin model, 199*
Kaplan-Meier estimator, 117–120, 165* marginal Cox model, see marginal hazard
conditional, 165* model
confidence limits, 118, 166* marginal hazard model, 147–151, 152b,
intuitive argument, 118 201–204*
IPCW version, 237 clustered data, 148, 203*
variance estimator, 165* competing risks, 149, 203*
Kolmogorov forward differential equations, illness-death model, 150, 204*
164* recurrent events, 149, 203*
robust standard deviations, 148
landmarking, 172*, 176–183 time to (first) entry, 147, 201*
Aalen-Johansen estimator, 172* WLW model, 149, 203*
bone marrow transplantation, 178–179 marginal parameter, 14, 16, 17b
estimating equations, 181* direct model, 134–146
joint models, 257 failure distribution function, 15
super model, 177–178 for recurrent events, 31
latent failure times, see competing risks mean function, 142
LEADER trial, 9 restricted mean, 16, 30*
AG model, 65 state occupation probability, 15
bivariate pseudo-values, 235–236 survival function, 15
Cook-Lawless estimator, 237 time lost, 130
frailty model, 113 time to (first) entry, 16, 30*, 147
Ghosh-Lin model, 145 marked point process format, 21
intensity models for recurrent Markov process, 18, 31*, 132, 134,
myocardial infarctions, 64–65 163–170*, 176b
Mao-Lin model, 213 Aalen-Johansen estimator, 164*
PWP model, 65 interval-censoring, 246*
left-truncation, see delayed entry product-integral, 163*
Lexis diagram, 85 property, 163*
likelihood function, 69–73* state occupation probability, 164*
factorization, 71*, 72b test for assumption of, 91, 174*
Jacod formula, 70* transition probability, 164*
multinomial experiment, 70* martingale, 33*, 82b*
two-state model, 72* matrix exponential, 164*
SUBJECT INDEX 275
mean function piece-wise constant, 37–38, 78–80*
Cook-Lawless estimator, 143, 198* Poisson, 45
critique against, 145* partial transition rate, 171*
Ghosh-Lin model, 144 recurrent events, 198*
LWYY model, 143, 197* past, 18
Mao-Lin model, 199* condition on the, 18
Nelson-Aalen estimator, 142, 197* history, 16
terminal event, 143, 146b information, 1, 31*
micro-simulation, 2, 19, 184–190, 190b path, 11, 19, 31*
PROVA trial, 187–190 micro-simulation, 184
multi-state PBC3 trial, 2–4
process, 15, 30* Aalen model, 54–57
survival data, 1, 2b analysis of censoring, 153
multi-state model, 29b Breslow estimate, 44
diagram, see diagram cause-specific hazard, 61
multinomial experiment censoring, 2
likelihood function, 70* competing risks, 61
micro-simulation, 184* Cox model, 43–44
multiplicative hazard regression model, checking linearity, 46–48, 214
41–53, see also Cox model and checking proportional hazards, 50,
Poisson model 95, 215
cumulative incidence function
Nelson-Aalen estimator, 36–37, 65b, 73–75* Aalen-Johansen, 127
confidence limits, 36, 74* direct model, 138
for recurrent events, 142, 197* plug-in from Cox models, 127
maximum likelihood interpretation, 74* direct binomial regression, 211–213
variance estimator, 74* Fine-Gray model, 138
nested case-control study, 258–259* g-formula, 122
non-adapted covariate, 71*, 88 logrank test, 40
non-avoidable event, 1, 28 martingale residuals, 214
non-collapsibility, 26 Nelson-Aalen estimate, 37
non-informative censoring, 71* piece-wise constant hazards, 38
non-Markov process, 32*, 170–175*, 176b, Poisson model, 45
234 checking interactions, 48
Nelson-Aalen estimator, 171* checking linearity, 46–48
product-integral, 171* checking proportional hazards, 50
recurrent events, 174* pseudo-values, 224–234
state occupation probability, 171* residuals, 213–218
transition probability, 172* restricted mean
direct model, 136
observational study, 13 plug-in, 125
bone marrow transplantation, 9 Schoenfeld residuals, 215
Copenhagen Holter study, 10 survival function
Guinea-Bissau study, 4 Kaplan-Meier, 119, 121
recurrent episodes in affective plug-in, 121
disorders, 7 time lost
testis cancer study, 5 direct model, 139
occurrence/exposure rate, 38, 72*, 78*, 164* plug-in, 130
standard deviation, 38 time-dependent covariate, 95–96
Pepe estimator, 132, 172*
panel data, 245 piece-wise constant hazards, 37–38, 65b,
parametric hazard model 78–80*
276 SUBJECT INDEX
regression, 79* pseudo-observation, see pseudo-value
piece-wise exponential model, see pseudo-values, 2, 229, 229b
piece-wise constant hazards bivariate, 235–236
plug-in, 2, 20, 117–134, 134b covariate-dependent censoring, 236
competing risks, 125–130, 166–168* cumulative incidence function, 233–234
cumulative incidence function, 126, cumulative residuals, 241*
167* delayed entry, 221
expected length of stay, 168* GEE, 223
illness-death model, 131–134, 168* hazard difference, 229
Markov process, 163–170* infinitesimal jackknife, 241*
prevalence, 169* influence function, 238–241*
recurrent events, 169–170* intuition, 222–224
restricted mean, 122, 166* link function, 223
semi-Markov process, 174–175* cloglog, 227, 233
survival function, 118, 165* identity, 231, 233
time lost, 130, 168* logarithm, 233, 234
two-state model, 117–125, 165–166* –logarithm, 229
Poisson model, 45, 65b, 79*, see also logit, 233
piece-wise constant hazards no censoring, 222
checking proportional hazards, 50 non-Markov process, 234
etymology, 80* recurrent events, 235–236
joint models, 105* residual plot, 228
versus Cox model, 51–53 restricted mean, 229
population scatter plot, 224
hypothetical, 28 survival indicator, 223
sample from, 27 theroretical properties, 237–241*
without censoring, 28 time lost, 233–234
positivity, see causal inference with censoring, 223
prediction, 26, see also landmarking and PWP model, 63
joint models
prevalence, see illness-death model randomized trial
plug-in, 169* LEADER, 9
product-integral, 70*, 163* PBC3, 2
non-Markov, 171* PROVA, 6
profile likelihood, 76* rate, 2, 65b, see intensity or hazard
prognostic variable, 1, 2 recurrent episodes, 8, see also illness-death
progressive model, 170* model with recovery and recurrent
propensity score, see causal inference events
proportional hazards, 41 recurrent episodes in affective disorders, 7–9
checking, 50, 95, 208* AG model, 63
same as no interaction with time, 50 analysis of censoring, 154
PROVA trial, 6–7 Cook-Lawless estimator, 144
analysis of censoring, 154 gap time model, 63
Cox model, 83–85 Ghosh-Lin model, 143
delayed entry, 91 LWYY model, 143
joint Cox models, 105 mean function, 143
logrank test, 84 PWP model, 63
Markov, 91 state occupation probability, 134
micro-simulation, 187–190 time-dependent covariate, 89–90
non-Markov, 208–210 WLW model, 150
pseudo-values, 234 recurrent events
time-dependent covariate, 90–94 composite end-point, 199*
SUBJECT INDEX 277
Cook-Lawless estimator, 143, 198* counting process, 22
diagram, 8 delayed entry, 16
direct model, 142–146, 196* Kaplan-Meier estimate, 16
frailty model, 112 restricted mean, 17
Ghosh-Lin model, 144, 199* software, xii
LWYY model, 143, 197* (start, stop, status) triple, 21
Mao-Lin model, 199* state
marginal hazard model, 149, 152b, absorbing, 18, 31*
203* expected length of stay, 30*
mean function, 142 space, 30*
partial transition rate, 198* transient, 18, 31*
plug-in, 169–170* state occupancy, see state occupation
probability of at least h events, 170* state occupation probability, 18, 30*, 164*,
progressive model, 170* 171*
pseudo-values, 235–236 direct model, 200–201*
PWP model, 63 sub-distribution
terminal event, 146b, 198* function, 30*
WLW model, 149 hazard, see hazard
registry-based study summary box
testis cancer study, 5 conditional parameters, 20
regression coefficient, 25 counting processes and martingales, 82
interpretation, 25 cumulative incidence function, 131
regression function, 54 direct models, 147
regression model, 25 examples of event history data, 13
residuals Fine-Gray model, 142
cumulative, 205–208* independent censoring/competing risks,
cumulative martingale, 207* 159
cumulative pseudo, 241* intensity, hazard, rate, 65
cumulative sums of, 206* likelihood factorization, 72
martingale, 207*, 214 marginal hazard models, 152
Schoenfeld, 208*, 215 marginal parameters, 17
score, 208*, 215 Markov and non-Markov processes,
restricted mean, 16, 30* 176
direct model, 136, 192* mean function and terminal event, 146
plug-in, 122, 166* micro-simulation, 190
pseudo-values, 229 model-based and robust SD, 152
risk, 1, 15 multi-state model, competing risks, and
risk factor, 2 censoring, 29
risk set, 43, 77* multi-state survival data, 2
robust SD, see GEE plug-in, 134
pseudo-values, 229
sample, 27 time zero, 61
sandwich estimator, see GEE time-dependent covariate or state, 100
Schoenfeld residuals, see residuals time-variable and time-dependent
semi-competing risks, 151, 158* covariates, 94
semi-Markov process, 31*, 91*, 132, survival function, 15, 30*
174–175* area under, 125
semi-parametric model, 41, 57, 73*, 81* cloglog link, 121
shared parameters for intensity models, see direct model, 135, 192*
intensity model non-parametric estimator, see
small set of survival data, 14 Kaplan-Meier estimator
at-risk function, 22 one-to-one with hazard, 20, 120
278 SUBJECT INDEX
plug-in estimator, 118, 122, 165* calendar time, 13
plug-in estimator from Cox model, choice of, 13, 91
166* delayed entry, 24
pseudo-values, 223 several, 53, 94
transient state, 18, 31*
target parameter, 25 transition
target population, 27 intensity, 18, 31*, 69*
terminal event, 8, 9, 143, 198* probability, 18, 31*
testis cancer study, 5–6 Turnbull estimator, 246*
Lexis diagram, 85 two-state model, 3
Poisson model, 85–86 diagram, 3, 19
time axis, see time-variable direct model, 135–136, 192–193*
time lost, 30* interval-censoring, 246*
cause-specific, 130, 196* likelihood function, 72*
competing risks, 130, 196* plug-in, 117–125, 165–166*
direct model, 139, 196* type-specific covariate, see covariate
plug-in, 130, 168*
pseudo-values, 233–234 utility, 199
time origin, 13
time to (first) entry, see marginal parameter von Mises expansion, 239*
time zero, 13, 61b
time-dependent covariate, see covariate wide data format, 21
time-dependent strata, 103 wild bootstrap, 206*
time-variable, 61b, 94b WLW model, 149, 152b, 203*
age, 13 competing risks, 150, 203*