100% found this document useful (10 votes)

152 views

Bayesian Analysis of Failure Time Data Using P Splines Free Download

The document discusses Bayesian analysis of failure time data using P-splines, focusing on modeling hazard rates and covariates in both discrete and continuous time. It introduces key concepts of failure time analysis, likelihood construction, and computational methods such as MCMC. The thesis aims to provide a comprehensive overview of Bayesian methods and their application in various models for analyzing failure time data.

Uploaded by

kanhquoan.gnguangnga.y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (10 votes)

152 views

Bayesian Analysis of Failure Time Data Using P Splines Free Download

Uploaded by

kanhquoan.gnguangnga.y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Bayesian Analysis of Failure Time Data Using P Splines

Visit the link below to download the full version of this book:

https://medipdf.com/product/bayesian-analysis-of-failure-time-data-using-p-splin
es/

Click Download Now

Contents

1 Introduction 1
1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Basic Concepts of Failure Time Analysis 5

2.1 Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Likelihood Construction . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Censoring and Truncation . . . . . . . . . . . . . . . . . 8
2.3.2 Time Varying Covariates . . . . . . . . . . . . . . . . . . 12
2.4 Relative Risk and Log-Location-Scale Family . . . . . . . . . . . 14

3 Computation and Inference 17

3.1 MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Metropolis-Hastings . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Inference from Simulation Output . . . . . . . . . . . . . . . . . 20
3.3 Model Diagnostics and Comparison . . . . . . . . . . . . . . . . 22
3.3.1 Criterion Based Methods . . . . . . . . . . . . . . . . . . 23
3.3.2 Martingale Residuals . . . . . . . . . . . . . . . . . . . . 26
3.4 Bayesian P-Splines . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Extended Linear Predictor . . . . . . . . . . . . . . . . . 36

4 Discrete Time Models 45

4.1 Estimation Based on GLM Methodology . . . . . . . . . . . . . . 46
4.1.1 Grouped Cox . . . . . . . . . . . . . . . . . . . . . . . . 52
VI Contents

4.1.2 Logistic Model . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Estimation Based on Latent Variable Representation . . . . . . . . 54
4.2.1 Probit Model . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Scale Mixtures of Normals . . . . . . . . . . . . . . . . . 56
4.2.3 Grouped Cox II . . . . . . . . . . . . . . . . . . . . . . . 58

5 Application I: Unemployment Durations 61

6 Continuous Time Models 69

6.1 Lognormal and Extensions . . . . . . . . . . . . . . . . . . . . . 69
6.2 Relative Risk Models . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.1 Exponential Distribution . . . . . . . . . . . . . . . . . . 76
6.2.2 Weibull Distribution . . . . . . . . . . . . . . . . . . . . 77
6.2.3 Other Baseline Hazards . . . . . . . . . . . . . . . . . . . 79
6.3 Nonparametric Relative Risk Models . . . . . . . . . . . . . . . . 80
6.3.1 Piecewise Exponential Hazard . . . . . . . . . . . . . . . 80
6.3.2 Nonparametric Relative Risk Models . . . . . . . . . . . 83

7 Application II: Crime Recidivism 87

8 Summary and Outlook 95

Appendix A: Description of R Function 99

Bibliography 103
List of Figures

2.1 Some hazard rates . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Some survivor functions . . . . . . . . . . . . . . . . . . . . . . 6

3.1 B-Spline basis functions of degree 0,1,2,3. . . . . . . . . . . . . . 31

3.2 Function estimation via B-splines . . . . . . . . . . . . . . . . . 32
3.3 Inﬂuence of smoothing parameter . . . . . . . . . . . . . . . . . 33

4.1 Common response functions . . . . . . . . . . . . . . . . . . . . 52

4.2 Error approximation logistic . . . . . . . . . . . . . . . . . . . . 57

5.1 Convergence problems . . . . . . . . . . . . . . . . . . . . . . . 63

5.2 Histograms of numbers of effective parameter draws for both sam-
plers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Fixed effects - discrete time . . . . . . . . . . . . . . . . . . . . . 65
5.4 Density curves for martingale residuals at selected points in time.
The grouped Cox model with complementary log-log link is can
be seen to perform slightly better, the distribution of the residuals
is more centered around zero for this model. . . . . . . . . . . . . 66
5.5 Nonlinear effects - discrete time (a) . . . . . . . . . . . . . . . . 67
5.6 Nonlinear effects - discrete time (b) . . . . . . . . . . . . . . . . 68

6.1 (a) hazard (b) density (c) survivor function of the lognormal distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 (a) hazard (b) density (c) survivor function of the Weibull distribu-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Example of a bathtub shaped hazard . . . . . . . . . . . . . . . . 80
VIII List of Figures

7.1 Fixed effects continuous time . . . . . . . . . . . . . . . . . . . . 89

7.2 Effective number of parameters - continuous times . . . . . . . . 90
7.3 Baseline hazard for the piecewise constant model with 5% and
95% quantiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.4 Baseline hazard for the lognormal model with 5% and 95% poste-
rior quantiles. Note that the posterior quantiles for the lognormal
model are very narrow and barely visible. . . . . . . . . . . . . . 90
7.5 Estimated nonlinear effects - continuous time (a) . . . . . . . . . 91
7.6 Estimated nonlinear effects - continuous time (b) . . . . . . . . . 93
7.7 Selected sampling paths . . . . . . . . . . . . . . . . . . . . . . . 94
List of Tables

2.1 Discrete data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Discrete data - longitudinal . . . . . . . . . . . . . . . . . . . . . 11
2.3 Episode splitting . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Loss functions and corresponding estimators . . . . . . . . . . . . 21

4.1 Some mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1 Model assessment . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.1 Data set expansion for piecewise constant model, given knots 0, 2,
5, 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.1 Model assessment . . . . . . . . . . . . . . . . . . . . . . . . . 88

1 Introduction

Failure time analysis is a form of regression analysis where the time until an event
occurs is of interest. The event is generically referred to as failure in this thesis,
the observational units are referred to as individuals.
Unlike most regression models the model is not formulated for the conditional
expectation. Most regression models for failure time analysis are formulated in
terms of the hazard rate, giving the risk of failure and will be deﬁned precisely in
the following. A general formulation for the hazard rate is (Cox and Oakes 1984,
p. 70):
β ) = h0 (t)ρ(β1 z1 + ..., βk zk ) = h0 (t)ρ(ηi ).
h(t|zzi ,β

Here, the baseline hazard h0 (t) gives the hazard of an individual with standard
conditions, corresponding to z = 0 , ηi = z
i β is the linear predictor and ρ(·) is a
nonnegative function satisfying ρ(0) = 1. Splines allow the replacement of linear
effects of the form z
i β in the linear predictor by more general functions. This is
useful for flexible modeling of the baseline hazard, treating time formally like a
covariate. A spline is a function consisting of local polynomials that are joined
together at points in the domain of the covariate. Splines can be understood as a
regression model: Every spline can be written as a weighted sum of basis func-
tions depending on a covariate, hence a regression model where the regression
coefficients are given by the weigths.
The aim of this thesis is to present Bayesian methods for models where either
the hazard rate, covariates, or both are modeled via splines, in discrete and contin-
uous time. B-spline basis functions in combination with a penalty to avoid over-
fitting (usually called P-splines) are the main building blocks used for modeling.
P-splines have good numerical properties, and allow fast computation. Addition-
ally, other useful basis functions for failure time analysis will be given. Failures are
always assumed to be nonrecurrent. A fully Bayesian perspective using MCMC

M. Kaeding, Bayesian Analysis of Failure Time Data Using P-Splines, BestMasters,

DOI 10.1007/978-3-658-08393-9_1, © Springer Fachmedien Wiesbaden 2015
2 1 Introduction

methods is taken.

1.1 Outline

This thesis is structured as follows. At first the basic concepts of failure time
analysis are introduced. For the statistical analysis of failure time data, time is
represented by a random variable which is characterized by quantities that are
specific for failure time modeling. These quantities can be used to construct the
likelihood by taking into account special properties of failure time data, such as
censoring, which refers to failure times that are not fully observed. Next, two
central model families are introduced; the relative risk and the log-location-scale
model family. The subsequent chapter gives an overview of computational and
inferential methods as they are relevant for model building. The chapter concludes
with the introduction of Bayesian P-splines using the Gaussian likelihood as an
example. The sampling scheme for Gaussian responses can be adjusted for the
probit model for discrete time and the lognormal model for continuous time. Sub-
sequently, models for the analysis of discrete time are introduced. Gibbs sampler
for these models are categorized here by methods embedded in the generalized lin-
ear model (GLM) and the latent variable framework. Based on those frameworks
efficient Bayesian sampling schemes can be constructed. From the GLM frame-
work iteratively weighted least squares (IWLS) proposals based on fisher scoring
for the Metropolis-Hastings algorithm can be derived. Many sampling schemes
for models using P-splines were developed on the basis on IWLS proposals, in-
cluding sampling schemes for continuous time models. Discrete time models are
illustrated using data of unemployment durations. Subsequently, estimation for
continuous time is described. The focus is on relative risk models but the lognor-
mal and extensions based on will also be discussed. The methods are illustrated
using a data set on crime-recidivism. As final chapter, a summary with outlook
will be given.
1.2 Notation 3

1.2 Notation

In this thesis standard notation as often used in the literature is used. The distinc-
tion between a random variable Y and its realizations y will be made in the intro-
ductory chapters and ignored for the later chapters when the meaning is obvious.
The conventions used in this thesis are listed here. Conditioning on parameters
will often be surpressed for notational simplicity.
4 1 Introduction

Symbol Explanation
x scalar
x = (x1 , ..., xn ) vector
X matrix
I[·] indicator function
diag(x1 , ..., xn ) diagonal matrix obtained from x
A,B
bdiag(A B) block diagonal matrix out of matrices A ,B
B

The following table gives an overview over important ﬁxed symbols.

Symbol Explanation
h(t) hazard rate
H(t) cumulative hazard rate
h0 (t) baseline hazard
H0 (t) cumulative baseline hazard
G(t) survivor function
D available data
L(θθ |D) likelihood
vi censoring indicator
η linear predictor

The following table gives an overview over the shorthand used for the distributions.

Distribution Shorthand Parameter

normal N(μ, σ 2 ) expectation μ, variance σ 2
truncated normal T N(a,b) (μ, σ 2 ) expectation μ, variance σ 2 , support (a,b)
lognormal LN(μ, σ 2 ) location μ,shape σ
inverse gamma IG(α, β ) shape α, scale β
gamma G(α, β ) shape α, rate β
Poisson P(λ ) mean/variance λ
inverse Wishart IW (a,BB) degrees of freedom a, scale matrix B
2 Basic Concepts of Failure Time Analysis

2.1 Continuous Time

Time is represented by the nonnegative random variable T with cumulative density

function
F(t) = P(T ≤ t),

and density
f (t) = dF(t)/dt.

For failure time analysis, T is generally characterized by other quantities. The

survivor function gives the probability that T exceeds t:

G(t) = P(T > t) = 1 − F(t).

It always holds that no individual has failed at T = 0

G(0) = 1, (2.1)

and it is usually assumed that every subject will fail eventually

lim G(t) = 0. (2.2)

t→∞

Variables with survivor function not satisfying 2.2 are called defective, for those
it follows that E[T]=∞. The probability of failure in the small interval [t,t+dt) can
be approximated by h(t)dt (Aalen et al. 2008, pp. 5–17). The function h(t) is the
hazard rate, deﬁned as:

P(t ≤ T < t + Δ|T ≥ t)

h(t) = lim . (2.3)
Δ→0 Δ

M. Kaeding, Bayesian Analysis of Failure Time Data Using P-Splines, BestMasters,

VXUYLYRUIXQFWLRQ

KD]DUG

W W

Figure 2.1: Some hazard rates Figure 2.2: Some survivor functions

The probability P(t ≤ T < t + Δ|T ≥ t) is

F(t + Δ) − F(t)
.
G(t)

Hence 2.3 is
1 F(t + Δ) − F(t) F (t) f (t)
lim = = ,
G(t) Δ→0 Δ G(t) G(t)
showing that the hazard is a conditional density.
The cumulative hazard rate is
t t
f (u)
H(t) = h(t) = du = [− log G(u)]t0 = − log G(t),
0 0 G(u)

due to 2.1. Hence, the survivor function can be written in terms of the hazard rate:
t
G(t) = exp(− h(t)) = exp(−H(t)). (2.4)
0

The same applies for the density:

f (t) = h(t)G(t) = h(t) exp(−H(t)).

Because of these relationships, the random variable T is fully speciﬁed by one of

the given quantities. From 2.4, it can be seen that the function h(t) only needs to
2.2 Discrete Time 7

satisfy t
h(s) ds < ∞,
0
for all t and ∞
h(s) ds = ∞
0
to be the hazard rate of a nondefective continuous variable (Kalbﬂeisch and Pren-
tice 2002, p. 9). Many models in failure time modeling are formulated in terms of
the hazard rate ﬁrst.

2.2 Discrete Time

In the case of grouped failure times, an unobservable continous random variable T

is partitioned into m+1 intervals [a0 = 0, a1 ), [a1 , a2 ), ..., [am , am+1 = ∞), (Lawless
2003, p. 370). Observed are discrete failure times from the random variable T =
{1, 2, .., m + 1}, so that T= t corresponds to T ∈ [at−1 , at ). The hazard in terms of
T is
P(T = t) f (t)
h(t) = P(T = t|T ≥ t) = = . (2.5)
P(T ≥ t) G(t − 1)
Expressing 2.5 in terms of T gives:
at
G (at ) − G (at−1 )
h(t) = = 1 − exp(− h (u) du).
G (at ) at−1

This is the probability of failure in interval t, conditional on reaching the interval.

A discrete time model can be speciﬁed in terms of T or T . Failure after interval t
is a result of a sequence of binary trials unfolding in time (Kalbﬂeisch and Prentice
2002, p. 9):

G(t) = P(T > t) = P(T = 1 ∩ T = 2... ∩ T = t) =

P(T = 1)P(T = 2|T = 1)P(T = 3|T = 1, T = 2)...P(T = t|T = 1, ..., T = t − 1).

The probability P(T = x|T = x − 1) is given by 1 − h(x), it follows that in analogy

to the continuous case the survivor function can be expressed in terms of the hazard
8 2 Basic Concepts of Failure Time Analysis

rate:
t
G(t) = ∏ (1 − h( j)).
j=1

Assuming grouped failure times might not be appropriate in all cases, as some
random variables are intrinsically discrete. Some helpful results follow from this
assumption however, and estimation is easier by deriving inferences on the like-
lihood contributions following from 2.2, leading to an identical modeling frame-
work.

2.3 Likelihood Construction

Failure time data have some special characteristics which have to be accounted
for in the construction of the likelihood. A failure time is referred to as censored
when the actual failure time is not observed but it is only known to fall into an
interval. Failure times are left-truncated if they are only observable if they exceed
a truncation time. Time varying covariates are often available in the data set. In
the following sections, based on Klein and Moeschberger (2003, pp. 63-77), it will
be clarified how these conditions are accounted for in the formulation of the likeli-
hood. Conceptually, these adjustments can be represented in an unified framework
by varying the likelihood contributions. As a consequence, the likelihood becomes
more difficult to work with but there are computational methods which simplify
estimation.