Gap Time Distributions: DFCI Biostat, Nov 12, 1999
Gap Time Distributions: DFCI Biostat, Nov 12, 1999
Recurrent events (time between j th and j + 1st). T1 j th event. T2 = time to j + 1st event.
Comparisons of treatment groups are primarily descriptive: eg consider duration of response only a subset of cases respond responders from different treatment groups may differ in ways other than treatment received thus cannot use the baseline randomization to infer a causal effect of treatment Terminology and examples used in the rest of the talk will be for survival beyond progression, but the same issues arise for other gap time distribution inference problems.
E9486 Multiple Myeloma Groups dened by markers measured at time of progression. Is survival beyond progression different among groups? 451 cases with disease progression, 413 dead, 41 still alive
Following slides: Plots of the raw data, and KM estimates of survival beyond progression for groups formed by time of progression, show strong evidence for dependence.
E9486
Dead survtime-progtime 6 Alive 8 4
0 2 4 progtime (years) 6
Another approach: t a proportional hazards model with response = survival beyond progression, covariate = time to progression. Use penalized partial likelihood to give a exible estimate of the effect. z <- cox.spline(a,usurv-uprogtime,ustat,uprogtime) cox.spline.plot(z,1, font=5, lwd=2,xlab=Time of Progression, main=Survival Beyond Progression Hazard Ratio)
4 Time of Progression
Let
and Then
S y
Z1 =
0
= P T2 , T1 j
y :
S y x dF1 x :
That is, the marginal distribution S y is the weighted average of the conditional distributions S y jx.
= x.
With limited follow-up, S y jx is not identiable or estimable for all x; y . Then the data only contain information on S y jx for x; y combinations with x + y c. Let c = the maximum follow-up. Eg, for E9486, c = 11:5 years.
:
In the plot of the data, x is on the horizontal axis and y on the vertical. Only have information on S y jx for x; y points below the line y = 11:5 , x.
10
E9486
8 x=4.5 Dead Alive
y=survtime-progtime
0 2 4 x=progtime (years) 6
y=11.5-x
11
Since do not have info on S y jx for all x; y , cannot estimate the marginal distribution S y . Exception: If T1 y c , b.
b always, and b c, then S y is estimable for
Eg, for duration of response, if response always occurs within 6 months of entry when it occurs at all, and c = 5 years, then S y is estimable for y 4:5.
12
Dependent Censoring Let C be the potential censoring time measured from baseline. Suppose C is independent of T1 ; T2 . Actually observe minfT1 ; C g, I T1
C , minfT2 ; C g, I T2 C .
For the gap time distribution, the failure time is T2 , T1 and the censoring time is C , T1 . If T2 , T1 is correlated with T1 , then C , T1 will be correlated with T2 , T1 , so the censoring and failure times will not be independent when measured from T1 . In E9486, censoring from the accrual and follow-up periods should be roughly uniform over 7; 11:5, which is the region between the two diagonal lines, below. (Of course, our follow-up is not quite that consistent or reliable).
13
E9486
Dead y=survtime-progtime y=7-x Alive 8 6
0 2 4 x=progtime (years) 6
y=11.5-x
14
= 1 years.
Subjects censored at this time will all have long times to progression. Since they have long times to progression, they will have longer survival beyond progression, because of the correlation between the two quantities. Hence the censored subjects are not a random subset of the risk set at any given time (dependent censoring) Thus standard methods applied to the marginal gap time data (eg Kaplan-Meier) will be biased, even when S y is identiable. (Will be unbiased on the difference between the maximum of the support of the progression dist. and the minimum of the support of the censoring dist.)
15
What to Estimate / How to Estimate it? 1. Focus on the conditional distribution S y jx. Identiable for x + y
T2 c.
generally dependent censoring will not be a problem Can model the dependence of the distribution of T2 , T1 on T1 (eg Cox model) Inferences on other factors from tests and Cox models stratied on TTP groups are approximately valid Can give nonparametric kernel-type estimators (eg Dabrowska, SJS, 1987). See the function surv.smooth() in the local S library.
16
Is the marginal distribution really of interest with dependence? 2. Focus on the conditional distribution
H yx
j = P T2 , T1
y T1
j x
c.
Identiable for x + y
Lin, Sun, Ying (Bka, 1999) give a consistent estimator (below), and in unpublished work have developed a generalization of the logrank test. Lin, Sun, Ying: Let H x; y = P T1
ji ji i
= P C t. ~ Index subjects with the subscript i, and let T = minfT ; C g, = I T C , j = 1; 2, i = 1; : : : ; n. Note that H y jx = H x; y =H x; 0.
y and G t
ji ji i
x; T2 , T1
17
1
n
X
i
I T1i
x; T2 , T1
i
is unbiased for H x; y . With censoring, note that Gt 0 for t ~ ~ y = 0 when 1i = 0. I T2i , T1i
c, then
i i i i i i
c. Also,
If x + y
~ ~ ~ ~ EfI T1 x; T2 , T1 y=Gy + T1 jT1 ; T2 g = EfI T1 x; T2 , T1 y; C , T1 y=Gy + T1 jT1 ; T2 g = I T1 x; T2 , T1 y;
i i i i i i i i i i i
so
18
1
n
X ~
i
I T1i
~ ~ x; T2 , T1
i
y =G y
~ + T1
i
is unbiased for H x; y . Substituting a consistent estimator for Gt then gives a consistent estimator for H x; y .
^ Can use the Kaplan-Meier estimator Gt, computed from the data ~ T2i ; 1 , 2i .
Note that the full data set measured from baseline is used to estimate G. Asymptotic variance is not trivial to calculate.
19
20
h1 <- length(stime[stime-rtime > tp[j] & subr]) out[j] <- h1/h0 } } else { cd <- survfit(Surv(stime,1-sstat)1) h0 <- sum(1/(summary(cd,times=sort((rtime)[subr]))$surv)) for (j in 1:length(tp)) { if (tp[j]>maxstime-tp2) out[j] <- NA else { i <- stime-rtime > tp[j] & subr out[j] <- if (length(i[i])==0) 0 else {sum(1/( summary(cd,times=sort((rtime+tp[j])[i]))$surv))/h0 } } } } out }
21
> # 9486; H(i|2) > survbrec(1:10,d$progtime,d$progstat,d$survtime,d$survstat,2) [1] 0.35475379 0.19061398 0.12707598 0.07412766 0.04264862 [6] 0.02346133 0.01434252 0.01304146 0.02603643 NA > ## NOTE: not monotone > # 9486; H(i|5) > survbrec(1:7,d$progtime,d$progstat,d$survtime,d$survstat,5) [1] 0.51997911 0.30615689 0.20832986 0.12561816 0.06387121 [6] 0.03879853 NA
Simulation: T1 Exp1, T2 , T1 jT1 Weibull1 + :5T1 ; 2, corT1 ; T2 , T1 = :53, C U 0; 2:5, n = 200 On average expect 73 cases without progression, 70 progressed but alive, and 57 progressed and died.
f1 <- function(tpp,cutoff,n=200,mc=2.5){ u1 <- rexp(n) #progtime u2 <- u1+rweibull(n,shape=2,scale=1+.5*u1) #survtime
22
truc <- truu <- tpp sub <- u1<=cutoff; nc <- length(sub[sub]) for (j in 1:length(tpp)) { xj <- u2-u1>tpp[j] truu[j] <- sum(xj)/n truc[j] <- sum(xj & sub)/nc } ct <- mc*runif(n) #censoring i1 <- ifelse(ct<u1,0,1) u1 <- pmin(u1,ct) i2 <- ifelse(ct<u2,0,1) u2 <- pmin(u2,ct) d2 <- survbrec(tpp,u1,i1,u2,i2,cutoff,maxstime=mc) sub <- u2-u1>0 & i1 == 1 k1 <- summary(survfit(Surv(u2-u1,i2)1,subset=sub), times=tpp)$surv cbind(truu,truc,d2,k1) }
23
ntri <- 500 tpp <- c(.5,1) out <- array(NA,c(length(tpp),4,ntri)) for (i in 1:ntri) out[,,i] <- f1(tpp,1) dimnames(out) <- list(format(tpp),c(True Unc, True Cond,LSY,KM),NULL) # Estimates of means apply(out,c(1,2),mean) True Unc True Cond LSY KM 0.5 0.87127 0.8360188 0.8393883 0.8450474 1.0 0.59050 0.4958633 0.4990447 0.5027919 > # Standard errors of means > sqrt(apply(out,c(1,2),var)/ntri) True Unc True Cond LSY KM 0.5 0.001071808 0.001477317 0.002585383 0.001610749 1.0 0.001596997 0.002019595 0.003563854 0.002712360
24
> # Estimates of variances > apply(out,c(1,2),var) True Unc True Cond LSY KM 0.5 0.0005743859 0.001091232 0.003342102 0.001297256 1.0 0.0012752004 0.002039382 0.006350528 0.003678449
True Unc = True unconditional probability S y True Cond = True conditional probability H y jx LSY = Lin Sun Ying estimator of H y j1 KM = Kaplan-Meier applied to the gap time data
Conditional and marginal distributions are different LSY is essentially unbiased for the true conditional distribution
25
KM is biased as an estimator of the true unconditional distribution It is coincidence that KM is also nearly unbiased for the conditional distribution. The KM would be the same for any value of cutoff above, while H y jx would vary with x =cutoff. Variance of LSY estimator is substantially larger than KM. How efcient is the LSY procedure?
26
Summary If the time to the initiating event is correlated with the gap time to the terminating event, then In general the marginal gap time distribution is not identiable When it is identiable, standard methods for inference on the marginal distribution may be invalid due to dependent censoring Various conditional distributions can be estimated, and inference should focus on these. LSY estimator is not monotone, and its efciency properties are not clear.