0% found this document useful (0 votes)

31 views12 pages

Quantifying Treatment Differences in Confirmatory Trials With Delayed Effects

This document discusses methods for quantifying treatment differences in clinical trials with delayed treatment effects, such as those seen in immuno-oncology. It compares the restricted mean survival time (RMST), hazard ratio (HR), and weighted log-rank test. The weighted log-rank test is generally most powerful for detecting delayed effects. The HR linked to the weighted log-rank test best captures maximal treatment differences. The RMST performs better than the log-rank test under certain conditions when the truncation time is close to the survival curve tails. The document recommends reporting the weighted log-rank test-linked HR to measure treatment effect when substantial non-proportional hazards are suspected.

Uploaded by

Gaston GB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views12 pages

Quantifying Treatment Differences in Confirmatory Trials With Delayed Effects

Uploaded by

Gaston GB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Quantifying treatment differences in confirmatory trials with

delayed effects

José L. Jiménez
Novartis Pharma A.G., Basel, Switzerland
arXiv:1908.10502v1 [stat.ME] 28 Aug 2019

Abstract
Dealing with non-proportional hazards is increasingly common nowadays when designing
confirmatory clinical trials in oncology. Under these circumstances, the hazard ratio may not
be the best statistical measurement of treatment effect, and nor is log-rank test since it will
no longer be the most powerful statistical test. Possible alternatives include the restricted
mean survival time (RMST), that does not rely on the proportional hazards assumption and
is clinically interpretable, and the weighted log-rank test, which is proven to outperform the
log-rank test in delayed effects settings. We conduct simulations to evaluate the performance
and operating characteristics of the RMST-based inference and compared to the log-rank test
and weighted log-rank test with parameter values (ρ = 0, γ = 1), as well as their linked hazard
ratios. The weighted log-rank test is generally the most powerful test in a delayed effects
setting, and RMST-based tests have, under certain conditions, better performance than the
log-rank test when the truncation time is reasonably close to the tail of the observed curves.
In terms of treatment effect quantification, the hazard ratio linked to the weighted log-rank
test is able to capture the maximal treatment difference and provides a valuable summary
of the treatment effect in delayed effect settings. Hence, we recommend the inclusion of the
hazard ratio linked to the weighted log-rank test among the measurements of treatment effect
in settings where there is suspicion of substantial departure from the proportional hazards
assumption.

Keywords: delayed effects; non-proportional hazards; restricted mean survival time; weighted
log-rank.

1 Introduction
Randomized controlled clinical trials are the gold standard in drug development to confirm both
safety and efficacy of new drugs. When the primary endpoint is a time-to-event endpoint, the
objective is to quantify the difference between the survival curves of the treatment arms. The most
common time to event endpoints used for confirmatory phase III trials in oncology are: progression
free survival (PFS) and overall survival (OS). PFS corresponds to the time from randomization
until disease progression or death, whereas OS corresponds to the time from randomization until
death.
In this article we focus on the application to the immuno-oncology (IO) space. IO agents have
an effect on both the subject’s immune system and the tumor’s microenvironment. This way, the
tumors may be eliminated from the host or the disease progression may be delayed. In contrast with
chemotherapeutic agents, the effect of an IO agent is not directed to the tumor but to the subject’s

1
immune system, which causes that the effect is not observable immediately. This lag between the
activation of the immune cell is known in the literature as a delayed treatment effect. This delay
induces non proportional hazards and may translate to an overall underestimation of the PFS or
OS difference with respect to the control treatment arm (i.e., the hazard ratio (HR) may increase
towards 1, as the delay increases).
There exist in the literature multiple approaches to quantify treatment differences and test the
null hypothesis (HR=1) vs the alternative hypothesis (HR < 1) in confirmatory clinical trials.
The weighted log-rank test with the Fleming and Harrington class of weights [3] has gained, in
the recent years, considerable attention in the IO space as it allows to weight late differences between
survival curves over early differences by tuning its two parameters (ρ, γ). On this matter, [5] made
an extensive evaluation of the weighted log-rank test in confirmatory trials with delayed effects.
However, this comparison was made based only on power, and did not explore the quantification of
treatment effect.
[4] explored the differences between restricted mean survival time (RMST) and the hazard ratio
(HR) in a large number of scenarios that include proportional hazards and non proportional hazards.
The RMST is a robust and clinically interpretable measurement of the survival time distribution
that only depends on the selection of cutoff (truncation) time t∗ that needs to be pre-specified to
avoid selection bias (see [9]). Its clear advantage over the HR in a delayed effects setting is that it
does not rely on the proportional hazards assumption. Moreover, analogous to the hazard ratio as
a measurement of the relative risk of event hazard, a similar measurement for the RMST can be
obtained by simply doing the ratio of RMST between arms (control vs. experimental), with ratio
< 1 meaning survival benefit in the experimental arm. In the non proportional hazards setting in
which we are interested, the work from [4] concluded that the RSMT based tests are more efficient
that the log-rank test under certain censoring conditions (i.e., it achieves higher power). However,
the HR still gives a slightly better estimate of the maximal treatment differences, although as the
dropout rate increases the differences between the HR and the RMST ratio tend to disappear. This
article however does not include the performance of the weighted log-rank test (and its adjusted
HR), which is proven to be more powerful than the log-rank test in settings with delayed effects.
Mind that in [4] the HR used is a weighted average of the HR over time on the log scale and not
just the HR from the standard Cox model.
In this article extend the work of [4] and evaluate the differences in performance between RMST
based tests, and the weighted log-rank test with the parameter combination (ρ = 0, γ = 1), as well
as between the ratio between RMST of each treatment group, and the HR linked to the weighted
log-rank test and referred as “adjusted HR” (see section 2.2), in a setting with delayed effects. For
comparison purposes, we also include the HR in the comparison.
The rest of the manuscript is structured as follows. In section 2 we describe the weighted log-
rank test, the HR that is linked to the weighted log-rank test, and the RMST. In section 3 we
present an empirical evaluation between the log rank test, weighted log-rank test and RMST based
test in simulated scenarios with delayed effects. In section 4 we present an evaluation of a real trial
example. Last, in section 5 we discussion the major findings and conclusions of the article.

2 Method
Following the notation of [5], let T be a vector that contains the event times, ti , i = 1, 2, . . . , D,
between the patients’ enrollment date and the patients’ final event date, tD , such that t1 < t2 <
· · · < tD . Let the number of events at time ti be denoted as di , the total number of patients at
risk at that time be denoted as ni , and the effect delay (in months) be denoted as . As previously
described if t < both survival curves go in parallel and once t ≥ , the survival curves will start
diverging. Hence, we assume the following density functions fj (t), survival functions Sj (t) and

2
hazard functions hj (t) for the control group (j = 1) and for the experimental group (j = 2):

f1 (t) = λ exp(−λt), S1 (t) = exp(−λt) and h1 (t) = λ,

(1)

λ exp(−λt) exp(−λt) λ if 0 ≤ t <
f2 (t) = , S2 (t) = and h2 (t) = ,
cψλ exp(−ψλt) c exp(−ψλt) ψλ if t≥
h i R∞
1
where c = exp ψλ ψ−1 so that 0 f2 (t)dt = 1. This way, we assume a step function for the
hazard ratio where from time 0 to , the hazard ratio is equal to 1, and from time the hazard ratio
is equal to 1/ψ.
In this article we assume that the control group receives the standard of care and the experimen-
tal group receives a combination of the standard of care plus the IO agent which causes the delayed
effect. Hence, any observed difference from time 0 until time is random and the conclusions we
obtain are only applicable to studies where a similar assumption is made; otherwise, we cannot
guarantee that from time 0 to time , both groups have a common survival function.

2.1 Weighted log-rank test

The weighted log-rank test is defined as
PD
i=1 wti (d1i − E(d1i ))
Zw = q PD , (2)
2
w
i=1 ti Var(d 1i )

di (ni −di )
where E(d1i ) = n1i × ndii , Var(d1i ) = n1i nn2i2 (n i −1)
and Zw ≈ N (0, 1) under the null hypothesis
i
(HR=1).
Fleming and Harrington (1981) [3] proposed the use of wi to weight early, middle and late
differences through the Gρ,γ class of weighted log-rank tests, where the weight function at a time
point ti is equal to

wti = Ŝ(ti )ρ (1 − Ŝ(ti ))γ , (3)

where Ŝ(ti ) corresponds to the Kaplan-Meier estimator.
Depending on the values of ρ and γ, we will have different weight functions that will emphasize
early differences (ρ = 1, γ = 0), middle differences (ρ = 1, γ = 1) or late differences (ρ = 0, γ = 1) in
the hazard rates or the survival curves. The parameter combination (ρ = 0, γ = 0) attributes equal
weights to all data values and hence does not emphasize any survival differences between treatment
arms (i.e., the log-rank test).
Prior specification of (ρ, γ) is always advisable for the trial integrity, although some authors
(see e.g., [7]) note that the value of (ρ, γ) can be modified at the interim analysis without type-I
error rate inflation. The standard practice is to estimate, at the end of the trial, a hazard ratio
across the entire study through the standard Cox model [2]. However, unless we use the parameter
combination (ρ = 0, γ = 0), there will be a disconnect between the hazard ratio (i.e., the standard
Cox model) and the weighted log-rank test. An adjusted HR is that incorporates the Fleming and
Harrington class of weights in presented in section 2.2.

2.2 Hazard ratio linked to the weighted log-rank test

We follow the method proposed by [8] where, given the weight function wti defined in equation (3),
the effect adjustment factor Ai is defined as

3
wti
Ati = . (4)
max(wti )
In (4), Ati is non-negative and has a maximal value 1. The hazard function in the Cox model
proposed by [8] is defined as
∗
λ(ti ; X) = λ0 eAti ×β×X = λ0 eβ×Xti , (5)
where λ(ti ; X) has a constant coefficient and a time-varying covariate Xt∗i = Ati × X that represents
the assignment weighted by the adjustment factor. The β̂ from the Cox models with time-varying
coefficients are proven to be unbiased (see [1]).
Also, since Ati ≤ 1, we can interpret β̂ as the estimated maximal effect. The time points
where we observe the maximal treatment difference are weighted with Ati = 1 in the corresponding
weighted log-rank test. Moreover, this weighted log-rank test (and consequently the score test from
this model) is optimal and will have the highest power based on the Scoenfeld’s proof [10].
The adjusted hazard ratio is therefore defined as

λ0 eAti ×β×1
HRti = Ati ×β×0
= eAti ×β , (6)
λ0 e
where eβ represents the maximal effect.

2.3 The restricted mean survival time

The RMST µ of a random time to event variable T is a measure of the survival time distribution
that does not require a proportional hazards assumption which is defined as the mean of the survival
time Z = min(T, t∗ ) truncated at time t∗ > 0. It corresponds to the area under the survival curve
S(t) = P (T > 0) defined from t = 0 to t = t∗ :
Z t∗
∗
µ(t ) = S(t)dt. (7)
0
2 ∗
The variance σ (t ) of X is defined as
Z t∗ Z t∗ 2
2 ∗
σ (t ) = 2 tS(t)dt − S(t)dt . (8)
0 0

To estimate µ in (7) we can use the Kaplan-Meier (KM) estimator of Ŝ(t), hence
Z t∗
µ̂ = Ŝ(t)dt. (9)
0

µ̂(t∗ ) approximately follows a normal distribution with variance

D Z t∗ 2
∗
X di
V (µ̂(t )) = Ŝ(t)dt , (10)
i=1 ti Yi (ti − di )
where di and Yi are the number of events and number of subjects at risk at time ti respectively.
The difference between treatment arms in terms of RMST is estimated as
Z t∗ h i
ŜE (t) − ŜC (t) dt, (11)
0

4
where ŜE (t) and ŜC (t) are the estimated survival curves of the experimental and control arms
respectively. The estimated variance term is defined as V (µ̂E (t∗ )) + V (µ̂C (t∗ )).
Since we have the RMST of both treatment arms, we can compute
R t∗
ŜE (t)dt
R0t∗ . (12)
0
ŜC (t)dt
Equation (12) is, just like the hazard ratio, a measurement of the relative risk of event hazard
with ratio < 1 indicating a survival improvement in the experimental arm. The variance of (12) is
estimated using the delta method.
To have an objective evaluation, in this article t∗ is linked to the data and is pre-specified as i)
the minimum of the maximal observed event times (minimax event time) and ii) minimum of the
maximal observed (event or censored) times (minimax observed time) of each treatment arm.

3 Simulated study
3.1 Setup
The simulation of the survival times T is conducted by randomly drawing sample from U (0, 1) and
backtransforming using the inverse function Sk−1 (U ). We assume that the dropout time variable D
follows an exponential distribution with rate parameters λDE and λDC in the experimental arm and
the control arm respectively. Again, following the notation of [4], let Y denote the time a subject
is enrolled in the trial, and its distribution is the same in both treatment arms. We assume that T
and D are independent and their distribution does not depend on Y . The accrual and event times
from different patients are also independent.
In total, we implement 2 scenarios with different delay times () that go from 0 (i.e., proportional
hazards) up to 4 months delay. The median survival for the control arm is 6 months in both scenarios
whereas the median survival for the experimental arm is 15 and 9 months respectively. Hence, the
true hazard ratios (i.e., maximal treatment differences) will be 0.4 and 0.667 respectively as well.
Sample size is calculated described above and using the Schoenfel’s formula (see [11]). Hence,
the necessary number of events in each scenario is 52 and 258, and the total sample size 75 and 330
respectively.
The 2 scenarios with the different delays are implemented with different dropout rates that
follow an exponential distribution with hazard rates of 1% and 3%. Data is generated using the
nphsim R package [12] where we incorporate a 18 month enrollment (with ramp-up) period with
administrative censoring at 25 months, randomization ratio 1:1, power of 90% and a one-sided level
α of 2.5%. We run M = 10.000 simulated trials where we calculate the empirical power defined as
M
1 X
Power = I(ztest > zα ). (13)
M i=1
The evaluation of the RMST based test (minimax event time), the RMST based test (minimax
observed time), the RMST ratio, the weighted log-rank test with the parameter combination (ρ =
0, γ = 1), the log-rank test, the HR and the adjusted HR is implemented at the same time in the 2
presented scenarios.

3.2 Results
In Figures 1, 2, 3 and 4 we present the empirical comparison between RMST based tests and the
weighted log-rank test with the parameters (ρ = 0, γ = 1), and their respective treatment effect

5
difference estimates. Recall that the adjusted HR is the only method studied in this article that
actually estimates the maximal treatment difference. The HR and the RMST provide a treatment
differences across the entire study.
Figure 1 contains the results of scenario 1 where, under proportional hazards, the hazard ratio
is equal 0.4 (i.e, the maximal treatment difference is hence 0.4). Overall, we can see that in terms of
power, and as expected based on previous literature, the weighted log-rank test with the parameter
combination (ρ = 0, γ = 1) is the test with highest power as the delay increases. When the dropout
rate is equal to 1% and the delay is equal to 0 (i.e., under proportional hazards), the log rank
test has a power of 90% and is the most powerful test. The RMST based test (minimax observed
time) performs slightly worse but outperforming both the RMST based test (minimax event time)
and the weighted log-rank. However, when delay start to increase, weighted log-rank achieves the
highest power, outperforming the log-rank test and the RMST based tests.
The weighted log-rank test remains the most powerful test also when the dropout rates increase
to 3%. However, it is interesting to point out that, as the dropout rate and the delay increase, the
RMST based test (minimax observe time) becomes slightly more powerful than the log-rank test.
Figure 3 contains the estimated treatment difference estimations in scenario 1. For both dropout
rate, we observe that the HR provides a treatment difference across the entire study of 0.41 under
proportional hazards, which increases up to 0.68 with a 4 month delay. Regarding the RMST
ratios, the one using the minimax event time provides an estimate of the treatment difference of
0.62 under proportional hazards, increasing up to 0.83 with a 4 month delay. The RMST ratio
that uses the minimax observed time provides an estimated treatment difference of 0.59 under
proportional hazards that increases up to 0.78 with a 4 month delay. The adjusted HR provides a
maximal treatment difference under proportional hazards of 0.42, which increases up to 0.54 with
a 4 month delay.
Figure 2 contains the results of scenario 2 where, under proportional hazards, the hazard ratio
is equal 0.667 (i.e, the maximal treatment difference is hence 0.667). Overall we can see that, just
like in scenario 1, in terms of power the weighted log-rank test with the parameter combination
(ρ = 0, γ = 1) is the test with highest power as the delay increases. When the dropout rate is
equal to 1% and the delay is equal to 0 (i.e., under proportional hazards), the log rank test has
a power of 90% and is the most powerful test. The RMST based test (minimax observed time)
performs slightly worse, but outperforming both the RMST based test (minimax event time) and
the weighted log-rank test. However, when the delay starts to increase, the weighted log-rank
achieves the highest power, outperforming the log-rank test and the RMST based tests.
The weighted log-rank test remains the most powerful test also when the dropout rates increase
to 3%. However, it is interesting to point out that, as the dropout rate and the delay increase, in
this scenario that has a smaller treatment difference between arms than the one from scenario 1 (i.e.,
the maximal treatment difference is scenario 1 is 0.4 and in scenario 2 is 0.667), both RMST based
tests become more powerful than the log-rank test. This conclusion is in line with the conclusions
made by [4].
Figure 4 contains the estimated treatment difference estimations in scenario 2. With a dropout
rate of 1%, we observe that the HR provides a treatment difference across the entire study of 0.67
under proportional hazards, which increases up to 0.81 with a 4 month delay. Regarding the RMST
ratios, the one using the minimax event time provides an estimate of the treatment difference of 0.75
under proportional hazards, increasing up to 0.86 with a 4 month delay. The RMST ratio that uses
the minimax observed time provides an estimated treatment difference of 0.73 under proportional
hazards that increases up to 0.85 with a 4 month delay. The adjusted HR provides a maximal
treatment difference under proportional hazards of 0.67, which increases up to 0.73 with a 4 month
delay. With a dropout rate of 3%, we observe that the HR provides a treatment difference across
the entire study of 0.67 under proportional hazards, which increases up to 0.81 with a 4 month
delay. Regarding the RMST ratios, the one using the minimax event time provides an estimate of

6
the treatment difference of 0.71 under proportional hazards, increasing up to 0.81 with a 4 month
delay. The RMST ratio that uses the minimax observed time provides an estimated treatment
difference of 0.71 under proportional hazards that increases up to 0.81 with a 4 month delay. The
adjusted HR provides a maximal treatment difference under proportional hazards of 0.67, which
increases up to 0.72 with a 4 month delay.
Overall, the simulations performed in this article provide the following conclusions:

• In line with [5], the weighted log-rank test with parameters (ρ = 0, γ = 1) is the method that
provides highest power in a setting with delayed effects.

• In line with the conclusions presented by [8], the adjusted HR that is linked to the weighted
log-rank with parameters (ρ = 0, γ = 1) test captures very well the maximal treatment
difference between two treatment arms in the presence of delayed effects.

• In line with the conclusions from [4], the RMST based test using the minimax observed
time outperforms in terms of power the log-rank test in the presence of delayed effects and
increasing dropout rates. However, the HR yields a treatment difference across the entire
study closer to the maximal treatment difference than the RMST ratios.

• Even though the HR and the RMST ratios do not try to estimate the maximal treatment
difference, when used for this purpose as it is done in current practice, their estimation of the
maximal treatment difference is, by far, not as good as the one provided by the adjusted HR.

Effect delay (% of study duration) Effect delay (% of study duration)

0% 4% 8% 12% 16% 0% 4% 8% 12% 16%
1.0

1.0
0.8

0.8
Empirical power

Empirical power
0.6

0.6
0.4

0.4

dropout rate: 1% dropout rate: 3%

log−rank log−rank
0.2

0.2

RMST (minimax event times) RMST (minimax event times)

RMST (minimax observed times) RMST (minimax observed times)
0.0

0.0

weighted log−rank weighted log−rank

0 1 2 3 4 0 1 2 3 4
Effect delay (months) Effect delay (months)

Figure 1: Empirical power in scenario 1 for the log-rank test, the RMST bases test (minimax event
time), RMST based test (minimax observe time) and the weighted log-rank test with the parameter
combination (ρ = 0, γ = 1) with dropout rates of 1% and 3%.

7
Effect delay (% of study duration) Effect delay (% of study duration)
1.0 0% 4% 8% 12% 16% 0% 4% 8% 12% 16%

1.0
0.8

0.8
Empirical power

Empirical power
0.6

0.6
0.4

0.4
dropout rate: 1% dropout rate: 3%
log−rank log−rank
0.2

0.2
RMST (minimax event times) RMST (minimax event times)
RMST (minimax observed times) RMST (minimax observed times)
0.0

0.0
weighted log−rank weighted log−rank

0 1 2 3 4 0 1 2 3 4
Effect delay (months) Effect delay (months)

Figure 2: Treatment difference estimations in scenario 1 using the HR, the adjusted HR and RMST
ratios with dropout rates of 1% and 3%.

Effect delay (% of study duration) Effect delay (% of study duration)

0% 4% 8% 12% 16% 0% 4% 8% 12% 16%
1.0

1.0
Estimated treatment difference

Estimated treatment difference

0.8

0.8
0.6

0.6
0.4

0.4

dropout rate: 1% dropout rate: 3%

HR HR
0.2

0.2

RMST ratio (minimax event times) RMST ratio (minimax event times)
RMST ratio (minimax observed times) RMST ratio (minimax observed times)
0.0

0.0

Adjusted HR Adjusted HR

0 1 2 3 4 0 1 2 3 4
Effect delay (months) Effect delay (months)

Figure 3: Empirical power in scenario 2 for the log-rank test, the RMST bases test (minimax event
time), RMST based test (minimax observe time) and the weighted log-rank test with the parameter
combination (ρ = 0, γ = 1) with dropout rates of 1% and 3%.

8
Effect delay (% of study duration) Effect delay (% of study duration)
1.0 0% 4% 8% 12% 16% 0% 4% 8% 12% 16%

1.0
Estimated treatment difference

Estimated treatment difference

0.8

0.8
0.6

0.6
0.4

0.4
dropout rate: 1% dropout rate: 3%
HR HR
0.2

0.2
RMST ratio (minimax event times) RMST ratio (minimax event times)
RMST ratio (minimax observed times) RMST ratio (minimax observed times)
0.0

0.0
Adjusted HR Adjusted HR

0 1 2 3 4 0 1 2 3 4
Effect delay (months) Effect delay (months)

Figure 4: Treatment difference estimations in scenario 2 using the HR, the adjusted HR and RMST
ratios with dropout rates of 1% and 3%.

4 Revisiting a real trial with delayed effects

In this section we exemplify the methodology reviewed so far with a real life trial setting [6]. In
this trial, the HR was estimated to be 0.77 and we can observe that the effect does not kick in until
month 15, as we can see in Figure 5. A total of 326 patients were 1:1 randomized to receive either
inotuzumab ozogamicin (inotuzumab ozogamicin group) or standard intensive chemotherapy.
Assuming a 1% dropout rate, the HR is estimated to be 0.77, the HR linked to the weighted
log-rank test takes a value of 0.67, and the RMST ratios take values of 0.80 and 0.87 using the
minimax observed time and minimax event time respectively. In terms of estimated power, we
obtained values of 0.59, 0.75, 0.65 and 0.40 using the log-rank test, the weighted log-rank test, and
the RMST-based tests with minimax observed times and minimax event times respectively.
Assuming a 3% dropout rate, the HR is estimated to be 0.71, the HR linked to the weighted
log-rank test takes a value of 0.56, and the RMST ratios take values of 0.64 and 0.66 using the
minimax observed time and minimax event time respectively. In terms of estimated power, we
obtained values of 0.78, 0.97, 0.94 and 0.90 using the log-rank test, the weighted log-rank test, and
the RMST-based tests with minimax observed times and minimax event times respectively.
Again, we observe the same performance patter as the one observe in the two simulated scenarios.
The weightled log-rank test with parameter values (ρ = 0, γ = 1) achieves the highest power and
the HR linked to this test is the one that better characterizes the treastment effect. We also observe
that the dropout rate has an impact on power and treatment effect estimation. For a dropout rate
of 1%, HR ratio outperforms RMST ratios, but when the dropout rates increases up to 3%, RMST
ratios outperform the HR. The results comparing the HR with the RMST ratio are in line with the
findings made by [4]. However, for both dropout rates, the HR linked to the weighted log-rank test
is the one that performs best.
To do this comparison, we used stepwise exponential distributions with HR = 0.85 from month
0 to month 15, and with HR = 0.1 from month 15 until the end of the trial. These HR values result
in survival curves very simular to the ones showed in Figure 5. Mind that it is out of the scope

9
Figure 5: Overall survival Kaplan-Meier curves of the phase 3 randomized study in patients with
relapsed or refractory, CD22-positive, Philadelphia chromosome (Ph)-positive or Ph-negative acute
lymphoblastic leukemia. A total of 326 patients were 1:1 randomized to receive either inotuzumab
ozogamicin (inotuzumab ozogamicin group) or standard intensive chemotherapy (standard-therapy
group) (source: [6]

of this article to assess the results of this particular clinical trial. Its only purpose is to show the
performance of the methodoly used in this article in a real setting.

5 Discussion
Nowadays, it is quite common to find studies where the proportional hazard assumption does not
hold (i.e, with the use of novel cancer therapies such us targeted therapies or immunotherapies).
However, despite the fact that the HR lacks of interpretability under non proportional hazards, it
is still the standard method to quantify treatment differences.
In this article we present a comparison between the log-rank test, the weighted log-rank test with
parameters (ρ = 0, γ = 1) and RMST based tests, and their linked treatment difference estimates
(i.e., the HR, the adjusted HR and RMST ratios), that are widely used in clinical trials with delayed
effects. This article represents an extension of the work done by [4]. In the mentioned article, a
comparison is done between log-rank and RMST-based tests (and their linked HR and RMST ratios).
This comparison concludes that RMST ratios not only capture better the treatment differences but
also can be interpreted, since they do not rely on the proportional hazards assumption. This
comparison is done in a wide variety of scenarios, including non-proportional hazards. However,
we believe that under non-proportional hazards, the weighted log-rank test and its linked HR are
much more appropriate than both the HR and RMST ratios.
We implement all these methods (i.e., the log-rank test, the weighted log-rank test with paramter
values (ρ = 0, γ = 1), RMST-based test, and their linked HRs and RMST ratios) in two scenarios

10
with delayed effects and different droupout rates. From these simulations we conclude that under
non-proportional hazards scenarios where late separation of survival curves is observed, the RMST-
based test has better performance than the log-rank test in terms of power when the truncation
time is reasonably close to the tail of the observed Kaplan Meier curves. However, the weighted
log-rank test with parameters (ρ = 0, γ = 1) outperforms both RMST-based tests and the log-rank
test. In terms of treatment effect quantification, the HR linked to the weighted log-rank test is the
measurement that performs best under non-proportional hazards.
The estimation of the treatment effect is also a key component of the analysis of a clinical trial.
The RMST-based tests do not rely on any model assumptions and hence the interpretation is still
straightforward. In contrast, the HR varies with time, and its value cannot be interpreted as the
average HR across times. The RMST can capture the entire event-free distribution and hence is
able to provide a clinically meaningful summary of the group differences in a randomized study.
However, we believe that the HR linked to the weighted-log rank test also provides a good summary
of the group differences by giving the maximal treatment difference observed along the entire trial,
which can be easily interpreted under non-proportional hazards. Plus, it does not require to specify
any truncation time unlike the RMST ratio. From our point of view this a clear advantage with
respect to the RMST ration because if we design a study with the RMST as the primary analysis
powered to detect a meaningful difference of 2 RMSTs, the selection of the truncation time cannot
be based on the minimax event time or minimax observed time when data is not available. Instead,
this truncation time should be a fixed timepoint. This time window has to be large enough and
expected to capture most of the survival curves for the RMST to be used as an adequate global
summary statistic. However, we believe that the maximal treatment difference is only useful in
scenarios where there is a late separation between Kaplan-Meier curves. It would not make sense
to provide this measurent for example in scenarios with crossing Kaplan Meier curves.
Therefore, under non-proportional hazard with late separation we agree with [4] when saying
that the RMST curves as well as the related ratios are easy to interpret and are clinically meaningful
to characterize the treatment effect over time and has a clear advantage over the HR and the log-
rank test. However, we have shown that the weighted log-rank test with parameters (ρ = 0, γ = 1)
outperforms the RMST-based tests in terms of power and its linked HR provides a treatment
difference summary that can be also very useful under the presence of delayed effects.

Software and data sharing

The R code used in the article is available at https://github.com/jjimenezm1989.

Disclaimer
The views and opinions expressed in this article are those of the author and do not necessarily
reflect the official policy or position of Novartis Pharma A.G.

References
[1] Per Kragh Andersen and Richard D Gill. Cox’s regression model for counting processes: a
large sample study. The annals of statistics, pages 1100–1120, 1982.

[2] David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series
B (Methodological), 34(2):187–202, 1972.

11
[3] Thomas R Fleming and David P Harrington. A class of hypothesis tests for one and two
sample censored survival data. Communications in Statistics-Theory and Methods, 10(8):763–
794, 1981.

[4] Bo Huang and Pei-Fen Kuan. Comparison of the restricted mean survival time with the hazard
ratio in superiority trials with a time-to-event end point. Pharmaceutical statistics, 17(3):202–
213, 2018.

[5] José L Jiménez, Viktoriya Stalbovskaya, and Byron Jones. Properties of the weighted log-rank
test in the design of confirmatory studies with delayed effects. Pharmaceutical statistics, 2018.

[6] Hagop M Kantarjian, Daniel J DeAngelo, Matthias Stelljes, Giovanni Martinelli, Michaela
Liedtke, Wendy Stock, Nicola Gökbuget, Susan O’Brien, Kongming Wang, Tao Wang, et al.
Inotuzumab ozogamicin versus standard therapy for acute lymphoblastic leukemia. New Eng-
land Journal of Medicine, 375(8):740–753, 2016.

[7] John Lawrence. Strategies for changing the test statistic during a clinical trial. Journal of
biopharmaceutical statistics, 12(2):193–205, 2002.

[8] Ray S Lin and Larry F León. Estimation of treatment effects in weighted log-rank tests.
Contemporary clinical trials communications, 8:147–155, 2017.

[9] Patrick Royston and Mahesh KB Parmar. Restricted mean survival time: an alternative to
the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome.
BMC medical research methodology, 13(1):152, 2013.

[10] David Schoenfeld. The asymptotic properties of nonparametric tests for comparing survival
distributions. Biometrika, 68(1):316–319, 1981.

[11] David A Schoenfeld et al. Sample-size formula for the proportional-hazards regression model.
Biometrics, 39(2):499–503, 1983.

[12] Wu Haiyan Wang, Yang and Keaven Anderson. Nphsim: Simulation and power calculations
for time-to-event clinical trials. https://github.com/keaven/nphsim/.

Handbook of Statistics in Clinical Oncology John Crowley PDF Download
100% (2)
Handbook of Statistics in Clinical Oncology John Crowley PDF Download
55 pages
Randomized Phase II Cancer Clinical Trials 1st Edition Sin-Ho Jung PDF Download
100% (2)
Randomized Phase II Cancer Clinical Trials 1st Edition Sin-Ho Jung PDF Download
45 pages
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
No ratings yet
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
206 pages
Survival Analysis: A Self-Learning Text, Third Edition (Statistics For Biology and Health) .
100% (27)
Survival Analysis: A Self-Learning Text, Third Edition (Statistics For Biology and Health) .
23 pages
Medical Statistics
No ratings yet
Medical Statistics
334 pages
ACR 2021 TXIT Exam - Assembled
No ratings yet
ACR 2021 TXIT Exam - Assembled
180 pages
Guía Detallada ESMO MCBS
No ratings yet
Guía Detallada ESMO MCBS
41 pages
Survival Analysis
No ratings yet
Survival Analysis
28 pages
Beyond Fixed Restriction Time Adaptive Restricted Mean Survival Time Methods in Clinical Trials
No ratings yet
Beyond Fixed Restriction Time Adaptive Restricted Mean Survival Time Methods in Clinical Trials
43 pages
Survival Analysis - Lecture 3
No ratings yet
Survival Analysis - Lecture 3
72 pages
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
No ratings yet
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
206 pages
PM510: Principles of Biostatistics: V. Combining 2x2 Tables: The Mantel-Haenszel Method and Survival Analysis
No ratings yet
PM510: Principles of Biostatistics: V. Combining 2x2 Tables: The Mantel-Haenszel Method and Survival Analysis
38 pages
Two Way Anova
No ratings yet
Two Way Anova
12 pages
4.1. Major Procedures: 4. Comparison of Survival Curves: Hypotheses Testing
No ratings yet
4.1. Major Procedures: 4. Comparison of Survival Curves: Hypotheses Testing
32 pages
Statistics in Clinical Oncology
No ratings yet
Statistics in Clinical Oncology
563 pages
02 MG JSM StatReg
No ratings yet
02 MG JSM StatReg
38 pages
Restricted Mean Survival Time: An Alternative To The Hazard Ratio For The Design and Analysis of Randomized Trials With A Time-To-Event Outcome
No ratings yet
Restricted Mean Survival Time: An Alternative To The Hazard Ratio For The Design and Analysis of Randomized Trials With A Time-To-Event Outcome
15 pages
Survival Analysis Theory 2024-4
No ratings yet
Survival Analysis Theory 2024-4
49 pages
Nominal Variables Tests and Outcome Measures - Lecture 4
No ratings yet
Nominal Variables Tests and Outcome Measures - Lecture 4
39 pages
Survival Analysis
No ratings yet
Survival Analysis
44 pages
6.2) Survival Analysis (Logrank Test 22)
No ratings yet
6.2) Survival Analysis (Logrank Test 22)
22 pages
A Conceptual Approach To Survival Analysis
No ratings yet
A Conceptual Approach To Survival Analysis
107 pages
Wang Schaubel 2018 RMST Modelling
No ratings yet
Wang Schaubel 2018 RMST Modelling
24 pages
Across t2d - Statistics 2 Fundamental Statistics
No ratings yet
Across t2d - Statistics 2 Fundamental Statistics
37 pages
Solutions Set2
No ratings yet
Solutions Set2
2 pages
Uses and Limitations of The Restricted Mean Survival Time: Illustrative Examples From Cardiovascular Outcomes and Mortality Trials in Type 2 Diabetes
No ratings yet
Uses and Limitations of The Restricted Mean Survival Time: Illustrative Examples From Cardiovascular Outcomes and Mortality Trials in Type 2 Diabetes
13 pages
Lecture Three
No ratings yet
Lecture Three
28 pages
Survival Analysis
No ratings yet
Survival Analysis
25 pages
Pages From (Statistics For Biology and Health) David G. Kleinbaum, Mitchel Klein (Auth.) - Survival
No ratings yet
Pages From (Statistics For Biology and Health) David G. Kleinbaum, Mitchel Klein (Auth.) - Survival
10 pages
Ni Hms 595225
No ratings yet
Ni Hms 595225
16 pages
Tian 2014 RMST Estimation
No ratings yet
Tian 2014 RMST Estimation
12 pages
Logrank Tests (Lachin and Foulkes)
No ratings yet
Logrank Tests (Lachin and Foulkes)
13 pages
Dissertation Cox Regression
100% (2)
Dissertation Cox Regression
5 pages
CH Survival Analysis
No ratings yet
CH Survival Analysis
11 pages
Stat Module 2 Q3
No ratings yet
Stat Module 2 Q3
21 pages
Applied Statistics Survival Analysis
No ratings yet
Applied Statistics Survival Analysis
23 pages
Strategies For Power Calculations in Predictive Biomarker Studies in Survival Data
No ratings yet
Strategies For Power Calculations in Predictive Biomarker Studies in Survival Data
9 pages
Abstract
No ratings yet
Abstract
7 pages
Schaubel Wei 2011 Double Inverse Weights
No ratings yet
Schaubel Wei 2011 Double Inverse Weights
10 pages
Single Arm Phase II Survival Trial Design, 1st Edition ISBN 0367653494, 9780367653491 Complete Digital Book
No ratings yet
Single Arm Phase II Survival Trial Design, 1st Edition ISBN 0367653494, 9780367653491 Complete Digital Book
17 pages
Longitudinal EBM-CAS Introduction To Survival Analysis and Log-Rank Test-Dr. Vicka Oktaria, MPH, PH.D (2023)
No ratings yet
Longitudinal EBM-CAS Introduction To Survival Analysis and Log-Rank Test-Dr. Vicka Oktaria, MPH, PH.D (2023)
24 pages
PIIS0168827807001328
No ratings yet
PIIS0168827807001328
8 pages
D11 - Meta Analysis
No ratings yet
D11 - Meta Analysis
8 pages
The Wilcoxon Rank Sum Test or Wilcoxon Two Sample Test
No ratings yet
The Wilcoxon Rank Sum Test or Wilcoxon Two Sample Test
11 pages
Evaluating Prognosis Answer
No ratings yet
Evaluating Prognosis Answer
5 pages
Survival Analysis
No ratings yet
Survival Analysis
5 pages
HP425 Seminar 3: Non-Parametric Survival Analysis I
No ratings yet
HP425 Seminar 3: Non-Parametric Survival Analysis I
3 pages
Non-Inferiority Tests For Two Survival Curves Using Cox's Proportional Hazards Model
No ratings yet
Non-Inferiority Tests For Two Survival Curves Using Cox's Proportional Hazards Model
10 pages
T 3 T 4
No ratings yet
T 3 T 4
2 pages
Mann Whitney Example
No ratings yet
Mann Whitney Example
30 pages
Biostatistics L11+12 2021
No ratings yet
Biostatistics L11+12 2021
9 pages
Part 15 PDF
No ratings yet
Part 15 PDF
7 pages
1073 Full
No ratings yet
1073 Full
1 page
1073 Full
No ratings yet
1073 Full
1 page
Primary Composite Endpoints Are The Main Measurements For A Trial They Answer The Most Important
No ratings yet
Primary Composite Endpoints Are The Main Measurements For A Trial They Answer The Most Important
2 pages
Survival Analysis Presentation
No ratings yet
Survival Analysis Presentation
18 pages
Quality Control
No ratings yet
Quality Control
19 pages
Biostatistics 203. Survival Analysis: Yhchan
No ratings yet
Biostatistics 203. Survival Analysis: Yhchan
8 pages
Exercises Big o
No ratings yet
Exercises Big o
59 pages
Statistical Methods For Machine Learning
No ratings yet
Statistical Methods For Machine Learning
272 pages
Randomized Algosnotes
No ratings yet
Randomized Algosnotes
362 pages
Statistics and Probability Theory MTH-262
No ratings yet
Statistics and Probability Theory MTH-262
38 pages
F Distribution Tables
100% (1)
F Distribution Tables
5 pages
Ppt-21mab204t - Unit III Two Dimensional Rvs
No ratings yet
Ppt-21mab204t - Unit III Two Dimensional Rvs
107 pages
Probabilistic Identification o F Keyblocks in Rock Excavations PDF
No ratings yet
Probabilistic Identification o F Keyblocks in Rock Excavations PDF
250 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
11 pages
MODULE 1 - Random Variables and Probability Distributions
No ratings yet
MODULE 1 - Random Variables and Probability Distributions
12 pages
Lecture 15
No ratings yet
Lecture 15
37 pages
Given The Learning Materials and Activities of This Chapter, They Will Be Able To
No ratings yet
Given The Learning Materials and Activities of This Chapter, They Will Be Able To
14 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Test Bank Questions Chapters 1 and 2
No ratings yet
Test Bank Questions Chapters 1 and 2
3 pages
EViews Help - Unit Root Tests With A Breakpoint
No ratings yet
EViews Help - Unit Root Tests With A Breakpoint
15 pages
A Unified View of Performance Metrics: Translating Threshold Choice Into Expected Classification Loss
No ratings yet
A Unified View of Performance Metrics: Translating Threshold Choice Into Expected Classification Loss
57 pages
Random - WORKSHEET
No ratings yet
Random - WORKSHEET
9 pages
Submitted To The Annals of Probability
No ratings yet
Submitted To The Annals of Probability
60 pages
Numerical Integration in Celestial Mechanics: A Case For Contact Geometry
No ratings yet
Numerical Integration in Celestial Mechanics: A Case For Contact Geometry
31 pages
Schubert Structure Operators and K: in Loving Memory of Our Friend Bert Kostant
No ratings yet
Schubert Structure Operators and K: in Loving Memory of Our Friend Bert Kostant
29 pages
Assignment Report - Group A
No ratings yet
Assignment Report - Group A
31 pages
Statistics 191: Introduction To Applied Statistics: Simple Linear Regression: Diagnostics
No ratings yet
Statistics 191: Introduction To Applied Statistics: Simple Linear Regression: Diagnostics
25 pages
If Radius: X Y F 1 X X Y F 1 X X Y D F 1
100% (1)
If Radius: X Y F 1 X X Y F 1 X X Y D F 1
14 pages
Cylindrical Type Integrable Classical Systems in A Magnetic Field
No ratings yet
Cylindrical Type Integrable Classical Systems in A Magnetic Field
26 pages
The Group Aut and Out of The Fundamental Group of A Closed Sol 3-Manifold
No ratings yet
The Group Aut and Out of The Fundamental Group of A Closed Sol 3-Manifold
40 pages
Lucas Atoms: Bruce E. Sagan and Jordan Tirrell
No ratings yet
Lucas Atoms: Bruce E. Sagan and Jordan Tirrell
23 pages
Actuarial Statistics I
No ratings yet
Actuarial Statistics I
4 pages
0 N 0 1 0 F 0 N 0 N 0
No ratings yet
0 N 0 1 0 F 0 N 0 N 0
32 pages
HL Prob PR Q
No ratings yet
HL Prob PR Q
14 pages
Numerical Analysis of A Semilinear Fractional Diffusion Equation
No ratings yet
Numerical Analysis of A Semilinear Fractional Diffusion Equation
30 pages
Limit Theorems For Generalized Density-Dependent Markov Chains and Bursty Stochastic Gene Regulatory Networks
No ratings yet
Limit Theorems For Generalized Density-Dependent Markov Chains and Bursty Stochastic Gene Regulatory Networks
29 pages
Outage Analysis of Cooperative NOMA Using Maximum Ratio Combining at Intersections
No ratings yet
Outage Analysis of Cooperative NOMA Using Maximum Ratio Combining at Intersections
18 pages
High-Order Partitioned Spectral Deferred Correction Solvers For Multiphysics Problems
No ratings yet
High-Order Partitioned Spectral Deferred Correction Solvers For Multiphysics Problems
25 pages
Kalman Filter Implementation: First Part of Implementation
No ratings yet
Kalman Filter Implementation: First Part of Implementation
10 pages
Equivalence Groupoid of A Class of General Burgers-Korteweg-de Vries Equations With Space-Dependent Coefficients
No ratings yet
Equivalence Groupoid of A Class of General Burgers-Korteweg-de Vries Equations With Space-Dependent Coefficients
15 pages
On Perfectness in Gaussian Graphical Models: Arash A. Amini, Bryon Aragam, Qing Zhou
No ratings yet
On Perfectness in Gaussian Graphical Models: Arash A. Amini, Bryon Aragam, Qing Zhou
15 pages
Sensor Switching Control Under Attacks Detectable by Finite Sample Dynamic Watermarking Tests
No ratings yet
Sensor Switching Control Under Attacks Detectable by Finite Sample Dynamic Watermarking Tests
13 pages
ExamW08 PDF
No ratings yet
ExamW08 PDF
18 pages
D D D D D D D D
No ratings yet
D D D D D D D D
20 pages
Local Topology of A Deformation of A Function-Germ With A One-Dimensional Critical Set
No ratings yet
Local Topology of A Deformation of A Function-Germ With A One-Dimensional Critical Set
20 pages
Learning Activity Sheets: Quarter 3, Week 3 and 4
No ratings yet
Learning Activity Sheets: Quarter 3, Week 3 and 4
7 pages
Eva Bayer-Fluckiger and Jean-Pierre Serre
No ratings yet
Eva Bayer-Fluckiger and Jean-Pierre Serre
9 pages
An Extreme Amplitude, Massive Heartbeat System in The LMC Characterized Using ASAS-SN and TESS
No ratings yet
An Extreme Amplitude, Massive Heartbeat System in The LMC Characterized Using ASAS-SN and TESS
8 pages
Rate of Convergence of Uniform Transport Processes To Brownian Sheet
No ratings yet
Rate of Convergence of Uniform Transport Processes To Brownian Sheet
8 pages
A Note About Online Nonrepetitive Coloring K-Trees: Bal Azs Keszegh Xuding Zhu September 9, 2019
No ratings yet
A Note About Online Nonrepetitive Coloring K-Trees: Bal Azs Keszegh Xuding Zhu September 9, 2019
8 pages
Turner - Durbin-Watson Test Paper Sept 2019 Revised Version
No ratings yet
Turner - Durbin-Watson Test Paper Sept 2019 Revised Version
13 pages
1909 01984
No ratings yet
1909 01984
6 pages
Identifying Berwald Finsler Geometries: II III IV V
No ratings yet
Identifying Berwald Finsler Geometries: II III IV V
6 pages
Practice Examination 1
No ratings yet
Practice Examination 1
6 pages
Nama: Agun Muliyadi Aritonang NIM: 4183230012 Kelas: PSM B 2018 Mata Kuliah: Analisis Runtun Waktu (UTS)
No ratings yet
Nama: Agun Muliyadi Aritonang NIM: 4183230012 Kelas: PSM B 2018 Mata Kuliah: Analisis Runtun Waktu (UTS)
6 pages
Lampiran 14: Uji Normalitas Data: One-Sample Kolmogorov-Smirnov Test
No ratings yet
Lampiran 14: Uji Normalitas Data: One-Sample Kolmogorov-Smirnov Test
3 pages
Chi-Square Practical
No ratings yet
Chi-Square Practical
3 pages
Rabi Maize 2016-17
No ratings yet
Rabi Maize 2016-17
4 pages
269IMG
No ratings yet
269IMG
8 pages
2CSDE64 Information Theory and Coding
No ratings yet
2CSDE64 Information Theory and Coding
2 pages
First Hitting Time Regression Models: Lifetime Data Analysis Based on Underlying Stochastic Processes
From Everand
First Hitting Time Regression Models: Lifetime Data Analysis Based on Underlying Stochastic Processes
Chrysseis Caroni
No ratings yet
Complementary and Alternative Medical Lab Testing Part 17: Oncology
From Everand
Complementary and Alternative Medical Lab Testing Part 17: Oncology
Ronald Steriti
No ratings yet
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet

Quantifying Treatment Differences in Confirmatory Trials With Delayed Effects

Uploaded by

Quantifying Treatment Differences in Confirmatory Trials With Delayed Effects

Uploaded by

Quantifying treatment differences in confirmatory trials with

f1 (t) = λ exp(−λt), S1 (t) = exp(−λt) and h1 (t) = λ,

2.1 Weighted log-rank test

wti = Ŝ(ti )ρ (1 − Ŝ(ti ))γ , (3)

2.2 Hazard ratio linked to the weighted log-rank test

2.3 The restricted mean survival time

µ̂(t∗ ) approximately follows a normal distribution with variance

Effect delay (% of study duration) Effect delay (% of study duration)

dropout rate: 1% dropout rate: 3%

RMST (minimax event times) RMST (minimax event times)

weighted log−rank weighted log−rank

Effect delay (% of study duration) Effect delay (% of study duration)

Estimated treatment difference

dropout rate: 1% dropout rate: 3%

Estimated treatment difference

4 Revisiting a real trial with delayed effects

Software and data sharing

You might also like