SlideShare a Scribd company logo
PhUSE 2008
1
Paper ST03
A gentle introduction to survival analysis
Angelo Tinazzi, SENDO-Tech S.r.l., Milan, Italy
Martin Scott, Numerus Ltd, Tuebingen, Germany
Anna Compagnoni, SENDO-Tech S.r.l., Milan, Italy
ABSTRACT
Time-to event analyses are commonly used to analyze a clinical trial. Survival Analysis is a necessary tool to analyze
the time to the occurrence of an event of interest, which can be either adverse, positive or neutral. Analysis of survival
data requires special techniques because the event of interest does not necessarily occur for all patients before the
end of the study. Survival methodology has the advantage of being able to use all available data to estimate the
survival probability, including data derived from patients who did not experience an event.
This paper is a tutorial addressed to statistical programmers who are required to produce derived datasets containing
time-to-event endpoint variables and summary tables containing survival data estimates (e.g. median survival). The
paper is composed of two main sections.
In the first part, the main elements of survival analysis theory will be introduced:
• the Kaplan-Meier method for survival probability estimation;
• the log-rank test;
• the hazard ratio calculation;
• the interpolation of a Cox regression model.
In the second part, the main programming aspects of survival analysis will be discussed:
• creating derived datasets containing time-to-event endpoints;
• working with longitudinal data (repeated measures, such as follow-up);
• the use of the main SAS
©
procedure for survival analysis (e.g. proc LIFETEST and proc PHREG);
• extracting the most relevant information (e.g. median survival) from a SAS output using the ODS for creating
ad-hoc summary tables.
INTRODUCTION
In many clinical studies, either experimental or observational, the time until the occurrence of an event is the main
outcome of interest; such an event can be either adverse, such as death or disease progression or recurrence of a
tumor, or positive, such as birth or disease remission, or neutral, such as cessation of breast feeding or the completion
of a task. In all these cases, it is conventional to talk about survival endpoints; the term survival may be misleading
because the techniques are applicable to any well-defined event although traditionally death was the event of interest.
Events in survival analysis are defined by a ‘transition’ from one discrete state to another at an instantaneous moment
in time.
The typical feature of this kind of data is that not all the subjects will have experienced the event at the end of the
follow-up period. This is called censoring, meaning that the observation period ended without observing the event of
interest, either because the subject has not yet experienced the event by the end of follow-up period, or because the
patient is lost to follow-up, or because the subject experiences a different event that makes further follow-up
impossible.
It is not possible in these cases to know if and when the subject would have experienced the event and we refer to this
as right censoring meaning that a subject’s follow-up stops before observing the event; while left censoring refers to
when the event has occurred before the subject has entered the study.
Classical statistical methods, such as T-test or Chi-square Test, are not appropriate to analyze survival data because
they are not conceived to take censored data into account. All subjects (including those with censored observations)
provide useful information for the trial, and should therefore not be excluded from the analysis. On the other hand,
considering the censoring time as a true survival time for these subjects would lead to a biased estimate of their
survival time and probability. For these reasons there is the need to use a set of specific statistical methods to handle
survival data.
To make an example, consider the following two clinical study endpoints:
the risk of dying in Intensive Care Unit (ICU) within 30 days from the admission;
PhUSE 2008
2
survival rate of advanced breast cancer patients at 5 years from the diagnosis date.
In the first case you can describe the endpoint results by using a simple % rate (events/totals), and eventually using the
chi-square test, for two main reasons:
the limited length of required observation period (30 days): there is no risk for loosing patient;
at the end of the 30 days period, a patient can be either dead or alive (discharged or not).
In the second case it is more appropriate to use a survival analysis approach for the following reasons:
with a required length of follow-up (5 years), there is an high risk to loose patient (lost to follow-up);
usually this kind of study requires a longer accrual period (sometimes varying from 2 to 5 years), therefore if
you perform an interim analysis not all the patients had enough time to reach (alive) the 5 years survival
endpoint;
the time when the event has been observed (e.g. 1 year vs 3 years) with this “chronic” disease, make a
substantial difference, therefore simply checking how many patients died in the 0-5 years period and
eventually removing lost to follow-up patients, it is not appropriate.
INCOMPLETE DATA
In clinical practice not all the subjects are enrolled at the same time and thus the follow-up period can vary from one
subject to another. For these reasons it is suitable to convert calendar-time data to time-to-event data, which is a more
appropriate format to perform survival analysis.
The starting point is set to the same (e.g. entry into study or diagnosis time, will be the beginning point of the study
defined as time=0) for all enrolled subjects, and all the possible outcomes are coded depending on the event of
interest.
Figure 1: Example of Survival Data.
R = Relapse; D = Death from ovarian cancer; Do = Death from other cause;
A = Alive; L = Loss to Follow-up; x = Death; = Censored
PART I: STATISTICAL METHODS
MAIN ASSUMPTIONS
To ensure a correct estimate of the survival probability in presence of censored data, three main assumptions are
necessary.
First, we assume that subjects who are censored have the same survival distribution as those who continue the
observation during the study: that is to say, the censoring process is independent of the primary endpoint. If the reason
for censoring is due to negative treatment effects or to the worsened subject condition, this censoring would lead to a
biased estimate of the survival probability.
The second assumption derives from the fact that subjects who first joined the study have a longer observation period
and thus have more chances to experience the event than subjects enrolled at the end of recruitment period. We
assume that survival probability is the same for subjects enrolled at the beginning and at the end of recruitment period.
For some kind of events of interest, like tumor recurrence or progression, it is not possible to identify the exact date
PhUSE 2008
3
when the event happens, because the only information we would have is the event happened between two subsequent
examinations. In this case we need to assume that the event happens at the time it is detected, otherwise we would
obtain a biased estimate of the survival times.
CALCULATION OF SURVIVAL PROBABILITY: THE KAPLAN-MEIER METHOD
Survival data are described in terms of two probabilities: survival and hazard.
The first one is the probability for an individual to survive from the starting time until a specified future time t.
To estimate the proportion of subjects surviving at a given time point, and hence the survival probability to that time for
the generic population from which the sample is extracted, the Kaplan-Meier method [1][2][3][4][5][6][7], also called
product-limit estimator, is commonly used, which allows to deal with censored information.
The simple idea underlying this method is that to survive to k intervals from the start of the study, it is necessary to
survive to each one of the previous intervals, and then also to the k
th
one. This principle allows working with conditional
and cumulative probabilities.
The conditional survival probability for the generic time interval (on the condition that the subject was survivor at the
beginning of the interval and then to all the previous) is given by:
i
ii
i
r
dr
p
−
=
where r is the number alive and at risk at the beginning of the i
th
interval, and d the number of failures during the same
interval. Observations censored at any given time influence the number at risk at the start of the subsequent time
interval.
Survival at any time point is calculated as product of the conditional probabilities of surviving each previous time
interval.
∏≤
−=
ti i
i
r
d
tS )1()(
Figure 2 is an example of survival probability calculation, derived from a SAS output referred to time to progression
data (time expressed in weeks).
PhUSE 2008
4
Figure 2: Survival probabilities Calculation
At week 0, all enrolled subjects (17) are alive and at risk to experience the event of interest (progression). Survival
probability is thus set to 1
1
.
At week 6, a censoring occurs (but number failed is still 0): it does not modify the survival and failure probabilities, but
the number of subjects at risk for the subsequent interval decreases by one (number left = 16).
At week 10, two events occur, thus all the statistics are modified: the number failed increases by two, while the number
left still at risk for the subsequent interval reduces to 14. Survival probability for the interval [time 0 to time 10] is then
equal to (16-2)/16=0.8750.
A new event occurs at week 16, together with two censored observations; the survival probability for the interval [week
10 to week 16] is equal to (14-1)/14=0.9286. To calculate the cumulative probability for a subject to survive from the
start of the study until the week 16 it is now necessary to cumulate the survival probabilities for each time interval.
Cumulative survival probability until time 16 is computed as 0.9286 x 0.8750 = 0.8125.
At week 17 a new event occurs: the survival probability for the time interval [week 16 to week 17] is computed as (11-
1)/11=0.9091.
Cumulating all the previous single probabilities, we can easily obtain the probability for a subject enrolled at the
beginning of the study to survive without experiencing the event until week 17: 0.9091 x 0.9286 x 0.8750 = 0.7387.
The same process can be followed until week 44 when no more events are available and the survival probability can’t
be modified anymore.
SURVIVAL PROBABILITIES CALCULATION: THE LIFETABLE METHOD
In some situations it is not possible to know or to record the exact time when the event occurs; the only available
information is that the event occurred in a certain time interval. In this case the survival probability cannot be updated
every time an event happens, but once in a fixed time unit: the derived survival curve and probability are called life-
table or actuarial estimates.
With this method, the whole period of observation is divided into a series of time intervals, where the basic assumption
is that censorings are uniformly distributed in each interval, so that the average number of subjects at risk during each
interval is given by:
2
' j
jj
c
nn −=
Where n’j is the number of subjects alive and at risk at the beginning of the time interval j and cj is the number of
censored observations in the same interval.
As previously seen with the Kaplan-Meier method, the cumulative probability to survive until the time t is given by the
product of the probabilities to survive to each previous time interval and also to the last one including t.
∏≤
−=
tj j
j
i
n
d
tS )1()( '
dj is the number of deaths or events observed in the time interval j.
THE GRAPHICAL REPRESENTATION
Survival probability can be plotted against survival time, obtaining a survival curve: the survival curve is a non-
increasing function with a step corresponding at each time when an event is observed; the step is used since the
estimated survival curve remains at a plateau between successive patient event time and it drops instantaneously at
each time of event to a new level. Times when a censoring happens are usually indicated with a small circle or a short
vertical line. The plot will never reach ‘0’ if the patient with longest observed time has in fact died. Were such a patient
still alive then the curve would have a plateau commencing at the time of the last event and continuing until the
censored observed time of this longest ‘surviving’ patient.
In certain circumstances a graph of 1-S(t), rather than S(t), is plotted against time to give the cumulative ‘event’ curve.
From the graphical representation of a survival curve it is possible to extract relevant summary statistics, mainly the
median survival time and the survival probability at a certain time point.
The median survival time is defined as the time point when half of the subjects have experienced the event of interest,
and can be easily determined by drawing a line (the red line in Figure 3) starting from the value 0.5 on the vertical axis
until you reach the survival curve, and then from the curve till the horizontal time axis. The exact value of the median
1
It can be also expressed as % (see figure 6)
PhUSE 2008
5
survival time can be calculated with a proportion involving the length of the axis and the maximum follow-up time
observed. Similarly you can extract the median from the estimates (figure 2) by searching for the first estimate below
0.50; in the case the 0.50 (or below) is not reached, we say “the median of survival has been not yet reached”. This
latter situation, can happen when cohorts with low event rate or in the case of early (interim) analysis. Again in the
same way you can extract other percentiles of interest, typically the 25% and the 75%, by searching the point in which
the curve reach the point of interest (e.g. .25 and .75).
The survival probability at a certain time point (e.g. 20 months, Figure 3) can be calculated in a similar way: as shown
by the blue line in Figure 3, it is necessary to draw a line from the chosen point of the horizontal time axis to the
survival curve, and then until the vertical survival probability axis. Once again, the exact probability can be deduced
using a proportion involving the length of the vertical axis and the maximum value of the survival probability.
It is difficult to judge precisely when the right-hand tail of the survival curve becomes unreliable. However, as rule of
thumb, the curve can be particularly unreliable when the number of patients remaining at risk is less than 15 or more in
general less than 5%.
Another important statistic you may be required to provide, is the median of follow-up. The median of follow-up is an
indicator of how ‘mature’ are your data by giving an estimate of the length of the observation period (e.g. how many
months on ‘average’ the patients have been followed since they have been enrolled). To calculate the median of follow-
up we can use survival analysis approach by censoring patients who experienced the event. The reason for that is
because you don’t know how long you could follow a patient if this did not experience the event.
Figure 3: Survival Curve with Censored Observations: Median Survival (red line) Time and
Survival Probability at a Fixed Time Point (blue line).
COMPARING SURVIVAL CURVES
An interesting issue in the clinical practice is the comparison of the survival functions of two (or more) groups of
subjects.
Classical statistical methods such as T-test or Chi-square Test are not appropriate to compare the proportion of
survivors for the reasons described before.
Alternatively, one could compare the proportions surviving at one or more pre-specified time points, but this approach
ignores the total survival experience of the groups during the whole follow-up period. Moreover, the time points are
arbitrary chosen.
The most common method used to compare survival curves is then a statistical hypothesis test called log-rank test [8],
which takes the whole follow-up period into account and does not require any assumption about the distribution of the
survival function, which is rarely normally distributed and is often skewed because it often comprises many early events
and only a few late ones.
The null hypothesis for the log-rank test is that there is no difference between the survivals of two or more populations
that are being compared (i.e. the probability of the event of interest occurring at any time point is the same for each
The median survival
is about 63 months
At 60 months the %
survival is about .58
PhUSE 2008
6
population).
The comparison is based on the difference between the observed number of events in each group and the expected
number of events in case of non-difference between the two groups.
−
=−
g g
gg
rank
E
EO 2
log
2
)(
χ
where O is the number of observed events in each group g, and E is the total number of expected events in each group
g. O and E are calculated for each time when an event happens; if a survival time is censored, then the subject is
considered to be at risk during the interval of censoring, but not anymore for the subsequent intervals.
The test statistic is then compared with a
2
with g-1 degrees of freedom.
The log-rank test assigns equal weight to each event at whatever time it occurs.
If researchers believe that differences between certain parts of the survival curves being compared are of greater
interest than others, then different weights can be assigned to the deaths, depending on the time when they happen.
For example, if it were known that a new treatment is helpful to avoid early mortality, then it would be useful to compare
the first part of the survival curves, without paying too much attention to the subsequent time intervals.
The best way to achieve this aim is to construct a statistical test using as weights the number of patients at risk at each
time point: this test is a variant of the log-rank test and it is known as Gehan or generalized Wilcoxon or Breslow test
and can be calculated as follows:
−
=
tg gtt
gtgtt
G
ER
EOR
,
2
2
2
)(
χ
where R is the number of patients at risk in each group and for each time point.
Note that these are tests of significance, and do not provide an estimate of the size of the differences in survival
probabilities between the groups.
COX REGRESSION MODEL AND HAZARD RATIO
While with previous described methods, either the Kaplan-Meier or the Life-Tables method, the survival data are
analyzed using an univariate approach.
To approach survival analysis in a multivariate way, the Cox model is used [9][10]. The Cox model a semi-parametric
regression model used to test the effect of a set of covariates X1, X2, …, Xn on the time-to-event variable; the model for
the hazard function is the following:
{ }nn xxxtth βββλ +++⋅= ...exp)()( 2211
where h(t) represents the hazard function and (t) is an unspecified initial hazard function.
The hazard is a function of time and represents the instantaneous probability that the event of interest will occur at a
specified time t given that it has not occurred prior to time t.
Hazard is defined as follows:
t
tNtentsobservedev
t ∆→∆
)(/)(
lim
0
Where N is the number of subjects at risk at the beginning of the time interval.
The Cox model (that, as with the Kaplan-Meier method, also adjusts for censored data) can include one or more
numeric covariates (which can be also time-dependent) and does not make any assumption on the distribution of the
event times.
The main assumption of the Cox proportional hazard regression model is that the hazard ratio between the treatment
groups remains constant over time, even if, as it is reasonable to assume, the hazard rate does change over time.
D. R. Cox, after whom the model is named, developed a method to estimate the coefficients i based on a maximal
partial likelihood, which is a modification of the maximum likelihood method.
Chi-square tests are performed to evaluate the null hypothesis of non-significance of the i parameters. A positive
parameter estimate indicates that the hazard increases with increasing values of the covariate, while a negative
estimate shows that the hazard decreases with increasing values of the covariate
The magnitude of the effect of a specific covariate on the event time is often expressed as hazard ratio [11]: this is an
estimate of the relative survival experience of two groups and can be calculated as
PhUSE 2008
7
cc
ee
EO
EO
/
/
If the hazard ratio indicates a beneficial effect of the treatment on study, this implies that the time to the observed
endpoint was modified by treatment.
The hazard ratio derived from the Cox regression model does not translate directly into information about the duration
of time until the event occurs, but it can be converted, anyway, in terms of absolute risk reduction, showing the
difference between the risk in the control group (set to 1) and the risk in the treated group (that is hazard ratio itself).
Absolute risk reduction is defined thus as 1 – HR, and can be expressed either as a probability or as a percentage.
PART II: STATISTICAL PROGRAMMING ASPECTS
ILLUSTRATIVE STUDY: OVARIAN CANCER STUDY
This data set refers to a multicenter, phase III, randomized clinical trial involving women with epithelial ovarian
carcinoma randomized to either receive a more (Lymphadenectomy) or less (No Lymphadenectomy or Surgery Only)
invasive surgery; the main end-points of the study included progression-free and overall survival [12].
427 eligible patients were enrolled: 211 were randomly assigned to the surgery only arm, while 216 to the
lymphadenectomy arm.
After a median follow-up of 68.4 months, tumor recurrence was observed in 292 patients (153 in the surgery alone arm,
and 139 in the lymphadenectomy arm).
Figure 4 shows the survival curve for these data.
Figure 4: Result of survival analysis for the ovarian cancer trial.
The Number of patients still at risk at each time point for the treated and control groups are shown below the horizontal
axis; you can also decide to add the number of events occurred in each interval (e.g. between year 1 and year 2).
The unadjusted hazard ratio for the risk of a first event in the lymphadenectomy group, as compared with the surgery
alone group, was 0.76 (95% CI 0.60 to 0.96), meaning that the risk of a recurrence of disease is lower in the patients
treated with lymphadenectomy if compared with the control group.
The absolute gain in survival for the treated group is equal to (1-0.76)*100=24%. This difference in survival probabilities
of the two compared groups is statistically significant (P=0.022 using the log-rank test).
Median progression-free survival was 22.4 months for patients in the control arm and 27.4 months for patients in the
systematic lymphadenectomy arm, confirming the improvement in survival for the patients treated with additional
lymphadenectomy.
These results can be obtained using Proc LIFETEST. Proc LIFETEST computes non-parametric estimates of the
survival distribution and rank tests for associations of the response variable with other variables. The survival estimates
PhUSE 2008
8
are computed within defined strata levels and the rank tests are pooled over the strata and are therefore adjusted for
strata differences. Additionally, statistics testing of homogeneity of the strata are computed.
proc lifetest data=lympho1 plot=(s);
time ttp*censor(0);
strata group;
run;
The PLOTS statement displays the survival curve for the data considered. The statement TIME requests the variable
for time to progression (ttp) and the variable indicating censoring (censor); the value between round brackets (0) is the
one associated to censored observations, STRATA is the variable for the groups to be compared (group). By default
the estimates are calculated using the Kaplan-Meier method.
The same code can be used for calculating median of follow-up by changing the censoring indicator from ‘0’ to ‘1’ if ‘1’
indicates the occurrence of the event of interest.
Cox proportional hazards analysis was performed to adjust the treatment comparison for baseline characteristics:
results are shown in Figure 5.
When histologic grade and residual disease were taken into account, the adjusted hazard ratio for occurrence of first
event was almost unchanged (0.75 [95% CI = 0.59 to 0.94]), confirming that the 25% improvement in survival for
patients treated with additional lymphadenectomy depends mostly on treatment received. These results can be
obtained using PROC PHREG.
Figure 5: Multivariable Cox proportional hazards analysis for PFS
Proc PHREG procedure performs regression analysis of survival data based on the Cox proportional hazards model.
Adopting a partial likelihood function, Cox regression eliminates the unknown baseline hazard and accounts for
censoring survival times. It also allows the use time dependent explanatory variable, which is one whose values for any
given individual can change over time
proc phreg data=lympho2;
model ttp*censor(0)=group grade residual;
run;
The statement MODEL requests to specify the dependent (ttp, censor) and independent (group, grade, residual)
variables to fit the regression model; the Hazard ratios are automatically calculated.
EXTRACTING SUMMARY STATISTICS
3
If you want to know which ODS table should be used to access a portion of the output of the SAS procedure you are
using, you can simply use the statement ODS TRACE ON before using the procedure and use the statement ODS
TRACE OFF when you want to stop ODS tracing. The name of the ODS tables used/generated by the SAS procedure,
will be reported in the LOG file.
PhUSE 2008
9
Survival analysis procedures, such as PROC LIFETEST, are able to generate outputs with a lot of information (the
example in figure 2 can increase in size if the study has thousands of patients) and of different types (lifetable,
statistical tests). In this case a summary output such as the one in figure 6 may facilitate the life of anyone who needs
to interpret the results (the statistician, medical writer, clinician, or the principal investigator). In the summary table in
figure 6 the summary statistics describing the 'Time to Progression' across the different study groups are reported:
nr. of events;
nr. of censored observations;
survival estimates at 6, 9 and 12 months (point of clinical interest) with their 95% confidence intervals;
the 25
th
, median and the 75
th
percentiles
This summary table can be generated by extracting the statistics obtained by PROC LIFETEST using the ODS
OUTPUT statement. The ODS OUTPUT allows you to convert the output of a SAS procedure into a SAS dataset. For
example to extract the Product Limit Estimates we can use the following statements:
ods output ProductLimitEstimates=survest;
proc lifetest data=lympho1;
time ttp*censor(0);
strata group;
run;
ods output close;
where ProductLimitEstimates is the ODS table name and survest is the dataset in which the product limit estimates will
be saved. Table 1 reports all the ODS tables name
3
that can be accessed from PROC LIFETEST. Of course you need
to work on the dataset extracted and produce the above table using SAS procedure such as TABULATE (we have for
example developed a macro called SURVEST that ‘automatically’ produce the output in figure 6).
All Doses
(N=13)
Dose Level 1
(N=3)
Dose Level 2
(N=4)
Dose Level 3
(N=6)
Number Progressed/Died/Started
New Anti-Tumor Therapy
10 (76.9%) 3 (100.0%) 4 (100.0%) 3 (50.0%)
Number Censored 3 (23.1%) 0 (0.0%) 0 (0.0%) 3 (50.0%)
Kaplan-Meier Estimates of the
Proportion of Patient Without PD at
6 Mos 46.7 66.7 0.0 66.7
9 Mos 24.8 0.0 0.0 66.7
12 Mos 19.8 0.0 0.0 66.7
25% Percentile of Time to
Progression (Months) and its
(95% CI)
2.1 (2.0,3.1) 2.0 (2.0,7.6) 1.5 (0.8,3.1) 2.1 (1.8,12.0)
Median Time to Progression
(Months) and its (95% CI)
4.8 (2.2,8.3) 6.6 (2.0,7.6) 2.6 (0.8,4.8) 12.0 (2.1,12.0)
75% Percentile of Time to
Progression (Months) and its
(95% CI)
8.7 (6.3,13.7) 7.6 (2.0,7.6) 4.0 (2.2,4.8) 12.0 (-,-)
Figure 6: Survival Analysis Summary Table
CREATING THE TIME-TO-EVENT ANALYSIS DATABASE
Creating the analysis database (ADB) for time-to-event outcomes may seem an easy task; in theory you just need to
create two variables: the event (or censored) variable and the time variable. However, due to the complexity of the
clinical trials data collection tools (e.g. CRF), it might not be possible to derive these two variables from only two source
variables. In other words it will be uncommon to have a form where the event and the time are directly reported (e.g.
event is 0 and time is 30 months), but you need to derive them by accessing different forms (SAS datasets) across
different time-schedule (e.g. visits, follow-up), to check whether or not a subject/patient has experienced an event and
when. Moreover, you also need to define what has to be considered ‘time 0’, i.e. either the randomization date or the
first diagnosis date, etc. Whilst the event is in most cases something that can be detected without doubts (e.g. the
death or the disease progression), the time for censored patients is not.
For example the ‘overall survival’ is defined as the time from randomization date, or whatever is considered the ‘time 0’,
until the date of death. If the patient is alive at a certain point, the patient is censored at last date it was seen alive (e.g.
follow-up date, last phone contact, etc.). Similarly, the ‘time-to-progression’ endpoint commonly used in oncology, is
PhUSE 2008
10
calculated from the randomization date till a tumor progression is observed. In the event that a patient did not
experience tumor progression, the patient will be censored either at the last contact date, or the last date when a formal
tumor-assessment was performed, which may or may not be the same as the last date the patient was seen alive.
As this specific situation may occur with other study-endpoints in other therapeutic areas, we suggest to carefully define
the algorithm to be used by clearly detailing the situation like the one just described.
In appendix, you can find an example of ‘ADB derivation’ flow-chart.
ODS Tables Name Contents
CensoredSummary Number of events and censored observations
HomStats Rank statistics for testing strata homogeneity
HomTests Tests for strata homogeneity
LifetableEstimates Lifetable survival estimates
LogForStepSeq Forward stepwise sequence for the log-rank statistics for
association
LogHomCov Covariance matrix for the log-rank statistics for strata
homogeneity
LogTestCov Covariance matrix for log-rank statistics for association
LogUniChisq Univariate chi-squares for log-rank statistics for association
Means Mean and Standard Error of survival time
ProductLimitEstimates Product-limit survival estimates
Quartiles Quartiles of the survival estimates
WilForStepSeq Forward stepwise sequence for the log-rank statistics for
association
WilHomCov Covariance matrix for the Wilcoxon statistics for strata
homogeneity
WilTestCov Covariance matrix for the log-rank statistics for association
WilUniChisq Univariate chi-squares for the Wilcoxon statistic for association
Table 1: ODS Tables Produced by PROC LIFETEST
ODS STATISTICAL GRAPHICS IN SAS VERSION 9.2
ODS Graphics is a new feature in SAS Version 9.2 (it was experimental in SAS Version 9.1), from SAS products of
SAS/STAT, Base SAS, SAS/ETS, and others, which allows the users to create statistical graphics automatically. Both
PROC LIFETEST and PROC PHREG support ODS Graphics. For example PROC LIFETEST provides eight type of
ODS GRAPHICS. Figure 7 shows the default “Estimated Survival Curve” (ODS Graph Name “Survival”) generated
using the following SAS code:
ods graphics on;
proc lifetest data=surv plot=(s);
time ttp*censor(0);
strata group;
run;
quit;
Prior to version 9.2 and ODS GRAPHICS, a similar output could be produced using the following code:
proc lifetest data=surv plot=(s)graphics;
time ttp*censor(0);
strata group;
symbol1 line=1 color=blue;
symbol2 line=2 color=red;
run;
PhUSE 2008
11
Figure 7: Graphical Survival Plot with ODS Graphics
META-ANALYSIS OF TIME-TO-EVENT DATA
Meta-analyses (MA) that combine the results of related randomised controlled trials have become a common and
widely accepted tool in the evaluation of health care interventions. Regulatory agencies such as the FDA have started
considering it as key supporting evidence. Systematic reviews use explicit, objective and prospectively-defined
methods to collect, critically evaluate and synthesize studies, making them less biased and more reproducible than
traditional reviews. A systematic review may, or may not, include a meta-analysis (MA): a statistical pooling of the
results from individual studies to obtain a single overall estimate of treatment effect. Today there are well validated
method to perform also meta-analysis of time-to-event data [13] [14].
ADDITIONAL TECHNICAL REFERENCES
Apart the methodological references you can find in the ‘References’ section, we want to suggest to have a look at the
following articles presented at various SAS users congress:
Smith T. Smith B. Kaplan Meier and Cox Proportional Hazard Modelling: Hands on Survival Analysis. WUSS.
2004
Yeh S. ODS and Statistical Graphics. Paper PR02. NESUG17. 2004
Zhou JC. Enhancement of Survival Graphs. Paper 239-28. SUGI28
Xu W. Wang WWB %KMPLOT – An enhancement of SAS PROC LIFETEST. Paper SP06. PharmaSUG.
2005
Smoak C. Survival (Kaplan-Meier) Curves made Easy. Paper TT07. PharmaSUG. 2006
Gharibvand L. Fernandez G. Advanced Statistical and Graphics features of SAS PHREG. Paper 375. SAS
Global Forum. 2008
CONCLUSIONS
Survival analysis is a method of analyzing time to event data when such data are exposed to a censoring process,
whereby subjects finish a clinical trial without having experienced an event. Other statistical methods cannot
adequately take into account the censoring process since they do not adjust for the 'risk' population decreasing as the
subject leaves the trial. Statistical procedures such as Proc LIFETEST and Proc PHREG are the tools with which
the statistician/programmer has to work with within the SAS system. Statistical programmers require a fundamental
understanding of the methodology so that they are able to understand, and eventually check, the results that they are
without doubts responsible for.
REFERENCES
1. Kaplan EL. Meier P. Nonparametric estimation from incomplete observations. J Am Statistics Association 1958;
53:457-81
2. Bewick V., Cheek L., Ball J. Statistics review 12: survival analysis. Crit Care 2004 Oct;8(5):389-94. Epub 2004 Sep
6.
3. Douglas G Altman, J Martin Bland. Statistics Notes: Time to event (survival) data. BMJ 1998; 317; 468-469
4. J Martin Bland , Douglas G Altman. Statistics Notes: Survival probabilities (the Kaplan-Meier method). BMJ 1998;
PhUSE 2008
12
317; 1572-1580
5. Parmar M. and Machin D. Survival Analysis – A Practical Approach.1995, John Wiley & Sons, Chichester, England
6. TG Clark, MJ Bradburn, SB Love and DG Altman. Survival Analysis Part I: Basic concepts and first
analyses.British Journal of Cancer (2003) 89, 232-238
7. Kleinbaum DG: Survival Analysis: A self-learning Text. Statistics in the Health Sciences. Springer 1997
8. J Martin Bland , Douglas G Altman. The logrank test. BMJ 2004; 328; 1073
9. Cox DR. Regression models and life tables (with discussion). Journal of the Royal Statistics Society, B34, 187-220
10. Cox DR. Oakes D. Analysis of Survival Data. Chapman & Hall, London. 1984
11. Spotswood L. Spruance, Julia E. Reid, Michael Grace, Matthew Samore. Hazard Ratio in Clinical Trials.
Antitumoral Agents and Chemotherapy Aug. 2004; 2787-2792
12. Benedetti Panici P. et al. Systematic aortic and pelvic lymphadenectomy versus resection of bulky nodes only in
optimally debulked advanced ovarian cancer: a randomized clinical trial. Journal of the National Cancer Institute,
Vol. 97, No. 8
13. Parmar KB, Torri V, Stewart L. Extracting summary statistics to perform meta-analyses of the published literature
for survival endpoints. Stat Med. 1998; 17: 2815-2834
14. Tinazzi A., Tierney J. A gentle introduction to meta-analysis. PhUSE 2007, Paper TU03
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Angelo Tinazzi
SENDO Tech S.r.l.
Medical Informatics and Biometry Unit
Via Visconti di Modrone 12
Milan / 20122
Work Phone: 0039 02 76420426 Fax: 0039 02 76017484
Email: tinazzia@sendo-org.it Web: http://www.sendofoundation.org
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
PhUSE 2008
13
APPENDIX: EXAMPLE OF A FLOW-CHART OF A COMPLEX TIME TO EVENT ADB CREATION
The following flow-chart summarize the programming steps to be followed to derive the event and time variables for the
time-to-progression(PD) endpoint.
Treatment Administration
Up to 6 repeated cycles of treatment
Time=0
The patient is randomized
Patient in
Progression
(PD)?
No Yes Event at PD date
Off-Treatment
More
Follow-up?
No TTP=0
Censored at last assessment
Follow-Up
Check
Patient
Status
• PD event at PD date
• New-anti-tumor therapy event
at start date
• Death
• tumor related: event at
death date
• other causes: censored
at last tumor assessment
• Alive: censored at last tumor
assessment

More Related Content

What's hot (20)

PPTX
Sampling Distributions and Estimators
Long Beach City College
 
PDF
Introduction To Survival Analysis
federicorotolo
 
PPTX
Survival analysis
vijaylaxmi hasaraddi
 
PPTX
Fishers test
Princy Francis M
 
PDF
Survival Analysis Using SPSS
Nermin Osman
 
PPTX
Basic biostatistics dr.eezn
EhealthMoHS
 
PDF
Kaplan meier survival curves and the log-rank test
zhe1
 
PPTX
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Setia Pramana
 
PPT
Survival analysis
IbraahimAli3
 
PPTX
Presentation on Multiple Decrement Life Table by amin
Aminul Islam
 
PDF
R workshop xiv--Survival Analysis with R
Vivian S. Zhang
 
PDF
Stata statistics
izahn
 
PDF
Survival analysis & Kaplan Meire
Dr Athar Khan
 
PPT
Part 2 Cox Regression
Bhaswat Chakraborty
 
PDF
Basic survival analysis
Mike LaValley
 
PDF
Basic Biostatistics and Data managment
Tadesse Awoke Ayele
 
PPTX
Introduction to Statistics (Part -I)
YesAnalytics
 
PDF
Categorical data analysis
Sumit Das
 
PPTX
Survival analysis
Sanjaya Sahoo
 
PPTX
Estimating a Population Mean
Long Beach City College
 
Sampling Distributions and Estimators
Long Beach City College
 
Introduction To Survival Analysis
federicorotolo
 
Survival analysis
vijaylaxmi hasaraddi
 
Fishers test
Princy Francis M
 
Survival Analysis Using SPSS
Nermin Osman
 
Basic biostatistics dr.eezn
EhealthMoHS
 
Kaplan meier survival curves and the log-rank test
zhe1
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Setia Pramana
 
Survival analysis
IbraahimAli3
 
Presentation on Multiple Decrement Life Table by amin
Aminul Islam
 
R workshop xiv--Survival Analysis with R
Vivian S. Zhang
 
Stata statistics
izahn
 
Survival analysis & Kaplan Meire
Dr Athar Khan
 
Part 2 Cox Regression
Bhaswat Chakraborty
 
Basic survival analysis
Mike LaValley
 
Basic Biostatistics and Data managment
Tadesse Awoke Ayele
 
Introduction to Statistics (Part -I)
YesAnalytics
 
Categorical data analysis
Sumit Das
 
Survival analysis
Sanjaya Sahoo
 
Estimating a Population Mean
Long Beach City College
 

Viewers also liked (20)

PPTX
Survival analysis
Har Jindal
 
PPTX
Survival Analysis Project
Uma Lalitha Chockalingam
 
PDF
Application of survival data analysis introduction and discussion
ASQ Reliability Division
 
PPT
Application of life tables in insect pest management
chidanand4098
 
PPTX
Subscription Survival Analysis
TheDataNation
 
PDF
Impact of censored data on reliability analysis
ASQ Reliability Division
 
PPTX
Survival Analysis
Titis Setya Wulandari
 
PPTX
Hazard ratios
Terry Shaneyfelt
 
PDF
Making Cancer Survival Statistics More Relevant for Clinicians, Patients, and...
Paul Dickman
 
PDF
Efficacy endpoints in Oncology
Angelo Tinazzi
 
PPTX
Is the Data Scaled, Ordinal, or Nominal Proportional?
Ken Plummer
 
PPT
tnm classification of tumours
drkreenaparekh
 
PDF
Cancer Staging
meducationdotnet
 
PPTX
Hazard analysis(ppt)
waiyin_lee
 
PPT
Staging of cancer
Omar Moatamed
 
PPTX
Principles of oncology
surgerymgmcri
 
PPTX
8th Edition of the TNM Classification for Lung Cancer
Mauricio Lema
 
PPTX
Principles of oncology
Uthamalingam Murali
 
PPT
1 Introduction To Oncology
Miami Dade
 
PPTX
Introduction to Machine Learning
Lior Rokach
 
Survival analysis
Har Jindal
 
Survival Analysis Project
Uma Lalitha Chockalingam
 
Application of survival data analysis introduction and discussion
ASQ Reliability Division
 
Application of life tables in insect pest management
chidanand4098
 
Subscription Survival Analysis
TheDataNation
 
Impact of censored data on reliability analysis
ASQ Reliability Division
 
Survival Analysis
Titis Setya Wulandari
 
Hazard ratios
Terry Shaneyfelt
 
Making Cancer Survival Statistics More Relevant for Clinicians, Patients, and...
Paul Dickman
 
Efficacy endpoints in Oncology
Angelo Tinazzi
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Ken Plummer
 
tnm classification of tumours
drkreenaparekh
 
Cancer Staging
meducationdotnet
 
Hazard analysis(ppt)
waiyin_lee
 
Staging of cancer
Omar Moatamed
 
Principles of oncology
surgerymgmcri
 
8th Edition of the TNM Classification for Lung Cancer
Mauricio Lema
 
Principles of oncology
Uthamalingam Murali
 
1 Introduction To Oncology
Miami Dade
 
Introduction to Machine Learning
Lior Rokach
 
Ad

Similar to A gentle introduction to survival analysis (20)

DOC
Non-Parametric Survival Models
MangaiK4
 
PPT
Part 1 Survival Analysis
Bhaswat Chakraborty
 
PDF
B04621019
IOSR-JEN
 
PPTX
Life Tables & Kaplan-Meier Method.pptx
Pravin Kolekar
 
PDF
What is survival analysis, and when should I use it?
Cecilia M. Patino-Sutton, MD MeD PhD
 
PDF
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
cheweb1
 
PPT
Epidemiology Lectures for UG
amitakashyap1
 
PDF
Cox model
Veronica Navia
 
PPT
Lecture 5-Survival Analysis.ppt
FenembarMekonnen
 
PDF
Lemeshow samplesize
1joanenab
 
PDF
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
Cytel USA
 
DOCX
Answer the following.   (5 pts ea)A study is conducted to estimate.docx
boyfieldhouse
 
PDF
Survival analysis for lab scientists
Mike LaValley
 
PPTX
Life table and survival analysis 04122013
sauravkumar946
 
PDF
1. appl. statist. (2015)
Gitanambiar
 
DOCX
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
toltonkendal
 
PDF
Restricted Mean Survival Analysis
ayatan2
 
PDF
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Waqas Tariq
 
PPTX
Cohort study
Dr. Ankit Gaur
 
PDF
2014 lab slides measures of disease_final (4)
A M
 
Non-Parametric Survival Models
MangaiK4
 
Part 1 Survival Analysis
Bhaswat Chakraborty
 
B04621019
IOSR-JEN
 
Life Tables & Kaplan-Meier Method.pptx
Pravin Kolekar
 
What is survival analysis, and when should I use it?
Cecilia M. Patino-Sutton, MD MeD PhD
 
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
cheweb1
 
Epidemiology Lectures for UG
amitakashyap1
 
Cox model
Veronica Navia
 
Lecture 5-Survival Analysis.ppt
FenembarMekonnen
 
Lemeshow samplesize
1joanenab
 
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
Cytel USA
 
Answer the following.   (5 pts ea)A study is conducted to estimate.docx
boyfieldhouse
 
Survival analysis for lab scientists
Mike LaValley
 
Life table and survival analysis 04122013
sauravkumar946
 
1. appl. statist. (2015)
Gitanambiar
 
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
toltonkendal
 
Restricted Mean Survival Analysis
ayatan2
 
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Waqas Tariq
 
Cohort study
Dr. Ankit Gaur
 
2014 lab slides measures of disease_final (4)
A M
 
Ad

More from Angelo Tinazzi (20)

PPTX
Adapting CDISC to Adaptive Design
Angelo Tinazzi
 
PDF
Adapting to Adaptive
Angelo Tinazzi
 
PPTX
CDISC SDTM and ADaM for survival data
Angelo Tinazzi
 
PDF
Looking for SDTM migration specialist
Angelo Tinazzi
 
PPTX
A Systematic Review of ADaM IG Interpretation
Angelo Tinazzi
 
PPTX
Looking for SDTM migration specialist
Angelo Tinazzi
 
PPTX
Therapeutic Area Standards – Reflections on Oncology standards and what is ne...
Angelo Tinazzi
 
PPTX
The PhUSE Therapeutic Area Wiki Page
Angelo Tinazzi
 
PDF
Interpreting CDISC ADaM IG through Users Interpretation
Angelo Tinazzi
 
PDF
Efficacy endpoints in Oncology
Angelo Tinazzi
 
PDF
A gentle introduction to meta-analysis
Angelo Tinazzi
 
PPT
SDTM modelling: from study protocol to SDTM-compliant datasets
Angelo Tinazzi
 
PPT
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
Angelo Tinazzi
 
PDF
The Implementation of ICH Development Safety Update Report (DSUR) Guidance
Angelo Tinazzi
 
PDF
THE DO’S AND DON’TS OF DATA SUBMISSION
Angelo Tinazzi
 
PDF
Oncology Therapeutic Area Workshop
Angelo Tinazzi
 
PDF
INTERPRETING CDISC ADaM IG THROUGH USERS INTERPRETATION
Angelo Tinazzi
 
PDF
Web-Triage An Application for patient registration in phase I dose escalation...
Angelo Tinazzi
 
PDF
From Local Laboratory to Standardisation and beyond Applying a common grading...
Angelo Tinazzi
 
PDF
The application of STDM in a no-profit and disease specific organisation - CD...
Angelo Tinazzi
 
Adapting CDISC to Adaptive Design
Angelo Tinazzi
 
Adapting to Adaptive
Angelo Tinazzi
 
CDISC SDTM and ADaM for survival data
Angelo Tinazzi
 
Looking for SDTM migration specialist
Angelo Tinazzi
 
A Systematic Review of ADaM IG Interpretation
Angelo Tinazzi
 
Looking for SDTM migration specialist
Angelo Tinazzi
 
Therapeutic Area Standards – Reflections on Oncology standards and what is ne...
Angelo Tinazzi
 
The PhUSE Therapeutic Area Wiki Page
Angelo Tinazzi
 
Interpreting CDISC ADaM IG through Users Interpretation
Angelo Tinazzi
 
Efficacy endpoints in Oncology
Angelo Tinazzi
 
A gentle introduction to meta-analysis
Angelo Tinazzi
 
SDTM modelling: from study protocol to SDTM-compliant datasets
Angelo Tinazzi
 
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
Angelo Tinazzi
 
The Implementation of ICH Development Safety Update Report (DSUR) Guidance
Angelo Tinazzi
 
THE DO’S AND DON’TS OF DATA SUBMISSION
Angelo Tinazzi
 
Oncology Therapeutic Area Workshop
Angelo Tinazzi
 
INTERPRETING CDISC ADaM IG THROUGH USERS INTERPRETATION
Angelo Tinazzi
 
Web-Triage An Application for patient registration in phase I dose escalation...
Angelo Tinazzi
 
From Local Laboratory to Standardisation and beyond Applying a common grading...
Angelo Tinazzi
 
The application of STDM in a no-profit and disease specific organisation - CD...
Angelo Tinazzi
 

Recently uploaded (20)

PPTX
IMPORTANCE of WORLD ORS DAY July 29 & ORS.pptx
MedicalSuperintenden19
 
PPTX
12. Neurosurgery (part. 2) SURGERY OF VERTEBRAL COLUMN, SPINAL CORD AND PERIP...
Bolan University of Medical and Health Sciences ,Quetta
 
PDF
1140718-椎間盤源性疼痛—病理機轉、診斷與治療-社團法人高雄市醫師公會.pdf
Ks doctor
 
PPTX
Oro-antral Communications and its management strategies
Srinjoy Chatterjee
 
PPTX
First 1000 days of nutrition for children. importance of breast feeding . by ...
RekhaR88
 
PDF
ADVANCED CLINICAL PHARMACOKINETICS AND BIOPHARMACEUTICS AT ONE PLACE.pdf
BalisaMosisa
 
PPTX
CEPHALOPELVIC DISPROPORTION (Mufeez).pptx
mufeezwanim2
 
DOCX
RUHS II MBBS Pharmacology Paper-I with Answer Key | 26 July 2025 (New Scheme)
Shivankan Kakkar
 
PPTX
A Detailed Overview of Sterols Chemistry, Sources, Functions and Applications...
Indranil Karmakar
 
PDF
ICF around the World - Keynote presentation
Olaf Kraus de Camargo
 
PPTX
Pharmacotherapy of Myasthenia Gravis- Dr. Anurag Sharma (1).pptx
Anurag Sharma
 
PPTX
Cosmetics and cosmeceuticals : sunscreen and sunprotection.pptx
SahilKasture2
 
PPTX
13.Anesthesia and its all types.....pptx
Bolan University of Medical and Health Sciences ,Quetta
 
PPTX
12. Biomechanicsof the humanlowerextremity.pptx
Bolan University of Medical and Health Sciences ,Quetta
 
PPTX
perioperative management and ERAS protocol.pptx
Fahad Ahmad
 
PPTX
LOW GRADE GLIOMA MANAGEMENT BY DR KANHU CHARAN PATRO
Kanhu Charan
 
PPTX
HANAU ARTICULATORS AND CLASSIFICATION.pptx
Priya Singaravelu
 
PPTX
DEVELOPMENTAL DYSPLASIA OF HIP , Congenital Dislocation of Hip
Deep Desai
 
PPTX
Models of screening of Adrenergic Blocking Drugs.pptx
Dr Fatima Rani
 
PDF
Alexander Neumeister_ A Journey of Science, Leadership, and Resilience.pdf
Sentosh It LTD
 
IMPORTANCE of WORLD ORS DAY July 29 & ORS.pptx
MedicalSuperintenden19
 
12. Neurosurgery (part. 2) SURGERY OF VERTEBRAL COLUMN, SPINAL CORD AND PERIP...
Bolan University of Medical and Health Sciences ,Quetta
 
1140718-椎間盤源性疼痛—病理機轉、診斷與治療-社團法人高雄市醫師公會.pdf
Ks doctor
 
Oro-antral Communications and its management strategies
Srinjoy Chatterjee
 
First 1000 days of nutrition for children. importance of breast feeding . by ...
RekhaR88
 
ADVANCED CLINICAL PHARMACOKINETICS AND BIOPHARMACEUTICS AT ONE PLACE.pdf
BalisaMosisa
 
CEPHALOPELVIC DISPROPORTION (Mufeez).pptx
mufeezwanim2
 
RUHS II MBBS Pharmacology Paper-I with Answer Key | 26 July 2025 (New Scheme)
Shivankan Kakkar
 
A Detailed Overview of Sterols Chemistry, Sources, Functions and Applications...
Indranil Karmakar
 
ICF around the World - Keynote presentation
Olaf Kraus de Camargo
 
Pharmacotherapy of Myasthenia Gravis- Dr. Anurag Sharma (1).pptx
Anurag Sharma
 
Cosmetics and cosmeceuticals : sunscreen and sunprotection.pptx
SahilKasture2
 
13.Anesthesia and its all types.....pptx
Bolan University of Medical and Health Sciences ,Quetta
 
12. Biomechanicsof the humanlowerextremity.pptx
Bolan University of Medical and Health Sciences ,Quetta
 
perioperative management and ERAS protocol.pptx
Fahad Ahmad
 
LOW GRADE GLIOMA MANAGEMENT BY DR KANHU CHARAN PATRO
Kanhu Charan
 
HANAU ARTICULATORS AND CLASSIFICATION.pptx
Priya Singaravelu
 
DEVELOPMENTAL DYSPLASIA OF HIP , Congenital Dislocation of Hip
Deep Desai
 
Models of screening of Adrenergic Blocking Drugs.pptx
Dr Fatima Rani
 
Alexander Neumeister_ A Journey of Science, Leadership, and Resilience.pdf
Sentosh It LTD
 

A gentle introduction to survival analysis

  • 1. PhUSE 2008 1 Paper ST03 A gentle introduction to survival analysis Angelo Tinazzi, SENDO-Tech S.r.l., Milan, Italy Martin Scott, Numerus Ltd, Tuebingen, Germany Anna Compagnoni, SENDO-Tech S.r.l., Milan, Italy ABSTRACT Time-to event analyses are commonly used to analyze a clinical trial. Survival Analysis is a necessary tool to analyze the time to the occurrence of an event of interest, which can be either adverse, positive or neutral. Analysis of survival data requires special techniques because the event of interest does not necessarily occur for all patients before the end of the study. Survival methodology has the advantage of being able to use all available data to estimate the survival probability, including data derived from patients who did not experience an event. This paper is a tutorial addressed to statistical programmers who are required to produce derived datasets containing time-to-event endpoint variables and summary tables containing survival data estimates (e.g. median survival). The paper is composed of two main sections. In the first part, the main elements of survival analysis theory will be introduced: • the Kaplan-Meier method for survival probability estimation; • the log-rank test; • the hazard ratio calculation; • the interpolation of a Cox regression model. In the second part, the main programming aspects of survival analysis will be discussed: • creating derived datasets containing time-to-event endpoints; • working with longitudinal data (repeated measures, such as follow-up); • the use of the main SAS © procedure for survival analysis (e.g. proc LIFETEST and proc PHREG); • extracting the most relevant information (e.g. median survival) from a SAS output using the ODS for creating ad-hoc summary tables. INTRODUCTION In many clinical studies, either experimental or observational, the time until the occurrence of an event is the main outcome of interest; such an event can be either adverse, such as death or disease progression or recurrence of a tumor, or positive, such as birth or disease remission, or neutral, such as cessation of breast feeding or the completion of a task. In all these cases, it is conventional to talk about survival endpoints; the term survival may be misleading because the techniques are applicable to any well-defined event although traditionally death was the event of interest. Events in survival analysis are defined by a ‘transition’ from one discrete state to another at an instantaneous moment in time. The typical feature of this kind of data is that not all the subjects will have experienced the event at the end of the follow-up period. This is called censoring, meaning that the observation period ended without observing the event of interest, either because the subject has not yet experienced the event by the end of follow-up period, or because the patient is lost to follow-up, or because the subject experiences a different event that makes further follow-up impossible. It is not possible in these cases to know if and when the subject would have experienced the event and we refer to this as right censoring meaning that a subject’s follow-up stops before observing the event; while left censoring refers to when the event has occurred before the subject has entered the study. Classical statistical methods, such as T-test or Chi-square Test, are not appropriate to analyze survival data because they are not conceived to take censored data into account. All subjects (including those with censored observations) provide useful information for the trial, and should therefore not be excluded from the analysis. On the other hand, considering the censoring time as a true survival time for these subjects would lead to a biased estimate of their survival time and probability. For these reasons there is the need to use a set of specific statistical methods to handle survival data. To make an example, consider the following two clinical study endpoints: the risk of dying in Intensive Care Unit (ICU) within 30 days from the admission;
  • 2. PhUSE 2008 2 survival rate of advanced breast cancer patients at 5 years from the diagnosis date. In the first case you can describe the endpoint results by using a simple % rate (events/totals), and eventually using the chi-square test, for two main reasons: the limited length of required observation period (30 days): there is no risk for loosing patient; at the end of the 30 days period, a patient can be either dead or alive (discharged or not). In the second case it is more appropriate to use a survival analysis approach for the following reasons: with a required length of follow-up (5 years), there is an high risk to loose patient (lost to follow-up); usually this kind of study requires a longer accrual period (sometimes varying from 2 to 5 years), therefore if you perform an interim analysis not all the patients had enough time to reach (alive) the 5 years survival endpoint; the time when the event has been observed (e.g. 1 year vs 3 years) with this “chronic” disease, make a substantial difference, therefore simply checking how many patients died in the 0-5 years period and eventually removing lost to follow-up patients, it is not appropriate. INCOMPLETE DATA In clinical practice not all the subjects are enrolled at the same time and thus the follow-up period can vary from one subject to another. For these reasons it is suitable to convert calendar-time data to time-to-event data, which is a more appropriate format to perform survival analysis. The starting point is set to the same (e.g. entry into study or diagnosis time, will be the beginning point of the study defined as time=0) for all enrolled subjects, and all the possible outcomes are coded depending on the event of interest. Figure 1: Example of Survival Data. R = Relapse; D = Death from ovarian cancer; Do = Death from other cause; A = Alive; L = Loss to Follow-up; x = Death; = Censored PART I: STATISTICAL METHODS MAIN ASSUMPTIONS To ensure a correct estimate of the survival probability in presence of censored data, three main assumptions are necessary. First, we assume that subjects who are censored have the same survival distribution as those who continue the observation during the study: that is to say, the censoring process is independent of the primary endpoint. If the reason for censoring is due to negative treatment effects or to the worsened subject condition, this censoring would lead to a biased estimate of the survival probability. The second assumption derives from the fact that subjects who first joined the study have a longer observation period and thus have more chances to experience the event than subjects enrolled at the end of recruitment period. We assume that survival probability is the same for subjects enrolled at the beginning and at the end of recruitment period. For some kind of events of interest, like tumor recurrence or progression, it is not possible to identify the exact date
  • 3. PhUSE 2008 3 when the event happens, because the only information we would have is the event happened between two subsequent examinations. In this case we need to assume that the event happens at the time it is detected, otherwise we would obtain a biased estimate of the survival times. CALCULATION OF SURVIVAL PROBABILITY: THE KAPLAN-MEIER METHOD Survival data are described in terms of two probabilities: survival and hazard. The first one is the probability for an individual to survive from the starting time until a specified future time t. To estimate the proportion of subjects surviving at a given time point, and hence the survival probability to that time for the generic population from which the sample is extracted, the Kaplan-Meier method [1][2][3][4][5][6][7], also called product-limit estimator, is commonly used, which allows to deal with censored information. The simple idea underlying this method is that to survive to k intervals from the start of the study, it is necessary to survive to each one of the previous intervals, and then also to the k th one. This principle allows working with conditional and cumulative probabilities. The conditional survival probability for the generic time interval (on the condition that the subject was survivor at the beginning of the interval and then to all the previous) is given by: i ii i r dr p − = where r is the number alive and at risk at the beginning of the i th interval, and d the number of failures during the same interval. Observations censored at any given time influence the number at risk at the start of the subsequent time interval. Survival at any time point is calculated as product of the conditional probabilities of surviving each previous time interval. ∏≤ −= ti i i r d tS )1()( Figure 2 is an example of survival probability calculation, derived from a SAS output referred to time to progression data (time expressed in weeks).
  • 4. PhUSE 2008 4 Figure 2: Survival probabilities Calculation At week 0, all enrolled subjects (17) are alive and at risk to experience the event of interest (progression). Survival probability is thus set to 1 1 . At week 6, a censoring occurs (but number failed is still 0): it does not modify the survival and failure probabilities, but the number of subjects at risk for the subsequent interval decreases by one (number left = 16). At week 10, two events occur, thus all the statistics are modified: the number failed increases by two, while the number left still at risk for the subsequent interval reduces to 14. Survival probability for the interval [time 0 to time 10] is then equal to (16-2)/16=0.8750. A new event occurs at week 16, together with two censored observations; the survival probability for the interval [week 10 to week 16] is equal to (14-1)/14=0.9286. To calculate the cumulative probability for a subject to survive from the start of the study until the week 16 it is now necessary to cumulate the survival probabilities for each time interval. Cumulative survival probability until time 16 is computed as 0.9286 x 0.8750 = 0.8125. At week 17 a new event occurs: the survival probability for the time interval [week 16 to week 17] is computed as (11- 1)/11=0.9091. Cumulating all the previous single probabilities, we can easily obtain the probability for a subject enrolled at the beginning of the study to survive without experiencing the event until week 17: 0.9091 x 0.9286 x 0.8750 = 0.7387. The same process can be followed until week 44 when no more events are available and the survival probability can’t be modified anymore. SURVIVAL PROBABILITIES CALCULATION: THE LIFETABLE METHOD In some situations it is not possible to know or to record the exact time when the event occurs; the only available information is that the event occurred in a certain time interval. In this case the survival probability cannot be updated every time an event happens, but once in a fixed time unit: the derived survival curve and probability are called life- table or actuarial estimates. With this method, the whole period of observation is divided into a series of time intervals, where the basic assumption is that censorings are uniformly distributed in each interval, so that the average number of subjects at risk during each interval is given by: 2 ' j jj c nn −= Where n’j is the number of subjects alive and at risk at the beginning of the time interval j and cj is the number of censored observations in the same interval. As previously seen with the Kaplan-Meier method, the cumulative probability to survive until the time t is given by the product of the probabilities to survive to each previous time interval and also to the last one including t. ∏≤ −= tj j j i n d tS )1()( ' dj is the number of deaths or events observed in the time interval j. THE GRAPHICAL REPRESENTATION Survival probability can be plotted against survival time, obtaining a survival curve: the survival curve is a non- increasing function with a step corresponding at each time when an event is observed; the step is used since the estimated survival curve remains at a plateau between successive patient event time and it drops instantaneously at each time of event to a new level. Times when a censoring happens are usually indicated with a small circle or a short vertical line. The plot will never reach ‘0’ if the patient with longest observed time has in fact died. Were such a patient still alive then the curve would have a plateau commencing at the time of the last event and continuing until the censored observed time of this longest ‘surviving’ patient. In certain circumstances a graph of 1-S(t), rather than S(t), is plotted against time to give the cumulative ‘event’ curve. From the graphical representation of a survival curve it is possible to extract relevant summary statistics, mainly the median survival time and the survival probability at a certain time point. The median survival time is defined as the time point when half of the subjects have experienced the event of interest, and can be easily determined by drawing a line (the red line in Figure 3) starting from the value 0.5 on the vertical axis until you reach the survival curve, and then from the curve till the horizontal time axis. The exact value of the median 1 It can be also expressed as % (see figure 6)
  • 5. PhUSE 2008 5 survival time can be calculated with a proportion involving the length of the axis and the maximum follow-up time observed. Similarly you can extract the median from the estimates (figure 2) by searching for the first estimate below 0.50; in the case the 0.50 (or below) is not reached, we say “the median of survival has been not yet reached”. This latter situation, can happen when cohorts with low event rate or in the case of early (interim) analysis. Again in the same way you can extract other percentiles of interest, typically the 25% and the 75%, by searching the point in which the curve reach the point of interest (e.g. .25 and .75). The survival probability at a certain time point (e.g. 20 months, Figure 3) can be calculated in a similar way: as shown by the blue line in Figure 3, it is necessary to draw a line from the chosen point of the horizontal time axis to the survival curve, and then until the vertical survival probability axis. Once again, the exact probability can be deduced using a proportion involving the length of the vertical axis and the maximum value of the survival probability. It is difficult to judge precisely when the right-hand tail of the survival curve becomes unreliable. However, as rule of thumb, the curve can be particularly unreliable when the number of patients remaining at risk is less than 15 or more in general less than 5%. Another important statistic you may be required to provide, is the median of follow-up. The median of follow-up is an indicator of how ‘mature’ are your data by giving an estimate of the length of the observation period (e.g. how many months on ‘average’ the patients have been followed since they have been enrolled). To calculate the median of follow- up we can use survival analysis approach by censoring patients who experienced the event. The reason for that is because you don’t know how long you could follow a patient if this did not experience the event. Figure 3: Survival Curve with Censored Observations: Median Survival (red line) Time and Survival Probability at a Fixed Time Point (blue line). COMPARING SURVIVAL CURVES An interesting issue in the clinical practice is the comparison of the survival functions of two (or more) groups of subjects. Classical statistical methods such as T-test or Chi-square Test are not appropriate to compare the proportion of survivors for the reasons described before. Alternatively, one could compare the proportions surviving at one or more pre-specified time points, but this approach ignores the total survival experience of the groups during the whole follow-up period. Moreover, the time points are arbitrary chosen. The most common method used to compare survival curves is then a statistical hypothesis test called log-rank test [8], which takes the whole follow-up period into account and does not require any assumption about the distribution of the survival function, which is rarely normally distributed and is often skewed because it often comprises many early events and only a few late ones. The null hypothesis for the log-rank test is that there is no difference between the survivals of two or more populations that are being compared (i.e. the probability of the event of interest occurring at any time point is the same for each The median survival is about 63 months At 60 months the % survival is about .58
  • 6. PhUSE 2008 6 population). The comparison is based on the difference between the observed number of events in each group and the expected number of events in case of non-difference between the two groups. − =− g g gg rank E EO 2 log 2 )( χ where O is the number of observed events in each group g, and E is the total number of expected events in each group g. O and E are calculated for each time when an event happens; if a survival time is censored, then the subject is considered to be at risk during the interval of censoring, but not anymore for the subsequent intervals. The test statistic is then compared with a 2 with g-1 degrees of freedom. The log-rank test assigns equal weight to each event at whatever time it occurs. If researchers believe that differences between certain parts of the survival curves being compared are of greater interest than others, then different weights can be assigned to the deaths, depending on the time when they happen. For example, if it were known that a new treatment is helpful to avoid early mortality, then it would be useful to compare the first part of the survival curves, without paying too much attention to the subsequent time intervals. The best way to achieve this aim is to construct a statistical test using as weights the number of patients at risk at each time point: this test is a variant of the log-rank test and it is known as Gehan or generalized Wilcoxon or Breslow test and can be calculated as follows: − = tg gtt gtgtt G ER EOR , 2 2 2 )( χ where R is the number of patients at risk in each group and for each time point. Note that these are tests of significance, and do not provide an estimate of the size of the differences in survival probabilities between the groups. COX REGRESSION MODEL AND HAZARD RATIO While with previous described methods, either the Kaplan-Meier or the Life-Tables method, the survival data are analyzed using an univariate approach. To approach survival analysis in a multivariate way, the Cox model is used [9][10]. The Cox model a semi-parametric regression model used to test the effect of a set of covariates X1, X2, …, Xn on the time-to-event variable; the model for the hazard function is the following: { }nn xxxtth βββλ +++⋅= ...exp)()( 2211 where h(t) represents the hazard function and (t) is an unspecified initial hazard function. The hazard is a function of time and represents the instantaneous probability that the event of interest will occur at a specified time t given that it has not occurred prior to time t. Hazard is defined as follows: t tNtentsobservedev t ∆→∆ )(/)( lim 0 Where N is the number of subjects at risk at the beginning of the time interval. The Cox model (that, as with the Kaplan-Meier method, also adjusts for censored data) can include one or more numeric covariates (which can be also time-dependent) and does not make any assumption on the distribution of the event times. The main assumption of the Cox proportional hazard regression model is that the hazard ratio between the treatment groups remains constant over time, even if, as it is reasonable to assume, the hazard rate does change over time. D. R. Cox, after whom the model is named, developed a method to estimate the coefficients i based on a maximal partial likelihood, which is a modification of the maximum likelihood method. Chi-square tests are performed to evaluate the null hypothesis of non-significance of the i parameters. A positive parameter estimate indicates that the hazard increases with increasing values of the covariate, while a negative estimate shows that the hazard decreases with increasing values of the covariate The magnitude of the effect of a specific covariate on the event time is often expressed as hazard ratio [11]: this is an estimate of the relative survival experience of two groups and can be calculated as
  • 7. PhUSE 2008 7 cc ee EO EO / / If the hazard ratio indicates a beneficial effect of the treatment on study, this implies that the time to the observed endpoint was modified by treatment. The hazard ratio derived from the Cox regression model does not translate directly into information about the duration of time until the event occurs, but it can be converted, anyway, in terms of absolute risk reduction, showing the difference between the risk in the control group (set to 1) and the risk in the treated group (that is hazard ratio itself). Absolute risk reduction is defined thus as 1 – HR, and can be expressed either as a probability or as a percentage. PART II: STATISTICAL PROGRAMMING ASPECTS ILLUSTRATIVE STUDY: OVARIAN CANCER STUDY This data set refers to a multicenter, phase III, randomized clinical trial involving women with epithelial ovarian carcinoma randomized to either receive a more (Lymphadenectomy) or less (No Lymphadenectomy or Surgery Only) invasive surgery; the main end-points of the study included progression-free and overall survival [12]. 427 eligible patients were enrolled: 211 were randomly assigned to the surgery only arm, while 216 to the lymphadenectomy arm. After a median follow-up of 68.4 months, tumor recurrence was observed in 292 patients (153 in the surgery alone arm, and 139 in the lymphadenectomy arm). Figure 4 shows the survival curve for these data. Figure 4: Result of survival analysis for the ovarian cancer trial. The Number of patients still at risk at each time point for the treated and control groups are shown below the horizontal axis; you can also decide to add the number of events occurred in each interval (e.g. between year 1 and year 2). The unadjusted hazard ratio for the risk of a first event in the lymphadenectomy group, as compared with the surgery alone group, was 0.76 (95% CI 0.60 to 0.96), meaning that the risk of a recurrence of disease is lower in the patients treated with lymphadenectomy if compared with the control group. The absolute gain in survival for the treated group is equal to (1-0.76)*100=24%. This difference in survival probabilities of the two compared groups is statistically significant (P=0.022 using the log-rank test). Median progression-free survival was 22.4 months for patients in the control arm and 27.4 months for patients in the systematic lymphadenectomy arm, confirming the improvement in survival for the patients treated with additional lymphadenectomy. These results can be obtained using Proc LIFETEST. Proc LIFETEST computes non-parametric estimates of the survival distribution and rank tests for associations of the response variable with other variables. The survival estimates
  • 8. PhUSE 2008 8 are computed within defined strata levels and the rank tests are pooled over the strata and are therefore adjusted for strata differences. Additionally, statistics testing of homogeneity of the strata are computed. proc lifetest data=lympho1 plot=(s); time ttp*censor(0); strata group; run; The PLOTS statement displays the survival curve for the data considered. The statement TIME requests the variable for time to progression (ttp) and the variable indicating censoring (censor); the value between round brackets (0) is the one associated to censored observations, STRATA is the variable for the groups to be compared (group). By default the estimates are calculated using the Kaplan-Meier method. The same code can be used for calculating median of follow-up by changing the censoring indicator from ‘0’ to ‘1’ if ‘1’ indicates the occurrence of the event of interest. Cox proportional hazards analysis was performed to adjust the treatment comparison for baseline characteristics: results are shown in Figure 5. When histologic grade and residual disease were taken into account, the adjusted hazard ratio for occurrence of first event was almost unchanged (0.75 [95% CI = 0.59 to 0.94]), confirming that the 25% improvement in survival for patients treated with additional lymphadenectomy depends mostly on treatment received. These results can be obtained using PROC PHREG. Figure 5: Multivariable Cox proportional hazards analysis for PFS Proc PHREG procedure performs regression analysis of survival data based on the Cox proportional hazards model. Adopting a partial likelihood function, Cox regression eliminates the unknown baseline hazard and accounts for censoring survival times. It also allows the use time dependent explanatory variable, which is one whose values for any given individual can change over time proc phreg data=lympho2; model ttp*censor(0)=group grade residual; run; The statement MODEL requests to specify the dependent (ttp, censor) and independent (group, grade, residual) variables to fit the regression model; the Hazard ratios are automatically calculated. EXTRACTING SUMMARY STATISTICS 3 If you want to know which ODS table should be used to access a portion of the output of the SAS procedure you are using, you can simply use the statement ODS TRACE ON before using the procedure and use the statement ODS TRACE OFF when you want to stop ODS tracing. The name of the ODS tables used/generated by the SAS procedure, will be reported in the LOG file.
  • 9. PhUSE 2008 9 Survival analysis procedures, such as PROC LIFETEST, are able to generate outputs with a lot of information (the example in figure 2 can increase in size if the study has thousands of patients) and of different types (lifetable, statistical tests). In this case a summary output such as the one in figure 6 may facilitate the life of anyone who needs to interpret the results (the statistician, medical writer, clinician, or the principal investigator). In the summary table in figure 6 the summary statistics describing the 'Time to Progression' across the different study groups are reported: nr. of events; nr. of censored observations; survival estimates at 6, 9 and 12 months (point of clinical interest) with their 95% confidence intervals; the 25 th , median and the 75 th percentiles This summary table can be generated by extracting the statistics obtained by PROC LIFETEST using the ODS OUTPUT statement. The ODS OUTPUT allows you to convert the output of a SAS procedure into a SAS dataset. For example to extract the Product Limit Estimates we can use the following statements: ods output ProductLimitEstimates=survest; proc lifetest data=lympho1; time ttp*censor(0); strata group; run; ods output close; where ProductLimitEstimates is the ODS table name and survest is the dataset in which the product limit estimates will be saved. Table 1 reports all the ODS tables name 3 that can be accessed from PROC LIFETEST. Of course you need to work on the dataset extracted and produce the above table using SAS procedure such as TABULATE (we have for example developed a macro called SURVEST that ‘automatically’ produce the output in figure 6). All Doses (N=13) Dose Level 1 (N=3) Dose Level 2 (N=4) Dose Level 3 (N=6) Number Progressed/Died/Started New Anti-Tumor Therapy 10 (76.9%) 3 (100.0%) 4 (100.0%) 3 (50.0%) Number Censored 3 (23.1%) 0 (0.0%) 0 (0.0%) 3 (50.0%) Kaplan-Meier Estimates of the Proportion of Patient Without PD at 6 Mos 46.7 66.7 0.0 66.7 9 Mos 24.8 0.0 0.0 66.7 12 Mos 19.8 0.0 0.0 66.7 25% Percentile of Time to Progression (Months) and its (95% CI) 2.1 (2.0,3.1) 2.0 (2.0,7.6) 1.5 (0.8,3.1) 2.1 (1.8,12.0) Median Time to Progression (Months) and its (95% CI) 4.8 (2.2,8.3) 6.6 (2.0,7.6) 2.6 (0.8,4.8) 12.0 (2.1,12.0) 75% Percentile of Time to Progression (Months) and its (95% CI) 8.7 (6.3,13.7) 7.6 (2.0,7.6) 4.0 (2.2,4.8) 12.0 (-,-) Figure 6: Survival Analysis Summary Table CREATING THE TIME-TO-EVENT ANALYSIS DATABASE Creating the analysis database (ADB) for time-to-event outcomes may seem an easy task; in theory you just need to create two variables: the event (or censored) variable and the time variable. However, due to the complexity of the clinical trials data collection tools (e.g. CRF), it might not be possible to derive these two variables from only two source variables. In other words it will be uncommon to have a form where the event and the time are directly reported (e.g. event is 0 and time is 30 months), but you need to derive them by accessing different forms (SAS datasets) across different time-schedule (e.g. visits, follow-up), to check whether or not a subject/patient has experienced an event and when. Moreover, you also need to define what has to be considered ‘time 0’, i.e. either the randomization date or the first diagnosis date, etc. Whilst the event is in most cases something that can be detected without doubts (e.g. the death or the disease progression), the time for censored patients is not. For example the ‘overall survival’ is defined as the time from randomization date, or whatever is considered the ‘time 0’, until the date of death. If the patient is alive at a certain point, the patient is censored at last date it was seen alive (e.g. follow-up date, last phone contact, etc.). Similarly, the ‘time-to-progression’ endpoint commonly used in oncology, is
  • 10. PhUSE 2008 10 calculated from the randomization date till a tumor progression is observed. In the event that a patient did not experience tumor progression, the patient will be censored either at the last contact date, or the last date when a formal tumor-assessment was performed, which may or may not be the same as the last date the patient was seen alive. As this specific situation may occur with other study-endpoints in other therapeutic areas, we suggest to carefully define the algorithm to be used by clearly detailing the situation like the one just described. In appendix, you can find an example of ‘ADB derivation’ flow-chart. ODS Tables Name Contents CensoredSummary Number of events and censored observations HomStats Rank statistics for testing strata homogeneity HomTests Tests for strata homogeneity LifetableEstimates Lifetable survival estimates LogForStepSeq Forward stepwise sequence for the log-rank statistics for association LogHomCov Covariance matrix for the log-rank statistics for strata homogeneity LogTestCov Covariance matrix for log-rank statistics for association LogUniChisq Univariate chi-squares for log-rank statistics for association Means Mean and Standard Error of survival time ProductLimitEstimates Product-limit survival estimates Quartiles Quartiles of the survival estimates WilForStepSeq Forward stepwise sequence for the log-rank statistics for association WilHomCov Covariance matrix for the Wilcoxon statistics for strata homogeneity WilTestCov Covariance matrix for the log-rank statistics for association WilUniChisq Univariate chi-squares for the Wilcoxon statistic for association Table 1: ODS Tables Produced by PROC LIFETEST ODS STATISTICAL GRAPHICS IN SAS VERSION 9.2 ODS Graphics is a new feature in SAS Version 9.2 (it was experimental in SAS Version 9.1), from SAS products of SAS/STAT, Base SAS, SAS/ETS, and others, which allows the users to create statistical graphics automatically. Both PROC LIFETEST and PROC PHREG support ODS Graphics. For example PROC LIFETEST provides eight type of ODS GRAPHICS. Figure 7 shows the default “Estimated Survival Curve” (ODS Graph Name “Survival”) generated using the following SAS code: ods graphics on; proc lifetest data=surv plot=(s); time ttp*censor(0); strata group; run; quit; Prior to version 9.2 and ODS GRAPHICS, a similar output could be produced using the following code: proc lifetest data=surv plot=(s)graphics; time ttp*censor(0); strata group; symbol1 line=1 color=blue; symbol2 line=2 color=red; run;
  • 11. PhUSE 2008 11 Figure 7: Graphical Survival Plot with ODS Graphics META-ANALYSIS OF TIME-TO-EVENT DATA Meta-analyses (MA) that combine the results of related randomised controlled trials have become a common and widely accepted tool in the evaluation of health care interventions. Regulatory agencies such as the FDA have started considering it as key supporting evidence. Systematic reviews use explicit, objective and prospectively-defined methods to collect, critically evaluate and synthesize studies, making them less biased and more reproducible than traditional reviews. A systematic review may, or may not, include a meta-analysis (MA): a statistical pooling of the results from individual studies to obtain a single overall estimate of treatment effect. Today there are well validated method to perform also meta-analysis of time-to-event data [13] [14]. ADDITIONAL TECHNICAL REFERENCES Apart the methodological references you can find in the ‘References’ section, we want to suggest to have a look at the following articles presented at various SAS users congress: Smith T. Smith B. Kaplan Meier and Cox Proportional Hazard Modelling: Hands on Survival Analysis. WUSS. 2004 Yeh S. ODS and Statistical Graphics. Paper PR02. NESUG17. 2004 Zhou JC. Enhancement of Survival Graphs. Paper 239-28. SUGI28 Xu W. Wang WWB %KMPLOT – An enhancement of SAS PROC LIFETEST. Paper SP06. PharmaSUG. 2005 Smoak C. Survival (Kaplan-Meier) Curves made Easy. Paper TT07. PharmaSUG. 2006 Gharibvand L. Fernandez G. Advanced Statistical and Graphics features of SAS PHREG. Paper 375. SAS Global Forum. 2008 CONCLUSIONS Survival analysis is a method of analyzing time to event data when such data are exposed to a censoring process, whereby subjects finish a clinical trial without having experienced an event. Other statistical methods cannot adequately take into account the censoring process since they do not adjust for the 'risk' population decreasing as the subject leaves the trial. Statistical procedures such as Proc LIFETEST and Proc PHREG are the tools with which the statistician/programmer has to work with within the SAS system. Statistical programmers require a fundamental understanding of the methodology so that they are able to understand, and eventually check, the results that they are without doubts responsible for. REFERENCES 1. Kaplan EL. Meier P. Nonparametric estimation from incomplete observations. J Am Statistics Association 1958; 53:457-81 2. Bewick V., Cheek L., Ball J. Statistics review 12: survival analysis. Crit Care 2004 Oct;8(5):389-94. Epub 2004 Sep 6. 3. Douglas G Altman, J Martin Bland. Statistics Notes: Time to event (survival) data. BMJ 1998; 317; 468-469 4. J Martin Bland , Douglas G Altman. Statistics Notes: Survival probabilities (the Kaplan-Meier method). BMJ 1998;
  • 12. PhUSE 2008 12 317; 1572-1580 5. Parmar M. and Machin D. Survival Analysis – A Practical Approach.1995, John Wiley & Sons, Chichester, England 6. TG Clark, MJ Bradburn, SB Love and DG Altman. Survival Analysis Part I: Basic concepts and first analyses.British Journal of Cancer (2003) 89, 232-238 7. Kleinbaum DG: Survival Analysis: A self-learning Text. Statistics in the Health Sciences. Springer 1997 8. J Martin Bland , Douglas G Altman. The logrank test. BMJ 2004; 328; 1073 9. Cox DR. Regression models and life tables (with discussion). Journal of the Royal Statistics Society, B34, 187-220 10. Cox DR. Oakes D. Analysis of Survival Data. Chapman & Hall, London. 1984 11. Spotswood L. Spruance, Julia E. Reid, Michael Grace, Matthew Samore. Hazard Ratio in Clinical Trials. Antitumoral Agents and Chemotherapy Aug. 2004; 2787-2792 12. Benedetti Panici P. et al. Systematic aortic and pelvic lymphadenectomy versus resection of bulky nodes only in optimally debulked advanced ovarian cancer: a randomized clinical trial. Journal of the National Cancer Institute, Vol. 97, No. 8 13. Parmar KB, Torri V, Stewart L. Extracting summary statistics to perform meta-analyses of the published literature for survival endpoints. Stat Med. 1998; 17: 2815-2834 14. Tinazzi A., Tierney J. A gentle introduction to meta-analysis. PhUSE 2007, Paper TU03 CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Angelo Tinazzi SENDO Tech S.r.l. Medical Informatics and Biometry Unit Via Visconti di Modrone 12 Milan / 20122 Work Phone: 0039 02 76420426 Fax: 0039 02 76017484 Email: [email protected] Web: http://www.sendofoundation.org SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
  • 13. PhUSE 2008 13 APPENDIX: EXAMPLE OF A FLOW-CHART OF A COMPLEX TIME TO EVENT ADB CREATION The following flow-chart summarize the programming steps to be followed to derive the event and time variables for the time-to-progression(PD) endpoint. Treatment Administration Up to 6 repeated cycles of treatment Time=0 The patient is randomized Patient in Progression (PD)? No Yes Event at PD date Off-Treatment More Follow-up? No TTP=0 Censored at last assessment Follow-Up Check Patient Status • PD event at PD date • New-anti-tumor therapy event at start date • Death • tumor related: event at death date • other causes: censored at last tumor assessment • Alive: censored at last tumor assessment