SlideShare a Scribd company logo
Survival
Analysis Using
SPSS
Dr. Nermin Osman
Assistant Lecturer of Biomedical Informatics and Medical Statistics
Medical Research Institute, Alexandria University
United Nations System Staff College Intern, UNITAR
nerminosman@unssc.org
What is survival analysis
− Event history analysis
− Time series analysis
When use survival analysis
− Research interest is about time-to-event and event is discrete
occurrence.
Examples of survival analysis
− Duration to the hazard of death
− Adoption of an innovation over time
Characteristics of survival analysis
− At any time point, events may occur
− Factors influence events include two types: time-constant and
time-dependent (factor: age).
Survival analysis
Survival analysis focuses on hazard function
− Hazard: any worse event of interest occurring
− Hazard might be death, engine breakdown, etc.
− Hazard rate: is the instantaneous probability of the given
event occurring at any point in time. It can be plotted against
time on the X axis, forming a graph of the hazard rate over
time.
− Hazard function: the equation that describe this plotted line is
the hazard function.
− Hazard ratio: also called relative risk: Exp(B) in SPSS.
Survival analysis
Vertical
(y)
Hazard
rate
Horizontal (x)
Time
Yrs/ ms
Hazard
Function
Type of survival analysis
− Nonparametric: no assumption about the shape of hazard
function. Hazard function is estimated based on empirical
data, showing change over time, for example, Kaplan-Meier
survival analysis.
− Semi-parametric: no assumption about the shape of hazard
function, but make assumption about how covariates affect
the hazard function, for example: Cox regression
− Parametric: specify the shape of baseline hazard function and
covariates effects on hazard function in advance.
− Used when time is itself considered a meaningful independent
variable. (parametric)
− Used for predictive modeling (Logistic Regression)& Maximum
likelihood method(large dataset)
Survival analysis
Terms
− Events: what terminates an episode (such as death, adoption
of an innovation), it is the change which causes the subject to
transition from one state to another.
− Durations: the number of time units an individual spends in a
given state.
− Dependent: probability of an event.
− Survival function, s(t): is the cumulative frequency of the
proportion of the sample Not experiencing the event by time
t. In another word, it is the probability of event will NOT occur
until time t.
− Censored cases: data are censored if events start before (left-
censored) or ended after (right-censored) the period of
observation.
Survival analysis
Censored cases: unique characteristics of survival
analysis.
− For some cases, the event simply doesn’t occur before the end
of study.
− For some cases, they drop out from the study for reasons
unrelated to the study.
− For some cases, we lost track of their status sometime before
the end of the study.
Survival analysis
Censored cases
Survival analysis
Outline of topics
− Life tables
− Kaplan-Meier
− Cox regression
− Cox regression with a time-dependent covariate
Survival analysis
Life tables
 Life Tables is a descriptive procedure for
examining the distribution of time-to-event
variables. We also can compare the distribution
by levels of a factor variable.
 The basic idea of life tables is to subdivide the
period of observation into smaller time intervals.
Then the probability from each of the intervals
are estimated.
Variables
− Time variable (duration variable): must be a continuous
variable.
− Status variable: binary or dichotomous variable, represents
the event of interest.
− Factor variable: categorical variable.
Assumption
− The probability for the event of interest should depend only
on time. Cases that enter the study at different times should
behave similarly.
− No systematic differences between censored and uncensored
cases
Life tables
Example (from IBM SPSS 20.0): data file name: telco
− Examine distribution of customer time to churn by customer
category.
− Time variable: tenure (in month)
− Status variable: churn (binary: 1 = Churn, 0 = Not churn)
− Factor: custcat (four categories)
Go to Analyze > Survival > Life Tables
Life tables
Run analysis
Life tables
Click Options
Life
tables
1. Survival: display the
cumulative survival
function on a linear scale
2. Hazard: display the
cumulative hazard
function on a linear
scale.
SPSS Outputs: life table
Life tables
SPSS outputs: life table
− Interval Start Time. The beginning value for each interval.
Each interval extends from its start time up to the start time of
the next interval.
− Number Withdrawing during Interval: the number of censored
cases in this interval. These are still active customers, but so
far they have not been customers longer than the time period
indicated by this interval.
− Number Exposed to Risk. The number of surviving cases minus
one half the censored cases. This is intended to account for
the effect of the censored cases.
Life tables
SPSS outputs: life table
− Number of Terminal Events. The number of cases that
experience the terminal event in this interval. These are
customers with churn = 1.
− Proportion Terminating. The ratio of terminal events to the
number exposed to risk (10/264.5=0.0378).
− Proportion Surviving. One minus the proportion terminating.
Life tables
SPSS Outputs: life table
Life tables
SPSS Outputs: life table
− Cumulative Proportion Surviving at End of Interval. The
proportion of cases surviving from the start of the table to
the end of the interval (266-10-17)/266=0.8984 (second
row).
− Probability Density. An estimate of the
probability of experiencing the terminal event
during the interval.
(i.e. likelihood that an item will experience the terminal
event at a certain point in time based on its survival to an
earlier time)
− Hazard Rate. An estimate of experiencing the terminal
event during the interval, conditional upon surviving to
the start of the interval. (i.e. the rate of death for an item of a
given time)
Life tables
SPSS Outputs: life table
− The greatest number and proportion of terminal events occur
within the first year, which suggests that customers should be
monitored more closely during their first year to be sure of
their satisfaction with the company's service.
Life tables
SPSS Outputs: survival function
Life
tables
1.The horizontal axis shows the
time to event. The vertical axis
shows the probability of survival.
2.Any point on the survival curve
shows the probability that a customer
of a given service category will
remain a customer past that time.
3.Total service and Basic service
customers have the lowest survival
curves, and E- service customers have
lower curves than Plus service
customers.
SPSS Outputs
Life tables
1.Wilcoxon test is used to compare
survival distribution among groups,
with the test statistic based on
differences in group mean scores.
2.Since the significance value of
the test is less than 0.05, we
conclude that the survival curves
are different across the group.
3. Pairwise comparisons show
which two groups are
significantly different in survival
curves.
The Kaplan-Meier procedure is a non-parametric
method of estimating time-to-event models in the
presence of censored cases.
A descriptive procedure for examining the distribution
of time-to-event variables. We also can compare the
distribution by levels of a factor variable or produce
separate analyses by levels of a stratification variable.
Censored cases (right-censored cases) are those for
which the event of interest has not yet happened.
Kaplan-Meier procedure
Assumptions
− Probabilities for the event of interest should depend only on
time taking in consideration a stratified factor after the
initial event without covariates effects.
− Cases that enter the study at different times (for example,
patients who begin treatment at different times) should
behave similarly.
− Censored and uncensored cases behave the same. If, for
example, many of the censored cases are patients with more
serious conditions, your results may be biased.
Kaplan-Meier procedure
Example (from IBM SPSS 20.0) : data file:
pain_medication
− A pharmaceutical company is developing an anti-inflammatory
medication for treating chronic arthritic pain. The research
interest is the time it takes for the drug to take effect and how
it compares to an existing drug. Shorter times to effect are
considered better.
Event: drug takes effect
Kaplan-Meier procedure
Variables
− Time variable (duration variable): must be a continuous
variable
− Status variable: categorical variable, represents the event of
interest (drug has effect or not).
− Factor variable (stratification variable): categorical variable,
represents a causal effect (type of treatment for example).
Kaplan-Meier procedure
Kaplan-Meier procedure
 We have Factor variable (stratification factor): Treatment (0 = New
drug, 1 = old drug), Status variable (event): status ( 0 = censored,
1 = take effect), Time variable: time
 We want to compare the effect of two different drugs. Null
hypothesis: whether survival function is the same between
different levels of the factor variable (between old and new drug)
.
Go to Analyze > Survival > Kaplan-Meier
Kaplan-Meier procedure
Analyze data
Kaplan-Meier procedure
Click Compare Factor
Kaplan-Meier procedure
Log rank: Tests equality of survival functions by weighting all time points the
same.
Breslow: Tests equality of survival functions by weighting all time points by the
number of cases at risk at each time point.
Tarone-Ware: Tests equality of survival functions by weighting all time points by
the square root of the number of cases at risk at each time point.
Compare factor
Kaplan-Meier procedure
Pooled over strata: a single test is computed for all factor levels, testing for
equality of survival function across all levels of the factor variable.
Pairwise over strata: a separate test is computed for each pair of factor levels
when a pooled test shows non-equality of survival functions.
For each stratum: a separate test is computed for group formed by the
stratification variable.
Pairwise for each stratum: a separate test is computed for each pair of factor
variable, for each stratum of the stratification variable.
Click Options
Kaplan-Meier procedure
SPSS outputs: survival table
Kaplan-Meier procedure
The survival table is a descriptive table
that details the time until the drug
takes effect. The table is sectioned by
each level of Treatment, and each
observation occupies its own row in
the table. The table is very large.
Survival table
− Time. The time at which the event or censoring occurred.
− Status. Indicates whether the case experienced the terminal
event or was censored.
− Cumulative Proportion Surviving at the Time. The proportion
of cases surviving from the start of the table until this time.
When multiple cases experience the terminal event at the
same time, these estimates are printed once for that time
period and apply to all the cases whose drug took effect at
that time.
Kaplan-Meier procedure
Survival table
− N of Cumulative Events. The number of cases that have
experienced the terminal event from the start of the table
until this time.
− N of Remaining Cases. The number of cases that, at this time,
haven’t yet to experience the terminal event or be censored.
Kaplan-Meier procedure
Means or Median for Survival time
Kaplan-Meier procedure
.
Overall comparison
Kaplan-Meier procedure
This table provides overall tests of the equality of
survival times across groups. Since the significance
values of the tests are all greater than 0.05, there is no
statistically significant difference between two
treatments in survival time.
Survival function
Kaplan-Meier
procedure
1. The survival curves give a
visual representation of the
life tables.
2. The horizontal axis shows
the time to event. In this
plot, drops in the survival
curve occur whenever the
medication takes effect in a
patient.
3. The vertical axis shows
the probability of survival
(probability of not experience
the treatment effect).
The Cox Regression procedure is useful for modeling the
time to a specified event, based upon the values of
given covariates.
One or more covariates are used to predict a status
(event).
The central statistical output is the hazard ratio.
Data contain censored and uncensored cases.
Similar to logistic regression, but Cox regression
assesses relationship between survival time and
covariates .
Cox regression
Terms
− Status variable: the dependent in Cox regression, should be
binary variable.
− Time variable: measures duration to the event defined by the
status variable (continuous or discrete).
− Covariates: independent/predictor variables. They can be
categorical or continuous. They also can be time-fixed or time-
dependent.
− Interaction terms
− Categorical covariates: SPSS automatically convert them into a set of
dummy variables, omitting one category (reference).
Cox regression
Example (from SPSS): data file: telco
− Use Cox Regression to determine which attributes are
associated with shorter "time to churn“.
− Time variable: tenure (month with services)
− Status variable: churn (0 = No, 1 = Yes).
− Covariates: age, marital, address, employ, retire, gender,
reside, and custcat
Go to Analyze > Survival > Cox Regression
Cox regression
Run Cox regression
Cox regression
Click Categorical
Cox regression
Click Plots
Cox regression
Click Options
Cox regression
SPSS Outputs
Cox regression
SPSS Outputs
Cox regression
SPSS Outputs: variables in the equation
− Exp(B), which can be interpreted as the predicted change in
the hazard for a unit increase in the predictor.
− For binary covariates, hazard ratio is the estimate of the ratio
of the hazard rate in one group to the hazard rate in another
group.
− The value of Exp(B) for marital means that the churn hazard
for an unmarried customer is 1.395 times that of a married
customer.
− The value of Exp(B) for address means that the churn hazard is
reduced by 100%−(100%×0.943)=5.7% for each year a
customer has lived at the same address.
Cox regression
SPSS Outputs
Cox regression
This table displays the average
value of each predictor variable,
plus a pattern for each level of
custcat. The four patterns
correspond to each of the
customer types, each with
otherwise "average" predictor
values.
Cox regression
 SPSS Outputs: survival function
 The basic survival curve is
a visual display of the
model- predicted time to
churn for the "average"
customer.
 The horizontal axis shows
the
 time to event.
 The vertical axis
shows the probability
of survival.
SPSS Outputs
Cox regression
1. The plots show the effect
of customer category.
2.Total service and Basic
service customers have
lower survival curves
because, they are more
likely to have shorter times
to churn.
SPSS Outputs
Cox regression
1.The horizontal axis shows the
time to event.
2.The vertical axis shows the
cumulative hazard, equal to the
negative log of the survival
probability.
Time-dependent covariates
− Simple product of the time variable and the covariate.
− Some variables may have different values at different time
periods but aren't systematically related to time. In such
cases, you need to define a segmented time-dependent
covariate, which can be done using logical expressions.
Cox regression with a time-
dependent covariate
Example (from IBM SPSS 20.0): data file: recidivism
− A government law enforcement agency is concerned about
recidivism rates in their area of jurisdiction. One of the
measures of recidivism is the time until second arrest for
offenders. The agency would like to model time to rearrest
using Cox Regression, but are worried the proportional
hazards assumption is invalid across age categories.
Go to Analyze > Survival > Cox with time-dependent
covariate
Cox regression with a time-dependent
covariates
Run analysis: the product of variable T_ and variable age
Cox regression with a time-dependent covariate
Click Model and back to Cox regression window
Cox regression with a time-
dependent covariate
Dependent variable:
arrest2
Time variable: time
Covariates: age and
new product
variable: T_cov
SPSS outputs
Cox regression with a time-
dependent covariate
The time-dependent covariate has a significance value less than
0.05, which means it contributes to the model, but the value of the
coefficient is very small.
We found that the effect of age on recidivism is time-dependent,
and added a term to the model that helps to account for that
dependence.
Survival Analysis Using SPSS

More Related Content

What's hot (20)

PPTX
Survival analysis
Sanjaya Sahoo
 
PPT
Part 1 Survival Analysis
Bhaswat Chakraborty
 
PDF
Basic survival analysis
Mike LaValley
 
PPTX
Survival analysis
Har Jindal
 
PPT
SURVIVAL ANALYSIS.ppt
mbang ernest
 
PPT
Types of studies
Abdo_452
 
PPT
SAMPLE SIZE DETERMINATION.ppt
abdulwehab2
 
PPTX
Sample Size Determination
Centre for Social Initiative and Management
 
PPTX
ODDS RATIO AND RELATIVE RISK EVALUATION
Kanhu Charan
 
PPTX
Chi square test
YASMEEN CHAUDHARI
 
PPTX
Sample determinants and size
Tarek Tawfik Amin
 
PPTX
biostatistics
Mehul Shinde
 
PPT
Sample Size Estimation
Nayyar Kazmi
 
PPTX
Survival analysis
vijaylaxmi hasaraddi
 
PPT
Epidemiology Study Design
Mohammad Ismail Zubair MD. MSc
 
PPTX
Chi square test final
Har Jindal
 
PDF
Normality tests
Dr Lipilekha Patnaik
 
PPT
1.introduction
abdi beshir
 
PPTX
Fishers test
Princy Francis M
 
PDF
Study designs
Dr Lipilekha Patnaik
 
Survival analysis
Sanjaya Sahoo
 
Part 1 Survival Analysis
Bhaswat Chakraborty
 
Basic survival analysis
Mike LaValley
 
Survival analysis
Har Jindal
 
SURVIVAL ANALYSIS.ppt
mbang ernest
 
Types of studies
Abdo_452
 
SAMPLE SIZE DETERMINATION.ppt
abdulwehab2
 
ODDS RATIO AND RELATIVE RISK EVALUATION
Kanhu Charan
 
Chi square test
YASMEEN CHAUDHARI
 
Sample determinants and size
Tarek Tawfik Amin
 
biostatistics
Mehul Shinde
 
Sample Size Estimation
Nayyar Kazmi
 
Survival analysis
vijaylaxmi hasaraddi
 
Epidemiology Study Design
Mohammad Ismail Zubair MD. MSc
 
Chi square test final
Har Jindal
 
Normality tests
Dr Lipilekha Patnaik
 
1.introduction
abdi beshir
 
Fishers test
Princy Francis M
 
Study designs
Dr Lipilekha Patnaik
 

Similar to Survival Analysis Using SPSS (20)

PPTX
LoveJ-SurvivalAnalysis to analyse degreee completion.pptx
TroyTeo1
 
PPTX
Life table and survival analysis 04122013
sauravkumar946
 
PDF
survival analysis and kaplan meier analysis.pdf
mikaelgirum
 
PPTX
Survival Analysis
SMAliKazemi
 
PPTX
Contemporary Approaches to Survival Data Analysis by Dr. Idokoko A. B.
Abraham Idokoko
 
PPT
Lecture 5-Survival Analysis.ppt
FenembarMekonnen
 
PDF
Advances in Survival Analysis N. Balakrishnan
clymotimurfq
 
PPT
Introduction to Survival Analysis in statistics, Kaplan Meier
kozloyevski1
 
PPT
lecture1 on survival analysis HRP 262 class
TroyTeo1
 
PDF
A gentle introduction to survival analysis
Angelo Tinazzi
 
PPTX
Life Tables & Kaplan-Meier Method.pptx
Pravin Kolekar
 
PDF
Survival analysis & Kaplan Meire
Dr Athar Khan
 
PPTX
6 Natural History of Disease.pptx for the
pierresemeko1989
 
PPTX
Life table
cpdarshini
 
PDF
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
PDF
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Waqas Tariq
 
PPT
Survival analysis
IbraahimAli3
 
PPTX
Survival Analysis With Generalized Additive Models
Christos Argyropoulos
 
PDF
R workshop xiv--Survival Analysis with R
Vivian S. Zhang
 
PPTX
Survival Analysis
Mohammad Alfan Alfian Riyadi
 
LoveJ-SurvivalAnalysis to analyse degreee completion.pptx
TroyTeo1
 
Life table and survival analysis 04122013
sauravkumar946
 
survival analysis and kaplan meier analysis.pdf
mikaelgirum
 
Survival Analysis
SMAliKazemi
 
Contemporary Approaches to Survival Data Analysis by Dr. Idokoko A. B.
Abraham Idokoko
 
Lecture 5-Survival Analysis.ppt
FenembarMekonnen
 
Advances in Survival Analysis N. Balakrishnan
clymotimurfq
 
Introduction to Survival Analysis in statistics, Kaplan Meier
kozloyevski1
 
lecture1 on survival analysis HRP 262 class
TroyTeo1
 
A gentle introduction to survival analysis
Angelo Tinazzi
 
Life Tables & Kaplan-Meier Method.pptx
Pravin Kolekar
 
Survival analysis & Kaplan Meire
Dr Athar Khan
 
6 Natural History of Disease.pptx for the
pierresemeko1989
 
Life table
cpdarshini
 
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Waqas Tariq
 
Survival analysis
IbraahimAli3
 
Survival Analysis With Generalized Additive Models
Christos Argyropoulos
 
R workshop xiv--Survival Analysis with R
Vivian S. Zhang
 
Survival Analysis
Mohammad Alfan Alfian Riyadi
 
Ad

More from Nermin Osman (11)

PDF
Kaplan Meier Survival Curve Analysis
Nermin Osman
 
PPTX
Hypothesis testing
Nermin Osman
 
PDF
Writing proper tilte and introduction
Nermin Osman
 
PDF
Clinical trial study design
Nermin Osman
 
PDF
SLR Assumptions:Model Check Using SPSS
Nermin Osman
 
PDF
Observational Studies: Case Control & Cohort Studies
Nermin Osman
 
PDF
Cross sectional study
Nermin Osman
 
PPTX
Global Health Spectrum and Sustainable Peace
Nermin Osman
 
PPTX
Hands on Modern Knowledge Cycle Research Misconduct & Reference Management
Nermin Osman
 
PPT
How to conduct a questionnaire for a scientific survey
Nermin Osman
 
PPT
Scientific writing and international publication
Nermin Osman
 
Kaplan Meier Survival Curve Analysis
Nermin Osman
 
Hypothesis testing
Nermin Osman
 
Writing proper tilte and introduction
Nermin Osman
 
Clinical trial study design
Nermin Osman
 
SLR Assumptions:Model Check Using SPSS
Nermin Osman
 
Observational Studies: Case Control & Cohort Studies
Nermin Osman
 
Cross sectional study
Nermin Osman
 
Global Health Spectrum and Sustainable Peace
Nermin Osman
 
Hands on Modern Knowledge Cycle Research Misconduct & Reference Management
Nermin Osman
 
How to conduct a questionnaire for a scientific survey
Nermin Osman
 
Scientific writing and international publication
Nermin Osman
 
Ad

Recently uploaded (20)

PPTX
Graduation Project 2025 mohamed Tarek PT
midotarekss12
 
PPTX
Role of GIS in precision farming.pptx
BikramjitDeuri
 
PPTX
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
PPTX
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
PDF
A deep Search for Ethylene Glycol and Glycolonitrile in the V883 Ori Protopla...
Sérgio Sacani
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PPTX
Laboratory design and safe microbiological practices
Akanksha Divkar
 
PPTX
mirna_2025_clase_genética_cinvestav_Dralvarez
lalvarezmex
 
PDF
High-definition imaging of a filamentary connection between a close quasar pa...
Sérgio Sacani
 
PDF
New Physics and Quantum AI: Pioneering the Next Frontier
Saikat Basu
 
PPTX
Presentation on the LANGERHANS CELLS.pptx
drnikitabaheti
 
PPTX
Evolution of diet breadth in herbivorus insects.pptx
Mr. Suresh R. Jambagi
 
PDF
Systems Biology: Integrating Engineering with Biological Research (www.kiu.a...
publication11
 
PPTX
Quality control test for plastic & metal.pptx
shrutipandit17
 
PPTX
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
PDF
Perchlorate brine formation from frost at the Viking 2 landing site
Sérgio Sacani
 
PPTX
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
PDF
Pulsar Sparking: What if mountains on the surface?
Sérgio Sacani
 
PPTX
Chromium (Cr) based oxidizing reagents.pptx
karnikhimani
 
PPTX
RED ROT DISEASE OF SUGARCANE.pptx
BikramjitDeuri
 
Graduation Project 2025 mohamed Tarek PT
midotarekss12
 
Role of GIS in precision farming.pptx
BikramjitDeuri
 
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
A deep Search for Ethylene Glycol and Glycolonitrile in the V883 Ori Protopla...
Sérgio Sacani
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Laboratory design and safe microbiological practices
Akanksha Divkar
 
mirna_2025_clase_genética_cinvestav_Dralvarez
lalvarezmex
 
High-definition imaging of a filamentary connection between a close quasar pa...
Sérgio Sacani
 
New Physics and Quantum AI: Pioneering the Next Frontier
Saikat Basu
 
Presentation on the LANGERHANS CELLS.pptx
drnikitabaheti
 
Evolution of diet breadth in herbivorus insects.pptx
Mr. Suresh R. Jambagi
 
Systems Biology: Integrating Engineering with Biological Research (www.kiu.a...
publication11
 
Quality control test for plastic & metal.pptx
shrutipandit17
 
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
Perchlorate brine formation from frost at the Viking 2 landing site
Sérgio Sacani
 
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
Pulsar Sparking: What if mountains on the surface?
Sérgio Sacani
 
Chromium (Cr) based oxidizing reagents.pptx
karnikhimani
 
RED ROT DISEASE OF SUGARCANE.pptx
BikramjitDeuri
 

Survival Analysis Using SPSS

  • 1. Survival Analysis Using SPSS Dr. Nermin Osman Assistant Lecturer of Biomedical Informatics and Medical Statistics Medical Research Institute, Alexandria University United Nations System Staff College Intern, UNITAR [email protected]
  • 2. What is survival analysis − Event history analysis − Time series analysis When use survival analysis − Research interest is about time-to-event and event is discrete occurrence. Examples of survival analysis − Duration to the hazard of death − Adoption of an innovation over time Characteristics of survival analysis − At any time point, events may occur − Factors influence events include two types: time-constant and time-dependent (factor: age). Survival analysis
  • 3. Survival analysis focuses on hazard function − Hazard: any worse event of interest occurring − Hazard might be death, engine breakdown, etc. − Hazard rate: is the instantaneous probability of the given event occurring at any point in time. It can be plotted against time on the X axis, forming a graph of the hazard rate over time. − Hazard function: the equation that describe this plotted line is the hazard function. − Hazard ratio: also called relative risk: Exp(B) in SPSS. Survival analysis
  • 5. Type of survival analysis − Nonparametric: no assumption about the shape of hazard function. Hazard function is estimated based on empirical data, showing change over time, for example, Kaplan-Meier survival analysis. − Semi-parametric: no assumption about the shape of hazard function, but make assumption about how covariates affect the hazard function, for example: Cox regression − Parametric: specify the shape of baseline hazard function and covariates effects on hazard function in advance. − Used when time is itself considered a meaningful independent variable. (parametric) − Used for predictive modeling (Logistic Regression)& Maximum likelihood method(large dataset) Survival analysis
  • 6. Terms − Events: what terminates an episode (such as death, adoption of an innovation), it is the change which causes the subject to transition from one state to another. − Durations: the number of time units an individual spends in a given state. − Dependent: probability of an event. − Survival function, s(t): is the cumulative frequency of the proportion of the sample Not experiencing the event by time t. In another word, it is the probability of event will NOT occur until time t. − Censored cases: data are censored if events start before (left- censored) or ended after (right-censored) the period of observation. Survival analysis
  • 7. Censored cases: unique characteristics of survival analysis. − For some cases, the event simply doesn’t occur before the end of study. − For some cases, they drop out from the study for reasons unrelated to the study. − For some cases, we lost track of their status sometime before the end of the study. Survival analysis
  • 9. Outline of topics − Life tables − Kaplan-Meier − Cox regression − Cox regression with a time-dependent covariate Survival analysis
  • 10. Life tables  Life Tables is a descriptive procedure for examining the distribution of time-to-event variables. We also can compare the distribution by levels of a factor variable.  The basic idea of life tables is to subdivide the period of observation into smaller time intervals. Then the probability from each of the intervals are estimated.
  • 11. Variables − Time variable (duration variable): must be a continuous variable. − Status variable: binary or dichotomous variable, represents the event of interest. − Factor variable: categorical variable. Assumption − The probability for the event of interest should depend only on time. Cases that enter the study at different times should behave similarly. − No systematic differences between censored and uncensored cases Life tables
  • 12. Example (from IBM SPSS 20.0): data file name: telco − Examine distribution of customer time to churn by customer category. − Time variable: tenure (in month) − Status variable: churn (binary: 1 = Churn, 0 = Not churn) − Factor: custcat (four categories) Go to Analyze > Survival > Life Tables Life tables
  • 14. Click Options Life tables 1. Survival: display the cumulative survival function on a linear scale 2. Hazard: display the cumulative hazard function on a linear scale.
  • 15. SPSS Outputs: life table Life tables
  • 16. SPSS outputs: life table − Interval Start Time. The beginning value for each interval. Each interval extends from its start time up to the start time of the next interval. − Number Withdrawing during Interval: the number of censored cases in this interval. These are still active customers, but so far they have not been customers longer than the time period indicated by this interval. − Number Exposed to Risk. The number of surviving cases minus one half the censored cases. This is intended to account for the effect of the censored cases. Life tables
  • 17. SPSS outputs: life table − Number of Terminal Events. The number of cases that experience the terminal event in this interval. These are customers with churn = 1. − Proportion Terminating. The ratio of terminal events to the number exposed to risk (10/264.5=0.0378). − Proportion Surviving. One minus the proportion terminating. Life tables
  • 18. SPSS Outputs: life table Life tables
  • 19. SPSS Outputs: life table − Cumulative Proportion Surviving at End of Interval. The proportion of cases surviving from the start of the table to the end of the interval (266-10-17)/266=0.8984 (second row). − Probability Density. An estimate of the probability of experiencing the terminal event during the interval. (i.e. likelihood that an item will experience the terminal event at a certain point in time based on its survival to an earlier time) − Hazard Rate. An estimate of experiencing the terminal event during the interval, conditional upon surviving to the start of the interval. (i.e. the rate of death for an item of a given time) Life tables
  • 20. SPSS Outputs: life table − The greatest number and proportion of terminal events occur within the first year, which suggests that customers should be monitored more closely during their first year to be sure of their satisfaction with the company's service. Life tables
  • 21. SPSS Outputs: survival function Life tables 1.The horizontal axis shows the time to event. The vertical axis shows the probability of survival. 2.Any point on the survival curve shows the probability that a customer of a given service category will remain a customer past that time. 3.Total service and Basic service customers have the lowest survival curves, and E- service customers have lower curves than Plus service customers.
  • 22. SPSS Outputs Life tables 1.Wilcoxon test is used to compare survival distribution among groups, with the test statistic based on differences in group mean scores. 2.Since the significance value of the test is less than 0.05, we conclude that the survival curves are different across the group. 3. Pairwise comparisons show which two groups are significantly different in survival curves.
  • 23. The Kaplan-Meier procedure is a non-parametric method of estimating time-to-event models in the presence of censored cases. A descriptive procedure for examining the distribution of time-to-event variables. We also can compare the distribution by levels of a factor variable or produce separate analyses by levels of a stratification variable. Censored cases (right-censored cases) are those for which the event of interest has not yet happened. Kaplan-Meier procedure
  • 24. Assumptions − Probabilities for the event of interest should depend only on time taking in consideration a stratified factor after the initial event without covariates effects. − Cases that enter the study at different times (for example, patients who begin treatment at different times) should behave similarly. − Censored and uncensored cases behave the same. If, for example, many of the censored cases are patients with more serious conditions, your results may be biased. Kaplan-Meier procedure
  • 25. Example (from IBM SPSS 20.0) : data file: pain_medication − A pharmaceutical company is developing an anti-inflammatory medication for treating chronic arthritic pain. The research interest is the time it takes for the drug to take effect and how it compares to an existing drug. Shorter times to effect are considered better. Event: drug takes effect Kaplan-Meier procedure
  • 26. Variables − Time variable (duration variable): must be a continuous variable − Status variable: categorical variable, represents the event of interest (drug has effect or not). − Factor variable (stratification variable): categorical variable, represents a causal effect (type of treatment for example). Kaplan-Meier procedure
  • 27. Kaplan-Meier procedure  We have Factor variable (stratification factor): Treatment (0 = New drug, 1 = old drug), Status variable (event): status ( 0 = censored, 1 = take effect), Time variable: time  We want to compare the effect of two different drugs. Null hypothesis: whether survival function is the same between different levels of the factor variable (between old and new drug) .
  • 28. Go to Analyze > Survival > Kaplan-Meier Kaplan-Meier procedure
  • 30. Click Compare Factor Kaplan-Meier procedure Log rank: Tests equality of survival functions by weighting all time points the same. Breslow: Tests equality of survival functions by weighting all time points by the number of cases at risk at each time point. Tarone-Ware: Tests equality of survival functions by weighting all time points by the square root of the number of cases at risk at each time point.
  • 31. Compare factor Kaplan-Meier procedure Pooled over strata: a single test is computed for all factor levels, testing for equality of survival function across all levels of the factor variable. Pairwise over strata: a separate test is computed for each pair of factor levels when a pooled test shows non-equality of survival functions. For each stratum: a separate test is computed for group formed by the stratification variable. Pairwise for each stratum: a separate test is computed for each pair of factor variable, for each stratum of the stratification variable.
  • 33. SPSS outputs: survival table Kaplan-Meier procedure The survival table is a descriptive table that details the time until the drug takes effect. The table is sectioned by each level of Treatment, and each observation occupies its own row in the table. The table is very large.
  • 34. Survival table − Time. The time at which the event or censoring occurred. − Status. Indicates whether the case experienced the terminal event or was censored. − Cumulative Proportion Surviving at the Time. The proportion of cases surviving from the start of the table until this time. When multiple cases experience the terminal event at the same time, these estimates are printed once for that time period and apply to all the cases whose drug took effect at that time. Kaplan-Meier procedure
  • 35. Survival table − N of Cumulative Events. The number of cases that have experienced the terminal event from the start of the table until this time. − N of Remaining Cases. The number of cases that, at this time, haven’t yet to experience the terminal event or be censored. Kaplan-Meier procedure
  • 36. Means or Median for Survival time Kaplan-Meier procedure .
  • 37. Overall comparison Kaplan-Meier procedure This table provides overall tests of the equality of survival times across groups. Since the significance values of the tests are all greater than 0.05, there is no statistically significant difference between two treatments in survival time.
  • 38. Survival function Kaplan-Meier procedure 1. The survival curves give a visual representation of the life tables. 2. The horizontal axis shows the time to event. In this plot, drops in the survival curve occur whenever the medication takes effect in a patient. 3. The vertical axis shows the probability of survival (probability of not experience the treatment effect).
  • 39. The Cox Regression procedure is useful for modeling the time to a specified event, based upon the values of given covariates. One or more covariates are used to predict a status (event). The central statistical output is the hazard ratio. Data contain censored and uncensored cases. Similar to logistic regression, but Cox regression assesses relationship between survival time and covariates . Cox regression
  • 40. Terms − Status variable: the dependent in Cox regression, should be binary variable. − Time variable: measures duration to the event defined by the status variable (continuous or discrete). − Covariates: independent/predictor variables. They can be categorical or continuous. They also can be time-fixed or time- dependent. − Interaction terms − Categorical covariates: SPSS automatically convert them into a set of dummy variables, omitting one category (reference). Cox regression
  • 41. Example (from SPSS): data file: telco − Use Cox Regression to determine which attributes are associated with shorter "time to churn“. − Time variable: tenure (month with services) − Status variable: churn (0 = No, 1 = Yes). − Covariates: age, marital, address, employ, retire, gender, reside, and custcat Go to Analyze > Survival > Cox Regression Cox regression
  • 48. SPSS Outputs: variables in the equation − Exp(B), which can be interpreted as the predicted change in the hazard for a unit increase in the predictor. − For binary covariates, hazard ratio is the estimate of the ratio of the hazard rate in one group to the hazard rate in another group. − The value of Exp(B) for marital means that the churn hazard for an unmarried customer is 1.395 times that of a married customer. − The value of Exp(B) for address means that the churn hazard is reduced by 100%−(100%×0.943)=5.7% for each year a customer has lived at the same address. Cox regression
  • 49. SPSS Outputs Cox regression This table displays the average value of each predictor variable, plus a pattern for each level of custcat. The four patterns correspond to each of the customer types, each with otherwise "average" predictor values.
  • 50. Cox regression  SPSS Outputs: survival function  The basic survival curve is a visual display of the model- predicted time to churn for the "average" customer.  The horizontal axis shows the  time to event.  The vertical axis shows the probability of survival.
  • 51. SPSS Outputs Cox regression 1. The plots show the effect of customer category. 2.Total service and Basic service customers have lower survival curves because, they are more likely to have shorter times to churn.
  • 52. SPSS Outputs Cox regression 1.The horizontal axis shows the time to event. 2.The vertical axis shows the cumulative hazard, equal to the negative log of the survival probability.
  • 53. Time-dependent covariates − Simple product of the time variable and the covariate. − Some variables may have different values at different time periods but aren't systematically related to time. In such cases, you need to define a segmented time-dependent covariate, which can be done using logical expressions. Cox regression with a time- dependent covariate
  • 54. Example (from IBM SPSS 20.0): data file: recidivism − A government law enforcement agency is concerned about recidivism rates in their area of jurisdiction. One of the measures of recidivism is the time until second arrest for offenders. The agency would like to model time to rearrest using Cox Regression, but are worried the proportional hazards assumption is invalid across age categories. Go to Analyze > Survival > Cox with time-dependent covariate Cox regression with a time-dependent covariates
  • 55. Run analysis: the product of variable T_ and variable age Cox regression with a time-dependent covariate
  • 56. Click Model and back to Cox regression window Cox regression with a time- dependent covariate Dependent variable: arrest2 Time variable: time Covariates: age and new product variable: T_cov
  • 57. SPSS outputs Cox regression with a time- dependent covariate The time-dependent covariate has a significance value less than 0.05, which means it contributes to the model, but the value of the coefficient is very small. We found that the effect of age on recidivism is time-dependent, and added a term to the model that helps to account for that dependence.