A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury

Ohio Center of Excellence in Knowledge-Enabled Computing
A new CPXR Based Logistic Regression Method
and Clinical Prognostic Modeling Results Using
the Method on Traumatic Brain Injury
Vahid Taslimitehrani, Guozhu Dong
kno.e.sis center
Department of Computer Science and Engineering
Wright State University
Dayton, OH
1

Outline
• Motivation and background
• Preliminaries
– Contrast pattern mining
– Logistic regression
• CPXR(Log)
• TBI data
• Results of CXR(Log) on TBI
• Conclusion
• References
2

Motivation and Background
• CPXR (Log): Accurate and informative prognostic models
 Prognostic models are central to medicine. [Steyerberg, 2009]
 Facilitate physicians decision making process on patient treatment plan,
screening and etc.
 Help to understand the disease behavior including identifying new
biomarkers.
 Number of articles listed in PubMed with “prediction model” in title in
2012 is 7 times of that in 2000. [pubmed]
3

• CPXR (Log): A powerful new generic Logistic Regression method
 Logistic regression is one of the most popular approaches for building
clinical prediction models. [Steyerberg, 2009]
 Logistic regression models are desirable since
 They are representable.
 They are probabilistic based.
 They are flexible in terms of
predictor variables. (categorical
and numerical variables)
4

• Traumatic Brain Injury
 One of the leading causes of death and disability worldwide.
 Annually, 1.5 million death in worldwide. [Perel, 2006]
 $76.5 billion dollars including direct and indirect cost in 2010 in US.
[www.cdc.gov]
 Early and accurate prognostic models based on just admission time data
to make time–critical clinical decisions by physicians.
5

Challenges in clinical modeling
• Accuracy of the clinical prediction models
• Easiness to interpret clinical prediction models
• To explain medical decision to the patient
• To identify important risk factors
• Avoid overfitting to make clinical prediction models more generalizable
• Early decision making
• ABILITY to CAPTURE
– Heterogeneous patient group behavior
6

CPXR works well by using
several pattern local model pairs
These are different subpopulations that need different
predicted models. Using just one prediction function does
not work well!!
Not an extreme case! It happens very often …
7

How CPXR(Log) is different from other classifiers?
• CPXR introduced the idea of
– using patterns to logically characterize different
subpopulations of data and
– using local regression models to represent predictor response
relationship of the subpopulation
– choosing a pattern only if the local model is very accurate
[Dong, 2014]
• CPXR(Log)
– can capture diversified/heterogeneous behavior.
– is more generalizable.
– is less overfitting than other classifiers.
• CPXR(Log) is more accurate than other classifiers like SVM and
Random Forest.
8

Traditional classification vs CPXR
Training Data
Classification
engine
Classifier
(model)
Training
Data
Classification
engine
Baseline
model
• Large
error data
• Small
error data
(Pattern 1, Model 1)
(Pattern 2, Model 2)
(Pattern k, Model k)
.
.
.
Build and select
CPs &
local models
9

CPXR(Log) – PXR concept
• Definition: Let 𝐷 = 𝑋𝑖, 𝑌𝑖 1 ≤ 𝑖 ≤ 𝑛 be training data for regression. Let
𝑓 be a regression model built on 𝐷, which we will call the baseline
model on 𝐷. A pattern aided regression (PXR) model is a tuple
𝑃𝑀 = ( 𝑃1, 𝑓1, 𝑤1 , … , 𝑃𝑘, 𝑓𝑘, 𝑤 𝑘 , 𝑓𝑑), where {𝑃1, … , 𝑃𝑘} is the pattern set of
𝑃𝑀, 𝑓𝑖s are local regression models of 𝑃𝑖s and 𝑓𝑑 is the default regression
model. We define the regression model of 𝑃𝑀 as
𝑓𝑃𝑀 =
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖 𝑓𝑖(𝑥)
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖
𝑖𝑓 𝜋 𝑥 ≠ 0
𝑓𝑑 𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
for each instance 𝑥, where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑥 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 𝑃𝑖 .
10

Preliminaries: Contrast Patterns
• A toy example
• 𝑃1 = 𝐴2 = 𝑐 & 𝐴3 = 𝑒 𝑚𝑡 𝑃1, 𝐷 = 𝑡2, 𝑡3, 𝑡4 𝑠𝑢𝑝𝑝(𝑃1, 𝐷)=
3
5
= 𝟔𝟎%
• 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 =
2
1
= 𝟐
• Given a threshold like 2, 𝑃1 is a contrast pattern.
• Details: We only consider one minimal generator pattern for each
“equivalency class” of contrast patterns.
TID 𝑨 𝟏 𝑨 𝟐 𝑨 𝟑 𝑨 𝟒 𝑨 𝟓 Class
𝒕 𝟏 b d e g i 𝑪 𝟏
𝒕 𝟐 b c e g i 𝑪 𝟏
𝒕 𝟑 a c e g j 𝑪 𝟐
𝒕 𝟒 a c e h j 𝑪 𝟐
𝒕 𝟓 b d f g i 𝑪 𝟐

Quality measures
• CPXR(Log) needs to efficiently extract a desirable pattern set from a
huge search space of potential pattern sets.
• Definition: The average residual reduction (arr) of a pattern 𝑃 w.r.t.
a model 𝑓 and a dataset 𝐷 is
𝑎𝑟𝑟 𝑃 =
𝑋 ∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓𝑃)
𝑚𝑑𝑠(𝑃)
• Definition: The total residual reduction (trr) of a pattern set 𝑃𝑆 =
𝑃1, … , 𝑃𝑘 w.r.t a model 𝑓 and a dataset 𝐷 is
𝑡𝑟𝑟 𝑃 =
𝑋 ∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓 𝑃𝑀)
𝑋∈𝐷 𝑟 𝑋(𝑓)
where 𝑃𝑀 = 𝑃1, 𝑓𝑃1
, 𝑤1 , … , 𝑃𝑘, 𝑓𝑃 𝑘
, 𝑤 𝑘 , 𝑓 , 𝑤𝑖 = 𝑎𝑟𝑟(𝑃𝑖), and 𝑚𝑑𝑠 𝑃𝑆 =
𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃).

CPXR(Log) algorithm -- outline
• First step: split training dataset 𝐷 into two classes, 𝐿𝐸 and 𝑆𝐸.
• 𝐿𝐸: instances of 𝐷 where baseline model 𝑓 makes Large Error.
• 𝑆𝐸: instances of 𝐷 where baseline model 𝑓 makes Small Error.
• Second step: extract all contrast patterns on 𝐿𝐸 satisfying 𝑚𝑖𝑛𝑆𝑢𝑝.
• Third step: search for a small set of pattern to maximize error reduction
and uses that set to build a 𝑃𝑋𝑅 model.
• Note
 Each pattern 𝑃 is associated with a local regression model 𝑓𝑃 built on 𝑃’s matching
data.
 Using a pattern 𝑃 and its local associated regression model 𝑓𝑃 is a flexible way to
represent one predictor response relationship.
 Different (𝑃, 𝑓𝑃) pairs represent highly different predictor response relationships.
13

CPXR(Log) – details (1)
• Inputs:
• Training data 𝐷 = (𝑥𝑖, 𝑦𝑖) 1 ≤ 𝑖 ≤ 𝑛
• Baseline model 𝑓
• 𝜌 to partition 𝐷 into 𝐿𝐸 and 𝑆𝐸
• 𝑚𝑖𝑛𝑆𝑢𝑝 threshold on contrast patterns
• Output:
• A 𝑃𝑋𝑅 model
 Let 𝑟1, … , 𝑟𝑛 denote 𝑓’s error on 𝑥1, … , 𝑥 𝑛;
 Determine 𝜅 to minimize 𝜌 −
𝑟 𝑖>𝜅 𝑟 𝑖
𝑟 𝑖
;
 Let 𝐿𝐸 = 𝑥𝑖 𝑟𝑖 > 𝜅 , 𝑆𝐸 = 𝐷 − 𝐿𝐸;
 Discretize each numerical variable using entropy based binning;
 Extract all contrast patterns for 𝑚𝑖𝑛𝑆𝑢𝑝 in the 𝐿𝐸 class (𝐶𝑃𝑆);
14

CPXR(Log) – details (2)
 For each 𝑃 ∈ 𝐶𝑃𝑆, build the local regression model 𝑓𝑃 for data in 𝑚𝑑𝑠(𝑃);
 Let 𝑃𝑆 = 𝑃0 , where 𝑃0 is the pattern 𝑃 in 𝐶𝑃𝑆 with highest 𝑎𝑟𝑟;

 Let 𝑓𝑑 be the regression model trained from 𝐷 − 𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃);
 Return 𝑃𝑀(𝑃𝑆, 𝑓𝑑);
15

TBI data
• TBI dataset is a collection of some International and US Tirilazad trials.
• 2159 instances. [Steyerberg, 2008]
• 15 numerical and categorical predictor variables.
• Missing instances were treated using multiple imputation.
• The outcome variable is the Glascow Outcome Scale: GOS 1 (dead),…,
GOS 5 (good recovery)
• This study used two discretized versions of GOS: “Mortality” vs survival
(GOS1 vs GOS 2-5), “Unfavorable” vs favorable (GOS 1-3 vs GOS 4-5)
Category Predictor variables
Basic Cause of injury, age, GCS motor score, pupil reactivity
Computed
tomography (CT)
Hypoxia, hypotension, Marshall CT, tSAH, eDH,
compressed cistern, midline shift more than 5 mm
Lab Glucose, ph, sodium, hb
16

Results – Performance of SLogR and CPXR(Log) on Mortality models
Model SLogR CPXR(Log)
Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC
Basic 0.95 0.18 0.27 0.77 0.96 0.18 0.28 0.8
Basic+CT 0.95 0.32 0.42 0.8 0.96 0.42 0.53 0.88
Basic+CT+Lab 0.94 0.36 0.46 0.8 0.97 0.46 0.58 0.92
Of course more accurate than standard logistic regression
17

Results – Performance of SLogR and CPXR(Log) on Unfavorable models
Model SLogR CPXR(Log)
Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC
Basic 0.85 0.52 0.59 0.76 0.89 0.54 0.63 0.82
Basic+CT 0.85 0.6 0.66 0.8 0.87 0.65 0.7 0.87
Basic+CT+Lab 0.84 0.61 0.66 0.81 0.91 0.72 0.76 0.93
18

Results – Impact of adding more variables on AUC
Variable set change Mortality Unfavorable
CPXR(Log) SLogR CPXR(Log) SLogR
Basic  Basic +CT 10% 7.7% 6% 5.2%
Basic  Basic + CT + Lab 15% 11.1% 13.4% 6.6%
Mortality Unfavorable
Basic Basic+CT Basic+CT+Lab Basic Basic+CT Basic+CT+Lab
11.1% 12.8% 15% 7.9% 8.8% 14.8%
CPXR(Log) over SlogR
AUC improvement when more variables are used by CPXR(Log) and SLogR
19

Results – ROC curves of Basic models
20

Results - ROC curves of (Basic + CT) models
21

Results - ROC curves of (Basic+CT+Lab) models
22

Results – Performance comparison
CPXR(Log)
Comparing CPXR(Log)
performance with
- Logistic Regression
- SVM
- Random Forest
23

Example: patterns used by CPXR(Log) & Mortality (Basic+CT+Lab)
patterns arr Cov
(CT classification = III) 15% 20%
(CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4) 12% 15%
(No compressed cistern) AND (No midline shift) AND (7.22 < PH <= 7.45) 10% 40%
(10.77 < glucose <= 21.98) AND (134 < sodium <= 144) 18% 18%
(No Hypotension) AND (134 < sodium < 144) AND (10.55 < HB <= 14.57)
AND (with tSAH)
19% 20%
(No tSAH) AND (134 < sodium <= 144) AND (10.77 < glucose <= 21.98)
AND (No Hypotension) AND (No midline shift) AND (One reactive pupil)
19% 20%
(No tSAH) AND (One reactive pupil) 18% 40%
24

Odds ratios
(CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4)
25

Residual reduction and example patient
• Age = 15 years old
• Cause of injury =
motorbike accident
• GCS motor score = 5
(No eye response)
• No reactive pupil
• No hypoxia
• No hypotension
• CT scan classification = V
(mass lesion)
• No tSAH
• With ePDH
• Has midline shift more
than 5 mm
• Glucose = 9.06 mmol/l
• PH = 7.37
• Sodium = 141 mmol/l
• Hb = 14.4 g/dl
• Patient is dead. 
0.78, risk of
survival based on
standard logistic
regression!!!!
0
100
200
300
400
500
600
0 500 1000 1500 2000 2500
Error distribution of TBI dataset on SLogR
Patient is matched
with “pattern II”
and CPXR(Log)
predicted 0.38 risk
of survival.
26
0
2
4
6
8
10
12
0 500 1000 1500 2000 2500
Error distribution of TBI dataset on CPXR(Log)

Results – Box plot of RMSE reduction in CPXR
• Piecewise linear regression
• Support vector regression
• Bayesian additive regression tree
• Gradient boosting method
How much CPXR can reduce RMSE (Root Mean Square Error) in 50 datasets comparing to
27

Results – Noise sensitivity and impact of the number of patterns
Number of patterns is determined by
the method automatically.
How much noisy datasets can impact
on the performance of CPXR and
other methods?
28

Conclusion
• We presented an effective new method, CPXR(Log) for logistic regression
and for clinical predictive modeling.
• We showed CPXR is more accurate than standard logistic regression and
some other classification algorithms.
• We also presented CPXR(Log) models including patterns and local
models an new odds ratios of predictor variables.
29

References
• Guozhu Dong & Vahid Taslimitehrani. Pattern-Aided Regression
Modeling and Prediction Model Analysis. Tech Report, CSE, Wright State
Univ. 2014.
• E. Steyerberg: Clinical prediction models. Springer, 2009.
• P. Perel, P. Edwards, R. Wentz, and I. Roberts: Systematic review of
prognostic models in traumatic brain injury. BMC medical informatics
and decision making, 6(1): 1-10, 2006.
• G. Dong, J. Li: Efficient mining of emerging patterns: Discovering trends
and differences. In Proc. KDD, 43-52, 1999.
• E.W. Steyerberg, et al: Predicting outcome after traumatic brain injury:
development and international validation of prognostic scores based on
admission characteristics. PLoS medicine, 5(8): e165, 2008.
30

Preliminaries: Logistic Regression
• Regression modeling: predicting response variable (output) based on
predictor variables (input).
• Logistic regression: the response variable is binary. For example,
• “having the disease” or “not”
• “mortal” or “not”
• Let X=(𝑥1, 𝑥2, … , 𝑥 𝑛) be a vector of predictor variables
• and Y be the response variable.
• The goal of logistic regression is learning a function like
𝑙𝑝 𝑋 = 𝛽0 + 𝑖=1
𝑛
𝛽𝑖 × 𝑥𝑖 satisfying
log
𝑃 𝑌 = 1
𝑃 𝑌 = 1 + 1
= 𝑙𝑝(𝑋)
Chi-square (𝜒2) is one
of the goodness of fit
measures for logistic
regression
31

• An item is a single variable condition of the form
“A = a” or “ 𝒗 𝟏 ≼ 𝑨 < 𝒗 𝟐 “
• A pattern is a finite set of items.
• An instance X from dataset D is said to match a
pattern P, if X satisfies every item in P.
• Example:
“ 60 ≼ Age ≺ 80 ” AND “Diagnosed with high cholesterol = YES”
is a pattern with TWO items.
One instance (patient ID = 1) matches the above pattern.
Patient
ID
Age BMI Sys Blood
Pressure
Diagnosed with high
Cholesterol
Diagnosed with
Heart Failure ©
1 75 22 120 YES YES
2 67 27 131 NO NO
32

• The matching data of pattern P in dataset D or 𝑚𝑡(𝑃, 𝐷) is the set of all
instances matching pattern P.
• The support of pattern P in D is 𝑠𝑢𝑝𝑝 𝑃, 𝐷 =
𝑚𝑡(𝑃,𝐷)
𝐷
• Given 2 classes 𝐶1 and 𝐶2,the support ratio of pattern P from 𝐶1 to 𝐶2
𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 =
𝑠𝑢𝑝𝑝(𝑃,𝐶2)
𝑠𝑢𝑝𝑝(𝑃,𝐶1)
• Given a threshold 𝛾, a contrast pattern (emerging pattern) of class 𝐶2
is a pattern P satisfying 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 ≽ 𝛾. [Dong, 1999]
33

A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury

Recommended

More Related Content

What's hot (15)

Viewers also liked (20)

Similar to A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury (20)

Recently uploaded (20)

A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury

Editor's Notes