SlideShare a Scribd company logo
Ohio Center of Excellence in Knowledge-Enabled Computing
A new CPXR Based Logistic Regression Method
and Clinical Prognostic Modeling Results Using
the Method on Traumatic Brain Injury
Vahid Taslimitehrani, Guozhu Dong
kno.e.sis center
Department of Computer Science and Engineering
Wright State University
Dayton, OH
1
Ohio Center of Excellence in Knowledge-Enabled Computing
Outline
• Motivation and background
• Preliminaries
– Contrast pattern mining
– Logistic regression
• CPXR(Log)
• TBI data
• Results of CXR(Log) on TBI
• Conclusion
• References
2
Ohio Center of Excellence in Knowledge-Enabled Computing
Motivation and Background
• CPXR (Log): Accurate and informative prognostic models
 Prognostic models are central to medicine. [Steyerberg, 2009]
 Facilitate physicians decision making process on patient treatment plan,
screening and etc.
 Help to understand the disease behavior including identifying new
biomarkers.
 Number of articles listed in PubMed with “prediction model” in title in
2012 is 7 times of that in 2000. [pubmed]
3
Ohio Center of Excellence in Knowledge-Enabled Computing
Motivation and Background
• CPXR (Log): A powerful new generic Logistic Regression method
 Logistic regression is one of the most popular approaches for building
clinical prediction models. [Steyerberg, 2009]
 Logistic regression models are desirable since
 They are representable.
 They are probabilistic based.
 They are flexible in terms of
predictor variables. (categorical
and numerical variables)
4
Ohio Center of Excellence in Knowledge-Enabled Computing
Motivation and Background
• Traumatic Brain Injury
 One of the leading causes of death and disability worldwide.
 Annually, 1.5 million death in worldwide. [Perel, 2006]
 $76.5 billion dollars including direct and indirect cost in 2010 in US.
[www.cdc.gov]
 Early and accurate prognostic models based on just admission time data
to make time–critical clinical decisions by physicians.
5
Ohio Center of Excellence in Knowledge-Enabled Computing
Challenges in clinical modeling
• Accuracy of the clinical prediction models
• Easiness to interpret clinical prediction models
• To explain medical decision to the patient
• To identify important risk factors
• Avoid overfitting to make clinical prediction models more generalizable
• Early decision making
• ABILITY to CAPTURE
– Heterogeneous patient group behavior
6
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR works well by using
several pattern local model pairs
These are different subpopulations that need different
predicted models. Using just one prediction function does
not work well!!
Not an extreme case! It happens very often …
7
Ohio Center of Excellence in Knowledge-Enabled Computing
How CPXR(Log) is different from other classifiers?
• CPXR introduced the idea of
– using patterns to logically characterize different
subpopulations of data and
– using local regression models to represent predictor response
relationship of the subpopulation
– choosing a pattern only if the local model is very accurate
[Dong, 2014]
• CPXR(Log)
– can capture diversified/heterogeneous behavior.
– is more generalizable.
– is less overfitting than other classifiers.
• CPXR(Log) is more accurate than other classifiers like SVM and
Random Forest.
8
Ohio Center of Excellence in Knowledge-Enabled Computing
Traditional classification vs CPXR
Training Data
Classification
engine
Classifier
(model)
Training
Data
Classification
engine
Baseline
model
• Large
error data
• Small
error data
(Pattern 1, Model 1)
(Pattern 2, Model 2)
(Pattern k, Model k)
.
.
.
Build and select
CPs &
local models
9
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) – PXR concept
• Definition: Let 𝐷 = 𝑋𝑖, 𝑌𝑖 1 ≤ 𝑖 ≤ 𝑛 be training data for regression. Let
𝑓 be a regression model built on 𝐷, which we will call the baseline
model on 𝐷. A pattern aided regression (PXR) model is a tuple
𝑃𝑀 = ( 𝑃1, 𝑓1, 𝑤1 , … , 𝑃𝑘, 𝑓𝑘, 𝑤 𝑘 , 𝑓𝑑), where {𝑃1, … , 𝑃𝑘} is the pattern set of
𝑃𝑀, 𝑓𝑖s are local regression models of 𝑃𝑖s and 𝑓𝑑 is the default regression
model. We define the regression model of 𝑃𝑀 as
𝑓𝑃𝑀 =
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖 𝑓𝑖(𝑥)
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖
𝑖𝑓 𝜋 𝑥 ≠ 0
𝑓𝑑 𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
for each instance 𝑥, where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑥 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 𝑃𝑖 .
10
Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Contrast Patterns
• A toy example
• 𝑃1 = 𝐴2 = 𝑐 & 𝐴3 = 𝑒 𝑚𝑡 𝑃1, 𝐷 = 𝑡2, 𝑡3, 𝑡4 𝑠𝑢𝑝𝑝(𝑃1, 𝐷)=
3
5
= 𝟔𝟎%
• 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 =
2
1
= 𝟐
• Given a threshold like 2, 𝑃1 is a contrast pattern.
• Details: We only consider one minimal generator pattern for each
“equivalency class” of contrast patterns.
TID 𝑨 𝟏 𝑨 𝟐 𝑨 𝟑 𝑨 𝟒 𝑨 𝟓 Class
𝒕 𝟏 b d e g i 𝑪 𝟏
𝒕 𝟐 b c e g i 𝑪 𝟏
𝒕 𝟑 a c e g j 𝑪 𝟐
𝒕 𝟒 a c e h j 𝑪 𝟐
𝒕 𝟓 b d f g i 𝑪 𝟐
Ohio Center of Excellence in Knowledge-Enabled Computing
Quality measures
• CPXR(Log) needs to efficiently extract a desirable pattern set from a
huge search space of potential pattern sets.
• Definition: The average residual reduction (arr) of a pattern 𝑃 w.r.t.
a model 𝑓 and a dataset 𝐷 is
𝑎𝑟𝑟 𝑃 =
𝑋 ∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓𝑃)
𝑚𝑑𝑠(𝑃)
• Definition: The total residual reduction (trr) of a pattern set 𝑃𝑆 =
𝑃1, … , 𝑃𝑘 w.r.t a model 𝑓 and a dataset 𝐷 is
𝑡𝑟𝑟 𝑃 =
𝑋 ∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓 𝑃𝑀)
𝑋∈𝐷 𝑟 𝑋(𝑓)
where 𝑃𝑀 = 𝑃1, 𝑓𝑃1
, 𝑤1 , … , 𝑃𝑘, 𝑓𝑃 𝑘
, 𝑤 𝑘 , 𝑓 , 𝑤𝑖 = 𝑎𝑟𝑟(𝑃𝑖), and 𝑚𝑑𝑠 𝑃𝑆 =
𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃).
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) algorithm -- outline
• First step: split training dataset 𝐷 into two classes, 𝐿𝐸 and 𝑆𝐸.
• 𝐿𝐸: instances of 𝐷 where baseline model 𝑓 makes Large Error.
• 𝑆𝐸: instances of 𝐷 where baseline model 𝑓 makes Small Error.
• Second step: extract all contrast patterns on 𝐿𝐸 satisfying 𝑚𝑖𝑛𝑆𝑢𝑝.
• Third step: search for a small set of pattern to maximize error reduction
and uses that set to build a 𝑃𝑋𝑅 model.
• Note
 Each pattern 𝑃 is associated with a local regression model 𝑓𝑃 built on 𝑃’s matching
data.
 Using a pattern 𝑃 and its local associated regression model 𝑓𝑃 is a flexible way to
represent one predictor response relationship.
 Different (𝑃, 𝑓𝑃) pairs represent highly different predictor response relationships.
13
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) – details (1)
• Inputs:
• Training data 𝐷 = (𝑥𝑖, 𝑦𝑖) 1 ≤ 𝑖 ≤ 𝑛
• Baseline model 𝑓
• 𝜌 to partition 𝐷 into 𝐿𝐸 and 𝑆𝐸
• 𝑚𝑖𝑛𝑆𝑢𝑝 threshold on contrast patterns
• Output:
• A 𝑃𝑋𝑅 model
 Let 𝑟1, … , 𝑟𝑛 denote 𝑓’s error on 𝑥1, … , 𝑥 𝑛;
 Determine 𝜅 to minimize 𝜌 −
𝑟 𝑖>𝜅 𝑟 𝑖
𝑟 𝑖
;
 Let 𝐿𝐸 = 𝑥𝑖 𝑟𝑖 > 𝜅 , 𝑆𝐸 = 𝐷 − 𝐿𝐸;
 Discretize each numerical variable using entropy based binning;
 Extract all contrast patterns for 𝑚𝑖𝑛𝑆𝑢𝑝 in the 𝐿𝐸 class (𝐶𝑃𝑆);
14
Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) – details (2)
 For each 𝑃 ∈ 𝐶𝑃𝑆, build the local regression model 𝑓𝑃 for data in 𝑚𝑑𝑠(𝑃);
 Let 𝑃𝑆 = 𝑃0 , where 𝑃0 is the pattern 𝑃 in 𝐶𝑃𝑆 with highest 𝑎𝑟𝑟;

 Let 𝑓𝑑 be the regression model trained from 𝐷 − 𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃);
 Return 𝑃𝑀(𝑃𝑆, 𝑓𝑑);
15
Ohio Center of Excellence in Knowledge-Enabled Computing
TBI data
• TBI dataset is a collection of some International and US Tirilazad trials.
• 2159 instances. [Steyerberg, 2008]
• 15 numerical and categorical predictor variables.
• Missing instances were treated using multiple imputation.
• The outcome variable is the Glascow Outcome Scale: GOS 1 (dead),…,
GOS 5 (good recovery)
• This study used two discretized versions of GOS: “Mortality” vs survival
(GOS1 vs GOS 2-5), “Unfavorable” vs favorable (GOS 1-3 vs GOS 4-5)
Category Predictor variables
Basic Cause of injury, age, GCS motor score, pupil reactivity
Computed
tomography (CT)
Hypoxia, hypotension, Marshall CT, tSAH, eDH,
compressed cistern, midline shift more than 5 mm
Lab Glucose, ph, sodium, hb
16
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Performance of SLogR and CPXR(Log) on Mortality models
Model SLogR CPXR(Log)
Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC
Basic 0.95 0.18 0.27 0.77 0.96 0.18 0.28 0.8
Basic+CT 0.95 0.32 0.42 0.8 0.96 0.42 0.53 0.88
Basic+CT+Lab 0.94 0.36 0.46 0.8 0.97 0.46 0.58 0.92
Of course more accurate than standard logistic regression
17
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Performance of SLogR and CPXR(Log) on Unfavorable models
Model SLogR CPXR(Log)
Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC
Basic 0.85 0.52 0.59 0.76 0.89 0.54 0.63 0.82
Basic+CT 0.85 0.6 0.66 0.8 0.87 0.65 0.7 0.87
Basic+CT+Lab 0.84 0.61 0.66 0.81 0.91 0.72 0.76 0.93
18
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Impact of adding more variables on AUC
Variable set change Mortality Unfavorable
CPXR(Log) SLogR CPXR(Log) SLogR
Basic  Basic +CT 10% 7.7% 6% 5.2%
Basic  Basic + CT + Lab 15% 11.1% 13.4% 6.6%
Mortality Unfavorable
Basic Basic+CT Basic+CT+Lab Basic Basic+CT Basic+CT+Lab
11.1% 12.8% 15% 7.9% 8.8% 14.8%
CPXR(Log) over SlogR
AUC improvement when more variables are used by CPXR(Log) and SLogR
19
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – ROC curves of Basic models
20
Ohio Center of Excellence in Knowledge-Enabled Computing
Results - ROC curves of (Basic + CT) models
21
Ohio Center of Excellence in Knowledge-Enabled Computing
Results - ROC curves of (Basic+CT+Lab) models
22
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Performance comparison
CPXR(Log)
Comparing CPXR(Log)
performance with
- Logistic Regression
- SVM
- Random Forest
23
Ohio Center of Excellence in Knowledge-Enabled Computing
Example: patterns used by CPXR(Log) & Mortality (Basic+CT+Lab)
patterns arr Cov
(CT classification = III) 15% 20%
(CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4) 12% 15%
(No compressed cistern) AND (No midline shift) AND (7.22 < PH <= 7.45) 10% 40%
(10.77 < glucose <= 21.98) AND (134 < sodium <= 144) 18% 18%
(No Hypotension) AND (134 < sodium < 144) AND (10.55 < HB <= 14.57)
AND (with tSAH)
19% 20%
(No tSAH) AND (134 < sodium <= 144) AND (10.77 < glucose <= 21.98)
AND (No Hypotension) AND (No midline shift) AND (One reactive pupil)
19% 20%
(No tSAH) AND (One reactive pupil) 18% 40%
24
Ohio Center of Excellence in Knowledge-Enabled Computing
Odds ratios
(CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4)
25
Ohio Center of Excellence in Knowledge-Enabled Computing
Residual reduction and example patient
• Age = 15 years old
• Cause of injury =
motorbike accident
• GCS motor score = 5
(No eye response)
• No reactive pupil
• No hypoxia
• No hypotension
• CT scan classification = V
(mass lesion)
• No tSAH
• With ePDH
• Has midline shift more
than 5 mm
• Glucose = 9.06 mmol/l
• PH = 7.37
• Sodium = 141 mmol/l
• Hb = 14.4 g/dl
• Patient is dead. 
0.78, risk of
survival based on
standard logistic
regression!!!!
0
100
200
300
400
500
600
0 500 1000 1500 2000 2500
Error distribution of TBI dataset on SLogR
Patient is matched
with “pattern II”
and CPXR(Log)
predicted 0.38 risk
of survival.
26
0
2
4
6
8
10
12
0 500 1000 1500 2000 2500
Error distribution of TBI dataset on CPXR(Log)
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Box plot of RMSE reduction in CPXR
• Piecewise linear regression
• Support vector regression
• Bayesian additive regression tree
• Gradient boosting method
How much CPXR can reduce RMSE (Root Mean Square Error) in 50 datasets comparing to
27
Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Noise sensitivity and impact of the number of patterns
Number of patterns is determined by
the method automatically.
How much noisy datasets can impact
on the performance of CPXR and
other methods?
28
Ohio Center of Excellence in Knowledge-Enabled Computing
Conclusion
• We presented an effective new method, CPXR(Log) for logistic regression
and for clinical predictive modeling.
• We showed CPXR is more accurate than standard logistic regression and
some other classification algorithms.
• We also presented CPXR(Log) models including patterns and local
models an new odds ratios of predictor variables.
29
Ohio Center of Excellence in Knowledge-Enabled Computing
References
• Guozhu Dong & Vahid Taslimitehrani. Pattern-Aided Regression
Modeling and Prediction Model Analysis. Tech Report, CSE, Wright State
Univ. 2014.
• E. Steyerberg: Clinical prediction models. Springer, 2009.
• P. Perel, P. Edwards, R. Wentz, and I. Roberts: Systematic review of
prognostic models in traumatic brain injury. BMC medical informatics
and decision making, 6(1): 1-10, 2006.
• G. Dong, J. Li: Efficient mining of emerging patterns: Discovering trends
and differences. In Proc. KDD, 43-52, 1999.
• E.W. Steyerberg, et al: Predicting outcome after traumatic brain injury:
development and international validation of prognostic scores based on
admission characteristics. PLoS medicine, 5(8): e165, 2008.
30
Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Logistic Regression
• Regression modeling: predicting response variable (output) based on
predictor variables (input).
• Logistic regression: the response variable is binary. For example,
• “having the disease” or “not”
• “mortal” or “not”
• Let X=(𝑥1, 𝑥2, … , 𝑥 𝑛) be a vector of predictor variables
• and Y be the response variable.
• The goal of logistic regression is learning a function like
𝑙𝑝 𝑋 = 𝛽0 + 𝑖=1
𝑛
𝛽𝑖 × 𝑥𝑖 satisfying
log
𝑃 𝑌 = 1
𝑃 𝑌 = 1 + 1
= 𝑙𝑝(𝑋)
Chi-square (𝜒2) is one
of the goodness of fit
measures for logistic
regression
31
Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Contrast Patterns
• An item is a single variable condition of the form
“A = a” or “ 𝒗 𝟏 ≼ 𝑨 < 𝒗 𝟐 “
• A pattern is a finite set of items.
• An instance X from dataset D is said to match a
pattern P, if X satisfies every item in P.
• Example:
“ 60 ≼ Age ≺ 80 ” AND “Diagnosed with high cholesterol = YES”
is a pattern with TWO items.
One instance (patient ID = 1) matches the above pattern.
Patient
ID
Age BMI Sys Blood
Pressure
Diagnosed with high
Cholesterol
Diagnosed with
Heart Failure ©
1 75 22 120 YES YES
2 67 27 131 NO NO
32
Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Contrast Patterns
• The matching data of pattern P in dataset D or 𝑚𝑡(𝑃, 𝐷) is the set of all
instances matching pattern P.
• The support of pattern P in D is 𝑠𝑢𝑝𝑝 𝑃, 𝐷 =
𝑚𝑡(𝑃,𝐷)
𝐷
• Given 2 classes 𝐶1 and 𝐶2,the support ratio of pattern P from 𝐶1 to 𝐶2
𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 =
𝑠𝑢𝑝𝑝(𝑃,𝐶2)
𝑠𝑢𝑝𝑝(𝑃,𝐶1)
• Given a threshold 𝛾, a contrast pattern (emerging pattern) of class 𝐶2
is a pattern P satisfying 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 ≽ 𝛾. [Dong, 1999]
33
Ad

Recommended

Anomaly detection- Credit Card Fraud Detection
Anomaly detection- Credit Card Fraud Detection
Lipsa Panda
 
IJCSI-2015-12-2-10138 (1) (2)
IJCSI-2015-12-2-10138 (1) (2)
Dr Muhannad Al-Hasan
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...
ijcsity
 
Cosso cox
Cosso cox
Izzatul Jannah Jannah
 
Statistik 1 10 12 edited_anova
Statistik 1 10 12 edited_anova
Selvin Hadi
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
iosrjce
 
A Causal Framework for Meta-Analysis, drafty
A Causal Framework for Meta-Analysis, drafty
Wei Wang
 
T180203125133
T180203125133
IOSR Journals
 
Enhanced abc algo for tsp
Enhanced abc algo for tsp
Dr Sandeep Kumar Poonia
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Harshal Jain
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET Journal
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Dr Sandeep Kumar Poonia
 
B0930610
B0930610
IOSR Journals
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Hok Lie
 
Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel Data
IJCSIS Research Publications
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
Galit Shmueli
 
Data Mining using SAS
Data Mining using SAS
Tanu Puri
 
効率的反実仮想学習
効率的反実仮想学習
Masa Kato
 
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
IAEME Publication
 
Resampling methods
Resampling methods
Setia Pramana
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
cscpconf
 
New cv 2016
New cv 2016
AJIT RAWAT
 
تربية الدجاج البلدي
تربية الدجاج البلدي
دكتور سيد صبحي - طبيب أمراض الدواجن
 
أساسيات التهوية الدنيا في فصل الشتاء
أساسيات التهوية الدنيا في فصل الشتاء
دكتور سيد صبحي - طبيب أمراض الدواجن
 
Хороший бренд - плохой Интернет, или что можно делать с детским брендом в Сети.
Хороший бренд - плохой Интернет, или что можно делать с детским брендом в Сети.
PiXSELLS
 
презентація впм
презентація впм
metallurg056
 
Comparing Marketplace ACA Enrollees to Employer Sponsor Programs
Comparing Marketplace ACA Enrollees to Employer Sponsor Programs
Jasmine_Dixon
 
Empal gentong
Empal gentong
JS Kamdhi Jayahadi Saputra
 

More Related Content

What's hot (15)

Enhanced abc algo for tsp
Enhanced abc algo for tsp
Dr Sandeep Kumar Poonia
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Harshal Jain
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET Journal
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Dr Sandeep Kumar Poonia
 
B0930610
B0930610
IOSR Journals
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Hok Lie
 
Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel Data
IJCSIS Research Publications
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
Galit Shmueli
 
Data Mining using SAS
Data Mining using SAS
Tanu Puri
 
効率的反実仮想学習
効率的反実仮想学習
Masa Kato
 
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
IAEME Publication
 
Resampling methods
Resampling methods
Setia Pramana
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
cscpconf
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Harshal Jain
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET Journal
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Dr Sandeep Kumar Poonia
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Hok Lie
 
Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel Data
IJCSIS Research Publications
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
Galit Shmueli
 
Data Mining using SAS
Data Mining using SAS
Tanu Puri
 
効率的反実仮想学習
効率的反実仮想学習
Masa Kato
 
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
IAEME Publication
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
cscpconf
 

Viewers also liked (20)

New cv 2016
New cv 2016
AJIT RAWAT
 
تربية الدجاج البلدي
تربية الدجاج البلدي
دكتور سيد صبحي - طبيب أمراض الدواجن
 
أساسيات التهوية الدنيا في فصل الشتاء
أساسيات التهوية الدنيا في فصل الشتاء
دكتور سيد صبحي - طبيب أمراض الدواجن
 
Хороший бренд - плохой Интернет, или что можно делать с детским брендом в Сети.
Хороший бренд - плохой Интернет, или что можно делать с детским брендом в Сети.
PiXSELLS
 
презентація впм
презентація впм
metallurg056
 
Comparing Marketplace ACA Enrollees to Employer Sponsor Programs
Comparing Marketplace ACA Enrollees to Employer Sponsor Programs
Jasmine_Dixon
 
Empal gentong
Empal gentong
JS Kamdhi Jayahadi Saputra
 
Electronics Materials
Electronics Materials
Dil Nawaz
 
Digital Marketing
Digital Marketing
Edgar Gerriano
 
Hebräischkurs
Hebräischkurs
redux2
 
Print woven structure project
Print woven structure project
Suman Silky
 
Leading Jordanian Company Integrated Technology Group showcases its innovativ...
Leading Jordanian Company Integrated Technology Group showcases its innovativ...
learnafrica2
 
How to increase website conversions by applying the laws of great product design
How to increase website conversions by applying the laws of great product design
Lindsay Bayuk
 
Chitcare Chit Fund Web Based ERP Application
Chitcare Chit Fund Web Based ERP Application
Chit Care
 
ΕΦΚΑ 18/ΕΓΚ.4/2-2-17
ΕΦΚΑ 18/ΕΓΚ.4/2-2-17
Panayotis Sofianopoulos
 
Fanavari etelaat
Fanavari etelaat
Lampesht
 
updated cv 2015
updated cv 2015
Sandesh Garad
 
Matbook sample
Matbook sample
Anthony Lett
 
Brochure master copy
Brochure master copy
Julie Hickton
 
Roro jonggrang legend
Roro jonggrang legend
Kristian Saputro
 
Хороший бренд - плохой Интернет, или что можно делать с детским брендом в Сети.
Хороший бренд - плохой Интернет, или что можно делать с детским брендом в Сети.
PiXSELLS
 
презентація впм
презентація впм
metallurg056
 
Comparing Marketplace ACA Enrollees to Employer Sponsor Programs
Comparing Marketplace ACA Enrollees to Employer Sponsor Programs
Jasmine_Dixon
 
Electronics Materials
Electronics Materials
Dil Nawaz
 
Hebräischkurs
Hebräischkurs
redux2
 
Print woven structure project
Print woven structure project
Suman Silky
 
Leading Jordanian Company Integrated Technology Group showcases its innovativ...
Leading Jordanian Company Integrated Technology Group showcases its innovativ...
learnafrica2
 
How to increase website conversions by applying the laws of great product design
How to increase website conversions by applying the laws of great product design
Lindsay Bayuk
 
Chitcare Chit Fund Web Based ERP Application
Chitcare Chit Fund Web Based ERP Application
Chit Care
 
Fanavari etelaat
Fanavari etelaat
Lampesht
 
Brochure master copy
Brochure master copy
Julie Hickton
 
Ad

Similar to A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury (20)

Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Artificial Intelligence Institute at UofSC
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
Artificial Intelligence Institute at UofSC
 
Predictive analytics and Type of Predictive Analytics
Predictive analytics and Type of Predictive Analytics
Abhishek Job
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
University of Malaya
 
Conference_paper.pdf
Conference_paper.pdf
NarenRajVivek
 
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
rahulmonikasharma
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
Dr Athar Khan
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Clinical prediction of chronic periodontitis
Clinical prediction of chronic periodontitis
Htun Teza
 
Logistic Regression.ppt
Logistic Regression.ppt
habtamu biazin
 
Predictive Analytics and Machine Learning for Healthcare - Diabetes
Predictive Analytics and Machine Learning for Healthcare - Diabetes
Dr Purnendu Sekhar Das
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
ankit_ppt
 
project of computer science eng PPT.pptx
project of computer science eng PPT.pptx
yetadey488
 
IRJET- Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
IRJET Journal
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
Nexgen Technology
 
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
IOSR Journals
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Peea Bal Chakraborty
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
cambridgeWD
 
Prediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptx
Ewout Steyerberg
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Artificial Intelligence Institute at UofSC
 
Predictive analytics and Type of Predictive Analytics
Predictive analytics and Type of Predictive Analytics
Abhishek Job
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
University of Malaya
 
Conference_paper.pdf
Conference_paper.pdf
NarenRajVivek
 
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
rahulmonikasharma
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
Dr Athar Khan
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Clinical prediction of chronic periodontitis
Clinical prediction of chronic periodontitis
Htun Teza
 
Logistic Regression.ppt
Logistic Regression.ppt
habtamu biazin
 
Predictive Analytics and Machine Learning for Healthcare - Diabetes
Predictive Analytics and Machine Learning for Healthcare - Diabetes
Dr Purnendu Sekhar Das
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
ankit_ppt
 
project of computer science eng PPT.pptx
project of computer science eng PPT.pptx
yetadey488
 
IRJET- Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
IRJET Journal
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
Nexgen Technology
 
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
IOSR Journals
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Peea Bal Chakraborty
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
cambridgeWD
 
Prediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptx
Ewout Steyerberg
 
Ad

Recently uploaded (20)

SAP_S4HANA_EWM_Food_Processing_Industry.pptx
SAP_S4HANA_EWM_Food_Processing_Industry.pptx
vemulavenu484
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
QUALITATIVE EXPLANATORY VARIABLES REGRESSION MODELS
QUALITATIVE EXPLANATORY VARIABLES REGRESSION MODELS
Ameya Patekar
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
jacoba18
 
Hypothesis Testing Training Material.pdf
Hypothesis Testing Training Material.pdf
AbdirahmanAli51
 
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
OlhaTatokhina1
 
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
SQL-Demystified-A-Beginners-Guide-to-Database-Mastery.pptx
SQL-Demystified-A-Beginners-Guide-to-Database-Mastery.pptx
bhavaniteacher99
 
最新版西班牙莱里达大学毕业证(UdL毕业证书)原版定制
最新版西班牙莱里达大学毕业证(UdL毕业证书)原版定制
Taqyea
 
Untitled presentation xcvxcvxcvxcvx.pptx
Untitled presentation xcvxcvxcvxcvx.pptx
jonathan4241
 
Module 1Integrity_and_Ethics_PPT-2025.pptx
Module 1Integrity_and_Ethics_PPT-2025.pptx
Karikalcholan Mayavan
 
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Ameya Patekar
 
5. & 9. Packing material and Labelling_AP-60,XP-60.pdf
5. & 9. Packing material and Labelling_AP-60,XP-60.pdf
maricruzduranpaterni
 
Advanced_English_Pronunciation_in_Use.pdf
Advanced_English_Pronunciation_in_Use.pdf
leogoemmanguyenthao
 
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Ameya Patekar
 
Section Three - Project colemanite production China
Section Three - Project colemanite production China
VavaniaM
 
SAP_S4HANA_EWM_Food_Processing_Industry.pptx
SAP_S4HANA_EWM_Food_Processing_Industry.pptx
vemulavenu484
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
QUALITATIVE EXPLANATORY VARIABLES REGRESSION MODELS
QUALITATIVE EXPLANATORY VARIABLES REGRESSION MODELS
Ameya Patekar
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
jacoba18
 
Hypothesis Testing Training Material.pdf
Hypothesis Testing Training Material.pdf
AbdirahmanAli51
 
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
OlhaTatokhina1
 
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
SQL-Demystified-A-Beginners-Guide-to-Database-Mastery.pptx
SQL-Demystified-A-Beginners-Guide-to-Database-Mastery.pptx
bhavaniteacher99
 
最新版西班牙莱里达大学毕业证(UdL毕业证书)原版定制
最新版西班牙莱里达大学毕业证(UdL毕业证书)原版定制
Taqyea
 
Untitled presentation xcvxcvxcvxcvx.pptx
Untitled presentation xcvxcvxcvxcvx.pptx
jonathan4241
 
Module 1Integrity_and_Ethics_PPT-2025.pptx
Module 1Integrity_and_Ethics_PPT-2025.pptx
Karikalcholan Mayavan
 
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Ameya Patekar
 
5. & 9. Packing material and Labelling_AP-60,XP-60.pdf
5. & 9. Packing material and Labelling_AP-60,XP-60.pdf
maricruzduranpaterni
 
Advanced_English_Pronunciation_in_Use.pdf
Advanced_English_Pronunciation_in_Use.pdf
leogoemmanguyenthao
 
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Ameya Patekar
 
Section Three - Project colemanite production China
Section Three - Project colemanite production China
VavaniaM
 

A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury

  • 1. Ohio Center of Excellence in Knowledge-Enabled Computing A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury Vahid Taslimitehrani, Guozhu Dong kno.e.sis center Department of Computer Science and Engineering Wright State University Dayton, OH 1
  • 2. Ohio Center of Excellence in Knowledge-Enabled Computing Outline • Motivation and background • Preliminaries – Contrast pattern mining – Logistic regression • CPXR(Log) • TBI data • Results of CXR(Log) on TBI • Conclusion • References 2
  • 3. Ohio Center of Excellence in Knowledge-Enabled Computing Motivation and Background • CPXR (Log): Accurate and informative prognostic models  Prognostic models are central to medicine. [Steyerberg, 2009]  Facilitate physicians decision making process on patient treatment plan, screening and etc.  Help to understand the disease behavior including identifying new biomarkers.  Number of articles listed in PubMed with “prediction model” in title in 2012 is 7 times of that in 2000. [pubmed] 3
  • 4. Ohio Center of Excellence in Knowledge-Enabled Computing Motivation and Background • CPXR (Log): A powerful new generic Logistic Regression method  Logistic regression is one of the most popular approaches for building clinical prediction models. [Steyerberg, 2009]  Logistic regression models are desirable since  They are representable.  They are probabilistic based.  They are flexible in terms of predictor variables. (categorical and numerical variables) 4
  • 5. Ohio Center of Excellence in Knowledge-Enabled Computing Motivation and Background • Traumatic Brain Injury  One of the leading causes of death and disability worldwide.  Annually, 1.5 million death in worldwide. [Perel, 2006]  $76.5 billion dollars including direct and indirect cost in 2010 in US. [www.cdc.gov]  Early and accurate prognostic models based on just admission time data to make time–critical clinical decisions by physicians. 5
  • 6. Ohio Center of Excellence in Knowledge-Enabled Computing Challenges in clinical modeling • Accuracy of the clinical prediction models • Easiness to interpret clinical prediction models • To explain medical decision to the patient • To identify important risk factors • Avoid overfitting to make clinical prediction models more generalizable • Early decision making • ABILITY to CAPTURE – Heterogeneous patient group behavior 6
  • 7. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR works well by using several pattern local model pairs These are different subpopulations that need different predicted models. Using just one prediction function does not work well!! Not an extreme case! It happens very often … 7
  • 8. Ohio Center of Excellence in Knowledge-Enabled Computing How CPXR(Log) is different from other classifiers? • CPXR introduced the idea of – using patterns to logically characterize different subpopulations of data and – using local regression models to represent predictor response relationship of the subpopulation – choosing a pattern only if the local model is very accurate [Dong, 2014] • CPXR(Log) – can capture diversified/heterogeneous behavior. – is more generalizable. – is less overfitting than other classifiers. • CPXR(Log) is more accurate than other classifiers like SVM and Random Forest. 8
  • 9. Ohio Center of Excellence in Knowledge-Enabled Computing Traditional classification vs CPXR Training Data Classification engine Classifier (model) Training Data Classification engine Baseline model • Large error data • Small error data (Pattern 1, Model 1) (Pattern 2, Model 2) (Pattern k, Model k) . . . Build and select CPs & local models 9
  • 10. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR(Log) – PXR concept • Definition: Let 𝐷 = 𝑋𝑖, 𝑌𝑖 1 ≤ 𝑖 ≤ 𝑛 be training data for regression. Let 𝑓 be a regression model built on 𝐷, which we will call the baseline model on 𝐷. A pattern aided regression (PXR) model is a tuple 𝑃𝑀 = ( 𝑃1, 𝑓1, 𝑤1 , … , 𝑃𝑘, 𝑓𝑘, 𝑤 𝑘 , 𝑓𝑑), where {𝑃1, … , 𝑃𝑘} is the pattern set of 𝑃𝑀, 𝑓𝑖s are local regression models of 𝑃𝑖s and 𝑓𝑑 is the default regression model. We define the regression model of 𝑃𝑀 as 𝑓𝑃𝑀 = 𝑃 𝑖∈𝜋 𝑥 𝑤𝑖 𝑓𝑖(𝑥) 𝑃 𝑖∈𝜋 𝑥 𝑤𝑖 𝑖𝑓 𝜋 𝑥 ≠ 0 𝑓𝑑 𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 for each instance 𝑥, where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑥 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 𝑃𝑖 . 10
  • 11. Ohio Center of Excellence in Knowledge-Enabled Computing Preliminaries: Contrast Patterns • A toy example • 𝑃1 = 𝐴2 = 𝑐 & 𝐴3 = 𝑒 𝑚𝑡 𝑃1, 𝐷 = 𝑡2, 𝑡3, 𝑡4 𝑠𝑢𝑝𝑝(𝑃1, 𝐷)= 3 5 = 𝟔𝟎% • 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1 𝐶2 𝑃 = 2 1 = 𝟐 • Given a threshold like 2, 𝑃1 is a contrast pattern. • Details: We only consider one minimal generator pattern for each “equivalency class” of contrast patterns. TID 𝑨 𝟏 𝑨 𝟐 𝑨 𝟑 𝑨 𝟒 𝑨 𝟓 Class 𝒕 𝟏 b d e g i 𝑪 𝟏 𝒕 𝟐 b c e g i 𝑪 𝟏 𝒕 𝟑 a c e g j 𝑪 𝟐 𝒕 𝟒 a c e h j 𝑪 𝟐 𝒕 𝟓 b d f g i 𝑪 𝟐
  • 12. Ohio Center of Excellence in Knowledge-Enabled Computing Quality measures • CPXR(Log) needs to efficiently extract a desirable pattern set from a huge search space of potential pattern sets. • Definition: The average residual reduction (arr) of a pattern 𝑃 w.r.t. a model 𝑓 and a dataset 𝐷 is 𝑎𝑟𝑟 𝑃 = 𝑋 ∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓𝑃) 𝑚𝑑𝑠(𝑃) • Definition: The total residual reduction (trr) of a pattern set 𝑃𝑆 = 𝑃1, … , 𝑃𝑘 w.r.t a model 𝑓 and a dataset 𝐷 is 𝑡𝑟𝑟 𝑃 = 𝑋 ∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓 𝑃𝑀) 𝑋∈𝐷 𝑟 𝑋(𝑓) where 𝑃𝑀 = 𝑃1, 𝑓𝑃1 , 𝑤1 , … , 𝑃𝑘, 𝑓𝑃 𝑘 , 𝑤 𝑘 , 𝑓 , 𝑤𝑖 = 𝑎𝑟𝑟(𝑃𝑖), and 𝑚𝑑𝑠 𝑃𝑆 = 𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃).
  • 13. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR(Log) algorithm -- outline • First step: split training dataset 𝐷 into two classes, 𝐿𝐸 and 𝑆𝐸. • 𝐿𝐸: instances of 𝐷 where baseline model 𝑓 makes Large Error. • 𝑆𝐸: instances of 𝐷 where baseline model 𝑓 makes Small Error. • Second step: extract all contrast patterns on 𝐿𝐸 satisfying 𝑚𝑖𝑛𝑆𝑢𝑝. • Third step: search for a small set of pattern to maximize error reduction and uses that set to build a 𝑃𝑋𝑅 model. • Note  Each pattern 𝑃 is associated with a local regression model 𝑓𝑃 built on 𝑃’s matching data.  Using a pattern 𝑃 and its local associated regression model 𝑓𝑃 is a flexible way to represent one predictor response relationship.  Different (𝑃, 𝑓𝑃) pairs represent highly different predictor response relationships. 13
  • 14. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR(Log) – details (1) • Inputs: • Training data 𝐷 = (𝑥𝑖, 𝑦𝑖) 1 ≤ 𝑖 ≤ 𝑛 • Baseline model 𝑓 • 𝜌 to partition 𝐷 into 𝐿𝐸 and 𝑆𝐸 • 𝑚𝑖𝑛𝑆𝑢𝑝 threshold on contrast patterns • Output: • A 𝑃𝑋𝑅 model  Let 𝑟1, … , 𝑟𝑛 denote 𝑓’s error on 𝑥1, … , 𝑥 𝑛;  Determine 𝜅 to minimize 𝜌 − 𝑟 𝑖>𝜅 𝑟 𝑖 𝑟 𝑖 ;  Let 𝐿𝐸 = 𝑥𝑖 𝑟𝑖 > 𝜅 , 𝑆𝐸 = 𝐷 − 𝐿𝐸;  Discretize each numerical variable using entropy based binning;  Extract all contrast patterns for 𝑚𝑖𝑛𝑆𝑢𝑝 in the 𝐿𝐸 class (𝐶𝑃𝑆); 14
  • 15. Ohio Center of Excellence in Knowledge-Enabled Computing CPXR(Log) – details (2)  For each 𝑃 ∈ 𝐶𝑃𝑆, build the local regression model 𝑓𝑃 for data in 𝑚𝑑𝑠(𝑃);  Let 𝑃𝑆 = 𝑃0 , where 𝑃0 is the pattern 𝑃 in 𝐶𝑃𝑆 with highest 𝑎𝑟𝑟;   Let 𝑓𝑑 be the regression model trained from 𝐷 − 𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃);  Return 𝑃𝑀(𝑃𝑆, 𝑓𝑑); 15
  • 16. Ohio Center of Excellence in Knowledge-Enabled Computing TBI data • TBI dataset is a collection of some International and US Tirilazad trials. • 2159 instances. [Steyerberg, 2008] • 15 numerical and categorical predictor variables. • Missing instances were treated using multiple imputation. • The outcome variable is the Glascow Outcome Scale: GOS 1 (dead),…, GOS 5 (good recovery) • This study used two discretized versions of GOS: “Mortality” vs survival (GOS1 vs GOS 2-5), “Unfavorable” vs favorable (GOS 1-3 vs GOS 4-5) Category Predictor variables Basic Cause of injury, age, GCS motor score, pupil reactivity Computed tomography (CT) Hypoxia, hypotension, Marshall CT, tSAH, eDH, compressed cistern, midline shift more than 5 mm Lab Glucose, ph, sodium, hb 16
  • 17. Ohio Center of Excellence in Knowledge-Enabled Computing Results – Performance of SLogR and CPXR(Log) on Mortality models Model SLogR CPXR(Log) Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC Basic 0.95 0.18 0.27 0.77 0.96 0.18 0.28 0.8 Basic+CT 0.95 0.32 0.42 0.8 0.96 0.42 0.53 0.88 Basic+CT+Lab 0.94 0.36 0.46 0.8 0.97 0.46 0.58 0.92 Of course more accurate than standard logistic regression 17
  • 18. Ohio Center of Excellence in Knowledge-Enabled Computing Results – Performance of SLogR and CPXR(Log) on Unfavorable models Model SLogR CPXR(Log) Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC Basic 0.85 0.52 0.59 0.76 0.89 0.54 0.63 0.82 Basic+CT 0.85 0.6 0.66 0.8 0.87 0.65 0.7 0.87 Basic+CT+Lab 0.84 0.61 0.66 0.81 0.91 0.72 0.76 0.93 18
  • 19. Ohio Center of Excellence in Knowledge-Enabled Computing Results – Impact of adding more variables on AUC Variable set change Mortality Unfavorable CPXR(Log) SLogR CPXR(Log) SLogR Basic  Basic +CT 10% 7.7% 6% 5.2% Basic  Basic + CT + Lab 15% 11.1% 13.4% 6.6% Mortality Unfavorable Basic Basic+CT Basic+CT+Lab Basic Basic+CT Basic+CT+Lab 11.1% 12.8% 15% 7.9% 8.8% 14.8% CPXR(Log) over SlogR AUC improvement when more variables are used by CPXR(Log) and SLogR 19
  • 20. Ohio Center of Excellence in Knowledge-Enabled Computing Results – ROC curves of Basic models 20
  • 21. Ohio Center of Excellence in Knowledge-Enabled Computing Results - ROC curves of (Basic + CT) models 21
  • 22. Ohio Center of Excellence in Knowledge-Enabled Computing Results - ROC curves of (Basic+CT+Lab) models 22
  • 23. Ohio Center of Excellence in Knowledge-Enabled Computing Results – Performance comparison CPXR(Log) Comparing CPXR(Log) performance with - Logistic Regression - SVM - Random Forest 23
  • 24. Ohio Center of Excellence in Knowledge-Enabled Computing Example: patterns used by CPXR(Log) & Mortality (Basic+CT+Lab) patterns arr Cov (CT classification = III) 15% 20% (CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4) 12% 15% (No compressed cistern) AND (No midline shift) AND (7.22 < PH <= 7.45) 10% 40% (10.77 < glucose <= 21.98) AND (134 < sodium <= 144) 18% 18% (No Hypotension) AND (134 < sodium < 144) AND (10.55 < HB <= 14.57) AND (with tSAH) 19% 20% (No tSAH) AND (134 < sodium <= 144) AND (10.77 < glucose <= 21.98) AND (No Hypotension) AND (No midline shift) AND (One reactive pupil) 19% 20% (No tSAH) AND (One reactive pupil) 18% 40% 24
  • 25. Ohio Center of Excellence in Knowledge-Enabled Computing Odds ratios (CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4) 25
  • 26. Ohio Center of Excellence in Knowledge-Enabled Computing Residual reduction and example patient • Age = 15 years old • Cause of injury = motorbike accident • GCS motor score = 5 (No eye response) • No reactive pupil • No hypoxia • No hypotension • CT scan classification = V (mass lesion) • No tSAH • With ePDH • Has midline shift more than 5 mm • Glucose = 9.06 mmol/l • PH = 7.37 • Sodium = 141 mmol/l • Hb = 14.4 g/dl • Patient is dead.  0.78, risk of survival based on standard logistic regression!!!! 0 100 200 300 400 500 600 0 500 1000 1500 2000 2500 Error distribution of TBI dataset on SLogR Patient is matched with “pattern II” and CPXR(Log) predicted 0.38 risk of survival. 26 0 2 4 6 8 10 12 0 500 1000 1500 2000 2500 Error distribution of TBI dataset on CPXR(Log)
  • 27. Ohio Center of Excellence in Knowledge-Enabled Computing Results – Box plot of RMSE reduction in CPXR • Piecewise linear regression • Support vector regression • Bayesian additive regression tree • Gradient boosting method How much CPXR can reduce RMSE (Root Mean Square Error) in 50 datasets comparing to 27
  • 28. Ohio Center of Excellence in Knowledge-Enabled Computing Results – Noise sensitivity and impact of the number of patterns Number of patterns is determined by the method automatically. How much noisy datasets can impact on the performance of CPXR and other methods? 28
  • 29. Ohio Center of Excellence in Knowledge-Enabled Computing Conclusion • We presented an effective new method, CPXR(Log) for logistic regression and for clinical predictive modeling. • We showed CPXR is more accurate than standard logistic regression and some other classification algorithms. • We also presented CPXR(Log) models including patterns and local models an new odds ratios of predictor variables. 29
  • 30. Ohio Center of Excellence in Knowledge-Enabled Computing References • Guozhu Dong & Vahid Taslimitehrani. Pattern-Aided Regression Modeling and Prediction Model Analysis. Tech Report, CSE, Wright State Univ. 2014. • E. Steyerberg: Clinical prediction models. Springer, 2009. • P. Perel, P. Edwards, R. Wentz, and I. Roberts: Systematic review of prognostic models in traumatic brain injury. BMC medical informatics and decision making, 6(1): 1-10, 2006. • G. Dong, J. Li: Efficient mining of emerging patterns: Discovering trends and differences. In Proc. KDD, 43-52, 1999. • E.W. Steyerberg, et al: Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics. PLoS medicine, 5(8): e165, 2008. 30
  • 31. Ohio Center of Excellence in Knowledge-Enabled Computing Preliminaries: Logistic Regression • Regression modeling: predicting response variable (output) based on predictor variables (input). • Logistic regression: the response variable is binary. For example, • “having the disease” or “not” • “mortal” or “not” • Let X=(𝑥1, 𝑥2, … , 𝑥 𝑛) be a vector of predictor variables • and Y be the response variable. • The goal of logistic regression is learning a function like 𝑙𝑝 𝑋 = 𝛽0 + 𝑖=1 𝑛 𝛽𝑖 × 𝑥𝑖 satisfying log 𝑃 𝑌 = 1 𝑃 𝑌 = 1 + 1 = 𝑙𝑝(𝑋) Chi-square (𝜒2) is one of the goodness of fit measures for logistic regression 31
  • 32. Ohio Center of Excellence in Knowledge-Enabled Computing Preliminaries: Contrast Patterns • An item is a single variable condition of the form “A = a” or “ 𝒗 𝟏 ≼ 𝑨 < 𝒗 𝟐 “ • A pattern is a finite set of items. • An instance X from dataset D is said to match a pattern P, if X satisfies every item in P. • Example: “ 60 ≼ Age ≺ 80 ” AND “Diagnosed with high cholesterol = YES” is a pattern with TWO items. One instance (patient ID = 1) matches the above pattern. Patient ID Age BMI Sys Blood Pressure Diagnosed with high Cholesterol Diagnosed with Heart Failure © 1 75 22 120 YES YES 2 67 27 131 NO NO 32
  • 33. Ohio Center of Excellence in Knowledge-Enabled Computing Preliminaries: Contrast Patterns • The matching data of pattern P in dataset D or 𝑚𝑡(𝑃, 𝐷) is the set of all instances matching pattern P. • The support of pattern P in D is 𝑠𝑢𝑝𝑝 𝑃, 𝐷 = 𝑚𝑡(𝑃,𝐷) 𝐷 • Given 2 classes 𝐶1 and 𝐶2,the support ratio of pattern P from 𝐶1 to 𝐶2 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1 𝐶2 𝑃 = 𝑠𝑢𝑝𝑝(𝑃,𝐶2) 𝑠𝑢𝑝𝑝(𝑃,𝐶1) • Given a threshold 𝛾, a contrast pattern (emerging pattern) of class 𝐶2 is a pattern P satisfying 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1 𝐶2 𝑃 ≽ 𝛾. [Dong, 1999] 33

Editor's Notes

  • #6: epilepsyu.com
  • #33: drbonnie360.com
  • #34: http://www.barco.com/en/News/Post/2013/7/1/Contrast-ratio-hardly-a-black-and-white-metric