SlideShare a Scribd company logo
Automated Parameter Optimization
for Defect Prediction Models
Chakkrit (Kla)

Tantithamthavorn
Shane McIntosh Ahmed E. Hassan Kenichi Matsumoto
http://chakkrit.com kla@chakkrit.com @klainfo
Defect models are used to predict software
modules that are likely to be defective in the future
2
Pre-release period
Releasedate
Post-release period
Defect
prediction
models
Module A
Module C
Module B
Module D
Module A
Module C
Module B
Module D
Clean
Defect-prone
Clean
Defect-prone
Defect models are trained 

using classification techniques
3
1
2
Decision Tree

Algorithms
Regression 

Algorithms
Clustering 

Algorithms
Ensemble 

Algorithms
Such classification techniques often
require parameter settings
4
Ensemble 

Algorithms
Such classification techniques often
require parameter settings
4
The number of trees in 

a random forest classifier
Ensemble 

Algorithms
Such classification techniques often
require parameter settings
4
The number of trees in 

a random forest classifier
26 of the 30 most commonly used
classification techniques require at least
one parameter setting
Ensemble 

Algorithms
5
Defect models may underperform if they are
trained using suboptimal parameter settings
The default settings of 

random forest, naïve bayes,
and support vector machines

are suboptimal

[Jiang et al., DEFECTS’08]

[Tosun et al., ESEM’09]
[Hall et al., TSE’12]
6
Different toolkits have different default
settings for the same classification technique
randomForest package
Default setting of the number of trees 

in a random forest
10
50
100
500
bigrf package
7
The parameter space is too large
for manual inspection
There are at least 17,000
possible settings to 

explore when
training k-NN classifiers

[Kocaguneli et al., TSE’12]
8
The parameter space is too large
for manual inspection
There are at least 17,000
possible settings to 

explore when
training k-NN classifiers

[Kocaguneli et al., TSE’12]
How do automated parameter
optimization techniques fare when
applied to defect prediction?
9
Performance
improvement
10
Performance
Improvement
Performance

Stability
11
Performance
Improvement
Performance

Stability
Caret — an off-the-shelf automated
parameter optimization technique
12
(Step-1)

Generate
candidate
settings
Settings
(Step-2)

Evaluate
candidate
settings
Performance

for each setting
(Step-3)

Identify
optimal
setting
Optimal

setting
Generate a set of 

candidate settings to evaluate
13
#Trees for random forest
#Trees = 10 #Trees = 20 #Trees = 30
#Trees = 40 #Trees = 50
(Step-1)

Generate
candidate
settings
Evaluate the performance of each candidate
setting using bootstrap validation
14
Defect 

Dataset
Testing 

Corpus
Training 

Corpus
Generate
bootstrap
samples
Construct

defect
model Model
Calculate
performance
Perf.
Out-of-sample Bootstrap Validation 

with 100 repetitions
(Step-1)

Generate
candidate
settings
(Step-2)

Evaluate
candidate
settings
The optimal setting is the one that
achieved the top performance score
15
AUC=0.65 AUC=0.68 AUC=0.70
AUC=0.80 AUC=0.86
10 20 30
40 50
(Step-1)

Generate
candidate
settings
(Step-2)

Evaluate
candidate
settings
(Step-3)

Identify
optimal
setting
We study a collection of 

18 datasets from 5 open corpora
16
A threat of bias exists if researchers fixate on
studying the same datasets with the same metrics

[Tantithamthavorn et al., TSE’16]
We study a collection of 

18 datasets from 5 open corpora
16
1-7K Modules

21-28% Defective Rate

21-38 Metrics

[Shepperd et al., TSE’13]
1-10K Modules

11-44% Defective Rate

15-32 Metrics



[Zimmermann et al., PROMISE’07]
[D’Ambros et al., MSR’10]

[Kim et al., ICSE’11]
600-800 Modules

36-48% Defective Rate

20 Metrics



[Jureczko et al., PROMISE’10]
A threat of bias exists if researchers fixate on
studying the same datasets with the same metrics

[Tantithamthavorn et al., TSE’16]
Compute the performance
improvement
17
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Defect

dataset Construct +
evaluate 

default models
Default

setting
Compute the performance
improvement
17
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Defect

dataset Construct +
evaluate 

default models
Default

setting
Caret-optimized 

performance
100x
Technique 1
AUC
default

performance
100x
Technique 1
AUC
Compute the performance
improvement
17
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Defect

dataset Construct +
evaluate 

default models
Default

setting
Caret-optimized 

performance
100x
Technique 1
AUC
default

performance
100x
Technique 1
AUC
Average
Average
Compute the performance
improvement
17
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Defect

dataset Construct +
evaluate 

default models
Default

setting
Caret-optimized 

performance
100x
Technique 1
AUC
default

performance
100x
Technique 1
AUC
Average
Average
Performance 

Improvement
Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adia
G
AM
AUCPerformanceImprovement
Parameter settings can substantially influence
the performance of defect prediction models
18
Each boxplot presents
the performance
improvement for all

the 18 studied datasets
Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adia
G
AM
AUCPerformanceImprovement
Parameter settings can substantially influence
the performance of defect prediction models
19
9 of the 26 studied
classification techniques
have a large performance
improvement
Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adia
G
AM
AUCPerformanceImprovement
Parameter settings can substantially influence
the performance of defect prediction models
20
C5.0 and AdaBoost
have a median
improvement of 0.27
and 0.14 AUC
Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adia
G
AM
AUCPerformanceImprovement
Parameter settings can substantially influence
the performance of defect prediction models
21
C5.0 and AdaBoost

span up to 0.40 AUC
Large Medium Small Ne
●
●
●
●
●
● ●
● ●
●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boost
R
FR
ippe
AUCPerformanceImprovement
22
Caret improves the AUC
performance by up to 40
percentage points
Performance
Improvement
Performance

Stability
23
Performance

Stability
Large Medium Small Ne
●
●
●
●
●
● ●
● ●
●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boost
R
FR
ippe
AUCPerformanceImprovement
Caret improves the AUC
performance by up to 40
percentage points
Performance
Improvement
24
Default settings may introduce
instability into defect prediction models
Unstable performance
estimates may introduce bias
into the conclusion of research


[Jorgensen et al., TSE’07]
[Menzies and Shepperd, EMSE’12]
Estimating the stability of 

defect prediction models
25
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Construct +
evaluate 

default models
Default

setting
Defect

dataset
Estimating the stability of 

defect prediction models
25
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Construct +
evaluate 

default models
Default

setting
Caret-optimized 

performance
100x
Technique 1
AUC
Default

performance
100x
Technique 1
AUC
Defect

dataset
Estimating the stability of 

defect prediction models
25
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Construct +
evaluate 

default models
Default

setting
Caret-optimized 

performance
100x
Technique 1
AUC
Default

performance
100x
Technique 1
AUC
Defect

dataset
Standard

Deviation (S.D.)
Standard

Deviation (S.D.)
Estimating the stability of 

defect prediction models
25
Construct +
evaluate 

Caret-optimized
models
Optimized

setting
Construct +
evaluate 

default models
Default

setting
Caret-optimized 

performance
100x
Technique 1
AUC
Default

performance
100x
Technique 1
AUC
Stability 

Ratio
= S.D. of Optimized
S.D. of Default
Defect

dataset
Standard

Deviation (S.D.)
Standard

Deviation (S.D.)
Large Medium Small
0.0
0.5
1.0
1.5
2.0
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
ad
G
A
StabilityRatio
Caret-optimized classifiers tend to be
more stable than default classifiers
26
Cliff’s delta = Large
Each boxplot presents the
stability ratio for all

the 18 studied datasets
Large Medium Small
0.0
0.5
1.0
1.5
2.0
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
ad
G
A
StabilityRatio
Caret-optimized classifiers tend to be
more stable than default classifiers
27
Cliff’s delta = Large
A stability ratio lower than one
indicates that Caret-optimized
classifiers tend to be more stable
than default classifiers
Large Medium Small
0.0
0.5
1.0
1.5
2.0
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
ad
G
A
StabilityRatio
Caret-optimized classifiers tend to be
more stable than default classifiers
28
Stability ratio is lower
than 1 for 35% of the
studied classification
techniques
Cliff’s delta = Large
Medium Small Negligible
D
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boost
R
FR
ipperPM
R
PDAM
AR
S
SVM
Linear
J48
Caret-optimized classifiers are at
least as stable as default classifiers
29
Stability ratio is about 0
for 65% of the studied
classification
techniques
Large
0.0
0.5
1.0
1.5
2.0
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
L
StabilityRatio
30
Performance

Stability
Large Medium Small Ne
●
●
●
●
●
● ●
● ●
●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boost
R
FR
ippe
AUCPerformanceImprovement
Performance
Improvement
Caret improves the AUC
performance by up to 40
percentage points.
Caret-optimized classifiers tend
to be more stable than
classifiers trained with default
settings
31
Suboptimal parameter settings

may have an impact on 

prior defect prediction results!
Prior findings
on top-
performing
classification
techniques
32
17 of 22
classification
techniques are
indistinguishable



[Lessmann et. al. TSE’08]
Classification
techniques have a
large impact on
the performance



[Ghotra et al., ICSE’15]
Prior findings
on top-
performing
classification
techniques
32
However, these studies have not taken 

parameter optimization into account
17 of 22
classification
techniques are
indistinguishable



[Lessmann et. al. TSE’08]
Classification
techniques have a
large impact on
the performance



[Ghotra et al., ICSE’15]
Identifying statistically distinct
ranks of classification techniques
33
100x
Technique 1
AUC Performance

Distribution
100x
Technique 26
AUC Performance

Distribution
Dataset 1
….100x
Technique 2
AUC Performance

Distribution
Identifying statistically distinct
ranks of classification techniques
33
Scott-Knott
ESD test
100x
Technique 1
AUC Performance

Distribution
100x
Technique 26
AUC Performance

Distribution
Dataset 1
….100x
Technique 2
AUC Performance

Distribution
Identifying statistically distinct
ranks of classification techniques
33
Scott-Knott
ESD test
100x
Technique 1
AUC Performance

Distribution
100x
Technique 26
AUC Performance

Distribution
Dataset 1
….100x
Technique 2
AUC Performance

Distribution
Ranking for 

dataset 1
Identifying statistically distinct
ranks of classification techniques
33
Scott-Knott
ESD test
100x
Technique 1
AUC Performance

Distribution
100x
Technique 26
AUC Performance

Distribution
Dataset 1
….100x
Technique 2
AUC Performance

Distribution
Ranking for 

dataset 1
Ranking for 

Dataset 18
Scott-Knott
ESD test
Dataset 18
100x
Technique 1
AUC Performance

Distribution
100x
Technique 26
AUC Performance

Distribution
….100x
Technique 2
AUC Performance

Distribution
Identifying statistically distinct
ranks of classification techniques
34
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Scott-Knott
ESD test
100x
Technique 26
AUC Performance

Distribution
Dataset 1
Ranking for 

dataset 1
Ranking for 

Dataset 18
Scott-Knott
ESD test
Dataset 18
100x
Technique 26
AUC Performance

Distribution
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Compute the proportion of datasets
where a classifier appears in the top rank
35
Likelihood
for each 

technique
T1 T2 T3
0.67 0.67 0
Compute 

likelihood
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Compute the proportion of datasets
where a classifier appears in the top rank
36
Likelihood
for each 

technique
T1 T2 T3
0.67 0.67 0
Compute 

likelihood
Bootstrap resampling to combat
sample selection bias
37
Bootstrap

sample of ranking
Bootstrap 

Sampling
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
2 1 2 3
Re-compute the likelihood 

for each sample
38
Likelihood
for each 

technique
T1 T2 T3
0.67 0.33 0
Bootstrap 

Sampling
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
2 1 2 3
Compute 

likelihood
Bootstrap

sample of ranking
Repeat the bootstrap 100 times to
estimate the confidence interval
39
Bootstrap 

Sampling
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
2 1 2 3
Bootstrap

sample of ranking
Repeat the bootstrap 100 times to
estimate the confidence interval
39
Bootstrap 

Sampling
Pool of ranking 

for each dataset
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
3 1 1 2
Dataset T1 T2 T3
1 2 1 3
2 1 2 3
2 1 2 3
Repeat 100 times
T1 T2 T3
0.67 0.33 0
… … …
0.33 0 0
Distribution 

of likelihood
Compute 

likelihood
Bootstrap

sample of ranking
Caret optimization can substantially shift
the top-ranked classification techniques
40
●
●
●
● ●
●
●
● ●
●
● ●
● ●
● ●
PLS
PDA
N
N
et
PM
R
Boost
N
N
et
AR
S
FDA
Boost
adial
ecay
M
LP
R
BF
N
B
ipper
LM
T
●Optimized Classifier Default Classifier
●
●
●
●
●
●
●
●
● ●
●
●
● ●0.0
0.2
0.4
0.6
0.8
1.0
C
5.0xG
BTreeAVN
N
et
G
BM
R
FG
PLS
PDA
N
N
et
PM
RAM
Boost
PC
AN
N
etM
AR
S
FDA
AdaBoost
VM
R
adia
igh
Likelihood
●Optimized Classifier D
Top-ranklikelihoodestimate
Caret optimization can substantially shift
the top-ranked classification techniques
41
●
●
●
● ●
●
●
● ●
●
● ●
● ●
● ●
PLS
PDA
N
N
et
PM
R
Boost
N
N
et
AR
S
FDA
Boost
adial
ecay
M
LP
R
BF
N
B
ipper
LM
T
●Optimized Classifier Default Classifier
●
●
●
●
●
●
●
●
● ●
●
●
● ●0.0
0.2
0.4
0.6
0.8
1.0
C
5.0xG
BTreeAVN
N
et
G
BM
R
FG
PLS
PDA
N
N
et
PM
RAM
Boost
PC
AN
N
etM
AR
S
FDA
AdaBoost
VM
R
adia
igh
Likelihood
●Optimized Classifier D
Top-ranklikelihoodestimate
Caret optimization can substantially shift
the top-ranked classification techniques
42
●
●
●
● ●
●
●
● ●
●
● ●
● ●
● ●
PLS
PDA
N
N
et
PM
R
Boost
N
N
et
AR
S
FDA
Boost
adial
ecay
M
LP
R
BF
N
B
ipper
LM
T
●Optimized Classifier Default Classifier
●
●
●
●
●
●
●
●
● ●
●
●
● ●0.0
0.2
0.4
0.6
0.8
1.0
C
5.0xG
BTreeAVN
N
et
G
BM
R
FG
PLS
PDA
N
N
et
PM
RAM
Boost
PC
AN
N
etM
AR
S
FDA
AdaBoost
VM
R
adia
igh
Likelihood
●Optimized Classifier D
Top-ranklikelihoodestimate
Caret increases the
likelihood of appearing in
the top rank by up to 83%
43
43
43
43
43
Ad

More Related Content

What's hot (20)

Number plate recogition
Number plate recogitionNumber plate recogition
Number plate recogition
hetvi naik
 
An IOT based solution for Road Accidents
An IOT based solution for Road AccidentsAn IOT based solution for Road Accidents
An IOT based solution for Road Accidents
ijtsrd
 
AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...
AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...
AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...
Kuntal Bhowmick
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for Youth
WebCrazyLabs
 
Signature recognition
Signature recognitionSignature recognition
Signature recognition
Vijju Lakkundi
 
Android Fingerprint Authentication
Android Fingerprint AuthenticationAndroid Fingerprint Authentication
Android Fingerprint Authentication
Gabriel Bernardo Pereira
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCV
Vishal Polley
 
Amazn go
Amazn goAmazn go
Amazn go
Mahima Raj Gupta
 
Captcha
CaptchaCaptcha
Captcha
Sumit Garg
 
Driver detection system_final.ppt
Driver detection system_final.pptDriver detection system_final.ppt
Driver detection system_final.ppt
MaseeraAhmed1
 
License Plate Recognition System
License Plate Recognition System License Plate Recognition System
License Plate Recognition System
Hira Rizvi
 
Graphical Password Authentication
Graphical Password AuthenticationGraphical Password Authentication
Graphical Password Authentication
Abha nandan
 
Automatic number plate recognition using matlab
Automatic number plate recognition using matlabAutomatic number plate recognition using matlab
Automatic number plate recognition using matlab
ChetanSingh134
 
Captcha
CaptchaCaptcha
Captcha
Chitra Mudunuru
 
Accident detection
Accident detection Accident detection
Accident detection
Samana Rao
 
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
NanubalaDhruvan
 
Vehicle registration plate recognition system
Vehicle registration plate recognition systemVehicle registration plate recognition system
Vehicle registration plate recognition system
shailendra92
 
License plate recognition
License plate recognitionLicense plate recognition
License plate recognition
slmnsvn
 
An Overview of Robotic Process Automation (RPA)
An Overview of Robotic Process Automation (RPA)An Overview of Robotic Process Automation (RPA)
An Overview of Robotic Process Automation (RPA)
ARJUN S MEDA
 
Speed detection-of-moving-vehicle-using-speed-cameras
Speed detection-of-moving-vehicle-using-speed-camerasSpeed detection-of-moving-vehicle-using-speed-cameras
Speed detection-of-moving-vehicle-using-speed-cameras
VIKAS SINGH BHADOURIA
 
Number plate recogition
Number plate recogitionNumber plate recogition
Number plate recogition
hetvi naik
 
An IOT based solution for Road Accidents
An IOT based solution for Road AccidentsAn IOT based solution for Road Accidents
An IOT based solution for Road Accidents
ijtsrd
 
AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...
AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...
AUTOMATIC LICENSE PLATE RECOGNITION SYSTEM FOR INDIAN VEHICLE IDENTIFICATION ...
Kuntal Bhowmick
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for Youth
WebCrazyLabs
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCV
Vishal Polley
 
Driver detection system_final.ppt
Driver detection system_final.pptDriver detection system_final.ppt
Driver detection system_final.ppt
MaseeraAhmed1
 
License Plate Recognition System
License Plate Recognition System License Plate Recognition System
License Plate Recognition System
Hira Rizvi
 
Graphical Password Authentication
Graphical Password AuthenticationGraphical Password Authentication
Graphical Password Authentication
Abha nandan
 
Automatic number plate recognition using matlab
Automatic number plate recognition using matlabAutomatic number plate recognition using matlab
Automatic number plate recognition using matlab
ChetanSingh134
 
Accident detection
Accident detection Accident detection
Accident detection
Samana Rao
 
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
NanubalaDhruvan
 
Vehicle registration plate recognition system
Vehicle registration plate recognition systemVehicle registration plate recognition system
Vehicle registration plate recognition system
shailendra92
 
License plate recognition
License plate recognitionLicense plate recognition
License plate recognition
slmnsvn
 
An Overview of Robotic Process Automation (RPA)
An Overview of Robotic Process Automation (RPA)An Overview of Robotic Process Automation (RPA)
An Overview of Robotic Process Automation (RPA)
ARJUN S MEDA
 
Speed detection-of-moving-vehicle-using-speed-cameras
Speed detection-of-moving-vehicle-using-speed-camerasSpeed detection-of-moving-vehicle-using-speed-cameras
Speed detection-of-moving-vehicle-using-speed-cameras
VIKAS SINGH BHADOURIA
 

Viewers also liked (20)

The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
Chakkrit (Kla) Tantithamthavorn
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
SAIL_QU
 
Optimization in Statistical Physics
Optimization in Statistical PhysicsOptimization in Statistical Physics
Optimization in Statistical Physics
Kwan-yuet Ho
 
Learning to Optimize
Learning to OptimizeLearning to Optimize
Learning to Optimize
Pramit Choudhary
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
Thomas Zimmermann
 
Investigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective FilesInvestigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective Files
The University of Adelaide
 
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
The University of Adelaide
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
The University of Adelaide
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
The University of Adelaide
 
Optimization in QBD
Optimization in QBDOptimization in QBD
Optimization in QBD
Suraj Choudhary
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
The University of Adelaide
 
An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...
corpaulbezemer
 
Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...
Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...
Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...
Plan4Demand
 
Process optimisation kkk
Process optimisation kkkProcess optimisation kkk
Process optimisation kkk
kumar143vyshu4
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann
 
An Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance TestsAn Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance Tests
SAIL_QU
 
Contactor Generalizations
Contactor GeneralizationsContactor Generalizations
Contactor Generalizations
Michael O'Dell
 
Optimization Techniques In Pharmaceutical Formulation & Processing
Optimization Techniques In Pharmaceutical Formulation & ProcessingOptimization Techniques In Pharmaceutical Formulation & Processing
Optimization Techniques In Pharmaceutical Formulation & Processing
APCER Life Sciences
 
Ch. 2-optimization-techniques
Ch. 2-optimization-techniquesCh. 2-optimization-techniques
Ch. 2-optimization-techniques
anj134u
 
Sensors & Actuators
Sensors & Actuators Sensors & Actuators
Sensors & Actuators
Abdul Abbasi
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
Chakkrit (Kla) Tantithamthavorn
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
SAIL_QU
 
Optimization in Statistical Physics
Optimization in Statistical PhysicsOptimization in Statistical Physics
Optimization in Statistical Physics
Kwan-yuet Ho
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
Thomas Zimmermann
 
Investigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective FilesInvestigating Code Review Practices in Defective Files
Investigating Code Review Practices in Defective Files
The University of Adelaide
 
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...Ph.D. Thesis Defense:  Studying Reviewer Selection and Involvement in Modern ...
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...
The University of Adelaide
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
The University of Adelaide
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
The University of Adelaide
 
An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...
corpaulbezemer
 
Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...
Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...
Demand Planning Leadership Exchange: SAP APO DP Statistical Forecast Optimiza...
Plan4Demand
 
Process optimisation kkk
Process optimisation kkkProcess optimisation kkk
Process optimisation kkk
kumar143vyshu4
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann
 
An Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance TestsAn Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance Tests
SAIL_QU
 
Contactor Generalizations
Contactor GeneralizationsContactor Generalizations
Contactor Generalizations
Michael O'Dell
 
Optimization Techniques In Pharmaceutical Formulation & Processing
Optimization Techniques In Pharmaceutical Formulation & ProcessingOptimization Techniques In Pharmaceutical Formulation & Processing
Optimization Techniques In Pharmaceutical Formulation & Processing
APCER Life Sciences
 
Ch. 2-optimization-techniques
Ch. 2-optimization-techniquesCh. 2-optimization-techniques
Ch. 2-optimization-techniques
anj134u
 
Sensors & Actuators
Sensors & Actuators Sensors & Actuators
Sensors & Actuators
Abdul Abbasi
 
Ad

Similar to Automated parameter optimization should be included in future 
defect prediction studies (20)

WCSMO-ModelSelection-2013
WCSMO-ModelSelection-2013WCSMO-ModelSelection-2013
WCSMO-ModelSelection-2013
OptiModel
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_Ali
MDO_Lab
 
Mi rna data analysis 2013
Mi rna data analysis 2013Mi rna data analysis 2013
Mi rna data analysis 2013
Elsa von Licy
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuning
Dr. Jyoti Obia
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
IJERA Editor
 
Partha Sengupta_structural analysis.pptx
Partha Sengupta_structural analysis.pptxPartha Sengupta_structural analysis.pptx
Partha Sengupta_structural analysis.pptx
JimmyPhoenix2
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
asdfasdf214078
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Paulo Faria
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Sagar Deogirkar
 
Icse2013 malik
Icse2013 malikIcse2013 malik
Icse2013 malik
SAIL_QU
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
Bioinformatics and Computational Biosciences Branch
 
Forecast of long term wind speed based on optimized support vector regression...
Forecast of long term wind speed based on optimized support vector regression...Forecast of long term wind speed based on optimized support vector regression...
Forecast of long term wind speed based on optimized support vector regression...
Aboul Ella Hassanien
 
Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handling
hiratufail
 
Measurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptMeasurement system analysis Presentation.ppt
Measurement system analysis Presentation.ppt
jawadullah25
 
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
QIAGEN
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
Martin Pinzger
 
Attribute MSA
Attribute MSA Attribute MSA
Attribute MSA
dishashah4993
 
Attribute MSA
Attribute MSAAttribute MSA
Attribute MSA
dishashah4993
 
WCSMO-ModelSelection-2013
WCSMO-ModelSelection-2013WCSMO-ModelSelection-2013
WCSMO-ModelSelection-2013
OptiModel
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_Ali
MDO_Lab
 
Mi rna data analysis 2013
Mi rna data analysis 2013Mi rna data analysis 2013
Mi rna data analysis 2013
Elsa von Licy
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuning
Dr. Jyoti Obia
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
IJERA Editor
 
Partha Sengupta_structural analysis.pptx
Partha Sengupta_structural analysis.pptxPartha Sengupta_structural analysis.pptx
Partha Sengupta_structural analysis.pptx
JimmyPhoenix2
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
asdfasdf214078
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Paulo Faria
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Sagar Deogirkar
 
Icse2013 malik
Icse2013 malikIcse2013 malik
Icse2013 malik
SAIL_QU
 
Forecast of long term wind speed based on optimized support vector regression...
Forecast of long term wind speed based on optimized support vector regression...Forecast of long term wind speed based on optimized support vector regression...
Forecast of long term wind speed based on optimized support vector regression...
Aboul Ella Hassanien
 
Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handling
hiratufail
 
Measurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptMeasurement system analysis Presentation.ppt
Measurement system analysis Presentation.ppt
jawadullah25
 
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
QIAGEN
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
Martin Pinzger
 
Ad

More from Chakkrit (Kla) Tantithamthavorn (13)

Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Chakkrit (Kla) Tantithamthavorn
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
Chakkrit (Kla) Tantithamthavorn
 
Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?
Chakkrit (Kla) Tantithamthavorn
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Chakkrit (Kla) Tantithamthavorn
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
Chakkrit (Kla) Tantithamthavorn
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Chakkrit (Kla) Tantithamthavorn
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
Chakkrit (Kla) Tantithamthavorn
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
Chakkrit (Kla) Tantithamthavorn
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location Technique
Chakkrit (Kla) Tantithamthavorn
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Chakkrit (Kla) Tantithamthavorn
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
Chakkrit (Kla) Tantithamthavorn
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
Chakkrit (Kla) Tantithamthavorn
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
Chakkrit (Kla) Tantithamthavorn
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Chakkrit (Kla) Tantithamthavorn
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
Chakkrit (Kla) Tantithamthavorn
 
Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?
Chakkrit (Kla) Tantithamthavorn
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Chakkrit (Kla) Tantithamthavorn
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
Chakkrit (Kla) Tantithamthavorn
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Chakkrit (Kla) Tantithamthavorn
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
Chakkrit (Kla) Tantithamthavorn
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
Chakkrit (Kla) Tantithamthavorn
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location Technique
Chakkrit (Kla) Tantithamthavorn
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Chakkrit (Kla) Tantithamthavorn
 

Recently uploaded (20)

Understanding Quantitative Research.pptx
Understanding Quantitative Research.pptxUnderstanding Quantitative Research.pptx
Understanding Quantitative Research.pptx
AzkaRao2
 
A tale of two Lucies: talk at the maths dept, Free University of Amsterdam
A tale of two Lucies: talk at the maths dept, Free University of AmsterdamA tale of two Lucies: talk at the maths dept, Free University of Amsterdam
A tale of two Lucies: talk at the maths dept, Free University of Amsterdam
Richard Gill
 
What Are Dendritic Cells and Their Role in Immunobiology?
What Are Dendritic Cells and Their Role in Immunobiology?What Are Dendritic Cells and Their Role in Immunobiology?
What Are Dendritic Cells and Their Role in Immunobiology?
Kosheeka : Primary Cells for Research
 
MITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptx
MITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptxMITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptx
MITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptx
shereneramdon1109
 
Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...
Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...
Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...
Francisco Sandoval Martínez
 
Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...
Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...
Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...
Francisco Sandoval Martínez
 
BS10003_Lecture2_Transcription_and_Translation.pptx
BS10003_Lecture2_Transcription_and_Translation.pptxBS10003_Lecture2_Transcription_and_Translation.pptx
BS10003_Lecture2_Transcription_and_Translation.pptx
yashjangade4
 
Nutritional Management of Heat Stress in broiler and layers
Nutritional Management of Heat Stress in broiler and layersNutritional Management of Heat Stress in broiler and layers
Nutritional Management of Heat Stress in broiler and layers
Bihar Veterinary College, Bihar Animal Sciences University, Patna, Bihar, India
 
The Man Who Dared to Challenge Newton and Won! - The True Story of Thane Hein...
The Man Who Dared to Challenge Newton and Won! - The True Story of Thane Hein...The Man Who Dared to Challenge Newton and Won! - The True Story of Thane Hein...
The Man Who Dared to Challenge Newton and Won! - The True Story of Thane Hein...
Thane Heins NOBEL PRIZE WINNING ENERGY RESEARCHER
 
habsburg monarchy and the ottoman empire.pptx
habsburg monarchy and the ottoman empire.pptxhabsburg monarchy and the ottoman empire.pptx
habsburg monarchy and the ottoman empire.pptx
milica912572
 
Subject name: Introduction to psychology
Subject name: Introduction to psychologySubject name: Introduction to psychology
Subject name: Introduction to psychology
beebussy155
 
Issues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdfIssues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdf
Angelo Salatino
 
Drug transport model in Advance Biopharmaceutics
Drug transport model in Advance BiopharmaceuticsDrug transport model in Advance Biopharmaceutics
Drug transport model in Advance Biopharmaceutics
RushiBharti
 
Artificial Intelligence_in_Chemistry_Presentation.pptx
Artificial Intelligence_in_Chemistry_Presentation.pptxArtificial Intelligence_in_Chemistry_Presentation.pptx
Artificial Intelligence_in_Chemistry_Presentation.pptx
sridevimotupalli
 
Animal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptxAnimal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptx
MahitaLaveti
 
GCMS analysis for plant sample GCMS conclusion.pptx
GCMS analysis for plant sample GCMS conclusion.pptxGCMS analysis for plant sample GCMS conclusion.pptx
GCMS analysis for plant sample GCMS conclusion.pptx
SiddheshPote2
 
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth UniversityEuclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Peter Coles
 
Data Structures Stack and Queue Data Structures
Data Structures Stack and Queue Data StructuresData Structures Stack and Queue Data Structures
Data Structures Stack and Queue Data Structures
poongothai11
 
FLOWER class 9 ICSE Class9 detailed.pptx
FLOWER class 9 ICSE Class9 detailed.pptxFLOWER class 9 ICSE Class9 detailed.pptx
FLOWER class 9 ICSE Class9 detailed.pptx
priyagaffoor
 
Nested PCR Microteaching ppt by Nitin.pptx
Nested PCR Microteaching ppt by Nitin.pptxNested PCR Microteaching ppt by Nitin.pptx
Nested PCR Microteaching ppt by Nitin.pptx
NitinchaudharY351367
 
Understanding Quantitative Research.pptx
Understanding Quantitative Research.pptxUnderstanding Quantitative Research.pptx
Understanding Quantitative Research.pptx
AzkaRao2
 
A tale of two Lucies: talk at the maths dept, Free University of Amsterdam
A tale of two Lucies: talk at the maths dept, Free University of AmsterdamA tale of two Lucies: talk at the maths dept, Free University of Amsterdam
A tale of two Lucies: talk at the maths dept, Free University of Amsterdam
Richard Gill
 
MITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptx
MITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptxMITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptx
MITOSIS PRESENTATION FOR BIOLOGY AND HSB.pptx
shereneramdon1109
 
Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...
Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...
Antliff, Mark. - Avant-Garde Fascism. The Mobilization of Myth, Art, and Cult...
Francisco Sandoval Martínez
 
Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...
Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...
Antliff, Allan . - Anarchist Modernism. Art, Politics, and the First American...
Francisco Sandoval Martínez
 
BS10003_Lecture2_Transcription_and_Translation.pptx
BS10003_Lecture2_Transcription_and_Translation.pptxBS10003_Lecture2_Transcription_and_Translation.pptx
BS10003_Lecture2_Transcription_and_Translation.pptx
yashjangade4
 
habsburg monarchy and the ottoman empire.pptx
habsburg monarchy and the ottoman empire.pptxhabsburg monarchy and the ottoman empire.pptx
habsburg monarchy and the ottoman empire.pptx
milica912572
 
Subject name: Introduction to psychology
Subject name: Introduction to psychologySubject name: Introduction to psychology
Subject name: Introduction to psychology
beebussy155
 
Issues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdfIssues in using AI in academic publishing.pdf
Issues in using AI in academic publishing.pdf
Angelo Salatino
 
Drug transport model in Advance Biopharmaceutics
Drug transport model in Advance BiopharmaceuticsDrug transport model in Advance Biopharmaceutics
Drug transport model in Advance Biopharmaceutics
RushiBharti
 
Artificial Intelligence_in_Chemistry_Presentation.pptx
Artificial Intelligence_in_Chemistry_Presentation.pptxArtificial Intelligence_in_Chemistry_Presentation.pptx
Artificial Intelligence_in_Chemistry_Presentation.pptx
sridevimotupalli
 
Animal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptxAnimal Models for Biological and Clinical Research ppt 2.pptx
Animal Models for Biological and Clinical Research ppt 2.pptx
MahitaLaveti
 
GCMS analysis for plant sample GCMS conclusion.pptx
GCMS analysis for plant sample GCMS conclusion.pptxGCMS analysis for plant sample GCMS conclusion.pptx
GCMS analysis for plant sample GCMS conclusion.pptx
SiddheshPote2
 
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth UniversityEuclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Peter Coles
 
Data Structures Stack and Queue Data Structures
Data Structures Stack and Queue Data StructuresData Structures Stack and Queue Data Structures
Data Structures Stack and Queue Data Structures
poongothai11
 
FLOWER class 9 ICSE Class9 detailed.pptx
FLOWER class 9 ICSE Class9 detailed.pptxFLOWER class 9 ICSE Class9 detailed.pptx
FLOWER class 9 ICSE Class9 detailed.pptx
priyagaffoor
 
Nested PCR Microteaching ppt by Nitin.pptx
Nested PCR Microteaching ppt by Nitin.pptxNested PCR Microteaching ppt by Nitin.pptx
Nested PCR Microteaching ppt by Nitin.pptx
NitinchaudharY351367
 

Automated parameter optimization should be included in future 
defect prediction studies

  • 1. Automated Parameter Optimization for Defect Prediction Models Chakkrit (Kla)
 Tantithamthavorn Shane McIntosh Ahmed E. Hassan Kenichi Matsumoto http://chakkrit.com [email protected] @klainfo
  • 2. Defect models are used to predict software modules that are likely to be defective in the future 2 Pre-release period Releasedate Post-release period Defect prediction models Module A Module C Module B Module D Module A Module C Module B Module D Clean Defect-prone Clean Defect-prone
  • 3. Defect models are trained 
 using classification techniques 3 1 2 Decision Tree
 Algorithms Regression 
 Algorithms Clustering 
 Algorithms Ensemble 
 Algorithms
  • 4. Such classification techniques often require parameter settings 4 Ensemble 
 Algorithms
  • 5. Such classification techniques often require parameter settings 4 The number of trees in 
 a random forest classifier Ensemble 
 Algorithms
  • 6. Such classification techniques often require parameter settings 4 The number of trees in 
 a random forest classifier 26 of the 30 most commonly used classification techniques require at least one parameter setting Ensemble 
 Algorithms
  • 7. 5 Defect models may underperform if they are trained using suboptimal parameter settings The default settings of 
 random forest, naïve bayes, and support vector machines
 are suboptimal
 [Jiang et al., DEFECTS’08]
 [Tosun et al., ESEM’09] [Hall et al., TSE’12]
  • 8. 6 Different toolkits have different default settings for the same classification technique randomForest package Default setting of the number of trees 
 in a random forest 10 50 100 500 bigrf package
  • 9. 7 The parameter space is too large for manual inspection There are at least 17,000 possible settings to 
 explore when training k-NN classifiers
 [Kocaguneli et al., TSE’12]
  • 10. 8 The parameter space is too large for manual inspection There are at least 17,000 possible settings to 
 explore when training k-NN classifiers
 [Kocaguneli et al., TSE’12] How do automated parameter optimization techniques fare when applied to defect prediction?
  • 14. Caret — an off-the-shelf automated parameter optimization technique 12 (Step-1)
 Generate candidate settings Settings (Step-2)
 Evaluate candidate settings Performance
 for each setting (Step-3)
 Identify optimal setting Optimal
 setting
  • 15. Generate a set of 
 candidate settings to evaluate 13 #Trees for random forest #Trees = 10 #Trees = 20 #Trees = 30 #Trees = 40 #Trees = 50 (Step-1)
 Generate candidate settings
  • 16. Evaluate the performance of each candidate setting using bootstrap validation 14 Defect 
 Dataset Testing 
 Corpus Training 
 Corpus Generate bootstrap samples Construct
 defect model Model Calculate performance Perf. Out-of-sample Bootstrap Validation 
 with 100 repetitions (Step-1)
 Generate candidate settings (Step-2)
 Evaluate candidate settings
  • 17. The optimal setting is the one that achieved the top performance score 15 AUC=0.65 AUC=0.68 AUC=0.70 AUC=0.80 AUC=0.86 10 20 30 40 50 (Step-1)
 Generate candidate settings (Step-2)
 Evaluate candidate settings (Step-3)
 Identify optimal setting
  • 18. We study a collection of 
 18 datasets from 5 open corpora 16 A threat of bias exists if researchers fixate on studying the same datasets with the same metrics
 [Tantithamthavorn et al., TSE’16]
  • 19. We study a collection of 
 18 datasets from 5 open corpora 16 1-7K Modules
 21-28% Defective Rate
 21-38 Metrics
 [Shepperd et al., TSE’13] 1-10K Modules
 11-44% Defective Rate
 15-32 Metrics
 
 [Zimmermann et al., PROMISE’07] [D’Ambros et al., MSR’10]
 [Kim et al., ICSE’11] 600-800 Modules
 36-48% Defective Rate
 20 Metrics
 
 [Jureczko et al., PROMISE’10] A threat of bias exists if researchers fixate on studying the same datasets with the same metrics
 [Tantithamthavorn et al., TSE’16]
  • 20. Compute the performance improvement 17 Construct + evaluate 
 Caret-optimized models Optimized
 setting Defect
 dataset Construct + evaluate 
 default models Default
 setting
  • 21. Compute the performance improvement 17 Construct + evaluate 
 Caret-optimized models Optimized
 setting Defect
 dataset Construct + evaluate 
 default models Default
 setting Caret-optimized 
 performance 100x Technique 1 AUC default
 performance 100x Technique 1 AUC
  • 22. Compute the performance improvement 17 Construct + evaluate 
 Caret-optimized models Optimized
 setting Defect
 dataset Construct + evaluate 
 default models Default
 setting Caret-optimized 
 performance 100x Technique 1 AUC default
 performance 100x Technique 1 AUC Average Average
  • 23. Compute the performance improvement 17 Construct + evaluate 
 Caret-optimized models Optimized
 setting Defect
 dataset Construct + evaluate 
 default models Default
 setting Caret-optimized 
 performance 100x Technique 1 AUC default
 performance 100x Technique 1 AUC Average Average Performance 
 Improvement
  • 24. Large Medium Small ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adia G AM AUCPerformanceImprovement Parameter settings can substantially influence the performance of defect prediction models 18 Each boxplot presents the performance improvement for all
 the 18 studied datasets
  • 25. Large Medium Small ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adia G AM AUCPerformanceImprovement Parameter settings can substantially influence the performance of defect prediction models 19 9 of the 26 studied classification techniques have a large performance improvement
  • 26. Large Medium Small ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adia G AM AUCPerformanceImprovement Parameter settings can substantially influence the performance of defect prediction models 20 C5.0 and AdaBoost have a median improvement of 0.27 and 0.14 AUC
  • 27. Large Medium Small ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adia G AM AUCPerformanceImprovement Parameter settings can substantially influence the performance of defect prediction models 21 C5.0 and AdaBoost
 span up to 0.40 AUC
  • 28. Large Medium Small Ne ● ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adial G AM Boost R FR ippe AUCPerformanceImprovement 22 Caret improves the AUC performance by up to 40 percentage points Performance Improvement Performance
 Stability
  • 29. 23 Performance
 Stability Large Medium Small Ne ● ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adial G AM Boost R FR ippe AUCPerformanceImprovement Caret improves the AUC performance by up to 40 percentage points Performance Improvement
  • 30. 24 Default settings may introduce instability into defect prediction models Unstable performance estimates may introduce bias into the conclusion of research 
 [Jorgensen et al., TSE’07] [Menzies and Shepperd, EMSE’12]
  • 31. Estimating the stability of 
 defect prediction models 25 Construct + evaluate 
 Caret-optimized models Optimized
 setting Construct + evaluate 
 default models Default
 setting Defect
 dataset
  • 32. Estimating the stability of 
 defect prediction models 25 Construct + evaluate 
 Caret-optimized models Optimized
 setting Construct + evaluate 
 default models Default
 setting Caret-optimized 
 performance 100x Technique 1 AUC Default
 performance 100x Technique 1 AUC Defect
 dataset
  • 33. Estimating the stability of 
 defect prediction models 25 Construct + evaluate 
 Caret-optimized models Optimized
 setting Construct + evaluate 
 default models Default
 setting Caret-optimized 
 performance 100x Technique 1 AUC Default
 performance 100x Technique 1 AUC Defect
 dataset Standard
 Deviation (S.D.) Standard
 Deviation (S.D.)
  • 34. Estimating the stability of 
 defect prediction models 25 Construct + evaluate 
 Caret-optimized models Optimized
 setting Construct + evaluate 
 default models Default
 setting Caret-optimized 
 performance 100x Technique 1 AUC Default
 performance 100x Technique 1 AUC Stability 
 Ratio = S.D. of Optimized S.D. of Default Defect
 dataset Standard
 Deviation (S.D.) Standard
 Deviation (S.D.)
  • 35. Large Medium Small 0.0 0.5 1.0 1.5 2.0 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R ad G A StabilityRatio Caret-optimized classifiers tend to be more stable than default classifiers 26 Cliff’s delta = Large Each boxplot presents the stability ratio for all
 the 18 studied datasets
  • 36. Large Medium Small 0.0 0.5 1.0 1.5 2.0 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R ad G A StabilityRatio Caret-optimized classifiers tend to be more stable than default classifiers 27 Cliff’s delta = Large A stability ratio lower than one indicates that Caret-optimized classifiers tend to be more stable than default classifiers
  • 37. Large Medium Small 0.0 0.5 1.0 1.5 2.0 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R ad G A StabilityRatio Caret-optimized classifiers tend to be more stable than default classifiers 28 Stability ratio is lower than 1 for 35% of the studied classification techniques Cliff’s delta = Large
  • 38. Medium Small Negligible D ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adial G AM Boost R FR ipperPM R PDAM AR S SVM Linear J48 Caret-optimized classifiers are at least as stable as default classifiers 29 Stability ratio is about 0 for 65% of the studied classification techniques
  • 39. Large 0.0 0.5 1.0 1.5 2.0 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP L StabilityRatio 30 Performance
 Stability Large Medium Small Ne ● ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 C 5.0 AdaBoost AVN N etC ART PC AN N etN N etFDA M LPW eightD ecayM LP LM TG PLS LogitBoostKN N xG BTreeG BM N B R BF SVM R adial G AM Boost R FR ippe AUCPerformanceImprovement Performance Improvement Caret improves the AUC performance by up to 40 percentage points. Caret-optimized classifiers tend to be more stable than classifiers trained with default settings
  • 40. 31 Suboptimal parameter settings
 may have an impact on 
 prior defect prediction results!
  • 41. Prior findings on top- performing classification techniques 32 17 of 22 classification techniques are indistinguishable
 
 [Lessmann et. al. TSE’08] Classification techniques have a large impact on the performance
 
 [Ghotra et al., ICSE’15]
  • 42. Prior findings on top- performing classification techniques 32 However, these studies have not taken 
 parameter optimization into account 17 of 22 classification techniques are indistinguishable
 
 [Lessmann et. al. TSE’08] Classification techniques have a large impact on the performance
 
 [Ghotra et al., ICSE’15]
  • 43. Identifying statistically distinct ranks of classification techniques 33 100x Technique 1 AUC Performance
 Distribution 100x Technique 26 AUC Performance
 Distribution Dataset 1 ….100x Technique 2 AUC Performance
 Distribution
  • 44. Identifying statistically distinct ranks of classification techniques 33 Scott-Knott ESD test 100x Technique 1 AUC Performance
 Distribution 100x Technique 26 AUC Performance
 Distribution Dataset 1 ….100x Technique 2 AUC Performance
 Distribution
  • 45. Identifying statistically distinct ranks of classification techniques 33 Scott-Knott ESD test 100x Technique 1 AUC Performance
 Distribution 100x Technique 26 AUC Performance
 Distribution Dataset 1 ….100x Technique 2 AUC Performance
 Distribution Ranking for 
 dataset 1
  • 46. Identifying statistically distinct ranks of classification techniques 33 Scott-Knott ESD test 100x Technique 1 AUC Performance
 Distribution 100x Technique 26 AUC Performance
 Distribution Dataset 1 ….100x Technique 2 AUC Performance
 Distribution Ranking for 
 dataset 1 Ranking for 
 Dataset 18 Scott-Knott ESD test Dataset 18 100x Technique 1 AUC Performance
 Distribution 100x Technique 26 AUC Performance
 Distribution ….100x Technique 2 AUC Performance
 Distribution
  • 47. Identifying statistically distinct ranks of classification techniques 34 Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Scott-Knott ESD test 100x Technique 26 AUC Performance
 Distribution Dataset 1 Ranking for 
 dataset 1 Ranking for 
 Dataset 18 Scott-Knott ESD test Dataset 18 100x Technique 26 AUC Performance
 Distribution
  • 48. Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Compute the proportion of datasets where a classifier appears in the top rank 35 Likelihood for each 
 technique T1 T2 T3 0.67 0.67 0 Compute 
 likelihood
  • 49. Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Compute the proportion of datasets where a classifier appears in the top rank 36 Likelihood for each 
 technique T1 T2 T3 0.67 0.67 0 Compute 
 likelihood
  • 50. Bootstrap resampling to combat sample selection bias 37 Bootstrap
 sample of ranking Bootstrap 
 Sampling Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Dataset T1 T2 T3 1 2 1 3 2 1 2 3 2 1 2 3
  • 51. Re-compute the likelihood 
 for each sample 38 Likelihood for each 
 technique T1 T2 T3 0.67 0.33 0 Bootstrap 
 Sampling Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Dataset T1 T2 T3 1 2 1 3 2 1 2 3 2 1 2 3 Compute 
 likelihood Bootstrap
 sample of ranking
  • 52. Repeat the bootstrap 100 times to estimate the confidence interval 39 Bootstrap 
 Sampling Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Dataset T1 T2 T3 1 2 1 3 2 1 2 3 2 1 2 3 Bootstrap
 sample of ranking
  • 53. Repeat the bootstrap 100 times to estimate the confidence interval 39 Bootstrap 
 Sampling Pool of ranking 
 for each dataset Dataset T1 T2 T3 1 2 1 3 2 1 2 3 3 1 1 2 Dataset T1 T2 T3 1 2 1 3 2 1 2 3 2 1 2 3 Repeat 100 times T1 T2 T3 0.67 0.33 0 … … … 0.33 0 0 Distribution 
 of likelihood Compute 
 likelihood Bootstrap
 sample of ranking
  • 54. Caret optimization can substantially shift the top-ranked classification techniques 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PLS PDA N N et PM R Boost N N et AR S FDA Boost adial ecay M LP R BF N B ipper LM T ●Optimized Classifier Default Classifier ● ● ● ● ● ● ● ● ● ● ● ● ● ●0.0 0.2 0.4 0.6 0.8 1.0 C 5.0xG BTreeAVN N et G BM R FG PLS PDA N N et PM RAM Boost PC AN N etM AR S FDA AdaBoost VM R adia igh Likelihood ●Optimized Classifier D Top-ranklikelihoodestimate
  • 55. Caret optimization can substantially shift the top-ranked classification techniques 41 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PLS PDA N N et PM R Boost N N et AR S FDA Boost adial ecay M LP R BF N B ipper LM T ●Optimized Classifier Default Classifier ● ● ● ● ● ● ● ● ● ● ● ● ● ●0.0 0.2 0.4 0.6 0.8 1.0 C 5.0xG BTreeAVN N et G BM R FG PLS PDA N N et PM RAM Boost PC AN N etM AR S FDA AdaBoost VM R adia igh Likelihood ●Optimized Classifier D Top-ranklikelihoodestimate
  • 56. Caret optimization can substantially shift the top-ranked classification techniques 42 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PLS PDA N N et PM R Boost N N et AR S FDA Boost adial ecay M LP R BF N B ipper LM T ●Optimized Classifier Default Classifier ● ● ● ● ● ● ● ● ● ● ● ● ● ●0.0 0.2 0.4 0.6 0.8 1.0 C 5.0xG BTreeAVN N et G BM R FG PLS PDA N N et PM RAM Boost PC AN N etM AR S FDA AdaBoost VM R adia igh Likelihood ●Optimized Classifier D Top-ranklikelihoodestimate Caret increases the likelihood of appearing in the top rank by up to 83%
  • 57. 43
  • 58. 43
  • 59. 43
  • 60. 43
  • 61. 43