Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization

Bench4BL: Reproducibility Study on
the Performance of IR-Based Bug
Localization
Jaekwon Lee1, Dongsun Kim1, Tegawendé F. Bissyandé1,  
Woosung Jung2, Yves Le Traon1

1SnT, University of Luxembourg - Luxembourg

2Seoul National University of Education - South Korea

Bug Localization
!3
Where should we ﬁx?

Bug Localization
!4
Model
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
Bug Report
………..
…. …..
…..….
…….. 
….
..
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
A set of code ﬁles
Bug Localization

F(x)
Test Case
Test Case
Test Case
01: MultiMap<Method, Long> duration = new MultiMap<Method, Long>();
02: // run all benchmarks in same order, recording duration
03: for (Method m : benchmarks) {
04: System.err.println("# "+m.getName()+" benchmarking");
05: List<Integer> reps = getReps(min_reps, m);
06: for (int r : reps) {
07: System.gc();
08: long start = System.nanoTime();
09: m.invoke(suite,r);
10: long stop = System.nanoTime();
11: duration.map(m, stop - start);
12: }
13: }
Function
07: System.gc();
12: }
13: }
Function
Fault Localization
Bug Localization
!5
Bug Localization
Model
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
Bug Report
………..
…. …..
…..….
…….. 
….
..
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java

Bug Localization
F(x)
Test Case
Test Case
Test Case
07: System.gc();
12: }
13: }
Function
07: System.gc();
12: }
13: }
Function
Fault Localization
Bug Localization
!6
Model
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
Bug Report
………..
…. …..
…..….
…….. 
….
..
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java
…..
…..…..
…..…..
…..…..
….…..Java

…..…..
…..…..
…..…..
…..…..
…..…..
Source  
Codes
……..……..
……. ……..
…..……..
……..……..
…….. …….
Bug
Report
Information Retrieval based
Bug Localization (IRBL)
!7

…..…..
…..…..
…..…..
…..…..
…..…..
Source  
Codes
……..……..
……. ……..
…..……..
……..……..
…….. …….
Bug
Report
NL tokens
Code elements
Meta Info.
NL tokens
Code elements
Meta Info.
Extracting Features
Extracting Features
!8

Feature 
Vector
…..…..
…..…..
…..…..
…..…..
…..…..
Source  
Codes
……..……..
……. ……..
…..……..
……..……..
…….. …….
Bug
Report
….
Feature  
Vectors
NL tokens
Code elements
Meta Info.
NL tokens
Code elements
Meta Info.
Extracting Features
Extracting Features
!9

Feature 
Vector
…..…..
…..…..
…..…..
…..…..
…..…..
Source  
Codes
……..……..
……. ……..
…..……..
……..……..
…….. …….
Bug
Report
Recommend 
Code Files
…..
…..…..
…..…..
…..…..
…..
…..…..
…..…..
…..…..
…..
…..…..
…..…..
…..…..
…..
…..…..
…..…..
…..…..
1
2
3
N
….
….
Feature  
Vectors
NL tokens
Code elements
Meta Info.
NL tokens
Code elements
Meta Info.
Extracting Features
Extracting Features
Comparing Similarity 
& Ranking
!10

!12
Are these results mature enough?
Not enough maturity of performance
Subjects BRTracer BLUiR AmaLgam Locus
ZXing 0.445 0.380 0.410 0.502
SWT 0.467 0.560 0578 0.640
AspectJ 0.264 0.263 0.271 0.320
PDE 0.367 0.349 0.322 0.422
JDT 0.232 0.277 0.282 0.359
(metric : MAP)

Are the subjects still usable?
!13
PDE
Eclipse
ZXing
AspectJ
JDT
SWT
98
286
20
#Reports Period
2004 - 2016
2004 - 2010
2002 - 2006
2010 - 2010
Subject
Out-of-dated subjects
60
98

Are the subjects still usable?
!14
PDE
Eclipse
ZXing
AspectJ
JDT
SWT
98
286
20
#Reports Period
2004 - 2016
2004 - 2010
2002 - 2006
2010 - 2010
Subject
Out-of-dated subjects
60
98

Evaluation Conﬁguration?
!15
Inconsistent evaluation settings
BugLocator
BLIA
Locus
AmaLgam
BRTracer
BLUiR
Version Matching
Test ﬁle inclusion

Experiment  
Data Set
RQ1: To what extent do IRBL techniques
perform on up-to-date subjects?
Research Questions
!17

Experiment  
Data Set
Experiment
Conﬁguration
perform on up-to-date subject?
RQ2: What is the impact of version
matching on the performance of IRBL
techniques?
RQ3: To what extent are IRBL techniques
sensitive to the inclusion of test code ﬁles?
Research Questions
!18

Experiment  
Data Set
Experiment
Conﬁguration
Potential
Improvement
perform on up-to-date subject?
RQ2: What is the impact of version
matching on the performance of IRBL
techniques?
RQ3: To what extent are IRBL techniques
sensitive to the inclusion of test code ﬁles?
RQ4: What potential performance gain can
be reached by leveraging duplicate bug
reports?
Research Questions
!19

BugLocator 
(ICSE 2012)
BLIA 
(APSEC 2015)
Locus 
(ICSE 2016)
AmaLgam 
(ICPC 2014)
BRTracer 
(ICSME 2014)
BLUiR 
(ASE 2013)
IRBL Features Sub Modules
Bug report fixing historyFull text
Code 
segmentations
Identifiers
Identifiers
Identifiers
Identifiers
Bug report fixing history
Bug report fixing history, Revision history
Revision history
Bug report fixing history 
Stack Trace Analysis, Revision history
Bug report fixing history,  
Stack Trace Analysis
IRBL Techniques we used
!20

Subjects
!21
20+
Written in Java
Publicly available

bug reports
20 source code ﬁles  
in one of its version

Subjects
46 
Projects
New Subjects
9,459  
Bug
Reports
………..
…. …..
…..….
…….. 
….
..
5 
Projects
558  
Bug
Reports
………..
…. …..
…..….
…….. 
….
..
Old Subjects
!22

Subjects
46 
Projects
New Subjects
690  
Major
Versions
9,459  
Bug
Reports
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
807  
Duplicate
Reports
5 
Projects
5  
Major
Versions
558  
Bug
Reports
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
136  
Duplicate
Reports
Old Subjects
!23

!24
………..
…. …..
…..….
…….. 
….
..
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
Bug Oracle
New Subjects Old Subjects
………..
…. …..
…..….
…….. 
….
..
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
Bug Oracle
VS.
Single version matching
Test ﬁle included
Conﬁguration
RQ1:  
The use of old vs. new subjects

Single version
Matching
Multiple version
Matching
!25
VS.
Conﬁguration
New subjects
Test ﬁles included
RQ2:  
The importance of version matching

!26
………..
…. …..
…..….
…….. 
….
..
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
Bug Oracle
Test File Included Test File Excluded
VS.
+Test
………..
…. …..
…..….
…….. 
….
..
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
Bug Oracle
CAMEL-12558: Transacted and Policy should not have outputs
M main/java/org/apache/camel/model/PolicyDefinition.java

M main/java/org/apache/camel/model/TransactedDefinition.java

A test/java/org/apache/camel/catalog/CamelCatalog.java

A main/java/org/apache/camel/tools/apt/CoreEipAnnotationPrint.java

Added camel-web3j Spring-boot test
A test/java/org/apache/camel/itest/springboot/CamelWeb3jTest.java 
 
Update GoogleBigQueryProducer.java

M main/java/org/apache/camel/component/GoogleBigQueryProducer.java
Configuration
Multiple version matching
New subjects
Commit Log
RQ3:  
The impact of test file inclusion

!27
Master reports Merged reportsDuplicate reports
………..
…. …..
…..….
…….. 
….
..
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
Bug Oracle  
(Master reports)
………..
…. …..
…..….
…….. 
….
..
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
Bug Oracle  
(Duplicate reports)
Bug Oracle  
(Merged reports)
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
…..…..
…..…..
…..…..
…..….…..
Java
………..
…. …..
…..….
…….. 
….
..
…..….
…….. 
….
………..
…. …..
…..….
…….. 
….
..
…..….
…….. 
….
………..
…. …..
…..….
…….. 
….
..
…..….
…….. 
….
Conﬁguration
For the all subjects 
Including test ﬁles in Bug Oracle
RQ4:  
Leveraging duplicate bugs reports

Metrics
MAP
MRR
MAP =
1
M
MX
j=1
AP(j)
MRR =
1
M
MX
i=1
1
f-ranki
MAP =
1
M
MX
j=1
AP(j)
MRR =
1
M
MX
i=1
1
f-ranki
!29
Mean Average Precision
Mean Reciprocal Rank

●●
●●
0.359
0.35
0.359
0.365
0.363
0.38Locus
BLIA
AmaLgam
BLUiR
BRTracer
BugLocator
0.00 0.25 0.50 0.75 1.00
Distribution of MAP values of all subjects for each techniques
0.455
0.501
0.43
0.516
0.497
0.506Locus
BLIA
AmaLgam
BLUiR
BRTracer
BugLocator
0.00 0.25 0.50 0.75 1.00
Distribution of MRR values of all subjects for each techniques
Baseline Performance
!30

●●
●●
0.359
0.35
0.359
0.365
0.363
0.38Locus
BLIA
AmaLgam
BLUiR
BRTracer
BugLocator
0.00 0.25 0.50 0.75 1.00
Distribution of MAP values of all subjects for each techniques
0.455
0.501
0.43
0.516
0.497
0.506Locus
BLIA
AmaLgam
BLUiR
BRTracer
BugLocator
0.00 0.25 0.50 0.75 1.00
Distribution of MRR values of all subjects for each techniques
Baseline Performance
Bug localization still has much room for improvement.
!31

Technique
Old Subjects New Subjects
MAP MRR MAP MRR
BugLocator 0.2692 0.3985 ↗0.3052 ↗0.4223
BRTracer 0.2645 0.3664 ↗0.3330 ↗0.4690
BLUiR 0.3102 0.4556 0.2881 0.3869
AmaLgam 0.2950 0.4072 0.2906 0.3899
BLIA 0.2935 0.4242 ↗0.3014 0.4155
Locus 0.2641 0.3399 ↗0.3289 ↗0.4430
Summary of MAP/MRR of IRBL techniques
!32
Conﬁguration
RQ1:  

Not over-ﬁtted to old subjects
Technique
Old Subjects New Subjects
MAP MRR MAP MRR
BugLocator 0.2692 0.3985 ↗0.3052 ↗0.4223
BRTracer 0.2645 0.3664 ↗0.3330 ↗0.4690
BLUiR 0.3102 0.4556 0.2881 0.3869
AmaLgam 0.2950 0.4072 0.2906 0.3899
BLIA 0.2935 0.4242 ↗0.3014 0.4155
Locus 0.2641 0.3399 ↗0.3289 ↗0.4430
!33
Conﬁguration
RQ1:  

!34
Technique
Single Version Multiple Version
MAP MRR MAP MRR
BugLocator 0.3052 0.4223 ↗0.3713 ↗0.5075
BRTracer 0.3330 0.4690 ↗0.3992 ↗0.5526
BLUiR 0.2881 0.3869 ↗0.3623 ↗0.4802
AmaLgam 0.2906 0.3899 ↗0.3657 ↗0.4840
BLIA 0.3014 0.4155 ↗0.3777 ↗0.5124
Locus 0.3289 0.4430 ↗0.4217 ↗0.5514
New subjects
Conﬁguration
RQ2:  

New subjects
The evaluation/execution of IRBL techniques should apply
multiple version matching
!35
Technique
Single Version Multiple Version
MAP MRR MAP MRR
BugLocator 0.3052 0.4223 ↗0.3713 ↗0.5075
BRTracer 0.3330 0.4690 ↗0.3992 ↗0.5526
BLUiR 0.2881 0.3869 ↗0.3623 ↗0.4802
AmaLgam 0.2906 0.3899 ↗0.3657 ↗0.4840
BLIA 0.3014 0.4155 ↗0.3777 ↗0.5124
Locus 0.3289 0.4430 ↗0.4217 ↗0.5514
Conﬁguration
RQ2:  

ConﬁgurationSummary of MAP/MRR of IRBL techniques
!36
New subjects
RQ3:  
Technique
Test files excluded Test files included
MAP MRR MAP MRR
BugLocator 0.3811 0.4647 0.3713 ↗0.5075
BRTracer 0.4141 0.5090 0.3992 ↗0.5526
BLUiR 0.3603 0.4385 ↗0.3623 ↗0.4802
AmaLgam 0.3633 0.4420 0.3657 ↗0.4840
BLIA 0.3902 0.4728 ↗0.3777 ↗0.5124
Locus 0.4146 0.5002 ↗0.4217 ↗0.5514

Technique
Test files excluded Test files included
MAP MRR MAP MRR
BugLocator 0.3811 0.4647 0.3713 ↗0.5075
BRTracer 0.4141 0.5090 0.3992 ↗0.5526
BLUiR 0.3603 0.4385 ↗0.3623 ↗0.4802
AmaLgam 0.3633 0.4420 0.3657 ↗0.4840
BLIA 0.3902 0.4728 ↗0.3777 ↗0.5124
Locus 0.4146 0.5002 ↗0.4217 ↗0.5514
!37
Including test ﬁles does not bring bias or noise
Conﬁguration
New subjects
RQ3:  

RQ4:  
!38
Technique
Master Duplicate Merged
(Master+Duplicate)
MAP MRR MAP MRR MAP MRR
BugLocator 0.3503 0.5051 0.3259 0.4667 0.3502 ↗0.5249
BRTracer 0.3852 0.5508 0.3776 0.5430 0.3787 ↗0.5692
BLUiR 0.3159 0.4540 0.2804 0.4192 ↗0.3325 ↗0.4728
AmaLgam 0.3202 0.4581 0.2829 0.4223 ↗0.3327 ↗0.4725
BLIA 0.3518 0.4915 0.3231 0.4537 ↗0.3577 ↗0.5041
Locus 0.2915 0.4707 0.2871 ↗0.4724 ↗0.3042 ↗0.5021

!39
RQ4:  
Technique
Master Duplicate Merged
(Master+Duplicate)
MAP MRR MAP MRR MAP MRR
BugLocator 0.3503 0.5051 0.3259 0.4667 0.3502 ↗0.5249
BRTracer 0.3852 0.5508 0.3776 0.5430 0.3787 ↗0.5692
BLUiR 0.3159 0.4540 0.2804 0.4192 ↗0.3325 ↗0.4728
AmaLgam 0.3202 0.4581 0.2829 0.4223 ↗0.3327 ↗0.4725
BLIA 0.3518 0.4915 0.3231 0.4537 ↗0.3577 ↗0.5041
Locus 0.2915 0.4707 0.2871 ↗0.4724 ↗0.3042 ↗0.5021
Duplicate reports are complement master bug reports and
guarantee a minimum level of performance

!41
Dataset Available
https://github.com/exatoa/Bench4BL

Bug-Code Linking
Bug Report




 

Code
Repository
Commit Log
!43

Bug-Code Linking
Bug Report




 

Code
Repository
Commit Log
!44

Bug Oracle
………..
…. …..
…..….
…….. 
….
..
main/java/org/apache/camel/model/PolicyDeﬁnition.java

main/java/org/apache/camel/model/TransactedDeﬁnition.java

test/java/org/apache/camel/catalog/CamelCatalog.java

main/java/org/apache/camel/tools/apt/CoreEipAnnotationPrint.java
………..
…. …..
…..….
…….. 
….
..
main/java/org/apache/camel/GoogleBigQueryProducer.java
………..
…. …..
…..….
…….. 
….
..
main/java/org/apache/camel/component/StringConcatenator.java
Bug Report 1
Bug Report 2
Bug Report 3
…….
!45

Version Matching Strategy
Single version
Matching
!47
Previous Techniques

Version Matching Strategy
Single version
Matching
Multiple version
Matching
!48
Previous Techniques

Version Matching Approach
Selecting earliest version
!50

Test File Inclusion




 

Code
Repository
Commit LogBugLocator
BLIA
Locus
AmaLgam
BRTracer
BLUiR
!52

Test File Inclusion




 

Code
Repository
Commit LogBugLocator
BLIA
Locus
AmaLgam
BRTracer
BLUiR
We remove  
including “test” or “Test” in a path or ﬁlename
!53

Duplicate Bug Reports
46 
Projects
New Subjects
690  
Major
Versions
9,459  
Bug
Reports
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
807  
Duplicate
Reports
5 
Projects
5  
Major
Versions
558  
Bug
Reports
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
………..
…. …..
…..….
…….. 
….
..
136  
Duplicate
Reports
Old Subjects
!55

Duplicate Bug Reports
MATH-760 MATH-1192 MATH-2022
MATH-760 MATH-1192
MATH-760 MATH-2022
!57

Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization

More Related Content

What's hot (12)

Similar to Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization (20)

More from Dongsun Kim (13)

Recently uploaded (20)

Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization