Mining Development Knowledge to Understand and Support Software Logging Practices

Mining Development Knowledge to
Understand and Support Software
Logging Practices
Heng Li
Supervisor: Dr. Ahmed E. Hassan
Software Analysis & Intelligence Lab (SAIL)
Queen’s University, Canada

Developers insert logging code that
produces log messages at runtime
2
Log()
Logging
code
Log
messages
Software
system
Log.info(“Stopping server on ” + port);
2016-07-23 17:56:16 INFO Stopping server on 8032
Log messages record valuable runtime information

Diagnose
failures
Logging is critical for software maintenance
Detect
anomalies
Log messages are widely used in software
maintenance efforts
3
Understand
runtime
behaviors
Fu et al., Contextual analysis of program logs
for understanding system behaviors. MSR ‘13
Yuan et al., Sherlog: Error diagnosis by
connecting clues from run-time logs. ASPLOS ‘10
Xu et al., Detecting large-scale system
problems by mining console logs. SOSP ‘09

Developers have difficulties deciding on
appropriate logging code
4
“A lot of log
noise”
“Slowing
down perf
by 20%”
“Missing an
error log”
Developers spend a significant amount of efforts
maintaining their logging code
§ Logging practices in open source projects
[Yuan et al., 2012; Chen and Jiang, 2017]
§ Logging practices in industry
[Shang et al, 2014; Fu et al, 2014]
Prior
work

Development knowledge explains
the development of logging code
5
− LOG.info(msg);
+ LOG.warn(msg);
To help users
identify a problem
LOG.warn(msg);
What How Why
Change historySource code Issue reports

Thesis statement
Development knowledge can help us understand
current logging practices and develop useful tools
to support such logging practices
6
Change historySource code Issue reports
Development knowledge

Mining development knowledge to
understand and support logging practices
7
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info

8
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info

Developers communicate their logging
concerns in issue reports
9
Logging cost: performance overhead
Remove a logging statement

Developers communicate their logging
concerns in issue reports
10
Add a logging statement
Logging benefit:
exposing runtime problems

We study logging-related issues reports to
understand developer’s logging concerns
11
Logging
issue
reports
Logging
concerns
Automated
& manual
filtering
Qualitative
analysis

What are developers’ logging concerns?
12
Logging Benefits
§ Assisting in debugging
Logging Costs
§ Excessive log information
Research opportunities
Leverage Minimize
Frequency
§ Providing runtime perf
§ Exposing runtime problems
§ Bookkeeping
§ Showing execution progress
§ Exposing unnecessary details
§ Misleading end users
§ Performance overhead
§ Exposing sensitive info

13
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
10 categories of
logging concerns
(e.g., misleading users)

14
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info

Some code topics are more likely to need
logging statements
15
Examples of JIRA issues that require developers to log
the topic of “connections”
[EMSE 2018]

Can code topics explain where to log?
Topic: “connection”
Logging statement
[EMSE 2018]
16
We extract the code topics and logging statements for
each code snippet (method level)

We use LDA to extract code topics
Logging statement
[EMSE 2018]
17
Tokenization
Topic model
(LDA)
queue, connection

A small number of topics are much more
likely to be logged
Logging statement
The most log-intensive topics usually capture
communication between machines (e.g., ”connection”) or
interactions between threads (e.g., “thread interruption”)
[EMSE 2018]
18

We combine both the structure and topic
info to explain where to log
Logging statement
Structure info: lines of
code, complexity, control
flow statements, etc.
[EMSE 2018]
19

We combine both the structure and topic
info to explain where to log
Logging statement
Structure info: lines of
code, complexity, control
flow statements, etc.
LASSO model
[EMSE 2018]
20

Code topics bring additional explanatory
power (up to 13% AUC improvement)
21
0.82
0.86
0.8
0.86
0.83
0.96
0.87
0.94
0.9 0.9 0.88
0.99
0.5
0.6
0.7
0.8
0.9
1
Structure info Structure & topic info
AUC
The performance (AUC) of our LASSO models
Random guess
[EMSE 2018]

22
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
Logging varies
across code topics

23
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info

Developers have difficulties to make
appropriate log changes
24
Developers usually forget to change logging code when
they change their code; in many cases, logging code is
written as “after-thoughts” after a failure happens and
logs are needed [Yuan et al., 2012]
Commit n Commit n+1
Code
changes
Log
changes
Version k
Debugging
difficulties
Code change history
Maintenance
efforts

Learning from the code change history to
provide log change suggestions
25
[EMSE 2017]
Code Code Log Code Log
?
Commit 1 Commit 2 Commit n…
Code changes
without log
changes
Code changes
with log
changes
Do we need to
change logs?
Code change history

LOG?
Providing automated suggestions for log
changes when developers change the code
26
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Change
metrics
Historical
metrics
Product
metrics
[EMSE 2017]
Code

Our models can effectively suggest whether
a log change is needed
27
0.84
0.91
0.86 0.88
0.5
0.6
0.7
0.8
0.9
1
AUC
The performance (AUC) of our Random
Forest models
Random guess
[EMSE 2017]

LOG?
The source code and code changes are
important for explaining log changes
28
Log change
suggestions
Three dimensions
25 metrics
Change
metrics
Historical
metrics
Product
metrics
[EMSE 2017]
Code
Explain

29
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
The source code &
code changes can
explain log changes

30
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info

Log levels are used to disable some verbose
log messages while enabling important ones
31
Trace
Debug
Info
Warn
Error
Fatal Less verbose levels
(higher levels)
More verbose
levels (lower levels)
Log.error(“message”)
Log level

Improper log levels can have many
negative impacts
32
“…tends to generate a lot
of log noise…”
“These warnings worry
users”
Developers spend much efforts adjusting log levels
[Yuan et al., 2012]

Learning from the code change history to
provide log level suggestions
33
[EMSE 2017]
Commit 1 Commit 2 Commit n…
Code change history
Log.warn(msg) Log.info(msg) Log. ? (msg)
Log.error(msg)
Which log level
to use?

Providing automated suggestions for log
levels when developers add logging code
34
Logging statement metrics
Containing block metrics
Containing file metrics
Code change metrics
Historical change metrics
Trace
Debug
Info
Warn
Error
Fatal
Ordinal
Regression
Model
[EMSE 2017]

Ordinal regression models can effectively
model log levels
35
0.76
0.78
0.81
0.75
0.5
0.6
0.7
0.8
0.9
The performance (AUC) of our Ordinal
Regression Models
AUC
Random guess
[EMSE 2017]

The content of a logging statements and the
containing block/file explain its log level
36
Logging statement metrics
Containing block metrics
Containing file metrics
Code change metrics
Historical change metrics
Trace
Debug
Info
Warn
Error
Fatal
[EMSE 2017]
Explain

37
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
The log content &
containing blocks/files
can explain log levels

38
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Logging varies
across code topics
Error
Warn
Info
The source code &
code changes can
explain log changes
The log content &
containing blocks/files
can explain log levels
10 categories of
logging concerns
(e.g., misleading users)

References
§ Fu, Q., Lou, J. G., Lin, Q., Ding, R., Zhang, D., and Xie, T. (2013). Contextual analysis of program logs for
understanding system behaviors. In Proceedings of the 10th Working Conference on Mining Software
Repositories, MSR ’13, pages 397–400.
§ Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by
mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles,
SOSP ’09, pages 117–132.
§ Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., and Pasupathy, S. (2010). Sherlog: Error diagnosis by connecting
clues from run-time logs. In Proceedings of the 15th International Conference on Architectural Support for
Programming Languages and Operating Systems, ASPLOS ’10, pages 143–154.
§ Yuan, D., Park, S., and Zhou, Y. (2012). Characterizing logging practices in open source software. In Proceedings
of the 34th International Conference on Software Engineering, ICSE ’12, pages 102–112.
§ Chen, B. and Jiang, Z. M. J. (2017). Characterizing logging practices in Java-based open source software projects
– a replication study in apache software foundation. Empirical Software Engineering, 22(1):330–374.
§ Shang, W., Jiang, Z. M., Adams, B., Hassan, A. E., Godfrey, M. W., Nasser, M., and Flora, P. (2014). An
exploratory study of the evolution of communicated information about the execution of large software
systems. Journal of Software: Evolution and Process, 26(1):3–26.
§ Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., and Ubayashi, N. (2014). An empirical study of just-in-time
defect prediction using cross-project models. In Proceedings of the 11thWorking Conference onMining
Software Repositories, MSR 2014, pages 172–181.
39

Log()
Literature review
41
Mining
logging
code
Mining log messages
Improving
logging
code
Log()

Mining log messages
42
Understanding runtime behaviors
[Fu et al., 2013; Hassan et al., 2008; Shang et al., 2013]
Detecting anomaly conditions
[Xu et al., 2008, 2009; Fu et al., 2009; Jiang et al., 2008]
Diagnosing system failures
[Yuan et al, 2010; Syer et al., 2013]
Prior work highlights the importance of improving
logging quality

Mining logging code
43
Logging practices in open source projects
[Yuan et al., 2012; Chen and Jiang, 2017]
Logging practices in industry
[Fu et al, 2014; Pecchia et al., 2015]
Evolution of logging code
[Shang et al, 2011; Kabinna et al., 2016]
Log()
Developers spend much effort maintaining their logging
Software logging is a common practice

Improving logging code: proactive logging
44
Proactively adding logging info in the source
code
[Yuan et al., 2011, 2012; Zhao et al., 2017]
Log()
Producing excessive log information
Developers’ expertise and concerns are not considered

Improving logging code: learning to log
45
Learning statistical models to suggest where
to log
[Zhu et al., 2015; Lal and Sureka, 2016; Jia et al., 2018]
Ignoring logging patterns (e.g., log level, stack trace)
Log()
Focusing on one dim. of dev. knowledge (source code)
Providing logging suggestions as a post-dev. process

Logging stack traces can grow log files
very fast
46
Log.warn(msg) Log.warn(msg, e)
Logging a log
message + full stack
trace
Logging a log
message

Developers have difficulties to decide
whether to log stack traces
47
Missing stack trace
Improper logging
of stack trace

Learning from existing source code to
suggest whether to log a stack trace
48
Source
code
Source
code
Log(msg) Log(msg, e)
Source
code
Log(msg, ?)
Random Forest
Classifier
Log the
stack trace?
Six dimensions of
features
Log(msg, e)

Our models can effectively suggest whether
a stack trace is needed
49
0.85
0.94
0.9
0.86
0.5
0.6
0.7
0.8
0.9
1
AUC
The performance (AUC) of our Random
Forest models
Random guess

Mining Development Knowledge to Understand and Support Software Logging Practices

Recommended

More Related Content

Similar to Mining Development Knowledge to Understand and Support Software Logging Practices (20)

More from SAIL_QU (20)

Recently uploaded (20)

Mining Development Knowledge to Understand and Support Software Logging Practices