Spell Streaming Parsing of System Event Logs
Spell Streaming Parsing of System Event Logs
Abstract—System event logs have been frequently used as a approach that leverages the source code [12] which is often
valuable resource in data-driven approaches to enhance system unavailable, none of the previous methods could achieve online
health and stability. A typical procedure in system log analytics parsing in a streaming fashion. Some work claimed “online”
is to first parse unstructured logs, and then apply data analysis
on the resulting structured data. Previous work on parsing processing, but with the requirement of doing some extensive
system event logs focused on offline, batch processing of raw offline processing first, and only then matching log entries
log files. But increasingly, applications demand online monitoring with the data structures and patterns identified first through
and processing. We propose an online streaming method Spell, the offline, batched process [13].
which utilizes a longest common subsequence based approach,
to parse system event logs. We show how to dynamically extract There is also an increasing demand to properly manage and
log patterns from incoming logs and how to maintain a set of store system logs [14]. A log management system typically has
discovered message types in streaming fashion. Evaluation results a log shipper installed on each node to forward log entries to
on large real system logs demonstrate that even compared with a centralized server, which often contains a log parser, a log
the offline alternatives, Spell shows its superiority in terms of indexer, a storage engine and a user interface. In such systems
both efficiency and effectiveness.
the default log parser only parses simple schema information
I. I NTRODUCTION such as timestamp and hostname. The log entry itself is treated
as an unstructured text value. An online structured approach
The increasing complexity of modern computer systems has that could parse the event logs into structured data could make
become a significant limiting factor in deploying and manag- the logs much easier to query, summarize and aggregate.
ing them. Being able to be alerted and mitigate the problem Log entries are produced by the “print” statements in a
right away has become a fundamental requirement in many system program’s source code. As such, we can view a log
systems. As a result, automatically detecting anomalies upon entry as a collection of (“message type”, “parameter value”)
happening in an online fashion is an appealing solution. Data- pairs. For example, a log printing statement printf(“File %d
driven methods are heavily employed to understand complex finished.”, id); contains a constant message type File finished
system behaviors, for example, exploring machine data for and a variable parameter value which is the file id. Hence,
automatic pattern discovery and anomaly detection [1]. System the goal of a structured log parser is to identify the message
logs, as a universal data source that contains important infor- type File * finished, where * stands for the place holder for
mation such as usage patterns, execution paths, and program variables (parameter values).
running status, are valuable assets in assisting these data-driven
system analytics, in order to gain insights that are useful to Contributions. In this paper, we propose Spell, a structured
enhance system health, stability, and usability. Streaming Parser for Event Logs using an LCS (longest com-
The effectiveness of system log mining has been validated mon subsequence) based approach. Spell parses unstructured
by recent literature. Logs could be used to detect execution log messages into structured message types and parameters in
anomalies [2], [3], [4], monitor network failures [5], or even an online streaming fashion. The time complexity to process
find software bugs [6]. Researchers have also used system logs each log entry e is close to linear (to the size of e).
to discover and diagnose performance problems [7]. Recently With streaming, real-time message type and parameter ex-
to untangle the interleaved event logs from concurrent systems traction produced by Spell, not only it provides a concise,
has also become a hot topic of research [8]. intuitive summary for the end users, but the logs are also
To alleviate the pain of diving into massive unstructured represented by clean structured data to be processed and ana-
log data, in most prior work, the first and foremost step is to lyzed further using advanced data analytics methods by down-
automatically parse the unstructured system logs to structured stream analysts. Using two state-of-the-art offline methods to
data [2], [3], [4], [6]. There have been a substantial study on automatically extract message types and parameters from raw
how to achieve this, for example, using regular expressions log files as the competing baseline, our study shows that even
[8], leveraging the source code [6], or parsing purely based compared with the offline methods, Spell still outperforms
on system log characteristics using data mining approaches them in terms of both efficiency and effectiveness.
such as clustering and iterative partitioning [2], [9], [10], [11]. The rest of this paper is organized as follows. Section
Nevertheless, except the approach that uses regular expressions II provides the problem formulation and a literature survey.
which requires domain-specific expert knowledge [8], hence, Section III presents our streaming Spell algorithm and a
does not work for general purpose system log parsing, or the number of optimizations. Section IV evaluates our method
using large real system logs. Finally, section V concludes the system logs, such as log size and the bipartite relationship
paper and section VI is our acknowledgement. between words in the same log message. LogTree [10] utilized
II. P RELIMINARY AND BACKGROUND the format information of raw logs and applied a tree structure
to extract system events from raw logs. LogSig [11] generates
A. Problem formulation
system events from textual log messages by searching the
System event logs are a universal resource that exists
most representative message signatures. HELO [13] extracts
practically in any system. We use system event logs to denote
constants and variables from message bodies, by first using an
the free-text audit trace generated by the system execution
offline classification step and then performing online clustering
(typically in the /var/log folder). A log message or a log
based on the template set by the first step. HLAer [16] is a
record/entry refers to one line in the log file, which is produced
heterogeneous log analysis system which utilizes a hierarchical
by a log printing statement in the source code of a user or
clustering approach with pairwise log similarity measures
kernel program running on or inside the system.
to assist log formatting. All previous structured log parsing
Our goal is to parse each log entry e into a collection of
methods focus on offline batched processing or matching new
message types (and the corresponding parameter values). Here
log entries with previously offline-extracted message types or
each message type in e has a one-to-one mapping with a log
regular expressions (e.g., from source code).
printing statement in the source code producing the raw log
There are also commercial and open source softwares on log
entry e. For example, a log printing statement:
printf("Temperature %s exceeds warning threshold\n", tmp);
management and analysis. Splunk is a leading log management
system that offers a suite of solutions to find useful information
may produce several log entries such as:
Temperature (41C) exceeds warning threshold
from machine data. Elastic Stack offers a rich set of open-
sourced tools that could gather logs from distributed nodes,
where the parameter value is 41C, and the message type is:
and then index, store, for user to query/visualize. All these
Temperature * exceeds warning threshold.
tools provide interface to parse logs upon their arrival. How-
Formally, a structured log parser is defined as follows:
ever, their parsers are based on regular expressions defined
Definition 1 (Structured Log Parser) Given an ordered set
by end users. The system itself can only parse very simple
of log entries (ordered by timestamps), log= {e1 , e2 , . . . , eL },
and basic structured schema such as timestamp and hostname,
that contain m distinct message types produced by m different
while log messages are treated as unstructured text values.
log printing statements from p different programs, where the
values of m and p (and the printing statements and the program III. Spell: STREAMING STRUCTURED LOG PARSER
source code) are unknown, a structured log parser is to parse We now present Spell, a streaming structured log parser for
log and produce all message types from those m statements. system event logs. Since a basic building block for Spell is a
A structured log parser is the first and foremost step for most longest common subsequence (LCS) algorithm, hence, Spell
automatic and smart log mining and data-driven log analytics stands for Streaming structured Parser for Event Logs using
solutions, and also a useful and critical step for managing LCS. In what follows, we first review the LCS problem.
logs in a log management system. Our objective is to design A. The LCS problem
a streaming structured log parser such that it makes only one Suppose Σ is a universe of alphabets (e.g., a-z, 0-9). Given
pass over the log and processes each log entry in an online, any sequence α = {a1 , a2 , ..., am }, such that ai ∈ Σ for 1 ≤
streaming fashion continuously. Without loss of generality, we i ≤ m, a subsequence of α is defined as {ax1 , ax2 , . . . , axk },
assume that the size of each log entry is O(n) words. where ∀xi , xi ∈ Z+ , and 1 ≤ x1 < x2 < · · · < xk ≤ m. Let
B. Related work β = {b1 , b2 , ..., bn } be another sequence such that bj ∈ Σ for
Mining interesting patterns from raw system logs has been j ∈ [1, n]. A subsequence γ is called a common subsequence
an active research field for over a decade. Two major efforts of α and β iff it is a subsequence of each. The longest common
in this area include generating features from raw logs to apply subsequence (LCS) problem for input sequences α and β is to
various data analytics, e.g. [3], [4], [6], and building execution find longest such γ. For instance, sequence {1, 3, 5, 7, 9} and
models from system logs followed by comparing it with future sequence {1, 5, 7, 10} yields an LCS of {1, 5, 7}.
system executions, e.g. [2]. There are also efforts in identifying We observe that an LCS-based method can be developed
dependencies from concurrent logs [3], [4], [8]. to efficiently and effectively extract message types from raw
To achieve effective data-driven log analytics, the first and system logs. This is a seemingly natural idea, yet has not been
foremost process is to turn unstructured logs into structured explored by existing literature. Our key observation is that, if
data. Xu et al. [6] used the schema from log printing statements we view the output by a log printing statement (which is a
in the original programs’ source code to extract message types. log entry) as a sequence, in most log printing statements, the
In [8], the raw logs are parsed using pre-defined, domain- constant that represents a message type often takes a majority
specific regular expressions. There are efforts to make this pro- part of the sequence and the parameter values take only a
cess more automatic and more accurate. Fu et al. [2] proposed small portion. If two log entries are produced by the same
a method to first cluster log entries using pairwise weighted log printing statement stat, but only differ by having different
edit distance, and then perform recursively splitting. IPLoM parameter values, the LCS of the two sequences is very likely
[9], [15] explored several heuristics to iteratively partition to be the constant in the code stat, implying a message type.
new log entry: Temperature (41C) exceeds new log entry: Temperature (43C) exceeds new log entry: Commandhas completed new log entry: norecent update
warning threshold warning threshold successfully
LCSObject
LCSObject
LCSObject LCSObject
LCSseq: Temperature (41C) LCSseq: Temperature * exceeds LCSseq: Temperature * exceeds
exceeds warning threshold warning threshold warning threshold
lineIds: {0} lineIds: {0, 1} lineIds: {0, 1}
runtime (seconds)
In this section, we evaluate the efficiency and effectiveness
104 2.5
of Spell, by comparing it with two popular offline log parsing 103 2.0
algorithms, on 2 real log datasets with different formats. All 102 1.5
101 1.0
experiments were executed on a Linux machine with an 8-core
100 0.5
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz computer. We’ll 10-10.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.05 10 15 20 25 30 35 40 45 50
show that Spell not only is able to parse logs in an online log size ( ×105 , Los Almos) log size ( ×105 , Blue Gene)
streaming fashion, but also has outperformed the competing Fig. 3. Efficiency comparison of different methods.
offline methods in terms of both efficiency and effectiveness. TABLE III
The two offline algorithms to be compared are IPLoM [9], A MORTIZED COST OF EACH MESSAGE TYPE LOOKUP STEP IN Spell
Los Alamos HPC log BlueGene/L log
[15] and a clustering-based log parser [2] which we refer prefix tree (ms) 0.006 0.011
to as CLP. The idea of IPLoM is to partition the entire simple loop (ms) 0.020 0.087
log into multiple clusters, where each cluster represents a naive LCS (ms) 0.175 0.580
set of log entries printed by the same print statement. The
partition is done using a simple 3-step heuristic: i) partition • Spell (naive LCS): compute the LCS using DP between
by each log record length; ii) partition each cluster by the new log entry and every existing message type.
token position having least distinct tokens; iii) partition by the • Spell: Spell with the pre-filtering step.
bipartite mapping between tokens in each cluster. It is so far Figure 3 left shows the results on Los Almos HPC Log.
the most lightweight automatic log parsing algorithm. CLP, Note that runtime is measured by logarithm scale. To parse the
on the other hand, is a frequently used algorithm by multiple entire log with 433, 490 entries, Spell with naive LCS is about
log mining efforts [2], [3], [4]. It also partitions the log into 75 seconds while it’s only 9 seconds with pre-filtering. IPLoM
clusters, while by first clustering using weighted edit distance, shows the best efficiency, whereas Spell (with pre-filtering)
and then repeatedly partitioning until all clusters satisfy the is only slightly slower (within seconds). The CLP method
heuristic - each position either has the same token, or is a has the worst efficiency (2-4 orders of magnitude slower than
parameter position. IPLoM and Spell). We tested two variants of CLP: 1) CLP
TABLE I
PARAMETERS FOR ALL THREE ALGORITHMS
(auto threshold): it automatically sets the cluster threshold ς
Spell Value IPLoM Value by k-means clustering. When log size is bigger than 100,000,
message type thresh- 0.5 file support threshold 0.01 it’s already too slow to run to completion. 2) CLP (fixed
old τ partition support 0 threshold): it uses a fixed threshold 5 calculated from smaller
CLP Value threshold
edit distance weight ν 10 lower bound 0.1 log file, which significantly improves the runtime. However
cluster threshold ς 5 upper bound 0.9 it’s still much slower than other methods. In later experiments
private contents 4 cluster goodness 0.34 we only use CLP with fixed threshold if applicable.
threshold % threshold
Figure 3 right shows the results on Blue Gene Log. The
Table I shows the default values of key parameters used runtime in this figure is measured in normal decimal scale.
for each algorithm. For parameters with recommended values We didn’t include CLP in this experiment: even CLP with
that were clearly stated in the original papers, such as all fixed threshold is too slow to finish as the Blue Gene log
parameters for IPLoM [9], we simply adopt those values. has nearly 5 million entries. Here the advantage of our pre-
For others that were not clearly specified, we tested the filtering step is clearly demonstrated. In particular, Spell with
corresponding method with different values until we got the pre-filtering has outperformed IPLoM in terms of efficiency.
best result (for the same log data) as in the original paper. With prefix tree, when the log size grows much faster than the
We use the supercomputer logs that were commonly used number of message types, most log entries will find a match
for evaluation by previous work [9], [13], [15], [16], shown in prefix tree, and return immediately. Then for the majority
in table II (count is the total number of log entries). of the rest, the message types could be found using simple
TABLE II loop approach. Only for a small amount of log records that
L OG DATASETS
Log type Count Message type ground truth are not matched in pre-filtering step, we will compare it with
Los Alamos HPC log 1 433,490 available online2 each existing message type. Noticeably, the runtime of Spell
BlueGene/L log1 4,747,963 available online3 (naive LCS) increases exponentially. That’s because when log
A. Efficiency of Spell size grows bigger, more message types also show up, and when
Figure 3 shows the total runtime of different methods when each new log entry comes, it may need to be compared with a
log size (the number of log records) grows bigger. Note that larger number of message types. This result clearly shows the
we tested different alternatives of the Spell method: importance of the pre-filtering step and how it has effectively
mitigated the efficiency issues in the basic Spell method.
1 CFDR Data, https://www.usenix.org/cfdr-data The amortized cost for each log entry to find its message
2 Los Alamos National Lab HPC Log message types, https://web.cs.dal.ca/
∼makanju/iplom/hpc-clusters.txt
type using different lookup method in the pre-filtering step is
3 BlueGene/L message types, https://web.cs.dal.ca/∼makanju/iplom/bgl- shown in Table III (in milliseconds). Recall that for each log
clusters.txt entry, Spell first tries to find its message type in prefix tree,
TABLE IV TABLE V
N UMBER (P ERCENTAGE ) OF LOG ENTRIES RETURNED BY EACH STEP C OMPARISON OF Spell WITH AND WITHOUT PRE - FILTER
Los Alamos HPC log BlueGene/L log Spell Los Alamos HPC log BlueGene/L log
prefix tree 397,412 (91.68%) 4,457,719 (93.89%) With pre- True message Accuracy True message Accuracy
simple loop 35,691 (8.23%) 288,254 (6.07%) filtering types found types found
naive LCS 387 (0.09%) 1,990 (0.042%) False 55 0.822786 165 0.811798
True 55 0.822786 164 0.811791
then simple loop, and finally uses naive LCS if not found in
previous two steps. Table IV shows the number (percentage) V. C ONCLUSIONS
of log entries that are returned in each step, showing that We present a streaming structured log parser, Spell, for
over 91% could be processed in prefix tree in O(n) time, parsing large system event logs in streaming fashion. Spell
and over 99.9% in total could be processed by prefix tree and works perfectly for online system log mining and monitoring.
simple loop combined. The expensive naive LCS computation It is also a great addition to modern log management systems
is only applied to less than 0.1% of log entries. Hence much to provide end-users concise, real-time understanding of the
overhead is reduced by pre-filtering step. We’ll show later that system states. We propose pre-filtering to improve Spell’s
it provides almost identical results with the costly naive LCS efficiency. Experiments over real system logs have clearly
method. demonstrated that Spell has outperformed the state-of-the-art
B. Effectiveness of Spell offline methods in terms of both efficiency and effectiveness.
In this section we evaluate the effectiveness of Spell. After VI. ACKNOWLEDGMENT
parsing, the log file is processed into multiple clusters, where Min Du and Feifei Li were supported in part by grants
each cluster represents one message type with the associated NSF CNS-1314945 and NSF IIS-1251019. We wish to thank
log records (as produced by the corresponding log parsing all members of the TCloud project and the Flux group for
method). A parsed message type is considered as correct if helpful discussion and valuable feedback.
all and only all log records printed by that message type (as
identified through the ground truth) are clustered together. We R EFERENCES
run each method, compare the results with the ground truth [1] M. Du and F. Li, “ATOM: Automated tracking, orchestration and
generated by matching each log entry with its true message monitoring of resource usage in infrastructure as a service systems,”
in IEEE BigData, 2015.
type from Table II, and calculate the accuracy, which indicates [2] Q. Fu, J.-G. Lou, Y. Wang, and J. Li, “Execution anomaly detection in
the total number of log entries that are parsed to correct distributed systems through unstructured log analysis,” in ICDM, 2009.
message types over the number of total processed log records. [3] J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, “Mining invariants from
console logs for system problem detection.” in USENIX ATC, 2010.
Spell CLP (fixed threshold) IPLoM [4] J.-G. Lou, Q. Fu, S. Yang, J. Li, and B. Wu, “Mining program workflow
0.9 0.9
0.8 0.8
from interleaved traces,” in SIGKDD, 2010.
0.7 [5] K. Yamanishi and Y. Maruyama, “Dynamic syslog mining for network
accuracy
accuracy
0.6 0.7
failure monitoring,” in SIGKDD, 2005.
0.5 0.6
0.4 [6] W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting
0.5
0.3 large-scale system problems by mining console logs,” in SOSP, 2009.
0.2 0.4
[7] K. Nagaraj, C. Killian, and J. Neville, “Structured comparative analysis
0.10.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.35 10 15 20 25 30 35 40 45 50 of systems logs to diagnose performance problems,” in NSDI, 2012.
log size ( ×105 , Los Almos) log size ( ×105 , Blue Gene) [8] I. Beschastnikh, Y. Brun, M. D. Ernst, and A. Krishnamurthy, “Inferring
Fig. 4. Effectiveness comparison of different methods. models of concurrent systems from logs of their behavior with csight,”
Figure 4 shows the comparison on supercomputer logs. With in ICSE, 2014.
[9] A. A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, “Clustering
more log entries, number of message types also increases; and event logs using iterative partitioning,” in SIGKDD, 2009.
they don’t necessarily show up uniformly over time. Hence, the [10] L. Tang and T. Li, “LogTree: A framework for generating system events
effectiveness of a method does not necessarily show a steady from raw textual logs,” in ICDM, 2010.
[11] L. Tang, T. Li, and C.-S. Perng, “LogSig: Generating system events
trend as log size grows. We can see that in both charts, Spell from raw textual logs,” in CIKM, 2011.
achieves much better accuracy than other methods. IPLoM [12] W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan, “Online system
accuracy is acceptable in Figure 4 left for Los Almos log, and problem detection by mining patterns of console logs,” in ICDM, 2009.
[13] A. Gainaru, F. Cappello, S. Trausan-Matu, and B. Kramer, “Event log
becomes terrible in Figure 4 right for Blue Gene log. mining tool for large scale hpc systems,” in Euro-Par, 2011.
Note that the pre-filtering step in Spell may miss an existing [14] Z. Cao, S. Chen, F. Li, M. Wang, and X. S. Wang, “LogKV: Exploiting
message type t for a new log entry e if LCS(t, s) 6= t key-value stores for event log processing,” in CIDR, 2013.
[15] A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, “A lightweight
but LCS(t, s)| > |LCS(t0 , s)| when there is another existing algorithm for message type extraction in system application logs,”
message type t0 that satisfies t0 =LCS(t0 , s), where s is the TKDE, 2012.
token sequence of e. To evaluate such potential degrade to [16] H. C. Xia Ning, Geoff Jiang and K. Yoshihira, “HLAer: A system
for heterogeneous log analysis,” in SDM Workshop on Heterogeneous
the effectiveness due to the pre-filtering step, we show a Learning, 2014.
comparison in Table V. The result shows that Spell with pre- [17] Y. Li, H. Li, T. Duan, S. Wang, Z. Wang, and Y. Cheng, “A real linear
filtering has achieved an accuracy nearly the same as that using and parallel multiple longest common subsequences (mlcs) algorithm,”
in SIGKDD, 2016.
only naive LCS. This means the pre-filtering step has almost [18] Y. Li, Y. Wang, Z. Zhang, Y. Wang, D. Ma, and J. Huang., “A novel fast
no downgrade effect on the parsing results though it greatly and memory efficient parallel mlcs algorithm for longer and large-scale
reduces the parsing overhead. sequences alignments,” in ICDE, 2016.