Spell Streaming Parsing of System Event Logs

Spell is an online streaming method for parsing system event logs into structured data in real-time. It utilizes a longest common subsequence approach to dynamically extract log patterns from incoming logs and maintain discovered message types. Evaluation on large real logs shows Spell has better efficiency and effectiveness than offline log parsing alternatives.

Uploaded by

redzgn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Spell Streaming Parsing of System Event Logs

Uploaded by

redzgn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Spell: Streaming Parsing of System Event Logs

Min Du, Feifei Li

School of Computing, University of Utah
[email protected], [email protected]

Abstract—System event logs have been frequently used as a approach that leverages the source code [12] which is often
valuable resource in data-driven approaches to enhance system unavailable, none of the previous methods could achieve online
health and stability. A typical procedure in system log analytics parsing in a streaming fashion. Some work claimed “online”
is to first parse unstructured logs, and then apply data analysis
on the resulting structured data. Previous work on parsing processing, but with the requirement of doing some extensive
system event logs focused on offline, batch processing of raw offline processing first, and only then matching log entries
log files. But increasingly, applications demand online monitoring with the data structures and patterns identified first through
and processing. We propose an online streaming method Spell, the offline, batched process [13].
which utilizes a longest common subsequence based approach,
to parse system event logs. We show how to dynamically extract There is also an increasing demand to properly manage and
log patterns from incoming logs and how to maintain a set of store system logs [14]. A log management system typically has
discovered message types in streaming fashion. Evaluation results a log shipper installed on each node to forward log entries to
on large real system logs demonstrate that even compared with a centralized server, which often contains a log parser, a log
the offline alternatives, Spell shows its superiority in terms of indexer, a storage engine and a user interface. In such systems
both efficiency and effectiveness.
the default log parser only parses simple schema information
I. I NTRODUCTION such as timestamp and hostname. The log entry itself is treated
as an unstructured text value. An online structured approach
The increasing complexity of modern computer systems has that could parse the event logs into structured data could make
become a significant limiting factor in deploying and manag- the logs much easier to query, summarize and aggregate.
ing them. Being able to be alerted and mitigate the problem Log entries are produced by the “print” statements in a
right away has become a fundamental requirement in many system program’s source code. As such, we can view a log
systems. As a result, automatically detecting anomalies upon entry as a collection of (“message type”, “parameter value”)
happening in an online fashion is an appealing solution. Data- pairs. For example, a log printing statement printf(“File %d
driven methods are heavily employed to understand complex finished.”, id); contains a constant message type File finished
system behaviors, for example, exploring machine data for and a variable parameter value which is the file id. Hence,
automatic pattern discovery and anomaly detection [1]. System the goal of a structured log parser is to identify the message
logs, as a universal data source that contains important infor- type File * finished, where * stands for the place holder for
mation such as usage patterns, execution paths, and program variables (parameter values).
running status, are valuable assets in assisting these data-driven
system analytics, in order to gain insights that are useful to Contributions. In this paper, we propose Spell, a structured
enhance system health, stability, and usability. Streaming Parser for Event Logs using an LCS (longest com-
The effectiveness of system log mining has been validated mon subsequence) based approach. Spell parses unstructured
by recent literature. Logs could be used to detect execution log messages into structured message types and parameters in
anomalies [2], [3], [4], monitor network failures [5], or even an online streaming fashion. The time complexity to process
find software bugs [6]. Researchers have also used system logs each log entry e is close to linear (to the size of e).
to discover and diagnose performance problems [7]. Recently With streaming, real-time message type and parameter ex-
to untangle the interleaved event logs from concurrent systems traction produced by Spell, not only it provides a concise,
has also become a hot topic of research [8]. intuitive summary for the end users, but the logs are also
To alleviate the pain of diving into massive unstructured represented by clean structured data to be processed and ana-
log data, in most prior work, the first and foremost step is to lyzed further using advanced data analytics methods by down-
automatically parse the unstructured system logs to structured stream analysts. Using two state-of-the-art offline methods to
data [2], [3], [4], [6]. There have been a substantial study on automatically extract message types and parameters from raw
how to achieve this, for example, using regular expressions log files as the competing baseline, our study shows that even
[8], leveraging the source code [6], or parsing purely based compared with the offline methods, Spell still outperforms
on system log characteristics using data mining approaches them in terms of both efficiency and effectiveness.
such as clustering and iterative partitioning [2], [9], [10], [11]. The rest of this paper is organized as follows. Section
Nevertheless, except the approach that uses regular expressions II provides the problem formulation and a literature survey.
which requires domain-specific expert knowledge [8], hence, Section III presents our streaming Spell algorithm and a
does not work for general purpose system log parsing, or the number of optimizations. Section IV evaluates our method
using large real system logs. Finally, section V concludes the system logs, such as log size and the bipartite relationship
paper and section VI is our acknowledgement. between words in the same log message. LogTree [10] utilized
II. P RELIMINARY AND BACKGROUND the format information of raw logs and applied a tree structure
to extract system events from raw logs. LogSig [11] generates
A. Problem formulation
system events from textual log messages by searching the
System event logs are a universal resource that exists
most representative message signatures. HELO [13] extracts
practically in any system. We use system event logs to denote
constants and variables from message bodies, by first using an
the free-text audit trace generated by the system execution
offline classification step and then performing online clustering
(typically in the /var/log folder). A log message or a log
based on the template set by the first step. HLAer [16] is a
record/entry refers to one line in the log file, which is produced
heterogeneous log analysis system which utilizes a hierarchical
by a log printing statement in the source code of a user or
clustering approach with pairwise log similarity measures
kernel program running on or inside the system.
to assist log formatting. All previous structured log parsing
Our goal is to parse each log entry e into a collection of
methods focus on offline batched processing or matching new
message types (and the corresponding parameter values). Here
log entries with previously offline-extracted message types or
each message type in e has a one-to-one mapping with a log
regular expressions (e.g., from source code).
printing statement in the source code producing the raw log
There are also commercial and open source softwares on log
entry e. For example, a log printing statement:
printf("Temperature %s exceeds warning threshold\n", tmp);
management and analysis. Splunk is a leading log management
system that offers a suite of solutions to find useful information
may produce several log entries such as:
Temperature (41C) exceeds warning threshold
from machine data. Elastic Stack offers a rich set of open-
sourced tools that could gather logs from distributed nodes,
where the parameter value is 41C, and the message type is:
and then index, store, for user to query/visualize. All these
Temperature * exceeds warning threshold.
tools provide interface to parse logs upon their arrival. How-
Formally, a structured log parser is defined as follows:
ever, their parsers are based on regular expressions defined
Definition 1 (Structured Log Parser) Given an ordered set
by end users. The system itself can only parse very simple
of log entries (ordered by timestamps), log= {e1 , e2 , . . . , eL },
and basic structured schema such as timestamp and hostname,
that contain m distinct message types produced by m different
while log messages are treated as unstructured text values.
log printing statements from p different programs, where the
values of m and p (and the printing statements and the program III. Spell: STREAMING STRUCTURED LOG PARSER
source code) are unknown, a structured log parser is to parse We now present Spell, a streaming structured log parser for
log and produce all message types from those m statements. system event logs. Since a basic building block for Spell is a
A structured log parser is the first and foremost step for most longest common subsequence (LCS) algorithm, hence, Spell
automatic and smart log mining and data-driven log analytics stands for Streaming structured Parser for Event Logs using
solutions, and also a useful and critical step for managing LCS. In what follows, we first review the LCS problem.
logs in a log management system. Our objective is to design A. The LCS problem
a streaming structured log parser such that it makes only one Suppose Σ is a universe of alphabets (e.g., a-z, 0-9). Given
pass over the log and processes each log entry in an online, any sequence α = {a1 , a2 , ..., am }, such that ai ∈ Σ for 1 ≤
streaming fashion continuously. Without loss of generality, we i ≤ m, a subsequence of α is defined as {ax1 , ax2 , . . . , axk },
assume that the size of each log entry is O(n) words. where ∀xi , xi ∈ Z+ , and 1 ≤ x1 < x2 < · · · < xk ≤ m. Let
B. Related work β = {b1 , b2 , ..., bn } be another sequence such that bj ∈ Σ for
Mining interesting patterns from raw system logs has been j ∈ [1, n]. A subsequence γ is called a common subsequence
an active research field for over a decade. Two major efforts of α and β iff it is a subsequence of each. The longest common
in this area include generating features from raw logs to apply subsequence (LCS) problem for input sequences α and β is to
various data analytics, e.g. [3], [4], [6], and building execution find longest such γ. For instance, sequence {1, 3, 5, 7, 9} and
models from system logs followed by comparing it with future sequence {1, 5, 7, 10} yields an LCS of {1, 5, 7}.
system executions, e.g. [2]. There are also efforts in identifying We observe that an LCS-based method can be developed
dependencies from concurrent logs [3], [4], [8]. to efficiently and effectively extract message types from raw
To achieve effective data-driven log analytics, the first and system logs. This is a seemingly natural idea, yet has not been
foremost process is to turn unstructured logs into structured explored by existing literature. Our key observation is that, if
data. Xu et al. [6] used the schema from log printing statements we view the output by a log printing statement (which is a
in the original programs’ source code to extract message types. log entry) as a sequence, in most log printing statements, the
In [8], the raw logs are parsed using pre-defined, domain- constant that represents a message type often takes a majority
specific regular expressions. There are efforts to make this pro- part of the sequence and the parameter values take only a
cess more automatic and more accurate. Fu et al. [2] proposed small portion. If two log entries are produced by the same
a method to first cluster log entries using pairwise weighted log printing statement stat, but only differ by having different
edit distance, and then perform recursively splitting. IPLoM parameter values, the LCS of the two sequences is very likely
[9], [15] explored several heuristics to iteratively partition to be the constant in the code stat, implying a message type.
new log entry: Temperature (41C) exceeds new log entry: Temperature (43C) exceeds new log entry: Commandhas completed new log entry: norecent update
warning threshold warning threshold successfully

LCSObject

LCSObject LCSObject
LCSseq: Temperature (41C) LCSseq: Temperature * exceeds LCSseq: Temperature * exceeds
exceeds warning threshold warning threshold warning threshold
lineIds: {0} lineIds: {0, 1} lineIds: {0, 1}

LCSseq: Command has completed

successfully
...
lineIds: {2}
LCSMap LCSMap LCSMap LCSMap

Fig. 1. Basic workflow of Spell.

The merit of using the LCS formulation to parse system number of tokens in a log entry e), we consider the LCSseq
event logs, as compared with the previously mentioned clus- qj and the new log sequence s having the same message type.
tering and iterative partitioning methods, is that the LCS The intuition is that the LCS of qj and s is the maximum
sequence of two log messages is naturally a message type, LCS among all LCSObjects in the LCSMap, and the length
which makes streaming log parsing possible. of LCS(qj , s) is at least half the length of s; hence, unless
the total length of parameter values in e is more than half
B. Basic notations and data structure of its size, which is very unlikely in practice, the length of
In a log entry e, we call each word a token. A log entry e LCS(qj , s) is a good indicator whether the log entries in the
could be parsed to a set of tokens using system defined (or as jth LCSObject (which share the LCSseq qj ) share the same
user input) delimiters according to the format of the log. In message type with e or not (which would be LCS(qj , s)).
general common delimiters such as space and equal sign are If there are multiple LCSObjects having the same max `
sufficient to cover most cases. After tokenization of a log, each values, we choose the one with the smallest |qj | value, since
log entry is translated into a “token” sequence, which we will it has a higher set similarity value with s. Then we use
use to compute the longest common subsequence, i.e., Σ = backtracking to generate a new LCS sequence to represent the
{tokens from e1 } ∩ {tokens from e2 } · · · ∩ {tokens from eL }. message type for all log entries in the jth LCSObject and e.
Each log entry is assigned a unique line id which is initialized Note when using backtracking to get the new LCSseq of qj and
to 0 and auto-incremented for the arrival of a new log entry. s, we mark ‘*’ at the places where the two sequences disagree,
We create a data structure called LCSObject to hold cur- as the place holders for parameters, and consecutive adjacent
rently parsed LCS sequences and the related metadata infor- ‘*’s are merged into one ‘*’. For instance, consider the
mation. We use LCSseq to denote a sequence that’s the LCS following two sequences: s = Command Failed on: node-127 and
of multiple log messages, which, in our setting, is a candidate qj =Command Failed on: node-235 node-236, LCSseq of the two
for the message type of those log entries. That said, each would be: Command Failed on: *. Once this is done, we update
LCSObject contains an LCSseq and a list of line indices called the LCSseq of the jth LCSObject from qj to LCS(qj , s), and
lineIds that stores the line ids for the corresponding log entries add e’s line id to the lineIds of the jth LCSObject.
that lead to this LCSseq. Finally, we store all currently parsed If none of the existing qi ’s shares an LCS with s that is
LCSObjects into a list called LCSMap. When a new log entry at least |s|/2 in length, we create a new LCSObject for e in
ei arrives, we first compare it with all LCSseq’s in existing LCSMap, and set its LCSseq as s itself.
LCSObjects in LCSMap, then based on the results, either This completes the basic procedures in Spell, and most
insert the line id i to the lineIds of an existing LCSObject, standard logs could be successfully parsed using this method.
or compute a new LCSObject and insert it into LCSMap. We further describe how to improve its efficiency.

C. Basic workflow D. Improvement on efficiency

In this section we show how to achieve nearly optimal time
Our algorithm runs in a streaming fashion, as shown in complexity for most incoming log entries (i.e., linear to |s|,
Figure 1. Initially, the LCSMap list is empty. When a new log the number of tokens of a log entry e). In our basic method,
entry ei arrives, it is firstly parsed into a token sequence si when a new log entry arrives, we’ll need to compute the length
using a set of delimiters. After that, we compare si with the of its LCS with each existing message type. Suppose each log
LCSseq’s from all LCSObjects in the current LCSMap, to see entry is of size O(n) for some small constant n (i.e., n = |s|),
if si “matches” one of the existing LCSseq’s (hence, line id it takes O(n2 ) time to compute LCS of a log entry and an
i is added to the lineIds of the corresponding LCSObject), or existing message type (using a standard dynamic programming
we need to create a new LCSObject for LCSMap. (DP) formulation). Let m0 be the number of currently parsed
Get new LCS. Given a new log sequence s produced by the message types in LCSMap. The method in section III-C leads
tokenization of a new log entry e, we search through LCSMap. to a time complexity of O(m0 · n2 ) for each new log entry.
For the ith LCSObject, suppose its LCSseq is qi , we compute Note that since the number of possible tokens in a complex
the value `i , which is the length of the LCS(qi , s). While system could be large, we cannot apply techniques that com-
searching through the LCSMap, we keep the largest `i value pute LCS or MLCS efficiently by assuming a limited set of
and the index to the corresponding LCSObject. In the end, alphabets [17], [18], i.e., by assuming small |Σ| values.
if `j = max(`0i s) is greater than a threshold τ (by default, A key observation is that, for a vast majority of new log
τ = |s|/2, where |s| denotes the length of a sequence s, i.e., entries (over 99.9% in our evaluation), their message types
Prefix tree of Strs:
?: A B P C ROOT records, for each new log entry, the message type returned
by the prefix tree approach (if found), is 100% equal to the
A E
parameter results returned by the simple loop method. But there also
B C D F
exist cases where the returned message type by prefix tree is
C D less than 12 number of tokens ( 12 |s|) for a new log entry e
Fig. 2. Find the subsequence of σ using Prefix Tree.
while e’s message type already exists in LCSMap.
are often already present in currently parsed message types That said, the complete pre-filtering step in Spell is, for each
(stored by LCSMap). Hence instead of computing the LCS new log entry e, first find its message type using prefix tree,
between a new log entry and each exiting message type, we and if not found, apply the simple loop lookup. In evaluation
adopt a pre-filtering step to find if its message type already section we’ll show that Spell with pre-filtering step produces
exists, which reduces to the following problem: almost equally good results for all logs with much less cost.
For a new string σ and a set of current strings For log entries (less than 0.1% in our evaluation) that
strs = {str1 , str2 , ...strm }, find the longest stri such that do not find message types using the pre-filtering step, we
LCS(σ, stri ) = stri , and return true if |stri | >= 12 |σ|. compare the new log entry e with all existing message types
In our problem setting, each string is a set of tokens and to see if a new message type could be generated. However,
we simply view each token as a character. instead of computing LCS between each message type q and
1) Simple loop approach. A naive method is to simply loop e, we first compute their set similarity score using Jaccard
through strs. For each stri , maintain a pointer pi pointing to similarity. Only for those message types that have more than
the head of stri , and another pointer pt pointing to the head half common elements (i.e., tokens) with e do we compute
of σ. If the characters (or tokens in our case) pointed to by pi their LCS. Then if their LCS length exceeds 12 |s|, we adjust
and pt match, advance both pointers; otherwise only advance that message type and prefix tree T accordingly. Otherwise
pointer pt. When pt has gone to the end of σ, check if pi has we simply add e to T and LCSMap as a new message type.
also reached the end of stri . A pruning can be applied which
E. Time complexity analysis
is to skip stri if its length is less than 12 |σ|. The worst time
Spell ensures that the size of LCSMap increases by one
complexity for this approach is O(m · n).
only when a new message type is identified; otherwise, a new
2) Prefix tree approach. To avoid going through the entire
log entry id is added to an existing LCSObject with an updated
strs set, we could index stri s in strs using a prefix tree, and
message type. This guarantees that LCSMap size is at most
prune away many candidates.
the number of total message types (which is m) that could
An example is shown in Figure 2 where strs = {ABC, ACD,
be produced by the corresponding source code, which is a
AD, EF}, and they are indexed by a prefix tree T . Instead of
constant. In section III-D, we’ve shown that for the basic
checking σ against every stri in strs, we first check tree T
Spell, the time complexity for each new log entry is O(m·n2 ),
and see if there is an existing stri that is a subsequence of
since the naive DP method to compute LCS is O(n2 ) for
σ. If such a stri is identified, we check if |stri | > 21 |σ|).
log entries of size O(n), whereas our backtracking method
As shown in Figure 2, suppose σ=ABPC. Then by comparing
is often cheaper and we only do it with a target message type
each character of σ with each node of T , we could efficiently
in LCSMap which has the longest LCS length with respect to
prune most branches in T , and mark the characters in σ that
the new log entry if the length exceeds a threshold.
do not match any node in T as parameters. In this case, we
With the pre-filtering step, for each log entry, we’ll first
identify ABC as the message type for σ, and P as its parameter.
try to find its message type in prefix tree, then apply simple
For most log entries, it is highly likely that their message
loop approach, and only for the small portion that are still
type already exists in tree T , so Spell will stop here, and the
not located, the LCSMap needs to be compared against. For
time complexity is only O(n). This is optimal, since we have
L log records, suppose the number of log records that fail to
to go through every token in a new log entry at least once.
find message types in pre-filtering step is F , and the number
However, this approach only guarantees to return a stri if
of log records that are returned in simple loop step is I. The
such stri = LCS(σ, stri ) exists. It does not guarantee that the
amortized cost for each log record is only O(n + (I+F L
)
·m·
returned stri is the longest one among all stri s that satisfy stri F 2
n + L · m · n ), where m is the number of message types and
= LCS(σ, stri ). For example, if σ=DAPBC while strs={DA,
ABC}, the prefix tree returns DA instead of ABC. n is the log record length. In our evaluation, (I+FL
)
< 0.01
In practice, we find that although the prefix tree approach and FL < 0.001, thus the cost for each log record to find its
does not guarantee to find the longest message type, its message type in Spell is approximately only O(n) in practice.
returned message type is almost identical to the results of F. Remarks
simple loop method. That’s because parameters in each log Parsing each log message to extract their message type,
record tend to appear near the end. In fact one of the state-of- though a vital step for many further data analysis, is not an
art offline methods [2] finds message types by using weighted easy task. It should be noted that no automatic approach is
edit distance and assigns more weight to the token closer to perfect for all possible logs. For example, even the approach
end as parameter position. In particular, the evaluation results that extracts log schema from the source code [6] that produces
show that for the Los Almos HPC log with 433,490 log the corresponding log cannot achieve 100% accuracy.
Spell (naive LCS) CLP (fixed threshold) IPLoM
IV. E VALUATION Spell CLP (auto threshold)

runtime (seconds ×103 )

105 3.0

runtime (seconds)
In this section, we evaluate the efficiency and effectiveness
104 2.5
of Spell, by comparing it with two popular offline log parsing 103 2.0
algorithms, on 2 real log datasets with different formats. All 102 1.5
101 1.0
experiments were executed on a Linux machine with an 8-core
100 0.5
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz computer. We’ll 10-10.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.05 10 15 20 25 30 35 40 45 50
show that Spell not only is able to parse logs in an online log size ( ×105 , Los Almos) log size ( ×105 , Blue Gene)
streaming fashion, but also has outperformed the competing Fig. 3. Efficiency comparison of different methods.
offline methods in terms of both efficiency and effectiveness. TABLE III
The two offline algorithms to be compared are IPLoM [9], A MORTIZED COST OF EACH MESSAGE TYPE LOOKUP STEP IN Spell
Los Alamos HPC log BlueGene/L log
[15] and a clustering-based log parser [2] which we refer prefix tree (ms) 0.006 0.011
to as CLP. The idea of IPLoM is to partition the entire simple loop (ms) 0.020 0.087
log into multiple clusters, where each cluster represents a naive LCS (ms) 0.175 0.580
set of log entries printed by the same print statement. The
partition is done using a simple 3-step heuristic: i) partition • Spell (naive LCS): compute the LCS using DP between
by each log record length; ii) partition each cluster by the new log entry and every existing message type.
token position having least distinct tokens; iii) partition by the • Spell: Spell with the pre-filtering step.
bipartite mapping between tokens in each cluster. It is so far Figure 3 left shows the results on Los Almos HPC Log.
the most lightweight automatic log parsing algorithm. CLP, Note that runtime is measured by logarithm scale. To parse the
on the other hand, is a frequently used algorithm by multiple entire log with 433, 490 entries, Spell with naive LCS is about
log mining efforts [2], [3], [4]. It also partitions the log into 75 seconds while it’s only 9 seconds with pre-filtering. IPLoM
clusters, while by first clustering using weighted edit distance, shows the best efficiency, whereas Spell (with pre-filtering)
and then repeatedly partitioning until all clusters satisfy the is only slightly slower (within seconds). The CLP method
heuristic - each position either has the same token, or is a has the worst efficiency (2-4 orders of magnitude slower than
parameter position. IPLoM and Spell). We tested two variants of CLP: 1) CLP
TABLE I
PARAMETERS FOR ALL THREE ALGORITHMS
(auto threshold): it automatically sets the cluster threshold ς
Spell Value IPLoM Value by k-means clustering. When log size is bigger than 100,000,
message type thresh- 0.5 file support threshold 0.01 it’s already too slow to run to completion. 2) CLP (fixed
old τ partition support 0 threshold): it uses a fixed threshold 5 calculated from smaller
CLP Value threshold
edit distance weight ν 10 lower bound 0.1 log file, which significantly improves the runtime. However
cluster threshold ς 5 upper bound 0.9 it’s still much slower than other methods. In later experiments
private contents 4 cluster goodness 0.34 we only use CLP with fixed threshold if applicable.
threshold % threshold
Figure 3 right shows the results on Blue Gene Log. The
Table I shows the default values of key parameters used runtime in this figure is measured in normal decimal scale.
for each algorithm. For parameters with recommended values We didn’t include CLP in this experiment: even CLP with
that were clearly stated in the original papers, such as all fixed threshold is too slow to finish as the Blue Gene log
parameters for IPLoM [9], we simply adopt those values. has nearly 5 million entries. Here the advantage of our pre-
For others that were not clearly specified, we tested the filtering step is clearly demonstrated. In particular, Spell with
corresponding method with different values until we got the pre-filtering has outperformed IPLoM in terms of efficiency.
best result (for the same log data) as in the original paper. With prefix tree, when the log size grows much faster than the
We use the supercomputer logs that were commonly used number of message types, most log entries will find a match
for evaluation by previous work [9], [13], [15], [16], shown in prefix tree, and return immediately. Then for the majority
in table II (count is the total number of log entries). of the rest, the message types could be found using simple
TABLE II loop approach. Only for a small amount of log records that
L OG DATASETS
Log type Count Message type ground truth are not matched in pre-filtering step, we will compare it with
Los Alamos HPC log 1 433,490 available online2 each existing message type. Noticeably, the runtime of Spell
BlueGene/L log1 4,747,963 available online3 (naive LCS) increases exponentially. That’s because when log
A. Efficiency of Spell size grows bigger, more message types also show up, and when
Figure 3 shows the total runtime of different methods when each new log entry comes, it may need to be compared with a
log size (the number of log records) grows bigger. Note that larger number of message types. This result clearly shows the
we tested different alternatives of the Spell method: importance of the pre-filtering step and how it has effectively
mitigated the efficiency issues in the basic Spell method.
1 CFDR Data, https://www.usenix.org/cfdr-data The amortized cost for each log entry to find its message
2 Los Alamos National Lab HPC Log message types, https://web.cs.dal.ca/
∼makanju/iplom/hpc-clusters.txt
type using different lookup method in the pre-filtering step is
3 BlueGene/L message types, https://web.cs.dal.ca/∼makanju/iplom/bgl- shown in Table III (in milliseconds). Recall that for each log
clusters.txt entry, Spell first tries to find its message type in prefix tree,
TABLE IV TABLE V
N UMBER (P ERCENTAGE ) OF LOG ENTRIES RETURNED BY EACH STEP C OMPARISON OF Spell WITH AND WITHOUT PRE - FILTER
Los Alamos HPC log BlueGene/L log Spell Los Alamos HPC log BlueGene/L log
prefix tree 397,412 (91.68%) 4,457,719 (93.89%) With pre- True message Accuracy True message Accuracy
simple loop 35,691 (8.23%) 288,254 (6.07%) filtering types found types found
naive LCS 387 (0.09%) 1,990 (0.042%) False 55 0.822786 165 0.811798
True 55 0.822786 164 0.811791
then simple loop, and finally uses naive LCS if not found in
previous two steps. Table IV shows the number (percentage) V. C ONCLUSIONS
of log entries that are returned in each step, showing that We present a streaming structured log parser, Spell, for
over 91% could be processed in prefix tree in O(n) time, parsing large system event logs in streaming fashion. Spell
and over 99.9% in total could be processed by prefix tree and works perfectly for online system log mining and monitoring.
simple loop combined. The expensive naive LCS computation It is also a great addition to modern log management systems
is only applied to less than 0.1% of log entries. Hence much to provide end-users concise, real-time understanding of the
overhead is reduced by pre-filtering step. We’ll show later that system states. We propose pre-filtering to improve Spell’s
it provides almost identical results with the costly naive LCS efficiency. Experiments over real system logs have clearly
method. demonstrated that Spell has outperformed the state-of-the-art
B. Effectiveness of Spell offline methods in terms of both efficiency and effectiveness.
In this section we evaluate the effectiveness of Spell. After VI. ACKNOWLEDGMENT
parsing, the log file is processed into multiple clusters, where Min Du and Feifei Li were supported in part by grants
each cluster represents one message type with the associated NSF CNS-1314945 and NSF IIS-1251019. We wish to thank
log records (as produced by the corresponding log parsing all members of the TCloud project and the Flux group for
method). A parsed message type is considered as correct if helpful discussion and valuable feedback.
all and only all log records printed by that message type (as
identified through the ground truth) are clustered together. We R EFERENCES
run each method, compare the results with the ground truth [1] M. Du and F. Li, “ATOM: Automated tracking, orchestration and
generated by matching each log entry with its true message monitoring of resource usage in infrastructure as a service systems,”
in IEEE BigData, 2015.
type from Table II, and calculate the accuracy, which indicates [2] Q. Fu, J.-G. Lou, Y. Wang, and J. Li, “Execution anomaly detection in
the total number of log entries that are parsed to correct distributed systems through unstructured log analysis,” in ICDM, 2009.
message types over the number of total processed log records. [3] J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, “Mining invariants from
console logs for system problem detection.” in USENIX ATC, 2010.
Spell CLP (fixed threshold) IPLoM [4] J.-G. Lou, Q. Fu, S. Yang, J. Li, and B. Wu, “Mining program workflow
0.9 0.9
0.8 0.8
from interleaved traces,” in SIGKDD, 2010.
0.7 [5] K. Yamanishi and Y. Maruyama, “Dynamic syslog mining for network
accuracy

accuracy

0.6 0.7
failure monitoring,” in SIGKDD, 2005.
0.5 0.6
0.4 [6] W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting
0.5
0.3 large-scale system problems by mining console logs,” in SOSP, 2009.
0.2 0.4
[7] K. Nagaraj, C. Killian, and J. Neville, “Structured comparative analysis
0.10.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.35 10 15 20 25 30 35 40 45 50 of systems logs to diagnose performance problems,” in NSDI, 2012.
log size ( ×105 , Los Almos) log size ( ×105 , Blue Gene) [8] I. Beschastnikh, Y. Brun, M. D. Ernst, and A. Krishnamurthy, “Inferring
Fig. 4. Effectiveness comparison of different methods. models of concurrent systems from logs of their behavior with csight,”
Figure 4 shows the comparison on supercomputer logs. With in ICSE, 2014.
[9] A. A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, “Clustering
more log entries, number of message types also increases; and event logs using iterative partitioning,” in SIGKDD, 2009.
they don’t necessarily show up uniformly over time. Hence, the [10] L. Tang and T. Li, “LogTree: A framework for generating system events
effectiveness of a method does not necessarily show a steady from raw textual logs,” in ICDM, 2010.
[11] L. Tang, T. Li, and C.-S. Perng, “LogSig: Generating system events
trend as log size grows. We can see that in both charts, Spell from raw textual logs,” in CIKM, 2011.
achieves much better accuracy than other methods. IPLoM [12] W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan, “Online system
accuracy is acceptable in Figure 4 left for Los Almos log, and problem detection by mining patterns of console logs,” in ICDM, 2009.
[13] A. Gainaru, F. Cappello, S. Trausan-Matu, and B. Kramer, “Event log
becomes terrible in Figure 4 right for Blue Gene log. mining tool for large scale hpc systems,” in Euro-Par, 2011.
Note that the pre-filtering step in Spell may miss an existing [14] Z. Cao, S. Chen, F. Li, M. Wang, and X. S. Wang, “LogKV: Exploiting
message type t for a new log entry e if LCS(t, s) 6= t key-value stores for event log processing,” in CIDR, 2013.
[15] A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, “A lightweight
but LCS(t, s)| > |LCS(t0 , s)| when there is another existing algorithm for message type extraction in system application logs,”
message type t0 that satisfies t0 =LCS(t0 , s), where s is the TKDE, 2012.
token sequence of e. To evaluate such potential degrade to [16] H. C. Xia Ning, Geoff Jiang and K. Yoshihira, “HLAer: A system
for heterogeneous log analysis,” in SDM Workshop on Heterogeneous
the effectiveness due to the pre-filtering step, we show a Learning, 2014.
comparison in Table V. The result shows that Spell with pre- [17] Y. Li, H. Li, T. Duan, S. Wang, Z. Wang, and Y. Cheng, “A real linear
filtering has achieved an accuracy nearly the same as that using and parallel multiple longest common subsequences (mlcs) algorithm,”
in SIGKDD, 2016.
only naive LCS. This means the pre-filtering step has almost [18] Y. Li, Y. Wang, Z. Zhang, Y. Wang, D. Ma, and J. Huang., “A novel fast
no downgrade effect on the parsing results though it greatly and memory efficient parallel mlcs algorithm for longer and large-scale
reduces the parsing overhead. sequences alignments,” in ICDE, 2016.

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Phe Icws2017 Drain
No ratings yet
Phe Icws2017 Drain
8 pages
Icsme2022 Ulp
No ratings yet
Icsme2022 Ulp
12 pages
Drain An Online Log Parsing Approach With Fixed Depth Tree
No ratings yet
Drain An Online Log Parsing Approach With Fixed Depth Tree
8 pages
Molfi A Search-Based Approach For Accurate Identification of Log Message Formats
No ratings yet
Molfi A Search-Based Approach For Accurate Identification of Log Message Formats
11 pages
Shiso Incremental Mining of System Log Format
No ratings yet
Shiso Incremental Mining of System Log Format
8 pages
Log Parsing
No ratings yet
Log Parsing
24 pages
Sem Parse
No ratings yet
Sem Parse
13 pages
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
From Everand
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
Robert Johnson
No ratings yet
Python Algorithms Step by Step: A Practical Guide with Examples
From Everand
Python Algorithms Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Scripting with PowerShell for Beginners: A Practical Guide with Examples
From Everand
Scripting with PowerShell for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering System Programming with C: Files, Processes, and IPC
From Everand
Mastering System Programming with C: Files, Processes, and IPC
Larry Jones
No ratings yet
Shell Scripting Step by Step: A Practical Guide with Examples
From Everand
Shell Scripting Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
TMPA WhaleShark
No ratings yet
TMPA WhaleShark
14 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Python File Handling Made Easy: A Practical Guide with Examples
From Everand
Python File Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Nulog Self-Supervised Log Parsing
No ratings yet
Nulog Self-Supervised Log Parsing
16 pages
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
5/5 (2)
Building an Operating System with Rust: A Practical Guide
From Everand
Building an Operating System with Rust: A Practical Guide
Robert Johnson
No ratings yet
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Va A Data Clustering Algorithm For Mining Patterns From Event Logs
No ratings yet
Va A Data Clustering Algorithm For Mining Patterns From Event Logs
8 pages
LogSig Generating System Events From Raw Textual Logs
No ratings yet
LogSig Generating System Events From Raw Textual Logs
10 pages
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Python Automation for Beginners: A Practical Guide with Examples
From Everand
Python Automation for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Automated Log Parsing For Large Scale Log Analysis
No ratings yet
Automated Log Parsing For Large Scale Log Analysis
18 pages
Log Challenges
No ratings yet
Log Challenges
7 pages
Scrambling Research Paper 1
No ratings yet
Scrambling Research Paper 1
10 pages
Xu Sosp09
No ratings yet
Xu Sosp09
16 pages
Draft Abela Ulm 01
No ratings yet
Draft Abela Ulm 01
10 pages
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Bash Scripting Made Easy: A Practical Guide with Examples
From Everand
Bash Scripting Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Tools and Benchmarks For Automated Log Parsing
No ratings yet
Tools and Benchmarks For Automated Log Parsing
10 pages
Config File Types
From Everand
Config File Types
Frank Wellington
No ratings yet
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Systems Programming: Concepts and Techniques
From Everand
Systems Programming: Concepts and Techniques
Peter Johnson
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Data Manipulation with Python Step by Step: A Practical Guide with Examples
From Everand
Data Manipulation with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
PowerShell: A Beginner's Guide to Windows PowerShell
From Everand
PowerShell: A Beginner's Guide to Windows PowerShell
Roger Wilson
4/5 (1)
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
LFA Abstracting Log Lines To Log Event Types For Mining Software System Logs
No ratings yet
LFA Abstracting Log Lines To Log Event Types For Mining Software System Logs
4 pages
Software Architecture with Python
From Everand
Software Architecture with Python
Anand Balachandran Pillai
3/5 (1)
Learning Linux Binary Analysis: Learning Linux Binary Analysis
From Everand
Learning Linux Binary Analysis: Learning Linux Binary Analysis
Ryan "elfmaster" O'Neill
4/5 (1)
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
From Everand
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
Tim Peters
No ratings yet
Research Papers of SIEM
No ratings yet
Research Papers of SIEM
6 pages
Rust In Practice, Second Edition
From Everand
Rust In Practice, Second Edition
Rick Tim
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Deeplog: Anomaly Detection and Diagnosis From System Logs Through Deep Learning
No ratings yet
Deeplog: Anomaly Detection and Diagnosis From System Logs Through Deep Learning
14 pages
2504.04877v1
No ratings yet
2504.04877v1
34 pages
Linux Shell Scripting Simplified: A Practical Guide with Examples
From Everand
Linux Shell Scripting Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Event Log
No ratings yet
Event Log
2 pages
Mastering the Art of Unix Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Unix Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Lenma Length Matters Clustering System Log Messages
No ratings yet
Lenma Length Matters Clustering System Log Messages
10 pages
LogCluster A Data Clustering and Pattern Mining
No ratings yet
LogCluster A Data Clustering and Pattern Mining
7 pages
AEL Abstracting Execution Logs To Execution Events For Enterprise Applications
No ratings yet
AEL Abstracting Execution Logs To Execution Events For Enterprise Applications
6 pages
Fttree Syslog Processing For Switch Failure Diagnosis and Prediction in Datacenter Networks
No ratings yet
Fttree Syslog Processing For Switch Failure Diagnosis and Prediction in Datacenter Networks
10 pages
NGL Catalogue Spanish 2020 Web
No ratings yet
NGL Catalogue Spanish 2020 Web
106 pages
OBS User Manual
No ratings yet
OBS User Manual
9 pages
Child Pornography: First Report of The Dutch National Rapporteur
50% (2)
Child Pornography: First Report of The Dutch National Rapporteur
344 pages
SRS AndroidCalendar 0.1
No ratings yet
SRS AndroidCalendar 0.1
12 pages
Online Bookstore SRS
100% (1)
Online Bookstore SRS
16 pages
Departmental Storage System Guidelines-II PDF
No ratings yet
Departmental Storage System Guidelines-II PDF
200 pages
Guidelines e Commerce 13052019
No ratings yet
Guidelines e Commerce 13052019
14 pages
Storcenter Ix2
No ratings yet
Storcenter Ix2
137 pages
Evo Lute Tools For Rhino
No ratings yet
Evo Lute Tools For Rhino
180 pages
Tle Report in Quarter 3 Group 2
No ratings yet
Tle Report in Quarter 3 Group 2
4 pages
Understanding Mobile Shopping Behavior From A Utilitarian Perspective: A New Posteriori Framework
No ratings yet
Understanding Mobile Shopping Behavior From A Utilitarian Perspective: A New Posteriori Framework
14 pages
Master PDF Indiegogo
No ratings yet
Master PDF Indiegogo
186 pages
EPLAN Platform Multi-User Application Recommendation
No ratings yet
EPLAN Platform Multi-User Application Recommendation
20 pages
AFL1501
No ratings yet
AFL1501
10 pages
15 Drivemonitor: 15.1 Scope of Delivery
No ratings yet
15 Drivemonitor: 15.1 Scope of Delivery
2 pages
Library Services Terms and Conditions
No ratings yet
Library Services Terms and Conditions
7 pages
Use Case Scenarios
No ratings yet
Use Case Scenarios
12 pages
Syllabus-Philippines Indigenous Communities
No ratings yet
Syllabus-Philippines Indigenous Communities
8 pages
Digital Library Infrastructure and Architecture
No ratings yet
Digital Library Infrastructure and Architecture
8 pages
LESSON 7 Online Platforms
No ratings yet
LESSON 7 Online Platforms
10 pages
TK116 GPS Tracker Manual en
No ratings yet
TK116 GPS Tracker Manual en
26 pages
Online LIVE G.S. Prelim & Main Classroom Programme (September 2021 Session)
No ratings yet
Online LIVE G.S. Prelim & Main Classroom Programme (September 2021 Session)
2 pages
Bộ Đề Ielts Speaking Tháng 1 Đến Tháng 4 - 2023 Bản Final by Ngocbach
No ratings yet
Bộ Đề Ielts Speaking Tháng 1 Đến Tháng 4 - 2023 Bản Final by Ngocbach
27 pages
DMK-DMG Data Logger: Software Manual
No ratings yet
DMK-DMG Data Logger: Software Manual
18 pages
Why People (Don't) Shop Online: A Lifestyle Study of The Internet Consumer
No ratings yet
Why People (Don't) Shop Online: A Lifestyle Study of The Internet Consumer
41 pages
Exam Handbook - Revised
No ratings yet
Exam Handbook - Revised
21 pages
Connected Components Workbench Software Guide For Studio 5000 Logix Designer Application Users
No ratings yet
Connected Components Workbench Software Guide For Studio 5000 Logix Designer Application Users
34 pages
Just Dial STP Analysis
50% (2)
Just Dial STP Analysis
19 pages
Ecommerce For Small Enterprise
No ratings yet
Ecommerce For Small Enterprise
41 pages
KIN 170 Welcome Letter - Spring2020
No ratings yet
KIN 170 Welcome Letter - Spring2020
3 pages

Spell Streaming Parsing of System Event Logs

Uploaded by

Spell Streaming Parsing of System Event Logs

Uploaded by

Spell: Streaming Parsing of System Event Logs

Min Du, Feifei Li

LCSseq: Command has completed

Fig. 1. Basic workflow of Spell.

C. Basic workflow D. Improvement on efficiency

runtime (seconds ×103 )

You might also like