Process Mining: Overview and Opportunities: ACM Reference Format
Process Mining: Overview and Opportunities: ACM Reference Format
Over the last decade, process mining emerged as a new research field that focuses on the analysis of pro-
cesses using event data. Classical data mining techniques such as classification, clustering, regression, as-
sociation rule learning, and sequence/episode mining do not focus on business process models and are often
only used to analyze a specific step in the overall process. Process mining focuses on end-to-end processes
and is possible because of the growing availability of event data and new process discovery and conformance
checking techniques.
Process models are used for analysis (e.g., simulation and verification) and enactment by BPM/WFM sys-
tems. Previously, process models were typically made by hand without using event data. However, activities
executed by people, machines, and software leave trails in so-called event logs. Process mining techniques
use such logs to discover, analyze, and improve business processes.
Recently, the Task Force on Process Mining released the Process Mining Manifesto. This manifesto is
supported by 53 organizations and 77 process mining experts contributed to it. The active involvement of
end-users, tool vendors, consultants, analysts, and researchers illustrates the growing significance of process
mining as a bridge between data mining and business process modeling. The practical relevance of process
mining and the interesting scientific challenges make process mining one of the “hot” topics in Business
Process Management (BPM). This paper introduces process mining as a new research field and summarizes
the guiding principles and challenges described in the manifesto.
Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications—Data Min-
ing
General Terms: Management, Measurement, Performance
Additional Key Words and Phrases: Process mining, business intelligence, business process management,
data mining
ACM Reference Format:
Van der Aalst, W.M.P. 2011. Process Mining: Overview and Opportunities. ACM Trans. Manag. Inform. Syst.
99, 99, Article 99 (February 2012), 16 pages.
DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
1. INTRODUCTION
Process mining aims to discover, monitor and improve real processes by extracting
knowledge from event logs readily available in today’s information systems [Aalst
2011]. Over the last decade there has been a spectacular growth of event data and pro-
cess mining techniques have matured significantly. As a result, management trends
related to process improvement and compliance can now benefit from process mining.
Starting point for process mining is an event log. Each event in such a log refers to
an activity (i.e., a well-defined step in some process) and is related to a particular case
(i.e., a process instance). The events belonging to a case are ordered and can be seen
as one “run” of the process. Event logs may store additional information about events.
Author’s address: Department of Mathematics and Computer Science, Eindhoven University of Technology,
PO Box 513, 5600 MB, Eindhoven, The Netherlands. [email protected]
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights
for components of this work owned by others than ACM must be honored. Abstracting with credit is per-
mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component
of this work in other works requires prior specific permission and/or a fee. Permissions may be requested
from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or [email protected].
c 2012 ACM 0000-0001/2012/02-ART99 $10.00
DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:2 W. van der Aalst
In fact, whenever possible, process mining techniques use extra information such as
the resource (i.e., person or device) executing or initiating the activity, the timestamp
of the event, or data elements recorded with the event (e.g., the size of an order).
supports/
“world” business
controls
processes software
people machines system
components
organizations records
events, e.g.,
messages,
specifies transactions,
models configures
analyzes etc.
implements
analyzes
discovery
(process) event
conformance
model logs
enhancement
Fig. 1. The three basic types of process mining: (a) discovery, (b) conformance, and (c) enhancement.
Event logs can be used to conduct three types of process mining as shown in Fig. 1
[Aalst 2011]. The first type of process mining is discovery. A discovery technique takes
an event log and produces a model without using any a-priori information. Process
discovery is the most prominent process mining technique. For many organizations it
is surprising to see that existing techniques are indeed able to discover real processes
merely based on example behaviors stored in event logs. The second type of process
mining is conformance. Here, an existing process model is compared with an event log
of the same process. Conformance checking can be used to check if reality, as recorded
in the log, conforms to the model and vice versa. The third type of process mining is
enhancement. Here, the idea is to extend or improve an existing process model thereby
using information about the actual process recorded in some event log. Whereas con-
formance checking measures the alignment between model and reality, this third type
of process mining aims at changing or extending the a-priori model. For instance, by
using timestamps in the event log one can extend the model to show bottlenecks, ser-
vice levels, and throughput times.
Unlike traditional Business Process Management (BPM) techniques that use hand-
made models [Weske 2007], process mining is based on facts. Based on observed be-
havior recorded in event logs, intelligent techniques are used to extract knowledge.
Therefore, we claim that process mining enables evidence-based BPM. Unlike existing
analysis approaches, process mining is process-centric (and not data-centric), truly in-
telligent (learning from historic data), and fact-based (based on event data rather than
opinions).
Process mining is related to data mining. Whereas classical data mining techniques
are mostly data-centric [Hand et al. 2001], process mining is process-centric. Main-
stream business process modeling techniques use notations such as the Business Pro-
cess Modeling Notation (BPMN), UML activity diagrams, Event-driven Process chains
(EPC), and various types of Petri nets [Aalst and Stahl 2011; Desel and Reisig 1998;
Weske 2007]. These notations can be used model process processes with concurrency,
choice, iteration, etc.
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:3
This paper introduces not only process mining as a new research field, but also fa-
miliarizes the reader with the Process Mining Manifesto [TFPM 2011] released by the
Task Force on Process Mining in October 2011. The growing interest in log-based pro-
cess analysis motivated the establishment of a Task Force on Process Mining in 2009.
This manifesto aims to promote the topic of process mining. Moreover, by defining a set
of guiding principles and listing important challenges, this manifesto hopes to serve
as a guide for software developers, scientists, consultants, business managers, and end-
users. The goal is to increase the maturity of process mining as a new tool to improve
the (re)design, control, and support of operational business processes.
The remainder of this paper is organized as follows. Section 2 introduces the notion
of an event log, used as input for process mining. Section 3 shows how process mod-
els can be discovered from scratch using only raw event data. Section 4 discusses the
second type of process mining: conformance checking. Section 5 elaborates on the third
type of process mining: enhancement. The guiding principles and challenges listed in
the manifesto are summarized in Section 6. Section 7 discusses tool support and shows
some real-life examples. Section 8 concludes the paper.
3. DISCOVERY
This section introduces the notion of process discovery, i.e., automatically construct
models based on observed events.
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:4 W. van der Aalst
# trace
455 acdeh b
examine
191 abdeg thoroughly
M1 g
177 adceh p1 p5 pay
p3
c compensation
144 abdeh a examine e
111 acdeg start register casually decide end
p2
request
82 adceg h
d p4
reject
56 adbeh check ticket request
f reinitiate
47 acdefdbeh request
38 adbeg
33 acdefbdeh
14 acdefbdeg
11 acdefdbeg
check
9 adcefcdeh d ticket
8 adcefdbeh M2 b g
p1 examine p3 pay
5 adcefbdeg thoroughly compensation
a e
3 acdefbdefdbeg start register p2 decide end
request c
2 adcefdbeg examine h
2 adcefbdefbdeg casually reject
request
reinitiate
1 adcefdbefbdeh f1 request f2
reinitiate
1 adbefbdefdbeg request
1 adcefdbefcdefdbeg
1391
Fig. 2. One event log and two potential process models (M1 and M2 ) aiming to describe the observed be-
havior.
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:5
The state of a Petri net, also referred to as marking, is defined by the distribution of
tokens over places. A transition is enabled if each of its input places contains a token.
For example, in M1 , transition a is enabled in the initial marking of M1 , because the
only input place of a contains a token (black dot).
An enabled transition may fire thereby consuming a token from each of its input
places and producing a token for each of its output places. Firing a in the initial mark-
ing corresponds to removing one token from start and producing two tokens (one for
p1 and one for p2). After firing a, three transitions are enabled: b, c, and d. There is
a non-deterministic choice between b and d. Firing b will disable c because the token
is removed from the shared input place (and vice versa). Transition d is concurrent
with b and c, i.e., it can fire without disabling another transition. Transition e becomes
enabled after d and b or c have occurred. Note that transition e in M1 is only enabled
if both input places (p3 and p4) contain a token. After executing e, three transitions
become enabled: f , g, and h. These transitions are competing for the same token thus
modeling a choice. When g or h is fired, the process ends with a token in place end. If
f is fired, the process returns to the state just after executing a.
It is easy to check that all traces in the event log can be reproduced by M1 . This does
not hold for the second process model in Fig. 2. M2 is able to reproduce traces such
as acdeh (455 instances), abdeg (191 instances), and acdefbdeh (33 instances). Note that
M2 has two transitions corresponding to activity f . To refer to them they are named f1
and f2 . M2 also allows for behavior very different from what can be observed in the log,
e.g., abeg and abdddddf1 bddddeh are possible according to the model but do not appear in
the log. There are also traces in the log that cannot be replayed by M2 , e.g., adceh (177
instances), adceg (82 instances), and adcefcdeh (9 instances) are not possible according
to M2 .
The two process models in Fig. 2 are visualized in terms of Petri nets. In fact, both
models are so-called WF-nets [Aalst et al. 2011]. A WF-net is a Petri net with one
source place and one sink place such that all places and transitions are on a path from
source to sink. Both models in Fig. 2 have a source place named start and a sink place
end and all nodes are on a path from start to end.
In general, the notation used to visualize the result may be very different from the
representation used during the actual discovery process. All mainstream BPM nota-
tions (Petri nets, EPCs, BPMN, YAWL, UML activity diagrams, etc.) can be used to
show discovered processes such as M1 [Aalst 2011; Weske 2007].
3.3. Process Discovery Algorithms
Since the mid-nineties several groups have been working on techniques for automated
process discovery based on event logs [Aalst et al. 2004; Aalst et al. 2007; Agrawal
et al. 1998; Cook and Wolf 1998; Datta 1998; Dongen and Aalst 2004; 2005; Greco
et al. 2006; Weijters and Aalst 2003]. In [Aalst et al. 2003] an overview is given of the
early work in this domain. The idea to apply process mining in the context of work-
flow management systems was introduced in [Agrawal et al. 1998]. In parallel, Datta
[Datta 1998] looked at the discovery of business process models. Cook et al. investi-
gated similar issues in the context of software engineering processes [Cook and Wolf
1998]. Herbst [Herbst 2000] was one of the first to tackle more complicated processes,
e.g., processes containing duplicate tasks.
Most of the classical approaches have problems dealing with concurrency. The α-
algorithm [Aalst et al. 2004] is an example of a simple technique that takes concur-
rency as a starting point. The α-algorithm scans the event log for particular patterns.
For example, if activity a is followed by b but b is never followed by a, then it is assumed
that there is a causal dependency between a and b. To reflect this dependency, the cor-
responding Petri net should have a place connecting a to b. We use the notation, a > b
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:6 W. van der Aalst
if and only if there is a trace σ = ht1 , t2 , t3 , . . . tn i in the log and an i ∈ {1, . . . , n−1} such
that ti = a and ti+1 = b. a → b if and only if a > b and b 6> a; a#b if and only if a 6> b
and b 6> a; and akb if and only if a > b and b > a. These four ordering relations are used
to create places connecting the different transitions in the Petri net. The α-algorithm
is simple and efficient, but has problems dealing with complicated routing constructs
and noise (like most of the other approaches described in literature).
Region-based approaches are able to express more complex control-flow structures
without underfitting. State-based regions were introduced in 1989 [Ehrenfeucht and
Rozenberg 1989] and generalized in various ways [Cortadella et al. 1998]. In [Aalst
et al. 2010; Dongen et al. 2007; Sole and Carmona 2010] it is shown how these state-
based regions can be applied to process mining. In parallel, several authors applied
language-based regions to process mining [Bergenthum et al. 2007; Werf et al. 2010].
The basic idea of these approaches is to discover places. Note that the addition of places
limits the behavior of the Petri net. The idea is to add places that do not exclude any
of the behavior seen in the event log.
For practical applications of process discovery it is essential that noise and incom-
pleteness are handled well. Surprisingly, only few discovery algorithms focus on ad-
dressing these issues. Notable exceptions are heuristic mining [Weijters and Aalst
2003], fuzzy mining [Günther and Aalst 2007], and genetic process mining [Medeiros
et al. 2007].
ProM’s heuristic miner uses the algorithm described in [Weijters and Aalst 2003]
(see also Section 6.2 in [Aalst 2011]). The algorithm first builds a dependency graph
based on the frequencies of activities and the number of times one activity is followed
by another activity. Based on predefined thresholds, dependencies are added to the
dependency graph graph (or not). The dependency graph reveals the “backbone” of the
process model. This backbone is used to discover the detailed split and join behavior
of nodes. If an activity has multiple input arcs, then the heuristic miner analyzes the
log to see whether the join is an AND-join, an XOR-join or an OR-join. In case of an
OR-join, the detailed synchronization behavior is learned. If an activity has multiple
output arcs, then the “split behavior” is learned in a similar fashion.
See Chapter 6 of [Aalst 2011] for a more elaborate introduction to the various process
discovery approaches described in literature.
4. CONFORMANCE
In recent years, powerful process mining techniques have been developed that can
automatically construct a suitable process model given an event log. Whereas process
discovery constructs a model without any a priori information (other than the event
log), conformance checking uses a model and an event log as input. The model may have
been made by hand or discovered through process discovery. For conformance checking,
the modeled behavior and the observed behavior (i.e., event log) are compared.
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:7
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:8 W. van der Aalst
by x, then the footprints of event log and model disagree on the ordering relation of x
and y.
The second approach replays the event log on the model. A naive approach to-
wards conformance checking would be to simply count the fraction of cases that can
be “parsed completely” (i.e., the proportion of cases corresponding to firing sequences
leading from the initial state to the final state). This approach cannot distinguish be-
tween an “almost fitting” case and a case that is completely unrelated to the modeled
behavior. A better approach is to continue replaying the event log on the model even
when transitions are not enabled. Simply “borrow tokens”, force the transition to fire
anyway, and record the problem. In the end, the number of “borrowed tokens” and the
number of “tokens left behind” (not consumed) indicate the fitness level. See [Rozinat
and Aalst 2008] and Section 7.2 in [Aalst 2011].
The third, and most advanced, approach is to compute an optimal alignment between
each trace in the log and the most similar behavior in the model. Consider for example
the following three alignments between the example log and model M2 :
a c d e h a d c e h a d c e f d b e h
γ1 = and γ2 = and γ3 =
a c d e h a c e h a c e f2 b e h
γ1 shows a perfect alignment: all moves of the trace in the event log (top part of align-
ment) can be followed by moves of the model (bottom part of alignment). γ2 shows an
optimal alignment for trace adceh in the event log and model M2 . The first move of the
trace in the event log can be followed by the model (event a). However, in the second
position of the alignment, we see a move of the trace in the event log which cannot
be mimicked by the model. This move in just the log is denoted as (d, ). γ3 shows
an optimal alignment for trace adcefdbeh in the event log and model M2 . Here, we en-
counter two situations where log and model cannot move together. Also note the move
(f, f2 ), i.e., event f in the log corresponds to the execution of transition f2 . Alignments
γ2 and γ3 clearly show the reasons for non-conformance between model and log. Such
problems can easily be quantified as shown in [Aalst et al. 2012; Adriansyah et al.
2011].
Conformance can be viewed from two angles: (a) the model does not capture the
real behavior (“the model is wrong”) and (b) reality deviates from the desired model
“the event log is wrong”). The first viewpoint is taken when the model is supposed to
be descriptive, i.e., capture or predict reality. The second viewpoint is taken when the
model is normative, i.e., used to influence or control reality.
5. ENHANCEMENT
It is also possible to extend or improve an existing process model using the event log. A
non-fitting process model can be corrected using the diagnostics provided by the align-
ment of model and log. Moreover, event logs may contain information about resources,
timestamps, and case data. For example, an event referring to activity “register re-
quest” and case “992564” may also have attributes describing the person that regis-
tered the request (e.g., “John”), the time of the event (e.g., “30-01-2012:14.55”), the age
of the customer (e.g., “45”), and the claimed amount (e.g., “650 euro”). After aligning
model and log it is possible to replay the event log on the model. While replaying one
can analyze these additional attributes.
For example, it is possible to analyze waiting times in-between activities. Simply
measure the time difference between causally related events and compute basic statis-
tics such as averages, variances, and confidence intervals. This way it is possible to
identify the main bottlenecks [Aalst 2011].
Information about resources can be used to discover roles, i.e., groups of people fre-
quently executing related activities. Here, standard clustering techniques can be used.
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:9
It is also possible to construct social networks based on the flow of work and analyze re-
source performance (e.g., the relation between workload and service times). See [Song
and Aalst 2008] for an overview of various process mining techniques analyzing the
organizational perspective based on event logs.
Standard classification techniques can be used to analyze the decision points in the
process model [Rozinat and Aalst 2006]. For example, activity e (“decide”) has three
possible outcomes (“pay”, “reject”, and “redo”). Using the data known about the case
prior to the decision, we can construct a decision tree explaining the observed behavior.
Process mining is not restricted to offline analysis and can also be used for predic-
tions and recommendations at runtime. For example, the completion time of a partially
handled customer order can be predicted using a discovered process model with timing
information [Aalst et al. 2011].
6. PROCESS MINING MANIFESTO
The IEEE Task Force on Process Mining recently released a manifesto describing guid-
ing principles and challenges [TFPM 2011]. The manifesto aims to increase the visibil-
ity of process mining as a new tool to improve the (re)design, control, and support of
operational business processes. It is intended to guide software developers, scientists,
consultants, and end-users. Before summarizing the manifesto, we briefly introduce
the task force.
6.1. Task Force on Process Mining
The growing interest in log-based process analysis motivated the establishment of the
IEEE Task Force on Process Mining. The goal of this task force is to promote the re-
search, development, education, and understanding of process mining. The task force
was established in 2009 in the context of the Data Mining Technical Committee of the
Computational Intelligence Society of the IEEE. Members of the task force include rep-
resentatives of more than a dozen commercial software vendors (e.g., Pallas Athena,
Software AG, Futura Process Intelligence, HP, IBM, Fujitsu, Infosys, and Fluxicon),
ten consultancy firms (e.g., Gartner and Deloitte) and over twenty universities.
Concrete objectives of the task force are: to make end-users, developers, consultants,
managers, and researchers aware of the state-of-the-art in process mining, to promote
the use of process mining techniques and tools, to stimulate new process mining ap-
plications, to play a role in standardization efforts for logging event data, to organize
tutorials, special sessions, workshops, panels, and to publish articles, books, videos,
and special issues of journals. For example, in 2010 the task force standardized XES
(www.xes-standard.org), a standard logging format that is extensible and supported
by the OpenXES library (www.openxes.org) and by tools such as ProM, XESame, Nitro,
etc. See http://www.win.tue.nl/ieeetfpm/ for recent activities of the task force.
6.2. Guiding Principles
As with any new technology, there are obvious mistakes that can be made when ap-
plying process mining in real-life settings. Therefore, the six guiding principles listed
in Table I aim to prevent users/analysts from making such mistakes. As an example,
consider Guiding Principle GP4: “Events Should Be Related to Model Elements”. It
is a misconception that process mining is limited to control-flow discovery, other per-
spectives such as the organizational perspective, the time perspective, and the data
perspective are equally important. However, the control-flow perspective (i.e., the or-
dering of activities) serves as the layer connecting the different perspectives. There-
fore, it is important to relate events in the log to activities in the model. Conformance
checking and model enhancement heavily rely on this relationship. Using Fig. 2, we
showed for example alignment γ3 which relates observed trace adcefdbeh to firing se-
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:10 W. van der Aalst
quence acef2 beh in M2 . After relating events to model elements, it is possible to “replay”
the event log on the model [Aalst 2011]. Replay may be used to reveal discrepancies
between an event log and a model, e.g., some events in the log are not possible accord-
ing to the model. Techniques for conformance checking can be used to quantify and
diagnose such discrepancies. Timestamps in the event log can be used to analyze the
temporal behavior during replay. Time differences between causally related activities
can be used to add average/expected waiting times to the model. These examples illus-
trate the importance of guiding principle GP4; the relation between events in the log
and elements in the model serves as a starting point for different types of analysis.
6.3. Challenges
Process mining is an important tool for modern organizations that need to manage
non-trivial operational processes. On the one hand, there is an incredible growth of
event data. On the other hand, processes and information need to be aligned perfectly
in order to meet requirements related to compliance, efficiency, and customer service.
Despite the applicability of process mining there are still important challenges that
need to be addressed; these illustrate that process mining is an emerging discipline.
Table II lists the eleven challenges described in the manifesto [TFPM 2011].
As an example consider Challenge C4: “Dealing with Concept Drift”. The term con-
cept drift refers to the situation in which the process is changing while being analyzed
[Bose et al. 2011]. For instance, in the beginning of the event log two activities may be
concurrent whereas later in the log these activities become sequential. Processes may
change due to periodic/seasonal changes (e.g., “in December there is more demand” or
“on Friday afternoon there are fewer employees available”) or due to changing condi-
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:11
Table II. Some of the Most Important Process mining Challenges Identified in the Manifesto
Finding, Merging, and Cleaning Event Data
C1 When extracting event data suitable for process mining several challenges need to be ad-
dressed: data may be distributed over a variety of sources, event data may be incomplete,
an event log may contain outliers, logs may contain events at different level of granularity,
etc.
Dealing with Complex Event Logs Having Diverse Characteristics
C2 Event logs may have very different characteristics. Some event logs may be extremely large
making them difficult to handle whereas other event logs are so small that not enough data
is available to make reliable conclusions.
Creating Representative Benchmarks
C3 Good benchmarks consisting of example data sets and representative quality criteria are
needed to compare and improve the various tools and algorithms.
Dealing with Concept Drift
C4 The process may be changing while being analyzed. Understanding such concept drifts is of
prime importance for the management of processes.
Improving the Representational Bias Used for Process Discovery
C5 A more careful and refined selection of the representational bias is needed to ensure high-
quality process mining results.
Balancing Between Quality Criteria such as Fitness, Simplicity, Precision, and
Generalization
C6 There are four competing quality dimensions: (a) fitness, (b) simplicity, (c) precision, and (d)
generalization. The challenge is to find models that score good in all four dimensions.
Cross-Organizational Mining
C7 There are various use cases where event logs of multiple organizations are available for
analysis. Some organizations work together to handle process instances (e.g., supply chain
partners) or organizations are executing essentially the same process while sharing expe-
riences, knowledge, or a common infrastructure. However, traditional process mining tech-
niques typically consider one event log in one organization.
Providing Operational Support
C8 Process mining is not restricted to off-line analysis and can also be used for online oper-
ational support. Three operational support activities can be identified: detect, predict, and
recommend.
Combining Process Mining With Other Types of Analysis
C9 The challenge is to combine automated process mining techniques with other analysis ap-
proaches (optimization techniques, data mining, simulation, visual analytics, etc.) to extract
more insights from event data.
Improving Usability for Non-Experts
C10 The challenge is to hide the sophisticated process mining algorithms behind user-friendly
interfaces that automatically set parameters and suggest suitable types of analysis.
Improving Understandability for Non-Experts
C11 The user may have problems understanding the output or is tempted to infer incorrect
conclusions. To avoid such problems, the results should be presented using a suitable rep-
resentation and the trustworthiness of the results should always be clearly indicated.
tions (e.g., “the market is getting more competitive”). Such changes impact processes
and it is vital to detect and analyze them [Bose et al. 2011].
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:12 W. van der Aalst
etc. are all supported by ProM [Aalst 2011; Verbeek et al. 2010]. For example, dozens
of different process discovery algorithms are supported by ProM. The functionality of
ProM is unprecedented, i.e., there is no product offering a comparable set of process
mining algorithms. However, the tool requires process mining expertise and is not sup-
ported by a commercial organization. Hence, it has the advantages and disadvantages
common for open-source software.
Fortunately, there is also a growing number of commercially available software prod-
ucts offering process mining capabilities. Examples are: ARIS Process Performance
Manager (Software AG), Comprehend (Open Connect), Discovery Analyst (Stereo-
LOGIC), Flow (Fourspark), Futura Reflect (Futura Process Intelligence), Interstage
Automated Process Discovery (Fujitsu), Process Discovery Focus (Iontas/Verint), Pro-
cessAnalyzer (QPR), and Reflect|one (Pallas Athena).
All of the products mentioned support process discovery, i.e., constructing a process
model based on an event log. For example, Futura Reflect supports genetic process
mining as described in [Medeiros et al. 2007]. Some of the systems mentioned have
difficulties discovering concurrency, e.g., ARIS Process Performance Manager, Flow,
and Interstage Automated Process Discovery. All systems take the timestamps in the
event log into account to be able to provide performance-related information, i.e., flow
times and bottlenecks can be discovered.
None of the commercial software products provides comprehensive support for con-
formance checking, i.e., the focus is on process discovery and performance measure-
ment. However, ProM supports the different types of conformance checking described
in Section 4.3.
Some of these products embed process mining functionality in a larger system, e.g.,
Pallas Athena embeds process mining in their BPM suite BPM|one. Other products
aim at simplifying process mining using an intuitive user interface.
7.2. Discovering “Spaghetti Processes”
There is a continuum of processes ranging from highly structured processes (Lasagna
processes) to unstructured processes (Spaghetti processes). Figure 3 shows why un-
structured processes are often called “Spaghetti processes”. The model was obtained
using ProM’s heuristic miner [Weijters and Aalst 2003]. Hence, low frequent behavior
has been filtered out. Nevertheless, the model is too difficult to comprehend. Note that
this is not necessarily a problem of the discovery algorithm. Activities are only con-
nected if they frequently followed one another in the event log. Hence, the complexity
shown in Fig. 3 reflects reality and is not caused by the discovery algorithm.
Figure 3 is an extreme example used to illustrate the characteristics of a typical
Spaghetti process. Given the data set it is not surprising that the process is unstruc-
tured; the 2765 patients did not form a homogeneous group and included individuals
with very different medical problems. The process model in Fig. 3 can be simplified
dramatically by selecting a group of patients with similar problems or by selecting
only the most frequent activities. Nevertheless, its complexity exemplifies some of the
challenges mentioned in the manifesto (in particular C1, C2, C6, C10, and C11).
7.3. Analyzing “Lasagna Processes”
Processes in municipalities are typically Lasagna processes. Figure 4 shows a so-called
“WOZ process” discovered for a Dutch municipality. We applied the heuristic miner
[Weijters and Aalst 2003] on an event log containing information about 745 objections
against the so-called WOZ (“Waardering Onroerende Zaken”, i.e., Valuation of Real Es-
tate) valuation. Dutch municipalities need to estimate the value of houses and apart-
ments. The WOZ value is used as a basis for determining the real-estate property tax.
The higher the WOZ value, the more tax the owner needs to pay. Therefore, Dutch mu-
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:13
B_Catheter a Demeur
e
(start)
2096
O_ECG dagelijks
(schedule) 0,5 0,667 0,857
2191 1 3 19
B_Catheter epiduraal
0,5 (start) 0,75 0,5 0,8
1 170 5 2 4
B _ C A PD O_ECG cito
0,667 (start) (schedule) 0,667
4 2 35 2
B_Scleroseren GI bloedin
g
0,667 0,5 (start)
8 5 4
M_MeasurementClinic
(complete) 1 0,995 0,667
12474 9484 1716 9
B_Oogglazen
0,5 0,833 0,5 0,5 0,5 0,75 0,5 0,75 0,983 (start)
4 29 4 2 2 12 3 9 128 3
O_X-thorax cito C_Sternumwondinfectie C_-Premature Slagen NNO B_Jejunostomie O_Doppler perifere vaten
(schedule) 0,889 (start) (start) (complete) 0,667 0,667 (schedule) 0,875
60 4 4 3 2 45 13 16 22
C_Dwarslaesi
e C_Endocarditis
0,5 0,667 0,5 0,5 (start) 0,75 0,5 (start) 0,5 0,833 0,5 0,5 0,5
1 11 1 1 1 5 1 2 3 14 2 2 2
B_Beademing C_Hepatitis, drug induced C_Thrombo-embolie art O_Fenytoine B_Supra Pubische blaascat
h
(start) (complete) 0,857 (start) 0,5 0,667 (schedule) (start) 0,667 0,969
2187 1 45 2 1 15 7 23 2 51
C_Decompensatie geen K
O C_Postanox encefalopa
t
0,5 0,5 0,5 (start) 0,5 0,998 0,5 (start) 0,667 0,5
2 1 3 4 3 532 1 3 4 1
B_Orthopaedische tracti
e B_Decubitus behandelin
g
0,667 (start) 0,5 0,75 0,5 (complete) 0,5 0,571 0,5 0,5 0,5 0,5 0,5 0,5 0,667 0,5 0,5 0,5 0,5 0,75 0,5 0,929 0,75
2 2 2 7 2 3 1 34 3 1 4 1 2 3 12 2 1 1 7 7 1 386 8
M_MeasurementChemistry
0,5 0,667 0,5 0,5 0,5 0,667 0,5 (complete) 1 0,833
1 17 1 1 1 14 6 19168 13945 26
C_Hemorrhoiden bloeden
d C_Pustuleuze af
w B_Decubitus behandelin
g
0,992 0,929 0,667 (start) 0,5 0,5 0,5 0,5 (start) 0,5 (start) 0,5 0,5 0,5 0,5 0,5 0,5 0,5 0,5 0,667 0,667 0,889 0,5 0,5 0,5 0,75 0,75 0,667 0,667 0,75 0,5 0,667 0,667 0,5 0,5
1718 100 2 1 1 1 1 1 1 1 4 5 4 1 13 13 3 4 3 2 5 11 1 4 1 5 4 15 9 9 2 5 2 1 1
B_Cardioversie M_MeasurementBloodGa
s O_Biopsie
0,833 (complete) (complete) 1 (complete) 0,8 0,912
29 80 28252 21398 2 14 99
B_Isolatie Universeel B_Minitracheotomie C_Lekkage na plastiek C_Decubitus hak st.a2 O_Methyl blauw/ fistulogram C_Pleura-Effusie O_Echo perifere vaten B_Wondzorg open bui
k C_Hypoglycaemie O_Kweek perifeer infuus B_Tracheostomie - percutaa
n B_Drain(s) sump B_Fixateur Extern
e O_Pulmonalis angi
o B_IPPB B_ P T CA C_Coma C_Hyperglycaemie >20mmol/
l B_Uritip C_Cholecystitis, acalc B_Actief warmte toevoege
n B_Isolatie aerogen
e B_Isolatie aerogen
e C_ARDS C_Platzbauch C_Geen plaats af
d B_Wondzorg open thora
x B_Plasmaforese C_Intra-peritoneaal Abces C_-VT B_Vacuum therapi
e O _ T EE C_Ischemie, Myocard O_ E EG B_Fasciotomie C_Bacteriemie O_Kweek art. lijn C_ARDS C_Psychose/verward B_Clysmeren C_MI mogelijk C_Sufheid O_Tobramycine dal / top B_Bezoek: afw. tijden C_s3 Shock, Hypovolaemisch C_Oligurie (< 5 ml/kg/24u) B_Oogzalven / druppele
n B_Drain(s) won
d C_Hemi-beeld C _ D IS C_Pleisterlaesie C_Febris e.c.i. C_Non oligurische nierinsu
f B_Verpleegvorm prikkelar
m O_X been B_Verpleegvorm boomsta
m C_Bronchitis (klinisch
) C_Acute Tubulus Necros
e C_Lijn sepsis B_IABP in op OK C_MI mogelijk B_Low flow bed O_Vancomycine dal / to
p B_Tracheostomie B_Supra Pubische blaascat
h C_Icterus (bili > 50 ) O_Sigmoideoscopie C_Empyeem B_Buikligging C_Hypoglycaemie C_Darmperforatie B_NO beademin g O_Fundus scopie O_Fundus scopie C_Hepatitis, drug induced C_Rhabdomyolysis B_CAVH(D ) C_Fistel bovenste tr di
g C _ - VF C_Critical illness polyneu
r B_Actief koelen C_Hypoxemie C_Ischemische hepatiti
s C_Candidosis invasief C_Decubitus overig st. 1 O_Coronair angiogram C_Pneumonie (mogelijk
) C_Abces C_Naadlekkag
e C_Lijnkweek positief C_Nosocomiale Pneumonie B_Bi of Trilumen Catheter C_Tamponade B_Duo luchtmatras B_Liescatheter(s) C_TIA O_Kweek liescatheter veneu
s C_Autoextubatie B_Fasciotomie O_CT-buik O_24 uurs urine Na Creatr U C_Pneumothorax O_Huiduitstrijk Oksel Li /R C_-SVT, paroxysmaa
l C_Urineweginfectie C_Para-valvulair lek na OK C_Bronchitis -purulen
t C_Hyperglycaemie >20mmol/l B_Necrotomie C_Peritonitis C_Trombopenie C_CVA C_Colitis, pseudomembraneus C_Pneumonie (klinisch
) C_Depressie C_Subcutaan emfysee
m C_Aspiratie B_Buikligging C_GI-bloeding C_Autoextubatie C_Loge Syndroom C_Pneumonie C_Maagretentie(>1500 ml/24) C_Beademingsafhankelijkheid B_Uritip B_Intermit. catheteriseren B_IABP in op OK
(start) (start) (start) 0,75 (start) (schedule) (start) (complete) (complete) (complete) (schedule) (start) 0,667 (start) (start) (complete) (complete) (complete) (start) (complete) (start) (start) 0,5 0,7 (start) (start) (complete) (complete) (complete) (start) 0,5 (complete) (complete) (start) (start) 0,5 (start) (schedule) 0,5 (start) 0,5 (schedule) 0,861 (complete) 0,667 (start) (schedule) 0,8 (start) (start) (start) 0,5 (complete) (start) 0,875 0,875 (schedule) 0,625 (complete) (start) 0,667 (start) (complete) (complete) (start) 0,75 (start) (start) (start) 0,75 (start) (complete) (schedule) (complete) (start) (start) (start) (complete) (start) (start) (schedule) (start) (complete) (start) 0,9 (schedule) (start) (start) 0,9 (start) 0,667 (start) (complete) (schedule) (complete) (start) (start) (start) (start) 0,643 (start) (start) (complete) (start) (start) (start) (start) (schedule) (start) (start) (start) (start) (start) (start) (complete) (complete) (start) 0,679 (start) (schedule) (start) 0,955 0,857 0,75 (start) 0,75 (schedule) (schedule) (start) 0,75 (schedule) (start) (start) (start) (start) (start) (complete) 0,8 (start) (start) (start) (start) (start) (start) (start) (start) (complete) (start) (complete) (start) (start) (start) (complete) (complete) 0,972 0,5 0,857 (complete) (start)
3 4 5 6 3 2 2 2 10 2 1 36 2 5 3 1 2 4 2 1 3 1 1 10 158 1 1 1 1 2 2 3 2 1 16 2 17 84 1 20 4 6 83 2 1 22 14 12 12 36 14 10 3 36 40 41 19 15 40 7 5 40 56 58 7 43 17 3 6 38 13 1 2 7 20 24 9 53 37 21 30 21 1 7 176 3 8 18 289 25 5 5 1 1 1 5 1 4 3 18 13 3 2 2 6 1 1 6 1 2 3 2 13 101 2 57 90 44 1 10 50 27 9 23 2 27 32 1 31 12 2 15 2 5 13 4 5 2 2 5 13 1 4 1 7 5 15 9 11 2 5 3 3 1 120 1 27 16 56
B_Horizontaal O_Wond inspectie B_Tracheostomie - percutaan B_Rethoratocomie op OK B_Reintubatie C_Decubitus stuit st. 2b O_Methyl blauw/ fistulogram B_Actief warmte toevoege
n C_Decompensatie na OK C_-VT O_TEE C_Ischemie, Myocard O_ E E G B_Isolatie Universeel B_IABP in op IC
U B_Catheter a Demeur
e O_Kweek art. lij
n B_Clysmere n C_MI zeker O_Tobramycine dal / top C_Decubitus hak st. a3 B_Arterie lijn op IC
U B_Beademing Niet Invasie
f C _ - VF C_Bloeding waarvoor reO
K O_ECHO Buik B_Laparotomie B_CVVH O_Vancomycine dal / to
p B_Defibrilatie O_Coronair angiogra
m B_T drain C_Anurie (<1ml/kg/24u) B_Air fluid bed B_ E R C P O_Keel kweek O_Kweek liescatheter veneu
s B_Reintubatie na Autoext O_24 uurs urine Na Creatr U O_CT-buik
(start) (complete) (complete) (start) (complete) (start) (complete) 0,75 0,75 (complete) (start) (complete) (complete) (complete) 0,5 (complete) (complete) (start) (complete) 0,5 (complete) (complete) (start) (complete) (start) (start) (start) (complete) (start) (schedule) (start) (complete) (complete) (start) (complete) (start) (start) 0,667 (start) (start) (schedule) (complete) (start) (complete) 0,8 (complete)
1 1 20 43 73 2 2 48 62 150 3 2 79 1 1 5 2 17 150 1 12 13 46 18 1 327 8 5 48 28 13 55 28 14 5 1 35 2 42 2 19 10 14 1 6 31
B_Pacemaker inbrenge
n B_Bloedtoediening met druk C_GI-bloeding
0,5 0,5 0,5 0,5 (start) (start) 0,5 (complete) 0,5 0,667 0,769 0,5 0,5 0,667 0,947 0,833 0,932 0,5
2 1 1 1 7 5 1 2 1 4 18 1 1 8 118 23 115 1
C_Wondinfectie O_X TWK B_Drain golf B_Intermit. Haemo Dialyse B_Sonde-Voeding C_Decubitus overig st. b4 O_X-thorax op aanvraag B_Duo luchtmatras O_Gentamycine dal / top O_X been
(start) (schedule) (complete) 0,833 0,5 (complete) (complete) 0,909 (start) (complete) (start) (complete) (complete)
3 1 6 15 1 14 159 32 1 28 192 115 2
B_Plasmaforese
(start) 0,8 0,889 0,889 0,917 0,667 0,8
5 5 106 93 73 5 14
B_Maagsonde B_Fysiotherapie
0,927 0,5 0,8 (complete) (complete) 0,5 0,5
1518 3 7 894 16 4 1
B_Basiszorg
0,667 (complete) 0,8 0,5 0,5
27 43 18 1 1
C_Hypertensie
(start) 0,5
1 1
B_PEG cathete
r
(start) 0,5 0,5
7 1 1
B_Blaasspoelen
0,667 0,942 (complete) 0,938 0,5
3 172 5 168 1
B_Perifeer infuus
0,857 0,961 0,5 (start) 0,5
14 147 1 2837 1
B_Thoraxdrain O_EMV scor e C_Flebitis C_Decubitus stuit st. 3a O_Echo nier blaas prostaat O_Toxicologie O_Transthoracaal ECH
O B_Swan Ganz op OK B_PCA pom p
0,5 (start) 0,912 (schedule) (start) (start) (schedule) (schedule) (schedule) (start) (start) 0,75
1 1863 32 10 2 1 15 2 12 117 19 4
B_Catheter epiduraal
0,5 0,667 0,5 (complete) 0,917 0,667 0,75 0,667 0,75 0,909
2 17 1 39 15 2 10 26 4 30
C_Pancreatitis
0,947 0,5 0,824 0,667 0,75 0,667 (start) 0,967 0,998 0,981
179 1 21 2 5 3 2 1169 755 56
C_Ischemische darm
0,667 0,5 0,833 0,5 0,667 0,5 0,75 (start) 0,667 0,5 0,833
4 4 20 2 7 1 14 6 3 3 10
B_PEP masker C_Bloedverlies > 50 ml/uur B_Amputatie Extremiteit B_Rethoratocomie op OK B_Wondzorg overig B_Swan Ganz op U
IC B_Isolatie contact O_X b.o.z.
0,5 (start) (start) (start) (complete) 0,889 0,667 (complete) (complete) (complete) (complete) 0,5 0,5 0,992 0,5
1 6 47 3 42 9 2 23 15 3 10 1 1 269 2
B_IABP in op IC
U
0,947 0,667 (complete) 0,667 0,5 0,5 0,995 0,857 0,5
201 4 12 8 1 1 250 8 1
B_PEG catheter
0,984 0,75 0,667 0,5 0,5 0,7 0,5 0,5 0,5 0,5 0,5 0,5 0,857 (complete) 0,75
1564 3 2 2 3 24 2 1 2 1 3 1 38 3 3
B_Drain(s) redon C_Sufheid B_Pacemaker AAN B_PCA pomp C_Lekkage na plastiek C_s2 Shock, Cardiaal O_Doppler perifere vate
n B_Bi of Trilumen Cathete
r O_Digoxine B_Halsinf./subclavia op OK
0,5 (start) 0,923 (complete) (complete) (complete) (complete) (complete) (complete) (complete) (complete) 0,966 0,5 (complete) 0,667
1 210 16 4 88 2 1 4 2 29 1 271 1 106 3
C_-Brady / Aritmie
0,5 0,955 0,8 0,5 0,25 0,5 0,5 0,5 0,5 (complete) 0,8 0,75
3 54 60 1 2 1 1 1 1 2 12 24
B_Drain(s) redo
n C_Hypotensie C_Anurie (<1ml/kg/24u) C_Bloeding waarvoor > 3 CP
0,947 0,5 (complete) 0,5 0,968 (complete) (complete) (start) 0,75 0,971
71 1 66 1 326 1 1 15 6 54
B_IABP uit op IC
U
0,667 0,982 0,5 0,5 0,5 (complete)
3 168 1 1 1 1
B_Necrotomie
0,5 0,5 0,5 0,5 0,95 0,5 0,5 (start) 0,667 0,5 0,5 0,8 0,5 0,833
1 1 1 1 33 1 1 5 2 1 1 7 4 28
C_Platzbauch B_Isolatie strikte C_Stridor B_Pacemaker AAN C_Shock, Anaphylactisch C_Lijn sepsis
(start) (start) (start) (start) 0,997 (start) (complete) 0,5
4 4 2 158 533 6 1 1
C_Decompensatie na OK
0,5 0,889 0,8 0,667 (complete) 0,667 0,667 0,5 0,75
1 45 25 3 1 4 3 9 5
C_Bronchitis (klinisch
)
0,5 0,5 0,5 0,75 0,5 0,857 (complete) 0,889 0,5 0,5 0,667
1 1 1 9 4 128 1 71 3 1 8
B_Liescatheter(s
)
0,766 0,5 (complete)
112 1 31
B_Vernevelaar
(complete) 0,959 0,5 0,75 0,857 0,75
17 282 1 4 53 105
B_Tracheostoma/Tube LOS
0,667 (complete) 0,974 0,75 0,667
4 57 277 23 5
B_PEP masker
0,965 0,5 0,5 (complete) 0,979 0,5 0,75 0,5
170 1 1 5 44 1 13 6
B_Isolatie strikte
0,667 0,857 0,5 (complete) 0,667
8 15 1 3 3
O_Wegen dagelijks
(schedule) 0,5 0,5
158 2 2
O_Kweek peritoneum
0,5 0,857 (schedule) 0,5 0,767 0,75 0,8 0,8
1 15 7 1 106 4 9 24
O_Synacthen
0,5 0,5 0,5 0,5 0,5 0,667 (schedule)
1 2 2 4 1 2 55
B_Donor Weefsel
0,98 0,667 (start) 0,9 0,936 0,955 0,972 0,75 0,9 0,833 0,833
64 5 1 1359 2032 101 53 4 1296 45 97
O_Synacthen
0,5 0,965 (complete) 0,875
1 122 53 24
B_Halsinf./subclavia op Ok
0,75 0,667 0,5 0,5 (start) 0,857 0,875 0,875
8 5 1 1 772 19 19 20
C_Trombopenie
0,909 0,803 0,5 0,667 0,5 (complete) 0,5
91 158 1 33 1 1 1
B_Reintubatie na Autoext
(complete) 0,5 0,5
14 1 1
C_Parotitis
(start) 0,5 0,982 0,5
1 1 282 1
O_X-thorax dagelijk
s
(complete) 0,909 0,5
331 4 1
O_Ramsay-score
0,5 0,5 (schedule) 0,5 0,5 0,5 0,5
1 3 5 1 1 1 1
B_Scleroseren GI bloedin
g
0,5 0,5 (complete) 0,5 0,5
1 2 4 1 1
B_Jejunumsonde
0,5 0,8 0,951 0,5 (complete) 0,5
4 8 230 1 6 1
C_Atelectase
0,5 0,967 0,955 0,938 0,933 0,667 0,833 0,944 0,857 0,5 0,5 (complete)
1 57 124 15 14 7 5 150 7 1 1 2
O_Kweek liescatheter ar
t O_Cito GRAM + sputumkweek O_Bloedkweek 3 O_Faeces kweek O_Bloedkweek 2 O_Kweek bi/tri lumen cath. O_Kweek urinecathete
r C_-VKF, atrium-flutte
r
(schedule) (schedule) (schedule) (schedule) (complete) (complete) (schedule) (complete) 0,5 0,5 0,992
1 97 14 63 252 58 30 52 1 1 657
B_Empyeem spoelin
g
0,5 0,933 0,5 0,667 0,667 (complete)
1 18 1 4 5 1
O_Kweek tracheostom
a
(complete) 0,5 0,5 0,5 0,50,889 0,5 0,5 0,5 0,667 0,667
1 1 1 1 1 7 1 1 1 6 3
C_Exantheem / Rash O_Sigmoideoscopie O_Kweek perifeer infuus O_Wond kweek B_Verwijderen tampon
0,947 (start) (complete) (complete) (complete) 0,947 (complete) 0,5
32 4 3 1 88 9 1 2
B_Thoraxdrain
0,5 (complete)
1 617
B_Actief koelen
(start) 0,97 0,9 0,75
2 106 99 11
B_Basiszorg
(start)
2010
0,5 0,5 0,5 0,667 0,5 0,5 0,5 0,964 0,5 0,984 0,667 0,5
1 2 3 3 6 5 3 91 1 86 16 1
O_Urine kweek B_Fysiotherapie O_Paracetamol O_Gastro / Duodenscopie O_Pleurapunctie O_Bronchoscopie B_Isolatie Beschermen
d
(schedule) 0,817 (start) 0,999 (complete) (complete) (schedule) (schedule) (complete)
244 448 371 1282 1 24 3 28 1
0,5 0,5 0,5 0,939 0,5 0,98 0,944 0,5 0,667 0,5
1 1 1 57 1 140 27 1 3 1
0,5 0,571 0,667 0,5 0,5 0,5 0,958 0,947 0,5 0,5 0,5 0,833
1 6 5 3 2 1 78 38 1 2 1 52
B_Drain(s) sump
0,667 0,875 0,8 (complete) 0,667 0,5 0,5 0,5 0,5 0,5 0,8 0,5
26 22 3 2 6 9 1 2 1 2 5 1
B_Beademing C_Polyurie (>40ml/kg/24u) O_Ramsay-score B_PTCA B_Decubitus zorg stadium a3 B_Verwijderen Agraves O_Tracheaspoelin
g
(complete) (start) (complete) 0,8 (start) (start) (complete) (complete)
1868 1 3 7 6 4 5 1
B_IPPB
0,5 0,5 0,5 0,918 0,5 0,5 0,75 0,5 (start) 0,5
1 1 1 112 3 2 6 1 8 1
B_O2 masker/neusslan
g B_Decubitus zorg stadiumb4 O_IAP studie O_Kweek sheath C_Empyeem C_Bacteriemie B_Vernevelaa
r
(complete) (complete) (schedule) 0,5 (complete) (complete) (complete) (start)
213 1 2 1 7 1 1 25
B_Beademing gestart op IC
U B_NO beademing B_Ontlastende LP bij druk
(start) (start) (complete) 0,875
61 1 1 49
0,5 0,792
2 39
B_Bronchiaal toile
t
(start) 0,667
373 2
B_Oogzalven / druppele
n O_Sputum kweek B_Bezoek: waken B _ R e OK B_Wondzorg overig
0,75 (start) (schedule) (complete) (complete) (start)
53 102 428 27 10 270
0,75 0,985 0,5 0,8 0,75 0,5 0,5 0,833 0,667 0,5 0,5
6 391 2 5 6 1 1 9 2 1 1
B_Intubatie O_Sputum kweek O_Kweek peritoneum B_Orthopaedische tractie O_Kweek overige O_I.V Catheter kweek overig B_ERCP O_Virus serologi
e B_Bezoek: afw. tijden B_Brochusscopie
(complete) 0,8 (complete) (complete) (complete) (schedule) (schedule) (complete) (complete) (start) (complete)
95 12 405 7 2 49 29 2 8 70 14
0,5 0,5 0,923 0,5 0,5 0,5 0,75 0,5 0,833 0,5 0,5
1 1 150 1 1 1 7 1 8 1 1
0,667 0,5 0,667 0,5 0,731 0,667 0,5 0,5 0,667 0,5
13 2 4 1 27 3 1 3 2 2
0,5 0,5
1 1
B_Arterie lijn op IC
U O_Cystoscopie
(complete) (complete)
184 1
Fig. 3. Spaghetti process describing the diagnosis and treatment of 2765 patients in a Dutch hospital. The
process model was constructed based on an event log containing 114,592 events. There are 619 different
activities (taking event types into account) executed by 266 different individuals (doctors, nurses, etc.).
nicipalities need to handle many objections (i.e., appeals) of citizens that assert that
the WOZ value is too high. Figure 4 shows the process of handling these objections
within a particular municipality. The diagram is not intended to be readable; it is only
included to show the contrast with Fig. 3.
OZ04 Incompleet
Domain: heus1 complete
complete OZ04 Incompleet
start OZ10 Horen
complete
OZ10 Horen
OZ02 Voorbereiden start
start
OZ15 Zelf uitspraak
OZ02 Voorbereiden OZ06 Stop vordering OZ06 Stop vordering start OZ15 Zelf uitspraak
OZ08 Beoordelen OZ08 Beoordelen OZ09 Wacht Beoord complete
complete start complete
start complete start
OZ20 Administatie OZ20 Administatie OZ24 Start vordering OZ24 Start vordering
start complete start complete
OZ09 Wacht Beoord
complete
OZ16 Uitspraak OZ16 Uitspraak OZ18 Uitspr. wacht
start complete start
OZ18 Uitspr. wacht
complete
OZ12 Hertaxeren
complete
OZ12 Hertaxeren
start
Fig. 4. WF-net discovered based on an event log of a Dutch municipality. The log contains events related to
745 objections against the so-called WOZ valuation. These 745 objections generated 9583 events. There are
13 activities. For 12 of these activities both start and complete events are recorded. Hence, the WF-net has
25 transitions.
The discovered WF-net has a good fitness: 628 of the 745 cases can be replayed
without encountering any problems. The fitness of the model and log at the event level
is 0.98876214. This value is based on the approach described in [Aalst 2011; Rozinat
and Aalst 2008]. The high value shows that almost all recorded events are explained
by the model. Hence, the WOZ process is clearly a Lasagna process. Nevertheless, it
was interesting for the municipality to see the deviations highlighted in the model.
Figure 5 shows a fragment of the diagnostics provided by the ProM’s conformance
checker.
The municipality’s log contains timestamps. Therefore, it is possible to replay the
event log while taking the timestamps into account. ProM can visualize the phases of
the process that take most time. For example, the place in-between “OZ16 Uitspraak
start” (start of announcement of final judgment) and “OZ16 Uitspraak complete” (end
of announcement of final judgment) was visited 436 times. The average time spent in
this place is 7.84 days. This indicates that activity “OZ16 Uitspraak” (final judgment)
takes about a week. It is also possible to simply select two activities and measure the
time that passes in-between these activities. On average 202.73 days pass in-between
the completion of activity “OZ02 Voorbereiden” (preparation) and the completion of
“OZ16 Uitspraak” (final judgment). Such examples illustrate that process mining –
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:14 W. van der Aalst
Fig. 5. Fragment of the WF-net annotated with diagnostics generated by ProM’s conformance checker.
The WF-net and event log fit well (fitness is 0.98876214). Nevertheless, several low-frequent deviations
are discovered. For example. activity “OZ12 Hertaxeren” (re-evaluation of WOZ value) is started 23 times
without being enabled according to the model.
unlike classical Business Intelligence (BI) tools – helps organizations to “look inside”
their processes. This is in stark contrast with contemporary BI tools that typically
focus on reporting and fancy looking dashboards.
8. CONCLUSION
This paper introduced process mining as a new technology enabling evidence-based
process analysis. We introduced the three basic types of process mining (discovery, con-
formance, and enhancement) using a small example and used some larger examples to
illustrate the applicability in real-life settings. Nevertheless, there are still many open
scientific challenges and most end-user organizations are not yet aware of the poten-
tial of process mining. This triggered the development of the Process Mining Manifesto
by an international task force involving 77 process mining experts representing 53 or-
ganizations. This manifesto can be obtained from http://www.win.tue.nl/ieeetfpm/.
The reader interested in process mining is also referred to the recent book on process
mining [Aalst 2011]. Also visit www.processmining.org for sample logs, videos, slides,
articles, and software.
ACKNOWLEDGMENTS
The author would like to thank all that contributed to the Process Mining Manifesto: Arya Adriansyah,
Ana Karla Alves de Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Pe-
ter van den Brand, Ronald Brandtjen, Joos Buijs, Andrea Burattin, Josep Carmona, Malu Castellanos,
Jan Claes, Jonathan Cook, Nicola Costantini, Francisco Curbera, Ernesto Damiani, Massimiliano de Leoni,
Pavlos Delias, Boudewijn van Dongen, Marlon Dumas, Schahram Dustdar, Dirk Fahland, Diogo R. Ferreira,
Walid Gaaloul , Frank van Geffen, Sukriti Goel, Christian Günther, Antonella Guzzo, Paul Harmon, Arthur
ter Hofstede, John Hoogland, Jon Espen Ingvaldsen, Koki Kato, Rudolf Kuhn, Akhil Kumar, Marcello La
Rosa, Fabrizio Maggi, Donato Malerba, Ronny Mans, Alberto Manuel, Martin McCreesh, Paola Mello, Jan
Mendling, Marco Montali, Hamid Motahari Nezhad, Michael zur Muehlen, Jorge Munoz-Gama, Luigi Pon-
tieri, Joel Ribeiro, Anne Rozinat, Hugo Seguel Pérez, Ricardo Seguel Pérez, Marcos Sepúlveda, Jim Sinur,
Pnina Soffer, Minseok Song, Alessandro Sperduti, Giovanni Stilo, Casper Stoel, Keith Swenson, Maurizio
Talamo, Wei Tan, Chris Turner, Jan Vanthienen, George Varvaressos, Eric Verbeek, Marc Verdonk, Roberto
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
Process Mining: Overview and Opportunities 99:15
Vigo, Jianmin Wang, Barbara Weber, Matthias Weidlich, Ton Weijters, Lijie Wen, Michael Westergaard, and
Moe Wynn.
REFERENCES
A ALST, W. VAN DER 2011. Process Mining: Discovery, Conformance and Enhancement of Business Processes.
Springer-Verlag, Berlin.
A ALST, W. VAN DER, A DRIANSYAH , A., AND D ONGEN, B. VAN 2012. Replaying History on Process Models
for Conformance Checking and Performance Analysis. WIREs Data Mining and Knowledge Discovery.
A ALST, W. VAN DER, D ONGEN, B., H ERBST, J., M ARUSTER , L., S CHIMM , G., AND W EIJTERS, A. 2003.
Workflow Mining: A Survey of Issues and Approaches. Data and Knowledge Engineering 47, 2, 237–
267.
A ALST, W. VAN DER, H EE , K. VAN, H OFSTEDE , A., S IDOROVA , N., V ERBEEK , H., V OORHOEVE , M., AND
W YNN, M. 2011. Soundness of Workflow Nets: Classification, Decidability, and Analysis. Formal Aspects
of Computing 23, 3, 333–363.
A ALST, W. VAN DER, R EIJERS, H., W EIJTERS, A., D ONGEN, B. VAN, M EDEIROS, A., S ONG, M., AND V ER -
BEEK , H. 2007. Business Process Mining: An Industrial Application. Information Systems 32, 5, 713–
732.
A ALST, W. VAN DER, R UBIN, V., V ERBEEK , H., D ONGEN, B. VAN, K INDLER , E., AND G ÜNTHER , C. 2010.
Process Mining: A Two-Step Approach to Balance Between Underfitting and Overfitting. Software and
Systems Modeling 9, 1, 87–111.
A ALST, W. VAN DER, S CHONENBERG, M., AND S ONG, M. 2011. Time Prediction Based on Process Mining.
Information Systems 36, 2, 450–475.
A ALST, W. VAN DER AND S TAHL , C. 2011. Modeling Business Processes: A Petri Net Oriented Approach. MIT
press, Cambridge, MA.
A ALST, W. VAN DER, W EIJTERS, A., AND M ARUSTER , L. 2004. Workflow Mining: Discovering Process Mod-
els from Event Logs. IEEE Transactions on Knowledge and Data Engineering 16, 9, 1128–1142.
A DRIANSYAH , A., D ONGEN, B. VAN, AND A ALST, W. VAN DER 2011. Conformance Checking using Cost-
Based Fitness Analysis. In IEEE International Enterprise Computing Conference (EDOC 2011), C. Chi
and P. Johnson, Eds. IEEE Computer Society, 55–64.
A GRAWAL , R., G UNOPULOS, D., AND L EYMANN, F. 1998. Mining Process Models from Workflow Logs. In
Sixth International Conference on Extending Database Technology. Lecture Notes in Computer Science
Series, vol. 1377. Springer-Verlag, Berlin, 469–483.
B ERGENTHUM , R., D ESEL , J., L ORENZ , R., AND M AUSER , S. 2007. Process Mining Based on Regions
of Languages. In International Conference on Business Process Management (BPM 2007), G. Alonso,
P. Dadam, and M. Rosemann, Eds. Lecture Notes in Computer Science Series, vol. 4714. Springer-
Verlag, Berlin, 375–383.
B OSE , R., A ALST, W. VAN DER, Z LIOBAITE , I., AND P ECHENIZKIY, M. 2011. Handling Concept Drift in Pro-
cess Mining. In International Conference on Advanced Information Systems Engineering (Caise 2011),
H. Mouratidis and C. Rolland, Eds. Lecture Notes in Computer Science Series, vol. 6741. Springer-
Verlag, Berlin, 391–405.
C OOK , J. AND W OLF, A. 1998. Discovering Models of Software Processes from Event-Based Data. ACM
Transactions on Software Engineering and Methodology 7, 3, 215–249.
C ORTADELLA , J., K ISHINEVSKY, M., L AVAGNO, L., AND YAKOVLEV, A. 1998. Deriving Petri Nets from
Finite Transition Systems. IEEE Transactions on Computers 47, 8, 859–882.
D ATTA , A. 1998. Automating the Discovery of As-Is Business Process Models: Probabilistic and Algorithmic
Approaches. Information Systems Research 9, 3, 275–301.
D ESEL , J. AND R EISIG, W. 1998. Place/Transition Nets. In Lectures on Petri Nets I: Basic Models, W. Reisig
and G. Rozenberg, Eds. Lecture Notes in Computer Science Series, vol. 1491. Springer-Verlag, Berlin,
122–173.
D ONGEN, B. VAN AND A ALST, W. VAN DER 2004. Multi-Phase Process Mining: Building Instance Graphs.
In International Conference on Conceptual Modeling (ER 2004), P. Atzeni, W. Chu, H. Lu, S. Zhou, and
T. Ling, Eds. Lecture Notes in Computer Science Series, vol. 3288. Springer-Verlag, Berlin, 362–376.
D ONGEN, B. AND A ALST, W. VAN DER 2005. Multi-Phase Mining: Aggregating Instances Graphs into EPCs
and Petri Nets. In Proceedings of the Second International Workshop on Applications of Petri Nets to
Coordination, Workflow and Business Process Management, D. Marinescu, Ed. Florida International
University, Miami, Florida, USA, 35–58.
D ONGEN, B. VAN, B USI , N., P INNA , G., AND A ALST, W. VAN DER 2007. An Iterative Algorithm for Applying
the Theory of Regions in Process Mining. In Proceedings of the Workshop on Formal Approaches to
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.
99:16 W. van der Aalst
Business Processes and Web Services (FABPWS’07), W. Reisig, K. Hee, and K. Wolf, Eds. Publishing
House of University of Podlasie, Siedlce, Poland, 36–55.
E HRENFEUCHT, A. AND R OZENBERG, G. 1989. Partial (Set) 2-Structures - Part 1 and Part 2. Acta Informat-
ica 27, 4, 315–368.
G RECO, G., G UZZO, A., P ONTIERI , L., AND S ACC À , D. 2006. Discovering Expressive Process Models by
Clustering Log Traces. IEEE Transaction on Knowledge and Data Engineering 18, 8, 1010–1027.
G ÜNTHER , C. AND A ALST, W. VAN DER 2007. Fuzzy Mining: Adaptive Process Simplification Based on
Multi-perspective Metrics. In International Conference on Business Process Management (BPM 2007),
G. Alonso, P. Dadam, and M. Rosemann, Eds. Lecture Notes in Computer Science Series, vol. 4714.
Springer-Verlag, Berlin, 328–343.
H AND, D., M ANNILA , H., AND S MYTH , P. 2001. Principles of Data Mining. MIT press, Cambridge, MA.
H ERBST, J. 2000. A Machine Learning Approach to Workflow Management. In Proceedings 11th European
Conference on Machine Learning. Lecture Notes in Computer Science Series, vol. 1810. Springer-Verlag,
Berlin, 183–194.
TFPM – IEEE T ASK F ORCE ON P ROCESS M INING. 2011. Process Mining Manifesto. In BPM Workshops.
Lecture Notes in Business Information Processing Series, vol. 99. Springer-Verlag, Berlin.
M ANYIKA , J., C HUI , M., B ROWN, B., B UGHIN, J., D OBBS, R., R OXBURGH , C., AND B YERS, A. 2011. Big
Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.
M EDEIROS, A., W EIJTERS, A., AND A ALST, W. VAN DER 2007. Genetic Process Mining: An Experimental
Evaluation. Data Mining and Knowledge Discovery 14, 2, 245–304.
M UNOZ -G AMA , J. AND C ARMONA , J. 2011. Enhancing Precision in Process Conformance: Stability, Confi-
dence and Severity. In IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011),
N. Chawla, I. King, and A. Sperduti, Eds. IEEE, Paris, France.
R OZINAT, A. AND A ALST, W. VAN DER 2006. Decision Mining in ProM. In International Conference on Busi-
ness Process Management (BPM 2006), S. Dustdar, J. Fiadeiro, and A. Sheth, Eds. Lecture Notes in
Computer Science Series, vol. 4102. Springer-Verlag, Berlin, 420–425.
R OZINAT, A. AND A ALST, W. VAN DER 2008. Conformance Checking of Processes Based on Monitoring Real
Behavior. Information Systems 33, 1, 64–95.
S OLE , M. AND C ARMONA , J. 2010. Process Mining from a Basis of Regions. In Applications and Theory of
Petri Nets 2010, J. Lilius and W. Penczek, Eds. Lecture Notes in Computer Science Series, vol. 6128.
Springer-Verlag, Berlin, 226–245.
S ONG, M. AND A ALST, W. VAN DER 2008. Towards Comprehensive Support for Organizational Mining.
Decision Support Systems 46, 1, 300–317.
V ERBEEK , H., B UIJS, J., D ONGEN, B. VAN, AND A ALST, W. VAN DER 2010. ProM 6: The Process Mining
Toolkit. In Proc. of BPM Demonstration Track 2010, M. L. Rosa, Ed. CEUR Workshop Proceedings
Series, vol. 615. 34–39.
W EIJTERS, A. AND A ALST, W. VAN DER 2003. Rediscovering Workflow Models from Event-Based Data using
Little Thumb. Integrated Computer-Aided Engineering 10, 2, 151–162.
W ERF, J., D ONGEN, B. VAN, H URKENS, C., AND S EREBRENIK , A. 2010. Process Discovery using Integer
Linear Programming. Fundamenta Informaticae 94, 387–412.
W ESKE , M. 2007. Business Process Management: Concepts, Languages, Architectures. Springer-Verlag,
Berlin.
ACM Transactions on Management Information Systems, Vol. 99, No. 99, Article 99, Publication date: February 2012.