081 ICPMDemos2020 VDD AVisualDriftDetectionSystemforProcessMining POSTPRINT
081 ICPMDemos2020 VDD AVisualDriftDetectionSystemforProcessMining POSTPRINT
Process Mining
Anton Yeshchenko, Jan Mendling Claudio Di Ciccio Artem Polyvyanyy
Vienna University of Economics and Business Sapienza University of Rome The University of Melbourne
Vienna, Austria Rome, Italy Melbourne, Australia
[email protected] [email protected] [email protected]
two drift points. Drift Map together with Drift Charts, autocorrelation plots,
We also compute the values of measures called spread and stationarity tests. In the chosen cluster 18, the system
of constraints and erratic measure to quantify the extent of automatically identifies two sudden drifts as shown in the
the drifting behavior [14]. The spread of constraints (shown Drift Chart (Fig. 2(b)). To check for incremental drifts, we
in Fig. 2(e)) intuitively indicates how variable and subject to inspect the results of the stationarity test (shown in Fig. 2(f)).
change the event log is. The measure ranges from 0 to 1: the For the chosen behavior cluster, the VDD system reports no
more the behavior changes over time, the higher the value incremental drift. Figure 2(c) depicts an autocorrelation plot
gets. In the Sepsis log, the measured spread of constraints is that shows how the time series correlates with itself with a
0.247, which indicates a relatively small rate of change in the step defined in the y-axis. The blue area on this plot shows
behavior. The erratic measure (shown in Fig. 2(d)) shows how the significant region of the analysis. Cluster 18 reveals an
a chosen cluster (Fig. 2(i)) compares to the cluster with the autocorrelation on step 2, meaning that the drift shows signs
maximum degree of change in the same log. of seasonality – thus being classifiable as a reoccurring drift.
4) Drift type detection 5) Understanding the drift behavior
In this step, we use a range of methods to analyze drift types To get an understanding of the effect of drifts on the process
(as those shown in Fig. 1) and visualize them in the connected behavior, we visually represent the general behavior found in
views. We use multi-variate time series change point detection the log extended with specific behavior shown in a chosen
algorithms to detect sudden drifts. In particular, we resort on behavior cluster. In particular, we use the gathered information
the Pruned Exact Linear Time (PELT) algorithm [20] to detect on the measured D ECLARE constraints in a behavior cluster
change points in the whole multi-variate time series as well and draw it on top of Directly-Follows graphs [12] such
as within the behavior clusters. Thereupon, we make use of as the one in Fig. 2(g). A Directly-Follows graph connects
the stationarity analysis in ensemble with the visual inspection via arcs the activities (nodes) with those other activities
of Drift Charts to highlight gradual and incremental drifts. that followed at least once in a trace. Arcs are weighted
With the aid of autocorrelation plots, we seek for the behavior by the number of such sequences. Nodes are weighted by
clusters exposing reoccurring drifts. the frequency with which the related activities occur in the
To show the results of this step, we resort on a mix of log. The Directly-Follows graph depicts the behavior that
graphical and numerical representations: the aforementioned is common to the entire event log. We add arcs highlighted
with different colors that represent additional D ECLARE, Council Discovery Project DP180102839. Claudio Di Ciccio
cluster-specific constraints. Negative D ECLARE constraints are is partly supported by the MIUR under grant “Dipartimenti
colored in red. Chaining constraints are in green. All other di eccellenza 2018-2022” of the Department of Computer
relationships are in blue. For cluster 18 we see from Fig. 2(g) Science of Sapienza University of Rome. Anton Yeshchenko
that activities Release C and Leucocytes occur in sequence, thanks Maryna Zadoianchuk and Oleksii Tkachenko for their
bound by the C HAIN P RECEDENCEpRelease C, Leucocytesq assistance during the development of the web application.
constraint. Furthermore, P RECEDENCEpRelease C, IV Liquidq
and P RECEDENCEpRelease C, IV Antibioticsq suggest that
R EFERENCES
IV Liquid and IV Antibiotics require Release C to occur before, [1] W. M. P. van der Aalst, Process Mining - Data Science in Action.
Springer, 2016.
unlike in the general behavior. [2] M. L. van Eck, X. Lu, S. J. J. Leemans, and W. M. P. van der Aalst,
“PM ˆ2 : A process mining project methodology,” in CAiSE. Springer,
III. M ATURITY, D OCUMENTATION AND S CREENCAST 2015, pp. 297–313.
[3] R. Moreno and R. E. Mayer, “Visual presentations in multimedia learning:
We implemented the VDD system as a Python-based stand- Conditions that overload visual working memory,” in VISUAL, D. P.
alone program for command line execution, and as a web Huijsmans and A. W. M. Smeulders, Eds. Springer, 1999, pp. 793–800.
application with back-end and front-end parts. The algorithms [4] A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar, “Detecting sudden
and gradual drifts in business processes from execution traces,” IEEE
are implemented using Python 3, resorting on the scipy library TKDE, vol. 29, no. 10, pp. 2140–2154, 2017.
for time-series clustering and on the ruptures library for [5] C. Zheng, L. Wen, and J. Wang, “Detecting process concept drifts from
change point identification. We use PM4Py2 [21] for the event logs,” in OTM. Springer, 2017, pp. 524–542.
[6] A. Ostovar, S. J. J. Leemans, and M. L. Rosa, “Robust drift
Directly-Follows Graph visualization. We use the MINERful3 characterization from event streams of business processes,” ACM Trans.
Java package for the discovery and measuring of D ECLARE Knowl. Discov. Data, vol. 14, no. 3, pp. 30:1–30:57, 2020. [Online].
constraints [10]. The front-end of the tool is implemented with Available: https://doi.org/10.1145/3375398
[7] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A
the React JavaScript library. The back-end is implemented with survey on concept drift adaptation,” ACM Comput. Surv., vol. 46, no. 4,
flask python library. We run our experiments using a laptop pp. 44:1–44:37, 2014.
equipped with an Intel Core i5 at 2.40GHz 2 with 8GB [8] C. Ware, Information visualization: perception for design. Elsevier,
2012.
of RAM. With this modest hardware, the tool was able to [9] W. M. P. van der Aalst and M. Pesic, “DecSerFlow: Towards a truly
process data and produce the analysis outcome in about 17 declarative service flow language,” in WS-FM, ser. Lecture Notes in
seconds using a real-size event log with 15 214 events from 16 Computer Science, vol. 4184. Springer, 2006, pp. 1–23.
[10] C. Di Ciccio and M. Mecella, “On the discovery of declarative control
activities over 1050 traces. This indicates that the VDD system flows for artful processes,” ACM TMIS, vol. 5, no. 4, pp. 24:1–24:37,
has reached a fairly large degree of maturity as it performs 2015.
well in terms of scalability. [11] G. C. Reinsel, Elements of multivariate time series analysis. Springer,
1993.
We have created a project website for the VDD [12] S. J. Leemans, D. Fahland, and W. M. van der Aalst, “Discovering block-
system, from which it can be downloaded together structured process models from event logs - A constructive approach,”
with its sources at https://github.com/yesanton/ in PETRI NETS. Springer, 2013, pp. 311–329.
[13] F. Mannhardt and D. Blinde, “Analyzing the trajectories of patients with
Process-Drift-Visualization-With-Declare. It is free for sepsis using process mining,” in BPMDS/EMMSAD. CEUR-WS.org,
academic and non-commercial use under the MIT license. 2017, pp. 72–80.
On the project website, we provide documentation on its [14] A. Yeshchenko, C. Di Ciccio, J. Mendling, and A. Polyvyanyy, “Compre-
hensive process drift detection with visual analytics,” in ER. Springer,
installation and first run. The web tool with a graphical 2019, in print.
interface is also available at https://yesanton.github.io/driftvis, [15] A. Yeshchenko, C. D. Ciccio, J. Mendling, and A. Polyvyanyy, “Com-
to be used for testing without the need to install the software prehensive process drift analysis with the visual drift detection tool,” in
ER Demos. CEUR-WS.org, 2019, pp. 108–112.
on a local machine. A screencast documenting its usage is [16] “IEEE standard for extensible event stream (xes) for achieving
available at https://youtu.be/mHOgVBZ4Imc. The GitHub interoperability in event logs and event streams,” pp. 1–50, Nov 2016.
project page contains the step by step tutorial of how to [Online]. Available: http://dx.doi.org/10.1109/IEEESTD.2016.7740858
[17] W. M. P. van der Aalst and M. Pesic, “DecSerFlow: Towards a truly
use the web-based tool. It is available at https://github. declarative service flow language,” in WS-FM. Springer, 2006, pp. 1–23.
com/yesanton/Process-Drift-Visualization-With-Declare/blob/ [18] J. Adamo, Data mining for association rules and sequential patterns -
master/publications/icpm-2020-demo-tutorial.pdf sequential and parallel algorithms, J. Adamo, Ed. Springer New York,
2001.
In future work, we will focus on the prediction of drifts in [19] S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series
running processes and the improvements of the interactivity of clustering - a decade review,” IS, vol. 53, no. C, pp. 16–38, Oct. 2015.
the visualization system. Furthermore, we will conduct user [20] R. Killick, P. Fearnhead, and I. A. Eckley, “Optimal detection of
changepoints with a linear computational cost,” Journal of the American
studies to assess the perceived quality of the tool. Statistical Association, vol. 107, no. 500, pp. 1590–1598, 2012.
Acknowledgements. [21] A. Berti, S. J. van Zelst, and W. M. P. van der Aalst, “Process mining for
python (pm4py): Bridging the gap between process- and data science,”
This work is partially funded by the EU H2020 program CoRR, vol. abs/1905.06169, 2019.
under MSCA-RISE agreement 645751 (RISE BPM). Artem
Polyvyanyy is partly supported by the Australian Research
2 http://pm4py.org, https://github.com/pm4py
3 https://github.com/cdc08x/MINERful