Machine Learning in IT Service Management
Machine Learning in IT Service Management
com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Procedia
Available Computer
online Science 00 (2019) 000–000
at www.sciencedirect.com
Procedia Computer Science 00 (2019) 000–000 www.elsevier.com/locate/procedia
ScienceDirect www.elsevier.com/locate/procedia
Abstract
Abstract
IT Service Management (ITSM) is a variety of activities directed towards maintenance of IT infrastructure. Hence, it is considered
to
IT be an important
Service Managementactivity for any
(ITSM) is acompany,
variety ofeven to one
activities not related
directed to IT.
towards Time of incidents’
maintenance resolution isHence,
of IT infrastructure. the key performance
it is considered
indicator for ITSM.activity
to be an important To reduceforresolution
any company,time, even
authors
to propose infrastructure
one not related to IT. incident
Time of prediction
incidents’ model. Model
resolution is based
is the on machine
key performance
learning
indicatortechnologies.
for ITSM. To Application of machine
reduce resolution learning
time, authors modelsinfrastructure
propose in ITSM allows significant
incident predictionimproves
model.inModel
customer experience
is based and
on machine
handling issues more efficiently,
learning technologies. Application decreasing
of machineservice deskmodels
learning agents’inefforts
ITSMand reducing
allows serviceimproves
significant costs. in customer experience and
This paper
handling aimsmore
issues to propose predictive
efficiently, method
decreasing of the
service incident
desk agents’resolution
efforts andtime estimation.
reducing serviceProposed
costs. model derives insights from
incident dataaims
This paper and to
predicts
proposeestimated
predictivetime of resolution,
method allowingresolution
of the incident detection time
of incident resolution’s
estimation. de-lays.
Proposed model derives insights from
Authors analyzed
incident data prediction
and predicts accuracy,
estimated timeand
of derived results
resolution, demonstrating
allowing detection ofthat model resolution’s
incident can assist with prediction for a large set of
de-lays.
incidents using dataset
Authors analyzed basedaccuracy,
prediction on a real andservice
deriveddesk incident
results data. Additionally,
demonstrating that model canits practical useprediction
assist with is applicable for service
for a large set of
improvement
incidents usingprocess.
dataset based on a real service desk incident data. Additionally, its practical use is applicable for service
improvement process.
© 2019 The Authors. Published by Elsevier B.V.
© 2018
This The
is an Authors.
open Published
accessPublished by Elsevier
article under B.V.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
© 2019 The Authors. by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review
This is an under
open responsibility
access article underof the
the CC
scientific committee
BY-NC-ND of the
license 9th Annual International Conference on Biologically Inspired
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired
Cognitive
Peer-reviewArchitectures.
under responsibility
Cognitive Architectures. of the scientific committee of the 9th Annual International Conference on Biologically Inspired
Cognitive Architectures.
Keywords: machine learning, information technology services management, gradient boosting decision trees
Keywords: machine learning, information technology services management, gradient boosting decision trees
1. Introduction
IT Service Management (ITSM) is a variety of activities directed towards maintenance of IT infrastructure. Hence,
it is an important activity for any company, even to one not related to IT [1]. Incident management (IM) is an ITSM
process area. The first goal of the IM process is to restore a normal service operation as quickly as possible and to
minimize the impact on business operations, thus ensuring that the best possible levels of service quality and
availability are maintained. Normal service operation is defined here as service operation within service-level
agreement (SLA). SLA is a document that describes the performance criteria a provider promises to meet while
delivering a service [2]. This agreement also sets out the remedial actions and any penalties that will take effect if
performance falls below the promised standard. It is an essential component of the legal contract between a service
consumer and the provider. Time of incidents’ resolution is the key performance indicator for IM. Ticket management
system store lots of data: ticket creation time, SLA criterias, assigned specialist, ticket priority and its impact, etc. Next
in this paper we will assume that SLA is time agreed for the ticket to be solved. It gives us more chances to predict
potential SLA violation in such condition.
In this paper we use dataset of ITSM of fast food restaurant chain. The objective is to develop a predictive model,
that can help improve quality of IT service.
The evaluation method is F1 score. It’s a harmonic mean of the precision and recall rate. Prediction and recall are
the common metrics used in classification. F1 score will take into account both the precision and recall.
The paper is organized as following. Section 2 introduces features generated through analysis, Section 3 describes
further features’ analysis and compares sampling techniques. Section 4 introduces the model that is used to generate
the final predictions. Paper is concluded with future works direction.
2. Feature engineering
Feature engineering is an essential part of building of any intelligent system, and this process is both difficult and
expensive. To obtain the training dataset, we first pre-process incoming raw data: filter not closed tickets, transform
duration fields from string format to integer (they represent duration in minutes), convert date-time string to date-time
object and encode categorical fields to integers.
To label a training instance, we use “1” to represent the ticket SLA was violated, this case is called positive instance;
“0” represents the situation when ticket SLA was not violated, and we note this case as a negative instance.
The feature engineering is about the patterns of ticket, specialist, ticket’s type individually and the combination of
the interaction patterns between two of the them [3]. The architecture of the features is below:
Ticket feature:
o Derived features
Specialist feature:
o Basic aggregated features
o Probability of violation
Category feature:
o Basic aggregated features
o Probability of violation
Category-Specialist feature:
o Basic aggregated features
o Probability of violation
To make the prediction precise, it’s very important to depict a ticket thoroughly [4,5]. To do it well, we studied
cleaned data and then found that week information and time of creation are very important for the final result. For
example, most of the tickets which were created at weekend were resolved with violations.
After additional analysis we listed the following key features:
Ticket creation day of week
Ticket creation day is day off
Dmitry Zuev et al. / Procedia Computer Science 145 (2018) 675–679 677
Dmitry Zuev, Alexey Kalistratov, Andrey Zuev / Procedia Computer Science 00 (2019) 000–000 3
where Dates is a set of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠 is a set of tickets in day d closed by specialist s.
where isViolated is a Boolean function indicating that the ticket t resolution time violated the SLA, Dates is a set
of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠 is a set of tickets in day d resolved by specialist s.
Besides of the specialist’s and ticket’s features, listed above, raw average of resolve duration and its ratio to SLA
in a selected set of categories could provide the information about the pattern of the ticket.
where 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑡𝑡 is a duration of resolution of ticket t in minutes, Dates is a set of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑑𝑑,𝑐𝑐 is a
set of tickets in day d in category c.
SLA duration to average tickets’ resolution time of category ratio in last n days:
𝐴𝐴𝐶𝐶
𝑅𝑅𝑐𝑐 = (5)
𝑆𝑆𝑆𝑆𝑆𝑆𝑐𝑐
The consideration of specialist-category’s features is similar to specialist’s. In the following section, we will first
assume that basic aggregated features have been extracted and then describe the details of features that are mostly
special and important to the prediction problem.
where Dates is a set of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠,𝑐𝑐 is a set of tickets in day d of category c closed by specialist s.
where isViolated is a Boolean function indicating that the ticket t resolution duration violated SLA, Dates is a set
of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠,𝑐𝑐 is a set of tickets in day d of category c closed by specialist s.
Feature analysis is another key issue before applying learning algorithms. The central premise when using a
feature selection technique is that the data contains many features that are either redundant or irrelevant, and can
thus be removed without incurring much loss of information. As we know more about decision tree model, we start
to learn the distribution of the values among features in probability theory. The tools provided by Pandas could
provide such distribution information.
Firstly, we extract features to each day in data, then use last two days for test and the other days for train. For whole
data set, we further divide it into positive data set and negative data set. Then, we calculate some statistical quantity,
like min/max value, mean, variance, skewness etc. Secondly, we compare the values between positive and negative
data set, and then followed by comparison be-tween train and test data set in each of positive and negative data set.
The former experiment can provide information about how clear the separation between positive and negative is.
Building factors importance chart helps us in visualizing all SLA failure influencing factors and finding
anomalies. Beside simple statistical analysis of feature we use random forests for feature selection. Random forests
are among the most popular machine learning methods thanks to their relatively good accuracy, robustness and ease
of use. They also provide two straightforward methods for feature selection: mean decrease impurity and mean
decrease accuracy. Most features mentioned above, except “impact level”, “VIP indicator”, “Ticket type” and “Task
Type”, are stable in this type of test.
Dmitry Zuev et al. / Procedia Computer Science 145 (2018) 675–679 679
Dmitry Zuev, Alexey Kalistratov, Andrey Zuev / Procedia Computer Science 00 (2019) 000–000 5
4. Model
During the period of the development, we have explored many different learning models, like Naive Bayesian
classifier, Logistic Regression and Gradient Boosting Decision Trees model [6]. In our experiments, Gradient
Boosting Decision Trees model perform much better than others, so only the GBDT model is selected as our final
model. In the following section, we will introduce the model we used and parameter tuning of the model. Gradient
Boosting Decision Trees is a hot topic among data competition in recent years. GBDT is a tree-based boosting
learning algorithm [7]. We use the classifier provided in Sklearn library. The optimization goal of GBDT is to
optimize deviance which is same as logistic regression. Unlike Random Forests, GBDT combines different tree
estimators in a boosting way. That means a GBDT model is built sequentially by using weak decision tree learners
on reweighed data learning the deviance. Then it sums up the prediction score of built trees with a given learning
rate to generate a powerful learner. We use the following parameters to train the gradient boosting trees:
The first issue is unbalanced label ratio of the training data. The positives only account for 9.86% of the total
records. To fix this problem, we try 2 sampling strategies. The first is random under sampling of the training part of
data set. The second is SMOTE oversampling with equal classes sizes. The experiment shows that the latter is better
and is chosen as the final sampling method. Prediction result is evaluated in F1 score and the score is 82.39% with
precision as 82.32% and recall as 82.47%.
5. Conclusion
In this paper, we presented our prediction model for SLA violation in ITSM of fast food restaurant chain. The paper
focus on the feature engineering and model engineering. In the feature engineering section, we propose a set of basic
aggregated features and some not obvious derived features. The derived features dramatically improve the overall
models and could be further used in other prediction cases. We also use advanced oversampling technique to improve
the models. It improves the diversity of models and result is evaluated in a more comprehensive manner. The latter
improve the accuracy of the prediction.
References
[1] Hochstein, A., Zarnekow, R., & Brenner, W. (2005, March). ITIL as common practice reference model for IT service management: formal
assessment and implications for practice. In e-Technology, e-Commerce and e-Service, 2005. EEE'05. Proceedings. The 2005 IEEE
International Conference on (pp. 704-710). IEEE.
[2] Malega, P. (2014). Escalation management as the necessary form of incident management process. J Emerg Trends Comput Inf Sci, 5(6), 641-
646.
[3] Li, D., Zhao G., Wang, Z., Ma, W., & Liu, Y. (2015) "A Method of Purchase Prediction Based on User Behavior Log", IEEE International
Conference on Data Mining Workshop, pp. 1031-1039.
[4] Feature selection. Retrieved from: https://en.wikipedia.org/wiki/Feature_selection
[5] Seide, F., Li, G., Chen, X., & Yu, D. (2011, December). Feature engineering in context-dependent deep neural networks for conversational
speech transcription. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on (pp. 24-29). IEEE.
[6] Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217-222.
[7] Ye, J., Chow, J. H., Chen, J., & Zheng, Z. (2009, November). Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th
ACM conference on Information and knowledge management (pp. 2061-2064). ACM.