0% found this document useful (0 votes)
66 views

Machine Learning in IT Service Management

This document discusses using machine learning to predict incident resolution times in IT service management. The authors propose an infrastructure incident prediction model based on machine learning technologies to reduce resolution times. They analyze the accuracy of a predictive model derived from real service desk incident data that can assist with predicting resolution times for a large set of incidents. The model allows for detection of delays in incident resolution, which could help improve customer experience and service processes.

Uploaded by

Hema K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Machine Learning in IT Service Management

This document discusses using machine learning to predict incident resolution times in IT service management. The authors propose an infrastructure incident prediction model based on machine learning technologies to reduce resolution times. They analyze the accuracy of a predictive model derived from real service desk incident data that can assist with predicting resolution times for a large set of incidents. The model allows for detection of delays in incident resolution, which could help improve customer experience and service processes.

Uploaded by

Hema K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Procedia
Available Computer
online Science 00 (2019) 000–000
at www.sciencedirect.com
Procedia Computer Science 00 (2019) 000–000 www.elsevier.com/locate/procedia

ScienceDirect www.elsevier.com/locate/procedia

Procedia Computer Science 145 (2018) 675–679

Postproceedings of the 9th Annual International Conference on Biologically Inspired Cognitive


Postproceedings of the 9thBICA
Architectures, Annual International
2018 Conference
(Ninth Annual MeetingonofBiologically Inspired Cognitive
the BICA Society)
Architectures, BICA 2018 (Ninth Annual Meeting of the BICA Society)
Machine Learning in IT Service Management
Machine Learning in IT Service Management
Dmitry Zueva, Alexey Kalistratovb,*, Andrey Zuevb
Dmitry Zueva, Alexey Kalistratovb,*, Andrey Zuevb
a
Beijing Institute of Technology, 5 South Zhongguancun Street, Beijing, 100081, China
b
Bauman
a Moscow
Beijing StateofTechnical
Institute University,
Technology, 5 Southul. BaumanskayaStreet,
Zhongguancun 2-ya, Beijing,
5/1, Moscow, 105005,
100081, ChinaRussia
b
Bauman Moscow State Technical University, ul. Baumanskaya 2-ya, 5/1, Moscow, 105005, Russia

Abstract
Abstract
IT Service Management (ITSM) is a variety of activities directed towards maintenance of IT infrastructure. Hence, it is considered
to
IT be an important
Service Managementactivity for any
(ITSM) is acompany,
variety ofeven to one
activities not related
directed to IT.
towards Time of incidents’
maintenance resolution isHence,
of IT infrastructure. the key performance
it is considered
indicator for ITSM.activity
to be an important To reduceforresolution
any company,time, even
authors
to propose infrastructure
one not related to IT. incident
Time of prediction
incidents’ model. Model
resolution is based
is the on machine
key performance
learning
indicatortechnologies.
for ITSM. To Application of machine
reduce resolution learning
time, authors modelsinfrastructure
propose in ITSM allows significant
incident predictionimproves
model.inModel
customer experience
is based and
on machine
handling issues more efficiently,
learning technologies. Application decreasing
of machineservice deskmodels
learning agents’inefforts
ITSMand reducing
allows serviceimproves
significant costs. in customer experience and
This paper
handling aimsmore
issues to propose predictive
efficiently, method
decreasing of the
service incident
desk agents’resolution
efforts andtime estimation.
reducing serviceProposed
costs. model derives insights from
incident dataaims
This paper and to
predicts
proposeestimated
predictivetime of resolution,
method allowingresolution
of the incident detection time
of incident resolution’s
estimation. de-lays.
Proposed model derives insights from
Authors analyzed
incident data prediction
and predicts accuracy,
estimated timeand
of derived results
resolution, demonstrating
allowing detection ofthat model resolution’s
incident can assist with prediction for a large set of
de-lays.
incidents using dataset
Authors analyzed basedaccuracy,
prediction on a real andservice
deriveddesk incident
results data. Additionally,
demonstrating that model canits practical useprediction
assist with is applicable for service
for a large set of
improvement
incidents usingprocess.
dataset based on a real service desk incident data. Additionally, its practical use is applicable for service
improvement process.
© 2019 The Authors. Published by Elsevier B.V.
© 2018
This The
is an Authors.
open Published
accessPublished by Elsevier
article under B.V.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
© 2019 The Authors. by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review
This is an under
open responsibility
access article underof the
the CC
scientific committee
BY-NC-ND of the
license 9th Annual International Conference on Biologically Inspired
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired
Cognitive
Peer-reviewArchitectures.
under responsibility
Cognitive Architectures. of the scientific committee of the 9th Annual International Conference on Biologically Inspired
Cognitive Architectures.
Keywords: machine learning, information technology services management, gradient boosting decision trees
Keywords: machine learning, information technology services management, gradient boosting decision trees

* Corresponding author. Tel.: +7-499-263-6391; fax: +7-499-267-4844.


* E-mail address:author.
Corresponding [email protected]
Tel.: +7-499-263-6391; fax: +7-499-267-4844.
E-mail address: [email protected]
1877-0509 © 2019 The Authors. Published by Elsevier B.V.
This is an open
1877-0509 access
© 2019 The article under
Authors. the CC by
Published BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Elsevier B.V.
Peer-review under
This is an open responsibility
access of the
article under thescientific committee
CC BY-NC-ND of the(https://creativecommons.org/licenses/by-nc-nd/4.0/)
license 9th Annual International Conference on Biologically Inspired C ognitive
Architectures.
Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired C ognitive
Architectures.

1877-0509 © 2018 The Authors. Published by Elsevier B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired Cognitive
Architectures.
10.1016/j.procs.2018.11.063
676 Dmitry Zuev et al. / Procedia Computer Science 145 (2018) 675–679
2 Dmitry Zuev, Alexey Kalistratov, Andrey Zuev / Procedia Computer Science 00 (2019) 000–000

1. Introduction

IT Service Management (ITSM) is a variety of activities directed towards maintenance of IT infrastructure. Hence,
it is an important activity for any company, even to one not related to IT [1]. Incident management (IM) is an ITSM
process area. The first goal of the IM process is to restore a normal service operation as quickly as possible and to
minimize the impact on business operations, thus ensuring that the best possible levels of service quality and
availability are maintained. Normal service operation is defined here as service operation within service-level
agreement (SLA). SLA is a document that describes the performance criteria a provider promises to meet while
delivering a service [2]. This agreement also sets out the remedial actions and any penalties that will take effect if
performance falls below the promised standard. It is an essential component of the legal contract between a service
consumer and the provider. Time of incidents’ resolution is the key performance indicator for IM. Ticket management
system store lots of data: ticket creation time, SLA criterias, assigned specialist, ticket priority and its impact, etc. Next
in this paper we will assume that SLA is time agreed for the ticket to be solved. It gives us more chances to predict
potential SLA violation in such condition.
In this paper we use dataset of ITSM of fast food restaurant chain. The objective is to develop a predictive model,
that can help improve quality of IT service.
The evaluation method is F1 score. It’s a harmonic mean of the precision and recall rate. Prediction and recall are
the common metrics used in classification. F1 score will take into account both the precision and recall.
The paper is organized as following. Section 2 introduces features generated through analysis, Section 3 describes
further features’ analysis and compares sampling techniques. Section 4 introduces the model that is used to generate
the final predictions. Paper is concluded with future works direction.

2. Feature engineering

Feature engineering is an essential part of building of any intelligent system, and this process is both difficult and
expensive. To obtain the training dataset, we first pre-process incoming raw data: filter not closed tickets, transform
duration fields from string format to integer (they represent duration in minutes), convert date-time string to date-time
object and encode categorical fields to integers.
To label a training instance, we use “1” to represent the ticket SLA was violated, this case is called positive instance;
“0” represents the situation when ticket SLA was not violated, and we note this case as a negative instance.
The feature engineering is about the patterns of ticket, specialist, ticket’s type individually and the combination of
the interaction patterns between two of the them [3]. The architecture of the features is below:
 Ticket feature:
o Derived features
 Specialist feature:
o Basic aggregated features
o Probability of violation
 Category feature:
o Basic aggregated features
o Probability of violation
 Category-Specialist feature:
o Basic aggregated features
o Probability of violation
To make the prediction precise, it’s very important to depict a ticket thoroughly [4,5]. To do it well, we studied
cleaned data and then found that week information and time of creation are very important for the final result. For
example, most of the tickets which were created at weekend were resolved with violations.
After additional analysis we listed the following key features:
 Ticket creation day of week
 Ticket creation day is day off
Dmitry Zuev et al. / Procedia Computer Science 145 (2018) 675–679 677
Dmitry Zuev, Alexey Kalistratov, Andrey Zuev / Procedia Computer Science 00 (2019) 000–000 3

 Ticket creation hour


 Ticket SLA deadline day of week
 Ticket SLA deadline day is day off
 Ticket SLA deadline hour
The features are summarized in 2 aspects: the basic aggregated features and their ratio as probabilities. The raw
count of the behaviors in a selected set of periods could provide the information about the work pattern of a particular
specialist. It is also a basic preparation stage for the other complex features that will be used in our analysis. So, let’s
describe features as the following:

 Specialist resolved tickets total count in last n days:


Ws = ∑d∈Dates ∑t∈Ticketss 1 (1)

where Dates is a set of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠 is a set of tickets in day d closed by specialist s.

 Specialist SLA violated tickets total count in last n days:


𝑉𝑉𝑉𝑉𝑠𝑠 = ∑d∈Dates ∑t∈Ticketss 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) (2)

where isViolated is a Boolean function indicating that the ticket t resolution time violated the SLA, Dates is a set
of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠 is a set of tickets in day d resolved by specialist s.

 Hence, probability of SLA violation in last n days:


𝑉𝑉𝑊𝑊𝑠𝑠
𝑃𝑃𝑠𝑠 = (3)
𝑊𝑊𝑠𝑠

Besides of the specialist’s and ticket’s features, listed above, raw average of resolve duration and its ratio to SLA
in a selected set of categories could provide the information about the pattern of the ticket.

 Average tickets’ resolution time of category in last n days:


∑d∈Dates ∑tϵTickets 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑡𝑡
d,c
Ac = ∑d∈Dates ∑tϵTickets
(4)
1
d,c

where 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑡𝑡 is a duration of resolution of ticket t in minutes, Dates is a set of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑑𝑑,𝑐𝑐 is a
set of tickets in day d in category c.

 SLA duration to average tickets’ resolution time of category ratio in last n days:
𝐴𝐴𝐶𝐶
𝑅𝑅𝑐𝑐 = (5)
𝑆𝑆𝑆𝑆𝑆𝑆𝑐𝑐

where 𝑆𝑆𝑆𝑆𝑆𝑆𝑐𝑐 is ticket SLA duration for ticket in category c in minutes.

The consideration of specialist-category’s features is similar to specialist’s. In the following section, we will first
assume that basic aggregated features have been extracted and then describe the details of features that are mostly
special and important to the prediction problem.

 Specialist resolved tickets total count of category in last n days:

𝑊𝑊𝑠𝑠,𝑐𝑐 = ∑d∈Dates ∑t∈Ticketss,c 1 (6)

where Dates is a set of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠,𝑐𝑐 is a set of tickets in day d of category c closed by specialist s.

 Specialist SLA violated tickets total count of category in last n days:


𝑉𝑉𝑉𝑉𝑠𝑠,𝑐𝑐 = ∑d∈Dates ∑t∈Ticketss,c 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖(𝑡𝑡) (7)
678 Dmitry Zuev et al. / Procedia Computer Science 145 (2018) 675–679
4 Dmitry Zuev, Alexey Kalistratov, Andrey Zuev / Procedia Computer Science 00 (2019) 000–000

where isViolated is a Boolean function indicating that the ticket t resolution duration violated SLA, Dates is a set
of last n days and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑠𝑠,𝑐𝑐 is a set of tickets in day d of category c closed by specialist s.

 Probability of SLA violation of category in last n days:


𝑉𝑉𝑊𝑊𝑠𝑠,𝑐𝑐
𝑃𝑃𝑠𝑠,𝑐𝑐 = (8)
𝑊𝑊𝑠𝑠,𝑐𝑐

3. Feature analysis and selection

Feature analysis is another key issue before applying learning algorithms. The central premise when using a
feature selection technique is that the data contains many features that are either redundant or irrelevant, and can
thus be removed without incurring much loss of information. As we know more about decision tree model, we start
to learn the distribution of the values among features in probability theory. The tools provided by Pandas could
provide such distribution information.
Firstly, we extract features to each day in data, then use last two days for test and the other days for train. For whole
data set, we further divide it into positive data set and negative data set. Then, we calculate some statistical quantity,
like min/max value, mean, variance, skewness etc. Secondly, we compare the values between positive and negative
data set, and then followed by comparison be-tween train and test data set in each of positive and negative data set.
The former experiment can provide information about how clear the separation between positive and negative is.

Fig. 1. Factors importance chart.

Building factors importance chart helps us in visualizing all SLA failure influencing factors and finding
anomalies. Beside simple statistical analysis of feature we use random forests for feature selection. Random forests
are among the most popular machine learning methods thanks to their relatively good accuracy, robustness and ease
of use. They also provide two straightforward methods for feature selection: mean decrease impurity and mean
decrease accuracy. Most features mentioned above, except “impact level”, “VIP indicator”, “Ticket type” and “Task
Type”, are stable in this type of test.
Dmitry Zuev et al. / Procedia Computer Science 145 (2018) 675–679 679
Dmitry Zuev, Alexey Kalistratov, Andrey Zuev / Procedia Computer Science 00 (2019) 000–000 5

4. Model

During the period of the development, we have explored many different learning models, like Naive Bayesian
classifier, Logistic Regression and Gradient Boosting Decision Trees model [6]. In our experiments, Gradient
Boosting Decision Trees model perform much better than others, so only the GBDT model is selected as our final
model. In the following section, we will introduce the model we used and parameter tuning of the model. Gradient
Boosting Decision Trees is a hot topic among data competition in recent years. GBDT is a tree-based boosting
learning algorithm [7]. We use the classifier provided in Sklearn library. The optimization goal of GBDT is to
optimize deviance which is same as logistic regression. Unlike Random Forests, GBDT combines different tree
estimators in a boosting way. That means a GBDT model is built sequentially by using weak decision tree learners
on reweighed data learning the deviance. Then it sums up the prediction score of built trees with a given learning
rate to generate a powerful learner. We use the following parameters to train the gradient boosting trees:

 Learning rate: 0.5


 Maximum tree depth: 32
 Minimal leaf size: 1
 Minimal samples split: 0.1
 Number of estimators: 60
 Subsample: 1.0
 Number of features:√𝑁𝑁

The first issue is unbalanced label ratio of the training data. The positives only account for 9.86% of the total
records. To fix this problem, we try 2 sampling strategies. The first is random under sampling of the training part of
data set. The second is SMOTE oversampling with equal classes sizes. The experiment shows that the latter is better
and is chosen as the final sampling method. Prediction result is evaluated in F1 score and the score is 82.39% with
precision as 82.32% and recall as 82.47%.

5. Conclusion

In this paper, we presented our prediction model for SLA violation in ITSM of fast food restaurant chain. The paper
focus on the feature engineering and model engineering. In the feature engineering section, we propose a set of basic
aggregated features and some not obvious derived features. The derived features dramatically improve the overall
models and could be further used in other prediction cases. We also use advanced oversampling technique to improve
the models. It improves the diversity of models and result is evaluated in a more comprehensive manner. The latter
improve the accuracy of the prediction.

References

[1] Hochstein, A., Zarnekow, R., & Brenner, W. (2005, March). ITIL as common practice reference model for IT service management: formal
assessment and implications for practice. In e-Technology, e-Commerce and e-Service, 2005. EEE'05. Proceedings. The 2005 IEEE
International Conference on (pp. 704-710). IEEE.
[2] Malega, P. (2014). Escalation management as the necessary form of incident management process. J Emerg Trends Comput Inf Sci, 5(6), 641-
646.
[3] Li, D., Zhao G., Wang, Z., Ma, W., & Liu, Y. (2015) "A Method of Purchase Prediction Based on User Behavior Log", IEEE International
Conference on Data Mining Workshop, pp. 1031-1039.
[4] Feature selection. Retrieved from: https://en.wikipedia.org/wiki/Feature_selection
[5] Seide, F., Li, G., Chen, X., & Yu, D. (2011, December). Feature engineering in context-dependent deep neural networks for conversational
speech transcription. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on (pp. 24-29). IEEE.
[6] Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217-222.
[7] Ye, J., Chow, J. H., Chen, J., & Zheng, Z. (2009, November). Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th
ACM conference on Information and knowledge management (pp. 2061-2064). ACM.

You might also like