0% found this document useful (0 votes)
15 views

A Model-Driven Engineering Approach For Monitoring Machine Learning Models

The outbreak of COVID-19 has dramatically changed peoples' lives over the past two years. The goal of this project is to analyze COVID-19 data from John Hopkins University for the USA, Brazil, India, and Iran between January 23, 2020 and January 21, 2022, and to develop prediction models for the new daily cases and new daily deaths related to COVID-19 for these four countries

Uploaded by

sit22cs182
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

A Model-Driven Engineering Approach For Monitoring Machine Learning Models

The outbreak of COVID-19 has dramatically changed peoples' lives over the past two years. The goal of this project is to analyze COVID-19 data from John Hopkins University for the USA, Brazil, India, and Iran between January 23, 2020 and January 21, 2022, and to develop prediction models for the new daily cases and new daily deaths related to COVID-19 for these four countries

Uploaded by

sit22cs182
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C)

2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C) | 978-1-6654-2484-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/MODELS-C53483.2021.00028

A Model-Driven Engineering Approach for


Monitoring Machine Learning Models
Panagiotis Kourouklidis Dimitris Kolovos Joost Noppen Nicholas Matragkas
British Telecom University of York British Telecom University of York
University of York York, United Kingdom Ipswich, United Kingdom York, United Kingdom
Ipswich, United Kingdom [email protected] [email protected] [email protected]
[email protected]

Abstract—Once a machine learning (ML) model is produced II. D OMAIN A NALYSIS


and used for commercial purposes, it is desirable to continuously
monitor it for any potential performance degradation. Domain Techniques that can infer the value of a target variable
experts in the area of ML, commonly lack the required expertise Y, when given the value of an observed variable X, have
in the area of software engineering, needed to implement a practical applicability in numerous scenarios, such as spam
robust and scalable monitoring solution. This paper presents an detection, fraud detection and digital marketing. Developing
approach based on model-driven engineering (MDE) principles,
for detecting and responding to events that can affect a ML such techniques, has therefore been the focus of various
model’s performance. The proposed solution allows ML experts scientific fields, such as statistics, pattern recognition and more
to schedule the execution of drift detecting algorithms on a recently, ML. This paper focuses on a subset of ML techniques
computing cluster and receive email notifications of the outcome called supervised learning. The scenario in which supervised
without requiring extensive software engineering knowledge. learning is applicable is the following. One would like to
Index Terms—Machine Learning, Model-Driven Engineering,
predict the value of a variable Y when given the value for
Dataset Shift, Concept Drift, Data Drift
a variable X but there is no known mapping between them.
Supervised learning algorithms seek to extract a mapping
I. I NTRODUCTION between X and Y from a set of labelled samples, known as
As usage of machine learning (ML) techniques for com- the training set. Once the mapping is extracted, it can be used
mercial purposes becomes increasingly commonplace, a lot of to infer the value of Y for unlabelled samples, known as the
research has focused on developing algorithms for detecting test set.
changes in the environment in which a ML model operates and In the supervised learning literature [1], [2], X and Y
adapting to them. Applying such algorithms is vital to ensure are often treated as random variables that follow a joint
that the performance of a deployed ML model stays adequate probability distribution P(X,Y). Regardless of the algorithm
in a changing environment. Unfortunately, just developing used to obtain the mapping from X to Y, samples in the
a suitable algorithm is not enough in a commercial setting test set need to be drawn from the same joint probability
as there is also considerable engineering effort required to distribution as the ones in the training set in order to ensure
architect a system that can meet the performance requirements good predictive performance. This assumption might not be
of the business. true for a number of reasons. The real world is a complex,
This paper presents a model-driven approach for ML mon- non-stationery environment and thus differences in the joint
itoring. The extent to which such an approach can lower the probability distributions between training and test sets should
technical barriers that ML experts face when designing ML be expected, especially over time. This phenomenon, in its
monitoring systems is explored. A proposed solution is shown numerous variations, is well-studied in the literature under
that allows ML experts to define the intended behaviour of names such as concept drift/shift [3], [4], covariate/sampling
certain ML monitoring systems without having to provide their shift [4]–[6], prior probability shift [4], [6] and the more
technical implementation. general, dataset shift [6], [7].
The rest of the paper in structured as follows: First, a brief Methods that adapt to such non-stationery environments are
overview of the ML monitoring domain is provided. After- also well-studied. As an example, in [6] the author presents a
wards, an initial meta-model of the domain is presented. Then, methodology for adapting a trained model using the marginal
the generation of a technical implementation according to a probability distribution of X in the test set. In [4], the au-
model-driven engineering (MDE) model of a ML monitoring thors propose a number of techniques for counter-acting drift
system is shown. Next, a web application that enables domain depending on whether some of the labels for samples in the
experts to experiment with the proposed solution is shown. test set are available or not, and under the assumption that we
Finally, the paper concludes with a discussion about planned know the causal model of the phenomenon we try to model.
future work. An alternative approach is followed by a subset of machine

978-1-6654-2484-4/21/$31.00 ©2021 IEEE 160


DOI 10.1109/MODELS-C53483.2021.00028
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:24 UTC from IEEE Xplore. Restrictions apply.
learning called online learning [2]. These algorithms do not The assumptions about the environment in which a deployed
make any assumptions about the underlying distribution of ML model operates are as follows:
observed and target variables and continuously adapt for every • A deployed ML model continuously receives unlabelled
labelled sample received. Therefore, online learning methods samples (inference requests) and is expected to return the
can be used when one expects that the environment will change predicted labels (inference responses) as soon as possible.
over time. The main disadvantage of this approach is that it • After an unspecified amount of time, the actual labels will
needs a constant flow of labelled samples which is not always be made available for a portion of the unlabelled samples
possible. previously received.
A hybrid approach is also possible. In this approach, the • Periodically, a decision shall be made on whether to take
model is trained using an initial labelled dataset but also corrective action based on the latest data received by
adapts whenever a new batch of labelled samples becomes the ML model. Both labelled and unlabelled data could
available. In [8], the authors suggest the use of an ensemble be used to make this decision depending on the chosen
of models to achieve good performance even in the presence methodology.
of drift. Whenever a more recent labelled dataset becomes • There are a number of different actions that can be taken.
available, an additional model is trained and added to the It is up to the domain expert to define which action should
ensemble. Then, the algorithm checks whether the ensemble’s be taken as a response to different kinds of drift.
performance has stayed consistent. Depending on the outcome Taking the above assumptions into account, it is proposed
there is an adjustment to the weights of the models in the that the meta-model that can describe ML monitoring methods
ensemble. should have the following essential components:
• A mechanism to specify which ML model one wants to
III. D OMAIN M ETA -M ODEL monitor.
• A mechanism to specify which fields of the inference
As presented in the domain analysis, research scientists
have developed a number of ML techniques that achieve request and response are to be captured. Also a way
good predictive performance even when deployed in a non- to name the captured fields so that other entities can
stationary environment. However, in order for these tech- reference them individually.
• A mechanism to define periodic drift detection execu-
niques to be applied in a commercial setting, they would
have to be part of a system that meets the requirements tions.
• A mechanism to define what actions are to be taken when
of various stakeholders. For example, such systems might
need to consider latency, cost of computational and storage drift is detected.
resources as well as compatibility with pre-existing software Following the above general ideas, a preliminary version of
and infrastructure. The issue is further complicated by the fact a meta-model for ML monitoring is presented. Although the
that the person that designs the methodology for learning in goal is to design a meta-model that is completely agnostic to
a non-stationary environment is not necessarily the person any underlying technical implementation, some compromises
responsible for the technical implementation of the overall have been made in this version in the interest of getting
system. This introduces an additional layer of challenges that feedback from domain experts sooner. These compromises will
can happen due to miscommunication between people. be pointed out and a discussion about how they can be rectified
This work investigates whether the application of MDE will be presented in the Future work section.
principles can reduce the overall effort required to implement Figure 1 shows the classes of the meta-model and how they
and deploy ML based systems that can handle non-stationary relate to each other. Here is a description of what each class
environments. A critically important part of the solution is represents:
the design of a domain-specific language (DSL), that can A. Deployment
express as many ML monitoring methodologies as possible
A Deployment is the top-level class of the meta-model
while being minimally bound to the technical implementation
that describes all the aspects of a ML monitoring method.
of the underlying system. The DSL aspires to provide a
It contains exactly one instance of the class Model and any
standardised communication layer between domain experts
number of instances of class DriftDetector.
who declaratively define the behaviour of the ML monitoring
system and software engineers who are responsible for gener- B. Model
ating a concrete implementation that adheres to the specified A Model represents the ML model that the domain expert
behaviour. wants to monitor and it contains general information about
This section presents an initial effort to design such a it. Specifically, it contains the name of the ML model, the
DSL. Specifically, the meta-model of a ML-based system URL where a serialization of the ML model can be found
that is designed to handle various types of non-stationarity and the name of the framework that was used to create it (e.g
is presented. In addition, the assumptions behind the various TensorFlow1 ). In the interest of simplifying demonstrations
design decisions and some planned future improvements are
discussed. 1 https://www.tensorflow.org

161

Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:24 UTC from IEEE Xplore. Restrictions apply.
Deployment the ML model. This type of DriftDetector is represented by
the DataDriftDetector subclass. The second type requires a
1 0...*
set of predicted labels produced by the ML model when
Model DriftDetector provided with unlabelled samples for which the actual labels
name: String name: String have been subsequently acquired. This type of DriftDetector
frequency: frequencyType
url: String
email: String
is represented by the ConceptDriftDetector subclass. This dis-
type: MLframework customContainer: String tinction was made in order to simplify the underlying technical
implementation of the system. Unifying the two types by
features 1...* 1 output designing one, more general class that is agnostic to the type
IO DataDriftDetector ConceptDriftDetector of algorithm that is executed is planned as future work. Lastly,
name: String type: dataDriftType type: conceptDriftType a DriftDetector also contains an email address attribute. In this
trainingSetUrl: String
type: ioType
featureName: String
expectedAccuracy: Float initial version, sending email alerts is the only kind of action
that can be taken in response to detected drift. Therefore, an
<<Enumeration>> <<Enumeration>> <<Enumeration>> email address attribute was added to DriftDetector instead of
MLframework frequencyType ioType
tensorflow hourly INT
creating a separate class.
pytorch daily FLOAT
... ... ... IV. G ENERATED A RTIFACTS
<<Enumeration>> <<Enumeration>> A MDE model that adheres to the meta-model presented in
dataDriftType conceptDriftType
kolmogorovSmirnov chiSquared
the previous section describes the behaviour of a ML monitor-
custom custom ing system without specifying any technical implementation
details. There are therefore multiple valid implementations,
Fig. 1. Domain Meta-Model based on different technologies. One such implementation,
based on Kubernetes2 , is presented below. Kubernetes was
chosen because it is a vendor-neutral platform that can run
of this work, the specified ML model is deployed to a web
on a wide variety of computing infrastructure, enabling ex-
endpoint. This is the reason why the last two attributes
perimentation. It is also widely used, which has resulted in
are required. This is, strictly speaking, not necessary for
the formation of a rich ecosystem of tools that extend its
monitoring purposes since the deploying process might exist
functionality. A number of these tools have been leveraged
separately. The Model class also contains a number of IO
for this implementation.
instances representing the inputs of the ML model, also known
Using the model-to-text (M2T) transformation language,
as features, and an additional instance of IO representing the
EGL [9], a M2T transformation was implemented. The output
output. The assumption made is that the model accepts a fixed
of this transformation is a Kubernetes manifest which contains
number of scalar values as input, and returns a single scalar
the description of the resources that need to be provisioned and
value as output. This can also be generalized further.
the containerized applications that need to be executed on a
C. IO cluster. The resulting system implements the behaviour speci-
An IO instance represents an input or an output of the mon- fied by a domain expert in the MDE model provided. For some
itored ML model. It contains the name of the input/output and components of the system, open-source, custom Kubernetes
an enumerative attribute describing its data type. Depending resources are used. For the rest, containerized applications
on the underlying implementation, different data types could were developed and indexed in a container registry so that they
be supported or this information might even be unnecessary. can be referenced in the manifest. The main advantage of this
approach is that the complexity of the M2T transformation is
D. DriftDetector reduced as no procedural code is generated.
A DriftDetector represents the periodic execution of an Figure 2 shows a diagram of the kind of system that is
algorithm that indicates whether corrective action should be described by the generated Kubernetes manifests. Below are
taken. The domain expert can define a DriftDetector’s name the descriptions of the components that make up the depicted
and execution frequency. They can also define the algorithm system.
that is to be executed. They can choose between a set of
algorithms already included in the underlying platform or A. Inference Service
provide the source code for a custom one that is going to This component provides two functionalities. Firstly, it
be integrated with the rest of the system. The specifics of this contains a model server which is responsible for responding
mechanism are further explained in the Generated Artifacts to inference requests sent by various applications. Secondly, it
section. contains a logger which produces events containing informa-
In this version of the meta-model, a distinction has been tion about the model server’s inbound requests and outbound
made between two types of DriftDetectors. The first type responses. These events can be consumed by other web
requires a set of unlabelled samples that are received over
2 https://kubernetes.io
time in addition to the set of labelled samples used to train

162

Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:24 UTC from IEEE Xplore. Restrictions apply.
Model Artifact
Service custom resource by the Knative open-source project.
The container that is executed by the Knative Service is a
Logger Model Server
Inference Response Object Storage containerized web service developed for this specific task.
Inference Request
Inference Service
D. Database
Events
Inference Request This is a standard MySQL database used to persist the data
Message Queue Inference Response as described above. It is implemented using Deployment, Ser-
Inference Consuming Service
vice and PersistentVolumeClaim resources from Kubernetes’
Events
base API and official MySQL container images.
Feedback/Ground Truth
Feedback/Ground Truth
E. Drift Detection Execution
This component is responsible for the periodic execution of
Oracle Service
Database Writer Proxy the drift detection algorithm specified by the domain expert. It
Features
&
Prediction
Feedback/Ground Truth Historical Data
is implemented as a CronJob resource from Kubernetes’ base
API which executes a container on a periodic basis. For this
Latest Data Object Storage implementation, a containerized application was developed
Container Image that retrieves test set data from the database and training set
Database
data from cloud storage and passes them to a function that
Drift Detection Execution Container Image Registry
Email Alert implements the drift detection algorithm. The function returns
a boolean that indicates whether drift was detected, along
Configuration with a message. In the case of detected drift the message is
Mail Server
included in an email alert sent to the relevant email address. If
Fig. 2. Generated Artifacts the domain expert wants to use their own custom algorithm,
they can specify the container image that implements it. In
general, they do not need to implement the parts responsible
services. The Inference Service component is implemented as for data retrieval and notification sending. They only need to
an InferenceService custom resource, which is developed by provide their custom algorithm implemented as a function with
the Kubeflow3 open-source project. specific inputs and outputs so that it can be integrated with the
base drift detection application template. More details of this
B. Message Queue
functionality are provided in the supporting platform section.
The Message Queue component is responsible for receiving
events from the logger and routing them to the database writer F. Configuration
service. This functionality is implemented using Broker and The containerized applications that were developed as part
Trigger custom resources which are developed by the Knative4 of this implementation, need to adapt their behaviour to match
open-source project. the domain expert’s specification. This can be achieved by
using configuration files. These are text files that are mounted
C. Database Writer to a container’s file system. An application running inside the
This component is responsible for initializing a database, container can read them and adjust its behaviour accordingly.
based on the information provided by the domain expert, The configuration files are implemented as ConfigMaps from
and storing the captured monitoring data in it. The first Kubernetes’ base API.
source of data, comes from consuming the events produced
by the Inference Service’s logger. These events contain the G. Proxy
bodies of the requests/responses that are received/sent by the In order to serve inference requests and also receive feed-
Inference Service’s model server. Since the schema of the back under the same host name a proxy is used. For every
API employed by the model server is known, one can extract incoming HTTP request, the proxy checks the request’s path.
the values of the features and the prediction according to Depending on the path, it either forwards the request to the
the domain expert’s specification. The extracted values are Inference service or the Database Writer. This functionality
subsequently stored in the previously initialized database. The is implemented using a VirtualService custom resource devel-
second source of data, comes in the form of feedback about the oped by the Istio5 open-source project.
ML model’s performance. This is the ground truth labels for
V. S UPPORTING P LATFORM
the inference requests previously sent to the Inference Service.
Every inference request/response gets a unique ID attached to It is technically possible for a domain expert to generate the
it so that it can later be matched with its corresponding ground Kubernetes manifests mentioned above by executing the M2T
truth information. The Database Writer is implemented using a transformation on their workstation. They could then use a
Kubernetes CLI tool to deploy to a cluster they have access to.
3 https://kubeflow.org
4 https://knative.dev 5 https://istio.io

163

Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:24 UTC from IEEE Xplore. Restrictions apply.
While possible, this process is not very user-friendly. Since the to executing periodically, one might want to define addi-
goal is to minimize the technical barriers that domain experts tional constraints, such as minimum amount of samples
face when trying to deploy ML monitoring systems, a web received between executions.
application that simplifies this process is provided. Below, is an • There is a need for an abstraction that is expressive
overview of the functionality offered by this web application. enough to describe any kind of drift detection algorithm.
The main functionality provided is the authoring and sub- That would remove the need for making distinctions
mission of MDE models. Users fill out a form whose fields between drift types in the meta-model layer.
correspond to the attributes of the various classes in the domain • The list of actions that can be taken in response to drift
meta-model. When the user submits the form, the specified needs to be expanded. Also, it would be beneficial to
MDE model is serialized in Flexmi [10] format and sent to provide a mechanism for describing complex scenarios
the application’s back-end. In the back-end, the received MDE in which a combinations of actions are prescribed when
model is used as input for the model to text transformation certain conditions are met.
described in the previous section. The resulting Kubernetes • The ability to incrementally update a deployed ML mon-
manifest is then used as input for Kubernetes’ CLI tool which itoring system would be beneficial. This would allow
sends the objects specified in the manifest to the cluster’s API domain experts to modify parts of their system without
server. This last step concludes the deployment of the ML redeploying it in its entirety.
monitoring system described by a domain expert in the form Once all of the needed improvements are made, a compre-
of a MDE model. hensive evaluation of the improved solution shall commence.
In addition to the main functionality, the user also has access The solution could be evaluated on two different bases. Firstly,
to a number of auxiliary features. These are specific to the one could investigate whether the solution is general enough to
computing infrastructure used for this implementation and not describe a substantial number of monitoring techniques found
designed with a focus on reusability . Despite that, they are in the literature. Secondly, an empirical evaluation could be
developed in order to facilitate discussion with ML domain carried out to answer whether the solution reduces the effort
experts and can therefore contribute to the empirical evaluation required for the deployment of ML monitoring systems.
of this work. The auxiliary features provided are the following:
ACKNOWLEDGMENT
• Listing all of the monitored ML models that are currently
deployed. For each deployment two URLs are shown. The work in this paper has been partially supported by the
One that can be used to send inference requests and a Lowcomote project, that received funding from the European
second one that can be used to send feedback/ground Union’s Horizon 2020 research and innovation programme un-
truth data. der the Marie Skłodowska-Curie grant agreement No 813884.
• A form is provided for the upload of serialized ML mod- R EFERENCES
els to cloud storage. In addition, the URLs of previously
[1] J. Friedman, T. Hastie, R. Tibshirani et al., The elements of statistical
uploaded ML models are shown. learning. Springer series in statistics New York, 2001, vol. 1, no. 10.
• A form is provided for the upload of files that contain [2] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of machine
training datasets to cloud storage. In addition, the URLs learning. MIT press, 2018.
[3] J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy
of previously uploaded training datasets are shown. data,” Mach. Learn., vol. 1, no. 3, pp. 317–354, 1986.
• Lastly, users can provide their own implementations of [4] B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij,
drift detection algorithms in the form of a Python func- “On causal and anticausal learning,” in 29th International Conference
on Machine Learning (ICML 2012). International Machine Learning
tion. Additionally, if their code imports any third-party Society, 2012, pp. 1255–1262.
packages, they need to provide a Python requirements [5] M. Salganicoff, “Tolerating concept and sampling shift in lazy learning
file. In the back-end, the user-provided code is combined using prediction error context switching,” Artif. Intell. Rev., vol. 11, no.
1-5, pp. 133–155, 1997.
with the drift detector’s base template and a container [6] A. Storkey, “When training and test sets are different: characterizing
image is built. The image is pushed to a registry and its learning transfer,” Dataset shift in machine learning, vol. 30, pp. 3–28,
URI is listed in the user interface so it can be included 2009.
[7] M. Kull and P. Flach, “Patterns of dataset shift,” in First International
in MDE models. This feature increases the solution’s Workshop on Learning over Multiple Contexts (LMCE) at ECML-PKDD,
flexibility without requiring the user to know how the 2014.
ML monitoring system is implemented. [8] R. Elwell and R. Polikar, “Incremental learning of concept drift in
nonstationary environments,” IEEE Trans. Neural Networks, vol. 22,
no. 10, pp. 1517–1531, 2011.
VI. F UTURE W ORK [9] L. M. Rose, R. F. Paige, D. S. Kolovos, and F. A. Polack, “The
epsilon generation language,” in European Conference on Model Driven
This work is intended to be a proof of concept that show- Architecture-Foundations and Applications. Springer, 2008, pp. 1–16.
cases the feasibility of MDE techniques applied in the ML [10] D. S. Kolovos and R. F. Paige, “Towards a modular and flexible human-
usable textual syntax for emf models.” in MODELS Workshops, 2018,
monitoring area. Below are some of the improvements that pp. 223–232.
need to be made for a more comprehensive solution:
• The scheduling of drift detection executions needs to
cover more complex scenarios. For example, in addition

164

Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:24 UTC from IEEE Xplore. Restrictions apply.

You might also like