0% found this document useful (0 votes)
65 views

02 - A Performance Modeling Framework For Microservices-Based Cloud Infrastructures

This paper proposes a performance modeling framework for microservices-based cloud infrastructures using stochastic Petri nets, NSGA-II multiobjective optimization, and random forest regression. The framework models resource sharing and auto-scaling of microservices deployed on virtual machines in private clouds. It identifies configurations that optimize resource usage while meeting service level agreements, and enables analysis of different scenarios and constraints. The modeling approach supports performance prediction of both overall infrastructures and individual microservices.

Uploaded by

Cayo Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

02 - A Performance Modeling Framework For Microservices-Based Cloud Infrastructures

This paper proposes a performance modeling framework for microservices-based cloud infrastructures using stochastic Petri nets, NSGA-II multiobjective optimization, and random forest regression. The framework models resource sharing and auto-scaling of microservices deployed on virtual machines in private clouds. It identifies configurations that optimize resource usage while meeting service level agreements, and enables analysis of different scenarios and constraints. The modeling approach supports performance prediction of both overall infrastructures and individual microservices.

Uploaded by

Cayo Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

The Journal of Supercomputing

https://doi.org/10.1007/s11227-022-04967-6

A performance modeling framework


for microservices‑based cloud infrastructures

Thiago Felipe da Silva Pinheiro1   · Paulo Pereira1 · Bruno Silva2 · Paulo Maciel1

Accepted: 17 November 2022


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
2022

Abstract
Microservice architectures (MSAs) can increase the performance of distributed systems
and enable better resource allocation by sharing underlying resources among multiple
microservices (MSs). One of the main advantages of MSAs is the ability to leverage
the elasticity provided by an infrastructure so that only the most demanding services
are scaled, which can contribute to efficient allocation of processing resources. A major
problem in allocating resources to microservices is determining a set of auto-scaling
parameters that will result in all microservices meeting specific service level agree-
ments (SLAs). Since the space of feasible configurations can be vast, manually deter-
mining a combination of parameter values that will result in all SLAs being met is com-
plex and time consuming. In addition, the performance overhead caused by running
microservices concurrently and the overhead caused by the VM instantiation process
must also be evaluated. Another problem is that microservices can suffer performance
degradation due to resource contention, which depends on how microservices are dis-
tributed across servers. To address the aforementioned issues, this paper proposes the
modeling of these infrastructures and their auto-scaling mechanisms in a private cloud
using stochastic Petri nets (SPNs), the non-dominated sorting genetic algorithm II
(NSGA-II), one of the most popular evolutionary algorithms for multiobjective opti-
mization (MOO), and random forest regression (RFR), an ensemble-learning-based
method, to identify critical trade-offs between performance and resource consumption
considering all deployed MSs. The SPN-based model is capable of representing both
instantiation of elastic VMs and a pool of instantiated elastic VMs where only con-
tainers are started. The analytical framework enables service providers (SPs) to esti-
mate performance metrics considering configurations that satisfy all performance con-
straints, use of elastic VMs, discard rate, discard probability, throughput, response time,
and corresponding cumulative distribution functions (CDFs). These metrics are critical
because they make it possible to estimate the time required to process each request, the
number of requests processed in a time interval, the number of requests rejected, and
the utilization of resources. The framework was validated with 95% confidence interval
(CI) using a real-world testbed. Two case studies were used to investigate its feasibility

Extended author information available on the last page of the article

13
Vol.:(0123456789)
T. F. da Silva Pinheiro et al.

by evaluating its application in a real scenario. We noticed a significant improvement in


performance when using a pool of elastic VMs, where throughput improved by 21.5%
and the number of discarded requests decreased by 70%. The application of the frame-
work can help in finding optimized solutions that support both infrastructure planning
and online performance prediction, and enable trade-off analyses considering different
scenarios and constraints.

Keywords  Microservices · Performance modeling · NSGA-II · Genetic algorithm ·


Machine learning · Stochastic petri nets

1 Introduction

MSAs consist of a set of small independent services that can be logically grouped to
provide the necessary functions that make up a large system [1–4]. Systems based
on MSAs orchestrate multiple granular, autonomous, and self-contained MSs. Each
MS represents a single concern and provides a communication interface used by
other MSs or external service consumers. This architecture enables faster deploy-
ment and release. In addition, only the most in-demand services are scaled, which
can contribute to efficient use and scaling of computing resources [1]. On the other
hand, it should be noted that MSs have all the complexities associated with a distrib-
uted system, such as errors in communicating with services, latency, and difficulties
in managing many services. With the advent of Internet of things (IoT) and edge
computing, many MSs need to be very close to consumers, and in some cases the
use of a public cloud makes it impossible to meet SLAs due to latency in data trans-
mission [5].
Several MSs can be deployed in a private cloud. Sharing the computing resources
of a private infrastructure across a set of MSs is a strategy that can enable better
resource utilization. However, deciding on the most appropriate deployment con-
figuration is not an easy task. Many scenarios need to be analyzed. SPs need to
evaluate the performance overhead of running MSs concurrently under different sce-
narios. Although virtualization technologies enable isolation of resources between
MSs, it is difficult to partition or isolate access to some resources such as memory
bandwidth, disk I/O, and network [6]. When multiple VMs/containers are scheduled
together on the same physical machine (PM), they may undergo a performance deg-
radation [7, 8]. Also, since the infrastructure is subject to processing load that fluc-
tuates over time, the service time is not constant. Another major challenge in provi-
sioning resources is to find a set of auto-scaling parameters that can meet the SLAs
for all MSs. Determining the thresholds and step sizes for inserting and removing
VMs and containers, taking into account all MSs involved, requires a deep under-
standing of the workload characterization of each MS. Key parameters related to
auto-scaling include the type and number of reserved VMs that are always running;
the type of elastic VMs that need to scale as demand increases; the number of rep-
licated containers running on each VM, which determine the number of concurrent
requests processed; the number of physical CPU (pCPU) cores of PMs that must be

13
A performance modeling framework for microservices‑based…

used to virtualize VMs; and thresholds and step sizes. A step size defines the num-
ber of elastic VMs that are created or destroyed when a threshold is reached. In addi-
tion, the performance overhead caused by the VM instantiation process must also
be evaluated. Considering the running microservices, it may be beneficial to use a
pool of already instantiated elastic VMs and only instantiate containers on the avail-
able VMs. The effects of these random variables on the evaluated measures must be
adequately predicted to properly guide the parameterization of each MS.
Stochastic modeling can provide a more accurate understanding of system behav-
ior and enable the identification of deployment configurations that optimize the use
of underlying processing resources while complying with SLA constraints. The
effort required to evaluate metrics using modeling is less than evaluating them in
production [9–13]. Considering that each MS has its performance constraints, satis-
fying all SLAs simultaneously when those services are deployed together is a multi-
objective optimization problem with two or more objective functions (OFs), where
a collection of efficient solutions can be found that differ with respect to all the met-
rics simultaneously. Since the space of solutions can be huge, manually finding a
configuration that satisfies the SLAs is a complex task. NSGA-II is one of the most
popular MOO algorithms with three special characteristics, namely, a fast non-dom-
inated sorting approach, a fast crowded distance estimation procedure, and a simple
crowded comparison operator [14]. A genetic algorithm is a metaheuristic inspired
by the natural selection process and used to solve practical problems by finding val-
ues for parameters to achieve the best results [15]. The load profile of two or more
MSs deployed on the same PM affects how well they run in parallel. Therefore,
modeling different load types is necessary to accurately predict performance. A per-
formance interference predictor can consider VM co-scheduling scenarios defined
by a VM placement (VMP)/consolidation technique [16, 17] and incorporates the
impact of stresses on multiple resources. A VM co-scheduling scenario represents
a set of VMs distributed across multiple PMs. RFR is an effective ensemble-learn-
ing-based method for predicting measures for a large number of scenarios through a
small set of training samples [18]. In this work, we combined stochastic modeling,
RFR, and NSGA-II to explore the solution spaces. Other works in the literature also
sought methods to find optimal resource allocation, as in [19–24], where the authors
propose stochastic and/or optimization models to represent elastic microservice-
based systems. However, none of these related works have explored the following
characteristics in conjunction: infrastructure planning, formal optimization, consoli-
dation technique, performance interferance modeling, VMs, containers and, finally,
stochastic modeling.
The contributions of the paper are summarized as follows:

• A modeling strategy based on SPNs [25, 26] to support performance predic-


tion of both microservices-based cloud infrastructures (MBCIs) and individual
microservices by representing the sharing of underlying processing resources
and the current load that the cloud is under, and enabling the evaluation of dis-
tinct deployment scenarios and scaling settings. The SPN-based model repre-
sents MSs, VMs, containers, pCPU cores, and scaling settings. The use of other
physical resources such as memory can be modeled by a consolidation algo-

13
T. F. da Silva Pinheiro et al.

rithm. The modeling strategy is capable of representing both instantiation of


elastic VMs and a pool of instantiated elastic VMs where only containers are
launched. The model has the main purpose of solving elastic VMs/pCPU cores
utilization (U), discard rate (DR), discard probability (DP), throughput (TP),
mean response time (MRT), and the corresponding cumulative distribution func-
tions (CDFs).
• A methodology based on RFR and VMP/consolidation algorithms to build a
performance interference predictor that considers groups of VMs/containers co-
scheduled on the same set of PMs. The learned model is able to accurately pre-
dict the service times of a set of MSs in different scenarios. This predictor has
been integrated into the SPN model and is invoked during the solution of the
stochastic model.
• We adapt the NSGA-II algorithm to map a set of OFs to the SPN-based model to
find appropriate parameter values that satisfy a set of performance constraints.
• We applied the analytical framework in two case studies to demonstrate its feasi-
bility. It has been shown that the synergistic combination of stochastic modeling,
a machine learning (ML) model, and a multiobjective optimization algorithm
can accurately predict the performance of MSs deployed on private infrastruc-
tures. By combining the proposed models and the optimization algorithm, it was
possible to perform a trade-off analysis considering the performance of all MSs.
The results of this study can be used to support both infrastructure planning and
online performance prediction, and enable trade-off analyses considering differ-
ent deployment scenarios and constraints.

This paper is organized as follows: Sect.  2 summarizes the works that are more
closely related to our proposal. Section  3 presents the theoretical background to
support understanding of the solution presented in this work. Section 4 presents the
analytical framework and describes how to apply it. Section 5 discusses its valida-
tion and illustrates its application in two case studies. Finally, Sect. 6 draws conclu-
sions and highlights future research directions.

2 Related works

Many studies have addressed the issue of performance assessment in the context
of microservice-based systems [4, 19–21, 27–34]. Singh and Sateesh [31] and Vil-
lamizar et  al. [32] compared performance between monolithic systems and MSAs
through experiments and showed the benefits of migrating to MSAs. As observed
in [35], the most important groups of metrics for evaluating MSs are performance,
scalability, availability, and maintenance. According to [35], most of the metrics to
consider relate to performance and scalability. Performance, scalability, and infra-
structure costs are three of the main issues that SPs face when planning and deploy-
ing MSs.
A variety of strategies have been proposed to solve this problem for microser-
vices-based systems [36–42]. Most of the proposed work addresses reactive scaling
techniques that modify the system based on its current state. Other research focuses

13
A performance modeling framework for microservices‑based…

on proactive techniques that modify the system based on analysis of past data to pre-
dict its future state. Most proposed approaches use threshold-based policies, queuing
theory, and/or machine learning techniques.
Formal methods have been applied in various computing domains to evalu-
ate system performance and to assist engineers in architectural planning. In recent
years, researchers’ interest in using probabilistic model checking in the cloud has
increased. The main reason is that quantitative analysis of the uncertainty associ-
ated with these systems is clearly needed. Modeling performance, scalability, and
infrastructure allows SPs to determine the distribution of processing capacity across
microservices.
Khazaei et  al. [20] proposed an analytical model based on continuous-time
Markov chains (CTMCs) to study the performance of provisioning on elastic micros-
ervice platforms. The model supports capacity planning for microservices and pro-
vides a systematic approach to measuring microservices elasticity by considering
multiple VMs on a PM and multiple containers on a VM. Results were obtained
for performance in terms of response time, discard probability, utilization, cluster
size. However, the solution proposed by the authors assumes that all VMs that can
be deployed are identical. In contrast to our work, where different types of VMs
can be allocated in the infrastructure for different microservices. In addition, their
model does not support the use of a consolidation algorithm, nor does it addressed
the overhead caused by running multiple microservices sharing the same processing
resources, how we do.
Bao et  al. [19] propose a performance model for microservice-based applica-
tions. Their analytical framework considers different types of VMs and containers
running on VMs, and allows modeling of CPU cores, memory, and disk usage. The
model assumes that only a single MS instance is running in a container, so it does
not address the definition of scaling rules. Results were obtained for performance in
terms of response time. However, other important metrics such as discard rate and
throughput were not considered. The work does not propose a consolidation tech-
nique, nor does it provide the ability to use the models along with a consolidation
algorithm. Moreover, the framework is focused on the deployment of MSs in public
clouds.
Gribaudo et al. [21] proposed a fluid stochastic Petri net (FSPN) to evaluate auto-
scaling policies for microservices architectures in public and private clouds. The
work addressed performance, scalability, cost, and energy consumption evaluation
to understand the impact of using different scaling strategies. The authors also pro-
pose an algorithm for consolidating microservices into VMs that is integrated with
the FSPN model. The number of running VMs is based on the workload variation.
However, unlike our work, the model proposed in [21] does not consider perfor-
mance interference caused by running different microservices together.
Kafhali et al. [24] propose a dynamic scaling model based on queuing theory and
CTMC for containerized cloud services. This model is used to capture the scalability
and dynamicity of cloud computing systems and estimate their QoS performance.
The aim is to improve the utilization of virtual computing resources and meet the
requirements of SLA in terms of CPU utilization, response time, discard rate, num-
ber of requests, and throughput. The model is capable of running many VMs on

13
T. F. da Silva Pinheiro et al.

one PM and represent multiple containers on a VM, reflecting real-world application


scenarios. The model represents only a pool of VMs, with each VM configured to
run one or more container instances. This is in contrast to our model, which can rep-
resent both a pool of VMs and instantiation of different types of VMs for different
MSs. In addition, the authors did not address the problem of resource contention and
the model is not able to use a consolidation technique as our work does.
Many papers have addressed the issue of resource consolidation in the context of
microservices [17, 43–46]. In contrast to these works, we do not propose a method
for resource consolidation, but rather a modeling framework that can be adapted to
be used in conjunction with some of these methods.
In sum, there have been a number of efforts to model microservices performance
based on resource allocation. However, very few efforts have been made to develop
a stochastic framework for evaluating the performance of such services deployed in
private clouds, to determine an appropriate scaling strategy considering the trade-
off analysis between the performance of all MSs, and to consider the application
of a consolidation technique that may affect the overall performance. Moreover, the
models proposed in related works do not provide adequate ways to model perfor-
mance interference caused by resource contention arising from MSs running in the
same shared hardware environment, which is a common occurrence. Our work is an
attempt to fill up the gap in this area.

3 Background

This section discusses concepts about SPNs and RFR that are fundamental to under-
standing the solution presented in this work.

3.1 Stochastic petri nets

SPNs are a powerful modeling formalism long used by scholars and industry to rep-
resent asynchronous, distributed, concurrent, parallel, deterministic, and stochastic
processes [25, 47–49]. SPNs define a specification technique that allows the problem
being analyzed to be represented mathematically and graphically, having analytical
mechanisms that support verifications of the behavioral properties and correctness
of the modeled process. SPNs are directed graphs consisting of two types of nodes:
places and transitions. Arcs represent the process flow. Places are represented by cir-
cles, while transitions are represented as filled rectangles. Places correspond to the
states in which the process may be at a given time during its evolution. Transitions,
on the other hand, represent events that occur during process execution. Tokens are
represented as small black circles stored in places, which indicate a particular state
of the process. SPNs assign a stochastic delay to each timed transition. This makes
the model probabilistic and described by a stochastic process. SPNs can be isomor-
phic to CTMCs [50] and are therefore analytically solvable [25]. Using phase-type
approximation techniques, non-exponential distributions can be represented by a set
of poly-exponential distributions, such as Erlang [26, 50, 51].

13
A performance modeling framework for microservices‑based…

Architecture Definition Models Definition


start
Architecture Stochastic Modeling No
Definition
Are the models
statistically validated?
Parameters Interference Modeling Yes
Definition
end

Metrics Validation Scenario Models


Definition Definition/Monitoring Validation

Subprocess Task Workflow

Fig. 1  Methodology overview

3.2 Random forest regression

RFR is a nonparametric regression method based on decision trees (DTs) that is


commonly used to capture nonlinearity in data sets [18, 52]. RFR is an ensemble
learning-based method in which the predictions of numerous DTs are averaged
to obtain the final prediction. This method uses DTs as part of a majority vot-
ing process. It provides an assessment of the importance of variables and, unlike
other methods, works well with a large number of features and a small training
data set. The interpretability of the model and the predictive accuracy of RFR are
unmatched among popular ML methods. The use of ensemble strategies and ran-
dom sampling leads to more accurate predictions.

4 Estimating performance in MBCIs

The next sections provide a detailed overview of the analytical framework. Sec-
tion 4.1 presents the methodology used to build and validate the proposed mod-
els. Section 4.2 presents the MBCI considered in this work. Section 4.3 describes
the stochastic models and how they can be applied. Section  4.4 describes the
methodology for building a ML-based performance model to predict service
times. Finally, Sect. 4.5 presents the optimization algorithm.

4.1 Methodology

This section presents the methodology used to create and validate the proposed
models. Figure 1 shows a flowchart summarizing this methodology. It is divided
into two main sub-processes: Architecture Definition and Models Definition. The
first sub-process defines the MBCI and generates all the data needed to support
the construction of the models, which is done in the subsequent sub-process. The
data generated in a particular task can be used as input to previous tasks, refining
the entire process. Below we describe each major task of the methodology:

13
T. F. da Silva Pinheiro et al.

Consumers Front-end Microservices Resource Pool Cloud Infrastructure

Scaling-out request
Triggering scaling action

VMs/containers being inserted


Cloud Watcher Reserved VMs Elastic VMs

Monitoring REST
MS #A
Quad-core CPU

API
Node-1

Available pCPU cores


Routes

VMs/containers being destroyed


API
REST Gateway REST IM
API API MS #B
Load balancer
Quad-core CPU

Node-2
Cloud
Controller
REST
API MS #C SM
Queries
Server
Quad-core CPU

Node-3 Pool
Service Discovery
Service Registry Updates/Observes
Scaling-in request

Available core Reserved VM Elastic VM being inserted into the resource pool
pCPU with four cores Container Database
Allocated core Elastic VM Elastic VM being destroyed from the resource pool Instance Instance
Quad-core CPU

Fig. 2  A microservice-based cloud infrastructure (MBCI)

• Architecture definition Define the components that make up the MBCI, such as
the API gateway, load balancer, service discovery, mechanism for inter-process
communication, and deployment strategy.
• Parameters definition Define the parameters to be used in the model, such as
arrival rates, service times, VM type, number of reserved VMs, number of con-
tainers per VM, scaling trigger timer, and scaling thresholds and step sizes.
• Metrics definition Define the performance metrics to be extracted from the
model, e.g., throughput, mean response time, and discard rate.
• Stochastic modeling Build the stochastic models that represent the MBCI based
on the data generated in the previous tasks.
• Interference modeling Create the ML-based model to predict the impact on ser-
vice times of different VM co-scheduled scenarios. The methodology to create
this model is described in Sect. 4.4.
• Validation scenario definition/monitoring Define the MSs to be deployed in the
MBCI, the workload they will be exposed to, and the workload generation tools
to be used in the validation process. Then monitor the MSs in production to take
measurements.
• Models validation Statistically compare the results obtained from the analytical
framework with each monitored MS. If the model is considered validated, we can
use the modeling strategy to evaluate different deployment settings. Otherwise,
we repeat some steps of the methodology until the framework is considered sta-
tistically validated.

4.2 System architecture

One of the biggest challenges that MSs bring is the management of processing
resources. An MBCI consists of several components divided into multiple layers that
play an essential role in maintaining the operation of the MSs. Figure 2 shows the
architecture considered in this paper. The front-end layer is responsible for service
registration, discovery, forwarding, and monitoring. The second layer represents the

13
A performance modeling framework for microservices‑based…

various MSs deployed in the infrastructure. The third layer represents the pools of
virtual resources allocated to run the various MSs instances. The last layer repre-
sents the infrastructure layer. The following main components define the system
architecture:

• API gateway MS instances have dynamic network locations. A common solution


to this problem is to use an API gateway, a server-side aggregation endpoint. A
service consumer makes requests through an API gateway using its DNS name
over HTTP or TCP and invokes a MS through its REST API or Thrift API1 with-
out knowing its exact location. This gateway centralizes the routing of requests
and can also perform other tasks, such as load balancing.
• Service discovery (SD) SD plays a role in determining the network location of
MS instances. The key component of an SD is the service registry (SR), a data-
base that contains these network locations. The API gateway periodically que-
ries the SD to obtain the list of available instances. The SD provides a registra-
tion API that can be used to register or deregister MS instances in the SR. Many
deployment platforms such as Kubernetes [53] and Docker [54] have built-in
mechanisms for registering and discovering services.
• Load balancer (LB) LB distributes requests to a set of registered service
instances running in VMs or containers according to a specific algorithm, e.g.,
round-robin. This component can have multiple request queues, each for a MS.
The LB queries the SD mechanism and forwards each request to an available
instance.
• Cloud watcher (CW) CW monitors the state of the MSs by checking the number
of occupied slots in the queues and collecting data on the arrival of requests and
their processing. This component is activated periodically and checks whether
the scaling thresholds have been reached. If so, it sends scaling requests to the
cloud controller to instantiate or destroy VMs and/or containers.
• Microservices Each MS has its own technology stack.
• Resource pool (RP) RP represents the set of MS instances responsible for pro-
cessing requests. When the RP is updated by adding or removing MS instances,
the SD updates the SR.
• Cloud infrastructure The cloud infrastructure uses the Infrastructure-as-a-Ser-
vice (IaaS) approach, and this layer consists of an infrastructure manager (IM), a
storage manager (SM), and N redundant nodes that provide processing resources
to support virtualization of MS instances. The cloud components are based on
frameworks such as OpenStack2 and CloudStack3 [55].

1
  https://​thrift.​apache.​org/.
2
  https://​www.​opens​tack.​org/.
3
  https://​cloud​stack.​apache.​org/.

13
T. F. da Silva Pinheiro et al.

Table 1  Metrics Metric Description

CDF Cumulative distribution function


DP Discard probability
DR Discard rate
MRT Mean response time
TP Throughput
U pCPU cores utilization/elastic
VMs utilization

4.3 Stochastic modeling strategy

SPs perform performance evaluations considering a range of scenarios in a model


representing the actual system. Our modeling strategy helps SPs find the most
appropriate deployment configurations given the constraints of the physical infra-
structure and individual MSs. We assume that only one MS is running on each VM,
and the number of containers running on each VM can vary. Each container runs
one instance of the MS. The model presented in this section is generic enough to
represent both instantiation of elastic VMs and a pool of instantiated elastic VMs
where only containers are launched. The model can represent different types of VMs
when not representing a pool of VMs. This section presents the modeling strategy
that represents the instantiation of elastic VMs, and Sect. 4.3.1 presents the changes
in the model to represent a pool of elastic VMs. Our strategy enables the evaluation
of performance metrics as described in Table 1.
The stochastic model was designed and solved using the Mercury tool4. Mercury
is a powerful tool that supports a considerable number of formalisms and allows a
wide range of evaluations for each of them, so it can help scholars and industry to
make predictions in different areas [56]. Mercury has a sophisticated feature that
can call external functions when all conditions for activating an SPN transition are
met. Thus, the transition using this feature is activated only if the associated external
function returns true. These functions receive as input the name of the transition,
the parameters and the current state of the model, i.e., a vector with the number of
tokens in each place. For timed transitions, an external function can also be called to
determine the average delay of the transition. These functions can perform complex
computations given the current state of the model.
The SPN model works in conjunction with a VMP/consolidation technique. VMP
is critical to providing efficient resource management in MBCIs and involves effi-
ciently allocating a group of VMs onto a set of PMs in accordance with a set of
criteria, such as to improve resource utilization and/or energy efficiency of the infra-
structure [57]. There are several VMP/consolidation techniques based on heuristic,

4
  https://​www.​modcs.​org/.

13
A performance modeling framework for microservices‑based…

Fig. 3  SPN subnets for one microservice

Table 2  Structural components
Component Description

An Inter-arrival time
Cn Replicated containers running for request processing
CW Cloud watcher timer
Dn Microservice scaling decision
LVM n Mean time for VMs and replicated containers to start for each
scaling-out request
RVM n Destruction of instantiated VMs releasing allocated cores
SI n Scaling-in requests not yet processed
SO n VMs being instantiated
NA n No scaling action because any threshold has been reached
Pn Service time
PCORES Number of pCPU cores available for instantiation of scaled VMs
Rn Requests queued or being processed
SOA n Decision to scale out the microservice
SIA n Decision to scale in the microservice

metaheuristic, exact, and approximate methods to find an optimal solution to this


placement problem [58].
Our framework represents independent MSs rather than the composition of mul-
tiple MSs. We consider that a user request can only be executed on one container.
The framework supports performance evaluation of individual MSs and the entire
MBCI, taking into account all MSs sharing the same infrastructure, the workloads
expected from each of them, and the performance constraints imposed by SLAs and
infrastructure resources. Based on the individual performance of each MS, it is pos-
sible to use a microservice composition approach where performance can be evalu-
ated by aggregating multiple MSs. There are several techniques for modeling the
composition of microservices [59].

13
T. F. da Silva Pinheiro et al.

Table 3  Transitions’ attributes
Transition Type Server semantic Weight Priority Enabling
function

An Timed Single server – 1 Yes


CW Timed Single server – 1 No
LVM n Timed Infinite server – 1 No
NA n Immediate – 1 1 Yes
Pn Timed Infinite server – 1 No
RVM n Immediate – 1 1 No
SIA n Immediate – 1 2 Yes
SOA n Immediate – 1 2 Yes

Table 4  Main variables used throughout the paper


Parameter Description

BSn Number of slots in the queue


𝛼n Step size
𝛽n Number of reserved VMs for the microservice n
𝛿 Number of pCPU cores reserved for elastic VMs
𝛿n Maximum number of pCPU cores that microservice n can allocate
Θn Time to launch elastic VMs and their replicated containers
𝜆n Inter-arrival rate
𝜆dn Discard rate
𝜆en Effective arrival rate
𝜇n Service time
Ξ Time between scaling decisions (cloud watcher)
Πsit
n
Threshold factor for scaling in the microservice n
Πsot
n Threshold factor for scaling out the microservice n
Φn Number of replicated containers running in reserved VMs
Ψn Number of pCPU cores allocated/deallocated for each scaling request
𝛾nc Number of replicated containers running per VM for microservice n

Figure 3 shows the structure of the stochastic model, which represents a single
MS. This model represents a queue, VMs, and containers for processing requests,
scaling thresholds, step sizes, and pCPU cores for elastic VM virtualizations.
The n-index included in some component names indicates that these components
are associated with a particular MS. Components without the n-index are unique
components representing the underlying infrastructure, which we will explain
shortly. The blue arcs represent the flow of creation of VMs and containers. On
the other hand, the red arcs represent the destruction of VMs and containers.
Tables 2, 3, and 4 describe the structural components, the attributes of the transi-
tions, and the parameters of the model, respectively.

13
A performance modeling framework for microservices‑based…

Fig. 4  SPN model representing two microservices

The model consists of two SPN subnets. The processing subnet represents both
the arrival of requests and the current processing power allocated to support request
processing. The scaling subnet represents both the scaling mechanism with the pre-
defined scaling thresholds and step sizes and the processing power currently avail-
able in the MBCI to instantiate elastic VMs.
The model can represent many MSs in the same MBCI, sharing the process-
ing power through refinement. Figure  4 illustrates this refinement using two MSs.
The SPN pattern representing a new MS is composed by all n-index components.
PCORES and CW are unique components, regardless of how many MSs are repre-
sented. As we can see, in addition to the new components for the new MS, three new
arcs connect the unique components to the new SPN subnet. For additional MSs, the
refinement pattern is the same.
The model assumes that all VMs running a MS are homogeneous. This means
that they have the same amount of memory, the same number of associated pCPU
cores, and the same number of replicated containers. However, the model may rep-
resent more than one type of VM, each running a specific MS.
PCORES represents the number of pCPU cores reserved on PMs to virtualize
elastic VMs. We consider the sum of the cores reserved for elastic VM virtualization
by consolidating multiple PMs, which is defined by 𝛿 . The 𝛿n parameter, on the other
hand, defines how many cores each MS can use simultaneously. Since the stochastic
model can consolidate the processing power of a number of PMs on the PCORES
place, VMs assigned to a MS can run on different PMs, considering multiple VM
co-scheduling scenarios depending on the type and total number of instantiated
VMs. A co-scheduling scenario determines which PM each VM runs on.
The 𝛾nc parameter defines the processing capacity of the VM. On the other hand,
Φn represents the number of replicated containers running in reserved VMs. These
must always be active to avoid delays in processing incoming requests, even if the
MS is empty, which means that the cloud does not destroy these VMs during the
scaling process. The allocated VMs for an MS composed of reserved VMs ( 𝛽n ) and
elastic VMs multiplied by 𝛾nc determines the current microservice processing capac-
ity (MPC). MPC is the maximum number of requests an MS can serve at a given
time. An engineer can increase the MPC by adding reserved VMs, changing scaling
settings, adding new PMs for virtualization, or using these strategies in combination.
Transition An comprises the arrival of requests at the MS ( 1∕𝜆n ). The firing of An
represents the admission or rejection of requests, which in case of admission, they

13
T. F. da Silva Pinheiro et al.

can be forwarded to a container or queued if no containers are available. The model


assumes independently and identically distributed inter-arrival times, represent-
ing a Poisson arrival process [60]. An has a single-server semantic (SSS), meaning
that token processing is sequential regardless of the degree of transition activation.
In addition, the model uses a service discipline known as first come, first served
(FCFS) [60]. The FCFS LB has a queue for each MS, and each queue starts with
a capacity equal to the buffer size variable BSn . This capacity is decreased as new
requests are received and forwarded to the queue. Rn represents queued requests and
requests being served by containers. On the other hand, Cn represents the current
MPC and each token in it represents a container instance. An considers BSn to deter-
mine whether the incoming request is accepted and defines a guard condition that
performs this check as follows:
#Rn − #Cn < BSn .

If the number of tokens in Rn is equal to or less than Cn , there are no requests in the
queue. The time required for the LB to forward a request to a container instance is
encapsulated in the response time. Thus, there is no transition representing this pro-
cess of forwarding requests. When An is fired, a token is created into the place Rn.
Each Cn receives the initial processing capacity defined by Φn . If processing
resources are limited, Φn may be zero, meaning that no reserved VM is allocated.
However, this can affect performance because the FCFS LB forwards the requests
for processing and there is no instantiated container to process them the moment
they arrive, increasing the number of occupied slots in the queue. The number of
tokens in Cn may change due to scaling operations, defining a new MPC for the MS.
Cn has an output arc that connects it to Pn.
The transition Pn represents request processing, and this transition has infinite
server semantics (ISS). In ISS, an enabled transition simultaneously processes any
set of input tokens that enable it [49, 50]. The enabling degree of Pn defines the
number of simultaneously processed requests. This means that a container instance
becomes unavailable when the enabling degree of Pn increases by one. This degree
is determined by which input place contains the smallest number of tokens. More
precisely, the number of tokens in Rn and Cn , determines the highest enabling degree
of Pn at a given time. If the number of tokens in Rn is greater than or equal to Cn , the
enabling degree is equal to the number of tokens in Cn . Otherwise, it is equal to the
number of tokens in Rn . Consequently, the enabling degree of Pn also determines
the current number of requests in the queue. The delay assigned to Pn is the mean
service time ( 𝜇n ). The service time is the time it takes a container to process a single
request. This time depends on the configuration assigned to the containers, e.g., the
upper limit of CPU consumption that each instance can consume, the number of
containers running concurrently on the VM, and the VMP scheme used. This means
that 𝜇n can be affected at each scaling operation. We leverage ML capabilities for
timed transitions to predict 𝜇n given the current VM co-scheduling scenario. To this
end, each Pn transition is associated with a function Gn that calls a database con-
structed by a ML model and returns the weighted average service time for the given
MS given the current number of allocated VMs. This function is defined as follows:

13
A performance modeling framework for microservices‑based…

𝜇n ⟵ Gn (∇),
[ ]
where ∇ represents an n-dimensional vector ∇ = ms(vm)
1
, ms(vm)
2
, … , ms(vm)
n  , where
each element ms(vm)
n represents the current number of VMs for the n-th MS. It is nec-
essary to consider the service time for a given MSn in each PM in which it is allo-
cated. Based on the 𝜇n in each PM, the weighted average 𝜇n considered by P n is cal-
culated as follows:
∑m n n
i=1
w ⋅ 𝜇i
𝜇n ⟵ Gn (∇) = ∑m i ,
w
i=1 i

where m is the number of PMs running the MSn , wi is the number of VMs running
in PMi for the MSn , and 𝜇i is the service time of MSn in PMi . The number of VMs
assigned to a MS is given by #Cn ∕𝛾nc , and the current allocation of VMs to PMs is
taken into account when generating the CTMC or during simulations, which may
cause the service times for each MS to vary given the current state of the model. An
output arc connects Pn to Cn , i.e., when Pn fires, the processing of a request is com-
plete and a container becomes available for processing a new request.
Our modeling strategy supports reactive and predictive (also called proactive)
methods for scaling microservices [61–64]. A reactive method can increase the num-
ber of VMs allocated to a MS whenever CPU utilization, queued requests, or other
resource utilizations reach a predefined level. Alternatively, predictive methods pro-
vide a solution to minimize SLA violations by proactively allocating or removing
VMs based on provisioning policies that are based on forecasted workload [63, 64].
Our work presents a model that employs a reactive scaling threshold-based auto-
scaler based on the number of occupied slots in queues. As observed in [65], scal-
ing decisions based on the number of requests in the queue are much more resilient
to microservice characteristics variations. However, our model is generic enough to
use other scaling strategies.
The current number of containers (i.e., MPC) and requests in the queue are fac-
tors that determine whether a MS needs to scale. A threshold for scaling-out (SOT)
defines the number of requests in a queue to instantiate resources when processing
power is available. The cloud instantiates VMs only if the number of available pCPU
cores is equal to or greater than the demand and, given the current VM co-schedul-
ing scenario, there is capacity on some PMs to accommodate the new VMs. In addi-
tion, there may be limits on the number of elastic VMs allocated to each MS, such
that a single MS cannot take up all of the MBCI’s processing power. On the other
hand, a threshold for scaling-in (SIT) defines the number of requests in a queue to
destroy elastic VMs and containers releasing pCPU cores and other resources. SOT
must always be greater than SIT. Otherwise, no elastic VM would be destroyed.
This work assumes that all PMs are homogeneous. However, our framework can
be adapted to represent heterogeneous hardware. It is worth noting that the instantia-
tion of VMs does not depend exclusively on the number of available pCPU cores, but
also on the types and current number of instantiated VMs and where these VMs are
distributed. For example, if only one VM with a memory-intensive MS is started and

13
T. F. da Silva Pinheiro et al.

Table 5  Enabling functions for the scaling defined with the Mercury language
Transition Enabling functions

SOA n (((#Rn − #Cn ) > ((#Cn + (#SOn × 𝛾nc ) × Πsot n


))) AND
((((((#Cn ∕𝛾nc ) + #SOn ) − Φn )∕𝛼n ) × Ψn ) < 𝛿n ))
SIA n ((#Rn − #Cn ) < (#Cn × Πsitn
)) AND (#SIn < (((#Cn ∕𝛾nc ) − 𝛽n )∕𝛼n )) AND
(((#Cn − ((𝛼n × 𝛾nc ) × (#SIn + 1)) = 0) AND (#Rn = 0)) OR
((#Cn − ((𝛼n × 𝛾nc ) × (#SIn + 1)) > 0)))

almost all of the host’s memory is allocated to that VM, no other VMs on that host can
be allocated. Modeling this type of constraint is not done by the SPN model, but by the
VMP algorithm.
Transition CW comprises the cloud watcher, which takes action to scale MSs. The
model does not take immediate scaling action when a threshold is reached to avoid
oscillation in launching/destroying VMs and containers. This makes sense because
reaching some thresholds may be transient, i.e., when a threshold is reached, it may not
be reached the next time. However, the model can be refined to support purely reactive
scaling. The delay assigned to CW comprises the time interval between scaling deci-
sions ( Ξ ). When CW fires, it creates a token into all Dn places.
Dn contains three output arcs associated with a different immediate transition. The
first arc, which connects Dn to SIAn , denotes the decision to scale in the MS. The sec-
ond arc, connecting Dn to SOAn , represents a decision to scale out. Finally, the third arc
connecting Dn to NAn means that no scaling action is taken. There are no conflicts for
a token in Dn , as all transitions are mutually exclusive. SIAn and SOAn have priority 2,
and both have to enable functions that specify the required conditions for scaling the
MS (see Table 5). Others scale-up and scale-down rules can be defined based on the
variables in the model. Also, it is also possible to associate an external function with
each SOAn and SIAn transition to define sophisticated rules.
SOAn means that the CW sends a scaling-out request (SOR) to the cloud controller
to launch one or more VMs. An arc connecting PCORES to SOAn denotes the alloca-
tion of pCPU cores for the new elastic VMs. When this transition is fired, it consumes
from PCORES the number of tokens corresponding to Ψn , making these cores unavail-
able. The parameters 𝛼n and Ψn determine the maximum number of elastic VMs that
can be allocated to a MS. SOAn has an enabling function responsible for its activation
(see Table 5). The first main condition checks whether the MS has reached a SOT:

#Rn − #Cn > (#Cn + #SOn × 𝛾nc ) × Πsot


n .

This condition checks whether the number of requests in the queue is greater than
n is the multiplicative factor multiplied by the
the current SOT. The parameter Πsot
current MPC, which gives the SOT. The second condition checks whether the MS
has already allocated the maximum number of cores allowed, as follows:
(((#Cn ∕𝛾nc + #SOn − 𝛽n )∕𝛼n ) × Ψn ) < 𝛿n .

Assuming there is no limit to the number of cores the MS can allocate, we consider
the total number of cores provisioned in the MBCI for elastic VMs (i.e., 𝛿n = 𝛿 ).

13
A performance modeling framework for microservices‑based…

Even if the enabling function evaluates to true, SOAn is activated only if the number
of cores required for scale-out is not used by other MSs. In addition, before acti-
vating SOAn , each SOAn transition calls an external function hn (∇) that queries the
scheme generated by the VMP algorithm and checks whether it is possible to allo-
cate new VMs for this MS given the current number of VMs and their distribution
among PMs. When SOAn is fired, it creates into the SOn place the number of VM
instances that the cloud needs to instantiate.
The tokens in SOn enable LVMn , which is responsible for launching VMs and
containers. LVMn corresponds to the instantiation time. This time includes the VM
instantiation time plus the container start-up time ( Θn ). This transition has ISS,
which means it starts VMs simultaneously. When this transition is fired, an elastic
VM and its containers are ready to handle requests and the MPC of the MS increases
by 𝛾nc containers.
SIAn represents the CW sending a request to the cloud controller to destroy VMs
from a MS. SIAn has an output arc to SIn . Each token in SIn represents a scaling-
in request (SIR). SIAn has an enabling function responsible for its activation (see
Table 5). The first main condition evaluates whether the MS has reached a SIT as
follows:

#Rn − #Cn < #Cn × Πsit


n
.

This condition evaluates whether the number of requests in the queue is less than the
current SIT. The second condition prevents more VMs from being destroyed than
the total number of elastic VMs available to the MS, as follows:
#SIn < (#Cn ∕𝛾nc − 𝛽n )∕𝛼n .

The third condition consists of two main logical expressions, as follows:



((#Cn − ((𝛼n × 𝛾nc ) × (#SIn + 1)) = 0) ∧ (#Rn = 0))
((#Cn − ((𝛼n × 𝛾nc ) × (#SIn + 1)) > 0)).

These two expressions compute the MPC of the MS considering the destruction of
elastic VMs. First, we obtain the number of elastic VMs to be destroyed by multi-
plying 𝛼n by 𝛾nc . Next, we multiply the resulting value by the number of SIRs not yet
processed, counting the current request. Then we subtract from Cn the elastic VMs
to be destroyed, obtaining the MPC after the scaling-in process. Thus, the first main
expression checks whether the MS runs out of containers to handle the requests after
destroying the elastic VMs. If so, the VMs corresponding to the current SIR are
destroyed only if there are no requests in the MS. If false, the following expression
checks whether there are any VMs left to process those requests after the VMs are
destroyed. If all required conditions are met, the MS is scaled in. When SIAn is fired,
the token in Dn is consumed and a new token is created into SIn , starting the scaling-
in process.
The tokens in SIn enable RVMn . We assume that there is no delay in destroy-
ing VMs when the decision to scale is made. However, a timed transition could
also be used. When this transition is fired, it destroys the number of VMs

13
T. F. da Silva Pinheiro et al.

Fig. 5  Models for representing scaling scenarios

corresponding to the step size and releases the associated pCPU cores and other
resources for use by other MSs.
Numerically solving a model with a large number of MSs, VMs, and containers
can lead to an explosion of the state space and require a lot of time to solve each
one. Evaluating these models through simulation is an alternative to avoid this prob-
lem. Another solution is the divide-and-conquer approach, which consists in creat-
ing several groups of MSs and assigning a certain number of PMs to each of them.
The model shown in Fig. 5a represents the scaling mechanism with four MSs.
However, this modeling pattern can also be used to represent n MSs. There are
some similarities between this model and the model shown in Fig. 4. This model
is not used for performance evaluation, but to generate a CTMC representing all
possible scaling scenarios given the current values for all parameters 𝛼n , Ψn , and
𝛿 . All timed transitions have a delay time of 1 and have SSS. The SOAn and SIAn
transitions represent the allocation and deallocation of VMs for each MS, respec-
tively. Each msn place represents the number of elastic VMs currently allocated to
the n-th MS. This SPN model generates a CTMC state list and a state transition
matrix ℚ that represents all VM allocation and deallocation scenarios as follows:

13
A performance modeling framework for microservices‑based…

∇e1 ∇e2 ∇e3 ... ∇en


∇e1 p p1,2 p1,3 ... p1,n 
1,1
∇e2  p2,1 p2,2 p2,3 ... p2,n 
∇e3 p p3,n 
Q=  3,1 p3,2 p3,3 ... 
..  . .. .. .. .. 
 . . 
. . . . .
∇en pn,1 pn,2 pn,3 ... pn,n

[ e) e)
Each CTMC state is represented by a vector ∇en = corecpu , ms(vm 1
, ms(vm
2
,
(vme )  , where the element corecpu represents the current number of available
]
… , msn
cores, and each element msvm n represents the current number of elastic VMs for the
e

nth MS. The value that pn,n takes in the matrix ℚ can be 1 or 0, indicating whether or
not there is a transition between states. Using the matrix ℚ and the list of states, it is
possible to map whether VMs are allocated and deallocated at each state transition
and for which MS. Figure 5b represents the first five states of a hypothetical scaling
setting. This CTMC represents all possible state transitions, considering only the
initial state (gray circle). As we can see, there are 9 cores available for instantiating
elastic VMs, and each state transition changes the number of cores available and
VMs assigned to a MS. For each change of 𝛼n or 𝛿 , a new CTMC must be created.
The variations of these parameters must reflect the variations that can be performed
by the optimization algorithm. All generated ℚ matrices and the respective state lists
serve as input to a VMP algorithm, which generates the VMP scheme with all sce-
narios of allocating VMs into PMs considering all state transitions.

4.3.1 The modeling framework when using a VM pool

When the MBCI adopts a VM pool, the meaning of some parameters changes. The
difference is that when the MBCI uses a VM pool, all VMs are already instantiated
and have the same processing capacity, so only the number of started/destroyed con-
tainers in each of them changes. In this way, only the start-up time of the containers
is considered, represented by the SOAn transitions. The PCORE place represents the
instantiated elastic VMs that are not allocated to any MS. pCPU cores for elastic
VMs ( 𝛿 ) becomes elastic VMs, and pCPU cores allocated/deallocated in each SOR
( Ψn ) becomes elastic VMs allocated in each SOR (i.e., equal to the step size). In this
context, the utilization metric (i.e., U 𝛿 and Un𝛿 ) represents the utilization of the elas-
tic VMs in the pool. However, the functional logic of the model remains the same.
Table 6 shows the parameters whose meaning has changed in this new context.

4.3.2 Performance metrics

Next, we present the key metrics derived from the model, which are essential to sup-
port microservices deployment in private infrastructures.
A SP must estimate the utilization of both pCPU cores for instantiating elastic
VMs if no VM pool is used, or elastic VMs if a pool is used, for each deployment

13
T. F. da Silva Pinheiro et al.

Table 6  New meanings for variables - VM pool


Parameter Description

𝛿 Number of elastic VMs in the pool.


𝛿n Maximum number of elastic VMs that microservice n can allocate.
Θn Time to start up the replicated containers for the microservice n .
Ψn Number of elastic VMs allocated/deallocated for each scaling
request. In this context, Ψn = 𝛼n.

configuration considered. By evaluating this metric, the SP can estimate whether the
processing resources are sufficient to handle the expected workload. If the utilization
is very high, this could indicate that the SP needs to allocate more PMs or that some
scaling settings are inappropriate because they may allocate more resources than the
MSs need. The total utilization of cores is estimated as shown in Eq. 1. Similarly,
the provider can also estimate the utilization per MS, as shown in Eq. 2.
∑n
U 𝛿 = 𝛿 − ( i=1 P(m(PCORES) = i) × i) (1)

∑n
Un𝛿 = ((( i=1 P(m(Rn ) = i) × i)∕𝛾nc − 𝛽n )+
∑n (2)
( i=1 P(m(SOn ) = i) × i))∕𝛼n × Ψn

DP is obtained by evaluating the probability that all slots in the buffer are occupied,
i.e., evaluating the probability that the queue size is equal to BSn , as shown in Eq. 3.
DR is obtained as shown in Eq. 4. The DR of the MBCI is the sum of the DRs of all
MSs. The effective arrival rate ( 𝜆en ) can be derived from DP and corresponds to the
rate of requests that are not discarded, as shown in Eq. 5.
∑n ∑n
DPn = P(( i=1 P(m(Rn ) = i) × i) − ( i=1 P(m(Cn ) = i) × i) = BSn ) (3)

DRn = 𝜆n × DPn (4)

𝜆en = 𝜆n × (1 − DPn ) (5)


MRT includes the total amount of time a request remains in a MS and is obtained as
shown in Eq. 6.
n

MRTn = ( P(m(Rn ) = i) × i)∕𝜆en (6)
i=1

TP can represent the number of requests processed per unit time in both the MBCI
and each MS. To compute TP, we considered Little’s law [60], which states that the
TP for a stable system is equal to the effective arrival rate ( 𝜆en ). The MBCI through-
put is the sum of the TP of all MSs.
SPs should know when their MSs and MBCIs are more likely to finish process-
ing an expected workload. CDFs, FT (t) , are time-dependent functions that indicate

13
A performance modeling framework for microservices‑based…

Performance Interference Modeling

Profiling Profiling for coVM Scenarios


1 4
Individual Microservices

Generating the Q-matrix Creating a Performance


2 of the Scaling Engine 5
Predictor for coVM Scenarios

Defining the VMP Schema and Building a Performance


3 all VM Co-scheduling Scenarios 6
Database for coVM Scenarios

Fig. 6  Methodology for modeling performance interference

such a time by the maximum probability of absorption. Non-negative random vari-


ables can define their absorption probability distributions in terms of their probabil-
ity density functions fT (t) , as shown in Eqs. 7 and 8 . CDFs allow for a more com-
prehensive evaluation of trade-offs considering different deployment scenarios. One
type of interpretations may be traced when analyzing CDFs, which may be applied
to a myriad of case studies:

• Probability of finishing the workload processing before time t P(T < t) denotes
the probability of finishing the processing before a time instant t.
t

∫0
FT (t) = fT (t)dt (7)

dFT (t)
fT (t) = (8)
dt

4.4 Performance interference modeling

Performance models for MBCIs must account for different scenarios of co-scheduled
VMs/containers that stress physical resources in different ways and impact perfor-
mance. A VM co-scheduling scenario represents a set of VMs/containers distributed
across multiple PMs. A collocated VM (coVM) scenario, on the other hand, represents
a set of VMs/containers running in a single PM. A VM co-scheduling scenario can
have up to n coVM scenarios associated with it, where n is the number of PMs for
virtualization. This allocation can be defined by a VMP/consolidation algorithm. Since
the number of VMs and the types of MSs deployed in the PMs vary due to scaling,
resource contention between coVMs affects service times, and it is necessary to esti-
mate this overhead for each MS. We propose a method for building a predictive per-
formance model that takes into account the level of stress that a coVM scenario places
on PMs, as shown in Fig. 6. In the following, we explain the steps to build the VMP
scheme, the predictive model, and the supporting databases for the SPN model.

13
T. F. da Silva Pinheiro et al.

• Step 1: Profiling individual microservices In this phase, we profile the resource


consumption and performance of each MS running in isolation in a PM. We
assume that all PMs supporting virtualization have the same processing capac-
ity. Engineers profile from 1 to n VMs, where n is the maximum number of VMs
that should be allocated in a PM for the MS. This can be determined by the VM
configuration, PM configuration, and 𝛿n . If the SP wants to evaluate performance
considering a different number of containers in the VM to decide what number
of containers to run in each VM, this is also done in this step. The goal is to
obtain a statistical sample for each number of VMs.
• Step 2: Generating the Q-matrix of the scaling engine At this stage, several ℚ
-matrices can be created, each representing all possible scaling scenarios given
the current 𝛼n ’s and 𝛽n’s. If any of the 𝛼n ’s and/or 𝛽n ’s of all MSs are changed
by the optimization algorithm, a ℚ-matrix must be created for each set of pos-
sible values given all 𝛼n ’s and 𝛽n’s. When the number of reserved VMs changes,
the value of 𝛿 also changes. And when the step size for a microservice changes,
the VM allocation pattern also changes. Therefore, for each change in 𝛼n and/or
𝛽n , a VMP schema database should be created. The engineer can use the DoE
technique to generate all possible combinations of 𝛼n and 𝛽n that the optimization
algorithm can consider [60].
• Step 3: Defining the VMP schema and all VM/container co-scheduling scenarios
A tool such as CloudSim toolkit5 can support the creation of VM combinations
between PMs according to a specific VMP algorithm such as First Fit [16]. When
not using a VM pool, a VMP algorithm considers the number of reserved VMs, the
resources consumed by each VM, the current distribution of VMs among PMs, and
the resources currently available on the PMs to decide which PMs to allocate the
new VMs to. The cloud scheduler should organize VMs between PMs according to
the generated VMP scheme, and live migration can support this process [57]. When
adopting a VM pool, the algorithm decides on which VM to start the containers
considering the VMs available. For both scenarios, this algorithm should obtain all
scaling scenarios represented by the ℚ matrix and its list of states, and according to
certain rules, generate all possible scenarios of co-scheduled VMs/containers and
store them in a database. Based on the [generated data, it is]possible to map the cur-
rent allocated VMs, ∇ , to PMs, ∇pm = ∇1 , ∇2 , … , ∇n  , as follows:
pm pm pm

∇pm ← m(∇).
Each element ∇n is another n-dimensional vector representing the number of
pm
[
VMs that the PMn is running for each MSn , i.e., ∇n = ms(vm)  .
pm (vm) ]
1
, ms2
, … , ms(vm)
n
It is then possible to know where each VM is running and with which other VMs.
In addition, a function hn (∇) called by transition SOAn decides whether the alloca-
tion of new VMs is possible given the current allocation of VMs based on the allo-
cation constraints defined by the VMP scheme. Figure 7 shows a VM allocation
scenario considering seven VMs. The service time for each MS in each of the PMs
may differ due to the interference caused by the other VMs.

5
  https://​cloud​simpl​us.​org/.

13
A performance modeling framework for microservices‑based…

A B C D
=[2, 1, 3, 1] REST
API #A

m( )
Quad-core CPU

Node-1
REST
API #B
VMP
scheme
database Quad-core CPU

REST Node-2
API #C
pm
=

A B C D A B C D A B C D
[[1,1,1,0],[0,0,1,1],[1,0,1,0]] REST
#D Quad-core CPU

API
PM-1 PM-2 PM-3 Node-3

Fig. 7  A co-scheduling scenario with 3 coVM scenarios

• Step 4: Profiling for coVM scenarios Considering the VMP scheme created, we
profile the resource utilization and performance of different coVM scenarios in
a single PM. However, an empirical evaluation of all combinations of VMs in a
PM can be very time consuming if the number of scenarios is high. Therefore,
only a small number of scenarios is profiled. A method such as Monte Carlo
or Latin Hypercube Sampling (LHS) can be used to select the scenarios to be
empirically evaluated [26, 66].
• Step 5: Creating a performance predictor for coVM scenarios The collected data
is used to build a RFR-based model. For each coVM scenario X, the following
input features are used in the predictive model:
– An n-dimensional vector ∇n .
pm
[ ]
– An n-dimensional vector ∇c = ms(c)
1
, ms(c)
2
, … , msn  
(c)
, where each element
n is the sum of the number of containers when running msn
ms(c) VMs for the
(vm)

n-th MS. [ ]
– An n-dimensional vector ∇st = ms(st)
1
, ms(st)
2 n  , where each element
, … , ms(st)
n takes the value representing the service time of the n-th MS when msn
ms(st) (vm)

VMs of that MS execute in isolation


[ in a PM. ]
– An n-dimensional vector ∇ = r1 , r2 , … , rn(u)  , where each element rn(u) rep-
u (u) (u)

resents the sum of individual utilizations for the n-th resource under consid-
eration from all MSs in X, where r1(u) could be CPU, r2(u) could be memory, and
so on. More precisely, the summation for each resource rn(u) takes into account
the individual utilization of each MS when running ms(vm)n VMs for this MS in
isolation in a PM.
  A prediction function f takes the above vectors as input and returns an
n-dimensional vector 𝜇 that predicts
[ the service times ] for all MSs running in the
coVM scenario X, i.e., 𝜇(X) = 𝜇1 , 𝜇2 , … , 𝜇n  , as follows:
(ms) (ms) (ms)

13
T. F. da Silva Pinheiro et al.

𝜇(X) ← f ∇pm , ∇c , ∇st , ∇u


( )
n

• Step 6: Building a performance database for coVM scenarios In this final step,
the predictive model is used to create a database containing the service time for
each MS considering all possible combinations of VMs on a single PM that was
not empirically profiled to create the learned model.

After these steps, the SPN model can use the performance and VMP databases
to support a myriad of evaluations. It is important to note that the predictive model
is valuable when the number of scenarios is high. When the number of scenarios is
low, all scenarios can be empirically profiled.

4.5 Optimization algorithm

Since the stochastic model can represent multiple MSs, it is necessary to use a
MOO approach where there may not be a unique global solution, but a set of suit-
able solutions. The quality of the solutions found is defined by the OFs. The OFs
are computed from the decision variables, i.e., the unknowns determined by a par-
ticular solution. We adapted the NSGA-II algorithm to find values for each selected
parameter, using up to three OFs for each MS: TP, MRT, DP. The OFs of a MS may
conflict with the OFs of other MSs because the underlying infrastructure is shared
among the MSs. Maximizing the TP of a MS may reduce the TP of other MSs.
Therefore, the main goal of the NSGA-II algorithm in our framework is to search for
solutions that guarantee that all OFs satisfy the SLA constraints for all MSs.
The optimization algorithm is described in Algorithm 1 and receives as input the
stochastic model; a set of constant parameters 𝜌c , which are not changed by the algo-
rithm; a set of variable parameters 𝜌v with an interval of values that can be assigned
to each parameter; the expected workload Δ , which defines the arrival rate for each
MS; the population size N, which defines the number of individuals generated in
each iteration; and the performance constraints defined in the SLAs. Each MS has
its own set of parameters that define the number of reserved VMs ( 𝛽n ), which may
vary from 0 to n; the number of containers per VM ( 𝛾nc ), which can vary from 1 to n;
the step size ( 𝛼n ), which can vary from 1 to n, etc. The algorithm starts by exploring
the space of possible configurations considering the parameter ranges 𝜌v.
The algorithm randomly generates an initial parent population P consisting of N
solutions uniformly distributed within the parameter bounds, and evaluates each solu-
tion using the OFs returning the fitness value of the given solution, starting from line
1 and ending with line 5. Each solution is called a chromosome and its representa-
tion consists of a pair of vectors representing the values assigned to the parameters
and the mapping of these values to each parameter. This initial population undergoes
an evolutionary process based on a fitness value, where at each stage of this process
the best individuals are selected for the next evolutionary stage and the others are
discarded. This evolutionary stage includes a number of generations of individuals,
starting from line 6 and ending with line 20. In the inner loop, the algorithm gener-
ates the offspring U from a population P(t) using crossover and mutation operators.
In this phase, parents with a higher fitness value are selected with a higher probability

13
A performance modeling framework for microservices‑based…

Fig. 8  Process for determining the fitness value for si

(line 8). In the crossover phase, two chromosomes from the set of parents P(t) are
combined to form a pair of children called offspring individual (line 9). Next, muta-
tion introduces random changes and is applied at the gene level (line 10). After the
offspring population is generated, the algorithm evaluates the OFs for each offspring
individual and assigns it a fitness value, starting from line 13 and ending with line
15. The algorithm combines the parent and offspring populations and selects the best
ones to form a new population, starting from line 16 and ending with line 19. Among
a set of solutions {P(t) ∪ U} , the non-dominated set of solutions is the one that is not
dominated by any member. A solution s1 dominates the other solution s2 if s1 is not
worse than s2 for all objectives and s1 is strictly better than s2 in at least one objective.
This evolutionary process ends when a satisfactory level of fitness value or a prede-
fined number of generations has been reached. This process leads to a selection of
solutions that represent the best solutions considering the trade-off between all OFs.
This set of solutions is called the Pareto front of solutions.

Algorithm 1 Optimization algorithm (NSGA-II)


Input: SPNModel, ρc , ρv , ∆, N, SLAs
Output: solutions meeting the performance constraints
1 t ← 0 //generation ID
2 Initialize a population P(t) with N individuals (ρv );
3 for each individual i of P(t) do
4 evaluateStochasticFitnessValue(SPNModel, si , ρc , ∆)
5 end
6 while termination criteria not fullfiled do
7 while children number is less than the population size do
8 Apply binary tournament selection to select two parents;
9 Apply crossover over the selected parents;
10 Apply mutation on two children;
11 Add the children into a new population U;
12 end
13 for each offspring individuals i do
14 evaluateStochasticFitnessValue(SPNModel, si , ρc , ∆)
15 end
16 non-dominated sorting of { P(t) ∪ U }
17 calculate crowding distance of { P(t) ∪ U }
18 P(t+1) ← generate new population with elitism from { P(t) ∪ U }
19 t←t+1
20 end
21 return the Pareto front of solutions;

Figure 8 shows the process for determining the stochastic fitness value for a given
solution si . This process is represented by calling the evaluateStochasticFitnessValue
function. The values of the decision variables from si are combined with the set 𝜌c .
The resulting set is mapped to the corresponding model parameters. The model is
then solved, and the fitness value of the given solution is determined. It is important
to note that the algorithm calls the Mercury API [56] to solve the model. Algorithm 2
describes the application of the analytical framework to find optimized solutions.

13
T. F. da Silva Pinheiro et al.

5 Model validation and case studies

This section describes the experiments performed to validate the analytical


framework and two case studies to illustrate its applicability.

13
A performance modeling framework for microservices‑based…

5.1 Model validation

Two validation scenarios were considered, the MBCI without a pool of elastic
VMs and with a pool. In the first scenario, VMs and containers are instantiated or
destroyed each time a scaling threshold is reached. In the second scenario, nine elas-
tic VMs are already instantiated and only containers are started or stopped. A 95%
CI was considered to assume that the framework accurately represents the real MSs.
We used JMeter6 to generate requests against the MBCI following a Poisson process.
We were careful to choose a workload that exceeds the limits of reserved VMs so
that elastic VMs are allocated and deallocated. Both the MBCI and the analytical
framework were given the same input parameters. Four MSs were considered, each
performing a specific type of processing:

1. A: computes the sum of prime numbers from one up to 100,000 for each request;
2. B: performs user authentication then generated HTML output based on the result;
3. C: queries a MongoDB instance with 2,000 records to get a fixed number of
randomly selected records for each query, and returns the found results;
4. D: queries a local database of 2,000 faces to perform face recognition, using
OpenCV7 as an auxiliary library to process the faces.

We developed a front-end system in Java that played the role of an API gateway, a
LB, and a CW. It forwarded requests from outside the MBCI to the target MS, moni-
tored the MSs, performed scaling requests, and communicated with the container
orchestrator via REST APIs. We used Docker as the container virtualization engine
and Docker Swarm to manage the container cluster. However, other orchestrators
could be used, such as Kubernetes [53]. A VM configured in master mode was used
to run both Docker Swarm and the front-end system. All other VMs were configured
for worker mode.
The infrastructure used to run the MSs consisted of five servers running Intel
Core i7-3770 3.4 GHz Quad-core CPU, 4 GB RAM DDR3, and 500 GB SATA HD.
We used a switch with 16 GB Ethernet ports and a maximum capacity of 32 Gbps
connecting all servers. Four servers used KVM hypervisors and were managed by
the Cloud Stack private cloud environment installed on the fifth server. One server
was allocated to run four reserved VMs, one VM for each MS. On the other hand,
three pCPU cores were reserved for elastic VM virtualization on each of the other
nodes running the KVM hypervisor, resulting in a total of nine pCPU cores for elas-
tic VM virtualization (i.e., #PCORES = 9 tokens).
Table 7 shows the parameters for the validation scenarios. Considering that each
VM is pinned to one core and Ψn × 𝛼n = 1 for all MSs, up to 9 elastic VMs can be
used. An exponentially distributed time of Ξ = 10 s was considered for the CW. The
analytical framework requires two other parameters that must be collected through
experiments, the probability distributions related to service and instantiation times.

6
  https://​jmeter.​apache.​org/.
7
  http://​opencv.​org/.

13
T. F. da Silva Pinheiro et al.

The first step was to profile each MS in a PM. Since a maximum of 3 VMs can
run in a PM for a MS, we separately profiled 1 to 3 VMs running in parallel on an
isolated PM. We configured JMeter to run 30 times for each coVM scenario. Each
run ended when the total number of requests processed by each VM was equal to
or greater than 300. For each VM, 9000 requests were considered in each scenario.
The resource consumption and service time for each MS in each number of evalu-
ated VMs were profiled. After, we generate the Q-matrix of the scaling engine. The
resource consumption of the VMs, the number of reserved VMs and the Q-matrix
serve as input for a VMP algorithm. As a VMP scheme, we considered the First Fit
algorithm, where the scheduler analyzes each PM sequentially and places the VM in
the first one with enough resources [16]. However, other algorithms could be used.
It was found 716 VM co-scheduling scenarios. By applying this algorithm, all VM
co-scheduling scenarios were mapped to PMs, resulting in a set of 35 distinct coVM
scenarios. The mapping performed by the VMP algorithm was adopted by the cloud
scheduler. For the scenario with a pool of VMs, we consider the same scheme cre-
ated without the pool. The difference is that with the pool, only the containers in
the respective VMs are instantiated. We evaluated a small set of randomly selected
coVM scenarios. We profiled all MSs that compose a given scenario in a PM at the
same time. We used JMeter to generate requests so that all containers in all VMs
were busy processing requests, and collected a statistical sample of service times
and resource consumption.
The next step was to build the learned model and then use it to create a per-
formance database. Our profiling phase resulted in a dataset of about hundreds of
data points, of which 80% of the data was used for training, 10% for testing, and the
remaining 10% for validation. Table 8 shows the learned model’s accuracy measure
for predicting service time for each MS. We used the same accuracy measure as con-
sidered in other studies, namely the coefficient of determination ( R2 ) [67]. Table 9
shows the minimum and maximum average service time considering all coVM sce-
narios, and Fig. 9 shows the variation on service time for all MSs in all coVM sce-
narios. All service times are exponentially distributed.
The next step was to collect the probability distributions of VM and container
instantiation times to validate the model without a pool of VMs. The step size for
each MS was one, so we instantiated only one VM in each SOR. We generated one
hundred SORs for each MS, by calling the EC2 API provided by the private IaaS
cloud. After capturing the instantiation time of the current request, the VM was
destroyed and another SOR was executed. We applied the Kolgomorov–Smirnov
(KS) and Anderson–Darling [68] goodness-of-fit tests with 95% confidence to
assess the best fit and both show that the instantiation times follow the Erlang distri-
bution (see Table 10).
Using the same steps as above, we collected the probability distributions of the
start-up times of the containers. These probabilities distributions are considered
when validating the model with a pool of elastic VMs (see Table 11).
Model validation consisted of two rounds of validation in each scenario consider-
ing exponentially distributed inter-arrival rates, as shown in Table 12. 𝜆r1
mbci
and 𝜆r2
mbci
represent the rates at which requests arrive at the MBCI. Each rate is divided among
the MSs taking into account the percentage given in the last four columns of the

13
A performance modeling framework for microservices‑based…

Table 7  Deployment scenario Parameters A B C D


for validation
Buffer size ( BSn) 100 100 100 100
Containers per VM ( 𝛾nc) 5 30 20 30
Reserved VMs ( 𝛽n) 1 1 1 1
𝛿n (Max) 6 6 6 6
Ψn 1 1 1 1
Step size ( 𝛼n) 1 1 1 1
SOT ( Πsot
n ) 2 2 2 2
SIT ( Πsit
n )
1 1 1 1

Table 8  Performance of the learned model for service times


Feature Test accuracy Train accuracy Validation accuracy

service_time_for_A 98.962 97.011 97.094


service_time_for_B 99.611 98.266 98.837
service_time_for_C 99.810 99.152 98.516
service_time_for_D 99.655 99.034 99.223

table. We configured JMeter to run 30 times in each round. Each run ended when the
total number of requests processed by each MS was equal to or greater than 300. For
each MS, 9000 requests were considered.
The models were solved using numerical analysis. A statistical method called
Bootstrap was used by deriving a 95% CI to validate the proposed models [69].
Bootstrap is a resampling method that generates samples within a sample obtained
empirically. Since only a limited number of samples are extracted on the real hard-
ware, the Bootstrap method generates fast results. First, an empirical sample
Tt = (Tt1 , Tt2 , … , Ttn ) is collected. The sample Tt is used to generate the bootstrap
sample Tt∗ = (Tt1∗ , Tt2∗ , … , TtB∗ ) , where B is 1000. Afterward, the B replications are
ordered, such that 𝜃̂ ∗ is the smallest replication and 𝜃̂ ∗ is the largest one. Consider-
(1) (B)
ing a confidence degree of 1 − 𝛼 , in which 𝛼 is the significance level, the CI is
denoted by [ 𝜃̂(B𝛼∕2)

,𝜃̂(B[1−𝛼∕2])

 ]. For a better understanding, assuming B = 1, 000 and
a confidence degree of 95% or 0.95 (𝛼 = 0.05) , the CI is represented by [ 𝜃̂(25)

,𝜃̂(975)

 ].
To validate an SPN model, the metric extracted from it should be within the boot-
strap CI. Figure 10 shows the TP and the usage of elastic VMs considering the two
rounds of validation when no VM pool is used. On the other hand, Fig. 11 shows the
TP and the usage of elastic VMs when the VM pool is used. The experiments have
shown that the measurements obtained from the analytical framework are within the
CIs of the real MSs. This demonstrates that the framework can represent the real
MSs with 95% confidence and can be used to support a myriad of evaluations.

13
T. F. da Silva Pinheiro et al.

Table 9  Mean service time— MS Mean service time–low (ms) Mean service


minimum and maximum time–high
(ms)

A 8101 12,976
B 4892 5921
C 7531 9285
D 14,011 18,451

20
18
16
Service Time (s)

14
12
10
8
6
4
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
coVM Scenario
$ % & '
1 VM 2 VMs 3 VMs

Fig. 9  Service times for all coVM scenarios

5.2 Case study 1

This case study applies the analytical framework to examine the performance over-
head of concurrent microservices execution, considering the two deployment sce-
narios evaluated in the validation process. We also evaluated the impact of CW on
performance. This study considered the inter-arrival rate shown in Table 13.
Performance was affected for all MSs when they ran concurrently in both scenar-
ios as expected, as shown in Table 14. Overall, the average usage of elastic VMs per
MS decreased as a result of resource sharing. When using a VM pool and compar-
ing it with the first scenario, we found a significant improvement in TP, MRT, and
DR for all MSs. The MRT decrease for A, B, C, and D when run together was 37%,
25%, 32%, and 22%, respectively, when a VM pool was used, compared to the first
scenario. Similarly, the decrease from DR for A, B, C, and D was 74%, 65%, 80%,
and 67%, respectively, compared to the first scenario. MS A had the largest perfor-
mance degradation when the VM pool was not adopted, its TP decreased by nearly
23%, the usage of elastic VMs decreased by more than 50%, and the DP increased
by 160%. The average number of containers running for A dropped from 27 to 13.
The DP of A has decreased considerably in the second scenario, from 32% without
the VM pool to approximately 8.5%. However, the decrease in the usage of elastic

13
A performance modeling framework for microservices‑based…

Table 10  Instantiation times for VMs and their containers


MS VM type (containers) Probability distri- IT1 (ms) Phases Rates
bution

A t2.small 2 (5) Erlang 38,440 34 1.11120


B t2.small 2 (30) Erlang 41,209 44 0.93387
C t2.small (20)
2 Erlang 41,950 33 1.25010
D t2.small (30)
2 Erlang 43,102 45 0.94679
1
Instantiation time
2
With 1 GB of RAM

Table 11  Start-up time for MS Number of Probability Phases Rates


ST1 (ms)
containers Containers distribution

A 5 Erlang 4163 37 0.1113


B 30 Erlang 5439 33 0.1626
C 20 Erlang 6298 45 0.1387
D 30 Erlang 6945 53 0.1288
1
Start-up time

Table 12  Inter-arrival rates for A B C D


𝜆r1 𝜆r2
the two validation rounds mbci mbci

20 reqs/s 25 reqs/s 10% 40% 25% 25%

VMs by A is still high (42.9%), but the decrease in TP was about 7.4%. The TP of
A is 34.8% higher when the pool is used than when it is not. For a 24-hour period,
the use of a pool corresponds to 61,171 more requests processed in A. A SP could
increase the number of elastic VMs that microservice A can use or allocate another
reserved VM to it. However, given the limited processing resources of the MBCI,
this would affect the performance of the other MSs. The overhead of running MSs
together was lower when only containers were instantiated.
Let us evaluate the performance for both scenarios considering the average of the
metrics of all microservices. Table 15 shows the average performance overhead con-
sidering the mean values of all microservices. We found that when not adopting a
VM pool the MRT per MS increased a little over 3 s, the average utilization of elas-
tic VMs per MS dropped by more than 46% and the average probability of a MS dis-
carding a request has increased by about 80%, from 14.2% to 25.7%. For a 24-hour
period, the overhead on TP is about 58,838 unprocessed requests per MS. On the
other hand, we found that when adopting a VM pool the impact on the MRT per MS
was about 700 ms, the average utilization of elastic VMs per MS decreased signifi-
cantly, but the impact on the TP was lower, an increase of about 21.5% compared to

13
T. F. da Silva Pinheiro et al.

11
10
9

Throughput (reqs/s
8
7
6
5
4
3
2
1
0
A B C D A B C D
Microservice Microservice
20 reqs/s 25 reqs/s
Infrastructure Model

(a) Throughput (reqs/s)

2.5
Elastic VM Utilization

2.0

1.5

1.0

0.5

0
A B C D A B C D
Microservice Microservice
20 reqs/s 25 reqs/s
Infrastructure Model

(b) Elastic VM usage

Fig. 10  Validation without a pool of elastic VMs

12
11
10
Throughput (reqs/s)

9
8
7
6
5
4
3
2
1
0
A B C D A B C D
Microservice Microservice
20 reqs/s 25 reqs/s
Infrastructure Model

(a) Throughput (reqs/s)

3.0

2.5
Elastic VM Utilization

2.0

1.5

1.0

0.5

0
A B C D A B C D
Microservice Microservice
20 reqs/s 25 reqs/s
Infrastructure Model

(b) Elastic VM usage

Fig. 11  Validation with a pool of elastic VMs

13
A performance modeling framework for microservices‑based…

Table 13  Inter-arrival rate for 𝜆mbci A B C D


the case study
30 reqs/s 10% 40% 25% 25%

the first scenario. The mean average probability of a MS rejecting a request was less
than 8%. For a 24-hour period, the overhead on the TP is about 24,280 unprocessed
requests per MS, a value that is 58.7% lower than in the scenario that does not use
a VM pool. Table 16 shows a comparison of the average performance per MS con-
sidering all MSs running concurrently when using a VM pool and when not using
a VM pool. Overall, we noticed a significant improvement in average performance
when using the VM pool. The MRT per MS decreased by almost 8 s, elastic VMs
usage increased by about 13%, TP improved by 21.5% and the DR decreased by
70%. The probability of a MS rejecting a request decreased by 71.5%. Figure  12
shows the average performance of the MBCI with and without the VM pool. This
overall performance gain is due to the minimal overhead of starting containers.
Let us now evaluate the throughput and DR provided by the deployment configu-
ration with a pool of VMs by varying the arrival rate at the MBCI from 0 to 120
requests per second (see Fig. 13). We found that throughput stabilizes at around 40
requests per second and all requests above this value are discarded.
Using CDFs, companies can calculate the likelihood that processing of a set of
requests will complete within a given time, taking into account the MBCI or indi-
vidual MSs. The model presented in Fig.  4 should be evolved. There should be a
new place called PR (processed requests) and a new output arc for each P n transition
connecting that transition to this new place. By performing a transient analysis, we
evaluate the probability that the number of tokens in PR is equal to the number of
expected requests to be processed ( P{#PR = N} ), which in this case is 1000. To this
end, we calculated the CDFs for the two scenarios. The probabilities were calculated
from t = 0 s to t = 800 s (see Fig. 14). According to Fig. 14, the distance between
the probabilities for scenario #1 is larger than the probabilities for scenario #2. The
probability of finishing execution for #2 reaches 100% only after 700 s, while this
happens at about 560 s for #1. In #1, the probability of finishing processing at 200 s
is 77% while #2 provides a 47% probability. When we consider the probability of
completion at t = 100 s , the probability for #1 is 47.5%, while #2 provides a 3%
probability. We see that the curve of #1 is steeper, which is due to the lower over-
head of launching containers only. By using a VM pool provides the highest prob-
abilities for faster completion of request processing.
Let us now examine the impact of CW on performance. We consider the auto-
scaling periodicity as a varying parameter and keep the same auto-scaling configura-
tion, arrival rate, and scenarios as in this study case. We vary the CW time from 10
to 100 s, with a step size of 10 s. Figure 15 shows the TP of the MBCI at each CW
time for both scenarios. For the deployment configurations evaluated, a lower CW
time results in better throughput performance for both scenarios. When we compare
TP with CW = 10 s to TP with CW = 100 s in each scenario, we find a decrease
of 21.8% and 30% for the scenario with a pool and without a pool, respectively.

13
T. F. da Silva Pinheiro et al.

Table 14  Performance overhead for each microservice in the two scenarios


MS Metric MBCI without a pool of VMs MBCI with a pool of VMs

I 1
A 2 A–I % I1 A2 A–I %

A Containers 27 13 − 14 − 51.85 28 16 − 12 − 42.86


Elastic VMs 4.339 1.537 − 2.802 − 64.58 4.631 2.180 − 2.451 − 52.93
TP (reqs/s) 2.633 2.037 − 0.596 − 22.64 2.963 2.745 − 0.218 − 7.36
DR (reqs/s) 0.367 0.963 0.596 162.40 0.037 0.255 0.218 589.19
DP (%) 12.24 32.10 19.86 162.25 1.24 8.48 7.24 583.87
MRT (ms) 35,849 44,989 9,140 25.50 26,315 28,422 2,107 8.01
B Containers 64 59 − 5 − 7.81 57 52 − 5 − 8.77
Elastic VMs 1.12 0.959 − 0.161 − 14.38 0.911 0.731 − 0.180 − 19.76
TP (reqs/s) 10.427 10.009 − 0.418 − 4.01 11.572 11.309 − 0.263 − 2.27
DR (reqs/s) 1.573 1.991 0.418 26.57 0.428 0.691 0.263 61.45
DP (%) 13.11 16.59 3.48 26.54 3.57 5.76 2.19 61.34
MRT (ms) 10,314 10,835 0,521 5.05 7,818 8,090 0,272 3.48
C Containers 58 47 − 11 − 18.97 58 49 − 9 − 15.52
Elastic VMs 1.903 1.361 − 0.542 − 28.48 1.92 1.428 − 0.492 − 25.63
TP (reqs/s) 6.468 5.811 − 0.657 − 10.16 7.348 7.163 − 0.185 − 2.52
DR (reqs/s) 1.032 1.689 0.657 63.66 0.152 0.337 0.185 121.71
DP (%) 13.76 22.52 8.76 63.66 2.02 4.50 2.48 122.77
MRT (ms) 17,019 18,965 1,946 11.43 12,541 12,973 0,432 3.44
D Containers 94 68 − 26 − 27.66 99 73 − 26 − 26.26
Elastic VMs 2.15 1.265 − 0.885 − 41.16 2.301 1.446 − 0.855 − 37.16
TP (reqs/s) 6.166 5.113 − 1.053 − 17.08 7.159 6.701 − 0.458 − 6.40
DR (reqs/s) 1.334 2.387 1.053 78.94 0.341 0.799 0.458 134.31
DP (%) 17.78 31.83 14.05 79.02 4.55 10.65 6.10 134.07
MRT (ms) 24,901 26,206 1,305 5.24 20,318 20,456 0,138 0.68

1
Isolated-the microservice running in isolation
2
All-the four microservices running concurrently

Table 15  Average performance overhead per microservice in the two scenarios


Metric MBCI without a pool of VMs MBCI with a pool of VMs

I 1
A 2 A–I % I1 A2 A–I %

Containers 61 47 − 14 − 22.95 61 48 − 13 − 21.31


Elastic VMs 2.378 1.281 − 1.098 − 46.15 2.441 1.446 − 0.995 − 40.75
TP (reqs/s) 6.424 5.743 − 0.681 − 10.60 7.261 6.98 − 0.281 − 3.87
DR (reqs/s) 1.077 1.758 0.681 63.26 0.240 0.521 0.281 117.33
DP (%) 14.22 25.76 11.54 81.15 2.85 7.35 4.50 157.89
MRT (ms) 22,021 25,249 3228 14.66 16,748 17,485 0,737 4.40

1
Isolated
2
All

13
A performance modeling framework for microservices‑based…

Table 16  Comparison of the average performance per MS in the two scenarios


Metric NP1 P2 P–NP %

Elastic VMs 1.281 1.446 0.165 12.88


TP (reqs/s) 5.743 6.980 1.237 21.54
DR (reqs/s) 1.758 0.521 − 1.237 − 70.36
DP (%) 25.76 7.35 − 18.41 − 71.47
MRT (s) 25.249 17.485 − 7.764 − 30.75

1
Without VM pool
2
With VM pool

27.92
30
22.97
25
20
Mean

15
10 5.79 7.03
5.12
5 2.08
0
Elastic VMs TP (reqs/s) DR (reqs/s)
Without VM Pool With VM Pool

Fig. 12  Average performance of the MBCI

For both scenarios, this change in CW time corresponds to 527,571 and 596,643
unprocessed requests in a 24-hour period, respectively. Moreover, the decrease in
TP with increasing CW time tends to be more pronounced when the VM pool is not
used, unlike the other scenario. As we can see, the CW time has a stronger weight
on performance.
The overhead of instantiating VMs is one of the factors affecting performance. It
can be beneficial for the SP to maintain a VM pool and only instantiate containers
on an available VM when a threshold for scaling-out is reached. When the MBCI
adopts a VM pool, the allocation and deallocation of VMs happen quickly, avoiding
the growth in the number of requests in the queue, which degrades performance.
However, other factors need to be considered, such as the impact on power con-
sumption if a set of elastic VMs is always instantiated even when not in use. It is
important to note that this analysis is for a single deployment configuration. Each
configuration impacts both the performance of the MBCI as a whole and the perfor-
mance of individual MSs. Therefore, it is beneficial for the SP to carefully consider
this trade-off for each configuration.

5.3 Case study 2

The analytical framework was used to evaluate the performance of different param-
eter settings considering a pool of instantiated elastic VMs. In this study, we inves-
tigate the deployment settings for deploying the same MSs on the same MBCI used
for the validation process, with the MSs now subject to performance constraints as

13
T. F. da Silva Pinheiro et al.

Fig. 13  Impact of 𝜆mbci on MBCI’s TP and DR when using a pool of VMs

Fig. 14  CDFs for scenarios #1 and #2

shown in Table 17 for the workload presented in Table 13. The trigger time for the
CW ( Ξ ) is 10 s. We employed the optimization algorithm to explore a space of fea-
sible solutions and find solutions that satisfy the constraints imposed by an SLA
by determining the parameter settings to apply in production based on the range of
values that each parameter can take (see Table 18). It is important to point out that
the SP must consider the constraints and workloads for their MSs when using our
solution.
Since NSGA-II is a MOO algorithm, it may find not only one solution that meets
the performance constraints but several. At the end of the search process, for each
solution found that satisfies the constraints, the algorithm shows the values to be
assigned to the MBCI parameters. After 10,000 solution evaluations, the algorithm
has found 2 solutions that satisfy all constraints (see Table 19).

13
A performance modeling framework for microservices‑based…

Fig. 15  Influence of the CW time on the MBCI throughput

Table 17  Performance SLA Metric A B C D


constraints defined in an SLA
#1 Minimum throughput (reqs/s) 2 10 7 7
Maximum response time (s) 30 12 15 20
Maximum discard (%) 5 5 5 5

Table 18  Parameter ranges for Parameters A B C D


the optimization algorithm
Buffer size ( BSn) 100–200 100–200 100–200 100–200
Containers per VM ( 𝛾nc) 5 30 20 30
Reserved VMs ( 𝛽n) 1–2 1–2 1–2 1–2
Elastic VMs ( 𝛿n ) (Max) 5–9 5–9 5–9 5–9
Step size ( 𝛼n) 1–2 1–2 1–2 1–2
SOT ( Πsot
n ) 1–4 1–4 1–4 1–4
SIT ( Πsit
n )
1–3 1–3 1–3 1–3

Table 20 shows the measurements that result when the solutions found are applied
in production. Assuming that the MBCI throughput is the sum of the throughputs of
the MSs, the difference between the TP of the solutions #1 and #2 considering a
24-hour period is 3974 more requests processed when #1 is applied, which is an
increase of 15.9%. When we evaluate the DR of the two solutions, we can see that
the overall DR of the solution #1 is 3.93% lower than that of #2. It is therefore up to
the SP to decide which of the two solutions should actually be applied.

13
T. F. da Silva Pinheiro et al.

Table 19  Solutions found by the Solution MS Πsot


BSn 𝛽n 𝛿n 𝛼n Πsit
optimization algorithm n n

#1 A 180 2 2 1 3 2
B 120 1 3 1 2 1
C 100 2 1 1 2 1
D 140 3 5 1 3 1
#2 A 160 1 3 2 3 2
B 130 1 4 1 3 1
C 110 1 2 2 3 2
D 150 3 6 1 4 2

Table 20  Measurements Solution Metric A B C D


obtained in production
#1 Elastic VMs 1.136 0.664 0.418 0
TP (reqs/s) 2.940 11.546 7.224 7.192
DR (reqs/s) 0.060 0.454 0.276 0.308
DP (%) 2.04 3.93 3.82 4.28
MRT (s) 29.03 10.52 14.01 18.29
#2 Elastic VMs 2.259 0.704 1.114 0.532
TP (reqs/s) 2.898 11.543 7.146 7.269
DR (reqs/s) 0.101 0.457 0.354 0.231
DP (%) 3.49 3.96 4.97 3.18
MRT (s) 29.98 10.63 14.77 17.63

6 Conclusion

This work proposes an analytical framework to predict the behavior of auto-scaling


mechanisms in a microservices-based cloud infrastructure when multiple microser-
vices are deployed together. Modeling these infrastructures and their auto-scaling
mechanisms in a private cloud using SPNs, RFR, and NSGA-II enables the identifi-
cation of critical trade-offs between performance and resource consumption consid-
ering all deployed microservices. This work also proposes a methodology to build a
ML-based performance model to predict service times considering different VM co-
scheduling scenarios generated by a consolidation technique. The proposed frame-
work has been validated with 95% confidence, and we have applied it in two case
studies to demonstrate its feasibility. Using the proposed framework enabled us to
observe that performance overhead was lower when using a pool of VMs. The MRT
per microservice decreased by almost 8 s, the utilization of elastic VMs increased by
about 13%, throughput improved by 21.5%, and discard decreased by 70%. We also
found that throughput decreased by 21.8% in the scenario with a pool of VMs and
by 30% without a pool, simply by changing the time between scaling decisions from
10 s to 100 s. The synergistic combination of stochastic modeling, a ML model, and

13
A performance modeling framework for microservices‑based…

a MOO algorithm has been shown to accurately predict the performance of MSs
deployed on private infrastructure. The use of the framework can help service pro-
viders find optimized solutions that support both infrastructure planning and online
performance prediction, and enable trade-off analysis considering different scenarios
and constraints. In the future, we plan to adapt and evaluate the proposed approach
by considering more sophisticated consolidation techniques that represent only one
virtualization layer, where containers run directly on the physical host OS and mul-
tiple physical machines are used. In addition, the performance of other MOO tech-
niques can be assessed when the stochastic model is explored to find good solutions
for different scenarios and constraints.

Data availibility  Data sharing not applicable to this article as no datasets were generated or analyzed dur-
ing the current study.

Declarations 
Conflict of interest  The authors have no conflict of interests to declare that are relevant to the content of
this article.

References
1. Fowler and Lewis (2014) “Microservices,” https://​marti​nfowl​er.​com/​artic​les/​micro​servi​ces.​html
2. Indrasiri K (2018) Microservices for the enterprise : designing, developing, and deploying. Apress,
New York
3. Newman S (2015) Building microservices: designing fine-grained systems. O’Reilly Media, Sebas-
topol, CA
4. Villamizar M, Garcés O, Castro H, Verano M, Salamanca L, Casallas R, Gil S (2015) Evaluating
the monolithic and the microservice architecture pattern to deploy web applications in the cloud, In:
2015 10th Computing Colombian Conference (10CCC), pp 583–590
5. Linthicum DS (2017) Connecting fog and cloud computing. IEEE Cloud Comput 4(2):18–20
6. Barve Y, Shekhar S, Chhokra A, Khare S, Bhattacharjee A, Sun H, Gokhale A, Kang z (2019)
Fecbench: A holistic interference-aware approach for application performance modeling, 06
7. Dhillon JS, Purini S, Kashyap S (2013) Virtual machine coscheduling: A game theoretic approach,
In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pp 227–234
8. Ismail BI, Jagadisan D, Khalid MF (2011) Determining overhead, variance & isolation metrics in
virtualization for iaas cloud, In: Lin SC, Yen E, (Eds) Data driven e-science, Springer, New York,
pp 315–330
9. da Silva Pinheiro TF, Silva FA, Fé I, Kosta S, Maciel P (2018) Performance prediction for sup-
porting mobile applications’ offloading. J Supercomput 74(8):4060–4103. https://​doi.​org/​10.​1007/​
s11227-​018-​2414-6
10. Pereira P, Araujo J, Torquato M, Dantas J, Melo C, Maciel P (2020) Stochastic performance model
for web server capacity planning in fog computing. J Supercomput 76(12):9533–9557. https://​doi.​
org/​10.​1007/​s11227-​020-​03218-w ([Online])
11. Pereira P, Araujo J, Melo C, Santos V, Maciel P (2021) Analytical models for availability evaluation
of edge and fog computing nodes. J Supercomput 77(9):9905–9933. https://​doi.​org/​10.​1007/​s11227-​
021-​03672-0 ([Online])
12. Pereira P, Melo C, Araujo J, Dantas J, Santos V, Maciel P (2021) Availability model for edge-fog-
cloud continuum: an evaluation of an end-to-end infrastructure of intelligent traffic management
service. J Supercomput 78(3):4421–4448. https://​doi.​org/​10.​1007/​s11227-​021-​04033-7 ([Online])
13. Clemente D, Pereira P, Dantas J, Maciel P (2022) Availability evaluation of system service hosted in
private cloud computing through hierarchical modeling process. J Supercomput 78(7):9985–10024.
https://​doi.​org/​10.​1007/​s11227-​021-​04217-1 ([Online])

13
T. F. da Silva Pinheiro et al.

14. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algo-
rithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
15. Mitchell M (1998) An introduction to genetic algorithms. Complex Adaptive Systems. Bradford
Books, Cambridge, MA
16. Mills K, Filliben J, Dabrowski C (2011) Comparing vm-placement algorithms for on-demand
clouds, In: IEEE Third International Conference on Cloud Computing Technology and Science, pp
91–98
17. Hu Y, Laat C, Zhao Z (2019) Optimizing service placement for microservice architecture in clouds.
Appl Sci 9:4663
18. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://​doi.​org/​10.​1023/a:​10109​33404​
324 ([Online])
19. Bao L, Wu C, Bu X, Ren N, Shen M (2019) Performance modeling and workflow scheduling of
microservice-based applications in clouds. IEEE Trans Parallel Distrib Syst 30(9):2114–2129
20. Khazaei H, Barna C, Beigi-Mohammadi N, Litoiu M (2016) Efficiency analysis of provisioning
microservices. In: IEEE International Conference on Cloud Computing Technology and Science
(CloudCom), pp 261–268
21. Gribaudo M, Iacono M, Manini D (2018) Performance evaluation of replication policies in micros-
ervice based architectures, In: Proceedings of the Ninth International Workshop on the Practical
Application of Stochastic Modelling (PASM), Electronic Notes in Theoretical Computer Science,
vol. 337, pp. 45–65. [Online]. Available: https://​www.​scien​cedir​ect.​com/​scien​ce/​artic​le/​pii/​S1571​
06611​83003​79
22. Li Q, Li B, Mercati P, Illikkal R, Tai C, Kishinevsky M, Kozyrakis C (2021) Rambo: Resource allo-
cation for microservices using Bayesian optimization. IEEE Comput Archit Lett 20(1):46–49
23. Merkouche S, Bouanaka C (2022) A proactive formal approach for microservice-based applications
auto-scaling, In: Belala F, Benchikha F, Boufaïda Z, Smaali S (eds) Proceedings of The 11th Semi-
nary of Computer Science Research at Feminine (RIF 2022) LIRE laboratory, constantine 2 Univer-
sity- Abdelhamid Mehri, Constantine, Algeria, March 10, 2022, ser. CEUR Workshop Proceedings,
vol. 3176. CEUR-WS.org, pp. 15–28. [Online]. Available: http://​ceur-​ws.​org/​Vol-​3176/​paper2.​pdf
24. El Kafhali S, El Mir I, Salah K, Hanini M (2020) Dynamic scalability model for containerized cloud
services. Arabian J Sci Eng 45:10693–10708
25. Molloy MK (1982) Performance analysis using stochastic petri nets. IEEE Trans Comput

31:913–917
26. Maciel PRM (2022) Performance, reliability, and availability evaluation of computational systems,
volume I: performance and background. Taylor & Francis, Chapman and Hall/CRC. [Online]. Avail-
able: https://​www.​routl​edge.​com/​Perfo​rmance-​Relia​bility-​and-​Avail​abili​ty-​Evalu​ation-​of-​Compu​
tatio​nal/​Maciel/​p/​book/​97810​32295​374
27. Salah T, Zemerly MJ, Yeun CY, Al-Qutayri M, Al-Hammadi Y (2017) Performance compari-
son between container-based and vm-based services, In: 2017 20th Conference on Innovations in
Clouds, Internet and Networks (ICIN), pp 185–190
28. Ueda T, Nakaike T, Ohara M (2016) Workload characterization for microservices. In: International
Symposium on Workload Characterization (IISWC), pp 1–10, IEEE
29. Joy AM (2015) Performance comparison between linux containers and virtual machines. In: Inter-
national Conference on Advances in Computer Engineering and Applications, pp. 342–346
30. Felter W, Ferreira A, Rajamony R, Rubio J (2015) An updated performance comparison of virtual
machines and linux containers. In: IEEE International Symposium on Performance Analysis of Sys-
tems and Software (ISPASS) pp 171–172
31. Singh V, Peddoju SK (2017) Container-based microservice architecture for cloud applications,
In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp
847–852
32. Villamizar M, Garcés O, Ochoa L, Castro H, Salamanca L, Verano M, Casallas R, Gil S, Valencia
C, Zambrano A, Lang M, (2016) Infrastructure cost comparison of running web applications in the
cloud using aws lambda and monolithic and microservice architectures, In: 2016 16th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp 179–182
33. Lin M, Xi J, Bai W, Wu J (2019) Ant colony algorithm for multi-objective optimization of con-
tainer-based microservice scheduling in cloud, IEEE Access, vol. 7, pp 83088–83100
34. Jindal A, Podolskiy V, Gerndt M (2019) Performance modeling for cloud microservice applications.
In: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, ser.

13
A performance modeling framework for microservices‑based…

ICPE ’19. Association for Computing Machinery, New York, NY, USA, pp 25-32. [Online]. Avail-
able: https://​doi.​org/​10.​1145/​32976​63.​33103​09
35. Auer F, Lenarduzzi V, Felderer M, Taibi D (2021) From monolithic systems to microservices: An
assessment framework, Inf Softw Technol, 137:106600. [Online]. Available: https://​www.​scien​cedir​
ect.​com/​scien​ce/​artic​le/​pii/​S0950​58492​10007​93
36. Bauer A, Lesch V, Versluis L, Ilyushkin A, Herbst N, Kounev S (2019) Chamulteon: coordinated
auto-scaling of micro-services, 07
37. Khazaei H, Ravichandiran R, Park B, Bannazadeh H, Tizghadam A, Leon-Garcia A (2017) Elascale:
autoscaling and monitoring as a service, In: Proceedings of the 27th Annual International Confer-
ence on Computer Science and Software Engineering, ser. CASCON ’17. USA: IBM Corporation,
pp 234–240
38. Nitto ED, Florio L, Tamburri DA (2020) Autonomic decentralized microservices: the Gru approach
and its evaluation. Springer International Publishing, Cham, pp 209–248. https://​doi.​org/​10.​1007/​
978-3-​030-​31646-4_9 ([Online])
39. Mao Y, Oak J, Pompili A, Beer D, Han T, Hu P (2017) Draps: Dynamic and resource-aware place-
ment scheme for docker containers in a heterogeneous cluster, In: 2017 IEEE 36th International
Performance Computing and Communications Conference (IPCCC), pp 1–8
40. Rossi F, Cardellini V, Presti FL (2020) Hierarchical scaling of microservices in kubernetes. In: IEEE
International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), pp 28–37
41. Horovitz S, Arian Y (2018) Efficient cloud auto-scaling with sla objective using q-learning. In: 2018
IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp 85–92
42. Rossi F, Nardelli M, Cardellini V (2019) Horizontal and vertical scaling of container-based applica-
tions using reinforcement learning. In: 2019 IEEE 12th International Conference on Cloud Comput-
ing (CLOUD), pp 329–338
43. Saboor A, Hassan MF, Akbar R, Shah SNM, Hassan F, Magsi SA, Siddiqui MA (2022) Containerized
microservices orchestration and provisioning in cloud computing: A conceptual framework and future
perspectives, Appl Sci, 12(12). [Online]. Available: https://​www.​mdpi.​com/​2076-​3417/​12/​12/​5793
44. Zhang R, Zhong A-M, Dong B, Tian F, Li R (2018) Container-vm-pm architecture: A novel archi-
tecture for docker container placement. In: Luo M, Zhang L-J (eds) Cloud computing-CLOUD
2018. Springer International Publishing, Cham, pp 128–140
45. Hussein MK, Mousa MH, Alqarni MA (2019) A placement architecture for a container as a service
(caas) in a cloud environment, J Cloud Comput, 8(1). [Online]. Available: https://​doi.​org/​10.​1186/​
s13677-​019-​0131-1
46. Khan AA, Zakarya M, Khan R, Rahman IU, Khan M, ur Rehman Khan A (2020) An energy, perfor-
mance efficient resource consolidation scheme for heterogeneous cloud datacenters, J Netw Comput
Appl, 150:102497. [Online]. Available: https://​www.​scien​cedir​ect.​com/​scien​ce/​artic​le/​pii/​S1084​
80451​93035​71
4 7. Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77(4):541–580
48. Marsan MA, Conte G, Balbo G (1987) A class of generalized stochastic petri nets for the perfor-
mance evaluation of multiprocessor systems. ACM Trans Comput Syst 2(2):93–122. https://​doi.​org/​
10.​1145/​190.​191 ([Online])
49. German R (2000) Performance analysis of communication systems: modeling with non-Markovian
stochastic petri nets. John Wiley & Sons, New York
50. Trivedi KS (2001) Probability and statistics with reliability, queuing, and computer science applica-
tions. John Wiley and Sons, New York
51. Maciel PRM (2022) Performance, reliability, and availability evaluation of computational systems,
Volume 2: Reliability, availability modeling, measuring, and data analysis. Taylor & Francis, Chap-
man and Hall/CRC. [Online]. Available: https://​www.​routl​edge.​com/​Perfo​rmance-​Relia​bility-​and-​
Avail​abili​ty-​Evalu​ation-​of-​Compu​tatio​nal/​Maciel/​p/​book/​97810​32306​407
5 2. Mitchell TM (1997) Machine learning. McGraw-Hill, New York
53. Bernstein D (2014) Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Comput
1(3):81–84
54. Merkel D (2014) Docker: Lightweight linux containers for consistent development and deployment,
Linux J, 2014(239)
55. Peng J, Zhang X, Lei Z, Zhang B, Zhang W, Li Q (2009) Comparison of several cloud comput-
ing platforms, In: 2009 Second International Symposium on Information Science and Engineering.
IEEE, pp 23–27

13
T. F. da Silva Pinheiro et al.

56. Pinheiro TFS, Oliveira D, Matos R, Silva B, Pereira P, Melo C, Oliveira F, Tavares E, Dantas J, Maciel
P (2021) The mercury environment: a modeling tool for performance and dependability evaluation, 06
57. Zakarya M, Gillam L (2017) An energy aware cost recovery approach for virtual machine migra-
tion. In: Bañares JÁ, Tserpes K, Altmann J (eds) Economics of grids, clouds, systems, and services.
Springer International Publishing, Cham, pp 175–190
58. Lopez-Pires F, Baran B (2015) Virtual machine placement literature review,. [Online]. Available:
arXiv:​1506.​01509
59. Valderas P, Torres V, Pelechano V (2020) A microservice composition approach based on the chore-
ography of bpmn fragments, Inf Softw Technol, 127:106370. [Online]. Available: https://​www.​scien​
cedir​ect.​com/​scien​ce/​artic​le/​pii/​S0950​58492​03013​97
60. Jain R (1990) The art of computer systems performance analysis: techniques for experimental
design, measurement, simulation, and modeling. John Wiley & Sons, New York
61. Galante G, d. Bona LCE (2012) A survey on cloud computing elasticity, In: Proceedings of the
2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing, ser. UCC ’12.
Washington, DC, USA: IEEE Computer Society, pp 263–270. [Online]. Available: https://​doi.​org/​
10.​1109/​UCC.​2012.​30
62. Iqbal W, Dailey M, Carrera D (2009) SLA-driven adaptive resource management for web applica-
tions on a heterogeneous compute cloud. Springer Verlag, pp 243–253. [Online]. Available: http://​
hdl.​handle.​net/​2117/​15867
63. Iqbal W, Erradi A, Abdullah M, Mahmood A (2022) Predictive auto-scaling of multi-tier applica-
tions using performance varying cloud resources. IEEE Trans Cloud Comput 10(1):595–607
64. Qu C, Calheiros RN, Buyya R (2018) Auto-scaling web applications in clouds: A taxonomy and
survey, ACM Comput Surv, 51(4). [Online]. Available: https://​doi.​org/​10.​1145/​31481​49
65. Gotin M, Lösch F, Heinrich R, Reussner R (2018) Investigating performance metrics for scaling
microservices in cloudiot-environments, In: Proceedings of the 2018 ACM/SPEC International
Conference on Performance Engineering, ser. ICPE ’18. Association for Computing Machinery,
New York, NY, USA, pp 157–167. [Online]. Available: https://​doi.​org/​10.​1145/​31844​07.​31844​30
66. Tang B (1993) Orthogonal array-based Latin hypercubes. J Am Stat Assoc 88(424):1392–1397
67. Mishra N, Lafferty JD, Hoffmann H (2017) Esp: A machine learning approach to predicting applica-
tion interference. In: IEEE International Conference on Autonomic Computing (ICAC), pp 125–134
68. Karson M (1968) Handbook of methods of applied statistics. volume i: Techniques of computation
descriptive methods, and statistical inference. volume ii: Planning of surveys and experiments. i. m.
chakravarti, r. g. laha, and j. roy, new york, john wiley; 1967, \$9.00. J Am Stat Assoc 63(323):1047–1049
69. Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman and Hall, London

Publisher’s Note  Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and
applicable law.

Authors and Affiliations

Thiago Felipe da Silva Pinheiro1   · Paulo Pereira1 · Bruno Silva2 · Paulo Maciel1

* Thiago Felipe da Silva Pinheiro


[email protected]; [email protected]
Bruno Silva
[email protected]
1
Centro de Informática (CIn), Federal University of Pernambuco, Road Jorn. Aníbal Fernandes ‑
Cidade Universitária, Recife, PE 50740‑560, Brazil
2
Research, Microsoft, Redmond Washington, USA

13

You might also like