0% found this document useful (0 votes)
5 views

Benchmarking_Serverless_Workloads_on_Kubernetes

This document presents a study on benchmarking serverless workloads on Kubernetes, focusing on a fault-tolerant architecture using open-source technologies. The research assesses performance metrics such as success rate, throughput, and latency under various workloads, demonstrating that the system can handle 70 to 90 concurrent users effectively. The findings indicate that the proposed architecture can scale efficiently while meeting quality of service requirements, addressing gaps in existing literature on serverless computing in private cloud environments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Benchmarking_Serverless_Workloads_on_Kubernetes

This document presents a study on benchmarking serverless workloads on Kubernetes, focusing on a fault-tolerant architecture using open-source technologies. The research assesses performance metrics such as success rate, throughput, and latency under various workloads, demonstrating that the system can handle 70 to 90 concurrent users effectively. The findings indicate that the proposed architecture can scale efficiently while meeting quality of service requirements, addressing gaps in existing literature on serverless computing in private cloud environments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Benchmarking Serverless Workloads on Kubernetes


2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid) | 978-1-7281-9586-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/CCGrid51090.2021.00085

Hima Govind Horacio González–Vélez


Cloud Competency Centre, National College of Ireland, Dublin, Ireland.
Email: [email protected] [email protected]

Abstract—As a disruptive paradigm in the cloud landscape, by Forrester [2] for the COVID-19 pandemic recovery period
Serverless Computing is attracting attention because of its unique i.e. by the end of 2021. Moreover, by using private/dedicated
value propositions to reduce operating costs and outsource cloud infrastructures, different organisations can arguably ful-
infrastructure management. Nevertheless, enterprise Function-
as-a-Service (FaaS) platforms may pose significant risks such as fill distinct regulatory and compliance requirements and offer
vendor lock-in, lack of security control due to multi-tenancy, com- enhanced security control.
plicated pricing models, and legal and regulatory compliance— The key contributions of this research are the following:
particularly in mobile computing scenarios. This work proposes • Develop a fault tolerant and highly available serverless
a production-grade fault-tolerant serverless architecture based architecture and verify the feasibility of open-source FaaS
on a highly-available Kubernetes topology using an open-source
framework, deployed on OpenStack instances, and benchmarked frameworks on private cloud.
with a realistic scaled-down Azure workload traces dataset. By • Benchmark serverless functions by developing a realistic
measuring success rate, throughput, latency, and auto scalability, workload model exploiting the workload characterization
we have managed to assess not only resilience but also sustained features of Azure production dataset.
performance under a logistic model for three distinct representa- • Perform data modelling by regression analysis to un-
tive workloads. Our test executions show, with 95%–confidence,
that between 70 and 90 concurrent users can access the system derstand the functional relationship between concurrency
while experiencing acceptable performance. Beyond the breaking and response times of various mobile workload patterns.
point identified (i.e. 91 transactions per second), the Kubernetes To evaluate our approach, we have deployed a master-
cluster has to be scaled-up or scaled out to meet the QoS and worker high-availability Kubernetes infrastructure on a pri-
availability requirements.
Index Terms—Serverless, OpenFaas, High Availability, Work-
vate OpenStack cloud infrastructure. Designed for clusters,
load modelling, Service Level Agreement, SLA, Mobile Comput- Kubernetes supports multiple “pods” across different systems
ing, Azure, Containers (physical or virtual machines), to allow seamless horizontal
scaling for dynamic workloads. Pods have been deployed as
I. I NTRODUCTION interconnected Docker containers integrated with OpenFaas to
Serverless computing is a recent cloud service model which explicitly underpin cluster interaction and monitoring respec-
allows stateless, short duration, and event-driven functions to tively
be executed on abstracted containers with granular scaling at We have benchmarked our approach using an open dataset
cloud datacenters. Serverless computing is commonly divided of Azure Cloud traces, and our findings suggest that auto-
into Backend-as-a-Service (BaaS) and Function-as-a-Service scalability of OpenFaas is seamless on OpenStack with the
(FaaS). proposed architecture. With the increase in concurrency, re-
While BaaS focuses on the traditional server-based com- sponse times increased and success rate decreased, which is
ponents such databases, authentication, storage, hosting, etc., consistent with the overall serverless trend. Our work has
FaaS allows developers to write the business logic in the form proved that by exploiting workload characterisation and imple-
of stateless functions, relieving programmers from operational menting a scaled down load leads to an appropriate workload
aspects of the underlying infrastructure. This paper will there- model. Based on these results, we have modelled response time
fore focus on the FaaS side. data set. Our results, i.e. the Logistic regression data model,
Widely popular because of its pay-per-use billing strategy, showed the number of concurrent users that can safely access
FaaS enables a “serverless function”, typically a code snippet, the application within SLAs with 95% confidence interval
to be executed on-demand on a operating system container in is within the range of 70 − 90 for a given cluster size.
response to triggered events. Open-source FaaS frameworks The data model can also be used to assess the performance
are vendor-agnostic and provide developers with the flexibil- of the serverless framework for any value of concurrency
ity of developing applications in multiple programming lan- without actually running the benchmarks. Hence, the workload
guages, effectively decoupling the underlying cloud platform model can be adopted for continuous performance testing
from the business logic. of real world serverless applications to avoid performance
The main motivations behind serverless adoption are cost regressions.
savings, reduced management overhead, and faster time to The rest of the paper is organised as follows. Section II
market. Gartner, a leading global research firm, forecasts that discusses the related work relevant to this paper. Section III
approximately half of global enterprises will embrace server- summarises the design details of the proposed HA architecture.
less by 2025, as of 20% today [1]. These insights are echoed Section III and Section IV describes the implementation

978-1-7281-9586-5/21/$31.00 ©2021 IEEE 704


DOI 10.1109/CCGrid51090.2021.00085
Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
Reference FaaS of- Cloud de- Setup highest cost-efficiency. Kumar et al. [8] evaluated the produc-
fering ployment tion serverless computing environments across the commercial
[3], [4], [5], Enterprise Public N/A cloud platforms.
[6], [7], [8] cloud Operating system virtualization or containers are
lightweight process-isolation abstractions running on the
[9] Open- On-prem N/A
source & Public shared kernel of a given host with guest operating system’s
cloud libraries and binaries [13]. They encompass a popular option
for the cloud data center, as they are nimbler than the
[10] Open- On-prem Single-master Kubernetes
source cluster setup on VMs full hardware-abstraction virtual machines. That is to say,
created by Virtualbox since containers share a kernel within a host, they do not
software. expose different attack surfaces through a host OS process as
[11] Open- On-prem Single-master Kubernetes virtual machines do. In this work, we have used Kubernetes
source cluster on bare-metal for Orchestration on OpenStack Nova instances. Originally
servers and Raspberry PI developed by NASA and Rackspace Hosting in 2010,
devices
OpenStack https://www.openstack.org/ is currently managed
by The OpenStack Foundation.
This work Open- Private Multi-master Kubernetes On the one hand, Kubernetes performance has also been the
source cloud HA cluster on OpenStack
subject of a few performance evaluations [14], [15]. On the
Table I other hand, from a mobile/IoT perspective, Pinto [9] proposed
S UMMARY OF L ITERATURE REVIEW AND C ONTRIBUTION a distributed architecture using Fog computing by combining
the architectural designs of edge and serverless computing.
They summarised that OpenFaas functionality suits perfectly
details, and Section V shows a performance evaluation and the constraints of IoT. Mohanty et al. [10] evaluated the
data modelling of the results. Concluding remarks and future performance of four open-source frameworks on a single-
research directions are detailed in Sections VI and VII respec- node Kubernetes cluster setup on VMs created by Virtualbox
tively. software, and their results indicate that Kubeless has the best
performance compared to others. Finally, Palade et al. [11]
II. R ELATED W ORK proposed multi-layered edge framework and evaluated four
Within the cloud computing community, the term serverless open-source serverless frameworks and found that Kubeless
is sometimes construed as misleading as the actual code outperforms the other frameworks in terms of response time
deployment still happens on containers within physical servers and throughput.
in a given datacenter. In fact, a well documented issue in Although significant research has been published on various
serverless computing [12] is the Cold Start problem which topics in serverless computing as depicted in Table I, few
relates to the time to bring up a new container instance when studies have evaluated the feasibility and performance of
there are no warm containers available for the request. Brads- cloud functions and, more importantly, they have employed
ley et al. [3] perform an in-depth performance assessment on proprietary serverless frameworks solely focusing on pricing
AWS Lambda and prove that their proposed warming strategy optimization and memory utilisation of functions. That is to
optimises its performance. Their results argue that languages say, while there have been holistic approaches to serverless
such as .Net and Java exhibit the longest cold-starts, whereas performance evaluation [16], serverless benchmarking still
Python and Node.js have the shortest. remains an open problem.
McGrath et al. [4] propose a .Net-oriented framework im- The most relevant work for our work is the evaluation of
plemented on Microsoft Azure for increasing the performance open-source FaaS frameworks on resource-constrained edge
of FaaS platforms where function expiration times and cold networks [11], edge IoT devices and IoT-Gateways with
start duration are evaluated. Jackson and Clynch [5] propose ARM architecture-based Raspberry pi devices. Prior works
a new testing framework to analyse the cost and performance did not consider load variations in every test runs, rather
of commercial FaaS platforms and their results indicate that they evaluated the FaaS framework with a uniform workload
.Net and Python has the highest performance with low cold with fixed concurrency and incremented the concurrency value
start times on Azure Functions and AWS Lambda. in subsequent test scenarios. Though valuable, these metrics
do not completely represent production FaaS performance as
Malawski et al. [6] [7] conducted a detailed performance
fluctuating workloads with predictable/unpredictable arrival
evaluation and cost comparison of serverless frameworks, by
rates, but not uniform mobile workloads, are best suited for
taking their heterogeneity into account. Of specific relevance
serverless.
to this work, their novel cloud benchmarking framework based
More importantly, the existing works have considered
on HyperFlow scientific workflows established that cost is
device-constrained infrastructure rather than scalable VMs
independent of the size of the functions executed for IBM
or containers on data center servers. Such virtualisation ap-
functions and Lambda. However, for Azure and GCF, the
proaches lead to overheads, and bare-metal servers suffer
results were uneven with the smallest function having the

705

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
scalability issues with significantly lower latency. Some AWS service integration by default. NATS is used for queuing
EC2 instance types are subject to automatic throttling because and asynchronous execution of functions. When a function
of their CPU credits system. Hence, the prior test beds do not is deployed, it creates multiple Pods depending on the scaling
arguably represent production-like setup and the performance parameters set by the user. Functions can be scaled to zero and
results reported cannot serve as a real baseline. back again in OpenFaas by using faas-idler or REST API.
In contrast, we have modelled the workload for FaaS Function WatchDog converts Dockerfiles into serverless
performance evaluation using the characterisation of real FaaS functions. It serves as an entry point for the HTTP requests
traces. We also study the performance of the framework under by interacting with the processes and caller. It acts as an ”init
different load conditions for various load patterns. Finally, process” supporting health checks, concurrent requests and
we also perform data modelling on a Response time data set timeouts. OpenFaas supports two watchdog modes namely,
using statistical regression techniques. The confidence level, of-watchdog (HTTP) mode and classic mode. HTTP mode
statistical significance, and the functional relationship between is suited for resource-intensive or streaming operations. Of-
response time and concurrency is established, which will aid watchdog maintains the functions alive between invocations.
in effective business decisions and predictions. Besides, there
is no empirical literature identified on serverless with private IV. E XPERIMENTAL SETUP
clouds. This study addresses the gaps in the existing literature Our serverless architecture is composed of ten m1.xLarge
by proposing a fault tolerant serverless architecture on private Nova virtual machines on the NCI OpenStack private
cloud and benchmarking it with fully open-source technologies cloud https://cloud.ncirl.ie. Configured with 8 vCPUs, 160 GB
such as Kubernetes, OpenFaas, JMeter, OpenStack, Grafana, memory and 16GB of RAM, each virtual machine runs Ubuntu
Kubespray, and Python. 18.04 LTS. All the worker nodes have Docker v19.03 installed
and Kubernetes v1.19 is used for container management and
III. P ROPOSED A RCHITECTURE
orchestration, created using an Ansible-based Kubespray pro-
In this work, we have selected a High Availability (HA) visioner. Calico v3.16 is applied for container/pod networking
topology in order to provide high performance, stability, in the multi-master Kubernetes cluster. JMeter 5.3, used as
and fault tolerance. We have chosen a minimum of three HTTP trigger to the functions, is installed on the cluster
master nodes for redundancy based on Raft [17], a consensus as docker container. Ansible v2.9.6 and Jinja2 v2.11.1 are
algorithm which extends the seminal Castro-Liskov work [18], installed as pre-requisites for running Kubespray tool through
internally implemented via Etcd. For a distributed cluster with pip installer v20.2.4. CPU-intensive function is implemented
N master nodes, Raft permits the system to be operational in Python v3.6.9. OpenFaas serverless framework deployed on
until (N − 1)/2 partitions and receives the quorum of (N/2)+1 the testbed has the following core components installed by de-
master nodes on absolute value. For a cluster to be highly fault: OpenFaas Gateway v0.18.18, Prometheus v19.03, Queue
available with at least one node up and running all the time, Worker v0.11.2, Basic-auth plugin v0.20.3, Nats streaming
a minimum of three master/worker/etcd nodes are required. A server v0.19, Alert manager v0.16. Faas-cli v0.12.14 and
brief description of the architectural components is presented Grafana v4.6.3 dashboard are integrated with OpenFaas ex-
below: plicitly for cluster interaction and monitoring respectively.
A master node is responsible for the maintenance of the We have employed an open dataset of Azure work-
cluster. For all administrative activities in the cluster, the mas- load traces for performance evaluation, freely available
ter node serves as the entry point. A worker node is a physical in Github under the AzurePublicDataset-20191 . It con-
machine, device, rack server or a VM which is capable of tains 14 time series spanning from July 15th–29th, each
running Linux containers to provide run-time environment. for 24 hours, with July 20th, 21st, 27th, and 28th be-
Worker nodes maintain pods and are controlled by the master ing weekends, and the rest are weekdays. There are
node. Etcd is a distributed store of key values that is used three files available: ı) Function Invocation Counts; ıı)
to manage the cluster state. Config details including secrets, Function Execution Duration; and, ııı) Application Memory.
subnets and configmaps are also stored in etcd. Each external The salient features of the workload characterisation
etcd node in the cluster communicates with the master node are [19]:
via kube-apiserver.
1) On average, more than 50% of functions have execution
OpenFaaS is an open-source serverless framework for
time less than 1 second and 96% of functions execute
building cloud functions, avail-able in Github under the terms
for less than 60s.
and conditions of MIT license. OpenFaaS supports multiple
2) 90% of the applications allocate less than 400MB, and
runtimes such as C#, Go, Java, Python, Ruby, NodeJS, PHP,
50% of the applications consume a maximum of 170
Dockerfile for ARMHF and Dockerfile. The main architectural
MB.
components are: OpenFaas Gateway is used to deploy and
3) 54% of the applications possesses one function, and 95%
invoke functions. Users can interact with OpenFaas gate-
of the applications have a maximum of 10 functions.
way through a UI in OpenFaas. Prometheus monitors the
environment, tracks cloud-native metrics, reports the them 1 https://github.com/Azure/AzurePublicDataset/blob/master/
to the API gateway. OpenFaas ships with the Prometheus AzureFunctionsDataset2019.md

706

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
Figure 1. Proposed fault-tolerant serverless architecture. It uses a minimum of three master nodes for redundancy based on Raft, where master nodes serve
as the entry point.

Figure 2. Test cases (.jmx files) generated by JMeter-Throughput Shaping


Timer plugin for the Peak/Spiky Workload. Figure 4. Test cases (.jmx files) generated by JMeter-Throughput Shaping
Timer plugin for the Growing workload.

4) 81% of the applications have an average invocation of


less than once one per minute. Across the entire Azure
platform, total number of invocations followed a weekly
and diurnal patterns.
5) More than 64% of applications are invoked by a HTTP
trigger and 29% of the applications are triggered by
timers.
6) It can also be observed from the graphs that the Inter-
Figure 3. Test cases (.jmx files) generated by JMeter-Throughput Shaping Arrival Time (IAT) distributions do not follow Poisson
Timer plugin for the Flat Workload.
model but have fairly predictable IATs because the
co-efficient of variation(CV) is greater than one for

707

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
significantly larger portion of applications. tool. OpenFaas ships with default Prometheus integra-
tion, a dashboard for monitoring and logging metrics of
The parameterisation of the simulation is as follows:
functions. Additionally, we have configured visualisation
• Choice of function: A quicksort function, which sorts dashboards using Grafana, an open-source monitoring
a random list of a thousand numbers in the range tool, to turn those metrics into useful dashboard view.
of [1, 10000] is used for simulating a CPU-intensive • Test scenarios: In this research, we are considering
workload, as checked into the public docker registry Spiky/bursty, Growing and Flat workload patterns, which
at dockerpullhma2308/rand:latest. are best representations of mobile cloud workloads, to
• Throughput: To calculate the Throughput or Transac- benchmark the System Under Test (SUT).
tions per second (TPS), we have considered the July 18th In Test scenario-1 (TS1), we consider an unpredictable
workload invocations per function md.anon.d04, which type peak workload with varying spikes row#16082 of
contains the number of invocations of each function Azure dataset. In this scenario, for studying the behaviour
on a per minute basis. A hash function mentioned on of auto scaling and concurrency, we consider Workloads
row#16082, has a maximum of 168943 transactions per scaled down by a factor of 40 (Concurrency series1),
minute. Peak load hour occurs from column ASM to 30 (Concurrency series2) and 10 (Concurrency series3).
AUT for row#16082 precisely. Hence, Peak T P S = Concurrency series1 data calculations is shown below:
168943/60 = 2185.7 for all the test scenarios in this The peak hour data consists of five samples. To scale
paper. The workload simulated in this research is scaled down the load by 30, we divide all the five TPS values by
down by a factor of 30, 40, 50, 10, 20, etc. to fit the 30. Scaled down T P S values =784/30 = 26; 2448/30
Test cluster configuration created on the NCI OpenStack = 81; 2815.7/30 = 93.66 and so on. TS1 dataset thus
private cloud https://cloud.ncril.ie. formed, 26, 81, 93, 76, 13, is seeded into JMeter tool.
• Trigger type: We have chosen HTTP triggers for invok- Hence, Peak Concurrency of TS1 series is 93. Similar
ing functions synchronously. logic is extended to Concurrency series2, series3, growing
• Workload model: We have fixed the number of concur- and flat workload types.
rent users or Thread count to 1500 and test execution In Test scenario-2, a Flat workload with constant requests
duration to 25 minutes with a Ramp up and Ramp per second, row#8922 of the Azure dataset. Test scenario-
down time of 10 minutes each. The Ramp-up and Ramp- 3 is a constantly increasing workload, row#41113 of the
down times are defined as per the recommended best Azure dataset. In both the cases, TPS value is calculated
practices to determine predicted delays between the start and scaled up to maintain peak TPS value for creating
of each concurrent user invoking the function. A Poisson enough load on the system.
distribution of Ramp-up time is purposely avoided and, Figures 2 3 4 are the .jmx files generated by JMeter
therefore, a fixed interval delay is chosen as per the work- for experiments. For testing the auto-scalability scenarios,
load characterisation results of Azure Workloads. The we deploy the function with min. replicas to 1 and max
scaled down TPS values are computed according to the replicas to 5, 10, 50, 70 and 100 using the OpenFaas
explanation mentioned above for varying the concurrency. labels.
Time gaps between Load Variations are set to 1 minute
a) Evaluation metrics: 1) Response time: The time taken
for all the workload patterns.
by the HTTP request for resolution; 2) Throughput (TPS): It is
We have set the minimum function replicas to 1 in order
defined as the number of HTTP requests/transactions satisfied
to avoid cold starts as their presence causes significant
per second. In this case, the TPS values are derived from the
delays in performance evaluation. The average throughput
Azure data set; 3) Success rate: The ratio of the number of
value is auto calculated by JMeter-Throughput by shaping
successful transactions to the total number of transactions. 4)
the timer for the corresponding TPS values seeded for test
Auto-Scalability: It is the ability of the FaaS framework to
executions. The memory and CPU values in OpenFaas are
scale-up or scale-down the function deployment on-demand;
set to defaults to avoid the throttle of CPU performance
5) Fault tolerance: It is the capability of the system to ensure
based on memory RAM size and OpenFaas works just
zero down time; and, 6) High availability: It is the ability
like a Kubernetes pod scheduler.
of architecture to maintain replicas of nodes to act against a
• Test Harness: JMeter, the open-source distributed load
single node failure.
testing tool, is used to generate the load for each test sce-
nario considered. Blazemeter Throughput Shaping Timer,
an opensource JMeter plugin is installed separately to
simulate the workloads for a given Invocations per second V. P ERFORMANCE T EST R ESULTS
(TPS) value. The Ultimate Thread Group provided by
the JMeter plugins library is chosen to create realistic We are benchmarking the SUT with various load patterns
load profiles with concurrent users, load duration, ramp and measure the Average response time metric. As Throughput
up times, etc. Finally, the test results or .jtl logs can is a fixed value, success rate and Response time will vary for
be viewed by the ’Aggregate Report’ feature of JMeter different scenarios.

708

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
Figure 5. OpenFaas Grafana dashboard depicting 200-success and 502-failure for Test scenario-3

Figure 6. OpenFaas Grafana dashboard depicting execution duration for Test scenario-3

Figure 7. OpenFaas Grafana dashboard depicting Replica scaling for Test scenario-3

Response Time
Success rates
100000
218 users 120
93 users 218 users
70 users 93 users
70 users
100
10000
Response Time (ms)

80
Success Rate (%)

1000
60

40
100

20

10
0
1

10

50

70

10

10

50

70

10
0

Function Replicas
Function Replicas

Figure 8. Impact of Auto scaling on Response times depicted for Spiky Figure 9. Impact of Auto scaling on Success rate depicted for Spiky workload
workload

709

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
Function Average Through- Success Response Time
Replicas Response time (ms) put Rate 100000
1 replica
—Concurrency series1— 5 replicas
100 replicas
1 99 16.6 100%
5 99 16.5 100% 10000

Response Time (ms)


10 101 16.5 100 %
50 101 16.5 100%
70 102 16.5 100%
1000
100 102 16.5 100%
—Concurrency series2—
1 5660 22.0 67.39%
5 104 21.9 100% 100
10 105 22.1 100%
50 110 21.9 100%
70 110 22.1 100% 10
100 113 22.0 100%

70

93

21
8
us

us
—Concurrency series3—

us
er

er

e
s

rs
1 6757 41.2 36.41%
Concurrency
5 5857 43.6 48.81%
10 5740 45.0 72.2%
50 5649 42.6 77.81% Figure 10. Impact of Concurrency on Response times depicted for Spiky
70 5600 40.0 78.8% workload
100 5258 44.5 86.62%
Table II
P ERFORMANCE EVALUATION RESULTS OF S PIKY WORKLOADS FOR Success rates
VARIOUS SCALED DOWN CONCURRENCY AND FUNCTION REPLICA (TPS) 120
VALUES . 1 replica
5 replicas
100 replicas
100
Success Rate (%)

80

A. Peak/Spiky workload scenario


60

Functions are auto scaled based on the Requests per second


40
or Throughput value in Prometheus, since the OpenFaas
gateway increments the function replicas as per the scaling 20
factor enabled. The maximum number of replicas on OpenFaas
is dependent on the maximum pods that can be scaled by 0
70

93

21
the Kubernetes HPA scheduler, which is again dependent on

8
us

us

us
er

er

er
s

s
the available memory of the cluster. From the Replica count

s
Concurrency
graphs in Grafana (see figure 5), it is evident that OpenFaas
demonstrates a seamless auto scaling capability on OpenStack. Figure 11. Impact of Concurrency on Success rate depicted for Spiky
Our Faas-Grafana dashboard also captures the response code, workload
i.e. 200 for success and 502 for a bad gateway error as shown
in figure 5. It is interesting to note that the average response
time increased with auto-scaling slightly for Workload A. This B. Flat and Growing workload scenarios
could be because of the performance degradation of OpenFaas Table III show the response times and success ratios of
gateway when there are too many function replicas serving growing and Flat type workloads with varying function repli-
the users simultaneously. In the second and third scenarios cas and with TPS or concurrency or Peak Invocations per
with higher concurrent requests/per workload, the success rate second value equals 93, which is scaled down by a factor
gradually has increased from 67.39% to 100% and 36.41% of 30.
to 86.62% respectively with the increase in replica count as Statistical similarity tests have been run on the different
depicted in figure 8 9. iterations of the same test scenario to check the similarity of
1) Impact of Concurrency: Figures 10 and 11 presents the result datasets. We have recorded five iterations of Test
the results of the performance evaluation of three concurrent scenario-2 results with function replicas ranging from 5 to
workloads following a spiky load pattern. As the concurrency 100 in intervals of 5. A Friedman test, with 95% confidence
of the workload increased, response time increased dramati- interval, indicate that our test iteration results are statistically
cally and success rate decreased, observed for replica count similar, χ2 (4) = 3.9, p = 0.45.
= 1 specifically in our experiments. These are inline with We conduct similarity tests between various workload types
the research findings of Palade [11] and Mohanty et al. [10], with same concurrent users to assess the statistical difference
who evaluated OpenFaas on edge devices, edge networks and between the serverless performance of various workload pat-
virtual machines using managed Kubernetes and VMs setup terns (Spiky, Flat and Growing types). A Kruskal-Wallis H
by Virtualbox software.

710

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
Response
4000
FLAT WORKLOAD Statistic df Sig.
Function Average Through- Success Kolmogorov-Smimov 0.537 21 0.000
Replicas Response time (ms) put Rate Shapiro-Wilk 0.231 21 0.000
1 7936 53.1 54.88% Table IV
5 105 59.9 100% N ORMALITY TEST RESULTS FOR VARIOUS FUNCTION REPLICAS OF S PIKY
10 104 60.2 100% WORKLOAD R ESPONSE TIME DATA SET.
50
70
106
106
59.9
59.9
100%
100%
2000
100 107 60.0 100%
GROWING WORKLOAD Response Times data model
Function Average Through- Success 12000
Replicas Response time (ms) put Rate
1 3632 30.5 75.53%
10000
5 101 30.8 100%
10 102 30.7 100% 0
50 102 30.7 100% 8000
60 80 100 120 1

Response Times (ms)


70 102 30.8 100%
100 104 30.8 100%
Table III 6000
P ERFORMANCE EVALUATION RESULTS FOR THE G ROWING AND F LAT
W ORKLOADS FOR VARIOUS FUNCTION REPLICAS .
4000

2000 Flat workload


Spiky workload
Growing workload
Baseline SLA(100 ms)
0
60 80 100 120 140 160 180 200 220 240 260
Concurrency

Figure 13. Response time models of all workloads (Spiky, Growing, and Flat)
for various concurrency values and replica size = 100; Peak concurrency of
(70 − 90 users per second– zoomed) can meet the performance SLA of 100
ms.

relationship between concurrency (independent variable) and


response times (dependent variable), we observe that the data
model follows a Logistic distribution. The significance level
adopted was 5% for all hypothesis tests. Accuracy of the data
model is tested using R and R2̂ values.
Figure 12. Response times of the Spiky, Growing, and Flat workloads plotted For all the three workload types, the correlation coefficient,
for 100 function replicas and various concurrency levels.
R, is 0.95, which places the correlation into the ”strong”
category (0.8 or stronger is a strong correlation). R2̂ values are
0.911, 0.922 and 0.90 for spiky, growing and flat workloads
test, with 95% confidence interval, showed that there was respectively. It means that 91.1% (or 92.2% or 90% as
a statistically significant difference in response time scores applicable) of the total variation in response times, which
between different Workload patterns, χ2 (2) = 5.9, p = 0.051, can be explained by the changes in concurrency. The p value
with a mean rank of 10.80 for Spiky, 9.0 for Flat and 4.2 for observed for all the three workloads in the ANOVA results
Growing patterns for a given concurrency. table is 0.00, which is < 0.05, explaining that the R2̂ value
1) Result data distribution: Descriptive statistics are pre- is highly significant and the model is a good fit. Figure 13
sented using means with standard deviation (SD) for response shows the logistic relationship of result data sets.
times with a sample size of 21 for various workloads with
function replicas ranging from 1 to 100. Kolmogorov-Smirnov VI. D ISCUSSION
and Shapiro-Wilk test results show a Sig. or p value of 0.0, From the box plots in Figure 12, the result data comparison
which is < 0.05. Also, from the Q-Q plot we can conclude shows that the median of the Spiky workload is in a higher
that the response time data set appears to be non-normally level, thus explaining why the average response time of the
distributed for all the workload patterns, as it does not follow Spiky workload is higher compared to the other patterns.
the diagonal line and appears to have a non-linear pattern. This could be because of the sharp rise and fall in the TPS
2) Response time data model: The experiments conducted values changing in the intervals of one minute. The long
in subsection V-A have been repeated for other concurrency upper and lower whiskers of the Spiky load box means that
values such as 70, 55, 93, 109, 218 for a fixed function replicas response times are varied for various concurrency values.
= 100 to perform the regression analysis. Hence, concurrency has significant effect on the response
Using the Curvilinear (Nonlinear) regression to model the times for Spiky load for function replicas more than 1. On

711

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.
the other hand, the short box for Flat workload shows that From an application perspective, we intend to consider
the response time values for distinct concurrency valued are mobile data networks, specifically using call data records, to
without much variation. It can also be deducted that there is design a layered strategy to cope with unexpected loads as
an obvious difference in response times of these workloads, part of mobile communications in emergency situations.
which is also confirmed by the Kruskal-Wallis H test results.
R EFERENCES
It is interesting to note that beyond the breaking point, the
Flat workload has highest error rate and response time of [1] A. Chandrasekaran and C. Lowery, “A CIO’s Guide to Serverless
Computing,” Gartner Research, Industry Report ID: G00465766, Apr.
9500 seconds as observed from the data model presented in 2020, (Last accessed: 15/Dec/2020). [Online]. Available: https://www.
Figure 13. gartner.com/smarterwithgartner/the-cios-guide-to-serverless-computing/
The study of various mean rank values returned by the [2] D. Bartoletti et al., “Predictions 2021: Cloud Computing,” Forrester,
Industry Report, Oct. 2020, (Last accessed: 15/Dec/2020). [Online].
Kruskal-Wallis H test on various workload patterns indicates Available: https://www.forrester.com/fn/51A83KxURjmofUAEV7bCKR
that the Spiky workload has significantly higher response [3] D. Bardsley, L. Ryan, and J. Howard, “Serverless performance and
time than the growing or steady state workload of the same optimization strategies,” in 2018 IEEE SmartCloud. New York: IEEE,
Sep. 2018, pp. 19–26.
concurrency. Figure 13 plots the data modelling graph, where [4] G. McGrath and P. R. Brenner, “Serverless computing: Design, imple-
the horizontal line represents the performance goal or the mentation, and performance,” in 2017 IEEE ICDCSW. Atlanta: IEEE,
QoS, and the three curves represent the results from the result Jun. 2017, pp. 405–410.
[5] D. Jackson and G. Clynch, “An investigation of the impact of language
models. Observing where these curves cross the horizontal line runtime on the performance and cost of serverless functions,” in 2018
shows the number of users that can safely access the system IEEE/ACM UCC. Zurich: IEEE, Dec. 2018, pp. 154–160.
in each case while still meeting the stated performance goal. [6] K. Figiela et al., “Performance evaluation of heterogeneous cloud
functions,” Concurrency and Computation: Practice and Experience,
The combined plot can be read as follows: vol. 30, no. 23, p. e4792, 2018.
The test executions show, with 95%–confidence, [7] M. Pawlik, K. Figiela, and M. Malawski, “Performance evaluation of
parallel cloud functions,” in ICPP 2018. Oregon: ACM, Aug. 2018,
that between 70 and 90 i.e. concurrent users can pp. 1–2.
access the system while experiencing acceptable [8] H. Lee, K. Satyam, and G. Fox, “Evaluation of production serverless
performance. Beyond the breaking point identified computing environments,” in 2018 IEEE CLOUD. San Francisco: IEEE,
Jul. 2018, pp. 442–450.
(i.e. 91 transactions per second–see Figure 13), the [9] D. Pinto, J. P. Dias, and H. S. Ferreira, “Dynamic allocation of serverless
Kubernetes cluster has to be scaled-up or scaled out functions in IoT environments,” in 2018 IEEE EUC. Bucharest: IEEE,
to meet the QoS and availability requirements. Oct. 2018, pp. 1–8.
[10] S. K. Mohanty, G. Premsankar, and M. di Francesco, “An evaluation of
open source serverless computing frameworks,” in 2018 IEEE Cloud-
VII. C ONCLUSIONS AND F UTURE WORK Com. Nicosia: IEEE, Dec. 2018, pp. 115–120.
As part of this research, we have studied the suitability of [11] A. Palade, A. Kazmi, and S. Clarke, “An evaluation of open source
serverless computing frameworks support at the edge,” in 2019 IEEE
open-source serverless offerings for private cloud, specifically, SERVICES. Milan: IEEE, Jul. 2019, pp. 206–211.
[12] I. Baldini et al., “Serverless computing: Current trends and open
we have evaluated the OpenFaas framework and the modelled problems,” in Research Advances in Cloud Computing, Dec. 2017, ch. 1,
workload produced realistic insights of the underlying system. pp. 1–20.
Our study shows that the relation between concurrency and [13] A. Randal, “The ideal versus the real: Revisiting the history of virtual
machines and containers,” ACM Computing Surveys, vol. 53, no. 1, pp.
response time follows a Logistic model for all categories of 5:1–31, Feb. 2020.
workloads. [14] V. Medel et al., “Characterising resource management performance in
Kubernetes,” Computers & Electrical Engineering, vol. 68, pp. 286–297,
This project has leveraged the Python of-watchdog template 2018.
for benchmarking CPU-intensive workload. Future research [15] A. Pereira Ferreira and R. Sinnott, “A performance evaluation of
includes evaluation of Classic watchdog and different modes containers running on managed Kubernetes services,” in 2019 IEEE
CloudCom. Sydney: IEEE, Dec. 2019, pp. 199–208.
of of-watchdog (forking and HTTP modes) for various run- [16] H. Martins, F. Araujo, and P. da Cunha, “Benchmarking serverless
times to determine which has the best performance for CPU, computing platforms,” Journal of Grid Computing, vol. 18, p. 691–709,
memory, network, and IO-intensive workloads. Implementing 2020.
[17] D. Ongaro and J. K. Ousterhout, “In search of an understandable
those resultant templates for specific scenarios can improve consensus algorithm,” in USENIX ATC ’14. Philadelphia: USENIX
the overall performance. Association, Jun. 2014, pp. 305–319.
The limitations of the current research are classified into [18] M. Castro and B. Liskov, “Practical byzantine fault tolerance,” in OSDI
’99. New Orleans: USENIX Association, Feb. 1999, pp. 173–186.
two broad categories: the cluster size and test duration. Large [19] M. Shahrad et al., “Serverless in the wild: Characterizing and optimizing
scale tests with much longer time frames (in the order of days) the serverless workload at a large cloud provider,” in ATC 20. USENIX
with more test iterations would provide more accurate mea- Association, Jul. 2020, pp. 205–218.
[20] H. González-Vélez and M. Cole, “Adaptive statistical scheduling of
surement of the system response over time as well as a more divisible workloads in heterogeneous systems,” Journal of Scheduling,
comprehensive understanding of any scalability constraints. vol. 13, no. 4, pp. 427–441, 2010.
Another interesting avenue for future research is to study [21] M. Danelutto et al., “Algorithmic skeletons and parallel design patterns
in mainstream parallel programming,” International Journal of Parallel
other types of serverless workload patterns, particularly those Programming, pp. 1–22, 2020, In Press.
connected to divisible workloads [20] and, in general, struc-
tured parallelism [21] to determine the sequence of workloads
patterns causing highest and least resource utilisation.

712

Authorized licensed use limited to: VinUni. Downloaded on March 15,2025 at 15:27:45 UTC from IEEE Xplore. Restrictions apply.

You might also like