Toffetti 2016
Toffetti 2016
highlights
• A definition of cloud-native applications and their desired characteristics.
• A distributed architecture for self-managing (micro) services.
• A report on our experiences and lessons learnt applying the proposed architecture to a legacy application brought to the cloud.
facing business changes. The bottom line is increased productivity. • Resilience: CNA have to anticipate failures and fluctuation
From the economical perspective, the pay-per-use model means in quality of both cloud resources and third-party services
that no upfront investment is needed for acquiring IT resources needed to implement an application to remain available during
or for maintaining them, as companies pay only for allocated outages. Resource pooling in the cloud implies that unexpected
resources and subscribed services. Moreover, by handing off the fluctuations of the infrastructure performance (e.g., noisy
responsibility of maintaining physical IT infrastructure, companies neighbor problem in multi-tenant systems) need to be expected
can avoid capital expenses (capex) in favor of usage-aligned and managed accordingly.
operational expenses (opex) and can focus on development rather • Elasticity: CNA need to support adjusting their capacity by
than operations support. adding or removing resources to provide the required QoS in
An extensive set of architectural patterns and best practices face of load variation avoiding over- and under-provisioning.
for cloud application development have been distilled, see for In other terms, cloud-native applications should take full
instance [2–4]. advantage of the cloud being a measured service offering on-
However, day-to-day cloud application development is still far demand self-service and rapid elasticity.
from fully embracing these patterns. Most companies have just It should be clear that resilience is the first goal to be attained to
reached the point of adopting hardware virtualization (i.e., VMs). achieve a functioning and available application in the cloud, while
Innovation leaders have already moved on to successfully deploy- scalability deals with load variation and operational cost reduction.
ing newer, more productive patterns, like microservices, based on Resilience in the cloud is typically addressed using redundant
light-weight virtualization (i.e., containers). resources. Formulating the trade-off between redundancy and
On one hand, a pay-per-use model only brings cost savings with operational cost reduction is a business decision.
respect to a dedicated (statically sized) system solution if (1) an The principles identified in the ‘‘12 factor app’’ methodology [7]
application has varying load over time and (2) the application focus not only on several aspects that impact on resiliency and scal-
provider is able to allocate the ‘‘right’’ amount of resources ability (e.g., dependencies, configuration in environment, backing
to it, avoiding both over-provisioning (paying for unneeded services as attached resources, stateless processes, port-binding,
resources) and under-provisioning resulting in QoS degradation. concurrency via process model, disposability) of Web applications,
On the other hand, years of cloud development experience have but also the more general development and operations process
taught practitioners that commodity server hardware and network (e.g., one codebase, build-release-run, dev/prod parity, administra-
switches break often. Failure domains help isolate problems, tive processes). Many of the best practices in current cloud devel-
but one should ‘‘plan for failure’’, striving to produce resilient opment stem from these principles.
applications on unreliable infrastructure, without compromising
their elastic scalability. 2.2. Current state of cloud development practice
In this article we relate on our experience in porting a legacy
Web application to the cloud, adopting a novel design pattern Cloud computing is novel and economically more viable with
for self-managing cloud native applications. This enables vendor respect to traditional enterprise-grade systems also because it
independence and reduced costs with respect to relying on relies on self-managed software automation (restarting compo-
IaaS/PaaS and third party vendor services. nents) rather than more expensive hardware redundancy to pro-
The main contributions of this article are: (1) a definition of vide resilience and availability on top of commodity hardware [8].
cloud-native applications and their desired characteristics, (2) a However, many applications deployed in the cloud today are sim-
distributed architecture for self-managing (micro) services, and ply legacy applications that have been placed in VMs without
(3) a report on our experiences and lessons learnt applying the changes of architecture or assumptions on the underlying infras-
proposed architecture to a legacy application brought to the cloud. tructure. Failing to adjust cost, performance and complexity expec-
tations, and assuming the same reliability of resources and services
in a traditional data center as in a public cloud can cost dearly, both
in terms of technical failure and economical loss.
2. Cloud-native applications
In order to achieve resilience and scalability, cloud applications
have to be continuously monitored, analyzing their application-
Any application that runs on a cloud infrastructure is a ‘‘cloud specific and infrastructural metrics to provide automated and re-
application’’, but a ‘‘cloud-native application’’ (CNA from here on) sponsive reactions to failures (health management functionality)
is an application that has been specifically designed to run in a cloud and changing environmental conditions (auto-scaling functional-
environment. ity), minimizing human intervention.
The current state of the art in monitoring, health management,
2.1. CNA: definitions and requirements and scaling consists of one of the following options: (a) using ser-
vices from the infrastructure provider (e.g., Amazon CloudWatch1
We can derive the salient characteristics of CNA from the main and Auto Scaling2 or Google Instance Group Manager3 ) with a de-
aspects of the cloud computing paradigm. As defined in [5], there fault or a custom provided policy, (b) leveraging a third-party ser-
are five essential characteristics of cloud computing: on-demand vice (e.g., Rightscale,4 New Relic5 ), (c) building an ad-hoc solution
self service, broad network access, resource pooling, rapid elasticity using available components (e.g., Netflix Scryer,6 logstash7 ). Both
and measured service. In actual practice the cloud infrastructure is
the enabler of these essential characteristics. Due to the economy
of scale, infrastructure installations are large and typically built 1 https://aws.amazon.com/cloudwatch.
of commodity hardware so that failures are the norm rather than 2 https://aws.amazon.com/autoscaling.
the exception [6]. Finally, cloud applications often rely on third- 3 https://cloud.google.com/compute/docs/autoscaler.
party services, as part of the application functionality, support 4 http://www.rightscale.com.
(e.g., monitoring) or both. Third-party services might also fail or 5 https://newrelic.com.
offer insufficient quality of service. 6 http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-
Given the considerations above, we can define the main re- scaling.html.
quirements of CNA as: 7 https://www.elastic.co/products/logstash.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 3
The main contributions of this article is a high-level distributed Fig. 2. Hierarchical KV-store clusters for microservices management.
architecture that can be used to implement self-managing cloud
native applications. running to provide resilience and performance guarantees (e.g., as
The idea is that just as there are best practices to build reli- in Fig. 1).
able services on the cloud by leveraging distributed algorithms In microservice architectures, several patterns are used to
and components, so can management functionalities (e.g., health- guarantee resilient, fail-fast behavior. For instance, the circuit-
management, auto-scaling, adaptive service placement) be imple- breaker pattern [12] or client-side load balancing such as in the
mented as resilient distributed applications. Netflix Ribbon library.11 The typical deployment has multiple
More in detail, the idea is to leverage modern distributed instances of the same microservice running at the same time,
in-memory key–value store solutions (KV-store; e.g. Consul,8 possibly with underlying data synchronization mechanisms for
Zookeeper,9 Etcd,10 Amazon Dynamo [9], Pahoehoe [10]) with stateful services. The rationale behind this choice is to be
strong or eventual consistency guarantees. They are used both able to deploy microservice instances across data centers and
to store the ‘‘state’’ of each management functionality and to infrastructure service providers and letting each microservice
facilitate the internal consensus algorithm for leader election and quickly adjust to failures by providing alternative endpoints for
assignment of management functionalities to cluster nodes. In this each service type.
way, management functionalities become stateless and, if any of In Fig. 2, we provide an intuitive representation of how mul-
the management nodes were to fail, the corresponding logic can tiple KV-store clusters can be used to implement self-managing
be restarted on another one with the same state. More concretely, microservice applications across cloud providers. Each microser-
any management functionality (e.g., the autoscaling logic) can vice is deployed with its own KV-store cluster for internal con-
be deployed within an atomic service as a stateless application figuration management and discovery among components. Local
component to make the service self-managing in that aspect. If the management functionalities (e.g., component health management,
autoscaling logic or the machine hosting it were to fail, the health scaling components) are delegated to nodes in the local cluster.
management functionality would restart it, and the distributed Another KV-store cluster is used at ‘‘global’’ (application) level.
key–value store would still hold its latest state. This ‘‘composition cluster’’ is used both for endpoint discovery
With the same approach, hierarchies of configuration clusters across microservices and leader election to start monitoring,
can be used to delegate atomic service scaling to the components, auto-scaling, and health management functionalities at service
and atomic service composition and lifecycle to service elected composition level. Other application-level decisions like for
leaders. What we propose integrates naturally with the common instance micro-service placement across clouds depending on
best practices of cloud orchestration and distributed configuration latencies and costs, or traffic routing across microservices can be
implemented as management logic in the composition cluster.
that we will discuss in the following sections.
Combined with placement across failure domains, the proposed
Self-managing microservice compositions. By generalization, and architecture enables distributed hierarchical self management,
building on the concept of service composability, the same akin to an organism (i.e., the composed service) that is able
architecture can be employed to deploy self-managing service to recreate its cells to maintain its morphology while each cell
compositions or applications using the microservice architectural (i.e., each microservice) is a living self-managing element.
pattern [11].
A microservice-oriented application can be represented with a 3.1. Atomic service example
type graph of microservices that invoke each other, and an instance
graph representing the multiple instances of microservices that are In this subsection we provide an example of how to apply the
concept of self-managing services to a monolithic Web application
8 https://www.consul.io.
9 https://zookeeper.apache.org/. 11 http://techblog.netflix.com/2013/01/announcing-ribbon-tying-netflix-
10 https://github.com/coreos/etcd. mid.html.
4 G. Toffetti et al. / Future Generation Computer Systems ( ) –
12 https://aws.amazon.com/cloudformation/. 15 http://raftconsensus.github.io/.
13 https://wiki.openstack.org/wiki/Heat. 16 For example see http://blog.zhaw.ch/icclab/setup-a-kubernetes-cluster-on-
14 http://docs.oasis-open.org/tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.html. openstack-with-heat ‘‘dedicated Etcd host’’.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 5
Fig. 4. An example snippet of the representation of a type graph (left) and instance graph (right) using Etcd directory structure.
there. For instance, in our example in Fig. 4, a new CA instance the scope of this work. For instance, in our example the load
adds a new directory with its UUID (uuid1) and saves a key with balancer can use its own internal metrics in combination with the
its endpoint to be used by the application server components. logstash aggregator17 to provide the average request rate, response
Edges in the instance graph are used to keep track of component time, and queue length in the last 5, 10, 30 s and 1, 5, 10 min.
connections in order to enforce the cardinalities on connections as These metrics are typically enough for an auto-scaling logic to take
specified in the type graph. The auto-scaling manager (described in decisions on the number of needed application servers.
the following subsections) is responsible for deciding how many
components per type are needed, while the health manager will 3.2.2. Auto-scaling
make sure that exactly as many instances as indicated by the auto- The auto-scaling component uses a performance model to
scaling logic are running and that their interconnections match the control horizontal scalability of the components. The main function
type graph. Component information (e.g., endpoints) is published is to decide how many instances of each component need to be
by each component in Etcd periodically with a period of 5 s and running to grant the desired QoS. Auto-scaling is started by the
a time to live (TTL) of 10 s. Whenever a component fails or is leader node. Its logic collects the monitoring information from
removed, its access information is automatically removed from Etcd, the current system configuration, and outputs the number of
Etcd, and the health manager and all dependent components can required components for each component type. This information is
be notified of the change. stored in the type graph for each node under the cardinality folder
Once the orchestrator has deployed the initial set of required with the key ‘‘req’’ (required) as in Fig. 4.
components for the service, it sets the status of the service on Etcd
to ‘‘active’’. Once this happens, the component which was elected 3.2.3. Health management
leader of the Etcd cluster will start the self-managing functionality The node that is assigned health management functionalities
with the auto-scaling and health management logic. compares the instance graph with the desired state of the
system (as specified by the auto-scaling logic) and takes care
of (1) terminating and restarting unresponsive components,
3.2.1. Monitoring
(2) instantiating new components, (3) destroying no longer needed
Before discussing the auto-scaling functionality, we will components, (4) configuring the connections among components
describe how Etcd can also be used to store a partial and aggregated in the instance graph so that cardinalities are enforced.
subset of monitoring information in order to allow auto-scaling
decisions to be taken. The rationale behind storing monitoring 3.2.4. Full life-cycle
information in Etcd is to allow resilience of the auto-scaling logic Fig. 5 depicts a simplified sequence diagram putting all the
by making it stateless. Even if the VM or container where the pieces together. The orchestrator sets up the initial deployment of
auto-scaling logic has been running fails, a new component can the service components. They register to Etcd and watch relevant
be started to take over the auto-scaling logic and knowledge base Etcd directories to perform configuration updates (reconfiguration
from where it was left. parts for AS and CA components are omitted). Once all initial
The common practice in cloud monitoring is to gather both components are deployed, the orchestrator sets the service state
low-level metrics from the virtual systems such as CPU, I/O, RAM to ‘active’. Components generating monitoring information save it
usage as well as higher-level and application-specific metrics such periodically in Etcd.
as response times and throughputs [16]. Considering the latter Each component runs a periodic check on the service state.
metrics, full response time distributions are typically relevant If the service is active and a node detects to be the Etcd cluster
in system performance evaluation, but for the sake of QoS leader, it starts the auto-scale and health management processes.
management high percentiles (e.g., 95th, 99th) over time windows Alternatively, auto-scale and health management components can
of few seconds are in general adequate to assess the system be started on other nodes depending on their utilization. A watch
behavior. We assume that each relevant component runs internal mechanism can be implemented from the cluster leader to indicate
monitoring logic that performs metrics aggregation and publishes to a component that it should start a management functionality.
aggregated metrics to Etcd. The actual directory structure and
format in which to save key performance indicators (KPIs) is
dependent on the auto-scaling logic to be used and is beyond 17 http://logstash.net/.
6 G. Toffetti et al. / Future Generation Computer Systems ( ) –
Fig. 5. Sequence diagram depicting a simplified service instantiation and deinstantiation. For simplicity we represent Etcd as a single process.
3.2.5. Self-healing properties technologies supporting the design patterns for cloud-based
By placing components across different failure domains (e.g., applications, and on the other hand to successfully apply these
availability zones in the same data center, or different data patterns to a traditional business application which was not
centers), the architecture described above is resilient to failure designed to run in the cloud. Rather than starting from scratch
and is able to guarantee that failed components will be restarted with an application designed from inception for the cloud, we
within seconds. The fact that any remaining node can be elected wanted to show that decomposition in smaller components
leader, and that the desired application state and monitoring data (even by component functionality rather than application feature)
is shared across an Etcd cluster, makes the health management and often allows to achieve resilience and elasticity even in legacy
auto-scaling components stateless, and allows the atomic service applications.
to be correctly managed as long as the cluster is composed of the For the evaluation of a suitable application, we decided to
minimum required number of nodes for consensus which is three. uphold the following criteria. The application should be:
19 https://suitecrm.com/.
18 Service Prototyping Lab: http://blog.zhaw.ch/icclab/. 20 http://zurmo.org/.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 7
creates a new component instance (i.e., a ‘‘unit’’ in Fleet parlance) and start Dynamite there where it could still restore the state from
and submits it to Fleet. Otherwise, if a scale-in is requested, it Etcd. For more details, we refer the reader to the documentation
instructs Fleet to destroy a specific unit. of the Dynamite implementation27 as well as our work previously
Dynamite is itself designed according to CNA principles. If it published in [17].
crashes, it is restarted and re-initialized using the information
stored in Etcd. This way, Dynamite can be run in a CoreOS cluster
resiliently. Even if the entire node Dynamite is running on were 27 Dynamite scaling engine: https://github.com/icclab/dynamite/blob/master/
to crash, Fleet would re-schedule the service to another machine readme.md.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 9
5. Experimental results and developers will only scale applications through containers.
The actual distribution of containers upon virtual machines is
In this section we report on our resilience and scalability decided by the Fleet scheduler, and in general results in uniform
experiments with our cloud-native Zurmo implementation. We distribution across VMs.
provide the complete source code of the application on our github Using our own internal monitoring system allows the appli-
repository.28 cation to scale on high level performance metrics (e.g. 95th per-
All the experiment we discuss here have been executed centiles) that are computed at timely intervals by the logstash
on Amazon AWS (eu-central) using 12 t2.medium-sized virtual component and saved to Etcd to be accessed by Dynamite.
machines. We also ran the same experiments on our local Fig. 9 shows one example run using the scaling engine to
OpenStack installation. The OpenStack results are in line with AWS withstand a load of 10 concurrent users growing to 100. In the
and we avoid reporting them here because they do not provide any upper graph we plot the application response time in milliseconds
additional insight. Instead, in the spirit of enabling verification and (red continuous line, left axis) and the request rate in requests per
repeatability, we decided to focus on the AWS experiments. They second (green dashed line, right axis). The request rate grows from
can be easily repeated and independently verified by deploying the roughly 2 requests per second up to 20, while the response time is
CloudFormation template we provide in the ‘‘aws’’ directory of the kept at bay by adaptively increasing the number of running Apache
implementation. containers. The bottom part of the graph shows the number of
The experiments are aimed at demonstrating that the proposed running Apache containers at any point in time (red continuous
self-managing architecture and our prototypical implementation line) as well as the number of simulated users. As soon as the
correctly address the requirements we identified for cloud-native generated traffic ends, the number of Apache containers is reduced.
applications: elasticity and resilience. In other terms we pose This simple experiment shows the feasibility of an auto-
ourselves the following questions: scaling mechanism according to our self-managing cloud-native
applications principles. For this example we only implemented
• Does the application scale (out and in) according to load a simple rule-based solution and we make no claims concerning
variations? its optimality with respect to minimizing operational costs. More
• Is the application resilient to failures? advanced adaptive model-based solutions (for instance the one
In order to demonstrate resilience we emulate IaaS failures by in [18]) could be easily integrated using the same framework.
respectively killing containers and VMs. Scaling of the application
is induced by a load generator whose intensity varies over time. 5.2. Resilience to container failures
The load generation tool we used is called Tsung.29 We created
a Zurmo navigation scenario by capturing it through a browser In order to emulate container failures, we extended the Multi-
extension, then generalized and randomized it. You can also find Cloud Simulation and Emulation Tool (MC-EMU).30 MC-EMU is an
this in our repository, in the ‘‘zurmo_tsung’’ component. In our extensible open-source tool for the dynamic selection of multiple
experiments the load was generated from our laptops running resource services according to their availability, price and capacity.
Tsung locally. We simulated a gradually increasing number of users We have extended MC-EMU with an additional unavailability
(from 10 up to 100) with a random think time between requests model and hooks for enforcing container service unavailability.
of 5 s in average. This yields a request rate of 0.2 requests per The container service hook connects to a Docker interface
second per user, and a theoretical maximum expected rate of 20 per VM to retrieve available container images and running
requests per second with 100 concurrent users. The load is mainly instances. Following the model’s determination of unavailability,
composed of read (HTTP GET) operations, around 200, and roughly the respective containers are forcibly stopped remotely. It is the
30 write (HTTP POST) requests involving database writes. It is task of the CNA framework to ensure that in such cases, the
important to notice that, due to our choice of avoiding to use sticky desired number of instances per image is only shortly underbid and
HTTP sessions, also any request saving data in the HTTP session that replacement instances are launched quickly. Therefore, the
object results in database writes. overall application’s availability should be close to 100% even if the
container instances are emulated with 90% estimated availability.
Fig. 10 depicts the results of an example run in which we forced
5.1. Scaling containers to fail with a 10% probability every minute. With respect
to the previous example one can clearly notice the oscillating
In order to address the first question we configured Dynamite number of Apache Web servers in the bottom of the figure, and
to scale out the service creating a new Apache container instance the effect this has on the application response time. Figs. 11 and
every time the 95th percentile of the application response time 12 show a glimpse of the monitoring metrics we were able to
(RT) continuously exceeds 1000 ms in a 15 s window. track and visualize through Kibana while running the experiment.
The scale in logic instead will shut down any Apache container We plot the average and percentile response times, response time
whose CPU utilization has been lower than 10% for a period of at per Apache container, request rate, HTTP response codes, number
least 30 s. Given that we are managing containers, scaling in and of running Apache containers and the CPU, memory, and disk
out is a very quick operation, and we can afford to react upon short utilization for each.
term signals (e.g., RT over few seconds).
Since we used Fleet and CoreOS for the experiments, and not 5.3. Resilience to VM failures
directly an IaaS solution billing per container usage, we also needed
to manage our own virtual machines. We used 10 VMs that are We also emulated VM failures, although without the automa-
pre-started before initiating the load and that are not part of the tion models of MC-EMU or similar tools like ChaosMonkey.31 In-
scaling exercise. The assumption is that future container-native stead, we simply used the AWS console to manually kill one or
applications will be only billed per container usage in seconds, more VMs at a given point in time to pinpoint critical moments.
Fig. 9. Response time, request rate, number of users and apache servers running the system without externally induced failures.
Fig. 10. Response time, request rate, number of users and apache servers running the system inducing probabilistic container failures.
The effects of killing entire VMs in our prototype implementa- AWS would restart the VM that would try to use Etcd discovery
tion vary a lot depending on the role of the VM in the Etcd cluster again to rejoin the cluster, but this would fail. In other failure
as well as the type of containers it is running. As one could expect, scenarios, the machine might even change its IP address, requiring
killing VMs only hosting ‘‘stateless’’ (or almost stateless) contain- manual deletion and addition of the new endpoint. This problem
ers (e.g., Apache, Memcached) only has small and transitory effects is fairly common for Etcd in AWS, so much that we found an
on the application quality of service. However, terminating a VM implementation of a containerized solution for it.32 However, we
running stateful components (e.g., the database) has much more did not yet integrate it into our stack and will leave a comparison
noticeable effects. to future work.
There are 2 types of VMs which we explicitly did not target for In order to show in practice how different the effects of killing
termination: VMs can be, we report here a run in which we target VMs running
• the VM running logstash; different types of components. Fig. 13 depicts a run in which we
• the VMs acting as ‘‘members’’ of the Etcd cluster. killed 2 of the VMs running half of the 4 MySQL Galera cluster
nodes roughly 3 min into the run (manually induced failures of
The reason for the former exclusion is trivial and easily amend- two VMs each time are marked with blue vertical dashed lines).
able: we simply did not have time to implement logstash as a load- Together with the database containers, one can see that also some
balanced service with multiple containers. Killing the logstash Apache containers were terminated. Moreover, having only two
container results in a period of few seconds without visible metrics Galera nodes left, one of which was acting as a replication source
in Kibana which would have defeated the goals of our experiment. for the Galera nodes newly (re)spawned by Fleet, means that the
The solution to this shortcoming is straightforward engineering. database response time became really high for a period, with a
Concerning the Etcd cluster member VMs, the issue is that clearly visible effect on the Web application response time. Other
the discovery token mechanism used for Etcd cluster initialization two VMs at a time were killed respectively 6 and 9 min into the run,
works only for cluster bootstrap. In order to keep the consensus
quorum small, the default cluster is only composed of three
members while other nodes join as ‘‘proxies’’ (they just read cluster
state). Any VM termination of one of the 3 member nodes in 32 http://engineering.monsanto.com/2015/06/12/etcd-clustering/.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 11
Fig. 11. The real-time monitoring metrics while running the experiment depicted in Fig. 10.
but since no database components were hit, apart from the graph bring up units after too many failed restarts. Fleet was failing in
of the number of Apache instances, no major effects are perceived bringing up replicas of components we needed to be redundant,
in the application response time. which made it extremely hard for to hope in achieving a reliable
system in those conditions. These failures in starting containers
5.4. Lessons learnt happened sporadically and we could not reproduce them at will.
This is not the behavior one is expecting with containers: one of the
Implementing our self-manging cloud-native application de- key points of them is to offer consistency between development
sign and applying it to an existing legacy Web application have and production environments.
proven to be valuable exercises in assessing the feasibility of our It took us a while to understand that the random failures were
approach through a proof of concept implementation and identi- due to a bug33 in the Docker version included in CoreOS 766.3.0. In
fying its weaknesses. very few cases concurrently pulling multiple containers resulted in
As it is mostly the case when realizing a theoretical design some container layers to be only partially downloaded, but docker
in practice, we were faced with several issues that hindered our would consider them complete and would refuse to pull again. The
progress. Some of them were a consequence of adopting fairly new problem was aggravated by the fact we used unit dependencies in
technologies lacking mature and battle-tested implementations. Fleet, requiring some units to start together on the same machine.
Here we report in a bottom-up fashion the main problems In this case a failing container would cause multiple units to be
we encountered with the technological stack we used for our disabled by Fleet.
implementation. It is hence always worth repeating: tight coupling is bad,
especially if it implies cascading failures while building a resilient
CoreOS. During about one year of research on CNA we used system.
different releases of CoreOS stable. The peskiest issue we had with
it took us quite some time to figure out. The symptoms were that
the same containers deployed on the same OS would randomly
refuse to start. This caused Fleet/systemd to give up trying to 33 https://github.com/coreos/bugs/issues/471.
12 G. Toffetti et al. / Future Generation Computer Systems ( ) –
Fig. 12. The real-time monitoring metrics while running the experiment depicted in Fig. 10.
Fig. 13. Response time, request rate, number of users and apache servers running the system inducing VM failures for stateful components.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 13
Etcd. The biggest issue we had with Etcd was already mentioned in One of the negative effects of having automatic component
the previous section. We use Etcd as the foundation of the whole reconfigurations upon changes of the component endpoints
distributed solution both as a distributed key–value store and for registered in Etcd is that circular dependencies would cause ripple
leader election. We expected that after a machine failure (when effects propagating through most components. This for instance
the machine gets rebooted or another machine takes its place) happened when we initially replaced a single MySQL instance with
rejoining a cluster would be automatic, however this is not the case a set of MySQL Galera nodes that needed to self-discover. A much
in AWS. Artificially causing machine failures like we did to test the more elegant solution is to put one or more load balancers in
reliability of our implementation often caused the Etcd cluster to front of every microservice and register them as the endpoint for
become unhealthy and unresponsive. a service. An even better solution is using the concept of services
Another issue we experienced is that Etcd stops responding to and an internal DNS to sort out service dependencies as done in
requests (also read requests!) if the machine disk is full. In this Kubernetes. This solution does not even require reconfigurations
case the cluster might again fail, and Fleet would stop behaving upon failures.
correctly across all VM instances. A very positive aspects of our implementation is that we have a
Fleet. Fleet is admittedly not a tool to be used directly for con- self-managing solution that now works seamlessly in OpenStack,
tainer management. Our lesson learnt here is that using managed AWS, and Vagrant. The internal monitoring stack can be easily
approaches like Kubernetes should be preferred. Apart from this, reused for other applications, and the decomposition in docker
we had often issues with failed units not being correctly removed containers allowed us to hit the ground running when starting our
in systemd on some nodes and in general misalignment between porting of the solution to Kubernetes which is our ongoing work.
the systemd state of some hosts and the units Fleet was aware of. Another aspect to notice is that when we started our implemen-
Some command line interface mistakes which can easily happen tation work, container managers were in their infancy and we had
(e.g., trying to run a unit template without giving it an identifier) to build a solution based on IaaS (managing VMs and their cluster-
result in units failing to be removed in systemd and Fleet hanging ing) rather than directly using APIs to manage sets of containers.
on the loop requesting their removal preventing any other com- Already now, the availability of container managers has improved,
mand from being executed. and we expect the commercial market to grow fast in this segment.
Another unexpected behavior we managed to trigger while If one is being charged per VM in a IaaS model, then only auto-
developing is due to the interplay of Fleet and Docker. Fleet scaling containers does not mean reducing costs. In practice what
is expected to restart automatically failed containers, however can be done is using, for example in AWS, AWS AutoscalingGroups
Docker volumes are not removed by default (the rationale is that for VMs and custom metrics generated from within the applica-
they might be remounted by some other containers). The net effect tion to trigger the instantiation and removal of a VM. The work is
is that after a while machines with problematic containers run out conceptually simple, but we did not implement it yet and are not
of disk space, Etcd would stop working, the cluster would become aware of existing re-usable solutions.
unhealthy, and the whole application would be on its own running Although our own experience using Fleet to implement
without Fleet. The CoreOS bug we mentioned above also caused the proposed solution was somehow difficult, we can already
this on long running applications effectively bringing down the relate on the excellent results we are having by porting the
service. entire architecture to Kubernetes. It is still work in progress,
These are all minor issues due to the fact that most of the tools but the whole work basically amounts to converting Fleet
we use are in development themselves. However, any of these unit files into replication controller and service descriptors for
problems might become a blocker for developers using these tools Kubernetes, no need for component discovery since ‘‘services’’ are
for the first time. framework primitives. All in all, the availability of more mature
Self-managing Zurmo. Finally some considerations concerning our container management solutions will only simplify the adoption
own design. The first thing to discuss is that we did not go all the of microservices architectures.
way and implement Zurmo as a set of self-managing microservices
each with its own specific application-level functionality. 6. Related work
The main reason is that we did not want to get into Zurmo’s code
base to split its functionality into different services. This would To the best of our knowledge, the work in [1] we extended here
have meant investing a large amount of time to understand the was the first attempt to bring management functionalities within
code and the database (which has more than 150 tables). Instead,
cloud-based applications leveraging on orchestration and the
we preserved the monolithic structure of the application core
consensus algorithm offered by distributed service configuration
written in PHP. What we did was replicating the components and
and discovery tools to achieve stateless and resilient behavior
put a load balancer in front of them (e.g., for Apache or MySQL
of management functionalities according to cloud-native design
Galera cluster). So, in a way, we created a microservice for each
patterns. The idea builds on the results and can benefit from a
type of component, with a layer of load balancers in front. This
number of research areas, namely cloud orchestration, distributed
is not the ‘‘functional’’ microservice decomposition advocated by
configuration management, health management, auto-scaling, and
Lewis and Fowler [11], however we showed experimentally that
cloud development patterns.
for all the purposes of resilience and elasticity it still works. Where
We already discussed the main orchestration approaches in
it would not work is in fostering and simplifying development
literature, as this work reuses much of the ideas from [13]. With
by multiple development teams (each catering for one or more
respect to the practical orchestration aspects of microservices
microservices as a product) in parallel. This for us means that the
management, a very recent implementation34 adopts a similar
microservices idea is actually more a way to scale the development
process itself rather than the running application. solution to what we proposed in our original position paper.
We used Etcd for component discovery, for instance for the We had some exchanges of views with the authors, but are not
Galera nodes to find each other, the loadbalancers to find backends, aware whether our work had zero or even minimal influence on
and Apache to find the Galera cluster endpoint and Memcached
instances. Breaking the application at least in microservices based
on component types would in hindsight have been a cleaner 34 https://www.joyent.com/blog/app-centric-micro-orchestration [retrieved on
option. 2016.06.10].
14 G. Toffetti et al. / Future Generation Computer Systems ( ) –
the Autopilot35 cloud-native design pattern recently promoted by some performance guarantees. In [1] we propose an approach that
Joyent. Either way, we consider the fact that other independent deploys the management (e.g., auto-scaling) functionalities within
researchers came up with a very similar idea an encouraging sign the managed application. This not only falls in the category of self-
for our work. */autonomic systems applied to auto-scaling surveyed in [21] (the
Several tools provide distributed configuration management application becomes self-managing), but with respect to the state
and discovery (e.g., Etcd, ZooKeeper, Consul). From the research of the art, brings the additional (and cloud-specific) contribution
perspective, what is more relevant to this work is the possibility of of making the managing functionalities stateless and resilient
counting on a reliable implementation of the consensus algorithm. according to cloud-native design principles. In this respect, the
Much of the health management functionality described in the works listed above are related just in desired functionality, but not
paper is inspired from Kubernetes [19], although to the best of relevant to the actual contribution we claim as any of the scaling
our knowledge Kubernetes was originally ‘‘not intended to span mechanisms proposed in literature can be used to perform the
multiple availability zones’’.36 Ubernetes,37 is a project aiming to actual scaling decision.
overcome this limit by federation. Finally, considering cloud patterns and work on porting legacy
A set of common principles concerning automated manage- applications to the cloud, the work of [24] is worth considering
ment of applications are making their way in container man- when addressing the thorny problem of re-engineering the
agement and orchestration approaches (e.g., Kubernetes, Mesos,38 database layer of existing applications to achieve scalable cloud
Fleet, Docker-compose39 ) with the identification, conceptualiza- deployment. With this respect, in our implementation work we
tion, and instantiation of management control loops as primitives just migrated a single MySQL node into a multi-master cluster
of the underlying management API. To give a concrete example, whose scalability is still limited in the end.
‘‘replication controllers’’ in Kubernetes are a good representative
of this: ‘‘A replication controller ensures the existence of the de- 7. Conclusion
sired number of pods for a given role (e.g., ‘‘front end’’). The au-
toscaler, in turn, relies on this capability and simply adjusts the In this experience report article, we have introduced an archi-
desired number of pods, without worrying about how those pods tecture that leverages on the concepts of cloud orchestration and
are created or deleted. The autoscaler implementation can focus distributed configuration management with consensus algorithms
on demand and usage predictions, and ignore the details of how to enable self-management of cloud-based applications. More in
to implement its decisions’’ [20]. Our proposed approach lever- detail, we build on the distributed storage and leader election func-
ages on basic management functionalities where present, but pro- tionalities that are commonly available tools in current cloud ap-
poses a way to achieve them as a part of the application itself plication development practice to devise a resilient and scalable
when deployed on a framework or infrastructure that does not sup- managing mechanism that provides health management and auto-
port it. Moreover, we target not only the atomic service level man- scaling functionality for atomic and composed services alike. The
aging components (akin to what Kubernetes does for containers) key design choice enabling resilience is for both functionalities to
but also service composition level managing multiple microservice be stateless so that in case of failure they can be restarted on any
instances. In [20], the authors also advocate control of multiple node collecting shared state information through the configuration
microservices through choreography rather than ‘‘centralized management system.
orchestration’’ to achieve emergent behavior. In our minds, once ap- Concerning future work, we plan to extend the idea to incor-
plications are deployed across different cloud vendors, orchestra- porate the choice of geolocation and multiple cloud providers in
tion (albeit with distributed state as we propose) is still the only the management functionality. Another aspect we look forward to
way to achieve a coherent coordinated behavior of the distributed tackle is that of continuous deployment management, including
system. adaptive load routing.
Horizontal scaling and the more general problem of quality of
service (QoS) of applications in the cloud have been addressed by a Acknowledgments
multitude of works. We reported extensively on the self-adaptive
approaches in [21] and here give only a brief intuition of the most This work has been partially funded by an internal seed project
relevant ones. We can cite the contributions from Nguyen et al. [22] at ICCLab40 and the MCN project under Grant No. [318109] of the
and Gandhi et al. [14] which use respectively a resource pressure EU 7th Framework Programme. It has also been supported by an
model and a model of the non-linear relationship between server AWS in Education Research Grant award, which helped us to run
load and number of requests in the system together with the our experiments on a public cloud.
maximum load sustainable by a single server to allocate new VMs. Finally we would like to acknowledge the help and feedback
A survey dealing in particular with the modeling techniques used from our colleagues Andy Edmonds, Florian Dudouet, Michael
to control QoS in cloud computing is available in [23]. With respect Erne, and Christof Marti in setting up the ideas and implementa-
to the whole area of auto-scaling and elasticity in cloud computing, tion. A big hand for Özgür Özsu who ran most of the experiments
including the works referenced from the surveys cited above, this and collected all the data during his internship at the lab.
work does not directly address the problem of how to scale a
cloud application to achieve a specific quality of service. Works in
References
current and past elasticity/auto-scaling literature focus either on
the models used or on the actual control logic applied to achieve [1] G. Toffetti, S. Brunner, M. Blöchlinger, F. Dudouet, A. Edmonds, An
architecture for self-managing microservices, in: V.I. Munteanu, T. For-
tis (Eds.), Proceedings of the 1st International Workshop on Auto-
mated Incident Management in Cloud, AIMC@EuroSys 2015, Bordeaux,
35 http://autopilotpattern.io/. France, April 21, 2015, ACM, ISBN: 978-1-4503-3476-1, 2015, pp. 19–24.
36 https://github.com/GoogleCloudPlatform/kubernetes/blob/master/DESIGN.md http://dx.doi.org/10.1145/2747470.2747474, URL
http://doi.acm.org/10.1145/2747470.2747474.
retrieved 03/03/2015. [2] B. Wilder, Cloud Architecture Patterns, O’Reilly, 2012.
37 https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/
federation.md.
38 http://mesos.apache.org.
39 https://docs.docker.com/compose. 40 http://blog.zhaw.ch/icclab/.
G. Toffetti et al. / Future Generation Computer Systems ( ) – 15