Ad hoc cloud computing
Ad hoc cloud computing
Abstract—This paper presents the first complete, integrated • Diverse workloads: Targets a set of more diverse
and end-to-end solution for ad hoc cloud computing en- applications such as memory, I/O and disk-intensive
vironments. Ad hoc clouds harvest resources from existing tasks as opposed to typical CPU-intensive applications
sporadically available, non-exclusive (i.e. primarily used for
some other purpose) and unreliable infrastructures. In this commonly executed by volunteer computing systems.
paper we discuss the problems ad hoc cloud computing solves Our research has developed solutions to each of the chal-
and outline our architecture which is based on BOINC. lenges above, and to our knowledge no other research in this
Keywords-cloud computing; ad hoc; virtualization; volunteer field has presented such a complete, integrated and end-to-
computing; reliability end solution for ad hoc cloud computing environments. In
this paper we detail the research challenges and solutions of
I. I NTRODUCTION
developing an ad hoc cloud computing prototype. Primarily
This paper introduces and develops a prototype of an we focus on our cloud continuity solution, however we
ad hoc cloud computing framework. Ad hoc clouds harvest explore possible solutions for the remaining problems.
resources from existing sporadically available, non-exclusive For simplicity and brevity in this paper, we assume
(i.e. primarily used for some other purpose) and unreliable that our implementation will be predominately deployed
infrastructures. Examples of such infrastructures range from on Local Area Networks to ensure reasonable security and
personal infrastructure users with a number of underutilized performance guarantees; we aim to extend it to Wide Area
computers, to startup companies through to large-scale or- Networks after evaluating and optimizing the architecture.
ganizational infrastructures. Subsequently, applications that require extremely high levels
The nature of providing a cloud service by harvesting of security, may not be suited to the ad hoc cloud, or
resources from a set of unreliable hosts does have similar perhaps any cloud implementation. Furthermore, interactive
elements to Grid and volunteer computing. Despite being applications or those that write to external dependencies
similar to volunteer and Grid computing as well as clouds may not function as expected due to data inconsistencies
(e.g. Amazon EC2 [1]) and clusters (e.g HTCondor [2]), the when a host fails abruptly. Such applications may need
ad hoc cloud computing paradigm has many key differences. further reliability mechanisms, however there are approaches
The ad hoc cloud model: to solve such problems [3].
• Volunteer resources: Operates over a set of non- The rest of this paper is organized as follows: Section
exclusive and sporadically available hosts, which may 2 provides an overview of the concepts and foundations of
be unpredictable in nature. This is in contrast to offering ad hoc cloud computing followed by an in-depth feature
a service from a dedicated cloud, cluster or Grid and implementation overview of our platform in Section 3.
infrastructure where each host’s resources are fully Section 4 briefly reports our initial evaluation while Section
committed to the service. 5 outlines related research. Section 6 concludes with a
• Lack of trust: Does not assume a level of trust exists summary and plans for future work.
between an end-user and the infrastructure provider;
a relationship that currently exists between end-users, II. C ONCEPTS AND F OUNDATIONS
clouds, clusters and Grids. We now discuss the architecture, components and pro-
• Ensures continuity: Maintains service availability in cesses of the ad hoc cloud computing platform.
the presence of host membership churn or failure to
ensure job continuity when running over a set of A. System Overview
unreliable hosts. An ad hoc cloud harvests resources from existing non-
• Low interference: Does not interfere with executing exclusive and sporadically available hosts used by host
host processes, especially in cases where these impor- users (e.g. company employees) and exposes these resources
tant processes consume a varying amount of resources to cloud jobs submitted by cloud users. Cloud jobs are
at any given time. submitted to the ad hoc server which then schedules jobs
1064
1. Request VM Host Machine hence the ad hoc cloud must allow user-defined jobs to be
V-BOINC
2. VM and Script
BOINC
submitted on-the-fly to BOINC while the service is running.
Server V-BOINC
Client This is not a trivial task and other research has taken place to
1.1. Get Disk 3. Create/
4. Start Attach enable job submission to BOINC (e.g. [6], [7], [8]) however
Dependencies 5. Get Job
VM Disk, Mount these methods either split single tasks into independent jobs
B + Setup
MPI 1.1.2 BOINC 6. Executable/Data O
I
Virtual to be executed or would generate too much overhead for this
Retrieve
Server
7. Job Result
N
C
Macine type of platform.
R
To enable on-the-fly and independent job submission to
BOINC, we created a BOINC project named Job Service
upon the ad hoc server to accept and distribute jobs. This is
Figure 2: V-BOINC Overview
in addition to the V-BOINC project, which is modified and
renamed to VM Service, that allows hosts within the ad hoc
platform to obtain virtual machines. Figure 3 shows how
is first instructed to request a virtual machine image (1) both the job and virtual machine BOINC projects interact.
from the V-BOINC project. Concurrently, the V-BOINC
Client probes the regular BOINC server to determine if any Ad hoc Server
dependencies exist for the specified project (1.1); which can BOINC Interface
be downloaded and attached to the virtual machine. This
gives the ad hoc cloud the ability to execute a set of diverse Job Service VM Service
workloads that have dependencies (e.g MPI, data sets, etc),
jobs/ WC VC AC
however for the context of this paper, we omit these details.
The V-BOINC Server sends the virtual machine image Scheduler Scheduler
and a script that configures it (e.g. sets CPUs, memory and
disk space limits) to the V-BOINC Client (2). The virtual
machine is then configured (3) and started (4) to allow it to DB
request (5), receive (6) BOINC jobs and return job results
(7). More information about V-BOINC can be found in [5].
VM Host
III. F ROM VOLUNTEER TO C LOUD C OMPUTING
We now describe how the ad hoc cloud was developed Figure 3: The Ad hoc Server Architecture
and the major components that underpins the concept. The
basis of this work involves transforming V-BOINC into a First a cloud user uploads an application and optional data
platform that not only takes into account the requirements via a web interface. All uploaded files to the service are
of a cloud computing system but also one that can operate placed in a jobs/ folder within the Job Service project which
over an unreliable infrastructure. are then processed by our developed work creator (WC)
As such, all subsequent modifications and features men- daemon; BOINC allows project developers to create and add
tioned have been made either to the V-BOINC Server or daemons to the BOINC default daemon set to perform user-
V-BOINC Client, which are now named the ad hoc server defined actions. The work creator daemon distinguishes
and ad hoc client respectively. To ensure the ad hoc server application from data, creates XML descriptions of these
does not become a single point of failure, the server can be files and calls BOINC API functions to ultimately create a
replicated and load balanced in the same way regular BOINC BOINC workunit.
servers currently are. We also assume that an ad hoc client The Job Service then informs the VM Service that a
and virtual machine both reside on each host within the ad cloud job exists and that the VM Service’s vm controller
hoc infrastructure, i.e. processes (1) and (2) of Figure 2 have (VC) daemon can begin instantiating a virtual machine
been completed. upon a volunteer host to execute this job. Currently failed
volunteer hosts are determined by the availability checker
A. BOINC Job Submission
(AC) daemon which deems a host failed if they do not poll
To utilize the resources available in the ad hoc cloud, a within two minutes; a host is set to periodically poll the
user must submit a job to the service; we currently assume a server every minute.
job is an application executable with an option to upload data
to be analysed. In the case of V-BOINC, a virtual machine B. Reliability Scheduling
obtains an application by connecting to a specific BOINC Upon being notified a job awaits for execution, the VM
project; BOINC project jobs are however statically created Service begins scheduling a job to the most reliable host
before the BOINC service begins. In contrast, we would with a virtual machine ready to be used. The scheduler does
like cloud users to submit any job at any moment in time, this based on the following characteristics for each host:
1065
1) The total number of cloud jobs previously assigned, detach from a project, a host user is free to do so. However
2) The total number of cloud jobs previously completed, within an ad hoc cloud platform, the server has to instruct
3) The number of host failures, e.g. host termination, the ad hoc client to perform tasks, in turn transforming
hardware or OS failures, a client controlled infrastructure into a sever controlled
4) The number of guest failures, e.g. virtual machine infrastructure; we have modified our V-BOINC platform
configuration, instantiation, execution, and shutdown to allow this. The ad hoc client receives and issues these
errors, commands via the ad hoc BOINC client component to other
5) The current resource load. middleware components shown in Figure 4.
The reliability factors (1)-(3) are monitored by the ad hoc
Virtual Machine
server and are recorded in the Job Service database. The
BOINC Core Application
number of ad hoc host failures are monitored by the VM Client
BOINC Task
Dependencies
Service’s availability checker daemon which sets an ad hoc
Downloaded Virtual Machine
host to terminated or failed after two minutes of inactivity.
The reliability factors (4)-(5) are monitored by the ad hoc Ad hoc Client
1066
In order to ensure the system state is up-to-date at any To maintain a reliable service, we want to ensure that
given time, each host periodically polls the ad hoc server 95% of the time a cloud job will complete successfully. To
to signify the host’s and virtual machine’s availability. The satisfy this requirement, we must always have at least one
latter is tested via the Failure Detection component; an current snapshot present on another host as its presence is
independent process to test virtual machine availability. In directly related to the future success of an application if
response to a host poll, the ad hoc server returns a list of it is interrupted in any way. As such, we require that the
all other available hosts in the BOINC XML message along combined probability of a group of hosts failing that store
with their IP addresses and reliability values. a particular virtual machine’s snapshot is less than or equal
to 5%; this can be calculated by multiplying the respective
D. Making the Unreliable Reliable failure probabilities of each host. For example, Figure 5
These reliability values (discussed in the previous Section) shows that the probability of a cloud job never completing
are used by our P2P Snapshot component, shown in Figure while running on virtual machine A is 0.03%.
4, which has the task of periodically taking virtual machine This scheduling method does however mean that reliable
snapshots and transferring these to other clients in parallel destinations may end up storing many snapshots. However,
using pssh [10] to ensure cloud job continuity. After a suc- the maximum host storage that can be used by the ad
cessful transfer, the ad hoc server is informed of receiving hoc cloud (e.g. the ad hoc client, snapshots, etc) can be
hosts which now store the snapshot the location(s). When specified by the ad hoc host user via regular BOINC. In the
a host or virtual machine fails, the server is then able to event this limit is reached, the ad hoc server does not send
instruct one of the receiving hosts to restore the snapshot. the details of that host to polling ad hoc clients, ensuring
Take Figure 5 as an example, where each host’s percentage further snapshots are not sent to the host. We note there are
failure probability is displayed. many improvements that could be made to our P2P Snapshot
Scheduler, however we leave these for future work.
75% In the event a virtual machine or host running a cloud
VM VM job is deemed unreachable, the server begins the process
A E of restoring the virtual machine’s snapshot on another host.
Host Host It does this based on the host reliability formula when
60% 7% 5% selecting the best host for initially deploying a job onto
VM VM VM a virtual machine. Finally, all hosts that store the restored
B D G snapshot are instructed to delete it.
Host Host Host
IV. E VALUATION
VM VM The prototype of our ad hoc cloud computing concept was
C F
evaluated in terms of reliability and performance. Our relia-
Host Host
bility experiment tested our prototype running on 30 nodes
21% 53% the EDIM1 cluster [11]. In order to accurately simulate
an unreliable infrastructure, we obtained Nagios monitoring
Figure 5: Snapshot Scheduling and Failure Probabilities (%). data over a period of 36 months from 650 hosts in The
School of Informatics at The University of Edinburgh. We
We have seven nodes A to G. At time t=0 node A check- parsed this monitoring data, calculated the host activity for
points the virtual machine and the snapshot is sent to nodes every hour and selected the hour where 30 hosts had the
B, D and E at t=1. The virtual machine A terminates most activity.
due to a host failure (t=2), the ad hoc server detects this We replayed these events on EDIM1 with our ad hoc
(t=3) and instructs node D to restore virtual machine A’s cloud installed and measured the completion rates of a
snapshot (t=4). This process is repeated for each node where variety of workloads; our prototype achieved up to 93.3% re-
periodic snapshots are taken and sent to others. Snapshots liability. This is a significant improvement to BOINC, which
are however not blindly pushed to a random subset of hosts. simply restarts a failed task, unless application checkpoint-
The P2P Snapshot component schedules snapshots to be ing is enabled, in turn requiring application modification and
stored on the most reliable hosts. It does this by filtering the checkpoint to be present in memory.
potential snapshot receivers based on whether they are in In order to evaluate performance, we compared the times
use, the sender’s cloudlet membership and the descending of executing a cloud job on our ad hoc cloud and Amazon
reliability of potential receivers, all of which are taken from EC2 [1]) instance with similar resources. We showed that our
the list of available hosts within the BOINC XML message ad hoc cloud can offer similar performance for a variety of
sent from the server. The algorithm will then select the first workloads, even in the event of one or multiple ad hoc guest
n hosts that have less than a 5% chance of all n failing. failures, when taking into account the various overheads of
1067
both models. More information about our evaluation of the need to be tested by deploying the ad hoc cloud on a
ad hoc cloud can be found in [9]. live operational infrastructure with real workloads in the
near future. A more detailed insight into our ad hoc cloud
V. R ELATED W ORK computing prototype, the challenges solved and performance
Authors of [12] propose the concept of the ad hoc cloud can be found at [9].
within enterprise settings to harness unused resources to R EFERENCES
improve overall utilization, reduce net energy consumption
and allow organizations to take advantage of operating their [1] “Amazon EC2,” http://aws.amazon.com/ec2/, ac-
cessed: February 2015.
own in-house cloud. Their focus of the paper is to outline the
major implementation challenges and describe one approach [2] D. Thain et al., “Distributed Computing in Practice: The
to creating an ad hoc cloud computing infrastructure. The Condor Experience,” Concurrency - Practice and Experience,
main challenges outlined relate to coping with the sporad- vol. 17, no. 2-4, pp. 323–356, 2005.
ically available hosts and how to minimize the impact on [3] B. Cully et al., “Remus: High Availability via Asynchronous
non-cloud processes to an acceptable level. Virtual Machine Replication,” in Proceedings of the 5th
Chandra et al. propose a similar idea using Nebulas USENIX Symposium on Networked Systems Design and Im-
(synonymous to an ad hoc cloud) where volunteer resources plementation, ser. NSDI’08. Berkeley, CA, USA: USENIX
Association, 2008, pp. 161–174.
are used to create a cloud platform [13]. They note that
Nebulas are particularly useful for applications that do not [4] D. P. Anderson, “BOINC: A System for Public-Resource
have strong performance guarantees and hence the authors Computing and Storage,” in 5th IEEE/ACM International
focus on the performance and reliability of such platforms. Workshop on Grid Computing, 2004, pp. 4–10.
Sundarrajan et al. describe their early experience with [5] G. McGilvary et al., “V-BOINC: The Virtualization of
a prototype of their Nebula cloud system [14]; this was BOINC,” in CCGrid 2013, Delft, The Netherlands, May 2013.
tested using a data-intensive blog analysis use case on
[6] G. Rı́os et al., “Legion: An Extensible Lightweight Frame-
PlanetLab [15]. Their architecture consists of the same core work for Easy BOINC Task Submission, Monitoring and
components found in our system, namely a master server to Result Retrieval using Web Services,” in Proceedings of the
coordinate all activities backed by a database and clients to Latin American Conference on High Performance Computing,
execute cloud jobs. The authors do not use virtual machines 2011.
to execute cloud jobs but instead use the NativeClient plugin [7] E. Urbah et al., “EDGeS: Bridging EGEE to BOINC and
for web browsers to execute code as web applications. Their XtremWeb,” Journal of Grid Computing, vol. 7, pp. 335–354,
method shows that at times, little overhead can be seen when 2009.
executing code in this way.
[8] A. Kertész et al., “Multi-level brokering solution for inter-
VI. C ONCLUSIONS operating service and desktop grids,” in Proceedings of the
2010 Conference on Parallel Processing, ser. Euro-Par 2010.
We have outlined our developed ad hoc cloud computing Berlin, Heidelberg: Springer-Verlag, 2011, pp. 271–278.
platform that deploys a cloud service upon an end-user’s
[9] G. McGilvary, “Ad hoc cloud computing,” Ph.D. dissertation,
existing infrastructure where member hosts are sporadically The University of Edinburgh, 2014.
available and used for some other primary purpose. The
ad hoc cloud concept is useful for those who wish to [10] “Pssh website,” February 2015. [Online]. Available:
improve their infrastructure efficiency and utilization as well http://code.google.com/p/parallel-ssh/
as reduce costs by improving their return on IT investments. [11] P. Martin et al., “EDIM1 Progress Report,” EPCC, Tech. Rep.,
Furthermore, those who are not able to or do not wish to 2011.
to migrate to the commercial or private cloud models can
[12] G. Kirby et al., “An Approach to Ad-Hoc Cloud Computing,”
experiment and explore the potential of ad hoc clouds before University of St Andrews, Whitepaper, 2010.
adopting either of the commercial or private models.
In order to successfully develop an ad hoc cloud comput- [13] A. Chandra et al., “Nebulas: Using Distributed Voluntary
ing platform, a large number of technological and research Resources to Build Clouds,” in Proceedings of the 2009 Con-
ference on Hot Topics in Cloud Computing, ser. HotCloud’09.
fields must be visited such as virtualization, volunteer com- Berkeley, CA, USA: USENIX Association, 2009.
puting and scheduling. Despite this, we showed by outlining
our architecture that the concept of ad hoc cloud computing [14] J. B. Weissman et al., “Early Experience with the Distributed
is feasible and based on our initial evaluation, can be reliable Nebula Cloud,” in Proceedings of the Fourth International
Workshop on Data-intensive Distributed Computing, ser.
and offer comparable performance to Amazon EC2. We are DIDC ’11. New York, NY, USA: ACM, 2011, pp. 17–26.
confident that the reliability and performance of our ad hoc
cloud development can increase in a range of other scenarios, [15] “PlanetLab,” http://www.planet-lab.org/,
however these extrapolations are based on assumptions that accessed: February 2015.
1068