0% found this document useful (0 votes)
20 views

Towards Efficient Mapping Scheduling and Execution of HPC Applications on Platforms in Cloud

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Towards Efficient Mapping Scheduling and Execution of HPC Applications on Platforms in Cloud

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum

Towards Efficient Mapping, Scheduling, and


Execution of HPC Applications on Platforms in
Cloud
Abhishek Gupta (4th year Ph.D. student) and Laxmikant V Kalé
University of Illinois at Urbana-Champaign
Urbana, IL 61801, USA
(gupta59, kale)@illinois.edu

Abstract—The advantages of pay-as-you-go model, elasticity, interconnect in clouds, network and I/O virtualization over-
and the flexibility and customization offered by virtualization head, hardware heterogeneity, cross-application interference
make cloud computing an attractive option for meeting the needs arising from multi-tenancy, and the HPC-agnostic cloud sched-
of some High Performance Computing (HPC) users. However,
there is a mismatch between cloud environments and HPC re- ulers [1–4].
quirements. The poor interconnect and I/O performance in cloud, While the outcome of these studies paints a rather pes-
HPC-agnostic cloud schedulers, and the inherent heterogeneity simistic view of HPC clouds, recently there have been efforts
and multi-tenancy in cloud are some bottlenecks for effective towards HPC-optimized clouds (such as Amazon Cluster Com-
HPC in cloud. pute [5] and DoE Magellan project [1,3,6]), HPC-aware cloud
Our primary thesis is that cloud is suitable for some HPC
schedulers [7, 8] and topology-aware mapping of application
applications not all applications, and for those applications,
cloud can be more cost-effective compared to typical dedicated virtual machines (VMs) to physical topology [9]. These efforts
HPC platforms using intelligent application-to-platform mapping, point to a promising direction to overcome some of the
HPC-aware cloud schedulers, and cloud-aware HPC execution fundamental inhibitors. However, much work remains to be
and parallel runtime system. To address the challenges, and to done, and today only embarrassingly parallel or small scale
exploit the opportunities offered by HPC-clouds, we make Open-
HPC applications can be efficiently run in cloud [1–4].
Stack Nova scheduler HPC-aware and Charm++ parallel runtime
system cloud-aware. We demonstrate that our techniques result In this thesis, outlined in Figure 1, we take a more holistic
in significant improvement in cost (up to 60%), performance (up approach unlike past research: First, besides addressing the
to 45%), and throughput (up to 32%) for HPC in cloud; helping challenges of running HPC applications in cloud, we also
cloud users gain confidence in the capabilities of cloud for HPC, explore the opportunities offered by cloud for HPC. Secondly,
and cloud providers run a more profitable business. our research is aimed at improving HPC performance, resource
Keywords-Cloud; High Performance Computing; Scheduling; utilization, and cost when running in cloud and hence it is
Placement; Runtime system; Virtual machines beneficial to both – users and cloud providers. Finally, with
the objective of providing a set of techniques to bridge the gap
I. I NTRODUCTION AND R ELATED W ORK between HPC and clouds, we adopt a threefold complementary
approach:
Cloud computing has recently emerged as a cost effective • Mapping applications to platforms in cloud intelli-
alternative to dedicated infrastructure for HPC applications. gently: Through comprehensive performance evaluation
Running an application in cloud avoids the long lead time, and analysis, we identify what application and platform
high capital expenditure, and large operational costs associated characteristics are crucial for the selection of a platform
with a dedicated HPC infrastructure [1]. In addition, the ability for a particular application. We conclude that a hybrid
to provision HPC resources on-demand with high elasticity, supercomputer-cloud approach can be more cost-effective
reduces the risks caused by under-provisioning, and reduces compared to running all applications on a dedicated
the underutilization of resources caused by over-provisioning. supercomputer or all in cloud [4, 10]. (§II)
Finally, the built-in virtualization support in the cloud offers an • Making cloud schedulers and VM placement HPC-aware:
alternative way to support flexibility, customization, security, We propose and demonstrate techniques for application-
and resource control to the HPC community. aware consolidation and placement of VMs on physical
However, despite these benefits, there is a mismatch be- machines. Through topology-awareness, heterogeneity-
tween the requirements of HPC and the characteristics of awareness, cross-VM interference accounting, and careful
current cloud environments [1–4]. Most HPC applications co-location of application VMs of complementary ex-
consist of tightly-coupled parallel processes which perform ecution profiles, we achieve significant improvement in
frequent communication and synchronizations. Dominant chal- performance and resource utilization [11, 12]. (§III)
lenges for HPC in cloud are shown in Figure 1, and include • Making HPC execution and runtime cloud-aware: We
the following: the absence of low-latency and high-bandwidth address the challenges of heterogeneity and multi-tenancy

978-0-7695-4979-8/13 $26.00 © 2013 IEEE 2294


DOI 10.1109/IPDPSW.2013.125
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 07:20:25 UTC from IEEE Xplore. Restrictions apply.
HPC in Cloud

Performance Evaluation Cost Evaluation

Challenges/Bottlenecks Opportunities

VM Virtualization - Pay-as-you-go/
Poor Network Heterogeneity Multi- Security Noise Elasticity rent vs. own
consolidation customization
Performance tenancy

Commodity Virtualization
Interconnect overhead

Thin VMs/Containers

Mapping Application-Aware Cloud Aware Malleable Parallel Jobs


Applications to Platforms Cloud Schedulers HPC Load Balancer (Runtime Shrink/Expand)

MAPPING SCHEDULING/PLACEMENT EXECUTION


HPC Aware Clouds Cloud Aware HPC
Fig. 1: Thesis overview

HPC Application Load Balancer migrates


request stream objects from overloaded
Normalized Performance

1.4 to under loaded VM


1.3 First App
Second App
1.2 HPC VM1 HPC VM2
Intelligent Mapping of 1.1
Applications to platform 1
0.9
0.8 Background/
0.7 Interfering VM
Commodity Cloud with 0.6 running on same host
Supercomputer ... EP-ChaNGa LU-IS LU-ChaNGa
Cluster virtualization Application Combinations Physical Host 1 Physical Host 2
Objects
(a) Mapping Concept (b) Co-locating VMs from different applications on a(c) Load balancer to mitigate the effect of static and
physical node: application-aware scheduling dynamic heterogeneity on application performance

Fig. 2: Methodology

in cloud through dynamic redistribution of parallel tasks such as NAMD [18], ChaNGa [19], and Sweep3D [20],
(Charm++ [13, 14] objects or AMPI [14] threads) to on a range of platforms – supercomputer, HPC-optimized
VMs [15, 16]. We also explore the use of malleable jobs cluster, private cloud, and public cloud. These platforms have
to benefit from the inherent elasticity in cloud. (§IV) different interconnects, operating systems, and virtualization.
Our results, presented in [4], show that cloud can be cost-
II. P ERFORMANCE AND M APPING OF HPC IN C LOUD effective compared to supercomputers at small scale or for
The primary research challenge that we address here is that applications which are less communication-intensive.
rather than running all the applications on a single platform Based on this observation, we proposed a tool for mapping
(in-house or cloud), will it be more cost-effective approach application to platforms in cloud using application character-
to leverage multiple platforms (dedicated and in the cloud) istics such as communication intensiveness and sensitivity to
and if so, how? To answer this question, we evaluated the noise. Instead of considering cloud as a substitute for super-
performance and cost of running a set of HPC benchmarks computer, we investigated the co-existence of supercomputer
(NPB benchmarks [17]) and some real world applications, and cloud (See Figure 2a). We follow a two step methodology

2295

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 07:20:25 UTC from IEEE Xplore. Restrictions apply.
– 1) Characterize an application using theoretical models, testbed at HP Labs site. We use VMs with 1-vcpu, 2GB
previous instrumentation, or simulation to generate an ap- memory, and KVM as hypervisor. The applications used in
plication signature that captures application’s communication this experiment are NPB [17] (EP = Embarrisingly Parallel,
profile, grain size, and problem size and 2) use heuristics to LU = LU factorization, IS = Integer Sort) problem size
select a suitable platform from a given set for an application class B and ChaNGa [19] = Cosmology. We first ran each
based on application signature, platform characteristics, and application using all 4 cores of a node (dedicated mode), and
user preferences. In [10], we provided a proof-of-concept then ran them in shared mode, where each node is shared
of this approach, and evaluated the associated benefits of a by the two applications – 2 VMs of each application run
smart mapping tool. Through simulation using simple regular on a node, 4 VMs total per application. Figure 2b shows
applications, we showed that in a concrete scenario with a the performance for both applications in shared execution
supercomputer (Ranger [21]) and a Eucalyptus based cloud as normalized with respect to the dedicated execution for dif-
two available platforms, our scheme reduces the cost by 60% ferent application combinations. Here, the x-label represent
while limiting the performance penalty to 10-15% vs. a non the application combination and the first (similarly second)
optimized configuration. bar corresponds to the first (second) application in x-label. It
Characterizing an HPC application and predicting its is clear from Figure 2b that some application combinations
performance is challenging and has been extensively re- achieve normalized performance close to one (EP-ChaNGa),
searched. Run-time instrumentation, event-tracing and curve- some co-locations results in significant detrimental impact
fitting based performance-modeling approaches have been on performance of one application (e.g. ChaNGa-IS because
explored [22–24]. Our objective in this thesis is not to perform IS is communication-intensive, hence locating all 4 VMs on
extensive applications characterization but to discover the most same node reduces communication time), whereas in case of
important dimensions for the purpose of mapping applications LU-ChaNGa, the interference actually results in performance
to platforms. Through our research, we have demonstrated that improvement. Investigation revealed that this is due to the large
there are significant benefits which can be achieved by using working set size of LU and small working set size of EP, which
an intelligent tool and a combination of multiple platform, means that the shared last level cache is better utilized when
compared to a single platform or naive mapping. We believe the applications are run in the shared mode [12].
that our approach can be extended to complex applications We demonstrated that there are significant benefits of using
such as those with irregular communication patterns and a common pool of resources for applications with different
multiple phases. characteristics (such as HPC vs. non-HPC, communication,
synchronization, cache intensiveness) but cross-application in-
III. HPC-AWARE C LOUD S CHEDULER terference is a major impediment to effective resource based
The second method which we adopt to bridge the gap packing of HPC applications. To address this problem, we
between HPC and cloud is to focus on cloud schedulers adopt the following approach - 1) Characterize application
and explore opportunities to a) improve HPC performance along two dimensions – tightly coupledness and use of shared
in cloud and b) reduce HPC cost when running in cloud. resources (such as cache) on a shared physical node and 2)
Current strategies for placement of VMs to physical machines match applications whose execution profiles well complement
are mostly HPC-agnostics, that is they do not consider the each other and place them on same node to improve resource
intrinsic nature of HPC applications. An HPC application utilization. We implemented this approach on top of existing
consists of n processes which communicate and synchronize OpenStack Nova scheduler and evaluated it in the same setup
frequently with each other during the execution. However, in and above. Our results in [12] show that our techniques achieve
cloud physical machines can be heterogeneous, and achieved 45% better performance while limiting jitter to 8% through
network performance between two physical nodes (also re- cross-VM interference accounting. We also modified a popular
ferred to as hosts) can vary significantly depending on the cloud simulator – CloudSim [28] to make it HPC-aware. Sim-
physical position of nodes in the network topology. Hence, ulation results using CloudSim showed that our application-
to obtain better performance, we modified OpenStack [25] aware consolidation technique can result in 32% increase in
Nova scheduler to make at HPC-aware. OpenStack is a pop- throughput compared to default scheduling techniques.
ular cloud management system. We evaluated the modified
OpenStack Nova scheduler by setting up a cloud on Open IV. C LOUD -AWARE HPC RUNTIME
Cirrus test-bed [26] using KVM [27] as hypervisor. In [11], we The final approach that we follow is to adapt HPC runtime
demonstrated performance improvement up to 20% through to meet the needs of cloud environments. Our hypothesis is
topology- and hardware-awareness. that the parallel runtime system should be able to adapt to the
We extended this HPC-aware scheduler to accomplish the the dynamic variations in the cloud execution environment,
second goal – to make HPC execution more economical resulting in improved performance. In addition, by providing
in cloud. To this end, we explored the opportunities and runtime support for dynamically expanding/shrinking parallel
challenges of VM consolidation for HPC. Figure 2b illustrates jobs, significant gains in terms of higher resource utilization
this with a simple experiment, where we use two multi-core and cost savings can be achieved by leveraging cloud features
physical nodes (4-core, 8GB, 3 GHz each) of Open Cirrus such as variable pricing.

2296

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 07:20:25 UTC from IEEE Xplore. Restrictions apply.
To validate our hypothesis, we investigate the adaptation R EFERENCES
of Charm++ [13, 14] parallel runtime system to virtualized [1] “Magellan Final Report,” U.S. Department of Energy (DOE), Tech. Rep.,
environment. HPC applications and runtime are typically de- 2011, http://science.energy.gov/∼/media/ascr/pdf/program-documents/
signed to be run in a homogeneous and dedicated environment docs/Magellan Final Report.pdf.
[2] A. Iosup et al., “Performance Analysis of Cloud Computing Services for
whereas in case of cloud, there is inherent static hardware Many-Tasks Scientific Computing,” Parallel and Distributed Systems,
heterogeneity, and multi-tenancy can result in dynamic het- IEEE Transactions on, vol. 22, no. 6, pp. 931 –945, june 2011.
erogeneity (e.g. other application VMs entering and leaving [3] K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia,
J. Shalf, H. J. Wasserman, and N. J. Wright, “Performance Analysis
a physical node). Heterogeneity – both static and dynamic, of High Performance Computing Applications on the Amazon Web
significantly degrades performance of parallel applications Services Cloud,” in CloudCom’10, 2010.
especially those which are iterative and bulk synchronous. To [4] A. Gupta and D. Milojicic, “Evaluation of HPC Applications on Cloud,”
in Open Cirrus Summit (Best Student Paper), Atlanta, GA, Oct. 2011,
minimize the impact of these factors on application perfor- pp. 22 –26. [Online]. Available: http://dx.doi.org/10.1109/OCS.2011.10
mance, we designed and implemented a cloud-aware load bal- [5] “High Performance Computing (HPC) on AWS,” http://aws.amazon.
ancer for HPC applications on top of existing Charm++ load com/hpc-applications.
[6] “Magellan - Argonne’s DoE Cloud Computing,” http://magellan.alcf.anl.
balancing framework. Our approach is based on decomposing gov.
the workload into medium grained tasks called objects, which [7] “Nova Scheduling Adaptations,” http://xlcloud.org/bin/download/
can be easily migratable by the runtime across processors Download/Presentations/Workshop 26072012 Scheduler.pdf.
[8] “HeterogeneousArchitectureScheduler,” http://wiki.openstack.org/
(virtual cores in our case). The load balancing framework HeterogeneousArchitectureScheduler.
instruments the application execution, and measures object [9] P. Fan, Z. Chen, J. Wang, Z. Zheng, and M. R. Lyu, “Topology-Aware
and processor loads. Idle times on VMs are also measured. Deployment of Scientific Applications in Cloud Computing,” Cloud
Computing, IEEE International Conference on, vol. 0, 2012.
It is assumed that there is very small variation in object [10] A. Gupta et al., “Exploring the Performance and Mapping of HPC
loads across iterations – sometimes referred to as principle Applications to Platforms in the cloud,” in HPDC ’12. New York,
of persistence. Hence, based on the measured statistics from NY, USA: ACM, 2012, pp. 121–122.
[11] A. Gupta, D. Milojicic, and L. Kale, “Optimizing VM Placement for
previous iterations, we migrate loads away from overloaded HPC in Cloud,” in Workshop on Cloud Services, Federation and the 8th
VMs to underloaded VMs. Figure 2c illustrates a situation Open Cirrus Summit, San Jose, CA, 2012.
with two physical hosts (nodes). There is a VM from another [12] A. Gupta et al., “HPC-Aware VM Placement in Infrastructure Clouds ,”
in accepted at Intl. Conf. on Cloud Engineering IC2E ’13.
application running on the first host. Without load balancing, [13] L. Kale and S. Krishnan, “Charm++: A Portable Concurrent Object
each host would be distributed equal number of objects. Since, Oriented System Based on C++,” in OOPSLA, September 1993.
one of the HPC VM has to time-share the CPU with the [14] L. V. Kale and G. Zheng, “Charm++ and AMPI: Adaptive Runtime
Strategies via Migratable Objects,” in Advanced Computational Infras-
interfering VM, the load is imbalanced, and whole application tructures for Parallel and Distributed Applications, M. Parashar, Ed.
will slow down. Our load balancer detects this condition, and Wiley-Interscience, 2009, pp. 265–282.
migrates objects from overloaded to underloaded VMs based [15] O. Sarood, A. Gupta, and L. V. Kale, “Cloud Friendly Load Balancing
for HPC Applications: Preliminary Work,” in Parallel Processing Work-
on average load calculation. Details of the algorithm can be shops (ICPPW), 2012 41st Intl. Conf. on, sept. 2012, pp. 200 –205.
found in [16]. We evaluated our techniques on a real cloud [16] A. Gupta et al., “Improving HPC Application Performance in Cloud
setup up to 64 VMs. Our results shown in [15,16] demonstrate through Dynamic Load Balancing,” in accepted at CCGRID ’13.
[17] “NAS Parallel Benchmarks,” http://www.nas.nasa.gov/Resources/
performance benefits up to 45% for scientific benchmarks and Software/npb.html.
a real world molecular dynamics application. [18] A. Bhatele, S. Kumar, C. Mei, J. C. Phillips, G. Zheng, and L. V. Kale,
“Overcoming Scaling Challenges in Biomolecular Simulations across
In future, we plan to evaluate this load balancer on a Multiple Platforms,” in IPDPS 2008, April 2008, pp. 1–12.
larger scale, and to explore runtime support for malleable jobs. [19] P. Jetley, F. Gioachin, C. Mendes, L. V. Kale, and T. R. Quinn,
Adaptive MPI (AMPI) [14] can be used to obtain these benefits “Massively Parallel Cosmological Simulations with ChaNGa,” in IDPPS,
2008, pp. 1–12.
of our dynamic runtime system for MPI [29] applications. [20] “The ASCII Sweep3D code,” http://wwwc3.lanl.gov/pal/software/
sweep3d.
[21] “Ranger User Guide,” http://services.tacc.utexas.edu/index.php/
ranger-user-guide.
V. C ONCLUSIONS [22] C. da Lu and D. Reed, “Compact Application Signatures for Parallel
and Distributed Scientific Codes,” in Supercomputing, ACM/IEEE 2002.
[23] J. S. Vetter, N. Bhatia, E. M. Grobelny, P. C. Roth, and G. R. Jou-
Since clouds have traditionally been designed for business bert, “Capturing Petascale Application Characteristics with the Sequoia
Toolkit,” in In Proceedings of Parallel Computing 2005. Malaga, 2005.
and web applications with the goal of increasing the utilization [24] D. H. Bailey and A. Snavely, “Performance Modeling: Understanding
of underutilized resources through consolidation and multi- the Past and Predicting the Future,” in in Euro-Par 2005, p. 185.
tenancy, there is a mismatch between current cloud offerings [25] “Open Stack Open Source Cloud Computing Software,” http://www.
openstack.org/.
and HPC requirements. This thesis aims to bridge that gap [26] A. I. Avetisyan and et al., “Open Cirrus: A Global Cloud Computing
through effective mapping, VM placement and scheduling, Testbed,” Computer, vol. 43, pp. 35–43, April 2010.
and execution of HPC applications on a range of platforms [27] “kvm – Kernel-based Virtual Machine,” Redhat, Inc., Tech. Rep., 2009.
[28] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and
in cloud. Using a complementary approach of making clouds R. Buyya, “CloudSim: A Toolkit for Modeling and Simulation of Cloud
HPC-aware and HPC cloud-aware, we have demonstrated that Computing Environments and Evaluation of Resource Provisioning
HPC performance-cost tradeoffs in cloud can be significantly algorithms,” Softw. Pract. Exper., vol. 41, no. 1, pp. 23–50, Jan. 2011.
[29] “MPI: A Message Passing Interface Standard,” in M. P. I. Forum, 1994.
improved.

2297

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 07:20:25 UTC from IEEE Xplore. Restrictions apply.

You might also like