0% found this document useful (0 votes)
3 views

Exploring_the_performance_and_mapping_of_HPC_appli

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Exploring_the_performance_and_mapping_of_HPC_appli

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/254005988

Exploring the performance and mapping of HPC applications to platforms in the


cloud

Conference Paper · June 2012


DOI: 10.1145/2287076.2287093

CITATIONS READS
30 180

9 authors, including:

Abhishek Gupta Laxmikant V. Kalé


University of Illinois, Urbana-Champaign University of Illinois, Urbana-Champaign
20 PUBLICATIONS 1,004 CITATIONS 397 PUBLICATIONS 6,356 CITATIONS

SEE PROFILE SEE PROFILE

Dejan Milojicic Paolo Faraboschi


HP Inc. HP Inc.
266 PUBLICATIONS 7,408 CITATIONS 83 PUBLICATIONS 2,572 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Abhishek Gupta on 17 November 2014.

The user has requested enhancement of the downloaded file.


Exploring the Performance and Mapping of HPC
Applications to Platforms in the Cloud

Abhishek Gupta, Filippo Gioachin, Paolo Faraboschi,


Laxmikant V. Kalé Verdi March, Richard Kaufmann,
University of Illinois at Chun Hui Suen, Dejan Milojicic
Urbana-Champaign, Urbana, Bu-Sung Lee HP Labs, Palo Alto, CA, USA
IL, USA HP Labs, Singapore (paolo.faraboschi,
(gupta59,kale)@illinois.edu (gioachin, verdi.march, richard.kaufmann,
chun-hui.suen, dejan.milojicic)@hp.com
francis.lee)@hp.com

ABSTRACT Table 1: Experimental Test-bed


Platform Ranger Taub Open Euca- HPLS
This paper presents a scheme to optimize the mapping of Cirrus cloud
HPC applications to a set of hybrid dedicated and cloud Cores/ 16 12 4 2 12
node @2.3GHz @2.67GHz @3.00GHz @2.67GHz @2.67GHz
resources. First, we characterize application performance Network Infiniband Infiniband 10 GigaE 1 GigaE 1 GigaE
on dedicated clusters and cloud to obtain application sig-
natures. Then, we propose an algorithm to match these ers. Then, we propose an algorithm that leverages the per-
signatures to resources such that performance is maximized formance characteristics to map an application to resources.
and cost is minimized. Finally, we show simulation results In Section 3 we present our simulation results showing that
revealing that in a concrete scenario our proposed scheme re- our scheme reduces the cost by 60% compared to a non-
duces the cost by 60% at only 10-15% performance penalty optimized configuration, while the performance penalty is
vs. a non optimized configuration. We also find that the ex- kept at 10-15%. Finally, Section 4 discusses the lessons
ecution overhead in cloud can be minimized to a negligible learned and potential implications of our study.
level using thin hypervisors or OS-level containers.
2. APPROACH
Categories and Subject Descriptors We benchmarked a variety of platforms spanning different
architecture (see Table 1). Ranger and Taub are supercom-
D.1.3 [Concurrent Programming]: Parallel Programming;
puters, while Open Cirrus is a dedicated cluster with slower
K.6.4 [System Management]: Centralization/decentralization
interconnect. HPLS and Eucalyptus are typical cloud envi-
ronment. We also compare lightweight virtualization using
Keywords dedicated network (thin VM) and Linux containers (LXC)
High Performance Computing, Clouds, Resource Scheduling using NAMD [4], a highly scalable molecular dynamics ap-
plication with the ApoA1 input (92k atoms).
Figure 1 shows the scaling behavior of our testbeds for (a)
1. INTRODUCTION different platforms and (b) for different virtualization tech-
A recent study reaffirmed that dedicated supercomputers niques applied to a typical cloud node. Due to superior net-
are still more cost-effective than cloud for large-scale HPC work performance on the supercomputers (Taub, Ranger),
applications [2]. This is largely due to the high overhead NAMD scales well over the test range, while we observe
of virtualization on I/O latency which hinders the adoption scaling problems on Open cirrus and even more on cloud
of cloud for large-scale HPC applications [2, 6]. However, (Eucalyptus, HPLS) due to inferior network performance,
our preliminary study indicated that cloud resources could which we verified by measuring the time spent in communi-
be cost-effective for small and medium-scale HPC applica- cation. Networking on cloud is further impacted by the I/O
tions [5]. As such, resource allocation should be aware of virtualization overhead, although through a more in-depth
application and resource characteristics to maximize appli- study we show alternative techniques (b) that can partially
cation performance yet minimizing cost. mitigate the overhead. thin VM assigns a dedicated net-
This paper describes a proposed scheme to intelligently work interface to each VM via an IOMMU pass-through,
map an HPC application to a set of hybrid resources con- and achieves near native performance (’bare’). We also show
sisting of a mix of dedicated and cloud resources. Section 2 that the slowdown incurred by CPU virtualization is mini-
begins with our in-depth performance characterization for mal, compared to conventional network virtualization (’plain
HPC applications on various dedicated clusters and cloud. VM’). Interference from the OS and hypervisor causes addi-
We discover that the cloud overhead can be minimized to a tional slowdown on VMs. Figure 1(c) shows the distribution
negligible level using thin hypervisors or OS-level contain- of execution slowdown from the ideal 1000µs execution step
measured on a virtualized node.
Based on these findings and our previous work [5], we de-
Copyright is held by the author/owner(s).
HPDC’12, June 18–22, 2012, Delft, The Netherlands. veloped a mapper tool shown in Figure 2. Starting from
ACM 978-1-4503-0805-2/12/06. an HPC application, through characterization we extract a
105

4
10

Execution Time per step (s)

Execution Time per step (s)


HP Cloud VM-plain
103

Distribution
Euca. Cloud 10
0 VM-thin
100 Open Cirrus bare
Ranger
Taub 102
10-1
101
-2
10-1
10
100
1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 0 100 200 300 400 500
Number of cores Number of cores Iteration time delay (us)

(a) NAMD on different plat-(b) NAMD using different VM (c) Noise


forms
Figure 1: (a,b) Execution time vs. Number of cores for NAMD (c) Noise Benchmark on a VM
1.4
Platform Application Instance User Preferences Normalized Time
1.2 Normalized Cost
characteristics (Optional: Application (cost, perf , QoS)

Normalized Cost
Normalized Time/
Signature) 1

User did not 0.8


Mapper Front End provide Application 0.6
Signature
0.4

User provided 0.2


Simulation, and Modeling
Application 0
Signature

Av
IS rag
IS clas
IS clas B_4
Ja clas B_1
Ja obi1 B_6
Ja obi1 _4
Ja obi1 _16
Ja obi2 _64
Ja obi2 _4
Ja obi2 _16
Ja obi4 _64
Ja obi4 _4
EP obi4 _16
EP cla _64
EP cla sB_
_ e
_ s
_ s
c s 6
c k 4
c k
c k
c k
c k
c k
c k
c k
e

_ k
_ s
_c ssB 4
la _
ss 16
Relative Performance, More accurate

B_
Application

64
Cost Estimation Prediction desired
Signature
Application
Network, Noise Simulation /
Learning iterations Figure 3: Normalized Performance and Cost (intel-
ligent mapping vs execution on supercomputer)
Platform Decision
considering user pref.
Predicted Perf lem sizes, that is input matrix dimensions (e.g. size 1k ×
1k). For this application set, our scheme reduces the cost on
average by 60% compared to a non-optimized configuration,
Recommended while the performance penalty is kept at 10-15%. Further
Platform for this App details can be found in the technical report on this work [3].
Figure 2: Mapper Approach
signature capturing the most important dimensions: num-
ber and size of messages, computational grain size (FLOPS), 4. LESSONS LEARNED, CONCLUSIONS
overlap percentage of computation and communication, pres- We have shown that the adoption of intelligent mapping
ence of synchronization barriers and load balancing. Sub- techniques is pivotal to the success of hybrid platform envi-
sequently, given a set of applications to execute and a set ronments that combine supercomputer and typical hypervisor-
of target platforms, we define heuristics to map the appli- based clouds. In some cases, a hybrid cloud-supercomputer
cations to the platforms that optimize parallel efficiency. platform environment can outperform its individual con-
In doing so, we consider several target platforms spanning stituents. We learned that application characterization in
a variety of processor configurations, interconnection net- HPC-Cloud space is a challenging problem, but the benefits
works, and virtualization environments. Platform charac- are substantial. Finally, we demonstrated that lightweight
teristics, such as CPU frequency, interconnect latency and virtualization is important to remove “friction” from HPC in
bandwidth, platform costs (using a pay-per-use charging cloud.
rate based model) and user preferences are considered. The We described the concept and initial implementation of a
output of the tool are platform recommendations to opti- static tool to automate the mapping, using a combination
mize practical scenarios such as best performance within a of application characteristics, platform parameters, and user
constrained budget, or cost minimization with performance preferences. In the future, we plan to extend the mapping
guarantees. tool to also perform a dynamic adjustment of the static map-
ping through run-time monitoring.
3. RESULTS
We evaluated the results obtained by our mapper and 5. REFERENCES
studied the benefits using it to map a set of application [1] NPB. http://www.nas.nasa.gov/Resources/Software/npb.html.
[2] Magellan Final Report. Technical report, U.S. Department of
to supercomputer (Ranger) and Eucalyptus cloud. Figure 3 Energy (DOE), 2011.
shows the significant cost savings achieved while meeting [3] Exploring the Performance and Mapping of HPC Applications
performance guarantees using our intelligent mapper. Em- to Platforms in the Cloud. Technical report, HP Labs, 2012.
barrassingly parallel (EP) and Integer sort (IS) benchmarks [4] A. Bhatele et al. Overcoming Scaling Challenges in Biomolecular
Simulations across Multiple Platforms. In IPDPS 2008.
are part of NPB Class B benchmark suite [1] and Jacobi2D
[5] A. Gupta and D. Milojicic. Evaluation of HPC Applications on
is a kernel which performs 5-point stencil computation to Cloud. In Best Student Paper, Open Cirrus Summit, 2011.
average values in a 2-D grid. The application suffix is the [6] E. Walker. Benchmarking Amazon EC2 for high-performance
number of processors; for Jacobi, we consider multiple prob- scientific computing. LOGIN, 2008.

View publication stats

You might also like