Exploring_the_performance_and_mapping_of_HPC_appli
Exploring_the_performance_and_mapping_of_HPC_appli
net/publication/254005988
CITATIONS READS
30 180
9 authors, including:
All content following this page was uploaded by Abhishek Gupta on 17 November 2014.
4
10
Distribution
Euca. Cloud 10
0 VM-thin
100 Open Cirrus bare
Ranger
Taub 102
10-1
101
-2
10-1
10
100
1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 0 100 200 300 400 500
Number of cores Number of cores Iteration time delay (us)
Normalized Cost
Normalized Time/
Signature) 1
Av
IS rag
IS clas
IS clas B_4
Ja clas B_1
Ja obi1 B_6
Ja obi1 _4
Ja obi1 _16
Ja obi2 _64
Ja obi2 _4
Ja obi2 _16
Ja obi4 _64
Ja obi4 _4
EP obi4 _16
EP cla _64
EP cla sB_
_ e
_ s
_ s
c s 6
c k 4
c k
c k
c k
c k
c k
c k
c k
e
_ k
_ s
_c ssB 4
la _
ss 16
Relative Performance, More accurate
B_
Application
64
Cost Estimation Prediction desired
Signature
Application
Network, Noise Simulation /
Learning iterations Figure 3: Normalized Performance and Cost (intel-
ligent mapping vs execution on supercomputer)
Platform Decision
considering user pref.
Predicted Perf lem sizes, that is input matrix dimensions (e.g. size 1k ×
1k). For this application set, our scheme reduces the cost on
average by 60% compared to a non-optimized configuration,
Recommended while the performance penalty is kept at 10-15%. Further
Platform for this App details can be found in the technical report on this work [3].
Figure 2: Mapper Approach
signature capturing the most important dimensions: num-
ber and size of messages, computational grain size (FLOPS), 4. LESSONS LEARNED, CONCLUSIONS
overlap percentage of computation and communication, pres- We have shown that the adoption of intelligent mapping
ence of synchronization barriers and load balancing. Sub- techniques is pivotal to the success of hybrid platform envi-
sequently, given a set of applications to execute and a set ronments that combine supercomputer and typical hypervisor-
of target platforms, we define heuristics to map the appli- based clouds. In some cases, a hybrid cloud-supercomputer
cations to the platforms that optimize parallel efficiency. platform environment can outperform its individual con-
In doing so, we consider several target platforms spanning stituents. We learned that application characterization in
a variety of processor configurations, interconnection net- HPC-Cloud space is a challenging problem, but the benefits
works, and virtualization environments. Platform charac- are substantial. Finally, we demonstrated that lightweight
teristics, such as CPU frequency, interconnect latency and virtualization is important to remove “friction” from HPC in
bandwidth, platform costs (using a pay-per-use charging cloud.
rate based model) and user preferences are considered. The We described the concept and initial implementation of a
output of the tool are platform recommendations to opti- static tool to automate the mapping, using a combination
mize practical scenarios such as best performance within a of application characteristics, platform parameters, and user
constrained budget, or cost minimization with performance preferences. In the future, we plan to extend the mapping
guarantees. tool to also perform a dynamic adjustment of the static map-
ping through run-time monitoring.
3. RESULTS
We evaluated the results obtained by our mapper and 5. REFERENCES
studied the benefits using it to map a set of application [1] NPB. http://www.nas.nasa.gov/Resources/Software/npb.html.
[2] Magellan Final Report. Technical report, U.S. Department of
to supercomputer (Ranger) and Eucalyptus cloud. Figure 3 Energy (DOE), 2011.
shows the significant cost savings achieved while meeting [3] Exploring the Performance and Mapping of HPC Applications
performance guarantees using our intelligent mapper. Em- to Platforms in the Cloud. Technical report, HP Labs, 2012.
barrassingly parallel (EP) and Integer sort (IS) benchmarks [4] A. Bhatele et al. Overcoming Scaling Challenges in Biomolecular
Simulations across Multiple Platforms. In IPDPS 2008.
are part of NPB Class B benchmark suite [1] and Jacobi2D
[5] A. Gupta and D. Milojicic. Evaluation of HPC Applications on
is a kernel which performs 5-point stencil computation to Cloud. In Best Student Paper, Open Cirrus Summit, 2011.
average values in a 2-D grid. The application suffix is the [6] E. Walker. Benchmarking Amazon EC2 for high-performance
number of processors; for Jacobi, we consider multiple prob- scientific computing. LOGIN, 2008.