Then_and_Now_Improving_Software_Portability_Productivity_and_100_Performance

Software

Uploaded by

Srihari Yallala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views10 pages

Then_and_Now_Improving_Software_Portability_Productivity_and_100_Performance

Software

Uploaded by

Srihari Yallala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

THEME ARTICLE: TRANSFORMING SCIENCE THROUGH

SOFTWARE: IMPROVING WHILE DELIVERING 100X

Then and Now: Improving Software

Portability, Productivity, and 1003
Performance
Hartwig Anzt , University of Tennessee, Knoxville, TN, 37996, USA
Axel Huebl and Xiaoye S. Li , Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA

The U.S. Exascale Computing Project (ECP) has succeeded in preparing applications to
run efficiently on the first reported exascale supercomputers in the world. To achieve
this, it modernized the whole leadership software stack, from libraries to simulation
codes. In this article, we contrast selected leadership software before and after the ECP.
We discuss how sustainable research software development for leadership computing
can embrace the conversation with the hardware vendors, leadership computing
facilities, software community, and domain scientists who are the application
developers and integrators of software products. We elaborate on how software needs
to take portability as a central design principle and to benefit from interdependent
teams; we also demonstrate how moving to programming languages with high
momentum, like modern C11, can help improve the sustainability, interoperability, and
performance of research software. Finally, we showcase how cross-institutional efforts
can enable algorithm advances that are beyond incremental performance optimization.

H
igh-performance computing (HPC) enables Oak Ridge National Laboratory was the first machine
innovation for scientists and engineers, across benchmarked to compute the LINPACK benchmark at
exploration and discovery science, design and an execution rate of 1 exaflops—thereby fulfilling the
optimization, or validation of theories about the funda- ambitious goal of a 5 performance improvement over
mental laws of nature. Supercomputers enable largest Summit.b With the sunset of the ECP in December
scale data analysis as well as modeling and simulation 2023, it is time to look at the software side and com-
to study systems that would be impossible to study at pare the capability status before and after the ECP.
the same level of detail in the real world, e.g., due to For this purpose, we describe several software
the size, complexity, physical danger, or cost involved. projects that are rooted in mathematical libraries and
In 2016, the U.S. Exascale Computing Project (ECP)a the application space, and we investigate their perfor-
started on its mission to accelerate the delivery of a mance improvements and sustainability: the Extreme-
capable exascale computing ecosystem that delivers Scale Scientific Software Development Kit (xSDK) and
50 the application performance of the leading 20 its constituent libraries, such as Ginkgo, Software for
petafloating point operations per second (petaflops) Linear Algebra Targeting Exascale (SLATE), SuperLU,
systems. In 2022, the Frontier supercomputer at the and the laser–plasma modeling application WarpX.c
We will describe critical facets of how software devel-
a
https://www.exascaleproject.org opment methodologies and interdisciplinary teams
have been transformed, leading to improvements in
© 2024 The Authors. This work is licensed under a Creative the software itself, and why these advances are essen-
Commons Attribution 4.0 License. For more information, see tial for next-generation science.
https://creativecommons.org/licenses/by/4.0/
Digital Object Identifier 10.1109/MCSE.2024.3387302
b
Date of publication 10 April 2024; date of current version https://www.olcf.ornl.gov/summit/
c
5 July 2024. https://warpx.readthedocs.io

January-March 2024 Published by the IEEE Computer Society Computing in Science & Engineering 61
TRANSFORMING SCIENCE THROUGH SOFTWARE: IMPROVING WHILE DELIVERING 100X

SOFTWARE ENGINEERING: THEN Easy installation via the Spack package managere:
AND NOW The release process uses the Spack pull request
process for all xSDK-related changes that go into
Prior to the ECP, many HPC software stacks used for
the Spack package manager.
scientific research in the U.S. Department of Energy
Continued systematic testing: All xSDK pack-
(DOE) were developed and grew in response to needs
ages go through Spack build test cycles on vari-
of time-limited domain science projects. There was not
ous commonly used workstations. The testing
much coordination among different software teams
is also extended to multiple DOE Leadership
and products. For example, multiple libraries could not
Computing Facility machines. The Gitlab CI
even be built and linked into a single application, e.g.,
(pipeline) infrastructure is used to perform daily
due to name space issues. The naturally growing soft-
runs of multiple tests on different systems.f
ware stacks also often did not have a defined software
Performance autotuner GPTuneg: Each library
development cycle or quality standards to adhere to.
in xSDK has tunable parameters that may
Sometimes, even ad hoc solutions implemented to
greatly affect the code performance on the
serve certain requirements became an essential part
actual machine. GPTune uses Bayesian optimi-
of a major software ecosystem.
zation based on Gaussian process regression
The concept of making software sustainable, pro-
to find the best parameter configurations. It
ductive, and reliable through a defined software
supports advanced features, such as multitask
development process and a culture of collaborative
learning, transfer learning, multifidelity/objective
software engineering became popular (and required)
tuning, and parameter sensitivity analysis.
only during the ECP. Among the most successful and
impactful measures on the side of mathematical Since the inception of the ECP, the number of librar-
software is xSDK.d Its community efforts have imple- ies in the xSDK collection has grown to 26. Figure 1
mented a set of standards on software quality and illustrates the dependencies among some of the librar-
interoperability, deployed a federated continuous ies. As shown in this hierarchy, some libraries at the
integration (CI) infrastructure that allows for rigorous lower level provide commonly used building blocks
software testing on various hardware architectures, that are needed by the higher level math libraries
and taken the challenge of defining software packages and applications.
that contain compatible versions of a plethora of inde- As an example, the Ginkgo software stack devel-
pendent but interoperable software libraries. xSDK oped under the ECP currently employs 45 CI pipelines
pioneered a set of key elements that addresses the on CPU and GPU architectures from AMD, Intel, and
shortcomings from the past: Nvidia and has 91% unit test coverage. Likewise, the
WarpX application performs CI tests with code reviews
Community policies: There is a set of manda-
on the three major operating systems (Linux, macOS,
tory policies [including topics of configuring,
and Windows) and deploys to three major CPU (x86,
installing, testing, message passing interface
ARM, and PPC) and three major GPU (Nvidia, AMD,
(MPI) usage, portability, contact and version
and Intel) architectures. Reusing the mathematical
information, open source licensing, name spac-
software in xSDK, the AMReX libraryh became a central
ing, and repository access] that a software
dependency of WarpX for its data structures, commu-
package must satisfy to be considered xSDK
nication routines, portability, and third-party solvers.
compatible. Also presented are recommended
However, it is not the community agreeing on
policies (including public repository access,
standards, reuse, and the technical realization of rigor-
error handling, freeing system resources, and
ous software testing alone that enabled higher produc-
library dependencies), which are encouraged
tivity and collaboration across institutions. Equally
but not required.
important is the recognition of research software engi-
Interoperability (see Figure 1): This enables a
neering as a profession, establishing career paths and
collection of related and complementary soft-
understanding the culture around it.i It is an open
ware packages to be able to call each other so
research software engineering culture across projects
that they can be used simultaneously to solve a
complex problem. e
https://spack.io
f
https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines
g
https://gptune.lbl.gov
h
https://amrex-codes.github.io/amrex/
d i
https://xsdk.info https://us-rse.org and https://society-rse.org