Introduction To HPC and Current Usage in HEP
Introduction To HPC and Current Usage in HEP
erhtjhtyhy
Rui Wang
Argonne National Laboratory
Monday, 20 May, 2024
What is High Performance Computing (HPC)
2
Basics of HPC Architecture
● Compute nodes, interconnects, storage systems, and a software stack
optimized for high performance
3
Cluster management and job scheduling
● Allocate computational resources and manage job execution on HPC clusters
– Slurm (Simple Linux Utility for Resource A two rank MPI job which utilizes 2 physical cores (and 4 hyperthreads) of
a Perlmutter CPU node
Management) #!/bin/bash
#SBATCH --qos=shared
i. Highly scalable cluster management and #SBATCH --constraint=cpu
#SBATCH --time=5
job scheduling system. Allocates #SBATCH --nodes=1
#SBATCH --ntasks=2
resources based on user-defined policies #SBATCH --cpus-per-task=2
srun --cpu-bind=cores ./a.out
and manages job execution on the cluster
– PBS (Portable Batch System) A 30 min 1 node interactive job on Aurora
4
Parallel Processing
● To solve problem faster, works are often dissected in pieces and executed in
parallel
● Embarrassingly Parallel
– A bunch of copies of the same task, each has different input parameters
https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code
5
Parallel Processing
● Shared-Memory Parallelism (Multithreading)
– Tasks are run as threads on separate CPU-cores of the same computer
– Methods: OpenMP, POSIX Threads (pthread), vector intrinsics (Intel MKL), C++
Parallel STL (Intel TBB) and etc.
program hello_world_multithreaded
use omp_lib
!$omp parallel
end program
https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code
6
Parallel Processing
● Distributed-Memory Parallelism (Multiprocessing)
– Running tasks as multiple processes that do not share the same space in memory
– Methods: MPI (Message-Passing Interface) & Spark/Hadoop, Dask, General
Multiprocessing and etc. for Machine Learning
#include <iostream>
#include <mpi.h>
MPI_Init(&argc, &argv);
MPI_Finalize();
https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code return 0;
}
7
NERSC Perlmutter @LBNL 14th of Top500
8
Vega @ IZUM Slovenian
European first Peta-scale HPC
Peak performance: 10.1 petaflops
9
TACC Frontera @ UTexas 33th of Top500
Mellanox InfiniBand HDR: Full HDR (200 Gb/s); HDR100 (100 Gb/s)
10
ALCF Aurora @ Argonne National Lab 2nd of Top500
10,624 nodes
11
OLCF Frontier @ Oak Ridge National Lab 1st of Top500
12
Computing Challenge in High Energy Physics
Same scale
Albrecht et al., 2019
13
Computing Challenge in High Energy Physics
● Large-scale Monte Carlo (MC) Simulations
● Data Analysis from Particle Detectors
● Complex Computational Models (e.g., Lattice QCD)
CMS Offline and Computing Public Results & ATLAS Software and Computing HL-LHC Roadmap
14
HPC usage – CMS&ATLAS
CPU @ Perlmutter
MC Simulation
GPU @ Perlmutter
R&D / ML
(eg. TrackML)
15
HPC usage – Cosmological N-Body simulation
Evolution of matter distribution over cosmic time for a Adiabatic Hydro Simulation on Sunspot
sub-volume of a HACC simulation (Aurora TDS)
FarPoint: https://arxiv.org/abs/2109.01956
https://www.exascaleproject.org/event/hacc/
16
Challenges and Future Directions
● HEP Computing Resource Challenges Distributed Heterogeneous Computing with PanDA in ATLAS
– T.Maeno CHEP23 talk
– HL-LHC: 10X data, 10X complexity
– In 2030, LHC experiments will need
i. > O(100) PFlops in sustained compute performance
ii. O(10) Exabyte/year data throughput
● HPC Challenges
– Ability to run multiple HEP workflows at O(10) Pflops
sustained on multiple heterogeneous exascale systems
– Match HEP I/O requirements to HPC file systems and
networking infrastructure
– Address realtime to near-realtime applications and resilience
challenges Evolution of ASCR and HEP computing resources
https://indico.fnal.gov/event/61971/contributions/281095/attachments/173755/23
5361/HEP-CCE_intro_dec23_AHM%5B1%5D.pdf
17
HEP-CCE project
Advanced Scientific High Energy Physics (HEP)
Computing Research
(ASCR)
Research and Facilities Research and Facilities
HEP-CCE
Computational HEP Program
ASCR Programs
Cross-Cuts
https://indico.fnal.gov/event/61971/contributions/281095/attachments/173755/23
5361/HEP-CCE_intro_dec23_AHM%5B1%5D.pdf
18
Tasks identified
● Ability for HEP workflows to exploit concurrency at the node level of HPC platforms
– Natural event-level parallelism is not enough
● Multiple compute platforms as a portability challenge
– Complex HEP code needs to run on grid/cloud/HPC systems in production
● I/O needs to scale to thousands of HPC nodes
– Not just for data but also for software libraries, databases, configurations (~100K files)
● Ability to run complex HEP workflows robustly on HPC systems
– Resilience challenge
● Mixed CPU/GPU pipelines will become more common as AI/ML methods are
increasingly incorporated
– HPC facilities offer unique large-scale AI/ML training opportunities
https://indico.fnal.gov/event/61971/contributions/281095/attachments/173755/235361/HEP-CCE_intro_dec23_AHM%5B1%5D.pdf
19
Portable Applications to Portable Workflows
● Address needs of large and small HEP workflows 2019
● Hybrid CPU/GPU application support
● Current and promised future support of hardware
● Long term sustainability, code stability, and performance
FastCaloSim: ATLAS parameterized LAr calorimeter simulation
Patatrack: CMS pixel detector reconstruction
Wirecell Toolkit: LArTPC signal simulation
p2r: CMS "propagate-to-R" track reconstruction
better
2023
better
● Identify I/O & storage bottleneck of HEP application (Proto)DUNE (will) write its raw data in HDF5 format
ry
TTree ina RNTuple
lim
Pre
Summary on PPS
21
Accelerating HEP Simulation
● Event generators for accelerated systems
● Celeritas: new MC particle transport code designed for high performance simulation of
complex HEP detectors on GPU-accelerated hardware
MadGraph (SYCL) Geometry modeling
CMS (CMSSW)
Portability Scalability
Up to 500
nodes
Scalability
Large Problems
23
Complex workflow
● Characterizes workflows across various dimensions
● Captures software and system use
● Identifies challenges for each experiment
24
Summary
● HPC plays a vital role in HEP research, enabling large-scale simulations, data analysis,
and theoretical calculations.
● Current applications include Monte Carlo simulations, data analysis from particle
detectors, and event generations
● providing excellent opportunities for utilizing large resources and acquiring advanced
expertise
25
Q&A
26
Backup
27
HPC system statistics
28
US HPC facilities
Two distinct types of HPC facilities in the US
▪ DOE Leadership-class Facilities (LCFs)
– Argonne National Lab, Oak Ridge National Lab
– Focused on FLOPS, GPU heavy
– Prefer jobs that can fill the entire HPC
– Very locked down network - absolutely no WAN file transfer from the compute nodes
▪ User-focused facilities
– TACC, NERSC and so on
– Usually a mix of GPU-focused and CPU-focused machines
– WAN data transfer often allowed in jobs, but using DTN preferred
29
HPC usage – Lattice QCD
Perlmutter benchmarking example
33