0% found this document useful (0 votes)
10 views

Introduction To HPC and Current Usage in HEP

Uploaded by

nadiabha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Introduction To HPC and Current Usage in HEP

Uploaded by

nadiabha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to HPC and Current Usage in HEP

erhtjhtyhy

Rui Wang
Argonne National Laboratory
Monday, 20 May, 2024
What is High Performance Computing (HPC)

● High Performance Computing (HPC) refers to technology that combines


computing powers to process data and perform complex calculations at high
speeds
– Parallel processing techniques and advanced algorithms to solve complex
computational problems efficiently and rapidly
– Cluster of powerful computing nodes eg. supercomputer
● HPC systems are designed to handle large-scale simulations, data analysis
tasks, and modeling challenges

2
Basics of HPC Architecture
● Compute nodes, interconnects, storage systems, and a software stack
optimized for high performance

AMD instinct MI250x accelerators


(OLCF Frontier))

3
Cluster management and job scheduling
● Allocate computational resources and manage job execution on HPC clusters

– Slurm (Simple Linux Utility for Resource A two rank MPI job which utilizes 2 physical cores (and 4 hyperthreads) of
a Perlmutter CPU node
Management) #!/bin/bash
#SBATCH --qos=shared
i. Highly scalable cluster management and #SBATCH --constraint=cpu
#SBATCH --time=5
job scheduling system. Allocates #SBATCH --nodes=1
#SBATCH --ntasks=2
resources based on user-defined policies #SBATCH --cpus-per-task=2
srun --cpu-bind=cores ./a.out
and manages job execution on the cluster
– PBS (Portable Batch System) A 30 min 1 node interactive job on Aurora

qsub -l select=1 -l walltime=30:00 -A [your_ProjectName] -q


i. Cluster and resource management EarlyAppAccess -I

Recommended PBSPro options follow.


software suite used for managing and
#!/bin/sh
scheduling jobs on high-performance #PBS -A [your_ProjectName]
#PBS -N
computing (HPC) clusters #PBS -l walltime=[requested_walltime_value]
#PBS -k doe
#PBS -l place=scatter
#PBS -q EarlyAppAccess

4
Parallel Processing

● To solve problem faster, works are often dissected in pieces and executed in
parallel
● Embarrassingly Parallel
– A bunch of copies of the same task, each has different input parameters

https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code

5
Parallel Processing
● Shared-Memory Parallelism (Multithreading)
– Tasks are run as threads on separate CPU-cores of the same computer
– Methods: OpenMP, POSIX Threads (pthread), vector intrinsics (Intel MKL), C++
Parallel STL (Intel TBB) and etc.

program hello_world_multithreaded
use omp_lib

!$omp parallel

write(*,*) "Hello from process ",


omp_get_thread_num(), " of ",
omp_get_num_threads()

!$omp end parallel

end program

https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code

6
Parallel Processing
● Distributed-Memory Parallelism (Multiprocessing)
– Running tasks as multiple processes that do not share the same space in memory
– Methods: MPI (Message-Passing Interface) & Spark/Hadoop, Dask, General
Multiprocessing and etc. for Machine Learning

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {


using namespace std;

MPI_Init(&argc, &argv);

int world_size, world_rank;


MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

// Get the name of the processor


char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);

// Print off a hello world message


cout << "Process " << world_rank << " of " << world_size
<< " says hello from " << processor_name << endl;

// uncomment next line to make CPU-cores work (infinitely)


// while (true) {};

MPI_Finalize();
https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code return 0;
}

7
NERSC Perlmutter @LBNL 14th of Top500

CPU (3072 nodes) GPU (1536 40GB + 256 80GB nodes)

4x NVIDIA A100 (Ampere) GPU


2x AMD EPYC 7763 (Milan) CPU
● Peak FP64: 59.9 PFLOPS, 119.8 (tensor)
● 64 cores/CPU

● Peak FP64: 7.7 PFLOPS 1x AMD EPYC 7763 (Milan) CPU


● Peak FP64: 3.9 PFLOPS
1x HPE SlingShot 11
● 200G (25 GB/s) bandwidth 1x HPE SlingShot 11
● 200G (25 GB/s) bandwidth

Floating point operations per second


(FLOPS)
teraFLOPS TFLOPS 1012
petaFLOPS PFLOPS 1015
exaFLOPS EFLOPS 1018

8
Vega @ IZUM Slovenian
European first Peta-scale HPC
Peak performance: 10.1 petaflops

9
TACC Frontera @ UTexas 33th of Top500

CPU (8,008 nodes)

Intel Xeon Platinum 8280 ("Cascade Lake")


● 28 cores/socket, 56 cores/node

● Clock rate: 2.7Ghz

● Peak performance: 4.8TFLOPS

GPU (90 nodes)

4 NVIDIA Quadro RTX 5000/node


● 3072 CUDA Parallel Processing Cores/card

● 384 NVIDIA Tensor Cores/card

2 Intel Xeon E5-2620 v4 (“Broadwell”)/node

Mellanox InfiniBand HDR: Full HDR (200 Gb/s); HDR100 (100 Gb/s)

10
ALCF Aurora @ Argonne National Lab 2nd of Top500

10,624 nodes

May 13, 2023 1.012 ExaFLOPS

● 8 HPE Slingshot-11 NICs


● DAOS – high performance
storage system for storing
checkpoints and analysis
files

11
OLCF Frontier @ Oak Ridge National Lab 1st of Top500

America’s first exascale system

12
Computing Challenge in High Energy Physics

● HEP experiments, such as the experiments at Large Hadron Collider (LHC) at


CERN, produce an enormous amount of data with each collision event

Same scale
Albrecht et al., 2019

13
Computing Challenge in High Energy Physics
● Large-scale Monte Carlo (MC) Simulations
● Data Analysis from Particle Detectors
● Complex Computational Models (e.g., Lattice QCD)

CMS Offline and Computing Public Results & ATLAS Software and Computing HL-LHC Roadmap

14
HPC usage – CMS&ATLAS

Perlmutter, TACC, Vega

CPU @ Perlmutter
MC Simulation

GPU @ Perlmutter
R&D / ML
(eg. TrackML)

15
HPC usage – Cosmological N-Body simulation
Evolution of matter distribution over cosmic time for a Adiabatic Hydro Simulation on Sunspot
sub-volume of a HACC simulation (Aurora TDS)
FarPoint: https://arxiv.org/abs/2109.01956

https://www.exascaleproject.org/event/hacc/

Summit (GPU) @ OLCF Aurora @ ALCF

16
Challenges and Future Directions
● HEP Computing Resource Challenges Distributed Heterogeneous Computing with PanDA in ATLAS
– T.Maeno CHEP23 talk
– HL-LHC: 10X data, 10X complexity
– In 2030, LHC experiments will need
i. > O(100) PFlops in sustained compute performance
ii. O(10) Exabyte/year data throughput
● HPC Challenges
– Ability to run multiple HEP workflows at O(10) Pflops
sustained on multiple heterogeneous exascale systems
– Match HEP I/O requirements to HPC file systems and
networking infrastructure
– Address realtime to near-realtime applications and resilience
challenges Evolution of ASCR and HEP computing resources

● HEP Long-Term Involvement in HPC


– Chosen platform for Cosmic Frontier (CMB-S4, DESI, LSST
DESC, LZ, and etc.)
– LHC experiments among top users of NERSC
– DUNE (Neutrino experiment) has identified HPC resource
utilization as a major component of its computing strategy

https://indico.fnal.gov/event/61971/contributions/281095/attachments/173755/23
5361/HEP-CCE_intro_dec23_AHM%5B1%5D.pdf
17
HEP-CCE project
Advanced Scientific High Energy Physics (HEP)
Computing Research
(ASCR)
Research and Facilities Research and Facilities
HEP-CCE
Computational HEP Program
ASCR Programs
Cross-Cuts

HEP-CCE as a joint effort across the participating


laboratories to bring large-scale computational and data
management resources to bear on pressing HEP science
problems
● Focus on enabling cross-cutting solutions leveraging ASCR and
HEP expertise and resources
● Integrate individual strengths and resources

https://indico.fnal.gov/event/61971/contributions/281095/attachments/173755/23
5361/HEP-CCE_intro_dec23_AHM%5B1%5D.pdf
18
Tasks identified
● Ability for HEP workflows to exploit concurrency at the node level of HPC platforms
– Natural event-level parallelism is not enough
● Multiple compute platforms as a portability challenge
– Complex HEP code needs to run on grid/cloud/HPC systems in production
● I/O needs to scale to thousands of HPC nodes
– Not just for data but also for software libraries, databases, configurations (~100K files)
● Ability to run complex HEP workflows robustly on HPC systems
– Resilience challenge
● Mixed CPU/GPU pipelines will become more common as AI/ML methods are
increasingly incorporated
– HPC facilities offer unique large-scale AI/ML training opportunities

https://indico.fnal.gov/event/61971/contributions/281095/attachments/173755/235361/HEP-CCE_intro_dec23_AHM%5B1%5D.pdf

19
Portable Applications to Portable Workflows
● Address needs of large and small HEP workflows 2019
● Hybrid CPU/GPU application support
● Current and promised future support of hardware
● Long term sustainability, code stability, and performance
FastCaloSim: ATLAS parameterized LAr calorimeter simulation
Patatrack: CMS pixel detector reconstruction
Wirecell Toolkit: LArTPC signal simulation
p2r: CMS "propagate-to-R" track reconstruction
better

2023

better

Charles Leggett's summary on PPS


20
Optimizing Data Storage HDF5 as intermediate event storage for HPC

● Identify I/O & storage bottleneck of HEP application (Proto)DUNE (will) write its raw data in HDF5 format

● Data management, data reduction/compression


● Optimized data delivery
Parallel HDF5
ROOT RNTuple & object storage (DAOS) with Darshan
I/O activities related to ROOT files(ATLAS derivation)

ry
TTree ina RNTuple
lim
Pre

Summary on PPS
21
Accelerating HEP Simulation
● Event generators for accelerated systems
● Celeritas: new MC particle transport code designed for high performance simulation of
complex HEP detectors on GPU-accelerated hardware
MadGraph (SYCL) Geometry modeling
CMS (CMSSW)
Portability Scalability

Up to 500
nodes

Sherpa (CUDA, Kokkos)


Optical photon
simulation

Scalability

Optical photon shower (JUNO


neutrino detector); S.C. Blyth,
Opticks: GPU Optical Photon
Portability Simulation for Particle Physics
with Nvidia OptiX™, CHEP 2018.

Stefan Hoeche & Taylor Childers's summary on Event Generator


22
Scaling up HEP AI/ML Applications

Hyper parameter scan Inference as a Service (IaaS)


particle flow HPO

Large Problems

ML@NERSC 2022 survey Paolo Calafiura's talk on HEP-CCE ML scaling

23
Complex workflow
● Characterizes workflows across various dimensions
● Captures software and system use
● Identifies challenges for each experiment

Kyle Chard's summary on CW

24
Summary
● HPC plays a vital role in HEP research, enabling large-scale simulations, data analysis,
and theoretical calculations.
● Current applications include Monte Carlo simulations, data analysis from particle
detectors, and event generations
● providing excellent opportunities for utilizing large resources and acquiring advanced
expertise

● Addressing challenges and leveraging


advancements in HPC for groundbreaking HPC HEP
This
discoveries in High Energy Physics intro

25
Q&A

26
Backup

27
HPC system statistics

28
US HPC facilities
Two distinct types of HPC facilities in the US
▪ DOE Leadership-class Facilities (LCFs)
– Argonne National Lab, Oak Ridge National Lab
– Focused on FLOPS, GPU heavy
– Prefer jobs that can fill the entire HPC
– Very locked down network - absolutely no WAN file transfer from the compute nodes

▪ User-focused facilities
– TACC, NERSC and so on
– Usually a mix of GPU-focused and CPU-focused machines
– WAN data transfer often allowed in jobs, but using DTN preferred

29
HPC usage – Lattice QCD
Perlmutter benchmarking example

● Generation: lattices are propagated until they sample an equilibrium


distribution of lattice configurations. Due to long equilibration and
decorrelation times, lattice generation typically uses checkpoint restart
methods to split the generation stages into multiple jobs.
● Spectrum: The generation stages continue after the lattice distribution has
equilibrated, and a subset of the lattices (that have been written to disk) are
periodically sampled to be analyzed by spectrum stages
30
Event Reconstruction – Tracking
▪ Traditionally, reconstruction software is very
experiment-dependent
– detector geometry, calibrations, etc.
▪ Recent hardware evolution dictate common
solutions to tackle the HL-LHC challenges
▪ A Common Tracking Software Project (ACTS)
develops an experiment-independent set of
track reconstruction tools
– Provide high-level modules that can be used E
for any tracking detector
– Highly performant, yet largely customizable
v
implementations e
– Users from LHC, FCC, SPhenix, EIC, … nt
▪ Key features: R
– Tracking geometry description
e
– Simple event data model
– Common algorithms for: c
• Track propagation and fitting, Seed finding, o
Vertexing C. Rougier - ATLAS ITk Track Reconstruction with a GNN-based Pipeline
n
31
32
Programing model for Accelerators
● Kokkos: abstraction layer which supports CPU, GPU and etc., as well as
various HPC architectures
● CUDA: parallel computing platform and programming model developed by
NVIDIA
● SYCL: higher-level programming model for various hardware accelerators
● HIP: C++ Runtime API and Kernel Language for portable applications creation
with AMD and NVIDIA GPUs
● alpaka: header-only abstraction library for accelerator development
● std:par: parallele excursion added in C++20

33

You might also like