SlideShare a Scribd company logo
Burst retrieval of data
from multiple Cloud regions for
Multi-Messenger Astrophysics
with IceCube
Igor Sfiligoi
UCSD/SDSC
Jensen Huang keynote
yesterday
2
The Largest Cloud Simulation in History
50k NVIDIA GPUs in the Cloud
350 Petaflops for 2 hours
Distributed across US, Europe & Asia
On Saturday morning we bought all GPU capacity that was for sale in
Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide
Jensen Huang keynote
yesterday
3
The Largest Cloud Simulation in History
50k NVIDIA GPUs in the Cloud
350 Petaflops for 2 hours
Distributed across US, Europe & Asia
On Saturday morning we bought all GPU capacity that was for sale in
Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide
About 20TBytes
of data produced
in the process
The Science Case
IceCube
5
A cubic kilometer of ice at the
south pole is instrumented
with 5160 optical sensors.
Astrophysics:
• Discovery of astrophysical neutrinos
• First evidence of neutrino point source (TXS)
• Cosmic rays with surface detector
Particle Physics:
• Atmospheric neutrino oscillation
• Neutrino cross sections at TeV scale
• New physics searches at highest energies
Earth Science:
• Glaciology
• Earth tomography
A facility with very
diverse science goals
Restrict this talk to
high energy Astrophysics
High Energy Astrophysics
Science case for IceCube
6
Universe is opaque to light
at highest energies and
distances.
Only gravitational waves
and neutrinos can pinpoint
most violent events in
universe.
Fortunately, highest energy
neutrinos are of cosmic origin.
Effectively “background free” as long
as energy is measured correctly.
High energy neutrinos from
outside the solar system
7
First 28 very high energy neutrinos from outside the solar system
Red curve is the photon flux
spectrum measured with the
Fermi satellite.
Black points show the
corresponding high energy
neutrino flux spectrum
measured by IceCube.
This demonstrates both the opaqueness of the universe to high energy
photons, and the ability of IceCube to detect neutrinos above the maximum
energy we can see light due to this opaqueness.
Science 342 (2013). DOI:
10.1126/science.1242856
Understanding the Origin
8
We now know high energy events happen in the universe. What are they?
p + g D + p + 0 p + g g
p + g D + n + + n + +
Co
Aya Ishihara
The hypothesis:
The same cosmic events produce
neutrinos and photons
We detect the electrons or muons from neutrino that interact in the ice.
Neutrino interact very weakly => need a very large array of ice instrumented
to maximize chances that a cosmic neutrino interacts inside the detector.
Need pointing accuracy to point back to origin of neutrino.
Telescopes the world over then try to identify the source in the direction
IceCube is pointing to for the neutrino.
Multi-messenger Astrophysics
The ν detection challenge
9
Optical Pro
Aya Ishiha
Combining all the possible info
These features are included in
We re al a s be de eloping h
Nature never tell us a perfec
satisfactory agreem
Ice properties change with
depth and wavelength
Observed pointing resolution at high
energies is systematics limited.
Central value moves
for different ice models
Improved e and τ reconstruction
Þ increased neutrino flux
detection
Þ more observations
Photon propagation through
ice runs efficiently on single
precision GPU.
Detailed simulation campaigns
to improve pointing resolution
by improving ice model.
Improvement in reconstruction with
better ice model near the detectors
First evidence of an origin
10
First location of a source of very high energy neutrinos.
Neutrino produced high energy muon
near IceCube. Muon produced light as it
traverses IceCube volume. Light is
detected by array of phototubes of
IceCube.
IceCube alerted the astronomy community of the
observation of a single high energy neutrino on
September 22 2017.
A blazar designated by astronomers as TXS
0506+056 was subsequently identified as most likely
source in the direction IceCube was pointing. Multiple
telescopes saw light from TXS at the same time
IceCube saw the neutrino.
Science 361, 147-151
(2018). DOI:10.1126/science.aat2890
IceCube’s Future Plans
11
| IceCube Upgrade and Gen2 | Summer Blot | TeVPA 2018
The IceCube-Gen2 Facility
Preliminary timeline
MeV- to EeV-scale physics
Surface array
High Energy
Array
Radio array
PINGU
IC86
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 … 2032
Today
Surface air shower
ConstructionR&D Design & Approval
IceCube Upgrade
IceCube Upgrade
Deployment
Near term:
add more phototubes to deep core to increase granularity of measurements.
Longer term:
• Extend instrumented
volume at smaller
granularity.
• Extend even smaller
granularity deep core
volume.
• Add surface array.
Improve detector for low & high energy neutrinos
Details on the Cloud Burst
The Idea
• Integrate all GPUs available for sale
worldwide into a single HTCondor pool.
- use 28 regions across AWS, Azure, and Google
Cloud for a burst of a couple hours, or so.
• IceCube submits their photon propagation
workflow to this HTCondor pool.
- we handle the input, the jobs on the GPUs, and
the output as a single globally distributed system.
13
Run a GPU burst relevant in scale
for future Exascale HPC systems.
A global HTCondor pool
• IceCube, like all OSG user communities, relies on
HTCondor for resource orchestration
- This demo used the standard tools
• Dedicated HW setup
- Avoid disruption of OSG production system
- Optimize HTCondor setup for the spiky nature of the demo
§ multiple schedds for IceCube to submit to
§ collecting resources in each cloud region, then collecting from all
regions into global pool
14
HTCondor Distributed CI
15
Collector
Collector Collector
Collector
Collector
Negotiator
Scheduler SchedulerScheduler
IceCube
VM
VM
VM
10 schedd’s
One global resource pool
Using native Cloud storage
• Input data pre-staged into native Cloud storage
- Each file in one-to-few Cloud regions
§ some replication to deal with limited predictability of resources per region
- Local to Compute for large regions for maximum throughput
- Reading from “close” region for smaller ones to minimize ops
• Output staged back to region-local Cloud storage
- To be transferred back asynchronously after the compute is done
• Deployed simple wrappers around Cloud native file
transfer tools
- IceCube jobs do not need to customize for different Clouds
- They just need to know where input data is available
(pretty standard OSG operation mode)
16
Using native Cloud storage
• Input data pre-staged into native Cloud storage
- Each file in one-to-few Cloud regions
§ some replication to deal with limited predictability of resources per region
- Local to Compute for large regions for maximum throughput
- Reading from “close” region for smaller ones to minimize ops
• Output staged back to region-local Cloud storage
- To be transferred back asynchronously after the compute is done
• Deployed simple wrappers around Cloud native file
transfer tools
- IceCube jobs do not need to customize for different Clouds
- They just need to know where input data is available
(pretty standard OSG operation mode)
17
Done at a
leisurely pace
Using native Cloud storage
• Input data pre-staged into native Cloud storage
- Each file in one-to-few Cloud regions
§ some replication to deal with limited predictability of resources per region
- Local to Compute for large regions for maximum throughput
- Reading from “close” region for smaller ones to minimize ops
• Output staged back to region-local Cloud storage
- To be transferred back asynchronously after the compute is done
• Deployed simple wrappers around Cloud native file
transfer tools
- IceCube jobs do not need to customize for different Clouds
- They just need to know where input data is available
(pretty standard OSG operation mode)
18
The focus
of this talk
Science with 50k GPUs
achieved as peak performance
19
Time in Minutes
Each color is a different
cloud region in US, EU, or Asia.
Total of 28 Regions in use.
Peaked at about 50k GPUs
~350 Petaflops of fp32
8 generations of NVIDIA GPUs used.
A Heterogenous Resource Pool
20
28 cloud Regions across 4 world regions
providing us with 8 GPU generations.
No one region or GPU type dominates!
Science Produced
21
Distributed High-Throughput
Computing (dHTC) paradigm
implemented via HTCondor provides
global resource aggregation.
Largest cloud region provided 10.8% of the total
dHTC paradigm can aggregate
on-prem anywhere
HPC at any scale
and multiple clouds
Data Produced
22
Size of the data created
was proportional
to the events processed
Largest cloud region provided 10.8% of the total
Just as distributed as
the compute has been
About 20 TB total
Getting the data out of the Clouds
Timeline
• IceCube is actually in no hurry in getting the
data out of the Clouds
- Sooner is of course better
- But not time critical
• But Cloud great for urgent computing
- And there getting the data promptly out
would be as important as getting
the compute done in the first place
24
LIGO example
• The LIGO is the other MMA experiment that
can be used to detect large Cosmic events
and point other Astronomy observations
• They are currently limited by compute on
how accurate their pointing is
- More compute would mean better pointing
- Must must be prompt
25
LIGO example
• The LIGO is the other MMA experiment that
can be used to detect large Cosmic events
and point other Astronomy observations
• They are currently limited by compute on
how accurate their pointing is
- More compute would mean better pointing
- Must must be prompt
26
20k GPUs for 30 mins with a 30min ramp-
up gets us into the regime where we can
reasonably run a multi-approximant/multi-
EOS analysis to dramatically improve
confidence in probability of an EM counter
part in ~1 hour, so that classifications are
as accurate as they're going to get before
an optical counterpart fades
James Clark, LIGO
Demonstrating a Burt Transfer
• We thus decided to move
~10 TB of the data
back from the Clouds
in a short burst
- 10 TB dictated by the available storage options
• Trying two options
- Directly to UW using many commodity nodes
- Stage to a Internet2 DTN
27
UW commodity setup
• We fully expected to be disk I/O bound
- Single spinning disk per node
• We managed to secure 30 nodes
for the purpose
28
UW commodity setup
• Managed to transfer about
9 TB in 90 minutes
29
UW commodity setup
• About 16 Gbps aggregate bandwidth
- But huge variations between Cloud regions
- 3.5Gbps from best, <0.5 Gbps from worst
30
Internet2 DTN
• Wanted to see how a single high-end node
with flash-based storage would fare
• We also had previous network
measurements that suggested that we may
be able to beat the 30-node US setup
- See my CHEP19 talk, if interested
http://chep2019.org
31
Network measurements
32
US East
US West 2
35 Gbps
36 Gbps
33 Gbps
36 Gbps
AWS
From Cloud storage
/dev/shm
Network measurements
33
US East
US West 2
36 Gbps
31 Gbps
27 Gbps
29 Gbps
Azure
From Cloud storage
/dev/shm
Network measurements
34
US East 1
36 Gbps
US West 1
35 Gbps
Google Cloud
From Cloud storage
/dev/shm
Internet2 DTN
• Took about 2 hours to transfer 2 TB
- We did not beat UW
35
Internet2 DTN
• Peaked at slightly less than 10 Gbps
- Likely limited by the storage
• Again, huge differences in performance
between Cloud regions
36
Summary
• Large scale cloud computing is feasible
- We almost matched Summit in FLOP32s
- And can be ramped up very fast
• Getting data between on-prem and Cloud
not a big deal either
- We exceeded 10 Gbps while going
to virtually all Cloud regions
- But needs adequate on-prem capabilities
37
Acknowledgements
• Internet2 was the main network provider for
this activity.
• This work was partially sponsored by
NSF grants OAC-1941481,
MPS-1148698, OAC-1841530 and
OAC-1826967.
38
Ad

More Related Content

What's hot (19)

Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
Igor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
Igor Sfiligoi
 
Near Exascale Computing in the Cloud
Near Exascale Computing in the CloudNear Exascale Computing in the Cloud
Near Exascale Computing in the Cloud
Frank Wuerthwein
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Rob Emanuele
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
Tim Bell
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
Rob Emanuele
 
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
Andrew Howard
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
Tim Bell
 
Federated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation TherapyFederated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation Therapy
CESGA Centro de Supercomputación de Galicia
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
Amrita Prasad
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
Tim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
Tim Bell
 
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDBHow a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
InfluxData
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
Tim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
Tim Bell
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
Gordon Chung
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
NECST Lab @ Politecnico di Milano
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
Belmiro Moreira
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
Larry Smarr
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
Igor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
Igor Sfiligoi
 
Near Exascale Computing in the Cloud
Near Exascale Computing in the CloudNear Exascale Computing in the Cloud
Near Exascale Computing in the Cloud
Frank Wuerthwein
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Rob Emanuele
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
Tim Bell
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
Rob Emanuele
 
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
Andrew Howard
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
Tim Bell
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
Amrita Prasad
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
Tim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
Tim Bell
 
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDBHow a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
InfluxData
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
Tim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
Tim Bell
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
Gordon Chung
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
NECST Lab @ Politecnico di Milano
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
Belmiro Moreira
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
Larry Smarr
 

Similar to Burst data retrieval after 50k GPU Cloud run (20)

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
InfluxData
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
Larry Smarr
 
Detecting solar farms with deep learning
Detecting solar farms with deep learningDetecting solar farms with deep learning
Detecting solar farms with deep learning
Jason Brown
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
Databricks
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
Larry Smarr
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Databricks
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
Larry Smarr
 
The Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years InThe Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years In
Larry Smarr
 
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
The Statistical and Applied Mathematical Sciences Institute
 
NASA's Movement Towards Cloud Computing
NASA's Movement Towards Cloud ComputingNASA's Movement Towards Cloud Computing
NASA's Movement Towards Cloud Computing
Software & Information Industry Association
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERN
Arne Wiebalck
 
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
Larry Smarr
 
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
Larry Smarr
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Databricks
 
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...
Larry Smarr
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
Robert Grossman
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
inside-BigData.com
 
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE EU
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
InfluxData
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
Larry Smarr
 
Detecting solar farms with deep learning
Detecting solar farms with deep learningDetecting solar farms with deep learning
Detecting solar farms with deep learning
Jason Brown
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
Databricks
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
Larry Smarr
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Databricks
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
Larry Smarr
 
The Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years InThe Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years In
Larry Smarr
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERN
Arne Wiebalck
 
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
Larry Smarr
 
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
Larry Smarr
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Databricks
 
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...
Larry Smarr
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
Robert Grossman
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
inside-BigData.com
 
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE EU
 
Ad

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
Igor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
Igor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
Igor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Igor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Igor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
Igor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
Igor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
Igor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
Igor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
Igor Sfiligoi
 
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Igor Sfiligoi
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
Igor Sfiligoi
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondor
Igor Sfiligoi
 
Characterizing network paths in and out of the Clouds
Characterizing network paths in and out of the CloudsCharacterizing network paths in and out of the Clouds
Characterizing network paths in and out of the Clouds
Igor Sfiligoi
 
GRP 19 - Nautilus, IceCube and LIGO
GRP 19 - Nautilus, IceCube and LIGOGRP 19 - Nautilus, IceCube and LIGO
GRP 19 - Nautilus, IceCube and LIGO
Igor Sfiligoi
 
Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
Igor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
Igor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
Igor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Igor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Igor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
Igor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
Igor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
Igor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
Igor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
Igor Sfiligoi
 
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Igor Sfiligoi
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
Igor Sfiligoi
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondor
Igor Sfiligoi
 
Characterizing network paths in and out of the Clouds
Characterizing network paths in and out of the CloudsCharacterizing network paths in and out of the Clouds
Characterizing network paths in and out of the Clouds
Igor Sfiligoi
 
GRP 19 - Nautilus, IceCube and LIGO
GRP 19 - Nautilus, IceCube and LIGOGRP 19 - Nautilus, IceCube and LIGO
GRP 19 - Nautilus, IceCube and LIGO
Igor Sfiligoi
 
Ad

Recently uploaded (20)

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 

Burst data retrieval after 50k GPU Cloud run

  • 1. Burst retrieval of data from multiple Cloud regions for Multi-Messenger Astrophysics with IceCube Igor Sfiligoi UCSD/SDSC
  • 2. Jensen Huang keynote yesterday 2 The Largest Cloud Simulation in History 50k NVIDIA GPUs in the Cloud 350 Petaflops for 2 hours Distributed across US, Europe & Asia On Saturday morning we bought all GPU capacity that was for sale in Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide
  • 3. Jensen Huang keynote yesterday 3 The Largest Cloud Simulation in History 50k NVIDIA GPUs in the Cloud 350 Petaflops for 2 hours Distributed across US, Europe & Asia On Saturday morning we bought all GPU capacity that was for sale in Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide About 20TBytes of data produced in the process
  • 5. IceCube 5 A cubic kilometer of ice at the south pole is instrumented with 5160 optical sensors. Astrophysics: • Discovery of astrophysical neutrinos • First evidence of neutrino point source (TXS) • Cosmic rays with surface detector Particle Physics: • Atmospheric neutrino oscillation • Neutrino cross sections at TeV scale • New physics searches at highest energies Earth Science: • Glaciology • Earth tomography A facility with very diverse science goals Restrict this talk to high energy Astrophysics
  • 6. High Energy Astrophysics Science case for IceCube 6 Universe is opaque to light at highest energies and distances. Only gravitational waves and neutrinos can pinpoint most violent events in universe. Fortunately, highest energy neutrinos are of cosmic origin. Effectively “background free” as long as energy is measured correctly.
  • 7. High energy neutrinos from outside the solar system 7 First 28 very high energy neutrinos from outside the solar system Red curve is the photon flux spectrum measured with the Fermi satellite. Black points show the corresponding high energy neutrino flux spectrum measured by IceCube. This demonstrates both the opaqueness of the universe to high energy photons, and the ability of IceCube to detect neutrinos above the maximum energy we can see light due to this opaqueness. Science 342 (2013). DOI: 10.1126/science.1242856
  • 8. Understanding the Origin 8 We now know high energy events happen in the universe. What are they? p + g D + p + 0 p + g g p + g D + n + + n + + Co Aya Ishihara The hypothesis: The same cosmic events produce neutrinos and photons We detect the electrons or muons from neutrino that interact in the ice. Neutrino interact very weakly => need a very large array of ice instrumented to maximize chances that a cosmic neutrino interacts inside the detector. Need pointing accuracy to point back to origin of neutrino. Telescopes the world over then try to identify the source in the direction IceCube is pointing to for the neutrino. Multi-messenger Astrophysics
  • 9. The ν detection challenge 9 Optical Pro Aya Ishiha Combining all the possible info These features are included in We re al a s be de eloping h Nature never tell us a perfec satisfactory agreem Ice properties change with depth and wavelength Observed pointing resolution at high energies is systematics limited. Central value moves for different ice models Improved e and τ reconstruction Þ increased neutrino flux detection Þ more observations Photon propagation through ice runs efficiently on single precision GPU. Detailed simulation campaigns to improve pointing resolution by improving ice model. Improvement in reconstruction with better ice model near the detectors
  • 10. First evidence of an origin 10 First location of a source of very high energy neutrinos. Neutrino produced high energy muon near IceCube. Muon produced light as it traverses IceCube volume. Light is detected by array of phototubes of IceCube. IceCube alerted the astronomy community of the observation of a single high energy neutrino on September 22 2017. A blazar designated by astronomers as TXS 0506+056 was subsequently identified as most likely source in the direction IceCube was pointing. Multiple telescopes saw light from TXS at the same time IceCube saw the neutrino. Science 361, 147-151 (2018). DOI:10.1126/science.aat2890
  • 11. IceCube’s Future Plans 11 | IceCube Upgrade and Gen2 | Summer Blot | TeVPA 2018 The IceCube-Gen2 Facility Preliminary timeline MeV- to EeV-scale physics Surface array High Energy Array Radio array PINGU IC86 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 … 2032 Today Surface air shower ConstructionR&D Design & Approval IceCube Upgrade IceCube Upgrade Deployment Near term: add more phototubes to deep core to increase granularity of measurements. Longer term: • Extend instrumented volume at smaller granularity. • Extend even smaller granularity deep core volume. • Add surface array. Improve detector for low & high energy neutrinos
  • 12. Details on the Cloud Burst
  • 13. The Idea • Integrate all GPUs available for sale worldwide into a single HTCondor pool. - use 28 regions across AWS, Azure, and Google Cloud for a burst of a couple hours, or so. • IceCube submits their photon propagation workflow to this HTCondor pool. - we handle the input, the jobs on the GPUs, and the output as a single globally distributed system. 13 Run a GPU burst relevant in scale for future Exascale HPC systems.
  • 14. A global HTCondor pool • IceCube, like all OSG user communities, relies on HTCondor for resource orchestration - This demo used the standard tools • Dedicated HW setup - Avoid disruption of OSG production system - Optimize HTCondor setup for the spiky nature of the demo § multiple schedds for IceCube to submit to § collecting resources in each cloud region, then collecting from all regions into global pool 14
  • 15. HTCondor Distributed CI 15 Collector Collector Collector Collector Collector Negotiator Scheduler SchedulerScheduler IceCube VM VM VM 10 schedd’s One global resource pool
  • 16. Using native Cloud storage • Input data pre-staged into native Cloud storage - Each file in one-to-few Cloud regions § some replication to deal with limited predictability of resources per region - Local to Compute for large regions for maximum throughput - Reading from “close” region for smaller ones to minimize ops • Output staged back to region-local Cloud storage - To be transferred back asynchronously after the compute is done • Deployed simple wrappers around Cloud native file transfer tools - IceCube jobs do not need to customize for different Clouds - They just need to know where input data is available (pretty standard OSG operation mode) 16
  • 17. Using native Cloud storage • Input data pre-staged into native Cloud storage - Each file in one-to-few Cloud regions § some replication to deal with limited predictability of resources per region - Local to Compute for large regions for maximum throughput - Reading from “close” region for smaller ones to minimize ops • Output staged back to region-local Cloud storage - To be transferred back asynchronously after the compute is done • Deployed simple wrappers around Cloud native file transfer tools - IceCube jobs do not need to customize for different Clouds - They just need to know where input data is available (pretty standard OSG operation mode) 17 Done at a leisurely pace
  • 18. Using native Cloud storage • Input data pre-staged into native Cloud storage - Each file in one-to-few Cloud regions § some replication to deal with limited predictability of resources per region - Local to Compute for large regions for maximum throughput - Reading from “close” region for smaller ones to minimize ops • Output staged back to region-local Cloud storage - To be transferred back asynchronously after the compute is done • Deployed simple wrappers around Cloud native file transfer tools - IceCube jobs do not need to customize for different Clouds - They just need to know where input data is available (pretty standard OSG operation mode) 18 The focus of this talk
  • 19. Science with 50k GPUs achieved as peak performance 19 Time in Minutes Each color is a different cloud region in US, EU, or Asia. Total of 28 Regions in use. Peaked at about 50k GPUs ~350 Petaflops of fp32 8 generations of NVIDIA GPUs used.
  • 20. A Heterogenous Resource Pool 20 28 cloud Regions across 4 world regions providing us with 8 GPU generations. No one region or GPU type dominates!
  • 21. Science Produced 21 Distributed High-Throughput Computing (dHTC) paradigm implemented via HTCondor provides global resource aggregation. Largest cloud region provided 10.8% of the total dHTC paradigm can aggregate on-prem anywhere HPC at any scale and multiple clouds
  • 22. Data Produced 22 Size of the data created was proportional to the events processed Largest cloud region provided 10.8% of the total Just as distributed as the compute has been About 20 TB total
  • 23. Getting the data out of the Clouds
  • 24. Timeline • IceCube is actually in no hurry in getting the data out of the Clouds - Sooner is of course better - But not time critical • But Cloud great for urgent computing - And there getting the data promptly out would be as important as getting the compute done in the first place 24
  • 25. LIGO example • The LIGO is the other MMA experiment that can be used to detect large Cosmic events and point other Astronomy observations • They are currently limited by compute on how accurate their pointing is - More compute would mean better pointing - Must must be prompt 25
  • 26. LIGO example • The LIGO is the other MMA experiment that can be used to detect large Cosmic events and point other Astronomy observations • They are currently limited by compute on how accurate their pointing is - More compute would mean better pointing - Must must be prompt 26 20k GPUs for 30 mins with a 30min ramp- up gets us into the regime where we can reasonably run a multi-approximant/multi- EOS analysis to dramatically improve confidence in probability of an EM counter part in ~1 hour, so that classifications are as accurate as they're going to get before an optical counterpart fades James Clark, LIGO
  • 27. Demonstrating a Burt Transfer • We thus decided to move ~10 TB of the data back from the Clouds in a short burst - 10 TB dictated by the available storage options • Trying two options - Directly to UW using many commodity nodes - Stage to a Internet2 DTN 27
  • 28. UW commodity setup • We fully expected to be disk I/O bound - Single spinning disk per node • We managed to secure 30 nodes for the purpose 28
  • 29. UW commodity setup • Managed to transfer about 9 TB in 90 minutes 29
  • 30. UW commodity setup • About 16 Gbps aggregate bandwidth - But huge variations between Cloud regions - 3.5Gbps from best, <0.5 Gbps from worst 30
  • 31. Internet2 DTN • Wanted to see how a single high-end node with flash-based storage would fare • We also had previous network measurements that suggested that we may be able to beat the 30-node US setup - See my CHEP19 talk, if interested http://chep2019.org 31
  • 32. Network measurements 32 US East US West 2 35 Gbps 36 Gbps 33 Gbps 36 Gbps AWS From Cloud storage /dev/shm
  • 33. Network measurements 33 US East US West 2 36 Gbps 31 Gbps 27 Gbps 29 Gbps Azure From Cloud storage /dev/shm
  • 34. Network measurements 34 US East 1 36 Gbps US West 1 35 Gbps Google Cloud From Cloud storage /dev/shm
  • 35. Internet2 DTN • Took about 2 hours to transfer 2 TB - We did not beat UW 35
  • 36. Internet2 DTN • Peaked at slightly less than 10 Gbps - Likely limited by the storage • Again, huge differences in performance between Cloud regions 36
  • 37. Summary • Large scale cloud computing is feasible - We almost matched Summit in FLOP32s - And can be ramped up very fast • Getting data between on-prem and Cloud not a big deal either - We exceeded 10 Gbps while going to virtually all Cloud regions - But needs adequate on-prem capabilities 37
  • 38. Acknowledgements • Internet2 was the main network provider for this activity. • This work was partially sponsored by NSF grants OAC-1941481, MPS-1148698, OAC-1841530 and OAC-1826967. 38