SlideShare a Scribd company logo
WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN
OpenCAPI-based image analysis
pipeline for 18 GB/s kHz-framerate X-
ray camera at the SLS synchrotron
Filip Leonarski :: Beamline Data Scientist :: Macromolecular Crystallography
Page 1
• Introduction: Macromolecular crystallography at synchrotrons and X-ray
detectors
• Technology: POWER + OpenCAPI
• Solution: Jungfraujoch
Plan
Page 2
X-ray
1901 Nobel Prize
W. Röentgen
Discovery of X-rays
X-ray macromolecular crystallography (MX)
Page 4
1901 Nobel Prize
W. Röentgen
Discovery of X-rays
(Photo 51 by R.
Gosling and R.
Franklin)
1962 Nobel Prize
F. Crick, J. Watson and
M. Wilkins
Structure of DNA
double helix solved
with X-rays
X-ray macromolecular crystallography (MX)
Page 5
1901 Nobel Prize
W. Röentgen
Discovery of X-rays
(Photo 51 by R.
Gosling and R.
Franklin)
1962 Nobel Prize
F. Crick, J. Watson and
M. Wilkins
Structure of DNA
double helix solved
with X-rays
2009 Nobel Prize
V. Ramakrishnan*, T.
Steiz, A. Yonath*
Structure of ribosome
(*) some of their structures
were solved at PSI
Wikipedia:
X-ray crystallography is the experimental science determining the atomic and
molecular structure of a crystal, in which the crystalline structure causes a beam of
incident X-rays to diffract into many specific directions. By measuring the angles
and intensities of these diffracted beams, a crystallographer can produce a three-
dimensional picture of the density of electrons within the crystal.
X-ray macromolecular crystallography (MX)
Page 6
• Particle accelerators are source of the
brightest X-ray beam (multiple orders of
magnitudes as compared to conventional X-
ray tubes), when charged particles travel
through magnetic field
- Effect is nuisance for high energy physics
(undesirable energy loss),
- but it is a blessing for structural science =>
modern storage rings are build exclusively
as light sources.
• Synchrotrons provide continuous X-ray
beam, while X-ray free electron lasers
produce femtosecond long bright pulses
MX at synchrotron
Page 7
Paul Scherrer Institute
Page 8
SwissFEL
Swiss Light
Source
Swiss Alps
• 3 experimental stations at the synchrotron
• 1 experimental station at the SwissFEL
• Beamtime is shared between academic and
industrial users
- Industrial customers are mostly pharmaceutical
companies looking for drug binding to potential
drug targets
- Academic users are universities and scientific
institutes worldwide doing basic research in
structural biology
MX at Swiss Light Source and SwissFEL
Page 9
• New storage ring to be installed in 2024-2025
• Flux (photons/second) will increase by order of magnitude
• Measurements can be done 10x faster
• Enabling fragment screening method – i.e. single protein target is
crystallized with hundredths or thousands of molecular fragments to
find best drug
- This is like molecular docking, but fully experimentally
Major upgrade in 2024/2025 for SLS 2.0
Page 10
• PSI is major detector developer
- Hybrid pixel detectors developed for
CERN high energy physics
experiments
- Design could be used for X-ray
cameras – first PILATUS in 2000s
- PSI start-up Dectris, commercialized
PILATUS and EIGER detectors, most
synchrotrons are equipped with
their detectors
• Currently PSI is rolling out new
generation: JUNGFRAU
Page 11
New detector for SwissFEL and SLS 2.0
• Silicon sensor converts X-ray to
electric charge
• Bump bonded to sensor is ASIC, with
dedicated electronics for each pixel
• Pixel has three capacitors allowing
different amplification
• They are dynamically switched during
exposure to adjust for incoming
charge
Page 12
Adaptive gain detector to increase dynamic
range
Aim: measure reliably from 1 to 20,000,000 photons per second
Page 13
Adaptive gain detector to increase dynamic
range
0001010111110011
Pixel output in JF:
0001010111110011
Gain: 00:G0 01:G1 11:G2
ADC value: 0001010111110011
Photon number: =
!"# $ %&'&()*+
,*-.∗%01)1. &.&2,3 Gain and pedestal factors are
specific for pixel and gain setting
Prior calibration
Dedicated dark run
• Detector is modular
• 524,288 pixels per module
• 2.2 kHz * 524,288 pixels * 16 bit = 2.3 GB/s
- 2 x 10 Gbit/s links
• 4 Mpixel detector (2020)
- 16 x 10 Gbit/s
• 10 Mpixel (2022)
- 40 x 10 Gbit/s
Page 14
Modular detector
4 Mpixel (2020)
10 Mpixel (2022)
Page 15
MX detector data rates double every 2 years
0.1
1
10
100
2006 2008 2010 2012 2014 2016 2018 2020 2022 2024
Frame
rate
[GB/s]
Year
2007 PSI PILATUS 6 Mpixel 12.5 Hz 0.2 GB/s
2014 Dectris EIGER 16 Mpixel 133 Hz 3.4 GB/s
2019 Dectris EIGER 2 XE 16 Mpixel 400 Hz 13.5 GB/s
2020 PSI JUNGFRAU 4 Mpixel 2200 Hz 18.4 GB/s
2022 PSI JUNGFRAU 10 Mpixel 2200 Hz 46.1 GB/s
• Detector is streaming frames over UDP
- Receiver using Linux Datagram Socket
• Conversion of pixel read-out
- CPU SIMD code
• Compression
- CPU compression
First approach: scale conventional architecture
Page 16
• Detector is streaming frames over UDP
- Receiver using Linux Datagram Socket
• Conversion of pixel read-out
- CPU SIMD code
• Compression
- CPU compression
First approach: scale conventional architecture
Page 17
Aim
20 GB/s
Reached
5 GB/s
WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN
POWER / OpenCAPI / FPGA architecture
Page 18
• Real-time performance
- FPGA design is cycle-accurate, with fixed latency and throughput
• Large memory throughput
- FPGAs with HBM2 have 460 GB/s bandwidth to 8 GB large memory
• Ethernet on-board
- FPGA are made to work with network, often having dedicated “hard” cores for
ethernet
• Development of FPGAs is difficult and time consuming
- Hardware description languages
- PCI Express
• Virtex Ultrascale+ HBM (XCVU33P and XCVU35P)
- Availble as low-profile half-length 75W cards
FPGA are perfect devices for data acquisition
Page 19
• C/C++ compiler to produce
hardware design language (Verilog
or VHDL)
• All code is valid C++ code, it can be
executed on CPU and functionally is
generally equivalent
• Dedicated pragma to guide FPGA
synthesis
• It is generally understandable for
software developers, but may
contain strange/inoptimal
constructs from software point of
view
High-level synthesis
Page 20
Bitshuffle for 16-bit numbers
• For VU33/35P:
- Size: 8 GB
- Bandwidth: up to 460 GB/s
- Latency: up to 120 cycles @ 200 MHz
• Complex architecture
- 32 x 256-bit AXI3 interfaces
- Either operating as 32 separate memories
- Or as single memory with crossbar (at the cost of up to 50% throughput)
• 256-bit is a problem, as data are 512-bit (PCIe Gen3 x16) or 1024-bit (OpenCAPI,
PCIe Gen4 x16)
• Simulation only with special tools (Cadence Xcelium), impossible with Xilinx tools
High-bandwidth memory
Page 21
• PCI Express is CPU-centric bus, as it is design to
support peripherals
• This is good model, when FPGA is a coprocessor
to CPU – which sends data, and waits for reply
=> but for data acquisition, it is FPGA that is
producing the data, CPU has no prior knowledge
which packet will be processed at the time
• DMA is operating on physical addresses: virtual
addresses need to be pinned by kernel (so are
not swapped and moved)
Þ need to maintain own driver
Þ address translation cache possible on FPGA,
but requires memory
PCI Express DMA
Page 22
Xilinx QDMA is a robust
but highly complex
solution for PCI Express –
used to interface FPGAs
with x86 AMD and Intel
CPUs
• IBM POWER9 showed great numbers for
I/O and memory throughput in Summit
and Sierra supercomputers
• IBM designed own memory coherent
interface for accelerators
(CAPI/OpenCAPI), which has advantages
over PCIe
POWER architecture
Page 23
Source: Wikipedia
OpenCAPI
Page 24
FPGA
board
POWER9
CPU
OpenCAPI
cable
OpenCAPI
Page 25
FPGA
board
POWER9
CPU
OpenCAPI
cable
• Predecessor CAPI => proprietary IBM
• Communication over PCIe physical lines
(but different protocol)
• OpenCAPI => consortium model
• Dedicated cabling (8 x 25 Gbit/s lines)
• For POWER10 – this will be default memory interface,
(allowing to have any type of memory attached to CPU + to
“share” memory over network)
• Similar difference what 80286/80386 virtual
mode brought to software development
• In OpenCAPI one needs single kernel operation
=> Attach accelerator to running process
• Then, accelerator has access to virtual address
space of running process – it is FPGA that is
initiating the communication
=> Address translation is handled by TLB and OS
=> FPGA sees memory in a fully cache-coherent
way
• All security/reliability/efficiency mechanisms in
CPU and kernel are also present in OpenCAPI
Page 26
What difference brings OpenCAPI?
Source: Wikipedia
• Main function for the action contains a pointer to virutal address space
- On device the pointer will be synthesized as 1024-bit master memory-mapped
AXI interface
- On CPU this pointer has to be just set to zero (which is first address of virtual
address space)
• Any cell in virtual memory is just accessed as offset from this pointer
• Only requirement is that memory is aligned to 128-bytes
- No special memory allocator, malloc or mmap is fine
- No pinning/registering
• The same memory buffer class for both simulation and working with device
• For configuration, there is also 4 MiB memory-maped I/O space (like BAR in PCIe)
- On device implemented as slave AXI-lite (32-bit)
How to develop with OpenCAPI?
Page 27
• Open source “shell” mantained by IBM
• http://github.com/OpenCAPI/oc-accel
• Provides ready made tool to work with OpenCAPI (from transceiver setup to
AXImm bridge)
• Provides preconfigured interfaces for I/O peripherals (HBM, 100G, NVMe)
• Provides simulation environment
- One can simulate both SW and HW in a single simulation (both user FPGA
design and software are not modified from their “real” implementation)
OC-Accel
Page 28
WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN
Jungfraujoch – FPGA implementation
Page 29
Page 30
Jungfraujoch server
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
FPGA board with OpenCAPI interface
- Data acquisition
- Initial data analysis
- Pre-compression
(2.5 Mpixel/board for JF)
Up to 50 GB/s acquisition and
data analysis in a single 2U
IBM POWER9 server with 1-4 FPGA
boards
Frame
summation
Page 31
Jungfraujoch FGPA streaming design
Modular design
• Stream of data handled by successive cores doing work in parallel
à throughput and latency of each core is determined by the hardware design
• Extra stages can be relatively simply added, option to bypass cores
• All cores are C++ functions, connected with AXI-Stream FIFOs
• As buffering is expensive on FPGA, it is best suited for algorithm that have limited
dependencies between frames
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 32
Jungfraujoch
Ethernet UDP/IP core
Processes ethernet packets from network, ignores unnecessary packets, reads
frame header to get frame number, module number, etc.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 33
Jungfraujoch
Dark current core
This cores is responsible for calculating moving average of detector frames.
Calculated value is used as dark current (pedestal) for subsequent frames.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 34
Jungfraujoch
Conversion core
This cores translates JUNGFRAU read-out into units of energy or photon counts.
It benefits from very fast HBM2 memory within the FPGA (460 GB/s). Data
leaving this core can be used for processing by data analysis software.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 35
Jungfraujoch
Frame summation core (work in progress)
As data that left gain correction core are on linear scale, they can be summed to
reduce downstream data rate, if lower frame rate is needed, as compared to
detector.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 36
Jungfraujoch
Strong pixel finder core
This is first step of spot finding algorithm (for example COLSPOT). It identifies
pixels that are stronger than given number of standard deviations of their
neighborhood.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 37
Jungfraujoch
Bitshuffle
FPGAs are bit order agnostic. Therefore exchanging bit order in popular
compression prefilter is pretty much for free on FPGA.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 38
Jungfraujoch
Host memory write
Address in host memory buffer is calculated and data forwarded to host memory
via OpenCAPI. Additional image statistics are saved as well.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 39
Jungfraujoch implementation on VU33P FPGA
Spot finding
HBM
Gain
Pedestal
Write data
OpenCAPI
100G
UDP
Jungfraujoch FPGA power usage is 18 W/board
for the whole streaming functionality
Page 40
Xilinx Vivado Power Report
2 boards for 4 Mpixel JUNGFRAU and 4 boards for 10 Mpixel JUNGFRAU
• VU33P or VU35P with 8 GB of HBM2
• OpenCAPI link and PCIe Gen3 x16 (or two
PCIe Gen4 x8)
• Small flash (2 kb) to store MAC address,
board IR
• QSFP-DD optical socket (same as QSFP28,
but with 8 lanes for 2x100G) =>
compatible with QSFP28 transceivers
• Up to 75W
Alpha Data 9H3 board
Page 41
• Software tests – Catch2
- 8 min
- Among other software tests includes 13
FPGA action tests (whole SLS code)
- Automated tests cover 95% lines of high-
level synthesis code
- Covers most of the functionality
correctness – including address calculation
- Main limitation is debugging of FIFOs
parallel behavior (deadlocks, etc.)
• Hardware simulation – Cadence Xcelium
- 4 hours
- Collection of 8 frames from single module
- Checks if hardware description is correct,
can find problems with synchronization,
and other, very rare, issues
- Too slow to verify functionality
OpenCAPI programming - testing
Page 42
• Detector and data acquisition system was sent in
November for an experiment in Photon Factory, KEK
• More than 2,000 datasets collected for protein
targets, few real-life native-SAD structures solved
• Due to pandemic, detector support and
development (including deployment of new FPGA
design) was done fully remotely from Switzerland
Commissioning in KEK (Jan – May 2021)
Page 43
BL-1A Photon Factory
JUNGFRAU detector (up)
tested in helium chamber
for native-SAD
measurements with 3.75
keV X-rays
Page 44
Structure of Nucleocapsid Phosphoprotein from
SARS-CoV-2 solved in 1 second
• Crystal was previously measured with
conventional setup at our beamline –
with measurement taking longer than
one minute
• With JUNGFRAU detector and
OpenCAPI readout, 2000 images
collected in one second allowed to
solve structure of this protein
• Experimental team: Filip Leonarski, Sylvain
Engilberge, Vincent Olieric, Meitian Wang (MX
Group), Aldo Mozzanica (PSI Detector Group)
• SARS-CoV-2 protein was produced by Zinzula, L.,
Basquin, J., Bracher, A., Baumeister, W. (MPI,
Martinsried)
Possible gain from using FPGA based system
Page 45
Courtesy: B. Mesnet (IBM)
Possible gain from using FPGA based system
Page 46
Courtesy: B. Mesnet (IBM)
MX Group (PSI)
• Vincent Olieric
• Takashi Tomizaki
• Chia-Ying Huang
• Sylvain Engilberg
• Justyna Wojdyła
• Meitian Wang
Detector Group (PSI)
• Aldo Mozzanica
• Martin Brückner
• Carlos Lopez-Cuenca
• Bernd Schmitt
Science IT (PSI)
• Leonardo Sala
Controls (PSI)
• Andrej Babic
• Leonardo Hax-Damiani
SLS management (PSI)
• Oliver Bunk
Photon Factory, KEK
• Naohiro Matsugaki
• Yusuke Yamada
• Masahide Hikita
MAX IV
• Jie Nan
• Zdenek Matej
Uni Konstanz
• Kay Diederichs
LBL
• Aaron Brewster
DLS
• Graeme Winter
• DIALS Team
ESRF
• Jerome Kieffer
IBM Systems (France)
• Alexandre Castellane
• Bruno Mesnet
InnoBoost SA
• Lionel Clavien
Acknowledgements
Page 47
Ad

More Related Content

What's hot (20)

POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
Ganesan Narayanasamy
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
Ganesan Narayanasamy
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
Ganesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
Ganesan Narayanasamy
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
Ganesan Narayanasamy
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
Ganesan Narayanasamy
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
inside-BigData.com
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
inside-BigData.com
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
inside-BigData.com
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
IBM Sverige
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
Ganesan Narayanasamy
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
Ganesan Narayanasamy
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
inside-BigData.com
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
inside-BigData.com
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
IBM Sverige
 

Similar to OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray Camera at the Swiss Light Source synchrotron (20)

CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!
Ceph Community
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PC Cluster Consortium
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM Research
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
Ceph Community
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
IBM Research
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
Intel® Software
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Chips&toys
Chips&toysChips&toys
Chips&toys
Serendipity Seraph
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overview
Nabil Chouba
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
Allan Cantle
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
Zoltan Arnold Nagy
 
supercomputer
supercomputersupercomputer
supercomputer
panjab university
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
Sri Prasanna
 
CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!
Ceph Community
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PC Cluster Consortium
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM Research
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
Ceph Community
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
IBM Research
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overview
Nabil Chouba
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
Allan Cantle
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
Zoltan Arnold Nagy
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
Sri Prasanna
 
Ad

More from Ganesan Narayanasamy (20)

Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Empowering Engineering Faculties: Bridging the Gap with Emerging TechnologiesEmpowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Ganesan Narayanasamy
 
Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
Ganesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
Ganesan Narayanasamy
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
Ganesan Narayanasamy
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
Ganesan Narayanasamy
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
Ganesan Narayanasamy
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
Ganesan Narayanasamy
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
Ganesan Narayanasamy
 
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Empowering Engineering Faculties: Bridging the Gap with Emerging TechnologiesEmpowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Ganesan Narayanasamy
 
Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
Ganesan Narayanasamy
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
Ganesan Narayanasamy
 
Ad

Recently uploaded (20)

Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 

OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray Camera at the Swiss Light Source synchrotron

  • 1. WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN OpenCAPI-based image analysis pipeline for 18 GB/s kHz-framerate X- ray camera at the SLS synchrotron Filip Leonarski :: Beamline Data Scientist :: Macromolecular Crystallography Page 1
  • 2. • Introduction: Macromolecular crystallography at synchrotrons and X-ray detectors • Technology: POWER + OpenCAPI • Solution: Jungfraujoch Plan Page 2
  • 3. X-ray 1901 Nobel Prize W. Röentgen Discovery of X-rays
  • 4. X-ray macromolecular crystallography (MX) Page 4 1901 Nobel Prize W. Röentgen Discovery of X-rays (Photo 51 by R. Gosling and R. Franklin) 1962 Nobel Prize F. Crick, J. Watson and M. Wilkins Structure of DNA double helix solved with X-rays
  • 5. X-ray macromolecular crystallography (MX) Page 5 1901 Nobel Prize W. Röentgen Discovery of X-rays (Photo 51 by R. Gosling and R. Franklin) 1962 Nobel Prize F. Crick, J. Watson and M. Wilkins Structure of DNA double helix solved with X-rays 2009 Nobel Prize V. Ramakrishnan*, T. Steiz, A. Yonath* Structure of ribosome (*) some of their structures were solved at PSI
  • 6. Wikipedia: X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three- dimensional picture of the density of electrons within the crystal. X-ray macromolecular crystallography (MX) Page 6
  • 7. • Particle accelerators are source of the brightest X-ray beam (multiple orders of magnitudes as compared to conventional X- ray tubes), when charged particles travel through magnetic field - Effect is nuisance for high energy physics (undesirable energy loss), - but it is a blessing for structural science => modern storage rings are build exclusively as light sources. • Synchrotrons provide continuous X-ray beam, while X-ray free electron lasers produce femtosecond long bright pulses MX at synchrotron Page 7
  • 8. Paul Scherrer Institute Page 8 SwissFEL Swiss Light Source Swiss Alps
  • 9. • 3 experimental stations at the synchrotron • 1 experimental station at the SwissFEL • Beamtime is shared between academic and industrial users - Industrial customers are mostly pharmaceutical companies looking for drug binding to potential drug targets - Academic users are universities and scientific institutes worldwide doing basic research in structural biology MX at Swiss Light Source and SwissFEL Page 9
  • 10. • New storage ring to be installed in 2024-2025 • Flux (photons/second) will increase by order of magnitude • Measurements can be done 10x faster • Enabling fragment screening method – i.e. single protein target is crystallized with hundredths or thousands of molecular fragments to find best drug - This is like molecular docking, but fully experimentally Major upgrade in 2024/2025 for SLS 2.0 Page 10
  • 11. • PSI is major detector developer - Hybrid pixel detectors developed for CERN high energy physics experiments - Design could be used for X-ray cameras – first PILATUS in 2000s - PSI start-up Dectris, commercialized PILATUS and EIGER detectors, most synchrotrons are equipped with their detectors • Currently PSI is rolling out new generation: JUNGFRAU Page 11 New detector for SwissFEL and SLS 2.0
  • 12. • Silicon sensor converts X-ray to electric charge • Bump bonded to sensor is ASIC, with dedicated electronics for each pixel • Pixel has three capacitors allowing different amplification • They are dynamically switched during exposure to adjust for incoming charge Page 12 Adaptive gain detector to increase dynamic range Aim: measure reliably from 1 to 20,000,000 photons per second
  • 13. Page 13 Adaptive gain detector to increase dynamic range 0001010111110011 Pixel output in JF: 0001010111110011 Gain: 00:G0 01:G1 11:G2 ADC value: 0001010111110011 Photon number: = !"# $ %&'&()*+ ,*-.∗%01)1. &.&2,3 Gain and pedestal factors are specific for pixel and gain setting Prior calibration Dedicated dark run
  • 14. • Detector is modular • 524,288 pixels per module • 2.2 kHz * 524,288 pixels * 16 bit = 2.3 GB/s - 2 x 10 Gbit/s links • 4 Mpixel detector (2020) - 16 x 10 Gbit/s • 10 Mpixel (2022) - 40 x 10 Gbit/s Page 14 Modular detector 4 Mpixel (2020) 10 Mpixel (2022)
  • 15. Page 15 MX detector data rates double every 2 years 0.1 1 10 100 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 Frame rate [GB/s] Year 2007 PSI PILATUS 6 Mpixel 12.5 Hz 0.2 GB/s 2014 Dectris EIGER 16 Mpixel 133 Hz 3.4 GB/s 2019 Dectris EIGER 2 XE 16 Mpixel 400 Hz 13.5 GB/s 2020 PSI JUNGFRAU 4 Mpixel 2200 Hz 18.4 GB/s 2022 PSI JUNGFRAU 10 Mpixel 2200 Hz 46.1 GB/s
  • 16. • Detector is streaming frames over UDP - Receiver using Linux Datagram Socket • Conversion of pixel read-out - CPU SIMD code • Compression - CPU compression First approach: scale conventional architecture Page 16
  • 17. • Detector is streaming frames over UDP - Receiver using Linux Datagram Socket • Conversion of pixel read-out - CPU SIMD code • Compression - CPU compression First approach: scale conventional architecture Page 17 Aim 20 GB/s Reached 5 GB/s
  • 18. WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN POWER / OpenCAPI / FPGA architecture Page 18
  • 19. • Real-time performance - FPGA design is cycle-accurate, with fixed latency and throughput • Large memory throughput - FPGAs with HBM2 have 460 GB/s bandwidth to 8 GB large memory • Ethernet on-board - FPGA are made to work with network, often having dedicated “hard” cores for ethernet • Development of FPGAs is difficult and time consuming - Hardware description languages - PCI Express • Virtex Ultrascale+ HBM (XCVU33P and XCVU35P) - Availble as low-profile half-length 75W cards FPGA are perfect devices for data acquisition Page 19
  • 20. • C/C++ compiler to produce hardware design language (Verilog or VHDL) • All code is valid C++ code, it can be executed on CPU and functionally is generally equivalent • Dedicated pragma to guide FPGA synthesis • It is generally understandable for software developers, but may contain strange/inoptimal constructs from software point of view High-level synthesis Page 20 Bitshuffle for 16-bit numbers
  • 21. • For VU33/35P: - Size: 8 GB - Bandwidth: up to 460 GB/s - Latency: up to 120 cycles @ 200 MHz • Complex architecture - 32 x 256-bit AXI3 interfaces - Either operating as 32 separate memories - Or as single memory with crossbar (at the cost of up to 50% throughput) • 256-bit is a problem, as data are 512-bit (PCIe Gen3 x16) or 1024-bit (OpenCAPI, PCIe Gen4 x16) • Simulation only with special tools (Cadence Xcelium), impossible with Xilinx tools High-bandwidth memory Page 21
  • 22. • PCI Express is CPU-centric bus, as it is design to support peripherals • This is good model, when FPGA is a coprocessor to CPU – which sends data, and waits for reply => but for data acquisition, it is FPGA that is producing the data, CPU has no prior knowledge which packet will be processed at the time • DMA is operating on physical addresses: virtual addresses need to be pinned by kernel (so are not swapped and moved) Þ need to maintain own driver Þ address translation cache possible on FPGA, but requires memory PCI Express DMA Page 22 Xilinx QDMA is a robust but highly complex solution for PCI Express – used to interface FPGAs with x86 AMD and Intel CPUs
  • 23. • IBM POWER9 showed great numbers for I/O and memory throughput in Summit and Sierra supercomputers • IBM designed own memory coherent interface for accelerators (CAPI/OpenCAPI), which has advantages over PCIe POWER architecture Page 23 Source: Wikipedia
  • 25. OpenCAPI Page 25 FPGA board POWER9 CPU OpenCAPI cable • Predecessor CAPI => proprietary IBM • Communication over PCIe physical lines (but different protocol) • OpenCAPI => consortium model • Dedicated cabling (8 x 25 Gbit/s lines) • For POWER10 – this will be default memory interface, (allowing to have any type of memory attached to CPU + to “share” memory over network)
  • 26. • Similar difference what 80286/80386 virtual mode brought to software development • In OpenCAPI one needs single kernel operation => Attach accelerator to running process • Then, accelerator has access to virtual address space of running process – it is FPGA that is initiating the communication => Address translation is handled by TLB and OS => FPGA sees memory in a fully cache-coherent way • All security/reliability/efficiency mechanisms in CPU and kernel are also present in OpenCAPI Page 26 What difference brings OpenCAPI? Source: Wikipedia
  • 27. • Main function for the action contains a pointer to virutal address space - On device the pointer will be synthesized as 1024-bit master memory-mapped AXI interface - On CPU this pointer has to be just set to zero (which is first address of virtual address space) • Any cell in virtual memory is just accessed as offset from this pointer • Only requirement is that memory is aligned to 128-bytes - No special memory allocator, malloc or mmap is fine - No pinning/registering • The same memory buffer class for both simulation and working with device • For configuration, there is also 4 MiB memory-maped I/O space (like BAR in PCIe) - On device implemented as slave AXI-lite (32-bit) How to develop with OpenCAPI? Page 27
  • 28. • Open source “shell” mantained by IBM • http://github.com/OpenCAPI/oc-accel • Provides ready made tool to work with OpenCAPI (from transceiver setup to AXImm bridge) • Provides preconfigured interfaces for I/O peripherals (HBM, 100G, NVMe) • Provides simulation environment - One can simulate both SW and HW in a single simulation (both user FPGA design and software are not modified from their “real” implementation) OC-Accel Page 28
  • 29. WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN Jungfraujoch – FPGA implementation Page 29
  • 30. Page 30 Jungfraujoch server Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer FPGA board with OpenCAPI interface - Data acquisition - Initial data analysis - Pre-compression (2.5 Mpixel/board for JF) Up to 50 GB/s acquisition and data analysis in a single 2U IBM POWER9 server with 1-4 FPGA boards Frame summation
  • 31. Page 31 Jungfraujoch FGPA streaming design Modular design • Stream of data handled by successive cores doing work in parallel à throughput and latency of each core is determined by the hardware design • Extra stages can be relatively simply added, option to bypass cores • All cores are C++ functions, connected with AXI-Stream FIFOs • As buffering is expensive on FPGA, it is best suited for algorithm that have limited dependencies between frames Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 32. Page 32 Jungfraujoch Ethernet UDP/IP core Processes ethernet packets from network, ignores unnecessary packets, reads frame header to get frame number, module number, etc. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 33. Page 33 Jungfraujoch Dark current core This cores is responsible for calculating moving average of detector frames. Calculated value is used as dark current (pedestal) for subsequent frames. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 34. Page 34 Jungfraujoch Conversion core This cores translates JUNGFRAU read-out into units of energy or photon counts. It benefits from very fast HBM2 memory within the FPGA (460 GB/s). Data leaving this core can be used for processing by data analysis software. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 35. Page 35 Jungfraujoch Frame summation core (work in progress) As data that left gain correction core are on linear scale, they can be summed to reduce downstream data rate, if lower frame rate is needed, as compared to detector. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 36. Page 36 Jungfraujoch Strong pixel finder core This is first step of spot finding algorithm (for example COLSPOT). It identifies pixels that are stronger than given number of standard deviations of their neighborhood. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 37. Page 37 Jungfraujoch Bitshuffle FPGAs are bit order agnostic. Therefore exchanging bit order in popular compression prefilter is pretty much for free on FPGA. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 38. Page 38 Jungfraujoch Host memory write Address in host memory buffer is calculated and data forwarded to host memory via OpenCAPI. Additional image statistics are saved as well. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 39. Page 39 Jungfraujoch implementation on VU33P FPGA Spot finding HBM Gain Pedestal Write data OpenCAPI 100G UDP
  • 40. Jungfraujoch FPGA power usage is 18 W/board for the whole streaming functionality Page 40 Xilinx Vivado Power Report 2 boards for 4 Mpixel JUNGFRAU and 4 boards for 10 Mpixel JUNGFRAU
  • 41. • VU33P or VU35P with 8 GB of HBM2 • OpenCAPI link and PCIe Gen3 x16 (or two PCIe Gen4 x8) • Small flash (2 kb) to store MAC address, board IR • QSFP-DD optical socket (same as QSFP28, but with 8 lanes for 2x100G) => compatible with QSFP28 transceivers • Up to 75W Alpha Data 9H3 board Page 41
  • 42. • Software tests – Catch2 - 8 min - Among other software tests includes 13 FPGA action tests (whole SLS code) - Automated tests cover 95% lines of high- level synthesis code - Covers most of the functionality correctness – including address calculation - Main limitation is debugging of FIFOs parallel behavior (deadlocks, etc.) • Hardware simulation – Cadence Xcelium - 4 hours - Collection of 8 frames from single module - Checks if hardware description is correct, can find problems with synchronization, and other, very rare, issues - Too slow to verify functionality OpenCAPI programming - testing Page 42
  • 43. • Detector and data acquisition system was sent in November for an experiment in Photon Factory, KEK • More than 2,000 datasets collected for protein targets, few real-life native-SAD structures solved • Due to pandemic, detector support and development (including deployment of new FPGA design) was done fully remotely from Switzerland Commissioning in KEK (Jan – May 2021) Page 43 BL-1A Photon Factory JUNGFRAU detector (up) tested in helium chamber for native-SAD measurements with 3.75 keV X-rays
  • 44. Page 44 Structure of Nucleocapsid Phosphoprotein from SARS-CoV-2 solved in 1 second • Crystal was previously measured with conventional setup at our beamline – with measurement taking longer than one minute • With JUNGFRAU detector and OpenCAPI readout, 2000 images collected in one second allowed to solve structure of this protein • Experimental team: Filip Leonarski, Sylvain Engilberge, Vincent Olieric, Meitian Wang (MX Group), Aldo Mozzanica (PSI Detector Group) • SARS-CoV-2 protein was produced by Zinzula, L., Basquin, J., Bracher, A., Baumeister, W. (MPI, Martinsried)
  • 45. Possible gain from using FPGA based system Page 45 Courtesy: B. Mesnet (IBM)
  • 46. Possible gain from using FPGA based system Page 46 Courtesy: B. Mesnet (IBM)
  • 47. MX Group (PSI) • Vincent Olieric • Takashi Tomizaki • Chia-Ying Huang • Sylvain Engilberg • Justyna Wojdyła • Meitian Wang Detector Group (PSI) • Aldo Mozzanica • Martin Brückner • Carlos Lopez-Cuenca • Bernd Schmitt Science IT (PSI) • Leonardo Sala Controls (PSI) • Andrej Babic • Leonardo Hax-Damiani SLS management (PSI) • Oliver Bunk Photon Factory, KEK • Naohiro Matsugaki • Yusuke Yamada • Masahide Hikita MAX IV • Jie Nan • Zdenek Matej Uni Konstanz • Kay Diederichs LBL • Aaron Brewster DLS • Graeme Winter • DIALS Team ESRF • Jerome Kieffer IBM Systems (France) • Alexandre Castellane • Bruno Mesnet InnoBoost SA • Lionel Clavien Acknowledgements Page 47