SlideShare a Scribd company logo
HPC Infrastructure To Solve
The CFD Grand Challenge
Anand Haridass
Senior Technical Staff Member
India System Development Lab
IBM India
19th Aeronautical Society of India CFD Symposium
Bangalore Aug 10-11 2017
IBM Systems
Aerospace
Design
Turbine
Design
Computational Fluid Dynamics - Layman View
Physics – Fluid
Dynamics
Mathematics –
Partial Diff. Eqn.
Computational
Geometry
Computer Science /
Numerical Analysis
Systems +
Hardware
Huge Linear Solve
Huge Non-Linear Solve
Algorithms
Memory Bound eg Data transportation costs; Direct Numerical Simulation
Compute Bound eg. Grid property updates
Networking Bound eg. Large cluster MPI implementation
Technology shrink alone no longer drive sufficient Cost/Performance improvements
Driving Innovation Beyond The Microprocessor
OpenPOWER Foundation
Platinum Members
The goal of the OpenPOWER Foundation is to
create an open ecosystem, using the POWER
Architecture to share expertise, investment,
and server-class intellectual property to serve
the evolving needs of customers.
Enabling National Computing Agenda
POWER Processor Roadmap
2014
12 Cores
SMT8
2X DPFP
PCIE Gen 3
Coprocessor (CAPI)
Enhanced Prefetch
NVLink 1.0
2X CAPI
2020+
24 Cores
New µArchitecture
Direct-attach DDR4
Gen4 PCIe
CAPI 2.0
OpenCAPI 3.0
NVLink 2.0
650mm2
POWER8
22 nm
POWER8
with NVLink
22 nm
POWER9
14 nm
659mm2
2016 2017
POWER10
>>24 Cores
New µArchitecture
Enhanced Memory
OpenCAPI 4.0
Future NVLink
695mm2
Memory bandwidth
230GBs or 115GBs sustained memory
bandwidth in 8/4 Port configuration
L4 cache – reducing access latency to
larger data sets
Strong Cores - Greater raw CPU throughput
More performance from each core
Superior Multi-threading & large caches
1, 2, 4 , 8 threads per core – improving
parallelism & throughput
Tighter accelerator integration
Only Processor with integrated NVLink
interface
Memory coherency (GPUs & otherwise)
I/O leadership
Established OpenCAPI
Open Industry Coherent Attach
Latency / Bandwidth Improvement
Removes Overhead from Attach Silicon
FPGA / Parallel Compute Optimized
Network/Memory/Storage Innovation
7
|
7
POWER9 – Premier Acceleration Platform
Extreme Processor / Accelerator Bandwidth and Reduced Latency
Coherent Memory and Virtual Addressing Capability for all Accelerators
OpenPOWER Community Enablement – Robust Accelerated Compute Options
7
State of the Art I/O and Acceleration Attachment Signaling
• PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
• 25G Link x 48 lanes – 300 GB/s duplex bandwidth
Robust Accelerated Compute Options with OPEN standards
• On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2
• CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4
• NVLink 2.0 – Next generation of GPU/CPU interconnect
Up to 2x bandwidth of NVLink1.0
Easier programming model for complex analytic & cognitive applications
• Coherency, virtual addressing, low overhead communication
• OpenCAPI 3.0 – High bandwidth, low latency and open interface using 25G Link
POWER9
PowerAccel
IBM Systems
IBM POWER HPC Long-term Systems Roadmap
| 8
2016
GPU
Intensive
S822LC-HPC P8 “Witherspoon” “Deep Eddy” Deep Eddy F.O.
2017-2018 2020+
HPC/HPA/DL
GPU Lite
S821LC P8
HPC Non-
accelerated
Nodes (2-
8sockets in 2U)
S822LC-B.D. P8
Future
“Boston” 1U
“Boston”
“Aero”
Boston
Follow-on
Boston
Follow-on
Aero
Follow-on
Plans subject to change
Fully Invested in Enabling Heterogeneous Computing
IBM Systems
“Minsky” S822LC for HPC
• 2.5X the CPU:GPU Interface Bandwidth
• Tight coupling: strong CPU: strong GPU performance
• Equalizing access to memory - for all kinds of programming
• Closer programming to the CPU paradigm
115GB/S 115GB/S
NVLink
DDR4
P8’
InfiniBand Fabric
DDR4
P8’
Tesla
P100
Tesla
P100
80GB/S Tesla
P100
Tesla
P100
80GB/S
Why it Matters:
Use Cases where NVLink will have the most Impact
Stream Data at Same
Rate as Computation
Genomics, Cryptography,
Video Processing, etc.
Burst Data at Startup
and Teardown
.
CFD/CAE, Machine Learning,
Deep Learning, etc.
Constant Data Transfers
between adjacent GPUs
Molecular Dynamics,
Amber, etc.
Mask Bus Transfers
from Host-Device
Accelerated Databases,
Analytics, etc.
IBM Systems | 10
• Profiling result based on running Kinetica “Filter by geographic area” queries on data set of 280 million simulated 1 simultaneous query stream each with 0 think time.
• Power System S822LC for HPC; 20 cores (2 x 10c chips) / 160 threads, POWER8 with NVLink; 2.86 GHz, 1024 GB memory, 2x 6Gb SSDs, 2-port 10 GbEth, 4x Tesla P100 GPU; Ubuntu 16.04.
• Competitive stack: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 512GB memory 2x 6Gb SSDs, 2-port 10 GbEth, 4x Tesla P100 GPU, Ubuntu 16.04.
2.7X faster query response time on IBM Power
Systems S822LC for HPC
87% of the total speedup (2.35x of 2.7x improvement)
is due to the NVLink Interface from CPU:GPU
• Performance dividend unique to POWER8 with
NVLink platforms
100 ticks
79 ticks
Data Transfer
21 ticks
Calculation*
79% 21%
37 ticks
24 ticks 13 ticks
2.7x speedup
Query Time: Competing System PCI-E x16 3.0
Query Time: S822LC for HPC, NVLink
* Includes non-overlapping: CPU, GPU, and idle times.
Data Transfer Calculation*
65% 35%
“Minsky” S822LC for HPC
Reduce Data Transfer For CFD Codes
CPU:GPU NVLink improves data transfer 2.6X in Nekbone
Newly accelerated codes such as Nekbone
which require faster data movement
• Avoid latency that exceeds solver time
• Support your largest datasets with the GPU
accelerated versions of your code because
data-transfer time improves
• IBM Power System S822LC for HPC; 20 cores (2 x 10c chips) / 160 threads, POWER8 with NVLink; 2.86 GHz, 256 GB memory, 2 x 1TB SATA 7.2K rpm HDD, 2-port 10 GbEth, 2x Tesla P100 GPU;
Ubuntu 16.04.
• Competitive stack: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 256 GB memory, 2 x 1TB SATA 7.2K rpm HDD, 2-port 10 GbEth, 2x Tesla K80 GPUs,
Ubuntu 16.04.
OpenFOAM: Motorbike example
Provided with OpenFoam examples: incompressible/simpleFoam
Different problem sizes: 12M – 100M points
SimpleFoam, double precision
OpenFOAM: Motorbike example
Multi-node runs with up to 500 cores / 25 nodes
Perfect scaling for 50M and 100M cells up to 500 cores
Smaller cases run into saturation
10
100
1000
10000
100000
16 64 256 1024
Runtime[s]
Cores
MotorBike on S822LC
0.50
0.60
0.70
0.80
0.90
1.00
1.10
1.20
16 64 256 1024
Efficiency Cores
Parallel Efficiency
12.8M
50M
100M
IBM Systems | 14
Code Optimization: Reservoir Simulation – Exploration for Oil & Gas
New Record IBM + Nvidia + Stone Ridge Technology (Echelon)
April ’17 - 1 Billion Cells simulated in 92 minutes using 30x 2 socket IBM OpenPower POWER8 Minsky servers (30 server nodes 60
POWER8 chips [600 cores]+ 120 Nvidia P100 GPU’s)
Previous Record
ExxonMobil – Feb ‘17 Used Full Blue Water facility at NCSA (716,800 cores & 22,400 server nodes) to simulate one billion cells
took 20hours
https://www-03.ibm.com/press/us/en/pressrelease/52164.wss
https://venturebeat.com/2017/04/25/ibm-nvidia-and-stone-ridge-technology-set-record-for-supercomputing-in-oil-and-gas-exploration/
Billion-cell simulation shatters previous published results
using one-tenth the power and 1/100th of the space

Results achieved in 92 minutes with 60 IBM POWER8
processors & 120 Nvidia P100 GPU accelerators, smash
previous published record of 20 hours using thousands of
processors
IBM Systems | 15https://www.ibm.com/us-en/marketplace/deep-learning-platform
Deep Learning - PowerAI
Thinking Differently - Getting HPC to ‘Work Smart Not Hard’
• Beginning to see ML/DL techniques
being applied to CFD codes
• Typically HPC development is focused on
increased speed.
• The fastest calculation is the one which
you don’t run!
• Can we use machine learning to make
better decisions on which simulations
give the most value?
• Can we use machine learning to improve
resolution of information?
‘Cognitive’ workflow uses 1/3 of the calculations to achieve 4
orders of magnitude resolution increase
Cognitive steering of an ensemble of simulations
BACKUP
POWER8
Compute
6/8/10/12 cores, ST/SMT2/SMT4/SMT8
Enhanced, Auto balancing threads
8 dispatch/16 execution pipes/224 instructions in flight
Transactional Memory/ Crypto & CRC instructions
Cache
64KB L1 + 512KB L2 / core
96MB L3 + up to 128MB L4 / socket
System Interfaces
230 GB/s memory bandwidth / socket vs Intel 85GB/s
Up to 48x Integrated PCI gen 3 / socket
CAPI (over PCI gen 3)
Robust, Large SMP Interconnect
On chip Energy Mgmt, VRM / core
IBM Journal of Research and Development Issue 1
Date Jan.-Feb. 2015 On IEEE Explore - Link
19
POWER9 Processor
New Core Microarchitecture
• Stronger thread performance
• Efficient agile pipeline
• POWER ISA v3.0
Enhanced Cache Hierarchy
• 120MB NUCA L3 architecture
• 12 x 20-way associative regions
• Advanced replacement policies
• Fed by 7 TB/s on-chip bandwidth
Cloud + Virtualization Innovation
• Quality of service assists
• New interrupt architecture
• Workload optimized frequency
• Hardware enforced trusted execution
Leadership Hardware Acceleration
• Enhanced on-chip acceleration
• Nvidia NVLink 2.0: High bandwidth,
advanced new features (25G Link)
• CAPI 2.0: Coherent accelerator and
storage attach (PCIe G4)
• New CAPI: Improved latency and
bandwidth, open interface (25G Link)
State of the Art I/O Subsystem
• PCIe Gen4 – 48 lanes
High Bandwidth Signaling Technology
• 16 Gb/s interface – Local SMP
• 25 Gb/s interface – BlueLink
– Accelerator, remote SMP
14nm finFET Semiconductor
Process
• Improved device performance and
reduced energy
• 17 layer metal stack and eDRAM
• 8.0 billion transistors
InfiniBand remains our strategic interconnect
IB is an open standard with a proven track record
over a decade of forward and backward
compatibility
IB features rich hardware offload
• “Intelligent” Mellanox IB NIC versus Intel Omni-
Path NIC
More CPU cycles available for application
processing with less CPU frequency sensitivity,
processor/memory occupancy, jitter
• MPI collectives acceleration in the IB fabric,
reducing CPU overhead
• Additional intelligence in the switch
• IB features GPU direct, reducing CPU overhead
• IB features NVMe over Fabrics support, reducing
CPU overhead
• IB has hardware-controlled, end-to-end retry
IB supports multiple network topologies for cost
optimization
• Fat tree, 3D torus, Dragonfly+
Open Industry Coherent Attach
• Latency / Bandwidth Improvement
• Removes Overhead from Attach Silicon
• Eliminates “Von-Neumann Bottleneck”
• FPGA / Parallel Compute Optimized
• Network/Memory/Storage Innovation
http://opencapi.org
Consortium announced October 14, 2016
Open Forum to Manage the OpenCAPI
Specification & Ecosystem
Ad

More Related Content

What's hot (20)

Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
IBM Sverige
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
Ganesan Narayanasamy
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
Ganesan Narayanasamy
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
Ganesan Narayanasamy
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
Ganesan Narayanasamy
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
Ganesan Narayanasamy
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
Ganesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
Ganesan Narayanasamy
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
Ganesan Narayanasamy
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
Ganesan Narayanasamy
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
Ganesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
InTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AI
InTTrust S.A.
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
Ganesan Narayanasamy
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big Memory
MemVerge
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
Lviv Startup Club
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
Ganesan Narayanasamy
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu India
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
Ganesan Narayanasamy
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
IBM Sverige
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
Ganesan Narayanasamy
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
Ganesan Narayanasamy
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
Ganesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
InTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AI
InTTrust S.A.
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big Memory
MemVerge
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
Lviv Startup Club
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
Ganesan Narayanasamy
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu India
 

Similar to HPC Infrastructure To Solve The CFD Grand Challenge (20)

POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
inside-BigData.com
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
CastLabKAIST
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
Anand Haridass
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
inside-BigData.com
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Redis Labs
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
Ganesan Narayanasamy
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcast
HELP400
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
VigneshwarRamaswamy
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVM
Zainal Abidin
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
Redis Labs
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
IBM Cloud Paris 20180517 - La solution Power AI
IBM Cloud Paris 20180517 - La solution Power AIIBM Cloud Paris 20180517 - La solution Power AI
IBM Cloud Paris 20180517 - La solution Power AI
IBM France Lab
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
CastLabKAIST
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
inside-BigData.com
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Redis Labs
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcast
HELP400
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
VigneshwarRamaswamy
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVM
Zainal Abidin
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
Redis Labs
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
IBM Cloud Paris 20180517 - La solution Power AI
IBM Cloud Paris 20180517 - La solution Power AIIBM Cloud Paris 20180517 - La solution Power AI
IBM Cloud Paris 20180517 - La solution Power AI
IBM France Lab
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
Ad

More from Anand Haridass (7)

2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
Anand Haridass
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Anand Haridass
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
Anand Haridass
 
Performance beyond moore's law
Performance beyond moore's lawPerformance beyond moore's law
Performance beyond moore's law
Anand Haridass
 
ISLPED 2015 FreqLeak (Presentation Charts)
ISLPED 2015 FreqLeak (Presentation Charts)ISLPED 2015 FreqLeak (Presentation Charts)
ISLPED 2015 FreqLeak (Presentation Charts)
Anand Haridass
 
VLSID 2015 FirmLeak (Poster)
VLSID 2015 FirmLeak (Poster)VLSID 2015 FirmLeak (Poster)
VLSID 2015 FirmLeak (Poster)
Anand Haridass
 
The Cloud & Its Impact on IT
The Cloud & Its Impact on ITThe Cloud & Its Impact on IT
The Cloud & Its Impact on IT
Anand Haridass
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
Anand Haridass
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Anand Haridass
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
Anand Haridass
 
Performance beyond moore's law
Performance beyond moore's lawPerformance beyond moore's law
Performance beyond moore's law
Anand Haridass
 
ISLPED 2015 FreqLeak (Presentation Charts)
ISLPED 2015 FreqLeak (Presentation Charts)ISLPED 2015 FreqLeak (Presentation Charts)
ISLPED 2015 FreqLeak (Presentation Charts)
Anand Haridass
 
VLSID 2015 FirmLeak (Poster)
VLSID 2015 FirmLeak (Poster)VLSID 2015 FirmLeak (Poster)
VLSID 2015 FirmLeak (Poster)
Anand Haridass
 
The Cloud & Its Impact on IT
The Cloud & Its Impact on ITThe Cloud & Its Impact on IT
The Cloud & Its Impact on IT
Anand Haridass
 
Ad

Recently uploaded (20)

π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 

HPC Infrastructure To Solve The CFD Grand Challenge

  • 1. HPC Infrastructure To Solve The CFD Grand Challenge Anand Haridass Senior Technical Staff Member India System Development Lab IBM India 19th Aeronautical Society of India CFD Symposium Bangalore Aug 10-11 2017
  • 2. IBM Systems Aerospace Design Turbine Design Computational Fluid Dynamics - Layman View Physics – Fluid Dynamics Mathematics – Partial Diff. Eqn. Computational Geometry Computer Science / Numerical Analysis Systems + Hardware Huge Linear Solve Huge Non-Linear Solve Algorithms Memory Bound eg Data transportation costs; Direct Numerical Simulation Compute Bound eg. Grid property updates Networking Bound eg. Large cluster MPI implementation
  • 3. Technology shrink alone no longer drive sufficient Cost/Performance improvements Driving Innovation Beyond The Microprocessor
  • 4. OpenPOWER Foundation Platinum Members The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise, investment, and server-class intellectual property to serve the evolving needs of customers.
  • 6. POWER Processor Roadmap 2014 12 Cores SMT8 2X DPFP PCIE Gen 3 Coprocessor (CAPI) Enhanced Prefetch NVLink 1.0 2X CAPI 2020+ 24 Cores New µArchitecture Direct-attach DDR4 Gen4 PCIe CAPI 2.0 OpenCAPI 3.0 NVLink 2.0 650mm2 POWER8 22 nm POWER8 with NVLink 22 nm POWER9 14 nm 659mm2 2016 2017 POWER10 >>24 Cores New µArchitecture Enhanced Memory OpenCAPI 4.0 Future NVLink 695mm2 Memory bandwidth 230GBs or 115GBs sustained memory bandwidth in 8/4 Port configuration L4 cache – reducing access latency to larger data sets Strong Cores - Greater raw CPU throughput More performance from each core Superior Multi-threading & large caches 1, 2, 4 , 8 threads per core – improving parallelism & throughput Tighter accelerator integration Only Processor with integrated NVLink interface Memory coherency (GPUs & otherwise) I/O leadership Established OpenCAPI Open Industry Coherent Attach Latency / Bandwidth Improvement Removes Overhead from Attach Silicon FPGA / Parallel Compute Optimized Network/Memory/Storage Innovation
  • 7. 7 | 7 POWER9 – Premier Acceleration Platform Extreme Processor / Accelerator Bandwidth and Reduced Latency Coherent Memory and Virtual Addressing Capability for all Accelerators OpenPOWER Community Enablement – Robust Accelerated Compute Options 7 State of the Art I/O and Acceleration Attachment Signaling • PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth • 25G Link x 48 lanes – 300 GB/s duplex bandwidth Robust Accelerated Compute Options with OPEN standards • On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2 • CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4 • NVLink 2.0 – Next generation of GPU/CPU interconnect Up to 2x bandwidth of NVLink1.0 Easier programming model for complex analytic & cognitive applications • Coherency, virtual addressing, low overhead communication • OpenCAPI 3.0 – High bandwidth, low latency and open interface using 25G Link POWER9 PowerAccel
  • 8. IBM Systems IBM POWER HPC Long-term Systems Roadmap | 8 2016 GPU Intensive S822LC-HPC P8 “Witherspoon” “Deep Eddy” Deep Eddy F.O. 2017-2018 2020+ HPC/HPA/DL GPU Lite S821LC P8 HPC Non- accelerated Nodes (2- 8sockets in 2U) S822LC-B.D. P8 Future “Boston” 1U “Boston” “Aero” Boston Follow-on Boston Follow-on Aero Follow-on Plans subject to change Fully Invested in Enabling Heterogeneous Computing
  • 9. IBM Systems “Minsky” S822LC for HPC • 2.5X the CPU:GPU Interface Bandwidth • Tight coupling: strong CPU: strong GPU performance • Equalizing access to memory - for all kinds of programming • Closer programming to the CPU paradigm 115GB/S 115GB/S NVLink DDR4 P8’ InfiniBand Fabric DDR4 P8’ Tesla P100 Tesla P100 80GB/S Tesla P100 Tesla P100 80GB/S Why it Matters: Use Cases where NVLink will have the most Impact Stream Data at Same Rate as Computation Genomics, Cryptography, Video Processing, etc. Burst Data at Startup and Teardown . CFD/CAE, Machine Learning, Deep Learning, etc. Constant Data Transfers between adjacent GPUs Molecular Dynamics, Amber, etc. Mask Bus Transfers from Host-Device Accelerated Databases, Analytics, etc.
  • 10. IBM Systems | 10 • Profiling result based on running Kinetica “Filter by geographic area” queries on data set of 280 million simulated 1 simultaneous query stream each with 0 think time. • Power System S822LC for HPC; 20 cores (2 x 10c chips) / 160 threads, POWER8 with NVLink; 2.86 GHz, 1024 GB memory, 2x 6Gb SSDs, 2-port 10 GbEth, 4x Tesla P100 GPU; Ubuntu 16.04. • Competitive stack: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 512GB memory 2x 6Gb SSDs, 2-port 10 GbEth, 4x Tesla P100 GPU, Ubuntu 16.04. 2.7X faster query response time on IBM Power Systems S822LC for HPC 87% of the total speedup (2.35x of 2.7x improvement) is due to the NVLink Interface from CPU:GPU • Performance dividend unique to POWER8 with NVLink platforms 100 ticks 79 ticks Data Transfer 21 ticks Calculation* 79% 21% 37 ticks 24 ticks 13 ticks 2.7x speedup Query Time: Competing System PCI-E x16 3.0 Query Time: S822LC for HPC, NVLink * Includes non-overlapping: CPU, GPU, and idle times. Data Transfer Calculation* 65% 35% “Minsky” S822LC for HPC
  • 11. Reduce Data Transfer For CFD Codes CPU:GPU NVLink improves data transfer 2.6X in Nekbone Newly accelerated codes such as Nekbone which require faster data movement • Avoid latency that exceeds solver time • Support your largest datasets with the GPU accelerated versions of your code because data-transfer time improves • IBM Power System S822LC for HPC; 20 cores (2 x 10c chips) / 160 threads, POWER8 with NVLink; 2.86 GHz, 256 GB memory, 2 x 1TB SATA 7.2K rpm HDD, 2-port 10 GbEth, 2x Tesla P100 GPU; Ubuntu 16.04. • Competitive stack: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 256 GB memory, 2 x 1TB SATA 7.2K rpm HDD, 2-port 10 GbEth, 2x Tesla K80 GPUs, Ubuntu 16.04.
  • 12. OpenFOAM: Motorbike example Provided with OpenFoam examples: incompressible/simpleFoam Different problem sizes: 12M – 100M points SimpleFoam, double precision
  • 13. OpenFOAM: Motorbike example Multi-node runs with up to 500 cores / 25 nodes Perfect scaling for 50M and 100M cells up to 500 cores Smaller cases run into saturation 10 100 1000 10000 100000 16 64 256 1024 Runtime[s] Cores MotorBike on S822LC 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 16 64 256 1024 Efficiency Cores Parallel Efficiency 12.8M 50M 100M
  • 14. IBM Systems | 14 Code Optimization: Reservoir Simulation – Exploration for Oil & Gas New Record IBM + Nvidia + Stone Ridge Technology (Echelon) April ’17 - 1 Billion Cells simulated in 92 minutes using 30x 2 socket IBM OpenPower POWER8 Minsky servers (30 server nodes 60 POWER8 chips [600 cores]+ 120 Nvidia P100 GPU’s) Previous Record ExxonMobil – Feb ‘17 Used Full Blue Water facility at NCSA (716,800 cores & 22,400 server nodes) to simulate one billion cells took 20hours https://www-03.ibm.com/press/us/en/pressrelease/52164.wss https://venturebeat.com/2017/04/25/ibm-nvidia-and-stone-ridge-technology-set-record-for-supercomputing-in-oil-and-gas-exploration/ Billion-cell simulation shatters previous published results using one-tenth the power and 1/100th of the space
 Results achieved in 92 minutes with 60 IBM POWER8 processors & 120 Nvidia P100 GPU accelerators, smash previous published record of 20 hours using thousands of processors
  • 15. IBM Systems | 15https://www.ibm.com/us-en/marketplace/deep-learning-platform Deep Learning - PowerAI
  • 16. Thinking Differently - Getting HPC to ‘Work Smart Not Hard’ • Beginning to see ML/DL techniques being applied to CFD codes • Typically HPC development is focused on increased speed. • The fastest calculation is the one which you don’t run! • Can we use machine learning to make better decisions on which simulations give the most value? • Can we use machine learning to improve resolution of information? ‘Cognitive’ workflow uses 1/3 of the calculations to achieve 4 orders of magnitude resolution increase Cognitive steering of an ensemble of simulations
  • 18. POWER8 Compute 6/8/10/12 cores, ST/SMT2/SMT4/SMT8 Enhanced, Auto balancing threads 8 dispatch/16 execution pipes/224 instructions in flight Transactional Memory/ Crypto & CRC instructions Cache 64KB L1 + 512KB L2 / core 96MB L3 + up to 128MB L4 / socket System Interfaces 230 GB/s memory bandwidth / socket vs Intel 85GB/s Up to 48x Integrated PCI gen 3 / socket CAPI (over PCI gen 3) Robust, Large SMP Interconnect On chip Energy Mgmt, VRM / core IBM Journal of Research and Development Issue 1 Date Jan.-Feb. 2015 On IEEE Explore - Link
  • 19. 19 POWER9 Processor New Core Microarchitecture • Stronger thread performance • Efficient agile pipeline • POWER ISA v3.0 Enhanced Cache Hierarchy • 120MB NUCA L3 architecture • 12 x 20-way associative regions • Advanced replacement policies • Fed by 7 TB/s on-chip bandwidth Cloud + Virtualization Innovation • Quality of service assists • New interrupt architecture • Workload optimized frequency • Hardware enforced trusted execution Leadership Hardware Acceleration • Enhanced on-chip acceleration • Nvidia NVLink 2.0: High bandwidth, advanced new features (25G Link) • CAPI 2.0: Coherent accelerator and storage attach (PCIe G4) • New CAPI: Improved latency and bandwidth, open interface (25G Link) State of the Art I/O Subsystem • PCIe Gen4 – 48 lanes High Bandwidth Signaling Technology • 16 Gb/s interface – Local SMP • 25 Gb/s interface – BlueLink – Accelerator, remote SMP 14nm finFET Semiconductor Process • Improved device performance and reduced energy • 17 layer metal stack and eDRAM • 8.0 billion transistors
  • 20. InfiniBand remains our strategic interconnect IB is an open standard with a proven track record over a decade of forward and backward compatibility IB features rich hardware offload • “Intelligent” Mellanox IB NIC versus Intel Omni- Path NIC More CPU cycles available for application processing with less CPU frequency sensitivity, processor/memory occupancy, jitter • MPI collectives acceleration in the IB fabric, reducing CPU overhead • Additional intelligence in the switch • IB features GPU direct, reducing CPU overhead • IB features NVMe over Fabrics support, reducing CPU overhead • IB has hardware-controlled, end-to-end retry IB supports multiple network topologies for cost optimization • Fat tree, 3D torus, Dragonfly+
  • 21. Open Industry Coherent Attach • Latency / Bandwidth Improvement • Removes Overhead from Attach Silicon • Eliminates “Von-Neumann Bottleneck” • FPGA / Parallel Compute Optimized • Network/Memory/Storage Innovation http://opencapi.org Consortium announced October 14, 2016 Open Forum to Manage the OpenCAPI Specification & Ecosystem