SlideShare a Scribd company logo
© 2006 IBM Corporation
Intro to Cell Broadband
Engine for HPC
H. Peter Hofstee
Cell/B.E. Chief Scientist
Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc.
IBM Systems and Technology Group
2 © 2006 IBM CorporationCell/B.E. for HPC
Dealing with the Memory Wall in the Compute Node
Manage locality
– Cell/B.E. does this explicitly, but applies to nearly every processor once you tune
Go very (thread) parallel in the node
– Many relatively slow threads so that memory appears closer
– IBM BlueGene, Sun Niagra, CRAY XMT, …
Prefetch
– Generalization of long-vector ( compute is the easy part )
– Cell/B.E. (code and data), old-style CRAY
All place a burden on programmers
– Automatic caching has its limits
– Auto-parallelization has its limits
– Automatic pre-fetching / Deep auto-vectorization has its limits
All have proven efficiency benefits
– BGP and RoadRunner have about same GFlops/W
– Both have significantly improved application efficiency on a variety of applications over
clusters of conventional processors.
3 © 2006 IBM CorporationCell/B.E. for HPC
Memory Managing Processor vs. Traditional General Purpose Processor
IBM
AMD
Intel
Cell
BE
4 © 2006 IBM CorporationCell/B.E. for HPC
5 © 2006 IBM CorporationCell/B.E. for HPC
2006 2007 2008 2009 2010
Cell BE
(1+8)
90nm SOI
Cell BE
(1+8)
90nm SOI
Cost
Reduction
Path
Next Gen
(2PPE’+32SPE’)
45nm SOI
~1 TF-SP (est.)
Cell Broadband Engine™ Architecture (CBEA)
Technology Competitive Roadmap
Performance
Enhancements/
Scaling Path Enhanced
Cell
(1+8eDP SPE)
65nm SOI
Enhanced
Cell
(1+8eDP SPE)
65nm SOI
Cell eDP chip:
To be used in Roadrunner
IBM® PowerXCell™ 8i
102.4 GF/s double precision
Up to16 GB DDR2 @ 21-25 GB/s
PowerXCell is IBM’s name for
this new enhanced double-
precision (eDP) Cell processor
variant
All future dates and specifications are estimations only; Subject to change without notice.
Dashed outlines indicate concept designs.
Continued
shrinks
Cell/B.E.
(1+8)
65nm SOI
Cell/B.E.
(1+8)
45nm SOI
6 © 2006 IBM CorporationCell/B.E. for HPC
Boeing 777 iRT Demo
Hybrid Configuration
Ridgeback memory server (112GB memory)
QS21 rendering accelerators (6 Tflops, 14
blades)
350M Triangle model
25GB working set
23000x more complex than today’s game
models
On demand transfers to blades
NFS RDMA over IB
Real-time 1080p ray-traced output
Compute Hierarchy
Head node load balancing blades
PPE load balancing SPEs
Transparent Memory Hierarchy128GB 2GB 256KB
(x86 disk) –> (x86 memory) –> (Cell memory) –> (SPE local store) –> (SPE register file)
120MB/sec 2GB/sec 25GB/sec 50GB/sec
http://gametomorrow.com/blog/index.php/2007/11/09/cell-and-the-boeing-777-at-sc07/
7 © 2006 IBM CorporationCell/B.E. for HPC
A Roadrunner “Triblade” node integrates Cell and
Opteron blades
QS22 is a future IBM Cell blade
containing two new enhanced double-
precision (eDP/PowerXCell™) Cell chips
Expansion blade connects two QS22 via
four internal PCI-E x8 links to LS21 and
provides the node’s ConnectX IB 4X
DDR cluster attachment
LS21 is an IBM dual-socket Opteron
blade
4-wide IBM BladeCenter packaging
Roadrunner Triblades are completely
diskless and run from RAM disks with
NFS & Panasas only to the LS21
Node design points:
– One Cell chip per Opteron core
– ~400 GF/s double-precision &
~800 GF/s single-precision
– 16 GB Cell memory &
8 GB Opteron memory
Cell eDP Cell eDP
HT2100
Cell eDP
QS22
2xPCI-E
x16
(Unused)
HT x16AMD
Dual Core
LS21
Std
PCI-E
Connector
HSDC
Connector
(unused)
IB
4x
DDR PCI-E x8
PCI-E x8
HT x16
HT x16
HT x16
QS22
I/O Hub I/O Hub
I/O Hub I/O Hub
2 x HT x16
Exp.
Connector
Dual PCI-E x8 flex-cable
2xPCI-E
x16
(Unused)
2x PCI-E
x8
AMD
Dual Core
Cell eDP
Dual PCI-E x8 flex-cable
Expansion
blade
HT2100
IB 4x DDR
to cluster
2 x HT x16
Exp.
Connector
HT x164 GB 4 GB
4 GB
4 GB 4 GB
4 GB
2x PCI-E
x8
8 © 2006 IBM CorporationCell/B.E. for HPC
Roadrunner is a hybrid Cell-accelerated 1.4 PF system of
modest size delivered in 2008
18 clusters
3,456 nodes
12 links per CU to each of 8 switches
Eight 2nd-stage 288-port IB 4X DDR switches
Connected Unit (CU) cluster
180 compute nodes w/ Cells
12 I/O nodes
288-port IB 4x DDR 288-port IB 4x DDR
12,960 Cell eDP chips ⇒ 1.3 PF, 52 TB
6,912 dual-core Opterons ⇒ 50 TF, 28 TB
12,960 Cell eDP chips ⇒ 1.3 PF, 52 TB
6,912 dual-core Opterons ⇒ 50 TF, 28 TB
PCI-e attached
Cell blades
I/O
296 racks
3.9 MW
9 © 2006 IBM CorporationCell/B.E. for HPC
Roadrunner Entry Level System
12 Hybrid Node Cluster
Hybrid Compute Node
– 24 - QS22s a future IBM Cell blade containing two new
enhanced double-precision IBM® PowerXCell™8i
processors
– 12 - LS21 an IBM dual-socket Opteron blade
– Conneced via four PCI-e x8 links
– Includes a ConnectX IB 4X DDR cluster attachment
– Compute node is diskless
IBM x3655 I/O and Management Servers
4-wide IBM BladeCenter packaging
24 Port IB 4X DDR Switch & Fabric
RHEL & Fedora Linux
IBM SDK 3.0 for Multicore Acceleration
IBM xCAT Cluster Management
– System-wide GigEnet network
Performance
Host Cell Total
Peak (TF) 0.35 4.92 5.26
Memory (GB) 96 192 288
Ext IO (GB/s) 1.2
10 © 2006 IBM CorporationCell/B.E. for HPC
Cell and hybrid speedup results are promising.
all comparisons are to a single Opteron core
parallel behavior unaffected, as will be shown in the scaling results
Cell / hybrid SPaSM implementation does twice the work of Opteron-only code
Milagro Cell-only results are preliminary
first 3 columns are measured, last column is projected
Application Type
Cell Only
(kernels)
Hybrid (Opteron+Cell)
CBE eDP CBE+IB eDP+PCIe
SPaSM full app 3x 4.5x 2.5x >4x
VPIC full app 9x 9x 6x >7x
Milagro full app 5x 6.5x 5x >6x
Sweep3D kernel 5x 9x 5x >5x
Courtesy John Turner, LANL
11 © 2006 IBM CorporationCell/B.E. for HPC
These results were achieved with a relatively modest level
of effort.
Code Class Language
Lines of code
FY07
FTEs
Orig. Modified
VPIC full app C/C++ 8.5k 10% 2
SPaSM full app C 34k 20% 2
Milagro full app C++ 110k 30% 2 x 1
Sweep3D kernel C 3.5k 50% 2 x 1
all staff started with little or no knowledge of Cell / hybrid programming
2 x 1 denotes separate efforts of roughly 1 FTE each
most efforts also added code
Courtesy John Turner, LANL
12 © 2006 IBM CorporationCell/B.E. for HPC
Where can we take Cell/B.E. next?
Build bridges to facilitate code porting and code portability
– E.g. compiler managed instruction and data caches
– Target is competitive chip-level efficiency without Cell-specific software
– Still allows full Cell benefit with optimized libraries and tuning
– E.g. Multicore (and) Acceleration software development toolkit
– Allow a wider audience to write parallel codes for a node
– Porting across wide variety of systems
Continue to enhance the Synergistic Processor Elements
– Continue to increase application reach
– Continue to measure ourselves on
– application performance/W
– application performance/mm2
Integrate the equivalent of a RoadRunner node on a chip
– Leverage Power 7 technology
– Allows a 10PFlop system of reasonable size
– Improved SPE – main core latency and bandwidth
– Improved cross-system latencies
Ad

More Related Content

What's hot (18)

Technology (1)
Technology (1)Technology (1)
Technology (1)
firstnameoRZLPPq3F lastnameoRZLPPq3F
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
AMD Developer Central
 
TotalView Debugger On Blue Gene
TotalView Debugger On Blue GeneTotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Totalviewtech
 
Zpudemo
ZpudemoZpudemo
Zpudemo
flexcore
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
Michael Gschwind
 
2008-10-14 Red Hat Update for IBM System z
2008-10-14 Red Hat Update for IBM System z2008-10-14 Red Hat Update for IBM System z
2008-10-14 Red Hat Update for IBM System z
Shawn Wells
 
Zpu
ZpuZpu
Zpu
flexcore
 
Introduction to BeagleBone Black
Introduction to BeagleBone BlackIntroduction to BeagleBone Black
Introduction to BeagleBone Black
SysPlay eLearning Academy for You
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
Shawn Wells
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Slide_N
 
Introduction to BeagleBoard-xM
Introduction to BeagleBoard-xMIntroduction to BeagleBoard-xM
Introduction to BeagleBoard-xM
SysPlay eLearning Academy for You
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
inside-BigData.com
 
Technology
TechnologyTechnology
Technology
Nishant Rayan
 
Mp So C 18 Apr
Mp So C 18 AprMp So C 18 Apr
Mp So C 18 Apr
FNian
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
AllineaSoftware
 
Ibm cell
Ibm cell Ibm cell
Ibm cell
Vyanktesh Dorlikar
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
Roberto Brandao
 
Blue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputerBlue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputer
Isaaq Mohammed
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
AMD Developer Central
 
TotalView Debugger On Blue Gene
TotalView Debugger On Blue GeneTotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Totalviewtech
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
Michael Gschwind
 
2008-10-14 Red Hat Update for IBM System z
2008-10-14 Red Hat Update for IBM System z2008-10-14 Red Hat Update for IBM System z
2008-10-14 Red Hat Update for IBM System z
Shawn Wells
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
Shawn Wells
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Slide_N
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
inside-BigData.com
 
Mp So C 18 Apr
Mp So C 18 AprMp So C 18 Apr
Mp So C 18 Apr
FNian
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
AllineaSoftware
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
Roberto Brandao
 
Blue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputerBlue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputer
Isaaq Mohammed
 

Similar to Intro to Cell Broadband Engine for HPC (20)

IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
Ganesan Narayanasamy
 
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Slide_N
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
Heiko Joerg Schick
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
lambertt
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
Heiko Joerg Schick
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
Sucharita Bohidar
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Anand Haridass
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01
Yalçin KARACA
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 
OpenPOWER Seminar at IIIT Bangalore
OpenPOWER Seminar at IIIT BangaloreOpenPOWER Seminar at IIIT Bangalore
OpenPOWER Seminar at IIIT Bangalore
Ganesan Narayanasamy
 
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell ProcessorRoadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Slide_N
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
Sri Prasanna
 
intel business presentation 77777777777.pptx
intel business presentation 77777777777.pptxintel business presentation 77777777777.pptx
intel business presentation 77777777777.pptx
AnjaliSharma489502
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
solarisyougood
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
Kangaroot
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
Yutaka Kawai
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
solarisyougood
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Anand Haridass
 
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Slide_N
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
lambertt
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
Heiko Joerg Schick
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Anand Haridass
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01
Yalçin KARACA
 
OpenPOWER Seminar at IIIT Bangalore
OpenPOWER Seminar at IIIT BangaloreOpenPOWER Seminar at IIIT Bangalore
OpenPOWER Seminar at IIIT Bangalore
Ganesan Narayanasamy
 
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell ProcessorRoadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Slide_N
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
Sri Prasanna
 
intel business presentation 77777777777.pptx
intel business presentation 77777777777.pptxintel business presentation 77777777777.pptx
intel business presentation 77777777777.pptx
AnjaliSharma489502
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
solarisyougood
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
Kangaroot
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
Yutaka Kawai
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
solarisyougood
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Anand Haridass
 
Ad

More from Slide_N (20)

IBM: Introduction to the Cell Multiprocessor
IBM: Introduction to the Cell MultiprocessorIBM: Introduction to the Cell Multiprocessor
IBM: Introduction to the Cell Multiprocessor
Slide_N
 
IBM: Introduction to the Cell Broadband Engine Architecture
IBM: Introduction to the Cell Broadband Engine ArchitectureIBM: Introduction to the Cell Broadband Engine Architecture
IBM: Introduction to the Cell Broadband Engine Architecture
Slide_N
 
AMD: The Next Generation of Microprocessors
AMD: The Next Generation of MicroprocessorsAMD: The Next Generation of Microprocessors
AMD: The Next Generation of Microprocessors
Slide_N
 
Cryptologic Applications of the PlayStation 3: Cell SPEED
Cryptologic Applications of the PlayStation 3: Cell SPEEDCryptologic Applications of the PlayStation 3: Cell SPEED
Cryptologic Applications of the PlayStation 3: Cell SPEED
Slide_N
 
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Slide_N
 
Roadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Roadrunner: Heterogeneous Petascale Computing for Predictive SimulationRoadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Roadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Slide_N
 
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los AlamosDriving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Slide_N
 
Petascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial ResultsPetascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results
Slide_N
 
The Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to RoadrunnerThe Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to Roadrunner
Slide_N
 
Roadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed ComputingRoadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed Computing
Slide_N
 
Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3
Slide_N
 
POWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER ProcessorPOWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER Processor
Slide_N
 
IBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group DevelopmentIBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group Development
Slide_N
 
IBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processorIBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processor
Slide_N
 
Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
Slide_N
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
Slide_N
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
Slide_N
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
IBM: Introduction to the Cell Multiprocessor
IBM: Introduction to the Cell MultiprocessorIBM: Introduction to the Cell Multiprocessor
IBM: Introduction to the Cell Multiprocessor
Slide_N
 
IBM: Introduction to the Cell Broadband Engine Architecture
IBM: Introduction to the Cell Broadband Engine ArchitectureIBM: Introduction to the Cell Broadband Engine Architecture
IBM: Introduction to the Cell Broadband Engine Architecture
Slide_N
 
AMD: The Next Generation of Microprocessors
AMD: The Next Generation of MicroprocessorsAMD: The Next Generation of Microprocessors
AMD: The Next Generation of Microprocessors
Slide_N
 
Cryptologic Applications of the PlayStation 3: Cell SPEED
Cryptologic Applications of the PlayStation 3: Cell SPEEDCryptologic Applications of the PlayStation 3: Cell SPEED
Cryptologic Applications of the PlayStation 3: Cell SPEED
Slide_N
 
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Slide_N
 
Roadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Roadrunner: Heterogeneous Petascale Computing for Predictive SimulationRoadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Roadrunner: Heterogeneous Petascale Computing for Predictive Simulation
Slide_N
 
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los AlamosDriving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Driving a Hybrid in the Fast-lane: The Petascale Roadrunner System at Los Alamos
Slide_N
 
Petascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial ResultsPetascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results
Slide_N
 
The Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to RoadrunnerThe Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to Roadrunner
Slide_N
 
Roadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed ComputingRoadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed Computing
Slide_N
 
Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3
Slide_N
 
POWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER ProcessorPOWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER Processor
Slide_N
 
IBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group DevelopmentIBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group Development
Slide_N
 
IBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processorIBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processor
Slide_N
 
Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
Slide_N
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
Slide_N
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
Slide_N
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
Ad

Recently uploaded (20)

Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 

Intro to Cell Broadband Engine for HPC

  • 1. © 2006 IBM Corporation Intro to Cell Broadband Engine for HPC H. Peter Hofstee Cell/B.E. Chief Scientist Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc. IBM Systems and Technology Group
  • 2. 2 © 2006 IBM CorporationCell/B.E. for HPC Dealing with the Memory Wall in the Compute Node Manage locality – Cell/B.E. does this explicitly, but applies to nearly every processor once you tune Go very (thread) parallel in the node – Many relatively slow threads so that memory appears closer – IBM BlueGene, Sun Niagra, CRAY XMT, … Prefetch – Generalization of long-vector ( compute is the easy part ) – Cell/B.E. (code and data), old-style CRAY All place a burden on programmers – Automatic caching has its limits – Auto-parallelization has its limits – Automatic pre-fetching / Deep auto-vectorization has its limits All have proven efficiency benefits – BGP and RoadRunner have about same GFlops/W – Both have significantly improved application efficiency on a variety of applications over clusters of conventional processors.
  • 3. 3 © 2006 IBM CorporationCell/B.E. for HPC Memory Managing Processor vs. Traditional General Purpose Processor IBM AMD Intel Cell BE
  • 4. 4 © 2006 IBM CorporationCell/B.E. for HPC
  • 5. 5 © 2006 IBM CorporationCell/B.E. for HPC 2006 2007 2008 2009 2010 Cell BE (1+8) 90nm SOI Cell BE (1+8) 90nm SOI Cost Reduction Path Next Gen (2PPE’+32SPE’) 45nm SOI ~1 TF-SP (est.) Cell Broadband Engine™ Architecture (CBEA) Technology Competitive Roadmap Performance Enhancements/ Scaling Path Enhanced Cell (1+8eDP SPE) 65nm SOI Enhanced Cell (1+8eDP SPE) 65nm SOI Cell eDP chip: To be used in Roadrunner IBM® PowerXCell™ 8i 102.4 GF/s double precision Up to16 GB DDR2 @ 21-25 GB/s PowerXCell is IBM’s name for this new enhanced double- precision (eDP) Cell processor variant All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. Continued shrinks Cell/B.E. (1+8) 65nm SOI Cell/B.E. (1+8) 45nm SOI
  • 6. 6 © 2006 IBM CorporationCell/B.E. for HPC Boeing 777 iRT Demo Hybrid Configuration Ridgeback memory server (112GB memory) QS21 rendering accelerators (6 Tflops, 14 blades) 350M Triangle model 25GB working set 23000x more complex than today’s game models On demand transfers to blades NFS RDMA over IB Real-time 1080p ray-traced output Compute Hierarchy Head node load balancing blades PPE load balancing SPEs Transparent Memory Hierarchy128GB 2GB 256KB (x86 disk) –> (x86 memory) –> (Cell memory) –> (SPE local store) –> (SPE register file) 120MB/sec 2GB/sec 25GB/sec 50GB/sec http://gametomorrow.com/blog/index.php/2007/11/09/cell-and-the-boeing-777-at-sc07/
  • 7. 7 © 2006 IBM CorporationCell/B.E. for HPC A Roadrunner “Triblade” node integrates Cell and Opteron blades QS22 is a future IBM Cell blade containing two new enhanced double- precision (eDP/PowerXCell™) Cell chips Expansion blade connects two QS22 via four internal PCI-E x8 links to LS21 and provides the node’s ConnectX IB 4X DDR cluster attachment LS21 is an IBM dual-socket Opteron blade 4-wide IBM BladeCenter packaging Roadrunner Triblades are completely diskless and run from RAM disks with NFS & Panasas only to the LS21 Node design points: – One Cell chip per Opteron core – ~400 GF/s double-precision & ~800 GF/s single-precision – 16 GB Cell memory & 8 GB Opteron memory Cell eDP Cell eDP HT2100 Cell eDP QS22 2xPCI-E x16 (Unused) HT x16AMD Dual Core LS21 Std PCI-E Connector HSDC Connector (unused) IB 4x DDR PCI-E x8 PCI-E x8 HT x16 HT x16 HT x16 QS22 I/O Hub I/O Hub I/O Hub I/O Hub 2 x HT x16 Exp. Connector Dual PCI-E x8 flex-cable 2xPCI-E x16 (Unused) 2x PCI-E x8 AMD Dual Core Cell eDP Dual PCI-E x8 flex-cable Expansion blade HT2100 IB 4x DDR to cluster 2 x HT x16 Exp. Connector HT x164 GB 4 GB 4 GB 4 GB 4 GB 4 GB 2x PCI-E x8
  • 8. 8 © 2006 IBM CorporationCell/B.E. for HPC Roadrunner is a hybrid Cell-accelerated 1.4 PF system of modest size delivered in 2008 18 clusters 3,456 nodes 12 links per CU to each of 8 switches Eight 2nd-stage 288-port IB 4X DDR switches Connected Unit (CU) cluster 180 compute nodes w/ Cells 12 I/O nodes 288-port IB 4x DDR 288-port IB 4x DDR 12,960 Cell eDP chips ⇒ 1.3 PF, 52 TB 6,912 dual-core Opterons ⇒ 50 TF, 28 TB 12,960 Cell eDP chips ⇒ 1.3 PF, 52 TB 6,912 dual-core Opterons ⇒ 50 TF, 28 TB PCI-e attached Cell blades I/O 296 racks 3.9 MW
  • 9. 9 © 2006 IBM CorporationCell/B.E. for HPC Roadrunner Entry Level System 12 Hybrid Node Cluster Hybrid Compute Node – 24 - QS22s a future IBM Cell blade containing two new enhanced double-precision IBM® PowerXCell™8i processors – 12 - LS21 an IBM dual-socket Opteron blade – Conneced via four PCI-e x8 links – Includes a ConnectX IB 4X DDR cluster attachment – Compute node is diskless IBM x3655 I/O and Management Servers 4-wide IBM BladeCenter packaging 24 Port IB 4X DDR Switch & Fabric RHEL & Fedora Linux IBM SDK 3.0 for Multicore Acceleration IBM xCAT Cluster Management – System-wide GigEnet network Performance Host Cell Total Peak (TF) 0.35 4.92 5.26 Memory (GB) 96 192 288 Ext IO (GB/s) 1.2
  • 10. 10 © 2006 IBM CorporationCell/B.E. for HPC Cell and hybrid speedup results are promising. all comparisons are to a single Opteron core parallel behavior unaffected, as will be shown in the scaling results Cell / hybrid SPaSM implementation does twice the work of Opteron-only code Milagro Cell-only results are preliminary first 3 columns are measured, last column is projected Application Type Cell Only (kernels) Hybrid (Opteron+Cell) CBE eDP CBE+IB eDP+PCIe SPaSM full app 3x 4.5x 2.5x >4x VPIC full app 9x 9x 6x >7x Milagro full app 5x 6.5x 5x >6x Sweep3D kernel 5x 9x 5x >5x Courtesy John Turner, LANL
  • 11. 11 © 2006 IBM CorporationCell/B.E. for HPC These results were achieved with a relatively modest level of effort. Code Class Language Lines of code FY07 FTEs Orig. Modified VPIC full app C/C++ 8.5k 10% 2 SPaSM full app C 34k 20% 2 Milagro full app C++ 110k 30% 2 x 1 Sweep3D kernel C 3.5k 50% 2 x 1 all staff started with little or no knowledge of Cell / hybrid programming 2 x 1 denotes separate efforts of roughly 1 FTE each most efforts also added code Courtesy John Turner, LANL
  • 12. 12 © 2006 IBM CorporationCell/B.E. for HPC Where can we take Cell/B.E. next? Build bridges to facilitate code porting and code portability – E.g. compiler managed instruction and data caches – Target is competitive chip-level efficiency without Cell-specific software – Still allows full Cell benefit with optimized libraries and tuning – E.g. Multicore (and) Acceleration software development toolkit – Allow a wider audience to write parallel codes for a node – Porting across wide variety of systems Continue to enhance the Synergistic Processor Elements – Continue to increase application reach – Continue to measure ourselves on – application performance/W – application performance/mm2 Integrate the equivalent of a RoadRunner node on a chip – Leverage Power 7 technology – Allows a 10PFlop system of reasonable size – Improved SPE – main core latency and bandwidth – Improved cross-system latencies