SlideShare a Scribd company logo
The energy-efficient, high performance 64bit
DOME µserver - an industry inclination point
Ronald P. Luijten – Data Motion Architect
lui@zurich.ibm.com
IBM Research - Zurich
12 June 2015
DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
COMPUTE is FREE – DATA is NOT
Ronald P. Luijten – Data Motion Architect
lui@zurich.ibm.com
IBM Research - Zurich
12 June 2015
DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
DOME:
• ppp Astron, IBM, Dutch gvt
• 20MEur gvt funding over 5 years
• Started feb 2012
Ronald P. Luijten – Analyst Meeting – 12 June 2015 3
© 2012 IBM Corporation
SKA (Square Kilometer Array) to measure Big Bang
Picture source: NZZ march 2014
Big
Bang Inflation
Protons
created
Start of
nucleosynthesis
through fusion
End of nucleo-
synthesis Modern Universe
0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years
Ronald P. Luijten – Analyst Meeting – 12 June 2015 4
DOME µServer Motivation & Objectives
•Create the worlds highest density 64 bit µ-server drawer
–Useful to evaluate both SKA radio-astronomy and IBM future business
–Platform for Business Analytics appliance pre-product research
–High energy efficiency / very low cost
–Commodity components, HW + SW standards based
–Leverage ‘free computing’ paradigm
–Enhance with ‘Value Add’: packaging, system integration, …
–Density and speed of light
•Most efficient cooling using IBM technology
(ref: SuperMUC June 2012 TOP500 machine)
•Must be true 64 bit to enable business applications
•Must run server class OS (SLES11 or RHEL6, or equivalent)
–Precluded ARM (64-bit Silicon was not available)
–PPC64 is available in SoC from FSL since 2011
–(no $$$ to build a new SoC…)
•This is the DOME project capability demonstrator – not a product
Ronald P. Luijten – Analyst Meeting – 12 June 2015 5
Definition
µServer:
The integration of an entire server node motherboard*
into a single microchip except DRAM, Nor-boot flash
and power conversion logic.
305mm
245mm
139mmx55mm
* no graphics
Ronald P. Luijten – Analyst Meeting – 12 June 2015 6
Definition
µServer:
The integration of an entire server node motherboard*
into a single microchip except DRAM, Nor-boot flash
and power conversion logic.
305mm
245mm
139mmx55mm
This does NOT imply low performance!
* no graphics
Ronald P. Luijten – Analyst Meeting – 12 June 2015 7
T4240 Chip Overview
12 core – fully dual threaded
1.8 GHz ppc64 (e6500)
12 DP-FPU; 12 128b Altivec
3 DDR3 channels at 1.86GT/s
3x 0.5MB L3 cache
4x 10GbE + 2x SATA
PCIe 3.0
HW packet acceleration
RegEx Pattern Match acc.
Crypto acceleration
28nm TSMC Bulk CMOS
239mm2 - ~1.7B transistors
111Mbit SRAM, 6M FF
7 Power states (2 power gating)
Ronald P. Luijten – Analyst Meeting – 12 June 2015 8
DOME compute node board diagram
T4240
16GB
DRAM
72bit
16GB
DRAM
72bit
PSoC
1Gbit SPI
flash
Power
converter
USB
JTAG
Serial
I2C
4 x
10 GbE
PCIe x8 2 x SATA
16GB
DRAM
72bit
1866 MT/s 1866 MT/s
1866 MT/s
1V / 40A
12V / 2.5A
Ronald P. Luijten – Analyst Meeting – 12 June 2015 9
DOME compute node board diagram
T4240
DRAM DRAM
PSoC
SPI
flash
Power
converter
USB
JTAG
Serial
I2C
4 x
10 GbE
PCIe x8 2 x SATA
DRAM
12V / 2.5A
PSOC collapses 6 functions into a small
chip to save Area, Power and Cost
1. On/Off and Power up sequencing
2. Provide uServer boot configuration
3. JTAG debug access
4. Serial port access (Linux)
5. Temperature monitoring and protection
• and current measurement
6. Management interface and control
Ronald P. Luijten – Analyst Meeting – 12 June 2015 10
133 mm
30 mm
Standard 240 pin DDR3
memory DIMM board
133 mm
55 mm
139 mm
P5020 SoC
P5020/P5040
(Generation 1)
T4240
Generation 2
(Lid Removed)
139 mm
DOME Compute node board form factor
55mm
FRONT
BACK
Decoupling
Capacitors
area
(lid removed)
T4240 SoC
Ronald P. Luijten – Analyst Meeting – 12 June 2015 11
Planned System: 2U rack unit
19” 2U Chassis w/ Combined Cooling & Power
128 compute node boards
1536 cores / 3072 Threads
6 TB DRAM
1.28Tbps Ethernet (@40Gbps)
Datacenter-in-a-box
• Expected 2U unit total power: ~ 6kW
• Integrated mains power converter to 12V distribution: 12V / 500A
• Each compute node has own 12V / 40W converter
• Common Power Converter boards for all other supplies
• High radix 10GbE / 40GbE switch boards (under construction)
• Connects to Mains, Rack level Water, 32x 40Gbps Ethernet
• Hot-water cooled for efficiency and density
Ronald P. Luijten – Analyst Meeting – 12 June 2015 12
Cooling variant
Inlet
water
[C]
Junction
temp Tj
[C]
Measured
thermal Res.
[K/W]
Maximum cooling
capacity
[W]
OF R240 Cu, no heat pipe 45 85 1.11 36
OF R240 Cu with heat pipe 45 85 0.85 47
OF R240 Cu with heat pipe 45 75 0.85 36
Electrical +
Thermal
Interface
Water In
Water In
Compute Nodes
3 layer
Laminated
Copper Plate
FR4 Carrier
SoC
Node Cooling Design & Validation
Power converter boards
Storage boards
CeBIT Demo, april 15
Ronald P. Luijten – Analyst Meeting – 12 June 2015 13
Performance Measurement Results
CPU Freescale T4240
12 cores; 24 thr.
28nm Bulk
Intel Xeon E3-1230L v3
4 cores; 8 threads
22nm FinFet
CPU2006 Benchmark
Test Environment
System: T4240RDB-PB
1.666 GHz core clock,
1.866 GT/s 6GB DRAM, 3 channels
Fedora 20, Kernel 3.12.19
GCC 4.7.2
gcc options: -O3 -mcpu=powerpc64
System: Supermicro X10SAE
1.8 GHz core clock; Turbo disabled
1.666 GT/s 8 GB DRAM, 2 channels
Fedora 19, Kernel 3.13.9
GCC 4.8.2
gcc options: -O3 -march=native -mtune=native
CINT-base – 1 thread
6.86 20.7
CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads)
Coremark - all threads 188K (24 threads) 65K (8 threads)
Ronald P. Luijten – Analyst Meeting – 12 June 2015 14
Performance Measurement Results
CPU Freescale T4240
12 cores; 24 thr.
28nm Bulk
Intel Xeon E3-1230L v3
4 cores; 8 threads
22nm FinFet
CPU2006 Benchmark
Test Environment
System: T4240RDB-PB
1.666 GHz core clock,
1.866 GT/s 6GB DRAM, 3 channels
Fedora 20, Kernel 3.12.19
GCC 4.7.2
gcc options: -O3 -mcpu=powerpc64
System: Supermicro X10SAE
1.8 GHz core clock; Turbo disabled
1.666 GT/s 8 GB DRAM, 2 channels
Fedora 19, Kernel 3.13.9
GCC 4.8.2
gcc options: -O3 -march=native -mtune=native
CINT-base – 1 thread
6.86 20.7
CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads)
Coremark - all threads 188K (24 threads) 65K (8 threads)
40% more performance @ 70% of node level energy
consumption 2x more operations per Watt
Ronald P. Luijten – Analyst Meeting – 12 June 2015 15
Power Measurement Results
Power measurement on rev 1 board #5, on 7 + 8 april 2015; PSoC firmware 2-mar-15
current measurements at 12V input of power converters, T4240 temp < 65C
voltage domain
current measured @ 12V input
condition mA W mA W A W W
PSOC only power 3.4 0.0408 74 0.888 0.0008 0.0096 0.9384
T4240 power on, kept in reset 75 0.9 152 1.824 0.32 3.84 6.564
u-boot prompt (idle) 77.6 0.9312 350 4.2 1.48 17.76 22.8912
Linux prompt, idle system 77.6 0.9312 315 3.78 1.58 18.96 23.6712
BW_MEM512M, 24 thr 77.3 0.9276 450 5.4 1.65 19.8 26.1276
stream, 24 thread 77.3 0.9276 470 5.64 1.65 19.8 26.3676
BW_MEM512, 24 thr 77.7 0.9324 320 3.84 2.53 30.36 35.1324
idle at XCFE desktop 77.7 0.9324 320 3.84 1.6 19.2 23.9724
SpecInt PerlBench, 24 thr 77.8 0.9336 400 4.8 2.63 31.56 37.2936
SpecInt PerlBench, 12 thr 78 0.936 355 4.26 2.2 26.4 31.596
SpecInt gcc, 12 thr 78 0.936 416 4.992 1.7 20.4 26.328
total
node
1V0 coreDRAM1V8 I/O
Ronald P. Luijten – Analyst Meeting – 12 June 2015 16
Remarks
New Big-Data Metric: Memory BW density
use raw memory BW available at SoC or CPU
divide by volume of entire enclosure, incl. HDD, PCI slots
DOME 128node 2U rack unit: 159GB/s/Liter (peak)
P8 server S822L (dual socket): 13.9GB/s/Liter (peak)
• New era – perfect storm and Innovators Dilemma
• µServer is all about SoC and packaging
• This is a serendipitous data point
Ronald P. Luijten – Analyst Meeting – 12 June 2015 17
Status and Plans
Until YE 2015
2016: a new compute node
Beyond 2016
Ronald P. Luijten – Analyst Meeting – 12 June 2015 18
LIVE DEMO
We demonstrate a
single node running:
• Fedora 20
• XFCE Desktop
• Stream
• CPMD
• And… live 1V domain
current measurement
We show a revision-1 board T4240ZMS compute server:
• Larger than DOME form factor, same netlist
• All components on top side (save bring-up time and expense)
• Air-cooled for single node operation
compute node mini BaseBoard
Ronald P. Luijten – Analyst Meeting – 12 June 2015 19
T4240ZMS rev 1 board
Single node baseboard
GbE
SATA (mSata slot, SATA data connector)
Various power supplies
USB
uSD card
mSATAPHY
POWER
SUPPLIES
DEMO SETUP
Ronald P. Luijten – Analyst Meeting – 12 June 2015 20
R.Luijten, 8 Jan 2015
88E1111
PHY
192.168.1.152 NFS server 192.168.1.1 DHCP / TFTP server
T4240
24 HW thread
1625MHz
DRAM
8GB
1500MT
DRAM
8GB
1500MT
PSoC
SPI
flash
Power
converter
USB
JTAG
Serial
I2C
1GbE
DRAM
8GB
1500MT
T4240ZMS node:
-Revision 1 board
-slower speeds
-less memory
DIMM connector
A192.168.1.240
Management console
(current on 1V supply)
Serial console
VNC client into ZMS
Showing virtual desktop
Browser with DB2 / WMD
DEMO SETUP
SKA: http://www.skatelescope.org
DOME: http://www.dome-exascale.nl
µServer: http://www.zurich.ibm.com/microserver
T4240 system: http://swissdutch.ch:6999
Wikipedia: https://en.wikipedia.org/wiki/Microserver
Twitter: https://twitter.com/ronaldgadget
Videos:
Impossible µServer: http://t.co/4vEkEVEazO
Innovators Dilemma: http://youtu.be/imweQe8NgnI
DOME T4240 Fedora: http://youtu.be/D6da5DqcyQk
4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC
for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density
© 2015 IEEE
International Solid-State Circuits Conference 22 of 15
Links
Ronald P. Luijten – Analyst Meeting – 12 June 2015 22
Acknowledgements
This work is the results of many people
• Peter v. Ackeren, FSL
• Ed Swarthout, FSL Austin
• Dac Pham, FSL Austin
• Yvonne Chan, IBM Toronto
• Andreas Doering, IBM ZRL
• Alessandro Curioni, IBM ZRL
• Stephan Paredes, IBM ZRL
• Matteo Cossale, IBM ZRL
• James Nigel, FSL
• Boris Bialek, IBM Toronto
• Marco de Vos, Astron NL
• Vipin Patel, IBM Fishkill
• And many more remain unnamed….
Companies: FSL Austin, Belgium & Germany; IBM worldwide; Transfer - NL
Ronald P. Luijten – Analyst Meeting – 12 June 2015 23
“Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm
Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth
System Density”, R.Luijten et al., ISSCC15, San Francisco, Feb 2015
“The DOME embedded 64 bit microserver demonstrator”, R. Luijten and A. Doering,
ICICDT 2013, Pavia, Italy, May 2013
“Quantitative Analysis of the Berkeley Dwarfs' Parallelism and Data Movement
Properties”, Victoria Caparros Cabezas, Phillip Stanley-Marbell, ACM CF 2011, May
2011
“Performance, Power, and Thermal Analysis of Low-Power Processors for Scale-
Out Systems”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, IEEE HPPAC 2011,
May 2011
“Pinned to the Walls—Impact of Packaging and Application Properties on the
Memory and Power Walls”, Phillip Stanley-Marbell, Victoria Caparros Cabezas,
Ronald P. Luijten, IEEE ISLPED 2011, Aug 2011.
4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC
for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density
© 2015 IEEE
International Solid-State Circuits Conference 24 of 15
Literature
Ronald P. Luijten – Analyst Meeting – 12 June 2015 24
Questions???
PS. I like lightweight things
µServer website: www.swissdutch.ch
Ronald P. Luijten – Analyst Meeting – 12 June 2015 25

More Related Content

PDF
IBM and ASTRON 64bit μServer for DOME
PDF
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
PDF
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
PDF
A Look Inside Google’s Data Center Networks
PDF
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
PDF
SGI HPC Update for June 2013
PDF
From Rack scale computers to Warehouse scale computers
PPTX
Exascale Capabl
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
A Look Inside Google’s Data Center Networks
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
SGI HPC Update for June 2013
From Rack scale computers to Warehouse scale computers
Exascale Capabl

What's hot (20)

PDF
A Dataflow Processing Chip for Training Deep Neural Networks
PDF
SGI HPC DAY 2011 Kiev
PDF
Bullx HPC eXtreme computing cluster references
PDF
POWER10 innovations for HPC
PDF
Deep Learning on the SaturnV Cluster
PDF
Expectations for optical network from the viewpoint of system software research
PDF
Exploring the Performance Impact of Virtualization on an HPC Cloud
PDF
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
PDF
IBM Data Centric Systems & OpenPOWER
PPTX
Programmable Exascale Supercomputer
PDF
Stig Telfer - OpenStack and the Software-Defined SuperComputer
PDF
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
PDF
HPC Cloud: Clouds on supercomputers for HPC
PDF
NNSA Explorations: ARM for Supercomputing
PPTX
Sierra overview
PDF
OpenHPC: A Comprehensive System Software Stack
PDF
IEEE CloudCom 2014参加報告
PDF
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
PDF
Introduction VAUUM, Freezing, XID wraparound
PDF
A Fresh Look at HPC from Huawei Enterprise
A Dataflow Processing Chip for Training Deep Neural Networks
SGI HPC DAY 2011 Kiev
Bullx HPC eXtreme computing cluster references
POWER10 innovations for HPC
Deep Learning on the SaturnV Cluster
Expectations for optical network from the viewpoint of system software research
Exploring the Performance Impact of Virtualization on an HPC Cloud
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
IBM Data Centric Systems & OpenPOWER
Programmable Exascale Supercomputer
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
HPC Cloud: Clouds on supercomputers for HPC
NNSA Explorations: ARM for Supercomputing
Sierra overview
OpenHPC: A Comprehensive System Software Stack
IEEE CloudCom 2014参加報告
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Introduction VAUUM, Freezing, XID wraparound
A Fresh Look at HPC from Huawei Enterprise
Ad

Viewers also liked (7)

PDF
Big Data and Implications on Platform Architecture
PDF
Presentation dell - into the cloud with dell
PDF
Citrix reference architecture for xen mobile 8 5_july2013
PDF
HP Micro Server remote access card user manual
PDF
Optimizing Dell PowerEdge Configurations for Hadoop
PDF
Micro Server Design - Open Compute Project
PPT
Wireless Microserver
Big Data and Implications on Platform Architecture
Presentation dell - into the cloud with dell
Citrix reference architecture for xen mobile 8 5_july2013
HP Micro Server remote access card user manual
Optimizing Dell PowerEdge Configurations for Hadoop
Micro Server Design - Open Compute Project
Wireless Microserver
Ad

Similar to IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver (20)

PDF
DOME 64-bit μDataCenter
PDF
Barcelona Supercomputing Center, Generador de Riqueza
PPTX
Building Efficient Edge Nodes for Content Delivery Networks
PDF
IBM Flex System x440 Compute Node
PDF
BUD17 Socionext SC2A11 ARM Server SoC
PDF
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
PPTX
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
PDF
IBM Flex System p24L, p260 and p460 Compute Nodes
PPTX
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
PPTX
Future Cloud Infrastructure
PDF
IBM Redbooks Product Guide: IBM System x3250 M4
PPTX
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
PDF
Flexible and Scalable Domain-Specific Architectures
PDF
IBM Redbooks Product Guide: IBM System x3630 M4
PPTX
Lenovo Blade Portfolio
PDF
Heterogeneous Computing : The Future of Systems
PDF
Blade Svr Comaprision sheet.pdf
PDF
IBM System x3300 M4
PPT
Valladolid final-septiembre-2010
PDF
Exaflop In 2018 Hardware
DOME 64-bit μDataCenter
Barcelona Supercomputing Center, Generador de Riqueza
Building Efficient Edge Nodes for Content Delivery Networks
IBM Flex System x440 Compute Node
BUD17 Socionext SC2A11 ARM Server SoC
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
IBM Flex System p24L, p260 and p460 Compute Nodes
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
Future Cloud Infrastructure
IBM Redbooks Product Guide: IBM System x3250 M4
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
Flexible and Scalable Domain-Specific Architectures
IBM Redbooks Product Guide: IBM System x3630 M4
Lenovo Blade Portfolio
Heterogeneous Computing : The Future of Systems
Blade Svr Comaprision sheet.pdf
IBM System x3300 M4
Valladolid final-septiembre-2010
Exaflop In 2018 Hardware

More from IBM Research (10)

PDF
IBM Research - Zurich Celebrates 60 Years of Science and Innovation
PDF
The Dilemmas of Innovation Management
PPTX
A Prototype Storage Subsystem based on Phase Change Memory
PDF
Big Data and the Future of Storage
PDF
The New Era of Cognitive Computing
PDF
Das IBM Forschungslabor als Arbeitgeber
PDF
Nano, SuperMUC and Photovoltaics:A Day in the Life of IBM Research - Zurich
PDF
Meet IBM Research
PDF
Dechema Conference: Istanbul
PDF
IBM Research: IBM 2010 Investor Briefing
IBM Research - Zurich Celebrates 60 Years of Science and Innovation
The Dilemmas of Innovation Management
A Prototype Storage Subsystem based on Phase Change Memory
Big Data and the Future of Storage
The New Era of Cognitive Computing
Das IBM Forschungslabor als Arbeitgeber
Nano, SuperMUC and Photovoltaics:A Day in the Life of IBM Research - Zurich
Meet IBM Research
Dechema Conference: Istanbul
IBM Research: IBM 2010 Investor Briefing

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Modernizing your data center with Dell and AMD
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
MYSQL Presentation for SQL database connectivity
Chapter 2 Digital Image Fundamentals.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
Dropbox Q2 2025 Financial Results & Investor Presentation
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
20250228 LYD VKU AI Blended-Learning.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift

IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver

  • 1. The energy-efficient, high performance 64bit DOME µserver - an industry inclination point Ronald P. Luijten – Data Motion Architect [email protected] IBM Research - Zurich 12 June 2015 DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
  • 2. COMPUTE is FREE – DATA is NOT Ronald P. Luijten – Data Motion Architect [email protected] IBM Research - Zurich 12 June 2015 DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
  • 3. DOME: • ppp Astron, IBM, Dutch gvt • 20MEur gvt funding over 5 years • Started feb 2012 Ronald P. Luijten – Analyst Meeting – 12 June 2015 3
  • 4. © 2012 IBM Corporation SKA (Square Kilometer Array) to measure Big Bang Picture source: NZZ march 2014 Big Bang Inflation Protons created Start of nucleosynthesis through fusion End of nucleo- synthesis Modern Universe 0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years Ronald P. Luijten – Analyst Meeting – 12 June 2015 4
  • 5. DOME µServer Motivation & Objectives •Create the worlds highest density 64 bit µ-server drawer –Useful to evaluate both SKA radio-astronomy and IBM future business –Platform for Business Analytics appliance pre-product research –High energy efficiency / very low cost –Commodity components, HW + SW standards based –Leverage ‘free computing’ paradigm –Enhance with ‘Value Add’: packaging, system integration, … –Density and speed of light •Most efficient cooling using IBM technology (ref: SuperMUC June 2012 TOP500 machine) •Must be true 64 bit to enable business applications •Must run server class OS (SLES11 or RHEL6, or equivalent) –Precluded ARM (64-bit Silicon was not available) –PPC64 is available in SoC from FSL since 2011 –(no $$$ to build a new SoC…) •This is the DOME project capability demonstrator – not a product Ronald P. Luijten – Analyst Meeting – 12 June 2015 5
  • 6. Definition µServer: The integration of an entire server node motherboard* into a single microchip except DRAM, Nor-boot flash and power conversion logic. 305mm 245mm 139mmx55mm * no graphics Ronald P. Luijten – Analyst Meeting – 12 June 2015 6
  • 7. Definition µServer: The integration of an entire server node motherboard* into a single microchip except DRAM, Nor-boot flash and power conversion logic. 305mm 245mm 139mmx55mm This does NOT imply low performance! * no graphics Ronald P. Luijten – Analyst Meeting – 12 June 2015 7
  • 8. T4240 Chip Overview 12 core – fully dual threaded 1.8 GHz ppc64 (e6500) 12 DP-FPU; 12 128b Altivec 3 DDR3 channels at 1.86GT/s 3x 0.5MB L3 cache 4x 10GbE + 2x SATA PCIe 3.0 HW packet acceleration RegEx Pattern Match acc. Crypto acceleration 28nm TSMC Bulk CMOS 239mm2 - ~1.7B transistors 111Mbit SRAM, 6M FF 7 Power states (2 power gating) Ronald P. Luijten – Analyst Meeting – 12 June 2015 8
  • 9. DOME compute node board diagram T4240 16GB DRAM 72bit 16GB DRAM 72bit PSoC 1Gbit SPI flash Power converter USB JTAG Serial I2C 4 x 10 GbE PCIe x8 2 x SATA 16GB DRAM 72bit 1866 MT/s 1866 MT/s 1866 MT/s 1V / 40A 12V / 2.5A Ronald P. Luijten – Analyst Meeting – 12 June 2015 9
  • 10. DOME compute node board diagram T4240 DRAM DRAM PSoC SPI flash Power converter USB JTAG Serial I2C 4 x 10 GbE PCIe x8 2 x SATA DRAM 12V / 2.5A PSOC collapses 6 functions into a small chip to save Area, Power and Cost 1. On/Off and Power up sequencing 2. Provide uServer boot configuration 3. JTAG debug access 4. Serial port access (Linux) 5. Temperature monitoring and protection • and current measurement 6. Management interface and control Ronald P. Luijten – Analyst Meeting – 12 June 2015 10
  • 11. 133 mm 30 mm Standard 240 pin DDR3 memory DIMM board 133 mm 55 mm 139 mm P5020 SoC P5020/P5040 (Generation 1) T4240 Generation 2 (Lid Removed) 139 mm DOME Compute node board form factor 55mm FRONT BACK Decoupling Capacitors area (lid removed) T4240 SoC Ronald P. Luijten – Analyst Meeting – 12 June 2015 11
  • 12. Planned System: 2U rack unit 19” 2U Chassis w/ Combined Cooling & Power 128 compute node boards 1536 cores / 3072 Threads 6 TB DRAM 1.28Tbps Ethernet (@40Gbps) Datacenter-in-a-box • Expected 2U unit total power: ~ 6kW • Integrated mains power converter to 12V distribution: 12V / 500A • Each compute node has own 12V / 40W converter • Common Power Converter boards for all other supplies • High radix 10GbE / 40GbE switch boards (under construction) • Connects to Mains, Rack level Water, 32x 40Gbps Ethernet • Hot-water cooled for efficiency and density Ronald P. Luijten – Analyst Meeting – 12 June 2015 12
  • 13. Cooling variant Inlet water [C] Junction temp Tj [C] Measured thermal Res. [K/W] Maximum cooling capacity [W] OF R240 Cu, no heat pipe 45 85 1.11 36 OF R240 Cu with heat pipe 45 85 0.85 47 OF R240 Cu with heat pipe 45 75 0.85 36 Electrical + Thermal Interface Water In Water In Compute Nodes 3 layer Laminated Copper Plate FR4 Carrier SoC Node Cooling Design & Validation Power converter boards Storage boards CeBIT Demo, april 15 Ronald P. Luijten – Analyst Meeting – 12 June 2015 13
  • 14. Performance Measurement Results CPU Freescale T4240 12 cores; 24 thr. 28nm Bulk Intel Xeon E3-1230L v3 4 cores; 8 threads 22nm FinFet CPU2006 Benchmark Test Environment System: T4240RDB-PB 1.666 GHz core clock, 1.866 GT/s 6GB DRAM, 3 channels Fedora 20, Kernel 3.12.19 GCC 4.7.2 gcc options: -O3 -mcpu=powerpc64 System: Supermicro X10SAE 1.8 GHz core clock; Turbo disabled 1.666 GT/s 8 GB DRAM, 2 channels Fedora 19, Kernel 3.13.9 GCC 4.8.2 gcc options: -O3 -march=native -mtune=native CINT-base – 1 thread 6.86 20.7 CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads) Coremark - all threads 188K (24 threads) 65K (8 threads) Ronald P. Luijten – Analyst Meeting – 12 June 2015 14
  • 15. Performance Measurement Results CPU Freescale T4240 12 cores; 24 thr. 28nm Bulk Intel Xeon E3-1230L v3 4 cores; 8 threads 22nm FinFet CPU2006 Benchmark Test Environment System: T4240RDB-PB 1.666 GHz core clock, 1.866 GT/s 6GB DRAM, 3 channels Fedora 20, Kernel 3.12.19 GCC 4.7.2 gcc options: -O3 -mcpu=powerpc64 System: Supermicro X10SAE 1.8 GHz core clock; Turbo disabled 1.666 GT/s 8 GB DRAM, 2 channels Fedora 19, Kernel 3.13.9 GCC 4.8.2 gcc options: -O3 -march=native -mtune=native CINT-base – 1 thread 6.86 20.7 CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads) Coremark - all threads 188K (24 threads) 65K (8 threads) 40% more performance @ 70% of node level energy consumption 2x more operations per Watt Ronald P. Luijten – Analyst Meeting – 12 June 2015 15
  • 16. Power Measurement Results Power measurement on rev 1 board #5, on 7 + 8 april 2015; PSoC firmware 2-mar-15 current measurements at 12V input of power converters, T4240 temp < 65C voltage domain current measured @ 12V input condition mA W mA W A W W PSOC only power 3.4 0.0408 74 0.888 0.0008 0.0096 0.9384 T4240 power on, kept in reset 75 0.9 152 1.824 0.32 3.84 6.564 u-boot prompt (idle) 77.6 0.9312 350 4.2 1.48 17.76 22.8912 Linux prompt, idle system 77.6 0.9312 315 3.78 1.58 18.96 23.6712 BW_MEM512M, 24 thr 77.3 0.9276 450 5.4 1.65 19.8 26.1276 stream, 24 thread 77.3 0.9276 470 5.64 1.65 19.8 26.3676 BW_MEM512, 24 thr 77.7 0.9324 320 3.84 2.53 30.36 35.1324 idle at XCFE desktop 77.7 0.9324 320 3.84 1.6 19.2 23.9724 SpecInt PerlBench, 24 thr 77.8 0.9336 400 4.8 2.63 31.56 37.2936 SpecInt PerlBench, 12 thr 78 0.936 355 4.26 2.2 26.4 31.596 SpecInt gcc, 12 thr 78 0.936 416 4.992 1.7 20.4 26.328 total node 1V0 coreDRAM1V8 I/O Ronald P. Luijten – Analyst Meeting – 12 June 2015 16
  • 17. Remarks New Big-Data Metric: Memory BW density use raw memory BW available at SoC or CPU divide by volume of entire enclosure, incl. HDD, PCI slots DOME 128node 2U rack unit: 159GB/s/Liter (peak) P8 server S822L (dual socket): 13.9GB/s/Liter (peak) • New era – perfect storm and Innovators Dilemma • µServer is all about SoC and packaging • This is a serendipitous data point Ronald P. Luijten – Analyst Meeting – 12 June 2015 17
  • 18. Status and Plans Until YE 2015 2016: a new compute node Beyond 2016 Ronald P. Luijten – Analyst Meeting – 12 June 2015 18
  • 19. LIVE DEMO We demonstrate a single node running: • Fedora 20 • XFCE Desktop • Stream • CPMD • And… live 1V domain current measurement We show a revision-1 board T4240ZMS compute server: • Larger than DOME form factor, same netlist • All components on top side (save bring-up time and expense) • Air-cooled for single node operation compute node mini BaseBoard Ronald P. Luijten – Analyst Meeting – 12 June 2015 19
  • 20. T4240ZMS rev 1 board Single node baseboard GbE SATA (mSata slot, SATA data connector) Various power supplies USB uSD card mSATAPHY POWER SUPPLIES DEMO SETUP Ronald P. Luijten – Analyst Meeting – 12 June 2015 20
  • 21. R.Luijten, 8 Jan 2015 88E1111 PHY 192.168.1.152 NFS server 192.168.1.1 DHCP / TFTP server T4240 24 HW thread 1625MHz DRAM 8GB 1500MT DRAM 8GB 1500MT PSoC SPI flash Power converter USB JTAG Serial I2C 1GbE DRAM 8GB 1500MT T4240ZMS node: -Revision 1 board -slower speeds -less memory DIMM connector A192.168.1.240 Management console (current on 1V supply) Serial console VNC client into ZMS Showing virtual desktop Browser with DB2 / WMD DEMO SETUP
  • 22. SKA: http://www.skatelescope.org DOME: http://www.dome-exascale.nl µServer: http://www.zurich.ibm.com/microserver T4240 system: http://swissdutch.ch:6999 Wikipedia: https://en.wikipedia.org/wiki/Microserver Twitter: https://twitter.com/ronaldgadget Videos: Impossible µServer: http://t.co/4vEkEVEazO Innovators Dilemma: http://youtu.be/imweQe8NgnI DOME T4240 Fedora: http://youtu.be/D6da5DqcyQk 4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density © 2015 IEEE International Solid-State Circuits Conference 22 of 15 Links Ronald P. Luijten – Analyst Meeting – 12 June 2015 22
  • 23. Acknowledgements This work is the results of many people • Peter v. Ackeren, FSL • Ed Swarthout, FSL Austin • Dac Pham, FSL Austin • Yvonne Chan, IBM Toronto • Andreas Doering, IBM ZRL • Alessandro Curioni, IBM ZRL • Stephan Paredes, IBM ZRL • Matteo Cossale, IBM ZRL • James Nigel, FSL • Boris Bialek, IBM Toronto • Marco de Vos, Astron NL • Vipin Patel, IBM Fishkill • And many more remain unnamed…. Companies: FSL Austin, Belgium & Germany; IBM worldwide; Transfer - NL Ronald P. Luijten – Analyst Meeting – 12 June 2015 23
  • 24. “Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density”, R.Luijten et al., ISSCC15, San Francisco, Feb 2015 “The DOME embedded 64 bit microserver demonstrator”, R. Luijten and A. Doering, ICICDT 2013, Pavia, Italy, May 2013 “Quantitative Analysis of the Berkeley Dwarfs' Parallelism and Data Movement Properties”, Victoria Caparros Cabezas, Phillip Stanley-Marbell, ACM CF 2011, May 2011 “Performance, Power, and Thermal Analysis of Low-Power Processors for Scale- Out Systems”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, IEEE HPPAC 2011, May 2011 “Pinned to the Walls—Impact of Packaging and Application Properties on the Memory and Power Walls”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, Ronald P. Luijten, IEEE ISLPED 2011, Aug 2011. 4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density © 2015 IEEE International Solid-State Circuits Conference 24 of 15 Literature Ronald P. Luijten – Analyst Meeting – 12 June 2015 24
  • 25. Questions??? PS. I like lightweight things µServer website: www.swissdutch.ch Ronald P. Luijten – Analyst Meeting – 12 June 2015 25