SlideShare a Scribd company logo
Allan Cantle - 3/25/2021


a.cantle@nallasway.com
OMI - Open Memory Interface
The Missing Piece of a Disaggregated


Modular, Flexible & Composable Computing World
Ethernet
or
PCIe
with
future
support
for
CXL
Resources are allocated & managed
with composer software, allocating
appropriate resources to each virtual
machine driven Workload
What is Disaggregation?
An attempt to free up Stranded Resources for di
ff
ering Workloads
Disaggregating Local DDR Memory


into Memory Pools has


proven to be more challenging


.


Latency Critical


.


RAS Issues
A A A A
A
A A A
C
C
C
C
C
C
C
C
M
M
M
M
M
M
M
M
S
S
S
S
S
S
S
S
IO
IO
IO
IO
IO
IO
IO
IO
Rack Scale Disaggregation
S
C M
M
M
C
IO IO
A
S S
S S S
S S
Classic Server with a


Typical Converged Infrastructure
A
IO
M = Memory
= Accelerator
= Input/Output C = Compute
S = Storage • Downsides of Rack Scale Disaggregation


• Increased Power through more data
movement


• Increased stress on Network in order
to enable software composability
Focus of this Webinar
C C C C
M M M M
What is Disaggregation?
An attempt to free up Stranded Resources for di
ff
ering Workloads
A A A A
A
A A A
M M M M
S
S
S
S
S
S
S
S
IO
IO
IO
IO
IO
IO
IO
IO
Rack Scale Disaggregation
S
C M
M
M
C
IO IO
A
S S
S S S
S S
Classic Server with a


Typical Converged Infrastructure
A
IO
M = Memory
= Accelerator
= Input/Output C = Compute
S = Storage
Focus of this Webinar
C C C C
Memory Inception


over OpenCAPI
Thymesis Flow
Why do Workloads differ in resource needs?
Di
ff
ering Compute ratios - Ops : Bytes/sec : Bytes Capacity
Workload Type Processor Centric Balanced (Classic) Data Centric
Compute Ratio 100 : 1 : 0.01 1 : 1 : 1 0.01 : 1 : 100
Architectural Example


M = HBM M = DRAM
A M C M
S
S
S
S S
C
S
IO IO IO
S
If you Compose a
Computer to
Application A’s
exact needs


——>


you now have a
“Domain Speci
f
ic
Architecture”
Building
Application A’s
exact architecture
requires Modularity,
Flexibility &
Composability


EVERYWHERE
IO
IO
Algo 1


100 : 1 : 0.01
Algo 2


1 : 1 : 1 Algo 3


0.01 : 1 : 100
Algo 4


100 : 1 : 0.01
Applications often have di
ff
ering
workload Algorithms running in
parallel with di
ff
ering IO bandwidth
between them
IO
IO
IO
IO
IO
Application A
The Compute Ratio
Hierarchically repeats itself
in a Fractal like manner
IO
Disaggregation without the Downsides
Physically Modular, Flexible and Composable
IO
IO
Algo 1


100 : 1 : 0.01
Algo 2


1 : 1 : 1 Algo 3


0.01 : 1 : 100
Algo 4


100 : 1 : 0.01
IO
IO
IO
IO
IO
IO
Application A
• Lego Block construction of Compute Ratios at the system level


• Domain Speci
f
ic Implementations could be quickly con
f
igured and tested


• Dense Modularity & Distributed Computing minimize data movement power
Disaggregation without the Downsides
Physically Modular, Flexible and Composable
Application A
• Lego Block construction of Compute Ratios at the system level


• Domain Speci
f
ic Implementations could be quickly con
f
igured and tested


• Dense Modularity & Distributed Computing minimize data movement power
DDR a Parallel Interface in Serial World
Unsuitable for physically Composable Systems
• Network, Memory, Media modules & IO use Common Serial EDSFF Interconnect
OCP - NIC 3.0
SNIA - E1.S & E3.S
Typically < 100W
S
IO
OMI DDIMM
M
IO
M
DDR DIMM
OMI DDIMM Overview
OMI O
ff
ers far more than just Composability
• <10ns Latency Adder over a standard DDR4 RDIMM


• In production since mid 2019 from Samsung, Smart Modular & Micron


• Memory Technology Agnostic - e.g. Easy processor migration from DDR4 to DDR5


• Up to 8x more memory bandwidth per processor - (2x BW/DDIMM + 4x No. Channels)


• Ecosystem Enablement with FPGA Host and Buffer Bringup Platform


• Fully Open Sourced IP for Host Controller and OMI Memory Buffer (RTL in Github)


• Functional and Memory Traf
f
ic Generator IP for testing purposes
16GB / 32GB - 1U High 64GB - 2U High 128GB - 3U High
32GB 3200 DDR4 DIMM (reference)
OMI DDIMMs Formats
8 Industry Standard
DDR4 Channels
8 OMI Channels
(Equivalent Pin
Comparison)
Apollo - FPGA Host Controller Bringup Platform
Gemini - OMI
DDIMM with
FPGA Bu
ff
er
IBM - POWER10 Die
The OMI Advantage
Memory Bandwidth AND Capacity at LOW Cost
To Scale = 20pts : 1mm
AMD - EPYC Rome IO Die
4x (2x2) DDR4 3200
DIMM Channels
= 102GBytes/s
8x (8x1) OMI 25G
DDIMM Channels
= 400GB/s
8x (8x1) OMI 25G
DDIMM Channels
= 400GB/s 1x HBM2
8x (8x1) Channels
= 311GB/s
HBM2
HBM2
HBM2
HBM2
HBM2
HBM2
NVidia - Ampere Die
Fully Composable Compute Node Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• FPGA Main Processor Example


• supports many composable Memory / Storage / IO con
f
igurations
22x 2C Connectors


For Memory, Storage &/or IO
Xilinx VU37P or VU13P


OAM
-
HPC Module Top View
EDSFF TA
-
1002 Interconnect


OAM
-
HPC Module Bottom View
OAM
-
HPC Module Bottom View


with many IO, Memory & Media
Con
f
igurations
IBM POWER 10 OAM
-
HPC Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• Form Factor con
f
igurable for IBM POWER 10 Processor


• Already supports the composable OMI DDIMM


• Cabled IO for OpenCAPI, SMP, and/or PCIe
22x 2C Connectors


For Memory, Storage &/or IO
IBM POWER 10


OAM
-
HPC Module Top View
EDSFF TA
-
1002 Interconnect


OAM
-
HPC Module Bottom View
OAM
-
HPC Module Bottom View


Populated with


16x OMI DDIMMs + Cabled IO
Memory Pooling Example Con
f
iguration
Pluggable into OCP OAI Chassis
Thank you
Want to
f
ind out more about OMI or the OAM
-
HPC Module?


Please contact Allan Cantle at a.cantle@nallasway.com

More Related Content

What's hot (20)

PPTX
Nand flash memory
Mohamed Fadel Buffon
 
PPTX
Heterogeneous Integration with 3D Packaging
AMD
 
PPTX
MemVerge: The Software Stack for CXL Environments
Memory Fabric Forum
 
PDF
DesignCon 2019 112-Gbps Electrical Interfaces: An OIF Update on CEI-112G
Leah Wilkinson
 
PPTX
Eco
Rajesh M
 
PPT
Basics Of Semiconductor Memories
Rahul Bandhe
 
PPSX
System on chip architectures
Dr. A. B. Shinde
 
PDF
Pci express technology 3.0
Biddika Manjusree
 
PDF
Write miss
marangburu42
 
PPT
System On Chip (SOC)
Shivam Gupta
 
PDF
Verification Strategy for PCI-Express
DVClub
 
PDF
MIPI DevCon 2016: MIPI D-PHY - Physical Layer Test & Measurement Challenges
MIPI Alliance
 
PPTX
Designing memory controller for ddr5 and hbm2.0
Deepak Shankar
 
PPTX
Ppt on embedded system
Pankaj joshi
 
PPTX
AXI Protocol.pptx
Yazan Yousef
 
PPTX
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
 
PDF
Unit 5
sowndaryaM5
 
PDF
ARM architcture
Hossam Adel
 
PPTX
Difference between PCI PCI-X PCIe
SUNODH GARLAPATI
 
PPTX
RISC-V Introduction
Yi-Hsiu Hsu
 
Nand flash memory
Mohamed Fadel Buffon
 
Heterogeneous Integration with 3D Packaging
AMD
 
MemVerge: The Software Stack for CXL Environments
Memory Fabric Forum
 
DesignCon 2019 112-Gbps Electrical Interfaces: An OIF Update on CEI-112G
Leah Wilkinson
 
Basics Of Semiconductor Memories
Rahul Bandhe
 
System on chip architectures
Dr. A. B. Shinde
 
Pci express technology 3.0
Biddika Manjusree
 
Write miss
marangburu42
 
System On Chip (SOC)
Shivam Gupta
 
Verification Strategy for PCI-Express
DVClub
 
MIPI DevCon 2016: MIPI D-PHY - Physical Layer Test & Measurement Challenges
MIPI Alliance
 
Designing memory controller for ddr5 and hbm2.0
Deepak Shankar
 
Ppt on embedded system
Pankaj joshi
 
AXI Protocol.pptx
Yazan Yousef
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
 
Unit 5
sowndaryaM5
 
ARM architcture
Hossam Adel
 
Difference between PCI PCI-X PCIe
SUNODH GARLAPATI
 
RISC-V Introduction
Yi-Hsiu Hsu
 

Similar to OMI - The Missing Piece of a Modular, Flexible and Composable Computing World (20)

PDF
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
Vaibhav R
 
PDF
OpenPOWER Summit 2020 - OpenCAPI Keynote
Allan Cantle
 
PDF
OpenCAPI Technology Ecosystem
Ganesan Narayanasamy
 
PPTX
Q1 Memory Fabric Forum: Big Memory Computing for AI
Memory Fabric Forum
 
PDF
Flexible and Scalable Domain-Specific Architectures
Netronome
 
PDF
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Alex Rasmussen
 
PDF
Memory management
Adrien Mahieux
 
PPTX
11-memory-hieoohhnooooohhnooooohhnooooohhnooorarchy.pptx
owaisalvi42
 
PPT
Design an I/O system
AARTHI SEETHA
 
PDF
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Memory Fabric Forum
 
PPTX
Linux Memory Management with CMA (Contiguous Memory Allocator)
Pankaj Suryawanshi
 
PDF
Advanced High-Performance Computing Features of the Open Power ISA
Ganesan Narayanasamy
 
PPT
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
PDF
4.1 Introduction 145• In this section, we first take a gander at a.pdf
arpowersarps
 
PDF
What every-programmer-should-know-about-memory
xan peng
 
PPTX
GigaIO: The March of Composability Onward to Memory with CXL
Memory Fabric Forum
 
PPTX
The State of CXL-related Activities within OCP
Memory Fabric Forum
 
PPTX
High speed I/O
rafiul_ahmed
 
PDF
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
Alex Rasmussen
 
PDF
TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
chiportal
 
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
Vaibhav R
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
Allan Cantle
 
OpenCAPI Technology Ecosystem
Ganesan Narayanasamy
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Memory Fabric Forum
 
Flexible and Scalable Domain-Specific Architectures
Netronome
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Alex Rasmussen
 
Memory management
Adrien Mahieux
 
11-memory-hieoohhnooooohhnooooohhnooooohhnooorarchy.pptx
owaisalvi42
 
Design an I/O system
AARTHI SEETHA
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Memory Fabric Forum
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Pankaj Suryawanshi
 
Advanced High-Performance Computing Features of the Open Power ISA
Ganesan Narayanasamy
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
4.1 Introduction 145• In this section, we first take a gander at a.pdf
arpowersarps
 
What every-programmer-should-know-about-memory
xan peng
 
GigaIO: The March of Composability Onward to Memory with CXL
Memory Fabric Forum
 
The State of CXL-related Activities within OCP
Memory Fabric Forum
 
High speed I/O
rafiul_ahmed
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
Alex Rasmussen
 
TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
chiportal
 
Ad

Recently uploaded (20)

DOCX
DK DT50W-17 battery tester Operating instruction of upper computer software 2...
ye Evan
 
PPTX
Series.pptxvvggghgufifudududydydydudyxyxyx
jasperbernaldo3
 
PPT
CCTV SYSTEM Installation and Setup method
radioindorezonecctv
 
PPTX
Pranjal Accountancy hhw ppt.pptxbnhxududjylitzitzyoxtosoysitztd
nishantrathore042
 
PPTX
原版澳洲莫道克大学毕业证(MU毕业证书)如何办理
Taqyea
 
PPT
Computer Hardware and Software Hw and SW .ppt
MuzaFar28
 
PPTX
Cybersecurity_Office_everyday Presentation.pptx
widpra
 
PDF
Development of Portable Spectometer For MIlk Qulaity analysis
ppr9495
 
PPTX
ualities-of-Quantitative-Research-1.pptx
jamjamkyong
 
PPT
476017990-IFRS-15-Revenue-from-Contracts-with-Customers-PPT-ppt.ppt
mehedifoysshal
 
PPTX
UWE文凭办理|办理西英格兰大学毕业证成绩单GPA修改仿制
Taqyea
 
PDF
ELS-04-Juli-2025.pdf....................
adisantoso59
 
PPT
COMBINATIONAL LOGIC DESIGN SADSADASDASDASDASDASDASDA
phmthai2300
 
PPTX
8th sem Final review.pptx about to be publishhed
NikhilHRaju
 
PPTX
一比一原版(UoB毕业证)布莱德福德大学毕业证如何办理
Taqyea
 
PDF
X-Station 2 Finger_UG_1.03_EN_240117.0.pdf
AhmedEssam240285
 
PPTX
Dnddnndjsjssjjdsjjddjjjdjdjdjdjddjjdjdj.pptx
Nandy31
 
PPTX
Flannel graphFlannel graphFlannel graphFlannel graphFlannel graph
shareesh25
 
PPTX
Dock Line Organization Made Easy – Discover AMARREX, the Mooring Line Holder ...
Seawatt
 
PDF
LINAC CANCER TREATMENT LINEAR ACCELERATOR
nabeehasahar1
 
DK DT50W-17 battery tester Operating instruction of upper computer software 2...
ye Evan
 
Series.pptxvvggghgufifudududydydydudyxyxyx
jasperbernaldo3
 
CCTV SYSTEM Installation and Setup method
radioindorezonecctv
 
Pranjal Accountancy hhw ppt.pptxbnhxududjylitzitzyoxtosoysitztd
nishantrathore042
 
原版澳洲莫道克大学毕业证(MU毕业证书)如何办理
Taqyea
 
Computer Hardware and Software Hw and SW .ppt
MuzaFar28
 
Cybersecurity_Office_everyday Presentation.pptx
widpra
 
Development of Portable Spectometer For MIlk Qulaity analysis
ppr9495
 
ualities-of-Quantitative-Research-1.pptx
jamjamkyong
 
476017990-IFRS-15-Revenue-from-Contracts-with-Customers-PPT-ppt.ppt
mehedifoysshal
 
UWE文凭办理|办理西英格兰大学毕业证成绩单GPA修改仿制
Taqyea
 
ELS-04-Juli-2025.pdf....................
adisantoso59
 
COMBINATIONAL LOGIC DESIGN SADSADASDASDASDASDASDASDA
phmthai2300
 
8th sem Final review.pptx about to be publishhed
NikhilHRaju
 
一比一原版(UoB毕业证)布莱德福德大学毕业证如何办理
Taqyea
 
X-Station 2 Finger_UG_1.03_EN_240117.0.pdf
AhmedEssam240285
 
Dnddnndjsjssjjdsjjddjjjdjdjdjdjddjjdjdj.pptx
Nandy31
 
Flannel graphFlannel graphFlannel graphFlannel graphFlannel graph
shareesh25
 
Dock Line Organization Made Easy – Discover AMARREX, the Mooring Line Holder ...
Seawatt
 
LINAC CANCER TREATMENT LINEAR ACCELERATOR
nabeehasahar1
 
Ad

OMI - The Missing Piece of a Modular, Flexible and Composable Computing World

  • 1. Allan Cantle - 3/25/2021 [email protected] OMI - Open Memory Interface The Missing Piece of a Disaggregated Modular, Flexible & Composable Computing World
  • 2. Ethernet or PCIe with future support for CXL Resources are allocated & managed with composer software, allocating appropriate resources to each virtual machine driven Workload What is Disaggregation? An attempt to free up Stranded Resources for di ff ering Workloads Disaggregating Local DDR Memory into Memory Pools has proven to be more challenging . Latency Critical . RAS Issues A A A A A A A A C C C C C C C C M M M M M M M M S S S S S S S S IO IO IO IO IO IO IO IO Rack Scale Disaggregation S C M M M C IO IO A S S S S S S S Classic Server with a Typical Converged Infrastructure A IO M = Memory = Accelerator = Input/Output C = Compute S = Storage • Downsides of Rack Scale Disaggregation • Increased Power through more data movement • Increased stress on Network in order to enable software composability Focus of this Webinar
  • 3. C C C C M M M M What is Disaggregation? An attempt to free up Stranded Resources for di ff ering Workloads A A A A A A A A M M M M S S S S S S S S IO IO IO IO IO IO IO IO Rack Scale Disaggregation S C M M M C IO IO A S S S S S S S Classic Server with a Typical Converged Infrastructure A IO M = Memory = Accelerator = Input/Output C = Compute S = Storage Focus of this Webinar C C C C Memory Inception over OpenCAPI Thymesis Flow
  • 4. Why do Workloads differ in resource needs? Di ff ering Compute ratios - Ops : Bytes/sec : Bytes Capacity Workload Type Processor Centric Balanced (Classic) Data Centric Compute Ratio 100 : 1 : 0.01 1 : 1 : 1 0.01 : 1 : 100 Architectural Example M = HBM M = DRAM A M C M S S S S S C S IO IO IO S If you Compose a Computer to Application A’s exact needs ——> you now have a “Domain Speci f ic Architecture” Building Application A’s exact architecture requires Modularity, Flexibility & Composability EVERYWHERE IO IO Algo 1 100 : 1 : 0.01 Algo 2 1 : 1 : 1 Algo 3 0.01 : 1 : 100 Algo 4 100 : 1 : 0.01 Applications often have di ff ering workload Algorithms running in parallel with di ff ering IO bandwidth between them IO IO IO IO IO Application A The Compute Ratio Hierarchically repeats itself in a Fractal like manner IO
  • 5. Disaggregation without the Downsides Physically Modular, Flexible and Composable IO IO Algo 1 100 : 1 : 0.01 Algo 2 1 : 1 : 1 Algo 3 0.01 : 1 : 100 Algo 4 100 : 1 : 0.01 IO IO IO IO IO IO Application A • Lego Block construction of Compute Ratios at the system level • Domain Speci f ic Implementations could be quickly con f igured and tested • Dense Modularity & Distributed Computing minimize data movement power
  • 6. Disaggregation without the Downsides Physically Modular, Flexible and Composable Application A • Lego Block construction of Compute Ratios at the system level • Domain Speci f ic Implementations could be quickly con f igured and tested • Dense Modularity & Distributed Computing minimize data movement power
  • 7. DDR a Parallel Interface in Serial World Unsuitable for physically Composable Systems • Network, Memory, Media modules & IO use Common Serial EDSFF Interconnect OCP - NIC 3.0 SNIA - E1.S & E3.S Typically < 100W S IO OMI DDIMM M IO M DDR DIMM
  • 8. OMI DDIMM Overview OMI O ff ers far more than just Composability • <10ns Latency Adder over a standard DDR4 RDIMM • In production since mid 2019 from Samsung, Smart Modular & Micron • Memory Technology Agnostic - e.g. Easy processor migration from DDR4 to DDR5 • Up to 8x more memory bandwidth per processor - (2x BW/DDIMM + 4x No. Channels) • Ecosystem Enablement with FPGA Host and Buffer Bringup Platform • Fully Open Sourced IP for Host Controller and OMI Memory Buffer (RTL in Github) • Functional and Memory Traf f ic Generator IP for testing purposes 16GB / 32GB - 1U High 64GB - 2U High 128GB - 3U High 32GB 3200 DDR4 DIMM (reference) OMI DDIMMs Formats 8 Industry Standard DDR4 Channels 8 OMI Channels (Equivalent Pin Comparison) Apollo - FPGA Host Controller Bringup Platform Gemini - OMI DDIMM with FPGA Bu ff er
  • 9. IBM - POWER10 Die The OMI Advantage Memory Bandwidth AND Capacity at LOW Cost To Scale = 20pts : 1mm AMD - EPYC Rome IO Die 4x (2x2) DDR4 3200 DIMM Channels = 102GBytes/s 8x (8x1) OMI 25G DDIMM Channels = 400GB/s 8x (8x1) OMI 25G DDIMM Channels = 400GB/s 1x HBM2 8x (8x1) Channels = 311GB/s HBM2 HBM2 HBM2 HBM2 HBM2 HBM2 NVidia - Ampere Die
  • 10. Fully Composable Compute Node Module Leveraged from OCP’s OAM Module - nicknamed OAM - HPC • FPGA Main Processor Example • supports many composable Memory / Storage / IO con f igurations 22x 2C Connectors For Memory, Storage &/or IO Xilinx VU37P or VU13P OAM - HPC Module Top View EDSFF TA - 1002 Interconnect OAM - HPC Module Bottom View OAM - HPC Module Bottom View with many IO, Memory & Media Con f igurations
  • 11. IBM POWER 10 OAM - HPC Module Leveraged from OCP’s OAM Module - nicknamed OAM - HPC • Form Factor con f igurable for IBM POWER 10 Processor • Already supports the composable OMI DDIMM • Cabled IO for OpenCAPI, SMP, and/or PCIe 22x 2C Connectors For Memory, Storage &/or IO IBM POWER 10 OAM - HPC Module Top View EDSFF TA - 1002 Interconnect OAM - HPC Module Bottom View OAM - HPC Module Bottom View Populated with 16x OMI DDIMMs + Cabled IO
  • 12. Memory Pooling Example Con f iguration Pluggable into OCP OAI Chassis
  • 13. Thank you Want to f ind out more about OMI or the OAM - HPC Module? Please contact Allan Cantle at [email protected]