0% found this document useful (0 votes)

15 views

SuperPod reference architecture

Uploaded by

ljwsiam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

SuperPod reference architecture

Uploaded by

ljwsiam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

System within one node

GPU:NIC = 8:8

IB NIC 200 Gb/s

PCIE Gen4 64 GB/s

2x6=12 NVlink
400 GB/s

6 NVswitch
Scalable Unit (SU)
1SU = 20 nodes

A800
Computing Network Architecture
With HDR IB (200Gbps)

140 nodes 80 nodes

For more info, please refer to

SuperPOD reference architecture

Whitepaper
MLPERF TRAINING 0.7
NVIDIA Selene with DGX A100 (40GB)
Tested in 2020 Q3.
• Pure data parallelism.
• Up to 43% perf gains for MLPerf Bert training with 8*HCAs VS 1*HCA, on 128 nodes scale.
• Up to 30% perf gains for MLPerf RN50 training with 8*HCAs VS 1*HCA, on 230 nodes scale.

5
OPTIMIZED IMPLEMENTATION
7.5B Model, 32 Nodes
7.5B model train with Megatron on 32 nodes
• 32 nodes: TPS=4, PPS=1, DPS=64 (TPS=4, PPS=1, DPS=64)

• Forward-compute and backward-compute includes all-reduce 18000

within each tensor model parallel group 16000

• AllReduce time within each data parallel group is significant 14000

Elapsed Time Per Step (ms)

12000

• 1HCA, 2HCAs: Extremely bad, bounded by communication perf. 10000

Some GPUs have to go through SMP interconnect to reach HCA. Bad GDR perf. 8000

6000

• 8HCAs VS 4HCAs: 6.6% improvement. 4000

Mainly from improvement of AllReduce for gradients in data parallelism group. 2000

1HCA 2HCA 4HCA 8HCA

0
Forward-compute Backward-compute All reduce (Data P) Optimizer

global_batch_size=512
10

Four or Dead 1 PDF
17% (71)
Four or Dead 1 PDF
9 pages
Michael Roberts - The Military Revolution, 1560-1660
100% (4)
Michael Roberts - The Military Revolution, 1560-1660
23 pages
Hpealletra 9000 and 6000 Storagesizingandconfigurationbestpractices
No ratings yet
Hpealletra 9000 and 6000 Storagesizingandconfigurationbestpractices
35 pages
Storage Huawei All
No ratings yet
Storage Huawei All
268 pages
Delegation
100% (2)
Delegation
58 pages
The Virginia Class Submarine Program
No ratings yet
The Virginia Class Submarine Program
12 pages
Heroux App Perf On Multicores Mantevo Project SAND2008-1085P 020408
No ratings yet
Heroux App Perf On Multicores Mantevo Project SAND2008-1085P 020408
21 pages
Tutorial On TI C6678
No ratings yet
Tutorial On TI C6678
65 pages
New Product in 2011
No ratings yet
New Product in 2011
17 pages
16Gb DDR4 SDRAM
No ratings yet
16Gb DDR4 SDRAM
379 pages
GPU Bootcamp Samhar
100% (1)
GPU Bootcamp Samhar
96 pages
FPGA Vs DSP For Power Electronics
No ratings yet
FPGA Vs DSP For Power Electronics
29 pages
1729 Psoc 3 Cy8c38 Programmable System-On-Chip Datasheet
No ratings yet
1729 Psoc 3 Cy8c38 Programmable System-On-Chip Datasheet
141 pages
20231130_IntroductionToAISystems
No ratings yet
20231130_IntroductionToAISystems
29 pages
OMAP3530 Red Card
No ratings yet
OMAP3530 Red Card
265 pages
Aprady PDF
No ratings yet
Aprady PDF
374 pages
JetsonNano DataSheet DS09366001v1.1
No ratings yet
JetsonNano DataSheet DS09366001v1.1
41 pages
NVIDIA Jetson Nano System-on-Module: Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC
No ratings yet
NVIDIA Jetson Nano System-on-Module: Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC
38 pages
1 DM3730, DM3725 Applications Processor
No ratings yet
1 DM3730, DM3725 Applications Processor
268 pages
JetsonTX1 Module DataSheet DS07224010v091
No ratings yet
JetsonTX1 Module DataSheet DS07224010v091
47 pages
Nvidia Nano Datasheet
No ratings yet
Nvidia Nano Datasheet
41 pages
Altra Max Rev A1 DS v1.05 20220728
No ratings yet
Altra Max Rev A1 DS v1.05 20220728
88 pages
Dspic33epxxgs202 Family Data Sheet Ds70005208e
No ratings yet
Dspic33epxxgs202 Family Data Sheet Ds70005208e
344 pages
Neuromorphic Architectures Lec 4-16-1731320691
No ratings yet
Neuromorphic Architectures Lec 4-16-1731320691
276 pages
Ta 01 Sharma Crimmins Pres User
No ratings yet
Ta 01 Sharma Crimmins Pres User
46 pages
S8906: Fast Data Pipelines For Deep Learning Training: Przemek Tredak, Simon Layton
No ratings yet
S8906: Fast Data Pipelines For Deep Learning Training: Przemek Tredak, Simon Layton
41 pages
NVIDIA Jetson TX1 System-on-Module: Data Sheet (Preliminary)
100% (1)
NVIDIA Jetson TX1 System-on-Module: Data Sheet (Preliminary)
44 pages
Infineon-CY8C29466 CY8C29566 CY8C29666 CY8C29866 PSoC Programmable
No ratings yet
Infineon-CY8C29466 CY8C29566 CY8C29666 CY8C29866 PSoC Programmable
70 pages
Atc22 Slides Jia Xianyan
No ratings yet
Atc22 Slides Jia Xianyan
26 pages
Ftth/Gpon Olt: Features
No ratings yet
Ftth/Gpon Olt: Features
2 pages
Ddr
No ratings yet
Ddr
377 pages
Jungwok Choi - tinyML Asia 2023
No ratings yet
Jungwok Choi - tinyML Asia 2023
17 pages
H 264/avc
No ratings yet
H 264/avc
23 pages
OV48B PB v1.3 WEB
No ratings yet
OV48B PB v1.3 WEB
2 pages
dsPIC33CK256MP508 Family Data Sheet DS70005349H
No ratings yet
dsPIC33CK256MP508 Family Data Sheet DS70005349H
640 pages
Acceleratingpythonongpus
No ratings yet
Acceleratingpythonongpus
33 pages
Ep 4020 - Aio Du
No ratings yet
Ep 4020 - Aio Du
5 pages
XC9572XL High Performance CPLD: Features
No ratings yet
XC9572XL High Performance CPLD: Features
8 pages
Tensor Programs V
No ratings yet
Tensor Programs V
48 pages
Dspic® Digital Signal Controllers
100% (3)
Dspic® Digital Signal Controllers
6 pages
DV2-3G Introduction - A
No ratings yet
DV2-3G Introduction - A
27 pages
DsPIC33 EP64 GS502 Datasheet
No ratings yet
DsPIC33 EP64 GS502 Datasheet
390 pages
2Gb LPDDR4 A Die Component Datasheet
No ratings yet
2Gb LPDDR4 A Die Component Datasheet
325 pages
001-84932 PSoC 5LP CY8C58LP Family Datasheet Programmable System-on-Chip PSoC
No ratings yet
001-84932 PSoC 5LP CY8C58LP Family Datasheet Programmable System-on-Chip PSoC
140 pages
OceanStor Dorado
No ratings yet
OceanStor Dorado
35 pages
sprt248f
No ratings yet
sprt248f
5 pages
Lec15 - Tournament Predictor and Loop Unrolling
No ratings yet
Lec15 - Tournament Predictor and Loop Unrolling
44 pages
RAN Dim v8.0 User Manual
No ratings yet
RAN Dim v8.0 User Manual
116 pages
OSDR
No ratings yet
OSDR
4 pages
Design and Implementation of Multi-Purpose DCT/DST-Specific Accelerator On Heterogeneous Multicore Architecture
No ratings yet
Design and Implementation of Multi-Purpose DCT/DST-Specific Accelerator On Heterogeneous Multicore Architecture
10 pages
GL753VE
No ratings yet
GL753VE
13 pages
Neutrino 430 Data Sheet
No ratings yet
Neutrino 430 Data Sheet
2 pages
NIIT BS+IT Program
No ratings yet
NIIT BS+IT Program
14 pages
PCI Express 5 Update Keys To Addressing An Evolving Specification
No ratings yet
PCI Express 5 Update Keys To Addressing An Evolving Specification
33 pages
NVIDIA Jetson AGX Xavier Series System-on-Module: Volta GPU + Carmel CPU + 8/32GB LPDDR4x + 32GB eMMC
No ratings yet
NVIDIA Jetson AGX Xavier Series System-on-Module: Volta GPU + Carmel CPU + 8/32GB LPDDR4x + 32GB eMMC
77 pages
Alif E7 Datasheet v2.5-1
No ratings yet
Alif E7 Datasheet v2.5-1
182 pages
Xa Zynq-7000 Soc First Generation Architecture
No ratings yet
Xa Zynq-7000 Soc First Generation Architecture
28 pages
Pic 1
No ratings yet
Pic 1
71 pages
Tata HPC Aman
No ratings yet
Tata HPC Aman
34 pages
Hpe Greenlake For Block Storage Powered by Hpe Alletra Storage MP (Release 3)
No ratings yet
Hpe Greenlake For Block Storage Powered by Hpe Alletra Storage MP (Release 3)
65 pages
Jetson Xavier NX Data Sheet v1.3
No ratings yet
Jetson Xavier NX Data Sheet v1.3
40 pages
WatsonX Starcoder-15.5b-Fine Tuning
No ratings yet
WatsonX Starcoder-15.5b-Fine Tuning
10 pages
lowpower-design-of-90nm-superhspl-trade-processor-core
No ratings yet
lowpower-design-of-90nm-superhspl-trade-processor-core
6 pages
Voice Over IP Crash Course
From Everand
Voice Over IP Crash Course
Steven Shepard
2/5 (1)
BIMP-EAGA Maritime 2023 Indonesia Event Programme
No ratings yet
BIMP-EAGA Maritime 2023 Indonesia Event Programme
11 pages
RF Power Field Effect Transistors: N - Channel Enhancement - Mode Lateral Mosfets
No ratings yet
RF Power Field Effect Transistors: N - Channel Enhancement - Mode Lateral Mosfets
13 pages
REYNOLDS APPARATUS
No ratings yet
REYNOLDS APPARATUS
9 pages
The DELIAN Project: Democracy Through Technology - Clinton Foundation
No ratings yet
The DELIAN Project: Democracy Through Technology - Clinton Foundation
4 pages
Dokumen.pub the New Commodity Trading Guide Breakthrough Strategies for Capturing Market Profits 1st Edition 0137145292 9780137145294
No ratings yet
Dokumen.pub the New Commodity Trading Guide Breakthrough Strategies for Capturing Market Profits 1st Edition 0137145292 9780137145294
193 pages
Linear Expansion 1 PDF
No ratings yet
Linear Expansion 1 PDF
1 page
Abhyaas Prelims 2024 Gs Test 1 Eng
No ratings yet
Abhyaas Prelims 2024 Gs Test 1 Eng
66 pages
Hibernate Practicals
No ratings yet
Hibernate Practicals
49 pages
Catherine Bennett - OutbreakSafe Pty LTD - Advisor Agreement
No ratings yet
Catherine Bennett - OutbreakSafe Pty LTD - Advisor Agreement
9 pages
Utilising Audio Visual Aids To Make Learning Easy and Effective in Primary Education
No ratings yet
Utilising Audio Visual Aids To Make Learning Easy and Effective in Primary Education
2 pages
A Seminar On Financial Deepening and Economic Growth in Nigeria Written by Udoh, Ediomi Nathaniel
No ratings yet
A Seminar On Financial Deepening and Economic Growth in Nigeria Written by Udoh, Ediomi Nathaniel
5 pages
P Remanand Kocheri: Professional Summary Skills
No ratings yet
P Remanand Kocheri: Professional Summary Skills
2 pages
2024Q1 Overseas Quarter PromotiJ1aS9O49
No ratings yet
2024Q1 Overseas Quarter PromotiJ1aS9O49
9 pages
SketchUp v5 Users Guide For Mac OSX
No ratings yet
SketchUp v5 Users Guide For Mac OSX
415 pages
Download full Becoming Elijah Prophet of Transformation Daniel C. Matt ebook all chapters
100% (4)
Download full Becoming Elijah Prophet of Transformation Daniel C. Matt ebook all chapters
50 pages
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
No ratings yet
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
14 pages
Learning Skills Assessment Record
100% (1)
Learning Skills Assessment Record
2 pages
Auskick Rules
No ratings yet
Auskick Rules
1 page
Cdo
No ratings yet
Cdo
164 pages
IPV AQEMM 01 11 Turtle Diagram Production
No ratings yet
IPV AQEMM 01 11 Turtle Diagram Production
4 pages
Name: - Score: - / PRELIM - Hands On Activity#1 Task
No ratings yet
Name: - Score: - / PRELIM - Hands On Activity#1 Task
2 pages
DLL_ENGLISH 5_Q4_W5-D3
No ratings yet
DLL_ENGLISH 5_Q4_W5-D3
3 pages
Memorandum-Association of Construction Company-Takeovr of Propritary Concern
100% (3)
Memorandum-Association of Construction Company-Takeovr of Propritary Concern
8 pages
16b.mbpp-Mups-0.2ma2 - Hyd101 Romp03
No ratings yet
16b.mbpp-Mups-0.2ma2 - Hyd101 Romp03
46 pages
Swiss Basic Practical Logbook
No ratings yet
Swiss Basic Practical Logbook
21 pages
Macro 2, Quiz 1 Answers
No ratings yet
Macro 2, Quiz 1 Answers
3 pages

SuperPod reference architecture

Uploaded by

SuperPod reference architecture

Uploaded by

System within one node

IB NIC 200 Gb/s

PCIE Gen4 64 GB/s

140 nodes 80 nodes

For more info, please refer to

• Forward-compute and backward-compute includes all-reduce 18000

within each tensor model parallel group 16000

• AllReduce time within each data parallel group is significant 14000

Elapsed Time Per Step (ms)

• 1*HCA, 2*HCAs: Extremely bad, bounded by communication perf. 10000

• 8*HCAs VS 4*HCAs: 6.6% improvement. 4000

1*HCA 2*HCA 4*HCA 8*HCA

You might also like

• 1HCA, 2HCAs: Extremely bad, bounded by communication perf. 10000

• 8HCAs VS 4HCAs: 6.6% improvement. 4000

1HCA 2HCA 4HCA 8HCA