17 HC2024 Tesla TTPoE v5
17 HC2024 Tesla TTPoE v5
Ideal Fabric:
• Lowest latency
• Highest bandwidth
• Simple Software
OSI Layer Example Protocols (TCP/IP) TCP/IP Implementation OSI Layer Example Protocols Dojo Implementation
Layer 7 Layer 7
HTTP, Telnet, FTP Pytorch, Dojotorch
Application Application
Layer 4 Layer 4
Transport
TCP, UDP
Transport
TTP
Layer 3 Layer 3 (Optional)
IPv4/IPv6 IPv4/IPv6 (Optional)
Network Network Hardware
Layer 2 Layer 2
Ethernet Frames, MAC addresses, VLAN Ethernet Frames, MAC addresses, VLAN
Data Link Data Link
Hardware
Layer 1 Layer 1
Data Encoding, Physical Specs Data Encoding, Physical Specs
Physical Physical
TTP Device B
TTP Device B
TTP Device A
TTP Device A
Time
Time
Clean TTP transfer Example NACK TTP transfer Example.
TTP_PAYLOAD, ID=3 is either lost or
out of order
CLOSED
OPEN_NACK
CLOSE
(RX) OPEN_NACK
(TX)
OPEN
OPEN (RX)
(TX)
Timeout, OPEN OPEN
Resend OPEN SENT RECD
(TX)
OPEN_ACK OPEN_ACK
(RX) (TX)
HW !CLOSE (RX)
& !idle timer (TX) OPEN
CONSTRAINED CLOSE_ACK & !victim (TX)
(RX) CLOSE_ACK
(TX)
CLOSE CLOSE
(TX) (RX)
CLOSE CLOSE
SENT RECD
CLOSE_NACK
(RX)
The Transport Layer hardware is an IP block between a NOC and an Ethernet standard MAC
• Translates and coalesces 64B/cycle NOC packets into up to 1kB TTP Ethernet packets
• Speaks AXI-S or SOP/EOP formats
• Optionally activates standard MAC features – pause packets, counters, stats, LLDP
• IP block instantiated in FPGA and Silicon implementations
NOC RX RX serdes[3:0]
ETHERNET PCS/FEC PMA/PHY
TTP MAC AXI-S MAC
MII 64/66B
PCS IEEE 802.3
NOC TX TX serdes[3:0]
Standard Ethernet IP
Feature Spec
Ethernet Speed 100Gbps QSFP
PCI-e Gen3 x16 8x GB DDR4
Clocks
Reset Debug CSR/Perfmons
Power
440 MB SRAM
Variable Ingest
(Forward Pass)
2 Tbps
100 Gbps TTP
Remote NIC
NIC DIP DIP
Remote
Remote NIC
Mojo 1 Tbps TTP DIP DIP
Mojo
Mojo PCIe Gen3 TTP Network DIP DIP
Host
Host NIC DIP DIP
Host NIC
NIC DIP DIP
To Other
Partitions
EVPN
• 4xExaFLOP BF16/FP16 Cluster VXLAN
• 40 PB Local Storage
• 40,960 Main Host Cores TTP + TCP/IP TCP/IP
LEAF-1 LEAF-2 LEAF-3 LEAF-4 LEAF-5 TTP Only LEAF-6 LEAF-7 LEAF-8
• 61,440 Mojo Host Cores Converged Only
• 320 Tbps TTP All-Reduce I/O (endpoint) 80 Tbps 16 Tbps 32 Tbps 36 Tbps
36 Tbps
TTP TCP/IP TTP TCP/IP
• 128 Tbps TTP Ingest I/O (endpoint) 80 Tbps 16 Tbps TCP/IP
TTP TCP/IP
• 208 Tbps TCP/IP (endpoint) 32 Tbps
TTP
• Converged and non-Converged network
experiments
https://ultraethernet.org/
https://ultraethernet.org/wp-content/uploads/sites/20/2023/10/23.07.12-UEC-1.0-Overview-FINAL-WITH-LOGO.pdf
Tesla has achieved Exa-scale with a lossy fabric, executing real training runs deployed in FSD
Tesla is joining the UEC and offering the TTPoE protocol publicly
Thanks to the