Ventana HotChips23 - Final
Ventana HotChips23 - Final
August 2023
Company confidential
Veyron V1 Target Markets RISC-V Performance Leader
Company confidential
2
Veyron V1: Server Class RISC-V IP + Chiplets RISC-V Performance Leader
DIMM Memory
DIMM Memory
Coherent Interconnect
• Full architectural support to run virtualized workloads 4-6x System Level 4-6x
DDR L4 Cache DDR
• RAS protection of all caches / functional RAMs, with end-to-end data poisoning
and background cache scrubbing PCIe / CXL / Ethernet
Company confidential
4
Veyron V1 Arch/Uarch Overview
RISC-V Architecture Support RISC-V Performance Leader
Company confidential
6
Core Microarchitecture Highlights RISC-V Performance Leader
• Decode, dispatch, issue, execute, and commit all operate in terms of “ops”
(fused and unfused)
Company confidential
7
Core Microarchitecture Highlights RISC-V Performance Leader
8
V1 CPU Pipelines RISC-V Performance Leader
Restart Pipe
RPS RP1 RP2 RP3
Predict Pipe
PNI PRS PR1 PR2 PR3
Fetch Pipe
QFS QX1 QX2 QT1 QT2 QD1 QD2
QT1 QT2
QD1 QD2
Decode Pipe
DPD DXE DRN DDS
CST
Company confidential
9
Predict, Fetch, and Decode Units RISC-V Performance Leader
Company confidential
10
Load/Store Unit RISC-V Performance Leader
• Can execute any mix of up to four loads and/or stores per cycle
• Closely-coupled L1/L2 data cache hierarchy for low latency
• DL1
• 64 KB virtual cache (VIVT)
• Four-cycle load-to-use latency
• Large single-level DTLB accessed on cache misses (on the way to DL2)
• Hardware synonym handling – multiple read-only synonyms can be co-resident
• Hardware coherent based on inclusion wrt DL2
• Hardware TLB consistent wrt TLB invalidates
• 512 KB DL2
• Pipelined 64B-wide fills into DL1
• Hardware data prefetchers
• Next line, sequential, strided, and multi-stride patterns
• Prefetch next line from DL2 into DL1
• Prefetch much farther ahead from L3/DRAM into DL2 as staging
Company confidential
11
Processor Cluster Highlights RISC-V Performance Leader
Company confidential
12
Processor Cluster Highlights (cont.) RISC-V Performance Leader
Company confidential
13
Veyron V1: World’s First Server Class RISC-V Processor RISC-V Performance Leader
Company confidential
Disruptive ROI: Highest Single Socket Performance at Compelling Perf/Watt/$ 14
Veyron V1 Reference Implementation PPA RISC-V Performance Leader
• TSMC 5nm
• Standard TSMC 5nm metal stack
• Width linearly scales with tiled dual core+L3 slice
• Highly portable design across processes and foundries
15
Thank You
Company confidential