Christopher Celio, Krste Asanovic, David Palerson
Christopher Celio, Krste Asanovic, David Palerson
BOOM is a work-in-progress.
Results shown in the talk are
preliminary and subject to
change!
2
Tuesday, June 30, 15
UC Berkeley Other
Berkeley
RISC-‐V
Processors
§ Sodor
CollecBon
- RV32I
-‐
Bny,
educaBonal,
not-‐synthesizable
§ Z-‐scale
- RV32IM
-‐
micro-‐controller
§ Rocket
- RV64G
-‐
in-‐order,
single-‐issue
applicaBon
core
§ BOOM
- RV64G
-‐
out-‐of-‐order,
superscalar
applicaBon
core
3
Tuesday, June 30, 15
UC Berkeley Why
OoO?
§ Great
for
...
- tolera'ng
variable
latencies
- finding
ILP
in
code
(instruc'on-‐level
parallelism)
- complex
method
for
fine-‐grain
data
prefetching
- plays
nicely
with
poor
compilers
and
lazily
wri<en
code
Performance!
4
Tuesday, June 30, 15
UC Berkeley OoO
widely
used
in
industry
§ Intel
Xeon/i-‐series
(10-‐100W)
§ ARM
Cortex
mobile
chips
(1W)
§ Intel
Atom
§ Sun/Oracle
Niagara
UltraSPARC
§ Play
Sta'on
5
Tuesday, June 30, 15
UC Berkeley Academic
OoO
Research
§ general
lack
of
effort
in
academia
to
build,
evaluate
OoO
designs
§ most
research
uses
so[ware
simulators
- cannot
produce
area,
power
numbers
- hard
to
trust,
verify
results
- McPAT
is
calibrated
against
90nm
Niagara,
65nm
Niagara
2,
65nm
Xeon,
and
180nm
Alpha
21364
- very
slow
§ Other
Academic
OoO
RTL
efforts...
- Illinois
Verilog
Model,
Princeton
Sharing
Architecture,
NCSU
FabScalar
(Alpha,
PISA)
- other
ISAs
can
be
very
challenging
to
implement
fully
- rely
on
SW
simulators
for
performance
numbers
- hopefully
RISC-‐V
can
make
everybody’s
lives
easier!
6
Tuesday, June 30, 15
UC Berkeley Design-‐space
exploraKon
§ RISC-‐V
ISA
§ Chisel
HCL
(hardware
construcBon
language)
§ Rocket-‐chip
SoC
generator
9
Tuesday, June 30, 15
UC Berkeley
The
RISC-‐V
ISA
is
easy
to
implement!
§ relaxed
memory
model
§ accrued
FP
excepBon
flags
§ no
integer
side-‐effects
(e.g.,
condiBon
codes)
§ no
cmov
or
predicaBon
§ no
implicit
register
specifiers
- JAL
requires
explicit
rd
§ rs1,
rs2,
rs3,
rd
always
in
same
space
- allows
decode,
rename
to
proceed
in
parallel
10
Tuesday, June 30, 15
UC Berkeley
The
RISC-‐V
ISA
§ BOOM
supports
“M”
(mul/div/rem)
- imul
can
be
either
pipelined
or
unpipelined
§ BOOM
supports
“A”
- AMOs+LR/SC
§ BOOM
supports
“FD”
- single,
double-‐precision
floa'ng
point
- IEEE
754-‐2008
compliant
FPU
- SP,
DP
FMA
with
hw
support
for
subnormals
§ RV64G
11
Tuesday, June 30, 15
UC Berkeley
Rocket-‐Chip
SoC
Generator
§ open-‐source
§ taped
out
10
Bmes
by
Berkeley
§ runs
at
1.6
GHz
in
IBM
45nm
§ makes
for
a
great
library
of
processor
components!
12
Tuesday, June 30, 15
UC Berkeley
Supports
Privileged
ISA
(“S”),
Virtual
Memory
§ boots
Linux!
§ just
released
Privileged
ISA
v1.7
§ instant
to
update
- Privileged
ISA
nearly
en'rely
isolated
to
Control/Status
Register
(CSR)
File,
TLBs
- updated
git
submodule
pointers
- changed
“tohost”
to
“mtohost”
in
one
line
13
Tuesday, June 30, 15
UC Berkeley Chisel
§ Hardware
Construc'on
Language
embedded
in
Scala
§ not
a
high-‐level
synthesis
language
§ hardware
module
is
a
data
structure
in
Scala
§ Full
power
of
Scala
for
wri'ng
generators
- object-‐oriented
programming
- factory
objects,
traits,
overloading
- funcBonal
programming
- high-‐order
funs,
anonymous
funcs,
currying
§ generated
C++
simulator
is
1:1
copy
of
Verilog
designs
14
Tuesday, June 30, 15
UC Berkeley
Chisel
Hardware
ConstrucKon
Language
§ object-‐oriented,
funcBonal
programming
§ powerful
for
wriBng
hw
generators
§ 12
days
(+1092
loc)
to
add
SP,DP
floaBng
point
§ 9
days
(+900
loc)
to
go
from
no
VM
to
booBng
Linux
15
Tuesday, June 30, 15
UC Berkeley BOOM
Issue
Window
Unified
Decode & Physical Functional Unit
Fetch Rename Register
File
in-‐order out-‐of-‐order
front-‐half back-‐half
16
Tuesday, June 30, 15
UC Berkeley BOOM
Rename Map Tables & Freelist
Issue
Window
ALU
Unified
Physical
Decode &
Fetch Rename
Register
File FPU
(PRF)
ROB
Commit
§ PRF
- explicit
renaming
- holds
specula've
and
commi<ed
data
- holds
both
x-‐regs,
f-‐regs
§ Unified
Issue
Window
- holds
all
instruc'ons
§ split
ROB/issue
window
design 17
Tuesday, June 30, 15
UC Berkeley Parameterized
Superscalar
bypassing
dual-issue (5r,3w)
val
exe_units
=
ArrayBuffer[ExecutionUnit]()
ALU
exe_units
+=
Module(new
ALUExeUnit(is_branch_unit
=
true
,
has_fpu
=
true
FPU
,
has_mul
=
true
))
exe_units
+=
Module(new
ALUMemExeUnit(fp_mem_support
=
true
imul
,
has_div
=
true
Issue Regfile
))
Regfile
bypass
Select Read network Writeback
bypassing
ALU
ALU
OR Issue
Select
Regfile bypass
Read network
FPU
imul
Regfile
Writeback
BHT
I$ >>
Target
Fetch
Front-end Buffer
ExeBrTarget
BPD
Branch
Prediction
Front-end
19
Tuesday, June 30, 15
UC Berkeley Load/Store
Unit
§ load/store
queue
with
store
ordering
- loads
execute
fully
out-‐of-‐order
wrt
stores,
other
loads
- store-‐data
forwarded
to
loads
as
required
§ non-‐blocking
data
cache
20
Tuesday, June 30, 15
UC Berkeley Synthesizable
§ Runs
on
FPGA
- (Zynq
zedboard
and
Zynq
zc706)
§ 2GHz
(30
FO4)
in
TSMC
45nm
- speed
of
logic
(SRAM
is
slower) 1.7mm2 @ 45nm
I$ D$ (32k)
LLC Data
Exe
Uncore
Regfile Ren
Issue
Exe
Uncore
bpd I$ (32k)
22
Tuesday, June 30, 15
UC Berkeley Feature
Summary
Feature BOOM
Synthesizable √
FPGA √
Parameterized √
floating point √
AMOs+LR/SC √
caches √
VM √
Boots Linux √
Multi-core √
lines of code 9k + 11k
23
Tuesday, June 30, 15
UC Berkeley That’s
BOOM!
Quad-issue (9r,4w) bypassing
ALU
ALU
FPU
Issue Regfile bypass Regfile
Select Read network imul Writeback
ALU
div
Agen
LSU D$
24
Tuesday, June 30, 15
UC Berkeley Comparison
against
ARM
Category ARM Cortex-A9 RISC-V BOOM-2w
% !
+9
Performance 3.59 CoreMarks/MHz 3.91 CoreMarks/MHz
note:
LLC Data
Exe
25
Rename
bpd I$ (32k)
r
Co
74k 8
3.00 P S -‐ A
MI r tex k e t
-‐ A 5
Co c ex
Ro r t
Co
2.00
1.00
0
out-‐of-‐order in-‐order
processors processors
preliminary results 26
Tuesday, June 30, 15
UC Berkeley Industry
Comparisons
CoreMark/
Processor Core Area Freq (MHz) IPC
MHz
48x
Intel Xeon E5 2668 (Ivy) ~12 mm2@22nm 5.60 3,300 1.96
preliminary results 27
Tuesday, June 30, 15
UC Berkeley Ivy
Bridge
Tile
Comparison
I$ D$ (32k)
LLC Data
Ivy
Bridge-‐EP
Tile
Exe BOOM-2w Chip (32kB/32kB
+
256kB
caches)
Issue
Uncore
Exe
Regfile Ren (32kb/32kB + 256kB caches) ~12nm
@
22nm
Uncore
1.7mm2 @ 45nm
ROB LLC Data (256k)
Rename
bpd I$ (32k)
Exe
Uncore
Regfile Ren
28
Uncore
preliminary results
Rename
bpd I$ (32k)
32
Tuesday, June 30, 15
UC Berkeley Conclusion
§ BOOM
supports
full
RV64G
+
privileged
ISA
(VM
support)
§ Able
to
boot
Linux
and
run
CoreMark,
SPECINT,
and
Dhrystone
benchmarks
§ BOOM
is
9,000
loc
and
3
person-‐years
of
work
§ Future
Work
- bring-‐up
more
interes'ng
applica'ons
- add
ROCC
interface
- explore
new
µarch
designs
- tape-‐out
this
fall
- open-‐source
by
winter
workshop
33
Tuesday, June 30, 15
UC Berkeley QuesKons?
34
Tuesday, June 30, 15
UC Berkeley Funding
Acknowledgements
§ Research
par*ally
funded
by
DARPA
Award
Number
HR0011-‐12-‐2-‐0016,
the
Center
for
Future
Architecture
Research,
a
member
of
STARnet,
a
Semiconductor
Research
Corpora*on
program
sponsored
by
MARCO
and
DARPA,
and
ASPIRE
Lab
industrial
sponsors
and
affiliates
Intel,
Google,
Huawei,
Nokia,
NVIDIA,
Oracle,
and
Samsung.
§ Approved
for
public
release;
distribu*on
is
unlimited.
The
content
of
this
presenta*on
does
not
necessarily
reflect
the
posi*on
or
the
policy
of
the
US
government
and
no
official
endorsement
should
be
inferred.
§ Any
opinions,
findings,
conclusions,
or
recommenda*ons
in
this
paper
are
solely
those
of
the
authors
and
does
not
necessarily
reflect
the
posi*on
or
the
policy
of
the
sponsors.
35
Tuesday, June 30, 15