14 25-15 00-RISCVMemoryModelTutorial
14 25-15 00-RISCVMemoryModelTutorial
Tutorial
Dan Lustig
May 7, 2018
ABOUT ME
2
OUTLINE
• Setting the Stage
• Litmus Tests
• RISC-V Weak Memory Ordering (“RVWMO”)
• Extensions: “Zam” and “Ztso”
• Documentation and Tools
• Conclude
3
WHAT IS A MEMORY CONSISTENCY MODEL?
4
WHY DO WE NEED A MEMORY MODEL?
5
This Photo by Unknown Author is licensed under CC BY-NC-ND
WHY DO WE NEED A MEMORY MODEL?
6
This Photo by Unknown Author is licensed under CC BY-NC-ND
WHY DO WE NEED A MEMORY MODEL?
7
This Photo by Unknown Author is licensed under CC BY-SA
WHY DO WE NEED A MEMORY MODEL?
The other parts of the contract are defined by the rest of the ISA specification
(including the ISA Formal Specification; see that TG’s tutorial later today)
8
This Photo by Unknown Author is licensed under CC BY-SA
A WIDE RANGE OF MEMORY MODELS
Low
Note: Performance
diagram obviously not to scale,
just a rough picture J Sequential
Consistency
ore O)
l St (TS RISC-V
ota ing
T er (RVWMO)
d
Or IBM Power
ore O)
l St (TS RISC-V
ota ing
T er (RVWMO)
d
Or IBM Power
11
SOME CASES ARE EASY (RELATIVELY)…
Initial condition on both harts: s0 == address x; s1 == address y.
Initial conditions in memory: all locations initialized to 0
Hart 0 Hart 1
li t1, 1 loop:
sw t1, 0(s0) lw a0, 0(s1)
fence w,w beqz a0, loop
sw t1, 0(s1) fence r,r
lw a1, 0(s0)
Final output: what are the possible final values of a0 and a1 on hart 1?
Only possible outcome is a0 == a1 == 1
12
SOME CASES ARE HARD…
• Should this outcome be permitted or forbidden? We’re
not even sure ourselves…
13
ARCHITECTURE VS. MICROARCHITECTURE
14
OPERATIONAL VS. AXIOMATIC
In modern practice, at ISA level, two common modeling approaches:
Axiomatic: define a set of criteria (“axioms”) to be satisfied
• Executions permitted unless they fail one or more axioms
Operational: define a golden abstract machine model
• Executions forbidden unless producible when executing this model
Axiomatic Operational
1. There is a total order on all 1. Harts take turn executing
memory operations. The order instructions. The order is non-
is non-deterministic. deterministic.
2. That total order respects 2. Each hart executes its own
program order instructions in order
3. Loads return the value written 3. Loads return the value written
by the latest store to the same by the most recent preceding
address in the total order store to the same address
16
SEQUENTIAL CONSISTENCY [LAMPORT ‘79]
Axiomatic Operational
Global memory
1. There is a total order on all 1. Harts take
order turn
executing
memory operations. The order instructions. The order is non-
is non-deterministic. deterministic.
Preserved Program
2. That total order respects 2. Order
Each(PPO)
hart executes its own
program order instructions in order
3. Loads return the value written 3. Loads return
Load Value the value written
Axiom
by the latest store to the same by the most recent preceding
address in the total order store to the same address
17
GLOBAL MEMORY ORDER
18
SEQUENTIAL CONSISTENCY [LAMPORT ‘79]
Axiomatic Operational
1. There is a total order on all 1. Harts take turn executing
memory operations. The order instructions. The order is non-
is non-deterministic. deterministic.
2. That total order respects 2. Each hart executes its own
program order instructions in order
3. Loads return the value written 3. Loads return the value written
by the latest store to the same by the most recent preceding
address in the total order store to the same address
19
TOTAL STORE ORDERING (SPARC, X86, RVTSO)
Axiomatic Operational
1. There is a total order on all 1. Harts take turn executing steps. The
memory operations. The order order is non-deterministic.
is non-deterministic. 2. Each hart executes its own
instructions in order
2. That total order respects
program order, except 3. Stores execute in two steps: 1) enter
StoreàLoad ordering store buffer, 2) drain to memory
4. Loads first try to forward from the
3. Loads return the value written store buffer. If that fails, they
by the latest store to the same return the value written by the most
address in program or memory recent preceding store to the same
order (whichever is later) address
20
ADDING A STORE BUFFER
If a load bypasses a store in the
(FIFO) store buffer, then the
load appears before the store in
global memory order
The load determines its return
value before the store becomes
globally visible
A Load
A AMO/SC
A A .aq
A A .rl
A LR
A A A
Addr/ctrl/ “(addr|data);rfi”
Overlap Overlap Overlap Fence data dep. or “addr;po”
Store
B Load
B Load
B B B .rl
B .aq
B SC
B B B
except “rsw” with pr/pw/sr/sw except ctrl deps.
and “fri;rfi” set appropriately RCsc where B is a load
26
PPO RULE 1
27
PPO RULE 3
AMO/SC
A A load B cannot determine its return value by
forwarding from an Atomic Memory Operation or
Overlap
Store-Conditional operation that has not yet become
Load
B globally visible
28
PPO RULE 3
A load B cannot determine its return value by
forwarding from an Atomic Memory Operation or
Store-Conditional operation that has not yet become
AMO/SC
A globally visible
Overlap
Load
B
29
PPO RULE 4
fence [r][w][i][o], [r][w][i][o]
Orders operations in the predecessor set before
operations in the successor set
A
PR: previous reads. SR: subsequent reads
Fence
30
PPO RULES 5-7
AMOs and LR/SC have optional acquire and release
annotations for release consistency
.aq
A A .rl
A • All operations following an acquire in program
order also following it in global memory order
• All operations preceding a release in program
B .rl
B .aq
B
order also precede it in global memory order
31
PPO RULES 9-11
If B has a syntactic address, control, or data dependency
on A, then A precedes B in global memory order
• Except control dependencies where B is a store
A
• Address dependency: the result of A is used to
Addr/ctrl/
data dep. determine the address accessed by B
B • Control dependency: the result of A feeds a branch
except ctrl deps. that determines whether B is executed at all
where B is a load
• Data dependency: the result of A is used to determine
the value written by store B
Note: ordering maintained regardless of actual values!
32
PPO RULES 12-13
1. B follows M in program order, and M has an address
dependency on A
A
2. B returns a value from an earlier store M in the same
hart, and M has an address or data dependency on A
“(addr|data);rfi”
or “addr;po”
B
Most processors will maintain these naturally, yet most
programmers won’t ever use them anyway
We made them explicit rules so that the operational and
axiomatic models all agree
• And also for Linux, which has similar rules too
33
PPO RULE 8
SC
B (Mostly redundant with rules 1 and 11, except in rare
cases of mismatched addresses and no data dependency)
34
PPO RULE 2
Same-address load-load ordering is also maintained, with
two exceptions:
Load
A 1. Both return values come from the same store
Overlap • A form of architecturally-visible speculation
Load
B • Common in many implementations
except “rsw”
and “fri;rfi” 2. B forwards from a store M between A and B in
program order
• B can determine its value from the store buffer while
A is still fetching an older value from memory
35
ATOMICITY OF AMO AND LR/SC
AMOs grab an old value in memory, perform an arithmetic operation (except for
swap), and write the new value to memory, all in one single atomic operation
• One node in the global memory order
LR grabs a reservation. SC performs a store if the reservation is still valid, and then
releases the reservation.
• A reservation can be killed for any reason. A reservation must be killed if there is
a store to the reserved address range from any other hart.
36
PROGRESS AXIOM
37
…AND THAT’S IT!
38
MEMORY MODEL ISA EXTENSIONS
39
ONGOING/FUTURE WORK
• Mixed-size, partially-overlapping
memory acceses
• Formalize instruction fetches and FENCE.I
TLB flushes and SFENCE.VMA, etc.
• Integration with other extensions (V, J, N, T, …)
• Integration with the ISA formalization task group’s effort
• Cache flush/writeback/etc. operations
• (The task group logistics for all this are still TBD)
40
This Photo by Unknown Author is licensed under CC BY
DOCUMENTATION & TOOLS
• Appendix A: two dozen pages
explaining the details in plain English
• Appendix B: Two axiomatic models and
one operational model, with
associated tools (Alloy, herd, rmem)
41
MEMORY MODEL RATIFICATION TIMELINE
• Released for public review on 5/2/18
• Foundation requires at least 45 days for
public review. This will end no earlier
than 6/16/18.
• If you have comments or feedback:
• send to isa-dev
• send as a PR or issue on riscv-isa-manual GitHub repo
• send to me directly
42
This Photo by Unknown Author is licensed under CC BY-NC-ND
43
TOTAL STORE ORDERING (SPARC, X86, RVTSO)
Axiomatic Operational
ppo := (program order) – WàR
acyclic(ppo U rfe U co U fr U fence)
acyclic(po_loc U rf U co U fr)
44
[Alglave et al., TOPLAS ‘09] [Sewell et al., CACM ‘10]
RVWMO
Axiomatic (App. B.2) Operational (App. B.3)
ppo := (13 rules, on next slide)
acyclic(ppo U rfe U co U fr)
acyclic(po_loc U rf U co U fr)
45
MULTI-COPY ATOMICITY
A load may only return a value from:
• An earlier store from the same hart (“hardware thread”)
• A store that is globally visible
46
WHO FEELS THE PAIN?
Synchronization
C/C++ MM Java MM Linux MM Libraries
Canonical Canonical Canonical Hand …
Mapping Mapping Mapping Mapping
• Misconception: end users will have to deal with the memory model
• Reality: end users rarely interact with the ISA memory model directly
• Burden falls instead on library/compiler writers and microarchitects
47
MEMORY MODEL TASK GROUP PROGRESS
• May 2017 Workshop: Formed the task group
(…debate…)
• Load Value Axiom: each byte of each load i returns the value
written to that byte by the store that is the latest in global memory
order among the following stores:
1. Stores that write that byte and that precede i in the global memory order
2. Stores that write that byte and that precede i in program order