Lect07 Instruction Flow
Lect07 Instruction Flow
Branch FETCH
Instruction
Predictor Flow
Instruction
Buffer
DECODE
Memory
Data
EXECUTE Flow
Reorder
Buffer
Register (ROB)
Data COMMIT
Flow Store D-cache
Queue
Instruction/Decode Buffer
Decode
Dispatch Buffer
Dispatch
Reservation
Issue Stations
Branch
Execute
Finish Reorder/
Completion Buffer
Complete
Store Buffer
Retire
Mikko Lipasti-University of Wisconsin 16
Branch Prediction
Execute
Store Buffer
Retire
18
Mikko Lipasti-University of Wisconsin
Condition Resolution
Fetch
Decode Buffer
CC Decode
reg.
GP Dispatch Buffer
reg.
value Dispatch
comp.
Reservation
Stations
Issue
Branch
Execute
Store Buffer
Retire
Mikko Lipasti-University of Wisconsin 19
Branch Instruction Speculation
to I-cache
Prediction
FA-mux
Spec. target PC(seq.) = FA (fetch address)
Branch PC(seq.) Fetch
Spec. cond. Predictor
(using a BTB) Decode Buffer
BTB Decode
update
(target addr. Dispatch Buffer
and history)
Dispatch
Reservation
Stations
Issue
Branch
Execute
Finish
Completion Buffer
Mikko Lipasti-University of Wisconsin 20
Branch/Jump Target Prediction
0x0348 0101 (NTNT) 0x0612
TT NT
T T
T
N T
N
NN
TTN N N
TT
N
N
• Hardware table remembers last 2 branch outcomes
– History of past several branches encoded by FSM
– Current state used to generate prediction
• Results:
Workload IBM1 IBM2 IBM3 IBM4 DEC CDC
Accuracy 93 97 91 83 98 91
Mikko Lipasti-University of Wisconsin 25
IBM Study [Nair, 1992]
• Branch processing on the IBM RS/6000
– Separate branch functional unit
– Overlap of branch instructions with other
instructions
• Zero cycle branches
– Two causes for branch stalls
• Unresolved conditions
• Branches downstream too close to unresolved
branches
• Investigated optimal FSM design for
branch prediction
Branch Decode
Predictor
Update Dispatch Buffer
Dispatch
Reservation
Stations
BRN SFX SFX CFX FPU LS
Issue
Branch
Execute
Finish Completion
Buffer
Mikko Lipasti-University of Wisconsin 28
BTAC and BHT Design (PPC 604)
FA-mux
FA I-cache
FAR
FA FA
Branch History Branch Target
Table (BHT) Address Cache
+4 (BTAC)
BTAC
update Decode Buffer
BHT
BHT prediction update Decode
Dispatch Buffer
BTAC prediction
Dispatch
BTAC:
Reservation
- 64 entries Stations
- fully associative BRN SFX SFX CFX FPU LS
- hit => predict taken
Issue
BHT: Branch
- 512 entries Execute
- direct mapped
- 2-bit saturating counter
history based prediction
- overrides BTAC prediction Finish Completion
Buffer
Mikko Lipasti-University of Wisconsin 29
Branch Speculation
NT T
NT T NT
T
(TAG 2)
NT T NT T NT T NT T
(TAG 3) (TAG 1)
Fairly simple, 5-
stage machine
from 1994
Many sources
for PC redirect
Lots of
complexity
• Heuristic-based: Ball/Larus
– Thomas Ball and James R. Larus. Branch Prediction for Free. ACM SIGPLAN Symposium
on Principles and Practice of Parallel Programming, pages 300-313, May 1993.
Mikko Lipasti-University of Wisconsin 36
Static Branch Prediction
• Profile-based
1. Instrument program binary
2. Run with representative (?) input set
3. Recompile program
a. Annotate branches with hint bits, or
b. Restructure code to match predict not-taken
• Best performance: 75-80% accuracy
m
Saturating Counter
Increment/Decrement
010110 010100
010101
010110 1 0
010111
BHR
0110
111110
111111
1 Branch Prediction
• BHR adds global branch history
– Provides more context
– Can differentiate multiple instances of the same static branch
– Can correlate behavior across multiple static branches
Mikko Lipasti-University of Wisconsin 40
Two-level Prediction: Local History
PHT
PC = 01011010010101 0000000
0000001
0000010
0000011
BHT 0101110
000 0101100
001 0101101
010 0101110 0 1
011 0101111
100
101 110
110 0111110
111 0111111
0 Branch Prediction
branches 0001
XOR
f2
M a jority
Final Prediction
• Multiple PHT banks indexed by different hash functions
– Conflicting branch pair unlikely to conflict in more than one PHT
• Majority vote determines prediction
• Used in Alpha EV8 (ultimately cancelled)
• P. Michaud, A. Seznec, and R. Uhlig. Trading Conflict and Capacity Aliasing in
Conditional Branch Predictors. ISCA-24, June 1997
Mikko Lipasti-University of Wisconsin 48
Agree Predictor
biasing bits
Branch Address
Global BHR
XOR PHT
1
Prediction
0
T-cache NT-cache
1991: Two-level prediction
Partial Tag 2bC Partial Tag 2bC
1993: gshare, tournament
• Based on bi-mode = =
Reduction Confidence
Function Prediction
History = T History = T
C if (y < 12)
goto D;
H is to r y = T T
Path ACD: D Path BCD:
if (y % 2)
Branch Address = X goto E; Branch Address = X
Branch History = TT Branch History = TT
Branch Outcome = Not Taken Branch Outcome = Taken
• History length
– Short history—lower training cost
– Long history—captures macro-level behavior
– Variable history length predictors
• Really long history (long loops)
– Loop count predictors
– Fourier transform into frequency domain
• Kampe et. al, “The FAB Predictor…”, HPCA 2002
• Limited capacity & interference
– Constructive vs. destructive
– Bi-mode, gskewed, agree, YAGS
– Sec. 9.3.2 provides good overview
Precomputed
X
Response
BHR
>t
“feature”
“response”
=? =? =?
Size of
Instruction
= = =
not-taken OR
target
Branch Target
Size of
Instruction
BTB BTB
Return
Address
Target Prediction
is this a return?
Target Prediction
(a) (b)
• Speculative update causes headaches
– On each predicted branch, checkpoint head/tail
– Further, checkpoint stack contents since speculative pop/push
sequence is destructive
– Conditional call/return causes more headaches
Mikko Lipasti-University of Wisconsin 70
Indirect Branches
• Tagged target cache
– Chang et. al, Target Prediction for Indirect Jumps, ISCA
1997
Cache Cache
Bank 1 Bank 2 Valid
Instruction
E F G H Bits
A B C D
Interchange Switch
A B C DE F G H
Collapsing Circuit
A B C EG
To Decode Stage
• Fetch from two cache blocks, rotate, collapse past taken branches
• Thomas M. Conte, Kishore N. Menezes, Patrick M. Mills and Burzin A. Patel.
Optimization of Instruction Fetch Mechanisms for High Issue Rates.
International Symposium on Computer Architecture, June 1995.
Mikko Lipasti-University of Wisconsin 74
High-Bandwidth Fetch: Trace Cache
Instruction Cache
E F G Trace Cache
H I J
A B A B C D E F G H I J
C
D
(a) (b)
Instruction Decode
bc
next line
prediction 6
E F G H tag
I J K L
A B C D tag 2