Branch Prediction
Branch Prediction
Branch in pipeline
Inst. 1 2 3 4 5 6
LD R1, 0(R5) IF ID EX M WB
• Stall pipeline
• Reduce branch stall by resolving branch in decode
stage
Problems with branch stall
• Delayed branch
• Instruction in branch delay slot is executed whether
or not the branch is taken
• Schedule (fill) branch delay slot with useful
instructions (except another branch)
• MIPS 5-stage pipeline with branch outcome in ID
stage – one branch delay slot
Delayed Branch
Scheduling options for Delayed Branch
• Static or dynamic
• Static – compiler - fixed
• Dynamic – runtime – prediction change with
program behaviour
Dynamic Branch prediction
Not
Taken Taken
Not
T NT
Taken
Taken
Branch prediction Buffer (BPB)
k - bits NT
Branch prediction Buffer (BPB) – 1bit
• Misprediction if
– Incorrect prediction for that branch
– Same index referenced by two branches
• Drawback
– Misprediction during loop entry and exit
– Loop branches with smaller number of iterations – less
accurate (N-2/N)
• Remedy
– Two bit prediction
Branch prediction Buffer (BPB) – 2bit
PT PT
(11) (10)
Taken Not
Taken
Not Taken
Taken
PNT PNT
(01) (00)
Taken
Not
Taken
Branch prediction Buffer (BPB) – 2 bit
d= b1 b1 b2 b2
prediction action prediction action
2 NT/NT NT/NT
0
2
0
d= b1 b1 New b1 b2 b2 New b2
prediction action prediction prediction action prediction
2 NT T T NT T T
0 T NT NT T NT NT
2 NT T T NT T T
0 T NT NT T NT NT
Correlating predictor (2,2)
Branch target buffer
• Modified ISA
• Support conditional execution of instruction (ARM)
• Invalid condition – no execution - nop
Predication- pros and cons
+
• No branch hazard
• Straight line code – better prefetching
• More independent instructions – better code
scheduling
-
• Over use of hardware
• Difficult for complex conditions (nested if)
• Difficult add predicated instruction to existing IS
Control hazard - summary
• Static or dynamic
• Static
– Branch prediction
– Branch delay slot
• Dynamic
• BHB – single bit, two bit
– Improve prediction
» Correlated predictor - Pentium
» Hybrid local global predictor – Alpha
– Early target determination
» BTB
» Next address in I-cache
Control hazard - summary
• Dynamic
– Early target determination
» BTB (Pentium, Itanium)
» Next address in I-cache (Alpha)
» Return address stack
– Reduce misprediction penalty
» Fetch both instruction streams (IBM mainframe)
– Eliminate branches
» Predicated execution (Itanium, ARM)