0% found this document useful (0 votes)
41 views

Branch Prediction

The document discusses techniques for reducing branch stalls in pipelines, including branch prediction. It describes how branch stalls can negatively impact performance and the need to resolve branches earlier. Static techniques like branch delay slots help but cannot fully solve the problem. Dynamic branch prediction allows the processor to speculatively execute past branches based on predicted outcomes. History-based and correlating predictors improve prediction by tracking branch behavior patterns and correlations between branches.

Uploaded by

sahith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Branch Prediction

The document discusses techniques for reducing branch stalls in pipelines, including branch prediction. It describes how branch stalls can negatively impact performance and the need to resolve branches earlier. Static techniques like branch delay slots help but cannot fully solve the problem. Dynamic branch prediction allows the processor to speculatively execute past branches based on predicted outcomes. History-based and correlating predictors improve prediction by tracking branch behavior patterns and correlations between branches.

Uploaded by

sahith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Branch Prediction

Branch in pipeline
Inst. 1 2 3 4 5 6
LD R1, 0(R5) IF ID EX M WB

BEQ R1, R2, L1 IF ID EX M

SUB R3, R4, R5 IF ID EX

XOR R6, R4, R5 IF ID

L1: ADD R7, R8, R9 IF

• Taken branch – wrong fetch of SUB and XOR


• Need to flush out
• SUB and XOR control dependent on BEQ – may
result in control hazard
Solution 1
Inst. 1 2 3 4 5 6
LD R1, 0(R5) IF ID EX M WB

BEQ R1, R2, L1 IF ID EX M

SUB R3, R4, R5 stall stall IF?

XOR R6, R4, R5 stall stall

L1: ADD R7, R8, R9 IF?

• Stall pipeline
• Reduce branch stall by resolving branch in decode
stage
Problems with branch stall

• Stall pipeline – cannot solve problem


• Branch identified in decode stage and by then next
instruction fetched
• Decision required even before branch is decoded
Reducing branch stalls using Static
techniques

• Actions for a branch are static - fixed for each


branch during the entire execution
• Freeze or flush the pipeline - holding or deleting any
instructions after the branch until the branch
destination is known
– simple both for hardware and software
– branch penalty is fixed and cannot be reduced by
software
Reducing branch stalls using Static
technique 1
Reducing branch stalls using Static
technique 2
• Treat every branch as not taken - allowing the
hardware to continue as if the branch were not
executed – static branch not taken prediction
• Processor state should not be changed until the
branch outcome is definitely known
Reducing branch stalls using Static
technique 3

• Treat every branch as taken - static branch taken


prediction
• Useful only if branch target is known before branch
outcome
Reducing branch stalls using Static
technique 4

• Delayed branch
• Instruction in branch delay slot is executed whether
or not the branch is taken
• Schedule (fill) branch delay slot with useful
instructions (except another branch)
• MIPS 5-stage pipeline with branch outcome in ID
stage – one branch delay slot
Delayed Branch
Scheduling options for Delayed Branch

From before From target From fall-through


Solution 2 - Branch prediction

• Technique for reducing control stalls


• Predict the branch outcome to avoid stall
• Execute under the assumption
• Misprediction – flush the instructions (in pipe) from
wrong path and fetch the right path
• Sequential execution – prediction – branch not taken
Branch Performance

• Effective CPI = CPI + branch


stalls/instruction(branch penalty)
• Branch penalty = misprediction rate × miss penalty
• Branch penalty for a program with 20% branch
instruction and 60% taken branch in processor
which resolves branch in the 3rd stage
– 0.2×0.6×2=0.24 cycles
Branch Prediction Requirement

• With present PC content –predict next PC content


• Predictor must correctly guess
– Is taken branch?
– If taken ? Target PC
• CPI = CPIideal+ (1-predictor accuracy) × misprediction
penalty
Need for better Predictor
• Assume 2 processor alternatives
A- resolve branch in stage 3
B- resolve branch in stage 10
Consider a program with 20% branch instructions
Consider 2 branch predictors with accuracy 50% and 90%. What
is the impact on CPI?

BP accuracy CPI of A CPI of B


50% 1.2 1.9
90% 1.04 1.18
Branch prediction

• Static or dynamic
• Static – compiler - fixed
• Dynamic – runtime – prediction change with
program behaviour
Dynamic Branch prediction

• Simplest scheme - branch-prediction buffer or


branch history table
• Single history bit associated with the branch
instruction

Not
Taken Taken
Not
T NT
Taken

Taken
Branch prediction Buffer (BPB)

• Indexed by the last few bits of the address of the


branch instructions
• Buffer read in decode phase
BPB

k - bits NT
Branch prediction Buffer (BPB) – 1bit

• Misprediction if
– Incorrect prediction for that branch
– Same index referenced by two branches
• Drawback
– Misprediction during loop entry and exit
– Loop branches with smaller number of iterations – less
accurate (N-2/N)
• Remedy
– Two bit prediction
Branch prediction Buffer (BPB) – 2bit

• Change prediction if miss twice


Taken Not
Taken

PT PT
(11) (10)
Taken Not
Taken
Not Taken
Taken
PNT PNT
(01) (00)
Taken

Not
Taken
Branch prediction Buffer (BPB) – 2 bit

• The prediction must miss twice before it is changed


• In a loop branch, at the last loop iteration, we do not
need to change the prediction
• For each index in the table, the 2 bits are used to
encode the four states of a finite state machine
• Implemented as a two bit counter (saturating
arithmetic) with MSB as prediction
• Generalization: n-bit saturating counter for each
entry in the prediction buffer
– No much gain in performance
History based predictor
• N T N T N T or N T T N T T
• 100% predictable if pattern is understood
• Prediction change based on history of branch
• Each prediction entry has 2 fields – history and
prediction for each history
• Example: 1 bit history and one bit prediction
BPB
History Prediction Prediction
(NT) (T)
k - bits NT
History based predictor
History Prediction (NT) Prediction (T) Branch
outcome
NT NT NT T
T T NT NT
NT T NT T
T T NT NT
NT T NT T

• Learn pattern after some iteration and predict


correctly
Correlating predictor
• Need more accuracy of branch prediction
• Can increase accuracy by predicting branch behavior
based on behavior of current branch along with other
branches
• Different correlations can exist among branches
• Ex1: If first branch not taken, second is also not taken
If (a==2)
--
If (a==2 and b==5)
• Ex2: If If first branch taken, second is not taken
If (a==2)
---
If (a==0)
Correlating predictor
If (aa==2) B1
aa=0;
If (bb==2) B2
bb=0;
If (aa!=bb) B3
{
• B1 not taken and B2 not taken, B3 will be taken
• Correlating predictor (two level predictor) – use the
behavior of other branches to make a prediction
• (m,n) predictor – use behavior of last m branches from 2m
branch predictors
• Each of 2m predictor – n bit predictor for a single branch
Correlating predictor
If (d==0) B1
d=1;
If (d==1) B2
Possible execution sequence

Initial value d==0? b1 Value of d d==1 ? b2


of d before b2
0 Yes Not taken 1 Yes Not taken
1 No Taken 1 Yes Not taken
2 No Taken 2 No Taken

• b1 not taken => b2 will not be taken


• Correlating predictor can take advantage of this
One bit predictor
If (d==0) B1
d=1;
If (d==1) B2
Initialized to Not taken (NT) and d alternate between 0 and 2
d= b1 b1 New b1 b2 b2 New b2
prediction action prediction prediction action prediction
2 NT T T NT T T
0 T NT NT T NT NT
2 NT T T NT T T
0 T NT NT T NT NT

All branches mispredicted


Correlating predictor (1,1)
• Each branch has two different branch prediction
buffers
• X/Y – X - Predictor used in case the previous branch
in the application has not been taken
• Y - Predictor used in case the previous branch in the
application has been taken
• The content of the two branch prediction buffers are
determined by the branch to which they belong
• Which of the two branch prediction buffers are used
is depending on the outcome of the previous branch
in the application
Correlating predictor (1,1)

d= b1 b1 b2 b2
prediction action prediction action
2 NT/NT NT/NT
0
2
0

• Branch prediction buffers for the branches b1 and


b2 are assumed to hold the prediction ‘Not taken’ for
both option (previous branch not taken/taken)
Correlating predictor (1,1)
d= b1 b1 b2 b2
prediction action prediction action
2 NT/NT T NT/NT T
0 T/NT NT NT/T NT
2 T/NT T NT/T T
0 T/NT NT NT/T NT

d= b1 b1 New b1 b2 b2 New b2
prediction action prediction prediction action prediction
2 NT T T NT T T
0 T NT NT T NT NT
2 NT T T NT T T
0 T NT NT T NT NT
Correlating predictor (2,2)
Branch target buffer

• BHB or correlating predictors used in decode stage


of pipeline
• Branch penalty can be further reduced by predicting
branch target in fetch stage
• Branch target buffer used to store the branch target
address
• Based on prediction, PC updated with next PC or
branch target address in BTB
Predication

• Compiler converts control dependency into a data


dependency – eliminate branch
• Each instruction has a predicate bit set based on the
predicate computation
• Only instructions with TRUE predicates are
committed (others turned into NOPs)
Predication example
I1 ADDS R3, R2, R1
I2 BCS next
I3 MOV R4, #0
I4 B stop
I5 next: MOV R4, #1
I6 stop: ----
----
• Add [R2] and [R1]
• Save result with carry
• [R4][R3] - sum
Predication example

I1 ADDS R3, R2, R1


I2 MOVCS R4, #1
I3 MOVCC R4, #0

• Modified ISA
• Support conditional execution of instruction (ARM)
• Invalid condition – no execution - nop
Predication- pros and cons
+
• No branch hazard
• Straight line code – better prefetching
• More independent instructions – better code
scheduling
-
• Over use of hardware
• Difficult for complex conditions (nested if)
• Difficult add predicated instruction to existing IS
Control hazard - summary
• Static or dynamic
• Static
– Branch prediction
– Branch delay slot
• Dynamic
• BHB – single bit, two bit
– Improve prediction
» Correlated predictor - Pentium
» Hybrid local global predictor – Alpha
– Early target determination
» BTB
» Next address in I-cache
Control hazard - summary

• Dynamic
– Early target determination
» BTB (Pentium, Itanium)
» Next address in I-cache (Alpha)
» Return address stack
– Reduce misprediction penalty
» Fetch both instruction streams (IBM mainframe)
– Eliminate branches
» Predicated execution (Itanium, ARM)

You might also like