L04 Pipelining
L04 Pipelining
Engineering
Lecture 4 - Pipelining
Krste Asanovic
Electrical Engineering and Computer Sciences
University of California at Berkeley
http://www.eecs.berkeley.edu/~krste
http://inst.eecs.berkeley.edu/~cs152
0x4
Add
Add
clk
we Br Logic Bcomp?
clk
rs1
rs2
PC addr 1 rd1 we
inst wa addr
wd rd2 ALU
clk Inst. GPRs rdata
Memory Data
Imm Memory
Select wdata
ALU
Control
write
fetch decode & Reg-fetch execute memory -back
phase phase phase phase phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
Write
I-Fetch Decode, Reg. Fetch Execute Memory -Back
(IF) (ID) (EX) (MA) (WB)
ime t0 t1 t2 t3 t4 t5 t6 t7 .
nstruction1 IF1 ID1 EX1 MA1 WB1
nstruction2 IF2 ID2 EX2 MA2 WB2
nstruction3 IF3 ID3 EX3 MA3 WB3
nstruction4 IF4 ID4 EX4 MA4 WB4
nstruction5 IF5 ID5 EX5 MA5 W
0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Select wdata
Write
I-Fetch Decode, Reg. Fetch Execute Memory -Back
(IF) (ID) (EX) (MA) (WB)
time t0 t1 t2 t3 t4 t5 t6 t7 ...
Resources
IF I1 I2 I3 I4 I5
ID I1 I2 I3 I4 I5
EX I1 I2 I3 I4 I5
MA I1 I2 I3 I4 I5
WB I1 I2 I3 I4 I5
January 31, 2012 CS152, Spring 2012 12
Pipelined Execution:
ALU Instructions
0x4
IR IR IR
Add
1
we
rs1
rs2
addr rd1 A
PC we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
Select wdata
wdata
MD1 MD2
WASel
RegWriteEn
we FuncSel
rs1 MemWrite
WBSel
rs2
addr rd1 A we
PC
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
MD1 MD2
ImmSel Op2Sel
we
rs1
rs2
addr rd1 A
PC we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
Select wdata
wdata
MD1 MD2
...
x1 x0 + 10
x4 x1 + 17 x1 is stale. Oops!
...
January 31, 2012 CS152, Spring 2012 17
Resolving Data Hazards (1)
Strategy 1:
0x4 bubble
IR IR IR
Add
1
we
rs1
rs2
addr rd1 A
PC we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
...
MD1 MD2
x1 x0 + 10
x4 x1 + 17
...
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I3 I3 I3 I4 I5
ID I1 I2 I2 I2 I2 I3 I4 I5
Resource
EX
Usage I1 - - - I2 I3 I4 I5
MA I1 - - - I2 I3 I4
WB I1 - - - I2 I3
- pipeline bubble
0x4 bubble
Add IR IR IR
1
we
rs1
rs2
addr rd1 A
PC we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
MD1 MD2
Cdest
we
rs1
rs2
addr rd1 A
PC we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
MD1 MD2
source(s) destination
ALU rd rs1 func10 rs2 rs1, rs2 rd
ALUI rd rs1 op imm rs1 rd
LW rd M [rs1 + imm] rs1 rd
SW M [rs1 + imm] rs2 rs1, rs2 -
Bcond rs1,rs2 rs1, rs2 -
true: PC PC + imm
false: PC PC + 4
J PC PC + imm - -
JAL x1 PC, PC PC + imm - x1
JALR rd PC, PC rs1 + imm rs1 rd
Cstall
stall = ((rs1D =wsE).weE + !
t y
(rs1D =wsM).weM + n o t or
is l s
(rs1D =wsW).weW) . re1D +
h is f u l
((rs2D =wsE).weE + T e
th
(rs2D =wsM).weM +
(rs2D =wsW).weW) . re2D
January 31, 2012 CS152, Spring 2012 25
Hazards due to Loads & Stores
Stall Condition
What if
x1+7 = x3+5 ?
0x4 bubble IR IR IR
Add
1
we
rs1
rs2
addr rd1 A
PC we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
MD1 MD2
...
M[x1+7] x2 Is there any possible data hazard
x4 M[x3+5] in this instruction sequence?
...
...
M[x1+7] x2 x1+7 = x3+5 data hazard
x4 M[x3+5]
...
Strategy 2:
x4 x1... x1 ...
0x4 bubble
E M W
IR IR IR
Add
1
ASrc
we
rs1
rs2
A
PC addr D rd1 we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
MD1 MD2
0x4 bubble
E M W
IR IR IR
Add
ASrc 1
we
rs1
rs2
A
PC addr D rd1 we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
BSrc
MD1 MD2
Is there still
a need for the
stall signal ? stall = (rs1D=wsE). (opcodeE=LWE).(wsE0 ).re1D
+ (rs2D=wsE). (opcodeE=LWE).(wsE0 ).re2D
Inst 1
Inst 2 3 instructions finish in 4 cycles
Bubble CPI = 4/3 = 1.33
Inst 3
Inst 1
Bubble 1
Inst 2 3 instructions finish in 5cycles
Inst
Bubble
3 2 CPI = 5/3 = 1.67
Inst 3
January 31, 2012 CS152, Spring 2012 35
Resolving Data Hazards (3)
Strategy 3:
0x4 bubble
E M W
IR IR IR
Add
ASrc 1
Guess_zero
we
rs1
rs2
0
A
PC addr D rd1 we
inst IR wa ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Select wdata
BSrc
MD1 MD2
– For Jumps
» Opcode, PC and offset
– For Jump Register
» Opcode, Register value, and PC
– For Conditional Branches
» Opcode, Register (for condition), PC and offset
– For all other instructions
» Opcode and PC
• have to know it’s not one of above
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 - I2 - I3 - I4
ID
Resource I1 - I2 - I3 - I4
EX
Usage I1 - I2 - I3 - I4
MA I1 - I2 - I3 - I4
WB I1 - I2 - I3 -
- pipeline bubble
Add
E M
0x4 bubble
Add IR IR
Jump? I1
PC addr
inst IR
104 Inst
Memory I2
Jump? II21 I1
IRSrcD
Any
addr
interaction
PC bubble
inst IR between
304
104 Inst
bubble
I2
stall and
Memory
jump?
IRSrcD = Case opcodeD
I1 096 ADD
J, JAL bubble
I2 100 J 304
... IM
I3 104 ADD kill
I4 304 ADD
January 31, 2012 CS152, Spring 2012 41
Jump Pipeline Diagrams
time
t0 t1 t2 t3 t4 t5 t6 t7 ...
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1
(I2) 100: J 304 IF2 ID2 EX2 MA2 WB2
(I3) 104: ADD IF3 - - - -
(I4) 304: ADD IF4 ID4 EX4 MA4 WB4
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I4 I5
ID I1 I2 - I4 I5
Resource
Usage EX I1 I2 - I4 I5
MA I1 I2 - I4 I5
WB I1 I2 - I4 I5
- pipeline bubble
Add
E M
0x4 bubble IR IR
Add
BEQ? I1
Taken?
IRSrcD
PC addr bubble A
inst IR
ALU Y
104 Inst
Memory I2
I1 096 ADD
Branch condition is not known until
I2 100 BEQ x1,x2 +200
the execute stage
what action should be taken in the
I3 104 ADD
decode stage ?
I4 304 ADD
January 31, 2012 CS152, Spring 2012 43
Pipelining Conditional Branches
PCSrc (pc+4 / jabs / rind / br)
stall
?
Add
E Bcond? M
0x4 bubble
IR IR
Add
I2 I1
Taken?
IRSrcD
PC addr bubble A
inst IR
ALU Y
108 Inst
Memory I3
Add
E Bcond? M
IRSrcE
0x4 bubble
IR IR
Add
Jump? I2 I1
Taken?
PC
IRSrcD
PC addr bubble A
inst IR
ALU Y
108 Inst
Memory I3
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1
(I2) 100: BEQZ +200 IF2 ID2 EX2 MA2 WB2
(I3) 104: ADD IF3 ID3 - - -
(I4) 108: IF4 - - -
(I5) 304: ADD IF5 ID5 EX5 MA5
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I4 I5
ID I1 I2 I3 - I5
Resource
Usage EX I1 I2 - - I5
MA I1 I2 - - I5
WB I1 I2 - - I5
- pipeline bubble
0x4 Taken?
Add
Bcomp
we
rs1
rs2 A
PC addr D rd1
inst IR wa ALU
wd rd2
Inst GPRs B
Memory
Imm
Select
BSrc MD1