Modulo15 RiscV DDCArv Ch7
Modulo15 RiscV DDCArv Ch7
Pipelined RISCV
Processor
Pipelined RISCV Processor
• Temporal parallelism
• Divide singlecycle processor into 5 stages:
– Fetch
– Decode
– Execute
– Memory
– Writeback
• Add pipeline registers between stages
Instr
Pipelined
Dec
1 Fetch Execute Memory Wr
Read
Instruction ALU Read / Write Reg
Reg
Dec
2 Fetch Execute Memory Wr
Read
Instruction ALU Read / Write Reg
Reg
Dec
3 Fetch Execute Memory Wr
Read
Instruction ALU Read / Write Reg
Reg
Time (cycles)
s0
lw DM s2
lw s2, 40(s0) IM RF 40 + RF
s9
add DM s3
add s3, s9, s10 IM RF s10 + RF
t1
sub DM s4
sub s4, t1, s8 IM RF s8 - RF
s11
and DM s5
and s5, s11, t0 IM RF t0 & RF
t4
sw DM
sw s6, 20(t4) IM RF 20 + RF
t2
or DM s7
or s7, t2, t3 IM RF t3 | RF
ALU
1 ALUResult
A RD 01
Instruction 24:20 10
A2 RD2 0 SrcBE Data
Memory 11:7
A3 1 Memory
Register WriteData
WD3 WD
File
PCTarget
+
+
4 ImmExt
31:7 Extend
PCPlus4
Result
Pipelined
CLK CLK CLK
Zero
CLK
ALU
1 ALUResultM ReadDataW
RD2E A RD 01
Instruction
Pipelined
24:20 10
A2 RD2 0 SrcBE Data
Memory 11:7
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
+
appended with
+
first letter of 4
31:7 Extend
ImmExtD ImmExtE
PCTargetE
PCPlus4M
PCPlus4W
PCD, PCE).
ResultW
ALU
1 ALUResultM
RD2E A RD 01
Instruction 24:20 10
A2 RD2 0 SrcBE Data
Memory
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
11:7 RdD RdE RdM RdW
+
4 ImmExtD ImmExtE
31:7 Extend
ResultW
6:0
BranchD BranchE
op ALUControlD2:0 ALUControlE2:0
14:12
funct3
30
ALUSrcD ALUSrcE
funct75
ImmSrcD1:0
ALU
1 ALUResultM
RD2E A RD 01
Instruction 24:20 10
A2 RD2 0 SrcBE Data
Memory
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
11:7 RdD RdE RdM RdW
+
4 ImmExtD ImmExtE
31:7 Extend
ResultW
Pipelined Processor
Hazards
Pipelined Hazards
• When an instruction depends on result from
instruction that hasn’t completed
• Types:
– Data hazard: register value not yet written back to
register file
– Control hazard: next instruction not decided yet
(caused by branch)
Time (cycles)
s4
add DM s8
add s8, s4, s5 IM RF s5 + RF
s8
sub DM s2
sub s2, s8, s3 IM RF s3 - RF
t6
or DM s9
or s9, t6, s8 IM RF s8 | RF
s8
and DM s7
and s7, s8, t2 IM RF t2 & RF
Time (cycles)
s4
add DM s8
add s8, s4, s5 IM RF s5 + RF
nop DM
nop IM RF RF
nop DM
nop IM RF RF
s8
sub DM s2
sub s2, s8, s3 IM RF s3 - RF
t6
or DM s9
or s9, t6, s8 IM RF s8 | RF
s8
and DM s7
and s7, s8, t2 IM RF t2 & RF
1 2 3 4 5 6 7 8
Time (cycles)
s4
add DM s8
add s8, s4, s5 IM RF s5 + RF
s8
sub DM s2
sub s2, s8, s3 IM RF s3 - RF
t6
or DM s9
or s9, t6, s8 IM RF s8 | RF
s8
and DM s7
and s7, s8, t2 IM RF t2 & RF
Time (cycles)
s4
add DM s8
add s8, s4, s5 IM RF s5 + RF
s8
sub DM s2
sub s2, s8, s3 IM RF s3 - RF
t6
or DM s9
or s9, t6, s8 IM RF s8 | RF
s8
and DM s7
and s7, s8, t2 IM RF t2 & RF
6:0
BranchD BranchE
op ALUControlD2:0 ALUControlE2:0
14:12
funct3
30
ALUSrcD ALUSrcE
funct75
ImmSrcD1:0
ALU
1 10 ALUResultM
RD2E A RD 01
Instruction 24:20 10
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
19:15 Rs1D Rs1E
24:20 Rs2D Rs2E
11:7 RdD RdE RdM RdW
+
4 ExtImmD ExtImmE
31:7 Extend
ResultW
ForwardAE
ForwardBE
Hazard Unit
Time (cycles)
s5
lw DM s7
lw s7, 40(s5) IM RF 40 + RF
Trouble!
s7
and DM s8
and s8, s7, t3 IM RF t3 & RF
s6
or DM t2
or t2, s6, s7 IM RF s7 | RF
s7
sub DM s3
sub s3, s7, s2 IM RF s2 - RF
Time (cycles)
s5
lw DM s7
lw s7, 40(s5) IM RF 40 + RF
s7 s7
and DM s8
and s8, s7, t3 IM RF t3 RF t3 & RF
s6
or or DM t2
or t2, s6, s7 IM IM RF s7 | RF
Stall s7
sub DM s3
sub s3, s7, s2 IM RF s2 - RF
(Stall the Fetch and Decode stages, and flush the Execute stage.)
6:0
BranchD BranchE
op ALUControlD2:0 ALUControlE2:0
14:12
funct3
30
ALUSrcD ALUSrcE
funct75
ImmSrcD1:0
ALU
1 10 ALUResultM
EN
RD2E A RD 01
Instruction 24:20 10
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
19:15 Rs1D Rs1E
24:20 Rs2D Rs2E
11:7 RdD RdE RdM RdW
+
4 ExtImmD ExtImmE
31:7 Extend
CLR
EN
PCPlus4W
PCTargetE
ResultW
ResultSrcE0
ForwardAE
ForwardBE
FlushE
StallD
StallF
Hazard Unit
Pipelined Processor
Control Hazards
Control Hazards
• beq:
– Branch not determined until the Execute stage of
pipeline
– Instructions after branch fetched before branch
occurs
– These 2 instructions must be flushed if branch
happens
Time (cycles)
s1
beq DM
20 beq s1, s2, L1 IM RF s2 - RF
t1
sub DM
24 sub s8, t1, s3 IM RF s3 RF Flush
these
or
28 or s9, t6, s5 IM RF DM RF instructions
2C ...
... ...
s3
add DM s7
58 L1: add s7, s3, s4 IM RF s4 + RF
6:0
BranchD BranchE
op ALUControlD2:0 ALUControlE2:0
14:12
funct3
30
ALUSrcD ALUSrcE
funct75
ImmSrcD1:0
ALU
1 10 ALUResultM
EN
A RD 01
Instruction 24:20 RD2E 10
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
19:15 Rs1D Rs1E
24:20 Rs2D Rs2E
11:7 RdD RdE RdM RdW
+
4 ExtImmD ExtImmE
31:7 Extend
CLR
EN
PCPlus4W
PCTargetE
ResultW
ForwardAE
ForwardBE
FlushD
FlushE
StallD
StallF
Hazard Unit
6:0
BranchD BranchE
op
14:12
ALUControlD2:0 ALUControlE2:0
funct3
30
ALUSrcD ALUSrcE
funct75
ImmSrcD1:0
ALU
1 10 ALUResultM ReadDataW
EN
A RD 01
Instruction 24:20 RD2E 10
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
19:15 Rs1D Rs1E
24:20 Rs2D Rs2E
11:7 RdD RdE RdM RdW
+
4 ExtImmD ExtImmE
31:7 Extend
CLR
EN
PCPlus4W
PCTargetE
ResultW
ForwardAE
ForwardBE
FlushD
FlushE
StallD
StallF
Hazard Unit
Pipelined
Performance
Pipelined Processor Performance Example
• SPECINT2000 benchmark:
– 25% loads
– 10% stores
– 13% branches
– 52% Rtype
• Suppose:
– 40% of loads used by next instruction
– 50% of branches mispredicted
• What is the average CPI? (Ideally it’s 1, but…)
– Load CPI = 1 when not stalling, 2 when stalling
So, CPIlw = 1(0.6) + 2(0.4) = 1.4
– Branch CPI = 1 when not stalling, 3 when stalling
So, CPIbeq = 1(0.5) + 3(0.5) = 2
• Decode and Writeback stages both use the register file in each cycle
• So each stage gets half of the cycle time (Tc/2) to do their work
• Or, stated a different way, 2x of their work must fit in a cycle (Tc)
6:0
BranchD BranchE
op ALUControlD2:0 ALUControlE2:0
14:12
funct3
30
ALUSrcD ALUSrcE
funct75
ImmSrcD1:0
ALU
1 10 ALUResultM ReadDataW
EN
A RD 01
Instruction 24:20 RD2E 10
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
PCD PCE
+
19:15 Rs1D Rs1E
24:20 Rs2D Rs2E
11:7 RdD RdE RdM RdW
+
4 ExtImmD ExtImmE
31:7 Extend
CLR
EN
PCPlus4W
PCTargetE
ResultW
ForwardAE
ForwardBE
FlushD
FlushE
StallD
StallF
Hazard Unit