CS104: Computer Organization: 30 March, 2020
CS104: Computer Organization: 30 March, 2020
30/03/2020
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0] Sign
Find the active ALU
16 Extend 32 control
control & data-
Instr[5-0]
path connections
L12
30/03/2020
Load Word Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0] Sign
Find the active ALU
16 Extend 32 control
control & data-
Instr[5-0]
path connections
L12
30/03/2020
Branch Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Read Data
PC Instr[31-0] 0 File Memory Read Data 1
Address ALU
Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0] Sign
ALU
16 Extend 32 control
Instr[5-0]
L12
30/03/2020
Instruction Times (Critical Paths)
What is the clock cycle time (assuming negligible
delays for muxes, control unit, sign extend, PC access,
shift left 2, wires, setup and hold times) but with:
Instruction and Data Memory (200 ps)
ALU and adders (200 ps)
Register File access (reads or writes) (100 ps)
Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total
R-
type
load
store
beq
jump
L12
30/03/2020
Instruction Critical Paths
What is the clock cycle time (assuming negligible
delays for muxes, control unit, sign extend, PC access,
shift left 2, wires, setup and hold times) but with:
Instruction and Data Memory (200 ps)
ALU and adders (200 ps)
Register File access (reads or writes) (100 ps)
Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total
R-
type 200 100 200 100 600
Cycle 1 Cycle 2
Clk
lw sw Waste
lw sw Waste
Add
File Address
Read
Address Write Addr ALU
Read Data
Data 2 Write Data
Write Data
Sign
16 Extend 32
System Clock
L12
30/03/2020
MIPS Pipeline Control Path Modifications
All control signals can be determined during Decode
and held in the state registers between pipeline stages
PCSrc
ID/EX
EX/MEM
Control
IF/ID
Add
Branch MEM/WB
RegWrite Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1 MemtoReg
Read ALUSrc
PC
16 Extend 32 ALUOp
RegDst
L12
30/03/2020
Pipeline Control
IF Stage: read Instr Memory (always asserted) and write
PC (on System Clock)
ID Stage: no optional control signals to set
R 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
L12
30/03/2020
ALU
IM Reg DM Reg
Once the
ALU
I Inst 0 IM Reg DM Reg pipeline is full,
n one instruction
s is completed
ALU
t Inst 1 IM Reg DM Reg
every cycle, so
r. CPI = 1
ALU
O Inst 2 IM Reg DM Reg
r
d
ALU
e Inst 3 IM Reg DM Reg
r
ALU
Inst 4 IM Reg DM Reg
Reading data
lw
ALU
I Mem Reg Mem Reg
from memory
n
s
ALU
t Inst 1 Mem Reg Mem Reg
r.
ALU
O Inst 2 Mem Reg Mem Reg
r
d
ALU
e Inst 3 Mem Reg Mem Reg
r
ALU
Inst 4 Mem Reg Mem Reg
Reading instruction
from memory
Fix with separate instr and data memories (I$ and D$)
L12
30/03/2020
How About RegisterTime
File Access?
(clock cycles)
ALU
I IM Reg DM Reg access hazard by
n doing reads in the
s second half of the
ALU
t Inst 1 IM Reg DM Reg
cycle and writes in
r. the first half
ALU
O Inst 2 IM Reg DM Reg
r
d
ALU
e add $2,$1, IM Reg DM Reg
r
ALU
add $1, IM Reg DM Reg
ALU
sub $4,$1,$5 IM Reg DM Reg
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
or $8,$1,$9 IM Reg DM Reg
ALU
xor $4,$1,$5 IM Reg DM Reg
ALU
I lw $1,4($2) IM Reg DM Reg
n
s
ALU
t sub $4,$1,$5 IM Reg DM Reg
r.
ALU
O and $6,$1,$7 IM Reg DM Reg
r
d
ALU
e or $8,$1,$9 IM Reg DM Reg
r
ALU
xor $4,$1,$5 IM Reg DM Reg
beq
ALU
I IM Reg DM Reg
n
s
ALU
t lw IM Reg DM Reg
r.
ALU
O Inst 3 IM Reg DM Reg
r
d
ALU
e Inst 4 IM Reg DM Reg
r
L12
30/03/2020
Other Pipeline Structures Are Possible
What about the (slow) multiply operation?
Make the clock twice as slow or …
let it take two cycles (since it doesn’t use the DM stage)
MUL
ALU
IM Reg DM Reg
XScale
ALU
IM1 IM2 Reg DM1 Reg
SHFT DM2
PC update decode DM write
BTB access reg 1 access ALU op reg write
start IM access
shift/rotate start DM access
IM access reg 2 access exception
L12
30/03/2020
Summary
All modern day processors use pipelining
Pipelining doesn’t help latency of single task, it helps
throughput of entire workload
Potential speedup: a CPI of 1 and fast a CC
Pipeline rate limited by slowest pipeline stage
Unbalanced pipe stages makes for inefficiencies
The time to “fill” pipeline and time to “drain” it can impact
speedup for deep pipelines and short code runs
Must detect and resolve hazards
Stalling negatively affects CPI (makes CPI less than the ideal
of 1)