1.Pipelining & ILP
1.Pipelining & ILP
Pipelining
CS252/Culler
1/24/02 Lec 2.1
Review: Visualizing Pipelining
ALU
Reg
n Ifetch Reg DMem
s
t
r.
ALU
Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d
e
r
ALU
Ifetch Reg DMem Reg
CS252/Culler
1/24/02 Lec 2.2
Limits to pipelining
CS252/Culler
1/24/02 Lec 2.3
Example: One Memory
Port/Structural Hazard
ALU
I Load Ifetch Reg DMem Reg
n
s
ALU
t Instr 1
Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
Instr 2
O
r
ALU
Reg
d Instr 3 Ifetch Reg DMem
e
r Instr 4
Structural Hazard
CS252/Culler
1/24/02 Lec 2.4
Resolving structural hazards
CS252/Culler
1/24/02 Lec 2.5
Detecting and Resolving Structural
Hazard
ALU
I Load Ifetch Reg DMem Reg
n
s
ALU
t Instr 1
Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
Instr 2
O
r
d Stall Bubble Bubble Bubble Bubble Bubble
ALU
r Instr 3 Ifetch Reg DMem Reg
CS252/Culler
1/24/02 Lec 2.6
Data Hazards
Time (clock cycles)
IF ID/RF EX MEM WB
ALU
add r1,r2,r3 Ifetch Reg DMem Reg
n
s
ALU
t sub r4,r1,r3 Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
O and r6,r1,r7
r
d
ALU
Ifetch Reg DMem Reg
e or r8,r1,r9
r
ALU
xor r10,r1,r11 Ifetch Reg DMem Reg
CS252/Culler
1/24/02 Lec 2.7
Three Generic Data Hazards
CS252/Culler
1/24/02 Lec 2.8
Three Generic Data Hazards
I: sub r1,r4,r3
J: add r1,r2,r3
• Called an “output
K: dependence” by compiler writers
mul r6,r1,r7
This also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Writes are always in stage 5
• Will see WAR and WAW in later more complicated pipes
CS252/Culler
1/24/02 Lec 2.10
Forwarding to Avoid Data Hazard
ALU
add r1,r2,r3 Ifetch Reg DMem Reg
s
t
r.
ALU
sub r4,r1,r3 Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d and r6,r1,r7
e
r
ALU
Ifetch Reg DMem Reg
or r8,r1,r9
ALU
Ifetch Reg DMem Reg
xor r10,r1,r11
CS252/Culler
1/24/02 Lec 2.11
Data Hazard Even with Forwarding
ALU
I lw r1, 0(r2) Ifetch Reg DMem Reg
n
s
ALU
t sub r4,r1,r6 Ifetch Reg DMem Reg
r.
ALU
O and r6,r1,r7
Ifetch Reg DMem Reg
r
d
e
ALU
Ifetch Reg DMem Reg
r or r8,r1,r9
CS252/Culler
1/24/02 Lec 2.12
Resolving this load hazard
CS252/Culler
1/24/02 Lec 2.13
Resolving the Load Data Hazard
ALU
s lw r1, 0(r2) Ifetch Reg DMem Reg
t
r.
ALU
sub r4,r1,r6 Ifetch Reg Bubble DMem Reg
O
r
d
ALU
Ifetch Bubble Reg DMem Reg
e and r6,r1,r7
r
ALU
Bubble Ifetch Reg DMem
or r8,r1,r9
CS252/Culler
1/24/02 Lec 2.14
Control Hazard on Branches
=> Three Stage Stall
ALU
10: beq r1,r3,36 Ifetch Reg DMem Reg
ALU
Ifetch Reg DMem Reg
14: and r2,r3,r5
ALU
Reg
18: or r6,r1,r7 Ifetch Reg DMem
ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9
ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg
CS252/Culler
1/24/02 Lec 2.15
Example: Branch Stall Impact
CS252/Culler
1/24/02 Lec 2.16
Pipelined MIPS Datapath
MUX
SEQ PC
Adder
Adder
Zero?
4 RS1
MEM/WB
Memory
Address
EX/MEM
RS2
Reg File
ID/EX
ALU
IF/ID
Memory
MUX
Data
MUX
WB Data
Sign
Extend
Imm
RD RD RD
CS252/Culler
1/24/02 Lec 2.17
OVERCOME Branch Hazard
Alternatives
#1: Stall until branch direction is clear
CS252/Culler
1/24/02 Lec 2.18
Extending Pipeline To Handle
Multicycle operations
Floating point numbers have two parts
Exponents & significant
We should deal with the exponent and
significant seperately
Example:
3.25 X10^3
2.63 X10^-1
3.25 X 10^3
0.000236 X10^3 -Shift the smaller to right until
match
3.250263 X10^3
CS252/Culler
1/24/02 Lec 2.19
Cont…
So some algorithm needs to be implemented in
order to perform the operation
Functional unit should be redesigned to perform
all operations and this type of functional unit
require longer pipeline cycle
Latency in the functional unit :
- Latency is defined as the number of
intervening cycles between an instruction that
produces a result and an instruction that uses
the result.
initiation interval :
number of cycles that must elapse
between issuing two operations of a given type
CS252/Culler
1/24/02 Lec 2.20
Cont….
CS252/Culler
1/24/02 Lec 2.21
Latencies and initiation intervals
for FU
CS252/Culler
1/24/02 Lec 2.22
Pipeline support for FP
operations
CS252/Culler
1/24/02 Lec 2.23
FP example
CS252/Culler
1/24/02 Lec 2.24
Cont….
CS252/Culler
1/24/02 Lec 2.25
Cont….
Assuming that the pipeline does all hazard
detection in ID, there are three checks that must
be performed before an instruction can issue:
Check For Structural Hazards: Wait until the
required functional unit is available
Check for a RAW data hazard: Wait until the
source registers are not listed as pending
destinations in a pipeline register that will not
be available
CS252/Culler
1/24/02 Lec 2.29
Instruction Level Parallelism
I: add r1,r2,r3
J: sub r4,r1,r3
• or InstrJ is data dependent on InstrK which is
dependent on InstrI
• Caused by a “True Dependence” (compiler term)
• If true dependence caused a hazard in the pipeline,
called a Read After Write (RAW) hazard
CS252/Culler
1/24/02 Lec 2.32
Instruction Level Data Dependence
Parallelism and Hazards
CS252/Culler
1/24/02 Lec 2.33
Instruction Level Name Dependence
#1:
Parallelism Anti-dependence
I: sub r4,r1,r3
J: add r1,r2,r3
• K: mul r6,r1,r7
Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”
• If anti-dependence caused a hazard in the pipeline,
called a Write After Read (WAR) hazard
• CS252/Culler
1/24/02 Lec 2.34
Name Dependence
Instruction Level #2:
Parallelism Output
dependence
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “output dependence” by compiler
writers
This also results from the reuse of name “r1”
• If anti-dependence caused a hazard in the
pipeline, called a Write After Write (WAW) hazard
CS252/Culler
1/24/02 Lec 2.35
Instruction Level Control
Parallelism Dependencies
CS252/Culler
1/24/02 Lec 2.36
Out-Of-Order Execution
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
Enables out-of-order execution => out-of-order
completion
CS252/Culler
1/24/02 Lec 2.37