0% found this document useful (0 votes)
11 views37 pages

1.Pipelining & ILP

The document discusses advanced computer architecture concepts, focusing on pipelining and its associated hazards, including structural, data, and control hazards. It explains various types of data hazards (RAW, WAR, WAW) and methods to resolve them, such as forwarding and stalling. Additionally, it covers instruction-level parallelism (ILP) and techniques to reduce stalls in pipelined architectures.

Uploaded by

Herlin L.T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views37 pages

1.Pipelining & ILP

The document discusses advanced computer architecture concepts, focusing on pipelining and its associated hazards, including structural, data, and control hazards. It explains various types of data hazards (RAW, WAR, WAW) and methods to resolve them, such as forwarding and stalling. Additionally, it covers instruction-level parallelism (ILP) and techniques to reduce stalls in pipelined architectures.

Uploaded by

Herlin L.T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

CS2354 - Advanced Computer Architecture

Pipelining

CS252/Culler
1/24/02 Lec 2.1
Review: Visualizing Pipelining

Time (clock cycles)

Cycle 1Cycle 2 Cycle 3 Cycle 4Cycle 5 Cycle 6Cycle 7


I

ALU
Reg
n Ifetch Reg DMem

s
t
r.

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg

d
e
r

ALU
Ifetch Reg DMem Reg

CS252/Culler
1/24/02 Lec 2.2
Limits to pipelining

• Hazards: circumstances that would cause


incorrect execution if next instruction
were launched
– Structural hazards: Attempting to use the same
hardware to do two different things at the same
time
– Data hazards: Instruction depends on result of prior
instruction still in the pipeline
– Control hazards: Caused by delay between the
fetching of instructions and decisions about changes
in control flow (branches and jumps).

CS252/Culler
1/24/02 Lec 2.3
Example: One Memory
Port/Structural Hazard

Time (clock cycles)


Cycle 1Cycle 2 Cycle 3 Cycle 4Cycle 5 Cycle 6Cycle 7

ALU
I Load Ifetch Reg DMem Reg

n
s

ALU
t Instr 1
Ifetch Reg DMem Reg

r.

ALU
Ifetch Reg DMem Reg
Instr 2
O
r

ALU
Reg
d Instr 3 Ifetch Reg DMem

e
r Instr 4
Structural Hazard
CS252/Culler
1/24/02 Lec 2.4
Resolving structural hazards

• Defn: attempt to use same hardware


for two different things at the same
time
• Solution 1: Wait
 must detect the hazard
 must have mechanism to stall
• Solution 2: Throw more hardware at
the problem

CS252/Culler
1/24/02 Lec 2.5
Detecting and Resolving Structural
Hazard

Time (clock cycles)


Cycle 1Cycle 2 Cycle 3 Cycle 4Cycle 5 Cycle 6Cycle 7

ALU
I Load Ifetch Reg DMem Reg

n
s

ALU
t Instr 1
Ifetch Reg DMem Reg

r.

ALU
Ifetch Reg DMem Reg
Instr 2
O
r
d Stall Bubble Bubble Bubble Bubble Bubble

ALU
r Instr 3 Ifetch Reg DMem Reg

CS252/Culler
1/24/02 Lec 2.6
Data Hazards
Time (clock cycles)

IF ID/RF EX MEM WB

ALU
add r1,r2,r3 Ifetch Reg DMem Reg

n
s

ALU
t sub r4,r1,r3 Ifetch Reg DMem Reg

r.

ALU
Ifetch Reg DMem Reg
O and r6,r1,r7
r
d

ALU
Ifetch Reg DMem Reg
e or r8,r1,r9
r

ALU
xor r10,r1,r11 Ifetch Reg DMem Reg

CS252/Culler
1/24/02 Lec 2.7
Three Generic Data Hazards

• Read After Write (RAW)


InstrJ tries to read operand before InstrI writes
it
I: add r1,r2,r3
J: sub r4,r1,r3

• Caused by a “Data Dependence” (in compiler


nomenclature). This hazard results from an
actual need for communication.

CS252/Culler
1/24/02 Lec 2.8
Three Generic Data Hazards

• Write After Read (WAR)


InstrJ writes operand before InstrI reads it
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “anti-dependence” by compiler
writers.
This results from reuse of the name “r1”.

• Can’t happen in MIPS 5 stage pipeline because:


– All instructions take 5 stages, and
– Reads are always in stage 2, and
– Writes are always in stage 5
CS252/Culler
1/24/02 Lec 2.9
Three Generic Data Hazards
• Write After Write (WAW)
InstrJ writes operand before InstrI writes it.

I: sub r1,r4,r3
J: add r1,r2,r3
• Called an “output
K: dependence” by compiler writers
mul r6,r1,r7
This also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Writes are always in stage 5
• Will see WAR and WAW in later more complicated pipes

CS252/Culler
1/24/02 Lec 2.10
Forwarding to Avoid Data Hazard

Time (clock cycles)


I
n

ALU
add r1,r2,r3 Ifetch Reg DMem Reg

s
t
r.

ALU
sub r4,r1,r3 Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg
d and r6,r1,r7
e
r

ALU
Ifetch Reg DMem Reg
or r8,r1,r9

ALU
Ifetch Reg DMem Reg
xor r10,r1,r11

CS252/Culler
1/24/02 Lec 2.11
Data Hazard Even with Forwarding

Time (clock cycles)

ALU
I lw r1, 0(r2) Ifetch Reg DMem Reg

n
s

ALU
t sub r4,r1,r6 Ifetch Reg DMem Reg

r.

ALU
O and r6,r1,r7
Ifetch Reg DMem Reg

r
d
e

ALU
Ifetch Reg DMem Reg

r or r8,r1,r9
CS252/Culler
1/24/02 Lec 2.12
Resolving this load hazard

• Adding hardware? ... not


• Detection?
• Compilation techniques?

• What is the cost of load delays?

CS252/Culler
1/24/02 Lec 2.13
Resolving the Load Data Hazard

Time (clock cycles)


I
n

ALU
s lw r1, 0(r2) Ifetch Reg DMem Reg

t
r.

ALU
sub r4,r1,r6 Ifetch Reg Bubble DMem Reg

O
r
d

ALU
Ifetch Bubble Reg DMem Reg
e and r6,r1,r7
r

ALU
Bubble Ifetch Reg DMem
or r8,r1,r9

CS252/Culler
1/24/02 Lec 2.14
Control Hazard on Branches
=> Three Stage Stall

ALU
10: beq r1,r3,36 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
14: and r2,r3,r5

ALU
Reg
18: or r6,r1,r7 Ifetch Reg DMem

ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9

ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg

CS252/Culler
1/24/02 Lec 2.15
Example: Branch Stall Impact

• Two part solution:


– Determine branch taken or not sooner, AND
– Compute taken branch address earlier
• MIPS branch tests if register = 0 or  0
• MIPS Solution:
– Move Zero test to ID/RF stage
– Adder to calculate new PC in ID/RF stage
– 1 clock cycle penalty for branch versus 3

CS252/Culler
1/24/02 Lec 2.16
Pipelined MIPS Datapath

Instruction Instr. Decode Execute Memory Writ


Fetch Reg. Fetch Addr. Access e
Next PC Next
Calc Back

MUX
SEQ PC

Adder
Adder

Zero?
4 RS1

MEM/WB
Memory
Address

EX/MEM
RS2

Reg File

ID/EX

ALU
IF/ID

Memory
MUX

Data

MUX

WB Data
Sign
Extend
Imm

RD RD RD

CS252/Culler
1/24/02 Lec 2.17
OVERCOME Branch Hazard
Alternatives
#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

#3: Predict Branch Taken

CS252/Culler
1/24/02 Lec 2.18
Extending Pipeline To Handle
Multicycle operations
 Floating point numbers have two parts
Exponents & significant
We should deal with the exponent and
significant seperately
Example:
3.25 X10^3
2.63 X10^-1
3.25 X 10^3
0.000236 X10^3 -Shift the smaller to right until
match

3.250263 X10^3
CS252/Culler
1/24/02 Lec 2.19
Cont…
 So some algorithm needs to be implemented in
order to perform the operation
 Functional unit should be redesigned to perform
all operations and this type of functional unit
require longer pipeline cycle
 Latency in the functional unit :
- Latency is defined as the number of
intervening cycles between an instruction that
produces a result and an instruction that uses
the result.
 initiation interval :
number of cycles that must elapse
between issuing two operations of a given type
CS252/Culler
1/24/02 Lec 2.20
Cont….

CS252/Culler
1/24/02 Lec 2.21
Latencies and initiation intervals
for FU

CS252/Culler
1/24/02 Lec 2.22
Pipeline support for FP
operations

CS252/Culler
1/24/02 Lec 2.23
FP example

CS252/Culler
1/24/02 Lec 2.24
Cont….

CS252/Culler
1/24/02 Lec 2.25
Cont….
Assuming that the pipeline does all hazard
detection in ID, there are three checks that must
be performed before an instruction can issue:
 Check For Structural Hazards: Wait until the
required functional unit is available
 Check for a RAW data hazard: Wait until the
source registers are not listed as pending
destinations in a pipeline register that will not
be available

 Check for a WAW data hazard: Determine if any


instruction in Al, . • • » A4,D, Ml, . . . , M7 has
the same register destination as this instruction.
CS252/Culler
1/24/02 Lec 2.26
Instruction Level Parallelism
Pipelining can overlap the execution of
instructions when they are independent of
one another. This potential overlap among
instructions is called instruction-level
parallelism (ILP) since the instructions can be
evaluated in parallel.

Instruction-level parallelism (ILP) is a


measure of how many of the operations in a
computer program can be performed
simultaneously. Consider the following
program:
1. e = a + b
2. f = c + d
3. g = e * f CS252/Culler
1/24/02 Lec 2.27
Instruction Level Parallelism

Operation 3 depends on the results of


operations 1 and 2
3 cannot be calculated until both of them are
completed
operations 1 and 2 do not depend on any
other operation, so they can be calculated
simultaneously.
A goal of compiler and processor designers is
to identify and take advantage of as much ILP
as possible. Ordinary programs are typically
written under a sequential execution model
where instructions execute one after the
other and in the order specified by the
programmer.
CS252/Culler
1/24/02 Lec 2.28
Instruction Level Parallelism

The simplest and most common way to


increase the amount of parallelism available
among instructions is to exploit parallelism
among iterations of a loop. This type of
parallelism is often called loop-level
parallelism.
Example 1
for (i=1; i<=1000; i= i+1)
x[i] = x[i] + y[i];
This is a parallel loop. Every iteration of the
loop can overlap with any other iteration,
although within each loop iteration there is
little opportunity for overlap.

CS252/Culler
1/24/02 Lec 2.29
Instruction Level Parallelism

for (i=1; i<=100; i= i+1){


a[i] = a[i] + b[i]; //s1
b[i+1] = c[i] + d[i]; //s2
}

Is this loop parallel? If not how to make it parallel?


Statement s1 uses the value assigned in the
previous iteration by statement s2, so there is a
loop-carried dependency between s1 and s2.
Despite this dependency, this loop can be made
parallel because the dependency is not circular:
- neither statement depends on itself;
- while s1 depends on s2, s2 does not depend on
s1.
CS252/Culler
1/24/02 Lec 2.30
Ideas To Reduce Stalls
Technique Reduces
Dynamic scheduling Data hazard stalls
Dynamic branch Control stalls
prediction
Iss uing multiple I deal CPI
instructions per cycle
Speculation Data and control stalls
Dynamic memory Data hazard stalls involving
disambiguation memory
Loop unrolling Control hazard stalls
Basic compiler pipeline Data hazard stalls
scheduling
Compiler dependence I deal CPI and data hazard stalls
analysis
Sof tware pipelining and I deal CPI and data hazard stalls
trace scheduling
Compiler speculation I deal CPI, data and control stalls
CS252/Culler
1/24/02 Lec 2.31
Instruction Level Data Dependence and
Hazards
Parallelism

• InstrJ is data dependent on InstrI


InstrJ tries to read operand before InstrI writes it

I: add r1,r2,r3
J: sub r4,r1,r3
• or InstrJ is data dependent on InstrK which is
dependent on InstrI
• Caused by a “True Dependence” (compiler term)
• If true dependence caused a hazard in the pipeline,
called a Read After Write (RAW) hazard

CS252/Culler
1/24/02 Lec 2.32
Instruction Level Data Dependence
Parallelism and Hazards

• Dependences are a property of programs


• Presence of dependence indicates potential for a
hazard, but actual hazard and length of any stall is
a property of the pipeline
• Importance of the data dependencies
1) indicates the possibility of a hazard
2) determines order in which results must be
calculated

Today looking at HW schemes to avoid hazard

CS252/Culler
1/24/02 Lec 2.33
Instruction Level Name Dependence
#1:
Parallelism Anti-dependence

• Name dependence: when 2 instructions use same


register or memory location, called a name, but no flow
of data between the instructions associated with that
name; 2 versions of name dependence
• InstrJ writes operand before InstrI reads it

I: sub r4,r1,r3
J: add r1,r2,r3
• K: mul r6,r1,r7
Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”
• If anti-dependence caused a hazard in the pipeline,
called a Write After Read (WAR) hazard

• CS252/Culler
1/24/02 Lec 2.34
Name Dependence
Instruction Level #2:
Parallelism Output
dependence

• InstrJ writes operand before InstrI writes it.

I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “output dependence” by compiler
writers
This also results from the reuse of name “r1”
• If anti-dependence caused a hazard in the
pipeline, called a Write After Write (WAW) hazard

CS252/Culler
1/24/02 Lec 2.35
Instruction Level Control
Parallelism Dependencies

• Every instruction is control dependent on


some set of branches, and, in general, these
control dependencies must be preserved to
preserve program order
if p1 {
S1;
};
if p2 {
S2;
}
• S1 is control dependent on p1, and S2 is
control dependent on p2 but not on p1.

CS252/Culler
1/24/02 Lec 2.36
Out-Of-Order Execution

DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
Enables out-of-order execution => out-of-order
completion

CS252/Culler
1/24/02 Lec 2.37

You might also like