Pipelinehazard 160823134502
Pipelinehazard 160823134502
Presented by
Ajal.A.J
AP/ ECE
Pipelining
Break instructions into steps
Work on instructions like in an assembly line
Allows for more instructions to be executed
in less time
A n-stage pipeline is n times faster than a non
pipeline processor (in theory)
What is Pipelining?
5
Note:
Slight variations depending on processor
Without Pipelining
• Normally, you would perform the fetch, decode,
execute, operate, and write steps of an
instruction and then move on to the next
instruction 1 2 3 4 5 6 7 8 9 10
Clock Cycle
Instr 1
Instr 2
With Pipelining
• The processor is able to perform each stage simultaneously.
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Pipeline (cont.)
Length of pipeline depends on the longest
step
Thus in RISC, all instructions were made to
be the same length
Each stage takes 1 clock cycle
In theory, an instruction should be finished
each clock cycle
Stages of Execution in Pipelined
MIPS
5 stage instruction pipeline
1) I-fetch: Fetch Instruction, Increment PC
2) Decode: Instruction, Read Registers
3) Execute:
Mem-reference: Calculate Address
R-format: Perform ALU Operation
4) Memory:
Load: Read Data from Data Memory
Store: Write Data to Data Memory
5) Write Back: Write Data to Register
Pipelined Execution
Representation
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
Program Flow
Dynamic pipeline: Uses
buffers to hold
instruction bits in case a
dependent instruction stalls
Pipelining
Lessons
Pipelining doesn’t help latency (execution time)
of single task, it helps throughput of entire
workload
Multiple tasks operating simultaneously using
different resources
Po te ntia l speedup = Number of pipe stages
Time to “fill” pipeline and time to “drain” it
reduces speedup
Structural Hazards. They arise from resource conflicts when the hardware cannot support all
possible combinations of instructions in simultaneous overlapped execution.
Data Hazards. They arise when an instruction depends on the result of a previous instruction
in a way that is exposed by the overlapping of instructions in the pipeline.
Control Hazards.They arise from the pipelining of branches and other instructions
that change the PC.
What Makes Pipelining Hard?
Power failing,
Arithmetic overflow,
I/O device request,
OS call,
Page fault
22
Pipeline Hazards
Structural hazard
Resource conflicts when the hardware cannot support
all possible combination of instructions simultaneously
Data hazard
An instruction depends on the results of a previous
instruction
Branch hazard
Instructions that change the PC
Structural hazard
Some pipeline processors have shared a single-
memory pipeline for data and instructions
Single Memory is a Structural
Hazard
Time (clock cycles)
I
n
ALU
M Reg M Reg
s Load
ALU
t Instr 1 M Reg M Reg
r.
ALU
M Reg M Reg
Instr 2
O
ALU
M Reg M Reg
Instr 3
r
ALU
d Instr 4 M Reg M Reg
e
•
r
Can’t read same memory twice in same clock cycle
Structural hazard
Memory data fetch requires on FI and FO
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Structural hazard
To solve this hazard, we “stall” the pipeline until the
resource is freed
A stall is commonly called pipeline bubble, since it
floats through the pipeline taking space but carry no
useful work
Structural Hazards limit
performance
Example: if 1.3 memory accesses per
instruction (30% of instructions execute
loads and stores)
and only one memory access per cycle then
Average CPI ≥ 1.3
Otherwise datapath resource is more than
100% utilized
Structural Hazard Solution: Add
more Hardware
Structural hazard
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
Data hazard
Example:
ADD R1R2+R3
SUB R4R1-R5
AND R6R1 AND R7
OR R8R1 OR R9
XOR R10R1 XOR R11
Data hazard
FO: fetch data value WO: store the executed value
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
Data hazard
Delay load approach inserts a no-operation instruction to
avoid the data conflict
ADD R1R2+R3
No-op
No-op
SUB R4R1-R5
AND R6R1 AND R7
OR R8R1 OR R9
XOR R10R1 XOR R11
Data hazard
Data hazard
It can be further solved by a simple hardware technique called
forwarding (also called bypassing or short-circuiting)
The insight in forwarding is that the result is not really needed by SUB
until the ADD execute completely
36
Read After Write (RAW)
A read after write (RAW) data hazard refers to a
situation where an instruction refers to a result that
has not yet been calculated or retrieved.
This can occur because even though an instruction
is executed after a previous instruction, the
previous instruction has not been completely
processed through the pipeline.
example:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3
Write After Read (WAR)
A write after read (WAR) data hazard represents a
problem with concurrent execution.
For example:
i1. R4 <- R1 + R5
i2. R5 <- R1 + R2
Write After Write (WAW
A write after write (WAW) data hazard may occur in
a concurrent execution environment.
example:
i1. R2 <- R4 + R7
i2. R2 <- R1 + R3
Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Branch Untaken
(Freeze approach)
The simplest method of dealing with branches is to
redo the fetch following a branch
Time
Branch Taken
(Predicted-untaken)
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Branch Taken
(Predicted-taken)
An alternative scheme is to treat every branch as
taken
branch instruction
Delay slot
branch target if taken
Delayed Branch
Optimal
Delayed
Branch
If the optimal is not
available: