0% found this document useful (0 votes)
13 views

Pipelinehazard 160823134502

The document discusses pipelining in processors. Pipelining breaks down instructions into stages to allow for more instructions to be executed simultaneously. This increases throughput by allowing new instructions to begin processing before previous ones have finished. The document outlines the stages in a RISC pipeline and describes how pipelining is implemented.

Uploaded by

atharvadongre777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Pipelinehazard 160823134502

The document discusses pipelining in processors. Pipelining breaks down instructions into stages to allow for more instructions to be executed simultaneously. This increases throughput by allowing new instructions to begin processing before previous ones have finished. The document outlines the stages in a RISC pipeline and describes how pipelining is implemented.

Uploaded by

atharvadongre777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Pipelining

Presented by
Ajal.A.J
AP/ ECE
Pipelining
 Break instructions into steps
 Work on instructions like in an assembly line
 Allows for more instructions to be executed
in less time
 A n-stage pipeline is n times faster than a non
pipeline processor (in theory)
What is Pipelining?
5

 Like an Automobile Assembly Line for


Instructions
 Each step does a little job of processing the
instruction
 Ideally each step operates in parallel
 Simple Model F1 D1 E1
 Instruction Fetch F2 D2 E2
 Instruction Decode F3 D3 E3
 Instruction Execute
pipeline

 It is technique of decomposing a sequential


process into suboperation, with each
suboperation completed in dedicated segment.
 Pipeline is commonly known as an assembly
line operation.
 It is similar like assembly line of car
manufacturing.
 First station in an assembly line set up a
chasis, next station is installing the engine,
another group of workers fitting the body.
Pipeline Stages

We can divide the execution of an instruction


into the following 5 “classic” stages:

IF: Instruction Fetch


ID: Instruction Decode, register fetch
EX: Execution
MEM: Memory Access
WB: Register write Back
RISC Pipeline Stages
 Fetch instruction
 Decode instruction
 Execute instruction
 Access operand
 Write result

 Note:
Slight variations depending on processor
Without Pipelining
• Normally, you would perform the fetch, decode,
execute, operate, and write steps of an
instruction and then move on to the next
instruction 1 2 3 4 5 6 7 8 9 10
Clock Cycle

Instr 1

Instr 2
With Pipelining
• The processor is able to perform each stage simultaneously.

• If the processor is decoding an instruction, it


may also fetch another instruction at the same
time.
Clock Cycle 1 2 3 4 5 6 7 8 9

Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Pipeline (cont.)
 Length of pipeline depends on the longest
step
 Thus in RISC, all instructions were made to
be the same length
 Each stage takes 1 clock cycle
 In theory, an instruction should be finished
each clock cycle
Stages of Execution in Pipelined
MIPS
5 stage instruction pipeline
1) I-fetch: Fetch Instruction, Increment PC
2) Decode: Instruction, Read Registers
3) Execute:
Mem-reference: Calculate Address
R-format: Perform ALU Operation
4) Memory:
Load: Read Data from Data Memory
Store: Write Data to Data Memory
5) Write Back: Write Data to Register
Pipelined Execution
Representation
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
Program Flow

 To simplify pipeline, every instruction takes


same number of steps, called stages
 One clock cycle per stage
Pipeline Problem

 Problem: An instruction may need to wait for


the result of another instruction
Pipeline Solution :

 Solution: Compiler may recognize which


instructions are dependent or independent of
the current instruction, and rearrange them to
run the independent one first
How to make pipelines faster
 Superpipelining
 Divide the stages of pipelining into more stages
 Ex: Split “fetch instruction” stage into two
stages
Super scalar pipelining
 Run multiple pipelines in parallel
Super duper pipelining
Automated consolidation of data from many
sources,
Dynamic pipeline


Dynamic pipeline: Uses
buffers to hold
instruction bits in case a
dependent instruction stalls
Pipelining
Lessons
 Pipelining doesn’t help latency (execution time)
of single task, it helps throughput of entire
workload
 Multiple tasks operating simultaneously using
different resources
 Po te ntia l speedup = Number of pipe stages
 Time to “fill” pipeline and time to “drain” it
reduces speedup

 Pipeline rate limited by slowest pipeline stage


 Unbalanced lengths of pipe stages also reduces
speedup
Performance limitations

 Imbalance among pipe stages


 limits cycle time to slowest stage
 Pipelining overhead
 Pipeline register delay
 Clock skew
 Clock cycle > clock skew + latch overhead
 Hazards
Pipeline Hazards
There are situations, called hazards, that prevent
the next instruction in the instruction stream
from executing during its designated cycle
There are three classes of hazards
Structural hazard
Data hazard
Branch hazard

Structural Hazards. They arise from resource conflicts when the hardware cannot support all
possible combinations of instructions in simultaneous overlapped execution.
Data Hazards. They arise when an instruction depends on the result of a previous instruction
in a way that is exposed by the overlapping of instructions in the pipeline.
Control Hazards.They arise from the pipelining of branches and other instructions
that change the PC.
What Makes Pipelining Hard?
Power failing,
Arithmetic overflow,
I/O device request,
OS call,
Page fault

22
Pipeline Hazards
Structural hazard
Resource conflicts when the hardware cannot support
all possible combination of instructions simultaneously
Data hazard
An instruction depends on the results of a previous
instruction
Branch hazard
Instructions that change the PC
Structural hazard
Some pipeline processors have shared a single-
memory pipeline for data and instructions
Single Memory is a Structural
Hazard
Time (clock cycles)
I
n

ALU
M Reg M Reg
s Load

ALU
t Instr 1 M Reg M Reg

r.

ALU
M Reg M Reg
Instr 2
O

ALU
M Reg M Reg
Instr 3
r

ALU
d Instr 4 M Reg M Reg

e

r
Can’t read same memory twice in same clock cycle
Structural hazard
Memory data fetch requires on FI and FO
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Structural hazard
To solve this hazard, we “stall” the pipeline until the
resource is freed
A stall is commonly called pipeline bubble, since it
floats through the pipeline taking space but carry no
useful work
Structural Hazards limit
performance
 Example: if 1.3 memory accesses per
instruction (30% of instructions execute
loads and stores)
and only one memory access per cycle then
 Average CPI ≥ 1.3
 Otherwise datapath resource is more than
100% utilized
Structural Hazard Solution: Add
more Hardware
Structural hazard
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
Data hazard
Example:
ADD R1R2+R3
SUB R4R1-R5
AND R6R1 AND R7
OR R8R1 OR R9
XOR R10R1 XOR R11
Data hazard
FO: fetch data value WO: store the executed value
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
Data hazard
Delay load approach inserts a no-operation instruction to
avoid the data conflict

ADD R1R2+R3
No-op
No-op
SUB R4R1-R5
AND R6R1 AND R7
OR R8R1 OR R9
XOR R10R1 XOR R11
Data hazard
Data hazard
 It can be further solved by a simple hardware technique called
forwarding (also called bypassing or short-circuiting)

 The insight in forwarding is that the result is not really needed by SUB
until the ADD execute completely

 If the forwarding hardware detects that the previous ALU operation


has written the register corresponding to a source for the current ALU
operation, control logic selects the results in ALU instead of from
memory
Data hazard
Data Hazard Classification
Three types of data hazards

RAW : Read After Write


WAW : Write After Write
WAR : Write After Read

• RAR : Read After Read


– Is this a hazard?

36
Read After Write (RAW)
A read after write (RAW) data hazard refers to a
situation where an instruction refers to a result that
has not yet been calculated or retrieved.
This can occur because even though an instruction
is executed after a previous instruction, the
previous instruction has not been completely
processed through the pipeline.
example:

i1. R2 <- R1 + R3
i2. R4 <- R2 + R3
Write After Read (WAR)
A write after read (WAR) data hazard represents a
problem with concurrent execution.

For example:
i1. R4 <- R1 + R5
i2. R5 <- R1 + R2
Write After Write (WAW
A write after write (WAW) data hazard may occur in
a concurrent execution environment.

example:

i1. R2 <- R4 + R7
i2. R2 <- R1 + R3

We must delay the WB (Write Back) of i2 until the execution of i1


Branch hazards
Branch hazards can cause a greater performance
loss for pipelines

When a branch instruction is executed, it may or


may not change the PC

If a branch changes the PC to its target


address, it is a taken branch Otherwise, it is
untaken
Branch hazards
There are FOUR schemes to handle branch hazards
Freeze scheme
Predict-untaken scheme
Predict-taken scheme
Delayed branch
5-Stage Pipelining
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Branch Untaken
(Freeze approach)
The simplest method of dealing with branches is to
redo the fetch following a branch

Fetch Decode Fetch Execution Write


Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Branch Taken
(Freeze approach)
The simplest method of dealing with branches is to redo
the fetch following a branch

Fetch Decode Fetch Execution Write


Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Branch Taken
(Freeze approach)
The simplest scheme to handle branches is to
freeze the pipeline holding or deleting any
instructions after the branch until the branch
destination is known

The attractiveness of this solution lies primarily


in its simplicity both for hardware and software
Branch Hazards
(Predicted-untaken)
A higher performance, and only slightly more
complex, scheme is to treat every branch as not taken

It is implemented by continuing to fetch instructions as if


the branch were normal instruction

The pipeline looks the same if the branch is not


taken
If the branch is taken, we need to redo the fetch
instruction
Branch Untaken
(Predicted-untaken)
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
Branch Taken
(Predicted-untaken)
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Branch Taken
(Predicted-taken)
An alternative scheme is to treat every branch as
taken

As soon as the branch is decoded and the target


address is computed, we assume the branch to be
taken and begin fetching and executing the
target
Branch Untaken
(Predicted-taken)
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Branch taken
(Predicted-taken)
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Delayed Branch
A fourth scheme in use in some processors is
called delayed branch
It is done in compiler time. It modifies the
code

The general format is:

branch instruction
Delay slot
branch target if taken
Delayed Branch
Optimal
Delayed
Branch
If the optimal is not
available:

(b) Act like


predict-taken
(in complier way)

(c) Act like


predict-untaken
(in complier way)
Delayed Branch
Delayed Branch is limited by
(1) the restrictions on the instructions that are
scheduled into the delay slots (for example:
another branch cannot be scheduled)
(2) our ability to predict at compile time whether a
branch is likely to be taken or not
Branch Prediction
A pipeline with branch prediction uses some
additional logic to guess the outcome of a conditional
branch instruction before it is executed
Branch Prediction
Various techniques can be used to predict whether a
branch will be taken or not:

 Prediction never taken


 Prediction always taken
 Prediction by opcode
 Branch history table

The first three approaches are static: they do not


depend on the execution history up to the time of the
conditional branch instruction. The last approach is
dynamic: they depend on the execution history.
Important Pipeline
58
Characteristics
 Latency
 Time required for an instruction to propagate through
the pipeline
 Based on the Number of Stages * Cycle Time
 Dominant if there are lots of exceptions / hazards, i.e.
we have to constantly be re-filling the pipeline
 Throughput
 The rate at which instructions can start and finish
 Dominant if there are few exceptions and hazards, i.e.
the pipeline stays mostly full
 Note we need an increased memory bandwidth
over the non-pipelined processor
Exceptions
59

 An exception is when the normal execution


order of instructions is changed. This has many
names:
 Interrupt
 Fault
 Exception
 Examples:
 I/O device request
 Invoking OS service
 Page Fault
 Malfunction
 Undefined instruction
 Overflow/Arithmetic Anomaly
 Etc!
Eliminating hazards- Pipeline
bubbling
 Bubbling the p ip e line , also known as
a p ip e line bre a k or a p ip e line s ta ll, is a method
for preventing data, structural, and branch
hazards from occurring.
 instructions are fetched, control logic
determines whether a hazard could/will occur.
If this is true, then the control logic inserts
NOPs into the pipeline. Thus, before the next
instruction (which would cause the hazard) is
executed, the previous one will have had
sufficient time to complete and prevent the
No: of NOPs = stages in pipeline
 If the number of NOPs is equal to the number
of stages in the pipeline, the processor has
been cleared of all instructions and can
proceed free from hazards. All forms of stalling
introduce a delay before the processor can
resume execution.

You might also like