Pipeline Hazards
Pipeline Hazards
PIPELINE HAZARDS
Falguni Sinhababu
Government College of Engineering and Leather Technology
PIPELINE HAZARDS 1
Pipeline the data path
PIPELINE HAZARDS
PIPELINE IN EXECUTION INSTRUCTIONS
▪ Instruction execution is typically divided into 5
stages:
▪ Instruction Fetch (IF)
▪ Instruction Decode (ID)
▪ ALU Operation (EX)
▪ Memory Access (MEM)
▪ Write back result to register (WB)
▪ These five stages can be executed in a overlap
fashion in a pipeline architecture.
▪ Results in significant speedup by overlapping instruction
execution.
PIPELINE HAZARDS
BASIC REQUIREMENT
▪Basic requirements for pipeline the data path
▪ We should able to start a new instruction in every
clock cycle.
▪ Each of the five steps mentioned before (IF, ID, EX,
MEM and WB) becomes a pipeline stage.
▪ Each stage must finish its execution within one clock
cycle.
▪Since execution of several instructions are
overlapped, we must ensure that there is no
conflict during the execution.
PIPELINE HAZARDS
SPEEDUP
▪ If time to execute each stage = T
▪ Total time to execute nonpipelined 5 stages (IF + ID + EX + MEM + WB) = 5T
▪ So total time to execute n instructions = 5Tn
L L L L L L
IF
A A
ID EX MEM WB
A A A A
T T T T T T
C C C C C C
H H H H H H
PIPELINE HAZARDS
PIPELINE EXECUTION
Clock Cycles
instruction 1 2 3 4 5 6 7 8
i IF ID EX MEM WB
i+1 IF ID EX MEM WB
i+2 IF ID EX MEM WB
i+3 IF ID EX MEM WB
PIPELINE HAZARDS
ADVANTAGES OF PIPELINE
▪ In the non-pipelined version, the execution time of an
instruction is equaled to the combined delay of the 5
stages ( say, 5T).
▪ In the pipelined version, once the pipeline is full, one
instruction gets executed after every T time.
▪ Assuming all state delays are equal (T) (neglecting latch delay)
▪ However, due to various conflicts between instructions
(called hazards), we cannot achieve the ideal
performance.
▪ Several techniques have been propose to improve the
performance
PIPELINE HAZARDS
SOME OBSERVATIONS
▪ To support overlapped execution, peak memory
bandwidth must be increased 5 times over that required
for the non-pipelined version.
▪ An instruction fetch must occur every clock cycle.
▪ Also there can be two memory accesses per clock cycle
(one for instruction and one for data).
a. Separate instruction and data caches are typically
used to support this.
I-Cache
CPU
D-Cache
PIPELINE HAZARDS
SOME OBSERVATIONS
b. The register bank is accessed both in the stages ID and WB
▪ In the ID stage, we are prefetching both the register operands source 1 and
source 2, so ID requires two register reads, and WB requires 1 register write.
▪ We thus have the requirement of 2 reads and 1 write in every clock cycle.
▪ Two register reads can be supported by having two read ports.
▪ Simultaneous reads and write may result in clashes (e.g. same register used).
▪ Solution: perform the write operation during the first half of the clock cycle, and the reads
during the second half of the clock cycle
PIPELINE HAZARDS
SOME OBSERVATIONS
c. Since a new instruction is fetched every
clock cycle, it is required to increment the
PC on every clock.
▪PC updating has to be done during IF stage
itself, as otherwise the next instruction cannot be
fetched.
▪In the nonpipelined version discussed earlier,
this was done during the MEM stage.
PIPELINE HAZARDS
BASIC PERFORMANCE ISSUES IN A PIPELINE
L L L L L L
IF
A A
ID EX MEM WB
A A A A
T T T T T T
C C C C C C
H H H H H H
PIPELINE HAZARDS
EXAMPLE 1
▪ Consider the 5 stage MIPS32 pipeline, with the
following features:
▪ Pipeline clock rate of 1 GHz (i.e. 1 ns clock cycle time).
▪ For a non-pipelined implementation, ALU operations and
branches takes 4 cycles, while memory operation takes 5
cycles.
▪ Relative frequencies of ALU operations, branches and
memory operations are 50%, 15% and 35% respectively.
▪ In the pipelined implementation, due to clock skew and setup
time, the clock cycle time increases by 0.25 ns.
▪ Calculate the estimated speedup of the pipelined
implementation
PIPELINE HAZARDS
SOLUTION
a) For non-pipelined processor:
▪ Average instruction execution cycle = clock cycle
time x average CPI
▪ 1 ns x (0.50 x 4 + 0.15 x 4 + 0.35 x 5) = 4.35 ns
b) For pipelined processor:
▪ Clock cycle time = 1 + 0.25 = 1.25 ns
▪ In the steady state (there are large number of
instructions), one instruction will get executed
every clock.
▪ Speedup = 4.35 / 1.25 = 3.48
PIPELINE HAZARDS
Pipeline Hazards
PIPELINE HAZARDS
INTRODUCTION
▪ Pipeline Hazards:
▪ Ideally, an instruction pipeline should complete the
execution of an instruction in every clock cycle.
▪ Hazards refer to the situations that prevent a pipeline from
operating at its maximum possible clock speed.
▪ Prevents some instructions from executing during its
designated clock cycle.
▪ Three types of hazards:
▪ Structural Hazard: Arise due to resource conflicts.
▪ Data Hazard: Arise due to data dependencies between
instructions.
▪ Control Hazard: Arise due to branch and other instructions
that change the PC.
PIPELINE HAZARDS
WHAT HAPPENS WHEN HAZARDS APPEAR?
▪ We can use special hardware and control circuits to avoid
the conflict that arise due to hazard.
▪ As an alternative, we can insert stall cycles in the
pipeline.
▪ When an instruction is stalled, all instructions that follow also get
stalled.
▪ Number of stall cycles depends on the criticality of the hazard.
▪ Instructions before the stall instruction can continue, but no new
instructions are fetched during the stall.
▪ In general, hazards result in performance degradation.
PIPELINE HAZARDS
CALCULATION OF PIPELINE DEGRADATION
PIPELINE HAZARDS
CALCULATION OF PIPELINE DEGRADATION
▪ If we restrict ourselves to stall cycles only, we can write:
▪ CPIpipe = Ideal CPI + (Pipeline Stalls Cycles per instruction)
▪ We can thus write:
PIPELINE HAZARDS
(A) STRUCTURAL HAZARDS
▪ They arise due to resource conflicts when the hardware
cannot support overlapped execution under all possible
scenarios.
▪ Examples:
▪ Single memory (cache) to store instructions and data: while an
instruction is being fetched some other instruction is trying to read
or write data.
▪ An instruction is trying to read data from the register bank (in ID
stage), while some other instruction is trying to write into a register
(in WB stage).
▪ Some functional unit (like floating – point ADD or MULTIPLY) is not
fully pipelined.
▪ A sequence of instructions that try to use the same functional unit will result in
stalls.
PIPELINE HAZARDS
ILLUSTRATION
▪ Structural Hazard in a single port memory system, which stores
both instructions and data.
Instruction 1 2 3 4 5 6 7 8
LW R1, 10(R2) IF ID EX MEM WB
ADD R3, R4, R5 IF ID EX MEM WB
SUB R10, R2, R9 IF ID EX MEM WB
AND R5, R7, R7 IF ID EX MEM WB
ADD R2, R1, R5 IF ID EX MEM
PIPELINE HAZARDS
WHY STRUCTURAL HAZARDS APPEAR IN REAL MACHINES?
▪ Why can’t we remove structural hazards by
incorporating new hardware?
a) To reduce the cost of implementation.
b) Pipelining all the functional units may be too costly.
c) If structural hazards are not frequent, it may not be
worth the effort and cost to avoid it.
▪ Memory access structural hazard is quite frequent.
▪ Makes use of separate instruction and data caches in the first
level.
PIPELINE HAZARDS
DATA HAZARDS
▪ Data Hazards occur due to data dependencies between
instructions that are in various stages of execution in the
pipeline.
Instruction 1 2 3 4 5 6
ADD R2, R5, R8 IF ID EX MEM WB
SUB R9, R2, R6 IF ID EX MEM WB
Instruction 1 2 3 4 5 6 7 8
ADD R2, R5, R8 IF ID EX MEM WB
SUB R9, R2, R6 IF ID STALL STALL ID EX MEM
▪ The first instruction computes R4, which is required by all the subsequent 4 instructions.
▪ The result of the 1st instruction at stage 5 (WB) is used by the ID stage of all the
subsequent 3 instructions.
▪ The last instruction (OR) is not affected by the data dependency
▪ The last AND instruction is solved by splitting the resistor access, writing is done in the
first half of the clock cycle and reading is done in the second half of the clock cycle.
No data
hazards here
Read Write Read Write Read Write
Instruction 1 2 3 4 5 6 7 8 9
ADD R4, R5, R6 IF ID EX MEM WB
SUB R3, R4, R8 IF ID EX MEM WB
ADD R4, R2, R4 IF ID EX MEM WB
AND R9, R4, R10 IF ID EX MEM WB
OR R11, R4, R5 IF ID EX MEM WB
Both
interlocked
Two Load are
Interlocks eliminated
Branch is
taken (1
cycle
penalty)
Branch is
taken (1
cycle
penalty)
Branch Delay
Slots For MIPS32, n=1
▪ Just like in load delay slots, the task of the compiler is to try and fill-up the
branch delay slots with meaningful instructions.
▪ Instructions in the branch-delay slot are always executed irrespective of
the outcome of the branch.
PIPELINE HAZARDS 24-Apr-24 52
EXAMPLES
▪ LW causes a page data fault (cycle 4), and ADD causes an instruction page
fault (cycle 2).
L L L L L L
IF ID EX WB
A A A A A A
T
C
T
C
T
C
T
C
MEM T
C
T
C
H H H H H H
Status Register
▪ Solution: the hardware posts each interrupt in a status vector, which is
carried with each instruction as it moves through the pipeline.
▪ When an instruction reaches WB, the interrupt status vector is checked and
handled if present.
▪ Guarantees that all interrupt handling is carried out in precise order.
PIPELINE HAZARDS 24-Apr-24 60
THANK YOU
PIPELINE HAZARDS