0% found this document useful (0 votes)
42 views

Instruction Pipeline - Study Notes

Uploaded by

ch mounika
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Instruction Pipeline - Study Notes

Uploaded by

ch mounika
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Instruction

Pipeline
COMPUTER ORGANIZATION

Copyright © 2014-2021 Testbook Edu Solutions Pvt. Ltd.: All rights reserved
Download Testbook

Instruction Pipeline

Pipelining
Mechanism for overlapping execution of many input sets by dividing one computation stage into many (Let k)
computation sub-stages.

 Cost of implementation increases slightly.

 Speed up increases

Working of Pipeline

S1 must happen before S2, S3 and S2 must happen before S3 (sequential execution)

T/3 T/3 T/3 T/3 T/3

S1 Item 1 Item 2 Item 3

S2 Item 1 Item 2 Item 3

S3 Item 1 Item 2 Item 3

 Note: When Item 1 is in S2 stage, S1 will be empty so we can use S1 for Item 2 that time and parallel
execution of Item 1 in stage 2 can also happen.

 In the processor pipeline we need a latch between successive stages to hold the intermediate results
temporarily.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 2


Download Testbook

Pipelined Processors
a. Degree of Overlap:
 Serial: Next operation starts only after the previous operation gets completed.

 Overlapped: Some overlap between consecutive stages.

 Pipelined: Compute overlap between successive stages.

b. Depth of Pipeline:
 Performance of the pipeline depends on the number of stages and how they are utilized without conflict.

 Shallow pipeline has fewer number of stages.

Stages more complex:

 Deep pipeline has larger number of stages

Stages are simpler.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 3


Download Testbook

c. Scheduling alternatives:
 Static Pipeline:

i. Same sequence of pipeline stages is executed for all data/instructions.

ii. If one instruction stalls, all subsequent ones also get delayed.

 Dynamic Pipeline:

i. Can be reconfigured to perform variable functions at different times.

ii. Feed forward and feedback b/w stages.

Speedup and Efficiency


τ: Clock period of the pipeline

ti: Time delay of circuit in stage Si

d2: delay of a catch.

Maximum stage delay, τm = max{ti}

τ = τm + dL

Pipeline frequency, 1/c

Speed up for K-stage pipeline with n-inputs:

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 4


Download Testbook

Latency
 The number of time units between two inputs initialization of a pipeline is called the latency between
them.

 When two or more inputs attempt to use the same pipeline stage at same time, it will cause collision.

 Latencies, after using which cause collisions, are called forbidden latencies.

Pipelining MIPS32 Data Path


Assumptions:

 Each of 5 steps: If, ID, EX, MEM and WB let them as pipeline stages.

 Each stage, let must finish its execution within one clock cycle.

 Since many instructions will be overlapped, we must ensure that there is no conflict.

 We can achieve these assumptions easily.

Let each stage take ‘T’ time units.

Time to execute n instructions = 5 * T * n

In pipelined:

Time to execute n instructions = 5(T + Δ) + (n - 1) (T + Δ)

= (4 + n) (T + Δ)

= (4 + n)T, if T >>Δ

≃5 if n is very large.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 5


Download Testbook

Conflict Stages
 IF and MEM: Both these stages access memory. So, they should not be in the same cycle.
SOLUTION: Using separate instruction and data cache. (i-cache and d-cache)

 ID and WB: Both these stages access register banks. So, they should not be used in the same stock cycle.
SOLUTION: Allow both read and write access to registers in the same clock cycle.

 Simultaneous read and write may result in caches.


SOLUTION: Write in the 1st half of the cycle and read in the 2nd half of the cycle.

Points to Remember
1. Since, in a pipelined processor we have to fetch an instruction every clock cycle. Hence, we need to incre-
ment the program counter at the fetch stage itself. Otherwise, the next instruction will not be fetched.

2. In a non-pipelined processor there is no need to fetch an instruction every clock cycle. So, we increment
the program counter in the MEM stage.

Basic Performance Issue in Pipeline


 Register stages are inserted with each pipeline stage, which increases overall execution time of first
(single) instruction.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 6


Download Testbook

Pipeline Hazards
 An instruction pipeline should complete the execution of an instruction every clock cycle.

 Hazards are situations which prevents this from happening (for some instructions)

Hazards
1. Structural Hazards (Resource conflicts)

2. Data Hazards (Data Dependencies)

3. Control Hazard (Branch and relocation change in program counter)

Solution for Hazards


Using special hardware and control circuits.

Inserting stall cycles in pipeline

 When one instruction is stalled, all others that follow that instruction will also get stalled.

 No new instruction can be fetched during the duration of stall.

 Hazards result in performance degradation.

Structural Hazards
 Due to resource conflicts.

 When hardware cannot support overlapped execution.

 Example: single memory cache to store instruction and data.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 7


Download Testbook

Eliminating Structural Hazards

 To reduce the cost of implementation.

 Pipelining all the functional units may be too costly.

 If structural hazards are not frequent but them happen.

 Make use of operating I & D cache.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 8


Download Testbook

Data Hazards
Data hazards occur due to data dependencies between instructions.

I1: ADD R2, R5, R8 I2: SUB R2, R2, R6.

Basic Solution: It inserts stall cycles → 3 clocks will be wasted

To reduce number of clock cycles


a) Data forwarding/bypassing: As soon as data is computed it will be forwarded using some additional
hardware consisting of multiplexers, without waiting for data to be written back.

b) Concurrent Register Access: By splitting a clock cycle into two halves.

First half: Register read

Second half: Register write.

Bypassing
 The result computed by the previous instruction is stored in some register within the data path.

 Take the value directly from the register and forward to instruction required.

Register Read/Write
 To reduce the number of instructions to be forwarded.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 9


Download Testbook

 We can avoid conflict which is occurring in some cycle i.e. WB and ID in the same cycle by using Register
Read/Write Scheme.

 In first half Register Write (in WB)

 In the second half cycle Register Read (in ID).

Data Hazard while Accessing Memory


Memory references are always in order, and so data hazards between memory references never occur.

Cache miss can result in pipeline stalls.

Load instruction followed by the used of the loaded data:

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 10


Download Testbook

How to solve this problem?


 Cannot be eliminated using forwarding

 Pipeline Interlock: In hardware detects the hazard and stalls the pipeline until the hazard is cleared.

 One stall cycle is needed.

Instruction Issue
 Before Ex stage, in ID stage we will decode the instruction, in a typical ALU instruction.

 When we are moving from ID to EX stage it means we are starting to execute the operation. It is when we
issue the instruction.

 All possible data hazards can be checked in the ID stage itself.

If a data hazard exists, the instruction is stalled before it is issued.

Instruction Scheduling or Pipe Scheduling


Compiler tries to avoid generation code with a :

MIPS 32 Code

LW R1, a

LW R2, b

SUB R8, R1, R2 ← Interlock

SW R8, x

LW R1, c

LW R2, d

ADD Ra, R1, R2 ← Interlock

SW R9, y

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 11


Download Testbook

Schedule By Compiler MIPS 32 Code:

LW R1, a

LW R2, b

LW R3, c

SUM R8, R1, R2 Both interlocks eliminated

LW R4, d

SW R8, x

ADD R9, R3, R4

SW R9, y

 Pipeline Scheduling can increase number of registers required but result in performance improvement

 Load instruction requires that the next instruction should not use the currently loaded value which is
delayed load.

 If the compiler cannot move some instruction to fill up the delay slot, it can insert a NOP (No operation)
instruction.

Types of Data Hazards


a) Read After Write (RAW):
Consider two instructions i1 and i2, with i1 occurring before P2 in the program.

 i2 tries to read a source before i1 writes to it

 Situation where an instruction refers to a result that has not yet been calculated.

Example:

i1: R2 ← R5 + R3

i2: R4 ← R2 + R3

b) Write After Read: (WAR)


 i2 tries to write a destination before it is read by i1.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 12


Download Testbook

 Problem with concurrent execution.

Example:

i1: R4 ← R1 + R5

i2: R5 ← R1 + R2

c) Write After Write (WAW):

 i2 tries to write an operand before it is written by i1.

Example:

i1: R2 ← R4 + R7

i2: R2 ← R1 + R3

Control Hazard
 Arise because of change in flow of control or branch instructions.

 Greater performance loss can be generated that data hazards.

 If the branch is taken the PC is normally net updated until the end of MEM.

 The next instruction can be fetched only after that (3 stall cycles)

 We will redo the fetch again & again if it is a branch.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 13


Download Testbook

To Reduce Branch Stall Penalty


 In MIPS 32, the branches require testing a register for zero, or comparing the values of two registers.

 Using these registers by comparison logic, we can complete computation of effective addresses by the
end of the ID stage.

Delayed Branch Technique


→ If branch instruction has a penalty of n stall cycles, the execution cycle of a branches instruction:

Branch Instruction

→ Task of the compiler is to try filling up these delay slots to make more effective use.

→ Instructions in branch delay slots are always executed irrespective of whether the branch is taken or not.

COMPUTER ORGANIZATION | Instruction Pipeline PAGE 14

You might also like