0% found this document useful (0 votes)
9 views

Pipeline 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Pipeline 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Pipelining : Basic Concepts

CSE Sem-4
CPU Configuration and Instruction Execution Operations

Main Memory
1. Address of the next instruction is
transferred from PC to MAR. The
instruction is located in MM.
MAR MDR
2. Instruction is copied from memory to
MDR
CPU Bus
3. Instruction is transferred to IR to decode

4. Control unit send signals to appropriate


PC IR ACC ALU devices (ALU, ACC, memory) to
execute the instruction.

CU 5. The result is stored.

Common
Clock
Why Pipelining?
 CPU performance can be improved by:
• Improve the hardware by introducing faster circuits.
• Arrange the hardware such that more than one operation can be performed at the same time.

 Pipelining or Pipeline Processing:


• Pipelining is a technique where multiple instructions are overlapped during execution.
• Pipelining is a process of arrangement of hardware elements of the CPU such that its overall
performance is increased.
• It allows storing and executing instructions in an orderly process.
• Pipeline has two ends, the input end and the output end. Between these ends, there are multiple
stages/segments such that output of one stage is connected to input of next stage and each stage
performs a specific operation.
• Stages are purely a combination of sequential and combinational circuits performing arithmetic and
logic operations over the data stream flowing through the pipe.
Pipeline vs non-pipeline architecture:
p1 F D O E S F = Instruction Fetch
D = Decode
p2 - - - - - F D O E S
O = Operand Fetch
p3 - - - - - - - - - - F D O E S E = Execute
Clk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 S = Storing result

Fig: 1 Non-pipelined architecture


If there are ‘K’ stages in the pipelined architecture, and
number of instructions are ‘n’ – it will take K clock pulses
S1 F1 F2 F3 to perform the 1st instruction, and after, at each clock pulse
Space or stages

S2 D1 D2 D3 1 instructions will be completed.


S3 O1 O2 O3
S4 E1 E2 E3 So number of clock pulse required to perform n
S5 S1 S2 S3 instructions in K stage pipelined architecture:
Clk 1 2 3 4 5 6 7 = K+(n-1)
Time
Speed up of pipelined architecture=
Fig: 2 Pipelined architecture
S = CP in non-pipeline / CP in pipeline
(Space time diagram)
Difference Between Pipeline and Non-pipeline Architecture
Pipeline Architecture Non-pipeline Architecture
All the actions (fetching, decoding, execution of
Pipelining is an implementation technique where
instructions and storing results into memory) are
multiple instructions are overlapped in execution.
grouped into a single step.
Many instructions are executed at the same time. Only one instruction is executed at a time.

Execution time is comparatively less and execution Execution takes more time or more number of
is done in a fewer cycles. cycles comparatively.
It has a high throughput (amount of instructions
It has a low throughput.
executed per unit time).
In a Non-Pipelining system, the CPU scheduler
The pipeline is filled by the CPU scheduler from a chooses the instruction from the pool of waiting
pool of work which is waiting to occur. instructions, when an execution unit gives a signal
that it is free.
CPU Scheduling is a process of determining which process will own CPU for execution while another process is
on hold.
Principle of Pipeline
• The problem is divided into a series of
tasks that have to be completed one
after another.
• Each subtask can be executed by a
hardware that operates concurrently with
other pipeline stages.
• All pipeline stages works sequentially,
receiving their input from the previous
stage and transferring their output to the
next stage.
• There is a constant stream of tasks into the pipe and there is overlapped execution at the
subtask level.
• Each stage gets a new input at the beginning of the clock cycle, each stage has a single clock
cycle available for implementing the needed operations, and each stage produces the result
to the next stage by the starting of the subsequent clock cycle.
Advantages of Pipeline

• It can reduce the number of cycles to perform multiple instructions.


• It can raise the multiple instructions that can be processed together and lower the delay
between completed instructions, which can increase the throughput of the system.
• The more pipeline stages a processor has, the more instructions it can process at once.
Today’s microprocessor manufacturer uses 2 to 40 stages pipeline.
• If pipelining is used, the ALU can be perform faster, but design will be more complex.
• Pipelining in theory increases performance over an un-pipelined core by a factor of the
number of stages.
• Pipelined CPUs generally work at a higher clock frequency than the RAM clock
frequency, increasing computers overall performance.
Disadvantages of Pipeline
• The design of pipelined processor is complex and costly, compared to non-pipelined
processor.
• A non-pipelined processor will have a defined instruction throughput. The performance
of a pipelined processor is much harder to predict and may vary widely for different
programs.
• In pipelining every branching operation is delayed, the entire pipeline must be cleared,
as the processor cannot know in advance where to read the next instruction and must
wait for the branch instruction to finish, leaving the pipeline empty.
• Problems occur when serial instructions being executed concurrently. Any instruction in
the pipeline might depend on the output of the previous instruction, causing the pipeline
control logic to wait and insert a wasted clock cycle into the pipeline until the
dependency is resolved.
Types of Pipeline

Pipeline

Linear pipeline Non linear


pipeline

Synchronous Asynchronous
Pipeline Pipeline

Uniform Delay Non-uniform


Pipeline Delay Pipeline
Synchronous Pipeline Model
• Each stage consists of an input buffer/latch, followed by a combinational circuit.
• The latches are fast registers for holding the intermediate results between the stages.
• The output of combinational circuit is applied to the input latch of the next segment.
• Each latch is synchronized with same clock pulse. When i/p is high, latch will transfer data to next
stage simultaneously.

Clock Pulse

Input Output
L S1 L S2 L L Sn

Difference between latch and register: A latch loses the information (data) when passed on
to the next stage. A register retains the information until it cleared.
Types of Synchronous Pipeline
 Uniform delay pipeline: In this type of pipeline, all the stages will take same time to
complete an operation.
In uniform delay pipeline,
Cycle Time (Tp) = Stage Delay + Latch Delay

 Non-Uniform delay pipeline: In this type of pipeline, different stages take different
time to complete an operation.
In this type of pipeline,
Cycle Time (Tp) = Maximum(Stage Delay) + Latch Delay

For example, if there are 4 stages with delays, 1 ns, 2 ns, 3 ns, and 4 ns, then
Tp = Maximum(1 ns, 2 ns, 3 ns, 4 ns) + Latch Delay
= 4 ns + Latch Delay
Performance Metrics for Pipeline
 Clock Period: It is the time required to complete a single stage (latch delay + stage delay). Time
delay for each interface latch is similar. But time delay for different combinational circuits or
different stages is different. So, max of all stage duration should take as common stage duration.
Total clock period for pipeline architecture:
τ = max τ𝑖 𝑘𝑖=1 + τl for i = 1 to k
where τ𝑖 is the time delay for stage i
τl is the time delay for latch

 Speed-up: Speed-up of k-stage pipeline processor over an equivalent pipelined processor as:
Sk=T1/Tk= nk / k+(n-1) where n is the number of instructions,
k is the number of stages in each instruction.
Performance Metrics for Pipeline
 Efficiency: Efficiency of a linear pipeline is measured by the percentage of busy time-space spans over
the total time-space span.
Total time-space span = number of clock pulses for pipeline * total stages
= [k+(n-1)] * k
Busy time-space span = number of instructions * number of stages
= n*k
Efficiency η = [n*k] / [k+(n-1) * k] = n / k+(n-1)
= number of instructions / number of clock pulses for pipeline

• Larger the number of tasks flowing through the pipeline, will increase the efficiency.

 Throughput: The number of instructions that can be completed by a pipeline per unit time.
W = number of instructions / [number of clock pulses for pipeline * clock period]
= n / [k+(n-1)] * τ
=η/τ
Numerical on Synchronous Pipeline
Question 1: Consider a 4-segment pipeline with stage delays (2 ns, 8 ns, 3 ns, 10 ns). Find the
time taken to execute 100 tasks in the above pipeline. [Consider that there is no latch delay]

Answer: CPU time for Pipeline= (k + n – 1) Tp [ k = stages, n = tasks, Tp = Clock cycle time]
As the above pipeline is a non-uniform pipeline, Tp = max(2, 8, 3, 10) = 10 ns
CPU time for Pipeline = (4 + 100 – 1) 10 ns = 1030 ns

Question 2: A 4-stage pipeline with stage delays 150 ns, 120 ns, 160 ns, 140 ns. Latches have a
delay of 5 ns each. Find the time taken to execute 1000 tasks in the pipeline.

Answer: CPU time for Pipeline= (k + n – 1) Tp [ k = stages, n = tasks, Tp = Clock cycle time]
For non-uniform pipeline, Tp = max(150, 120, 160, 140) = 160 ns
CPU time for Pipeline = (4 + 1000 – 1) (160+5) ns
= 1003 x 165 ns = 165,495 ns = 165.5 µs
Asynchronous Pipeline Model

 No latches are there is this mode.

 The data flow between adjacent stages is controlled by a handshaking protocol.

 When the stage Si is ready to transmit intermediate data, it sends a ready signal to next stage Si+1

 Stage Si+1 sends an acknowledgement signal that it’s ready to accept the incoming data

 Stage Si will send the data to Si+1

Input Data Data Output

Ready S1 Ready S2 Ready Sk Ready

ACK ACK ACK ACK


Non-linear Pipeline Model
 A pipeline containing feed-forward and feedback connections in addition to the streamline
connection is called a non-linear pipeline.

 Due to the feed-forward and feedback connections, it can be reconfigured to perform different
functions at different time. Due to this, it also called Dynamic Pipeline. In this pipeline, it is not
necessary that the output will be coming from last stage only. At any stage output can be generated for
different functions.

 Non-linear pipeline are used for recursion problems.


Output X Feedback path

Input
Streamline path Streamline path Output Y
S1 S2 S3

Feed-forward path

Feedback path
Difference Between Linear and Non-linear Pipeline
Linear pipeline Non linear pipeline
Non-Linear pipeline are dynamic pipeline
Linear pipeline are static pipeline because
because they can be reconfigured to perform
they are used to perform fixed functions.
variable functions at different times.

Non-Linear pipeline allows feed-forward and


Linear pipeline allows only streamline
feedback connections in addition to the
connections
streamline connection

Function partitioning is relatively difficult


It is relatively easy to partition a given
because the pipeline stages are interconnected
function into a sequence of linearly ordered
with loops in addition to streamline
sub functions.
connections

The Output of the pipeline is produced from The Output of the pipeline is not necessarily
the last stage. produced from the last stage

You might also like