Pipeline 1
Pipeline 1
CSE Sem-4
CPU Configuration and Instruction Execution Operations
Main Memory
1. Address of the next instruction is
transferred from PC to MAR. The
instruction is located in MM.
MAR MDR
2. Instruction is copied from memory to
MDR
CPU Bus
3. Instruction is transferred to IR to decode
Common
Clock
Why Pipelining?
CPU performance can be improved by:
• Improve the hardware by introducing faster circuits.
• Arrange the hardware such that more than one operation can be performed at the same time.
Execution time is comparatively less and execution Execution takes more time or more number of
is done in a fewer cycles. cycles comparatively.
It has a high throughput (amount of instructions
It has a low throughput.
executed per unit time).
In a Non-Pipelining system, the CPU scheduler
The pipeline is filled by the CPU scheduler from a chooses the instruction from the pool of waiting
pool of work which is waiting to occur. instructions, when an execution unit gives a signal
that it is free.
CPU Scheduling is a process of determining which process will own CPU for execution while another process is
on hold.
Principle of Pipeline
• The problem is divided into a series of
tasks that have to be completed one
after another.
• Each subtask can be executed by a
hardware that operates concurrently with
other pipeline stages.
• All pipeline stages works sequentially,
receiving their input from the previous
stage and transferring their output to the
next stage.
• There is a constant stream of tasks into the pipe and there is overlapped execution at the
subtask level.
• Each stage gets a new input at the beginning of the clock cycle, each stage has a single clock
cycle available for implementing the needed operations, and each stage produces the result
to the next stage by the starting of the subsequent clock cycle.
Advantages of Pipeline
Pipeline
Synchronous Asynchronous
Pipeline Pipeline
Clock Pulse
Input Output
L S1 L S2 L L Sn
Difference between latch and register: A latch loses the information (data) when passed on
to the next stage. A register retains the information until it cleared.
Types of Synchronous Pipeline
Uniform delay pipeline: In this type of pipeline, all the stages will take same time to
complete an operation.
In uniform delay pipeline,
Cycle Time (Tp) = Stage Delay + Latch Delay
Non-Uniform delay pipeline: In this type of pipeline, different stages take different
time to complete an operation.
In this type of pipeline,
Cycle Time (Tp) = Maximum(Stage Delay) + Latch Delay
For example, if there are 4 stages with delays, 1 ns, 2 ns, 3 ns, and 4 ns, then
Tp = Maximum(1 ns, 2 ns, 3 ns, 4 ns) + Latch Delay
= 4 ns + Latch Delay
Performance Metrics for Pipeline
Clock Period: It is the time required to complete a single stage (latch delay + stage delay). Time
delay for each interface latch is similar. But time delay for different combinational circuits or
different stages is different. So, max of all stage duration should take as common stage duration.
Total clock period for pipeline architecture:
τ = max τ𝑖 𝑘𝑖=1 + τl for i = 1 to k
where τ𝑖 is the time delay for stage i
τl is the time delay for latch
Speed-up: Speed-up of k-stage pipeline processor over an equivalent pipelined processor as:
Sk=T1/Tk= nk / k+(n-1) where n is the number of instructions,
k is the number of stages in each instruction.
Performance Metrics for Pipeline
Efficiency: Efficiency of a linear pipeline is measured by the percentage of busy time-space spans over
the total time-space span.
Total time-space span = number of clock pulses for pipeline * total stages
= [k+(n-1)] * k
Busy time-space span = number of instructions * number of stages
= n*k
Efficiency η = [n*k] / [k+(n-1) * k] = n / k+(n-1)
= number of instructions / number of clock pulses for pipeline
• Larger the number of tasks flowing through the pipeline, will increase the efficiency.
Throughput: The number of instructions that can be completed by a pipeline per unit time.
W = number of instructions / [number of clock pulses for pipeline * clock period]
= n / [k+(n-1)] * τ
=η/τ
Numerical on Synchronous Pipeline
Question 1: Consider a 4-segment pipeline with stage delays (2 ns, 8 ns, 3 ns, 10 ns). Find the
time taken to execute 100 tasks in the above pipeline. [Consider that there is no latch delay]
Answer: CPU time for Pipeline= (k + n – 1) Tp [ k = stages, n = tasks, Tp = Clock cycle time]
As the above pipeline is a non-uniform pipeline, Tp = max(2, 8, 3, 10) = 10 ns
CPU time for Pipeline = (4 + 100 – 1) 10 ns = 1030 ns
Question 2: A 4-stage pipeline with stage delays 150 ns, 120 ns, 160 ns, 140 ns. Latches have a
delay of 5 ns each. Find the time taken to execute 1000 tasks in the pipeline.
Answer: CPU time for Pipeline= (k + n – 1) Tp [ k = stages, n = tasks, Tp = Clock cycle time]
For non-uniform pipeline, Tp = max(150, 120, 160, 140) = 160 ns
CPU time for Pipeline = (4 + 1000 – 1) (160+5) ns
= 1003 x 165 ns = 165,495 ns = 165.5 µs
Asynchronous Pipeline Model
When the stage Si is ready to transmit intermediate data, it sends a ready signal to next stage Si+1
Stage Si+1 sends an acknowledgement signal that it’s ready to accept the incoming data
Due to the feed-forward and feedback connections, it can be reconfigured to perform different
functions at different time. Due to this, it also called Dynamic Pipeline. In this pipeline, it is not
necessary that the output will be coming from last stage only. At any stage output can be generated for
different functions.
Input
Streamline path Streamline path Output Y
S1 S2 S3
Feed-forward path
Feedback path
Difference Between Linear and Non-linear Pipeline
Linear pipeline Non linear pipeline
Non-Linear pipeline are dynamic pipeline
Linear pipeline are static pipeline because
because they can be reconfigured to perform
they are used to perform fixed functions.
variable functions at different times.
The Output of the pipeline is produced from The Output of the pipeline is not necessarily
the last stage. produced from the last stage