0% found this document useful (0 votes)
30 views

BCS-29 Advanced Computer Architecture: Linear & Nonlinear Pipelines Instruction Pipelines & Arithmetic Operations

1. Pipelining is a technique used in computer architecture where multiple instructions are overlapped in execution to increase throughput. It works similar to an assembly line where tasks are divided into subtasks performed by specialized hardware stages concurrently. 2. Linear pipelines have a fixed sequence of stages connected in a linear fashion, while non-linear pipelines allow feedback and reconfiguration between stages. 3. Performance metrics like speedup, efficiency, and throughput are used to analyze pipelined processors and how well they utilize the hardware stages. Stalls and dependencies between instructions need to be addressed.

Uploaded by

MyselfSHAAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

BCS-29 Advanced Computer Architecture: Linear & Nonlinear Pipelines Instruction Pipelines & Arithmetic Operations

1. Pipelining is a technique used in computer architecture where multiple instructions are overlapped in execution to increase throughput. It works similar to an assembly line where tasks are divided into subtasks performed by specialized hardware stages concurrently. 2. Linear pipelines have a fixed sequence of stages connected in a linear fashion, while non-linear pipelines allow feedback and reconfiguration between stages. 3. Performance metrics like speedup, efficiency, and throughput are used to analyze pipelined processors and how well they utilize the hardware stages. Stalls and dependencies between instructions need to be addressed.

Uploaded by

MyselfSHAAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

BCS-29

Advanced Computer Architecture


Pipelined Processing
Linear & Nonlinear pipelines
Instruction Pipelines & Arithmetic Operations
Principles of Pipelining
• A pipeline may be compared directly with an
assembly line in a manufacturing plant. Thus
• Input task or process is subdivided into a sequence of subtasks;
• Each subtask is executed by a specialized hardware stage;
• Many such hardware stages operate concurrently;
• When successive tasks are streamed into the pipeline they are
executed in an overlapped fashion at the subtask level.
• The creation of the correct sequence of subtasks is crucial to
the performance of the pipeline.
• Slowest subtask is the bottleneck in the pipeline.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-2


Linear Pipeline
• A linear pipeline processor is a cascade of Processing Stages
which are linearly connected to perform fixed function over a
stream of data flowing from one end to the other.
• Linear pipeline are static pipeline because they are used to
perform fixed functions.
• On the basis of the control of data flow along the pipeline. we
model linear pipelines in two categories:
• Synchronous Pipeline
• Clocked latches between Stage i and Stage i+1
• Equal delays in all stages
• Asynchronous Pipeline (Handshaking)

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-3


Asynchronous Pipeline

Input Output
Ready Ready Ready Ready
S1 S2 Sk
Ack Ack Ack Ack

• Data flow between adjacent stages in an asynchronous


pipeline is controlled by a handshaking protocol.
• Asynchronous pipelines are useful in designing
communication channels in massage passing multicomputer
• Asynchronous pipelines may have variable throughput rate.
Different amount of delay may be experienced in different
stages.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-4


Synchronous Pipeline
L0 L1 L2 Lk-1 Lk

S1 S1 S1

Clock
t tM d

• Clocked latches are used to interface between stages. On the arrival of a


clock pulse, all latches transfer data to the next stage simultaneously.
• The pipeline stages are combinational logic circuits. It is desired to have
approximately equal delays in all stages.
• These delays determine the clock period and thus the speed of the
pipeline.
Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-5
Reservation Table
• The utilization pattern of successive stages in a pipeline is
specified by a Reservation Table
Time Time
X
S1 T1 T2 T3 T4 T5 T6 T7 T8

X S1 X X X X X
S2
S2 X X X X X
X
S3
S3 X X X X X
X
S4 S4 X X X X X

• Reservation Table of a 5 tasks on 4 stages


four stage linear pipeline

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-6


Clock period and frequency
Consider ti to be time delay due to logic circuitry in
stage Si , d to be time delay of each interface latch.
• Then
• The clock period, t , of a linear pipeline is given by
t = max {t i } + d
= tM + d

• The frequency, f = 1 / t
= 1 / [t M + d ] (i.e., reciprocal of clock period)

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-7


Speedup, Efficiency & Throughput
• Speedup
• Ideally, a linear pipeline of k stages can process n task in k
+ (n-1) clock so total time required is
TK = { k + (n-1)} t
• Time required for nonpipelined processing
T1 = n k t
Speedup factor SK = T1 / Tk
SK = n k / k + (n-1)
• The maximum speedup is SK → k as n → .
• This maximum speedup is very difficult to achieve because of
the dependence structure of the program.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-8


Speedup, Efficiency & Throughput
• Pipeline efficiency:
 = Sk / k
= n / k + (n-1)
So efficiency = 1 when n→ and
lower bound of  is 1 / k when n=1
• Throughput i.e. Number of task performed per unit
time
Hk = n / { k + (n-1)} t = n f / k +(n-1)
Maximum throughput is f, when efficiency = 1 as n→

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-9


Linear Instruction Pipelines
Assume the following instruction execution phases:
Fetch (F)
Decode (D)
Operand Fetch (O)
Execute (E)
Write results (W)

Execution F I1 I2 I3
D I1 I2 I3
O I1 I2 I3
E I1 I2 I3
W I1 I2 I3
Dr P K Singh Slide-2.10
Dependencies
• Data Dependency
(Operand is not ready yet)
• Instruction Dependency
(Branching)
Will that Cause a Problem?

Data Dependency 1 2 3 4 5 6
I1 -- Add R1, R2, R3
I2 -- Sub R4, R1, R5
F I1 I2
Solutions
D I1 I 2
STALL
O I 1 I2
Forwarding E I1 I2
Write and Read in one cycle
….
W I1 I2
Dr P K Singh Slide-2.11
Instruction Dependency
I1 – Branch o
I2 – 1 2 3 4 5 6

Solutions
F I1 I2
STALL
D I1 I2
Predict Branch taken O I1 I2
Predict Branch not taken
….
E I1 I2
W I1 I 2

Dr P K Singh Slide-2.12
Non Linear Pipelines
• Non-Linear pipeline are dynamic pipeline because they
can be reconfigured to perform variable functions at
different times.
• Non-Linear pipeline allows feed-forward and feedback
connections in addition to the streamline connection.

X Y

S1 S2 S3

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-13


Difference Between Linear and Non-Linear
pipeline
Linear Pipeline Non-Linear Pipeline
Linear pipeline are static pipeline Non-Linear pipeline are dynamic pipeline because
because they are used to perform fixed they can be reconfigured to perform variable
functions. functions at different times.
Linear pipeline allows only streamline Non-Linear pipeline allows feed-forward and feedback
connections. connections in addition to the streamline connection.
It is relatively easy to partition a given Function partitioning is relatively difficult because the
function into a sequence of linearly pipeline stages are interconnected with loops in
ordered sub functions. addition to streamline connections.
The Output of the pipeline is produced The Output of the pipeline is not necessarily produced
from the last stage. from the last stage.
The reservation table is trivial in the The reservation table is non-trivial in the sense that
sense that data flows in linear there is no linear streamline for data flows.
streamline.
Static pipelining is specified by single Dynamic pipelining is specified by more than one
Reservation table. Reservation table.
All initiations to a static pipeline use the A dynamic pipeline may allow different initiations to
same reservation table. follow a mix of reservation tables.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-14


Reservation Table
X Y

S1 S2 S3

• There are two reservation tables corresponding to a function X and a


function Y, respectively. Each function evaluation is specified by one
reservation table.
• A dynamic pipeline may be specified by more than one reservation table.
• Each reservation table displays the time-space flow of data through the
pipeline for one function evaluation.
• Different functions follow different paths through the pipeline.
• The number of columns in a reservation table is called the Evaluation time
of a given function.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-15


Reservation Table
Reservation table for function X

S1 X X X

S2 X X
S3 X X X

Reservation table for function Y


S1 Y Y
S2 Y
S3 Y Y Y
Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-16
Reservation Table
• The check marks in each row of the reservation table
correspond to the time instants (cycles) that a particular stage
will be used.
• There may be multiple check marks in a row, which means
repeated usage of the same stage in different cycles.
• Contiguous check marks in a row simply imply the extended
usage of a stage over more than one cycle.
• Multiple check marks in a column mean that multiple stages
need to be used in parallel during a particular clock cycle.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-17


Latency Analysis
• Latency
• The number of time units [clock cycles] between two initiations of a
pipeline is the Latency between them.
• A latency of K means that two initiations are separated by K clock
cycles.
• Collision
• Any attempt by two or more initiations to use the same pipeline stage
at the same time will cause a collision.
• A collision implies resource conflicts between two initiations in the
pipeline. Therefore, all collisions must be avoided in scheduling a
sequence of pipeline initiations.
Forbidden Latency: Latencies that cause collisions.
Permissible Latency: Latencies that will not cause collisions.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-18


Forbidden Latencies
• X after X
2

S1 X1 X2 X1 X2 X1

S2 X1 X2 X1 X2

S3 X1 X2 X1 X2 X1

S1 X1 X2 X1 X1

S2 X1 X1 X2

S3 X1 X1 X1 X2

Dr P K Singh Slide-2.19
Forbidden Latencies
• X after X
4
S1 X1 X2 X1 X1

S2 X1 X1 X2 X2

S3 X1 X1 X2 X1

7
S1 X1 X1 X2 X1

S2 X1 X1

S3 X1 X1 X1

Dr P K Singh Slide-2.20
Permissible Latencies
• X after X
1

S1 X1 X2 X1 X2 X1

S2 X1 X2 X1 X2

S3 X1 X2 X1 X2 X1 X2

S1 X1 X2 X1 X1

S2 X1 X1 X2 X2

S3 X1 X1 X2 X1 X2

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-21


Nonlinear Pipeline Design
• Latency Sequence
• A sequence of permissible latencies between successive task initiations
• Latency Sequence → 1, 8
• Latency Cycle
• A Latency Cycle is a latency sequence which repeats the same
subsequence (cycle) indefinitely.
• Latencies Cycle → (1,8) → 1, 8, 1, 8, 1, 8 …
• Average latency
• The average latency of a latency cycle is obtained by dividing the sum of all
latencies by the number of latencies along the cycle.
• Average Latency (of a latency cycle) → sum of all latencies / number of
latencies along the cycle {(1+8)/2=4.5}
• Constant Cycle → One latency value (3*)
Objective → Obtain the shortest average latency between
initiations without causing collisions.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-22


Latency Cycle (1,8)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

X1 X2 X1 X2 X1 X2 X3 X4 X3 X4 X3 X4 X5 X6

X1 X2 X1 X2 X3 X4 X3 X4 X5 X6

X1 X2 X1 X2 X1 X2 X3 X4 X3 X4 X3 X4 X5

Average Latency = (1+8)/2 = 4.5

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-23


Scheduling events
• Collision-Free Scheduling:
• When scheduling events in a nonlinear pipeline, the main objective is
to obtain the shortest average latency between initiations without
causing collisions.
• Collision vector
• By examining the reservation table, one can distinguish the set of
permissible latencies and set of forbidden latencies.
• The combined Set of permissible and forbidden latencies can be easily
displayed by a collision vector

• C = (Cm, Cm-1, …, C2, C1), m <= n-1


• n = number of column in reservation table
• Ci = 1 if latency i causes collision, 0 otherwise

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-24


Collision Vector
Forbidden Latencies: 2, 4, 5, 7
Collision Vector =
1011010
The next state is obtained by bitwise ORing the
initial collision vector with the shifted register
• C.V. = 1 0 1 1 0 1 0 (first state)

0 1 0 1 1 0 1 C.V. 1-bit right shifted


1 0 1 1 0 1 0 initial C.V.
---------------- OR
1111111
State Transition
State Transition using an n-bit Right shift register (n is
maximum forbidden latency)

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-26


Latency Analysis
X after X
C.V.
Grant X
Gate

OR

0
Grant X if 0

8+

1011010
Cycles: (1, 8), (I, 8, 6, 8), (1,
8, 3, 8), (3), (6), [3, 8), (3, 6,
8+
3
6 8+
3) and many more are the
1*

1011011
legitimate cycles may be
1111111

3* 6
traced.
Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-27
Latency Analysis
• Latency Sequence:
• A sequence of permissible latencies between successive initiations
• Latency Cycle:
• A latency sequence that repeats the same subsequence (cycle)
indefinitely
• Simple cycles:
• A simple cycle is a latency cycle in which each state appears only
once.
(3), (6), (8), (1, 8), (3, 8), and (6,8)
• Greedy Cycles:
• Simple cycles whose edges are all made with minimum latencies from
their respective starting states.
• Greedy cycles must first be simple, and their average latencies must
be lower than those of other simple cycles.
(1,8), (3) → one of them is MAL(Minimum Average latency)
Minimum Average latency(MAL)
• The minimum-latency edges in the state diagrams are
marked with asterisks.
• At least one of the greedy cycles will lead to the MAL.

• Bounds on the MAL


• MAL is lower bounded by the maximum number of checkmarks
in any row of the reservation table.
• MAL is lower than or equal to the average latency of any greedy
cycle in the state diagram.
• The average latency of any greedy cycle is upper-bounded by the
number of 1’s in the initial collision vector plus 1. This is also an
upper bund on the MAL.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-29


Optimization of MAL
• To optimize the MAL, one needs to find the lower bound by modifying the
reservation table.
• The approach is to reduce the maximum number of check marks in any
row.
• The modified reservation table must preserve the original function being
evaluated.
• use of non-compute delay stages to increase pipeline performance with a
shorter MAL.
Delay Insertion:
• The purpose of delay insertion is to modify the reservation table, yielding
a new collision vector.
• This leads to a modified state diagram, which may produce greedy cycles
meeting the lower hound on the MAI...

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-30


Delay Insertion

output

S1 S2 S3

MAL = 3
Reservation Tables
1 2 3 4 5
S1 X X
5+ 1011 3*
S2 X X
S3 X X

Forbidden Latencies: 1, 2, 4 State Diagram


C.V. → 1 0 1 1
Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-31
Delay Insertion
output
D1

S1 S2 S3

D2

1 2 3 4 5 6 7
S1 X X
Forbidden: 2, 6 S2 X X
C.V. → 1 0 0 0 1 0 S3 X X
D1 D
D2 D

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-32


Delay Insertion
4,7+
100010
1* 4,7+
7+ 3 4,7+ 5
4 5
110011 100011
1* 5
3* 3*
100110

Greedy cycle (1, 3), resulting in a reduced MAL= (1 + 3)/2 = 2.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-33

You might also like