0% found this document useful (0 votes)
41 views

Chapter One: Introduction To Pipelined Processors

The document describes the floating point execution unit of the IBM Model 91 processor. It contains data registers, arithmetic units, reservation stations, and a common data bus. Instructions are held in reservation stations with source and destination tags to track dependencies. The unit uses register tagging to enable forwarding of results to avoid stalls. Hazard detection ensures dependent instructions are executed in order by comparing their register domains and ranges. Job sequencing aims to schedule instructions without collisions in the reservation stations through latency analysis.

Uploaded by

Joona John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Chapter One: Introduction To Pipelined Processors

The document describes the floating point execution unit of the IBM Model 91 processor. It contains data registers, arithmetic units, reservation stations, and a common data bus. Instructions are held in reservation stations with source and destination tags to track dependencies. The unit uses register tagging to enable forwarding of results to avoid stalls. Hazard detection ensures dependent instructions are executed in order by comparing their register domains and ranges. Job sequencing aims to schedule instructions without collisions in the reservation stations through latency analysis.

Uploaded by

Joona John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Chapter One

Introduction to Pipelined
Processors
Principle of Designing Pipeline
Processors
(Design Problems of Pipeline
Processors)
Register Tagging
Example : IBM Model 91 :
Floating Point Execution Unit
Example : IBM Model 91-FPU
• The floating point execution unit consists of :
– Data registers
– Transfer paths
– Floating Point Adder Unit
– Multiply-Divide Unit
– Reservation stations
– Common Data Bus
Example : IBM Model 91-FPU
• There are 3 reservation stations for adder
named A1, A2 and A3 and 2 for multipliers
named M1 and M2.
• Each station has the source & sink registers
and their tag & control fields
• The stations hold operands for next execution.
Example : IBM Model 91-FPU
• 3 store data buffers(SDBs) and 4 floating point
registers (FLRs) are tagged
• Busy bits in FLR indicates the dependence of
instructions in subsequent execution
• Common Data Bus(CDB) is to transfer
operands
Example : IBM Model 91-FPU
• There are 11 units to supply information to
CDB: 6 FLBs, 3 adders & 2 multiply/divide unit
• Tags for these stations are :
Unit Tag Unit Tag
FLB1 0001 ADD1 1010
FLB2 0010 ADD2 1011
FLB3 0011 ADD3 1100
FLB4 0100 M1 1000
FLB5 0101 M2 1001
FLB6 0110
Example : IBM Model 91-FPU
• Internal forwarding can be achieved with
tagging scheme on CDB.
• Example:
• Let F refers to FLR and FLBi stands for ith FLB
and their contents be (F) and (FLBi)
• Consider instruction sequence
ADD F,FLB1 F  (F) + (FLB1)
MPY F,FLB2 F  (F) x (FLB2)
Example : IBM Model 91-FPU
• During addition :
– Busy bit of F is set to 1
– Contents of F and FLB1 is sent to adder A1
– Tag of F is set to 1010 (tag of adder)
Busy Bit = 1 Tag=1010
F
Storage Bus Instruction Unit

5 Floating Busy Bit = 1 Tag=1010


Floating 4 Point
Point Control
Buffers 3 Operand Store 3
Tags data buffers 2
(FLB) 2 Stack(FLOS) (SDB) 1

Decoder

Tag Sink Tag Source CTRL


Tag Sink Tag Source CTRL
Tag Sink Tag Source CTRL
1010 F 0001 FLB1 CTRL Tag Sink Tag Source CTRL

Adder Multiplier

(Common Data Bus)


Example : IBM Model 91-FPU
• Meantime, the decode of MPY reveals F is
busy, then
– F should set tag of M1 as 1010 (Tag of adder)
– F should change its tag to 1000 (Tag of Multiplier)
– Send content of FLB2 to M1

F Busy Bit = 1 Tag=1000


Storage Bus Instruction Unit
Before
addition
6

5 Floating Busy Bit = 1 Tag=1000


Floating 4 Point
Point Control
Buffers 3 Operand Store 3
Tags data buffers 2
(FLB) 2 Stack(FLOS) (SDB) 1

Decoder

Tag Sink Tag Source CTRL


1010 F 0010 FLB2 CTRL
Tag Sink Tag Source CTRL
Tag Sink Tag Source CTRL Tag Sink Tag Source CTRL

Adder Multiplier

(Common Data Bus)


Storage Bus Instruction Unit
After
addition
6

5 Floating Busy Bit = 1 Tag=1000


Floating 4 Point
Point Control
Buffers 3 Operand Store 3
Tags data buffers 2
(FLB) 2 Stack(FLOS) (SDB) 1

Decoder

Tag Sink Tag Source CTRL


1000 F 0010 FLB2 CTRL
Tag Sink Tag Source CTRL
Tag Sink Tag Source CTRL Tag Sink Tag Source CTRL

Adder Multiplier

(Common Data Bus)


Example : IBM Model 91-FPU
• When addition is done, CDB finds that the
result should be sent to M1
• Multiplication is done when both operands
are available
Hazard Detection and Resolution
Hazard Detection and Resolution
• Hazards are caused by resource usage
conflicts among various instructions
• They are triggered by inter-instruction
dependencies
Terminologies:
• Resource Objects: set of working registers,
memory locations and special flags
Hazard Detection and Resolution
• Data Objects: Content of resource objects
• Each Instruction can be considered as a
mapping from a set of data objects to a set of
data objects.
• Domain D(I) : set of resource of objects whose
data objects may affect the execution of
instruction I.(e.g.Source Registers)
Hazard Detection and Resolution
• Range R(I): set of resource objects whose data
objects may be modified by the execution of
instruction I .(e.g. Destination Register)
• Instruction reads from its domain and writes
in its range
Hazard Detection and Resolution
• Consider execution of instructions I and J, and
J appears immediately after I.
• There are 3 types of data dependent hazards:
1. RAW (Read After Write)
2. WAW(Write After Write)
3. WAR (Write After Read)
RAW (Read After Write)
• The necessary condition for this hazard is
R( I )  D( J )  
RAW (Read After Write)
• Example:
I1 : LOAD r1,a
I2 : ADD r2,r1
• I2 cannot be correctly executed until r1 is
loaded
• Thus I2 is RAW dependent on I1
WAW(Write After Write)
• The necessary condition is

R( I )  R( J )  
WAW(Write After Write)
• Example
I1 : MUL r1, r2
I2 : ADD r1,r4
• Here I1 and I2 writes to same destination and
hence they are said to be WAW dependent.
WAR(Write After Read)
• The necessary condition is
D( I )  R( J )  
WAR(Write After Read)
• Example:
• I1 : MUL r1,r2
• I2 : ADD r2,r3
• Here I2 has r2 as destination while I1 uses it as
source and hence they are WAR dependent
Hazard Detection and Resolution
• Hazards can be detected in fetch stage by
comparing domain and range.
• Once detected, there are two methods:
1. Generate a warning signal to prevent hazard
2. Allow incoming instruction through pipe and
distribute detection to all pipeline stages.
Job Sequencing and Collision
Prevention
Job Sequencing and Collision
Prevention
• Consider reservation table given below at t=1
1 2 3 4 5 6
Sa A A
Sb A A
Sc A A
Job Sequencing and Collision
Prevention
• Consider next initiation made at t=2
1 2 3 4 5 6 7 8
Sa A1 A2 A1 A2
Sb A1 A2 A1 A2
Sc A1 A2 A1 A2

• The second initiation easily fits in the


reservation table
Job Sequencing and Collision
Prevention
• Now consider the case when first initiation is
made at t = 1 and second at t = 3.
1 2 3 4 5 6 7 8
Sa A1 A2 A1 A2
Sb A 1 A 2 A 1A 2 A2
Sc A 1 A 2 A 1A 2 A2

• Here both markings A1 and A2 falls in the


same stage time units and is called collision
and it must be avoided
Terminologies
Terminologies
• Latency: Time difference between two
initiations in units of clock period
• Forbidden Latency: Latencies resulting in
collision
• Forbidden Latency Set: Set of all forbidden
latencies
General Method of finding Latency
Considering all initiations:
1 2 3 4 5 6 7 8 9 10 11
Sa A1 A2 A3 A4 A 5 A 6A 1 A 2 A3 A4 A5 A6
Sb A 1 A 2 A 1A 3 A 2A 4 A 3A 5 A 4A 6 A 5 A 6
Sc A 1 A 2 A 1A 3 A 2A 4 A 3A 5 A 4A 6 A 5 A 6

• Forbidden Latencies are 3 and 6


Shortcut Method of finding Latency

• Forbidden Latency Set = {1,6} U {1,3} U {1,3}


= { 1, 3, 6 }
Terminologies
• Latency Sequence : Sequence of latencies
between successive initiations
• For a RT, number of valid initiations and
latencies are infinite
Terminologies
• Latency Cycle:
• Among the infinite possible latency sequence,
the periodic ones are significant.
E.g. { 1, 3, 3, 1, 3, 3,… }
• The subsequence that repeats itself is called
latency cycle.
E.g. {1, 3, 3}
Terminologies
• Period of cycle: The sum of latencies in a
latency cycle (1+3+3=7)
• Average Latency: The average taken over its
latency cycle (AL=7/3=2.33)
• To design a pipeline, we need a control
strategy that maximize the throughput (no. of
results per unit time)
• Maximizing throughput is minimizing AL
Terminologies
• Latency sequence which is aperiodic in nature
is impossible to design
• Thus design problem is arriving at a latency
cycle having minimal average latency.

You might also like