0% found this document useful (0 votes)
13 views

Data Hazards

Uploaded by

uma_sai
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data Hazards

Uploaded by

uma_sai
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Overcoming Data Hazards

UNIT-III
Dynamic Scheduling Using Tomasulo’s Approach

• Tomasulo invented the IBM 360/91 floating point unit


– Built before cache memories came into use
– The unit tracks when operands for instructions are available to minimize
RAW hazards
– Used register renaming to minimize WAW and RAW hazards
• Key concept
– Track instruction dependences to allow execution as soon as operands
were available and renaming registers to avoid WAR and WAW hazards
• Goal
– Achieve high floating point performance from the instruction set without
relying on compiler

08/01/24 COMPUTER ARCHITECTURE 2


Tomasulo’s Approach - Background

• IBM 360/91 had only 4 double precision floating point


registers
• IBM 360/91 had long memory accesses and long floating
point delays
• IBM 360/91 has register-memory instructions
• Tomasulo’s algorithm focuses on the floating point unit and
the load-store unit

08/01/24 COMPUTER ARCHITECTURE 3


Tomasulo’s Approach - Background

• RAW hazards – avoided by execution of an instruction only


when its operands are available
• WAR and WAW hazards – eliminated by register renaming
– All destination registers renamed including those with pending read
or write for an earlier instruction

DIV.D F0,F2,F4 ADD.D & SUB.D has an antidependence


ADD.D F6,F0,F8 F8 must be used by ADD.D before SUB.D writes it or
WAR hazard
S.D F6,0(R1)
SUB.D F8,F10,F14 ADD.D must finish with R6 before S.D writes
S.D must finish before write-back of MUL.D, WAW
MUL.D F6,F10,F8 if ADD.D finishes later than MUL.D

08/01/24 COMPUTER ARCHITECTURE 4


Tomasulo’s Approach - Background

• Assume 2 temporary registers S & T


• S allows MUL.D to finish before ADD.D – removes F8
• T allows SUB.D to finish before ADD.D
• Any subsequent uses of F8 must be replaced by T

DIV.D F0,F2,F4 DIV.D F0,F2,F4


ADD.D F6,F0,F8 ADD.D S,F0,F8
S.D F6,0(R1) S.D S,0(R1)
SUB.D F8,F10,F14 SUB.D T,F10,F14
MUL.D F6,F10,F8 MUL.D F6,F10,T

08/01/24 COMPUTER ARCHITECTURE 5


Tomasulo’s Approach - Background

• Register renaming is provided by reservation stations


– Buffer operands of instructions waiting to issue
– Fetches and buffers an operand as soon as it is available,
eliminating need to get it from a register
– Pending instructions designate the reservation station that will
provide their input. As instructions are issued, the register
specifiers for pending operands are renamed to the names of the
reservation station
– When successive writes to a register overlap in execution, only the
last one is used to update the register
– There can be more reservation stations than real registers

08/01/24 COMPUTER ARCHITECTURE 6


Tomasulo’s Approach – Use of reservation stations
rather than a centralized register file
• Hazard detection and execution control are distributed
– Information held in the reservation stations at each functional
unit determine when an instruction can begin execution at that
unit
• Results are passed directly to functional units from the
reservation station where they are buffered
– Common results bus (also called common data bus – CDB) that
allows all units waiting for an operand to be loaded at once
– In pipelines with multiple execution units and issuing multiple
instructions per clock, more than one results bus will be needed

08/01/24 COMPUTER ARCHITECTURE 7


The basic structure of a MIPS floating point unit
using Tomasulo’s algorithm

• Execution control tables not shown

• Each station holds instruction that


has been issued and is awaiting
execution at a functional unit and
either the operand values or the
name of a reservation station that
will provide the values

• Load and store buffers behave


similar to reservation stations

• Reservation stations have tag fields


employed by pipeline control

08/01/24 COMPUTER ARCHITECTURE 8


Instruction Execution in this Pipeline

• Issue
– Get the instruction from the head of the instruction queue,
which is maintained in FIFO order.
– If there is a matching reservation station that is empty, issue
the instruction to the station with the operand values, if they
are currently in registers
– If there is not an empty reservation station, then there is a
structural hazard and the instruction stalls until a station or
buffer is freed. If the operands are not in the registers, keep
track of the functional units that will produce the operands
– REGISTERS RENAMED, WAR AND WAW HAZARDS
ELIMINATED

08/01/24 COMPUTER ARCHITECTURE 9


Instruction Execution in this Pipeline

• (2) Execute
– If not all operands available, monitor the common data bus
while waiting for the instruction to be completed. When
operand becomes available, it is placed into the corresponding
reservation station.
– When all operands are available, operation can be executed at
the corresponding functional unit.
– Delaying execution until all operands available, RAW hazards
eliminated
– Several instructions could become ready in the same clock
cycle for the same functional unit – unit will have to choose
– For floating point unit reservation stations, choice can be
arbitrary (we are producing register results here)

08/01/24 COMPUTER ARCHITECTURE 10


Instruction Execution in this Pipeline

• (2) Execute - continued


– Load and store ( choosing when multiple instructions are
ready) – two steps
• Compute effective address when the base register is available
– Effective address is then placed in the load or store buffer
• Load/Store
– Loads in load buffer execute as soon as memory unit is
available
– Stores in the store buffer wait for the value that is to be
stored before being sent to the memory unit
– Loads and stores are maintained in program order through
the effective address calculation

08/01/24 COMPUTER ARCHITECTURE 11


Instruction Execution in this Pipeline

• (2) Execute - continued


– Preservation of exception behavior
• No instruction is allowed to initiate execution until all branches
that precede the instruction in program order have completed
• Processor must know that branch prediction was correct
• Exception can be recorded but not actually raise it until
appropriate time

08/01/24 COMPUTER ARCHITECTURE 12


Instruction Execution in this Pipeline

• (3) Write Result


– When the result of the instruction is available, write it on the
Common Data Bus and from there into the destination
registers and into any reservation stations (including store
buffers) waiting for this result.
– Stores write data to memory during this step.

08/01/24 COMPUTER ARCHITECTURE 13


Hazard Detection and Elimination

• Data structures (hardware) used to detect and eliminate hazards


are attached to
– Reservation stations
– Register file
– Load – Store buffers
• These are tags associated with an extended set of virtual
registers used in renaming
– For this example, the tags are a 4 bit quantity that denotes one of
the 5 reservation stations or one of the six load buffers, an
equivalent of 11 registers that can be designated as results registers
– The tag field describes which reservation station contains the
instruction that will produce a result needed as a source operand

08/01/24 COMPUTER ARCHITECTURE 14


Hazard Detection and Elimination

• Once an instruction has been issued and is waiting for a


source operand, it refers to the operand by the reservation
station number where the instruction that will write the
register has been assigned
• Unused values, such as 0, indicate that the operand is
already available in the registers

08/01/24 COMPUTER ARCHITECTURE 15


Reservation Stations
• In the Tomasulo scheme, the tags refer to the buffer or unit that will
produce the result. Register names are discarded when an instruction
issues to a reservation station
• Each reservation station has seven fields
– Op – The operation to perform on source operands S1 and S2
– Qj, Qk – The reservation stations that will produce the corresponding source
operand ( a value of 0 indicates that the operand is already available in Vj or
Vk or is unnecessary)
– Vj, Vk – The value of the source operands. Only one of the V field or the Q
field is valid for each operand. For loads, the Vk field is used to hold the
offset field
– A – Used to hold information for the memory address calculation for a load or
store – immediate field initially stored here, then EA
– Busy – Indicates that this reservation station and its accompanying functional
unit are occupied

08/01/24 COMPUTER ARCHITECTURE 16


Register file & Load-Store Buffers

• The register file has one additional field, Qi


– Qi – The number of the reservation station that contains the
operation whose result should be stored into this register. If
the value is blank (or 0) no currently active instruction is
computing a result destined for this register, meaning that the
value is simply the register contents
• The load and store buffers each have a field, A
– A – holds the result of the effective address once the first step
of execution has been completed.

08/01/24 COMPUTER ARCHITECTURE 17


Ex. Show information tables for only first load
completion
• Refer to page 99, Fig 2.10 – note status of instructions indicate all
have been able to issue, both loads in execution and first load finished
• Load1, Load2, Add1, Add2, Mult1, Mult2 indicate tag for the
reservation station – With load 1 complete, the reservation station
(load store buffer in this case) is no longer busy
• Load 1 is completed, it provided a result for register F6, which is to be
loaded with the value 34(R2). This effective address was completed
and when completed, got stored in the Vk for any later instruction that
used F6 (note these are both second operands so in Vk vs Vj)
• Load 2 has not complete, but has a completed effective address and its
reservation station is busy. Note that the SUB.D will need register F2
provided by this load

08/01/24 COMPUTER ARCHITECTURE 18


Ex. Show information tables for only first load
completion
• Add1 is the reservation station name for the SUB.D instruction (note
the SUB in the Op field). The first load has completed and therefore
the value for the second operand (F6) passed by the bus when the
load-store unit fetched it, and therefore the value can be put in Vk.
Now the first operand is F2 which will be there when the second load
completes, so Qj gives the reservation station that will contain the
result when complete (which is Load2).
• The rest is left to the student

08/01/24 COMPUTER ARCHITECTURE 19


Tomasulo’s Algorithm Details
Loads-Stores
• Refer to Figure 2.10 Page 99
• Loads and stores go through a functional unit for EA
computation before going to load or store buffers.
• Loads take a second step to access memory and then go to
Write Result to send result to register file and/or waiting
reservation stations
• Stores complete their execution in Write Result which
writes the result to memory. (Note that Loads and Stores
do writes in Write Result)

08/01/24 COMPUTER ARCHITECTURE 20


Tomasulo’s Algorithm Details

• rd is the destination, rs and rt source


• imm is sign extended immediate field and r is the reservation station
or buffer the instruction is assigned to.
• RS is the reservation station data structure.
• The value returned by an FP unit or by the load store unit is called
result
• RegisterStat is the register status data structure
• Regs[] is the register file

08/01/24 COMPUTER ARCHITECTURE 21


Tomasulo’s Algorithm Details

• Issue for FP operation, using station r (which we waited for)


If (RegisterStat[rs].Qi ≠0) ;if some active inst is computing a result for rs
{RS[r].Qj ← RegisterStat[rs].Qi;} ;then place in station r’s Qj field
;the number of the reservation
;station that will provide result for
;rs
else
{RS[r].Vj ← Regs[rs] ; RS[r].Qj ←0;} ;else place the value of the register
;specified in the rs field into to Vj
;field of the reservation station
and
;set the Qj field = 0 to indicate
;that the value is available
;Do the same for rt

08/01/24 COMPUTER ARCHITECTURE 22


Tomasulo’s Algorithm Details

• Issue for FP operation, using station r –continued


RS[r].Busy ←yes ;set reservation station as busy
RegisterStat[rd].Qi=r ;set the status tag of the register in
the rd
;field to point to this reservation
station
;indicating that we are producing a
result
;for rd

08/01/24 COMPUTER ARCHITECTURE 23


Tomasulo’s Algorithm Details

• Execute for FP operation, using station r


– Wait until RS[r].Qj=0 and RS[r].Qk = 0 ;wait for both operands available

compute the result from the operands in Vj and Vk

08/01/24 COMPUTER ARCHITECTURE 24


Tomasulo’s Algorithm Details

• Write Result for FP operation (or a load register operation)


– Wait for execution complete at reservation station r & the CDB
available
∀x (if (RegisterStat[x].Qi = r) ; for all registers waiting on a result
; from this station
{ Regs[x] ← result ; place result in register
RegisterStat[x].Qi ← 0 } ; remove the waiting for tag.
)
∀x (if (RS[x].Qj = r) ; for all reservation stations waiting
; on a first source operand from r
{RS[x].Vj ← result ; store the result in the Vj field
RS[x].Qj ← 0} ; remove the waiting for tag
)

08/01/24 COMPUTER ARCHITECTURE 25


Tomasulo’s Algorithm Details

• Write Result for FP operation (or a load register operation) - continued


∀x (if (RS[x].Qk = r) ; for all reservation stations waiting
; on a second source operand
from r
{RS[x].Vk ← result ; store the result in the Vk field
RS[x].Qk ← 0} ; remove the waiting for tag
)

RS[r].Busy ← no;

08/01/24 COMPUTER ARCHITECTURE 26


Tomasulo’s Algorithm Details

• The Load Store Operations are left for the student

08/01/24 COMPUTER ARCHITECTURE 27


Tomasulo’s Algorithm Details
Loop Example
• An example
Loop: L.D F0,0(R1)
MUL.D F4,F0,F2
S.D F4,0(R1)
DADDUI R1,R1,-8
BNE R1,R2,Loop

08/01/24 COMPUTER ARCHITECTURE 28


Tomasulo’s Algorithm Details
Loop Example
• If the prediction is that branches are taken, using
reservation stations will allow multiple executions of this
loop to proceed at once
• In effect, the loop is unrolled dynamically
• Notes
– A load and store can safely be done in different order,
provided they access different addresses
– The processor can check program order and the effective
address

08/01/24 COMPUTER ARCHITECTURE 29


Tomasulo’s Algorithm Summary

• This scheme can lead to very high performance, provided


branch prediction is done effectively
• Tomasulo’s scheme is hardware expensive
– Each reservation station must have
• Associative buffer
• Complex control logic
– Performance limited by single CDB
• If another added, each reservation station must interact with all
CDB’s and logic gets more complex
• Two techniques combined
– Renaming of registers
– Buffering of source operands from the register file

08/01/24 COMPUTER ARCHITECTURE 30


Tomasulo’s Algorithm Summary

• This scheme is a technique for overcoming data hazards


• Implements forwarding
• Uses out of order execution

08/01/24 COMPUTER ARCHITECTURE 31

You might also like