Dynamic Scheduling
Dynamic Scheduling
Functional Units
Registers FP Mult
FP Divide
FP Add
Integer
SCOREBOARD Memory
• Integer- 1 clock cycle
• Add- 2 clock cycles
• Multi: 10 clock cycles
• Div: 40 clock cycles
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards?
• Solutions for WAR:
• Stall write back until registers have been read
• Read registers only during Read Operands stage
• Solution for WAW:
• Detect hazard and stall issue of new instruction until other
instruction completes
• Need to have multiple instructions in execution phase
=> multiple execution units or pipelined execution
units
• Scoreboard keeps track of dependencies between
instructions that have already issued.
Four Stages of Scoreboard Control
• Issue—decode instructions & check for structural
hazards (ID1)
• Instructions issued in program order (for hazard checking)
• Don’t issue if structural hazard
• Don’t issue if instruction is output dependent on
previously issued but uncompleted instruction (no WAW
hazards)
• Read operands—wait until no data hazards, then
read operands (ID2)
• All real dependencies (RAW hazards) resolved in this
stage. Wait for instructions to write back data.
• No data forwarding
• Execution—operate on operands (EX)
• Functional unit begins execution upon receiving operands. When
result is ready, scoreboard notified execute complete
• Write result—finish execution (WB)
• Stall until no WAR hazards with previous instructions:
11
Dynamic Scheduling Using A Scoreboard
Scoreboard Example
Instruction status Read Execution Write
Instruction j k Issue operandscompleteResult
LD F6 34+ R2
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
12
Dynamic Scheduling Using A Scoreboard
14
Dynamic Scheduling Using A Scoreboard
15
Dynamic Scheduling Using A Scoreboard
16
Dynamic Scheduling Using A Scoreboard
18
Dynamic Scheduling Using A Scoreboard
19
Dynamic Scheduling Using A Scoreboard
20
Dynamic Scheduling Using A Scoreboard
21
Dynamic Scheduling Using A Scoreboard
23
Dynamic Scheduling Using A Scoreboard
28
Dynamic Scheduling Using A Scoreboard
29
Dynamic Scheduling Using A Scoreboard
32
Dynamic Scheduling Using A Scoreboard
33
Dynamic Scheduling Using A Scoreboard
36
Dynamic Scheduling Using A Scoreboard
37
Review: Scoreboard
• Limitations of 6600 scoreboard
• No forwarding
• Limited to instructions in basic block (small window)
• Large number of functional units (structural hazards)
• Stall on WAR hazards
• Stall on WAW hazards
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
WAR SUB.D F8, F10, F14
MUL.D F6, F10, F8 WAW Output dependence
Antidependence
Name dependence
38
Another Dynamic Algorithm: Tomasulo
Algorithm
• For IBM 360/91 about 3 years after CDC 6600
• Goal: High Performance without special compilers
• RAW hazards are avoided by executing an instruction only when its
operands are available.
• Out of order write
• Uses register renaming to minimize WAW and WAR hazards.
• Uses temporary registers to remove name dependency.
• Register renaming is provided by the reservation station.
39
Another Dynamic Algorithm: Tomasulo Algorithm
41
Major components of Tomasualo Structure
1. Instruction Queue: Fetch unit keeps the instructions in the instruction
queue where they are issued in FIFO(maintaining in order).
2. Reservation station: Buffers the instruction and operands.
Operands value(already computed/ available)
available, instruction is dispatched to functional unit for
execution
pending (Name of RST/ load & store buffer).
Not available, RST tracks CDB. When available, RST buffers,
instruction is dispatched to functional unit for execution
Continued..
3. Common Data Bus(CDB): Result has passed to
various components like
Other RST(Waiting the result)
Load & Store buffer(waiting)
Registers
4. FP Registers : keeping the operands
5. Load & Store Buffers
6. Multiple FP Functional Units
7. Address Unit : E.A. Calculation for Load & Store
instruction.
Reservation Station
• Each Functional Unit (FU) has one or more reservation station.
• Instructions are issued if there is an empty reservation station.
• Scoreboard -> issued an instruction only when the FU is free.
• Operands are read from the register file if they are available.
• The reservation station holds
• Instruction that have been issued and are awaiting execution at a functional unit.
• The operands for that instruction (if they have already been computed or source of
the operands otherwise)
• The information needed to control the instruction once it has begun execution
• Renaming to larger set of register + buffering source operands
• Prevents registers as bottleneck
• WAR hazards are avoided because an operand is already stored in reservation station
even when a write to the same register is performed out of order.
• WAW hazards are avoided because of the user of pointers to reservation stations
instead of the register pointers as tags on the CDB.
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from instruction Queue
• Stall if structural hazard, i.e. no space in the reservation station (RS).
• If RS is free, the issue logic issues instruction to RS & read operands
into rs if ready
• (Register renaming => Solves WAR, WAW).
2. Execution—operate on operands (EX)
When both operands are ready then execute;
if not ready, watch CDB for result – Solves RAW
3. Write result—finish execution (WB)
• Write on Common Data Bus to all awaiting units;
• mark reservation station available.
• Write result into destination register if its status is rs. => Solves
WAW.
45
Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)
Vj, Vk— Value of the source operand.
Qj, Qk— Name of the RS that would provide the source operands.
A- used to hold information for the memory address calculation for
the load and store.
Busy—Indicates reservation station or FU is busy
Register File Status—Indicates which functional unit will write each
register, if one exists. Blank when no pending instructions that will
write that register meaning that the value is already available.
46
Tomasulo Example Cycle 0
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk A
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
47
Tomasulo Example Cycle 1
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk A
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
48
Tomasulo Example Cycle 2
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2- Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2 Assume Load takes 2 cycles
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
49
Tomasulo Example Cycle 3
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 Load1 Yes 34+R2
LD F2 45+ R3 2 3- Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No read value
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
50
Tomasulo Example Cycle 4
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(34+R2) Load2
0 Add2 No
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(34+R2) Add1
51
Tomasulo Example Cycle 5
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
2 Add1 Yes Sub M(34+R2) M(45+R3)
0 Add2 No
Add3 No
10 Mult1 Yes Mult M(45+R3) R(F4)
0 Mult2 Yes Div M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2
52
Tomasulo Example Cycle 6
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 --
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
1 Add1 Yes Sub M(34+R2) M(45+R3)
0 Add2 Yes Add M(45+R3) Add1
Add3 No
9 Mult1 Yes Mult M(45+R3) R(F4)
0 Mult2 Yes Div M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(45+R3) Add2 Add1 Mult2
53
Tomasulo Example Cycle 7
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(A1) M(A2)
0 Add2 Yes Add M(A2) Add1
Add3 No
8 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
54
Tomasulo Example Cycle 8
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
2 Add2 Yes Add M1-M2 M(A2)
Add3 No
7 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 M1-M2 Mult2
55
Tomasulo Example Cycle 9
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 --
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
1 Add2 Yes Add M1-M2 M(A2)
Add3 No
6 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 M1-M2 Mult2
56
Tomasulo Example Cycle 10
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 Yes Add M1-M2 M(A2)
Add3 No
5 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 M1-M2 Mult2
57
Tomasulo Example Cycle 11
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
58
Tomasulo Example Cycle 12
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
59
Tomasulo Example Cycle 15
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
0 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
60
Tomasulo Example Cycle 16
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 Mult2
61
Tomasulo Example Cycle 56
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 Mult2
62
Tomasulo Example Cycle 57
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56 57
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
57 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 result
63
Tomasulo Algorithm Vs Scoreboard
Tomasulo Algorithm Scoreboard
Control and buffers distributed with FU. Centralized in scoreboard
FU buffers called reservation station (RS) have Operands in register
rending operands
Registers in instruction replaced by values or No register renaming
pointers RS (Register renaming)
Avoids WAW hazard…………………………………………… Stall issue
Avoids WAR hazard…………………………………………… Stall completion
No issue on structural hazards No issue on structural hazard
Results to FU from RS not through registers over Write/read register
CDB that broadcasts results to all FUs
Tomasulo Drawback
• Many associative stores (CDB) at high speed.
• Performance limited by common data bus
• Each CDB must go to multiple FU
• Number of FU that can complete per cycle limited to one.
Example:Scoreboard tables before MUL.D writes results
Instruction Status
Read
Instruction Issue Operands Execution Complete Write Result
L.D F6,34(R2) X X X X
L.D F2,45(R3) X X X X
MUL.D F0,F2,F4 X X X
SUB.D F8,F6,F2 X X X X
DIV.D F10,F0,F6 X
ADD.D F6,F8,F2 X X X
67
Step-1 (Instruction status)
Draw the table and put a tick() mark.
L.D F2,44(R3)
MUL.D F0,F2,F4
SUB.D F8,F2,F6
DIV.D F10,F0,F6
ADD.D F6,F8,F2
68
Step-2(Reservation Station Status)
Name of Busy OP Vj Vk Qj Qk A
RST
Load1 No
Add3 No
69
Step-3(Register Result Status)
70