0% found this document useful (0 votes)
15 views

Dynamic Scheduling

Dynamic scheduling allows instructions to execute out of order by rearranging the execution in hardware to reduce pipeline stalls. A scoreboard is a data structure that keeps track of register dependencies between instructions to enable out-of-order execution while avoiding hazards like write-after-read and write-after-write. The scoreboard controls instruction issue, operand reading, execution, and result writing in 4 stages to ensure dependencies are respected and hazards are avoided.

Uploaded by

kasyap sai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Dynamic Scheduling

Dynamic scheduling allows instructions to execute out of order by rearranging the execution in hardware to reduce pipeline stalls. A scoreboard is a data structure that keeps track of register dependencies between instructions to enable out-of-order execution while avoiding hazards like write-after-read and write-after-write. The scoreboard controls instruction issue, operand reading, execution, and result writing in 4 stages to ensure dependencies are respected and hazards are avoided.

Uploaded by

kasyap sai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

DYNAMIC SCHEDULING

Why Dynamic Scheduling?


• All the static(complier) techniques discussed so far use in-order
instruction issue.
• That means that if an instruction is stalled in the pipeline, no
later instructions can proceed.
• With in-order issue, if two instructions have a hazard between
them, the pipeline will stall, even if there are later instructions
that are independent and would not stall.
• Compiler attempts to schedule the instructions, called static
scheduling.
• Several early processors used another approach, called dynamic
scheduling, whereby the hardware rearranges the instruction
execution to reduce the stalls.
Example: DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
Advantages of Dynamic Scheduling
• Enables handling in compile time
• Simplifies compiler

• Out-of-order execution => out-of-order completion.


• When instructions execute out of order, it may arise
WAR hazard and WAW hazard.
• Example: DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8, F8, F14
Instruction Parallelism by HW

• dynamically scheduled pipeline:: all instructions pass


through issue stage in order (in-order issue)
• Enables out-of-order execution and allows out-of-
order completion
• Will distinguish when an instruction begins execution
and when it completes execution; in between
instruction in execution
Dynamic Scheduling by Scoreboard:
• To implement out-of-order execution, ID stage must be split into
two stages:
1. Issue—decode instructions, check for structural hazards
2. Read operands—wait until no data hazards, then read operands
• Scoreboards is first used in CDC6600 in 1963
• Scoreboard is a data structure contains set of registers used in
instruction.
• Scoreboard decides
When to issue instruction
When to execute it
When to write registers (avoid WAR hazard)
• CDC 6600: In order issue, out-of-order execution (when there are no
conflicts and the hardware is available). , out-of-order commit (or
completion)
• No forwarding
Scoreboard Architecture (CDC 6600)
FP Mult

Functional Units
Registers FP Mult

FP Divide

FP Add

Integer

SCOREBOARD Memory
• Integer- 1 clock cycle
• Add- 2 clock cycles
• Multi: 10 clock cycles
• Div: 40 clock cycles
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards?
• Solutions for WAR:
• Stall write back until registers have been read
• Read registers only during Read Operands stage
• Solution for WAW:
• Detect hazard and stall issue of new instruction until other
instruction completes
• Need to have multiple instructions in execution phase
=> multiple execution units or pipelined execution
units
• Scoreboard keeps track of dependencies between
instructions that have already issued.
Four Stages of Scoreboard Control
• Issue—decode instructions & check for structural
hazards (ID1)
• Instructions issued in program order (for hazard checking)
• Don’t issue if structural hazard
• Don’t issue if instruction is output dependent on
previously issued but uncompleted instruction (no WAW
hazards)
• Read operands—wait until no data hazards, then
read operands (ID2)
• All real dependencies (RAW hazards) resolved in this
stage. Wait for instructions to write back data.
• No data forwarding
• Execution—operate on operands (EX)
• Functional unit begins execution upon receiving operands. When
result is ready, scoreboard notified execute complete
• Write result—finish execution (WB)
• Stall until no WAR hazards with previous instructions:

Example: DIVD F0,F2,F4


ADDD F10,F0,F8
SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads
operands
Dynamic Scheduling Using A Scoreboard

Three Parts of the Scoreboard


1. Instruction status—Indicates which of 4 steps the instruction is in
2.Functional unit status—Indicates the state of the functional unit (FU).
9 fields for each functional unit
Busy—Indicates whether the unit is busy or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready and not yet read.
Set to No after operands are read
3.Register result status—Indicates which functional unit will write each
register, if one exists. Blank when no pending instructions will write
that register

11
Dynamic Scheduling Using A Scoreboard

Scoreboard Example
Instruction status Read Execution Write
Instruction j k Issue operandscompleteResult
LD F6 34+ R2
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU

12
Dynamic Scheduling Using A Scoreboard

Instruction status Read Execution Write


Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU 13
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 1

Instruction status Read Execution Write


Instruction j k Issue operandscompleteResult
Issue LD #1
LD F6 34+ R2 1
LD F2 45+ R3 Shows in which cycle
MULTD F0 F2 F4
the operation occurred.
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer

14
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 2


LD #2 can’t issue
since integer unit
Instruction status Read Execution
Write
is busy.
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 MULT can’t issue
LD F2 45+ R3 because we require
MULTD F0 F2 F4 in-order issue.
SUBD F8 F6 F2
DIVDF10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer

15
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 3

Instruction status Read Execution


Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVDF10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer

16
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 4

Instruction status Read Execution


Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVDF10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer
17
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 5


Instruction status Read Execution Write
Instruction j k Issue operandscompleteResult
Issue LD #2 since
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 integer unit is now
MULTDF0 F2 F4 free.
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer

18
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 6


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4 Issue MULT.
LD F2 45+ R3 5 6
MULTD F0 F2 F4 6
SUBD F8 F6 F2
DIVDF10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1Integer

19
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 7


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result MULT can’t read its
LD F6 34+ R2 1 2 3 4 operands (F2)
LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6
because LD #2
SUBD F8 F6 F2 7 hasn’t finished.
DIVDF10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 IntegerYes No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1Integer Add

20
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 8a


Instruction status Read Execution
Write DIVD issues.
Instruction j k Issueoperands
complete
Result MULT and SUBD both
LD F6 34+ R2 1 2 3 4
waiting for F2.
LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVDF10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 IntegerYes No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1Integer Add Divide

21
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 8b


Instruction status Read Execution Write
Instruction j k Issue operandscompleteResult LD #2 writes F2.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
22
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 9


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result Now MULT and SUBD
LD F6 34+ R2 1 2 3 4 can both read F2.
LD F2 45+ R3 5 6 7 8 ADDD can’t start
MULTD F0 F2 F4 6 9 because add unit is
SUBD F8 F6 F2 7 9
busy.
DIVDF10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
10 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
2 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide

23
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 11


Instruction status Read Execution
Write ADD unit takes 2
Instruction j k Issueoperands
complete
Result
cycle.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11
DIVDF10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
24
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 12


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4 SUBD finishes.
LD F2 45+ R3 5 6 7 8 DIVD waiting for F0.
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide 25
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 13


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result ADDD issues.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide
26
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 14


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
2 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide
27
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 15


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
1 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide

28
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 16


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide

29
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 17


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result ADDD can’t write
LD F6 34+ R2 1 2 3 4 because of DIVD.
LD F2 45+ R3 5 6 7 8
RAW!
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide
30
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 18


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4 Nothing Happens!!
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide
31
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 19


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result MULT completes
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
execution.
MULTD F0 F2 F4 6 9 19
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide

32
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 20


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result MULT writes.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide

33
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 21


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result DIVD loads operands
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide
34
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 22


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result Now ADDD can write
LD F6 34+ R2 1 2 3 4 since WAR removed.
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
40 Divide Yes Div F10 F0 F6 No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide
35
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 61


Instruction status Read Execution
Write
Instruction j k Issueoperands
complete
Result DIVD completes
LD F6 34+ R2 1 2 3 4
execution
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVDF10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide Yes Div F10 F0 F6 No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide

36
Dynamic Scheduling Using A Scoreboard

Scoreboard Example Cycle 62

Instruction status Read Execution Write DONE!!


Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

37
Review: Scoreboard
• Limitations of 6600 scoreboard
• No forwarding
• Limited to instructions in basic block (small window)
• Large number of functional units (structural hazards)
• Stall on WAR hazards
• Stall on WAW hazards
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
WAR SUB.D F8, F10, F14
MUL.D F6, F10, F8 WAW Output dependence
Antidependence

Name dependence
38
Another Dynamic Algorithm: Tomasulo
Algorithm
• For IBM 360/91 about 3 years after CDC 6600
• Goal: High Performance without special compilers
• RAW hazards are avoided by executing an instruction only when its
operands are available.
• Out of order write
• Uses register renaming to minimize WAW and WAR hazards.
• Uses temporary registers to remove name dependency.
• Register renaming is provided by the reservation station.

39
Another Dynamic Algorithm: Tomasulo Algorithm

DIV.D F0, F2, F4


ADD.D F6, F0, F8
WAW Hazard S.D F6, 0(R1) WAR Hazard
SUB.D F8, F10, F14
MUL.D F6, F10, F8

Register renaming (Remove WAW and WAR hazard)

DIV.D F0, F2, F4


ADD.D S, F0, F8
S.D S, 0(R1)
SUB.D T, F10, F14
MUL.D F6, F10, T (Subsequent use of F8 as T)
40
FP unit and load-store unit using Tomasulo’s alg.

41
Major components of Tomasualo Structure
1. Instruction Queue: Fetch unit keeps the instructions in the instruction
queue where they are issued in FIFO(maintaining in order).
2. Reservation station: Buffers the instruction and operands.
Operands  value(already computed/ available)
available, instruction is dispatched to functional unit for
execution
pending (Name of RST/ load & store buffer).
Not available, RST tracks CDB. When available, RST buffers,
instruction is dispatched to functional unit for execution
Continued..
3. Common Data Bus(CDB): Result has passed to
various components like
Other RST(Waiting the result)
Load & Store buffer(waiting)
Registers
4. FP Registers : keeping the operands
5. Load & Store Buffers
6. Multiple FP Functional Units
7. Address Unit : E.A. Calculation for Load & Store
instruction.
Reservation Station
• Each Functional Unit (FU) has one or more reservation station.
• Instructions are issued if there is an empty reservation station.
• Scoreboard -> issued an instruction only when the FU is free.
• Operands are read from the register file if they are available.
• The reservation station holds
• Instruction that have been issued and are awaiting execution at a functional unit.
• The operands for that instruction (if they have already been computed or source of
the operands otherwise)
• The information needed to control the instruction once it has begun execution
• Renaming to larger set of register + buffering source operands
• Prevents registers as bottleneck
• WAR hazards are avoided because an operand is already stored in reservation station
even when a write to the same register is performed out of order.
• WAW hazards are avoided because of the user of pointers to reservation stations
instead of the register pointers as tags on the CDB.
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from instruction Queue
• Stall if structural hazard, i.e. no space in the reservation station (RS).
• If RS is free, the issue logic issues instruction to RS & read operands
into rs if ready
• (Register renaming => Solves WAR, WAW).
2. Execution—operate on operands (EX)
When both operands are ready then execute;
if not ready, watch CDB for result – Solves RAW
3. Write result—finish execution (WB)
• Write on Common Data Bus to all awaiting units;
• mark reservation station available.
• Write result into destination register if its status is rs. => Solves
WAW.

45
Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)
Vj, Vk— Value of the source operand.
Qj, Qk— Name of the RS that would provide the source operands.
A- used to hold information for the memory address calculation for
the load and store.
Busy—Indicates reservation station or FU is busy
Register File Status—Indicates which functional unit will write each
register, if one exists. Blank when no pending instructions that will
write that register meaning that the value is already available.

46
Tomasulo Example Cycle 0
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk A
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
47
Tomasulo Example Cycle 1
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk A
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
48
Tomasulo Example Cycle 2
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2- Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2 Assume Load takes 2 cycles
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
49
Tomasulo Example Cycle 3
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 Load1 Yes 34+R2
LD F2 45+ R3 2 3- Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No read value
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
50
Tomasulo Example Cycle 4
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(34+R2) Load2
0 Add2 No
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(34+R2) Add1
51
Tomasulo Example Cycle 5
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
2 Add1 Yes Sub M(34+R2) M(45+R3)
0 Add2 No
Add3 No
10 Mult1 Yes Mult M(45+R3) R(F4)
0 Mult2 Yes Div M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2
52
Tomasulo Example Cycle 6
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 --
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
1 Add1 Yes Sub M(34+R2) M(45+R3)
0 Add2 Yes Add M(45+R3) Add1
Add3 No
9 Mult1 Yes Mult M(45+R3) R(F4)
0 Mult2 Yes Div M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(45+R3) Add2 Add1 Mult2
53
Tomasulo Example Cycle 7
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(A1) M(A2)
0 Add2 Yes Add M(A2) Add1
Add3 No
8 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
54
Tomasulo Example Cycle 8
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
2 Add2 Yes Add M1-M2 M(A2)
Add3 No
7 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 M1-M2 Mult2
55
Tomasulo Example Cycle 9
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 --
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
1 Add2 Yes Add M1-M2 M(A2)
Add3 No
6 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 M1-M2 Mult2
56
Tomasulo Example Cycle 10
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 Yes Add M1-M2 M(A2)
Add3 No
5 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 M1-M2 Mult2
57
Tomasulo Example Cycle 11
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
58
Tomasulo Example Cycle 12
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
59
Tomasulo Example Cycle 15
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
0 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
60
Tomasulo Example Cycle 16
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 Mult2
61
Tomasulo Example Cycle 56
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 Mult2
62
Tomasulo Example Cycle 57
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56 57
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
57 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 result
63
Tomasulo Algorithm Vs Scoreboard
Tomasulo Algorithm Scoreboard
Control and buffers distributed with FU. Centralized in scoreboard
FU buffers called reservation station (RS) have Operands in register
rending operands
Registers in instruction replaced by values or No register renaming
pointers RS (Register renaming)
Avoids WAW hazard…………………………………………… Stall issue
Avoids WAR hazard…………………………………………… Stall completion
No issue on structural hazards No issue on structural hazard
Results to FU from RS not through registers over Write/read register
CDB that broadcasts results to all FUs
Tomasulo Drawback
• Many associative stores (CDB) at high speed.
• Performance limited by common data bus
• Each CDB must go to multiple FU
• Number of FU that can complete per cycle limited to one.
Example:Scoreboard tables before MUL.D writes results

Instruction Status

Read
Instruction Issue Operands Execution Complete Write Result
L.D F6,34(R2) X X X X
L.D F2,45(R3) X X X X
MUL.D F0,F2,F4 X X X
SUB.D F8,F6,F2 X X X X
DIV.D F10,F0,F6 X
ADD.D F6,F8,F2 X X X

Functional unit status


Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status


F0 F2 F4 F6 F8 F10 F12 ……
unit Mult1 Integer Add Divide
Tomasulo Exercise (Scenario is given)
Consider a set of instructions
L.D F6,32(R2)
L.D F2,44(R3)
MUL.D F0,F2,F4
SUB.D F8,F2,F6
DIV.D F10,F0,F6
ADD.D F6,F8,F2
And there are 2 load/store, 3 Add and 2 Mult.

and the scenario is

(i) 1st load has finished it’s write result stage.


(ii) 2nd load has completed it’s execution stage.
(iii) Remaining four instructions have finished their issue stage.
Show the status of each table.

67
Step-1 (Instruction status)
Draw the table and put a tick() mark.

Instruction issue Execution Write Result


Complete
L.D F6,32(R2)   

L.D F2,44(R3)  

MUL.D F0,F2,F4 

SUB.D F8,F2,F6 

DIV.D F10,F0,F6 

ADD.D F6,F8,F2 

68
Step-2(Reservation Station Status)
Name of Busy OP Vj Vk Qj Qk A
RST
Load1 No

Load2 Yes Load 44+Reg[R3]

Add1 Yes Sub Mem[32+Reg[R2]] Load2

Add2 Yes Add Add1 Load2

Add3 No

Mult1 Yes Mult Reg[F4] Load2

Mult2 Yes Div Mem[32+Reg[R2]] Mult1

69
Step-3(Register Result Status)

F0 F2 F4 F6 F8 F10 F12 … F30 F31


Mult1 Load2 Add2 Add1 Mult2

70

You might also like