Super Scalar 2
Super Scalar 2
1
Agenda
• Interrupts
• Out-of-Order Processors
2
Interrupts:
altering the normal flow of control
Ii-1 HI1
interrupt
program Ii HI2
handler
Ii+1 HIn
5
Interrupt Handler
• Saves EPC before re-enabling interrupts to allow nested
interrupts
– need an instruction to move EPC into GPRs
– need a way to mask further interrupts at least until EPC can be saved
• Needs to read a status register that indicates the cause
of the interrupt
• Uses a special indirect jump instruction RFE (return-
from-exception) to resume user code, this:
– enables interrupts
– restores the processor to the user mode
– restores hardware status and control state
6
Synchronous Interrupts
• A synchronous interrupt (exception) is caused by a
particular instruction
7
Exception Handling 5-Stage Pipeline
Inst. Data
PC D Decode E + M W
Mem Mem
Asynchronous Interrupts
Inst. Data
PC D Decode E + M W
Mem Mem
EPC Cause
Exc Exc Exc
D E M
PC PC PC
Select D E M Asynchronous
Handler Kill F Kill D Kill E Kill
PC Stage Stage Stage Interrupts Writeback
9
Exception Handling 5-Stage Pipeline
• Hold exception flags in pipeline until commit point (M
stage)
10
Speculating on Exceptions
• Prediction mechanism
– Exceptions are rare, so simply predicting no exceptions is very
accurate!
• Check prediction mechanism
– Exceptions detected at end of instruction execution pipeline, special
hardware for various exception types
• Recovery mechanism
– Only write architectural state at commit point, so can throw away
partially executed instructions after exception
– Launch exception handler after flushing pipeline
time
t0 t1 t2 t3 t4 t5 t6 t7 ....
IF I1 I2 I3 I4 I5
ID I1 I2 I3 nop I5
Resource
EX I1 I2 nop nop I5
Usage
MA I1 nop nop nop I5
WB nop nop nop nop I5
12
Agenda
• Interrupts
• Out-of-Order Processors
13
Out-Of-Order (OOO) Introduction
Name Frontend Issue Writeback Commit
I4 IO IO IO IO Fixed Length Pipelines
Scoreboard
I2O2 IO IO OOO OOO Scoreboard
I2OI IO IO OOO IO Scoreboard,
Reorder Buffer, and Store Buffer
I03 IO OOO OOO OOO Scoreboard and Issue Queue
IO2I IO OOO OOO IO Scoreboard, Issue Queue,
Reorder Buffer, and Store Buffer
14
OOO Motivating Code Sequence
0 MUL R1, R2, R3 0 1
1 ADDIU R11,R10,1
2 MUL R5, R1, R4 2 4
15
I4: In-Order Front-End, Issue,
Writeback, Commit
F D X M W
16
I4: In-Order Front-End, Issue,
Writeback, Commit
X1
X0
F D W
M0 M1
17
I4: In-Order Front-End, Issue,
Writeback, Commit (4-stage MUL)
X1 X2 X3
X0
F D X2 X3
M0 M1 W
Y0 Y1 Y2 Y3
F D I M0 M1
X2 X3 W
Y0 Y1 Y2 Y3
ARF R W
SB R/W W
19
Basic Scoreboard
Data Avail.
P F 4 3 2 1 0
P: Pending, Write to
R1
Destination in flight
R2 F: Which functional unit
R3 is writing register
Data Avail.: Where is the
…
write data in the
R31 functional unit pipeline
F D I M0 M1 W
Y0 Y1 Y2 Y3
ARF R W
SB R R/W W
23
I2O2 Scoreboard
• Similar to I4, but we can now use it to track
structural hazards on Writeback port
• Set bit in Data Avail. according to length of
pipeline
• Architecture conservatively stalls to avoid
WAW hazards by stalling in Decode therefore
current scoreboard sufficient. More
complicated scoreboard needed for
processing WAW Hazards
24
0 MUL R1, R2, R3 F D I Y0 Y1 Y2 Y3 W
1 ADDIU R11,R10,1 F D I X0 W
2 MUL R5, R1, R4 F D I I I Y0 Y1 Y2 Y3 W
3 MUL R7, R5, R6 F D D D I I I I Y0 Y1 Y2 Y3 W
4 ADDIU R12,R11,1 F F F D D D D I X0 W
5 ADDIU R13,R12,1 F F F F D I X0 W
6 ADDIU R14,R12,2 F D I I X0 W
26
I2OI: In-order Frontend/Issue, Out-of-
order Writeback, In-order Commit
SB X0 PRF ARF
F D I L0 L1 W ROB
FSB
C
S0
Y0 Y1 Y2 Y3
ARF W
SB R/W W
PRF R W
ROB R/W W R/W
FSB W R/W
27
PRF=Physical Register File(Future File), ROB=Reorder Buffer, FSB=Finished Store Buffer (1 entry)
Reorder Buffer (ROB)
State S ST V Preg
--
P 1
F 1
P 1
P
F
P
P
--
--
State: {Free, Pending, Finished}
S: Speculative
ST: Store bit
V: Physical Register File Specifier Valid
Preg: Physical Register File Specifier 28
Reorder Buffer (ROB)
State S ST V Preg Next instruction allocates here in D
--
P 1 Tail of ROB
F 1 Speculative because branch is in flight
P 1
P
F Instruction wrote ROB out of order
P
P Head of ROB
--
--
State: {Free, Pending, Finished}
S: Speculative Commit stage is waiting for
ST: Store bit Head of ROB to be finished
V: Physical Register File Specifier Valid
Preg: Physical Register File Specifier 29
Finished Store Buffer (FSB)
V Op Addr Data
--
30
0 MUL R1, R2, R3 F D I Y0 Y1 Y2 Y3 W C
1 ADDIU R11,R10,1 F D I X0 W r C
2 MUL R5, R1, R4 F D I I I Y0 Y1 Y2 Y3 W C
3 MUL R7, R5, R6 F D D D I I I I Y0 Y1 Y2 Y3 W C
4 ADDIU R12,R11,1 F F F D D D D I X0 W r C
5 ADDIU R13,R12,1 F F F F D I X0 W r C
6 ADDIU R14,R12,2 F D I I X0 W r C
Cyc D I ROB 0 1 2 3
0 Empty = free entry in ROB
1 0
2 1 0 R1 State of ROB at beginning of cycle
3 2 1 R11
4 R5 Pending entry in ROB
5
6 3 2 R11 Circle=Finished (Cycle after W)
7 R7
8 R1
9 Last cycle before entry is freed from ROB
10 4 3
(Cycle in C stage)
11 5 4 R12
12 6 5 R13 R5
13 R14
14 6 R12
15 R13
16 R7 Entry becomes free and is freed
17 R14 on next cycle
18
19 31
What if First Instruction Causes an
Exception?
0 MUL R1, R2, R3 F D I Y0 Y1 Y2 Y3 W /
1 ADDIU R11,R10,1 F D I X0 W r -- /
2 MUL R5, R1, R4 F D I I I Y0 /
3 MUL R7, R5, R6 F D D D I /
4 ADDIU R12,R11,1 F F F D /
F D I. . .
32
What About Branches?
Option 2
0 BEQZ R1, target F D I X0 W C
1 ADDIU R11,R10,1 F D I X0 /
Squash instructions in ROB
2 ADDIU R5, R1, R4 F D I /
when Branch commits
3 ADDIU R7, R5, R6 F D /
T ADDIU R12,R11,1 F D I . . .
Option 1
0 BEQZ R1, target F D I X0 W C
1 ADDIU R11,R10,1 F D I -
Squash instructions earlier. Has more
2 ADDIU R5, R1, R4 F D -
complexity. ROB needs many ports.
3 ADDIU R7, R5, R6 F -
T ADDIU R12,R11,1 F D I . . .
Option 3
0 BEQZ R1, target F D I X0 W C
1 ADDIU R11,R10,1 F D I X0 W / Wait for speculative instructions to
2 ADDIU R5, R1, R4 F D I X0 W / reach the Commit stage and squash in
3 ADDIU R7, R5, R6 F D I X0 W /
Commit stage
T ADDIU R12,R11,1 F D I X0 W C
33
What About Branches?
• Three possible designs with decreasing
complexity based on when to squash speculative
instructions and de-allocate ROB entry:
1. As soon as branch resolves
2. When branch commits
3. When speculative instructions reach commit
34
Avoiding Stalling Commit on Store
Miss
PRF ARF
W ROB C CSB R
FSB
0 OpA F D I X0 W C CSB=Committed Store Buffer
1 SW F D I S0 W C C C C
2 OpB F D I X0 W W W W C
3 OpC F D I X X X X W C
4 OpD F D I I I I X W C
F D I I
Q
M0 M1 W
Y0 Y1 Y2 Y3
ARF R W
SB R R/W W
I W R/W W
36
Q
Issue Queue (IQ)
Op Imm S V Dest V P Src0 V P Src1
Op: Opcode
Imm.: Immediate
S: Speculative Bit
V: Valid (Instruction has
corresponding Src/Dest)
P: Pending (Waiting on
operands to be produced)
F D I I
Q
M0 F D M0
I
Y0 Q
B
I Y0
Centralized Distributed
38
Advanced Scoreboard
Data Avail.
P 4 3 2 1 0
P: Pending, Write to
R1
Destination in flight
R2 Data Avail.: Where is the
R3 write data in the pipeline
and which functional unit
…
R31
39
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 MUL R1, R2, R3 F D I Y0 Y1 Y2 Y3 W
1 ADDIU R11,R10,1 F D I X0 W
2 MUL R5, R1, R4 F D i I Y0 Y1 Y2 Y3 W
3 MUL R7, R5, R6 F D i I Y0 Y1 Y2 Y3 W
4 ADDIU R12,R11,1 F D i I X0 W
5 ADDIU R13,R12,1 F D i I X0 W
6 ADDIU R14,R12,2 F D i I X0 W
Cyc D I IQ 0 1 2
0
1 0 Dest/Src0/Src1, Circle denotes value
2 1 0 R1/R2/R3 present in ARF
3 2 1 R11/R10
4 3 R5/R1/R4
5 4 R7/R5/R6 Value bypassed so no circle, present
6 5 2 R12/R11 bit
7 6 4 R13/R12 Value set present by
8 5 R14/R12 Instruction 1 in cycle 5, W
9 Stage
10 3
11 6 R14/R12
12
13
40
14
Assume All Instruction in Issue Queue
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 MUL R1, R2, R3 F D i I Y0 Y1 Y2 Y3 W
1 ADDIU R11,R10,1 F D i I X0 W
2 MUL R5, R1, R4 F D i I Y0 Y1 Y2 Y3 W
3 MUL R7, R5, R6 F D i I Y0 Y1 Y2 Y3 W
4 ADDIU R12,R11,1 F D i I X0 W
5 ADDIU R13,R12,1 F D i I X0 W
6 ADDIU R14,R12,2 F D i I X0 W
41
IO2I: In-order Frontend, Out-of-order
Issue/Writeback, In-order Commit
SB X0 PRF ARF
F D I I
Q L0 L1 W ROB
FSB
C
S0
Y0 Y1 Y2 Y3
ARF W
SB R/W W
PRF R W
ROB R/W W R/W
FSB W R/W
42
IQ W R/W
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 MUL R1, R2, R3 F D I Y0 Y1 Y2 Y3 W C
1 ADDIU R11,R10,1 F D I X0 W r C
2 MUL R5, R1, R4 F D i I Y0 Y1 Y2 Y3 W C
3 MUL R7, R5, R6 F D i I Y0 Y1 Y2 Y3 W C
4 ADDIU R12,R11,1 F D i I X0 W r C
5 ADDIU R13,R12,1 F D i I X0 W r C
6 ADDIU R14,R12,2 F D i I X0 W r C
43
Out-of-order 2-Wide Superscalar
with 1 ALU
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 MUL R1, R2, R3 F D I Y0 Y1 Y2 Y3 W C
1 ADDIU R11,R10,1 F D I X0 W r C
2 MUL R5, R1, R4 F D i I Y0 Y1 Y2 Y3 W C
3 MUL R7, R5, R6 F D i I Y0 Y1 Y2 Y3 W C
4 ADDIU R12,R11,1 F D I X0 W r C
5 ADDIU R13,R12,1 F D i I X0 W r C
6 ADDIU R14,R12,2 F D i I X0 W r C
44
Acknowledgements
• These slides contain material developed and copyright by:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
– Joel Emer (Intel/MIT)
– James Hoe (CMU)
– John Kubiatowicz (UCB)
– David Patterson (UCB)
– Christopher Batten (Cornell)
45
Copyright © 2013 David Wentzlaff
46