Lecture-10-pre
Lecture-10-pre
In-order commit
and multiple issue
CPEN 411: Computer Architecture
In-order commit
and multiple issue
• Speculative execution
• Reorder buffer
• Multiple issue
Learning objectives
• Describe the scenarios where out-of-order instruction
commit is not desirable for CPUs
• Mispredicted branch
Clock Number
1 2 3 4 5 6 7 8
TAKEN branch F D X M W
branch instr. +1 F D X M W
branch instr. +2 F D X M W
branch target F D X M W
Why would we flush the pipeline(s)?
• Mispredicted branch
Clock Number
1 2 3 4 5 6 7 8
TAKEN branch F D X M W
branch instr. +1 F D X M W
branch instr. +2 F D X M W
branch target F D X M W
• Mispredicted branch
Clock Number
1 2 3 4 5 6 7 8
TAKEN branch F D X M W
branch instr. +1 F D X M W
branch instr. +2 F D X M W
branch target F D X M W
ns
• Trap (debugger breakpoint)
io
pt
• OS context switch (timer)
ce
• Page fault (unmapped virt. addr.)
ex
Exceptions
Exceptions
exception
handler
address
“nop”/“squash” injection signals
Exception handling support
cancelled instructions
Which of these is not
So far a precise exception?
A: division by 0
B: segmentation fault
C: context switch
D: file does not exist
• Precise exceptions E: all are precise exceptions
A: division by 0
B: segmentation fault
C: context switch
D: file does not exist ✓
• Precise exceptions E: all are precise exceptions
Reorder
Buffer
Inst.
Queue
Commit
Regs
Issue
Complete
The Reorder Buffer (ROB)
ROB5
ROB4
Reorder Buffer
ROB3
Cycle 3: Oldest
DIV issued, allocates R3 PC=0x00(DIV R1,R1,R2) N
ROB2
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With ReorderInDIVD,
Buffer
reservation station for
what does “1”
represent?
Dest Value Program Counter Done?
Intr. A: R1 ROB7 Newest
Queue
B: ROB1ROB6
ROB5
ROB4
Reorder Buffer
ROB3
Cycle 3: Oldest
DIV issued, allocates R3 PC=0x00(DIV R1,R1,R2) N
ROB2
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With ReorderInDIVD,
Buffer
reservation station for
what does “1”
represent?
Dest Value Program Counter Done?
Intr. A: R1 ROB7 Newest
Queue
✓
B: ROB1ROB6
ROB5
ROB4
Reorder Buffer
ROB3
Cycle 3: Oldest
DIV issued, allocates R3 PC=0x00(DIV R1,R1,R2) N
ROB2
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
ROB4
Reorder Buffer
ROB3
Cycle 4: - - PC=0x04(BEQZ R3,Loop) N ROB2
Oldest
BEQZ issued, allocates R3 PC=0x00(DIV R3,R1,R2) N
ROB1
ROB and Reservation
Station entries R4 Regs[R4]
R3 ROB1
R2 Regs[R2]
RAT+Registers R1 Regs[R1]
Dest
Dest
1 DIVD Regs[R1],Regs[R2] BEQZ ROB1,Loop
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
ROB4
Reorder Buffer
R4 PC=0x20(DMUL R4,R4,R2) N
Cycle 5: - - N
ROB3
Oldest
DMUL issued, allocates
PC=0x04(BEQZ R3,Loop)
ROB2
R3 PC=0x00(DIV R3,R1,R2) N
ROB and Reservation ROB1
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
R4 PC=0x24(DADD R4,R4,R1) N ROB4
Reorder Buffer
Cycle 6: R4 PC=0x20(DMUL R4,R4,R2) N ROB3
DADD issued, allocates - - PC=0x04(BEQZ R3,Loop) N ROB2
Oldest
ROB and Reservation R3 PC=0x00(DIV R3,R1,R2) N
ROB1
Station entries, updates
Register alias table R4 ROB4
R3 ROB1
R2 Regs[R2]
R1 Regs[R1]
Dest
Dest
4 DADD ROB3,Regs[R1]
1 DIVD Regs[R1],Regs[R2] BEQZ ROB1,Loop
3 DMUL Regs[R4],Regs[R2]
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
R4 PC=0x24(DADD R4,R4,R1) N ROB4
Reorder Buffer
R4 X PC=0x20(DMUL R4,R4,R2) Y ROB3
Oldest
Cycle 16: - - PC=0x04(BEQZ R3,Loop) N ROB2
Reservation
Stations
adder multipliers
multipliers branch
ROB3, X
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
PC=0x24(DADD R4,R4,R1) Y ROB4
Reorder Buffer R4 Y
R4 X PC=0x20(DMUL R4,R4,R2) Y ROB3
Oldest
Cycle 17: - - PC=0x04(BEQZ R3,Loop) N ROB2
Reservation
Stations
adder multipliers
multipliers branch
ROB4, Y
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
PC=0x24(DADD R4,R4,R1) Y ROB4
Reorder Buffer R4 Y
R4 X PC=0x20(DMUL R4,R4,R2) Y ROB3
Oldest
Cycle 104: - - PC=0x04(BEQZ R3,Loop) N ROB2
Reservation
Stations
adder multipliers
multipliers branch
ROB1, “42”
Tomasulo With Reorder Buffer
Done?
Intr. ROB7 Newest
Queue
ROB6
ROB5
PC=0x24(DADD R4,R4,R1) Y ROB4
Reorder Buffer R4 Y
R4 X PC=0x20(DMUL R4,R4,R2) Y
Cycle 105: - - N
ROB3
Oldest
Register R3 updated
PC=0x04(BEQZ R3,Loop)
ROB2
R3 “42” PC=0x00(DIV R3,R1,R2) Y
(ROB1 released). ROB1
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With Reorder Buffer
Since the branch was mispredicted
should we flush it and execute it
again too?
Done?
Intr. A: Yes ROB7 Newest
Queue
B: No ROB6
ROB5
PC=0x24(DADD R4,R4,R1) Y ROB4
Reorder Buffer R4 Y
R4 X PC=0x20(DMUL R4,R4,R2) Y
Cycle 105: - - N
ROB3
Oldest
Register R3 updated
PC=0x04(BEQZ R3,Loop)
ROB2
R3 “42” PC=0x00(DIV R3,R1,R2) Y
(ROB1 released). ROB1
Reservation
Stations
adder multipliers
multipliers branch
Tomasulo With Reorder Buffer
Since the branch was mispredicted
should we flush it and execute it
again too?
Done?
Intr. A: Yes ROB7 Newest
Queue
B: No ✓ ROB6
ROB5
PC=0x24(DADD R4,R4,R1) Y ROB4
Reorder Buffer R4 Y
R4 X PC=0x20(DMUL R4,R4,R2) Y
Cycle 105: - - N
ROB3
Oldest
Register R3 updated
PC=0x04(BEQZ R3,Loop)
ROB2
R3 “42” PC=0x00(DIV R3,R1,R2) Y
(ROB1 released). ROB1
Reservation
Stations
adder multipliers
multipliers branch
Some ROB challenges
• inorder superscalar
used today in
some low-power
processors
A: RAW
• issue up to N instructions per cycle B: WAW
• at issue: C: WAR
D: A and B
– check each instruction for hazards with co-issued instructions
that are earlier in program order E: B and C
– also against all earlier instructions still in execution
A: RAW
• issue up to N instructions per cycle B: WAW
• at issue: C: WAR
D: A and B
– check each instruction for hazards with co-issued instructions
that are earlier in program order E: B and C ✓
– also against all earlier instructions still in execution
Clock A: 1 instruction
Number
B: 2-3 instructions
1 2 3 4 C: 4-5
5 instructions
6 7 8
integer instruction IF ID EX MEM D: 6-7 instructions
WB
E: Not sure
FP instruction IF ID EX EX EX WB
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
Assuming branch resolved in
2-Issue In-order Superscalar execute, what is the branch
penalty measured in number of
instructions squashed/flushed on a
branch misprediction?
Clock A: 1 instruction
Number
B: 2-3 instructions
1 2 3 4 C: 4-5
5 instructions
6 7✓ 8
integer instruction IF ID EX MEM D: 6-7 instructions
WB
E: Not sure
FP instruction IF ID EX EX EX WB
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
2-Issue In-order Superscalar
Clock Number
1 2 3 4 5 6 7 8
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
integer instruction IF ID EX MEM WB
FP instruction IF ID EX EX EX WB
Load1
Load2
Load3 Load Buffers
Load4 2x / cycle Store
Load5 Buffers
Load6
Add1
Add2 Mult1
Add3 Mult2
Reservation To Mem
Stations
FP adders FP multipliers
Assumptions:
(1) 2-wide issue
(2) Infinite number of reservation stations
(3) Perfect branch prediction (let’s do three iterations of loop)
(4) Issue instruction from target of taken branch 1 cycle after branch (due to fetch restrictions)
(5) One Integer Unit (handles load/store effective address calculation)
(6) Separate FP Function Unit for each type of FP operation
(7) Issue and Write Results take one cycle
(8) Latencies: One cycle for integer ALU; two cycles for loads; three cycles for FP adds
(9) Two CDBs
(10) Load/Store effective address calculation decoupled from memory access
(11) NO Reorder Buffer
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Example: 2-issue w/ Tomasulo
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Resource usage:
Changed Assumptions:
…
(5) Address Adder & Integer Unit
…
Modified Example, cont’d...
Memory
cycle:
#8
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
Memory
cycle:
#8
#8
• Precise exceptions
• Speculative execution
• Superscalar issue