0% found this document useful (0 votes)
28 views

08 Speculation

Speculation allows out-of-order execution by ignoring control dependencies and executing instructions before branches are resolved. This is enabled through branch prediction and speculative execution. Hardware mechanisms like the reorder buffer and register renaming allow recovering from mispredictions while preserving data and exception behavior. The reorder buffer buffers instruction results and commits them to registers and memory in program order.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

08 Speculation

Speculation allows out-of-order execution by ignoring control dependencies and executing instructions before branches are resolved. This is enabled through branch prediction and speculative execution. Hardware mechanisms like the reorder buffer and register renaming allow recovering from mispredictions while preserving data and exception behavior. The reorder buffer buffers instruction results and commits them to registers and memory in program order.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Dynamic ILP

Speculation

1
Outline

• Speculation
– Re-order buffers
• Limits to ILP

2
Speculation

Branch Prediction – Out of Order


Execution

3
Control Dependence Ignored
• If CPU stalls on branches, how much would
CPI increase?

• Control dependence need not be preserved in


the whole execution
– willing to execute instructions that should not have been
executed, thereby violating the control dependences, if
can do so without affecting correctness of the program
• Two properties critical to program
correctness are:
– data flow
– exception behavior

4
Branch Prediction and
Speculative Execution
• Speculation is to run • Example:
instructions on • for (i=0; i<1000; i++)
prediction – predictions
could be wrong. • C[i] = A[i]+B[i];

• Branch prediction: • Branch prediction:


cannot be avoided, predict the execution as
could be very accurate accurate as possible
(frequent cases)
• Speculative execution
• Misprediction is less recovery: if prediction is
frequent event – but can wrong, roll the execution
we ignore? back

5
Exception Behavior
• Preserving exception behavior -- exceptions must be
raised exactly as in sequential execution
– Same sequence as sequential
– No “extra” exceptions
• Example:
DADDU R2,R3,R4
BEQZ R2,L1
LW R1,0(R2)
L1:
– Problem with moving LW before BEQZ?
• Again, a dynamic execution must look like a sequential
execution, any time when it is stopped

6
Exceptions in Order

• Solutions:

– Early detection of FP exceptions

– The use of software mechanisms to restore a precise

exception state before resuming execution,

– Delaying instruction completion until we know an

exception is impossible

7
Precise Interrupts
• An interrupt is precise if the saved process
state corresponds with a sequential model of
program execution where one instruction
completes before the next begins.
• Tomasulo had:
In-order issue, out-of-order execution, and
out-of-order completion

• Need to “fix” the out-of-order completion


aspect so that we can find precise breakpoint
in instruction stream.

8
Short Seminar – Precise
Exceptions

1. 01277582(Implementation of precise exception


in a 5-stage pipeline embedded processor -
CNF03).pdf
2. 01354393(A 0.18-spl mu-m CMOS
implementation of an area efficient precise
exception handling unit for processing-in-
memory systems - CNF04).pdf
3. 00004607(Implementing precise interrupts in
pipelined processors - JNL88).pdf

9
HW Support for More ILP
• Speculation: allow an instruction to issue that is
dependent on branch predicted to be taken without
any consequences (including exceptions) if branch is
not actually taken (“HW undo”);
• Combine branch prediction with dynamic scheduling
to execute before branches resolved
• Separate speculative bypassing of results from real
bypassing of results
– When instruction no longer speculative,
write boosted results (instruction commit)
or discard boosted results
– execute out-of-order but commit in-order
to prevent irrevocable action (update state or exception)
until instruction commits

11
HW support for More ILP

• Need HW buffer for results of


uncommitted instructions: reorder
buffer
– 4 fields: instr, destination, value, ready
– Reorder buffer can be operand source =>
more registers like RS
– Use reorder buffer number instead of
reservation station when execution
completes
– Supplies operands between execution
complete & commit
– Once operand commits,
result is put into register
– Instructions commit in order
– As a result, its easy to undo speculated
instructions
on mispredicted branches
or on exceptions

12
Reorder Buffer Implementation

13
Result Shift Register
• Result Shift Register" is used to control
the result bus
• N is the length of the longest functional
unit pipeline
• An instruction that takes i clock
periods reserves stage i
• If the stage already contains valid
control information, then issue is held
until the next clock period
• Issuing instruction places control
information in the result shift register.
– the functional unit that will be supplying the
result
– the destination register
– This control information is also marked
"valid"
• Each clock period, the control
information is shifted down one stage
toward stage one.
• When it reaches stage one, it is used
during the next clock period to control
the result bus
14
The Hardware: Reorder Buffer
• If inst write results in program order,
reg/memory always get the correct IM
values

Fetch Unit
• Reorder buffer (ROB) – reorder out-of-
order inst to program order at the time of
writing reg/memory (commit)
Reorder
• If some inst goes wrong, handle it at the Decode Rename Regfile Buffer
time of commit – just flush inst
afterwards

• Inst cannot write reg/memory


immediately after execution, so ROB also
buffer the results S-buf L-buf RS RS
No such a place in Tomasulo original
DM FU1 FU2

15
Four Steps of Speculative
Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station and reorder buffer slot free, issue instr & send
operands & reorder buffer no. for destination (this stage sometimes
called “dispatch”)
2. Execution—operate on operands (EX)
When both operands ready then execute; if not ready, watch CDB for
result; when both in reservation station, execute; checks RAW
(sometimes called “issue”)
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4. Commit—update register with reorder result
When instr. at head of reorder buffer & result present, update register
with result (or store to memory) and remove instr from reorder buffer.
Mispredicted branch flushes reorder buffer (sometimes called
“graduation”)

16
Reorder Buffer Details
• Holds Instruction type: branch, store, ALU

Program Counter
Branch or L/W?
register operation
• Holds branch valid and exception bits

Exceptions?
Dest reg
– Flush pipeline when any bit is set

Ready?
Result
• Holds dest, result and PC
– Write results to dest at the time of commit
– Which PC to hold?
• A ready bit indicates if the Reorder Buffer
instruction has completed
execution and the value is ready
• Supplies operands between execution
complete and commit
• ROB replaces the Store Buffer also
17
Speculative Execution
Recovery
IM
Flush the pipeline on mis-
prediction Fetch Unit
– MIPS 5-stage pipeline
used flushing on taken Reorder
branches Decode Rename Regfile Buffer
• Where is the flush signal
from?
• When to flush?
S-buf L-buf RS RS
DM FU1 FU2

18
Changes to Other Components
• Use ROB index as tag
– Why not RS index any more?
– Why is ROB index a valid choice?
• Renaming table maps architecture registers
to ROB index if the register is renamed
• Reservation stations now use ROB index for
tracking dependence and for wakeup
• Again tag (now ROB index) and data are
broadcast on CDB at writeback
• Inst may receive values from reg/mem, data
broadcasting, or ROB

19
Code Example
Loop: LD R2, 0(R1)
DADDIU R2, R2, #1
SD R2, 0(R1)
DADDIU R1, R1, #4
BNE R2, R3, Loop
How would this code be executed?
Inst Issue Exec Memoryre Write Commit
ad results

LD 1 2 3 4 5

… … … … … …

… … … … … …

21
Summary
• Reservations stations: implicit register renaming to larger
set of registers + buffering source operands
– Prevents registers as bottleneck
– Avoids WAR, WAW hazards of Scoreboard
• Not limited to basic blocks when compared to static
scheduling (integer units gets ahead, beyond branches)
• Today, helps cache misses as well
– Don’t stall for L1 Data cache miss
– Can support memory-level parallelism
• Lasting Contributions
– Dynamic scheduling
– Register renaming
– Load/store disambiguation
• 360/91 descendants are Pentium III; PowerPC 604; MIPS
R10000; HP-PA 8000; Alpha 21264

22
Dynamic Scheduling: The Only
Choice?
• Most high-performance processors today are dynamically
scheduled superscalar processors
– With deeper and n-way issue pipeline
• Other alternatives to exploit instruction-level parallelism
– Statically scheduled superscalar
– VLIW
• Mixed effort: EPIC – Explicit Parallel Instruction Computing
– Example: Intel Itanium processors

Why is dynamic scheduling so popular today?


– Technology trends: increasing transistor budget, deeper pipeline, wide
issue

23

You might also like