CH16 ParallelismSuperScalar 22 Slides
CH16 ParallelismSuperScalar 22 Slides
Instruction-Level
Parallelism and
Chapter Superscalar
Processors
16
William Stallings, Computer Organization and Architecture, 9 th Edition
+ Objectives
Parallel execution High performance
After studying this chapter, you should be able to:
Explainthe difference between superscalar and
super pipelined approaches.
Define instruction-level parallelism.
Discuss dependencies and resource conflicts as
limitations to instruction level parallelism
Presentan overview of the design issues
involved in instruction-level parallelism.
Compare and contrast techniques of improving
pipeline performance in RISC machines and
superscalar machines.
+
Contents
16.1 Overview
16.2 Design Issues
16.1- Superscalar
Overview Refers to a machine
that is designed to
Term first coined in improve the
1987 performance of the
execution of scalar
instructions
Essence of the
Concept can be
approach is the ability
further exploited by
to execute
allowing instructions
instructions
to be executed in an
independently and
order different from
concurrently in
the program order
different pipelines
Compare
Some
results
+
Comparison
of
Superscalar
and
Superpipelin
e Approaches
+
Constraints
Instruction level parallelism
Refers to the degree to which the instructions of a
program can be executed in parallel
A combination of compiler based optimization and
hardware techniques can be used to maximize
instruction level parallelism Input of the next instruction is
the output of the previous
Limitations:
Previous instruction is a
True data dependency branch, code of the target can
Procedural dependency cause affects on input of the
next access the same resource (bus,
Resource conflicts 2 instructions
registers,…)
Output dependency
2 instructions write values to the same output
Anti-dependency (Write-after-write)
Situations in which parallel
executions can not be used Write-after-read situation
+
Constraints - Examples
1. A = 3
Data dependency Order of instructions can not be
2. B = A
changed They can not be parallelized
3. C = B
1. B = 3
Instruction 1, 3 can not be parallelized be cause they are
2. A = B + 1
Write-after-write (WAW) Output dependency
3. B = 7
i2 waits resources
which are being
accessed by i1
+
Design Issues
Instruction-Level Parallelism
and Machine Parallelism
Machine Parallelism
Ability to take advantage of instruction level
parallelism
Governed by number of parallel pipelines
+ Instruction Issue Policy
Instruction issue
Refers to the process of initiating instruction execution in the
processor’s functional units
An instruction buffer
(instruction window)
is used to store
instructions which
are ready for
executing. After a
processor has
finished decoding an
instruction, it is
placed in it. As long
as this buffer is not
full, the processor
can continue to fetch
and decode new Any instruction in the buffer will be issued out-of-order if
instructions. (1) It needs the particular functional unit that is available, and
(2) No conflicts or dependencies block this instruction.
Another buffer (reorder buffer) can be used as a temporary storage for results completed out
of order that are then committed to the register file in program order
Register Renamingoccur
Output and antidependencies
because register contents may not
reflect the correct ordering from the
program
Compiler techniques
Registersattempt to maximize
allocated the use of registers
dynamically
maximizing the number of storage conflicts if parallel execution is
applied. Register renaming is a technique of duplication of resources
(more registers are added). Registers are allocated dynamically by
the processor hardware, and they are associated with the values needed
by instructions at various points in time. Thus, the same original
register reference in several different instructions may refer to
Register Renaming- Example
R3: logical register
R3a :a hardware register allocated dynamically