0% found this document useful (0 votes)

135 views

Lec5 PDF

This document discusses superscalar processors, which can execute multiple scalar instructions simultaneously by exploiting instruction-level parallelism. It covers superpipelines, which increase parallelism by dividing pipeline stages into substages. It also discusses dependency issues that limit parallel execution, such as resource conflicts, control dependencies, and data dependencies. The window of execution is the set of instructions considered for parallel execution, limited by factors like instruction fetch rates and dependency analysis. Data dependencies are classified as true, output, or anti dependencies and affect the order of instruction execution.

Uploaded by

Dalton

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views

Lec5 PDF

Uploaded by

Dalton

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Lecture 5: Superscalar Processors

Definition and motivation

Superpipeline

Dependency issues

Parallel instruction
execution

1
Superscalar Architecture
Superscalar is a computer designed to improve
the performance of the execution of scalar
instructions.
A scalar is a variable that can hold only one atomic
value at a time, e.g., an integer or a real.
A scalar architecture processes one data item at a time -
- the computers we discussed up till now.
Examples of non-scalar variables:
Arrays
Matrices
Records

2
Superscalar Architecture (Contd)
In a superscalar architecture (SSA), several
scalar instructions can be initiated simultaneously
and executed independently.
Pipelining allows also several instructions to be
executed at the same time, but they have to be
in different pipeline stages at a given moment.
SSA includes all features of pipelining but,
in addition, there can be several instructions
executing simultaneously in the same pipeline
stage.
SSA introduces therefore a new level
of parallelism, called instruction-level
parallelism.
3
Motivation
Most operations are on scalar quantities (about 80%).
Speedup these operations will lead to large overall
performance improvement.
How to implement the idea?
A SSA processor fetches multiple instructions at a time, and
attempts to find nearby instructions that are independent of
each other and therefore can be executed in parallel.
Based on the dependency analysis, the processor may issue
and execute instructions in an order that differs from that
of the original machine code.
The processor may eliminate some unnecessary
dependencies by the use of additional registers and
renaming of register references.

4
Superpipelining
Superpipelining is based on dividing the stages of a pipeline
into several sub-stages, and thus increasing the number of
instructions which are handled by the pipeline at the same
time.
For example, by dividing each stage into two sub-stages,
a pipeline can perform at twice the speed in the ideal
situation.
Many pipeline stages may perform tasks that require less than
half a clock cycle.
No duplication of hardware is needed for these stages.

5
Superpipelining (Contd)
For a given architecture and the corresponding instruction
set there is an optimal number of pipeline stages/sub-
stages.
Increasing the number of stages/sub-stages over this limit
reduces the overall performance.
Overhead of data buffering between the stages.
Not all stages can be divided into (equal-length) sub-stages.
The hazards will be more difficult to resolved.
The clock skew problem.
More complex hardware.

6
Superscalar vs. Superpipeline
Base machine: 4-stage
pipeline
Instruction fetch
Operation decode
Operation execution
Result write back
Superpipeline of degree 2
A sub-stage often takes
half a clock cycle to finish.
Superscalar of degree 2
Two instructions are
executed concurrently
in each pipeline stage.
Duplication of hardware
is required by definition.
7
Superpipelined Superscalar Design

This is a new trend of architecture design:

Pentium Pro(P6): 3-degree superscalar, 12-
stage superpipeline.
PowerPC 620: 4-degree superscalar, 4/6-stage pipeline.

8
Basic Superscalar Concepts
SSA allows several instructions to be issued and
completed per clock cycle.
It consists of a number of pipelines that are
working in parallel.
Depending on the number and kind of parallel
units available, a certain number of instructions
can be executed in parallel.
In the following example two floating point
and two integer operations can be issued and
executed simultaneously.
Each unit is also pipelined and can execute several
operations in different pipeline stages.

9
An SSA Example

10
Lecture 5: Superscalar Processors
Definition and motivation

Superpipeline

Dependency issues

Parallel instruction
execution

11
Parallel Execution Limitation
The situations which prevent instructions to be executed
in parallel by SSA are very similar to those which prevent
efficient execution on a pipelined architecture (pipeline
hazards):
Resource conflicts.
Control (procedural) dependency.
Data dependencies.
Their consequences on SSA are more severe than those on
simple pipelines, because the potential of parallelism in SSA
is greater and, thus, a larger amount of performance will be
lost.
Instruction-level parallelism = the degree in which, on
average, the instructions of a program can be executed in
parallel.

12
Resource Conflicts
Several instructions compete for the same
hardware resource at the same time.
e.g., two arithmetic instructions need the same floating-
point unit for execution.
similar to structural hazards in pipeline.
They can be solved partly by introducing several
hardware units for the same functions.
e.g., have two floating-point units.
the hardware units can also be pipelined to support
several operations at the same time.

13
Procedural Dependency
The presence of branches creates major problems
in assuring the optimal parallelism.
cannot execute instructions after a branch in parallel
with instructions before a branch.
similar to control hazards in pipeline.
If instructions are of variable length, they cannot
be fetched and issued in parallel, since an
instruction has to be decoded in order to identify
the following one.
therefore, superscalar techniques are more efficiently
applicable to RISCs, with fixed instruction length and
format.

14
Data Conflicts
Caused by data dependencies between
instructions in the program.
similar to date hazards in pipeline.
To address the problem and to increase the
degree of parallel execution, SSA provides a
great liberty in the order in which instructions are
issued and executed.
Therefore, data dependencies have to be
considered and dealt with much more carefully.

15
Window of Execution
Due to data dependencies, only some part of the
instructions are potential subjects for parallel execution.
In order to find instructions to be issued in parallel, the
processor has to select from a sufficiently large instruction
sequence.
There are usually a lot of data dependencies in a short
instruction sequence.
Window of execution is defined as the set of instructions
that is considered for execution at a certain moment.
The number of instructions in the window should be as
large as possible. However, this is limited by:
Capacity to fetch instructions at a high rate.
The problem of branches.
The cost of hardware needed to analyze data dependencies.

16
Window of Execution Example

17
Window of Execution Example

18
Window of Execution (contd)
The window of execution can be extended over
basic block borders by branch prediction.
Speculative execution.
With speculative execution, instructions of the
predicted path are entered into the window of
execution.
Instructions from the predicted path are executed
tentatively.
If the prediction turns out to be correct the state change
produced by these instructions will become permanent
and visible (the instructions commit);
Otherwise, all effects are removed.

19
Data Dependencies
All instructions in the window of execution may
begin execution, subject to data dependence and
resource constraints.

Three types of data dependencies can be

identified:
True data dependency
Output dependency Artificial dependencies
Anti-dependency

20
True Data Dependency
True data dependencies exist when the output of one
instruction is required as an input to a subsequent
instruction:
MUL R4,R3,R1 (R4 := R3 * R1)
. . .
ADD R2,R4,R5 (R2 := R4 + R5)
can fetch and decode second instruction in parallel with first.
can NOT execute second instruction until first is finished.
They are intrinsic features of the users program, and
cannot be eliminated by compiler or hardware techniques.
They have to be detected and handled by hardware.
The addition above cannot be executed before the result of the
multiplication is available.
The simplest solution is to stall the adder until the multiplier
has finished.
In order to avoid the adder to be idle, the hardware can find
other instructions which can be executed by the adder.
21
True Data Dependency Example

There are often a lot of true data dependencies in a small

region of a program.
Increasing the window size can reduce the impact of these
dependencies.
A compiler cannot help to eliminate them!

22
Output Dependency
An output dependency exists if two instructions are
writing into the same location.
If the second instruction writes before the first one, an error
occurs:
MUL R4,R3,R1 (R4 := R3 * R1)
. . .
ADD R4,R2,R5 (R4 := R2 + R5)

23
Anti-dependency
An anti-dependency exists if an instruction uses a
location as an operand while a following one is writing into
that location.
If the first one is still using the location when the second one
writes into it, an error occurs:
MUL R4,R3,R1 (R4 := R3 * R1)
. . .
ADD R3,R2,R5 (R3 := R2 + R5)

24
Output and Anti- Dependencies
Output dependencies and anti-dependencies are not
intrinsic features of the executed program.
They are not real data dependencies but storage conflicts.
They are due to the competition of several instructions for the
same register.
They are only the consequence of the manner in which
the programmer or the compiler are using registers (or
memory locations).
In the previous examples the conflicts are produced only
because:
The output dependency: R4 is used by both instructions to
store the result (due to, for example, optimization of register
usage);
The anti-dependency: R3 is used by the second instruction to
store the result.

25
Output and Anti- Dependencies (Contd)

Output dependencies and anti-dependencies can usually be

eliminated by using additional registers.
This technique is called register renaming.

MUL R4,R3,R1 (R4 := R3 * R1)

. . .
ADD R4,R2,R5 (R4 := R2 + R5)

MUL R4,R3,R1 (R4 := R3 * R1)

. . .
ADD R3,R2,R5 (R3 := R2 + R5)

26
Effect of Dependencies

27
Lecture 5: Superscalar Processors
Definition and motivation

Superpipeline

Dependency issues

Parallel instruction
execution

28
Instruction vs Machine Parallelism
Instruction-level parallelism (ILP) - the average
number of instructions in a program that a processor might
be able to execute at the same time.
Mostly determined by the number of true (data) dependencies
and procedural (control) dependencies in relation to the
number of other instructions.
Machine parallelism of a processor - the ability of the
processor to take advantage of the ILP of the program.
Determined by the number of instructions that can be fetched
and executed at the same time, i.e., the capacity of the
hardware.
To achieve high performance, we need both ILP and
machine parallelism.
The ideal situation is that we have the same ILP and machine
parallelism.

29
Division and Decoupling
To increase ILP, we should divide the instruction execution
into smaller tasks and decouple them. In particular, we have
three important activities :

Instruction issue - an instruction is initiated and starts

execution.
Instruction completion - an instruction has competed its
specified operations.
Instruction commit - the results of the instruction
operations are written back to the register files or cache.
The machine state is changed. e important activities:

30
SSA Instruction Execution Policies
Instructions can be executed in an order
different from the strictly sequential one, with
the requirement that the results must be the
same.

Execution policies usually used:

In-order issue with in-order completion.
In-order issue with out-of-order completion.
Out-of-order issue with out-of-order completion.
Out-of-order issue with in-order completion.

31
In-Order Issue with In-Order Completion

Instructions are issued in exact program order, and completed in

the same order (with parallel issue and completion, of course!).
An instruction cannot be issued before the previous one has been
issued;
An instruction cannot be completed before the previous one has been
completed.
To guarantee in-order completion, an instruction will stall when
there is a conflict and when a unit requires more than one cycle to
execute.
Example:
Assume a processor that can issue and decode two instructions
per cycle, that has three functional units (two single-cycle integer
units, and a two-cycle floating-point unit), and that can complete
and write back two results per cycle.
And an instruction sequence with the characteristics given in the
next slide.

32
IOI with IOC Example
I1 needs two execute cycles (floating-point)
I2
I3
I4 needs the same function unit as I3
I5 needs data value produced by I4
I6 needs the same function unit as I5

33
IOI with IOC Discussion
The processor detects and handles (by stalling)
true data dependencies and resource conflicts.
The basic idea of SSA is not to rely on compiler-
based technique (compatibility consideration).
SSA allows the hardware alone to detect
instructions which can be executed in parallel and
to do that accordingly.
IOI with IOC is not very efficient, but it simplifies
the hardware.

34
In-Order Issue w. Out-of-Order Completion

With out-of-order completion, a later instruction may

complete before a previous one.
Address mainly the issue of long-latency operations such as
division.

35
Out-of-Order Issue w. Out-of-Order Completion

With in-order issue, no new instruction can be issued when

the processor has detected a conflict, and is stalled until
after the conflict has been resolved.
The processor is not allowed to look ahead for further
instructions, which could be executed in parallel with the
current ones.
Out-of-order issue takes a set of decoded instructions,
issues any instruction, in any order, as long as the program
execution is correct.
Decouple decode pipeline from execution pipeline, by
introducing an instruction window.
When a functional unit becomes available an instruction can be
executed.
Since instructions have been decoded, processor can look
ahead.

36
OOI with OOC Example

37
Speedup w/o Procedural Dependencies

38
Summary
The following techniques are main features for superscalar
processors:
Several pipelined units which are working in parallel;
Out-of-order issue and out-of-order completion;
Register renaming.
All of the above techniques are aimed to enhance
performance.
Experiments have shown:
Only adding additional functional units is not very efficient;
Out-of-order issue is extremely important, which allows to look
ahead for independent instructions;
Register renaming can improve performance with more
than 30%; in this case performance is limited only by true
dependencies.
It is important to provide a fetching/decoding capacity so that
the window of execution is sufficiently large.
39

Lecture 1 Introduction To Computer Architecture and Organization
No ratings yet
Lecture 1 Introduction To Computer Architecture and Organization
69 pages
Computer Concepts and C Programming - Revised
50% (2)
Computer Concepts and C Programming - Revised
2 pages
Lect5 PDF
No ratings yet
Lect5 PDF
21 pages
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
43 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
Module 5 Instruction Level Parallelism and Pipelining (1)
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining (1)
54 pages
7TH_UNIT 2-21EC74H6_CA
No ratings yet
7TH_UNIT 2-21EC74H6_CA
95 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
Computer Architecture_Lecture 13
No ratings yet
Computer Architecture_Lecture 13
18 pages
Unit 1
No ratings yet
Unit 1
5 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
28 pages
CH16-WS ILP and Superscalar-v2
No ratings yet
CH16-WS ILP and Superscalar-v2
42 pages
Instruction level Parallelism
No ratings yet
Instruction level Parallelism
22 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Instruction-Level Parallelism and Superscalar Processors
22 pages
Superscaling in Computer Architecture
No ratings yet
Superscaling in Computer Architecture
9 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Superscalar and Superpipelined Processors
No ratings yet
Superscalar and Superpipelined Processors
4 pages
Instruction Level Parallelism: Soner Onder
No ratings yet
Instruction Level Parallelism: Soner Onder
25 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Android Intents 1
No ratings yet
Android Intents 1
30 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Superscalar Processors: What Is A Superscalar Architecture?
No ratings yet
Superscalar Processors: What Is A Superscalar Architecture?
9 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
Instruction Level Parallelism and Superscalar Processors
No ratings yet
Instruction Level Parallelism and Superscalar Processors
34 pages
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
No ratings yet
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
47 pages
4-Advanced pipelining_241114_060906
No ratings yet
4-Advanced pipelining_241114_060906
80 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
Unit 5
No ratings yet
Unit 5
36 pages
Architecture PDF
No ratings yet
Architecture PDF
19 pages
Superscalar Processors Questions
No ratings yet
Superscalar Processors Questions
12 pages
MPMC Module 5
No ratings yet
MPMC Module 5
25 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Onur Digitaldesign Comparch 2021 Lecture17a Dataflow Superscalar Afterlecture
No ratings yet
Onur Digitaldesign Comparch 2021 Lecture17a Dataflow Superscalar Afterlecture
24 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
COA Pipelining
No ratings yet
COA Pipelining
35 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
37 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Chapter 3 Parallel and Pipelined Processing: 1 ECE734 VLSI Arrays For Digital Signal Processing
No ratings yet
Chapter 3 Parallel and Pipelined Processing: 1 ECE734 VLSI Arrays For Digital Signal Processing
13 pages
Teradata Interview Questions 3
No ratings yet
Teradata Interview Questions 3
36 pages
CarvalhoTrindade Cornellgrad 0058F 11810
No ratings yet
CarvalhoTrindade Cornellgrad 0058F 11810
297 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
ICON Tutorial 2017
No ratings yet
ICON Tutorial 2017
160 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
Chapter 1 Computer Abstractions and Technology
No ratings yet
Chapter 1 Computer Abstractions and Technology
23 pages
23S1-SS ZG653-M1-CS02B - WhatIsSoftArch
No ratings yet
23S1-SS ZG653-M1-CS02B - WhatIsSoftArch
39 pages
4 Bit ALU (COA)
No ratings yet
4 Bit ALU (COA)
10 pages
(Ebook) Automated Machine Learning in Action by Qingquan Song, Haifeng Jin, Xia Hu ISBN 9781617298059, 1617298050 - The full ebook with all chapters is available for download now
100% (3)
(Ebook) Automated Machine Learning in Action by Qingquan Song, Haifeng Jin, Xia Hu ISBN 9781617298059, 1617298050 - The full ebook with all chapters is available for download now
79 pages
Internal Assignment: Name Sneha Sankhla Roll Number 2214505216 Program Master of Computer Applications (Mca) Semester 1
No ratings yet
Internal Assignment: Name Sneha Sankhla Roll Number 2214505216 Program Master of Computer Applications (Mca) Semester 1
13 pages
1
No ratings yet
1
14 pages
UGRD-ITE6300 Cloud Computing and Internet of Things Prelim Exam
No ratings yet
UGRD-ITE6300 Cloud Computing and Internet of Things Prelim Exam
4 pages
Final Year B.tech CSE
No ratings yet
Final Year B.tech CSE
39 pages
Unit I 2 Marks With Answer
No ratings yet
Unit I 2 Marks With Answer
6 pages
EC8552 Computer Architecture and Organization Notes 1
No ratings yet
EC8552 Computer Architecture and Organization Notes 1
106 pages
E-Sys_EN_Release-Notes_V3_39_1
No ratings yet
E-Sys_EN_Release-Notes_V3_39_1
1 page
RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming
No ratings yet
RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming
15 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
4 pages
Stamatis Kalogerakos Thesis 2011
No ratings yet
Stamatis Kalogerakos Thesis 2011
333 pages
Lecture 1 Introduction To Distributed Systems - 034922
No ratings yet
Lecture 1 Introduction To Distributed Systems - 034922
6 pages
Agent Based Modeling Using Phyton
No ratings yet
Agent Based Modeling Using Phyton
34 pages
(Lecture Notes in Computer Science 8405 Theoretical Computer Science and General Issues) Diana Goehringer, Marco Domenico Santambrogio, João M. P. Cardoso, Koen Bertels (Eds.) - Reconfigurable Computi
No ratings yet
(Lecture Notes in Computer Science 8405 Theoretical Computer Science and General Issues) Diana Goehringer, Marco Domenico Santambrogio, João M. P. Cardoso, Koen Bertels (Eds.) - Reconfigurable Computi
370 pages
Coe123 Report
No ratings yet
Coe123 Report
24 pages
Cache Coherence and Synchronization - Tutorialspoint
No ratings yet
Cache Coherence and Synchronization - Tutorialspoint
7 pages
Implementing A Large Data Bus VLIW Microprocessor
No ratings yet
Implementing A Large Data Bus VLIW Microprocessor
7 pages
International Journal of Distributed and Parallel Systems (IJDPS)
No ratings yet
International Journal of Distributed and Parallel Systems (IJDPS)
20 pages
State of AI Report 2023 - ONLINE
No ratings yet
State of AI Report 2023 - ONLINE
163 pages

Lec5 PDF

Uploaded by

Lec5 PDF

Uploaded by

Lecture 5: Superscalar Processors

Definition and motivation

This is a new trend of architecture design:

Three types of data dependencies can be

There are often a lot of true data dependencies in a small

Output dependencies and anti-dependencies can usually be

MUL R4,R3,R1 (R4 := R3 * R1)

MUL R4,R3,R1 (R4 := R3 * R1)

Instruction issue - an instruction is initiated and starts

Execution policies usually used:

Instructions are issued in exact program order, and completed in

With out-of-order completion, a later instruction may

With in-order issue, no new instruction can be issued when

You might also like