0% found this document useful (0 votes)

5 views

CH16 ParallelismSuperScalar 22 Slides

This chapter discusses instruction-level parallelism and superscalar processors, focusing on their design issues, differences from super pipelined approaches, and limitations such as dependencies and resource conflicts. It covers key concepts like instruction issue policies, register renaming, and branch prediction, as well as techniques to enhance performance in superscalar architectures. The chapter concludes with an overview of the essential elements of superscalar processor organization.

Uploaded by

quocvietlhbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

CH16 ParallelismSuperScalar 22 Slides

Uploaded by

quocvietlhbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

+

Instruction-Level
Parallelism and
Chapter Superscalar
Processors
16
William Stallings, Computer Organization and Architecture, 9 th Edition
+ Objectives
Parallel execution  High performance
After studying this chapter, you should be able to:
 Explainthe difference between superscalar and
super pipelined approaches.
 Define instruction-level parallelism.
 Discuss dependencies and resource conflicts as
limitations to instruction level parallelism
 Presentan overview of the design issues
involved in instruction-level parallelism.
 Compare and contrast techniques of improving
pipeline performance in RISC machines and
superscalar machines.
+
Contents

 16.1 Overview
 16.2 Design Issues
16.1- Superscalar
Overview Refers to a machine
that is designed to
Term first coined in improve the
1987 performance of the
execution of scalar
instructions

Represents the next

In most applications
step in the evolution
the bulk of the
of high-performance
operations are on
general-purpose
scalar quantities
processors

Essence of the
Concept can be
approach is the ability
further exploited by
to execute
allowing instructions
instructions
to be executed in an
independently and
order different from
concurrently in
the program order
different pipelines
Compare

Some
results
+

Comparison
of
Superscalar
and
Superpipelin
e Approaches
+
Constraints
 Instruction level parallelism
 Refers to the degree to which the instructions of a
program can be executed in parallel
 A combination of compiler based optimization and
hardware techniques can be used to maximize
instruction level parallelism Input of the next instruction is
the output of the previous
 Limitations:
Previous instruction is a
 True data dependency branch, code of the target can
 Procedural dependency cause affects on input of the
next access the same resource (bus,
 Resource conflicts 2 instructions
registers,…)
 Output dependency
2 instructions write values to the same output
 Anti-dependency (Write-after-write)
Situations in which parallel
executions can not be used Write-after-read situation
+
Constraints - Examples
1. A = 3
Data dependency  Order of instructions can not be
2. B = A
changed  They can not be parallelized
3. C = B

MOV EAX, eff ; /* copy variable eff to the register

MOV EBX, EAX ; /* copy EAX to EBX  Data dependency

1. B = 3
Instruction 1, 3 can not be parallelized be cause they are
2. A = B + 1
Write-after-write (WAW)  Output dependency
3. B = 7

1. B = 3 Instruction 2 is anti-dependent  Order of instructions can

2. A = B + 1 not be changed  They can not be parallelized , instruction
3. B = 7 3: Write after read (WAR)
+Effect of
Dependencies
i1 and i2 are executed
concurrently
Input of i2 depends on
i1  i2 waits

i2 must be waited due

to a branch

i2 waits resources
which are being
accessed by i1
+
Design Issues
Instruction-Level Parallelism
and Machine Parallelism

 Instruction level parallelism

 Instructions in a sequence are independent
 Execution can be overlapped
 Governed by data and procedural dependency

 Machine Parallelism
 Ability to take advantage of instruction level
parallelism
 Governed by number of parallel pipelines
+ Instruction Issue Policy
 Instruction issue
 Refers to the process of initiating instruction execution in the
processor’s functional units

 Instruction issue policy

 Refers to the protocol used to issue instructions
 Instruction issue occurs when instruction moves from the decode
stage of the pipeline to the first execute stage of the pipeline

 Three types of orderings are important:

 The order in which instructions are fetched
 The order in which instructions are executed
 The order in which instructions update the contents of register and
memory locations

 Superscalar instruction issue policies can be grouped

into the following categories:
 In-order issue with in-order completion
 In-order issue with out-of-order completion
 Out-of-order issue with out-of-order completion
+
Superscalar
Instruction Issue
and Completion
Policies
Organization for Out-of-Order Issue
with Out-of-Order Completion

An instruction buffer
(instruction window)
is used to store
instructions which
are ready for
executing. After a
processor has
finished decoding an
instruction, it is
placed in it. As long
as this buffer is not
full, the processor
can continue to fetch
and decode new Any instruction in the buffer will be issued out-of-order if
instructions. (1) It needs the particular functional unit that is available, and
(2) No conflicts or dependencies block this instruction.
Another buffer (reorder buffer) can be used as a temporary storage for results completed out
of order that are then committed to the register file in program order
Register Renamingoccur
Output and antidependencies
because register contents may not
reflect the correct ordering from the
program

May result in a pipeline stall

Compiler techniques
Registersattempt to maximize
allocated the use of registers 
dynamically
maximizing the number of storage conflicts if parallel execution is
applied. Register renaming is a technique of duplication of resources
(more registers are added). Registers are allocated dynamically by
the processor hardware, and they are associated with the values needed
by instructions at various points in time. Thus, the same original
register reference in several different instructions may refer to
Register Renaming- Example
R3: logical register
R3a :a hardware register allocated dynamically

.When a new allocation is made for a particular logical register,

subsequent instruction references to that logical register as a
source operand are made to refer to the most recently allocated
hardware register (recent in terms of the program sequence of
instructions). In this example, the creation of register R3 c in
instruction I3 avoids the WAR dependency on the second
instruction and the WAW on the first instruction, and it does not
interfere with the correct value being accessed by I4. The result is
that I3 can be issued immediately; without renaming, I3 cannot be
issued until the first instruction is complete and the second
instruction is issued.
Machine Parallelism

3 hardware techniques that can be used in a superscalar

processor to enhance performance:
(1) Duplication of resources,
(2) Out-of-order issue,
(3) Renaming registers.
Figure 16.6 (next slide) shows the results in mean speedup of the
superscalar machine over the scalar machine (without procedural
dependencies).
base: processor organization does not duplicate any of the
functional units, but it can issue instructions out of order.
+ld/st: duplicates the load/store functional unit that accesses a
data cache.
+alu: duplicates the ALU,
+both: duplicates both load/store and ALU.
Speedups of Various
Machine Organizations
Without Procedural
Dependencies
+ Branch Prediction
 Any high-performance pipelined machine must address
the issue of dealing with branches
 Intel 80486 addressed the problem by fetching both
the next sequential instruction after a branch and
speculatively fetching the branch target instruction
 RISC machines:
 Delayed branch strategy was explored
 Processor always executes the single instruction that
immediately follows the branch
Keeps the pipeline full while the processor fetches a new

instruction stream
Reasons: multiple instructions need to execute in the
 Superscalar machines: delay slot, instruction dependencies are major interest
 Delayed branch strategy has less appeal (không là yêu
cầu)
 Have returned to pre-RISC techniques of branch prediction
Superscalar Execution
+
Superscalar Implementation
 Key elements:
 Instruction fetch strategies that simultaneously
fetch multiple instruction
 Logic for determining true dependencies involving
register values, and mechanisms for
communicating these values to where they are
needed during execution
 Mechanisms for initiating, or issuing, multiple
instructions in parallel
 Resources for parallel execution of multiple
instructions, including multiple pipelined functional
units and memory hierarchies capable of
simultaneously servicing multiple memory references
 Mechanisms for committing the process state
in correct order
+
Exercises

16.1 What is the essential characteristic of the superscalar approach to processor

design?
16.2 What is the difference between the superscalar and super pipelined
approaches?
16.3 What is instruction-level parallelism?
16.4 Briefly define the following terms: • True data dependency • Procedural
dependency • Resource conflicts • Output dependency • Antidependency
16.5 What is the distinction between instruction-level parallelism and machine
parallelism?
16.6 List and briefly define three types of superscalar instruction issue policies.
16.7 What is the purpose of an instruction window?
16.8 What is register renaming and what is its purpose?
16.9 What are the key elements of a superscalar processor organization?
+ Summary Instruction-Level
Parallelism and
Chapter 16
Superscalar
Processors

 Superscalarversus  Design issues

Superpipelined  Instruction-level
parallelism
 Machine parallelism
 Instruction issue policy
 Register renaming
 Branch prediction
 Superscalar execution
 Superscalar
implementation

SPPID - Procedure To Create New Line Style
100% (2)
SPPID - Procedure To Create New Line Style
23 pages
Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Instruction-Level Parallelism and Superscalar Processors
22 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
28 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
7TH_UNIT 2-21EC74H6_CA
No ratings yet
7TH_UNIT 2-21EC74H6_CA
95 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
CH18-COA11e
No ratings yet
CH18-COA11e
37 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
CH16-WS ILP and Superscalar-v2
No ratings yet
CH16-WS ILP and Superscalar-v2
42 pages
10.Week
No ratings yet
10.Week
35 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
40 pages
What Is The Essential Characteristic of The Su...
No ratings yet
What Is The Essential Characteristic of The Su...
2 pages
Computer Architecture_Lecture 13
No ratings yet
Computer Architecture_Lecture 13
18 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Instruction Level Parallelism and Superscalar Processors
No ratings yet
Instruction Level Parallelism and Superscalar Processors
34 pages
CH18 COA11e
No ratings yet
CH18 COA11e
40 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
EE457Unit9a_OoO
No ratings yet
EE457Unit9a_OoO
77 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
4-Advanced pipelining_241114_060906
No ratings yet
4-Advanced pipelining_241114_060906
80 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Superscalar
No ratings yet
Superscalar
38 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
43 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Superscalar and Superpipelined Processors
No ratings yet
Superscalar and Superpipelined Processors
4 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Xx-Iip & Ilp
No ratings yet
Xx-Iip & Ilp
16 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Lect5 PDF
No ratings yet
Lect5 PDF
21 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Architecture PDF
No ratings yet
Architecture PDF
19 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
Lec5 PDF
No ratings yet
Lec5 PDF
39 pages
Module 5_Processor Structure and Function
No ratings yet
Module 5_Processor Structure and Function
74 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
Slot15 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot15 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
(123doc) Dien Tu Vien Thong c16 Instructionlevel Parallelism and Superscalar Processors 39 g3 Khotailieu
No ratings yet
(123doc) Dien Tu Vien Thong c16 Instructionlevel Parallelism and Superscalar Processors 39 g3 Khotailieu
71 pages
Instruction-Level Parallel Processors: Asim Munir
No ratings yet
Instruction-Level Parallel Processors: Asim Munir
28 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
L13
No ratings yet
L13
15 pages
L1.3b_OOOpipelines
No ratings yet
L1.3b_OOOpipelines
72 pages
EX16
No ratings yet
EX16
2 pages
2-TypesofParallelism (1)
No ratings yet
2-TypesofParallelism (1)
69 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
Instruction level Parallelism
No ratings yet
Instruction level Parallelism
22 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Naskah BING4102 The 2
No ratings yet
Naskah BING4102 The 2
3 pages
1992 Telepresence Integrating Shared Task and Person Spaces 418
No ratings yet
1992 Telepresence Integrating Shared Task and Person Spaces 418
7 pages
SRVIVR25™: Longest Duration of Voice and Flight Data Available For Crash-Protected Recorders
No ratings yet
SRVIVR25™: Longest Duration of Voice and Flight Data Available For Crash-Protected Recorders
2 pages
Magnite OTT Is For Everyone
No ratings yet
Magnite OTT Is For Everyone
17 pages
Chapter 1 Pretest
No ratings yet
Chapter 1 Pretest
4 pages
CCS347- Game Development Lab Manual - Page
No ratings yet
CCS347- Game Development Lab Manual - Page
56 pages
2-3 Flowchart Dan Pseudocode
No ratings yet
2-3 Flowchart Dan Pseudocode
48 pages
Smart Door Lock System
No ratings yet
Smart Door Lock System
2 pages
How To Download MT4 Indicator Softwares and Install To Your Broker
No ratings yet
How To Download MT4 Indicator Softwares and Install To Your Broker
15 pages
KT6368A Bluetooth Chip FAQs (Updated to 55 Questions)
No ratings yet
KT6368A Bluetooth Chip FAQs (Updated to 55 Questions)
41 pages
Time Management Table Names Sap Eassy Access-Go To Sm30 & Enter The Table Name Compensatory Off Comp-Off Attendance Type S.No Subject Table Name
No ratings yet
Time Management Table Names Sap Eassy Access-Go To Sm30 & Enter The Table Name Compensatory Off Comp-Off Attendance Type S.No Subject Table Name
1 page
DOC-20241128-WA0003.
No ratings yet
DOC-20241128-WA0003.
28 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
61 pages
Counting Service Manual ccb9
No ratings yet
Counting Service Manual ccb9
17 pages
Untitled
No ratings yet
Untitled
2 pages
Corrected Ahimado Proposal
No ratings yet
Corrected Ahimado Proposal
39 pages
Input: Fishbone Diagram Template
No ratings yet
Input: Fishbone Diagram Template
4 pages
Primepower 200: Midrange Server
No ratings yet
Primepower 200: Midrange Server
2 pages
Pictionary
No ratings yet
Pictionary
8 pages
Firewall & Traffic Shaping - Meraki Dashboard
No ratings yet
Firewall & Traffic Shaping - Meraki Dashboard
3 pages
Chap6 - More Counting by Mapping
No ratings yet
Chap6 - More Counting by Mapping
31 pages
JEE (MAIN + ADVANCED) 2022-23 Computer Based Test (CBT) On 06.12.2022 (Tuesday) Enthusiast Phase - Teas, Tras, 1 (A) & 1
No ratings yet
JEE (MAIN + ADVANCED) 2022-23 Computer Based Test (CBT) On 06.12.2022 (Tuesday) Enthusiast Phase - Teas, Tras, 1 (A) & 1
1 page
Horn Mouthpiece Catalog
No ratings yet
Horn Mouthpiece Catalog
30 pages
Incident Response Management Policy Template For CIS Control 17
No ratings yet
Incident Response Management Policy Template For CIS Control 17
17 pages
Maintenance Manual For Sound Powered Telephone-Spt-Select
100% (1)
Maintenance Manual For Sound Powered Telephone-Spt-Select
7 pages
Massive Manual English
No ratings yet
Massive Manual English
115 pages
VPD - Update Product Manual
No ratings yet
VPD - Update Product Manual
46 pages
PWC Unit-3 (Arrays and Strings)
No ratings yet
PWC Unit-3 (Arrays and Strings)
45 pages
Gleason Kegelrad Englisch
No ratings yet
Gleason Kegelrad Englisch
4 pages

CH16 ParallelismSuperScalar 22 Slides

Uploaded by

CH16 ParallelismSuperScalar 22 Slides

Uploaded by

+

Represents the next

MOV EAX, eff ; /* copy variable eff to the register

1. B = 3 Instruction 2 is anti-dependent  Order of instructions can

i2 must be waited due

 Instruction level parallelism

 Instruction issue policy

 Three types of orderings are important:

 Superscalar instruction issue policies can be grouped

May result in a pipeline stall

.When a new allocation is made for a particular logical register,

3 hardware techniques that can be used in a superscalar

16.1 What is the essential characteristic of the superscalar approach to processor

 Superscalarversus  Design issues

You might also like