0% found this document useful (0 votes)

21 views27 pages

Chapter2 part 3

The document discusses multiple issue scheduling in processors, including static and dynamic methods, as well as speculation for executing instructions simultaneously. It covers hardware multithreading techniques, including fine-grained and simultaneous multithreading (SMT), and explores Flynn's Taxonomy of parallel computing architectures such as SIMD and MIMD. Additionally, it addresses shared and distributed memory systems, interconnection networks, and the importance of bisection width in communication performance.

Uploaded by

hzfhzf137

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views27 pages

Chapter2 part 3

Uploaded by

hzfhzf137

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Multiple Issue (2)

 static multiple issue - functional units are

scheduled at compile time.

 dynamic multiple issue – functional units

are scheduled at run-time.

Copyright © 2010, Elsevier Inc. All rights Reserved 1

Speculation (1)
 In order to make use of multiple issue, the
system must find instructions that can be
executed simultaneously.
 In speculation, the compiler or
the processor makes a guess
about an instruction, and then
executes the instruction on the
basis of the guess.

Copyright © 2010, Elsevier Inc. All rights Reserved 2

Speculation (2)

z=x+y;
i f ( z > 0) Z will be
w=x; positive

else
w=y;

If the system speculates incorrectly,

it must go back and recalculate w = y.

Copyright © 2010, Elsevier Inc. All rights Reserved 3

Hardware multithreading (1)
 There aren’t always good opportunities for
simultaneous execution of different
threads.
 Hardware multithreading provides a means
for systems to continue doing useful work
when the task being currently executed
has stalled.

Copyright © 2010, Elsevier Inc. All rights Reserved 4

Hardware multithreading (2)
 Fine-grained - the processor switches
between threads after each instruction,
skipping threads that are stalled.

 Pros: potential to avoid wasted machine time

due to stalls.
 Cons: a thread that’s ready to execute a long
sequence of instructions may have to wait to
execute every instruction.

Copyright © 2010, Elsevier Inc. All rights Reserved 5

Hardware multithreading (3)
 Simultaneous multithreading (SMT) - a
variation on fine-grained multithreading.

 Allows multiple threads to make use of the

multiple functional units.

Copyright © 2010, Elsevier Inc. All rights Reserved 6

PARALLEL HARDWARE

Copyright © 2010, Elsevier Inc. All rights Reserved 7

Flynn’s Taxonomy
a n n
N e um
v o n
s ic SISD (SIMD)
cla s
Single instruction stream Single instruction stream
Single data stream Multiple data stream

MISD (MIMD)
Multiple instruction stream Multiple instruction stream
Single data stream Multiple data stream

no
tc
ov
ere
d

Copyright © 2010, Elsevier Inc. All rights Reserved 8

SIMD
 Parallelism achieved by dividing data
among the processors.

 Applies the same instruction to multiple

data items.

 Called data parallelism.

Copyright © 2010, Elsevier Inc. All rights Reserved 9

SIMD example

n data items
control unit
n ALUs

x[1] x[2] … x[n]

ALU1 ALU2 ALUn

for (i = 0; i < n; i++)

x[i] += y[i];

Copyright © 2010, Elsevier Inc. All rights Reserved 10

SIMD
 What if we don’t have as many ALUs as
data items?
 Divide the work and process iteratively.
 Ex. m = 4 ALUs and n = 15 data items.

Round3 ALU1 ALU2 ALU3 ALU4

1 X[0] X[1] X[2] X[3]
2 X[4] X[5] X[6] X[7]
3 X[8] X[9] X[10] X[11]
4 X[12] X[13] X[14]

Copyright © 2010, Elsevier Inc. All rights Reserved 11

Graphics Processing Units (GPU)
 Real time graphics application
programming interfaces or API’s use
points, lines, and triangles to internally
represent the surface of an object.

Copyright © 2010, Elsevier Inc. All rights Reserved 12

GPUs
 A graphics processing pipeline converts
the internal representation into an array of
pixels that can be sent to a computer
screen.

 Several stages of this pipeline

(called shader functions) are
programmable.
 Typically just a few lines of C code.

Copyright © 2010, Elsevier Inc. All rights Reserved 13

GPUs
 Shader functions are also implicitly
parallel, since they can be applied to
multiple elements in the graphics stream.

 GPU’s can often optimize performance by

using SIMD parallelism.
 The current generation of GPU’s use SIMD
parallelism.
 Although they are not pure SIMD systems.

Copyright © 2010, Elsevier Inc. All rights Reserved 14

MIMD
 Supports multiple simultaneous instruction
streams operating on multiple data
streams.

 Typically consist of a collection of fully

independent processing units or cores,
each of which has its own control unit and
its own ALU.

Copyright © 2010, Elsevier Inc. All rights Reserved 15

Shared Memory System (1)
 A collection of autonomous processors is
connected to a memory system via an
interconnection network.
 Each processor can access each memory
location.
 The processors usually communicate
implicitly by accessing shared data
structures.

Shared Memory System (2)
 Most widely available shared memory
systems use one or more multicore
processors.

Shared Memory System

Figure 2.3

Distributed Memory System
 Clusters (most popular)
 A collection of commodity systems.
 Connected by a commodity interconnection
network.

 Nodes of a cluster are individual

computations units joined by a
communication network.

Distributed Memory System

Figure 2.4

Interconnection networks
 Affects performance of both distributed
and shared memory systems.

 Two categories:
 Shared memory interconnects
 Distributed memory interconnects

Shared memory interconnects
 Bus interconnect
 A collection of parallel communication wires
together with some hardware that controls
access to the bus.
 Communication wires are shared by the
devices that are connected to it.
 As the number of devices connected to the
bus increases, contention for use of the bus
increases, and performance decreases.

Shared memory interconnects
 Switched interconnect
 Uses switches to control the routing of data
among the connected devices.

 Crossbar –

Allows simultaneous communication among
different devices.

Faster than buses.

But the cost of the switches and links is relatively
high.

Distributed memory interconnects
 Two groups
 Direct interconnect

Each switch is directly connected to a processor
memory pair, and the switches are connected to
each other.

 Indirect interconnect

Switches may not be directly connected to a
processor.

Bisection width
 A measure of “number of simultaneous
communications” or “connectivity”.

 How many simultaneous communications

can take place “across the divide” between
the halves?

Two bisections of a ring

Figure 2.9

Fully connected network
 Each switch is directly connected to every
other switch.

Figure 2.11

Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
onur-digitaldesign-2020-lecture20-gpu-beforelecture
No ratings yet
onur-digitaldesign-2020-lecture20-gpu-beforelecture
73 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Parallel Processing
No ratings yet
Parallel Processing
22 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Architecture
No ratings yet
Architecture
67 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Multiprocessor Architecture and Programming
No ratings yet
Multiprocessor Architecture and Programming
20 pages
Ch12 Parallel Proc3-Aula
No ratings yet
Ch12 Parallel Proc3-Aula
35 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
No ratings yet
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
58 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
2ad6a430 1637912349895
No ratings yet
2ad6a430 1637912349895
51 pages
23.L20 Multiprocessing Multithreading Vectorization
No ratings yet
23.L20 Multiprocessing Multithreading Vectorization
38 pages
PDA_2
No ratings yet
PDA_2
105 pages
Seminar
No ratings yet
Seminar
85 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Classification of Parallel Computation
No ratings yet
Classification of Parallel Computation
33 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
Parallel Processing Lecture2
No ratings yet
Parallel Processing Lecture2
62 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
18 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Flynn'S Classification: Cs6303 Computer Architecture
No ratings yet
Flynn'S Classification: Cs6303 Computer Architecture
11 pages
CS802A Lec-2 PDF
No ratings yet
CS802A Lec-2 PDF
28 pages
CSA Presentation
No ratings yet
CSA Presentation
37 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Unit VI
No ratings yet
Unit VI
50 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
No ratings yet
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
27 pages
Parallelism and Multicores
No ratings yet
Parallelism and Multicores
54 pages
Organization of Multiprocessor Systems
No ratings yet
Organization of Multiprocessor Systems
87 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Top Networking Terms You Should Know
From Everand
Top Networking Terms You Should Know
JOHN SMITH
No ratings yet
Operating System - Lab 3
No ratings yet
Operating System - Lab 3
8 pages
Bigdata Presentation - Parallel and Distributed System
No ratings yet
Bigdata Presentation - Parallel and Distributed System
7 pages
Catchlogs - 2023-02-21 at 23-39-49 - 5.12.0.100 (1165) - .Java
No ratings yet
Catchlogs - 2023-02-21 at 23-39-49 - 5.12.0.100 (1165) - .Java
28 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Deadlock
No ratings yet
Deadlock
12 pages
Real Time System - : BITS Pilani
No ratings yet
Real Time System - : BITS Pilani
16 pages
Chap6 Heter Computing
No ratings yet
Chap6 Heter Computing
22 pages
Full (Ebook PDF) Computer Architecture: A Quantitative Approach 5th Edition PDF All Chapters
100% (2)
Full (Ebook PDF) Computer Architecture: A Quantitative Approach 5th Edition PDF All Chapters
51 pages
OS Sheet (3) Solution
No ratings yet
OS Sheet (3) Solution
11 pages
Operatingsystem
No ratings yet
Operatingsystem
34 pages
COMPLETE OS - Merged
No ratings yet
COMPLETE OS - Merged
837 pages
Operating Systems: The Critical-Section Problem
No ratings yet
Operating Systems: The Critical-Section Problem
26 pages
4.5 Distributed Mutual Exclusion: Presenter: Weiling Li Instructor: Dr. Zhang
No ratings yet
4.5 Distributed Mutual Exclusion: Presenter: Weiling Li Instructor: Dr. Zhang
29 pages
Concurrency: Deadlock and Starvation: Ninth Edition, Global Edition by William Stallings
No ratings yet
Concurrency: Deadlock and Starvation: Ninth Edition, Global Edition by William Stallings
69 pages
Assignment 1,2&3
No ratings yet
Assignment 1,2&3
3 pages
AR20 JP Unit-4
No ratings yet
AR20 JP Unit-4
45 pages
PDC 7
No ratings yet
PDC 7
13 pages
Race Condition: Process Synchronization
No ratings yet
Race Condition: Process Synchronization
5 pages
Lesson 3: Process Management: IT 311: Applied Operating System
No ratings yet
Lesson 3: Process Management: IT 311: Applied Operating System
64 pages
DC - Unit I
No ratings yet
DC - Unit I
55 pages
Multicore Architecture and Programming1 - P21EC7024
No ratings yet
Multicore Architecture and Programming1 - P21EC7024
4 pages
Unit 1
No ratings yet
Unit 1
22 pages
CS8603 Unit - Iii
No ratings yet
CS8603 Unit - Iii
21 pages
07_gpuarch
No ratings yet
07_gpuarch
73 pages
UNIT-II
No ratings yet
UNIT-II
14 pages
R16JNTUK
No ratings yet
R16JNTUK
1 page
Rate Monotonic Scheduling: Group - 12
No ratings yet
Rate Monotonic Scheduling: Group - 12
19 pages
10EC65 Operating Systems - Message Passing
No ratings yet
10EC65 Operating Systems - Message Passing
39 pages
05-Consistency and Replication
No ratings yet
05-Consistency and Replication
64 pages
viva questions
No ratings yet
viva questions
15 pages

Chapter2 part 3

Uploaded by

Chapter2 part 3

Uploaded by

Multiple Issue (2)

 static multiple issue - functional units are

 dynamic multiple issue – functional units

Copyright © 2010, Elsevier Inc. All rights Reserved 1

Copyright © 2010, Elsevier Inc. All rights Reserved 2

If the system speculates incorrectly,

Copyright © 2010, Elsevier Inc. All rights Reserved 3

Copyright © 2010, Elsevier Inc. All rights Reserved 4

 Pros: potential to avoid wasted machine time

Copyright © 2010, Elsevier Inc. All rights Reserved 5

 Allows multiple threads to make use of the

Copyright © 2010, Elsevier Inc. All rights Reserved 6

Copyright © 2010, Elsevier Inc. All rights Reserved 7

Copyright © 2010, Elsevier Inc. All rights Reserved 8

 Applies the same instruction to multiple

 Called data parallelism.

Copyright © 2010, Elsevier Inc. All rights Reserved 9

x[1] x[2] … x[n]

for (i = 0; i < n; i++)

Copyright © 2010, Elsevier Inc. All rights Reserved 10

Round3 ALU1 ALU2 ALU3 ALU4

Copyright © 2010, Elsevier Inc. All rights Reserved 11

Copyright © 2010, Elsevier Inc. All rights Reserved 12

 Several stages of this pipeline

Copyright © 2010, Elsevier Inc. All rights Reserved 13

 GPU’s can often optimize performance by

Copyright © 2010, Elsevier Inc. All rights Reserved 14

 Typically consist of a collection of fully

Copyright © 2010, Elsevier Inc. All rights Reserved 15

Copyright © 2010, Elsevier Inc. All rights Reserved 16

Copyright © 2010, Elsevier Inc. All rights Reserved 17

Copyright © 2010, Elsevier Inc. All rights Reserved 18

 Nodes of a cluster are individual

Copyright © 2010, Elsevier Inc. All rights Reserved 19

Copyright © 2010, Elsevier Inc. All rights Reserved 20

Copyright © 2010, Elsevier Inc. All rights Reserved 21

Copyright © 2010, Elsevier Inc. All rights Reserved 22

Copyright © 2010, Elsevier Inc. All rights Reserved 23

Copyright © 2010, Elsevier Inc. All rights Reserved 24

 How many simultaneous communications

Copyright © 2010, Elsevier Inc. All rights Reserved 25

Copyright © 2010, Elsevier Inc. All rights Reserved 26

Copyright © 2010, Elsevier Inc. All rights Reserved 27

You might also like