0% found this document useful (0 votes)

42 views

Slides Taken From: Parallel Computing Platforms

The document discusses limitations of memory system performance and explicitly parallel computing platforms. It describes how memory latency can limit performance on a single processor system and how caches can help reduce effective latency. It then contrasts single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) parallel models and shared address space vs message passing communication models for parallel platforms.

Uploaded by

Muhammad Shehryar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Slides Taken From: Parallel Computing Platforms

Uploaded by

Muhammad Shehryar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Slides taken from

Parallel Computing Platforms

Ananth Grama, Anshul Gupta,

George Karypis, and Vipin Kumar

To accompany the text ``Introduction to Parallel Computing'',

Addison Wesley, 2003.

Limitations of
Memory System Performance
•  Memory system, and not processor speed, is often the
bottleneck for many applications.
•  Memory system performance is largely captured by two
parameters, latency and bandwidth.
•  Latency is the time from the issue of a memory request
to the time the data is available at the processor.
•  Bandwidth is the rate at which data can be pumped to
the processor by the memory system.
Memory Latency: An Example

•  Consider a processor operating at 1 GHz (1 ns clock)

connected to a DRAM with a latency of 100 ns (no
caches). Assume that the processor has two multiply-
add units and is capable of executing four instructions in
each cycle of 1 ns. The following observations follow:
–  The peak processor rating is 4 GFLOPS.
–  Since the memory latency is equal to 100 cycles and block size
is one word, every time a memory request is made, the
processor must wait 100 cycles before it can process the data.

Memory Latency: An Example

•  On the above architecture, consider the problem of

computing a dot-product of two vectors.
–  A dot-product computation performs one multiply-add on a single
pair of vector elements, i.e., each floating point operation
requires one data fetch.
–  It follows that the peak speed of this computation is limited to
one floating point operation every 100 ns, or a speed of 10
MFLOPS, a very small fraction of the peak processor rating!
Improving Effective Memory
Latency Using Caches
•  Caches are small and fast memory elements between
the processor and DRAM.
•  This memory acts as a low-latency high-bandwidth
storage.
•  If a piece of data is repeatedly used, the effective latency
of this memory system can be reduced by the cache.
•  The fraction of data references satisfied by the cache is
called the cache hit ratio of the computation on the
system.
•  Cache hit ratio achieved by a code on a memory system
often determines its performance.

Explicitly Parallel Platforms

Dichotomy of Parallel Computing
Platforms
•  An explicitly parallel program must specify concurrency
and interaction between concurrent subtasks.
•  The former is sometimes also referred to as the control
structure and the latter as the communication model.

Control Structure of Parallel Programs

•  Parallelism can be expressed at various levels of

granularity - from instruction level to processes.
•  Between these extremes exist a range of models, along
with corresponding architectural support.
Control Structure of Parallel Programs

•  Processing units in parallel computers either operate

under the centralized control of a single control unit or
work independently.
•  If there is a single control unit that dispatches the same
instruction to various processors (that work on different
data), the model is referred to as single instruction
stream, multiple data stream (SIMD).
•  If each processor has its own control control unit, each
processor can execute different instructions on different
data items. This model is called multiple instruction
stream, multiple data stream (MIMD).

SIMD and MIMD Processors

PE: Processing Element

PE
+
INTERCONNECTION NETWORK

INTERCONNECTION NETWORK

PE control unit

PE PE
+
control unit
PE
Global
control
unit
PE
+
PE control unit

PE PE
+
control unit

(a) (b)

A typical SIMD architecture (a) and a typical MIMD architecture (b).

SIMD Processors
•  Some of the earliest parallel computers such as the
Illiac IV, MPP, DAP, CM-2, and MasPar MP-1 belonged
to this class of machines.
•  Variants of this concept have found use in co-processing
units such as the MMX units in Intel processors and DSP
chips such as the Sharc.
•  SIMD relies on the regular structure of computations
(such as those in image processing).
•  It is often necessary to selectively turn off operations on
certain data items. For this reason, most SIMD
programming paradigms allow for an ``activity mask'',
which determines if a processor should participate in a
computation or not.

Communication Model
of Parallel Platforms
•  There are two primary forms of data exchange between
parallel tasks - accessing a shared data space and
exchanging messages.
•  Platforms that provide a shared data space are called
shared-address-space machines or multiprocessors.
•  Platforms that support messaging are also called
message passing platforms or multicomputers.
Shared-Address-Space Platforms

•  Part (or all) of the memory is accessible to all

processors.
•  Processors interact by modifying data objects stored in
this shared-address-space.
•  If the time taken by a processor to access any memory
word in the system global or local is identical, the
platform is classified as a uniform memory access
(UMA), else, a non-uniform memory access (NUMA)
machine.

NUMA and UMA Shared-Address-Space

Platforms
P P
P
M M
Interconnection Network

Interconnection Network

C C M

P P P
M M
C C M

P P
M M
P
C C M
(a) (b) (c)

Typical shared-address-space architectures: (a) Uniform-memory

access shared-address-space computer; (b) Uniform-memory-
access shared-address-space computer with caches and
memories; (c) Non-uniform-memory-access shared-address-space
computer with local memory only.
NUMA and UMA
Shared-Address-Space Platforms
•  The distinction between NUMA and UMA platforms is important from
the point of view of algorithm design. NUMA machines require
locality from underlying algorithms for performance.
•  Programming these platforms is easier since reads and writes are
implicitly visible to other processors.
•  However, read-write data to shared data must be coordinated (this
will be discussed in greater detail when we talk about threads
programming).
•  Caches in such machines require coordinated access to multiple
copies. This leads to the cache coherence problem.

Shared-Address-Space
vs.
Shared Memory Machines

•  It is important to note the difference between the terms

shared address space and shared memory.
•  We refer to the former as a programming abstraction and
to the latter as a physical machine attribute.
•  It is possible to provide a shared address space using a
physically distributed memory.
Message-Passing Platforms

•  These platforms comprise of a set of processors and

their own (exclusive) memory.
•  Instances of such a view come naturally from clustered
workstations and non-shared-address-space
multicomputers.
•  These platforms are programmed using (variants of)
send and receive primitives.
•  Libraries such as MPI provide such primitives.

Message Passing
vs.
Shared Address Space Platforms

•  Message passing requires little hardware support, other

than a network.
•  Shared address space platforms can easily emulate
message passing. The reverse is more difficult to do (in
an efficient manner).
Interconnection Networks
for Parallel Computers
•  Interconnection networks carry data between processors
and to memory.
•  Interconnects are made of switches and links (wires,
fiber).
•  Interconnects are classified as static or dynamic.
•  Static networks consist of point-to-point communication
links among processing nodes and are also referred to
as direct networks.
•  Dynamic networks are built using switches and
communication links. Dynamic networks are also
referred to as indirect networks.

Static and Dynamic

Interconnection Networks
Static network Indirect network

P P P P

Network interface/switch Switching element

Processing node

Classification of interconnection networks: (a) a static

network; and (b) a dynamic network.
Design of parallel algorithms

Core content of this course:

•  Take memory hierarchy into account (data locality)
•  Distribute data over memories
•  Distribute work over processors
•  Introduce & analyse
communication & synchronization

A first hands-on experience : do the exercise !

Affidavit PSF
33% (3)
Affidavit PSF
1 page
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Unit 1
No ratings yet
Unit 1
25 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
64 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
PDA_2
No ratings yet
PDA_2
105 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Multiprocessors and Multicomputers
No ratings yet
Multiprocessors and Multicomputers
27 pages
PDC Notes by Zatch-1
No ratings yet
PDC Notes by Zatch-1
42 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Architecture1 1 (2012)
No ratings yet
Architecture1 1 (2012)
87 pages
Module 4- Architecture
No ratings yet
Module 4- Architecture
22 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
unit 4
No ratings yet
unit 4
16 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
8051 Arch
No ratings yet
8051 Arch
55 pages
2.3 Dichotomy of Parallel Computing Platforms
No ratings yet
2.3 Dichotomy of Parallel Computing Platforms
6 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Chapter 5 - Shared Memory Multiprocessor
No ratings yet
Chapter 5 - Shared Memory Multiprocessor
96 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Total Quality Management in Banking Sector
No ratings yet
Total Quality Management in Banking Sector
8 pages
8 Private Banking
100% (10)
8 Private Banking
47 pages
GROUP by Clause Practice Problems
No ratings yet
GROUP by Clause Practice Problems
2 pages
hw2 Solution
No ratings yet
hw2 Solution
5 pages
Mandarin Fast-Food-Fs
No ratings yet
Mandarin Fast-Food-Fs
63 pages
Type of Discontinuity Cheat Sheet
No ratings yet
Type of Discontinuity Cheat Sheet
13 pages
DSTN Merged - CSI ZC447 ES ZC447IS ZC447SS ZC447 CH 9, 10, 11
No ratings yet
DSTN Merged - CSI ZC447 ES ZC447IS ZC447SS ZC447 CH 9, 10, 11
27 pages
Check Ni Khen
No ratings yet
Check Ni Khen
36 pages
Dabur - Security Manual (Narendrapur) Rev
No ratings yet
Dabur - Security Manual (Narendrapur) Rev
19 pages
SYLLABUS-ON-SALES-PLEDGE-AND-MORTGAGE
No ratings yet
SYLLABUS-ON-SALES-PLEDGE-AND-MORTGAGE
4 pages
BOAST-Traumatic-Spinal-Cord-Injury
No ratings yet
BOAST-Traumatic-Spinal-Cord-Injury
1 page
Pemetaan Lahan Sawah Berbasis Sistem Informasi Geografis Di Subak Petangan Dan Subak Pakel II, Desa Ubung Kaja, Kecamatan Denpasar Utara
No ratings yet
Pemetaan Lahan Sawah Berbasis Sistem Informasi Geografis Di Subak Petangan Dan Subak Pakel II, Desa Ubung Kaja, Kecamatan Denpasar Utara
17 pages
Empowerment Takes More Than A Minute
No ratings yet
Empowerment Takes More Than A Minute
2 pages
Job Desk Chief Security
No ratings yet
Job Desk Chief Security
2 pages
Good Practice Guidelines For The Development of Shale Oil and Gas
No ratings yet
Good Practice Guidelines For The Development of Shale Oil and Gas
24 pages
Service Bulletin - 2009
No ratings yet
Service Bulletin - 2009
3 pages
Croc Bloc HD Insect Repellant
No ratings yet
Croc Bloc HD Insect Repellant
8 pages
Tolentino V CA PDF
No ratings yet
Tolentino V CA PDF
4 pages
Ministry For Youth Affairs 10 Vicariate Youth Day Group Registration Form
No ratings yet
Ministry For Youth Affairs 10 Vicariate Youth Day Group Registration Form
2 pages
Commission Agreement RT
No ratings yet
Commission Agreement RT
2 pages
Game Informer April 2016
No ratings yet
Game Informer April 2016
100 pages
Properties and Applications of Electroless Nickel: Ron Parkinson
No ratings yet
Properties and Applications of Electroless Nickel: Ron Parkinson
33 pages
C1
No ratings yet
C1
3 pages
Mathematics - Applications and Interpretation Higher Level Paper 2 - SP
No ratings yet
Mathematics - Applications and Interpretation Higher Level Paper 2 - SP
8 pages
1072500.PAPER-Martinevi 2020 PDF
No ratings yet
1072500.PAPER-Martinevi 2020 PDF
12 pages
Model_Specification
No ratings yet
Model_Specification
4 pages
Gator by serial number
No ratings yet
Gator by serial number
8 pages
Agri-Fishery LAS 5
No ratings yet
Agri-Fishery LAS 5
5 pages
BRPL_Fan_Replacement_Scheme_Price_List
No ratings yet
BRPL_Fan_Replacement_Scheme_Price_List
2 pages

Slides Taken From: Parallel Computing Platforms

Uploaded by

Slides Taken From: Parallel Computing Platforms

Uploaded by

Slides taken from

Parallel Computing Platforms

Ananth Grama, Anshul Gupta,

To accompany the text ``Introduction to Parallel Computing'',

• Consider a processor operating at 1 GHz (1 ns clock)

Memory Latency: An Example

• On the above architecture, consider the problem of

Explicitly Parallel Platforms

Control Structure of Parallel Programs

• Parallelism can be expressed at various levels of

• Processing units in parallel computers either operate

SIMD and MIMD Processors

A typical SIMD architecture (a) and a typical MIMD architecture (b).

• Part (or all) of the memory is accessible to all

NUMA and UMA Shared-Address-Space

Typical shared-address-space architectures: (a) Uniform-memory

• It is important to note the difference between the terms

• These platforms comprise of a set of processors and

• Message passing requires little hardware support, other

Static and Dynamic

Network interface/switch Switching element

Classification of interconnection networks: (a) a static

Core content of this course:

A first hands-on experience : do the exercise !

You might also like

•  Consider a processor operating at 1 GHz (1 ns clock)

•  On the above architecture, consider the problem of

•  Parallelism can be expressed at various levels of

•  Processing units in parallel computers either operate

•  Part (or all) of the memory is accessible to all

•  It is important to note the difference between the terms

•  These platforms comprise of a set of processors and

•  Message passing requires little hardware support, other