0% found this document useful (0 votes)

29 views31 pages

L2 Parallel Computing Models

The document discusses parallel computation models including PRAM and distributed models. It covers topics like parallel algorithms, complexity measures, and issues with shared memory models. The slides provide details on different parallel and distributed frameworks as well as interconnection networks and processing activation methods.

Uploaded by

Karthik Laxmikanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views31 pages

L2 Parallel Computing Models

Uploaded by

Karthik Laxmikanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Parallel Computation

Models Lecture 2

Slide 1

Parallel Computation Models

• PRAM (parallel RAM)
• Fixed Interconnection Network
– bus, ring, mesh, hypercube, shuffle-exchange
• Boolean Circuits
• Combinatorial Circuits
• BSP (Bulk Synchronous Parallel) • LogP (L:
Upper bound for the latency, O: Overhead, G –
Gap, P – No. of Processors) Slide 2
PARALLEL AND
DISTRIBUTED COMPUTATION
• MANY INTERCONNECTED PROCESSORS WORKING CONCURRENTLY

P4 P5
P3

INTERCONNECTION
NETWORK

P2
. . . . Pn P1

• CONNECTION MACHINE
the world

Slide 3

• INTERNET Connects all the computers of

TYPES OF
MULTIPROCESSING
FRAMEWORKS
PARALLEL DISTRIBUTED

TECHNICAL ASPECTS
•PARALLEL COMPUTERS (USUALLY) WORK IN TIGHT SYNCRONY, SHARE MEMORY TO A LARGE
EXTENT AND HAVE A VERY FAST AND RELIABLE COMMUNICATION MECHANISM BETWEEN
THEM.

• DISTRIBUTED COMPUTERS ARE MORE

INDEPENDENT, COMMUNICATION IS LESS FREQUENT AND LESS SYNCRONOUS,
AND THE COOPERATION IS LIMITED.

PURPOSES
• PARALLEL COMPUTERS COOPERATE TO SOLVE MORE EFFICIENTLY (POSSIBLY)
DIFFICULT PROBLEMS

• DISTRIBUTED COMPUTERS HAVE INDIVIDUAL GOALS AND PRIVATE

ACTIVITIES. SOMETIME COMMUNICATIONS WITH OTHER ONES ARE NEEDED. (E. G.
DISTRIBUTED DATA BASE OPERATIONS).

PARALLEL COMPUTERS: COOPERATION IN A POSITIVE SENSE

DISTRIBUTED COMPUTERS:
COOPERATION IN A NEGATIVE
SENSE, ONLY WHEN IT IS
NECESSARY

Slide 4
FOR PARALLEL SYSTEMS
WE ARE INTERESTED TO SOLVE ANY PROBLEM IN PARALLEL

FOR DISTRIBUTED SYSTEMS

WE ARE INTERESTED TO SOLVE IN PARALLEL

PARTICULAR PROBLEMS ONLY, TYPICAL EXAMPLES
ARE:

•COMMUNICATION SERVICES
ROUTING
BROADCASTING

•MAINTENANCE OF CONTROL STUCTURE

SPANNING TREE CONSTRUCTION
TOPOLOGY UPDATE
LEADER ELECTION

•RESOURCE CONTROL ACTIVITIES

LOAD BALANCING
MANAGING GLOBAL DIRECTORIES
Slide 5

PARALLEL ALGORITHMS

• WHICH MODEL OF COMPUTATION IS THE BETTER TO USE?

• HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL

ALGORITHM? • HOW TO CONSTRUCT EFFICIENT ALGORITHMS?

MANY CONCEPTS OF THE COMPLEXITY THEORY MUST BE REVISITED

• IS THE PARALLELISM A SOLUTION FOR HARD PROBLEMS?

• ARE THERE PROBLEMS NOT ADMITTING AN EFFICIENT PARALLEL SOLUTION,

THAT IS INHERENTLY SEQUENTIAL PROBLEMS?

Slide 6
We need a model of computation

• NETWORK (VLSI) MODEL

• The processors are connected by a network of bounded

degree. • No shared memory is available.

• Several interconnection topologies.

• Synchronous way of operating.

MESH CONNECTED ARRAY

degree = 4 (N) diameter = 2N

Slide 7

CUBE
0111

0110

HYPER
0101

0010 1110
1111

0100
1101

1010

diameter = 4
degree = 4 (log2N)

0011
1011
1000 1001
0000 0001
1100

N = 24 PROCESSORS
Slide 8
Other important topologies

• binary trees
• mesh of trees
• cube connected cycles

In the network model a PARALLEL MACHINE is a very complex

ensemble of small interconnected units, performing elementary
operations.
- Each processor has its own memory.
- Processors work synchronously.

LIMITS OF THE MODEL

• different topologies require different algorithms to solve the same

problem
• it is difficult to describe and analyse algorithms (the migration
of data have to be described)

A shared-memory model is more suitable by an algorithmic point of view

Slide 9

Model Equivalence
• given two models M1and M2, and a problem Π
of size n

• if M1and M2are equivalent then solving Π

requires:
– T(n) time and P(n) processors on M1
– T(n)O(1) time and P(n)O(1) processors on M2

Slide 10

PRAM
• Parallel Random Access Machine •
Shared-memory multiprocessor •
unlimited number of processors, each –
has unlimited local memory
– knows its ID
– able to access the shared memory
• unlimited shared memory
Slide 11

MODEL 1
P 1

2
PRAM 3

P2 .
. Common
Pi . Memory
.
P .
n

?
m
PRAM n RAM processors connected to a common memory of m cells

ASSUMPTION: at each time unit each Pi can read a memory cell, make an internal
computation and write another memory cell.

CONSEQUENCE: any pair of processor Pi Pj can communicate in constant time!

Pi writes the message in cell x at time t

Pireads the message in cell x at time t+1
Slide 12

PRAM
• Inputs/Outputs are placed in the shared
memory (designated address)
• Memory cell stores an arbitrarily large
integer
• Each instruction takes unit time •
Instructions are synchronized across the
processors

Slide 13

PRAM Instruction Set

• accumulator architecture
– memory cell R0accumulates results

• multiply/divide instructions take only

constant operands
– prevents generating exponentially large
numbers in polynomial time

Slide 14

PRAM Complexity Measures

• for each individual processor
– time: number of instructions executed
– space: number of memory cells accessed

• PRAM machine
– time: time taken by the longest running processor
– hardware: maximum number of active processors

Slide 15

Two Technical Issues for PRAM

• How processors are activated

• How shared memory is accessed

Slide 16

Processor Activation
• P0 places the number of processors (p) in the
designated shared-memory cell
– each active Pi, where i < p, starts executing
– O(1) time to activate
– all processors halt when P0 halts

• Active processors explicitly activate additional

processors via FORK instructions
– tree-like activation
– O(log p) time to activate
Slide 17
THE PRAM IS A THEORETICAL (UNFEASIBLE) MODEL
• The interconnection network between processors and memory would require
a very large amount of area .

• The message-routing on the interconnection network would require time

proportional to network size (i. e. the assumption of a constant access time
to the memory is not realistic).

WHY THE PRAM IS A REFERENCE MODEL?

• Algorithm’s designers can forget the communication problems and focus their
attention on the parallel computation only.

•There exist algorithms simulating any PRAM algorithm on bounded degree

networks.

E. G. A PRAM algorithm requiring time T(n), can be simulated in a mesh of tree in

time T(n)log2n/loglogn, that is each step can be simulated with a slow-down of
log2n/loglogn.

• Instead of design ad hoc algorithms for bounded degree networks, design more
general algorithms for the PRAM model and simulate them on a feasible network.

Slide 18
• For the PRAM model there exists a well developed body of
techniques and methods to handle different classes of computational
problems.
• The discussion on parallel model of computation is still

HOT The actual trend:

COARSE-GRAINED MODELS
•
The degree of parallelism allowed is independent
from the number of processors.

• The computation is divided in supersteps, each one includes

• local computation
• communication phase
• syncronization phase

the study is still

at the beginning!
Slide 19

Metrics
A measure of relative performance between a multiprocessor
system and a single processor system is the speed-up S( p),
defined as follows:
S( p) =Execution time using a single processor system Execution time
using a multiprocessor with p processors

S( p) =T1
TpEfficiency =Sp
p

Cost = p × Tp

Slide 20

Metrics cost-optimal: parallel cost

= sequential time
• Parallel algorithm is
Cp = T1
Ep = 100%

• Critical when
down-scaling: parallel
implementation may
become slower than
sequential
T1 = n3
Tp = n2.5 when p = n2
Cp = n4.5

Slide 21
Amdahl’s Law
• f = fraction of the problem that’s
inherently sequential
(1 – f) = fraction that’s parallel
=
• Parallel time Tp: 11
Tp= f + (1− f ) p
• Speedup processors: f
with p Sp−
f + p
Slide 22
Amdahl’s Law
• Upper bound on
speedup (p = ∞)
= S1
11Converges to 0
Sp− f =
∞ f
f Exa 2%
mple +
• p
:f=
S = 1 / 0.02 = 50

Slide 23

PRAM
• Too many interconnections gives problems with synchronization •
However it is the best conceptual model for designing efficient
parallel algorithms
– due to simplicity and possibility of simulating efficiently PRAM
algorithms on more realistic parallel architectures
Slide 24

Shared-Memory Access
Concurrent (C) means, many processors can do the operation simultaneously in
the same memory
Exclusive (E) not concurent

• EREW (Exclusive Read Exclusive Write)

• CREW (Concurrent Read Exclusive Write)
– Many processors can read simultaneously
the same location, but only one can attempt to write to a given location
• ERCW
• CRCW
– Many processors can write/read at/from the same memory location

Slide 25

Example CRCW-PRAM
• Initially
– table A contains values 0 and 1
– output contains value 0
• The program computes the “Boolean OR” of
A[1], A[2], A[3], A[4], A[5]

Slide 26

Example CREW-PRAM
• Assume initially table A contains [0,0,0,0,0,1] and we
have the parallel program
Slide
27

Pascal triangle
PRAM CREW

Slide 28

Introduction To Parallel Computing: Solution Manual
No ratings yet
Introduction To Parallel Computing: Solution Manual
70 pages
Asynchronous Programming in Rust
No ratings yet
Asynchronous Programming in Rust
76 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
PDA_3
No ratings yet
PDA_3
90 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Ram, Pram, and Logp Models
No ratings yet
Ram, Pram, and Logp Models
72 pages
Parallel Random Access Machine
No ratings yet
Parallel Random Access Machine
22 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
What Is Parallel Computing 1 PDF
No ratings yet
What Is Parallel Computing 1 PDF
21 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Parallel Algorithm Main Single
No ratings yet
Parallel Algorithm Main Single
289 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
PA midsem
No ratings yet
PA midsem
20 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
pdcco1
No ratings yet
pdcco1
8 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
1 Overview, Models of Computation, Brent's Theorem
No ratings yet
1 Overview, Models of Computation, Brent's Theorem
8 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
No ratings yet
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
17 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
HPC Note
No ratings yet
HPC Note
39 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Three
No ratings yet
Three
10 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
V Models of Parallel Computers V. Models of Parallel Computers - After PRAM and Early Models
No ratings yet
V Models of Parallel Computers V. Models of Parallel Computers - After PRAM and Early Models
35 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Multithreading Algorithms
No ratings yet
Multithreading Algorithms
36 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
PRAMs
No ratings yet
PRAMs
67 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
P 1
No ratings yet
P 1
44 pages
p1
No ratings yet
p1
30 pages
PC 1
No ratings yet
PC 1
53 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
Parallel Algorithms and Architectures 1
No ratings yet
Parallel Algorithms and Architectures 1
22 pages
HPC_introduction_Lecture_3
No ratings yet
HPC_introduction_Lecture_3
42 pages
CS621-CHEATSHEET.docx
No ratings yet
CS621-CHEATSHEET.docx
11 pages
Solution 2-DD
No ratings yet
Solution 2-DD
70 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Notes 02
No ratings yet
Notes 02
9 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Pic® Micro Principles V11
From Everand
Pic® Micro Principles V11
Clive W. Humphris
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Course Type Course Code Name of Course L T P Credit: Text Books
No ratings yet
Course Type Course Code Name of Course L T P Credit: Text Books
1 page
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
78 pages
Selection Test - 2015 Àæ Éã À Àjãpéë - 2015 Àjãpáëyðuà Ué Àæzà Éuà Àä
No ratings yet
Selection Test - 2015 Àæ Éã À Àjãpéë - 2015 Àjãpáëyðuà Ué Àæzà Éuà Àä
20 pages
L1 Introduction
No ratings yet
L1 Introduction
12 pages
Computer Organization e
No ratings yet
Computer Organization e
2 pages
3.Synchronization
No ratings yet
3.Synchronization
45 pages
What Is Actor and Why It Is Used in Swift
No ratings yet
What Is Actor and Why It Is Used in Swift
2 pages
Node - Js Multithreading!
No ratings yet
Node - Js Multithreading!
6 pages
PDC Assignment
No ratings yet
PDC Assignment
3 pages
A Process Control Block
No ratings yet
A Process Control Block
26 pages
Ganesh_Surya--DevOps
No ratings yet
Ganesh_Surya--DevOps
1 page
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Cse308 - Operating Systems: G.Manikandan Sap / Ict / Soc Sastra
No ratings yet
Cse308 - Operating Systems: G.Manikandan Sap / Ict / Soc Sastra
15 pages
Middle Wares
No ratings yet
Middle Wares
3 pages
Chapter-1 Introduction To Distributed System
No ratings yet
Chapter-1 Introduction To Distributed System
68 pages
Process Management: Processes Threads Process Synchronization CPU Scheduling Deadlocks
No ratings yet
Process Management: Processes Threads Process Synchronization CPU Scheduling Deadlocks
38 pages
Processes and Threads: 2.1 Processes 2.2 Threads 2.3 Interprocess Communication 2.4 Classical IPC Problems 2.5 Scheduling
No ratings yet
Processes and Threads: 2.1 Processes 2.2 Threads 2.3 Interprocess Communication 2.4 Classical IPC Problems 2.5 Scheduling
55 pages
Introduction To Deadlock: Difference Between Starvation and Deadlock
No ratings yet
Introduction To Deadlock: Difference Between Starvation and Deadlock
12 pages
2024-12-03_00-50-56.7066_-0500-717f4236b7e6a4231c8a0f95424f1cf0d1edbac7
No ratings yet
2024-12-03_00-50-56.7066_-0500-717f4236b7e6a4231c8a0f95424f1cf0d1edbac7
12 pages
Hardware Multithreading
No ratings yet
Hardware Multithreading
22 pages
CSC 322 Operating Systems Concepts - 6:: Special Thanks To
No ratings yet
CSC 322 Operating Systems Concepts - 6:: Special Thanks To
24 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Portal Info Stub
No ratings yet
Portal Info Stub
11 pages
Cluster
No ratings yet
Cluster
172 pages
Trace
No ratings yet
Trace
517 pages
CH 4 Scheduling
No ratings yet
CH 4 Scheduling
20 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
ECT426 M2 Ktunotes - in
No ratings yet
ECT426 M2 Ktunotes - in
89 pages
SCHEDULINGALGORITHMS
No ratings yet
SCHEDULINGALGORITHMS
6 pages
Process - CPU Scheduling
100% (1)
Process - CPU Scheduling
10 pages
Advanced Parallel Processing
No ratings yet
Advanced Parallel Processing
32 pages
EuroPython 2018 Shailen Sobhee
No ratings yet
EuroPython 2018 Shailen Sobhee
64 pages
Multicore Programming in Haskell
No ratings yet
Multicore Programming in Haskell
55 pages

L2 Parallel Computing Models

Uploaded by

L2 Parallel Computing Models

Uploaded by

Parallel Computation

Parallel Computation Models

• INTERNET Connects all the computers of

• DISTRIBUTED COMPUTERS ARE MORE

• DISTRIBUTED COMPUTERS HAVE INDIVIDUAL GOALS AND PRIVATE

PARALLEL COMPUTERS: COOPERATION IN A POSITIVE SENSE

FOR DISTRIBUTED SYSTEMS

WE ARE INTERESTED TO SOLVE IN PARALLEL

•MAINTENANCE OF CONTROL STUCTURE

•RESOURCE CONTROL ACTIVITIES

• WHICH MODEL OF COMPUTATION IS THE BETTER TO USE?

• HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL

ALGORITHM? • HOW TO CONSTRUCT EFFICIENT ALGORITHMS?

MANY CONCEPTS OF THE COMPLEXITY THEORY MUST BE REVISITED

• IS THE PARALLELISM A SOLUTION FOR HARD PROBLEMS?

• ARE THERE PROBLEMS NOT ADMITTING AN EFFICIENT PARALLEL SOLUTION,

THAT IS INHERENTLY SEQUENTIAL PROBLEMS?

• NETWORK (VLSI) MODEL

• The processors are connected by a network of bounded

degree. • No shared memory is available.

• Several interconnection topologies.

• Synchronous way of operating.

MESH CONNECTED ARRAY

In the network model a PARALLEL MACHINE is a very complex

LIMITS OF THE MODEL

• different topologies require different algorithms to solve the same

A shared-memory model is more suitable by an algorithmic point of view

• if M1and M2are equivalent then solving Π

CONSEQUENCE: any pair of processor Pi Pj can communicate in constant time!

Pi writes the message in cell x at time t

PRAM Instruction Set

• multiply/divide instructions take only

PRAM Complexity Measures

Two Technical Issues for PRAM

• How processors are activated

• Active processors explicitly activate additional

• The message-routing on the interconnection network would require time

WHY THE PRAM IS A REFERENCE MODEL?

•There exist algorithms simulating any PRAM algorithm on bounded degree

E. G. A PRAM algorithm requiring time T(n), can be simulated in a mesh of tree in

HOT The actual trend:

• The computation is divided in supersteps, each one includes

the study is still

Metrics cost-optimal: parallel cost

• EREW (Exclusive Read Exclusive Write)

You might also like