0% found this document useful (0 votes)
29 views31 pages

L2 Parallel Computing Models

The document discusses parallel computation models including PRAM and distributed models. It covers topics like parallel algorithms, complexity measures, and issues with shared memory models. The slides provide details on different parallel and distributed frameworks as well as interconnection networks and processing activation methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views31 pages

L2 Parallel Computing Models

The document discusses parallel computation models including PRAM and distributed models. It covers topics like parallel algorithms, complexity measures, and issues with shared memory models. The slides provide details on different parallel and distributed frameworks as well as interconnection networks and processing activation methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Parallel Computation

Models Lecture 2

Slide 1

Parallel Computation Models


• PRAM (parallel RAM)
• Fixed Interconnection Network
– bus, ring, mesh, hypercube, shuffle-exchange
• Boolean Circuits
• Combinatorial Circuits
• BSP (Bulk Synchronous Parallel) • LogP (L:
Upper bound for the latency, O: Overhead, G –
Gap, P – No. of Processors) Slide 2
PARALLEL AND
DISTRIBUTED COMPUTATION
• MANY INTERCONNECTED PROCESSORS WORKING CONCURRENTLY

P4 P5
P3

INTERCONNECTION
NETWORK

P2
. . . . Pn P1

• CONNECTION MACHINE
the world

Slide 3

• INTERNET Connects all the computers of


TYPES OF
MULTIPROCESSING
FRAMEWORKS
PARALLEL DISTRIBUTED

TECHNICAL ASPECTS
•PARALLEL COMPUTERS (USUALLY) WORK IN TIGHT SYNCRONY, SHARE MEMORY TO A LARGE
EXTENT AND HAVE A VERY FAST AND RELIABLE COMMUNICATION MECHANISM BETWEEN
THEM.

• DISTRIBUTED COMPUTERS ARE MORE


INDEPENDENT, COMMUNICATION IS LESS FREQUENT AND LESS SYNCRONOUS,
AND THE COOPERATION IS LIMITED.

PURPOSES
• PARALLEL COMPUTERS COOPERATE TO SOLVE MORE EFFICIENTLY (POSSIBLY)
DIFFICULT PROBLEMS

• DISTRIBUTED COMPUTERS HAVE INDIVIDUAL GOALS AND PRIVATE


ACTIVITIES. SOMETIME COMMUNICATIONS WITH OTHER ONES ARE NEEDED. (E. G.
DISTRIBUTED DATA BASE OPERATIONS).

PARALLEL COMPUTERS: COOPERATION IN A POSITIVE SENSE

DISTRIBUTED COMPUTERS:
COOPERATION IN A NEGATIVE
SENSE, ONLY WHEN IT IS
NECESSARY

Slide 4
FOR PARALLEL SYSTEMS
WE ARE INTERESTED TO SOLVE ANY PROBLEM IN PARALLEL

FOR DISTRIBUTED SYSTEMS

WE ARE INTERESTED TO SOLVE IN PARALLEL


PARTICULAR PROBLEMS ONLY, TYPICAL EXAMPLES
ARE:

•COMMUNICATION SERVICES
ROUTING
BROADCASTING

•MAINTENANCE OF CONTROL STUCTURE


SPANNING TREE CONSTRUCTION
TOPOLOGY UPDATE
LEADER ELECTION

•RESOURCE CONTROL ACTIVITIES


LOAD BALANCING
MANAGING GLOBAL DIRECTORIES
Slide 5

PARALLEL ALGORITHMS

• WHICH MODEL OF COMPUTATION IS THE BETTER TO USE?

• HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL

ALGORITHM? • HOW TO CONSTRUCT EFFICIENT ALGORITHMS?

MANY CONCEPTS OF THE COMPLEXITY THEORY MUST BE REVISITED

• IS THE PARALLELISM A SOLUTION FOR HARD PROBLEMS?

• ARE THERE PROBLEMS NOT ADMITTING AN EFFICIENT PARALLEL SOLUTION,

THAT IS INHERENTLY SEQUENTIAL PROBLEMS?


Slide 6
We need a model of computation

• NETWORK (VLSI) MODEL

• The processors are connected by a network of bounded

degree. • No shared memory is available.

• Several interconnection topologies.

• Synchronous way of operating.

MESH CONNECTED ARRAY


degree = 4 (N) diameter = 2N

Slide 7

CUBE
0111

0110

HYPER
0101

0010 1110
1111

0100
1101

1010

diameter = 4
degree = 4 (log2N)

0011
1011
1000 1001
0000 0001
1100

N = 24 PROCESSORS
Slide 8
Other important topologies

• binary trees
• mesh of trees
• cube connected cycles

In the network model a PARALLEL MACHINE is a very complex


ensemble of small interconnected units, performing elementary
operations.
- Each processor has its own memory.
- Processors work synchronously.

LIMITS OF THE MODEL

• different topologies require different algorithms to solve the same


problem
• it is difficult to describe and analyse algorithms (the migration
of data have to be described)

A shared-memory model is more suitable by an algorithmic point of view

Slide 9

Model Equivalence
• given two models M1and M2, and a problem Π
of size n

• if M1and M2are equivalent then solving Π


requires:
– T(n) time and P(n) processors on M1
– T(n)O(1) time and P(n)O(1) processors on M2

Slide 10

PRAM
• Parallel Random Access Machine •
Shared-memory multiprocessor •
unlimited number of processors, each –
has unlimited local memory
– knows its ID
– able to access the shared memory
• unlimited shared memory
Slide 11

MODEL 1
P 1

2
PRAM 3

P2 .
. Common
Pi . Memory
.
P .
n

?
m
PRAM n RAM processors connected to a common memory of m cells

ASSUMPTION: at each time unit each Pi can read a memory cell, make an internal
computation and write another memory cell.

CONSEQUENCE: any pair of processor Pi Pj can communicate in constant time!

Pi writes the message in cell x at time t


Pireads the message in cell x at time t+1
Slide 12

PRAM
• Inputs/Outputs are placed in the shared
memory (designated address)
• Memory cell stores an arbitrarily large
integer
• Each instruction takes unit time •
Instructions are synchronized across the
processors

Slide 13

PRAM Instruction Set


• accumulator architecture
– memory cell R0accumulates results

• multiply/divide instructions take only


constant operands
– prevents generating exponentially large
numbers in polynomial time

Slide 14

PRAM Complexity Measures


• for each individual processor
– time: number of instructions executed
– space: number of memory cells accessed

• PRAM machine
– time: time taken by the longest running processor
– hardware: maximum number of active processors

Slide 15

Two Technical Issues for PRAM

• How processors are activated


• How shared memory is accessed

Slide 16

Processor Activation
• P0 places the number of processors (p) in the
designated shared-memory cell
– each active Pi, where i < p, starts executing
– O(1) time to activate
– all processors halt when P0 halts

• Active processors explicitly activate additional


processors via FORK instructions
– tree-like activation
– O(log p) time to activate
Slide 17
THE PRAM IS A THEORETICAL (UNFEASIBLE) MODEL
• The interconnection network between processors and memory would require
a very large amount of area .

• The message-routing on the interconnection network would require time


proportional to network size (i. e. the assumption of a constant access time
to the memory is not realistic).

WHY THE PRAM IS A REFERENCE MODEL?

• Algorithm’s designers can forget the communication problems and focus their
attention on the parallel computation only.

•There exist algorithms simulating any PRAM algorithm on bounded degree


networks.

E. G. A PRAM algorithm requiring time T(n), can be simulated in a mesh of tree in


time T(n)log2n/loglogn, that is each step can be simulated with a slow-down of
log2n/loglogn.

• Instead of design ad hoc algorithms for bounded degree networks, design more
general algorithms for the PRAM model and simulate them on a feasible network.

Slide 18
• For the PRAM model there exists a well developed body of
techniques and methods to handle different classes of computational
problems.
• The discussion on parallel model of computation is still

HOT The actual trend:

COARSE-GRAINED MODELS

The degree of parallelism allowed is independent
from the number of processors.

• The computation is divided in supersteps, each one includes

• local computation
• communication phase
• syncronization phase

the study is still


at the beginning!
Slide 19

Metrics
A measure of relative performance between a multiprocessor
system and a single processor system is the speed-up S( p),
defined as follows:
S( p) =Execution time using a single processor system Execution time
using a multiprocessor with p processors

S( p) =T1
TpEfficiency =Sp
p

Cost = p × Tp

Slide 20

Metrics cost-optimal: parallel cost


= sequential time
• Parallel algorithm is
Cp = T1
Ep = 100%

• Critical when
down-scaling: parallel
implementation may
become slower than
sequential
T1 = n3
Tp = n2.5 when p = n2
Cp = n4.5

Slide 21
Amdahl’s Law
• f = fraction of the problem that’s
inherently sequential
(1 – f) = fraction that’s parallel
=
• Parallel time Tp: 11
Tp= f + (1− f ) p
• Speedup processors: f
with p Sp−
f + p
Slide 22
Amdahl’s Law
• Upper bound on
speedup (p = ∞)
= S1
11Converges to 0
Sp− f =
∞ f
f Exa 2%
mple +
• p
:f=
S = 1 / 0.02 = 50

Slide 23

PRAM
• Too many interconnections gives problems with synchronization •
However it is the best conceptual model for designing efficient
parallel algorithms
– due to simplicity and possibility of simulating efficiently PRAM
algorithms on more realistic parallel architectures
Slide 24

Shared-Memory Access
Concurrent (C) means, many processors can do the operation simultaneously in
the same memory
Exclusive (E) not concurent

• EREW (Exclusive Read Exclusive Write)


• CREW (Concurrent Read Exclusive Write)
– Many processors can read simultaneously
the same location, but only one can attempt to write to a given location
• ERCW
• CRCW
– Many processors can write/read at/from the same memory location

Slide 25

Example CRCW-PRAM
• Initially
– table A contains values 0 and 1
– output contains value 0
• The program computes the “Boolean OR” of
A[1], A[2], A[3], A[4], A[5]

Slide 26

Example CREW-PRAM
• Assume initially table A contains [0,0,0,0,0,1] and we
have the parallel program
Slide
27

Pascal triangle
PRAM CREW

Slide 28

You might also like