L2 Parallel Computing Models
L2 Parallel Computing Models
Models Lecture 2
Slide 1
P4 P5
P3
INTERCONNECTION
NETWORK
P2
. . . . Pn P1
• CONNECTION MACHINE
the world
Slide 3
TECHNICAL ASPECTS
•PARALLEL COMPUTERS (USUALLY) WORK IN TIGHT SYNCRONY, SHARE MEMORY TO A LARGE
EXTENT AND HAVE A VERY FAST AND RELIABLE COMMUNICATION MECHANISM BETWEEN
THEM.
PURPOSES
• PARALLEL COMPUTERS COOPERATE TO SOLVE MORE EFFICIENTLY (POSSIBLY)
DIFFICULT PROBLEMS
DISTRIBUTED COMPUTERS:
COOPERATION IN A NEGATIVE
SENSE, ONLY WHEN IT IS
NECESSARY
Slide 4
FOR PARALLEL SYSTEMS
WE ARE INTERESTED TO SOLVE ANY PROBLEM IN PARALLEL
•COMMUNICATION SERVICES
ROUTING
BROADCASTING
PARALLEL ALGORITHMS
Slide 7
CUBE
0111
0110
HYPER
0101
0010 1110
1111
0100
1101
1010
diameter = 4
degree = 4 (log2N)
0011
1011
1000 1001
0000 0001
1100
N = 24 PROCESSORS
Slide 8
Other important topologies
• binary trees
• mesh of trees
• cube connected cycles
Slide 9
Model Equivalence
• given two models M1and M2, and a problem Π
of size n
Slide 10
PRAM
• Parallel Random Access Machine •
Shared-memory multiprocessor •
unlimited number of processors, each –
has unlimited local memory
– knows its ID
– able to access the shared memory
• unlimited shared memory
Slide 11
MODEL 1
P 1
2
PRAM 3
P2 .
. Common
Pi . Memory
.
P .
n
?
m
PRAM n RAM processors connected to a common memory of m cells
ASSUMPTION: at each time unit each Pi can read a memory cell, make an internal
computation and write another memory cell.
PRAM
• Inputs/Outputs are placed in the shared
memory (designated address)
• Memory cell stores an arbitrarily large
integer
• Each instruction takes unit time •
Instructions are synchronized across the
processors
Slide 13
Slide 14
• PRAM machine
– time: time taken by the longest running processor
– hardware: maximum number of active processors
Slide 15
Slide 16
Processor Activation
• P0 places the number of processors (p) in the
designated shared-memory cell
– each active Pi, where i < p, starts executing
– O(1) time to activate
– all processors halt when P0 halts
• Algorithm’s designers can forget the communication problems and focus their
attention on the parallel computation only.
• Instead of design ad hoc algorithms for bounded degree networks, design more
general algorithms for the PRAM model and simulate them on a feasible network.
Slide 18
• For the PRAM model there exists a well developed body of
techniques and methods to handle different classes of computational
problems.
• The discussion on parallel model of computation is still
COARSE-GRAINED MODELS
•
The degree of parallelism allowed is independent
from the number of processors.
• local computation
• communication phase
• syncronization phase
Metrics
A measure of relative performance between a multiprocessor
system and a single processor system is the speed-up S( p),
defined as follows:
S( p) =Execution time using a single processor system Execution time
using a multiprocessor with p processors
S( p) =T1
TpEfficiency =Sp
p
Cost = p × Tp
Slide 20
• Critical when
down-scaling: parallel
implementation may
become slower than
sequential
T1 = n3
Tp = n2.5 when p = n2
Cp = n4.5
Slide 21
Amdahl’s Law
• f = fraction of the problem that’s
inherently sequential
(1 – f) = fraction that’s parallel
=
• Parallel time Tp: 11
Tp= f + (1− f ) p
• Speedup processors: f
with p Sp−
f + p
Slide 22
Amdahl’s Law
• Upper bound on
speedup (p = ∞)
= S1
11Converges to 0
Sp− f =
∞ f
f Exa 2%
mple +
• p
:f=
S = 1 / 0.02 = 50
Slide 23
PRAM
• Too many interconnections gives problems with synchronization •
However it is the best conceptual model for designing efficient
parallel algorithms
– due to simplicity and possibility of simulating efficiently PRAM
algorithms on more realistic parallel architectures
Slide 24
Shared-Memory Access
Concurrent (C) means, many processors can do the operation simultaneously in
the same memory
Exclusive (E) not concurent
Slide 25
Example CRCW-PRAM
• Initially
– table A contains values 0 and 1
– output contains value 0
• The program computes the “Boolean OR” of
A[1], A[2], A[3], A[4], A[5]
Slide 26
Example CREW-PRAM
• Assume initially table A contains [0,0,0,0,0,1] and we
have the parallel program
Slide
27
Pascal triangle
PRAM CREW
Slide 28