0% found this document useful (0 votes)
7 views52 pages

Lecture 11

Uploaded by

jihem33832
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views52 pages

Lecture 11

Uploaded by

jihem33832
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Basic Communication

Operations
Preliminaries
• A big problem is divided into smaller tasks (logical unit)
• Process is an entity that execute tasks
• Mapping is performed to allocate tasks to processes
• Several processes executes at the same time and perform Inter
Process Communication (Interaction)
• Interaction is performed to share Data, Work, Synchronization
Information
• There are various patterns for communication
Assumptions for the Operations
• Interconnections support cut-through routing
• Communication time between any pair of nodes in the
network is same (regardless of the number of intermediate
nodes)
• Links are bi-directional
• The directly connected nodes can simultaneously send and receive messages of
m words without any congestion
• Single-port communication model
• A node can send on only one of its links at a time
• A node can receive on only one of its links at a time
• However, a node can receive a message while sending
another message at the same time on the same or a different
link.
Patterns
1. One to All Broadcast/All to One Reduction
2. All to All Broadcast/All to All Reduction
3. All Reduce (All to One Reduction + One to All
Broadcast)
4. Scatter (One to All Broadcast Personalized)/Gather
Topologies
1. Ring/Linear Array (One Dimensional)
2. Mesh (Two Dimensional)
3. Hyper Cube (Three Dimensional)
One-to-All Broadcast and All-to-One
Reduction
One-to-All Broadcast
• A single process sends identical data to all other processes.
• Initially one process has data of m size.
• After broadcast operation, each of the processes have own copy of
the m size.
All-to-One Reduction
• Dual of one-to-all broadcast
• The m-sized data from all processes are combined through an
associative operator
• Accumulated at a single destination process into one buffer of
size m
One-to-All Broadcast and All-to-One Reduction
One-to-All Broadcast and All-to-One Reduction
• Application: Used in many parallel algorithms including matrix-vector
multiplication, shortest path, Gaussian Elimination.
• How it works: Sequentially send p-1 from the source to the other p-1
process

• Disadvantages:
• Source becomes bottleneck
• The communication network is underutilized because only the connection
between a single pair of nodes is used at a time
• Solution: Recursive Doubling
Recursive doubling (Linear Array or Ring)
Recursive Doubling Broadcast
• Source process sends the massage to another process
• In next communication phase both the processes can
simultaneously propagate the message
• Message “HI” from the source node P0 is passed to all other nodes in
the ring in following three steps:
1. P0 to P4 (Distance:4)
2. P0 to P2, P4 to P6, in parallel (Distance:2)
3. P0 to P1, P2 to P3, P4 to P5, P6 to P7, in parallel (Distance:1)
Recursive doubling (Linear Array or Ring)
Recursive Doubling Reduction
Example: Sum of all numbers
Mesh
• We can regard each row and column of a square mesh
of p nodes as a linear array of nodes
• Communication algorithms on the mesh are simple
extensions of their linear array counterparts

Broadcast and Reduction


• Two step breakdown:
i. The operation is performed along one dimension by treating the row
as linear array
ii. Then the all the columns are treated similarly
One to all Broadcast on a 16 node mesh
3 7 11 15

2 6 10 14

1 5 9 13

0 4 8 12
“HI”
Step 1 (0th row recursive doubling)
3 7 11 15

2 6 10 14

1 5 9 13

0 4 8 12
“HI” “HI”
Step 2 (0th row recursive doubling)
3 7 11 15

2 6 10 14

1 5 9 13

0 4 8 12
“HI” “HI” “HI” “HI”
Step 3 (All Column recursive doubling)
3 7 11 15

2 6 10 14
“HI” “HI” “HI” “HI”

1 5 9 13

0 4 8 12
“HI” “HI” “HI” “HI”
Step 4 (All Column recursive doubling)
3 7 11 15
“HI” “HI” “HI” “HI”

2 6 10 14
“HI” “HI” “HI” “HI”

1 5 9 13
“HI” “HI” “HI” “HI”

0 4 8 12
“HI” “HI” “HI” “HI”
Reduction
3 7 11 15
“HI” “HI” “HI” “HI”

2 6 10 14
“HI” “HI” “HI” “HI”

1 5 9 13
“HI” “HI” “HI” “HI”

0 4 8 12
“HI” “HI” “HI” “HI”
3 7 11 15

2 6 10 14
“HI” “HI” “HI” “HI”

1 5 9 13

0 4 8 12
“HI” “HI” “HI” “HI”
3 7 11 15

2 6 10 14

1 5 9 13

0 4 8 12
“HI” “HI” “HI” “HI”
3 7 11 15

2 6 10 14

1 5 9 13

0 4 8 12
“HI” “HI”
Mesh (Broadcast and Reduction)
Hypercube
Broadcast
• Source node first send data to one node in the highest
dimension
• The communication successively proceeds along lower
dimensions in the subsequent steps
• The algorithm is same as used for linear array
• However, here [in hypercube] changing order of dimension will not congest
the network
Hypercube (Broadcast)
Matrix-Vector Multiplication (An
Application)
All-to-All Broadcast and All-to-All
Reduction
• All-to-All Broadcast
• A generalization to of one-to-all broadcast.
• Every process broadcasts m-word message.
• The broadcast-message for each of the processes can be
different than others
• All-to-All Reduction
• Dual of all-to-all broadcast
• Each node is the destination of an all-to-one reduction out of
total P reductions.
All-to-All Broadcast and All-to-All
Reduction
Linear Ring Broadcast (All to All)
Linear Ring Reduction (All to All)
• Draw an All-to-All Broadcast on a P-node linear ring
• Reverse the directions in each foreach of the step without
changing message
• After each communication step, combine messages
having same broadcast destination with associative
operator.
Task
• Draw an All-to-All Broadcast on a 4-node linear ring
• Reverse the directions and combine the results using ‘SUM’
All-to-All Broadcast on 2D Mesh
• based on the linear
array algorithm,
treating rows and
columns of the mesh
as linear arrays
• communication takes
place in two phases
• Row Wise All to All
Broad cast
• Column Wise All to
All Broad cast
All-to-All Broadcast on HyperCube
• The hypercube algorithm for all-to-all broadcast extends
the mesh algorithm to log p dimensions.
• Procedure: Requires log p steps.
• Communication: Occurs along a different dimension (x, y,
z) of the p-node hypercube in each step.
• Step Process: Pairs of nodes exchange data, doubling the
message size for the next step by concatenating received
messages with current data.
• Figure Illustrates these steps for an eight-node hypercube
with bidirectional communication channels.
All-Reduce
• All-Reduce: All to One Reduction + One to All Broad Cast
• Use all-to-one reduction followed by one-to-all broadcast
• The output is same as All to All Broadcast with less traffic
congestion
Example

• All to All Broadcast

• All to One Reduction

• One to All Broadcast


Prefix-Sums
• Prefix-sums are also known as scan operations
• Given p numbers n0, n1, ..., np-1(one on each node), the
problem is to compute the sums such that: -
• 𝑺𝒌 = σ𝑘𝑖=0 (𝒏𝒊)
• Here 𝑺𝒌 is the prefix-sum computed at kth node after the operation.
• Example:
• Original sequence: <3, 1, 4, 0, 2>
• Sequence of prefix sums: <3, 4, 8, 8, 10>
Rules
• Round Bracket (): The
msg is sent to other
node in next step
• Square Bracket []: The
msg is kept with that
node
• Lower index node will
keep msg in square
bracket as it is
• Higher index will add
msg in square bracket
that it got from lower
index node
Scatter and Gather
• Scatter (one-to-all personalized communication)
• Gather (Concatenation) is different than all to one
reduction as it doesn’t reduce the results with
associative operator
The scatter operation on an eight-
node hypercube
All-to-All personalized Communication
• Each node sends a distinct message of size m to every
other node.
• Also known total exchange
Example (Transpose Matrix)

All-to-all personalized communication in


transposing a 4 x 4 matrix using four processes.
All-to-All personalized [Ring]
Cont.
• All-to-all personalized communication on a six-node
ring.
• The label of each message is of the form {x, y},
where x is the label of the node that originally owned
the message, and y is the label of the node that is the
final destination of the message.
• The label ({x1, y1}, {x2, y2}, ..., {xn, yn}) indicates a
message that is formed by concatenating n individual
messages.
All-to-All personalized [Mesh]
• Two Steps
1. All to All Personal
Communication (Row Wise)
2. All to All Personal
Communication (Column
Wise)
All-to-All
personalized
[Hyper Cube]
• 0th Process
• 1st Step (x-axis) 0<->1
• (0,1), (0,3), (0,5), (0,7)

• 2nd step (y-axis) 0<->2


• (0,2), (0,6), (1,2), (1,6)

• 3rd Step (z-axis) 0<->4


• (0,4), (1,4), (2,4), (3,4)

• 2nd Process
• 1st Step (x-axis) 2<->3
• (2,3), (2,7), (2,5), (2,1)

• 2nd step (y-axis) 2<->0


• (2,0), (2,4), (3,0), (3,4)

• 3rd Step (z-axis) 2<->6


• (2,6), (3,6), (0,6), (1,6)
Circular Shift
• circular q-shift is the operation in which node i sends
a data packet to node (i + q) mod p in a p-node
ensemble (0 < q < p).
Circular Shift [Linear/Ring]
• Min (q, P-q) for finding short path of communication
Circular Shift [Mesh]
• Circular shift over Mesh
Topology is done in following
steps
1. Communication Over Row {q
mod sqrt (p)}
2. Compensatory Column Shift
3. Communication Over Column
{Floor[q/sqrt(p)]}

The communication steps in a


circular 5-shift on a 4 x 4 mesh
Circular Shift
[Hypercube]
• Q-shift e.g 5-shift
• First convert to binary
representation (101)
• Write the power of 2
(for enabled bits) 22 +
20
• i.e. 5 = 4 + 1
• 5 shift = 4 shift + 1 shift
he mapping of an eight-node linear array
onto a three-dimensional hypercube to
perform a circular 5-shift as a combination of
a 4-shift and a 1-shift.

You might also like