Lecture 11
Lecture 11
Operations
Preliminaries
• A big problem is divided into smaller tasks (logical unit)
• Process is an entity that execute tasks
• Mapping is performed to allocate tasks to processes
• Several processes executes at the same time and perform Inter
Process Communication (Interaction)
• Interaction is performed to share Data, Work, Synchronization
Information
• There are various patterns for communication
Assumptions for the Operations
• Interconnections support cut-through routing
• Communication time between any pair of nodes in the
network is same (regardless of the number of intermediate
nodes)
• Links are bi-directional
• The directly connected nodes can simultaneously send and receive messages of
m words without any congestion
• Single-port communication model
• A node can send on only one of its links at a time
• A node can receive on only one of its links at a time
• However, a node can receive a message while sending
another message at the same time on the same or a different
link.
Patterns
1. One to All Broadcast/All to One Reduction
2. All to All Broadcast/All to All Reduction
3. All Reduce (All to One Reduction + One to All
Broadcast)
4. Scatter (One to All Broadcast Personalized)/Gather
Topologies
1. Ring/Linear Array (One Dimensional)
2. Mesh (Two Dimensional)
3. Hyper Cube (Three Dimensional)
One-to-All Broadcast and All-to-One
Reduction
One-to-All Broadcast
• A single process sends identical data to all other processes.
• Initially one process has data of m size.
• After broadcast operation, each of the processes have own copy of
the m size.
All-to-One Reduction
• Dual of one-to-all broadcast
• The m-sized data from all processes are combined through an
associative operator
• Accumulated at a single destination process into one buffer of
size m
One-to-All Broadcast and All-to-One Reduction
One-to-All Broadcast and All-to-One Reduction
• Application: Used in many parallel algorithms including matrix-vector
multiplication, shortest path, Gaussian Elimination.
• How it works: Sequentially send p-1 from the source to the other p-1
process
• Disadvantages:
• Source becomes bottleneck
• The communication network is underutilized because only the connection
between a single pair of nodes is used at a time
• Solution: Recursive Doubling
Recursive doubling (Linear Array or Ring)
Recursive Doubling Broadcast
• Source process sends the massage to another process
• In next communication phase both the processes can
simultaneously propagate the message
• Message “HI” from the source node P0 is passed to all other nodes in
the ring in following three steps:
1. P0 to P4 (Distance:4)
2. P0 to P2, P4 to P6, in parallel (Distance:2)
3. P0 to P1, P2 to P3, P4 to P5, P6 to P7, in parallel (Distance:1)
Recursive doubling (Linear Array or Ring)
Recursive Doubling Reduction
Example: Sum of all numbers
Mesh
• We can regard each row and column of a square mesh
of p nodes as a linear array of nodes
• Communication algorithms on the mesh are simple
extensions of their linear array counterparts
2 6 10 14
1 5 9 13
0 4 8 12
“HI”
Step 1 (0th row recursive doubling)
3 7 11 15
2 6 10 14
1 5 9 13
0 4 8 12
“HI” “HI”
Step 2 (0th row recursive doubling)
3 7 11 15
2 6 10 14
1 5 9 13
0 4 8 12
“HI” “HI” “HI” “HI”
Step 3 (All Column recursive doubling)
3 7 11 15
2 6 10 14
“HI” “HI” “HI” “HI”
1 5 9 13
0 4 8 12
“HI” “HI” “HI” “HI”
Step 4 (All Column recursive doubling)
3 7 11 15
“HI” “HI” “HI” “HI”
2 6 10 14
“HI” “HI” “HI” “HI”
1 5 9 13
“HI” “HI” “HI” “HI”
0 4 8 12
“HI” “HI” “HI” “HI”
Reduction
3 7 11 15
“HI” “HI” “HI” “HI”
2 6 10 14
“HI” “HI” “HI” “HI”
1 5 9 13
“HI” “HI” “HI” “HI”
0 4 8 12
“HI” “HI” “HI” “HI”
3 7 11 15
2 6 10 14
“HI” “HI” “HI” “HI”
1 5 9 13
0 4 8 12
“HI” “HI” “HI” “HI”
3 7 11 15
2 6 10 14
1 5 9 13
0 4 8 12
“HI” “HI” “HI” “HI”
3 7 11 15
2 6 10 14
1 5 9 13
0 4 8 12
“HI” “HI”
Mesh (Broadcast and Reduction)
Hypercube
Broadcast
• Source node first send data to one node in the highest
dimension
• The communication successively proceeds along lower
dimensions in the subsequent steps
• The algorithm is same as used for linear array
• However, here [in hypercube] changing order of dimension will not congest
the network
Hypercube (Broadcast)
Matrix-Vector Multiplication (An
Application)
All-to-All Broadcast and All-to-All
Reduction
• All-to-All Broadcast
• A generalization to of one-to-all broadcast.
• Every process broadcasts m-word message.
• The broadcast-message for each of the processes can be
different than others
• All-to-All Reduction
• Dual of all-to-all broadcast
• Each node is the destination of an all-to-one reduction out of
total P reductions.
All-to-All Broadcast and All-to-All
Reduction
Linear Ring Broadcast (All to All)
Linear Ring Reduction (All to All)
• Draw an All-to-All Broadcast on a P-node linear ring
• Reverse the directions in each foreach of the step without
changing message
• After each communication step, combine messages
having same broadcast destination with associative
operator.
Task
• Draw an All-to-All Broadcast on a 4-node linear ring
• Reverse the directions and combine the results using ‘SUM’
All-to-All Broadcast on 2D Mesh
• based on the linear
array algorithm,
treating rows and
columns of the mesh
as linear arrays
• communication takes
place in two phases
• Row Wise All to All
Broad cast
• Column Wise All to
All Broad cast
All-to-All Broadcast on HyperCube
• The hypercube algorithm for all-to-all broadcast extends
the mesh algorithm to log p dimensions.
• Procedure: Requires log p steps.
• Communication: Occurs along a different dimension (x, y,
z) of the p-node hypercube in each step.
• Step Process: Pairs of nodes exchange data, doubling the
message size for the next step by concatenating received
messages with current data.
• Figure Illustrates these steps for an eight-node hypercube
with bidirectional communication channels.
All-Reduce
• All-Reduce: All to One Reduction + One to All Broad Cast
• Use all-to-one reduction followed by one-to-all broadcast
• The output is same as All to All Broadcast with less traffic
congestion
Example
• 2nd Process
• 1st Step (x-axis) 2<->3
• (2,3), (2,7), (2,5), (2,1)