0% found this document useful (0 votes)
2 views22 pages

Pdc - Co1-Basic Op & Cost Analysis

The document outlines basic communication operations and cost analysis in parallel and distributed computing, emphasizing the importance of efficient communication patterns among processes. Key operations discussed include one-to-all broadcast, all-to-one reduction, scatter, gather, and all-to-all communication, along with their implementations on various architectures. Additionally, it covers cost analysis methods for estimating the performance of parallel execution, focusing on optimizing resource allocation and calculating parallel costs.

Uploaded by

Syed Tahaseen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views22 pages

Pdc - Co1-Basic Op & Cost Analysis

The document outlines basic communication operations and cost analysis in parallel and distributed computing, emphasizing the importance of efficient communication patterns among processes. Key operations discussed include one-to-all broadcast, all-to-one reduction, scatter, gather, and all-to-all communication, along with their implementations on various architectures. Additionally, it covers cost analysis methods for estimating the performance of parallel execution, focusing on optimizing resource allocation and calculating parallel costs.

Uploaded by

Syed Tahaseen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

CO - 1

COURSE NAME : PARALLEL & DISTRIBUTED COMPUTING


COURSE CODE : 22CS4106

TOPICS : BASIC COMMUNICATION OPERATIONS AND COST


ANALYSIS.
BASIC COMMUNICATION
OPERATIONS
• Many interactions in practical parallel programs occur in
well-defined patterns involving more than two processes.
• Often either all processes participate together in a single
global interaction operation, or subset of processes
participate in interactions local to each subset.
• These common basic pattern inter-process interaction of
communication are frequently used as building blocks in
a variety of parallel programs.

2
BASIC COMMUNICATION OPERATIONS
• Proper implementation of these basic communication operations on various
parallel architectures is a key to the efficient execution of the parallel algorithms
that use them.
• The following basic communication operations are commonly used on various
parallel architectures:
• One to all broadcast and All to one reduction
• All to all broadcast and reduction
• All reduce operations
• Prefix sum operations
• Scatter and gather
• All to all personalized communication

3
ONE-TO-ALL BROADCAST AND ALL-TO-
ONE REDUCTION

• One processor has a piece of data (of size m) it needs to send to everyone.
• The dual of one-to-all broadcast is all-to-one reduction.
• In all-to-one reduction, each processor has m units of data. These data items
must be combined piece-wise (using some associative operator, such as
addition or min), and the result made available at a target processor.

4
ONE-TO-ALL BROADCAST AND ALL-TO-
ONE REDUCTION ON RINGS

• Simplest way is to send p-1 messages from the source to the other p-1
processors - this is not very efficient.
• Use recursive doubling: source sends a message to a selected processor. We now
have two independent problems derined over halves of machines.
• Reduction can be performed in an identical fashion by inverting the process.

5
ONE-TO-ALL BROADCAST

One-to-all broadcast on an eight-node ring.


Node 0 is the source of the broadcast. Each
message transfer step is shown by a
numbered, dotted arrow from the source of
the message to its destination. The number
on an arrow indicates the time step during
which the message is transferred.

6
ALL-TO-ONE REDUCTION

Reduction on an eight-node
ring with node 0 as the
destination of the reduction.

7
ALL-TO-ALL BROADCAST ON A MESH

All-to-all broadcast on a 3 x 3
mesh. The groups of nodes
communicating with each other in
each phase are enclosed by dotted
boundaries. By the end of the
second phase, all nodes get
(0,1,2,3,4,5,6,7) (that is, a
message from each node).

8
ALL-TO-ALL REDUCTION

• Similar communication pattern to all-to-all broadcast,


except in the reverse order.
• On receiving a message, a node must combine it with
the local copy of the message that has the same
destination as the received message before forwarding
the combined message to the next neighbor.

9
ALL-TO-ALL BROADCAST AND
REDUCTION ON A RING

• Simplest approach: perform p one-to-all


broadcasts. This is not the most efficient way,
though.
• Each node first sends to one of its neighbors the
data it needs to broadcast.
• In subsequent steps, it forwards the data
received from one of its neighbors to its other
neighbor.
• The algorithm terminates in p-1 steps

All-to-all broadcast on an eight-node ring.

10
BROADCAST AND REDUCTION ON A
MESH
• We can view each row and column of a
square mesh of p nodes as a linear array of
√p nodes.
• Broadcast and reduction operations can be
performed in two steps - the first step does
the operation along a row and the second
step along each column concurrently.
• This process generalizes to higher
dimensions as well.

11
BROADCAST AND REDUCTION ON A
BALANCED BINARY TREE

• Consider a binary tree in


which processors are
(logically) at the leaves and
internal nodes are routing
nodes.
• Assume that source
processor is the root of this
tree. In the first step, the
source sends the data to the
right child (assuming the
source is also the left child).
One-to-all broadcast on an eight-node tree. The problem has now been
decomposed into two
12 problems with half the
POINT-TO-POINT COMMUNICATION

• Send/Receive:
• A process sends a message to another process and the receiving
process acknowledges it.
• Used for direct communication between two processes.
• Examples: MPI (Message Passing Interface) MPI_Send and
MPI_Recv.

13
BROADCAST

• A single process sends the same message to all other


processes in the system.
• Common in tasks where a root process distributes data to
others.
• Example: MPI MPI_Bcast.

14
SCATTER & GATHER
• Scatter
• A process divides data into chunks and sends each chunk to
different processes.
• Used when a large dataset needs to be distributed across
processes.
• Gather
• A process collects data from all other processes into a single
process.
• Example: MPI MPI_Gather.
15
ALL-TO-ALL COMMUNICATION

• All processes send data to every other process.


• Used in dense communication patterns.
• Example: MPI MPI_Alltoall.

16
REDUCE

• Data from multiple processes is combined using a specific


operation (e.g., sum, max).
• Results are sent to a single process.
• Example: MPI MPI_Reduce.

17
COST ANALYSIS IN COMMUNICATION

• Cost analysis in parallel and distributed computing is the process of


estimating the cost of parallel execution in distributed systems.
• It's different from serial cost analysis because it takes into account
the cost of synchronized tasks that are running in parallel.
• Here are some aspects of cost analysis in parallel and distributed
computing:

18
COST ANALYSIS IN COMMUNICATION
• Parallel cost analysis

• This static cost analysis method uses three phases to estimate the cost
of parallel execution:
• Block-level analysis: Estimates the serial costs of blocks between
synchronization points
• Distributed flow graph (DFG) construction: Captures the
parallelism, waiting, and idle times in the distributed system
• Parallel cost calculation: The parallel cost is the path with the highest
cost in the DFG

19
COST ANALYSIS IN COMMUNICATION

• Optimizing cost

• Optimizing cost can improve system performance and user experience by


allocating resources efficiently. This involves analyzing performance metrics like
latency and throughput to identify areas where resources can be minimized.
• Cost calculation

• The cost of parallel computing is the product of the number of processing


elements used and the parallel runtime. This reflects the total time spent by
each processing element solving the problem.

20
WORKING PROCESS OF COST ANALYSIS
• Parallel cost analysis works in three phases:
• (1) it performs a block-level analysis to estimate the serial costs of the blocks
between synchronization points in the program;
• (2) it then constructs a distributed flow graph (DFG) to capture the parallelism,
the waiting, and idle times at the locations of the distributed system;
• (3) the parallel cost can finally be obtained as the path of maximal cost in the
DFG. We prove the correctness of the proposed parallel cost analysis, and
provide a prototype implementation to perform an experimental evaluation of
the accuracy and feasibility of the proposed analysis.

21
THANK YOU

Team – Parallel & Distributed Computing

22

You might also like