0% found this document useful (0 votes)

21 views

Unit 3

Uploaded by

Nayan Kadhre

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Unit 3

Uploaded by

Nayan Kadhre

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

SAVITRIBAI PHULE PUNE UNIVERSITY

Faculty Orientation Program

On
High Performance Computing
(2019 Course)
Unit III : Parallel Communication
Rushali Patil
Assistant Professor
Army Institute of Technology
Course Objective and Outcome
Course Objective
To illustrate the various techniques to parallelize the
algorithm
Course Outcome
Illustrate data communication operations on various
parallel architecture
Reference Book
“Introduction to Parallel Computing” by
Ananth Grama, Anshul Gupta, Gerorge Karypis, Vipin
Kumar
High Performanc Computing (2019 course)
Syllabus
Basic Communication:
One-to-All Broadcast
All-to-One Reduction
All-to-All Broadcast and Reduction
All-Reduce and Prefix-Sum Operations
All-to-All Personalized Communication
Improving the speed of some communication operations
Circular Shift
Principles of Message Passing Programming
 Blocking and Non Blocking MPI
 Collective Communication using MPI:
 Barrier
 Broadcast

 Reduce

 Scatter

 Gather

High Performanc Computing (2019 course)

Basic Communication Operations:
Introduction
Many interactions in practical parallel programs occur in
well defined patterns involving groups of processors
Efficient implementations of these operations can
improve performance, reduce development effort and
cost, and improve software quality
Efficient implementations must leverage underlying
architecture. For this reason, we refer to specic
architectures here
Time required to communicate a message of size m over
an uncongested network is ts + tmw. This time is will be
considered as a basis for analyses
High Performanc Computing (2019 course)
One-to-All Broadcast and
All-to-One Reduction

One-to-All Broadcast: Single process has a piece of data( of

size m) and it needs to send to all other processes
All-to-One Reduction: Each participating process has data
of size m. These data must be combined through
associative operator and accumulated at a single target
process
Applicatios:
matrix-vector multiplication
Gaussian elimination
shortest paths
vector inner product
High Performanc Computing (2019 course)
One-to-All Broadcast and
All-to-One Reduction

M One-to-All Broadcast M M M

0 1 ... p-1 0 1 ... p-1

One-to-All Reduction

High Performanc Computing (2019 course)

One-to-All Broadcast and
All-to-One Reduction on Rings
Simplest way is to send p-1 messages from the source to the
other p-1 processes
It is inefficient
Source process becomes a bottleneck
Recursive doubling:
Source process sends a message to another processes.
Now both these processes can send the message to two other
processes.
The message can be broadcast in log p steps.
Reduction can be performed in an identical fashion by
inverting the process.
High Performanc Computing (2019 course)
One-to-All Broadcast on Rings
Node 0 is the source of the broadcast. Each message
transfer step is shown by a numbered, dotted arrow from
the source of the message to its destination
3 3
2

7 6 5 4

0 1 2 3

2
3 3
High Performanc Computing (2019 course)
All-to-One Reduction on Rings
All-to-One Reduction Reduction on an eight-node
ring with node 0 as the destination of the reduction
1 1
2

7 6 5 4

0 1 2 3

2
1 1
High Performanc Computing (2019 course)
Broadcast and Reduction: Example
Consider the problem of multiplying a matrix with a vector:
The n × n matrix is assigned to an n × n (virtual) processor grid
The vector is assumed to be on the first row of processors
The first step of the product requires a one-to-all broadcast of the
vector element along the corresponding column of processors
The processors compute local product of the vector element and
the local matrix entry
In the final step, the results of these products are accumulated to
the first row using n concurrent all-to-one reduction operations
along the columns (using the sum operation)

High Performanc Computing (2019 course)

Broadcast and Reduction: Matrix-Vector
Multiplication Example
All-to-one P0 P1 P2 P3
reduction
one-to-all broadcast
P0 P0 P1 P2 P3

P4 ˇ P4 ˇ P5 ˇ P6 ˇP7
P8 ˇ P8 ˇ P9 ˇP10 ˇP11 Matrix
P12 ˇ P12 ˇ P13 ˇ P14 ˇP15
Output Vector

One-to-all broadcast and all-to-one reduction in the

multiplication of a 4 × 4 matrix with a 4 × 1 vector
High Performanc Computing (2019 course)
Broadcast and Reduction on a Mesh
We can view each row and column of a square mesh of
p nodes as a linear array of √p nodes
Broadcast and reduction operations can be performed
in two steps –
 the first step does the operation along a row and
the second step along each column concurrently
This process generalizes to higher dimensions as well

High Performanc Computing (2019 course)

Broadcast and Reduction on a Mesh:
Example
One-to-all broadcast on a 16-node mesh

3 7 11 15

4 4 4 4

2 6 10 14

3 3 3 3

1 5 9 13

4 4 4 4
2 2
0 4 8 12
1
High Performanc Computing (2019 course)
Broadcast and Reduction on a Hypercube
A hypercube with 2d nodes can be regarded as a d
dimensional mesh with two nodes in each dimension
The mesh algorithm can be generalized to a hypercube
and the operation is carried out in d (= log p) steps.

High Performanc Computing (2019 course)

Broadcast and Reduction on a Hypercube:
Example (110)
3
6 7 (111)
(010) (011)
2 3 2
3

2 3
1 4 5
(101)
(100)
0 1
(000) 3 (001)

One-to-all broadcast on a three-dimensional hypercube. The binary

representations of node labels are shown in parentheses
High Performanc Computing (2019 course)
Broadcast and Reduction on a Balanced
Binary Tree
Consider a binary tree in which processors are
(logically) at the leaves and internal nodes are routing
nodes
Assume that source processor is the root of this tree.
In the first step, the source sends the data to the right
child (assuming the source is also the left child). The
problem has now been decomposed into two problems
with half the number of processors.

High Performanc Computing (2019 course)

Broadcast and Reduction on a Balanced
Binary Tree
1

2 2

3 3 3 3
0 1 2 3 4 5 6 7

One-to-all broadcast on an eight-node tree

High Performanc Computing (2019 course)
Cost Analysis
Assume that p processes participate in the operation
and the data to be broadcast or reduced contains m
words
The broadcast or reduction procedure involves lop p
point-to-point simple message transfers, each at a time
cost of ts + tmw
Therefore, total time taken by the procedure is
T= (ts + tmw)log p

High Performanc Computing (2019 course)

All-to-All Broadcast and Reduction
Generalization of one-to-all broadcast in which each
process is the source as well as destination
A process sends the same m-word message to every
other process, but different processes may broadcast
different messages
All-to-all broadcast used in matrix operations
All-to-all Reduction: It is the dual of all-to-all
broadcast

High Performanc Computing (2019 course)

All-to-All Broadcast and Reduction
Mp-1 Mp-1 Mp-1
. . .
. . .
. . .
M1 M1 M1
M0 M1 Mp-1 One-to-All Broadcast M0 M0 M0

0 1 ... p-1 0 1 ... p-1

One-to-All Reduction

High Performanc Computing (2019 course)

All-to-All Broadcast on Rings
1st communication step

1(6) 1(5) 1(4)

7 6 5 4
(7) (4)
(6) (5)

1(7) 1 (3)
(0) (2)
(1) (3)
0 1 2 3

1(0) 1(1) 1(2)

High Performanc Computing (2019 course)

All-to-All Broadcast on Rings
2nd communication step

2(5) 2(4) 2(3)

7 6 5 4
(7,6) (4,3)
(6,5) (5,4)

2(6) 2 (2)
(0,7) (2,1)
(1,0) (3,2)
0 1 2 3

2(7) 2(0) 2(1)

High Performanc Computing (2019 course)

All-to-All Broadcast on Rings
7th communication step

7(0) 7(7) 7(6)

7 6 5 4
(7,6,5,4,3,2,1)
(6,5,4,3,2,1,0) (5,4,3,2,1,0,7) (4,3,2,1,0,7,6)
7(1) 7 (5)
(0,7,6,5,4,3,2) (1,0,7,6,5,4,3) (2,1,0,7,6,5,4) (3,2,1,0,7,6,5)

0 1 2 3

7(2) 7(3) 7(4)

High Performanc Computing (2019 course)

All-to-All Broadcast on Mesh
(7) (8) (6,7,8) (6,7,8)

(6) 6 7 8 (6,7,8) 6 7 8

3 4 5 3 4 5
(3) (4) (5) (3,4,5) (3,4,5) (3,4,5)

0 1 2 0 1 2

(0) (1) (2) (0,1,2) (0,1,2) (0,1,2)

(a) Initial data distribution (b) Data Distribution after rowwise broadcast

High Performanc Computing (2019 course)

Cost Analysis
On a ring, the time is given by:
T= (ts + tmw)(p-1)
On a mesh, the time is given by:
T= 2ts (√p-1) + tmm(p-1)
On a hypercube, we have:
T= ts log p + tmm(p-1)

High Performanc Computing (2019 course)

All-Reduce
In all-reduce, each node starts with a buffer of size m and the
final results of the operation are identical buffers of size m on
each node that are formed by combining the original p buffers
using an associative operator.
Identical to all-to-one reduction followed by a one-to-all
broadcast. This formulation is not the most efficient. Uses the
pattern of all-to-all broadcast, instead. The only difference is
that message size does not increase here. Time for this operation
is (ts + tmw) log p
Different from all-to-all reduction, in which p simultaneous all-
to-one reductions take place, each with a different destination
for the result.
High Performanc Computing (2019 course)
The Prefix-Sum Operation
Given p numbers n0, n1,… np-1 (one on each node), the
problem is to compute the sums sk = 
ni for all k
k
i0

between 0 and p-1

Initially, nk resides on the node labeled k, and at the
end of the procedure, the same node holds sk

High Performanc Computing (2019 course)

The Prefix-Sum Operation

High Performanc Computing (2019 course)

Scatter and Gather
In the scatter operation, a single node sends a unique message of
size m to every other node (also called a one-to-all personalized
communication).
In the gather operation, a single node collects a unique message
from each node.
While the scatter operation is fundamentally different from
broadcast, the algorithmic structure is similar, except for
differences in message sizes (messages get smaller in scatter and
stay constant in broadcast).
The gather operation is exactly the inverse of the scatter operation

High Performanc Computing (2019 course)

One-to-All Broadcast and
All-to-One Reduction
Mp-1
.
.
M1
M0 Scatter M0 M1 Mp-1

0 1 ... p-1 0 1 ... p-1

Gather

High Performanc Computing (2019 course)

All-to-All Personalized Communication
Each node has a distinct message of size m for every
other node
 This is unlike all-to-all broadcast, in which each node
sends the same message to all other nodes
 All-to-all personalized communication is also known
as total exchange
Applications:
Fast Fourier tranform
Matrix transpose
Sample sort
High Performanc Computing (2019 course)
Improving Speed of Some Communication
Operations
1. Splitting and Routing Messages in Parts
1. One-to-All Broadcast
1. scatter
2. all-to-all broadcast
2. All-to-One Reduction
1. all-to-all reduction
2. gather
3. All-Reduce
1. all-to-one reduction
2. one-to-all broadcast
2. All-Port Communication
High Performanc Computing (2019 course)
MPI: Message Passing Interface
MPI: the Message Passing Interface
MPI defines a standard library for message-passing that
can be used to develop portable message-passing
programs using either C or Fortran.
The MPI standard defines both the syntax as well as the
semantics of a core set of library routines.
Vendor implementations of MPI are available on almost
all
commercial parallel computers.
It is possible to write fully-functional message-passing
programs by using only the six routines.
High Performanc Computing (2019 course)
Starting and Terminating the MPI Library
MPI_Init is called prior to any calls to other MPI routines. Its
purpose is to initialize the MPI environment.
 MPI_Finalize is called at the end of the computation, and it
performs various clean-up tasks to terminate the MPI
environment.
 The prototypes of these two functions are:
int MPI_Init(int *argc, char ***argv)
int MPI_Finalize()
All MPI routines, data-types, and constants are prefixed
by “MPI_” The return code for successful completion is
MPI_SUCCESS.
High Performanc Computing (2019 course)
Skeleton of MPI Program
#include <mpi.h>
main( int argc, char** argv )
{
MPI_Init( &argc, &argv );
/* main part of the program */
/*
Use MPI function call depend on your data
partitioning and the parallelization
architecture
*/
MPI_Finalize();
}

High Performanc Computing (2019 course)

A minimal MPI program
#include “mpi.h”
#include <stdio.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
printf(“Hello, world!\n”);
MPI_Finalize();
Return 0;
}

High Performanc Computing (2019 course)

Communicator
A communicator defines a communication domain - a set of
processes that are allowed to communicate with each other.
Information about communication domains is stored in
variables of type MPI_Comm.
Communicators are used as arguments to all message transfer
MPI routines.
 A process can belong to many different (possibly overlapping)
communication domains.
 MPI defines a default communicator called
MPI_COMM_WORLD which includes all the processes.
High Performanc Computing (2019 course)
Querying Information
The MPI_Comm_size and MPI_Comm_rank functions are

used to determine the number of processes and the label of the

calling process, respectively
The calling sequences of these routines are as follows:
int MPI_Comm_size(MPI_Comm comm, int *size)

int MPI_Comm_rank(MPI_Comm comm, int *rank)

The rank of a process is an integer that ranges from zero up to

the size of the communicator minus one

High Performanc Computing (2019 course)
Sample Program
>include <mpi.h#
#include <stdio.h>
int main(int argc, char *argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("I am %d of %d\n", rank, size);
MPI_Finalize();
return 0;
}

High Performanc Computing (2019 course)

Sending and Receiving Messages (Blocking)
The basic functions for sending and receiving messages

in MPI are the MPI_Send and MPI_Recv, respectively

 The calling sequences of these routines are as follows:
int MPI_Send(void *buf, int count, MPI_Datatype

datatype,int dest, int tag, MPI_Comm comm)

int MPI_Recv(void *buf, int count, MPI_Datatype datatype,

int source, int tag, MPI_Comm comm, MPI_Status *status)

High Performanc Computing (2019 course)

Basic MPI Datatypes
MPI datatype C datatype

MPI_CHAR signed char

MPI_UNSIGNED_CHAR unsigned char
MPI_SHORT signed short
MPI_UNSIGNED_SHORT unsigned short
MPI_INT signed int
MPI_UNSIGNED unsigned int
MPI_LONG signed long
MPI_UNSIGNED_LONG unsigned long
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double

High Performanc Computing (2019 course)

MPI_Status
MPI Structure:
typedef struct MPI_Status{
int MPI_SOURCE;
int MPI_TAG;
int MPI_ERROR;
}
int MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int
*count)
It takes as arguments the
 status returned by MPI_Recv and
 the type of received data in datatype and
 the no. of entries that were actually received in the count variable
High Performanc Computing (2019 course)
Non-Blocking Communication
Nonblocking communications are useful for overlapping
communication with computation
int MPI_Isend(const void *buf, int count, MPI_Datatype
datatype, int dest, int tag, MPI_Comm comm, MPI_Request
*request)
int MPI_Irecv(void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm, MPI_Request *
request)
To check the completion of non-blocking send and
receive operations, MPI provides MPI_Test and MPI_Wait

High Performanc Computing (2019 course)

MPI Collective Communication Operations
Barrier
Broadcast
Reduction
Prefix
Gather
Scatter

High Performanc Computing (2019 course)

MPI_Barrier
int MPI_Barrier( MPI_Comm comm );
MPI_SUCCESS No error; MPI routine completed
successfully.
MPI_ERR_COMM Invalid communicator

High Performanc Computing (2019 course)

MPI_Barrier Example
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int rank, nprocs;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Brrier(MPI_COMM_WORLD);
    printf("Hello, world. I am %d of %d\n", rank, nprocs);
fflush(stdout);
    MPI_Finalize();
    return 0;
}

High Performanc Computing (2019 course)

MPI_Bcast
int MPI_Bcast( void *buffer, int count,
MPI_Datatype datatype, int root, MPI_Comm
comm );
Parameters
buffer[in/out] starting address of buffer (choice)
count[in] number of entries in buffer (integer)
datatype[in] data type of buffer (handle)
root[in] rank of broadcast root (integer)
comm[in] communicator (handle)
MPI_Bcast Example
void my_bcast(void* data, int count, MPI_Datatype datatype, int root, MPI_Comm
communicator)
{
int world_rank;
MPI_Comm_rank(communicator, &world_rank);
int world_size;
MPI_Comm_size(communicator, &world_size);
if (world_rank == root)
{ // If we are the root process, send our data to everyone
int i; for (i = 0; i < world_size; i++)
{
if (i != world_rank)
{
MPI_Send(data, count, datatype, i, 0, communicator);
}
}
} else
{ // If we are a receiver process, receive the data from the root
MPI_Recv(data, count, datatype, root, 0, communicator, MPI_STATUS_IGNORE);
}
}
MPI_Reduce
int MPI_Reduce( void *sendbuf, void *recvbuf, int
count, MPI_Datatype datatype, MPI_Op op, int
root, MPI_Comm comm );
Parameters
sendbuf[in] address of send buffer (choice)
recvbuf[out] address of receive buffer (choice, significant
only at root)
count[in] number of elements in send buffer (integer)
datatype[in] data type of elements of send buffer
(handle)
op[in] reduce operation (handle)
root[in] rank of root process (integer)
comm[in] communicator (handle)
MPI Reduction Operations
MPI_MAX - Returns the maximum element.
MPI_MIN - Returns the minimum element.
MPI_SUM - Sums the elements.
MPI_PROD - Multiplies all elements.
MPI_LAND - Performs a logical and across the elements.
MPI_LOR - Performs a logical or across the elements.
MPI_BAND - Performs a bitwise and across the bits of the
elements.
MPI_BOR - Performs a bitwise or across the bits of the
elements.
MPI_MAXLOC - Returns the maximum value and the rank of
the process that owns it.
MPI_MINLOC - Returns the minimum value and the rank of
the process that owns it.
MPI_Reduce Example
float *rand_nums = NULL;
rand_nums = create_rand_nums(num_elements_per_proc); // Sum the numbers locally
float local_sum = 0;
int i;
for (i = 0; i < num_elements_per_proc; i++)
{
local_sum += rand_nums[i];
}
// Print the random numbers on each process
printf("Local sum for process %d - %f, avg = %f\n", world_rank, local_sum, local_sum /
num_elements_per_proc);
// Reduce all of the local sums into the global sum float global_sum;
MPI_Reduce(&local_sum, &global_sum, 1, MPI_FLOAT, MPI_SUM, 0,
MPI_COMM_WORLD);
// Print the result
if (world_rank == 0)
{
printf("Total sum = %f, avg = %f\n", global_sum, global_sum / (world_size *
num_elements_per_proc));
}
MPI_Allreduce
int MPI_Allreduce( void *sendbuf, void *recvbuf,
int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm );
Parameters
sendbuf[in] starting address of send buffer (choice)
recvbuf[out] starting address of receive buffer (choice)
count[in] number of elements in send buffer (integer)
datatype[in] data type of elements of send buffer
(handle)
op[in] operation (handle)
comm[in] communicator (handle)
MPI_Gather
int MPI_Gather( void *sendbuf, int sendcnt, MPI_Datatype
sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype,
int root, MPI_Comm comm );
Parameters
sendbuf[in] starting address of send buffer (choice)
sendcount[in] number of elements in send buffer (integer)
sendtype[in] data type of send buffer elements (handle)
recvbuf[out] address of receive buffer (choice, significant only
at root)
recvcount[in] number of elements for any single receive (integer,
significant only at root)
recvtype[in] data type of recv buffer elements (significant only at
root) (handle)
root[in] rank of receiving process (integer)
comm[in] communicator (handle)
MPI_Gather Example
#include<stdio.h> #include<stdlib.h> #include<mpi.h>
int main (int argc, char **argv) {
int myrank, size,i;
int *recvbuffer;
int sendbuffer[2];
int recvbufflen = 0;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &size); /*Initialize Send buffer*/
for(i=0;i<2;i++) sendbuffer[i] = i*2; /*Only Process 0 allocates memory for Recvbuffer*/
if (myrank==0){
recvbufflen = 2*size; recvbuffer = (int*)malloc(recvbufflen * sizeof(int));
}
MPI_Gather(sendbuffer,2,MPI_INT,recvbuffer,2,MPI_INT,0,MPI_COMM_WORLD);
if (myrank==0){
for(i=0;i<recvbufflen;i++){
printf(“recvbuffer[%d]=%d\n”,i,recvbuffer[i]);
}}
MPI_Finalize();
return 0;}
MPI_Allgather
int MPI_Allgather( void *sendbuf, int sendcount,
MPI_Datatype sendtype, void *recvbuf, int recvcount,
MPI_Datatype recvtype, MPI_Comm comm );
Parameters
sendbuf[in] starting address of send buffer (choice)
sendcount[in] number of elements in send buffer (integer)
sendtype[in] data type of send buffer elements (handle)
recvbuf[out] address of receive buffer (choice)
recvcount[in] number of elements received from any
process (integer)
recvtype[in] data type of receive buffer elements (handle)
comm[in] communicator (handle)
MPI_Scatter
int MPI_Scatter( void *sendbuf, int sendcnt, MPI_Datatype
sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype,
int root, MPI_Comm comm );
Parameters
 sendbuf[in] address of send buffer (choice, significant only at root)
 sendcount[in] number of elements sent to each process (integer,
significant only at root)
 sendtype[in] data type of send buffer elements (significant only
at root) (handle)
 recvbuf[out] address of receive buffer (choice)
 recvcount[in] number of elements in receive buffer (integer)
 recvtype[in] data type of receive buffer elements (handle)
 root[in] rank of sending process (integer)
 comm[in] communicator (handle)
MPI_Scatter Example
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define SIZE 4
int main (int argc, char *argv[]) {
int numtasks, rank, sendcount, recvcount, source;
float sendbuf[SIZE][SIZE] =
{ {1.0, 2.0, 3.0, 4.0}, {5.0, 6.0, 7.0, 8.0}, {9.0, 10.0, 11.0, 12.0}, {13.0, 14.0, 15.0, 16.0} };
float recvbuf[SIZE];
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks == SIZE)
{
source = 1;
sendcount = SIZE;
recvcount = SIZE;
MPI_Scatter(sendbuf,sendcount,MPI_FLOAT,recvbuf,recvcount, MPI_FLOAT,source,MPI_COMM_WORLD);
printf("rank= %d Results: %f %f %f %f\n",rank,recvbuf[0], recvbuf[1],recvbuf[2],recvbuf[3]);
}
Else
printf("Must specify %d processors. Terminating.\n",SIZE);
MPI_Finalize();
}
MPI_Alltoall
int MPI_Alltoall( void *sendbuf, int sendcount,
MPI_Datatype sendtype, void *recvbuf, int recvcount,
MPI_Datatype recvtype, MPI_Comm comm );
Parameters
sendbuf[in] starting address of send buffer (choice)
sendcount[in] number of elements to send to each process
(integer)
sendtype[in] data type of send buffer elements (handle)
recvbuf[out] address of receive buffer (choice)
recvcount[in] number of elements received from any process
(integer)
recvtype[in] data type of receive buffer elements (handle)
comm[in] communicator (handle)
All-to-All Using MPI (Examples)
One-dimensional Matrix-Vector Multiplication
Single-Source Shortest Path
Sample Sort

High Performanc Computing (2019 course)

Thank You

High Performanc Computing (2019 course)

chap4_selected_slides
No ratings yet
chap4_selected_slides
54 pages
Decode HPC
No ratings yet
Decode HPC
68 pages
Parallel Computing - Unit II - NLAL
No ratings yet
Parallel Computing - Unit II - NLAL
84 pages
Lecture 14 Basic Communication Operations.pptx
No ratings yet
Lecture 14 Basic Communication Operations.pptx
40 pages
Parallel Computing Communication Operations Slides
No ratings yet
Parallel Computing Communication Operations Slides
71 pages
HPC UNIT 3 To UNIT 6 Technical-Merged
No ratings yet
HPC UNIT 3 To UNIT 6 Technical-Merged
143 pages
Unit 3 HPC
No ratings yet
Unit 3 HPC
73 pages
HPC 3rd Unit
No ratings yet
HPC 3rd Unit
16 pages
F2 PDF
No ratings yet
F2 PDF
51 pages
Communication Operations
No ratings yet
Communication Operations
70 pages
Lecture 11
No ratings yet
Lecture 11
52 pages
PDC Presntation
No ratings yet
PDC Presntation
9 pages
module 3ppt
No ratings yet
module 3ppt
50 pages
Lecture-18-PDC-BCS-6EF-SMI-Spring-2025
No ratings yet
Lecture-18-PDC-BCS-6EF-SMI-Spring-2025
14 pages
Lecture-17-PDC-BCS-6EF-SMI-Spring-2025
No ratings yet
Lecture-17-PDC-BCS-6EF-SMI-Spring-2025
17 pages
Unit 3 - Parallel Communication
No ratings yet
Unit 3 - Parallel Communication
41 pages
hpc_scaling
No ratings yet
hpc_scaling
56 pages
Basic Communications
No ratings yet
Basic Communications
13 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Lecture-15-PDC-BCS-6EF-SMI-Spring-2025
No ratings yet
Lecture-15-PDC-BCS-6EF-SMI-Spring-2025
27 pages
HPC May
No ratings yet
HPC May
2 pages
May Jun 2023
No ratings yet
May Jun 2023
2 pages
Lecture-16-PDC-BCS-6EF-SMI-Spring-2025
No ratings yet
Lecture-16-PDC-BCS-6EF-SMI-Spring-2025
15 pages
LEC6 parallelAlg-Broadcasting
No ratings yet
LEC6 parallelAlg-Broadcasting
15 pages
HPC Endsem 2024 FlyHigh Services
No ratings yet
HPC Endsem 2024 FlyHigh Services
16 pages
HPC_UNIT_1,2,3 MCQS_INSEM
No ratings yet
HPC_UNIT_1,2,3 MCQS_INSEM
6 pages
High Performance Computing Unit 1-2
No ratings yet
High Performance Computing Unit 1-2
60 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
Lecture 6B - Multicasting
No ratings yet
Lecture 6B - Multicasting
28 pages
Communication
No ratings yet
Communication
24 pages
3.5 Lecture Summary - Coursera
No ratings yet
3.5 Lecture Summary - Coursera
1 page
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
Thakur05-Optimization of Collective Communication Operations in MPICH
No ratings yet
Thakur05-Optimization of Collective Communication Operations in MPICH
18 pages
Content PDF
No ratings yet
Content PDF
14 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
08_1_MPI_Comm_Data_Distributions
No ratings yet
08_1_MPI_Comm_Data_Distributions
60 pages
DAA-Course Pack-BCA_3rdSem (1)
No ratings yet
DAA-Course Pack-BCA_3rdSem (1)
11 pages
12.revision Parallelization
No ratings yet
12.revision Parallelization
30 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
be_computer-engineering_semester-8_2024_may_high-performance-computing-hpc-2019-pattern
No ratings yet
be_computer-engineering_semester-8_2024_may_high-performance-computing-hpc-2019-pattern
2 pages
Mpi
No ratings yet
Mpi
46 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
Distributed UNIT 4
No ratings yet
Distributed UNIT 4
15 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
ENC-2023
No ratings yet
ENC-2023
29 pages
Sardar Patel Institute of Technology: Department of Computer Engineering
No ratings yet
Sardar Patel Institute of Technology: Department of Computer Engineering
3 pages
Handbook HPC 23-24
No ratings yet
Handbook HPC 23-24
18 pages
14.M.E. VLSI Design
No ratings yet
14.M.E. VLSI Design
21 pages
M.Tech (CS) - Syllabus
No ratings yet
M.Tech (CS) - Syllabus
49 pages
(Updated) Mid Question CSE 305
No ratings yet
(Updated) Mid Question CSE 305
3 pages
MPC
No ratings yet
MPC
42 pages
COL380: Introduction To Parallel & Distributed Programming
No ratings yet
COL380: Introduction To Parallel & Distributed Programming
20 pages
Communication and Networking Syllabus
No ratings yet
Communication and Networking Syllabus
41 pages
A Practical Approach To High-Performance Computing Sergei Kurgalin 2024 Scribd Download
100% (3)
A Practical Approach To High-Performance Computing Sergei Kurgalin 2024 Scribd Download
34 pages