MPI Matrix Multiplication 1 PDF
MPI Matrix Multiplication 1 PDF
Matrix-Vector Multiplication
Dong Dai ([email protected])
Key Topic
• In this case:
• Each task needs a set of rows (N/P) and the column vector.
• After the inner product computation done by each task, task i has N/P
element of vector c.
• The vector is then supposed to be replicated. An all-gather step
communicates each task’s element of c to all other tasks.
• The algorithm terminates or is ready for the next iteration.
Implementation: using MPI_Allgatherv
• An all-gather communication
concatenates blocks of a vector
distributed among a group of
processes and copies the resulting
whole vector to all the processes
• Use MPI function MPI_Allgatherv
• If the same number of items is
gathered from each process, the
simpler function MPI_Allgather is
more appropriate.
• But, we can not ensure that in a
general case that all processes
handle the same amount of rows
A2. Vertical Partition + Vector Divided
• We then try to associate a primitive task
with columns of the matrix A, vector b and
c are divided among the primitive tasks
• In this case:
• Each task i multiplies its columns of A by b_i,
• This creates a vector of partial results
• At the end of the computation, task i needs
only a part of result vector c_i for calculating
results
• We need an all-to-all communication
• Each partial results j on task i must be transferred
to task j
Implementation: MPI_Scatterv
• Each first-row process broadcasts its block of b to other processes in the same
column of the virtual process grid
• You can implement this by writing the communication code using pt-
to-pt APIs, or we can use collective APIs more efficiently
• The problem is each time we are doing group communication for different
sets of processes
• Solution: Create new communicators, and assign processes to these
communicators
Implementation: MPI_Comm_Split
• Collective function
MPI_Comm_Split
partitions the processes in
an existing communicator
into one or more
subgroups and constructs a
communicator for each of
these new subgroups.
• What is needed in our case is:
• Create a per-row communicator for
conducting sum-reduction
• Create a per-column communicator
for broadcasting subvector b_j
• We can use similar call as this
example shows
Question
• Implement Case b using pseudo code