MPI Matrix Multiplication 1 PDF

Uploaded by

Aliaa Karam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views23 pages

MPI Matrix Multiplication 1 PDF

Uploaded by

Aliaa Karam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

MPI Example

Matrix-Vector Multiplication
Dong Dai ([email protected])
Key Topic

• Understand what a real world MPI application looks like

• Learn some MPI APIs
• Learn how to create and manage communicators
• Understand how data partition would affect the performance
What is Matrix-Vector Multiplication
• It is simply a series of inner product (or dot product) computations
• Illustrated as:
• The sequential algorithms:

Input: a[0…m-1, 0…n-1], matrix mxn

b[0...n-1], vector nx1
Output: c[0…m-1], vector mx1
Algorithm:
For I from 0 to m-1
c[i] <- 0
for j <- 0 to n-1
c[i] <- c[i] + a[i,j]xb[j]
end for
end for
Matrix-vector Multiplication
• It is easy but important

• Often embedded in algorithms solving wide variety of problems.

• Recommendation systems
• Conjugate gradient method
• Neural networks

• So its performance is critical for many applications

• In this lecture, we discuss how to parallelized it to solve really big Matrix
Vector multiplication problem
Data Partition Options
• There are three straightforward ways to decompose an mxn matrix A:
• Rowwise block striping (Horizontal Data Partitioning)
• Columnwise block striping (Vertical Data Partitioning)
• Checkerboard block decomposition (Block Partioning)
Data Partition Options
• There are two natural ways to distribute vector b and c
• The vector elements may be replicated, meaning all the vector elements are
copied on all of the tasks
• Why it is acceptable?
• The vector elements may be divided among some or all of the tasks.
• For instance, in vertical partition, each process only needs a portion of b to calculate
A1. Horizontal Partition + Vector Replicated
• We first try to associate a primitive task with each row of the matrix
A, vector b and c are replicated among the primitive tasks

• In this case:
• Each task needs a set of rows (N/P) and the column vector.
• After the inner product computation done by each task, task i has N/P
element of vector c.
• The vector is then supposed to be replicated. An all-gather step
communicates each task’s element of c to all other tasks.
• The algorithm terminates or is ready for the next iteration.
Implementation: using MPI_Allgatherv

• An all-gather communication
concatenates blocks of a vector
distributed among a group of
processes and copies the resulting
whole vector to all the processes
• Use MPI function MPI_Allgatherv
• If the same number of items is
gathered from each process, the
simpler function MPI_Allgather is
more appropriate.
• But, we can not ensure that in a
general case that all processes
handle the same amount of rows
A2. Vertical Partition + Vector Divided
• We then try to associate a primitive task
with columns of the matrix A, vector b and
c are divided among the primitive tasks

• In this case:
• Each task i multiplies its columns of A by b_i,
• This creates a vector of partial results
• At the end of the computation, task i needs
only a part of result vector c_i for calculating
results
• We need an all-to-all communication
• Each partial results j on task i must be transferred
to task j
Implementation: MPI_Scatterv

• The all-to-all communication

moves the appropriate partial
results to the tasks that will add
them up
• The MPI function MPI_Scatterv
enables a single root process to
distribute a contiguous group of
elements to all of the processes in
a communicator, including itself
• If the same number of data items
is distributed to every process, the
simpler function MPI_Scatter is
appropriate.
A.3 Grid-based Partition + Vector Divided
• In the last case, we associate a
primitive task with a small grid of
element of the matrix and a portion
of vector.
• Key steps
• Redistribute vector b so that each task
has the correct portion of b
• Each task performs a matrix-vector
multiplication with its portions of A and
b
• Tasks in each row of the task gride
perform a sum-reduction on their
portion of c
• After this, c will be redistributed to the
first column of the task grid for the next
iteration
Redistribute vector b
• After calculating c, we need to redistribute its value as b for the next
iteration.

* This can be done by a point-to-point communication + a broadcast

communication
Creating Communicators
• In our grid-based matrix-vector multiplication, we have two collective
communication operations involved:
• Each row of processes in the grid performs an independent sum-reduction,
yielding vector c in the first column of processes.

• Each first-row process broadcasts its block of b to other processes in the same
column of the virtual process grid

• You can implement this by writing the communication code using pt-
to-pt APIs, or we can use collective APIs more efficiently
• The problem is each time we are doing group communication for different
sets of processes
• Solution: Create new communicators, and assign processes to these
communicators
Implementation: MPI_Comm_Split
• Collective function
MPI_Comm_Split
partitions the processes in
an existing communicator
into one or more
subgroups and constructs a
communicator for each of
these new subgroups.
• What is needed in our case is:
• Create a per-row communicator for
conducting sum-reduction
• Create a per-column communicator
for broadcasting subvector b_j
• We can use similar call as this
example shows
Question
• Implement Case b using pseudo code

Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
HPC MPI LAB 1 Vector Addition
No ratings yet
HPC MPI LAB 1 Vector Addition
9 pages
Matrix-Matrix Multiplication
No ratings yet
Matrix-Matrix Multiplication
8 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Lecture07 MPI by Example
No ratings yet
Lecture07 MPI by Example
27 pages
Report - Viber String
No ratings yet
Report - Viber String
26 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
Unit II Matrix Multiplication
No ratings yet
Unit II Matrix Multiplication
23 pages
matrix_mul
No ratings yet
matrix_mul
33 pages
Dense Matrix Algorithms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Dense Matrix Algorithms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
55 pages
Parallel Computing: MPI - Collective Communication
No ratings yet
Parallel Computing: MPI - Collective Communication
52 pages
CP4292-MCAP(1)
No ratings yet
CP4292-MCAP(1)
15 pages
MPI Plamen Krastev
No ratings yet
MPI Plamen Krastev
49 pages
60004210188_RajSingh_HPC_Exp1-7
No ratings yet
60004210188_RajSingh_HPC_Exp1-7
23 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
MPI Lab 3
No ratings yet
MPI Lab 3
18 pages
Nguyen Thanh Nam
No ratings yet
Nguyen Thanh Nam
21 pages
Week12 - L01 and L02
No ratings yet
Week12 - L01 and L02
22 pages
Mpi CG
No ratings yet
Mpi CG
23 pages
08_1_MPI_Comm_Data_Distributions
No ratings yet
08_1_MPI_Comm_Data_Distributions
60 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
77a3d882-bc70-4699-a880-d8bd3ce01411
No ratings yet
77a3d882-bc70-4699-a880-d8bd3ce01411
24 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
VSS-MPI-2
No ratings yet
VSS-MPI-2
23 pages
Lab Assesment 9 Parallel & Distributed Computing (L31+32) : Dated: 16/10/2020 Assessment 9 Muskan Agrawal 18BCE0707
No ratings yet
Lab Assesment 9 Parallel & Distributed Computing (L31+32) : Dated: 16/10/2020 Assessment 9 Muskan Agrawal 18BCE0707
4 pages
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Pdcnotes
No ratings yet
Pdcnotes
23 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
Ee8218 Lab2
No ratings yet
Ee8218 Lab2
7 pages
Assignment 04 (2)
No ratings yet
Assignment 04 (2)
16 pages
Untitled document
No ratings yet
Untitled document
23 pages
lab9pdchhhggffffddd
No ratings yet
lab9pdchhhggffffddd
4 pages
COMP4300 / COMP6430 Assignment 1: Distributed Matrix Multiply Using SUMMA
No ratings yet
COMP4300 / COMP6430 Assignment 1: Distributed Matrix Multiply Using SUMMA
10 pages
Mpi
No ratings yet
Mpi
67 pages
Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix
No ratings yet
Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix
7 pages
High Performance Computing Matrix Mul.
No ratings yet
High Performance Computing Matrix Mul.
15 pages
Fox Example
No ratings yet
Fox Example
2 pages
Cluster Lab session 03
No ratings yet
Cluster Lab session 03
9 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
PDCLabMan Updated
No ratings yet
PDCLabMan Updated
46 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
Inf3380 Oblig2 2011
No ratings yet
Inf3380 Oblig2 2011
3 pages
Unit IV
No ratings yet
Unit IV
12 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
CSE4001 Parallel and Distributed Computing: Lab Assignment 6
No ratings yet
CSE4001 Parallel and Distributed Computing: Lab Assignment 6
8 pages
EXERCISE- 4
No ratings yet
EXERCISE- 4
8 pages
MPI2
No ratings yet
MPI2
3 pages
Assignment Individual - 1 ParallelProg
No ratings yet
Assignment Individual - 1 ParallelProg
6 pages
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
No ratings yet
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
2 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
Gauss
No ratings yet
Gauss
7 pages
EXERCISE- 4[1] (1)
No ratings yet
EXERCISE- 4[1] (1)
8 pages
Assignment PDF
No ratings yet
Assignment PDF
2 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
Final PDC Exam
No ratings yet
Final PDC Exam
10 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Lecture 15 MPI Summarization
No ratings yet
Lecture 15 MPI Summarization
26 pages
As 3
No ratings yet
As 3
2 pages

MPI Matrix Multiplication 1 PDF

Uploaded by

MPI Matrix Multiplication 1 PDF

Uploaded by

MPI Example

• Understand what a real world MPI application looks like

Input: a[0…m-1, 0…n-1], matrix mxn

• Often embedded in algorithms solving wide variety of problems.

• So its performance is critical for many applications

• The all-to-all communication

* This can be done by a point-to-point communication + a broadcast

You might also like