0% found this document useful (0 votes)

3 views

slides.41

deal.ii

Uploaded by

rohithvijaykumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

slides.41

deal.ii

Uploaded by

rohithvijaykumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

MATH 676

–
Finite element methods in
scientific computing

Wolfgang Bangerth, Texas A&M University

http://www.dealii.org/ Wolfgang Bangerth

Lecture 41:

Parallelization on a cluster of
distributed memory machines

Part 1: Introduction to MPI

http://www.dealii.org/ Wolfgang Bangerth

Shared memory

In the previous lecture:

●
There was a single address space
●
All parallel threads of execution have access to all data

Advantage:
●
Makes parallelization simpler
Disadvantages:
●
Problem size limited by
– number of cores on your machine
– amount of memory on your machine
– memory bandwidth
●
Need synchronisation via locks
●
Makes it too easy to avoid hard decisions

http://www.dealii.org/ Wolfgang Bangerth

Shared memory

Example:
●
Only one Triangulation, DoFHandler, matrix, rhs vector
●
Multiple threads work in parallel to
– assemble linear system
– perform matrix-vector products
– estimate the error per cell
– generate graphical output for each cell
●
All threads access the same global objects

For examples, see several of the step-xx programs and the

“Parallel computing with multiple processors accessing
shared memory” documentation module
http://www.dealii.org/ Wolfgang Bangerth
Shared vs. distributed memory

This lecture:
●
Multiple machines with their own address spaces
●
No direct access to remote data
●
Data has to be transported explicitly between machines

Advantage:
●
(Almost) unlimited number of cores and memory
●
Often scales better in practice
Disadvantages:
●
Much more complicated programming model
●
Requires entirely different way of thinking

●
Practical difficulties debugging, profiling, ...
http://www.dealii.org/ Wolfgang Bangerth
Distributed memory

Example:
●
One Triangulation, DoFHandler, matrix, rhs vector object
per processor
●
Union of these objects represent global object
●
Multiple programs work in parallel to
– assemble their part of the linear system
– perform their part of the matrix-vector products
– estimate the error on their cells
– generate graphical output for each of their cells
●
Each program only accesses their part of global objects
See step-40/32/42 and the “Parallel computing with multiple
processors using distributed memory” module
http://www.dealii.org/ Wolfgang Bangerth
Distributed memory

There are many ways to do distributed memory

computing:
●
Message passing interface (MPI)

●
Remote procedure calls (RPC)

●
Partitioned global address space (PGAS) languages:
– Unified Parallel C (UPC – an extension to C)
– Coarray Fortran (part of Fortran 2008)
– Chapel, X10, Titanium

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

MPI's model is simple:

●
The “universe” consists of “processes”
●
Typically:
– One single-threaded process per core
– One multi-threaded process per machine

●
Processes can send “messages” to other processes…
●
…but nothing happens if the other side is not listening

Mental model: Sending letters through the mail system

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

MPI's model implies:

●
You can't “just access” data of another process

●
Instead, option 1:
– you need to send a request message
– other side has to pick up message
– other side has to know what to do
– other side has to send a message with the data
– you have to pick up message

●
Option 2:
– depending on phase of program, I know when someone
else needs my data → send it
– I will know who sent me data → go get it
http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

MPI's model implies:

●
You can't “just access” data of another process
●
Instead...

This is bothersome to program. However:

●
It exposes to the programmer what is happening
●
Processes can do other things between sending a
message and waiting for the next
●
Has been shown to scale to >1M processes

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

MPI implementations:
●
MPI is defined as a set of
– functions
– data types
– constants
with bindings to C and Fortran
●
Is not a language on its own
●
Can be compiled by a standard C/Fortran compiler
●
Is typically compiled using a specific compiler wrapper:
mpicc -c myprog.c -o myprog.o
mpiCC -c myprog.cc -o myprog.o
mpif90 -c myprog.f90 -o myprog.o

●
Bindings to many other languages exist
http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

MPI's bottom layer:

●
Send messages from one processor to others
●
See if there is a message from any/one particular process
●
Receive the message

Example (send on process 2 to process 13):

double d = foo();
MPI_Send (/*data=*/&d, /*count=*/1, /*type=*/MPI_DOUBLE,
/*dest=*/13, /*tag=*/42,
/*universe=*/MPI_COMM_WORLD);

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

MPI's bottom layer:

●
Send messages from one processor to others
●
See if there is a message from any/one particular process
●
Receive the message

Example (query for data from process 13):

MPI_Status status;
int message_available;
MPI_Iprobe (/*source=*/13, /*tag=*/42,
/*yesno=*/message_available,
/*universe=*/MPI_COMM_WORLD,
/*status=*/&status);

Note: One can also specify “anywhere”/”any tag”.

http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

MPI's bottom layer:

●
Send messages from one processor to others
●
See if there is a message from any/one particular process
●
Receive the message

Example (receive on process 13):

double d;
MPI_Status status;
MPI_Recv (/*data=*/&d, /*count=*/1, /*type=*/MPI_DOUBLE,
/*source=*/2, /*tag=*/42,
/*universe=*/MPI_COMM_WORLD,
/*status=*/&status);

Note: One can also specify “anywhere”/”any tag”.

http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

MPI's bottom layer:

●
Send messages from one processor to others
●
See if there is a message from any/one particular process
●
Receive the message

Notes:
●
MPI_Send blocks the program: function only returns
when the data is out the door
●
MPI_Recv blocks the program: function only returns when
– a message has come in
– the data is in the final location
●
There are also non-blocking start/end versions
(MPI_Isend, MPI_Irecv, MPI_Wait)

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

MPI's higher layers: Collective operations

●
Internally implemented by sending messages
●
Available operations:
– Barrier
– Broadcast (one item from one to all)
– Scatter (many items from one to all),
– Gather (from all to one), AllGather (all to all)
– Reduce (e.g. sum from all), AllReduce

Note: Collective operations lead to deadlocks if some

processes do not participate!

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

Example: Barrier use for timing (pseudocode)

… do something …
MPI_Barrier (MPI_COMM_WORLD);

std::time_point start = std::now(); // get current time

foo(); // may contain MPI calls
std::time_point end_local = std::now(); // get current time

MPI_Barrier (MPI_COMM_WORLD);
std::time_point end_global = std::now(); // get current time

std::duration local_time = end_local – start;

std::duration global_time = end_global – start;

Note: Different processes will compute different values.

http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

Example: Reduction

parallel::distributed::Triangulation<dim> triangulation;
… create triangulation …

unsigned int my_cells = triangulation.n_locally_owned_cells();

unsigned int global_cells;

MPI_Reduce (&my_cells, &global_cells, MPI_UNSIGNED, 1,

/*operation=*/MPI_SUM,
/*root=*/0,
MPI_COMM_WORLD);

Note 1: Only the root (processor) gets the result.

Note 2: Implemented by (i) everyone sending the root a
message, or (ii) hierarchical reduction on a tree
http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

Example: AllReduce

parallel::distributed::Triangulation<dim> triangulation;
… create triangulation …

unsigned int my_cells = triangulation.n_locally_owned_cells();

unsigned int global_cells;

MPI_Allreduce (&my_cells, &global_cells, MPI_UNSIGNED, 1,

/*operation=*/MPI_SUM,
MPI_COMM_WORLD);

Note 1: All processors now get the result.

Note 2: Can be implemented by MPI_Reduce +
MPI_Broadcast
http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

MPI's higher layers: Communicators

●
MPI_COMM_WORLD denotes the “universe” of all MPI
processes
●
Corresponds to a “mail service” (a communicator)
●
Addresses are the “ranks” of each process in a
communicator

●
One can form subsets of a communicator
●
Forms the basis for collective operations among a subset
of processes
●
Useful if subsets of processors do different tasks

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

MPI's higher layers: I/O

●
Fact: There is a bottleneck if 1,000 machines write to the
file system at the same time

●
MPI provides ways to make this more efficient

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

Also in MPI:
●
“One-sided communication”: directly writing into and
reading from another process's memory space
●
Topologies: mapping network characteristics to MPI
●
Starting additional MPI processes

More information on MPI:

http://www.mpi-forum.org/

http://www.dealii.org/ Wolfgang Bangerth

An MPI example: MatVec

Situation:
●
Multiply a large NxN matrix by a vector of size N
●
Matrix is assumed to be dense

●
Every one of P processors stores N/P rows of the matrix
●
Every processor stores N/P elements of each vector

●
For simplicity: N is a multiple of P

http://www.dealii.org/ Wolfgang Bangerth

An MPI example: MatVec

struct ParallelVector {
unsigned int size;
unsigned int my_elements_begin;
unsigned int my_elements_end;
double *elements;

ParallelVector (unsigned int sz,MPI_Comm comm) {

size = sz;
int comm_size, my_rank;
MPI_Comm_size (comm, &comm_size);
MPI_Comm_rank (comm, &my_rank);
my_elements_begin = size/comm_size*my_rank;
my_elements_end = size/comm_size*(my_rank+1);
elements = new double[my_elements_end-my_elements_begin];
}
};

http://www.dealii.org/ Wolfgang Bangerth

An MPI example: MatVec

struct ParallelSquareMatrix {
unsigned int size;
unsigned int my_rows_begin;
unsigned int my_rows_end;
double *elements;

ParallelSquareMatrix (unsigned int sz,MPI_Comm comm) {

size = sz;
int comm_size, my_rank;
MPI_Comm_size (comm, &comm_size);
MPI_Comm_rank (comm, &my_rank);
my_rows_begin = size/comm_size*my_rank;
my_rows_end = size/comm_size*(my_rank+1);
elements = new double[(my_rows_end-my_rows_begin)*size];
}
};

http://www.dealii.org/ Wolfgang Bangerth

An MPI example: MatVec

What does processor P need:

●
Graphical representation of what P owns:

A x y

●
To compute the locally owned elements of y, processor P
needs all elements of x
http://www.dealii.org/ Wolfgang Bangerth
An MPI example: MatVec

void vmult (A, x, y) {

int comm_size=..., my_rank=...;
for (row_block=0; row_block<comm_size; ++row_block)
if (row_block == my_rank) {
for (col_block=0; col_block<comm_size; ++col_block)
if (col_block == my_rank) {
for (i=A.my_rows_begin; i<A.my_rows_end; ++i)
for (j=A.size/comm_size*col_block; ...)
y.elements[i-y.my_rows_begin] = A[...i,j...] * x[...j...];
} else {
double *tmp = new double[A.size/comm_size];
MPI_Recv (tmp, …, row_block, …);
for (i=A.my_rows_begin; i<A.my_rows_end; ++i)
for (j=A.size/comm_size*col_block; ...)
y.elements[i-y.my_rows_begin] = A[...i,j...] * tmp[...j...];
delete tmp;
}
} else {
MPI_Send (x.elements, …, row_block, …);
}
}

http://www.dealii.org/ Wolfgang Bangerth

An MPI example: MatVec

Analysis of this algorithm

●
We only send data right when we need it:
– receiving processor has to wait
– has nothing to do in the meantime
A better algorithm would:
– send out its data to all other processors
– receive messages as needed (maybe already here)
●
As a general rule:
– send data as soon as possible
– receive it as late as possible
– try to interleave computations between sends/receives

●
We repeatedly allocate/deallocate memory – should set
up buffer only once
http://www.dealii.org/ Wolfgang Bangerth
An MPI example: MatVec

void vmult (A, x, y) {

int comm_size=..., my_rank=...;
for (row_block=0; row_block<comm_size; ++row_block)
if (row_block != my_rank)
MPI_Send (x.elements, …, row_block, …);

col_block = my_rank;
for (i=A.my_rows_begin; i<A.my_rows_end; ++i)
for (j=A.size/comm_size*col_block; ...)
y.elements[i-y.my_rows_begin] = A[...i,j...] * x[...j...];

double *tmp = new double[A.size/comm_size];

for (col_block=0; col_block<comm_size; ++col_block)
if (col_block != my_rank) {
MPI_Recv (tmp, …, row_block, …);
for (i=A.my_rows_begin; i<A.my_rows_end; ++i)
for (j=A.size/comm_size*col_block; ...)
y.elements[i-y.my_rows_begin] = A[...i,j...] * tmp[...j...];
}
delete tmp;
}

http://www.dealii.org/ Wolfgang Bangerth

Message Passing Interface (MPI)

Notes on using MPI:

●
Usually, algorithms need data that resides elsewhere
●
Communication needed

●
Distributed computing lives in the conflict zone between
– trying to keep as much data available locally to avoid
communication
– not creating a memory/CPU bottleneck

●
MPI makes the flow of information explicit
●
Forces programmer to design data structures/algorithms
for communication

●
Typical programs have relatively few MPI calls
http://www.dealii.org/ Wolfgang Bangerth
Message Passing Interface (MPI)

Alternatives to MPI:
●
boost::mpi is nice, but doesn't buy much in practice

●
Partitioned Global Address Space (PGAS) languages like
Co-Array Fortran, UPC, Chapel, X10, …:

Pros:
– offer nicer syntax
– communication is part of the language
Cons:
– typically no concept of “communicators”
– communication is implicit
– encourages poor data structure/algorithm design

http://www.dealii.org/ Wolfgang Bangerth

MATH 676
–
Finite element methods in
scientific computing

Wolfgang Bangerth, Texas A&M University

http://www.dealii.org/ Wolfgang Bangerth

CC
88% (16)
CC
46 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
Message Passing Interface: Parallel Processing Course University of Tehran
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
49 pages
Mpi
No ratings yet
Mpi
67 pages
MiniTool Partition Wizard Crack 12 Key Download Free 2025
No ratings yet
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
Message Passing Interface (MPI)
No ratings yet
Message Passing Interface (MPI)
4 pages
BIg data anslysi
No ratings yet
BIg data anslysi
57 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
An Introduction To MPI: Parallel Programming With The Message Passing Interface
No ratings yet
An Introduction To MPI: Parallel Programming With The Message Passing Interface
48 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
l2
No ratings yet
l2
24 pages
Writing Message Passing Parallel Programs With MPI: Course Notes
No ratings yet
Writing Message Passing Parallel Programs With MPI: Course Notes
80 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
PDC Lecture 16 MPI - Net-New
No ratings yet
PDC Lecture 16 MPI - Net-New
59 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
CS-3006 7 MPI Advanced Topics
No ratings yet
CS-3006 7 MPI Advanced Topics
36 pages
Distributed Memory Programming With MPI: Peter Pacheco
No ratings yet
Distributed Memory Programming With MPI: Peter Pacheco
121 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
Asg 03_MPI
No ratings yet
Asg 03_MPI
8 pages
Chapter 4 - Message-Passing Programming, MPI
No ratings yet
Chapter 4 - Message-Passing Programming, MPI
79 pages
MPI (1)
No ratings yet
MPI (1)
57 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
Lec5 MPI
No ratings yet
Lec5 MPI
28 pages
Mpi p2
No ratings yet
Mpi p2
51 pages
Introduction to MPI Basics
No ratings yet
Introduction to MPI Basics
8 pages
Message Passing Interface (MPI) : Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Message Passing Interface (MPI) : Author: Blaise Barney, Lawrence Livermore National Laboratory
41 pages
MPI: Portable Parallel Programming For Scientific Computing: William Gropp Rusty Lusk Debbie Swider Rajeev Thakur
No ratings yet
MPI: Portable Parallel Programming For Scientific Computing: William Gropp Rusty Lusk Debbie Swider Rajeev Thakur
12 pages
Unit IV
No ratings yet
Unit IV
12 pages
MPI2
No ratings yet
MPI2
3 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
Mpi Programming 2
No ratings yet
Mpi Programming 2
57 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
Message Passing Architecture
No ratings yet
Message Passing Architecture
32 pages
in3200-chap09
No ratings yet
in3200-chap09
56 pages
2-MPI
No ratings yet
2-MPI
13 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
Lecture 15 MPI Summarization
No ratings yet
Lecture 15 MPI Summarization
26 pages
Lecture 13-Derived Datatypes in MPI
No ratings yet
Lecture 13-Derived Datatypes in MPI
33 pages
Unit - 3 - My
No ratings yet
Unit - 3 - My
84 pages
PDC Week 11 Synchronization
No ratings yet
PDC Week 11 Synchronization
6 pages
Cluster Lab session 03
No ratings yet
Cluster Lab session 03
9 pages
Lecture 12-MPI Collective Communication
No ratings yet
Lecture 12-MPI Collective Communication
53 pages
VSS-MPI-2
No ratings yet
VSS-MPI-2
23 pages
08_1_MPI_Comm_Data_Distributions
No ratings yet
08_1_MPI_Comm_Data_Distributions
60 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
Computing LLNL Gov
No ratings yet
Computing LLNL Gov
42 pages
Module 5
No ratings yet
Module 5
9 pages
Module_203_20-_20MPI_20for_20Cluster_20Computing_20Lec
No ratings yet
Module_203_20-_20MPI_20for_20Cluster_20Computing_20Lec
30 pages
Sathish Vadhiyar: Advanced Features of The Message-Passing Interface
No ratings yet
Sathish Vadhiyar: Advanced Features of The Message-Passing Interface
15 pages
[Scientific and Engineering Computation] William Gropp, Ewing L. Lusk, Anthony Skjellum, Rajeev Thakur - Using MPI and Using MPI-2 (1999, The MIT Press)
No ratings yet
[Scientific and Engineering Computation] William Gropp, Ewing L. Lusk, Anthony Skjellum, Rajeev Thakur - Using MPI and Using MPI-2 (1999, The MIT Press)
385 pages
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
From Everand
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
Nathan Metzler
4/5 (2)
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
slides.39
No ratings yet
slides.39
21 pages
slides.41.25
No ratings yet
slides.41.25
15 pages
slides.17
No ratings yet
slides.17
5 pages
Lecture Series6 EOQInvModels
No ratings yet
Lecture Series6 EOQInvModels
31 pages
slides.16
No ratings yet
slides.16
14 pages
Lecture Series6 - EOQInvModels
No ratings yet
Lecture Series6 - EOQInvModels
54 pages
Mms Lab
No ratings yet
Mms Lab
2 pages
20 25 29
No ratings yet
20 25 29
70 pages
RGD AccessAbility Handbook 2015 ForWebFINAL-s PDF
No ratings yet
RGD AccessAbility Handbook 2015 ForWebFINAL-s PDF
32 pages
Learn Python 3 - Hello World Cheatsheet - Codecademy
No ratings yet
Learn Python 3 - Hello World Cheatsheet - Codecademy
5 pages
Get Online With CoD Wii 2022 No Homebrew Homebrew and Dolphin
No ratings yet
Get Online With CoD Wii 2022 No Homebrew Homebrew and Dolphin
9 pages
Anomaly Detection in Social Networks Twitter Bot
No ratings yet
Anomaly Detection in Social Networks Twitter Bot
11 pages
LaTeX Book
No ratings yet
LaTeX Book
1 page
15043C WiFi Aquatimer Control Unit User Manual Melnor, .
No ratings yet
15043C WiFi Aquatimer Control Unit User Manual Melnor, .
14 pages
Next Generation ABAP Runtime Analysis (SAT) - Introduction - SAP Blogs
No ratings yet
Next Generation ABAP Runtime Analysis (SAT) - Introduction - SAP Blogs
21 pages
Sigma Control2
No ratings yet
Sigma Control2
5 pages
Idemia National Digital Id Program Kingdom Morocco Case Study 202206
No ratings yet
Idemia National Digital Id Program Kingdom Morocco Case Study 202206
8 pages
SuperPro 610P
No ratings yet
SuperPro 610P
2 pages
Intel Fpga Product Catalog 22 2
No ratings yet
Intel Fpga Product Catalog 22 2
85 pages
Security: Strategies For Securing Distributed Systems
No ratings yet
Security: Strategies For Securing Distributed Systems
64 pages
Laptop Brands
No ratings yet
Laptop Brands
2 pages
Rdbms Query
No ratings yet
Rdbms Query
33 pages
8086 Program To Add Two Decimal Numbers
100% (1)
8086 Program To Add Two Decimal Numbers
1 page
Les 1.3
No ratings yet
Les 1.3
4 pages
Master of Science in Information Systems - Maryland Smith
No ratings yet
Master of Science in Information Systems - Maryland Smith
14 pages
Omega Manual French
No ratings yet
Omega Manual French
23 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Learn C++ Programming Language
50% (2)
Learn C++ Programming Language
54 pages
Material Bom
No ratings yet
Material Bom
382 pages
Object Detection
No ratings yet
Object Detection
13 pages
CorpusSearch Guide
No ratings yet
CorpusSearch Guide
104 pages
Algorithm Notes Additional Materials
No ratings yet
Algorithm Notes Additional Materials
17 pages
Deepak Cs Project
No ratings yet
Deepak Cs Project
25 pages
Imran Noor Network Engr..
No ratings yet
Imran Noor Network Engr..
3 pages
VLSI Physical Design - STA Interview Question Part 5
No ratings yet
VLSI Physical Design - STA Interview Question Part 5
2 pages
CTOOD
No ratings yet
CTOOD
16 pages

slides.41

Uploaded by

slides.41

Uploaded by

MATH 676

Wolfgang Bangerth, Texas A&M University

http://www.dealii.org/ Wolfgang Bangerth

Part 1: Introduction to MPI

http://www.dealii.org/ Wolfgang Bangerth

In the previous lecture:

http://www.dealii.org/ Wolfgang Bangerth

For examples, see several of the step-xx programs and the

There are many ways to do distributed memory

http://www.dealii.org/ Wolfgang Bangerth

MPI's model is simple:

Mental model: Sending letters through the mail system

http://www.dealii.org/ Wolfgang Bangerth

MPI's model implies:

MPI's model implies:

This is bothersome to program. However:

http://www.dealii.org/ Wolfgang Bangerth

MPI's bottom layer:

Example (send on process 2 to process 13):

http://www.dealii.org/ Wolfgang Bangerth

MPI's bottom layer:

Example (query for data from process 13):

Note: One can also specify “anywhere”/”any tag”.

MPI's bottom layer:

Example (receive on process 13):

Note: One can also specify “anywhere”/”any tag”.

MPI's bottom layer:

http://www.dealii.org/ Wolfgang Bangerth

MPI's higher layers: Collective operations

Note: Collective operations lead to deadlocks if some

http://www.dealii.org/ Wolfgang Bangerth

Example: Barrier use for timing (pseudocode)

std::time_point start = std::now(); // get current time

std::duration local_time = end_local – start;

Note: Different processes will compute different values.

unsigned int my_cells = triangulation.n_locally_owned_cells();

MPI_Reduce (&my_cells, &global_cells, MPI_UNSIGNED, 1,

Note 1: Only the root (processor) gets the result.

unsigned int my_cells = triangulation.n_locally_owned_cells();

MPI_Allreduce (&my_cells, &global_cells, MPI_UNSIGNED, 1,

Note 1: All processors now get the result.

MPI's higher layers: Communicators

http://www.dealii.org/ Wolfgang Bangerth

MPI's higher layers: I/O

http://www.dealii.org/ Wolfgang Bangerth

More information on MPI:

http://www.dealii.org/ Wolfgang Bangerth

http://www.dealii.org/ Wolfgang Bangerth

ParallelVector (unsigned int sz,MPI_Comm comm) {

http://www.dealii.org/ Wolfgang Bangerth

ParallelSquareMatrix (unsigned int sz,MPI_Comm comm) {

http://www.dealii.org/ Wolfgang Bangerth

What does processor P need:

void vmult (A, x, y) {

http://www.dealii.org/ Wolfgang Bangerth

Analysis of this algorithm

void vmult (A, x, y) {

double *tmp = new double[A.size/comm_size];

http://www.dealii.org/ Wolfgang Bangerth

Notes on using MPI:

http://www.dealii.org/ Wolfgang Bangerth

Wolfgang Bangerth, Texas A&M University

http://www.dealii.org/ Wolfgang Bangerth

You might also like