Mpi Lecture
Mpi Lecture
Message Passing
Interface (MPI)
Irish Centre for High-End Computing
(ICHEC)
www.ichec.ie
MPI Course
Acknowledgments
This course is based on the MPI
course developed by Rolf
Rabenseifner at the HighPerformance Computing-Center
Stuttgart (HLRS), University of
Stuttgart in collaboration with the
EPCC Training and Education Centre,
Edinburgh Parallel Computing
Centre, University of Edinburgh.
MPI Course
MPI_Init()
MPI_Comm_rank()
MPI Course
DAY 2
4.
5.
Coffee/Tea break
12: 30 Lunch
MPI Course
DAY 3
6.
7.
8.
Coffee/Tea break
MPI Course
MPI Course
http://www.epcc.ed.ac.uk/computing/training/document_archive/mpicourse/mpi-course.pdf
MPI Course
(Standard MPI-2)
~course00/MPI-I/examples
Note: The examples of a chapter are only readable after the end of the practical of that
chapter.
MPI Course
data
memory
program
Processor/Process
A processor may
run many
processes
data
data
data
program
program
program
program
distributed
memory
parallel
processors
communication network
MPI Course
data
program
communication network
MPI Course
10
data
data
program
program
program
communication network
MPI Course
myrank=
(size-1)
data
program
11
What is SPMD
Single Program, Multiple Data
Same (sub-)program runs on each
processor
MPI allows also MPMD, i.e., Multiple
Program, ...
but some vendors may be restricted to SPMD
MPMD can be emulated with SPMD
MPI Course
12
Emulation of MPMD
main(int argc, char **argv){
if (myrank < .... /* process should run the ocean model */){
ocean( /* arguments */ );
}else{
weather( /* arguments */ );
}
}
PROGRAM
IF (myrank < ... ) THEN !! process should run the ocean model
CALL ocean ( some arguments )
ELSE
CALL weather ( some arguments )
ENDIF
END
MPI Course
13
Message passing
Messages are packets of data moving between subprograms
Necessary information for the message passing
system:
sending process
source location
source data type
source data size
receiving process
i.e., the ranks
destination location
destination data type
destination buffer size
data
program
communication network
MPI Course
14
Access
A sub-program needs to be connected to a
message passing system
A message passing system is similar to:
phone line
mail box
fax machine
etc.
MPI:
program must be linked with an MPI library
program must be started with the MPI startup tool
MPI Course
15
Point-to-Point
Communication
Simplest form of message passing.
One process sends a message to
another.
Different types of point-to-point
communication:
synchronous send
buffered = asynchronous send
MPI Course
16
Synchronous Sends
The sender gets an information that the
message is received.
Analogue to the beep or okay-sheet of a fax.
beep
ok
MPI Course
17
Buffered = Asynchronous
Sends
Only know when the message has left.
MPI Course
18
Blocking Operations
Some sends/receives may block
until another process acts:
synchronous send operation blocks
until receive is issued;
receive operation blocks until
message is sent.
19
Non-Blocking
Operations
Non-blocking operations return
beep
ok
MPI Course
20
Collective
Communications
Collective communication routines are
higher level routines.
Several processes are involved at a
time.
May allow optimized internal
implementations, e.g., tree based
algorithms
MPI Course
21
Broadcast
A one-to-many
communication.
MPI Course
22
Reduction Operations
Combine data from several processes
to produce a single result.
sum=
?
200
15
10
300
30
MPI Course
23
Barriers
Synchronize processes.
all here?
MPI Course
24
MPI Forum
MPI-1 Forum
25
It also offers:
A great deal of functionality.
Support for heterogeneous parallel architectures.
With MPI-2:
Important additional functionality.
No changes to MPI-1.
MPI Course
26
MPI Overview
MPI_Init()
2. Process model and language bindings
MPI_Comm_rank()
3.
4.
Non-blocking communication
5.
Collective communication
6.
Virtual topologies
7.
Derived datatypes
8.
Case study
MPI Course
27
Header files
C
#include <mpi.h>
Fortran
include mpif.h
MPI Course
28
Fortran:
CALL MPI_XXXXXX( parameter, ...,
IERROR )
MPI Course
Never
forget!
29
language independent,
in several programming languages (C, Fortran, C++ [in MPI-2]).
Output arguments in C:
definition in the standard
usage in your code:
MPI Course
30
Initializing MPI
#include <mpi.h>
int main(int argc, char **argv)
C: int MPI_Init( int *argc, char ***argv)
{
MPI_Init(&argc, &argv);
....
program xxxxx
Fortran: MPI_INIT( IERROR ) implicit none
INTEGER IERROR include mpif.h
integer ierror
call MPI_Init(ierror)
....
Must be first MPI routine that is called.
MPI Course
31
32
Communicator MPI_COMM_WORLD
All processes of an MPI program are members of
the default communicator MPI_COMM_WORLD.
MPI_COMM_WORLD is a predefined handle in
mpi.h and mpif.h.
Each process has its own rank in a communicator:
starting with 0
ending with (size-1)
MPI_COMM_WORLD
1
4
MPI Course
2
3
33
Handles
Handles identify MPI objects.
For the programmer, handles are
predefined constants in mpi.h or mpif.h
example: MPI_COMM_WORLD
predefined values exist only after MPI_Init
was called
values returned by some MPI routines,
to be stored in variables, that are defined as
in Fortran: INTEGER
in C: special MPI typedefs
Handles refer to internal MPI data structures
MPI Course
34
Rank
The rank identifies different processes within a
communicator
The rank is the basis for any work and data distribution.
C: int MPI_Comm_rank( MPI_Comm comm, int *rank)
Fortran:
myrank=0
myrank=1
myrank=2
myrank=
(size-1)
MPI Course
35
Size
How many processes are contained
within a communicator?
C: int MPI_Comm_size( MPI_Comm
comm, int *size)
Fortran:
MPI_COMM_SIZE( comm,
size, ierror)
INTEGER comm, size, ierror
MPI Course
36
Exiting MPI
C: int MPI_Finalize()
Fortran: MPI_FINALIZE( ierror )
INTEGER ierror
Must be called last by all processes.
After MPI_Finalize:
37
MPI Course
38
39
1. MPI Overview
MPI_Init()
MPI_Comm_rank()
4. Non-blocking communication
5. Collective communication
6. Virtual topologies
7. Derived datatypes
8. Case study
MPI Course
40
Messages
A message contains a number of elements of
some particular datatype.
MPI datatypes:
Basic datatype.
Derived datatypes
654
96574
-12
7676
MPI Course
41
MPI Datatype
C datatype
MPI_CHAR
signed char
MPI_SHORT
MPI_INT
signed int
MPI_LONG
MPI_UNSIGNED_CHAR
unsigned char
MPI_UNSIGNED_SHORT
MPI_UNSIGNED
unsigned int
MPI_UNSIGNED_LONG
MPI_FLOAT
float
MPI_DOUBLE
double
MPI_LONG_DOUBLE
long double
MPI_BYTE
MPI_PACKED
MPI Course
42
MPI Datatype
Fortran datatype
MPI_INTEGER
INTEGER
MPI_REAL
REAL
DOUBLE PRECISION
MPI_DOUBLE_PRECISION
MPI_COMPLEX
COMPLEX
MPI_ LOGICAL
LOGICAL
MPI_CHARACTER
CHARACTER(1)
MPI_BYTE
MPI_PACKED
2345
654
96574
count=5
datatype=MPI_INTEGER
-12
7676
INTEGER arr(5)
MPI Course
43
Point-to-Point Communication
Communication between two processes.
Source process sends message to destination process.
Communication takes place within a communicator,
e.g., MPI_COMM_WORLD.
Processes are identified by their ranks in the
communicator.
communicator
1
4
message
destination
source
6
MPI Course
44
Sending a Message
C: int MPI_Send(void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm)
Fortran:
MPI_SEND(BUF, COUNT, DATATYPE, DEST, TAG,
COMM,
IERROR)
<type> BUF(*)
INTEGER COUNT, DATATYPE, DEST, TAG, COMM, IERROR
buf is the starting point of the message with count elements,
each described with datatype.
dest is the rank of the destination process within the
communicator comm.
tag is an additional nonnegative integer piggyback information,
additionally transferred with the message.
The tag can be used by the program to distinguish different types
of messages.
MPI Course
45
Receiving a Message
C: int MPI_Recv(void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm,
MPI_Status *status)
Fortran: MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG,
COMM, STATUS, IERROR)
<type> BUF(*)
INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM
INTEGER STATUS(MPI_STATUS_SIZE), IERROR
buf/count/datatype describe the receive buffer.
Receiving the message sent by process with rank source in
comm.
Envelope information is returned in status.
Output arguments are printed blue-cursive.
Only messages with matching tag are received.
MPI Course
46
47
Wildcards
Receiver can wildcard.
To receive from any source
source = MPI_ANY_SOURCE
48
Communication
Envelope
Envelope information is
returned from MPI_RECV in
status.
C:
status.MPI_SOURCE
status.MPI_TAG
count via MPI_Get_count()
Fortran: status(MPI_SOURCE)
status(MPI_TAG)
count via
MPI_GET_COUNT()
MPI Course
To:
destination rank
item-1
item-2
item-3
item-4
...
item-n
count
elements
49
50
Communication Modes
Send communication modes:
synchronous send MPI_SSEND
buffered [asynchronous] send
MPI_BSEND
standard send
MPI_SEND
Ready send
MPI_RSEND
51
Communication Modes
Sender modes
Definition Definitions
Notes
Synchronous send
MPI_SSEND
Buffered send
MPI_BSEND
Always completes
(unless an error occurs), irrespective of
receiver
Synchronous
MPI_SEND
Ready send
MPI_RSEND
Receive
MPI_RECV
MPI Course
highly dangerous!
52
risk of deadlock
risk of serialization
risk of waiting > idle time
high latency / best bandwidth
MPI Course
53
Message Order
Preservation
Rule for messages on the same connection,
1
4
3
6
MPI Course
54
P0
P1
ping
pong
MPI Course
time
55
rank=1
(dest=1)
(tag=17)
Recv (source=0)
Send (dest=0)
Recv
(tag=23)
(source=1)
if (my_rank==0)
/* i.e., emulated multiple program */
MPI_Send( ... dest=1 ...)
MPI_Recv( ... source=1 ...)
else
MPI_Recv( ... source=0 ...)
MPI_Send( ... dest=0 ...)
fi
MPI Course
56
MPI Course
57
MPI_Init()
MPI_Comm_rank()
Non-blocking communication
5. Collective communication
6. Virtual topologies
7. Derived datatypes
8. Case study
MPI Course
58
Deadlock
Code in each MPI process:
MPI_Ssend(, right_rank, )
MPI_Recv( , left_rank, )
2
6
MPI Course
59
Non-Blocking
Communications
Do some work
latency hiding
60
Non-Blocking Examples
Non-blocking send
MPI_Isend(...)
doing some other work
MPI_Wait(...)
Non-blocking receive
MPI_Irecv(...)
doing some other work
MPI_Wait(...)
MPI Course
61
Non-Blocking Send
Initiate non-blocking send
in the ring example: Initiate non-blocking send to the right
neighbor
Do some work:
in the ring example: Receiving the message from left
neighbor
2
6
4
MPI Course
62
Non-Blocking Receive
Do some work:
2
6
4
MPI Course
63
MPI Course
64
Request Handles
Request handles
are used for non-blocking communication
must be stored in local variables
C: MPI_Request
Fortran: INTEGER
the value
is generated by a non-blocking
communication routine
is used (and freed) in the MPI_WAIT routine
MPI Course
65
Non-blocking Synchronous
Send
C:
MPI_Issend(buf, count, datatype, dest, tag, comm,
OUT &request_handle);
MPI_Wait(INOUT &request_handle, &status);
Fortran:
CALL MPI_ISSEND(buf, count, datatype, dest, tag, comm,
OUT request_handle, ierror)
CALL MPI_WAIT(INOUT request_handle, status, ierror)
buf must not be used between Issend and Wait (in all progr.
languages)
MPI 1.1, page 40, lines 44-45
MPI Course
66
Non-blocking Receive
C:
MPI_Irecv(buf, count, datatype, source, tag, comm,
OUT &request_handle);
MPI_Wait(INOUT &request_handle, &status);
Fortran:
CALL MPI_IRECV (buf, count, datatype, source, tag,
comm,
OUT request_handle, ierror)
CALL MPI_WAIT( INOUT request_handle, status, ierror)
67
non-blocking.
A blocking send can be used with a
non-blocking receive,
and vice-versa.
Non-blocking sends can use any mode
standard MPI_ISEND
synchronous MPI_ISSEND
buffered MPI_IBSEND
ready
MPI_IRSEND
MPI Course
68
C:
Completion
Fortran:
CALL MPI_WAIT( request_handle, status, ierror)
CALL MPI_TEST( request_handle, flag, status, ierror)
one must
WAIT or
loop with TEST until request is completed, i.e., flag ==
1 or .TRUE.
MPI Course
69
Multiple Non-Blocking
Communications
You have several request handles:
Wait or test for completion of one message
MPI_Waitany / MPI_Testany
MPI Course
70
my_rank
0
Init
snd_buf
0
2
sum
0
2
my_rank
my_rank
snd_buf
snd_buf
to avoid deadlocks
sum
to verify the correctness, because
0
1
blocking synchronous send will cause a deadlock
MPI Course
sum
0
71
Initialization: 1
Each iteration:
my_rank
2 3 4 5
snd_buf
1
3
Fortran:
dest
source
C:
dest
source
= mod(my_rank+1,size)
= mod(my_rank1+size,size)
= (my_rank+1) % size;
= (my_rank1+size) % size;
Single
Program !!!
rcv_buf
5
sum
my_rank
my_rank
snd_buf
4
rcv_buf
1
snd_buf
4
rcv_buf
5
sum
sum
MPI Course
72
Advanced Exercises
Irecv instead of Issend
Substitute the IssendRecvWait
method by the IrecvSsendWait
method in your ring program.
Or
Substitute the IssendRecvWait
method by the IrecvIssendWaitall
method in your ring program.
MPI Course
73
Chap.5 Collective
Communication
1. MPI Overview
MPI_Init()
2. Process model and language bindings
MPI_Comm_rank()
5. Collective communication
e.g.,broadcast
6. Virtual topologies
7. Derived datatypes
8. Case study
MPI Course
74
Collective
Communication
75
Characteristics of Collective
Communication
Optimised Communication routines
involving a group of processes
Collective action over a communicator,
i.e. all processes must call the collective
routine.
Synchronization may or may not occur.
All collective operations are blocking.
No tags.
Receive buffers must have exactly the
same size as send buffers.
MPI Course
76
Barrier Synchronization
C: int MPI_Barrier(MPI_Comm comm)
Fortran:
MPI_BARRIER(COMM, IERROR)
INTEGER COMM, IERROR
MPI_Barrier is normally never needed:
MPI Course
77
C:
Broadcast
int MPI_Bcast(void *buf, int count, MPI_Datatype datatype,
int root, MPI_Comm comm)
Fortran:
MPI_Bcast(BUF, COUNT, DATATYPE, ROOT, COMM,
IERROR)
<type> BUF(*)
INTEGER COUNT, DATATYPE, ROOT
INTEGER COMM, IERROR
before
bcast
after
bcast
r ed
r ed
r ed
r ed
r ed
r ed
e.g., root=1
rank of the sending process (i.e., root process)
must be given identically by all processes
MPI Course
78
Scatter
ABCDE
before
scatter
e.g., root=1
ABCDE
after
A
B
C
D
E
scatter
C: int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype
sendtype,
void *recvbuf, int recvcount, MPI_Datatype recvtype,
int root, MPI_Comm comm)
Fortran:
MPI_SCATTER(SENDBUF, SENDCOUNT, SENDTYPE,
RECVBUF,
RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
<type>SENDBUF(*), RECVBUF(*)
INTEGER
SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE
INTEGER
ROOT, COMM, IERROR
MPI Course
79
Gather
e.g., root=1
before
gather
after
gather
AB C D E
C:
int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype
sendtype,
void *recvbuf, int recvcount, MPI_Datatype recvtype,
int root, MPI_Comm comm)
MPI Course
80
Global Reduction
Operations
To perform a global reduce operation across all members of a
group.
d0 o d1 o d2 o d3 o o ds-2 o ds-1
di = data in process rank i
single variable, or
vector
o
= associative operation
Example:
global sum or product
global maximum or minimum
global user-defined operation
MPI Course
81
Example of Global
Reduction
MPI Course
82
Maximum
MPI_MIN
Minimum
MPI_SUM
Sum
MPI_PROD
Product
MPI_LAND
Logical AND
MPI_BAND
Bitwise AND
MPI_LOR
Logical OR
MPI_BOR
Bitwise OR
MPI_LXOR
Logical exclusive OR
MPI_BXOR
Bitwise exclusive OR
MPI_MAXLOC
MPI_MINLOC
Minimum
and location of the 83
MPI
Course
MPI_REDUCE
before MPI_REDUCE
inbuf
result
after
ABC
ABC
DE F
DE F
GH I
GH I
JKL
JKL
MN O
MN O
root=1
AoDoGoJoM
MPI Course
84
User-Defined Reduction
Operations
Operator handles
User-defined operation :
associative
user-defined function must perform the operation vector_A
vector_B
syntax of the user-defined function MPI-1 standard
MPI Course
85
Example
void myPro Complex
(
*in,
d
Complex
*inout, int *len,
MPI_Datatype *dptr )
typedef struct {
double real, imag;
}Complex
int i;
Complex
MPI_Op myOp
MPI_Datatype ctype;
MPI_Type_contiguous( 2, MPI_DOUBLE,
&ctype);
MPI_Type_commit( &ctype );
myPro
MPI_Op_create( d
);
myOp
c;
for (i=0; i< *len; ++i) {
c.real = inout->real*in->real inout->imag*in->imag;
c.imag = inout->real*in->imag +
inout->imag*in->real;
*inout = c;
in++; inout++;
}
, True, &
myOp
MPI Course
86
Variants of Reduction
Operations
MPI_ALLREDUCE
no root,
returns the result in all processes
MPI_REDUCE_SCATTER
result vector of the reduction operation
is scattered to the processes into the real result
buffers
MPI_SCAN
prefix reduction
result at process with rank i :=
reduction of inbuf-values from rank 0 to rank i
MPI Course
87
MPI_ALLREDUCE
before MPI_ALLREDUCE
inbuf
result
after
ABC
ABC
DE F
DE F
GH I
GH I
JKL
JKL
MN O
MN O
AoDoGoJoM
MPI Course
88
MPI_SCAN
before MPI_SCAN
inbuf
result
after
ABC
ABC
DE F
DE F
AoD
GH I
GH I
AoDoG
JKL
JKL
AoDoGoJ
MN O
MN O
AoDoGoJoM
done in parallel
MPI Course
89
Exercise Global
reduction
Rewrite the pass-around-the-ring program to
~course00/MPI-I/examples/fortran/ring.f
~course00/MPI-I/examples/c/ring.c
MPI Course
90
sum=0
sum=1
sum=3
sum=6
sum=10
MPI Course
91
MPI_Init()
MPI_Comm_rank()
7.
Derived datatypes
8. Case study
MPI Course
92
Motivations
93
94
1
2
3
4
5
Even_grou
p
MPI Course
95
MPI_comm_group(MPI_COMM_WORLD, Old_group)
MPI_Group_incl(Old_group, 3, Odd_ranks, &Odd_group)
MPI_Group_incl(Old_group, 3, Even_ranks, &Even_group)
Alternatively
color = modulo(myrank, 2)
MPI_Comm_split(MPI_COMM_WORLD, color, key, &newcomm)
MPI Course
96
Group Management
Group Accessors
MPI_Group_size()
MPI_Group_rank()
Group Constructors
MPI_COMM_GROUP()
MPI_GROUP_INCL()
MPI_GROUP_EXCL()
Group Destructors
MPI_GROUP_FREE(group)
MPI Course
97
Communicator Management
Communicator Accessors
MPI_COMM_SIZE()
MPI_COMM_RANK()
Communicator Constructors
MPI_COMM_CREATE()
MPI_COMM_SPLIT()
Communicator Destructors
MPI_COMM_FREE(comm)
MPI Course
98
Virtual topology
For more complex mapping, mpi routines are availble
Global array
A(1:3000,
1:4000,
1:500) =
6109
words
on
3
x
4
x
5
=
60
processors
process coordinates
0..2,
0..3,
0..4
example:
on process
ic0=2,
ic1=0,
decomposition, e.g.,
301:400)
= 0.1109 words
process coordinates:
Cartesian topologies
Array decomposition:
application program directly
ic2=3
(rank=43)
A(2001:3000, 1:1000,
handled with virtual
handled by the
MPI Course
99
Graphical representation
1000
3000
2000
1000
2000
3000
1
2
4000
3
0
0
10
0
0
5
0
20
30
0
40
Distribution of
processes over the
grid
Distribution of the
Global Array
Coordinate (2, 0, 3)
represents process
number 43
It is being assigned
the cube
A(2001:3000, 1:1000,
301:400)
MPI Course
100
Virtual Topologies
Convenient process naming.
Simplifies writing of code.
Can allow MPI to optimize
communications.
MPI Course
101
102
Topology Types
Cartesian Topologies
each process is connected to its neighbor in a virtual
grid,
boundaries can be cyclic, or not,
processes are identified by Cartesian coordinates,
of course,
communication between any two processes is still
allowed.
Graph Topologies
general graphs,
not covered here.
MPI Course
103
Fortran:
MPI_CART_CREATE(
COMM_OLD,
NDIMS, DIMS, PERIODS,
REORDER, COMM_CART, IERROR)
INTEGER
COMM_OLD, NDIMS, DIMS(*)
LOGICAL
PERIODS(*), REORDER
INTEGER
COMM_CART, IERROR
comm_old
ndims
dims
periods
reorder
= MPI_COMM_WORLD
= 2
= ( 4,
3
)
= ( 1/.true., 0/.false. )
= see next slide
0
(0,0)
3
(1,0)
6
(2,0)
9
(3,0)
1
(0,1)
4
(1,1)
7
(2,1)
10
(3,1)
2
(0,2)
5
(1,2)
8
(2,2)
11
(3,2)
MPI Course
104
Example A 2-dimensional
Cylinder
Ranks and Cartesian process
coordinates in comm_cart
1
(0,1)
2
(0,2)
11
3
(1,0)
4
(1,1)
5
(1,2)
10
6
(2,0)
7
(2,1)
8
(2,2)
MPI Course
9
(3,0)
10
(3,1)
11
(3,2)
105
7
(2,1)
Cartesian Mapping
MappingFunctions
ranks to
process grid coordinates
C: int MPI_Cart_coords(
MPI_Comm comm_cart,
106
Cartesian Mapping
Functions
7
Mapping process grid coordinates to ranks
(2,1)
Fortran: MPI_CART_RANK(COMM_CART,
COORDS, RANK, IERROR)
INTEGER
COMM_CART, COORDS(*)
INTEGER
RANK, IERROR
MPI Course
107
Own coordinates
0
(0,0)
3
(1,0)
6
(2,0)
9
(3,0)
MPI_Cart_rank
1
(0,1)
4
(1,1)
MPI_Cart_coords
2
(0,2)
5
(1,2)
7
(2,1)
10
(3,1)
8
(2,2)
11
(3,2)
Cartesian Mapping
Functions???
Fortran:
DIRECTION, DISP,
COMM_CART,
MPI Course
109
MPI_Cart_shift Example
0
(0,0)
3
(1,0)
6
(2,0)
9
(3,0)
1
(0,1)
4
(1,1)
7
(2,1)
10
(3,1)
2
(0,2)
5
(1,2)
8
(2,2)
11
(3,2)
MPI Course
110
Cartesian Partitioning
Cut a grid up into slices.
A new communicator is produced for each slice.
Each slice can then perform its own collective
communications.
C: int MPI_Cart_sub(MPI_Comm comm_cart, int
*remain_dims,
MPI_Comm *comm_slice)
Fortran:
0
3
6
(0,0)
(1,0)
(2,0)
REMAIN_DIMS,
9
(3,0)
1
(0,1)
4
(1,1)
7
(2,1)
10
(3,1)
2
(0,2)
5
(1,2)
8
(2,2)
11
(3,2)
MPI_CART_SUB( COMM_CART,
COMM_SLICE, IERROR)
INTEGER COMM_CART
LOGICAL REMAIN_DIMS(*)
INTEGER COMM_SLICE, IERROR
MPI Course
111
MPI_Cart_sub Example
1
(0,1)
2
(0,2)
0
(0)
0
(0)
0
(0)
3
(1,0)
4
(1,1)
5
(1,2)
1
(1)
1
(1)
1
(1)
6
(2,0)
7
(2,1)
8
(2,2)
2
(2)
2
(2)
2
(2)
9
(3,0)
10
(3,1)
11
(3,2)
3
(3)
3
(3)
3
(3)
MPI Course
112
MPI Overview
2.
MPI_Init()
Process model and language bindings
MPI_Comm_rank()
3.
4.
Non-blocking communication
5.
Collective communication
6.
Virtual topologies
7. Derived datatypes
8.
Case study
MPI Course
113
MPI Datatypes
Description of the memory layout
of the buffer
for sending
for receiving
Basic types
Derived types
Vectors, structs, others
Built from existing datatypes
MPI Course
114
struct
buff_layout
{
int
i_val[3];
double d_val[5];
} buffer;
array_of_blocklengths[0]=3;
array_of_displacements[0]=0;
array_of_types[1]=MPI_DOUBLE;
array_of_blocklengths[1]=5;
array_of_displacements[1]=;
MPI_Type_struct(2, array_of_blocklengths,
array_of_displacements, array_of_types,
&buff_datatype);
Compiler
MPI_Type_commit(&buff_datatype);
MPI_Send(&buffer, 1, buff_datatype, )
&buffer = the start
address of the data
int
double
MPI Course
115
Derived Datatypes
Maps
Type
displacement of datatype
0
basic datatype 1
displacement of datatype
1
...
...
basic datatype n1
displacement of datatype
n-1
MPI Course
116
Example:
11
12
16
22
20
24
6.36324d+107
basic datatype
MPI_CHAR
MPI_INT
MPI_INT
MPI_DOUBLE
displacement
0
4
8
16
MPI Course
117
Contiguous Data
The simplest derived datatype
Consists of a number of contiguous items of the same datatype
oldtype
newtype
C:
MPI Course
118
Vector Datatype
oldtype
newtype
blocklength = 3 elements per block
stride = 5 (element stride between blocks)
count = 2 blocks
C:
Fortran:
MPI_TYPE_VECTOR(COUNT, BLOCKLENGTH,
STRIDE,
OLDTYPE, NEWTYPE, IERROR)
INTEGER COUNT, BLOCKLENGTH, STRIDE
INTEGER OLDTYPE, NEWTYPE, IERROR
MPI Course
119
MPI_TYPE_VECTOR: AN
EXAMPLE
Sending the first row of a N*M Matrix
C
Fortran
120
Fortran
MPI_Type_vector(5, 1, 4, MPI_INT,
MPI_ROW)
MPI_Type_Commit(MPI_ROW)
MPI_Send(&buf , MPI_ROW)
MPI_Recv(&buf , MPI_ROW)
MPI Course
121
C
MPI_Type_vector(4, 1, 5,
MPI_INT, MPI_COL)
Fortran
MPI_Type_vector(1, 4, 1,
MPI_INT, MPI_COL)
MPI_Type_Commit(MPI_COL)
MPI_Send(buf , MPI_COL)
MPI_Recv(buf , MPI_COL)
MPI Course
122
Fortran
MPI_Type_vector(3, 2, 4, MPI_INT,
MPI_SUBMAT)
MPI_Type_Commit(MPI_SUBMAT)
MPI_Send(&buf ,
MPI_SUBMAT)
MPI_Recv(&buf , MPI_SUBMAT)
MPI Course
123
Struct Datatype
oldtypes
MPI_INT
addr_0
MPI_DOUBLE
addr_1
newtype
block 0
C:
block 1
= 2
= ( 3,
5
)
= ( 0,
addr_1 addr_0 )
= ( MPI_INT, MPI_DOUBLE )
MPI Course
124
array_of_displacements[i] :=
address(block_i) address(block_0)
MPI-1
C:
int MPI_Address(void* location,
MPI_Aint *address)
Fortran: MPI_ADDRESS(LOCATION,
ADDRESS, IERROR)
<type>
LOCATION(*)
INTEGER ADDRESS, IERROR
MPI Course
125
Committing a Datatype
Before a dataytype handle is used in message
passing communication,
it needs to be committed with
MPI_TYPE_COMMIT.
This must be done only once.
C: int MPI_Type_commit(MPI_Datatype *datatype);
Fortran: MPI_TYPE_COMMIT(DATATYPE, IERROR)
INTEGER DATATYPE, IERROR
IN-OUT argument
MPI Course
126
size
:= 6 * size(oldtype)
extent := 8 * extent(oldtype)
better visualization of newtype:
MPI Course
127
C:
int MPI_Type_size(MPI_Datatype datatype, int
*size)
int MPI_Type_extent(MPI_Datatype datatype,
MPI_Aint *extent)
Fortran:
128
MPI Course
129