Introduction to C MPI PM
Introduction to C MPI PM
Preeti Malakar
[email protected]
• Shared memory
– OpenMP, Pthreads, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP
2
Hardware and Network Model
Memory
Core
3
Message Passing Interface (MPI)
• Standard for message passing in a distributed
memory environment
• Efforts began in 1991 by Jack Dongarra, Tony
Hey, and David W. Walker
• MPI Forum
– Version 1.0: 1994
– Version 2.0: 1997
– Version 3.0: 2012
4
MPI Implementations
• MPICH (ANL)
• MVAPICH (OSU)
• Intel MPI
• OpenMPI
5
MPI Processes
Memory
Core
6
MPI Internals
Process Manager
• Start and stop processes in a scalable way
• Setup communication channels for parallel processes
• Provide system-specific information to processes
Job scheduler
• Schedule MPI jobs on a cluster/supercomputer
• Allocate required number of nodes
– Two jobs generally do not run on the same core
• Enforce other policies (queuing etc.)
7
Communication Channels
Communication types
• Blocking
• Non-blocking
9
Getting Started
Function names:
MPI_*
Initialization and Finalization
MPI_Init
• gather information about the parallel job
• set up internal library state
• prepare for communication
MPI_Finalize
• cleanup
11
MPI_COMM_WORLD
12
Communication Scope
Communicator (communication handle)
• Defines the scope
• Specifies communication context
Process
• Belongs to a group
• Identified by a rank within a group
Identification
• MPI_Comm_size – total number of processes in communicator
• MPI_Comm_rank – rank in the communicator
13
MPI_COMM_WORLD
Process identified
by rank/id
3
2
0 4
14
Getting Started
Total
number of
processes
Rank of a
process
MPI Message
• Data and header/envelope
• Typically, MPI communications send/receive messages
Message Envelope
Source: Origin of message
Destination: Receiver of message
Communicator
Tag (0:MPI_TAG_UB)
16
MPI Communication Types
Point-to-point Collective
Point-to-point Communication
• MPI_Send Blocking send and receive
int MPI_Send (const void *buf, int count,
MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm)
SENDER
19
Example 1
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);
Message
// Sender process tag
if (myrank == 0) /* code for process 0 */
{
strcpy (message,"Hello, there");
MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99,
MPI_COMM_WORLD);
}
// Receiver process
else if (myrank == 1) /* code for process 1 */
{
MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);
printf ("received :%s\n", message);
} Message
tag
20
MPI_Status
• Source rank
• Message tag
• Message length
– MPI_Get_count
21
MPI_ANY_*
• MPI_ANY_SOURCE
– Receiver may specify wildcard value for source
• MPI_ANY_TAG
– Receiver may specify wildcard value for tag
22
Example 2
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);
// Sender process
if (myrank == 0) /* code for process 0 */
{
strcpy (message,"Hello, there");
MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99,
MPI_COMM_WORLD);
}
// Receiver process
else if (myrank == 1) /* code for process 1 */
{
MPI_Recv (message, 20, MPI_CHAR, MPI_ANY_SOURCE, 99,
MPI_COMM_WORLD, &status);
printf ("received :%s\n", message);
}
23
MPI_Send (Blocking)
• Implementation-
dependent
• Standard communication
mode
RECEIVER
24
Buffering
27
Other Send Modes
• MPI_Bsend Buffered
– May complete before matching receive is posted
• MPI_Ssend Synchronous
– Completes only if a matching receive is posted
• MPI_Rsend Ready
– Started only if a matching receive is posted
28
Non-blocking Point-to-Point
• MPI_Isend (buf, count, datatype, dest, tag,
comm, request)
• MPI_Irecv (buf, count, datatype, source, tag,
comm, request)
• MPI_Wait (request, status)
0 1
MPI_Isend MPI_Isend Safe
MPI_Recv MPI_Recv
29
Computation Communication Overlap
0 1
compute
compute MPI_Recv
MPI_Wait compute
compute
30
Collective Communications
• Must be called by all processes that are part of the
communicator
Types
• Synchronization (MPI_Barrier)
• Global communication (MPI_Bcast, MPI_Gather, …)
• Global reduction (MPI_Reduce, …)
31
Barrier
• Synchronization across all group members
• Collective call
• Blocks until all processes have entered the call
• MPI_Barrier (comm)
32
Broadcast
color = rank + 2;
int oldcolor = color;
MPI_Bcast (&color, 1, MPI_INT, 0, MPI_COMM_WORLD);
34
Gather
DATA
• Gathers values from all processes to a
root process A0
PROCESSES
• int MPI_Gather (sendbuf, sendcount, A1
sendtype, recvbuf, recvcount,
A2
recvtype, root, comm)
• Arguments recv* not relevant on non-
root processes A0 A1 A2
35
Example 4
color = rank + 2;
int colors[size];
MPI_Gather (&color, 1, MPI_INT, colors, 1, MPI_INT, 0,
MPI_COMM_WORLD);
if (rank == 0)
for (i=0; i<size; i++)
printf ("color from %d = %d\n", i, colors[i]);
36
Scatter
DATA
• Scatters values to all processes
from a root process A0
PROCESSES
• int MPI_Scatter (sendbuf, A1
sendcount, sendtype, recvbuf,
A2
recvcount, recvtype, root, comm)
• Arguments send* not relevant on
non-root processes A0 A1 A2
• Output parameter – recvbuf
37
Allgather
DATA
PROCESSES
values from all processes A1
• int MPI_Allgather A2
(sendbuf, sendcount,
sendtype, recvbuf,
recvcount, recvtype, A0 A1 A2
comm)
A0 A1 A2
A0 A1 A2
38
Reduce
0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0
2 1 2 3 4 5 6 MAX at root
39
Allreduce
• MPI_Allreduce (inbuf, outbuf, count, datatype, op, comm)
• op: MIN, MAX, SUM, PROD, …
• Combines element in inbuf of each process
• Combined value in outbuf of each process
0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0
2 1 2 3 4 5 6 2 1 2 3 4 5 6 2 1 2 3 4 5 6
MAX
40
Sub-communicator
- Logical subset
- Different contexts
41
MPI_COMM_SPLIT
MPI_Comm_split (MPI_Comm oldcomm, int
color, int key, MPI_Comm *newcomm)
• Collective call
• Logically divides based on color
– Same color processes form a group
– Some processes may not be part of newcomm
(MPI_UNDEFINED)
• Rank assignment based on key
42
Logical subsets of processes
13 00
2 21
5 16
42
10 14
1 …
17
6
11
8 9
0 3 19
15 10
7 31
12
18 52
4 …
How do you assign one color to odd processes and another color to even processes ?
color = rank % 2
43
MPI Programming
Hands-on
How to run an MPI program on a
cluster?
host1:2
host2:2
host3:2
…
45
How to run an MPI program on a
managed cluster/supercomputer?
46
Practice Examples
47
Your Code
• Each sub-directory (a1, a2, a3) has
– sub.sh [Required number of cores mentioned]
– Makefile
– .c
– Edit the .c [Look for “WRITE YOUR CODE HERE”]
48
Assignments
1. Even processes send their data to odd processes
49
Reference Material
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker and Jack Dongarra, MPI - The Complete Reference,
Second Edition, Volume 1, The MPI Core.
• William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI :
portable parallel programming with the message-passing
interface, 3rd Ed., Cambridge MIT Press, 2014.
50