0% found this document useful (0 votes)
69 views

Lecture15 PDF

The document discusses parallel computing paradigms like distributed memory and GPU computing. It introduces MPI (Message Passing Interface) as the standard for exchanging data between processors. MPI uses calls to subroutines to control data exchange between CPUs. The document provides examples of MPI routines like Broadcast and Reduce. It also discusses how to structure Fortran code for MPI and provides an example of using MPI to compute an integral in parallel.

Uploaded by

Daniel Mora
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Lecture15 PDF

The document discusses parallel computing paradigms like distributed memory and GPU computing. It introduces MPI (Message Passing Interface) as the standard for exchanging data between processors. MPI uses calls to subroutines to control data exchange between CPUs. The document provides examples of MPI routines like Broadcast and Reduce. It also discusses how to structure Fortran code for MPI and provides an example of using MPI to compute an integral in parallel.

Uploaded by

Daniel Mora
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to High Performance Scientific Computing

Autumn, 2016

Lecture 15

Imperial College Prasun Ray


London 28 November 2016
Parallel computing paradigms

Distributed memory
•  Each (4-core) chip has its own memory

•  The chips are connected by network ‘cables’

•  MPI coordinates communication between two or more CPUs

Imperial College
London
Parallel computing paradigms

Related approaches:
•  Hybrid programming: mix of shared-memory (OpenMP) and
distributed-memory (MPI) programming

•  GPU’s: Shared memory programming (CUDA or OpenCL)

•  Coprocessors and co-array programming

Imperial College
London
MPI intro
•  MPI: Message Passing Interface

•  Standard for exchanging data between processors

•  Supports Fortran, c, C++

•  Can also be used with Python

Imperial College
London
OpenMP schematic
Program starts with
single master thread

Then, launch parallel Start program


region with multiple
threads.
master thread
Each thread has
access to all FORK
variables introduced
previously Parallel region (4 threads)

Can end parallel JOIN


region if/when
desired and launch
Serial region (1 thread)
parallel regions again
in future as needed

Imperial College
London
MPI schematic
Program starts with
all processes
running
Start program
MPI controls
communication
between processes
Parallel region (4 processes)

Imperial College
London
MPI intro
•  Basic idea: calls to MPI subroutines control data exchange
between processors

•  Example:

call MPI_BCAST(n, 1, MPI_INTEGER,0,MPI_COMM_WORLD,ierr)

This will send the integer n which has size 1 from processor 0 to all
of the other processors.

Imperial College
London
MPI broadcast

P0 data P0 data

P1 P1 data

P2 P2 data

P3 P3 data

Imperial College
London
MPI intro
•  Basic idea: calls to MPI subroutines control data exchange
between processors

•  Example:

call MPI_BCAST(n, 1, MPI_INTEGER,0,MPI_COMM_WORLD,ierr)

This will send the integer n which has size 1 from processor 0 to all
of the other processors.

Generally, need to specify:


•  source and/or destination of message
•  size of data contained in message
•  type of data contained in message (integer, double precision, …)
•  the data itself (or its location)

Imperial College
London
Fortran code structure
! Basic Fortran 90 code structure!
!
!1. Header!
program template!
!
!2. Variable declarations (e.g. integers, real numbers,...)!
!
!3. basic code: input, loops, if-statements, subroutine calls!
print *, 'template code'!
!
!
!4. End program!
end program template!
!
! To compile this code:!
! $ gfortran -o f90template.exe f90template.f90!
! To run the resulting executable: $ ./f90template.exe

Imperial College
London
MPI intro
! Basic MPI + Fortran 90 code structure! See mpif90template.f90
!
!1. Header!
program template!
use mpi!
!
!2a. Variable declarations (e.g. integers, real numbers,...)!
integer :: myid, numprocs, ierr!
!
!2b. Initialize MPI!
call MPI_INIT(ierr)!
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)!
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)!
!
!3. basic code: input, loops, if-statements, subroutine calls!
print *, 'this is proc # ',myid, 'of ', numprocs!
!
!
!4. End program!
call MPI_FINALIZE(ierr)!
end program template!
!
! To compile this code:!
! $ mpif90 -o mpitemplate.exe mpif90template.f90!
! To run the resulting executable with 4 processes:$ mpiexec -n 4 mpitemplate.exe
Imperial College
London
MPI intro
! Basic MPI + Fortran 90 code structure! See mpif90template.f90
!
!1. Header!
program template!
use mpi!
!
!2a. Variable declarations (e.g. integers, real numbers,...)!
integer :: myid, numprocs, ierr!
!
!2b. Initialize MPI!
call MPI_INIT(ierr)!
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)!
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)!
!
!3. basic code: input, loops, if-statements, subroutine calls!
print *, 'this is proc # ',myid, 'of ', numprocs!
!
!
!4. End program!
call MPI_FINALIZE(ierr)!
end program template!
!
! To compile this code:!
! $ mpif90 -o mpitemplate.exe mpif90template.f90!
! To run the resulting executable with 4 processes:$ mpiexec -n 4 mpitemplate.exe
Imperial College
London
MPI intro
! Basic MPI + Fortran 90 code structure! See mpif90template.f90
!
!1. Header!
program template!
use mpi!
!
!2a. Variable declarations (e.g. integers, real numbers,...)!
integer :: myid, numprocs, ierr!
!
!2b. Initialize MPI!
call MPI_INIT(ierr)!
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)!
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)!
!
!3. basic code: input, loops, if-statements, subroutine calls!
print *, 'this is proc # ',myid, 'of ', numprocs!
!
!
!4. End program!
call MPI_FINALIZE(ierr)!
end program template!
!
! To compile this code:!
! $ mpif90 -o mpitemplate.exe mpif90template.f90!
! To run the resulting executable with 4 processes:$ mpiexec -n 4 mpitemplate.exe
Imperial College
London
MPI intro
•  Compile + run:

$ mpif90 -o mpif90template.exe mpif90template.f90!


!
$ mpiexec -n 4 mpif90template.exe!
this is proc # 0 of 4!
this is proc # 3 of 4!
this is proc # 1 of 4!
this is proc # 2 of 4!

Note: The number of processes specified with mpiexec can be


larger than the number of cores on your machine, but then tasks are
run sequentially.

Imperial College
London
MPI+Fortran example: computing an integral

•  Estimate integral
with midpoint rule,

Imperial College
London
MPI+Fortran quadrature
Two most important tasks:

1.  Decide how many intervals per processor

2.  Each processor will compute its own partial sum, sum_proc,
how do we compute sum(sum_proc)?

Imperial College
London
MPI+Fortran quadrature
Two most important tasks:

1.  Decide how many intervals per processor

2.  Each processor will compute its own partial sum, sum_proc,
how do we compute sum(sum_proc)?

•  N = number of intervals

•  numprocs = number of processors

•  Need to compute Nper_proc: intervals per processor

Imperial College
London
MPI+Fortran quadrature
•  N = number of intervals

•  numprocs = number of processors

•  Need to compute Nper_proc: intervals per processor

§  Basic idea: if N = 8 * numprocs, Nper_proc = 8

§  But, if N <= numprocs, N/numprocs = 0

Nper_proc = (N + numprocs – 1)/numprocs

Imperial College
London
MPI+Fortran quadrature
Two most important tasks:

1.  Decide how many intervals per processor

2.  Each processor will compute its own partial sum, sum_proc,
how do we compute sum(sum_proc)?

Use MPI_REDUCE

Imperial College
London
MPI reduce

P0 data1 Reduction P0 result

P1 data2 P1

P2 data3 P2

P3 data4 P3

Imperial College
London
MPI+Fortran quadrature
Two most important tasks:

1.  Decide how many intervals per processor

2.  Each processor will compute its own partial sum, sum_proc,
how do we compute sum(sum_proc)?

•  Use MPI_REDUCE

•  Reduction options: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD

Imperial College
London
MPI+Fortran quadrature
Two most important tasks:

1.  Decide how many intervals per processor

2.  Each processor will compute its own partial sum, sum_proc,
how do we compute sum(sum_proc)?

•  Use MPI_REDUCE

•  Reduction options: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD

•  For quadrature, we need MPI_SUM

Imperial College
London
MPI+Fortran quadrature

For quadrature, we need MPI_SUM:

call MPI_REDUCE(data, result, 1, MPI_DOUBLE_PRECISION,


0,MPI_COMM_WORLD,ierr)

This will:

1.  Collect the double precision variable data which has size 1 from
each processor.

2.  Compute the sum (because we have chosen and store


the value in result on processor 0.

Note: Only processor 0 will have the final sum. With


MPI_ALLREDUCE, the result will be on every processor.

Imperial College
London
MPI+Fortran quadrature
midpoint_p.f90: distribute data

!set number of intervals per processor!


Nper_proc = (N + numprocs - 1)/numprocs!
!
!starting and ending points for processor!
istart = myid * Nper_proc + 1!
iend = (myid+1) * Nper_proc!
if (iend>N) iend = N!
!

Imperial College
London
MPI+Fortran quadrature
midpoint_p.f90: 1. distribute data, 2. compute sum_proc

!set number of intervals per processor!


Nper_proc = (N + numprocs - 1)/numprocs!
!
!starting and ending points for processor!
istart = myid * Nper_proc + 1!
iend = (myid+1) * Nper_proc!
if (iend>N) iend = N!
!
!loop over intervals computing each interval's contribution to
integral!
do i1 = istart,iend!
xm = dx*(i1-0.5) !midpoint of interval i1!
call integrand(xm,f)!
sum_i = dx*f!
sum_proc = sum_proc + sum_i !add contribution from interval
to total integral!
end do!

Imperial College
London
MPI+Fortran quadrature
midpoint_p.f90: 1. distribute data, 2. compute sum_proc, 3. reduction

!set number of intervals per processor!


Nper_proc = (N + numprocs - 1)/numprocs!
!
!starting and ending points for processor!
istart = myid * Nper_proc + 1!
iend = (myid+1) * Nper_proc!
if (iend>N) iend = N!
!
!loop over intervals computing each interval's contribution to integral!
do i1 = istart,iend!
xm = dx*(i1-0.5) !midpoint of interval i1!
call integrand(xm,f)!
sum_i = dx*f!
sum_proc = sum_proc + sum_i !add contribution from interval to
total integral!
end do!
!collect double precision variable, sum, with size 1 on process 0 using
the MPI_SUM option!
call MPI_REDUCE(sum_proc,sum,1,MPI_DOUBLE_PRECISION,MPI_SUM,
0,MPI_COMM_WORLD,ierr)
Imperial College
London
MPI+Fortran quadrature
Compile and run:

$ mpif90 -o midpoint_p.exe midpoint_p.f90!


!
$ mpiexec -n 2 midpoint_p.exe !
number of intervals = 1000!
number of procs = 2!
Nper_proc= 500!
The partial sum on proc # 0 is: 1.8545905426699112 !
The partial sum on proc # 1 is: 1.2870021942532193 !
N= 1000!
sum= 3.1415927369231307 !
error= 8.3333337563828991E-008!
!

Imperial College
London
Other collective operations
•  Scatter and gather

Imperial College
London
MPI scatter

P0 [f1,f2,f3,f4] P0 f1

P1 P1 f2

P2 P2 f3

P3 P3 f4

Imperial College
London
MPI gather

P0 [f1,f2,f3,f4] P0 f1

P1 P1 f2

P2 P2 f3

P3 P3 f4

Imperial College
London
k 1
Ti+1 2Tik + Tik 11
Other Other collective
collective
2
= operations
Si
operations
x
2
x 1 k 1
•  •  Scatter
Scatter and k and gather
Ti gather
= Si + Ti+1 + Tik 11
2 2
dT •  Gather
•  iGather allTparticles
all particles 2Ton
i+ Ti 1
processor
i+1 on processor
=•  S
•  Compute i (t)interaction
Compute + interaction
forcesforces
2 for,particles
i =on1,that
for particles 2,
on..., Nprocessor
that
processor
dt x
2 XN
d xi
2
= f (|xi xj |), i = 1, 2, ..., N
dt j=1

•  Avoid
•  Avoid forproblems
for big big problems (why?)
(why?)

Imperial College
Imperial College
London London
MPI collective data movement

Imperial College
London
From Using MPI

You might also like