0% found this document useful (0 votes)

2 views

openmp

The document provides an introduction to OpenMP, an API for shared memory parallel programming, detailing its goals, history, and components. It outlines the programming model, threading concepts, directives, and runtime library routines essential for implementing OpenMP. Additionally, it includes examples and resources for further learning about OpenMP programming.

Uploaded by

prash_neo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

openmp

Uploaded by

prash_neo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

INTRODUCTION TO PARALLEL PROGRAMMING

WITH MPI AND OPENMP

March 18-20 2024 Junxian Chew, Michael Knobloch, Ilya Zhukov, Jolanta Zjupa Jülich Supercomputing Centre

Member of the Helmholtz Association

Part I: Introduction to OpenMP

Member of the Helmholtz Association

WHAT IS OPENMP?
Open specifications for Multi-Processing (not implementations)

API (Application Program Interface) for shared memory, explicit, thread based parallelism.

Goals of OpenMP:
Standardization
Ease of Use
Portability (across different platforms)

Three main API components:

Compiler Directives
Runtime Library Routines
Environment Variables

Current version of the specification: 5.2 (November 2021)

https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

Member of the Helmholtz Association March 18-20 2024 Slide 1

BRIEF HISTORY
1997 FORTRAN version 1.0 2013 Version 4.0, thread affinity, SIMD, devices, tasks
1998 C/C++ version 1.0 (dependencies, groups, and cancellation),
improved Fortran 2003 compatibility
1999 FORTRAN version 1.1
2015 Version 4.5, extended SIMD and devices facilities,
2000 FORTRAN version 2.0
task priorities
2002 C/C++ version 2.0
2018 Version 5.0, memory model, base language
2005 First combined version 2.5, memory model, compatibility, allocators, extended task and
internal control variables, clarifications devices facilities
2008 Version 3.0, tasks 2020 Version 5.1, support for newer base languages,
2011 Version 3.1, extended task facilities loop transformations, compare-and-swap,
extended devices facilities
2021 Version 5.2, reorganization of the specification
and improved consistency

Member of the Helmholtz Association March 18-20 2024 Slide 2

LITERATURE
Official Resources
OpenMP Architecture Review Board. OpenMP Application Programming Interface. Version 5.2. Nov. 2021. URL:
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
OpenMP Architecture Review Board. OpenMP Application Programming Interface. Examples. Version 5.1. Aug.
2021. URL: https://www.openmp.org/wp-content/uploads/openmp-examples-5.1.pdf
https://www.openmp.org
Recommended by https://www.openmp.org/resources/openmp-books/
Michael Klemm and Jim Cownie. High Performance Parallel Runtimes. De Gruyter Oldenbourg, 2021. ISBN:
9783110632729. DOI: doi:10.1515/9783110632729
Timothy G. Mattson, Yun He, and Alice E. Koniges. The OpenMP Common Core. Making OpenMP Simple Again.
1st ed. The MIT Press, Nov. 19, 2019. 320 pp. ISBN: 9780262538862
Ruud van der Pas, Eric Stotzer, and Christian Terboven. Using OpenMP—The Next Step. Affinity, Accelerators,
Tasking, and SIMD. 1st ed. The MIT Press, Oct. 13, 2017. 392 pp. ISBN: 9780262534789
Additional Literature
Michael McCool, James Reinders, and Arch Robison. Structured Parallel Programming. Patterns for Efficient
Computation. 1st ed. Morgan Kaufmann, July 31, 2012. 432 pp. ISBN: 9780124159938

Member of the Helmholtz Association March 18-20 2024 Slide 3

LITERATURE
Online Tutorials
https://hpc-tutorials.llnl.gov/openmp/
Older Works (https://www.openmp.org/resources/openmp-books/)
Barbara Chapman, Gabriele Jost, and Ruud van der Pas. Using OpenMP. Portable Shared Memory Parallel
Programming. 1st ed. Scientific and Engineering Computation. The MIT Press, Oct. 12, 2007. 384 pp. ISBN:
9780262533027
Rohit Chandra et al. Parallel Programming in OpenMP. 1st ed. Morgan Kaufmann, Oct. 11, 2000. 231 pp. ISBN:
9781558606715
Michael Quinn. Parallel Programming in C with MPI and OpenMP. 1st ed. McGraw-Hill, June 5, 2003. 544 pp. ISBN:
9780072822564
Timothy G. Mattson, Beverly A. Sanders, and Berna L. Massingill. Patterns for Parallel Programming. 1st ed.
Software Patterns. Sept. 15, 2004. 384 pp. ISBN: 9780321228116

Member of the Helmholtz Association March 18-20 2024 Slide 3

OPENMP PROGRAMMING MODEL: FORK - JOIN

Thread Based - Explicit - Nested Parallelism - Dynamic Threads - no I/O

Member of the Helmholtz Association March 18-20 2024 Slide 4

THREADS & TASKS
Thread
Smallest sequence of programmed instructions or an execution entity that can be managed independently by a
scheduler (which is typically a part of the operating system).

OpenMP Thread
A thread that is managed by the OpenMP runtime system.

Team
A set of one or more threads participating in the execution of a parallel region.

Task
A specific instance of executable code and its data environment that the OpenMP imlementation can schedule for
execution by threads.

Member of the Helmholtz Association March 18-20 2024 Slide 5

PROGRAM
Base Language
A programming language that serves as the foundation of the OpenMP specification.

The following base languages are given in [OpenMP-5.2, 1.7]: C90, C99, C11, C18, C++98, C++11, C++14, C++17, C++20,
Fortran 77, Fortran 90, Fortran 95, Fortran 2003, Fortran 2008, and a subset of Fortran 2018

Base Program
A program written in the base language.

OpenMP Program
A program that consists of a base program that is annotated with OpenMP directives or that calls OpenMP API
runtime library routines.

Internal Control Variables (ICVs)

ICVs are initialized by the implementation. They can be set through OpenMP environment variables, through
OpenMP runtime routines, and through directive clauses. The OpenMP program can retrieve the values of ICVs only
through OpenMP runtime routines.
Member of the Helmholtz Association March 18-20 2024 Slide 6
COMPILING & LINKING
Compilers that conform to the OpenMP specification usually accept a command line argument that turns on OpenMP
support, e.g.:

Intel C Compiler OpenMP Command Line Switch

$ icc -qopenmp ...

GNU C Compiler OpenMP Command Line Switch

$ gcc -fopenmp ...

GNU Fortran Compiler OpenMP Command Line Switch

$ gfortran -fopenmp ...

The name of the command line argument is not mandated by the specification and differs from one compiler to
another.

Member of the Helmholtz Association March 18-20 2024 Slide 7

RUNTIME LIBRARY DEFINITIONS
C/C++ Runtime Library Definitions
Runtime library routines and associated types are defined in the <omp.h> header file.

#include <omp.h>
C

Fortran Runtime Library Definitions

Runtime library routines and associated types are defined in either a Fortran include file
F77

include "omp_lib.h"

or a Fortran 90 module
F08

use omp_lib

Member of the Helmholtz Association March 18-20 2024 Slide 8

RUNTIME LIBRARY ROUTINES
OpenMP runtime library routines are used for a variety of purposes, a.o. to set or retrieve ICVs.
All OpenMP runtime routine names start with a lower case 𝑜𝑚𝑝_

omp_get_num_threads()
Returns the number of threads that constitute the team executing a parallel region from which this routine is called.

omp_set_num_threads()
Sets the number of threads that will be used in the following parallel region(s).

omp_get_thread_num()
Returns the thread number of the thread within a team calling this routine.

Member of the Helmholtz Association March 18-20 2024 Slide 9

ENVIRONMENT VARIABLES
OpenMP environment variables set ICVs of an OpenMP programs in the shell:

csh/tcsh
$ setenv ENV_VAR {NUM}

sh/bash
$ export ENV_VAR={NUM}

Names of the environment variables must be upper case

values given to environment variables are case insensitive and may have leading and trailing white space

OMP_NUM_THREADS
Sets number of threads to use in the OpenMP program.

Modifications to environment variables after the OpenMP program has started are ignored by the OpenMP
implementation. ICVs can be however changed through directive clauses and OpenMP runtime routines.

Member of the Helmholtz Association March 18-20 2024 Slide 10

C AND C++ DIRECTIVE FORMAT
#pragma omp directive-name [clause, ...] newline
C

Directives are case-sensitive

Only one directive-name can be specified per directive
Each directive applies to the next statement which must be a structured block
Clauses are optional and can be in any order
Newline is required, and precedes the structured block which is enclosed by the directive.

Structured Block
An executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an
OpenMP construct.

Member of the Helmholtz Association March 18-20 2024 Slide 11

FORTRAN DIRECTIVE FORMAT
F08

sentinel directive-name [clause ...]

Directives are case-insensitive

Fixed Form Sentinels
F08

sentinel = !$OMP | C$OMP | *$OMP

Must start in column 1

The usual line length, white space, continuation and column rules apply
Column 6 is blank for first line of directive, non-blank and non-zero for continuation
Free Form Sentinel
F08

sentinel = !$OMP

The usual line length, white space and continuation rules apply

Member of the Helmholtz Association March 18-20 2024 Slide 12

DIRECTIVES: PARALLEL REGION CONSTRUCT
The fundamental OpenMP: A parallel region is a block of code that will be executed by multiple threads.

#pragma omp parallel [clause ...] newline

structured_block
C

!$OMP PARALLEL [clause ...]

structured_block
F08

!$omp end parallel

Thread that reaches a parallel directive creates a team of threads and becomes the master of the team
Creates a team of threads to execute the parallel region
The code is duplicated and all threads in the team will execute the code contained in the structured block
Inside the region threads are identified by consecutive numbers starting at zero
There is an implied barrier at the end of a parallel section, only the master thread continues past this point
Optional clauses (explained later) can be used to modify behaviour and data environment of the parallel
region

Member of the Helmholtz Association March 18-20 2024 Slide 13

A FIRST OPENMP PROGRAM
#include <stdio.h>
#include <omp.h>

int main(void) {
printf("Hello from your main thread.\n");

#pragma omp parallel

printf("Hello from thread %d of %d.\n", omp_get_thread_num(),
↪ omp_get_num_threads());

printf("Hello again from your main thread.\n");

}
C

Member of the Helmholtz Association March 18-20 2024 Slide 14

A FIRST OPENMP PROGRAM
Program Output
$ gcc -fopenmp -o hello_openmp.x hello_openmp.c
$ ./hello_openmp.x
Hello from your main thread.
Hello from thread 1 of 8.
Hello from thread 0 of 8.
Hello from thread 3 of 8.
Hello from thread 4 of 8.
Hello from thread 6 of 8.
Hello from thread 7 of 8.
Hello from thread 2 of 8.
Hello from thread 5 of 8.
Hello again from your main thread.

Member of the Helmholtz Association March 18-20 2024 Slide 14

A FIRST OPENMP PROGRAM
program hello_openmp
use omp_lib
implicit none

print *, "Hello from your main thread."

!$omp parallel
print *, "Hello from thread ", omp_get_thread_num(), " of ",
↪ omp_get_num_threads(), "."
!$omp end parallel

print *, "Hello again from your main thread."

F08

end program

Member of the Helmholtz Association March 18-20 2024 Slide 14

PARALLEL CONTROL FLOW (IN OPENMP)
Thread 0
program hello_openmp
print *, "Hello..."
!$omp parallel
print *, "Hello..."
!$omp end parallel
print *, "Hello..."
end program
Console

Member of the Helmholtz Association March 18-20 2024 Slide 15

PARALLEL CONTROL FLOW (IN OPENMP)
Thread 0
program hello_openmp
print *, "Hello..."
!$omp parallel
print *, "Hello..."
!$omp end parallel
print *, "Hello..."
end program
Console
Hello from your main thread.

Member of the Helmholtz Association March 18-20 2024 Slide 15

PARALLEL CONTROL FLOW (IN OPENMP)
Thread 0 Thread 1
program hello_openmp program hello_openmp
print *, "Hello..." print *, "Hello..."
!$omp parallel !$omp parallel
print *, "Hello..." print *, "Hello..."
!$omp end parallel !$omp end parallel
print *, "Hello..." print *, "Hello..."
end program end program
Console
Hello from your main thread.

Member of the Helmholtz Association March 18-20 2024 Slide 15

SETTING NUMBER OF THREADS
// ENVIRONMENTAL VARIABLE
$ export OMP_NUM_THREADS={NUM}

#include <stdio.h>
#include <omp.h>

int main(){
printf("master thread: hello world.\n");

omp_set_num_threads({NUM}); // RUNTIME LIBRARY ROUTINE

#pragma omp parallel num_threads({NUM}) // DIRECTIVE CLAUSE
printf("thread %d out of %d threads: hello world. \n",
↪ omp_get_thread_num(),omp_get_num_threads());

return(0);
}
C

Member of the Helmholtz Association March 18-20 2024 Slide 16

Part II: Low-Level OpenMP Concepts

Member of the Helmholtz Association

DATA-SHARING ATTRIBUTES [OpenMP-5.2, 5.1]
Variable
A named data storage block, for which the value can be defined and redefined during the execution of a program.

Private Variable
With respect to a given set of task regions that bind to the same parallel region, a variable for which the name
provides access to a different block of storage for each task region.

Shared Variable
With respect to a given set of task regions that bind to the same parallel region, a variable for which the name
provides access to the same block of storage for each task region.

Member of the Helmholtz Association March 18-20 2024 Slide 17

DATASHARING ATTRIBUTE CLAUSES
PRIVATE Clause declares variables in its list to be private to each thread. A new object of the same type is declared
and referenced once for each thread in the team. Private variables should be assumed to be uninitialized for each
thread. C/C++: private(list), F: PRIVATE(list).

FIRSTPRIVATE Clause equals the private clause with automatic initialization of the listed variables accoding to the
value of their original objects prior to entry into the parallel or work-sharing construct. C/C++: firstprivate(list), F:
FIRSTPRIVATE(list).

LASTPRIVATE Clause equals the private clause with a copy from the last loop iteration or section to the original
variable object. C/C++: lastprivate(list), F: LASTPRIVATE(list).

SHARED Clause declares variables in its list to be shared among all threads in the team. A shared variable exists in
only one memory location and all threads can read or write to that address. C/C++: shared(list), F: SHARED(list).

DEFAULT Clause specifies a default scope for all variables in the lexical extent of any parallel region.
C/C++: default (shared | none), F: DEFAULT (PRIVATE | FIRSTPRIVATE | SHARED | NONE).

Member of the Helmholtz Association March 18-20 2024 Slide 18

REDUCTION CLAUSE [OpenMP-5.2, 5.5.8]
reduction(reduction-identifier : list)
*

Listed variables are declared private.

At the end of the construct, the original variable is updated by combining the private copies using the operation
given by reduction-identifier.
reduction-identifier may be +, -, *, &, |, ^, &&, ||, min or max (C and C++) or an identifier (C) or an
id-expression (C++)
reduction-identifier may be a base language identifier, a user-defined operator, or one of +, -, *,
.and., .or., .eqv., .neqv., max, min, iand, ior or ieor (Fortran)
Private versions of the variable are initialized with appropriate values

Member of the Helmholtz Association March 18-20 2024 Slide 19

THREAD SYNCHRONIZATION
In MPI, exchange of data between processes implies synchronization through the message metaphor.
In OpenMP, threads exchange data through shared parts of memory.
Explicit synchronization is needed to coordinate access to shared memory.

Data Race
A data race occurs when
multiple threads write to the same memory unit without synchronization or
at least one thread writes to and at least one thread reads from the same memory unit without
synchronization.

Data races result in unspecified program behavior.

OpenMP offers several synchronization mechanism which range from high-level/general to low-level/specialized.

Member of the Helmholtz Association March 18-20 2024 Slide 20

THE BARRIER CONSTRUCT [OpenMP-5.2, 15.3.1]
#pragma omp barrier
C
F08

!$omp barrier

Threads are only allowed to continue execution of code after the barrier once all threads in the current team
have reached the barrier.
A barrier region must be executed by all threads in the current team or none.