0% found this document useful (0 votes)
5 views

OpenMP P1

Uploaded by

thatsarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

OpenMP P1

Uploaded by

thatsarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

SHARED MEMORY

PROGRAMMING WITH
OPENMP
Chapter 5- Pacheco
Roadmap
• Writing programs that use OpenMP.
• Scope of variables.
• Reduction.
• Using OpenMP to parallelize many serial for loops with only small
changes to the source code.
• Task parallelism.
• Explicit thread synchronization.
• Standard problems in shared-memory programming.

2
Motivation
• Multicore CPUs are everywhere
• Servers with over 100 cores today
• Even smartphone CPUs have 8 cores

• Multithreading, natural programming model


• All processors share the same memory
• Threads in a process see same address space
• Many shared-memory algorithms developed

3
OpenMP
• OpenMP is an API for shared-memory parallel programming.
• MP = multiprocessing
• Designed for systems in which each thread or process can potentially
have access to all available memory.
• System is viewed as a collection of cores or CPU’s, all of which have
access to main memory.

4
in med term >
- OpenMP MPIS
- , & : "I
⑦ &
Multi message
processing passing
Interface
OpenMP
• Shared Memory with thread-based parallelism
• Not a new language *
• Compiler directives, library calls and environment variables extend
Y
the base language
• Not automatic parallelization
• user explicitly specifies parallel execution
• compiler does not ignore user directives even if wrong

5
Execution Model
• OpenMP program starts single threaded
• To create additional threads, user starts a parallel region
• additional threads are launched to create a team
• original (master) thread is part of the team
• threads “go away” at the end of the parallel region: usually sleep or spin
• Repeat parallel regions as necessary
• Fork-join model

6
Communicating Among Threads
>
- must
• Shared Memory Model
have
• threads read and write shared variables control
• no need for explicit message passing
*
• use synchronization to protect against race conditions
*
G sirigs
synchronizing
-

difference up[
up tools
lo2 open
5 Shore
Locks
7
Creating Parallel Regions
Syntax:
#pragma omp parallel [clause[[,] clause]… ]
{structured block}
from
#include<omp.h> & libary Choose
Paralism!
int main(){
// serial
#pragma omp parallel {
//code to be executed by each thread
}
// serial
}

8
↑ include <Omp h > .

include <Stdio he .

int main()

E
X
I Pragma Omp Parallel
Printf ("Eid Mrbark"); -Parallel/15% .

return o; g owidbioe ,
23 dbpi-ai %

Compile ! ·
- $ g eid . c - fopen up

Execution ! -

$ /a. out
5
·

!
So i a
Pragmas
• Special preprocessor instructions.
• Typically added to a system to allow behaviors that aren’t part of the
basic C specification.
• Compilers that don’t support the pragmas ignore them.
• Use the parallel directive to begin a parallel section in your
program.
• # pragma omp parallel
Most basic parallel directive.
• The number of threads that run the following structured block of code is determined by
the run-time system.

9
Example

5
!
Parallel part

find threads
I
is
- Private
function to
erauk
,

of threads
Sen :
&

-total
num
"

G &
thread !
& 200 sers
10
Example
$ gcc
$ . /
−g −Wall
omp_hello
−fopenmp
4
−o omp_hello omp_hello.c
-nondesemmens
sign
* compiling thread ,

↑ %9
running with 4 threads

51 545 ,

Hello from thread 0 of 4


Hello from thread 1 of 4
possible
outcomes
Hello from thread 3 of 4 schedulan
-
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 2 of 4
Hello from thread 3 of 4 Hello from thread 1 of 4
Hello from thread 0 of 4
Hello from thread 2 of 4
Hello from thread 0 of 4
Hello from thread 3 of 4
11
Forking/Joining Threads

A process forking and joining two threads

 Fork and Join: Master thread creates a team of threads as needed


be
can
Worker
every
-differ
Thread

mer
Master thread

FORK
FORK

JOIN

JOIN
be
Parallel
regions
· a car el
12
Some Terminology
• In OpenMP parlance, the collection of threads executing the parallel
block — the original thread and the new threads — is called a team
• The original thread is called the master, and the additional threads
are called slaves.

13
Clauses
• A clause is a text that modifies a directive.
• The num_threads clause can be added to a parallel directive.
• It allows the programmer to specify the number of threads that
should execute the following block.
# pragma omp parallel num_threads ( thread_count )

14
Of Note…
• There may be system-defined limitations on the number of threads
that a program can start.
• The OpenMP standard doesn’t guarantee that this will start -
thread_count threads.
• Most current systems can start hundreds or even thousands of
threads.
• Unless we’re trying to start a lot of threads, we will almost always get
the desired number of threads.

15
OpenMP Compiler Support
• What if the compiler doesn’t support OpenMP
• It will just ignore the parallel directive.
• The #include <omp.h> and the calls to omp_get_thread_ num and
omp_get_num_threads will cause errors

• To handle these problems, we can check whether the preprocessor


macro OPENMP is defined.
• If this is defined, we can include omp.h
• if OpenMP isn’t available, we assume that the Hello function will be single-
threaded

16
OpenMP Compiler Support
In case the compiler doesn’t support OpenMP
# include <omp.h>

Check for library Check for functions


# ifdef _OPENMP

J
#ifdef_OPENMP
# include <omp.h> int my_rank = omp_get_thread_num ( );
#endif int thread_count = omp_get_num_threads ( );

P # else
int my_rank = 0;
& ii int thread_count = 1;
gist or
~

# endif
liburg est ov
not !
not ? else s values 65 is] specificales
, 6,0i 17
Example. The Trapezoidal Rule

Serial Algorithm

18
A First OpenMP Version
1) We identified two types of tasks:
a) computation of the areas of individual
trapezoids, and
b) adding the areas of trapezoids.
2) There is no communication among the
tasks in the first collection, but each task
in the first collection communicates with
task 1b.
3) We assumed that there would be many
more trapezoids than cores.
• So, we aggregated tasks by assigning a
contiguous block of trapezoids to each thread
(and a single thread to each core).

·gij i 19
Race Condition

- Unpredictable results when two (or more) threads attempt to simultaneously execute:
global_result += my_result ;
- The value computed by thread 0 (my result = 1) is overwritten by thread 1.
Mutual exclusion ! 00 9! 5 .
.

# pragma omp critical only one thread can execute


global_result += my_result ; the following structured block at a time

20
-
Trapezo de Pragma S, 80S

3 >
- shared-every
threads
Y

can

access
&

3
E
Private sig

(i
go thread has
each Mutual exclusion
&

Pragma it own version


6
pointer 21
↑ S!
Scope Of Variables
• In serial programming, the scope of a variable consists of those parts
of a program in which the variable can be used.

• In OpenMP, the scope of a variable refers to the set of threads that


can access the variable in a parallel block.

22
Scope in OpenMP
• A variable that can be accessed by all the threads in the team has
shared scope.

• A variable that can only be accessed by a single thread has private


scope.

• The default scope for variables declared before a parallel block is


shared.

23
OpenMP Scoping: Rules
• Variables declared outside of a parallel region have a shared scope.
• When a variable can be seen/read/written by all threads in a team, it is said
to have shared scope.
• Variables included in a reduction clause are shared.
• A variable that can be seen by only one thread is said to have private
scope.
• Each thread has a copy of the private variable.
• Loop variables in an omp for are private.
• Local variables in the parallel region are private.

24
OpenMP Scoping: Rules

Code within the parallel region is executed


in parallel on all processors/threads.

25
Recall The Trapezoid Program!

• The call to Local trap can only be executed by one thread at a time
• We force the threads to execute sequentially
• To avoid problem, declare a private variable inside the parallel block and move
the critical section after the function call

26
Reduction Operators
• OpenMP provides a cleaner alternative that also avoids
serializing execution of local_trap using reduction.
• A reduction is a computation that repeatedly applies the same
reduction operator to a sequence of operands in order to get a
single result.
• A reduction operator is a binary operation (such as addition or
multiplication).
• All intermediate results of the operation should be stored in the
same variable: the reduction variable.
27
Reduction Operators

+, *, -, &, |, ˆ, &&, ||

 A reduction clause can be added to a parallel directive.

is identical to

28
ANY QUESTIONS?

You might also like