0% found this document useful (0 votes)
109 views

Shared Memory: Openmp Environment and Synchronization

This document discusses key aspects of the OpenMP API for parallel programming including compiler directives, parallel regions, work sharing directives like parallel for and sections, thread identification functions, data environment directives like private and reduction, and synchronization primitives like barriers and critical sections. It provides examples of using various OpenMP constructs for parallelizing loops and implementing common parallel patterns.

Uploaded by

karthik reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Shared Memory: Openmp Environment and Synchronization

This document discusses key aspects of the OpenMP API for parallel programming including compiler directives, parallel regions, work sharing directives like parallel for and sections, thread identification functions, data environment directives like private and reduction, and synchronization primitives like barriers and critical sections. It provides examples of using various OpenMP constructs for parallelizing loops and implementing common parallel patterns.

Uploaded by

karthik reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Shared Memory: OpenMP

Environment and Synchronization


OpenMP API Overview
API is a set of compiler directives inserted in the
source program (in addition to some library functions).
Ideally, compiler directives do not affect sequential
code.
pragma’s in C / C++ .
(special) comments in Fortran code.
API Semantics
Master thread executes sequential code.
Master and slaves execute parallel code.
Note: very similar to fork-join semantics of Pthreads
create/join primitives.
OpenMP Directives
Parallelization directives:
parallel region
parallel for
Data environment directives:
shared, private, threadprivate, reduction, etc.
Synchronization directives:
barrier, critical
General Rules about Directives
They always apply to the next statement, which must
be a structured block.
Examples
#pragma omp …
statement
#pragma omp …
{ statement1; statement2; statement3; }
OpenMP Parallel Region
#pragma omp parallel

A number of threads are spawned at entry.


Each thread executes the same code.
Each thread waits at the end.
Very similar to a number of create/join’s with the
same function in Pthreads.
Getting Threads to do Different Things
Through explicit thread identification (as in Pthreads).
Through work-sharing directives.
Thread Identification
int omp_get_thread_num()
int omp_get_num_threads()

Gets the thread id.


Gets the total number of threads.
Example
#pragma omp parallel
{
if( !omp_get_thread_num() )
master();
else
slave();
}
Work Sharing Directives
Always occur within a parallel region directive.
Two principal ones are
parallel for
parallel section
OpenMP Parallel For
#pragma omp parallel
#pragma omp for
for( … ) { … }
Each thread executes a subset of the iterations.
All threads wait at the end of the parallel for.
Multiple Work Sharing Directives
May occur within a single parallel region
#pragma omp parallel
{
#pragma omp for
for( ; ; ) { … }
#pragma omp for
for( ; ; ) { … }
}
All threads wait at the end of the first for.
The NoWait Qualifier
#pragma omp parallel
{
#pragma omp for nowait
for( ; ; ) { … }
#pragma omp for
for( ; ; ) { … }
}
Threads proceed to second for w/o waiting.
Sections
A parallel loop is an example of independent work
units that are numbered
If you have a pre-determined number of independent
work units, the sections is more appropriate
In a sections construct can be any number
of section constructs and each should be independent
They can be executed by any available thread in the
current team
Parallel Sections Directive

#pragma omp parallel


{
#pragma omp sections
{
{…}
#pragma omp section  this is a delimiter
{…}
#pragma omp section
{…}

}
}
Example:
y = f(x) + g(x)
double y1,y2;
#pragma omp sections
{
#pragma omp section
y1 = f(x)
#pragma omp section
y2 = g(x)
}
y = y1+y2;
Single directive
It limits the execution of a block to a single thread
If the computation needs to be done only once
Helpful for initializing shared variables
#pragma omp parallel
{
#pragma omp single
printf(“Inside section single!\n");
//Try to get thread numbers using omp_get_thread_num
// parallel code
}
Exercise 1:
Matrix multiplication using sections primitive and
observe the time taken
Matrix multiplication using serial programming and
observe the time taken
Exercise 2:
Data Environment Directives (2 of 2)
Private
Threadprivate
Reduction
Private Variables
#pragma omp parallel for private( list )

Makes a private copy for each thread for each variable


in the list.
This and all further examples are with parallel for, but
same applies to other region and work-sharing
directives.
Private Variables: Example (1 of 2)
for( i=0; i<n; i++ ) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
}
Swaps the values in a and b.
Loop-carried dependence on tmp.
Easily fixed by privatizing tmp.
Private Variables: Example (2 of 2)
#pragma omp parallel for private( tmp )
for( i=0; i<n; i++ ) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
}
Removes dependence on tmp.
Would be more difficult to do in Pthreads.
Threadprivate
Private variables are private on a parallel region basis.
Threadprivate variables are global variables that are
private throughout the execution of the program.
Threadprivate
#pragma omp threadprivate( list )
Example: #pragma omp threadprivate( x)
Requires program change in Pthreads.
Requires an array of size p.
Access as x[pthread_self()].
Costly if accessed frequently.
Not cheap in OpenMP either.
Reduction Variables
#pragma omp parallel for reduction( op:list )
op is one of +, *, -, &, ^, |, &&, or ||
The variables in list must be used with this operator in
the loop.
The variables are automatically initialized to sensible
values.
Reduction Variables: Example
#pragma omp parallel for reduction( +:sum )
for( i=0; i<n; i++ )
sum += a[i];

Sum is automatically initialized to zero.


{
int x;
x = 2;
#pragma omp parallel num_threads(2) shared(x)
{
if (omp_get_thread_num() == 0)
{
x = 5;
}
else
{ /* Print 1: the following read of x has a race */
printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x );
}
#pragma omp barrier
if (omp_get_thread_num() == 0)
{ /* Print 2 */
printf("2: Thread# %d: x = %d\n",
omp_get_thread_num(),x ); } else { /* Print 3 */
printf("3: Thread# %d: x = %d\n",
omp_get_thread_num(),x ); } }
return 0;
Synchronization Primitives
Critical
#pragma omp critical name
Implements critical sections by name.
Similar to Pthreads mutex locks (name ~ lock).
Barrier
#pragma omp critical barrier
Implements global barrier.
Reduction
#pragma omp parallel for reduction(+,sum)
for( i=0, sum=0; i<n; i++ )
sum += a[i];

Dependence on sum is removed.


Exercise
Use OpenMP to implement a producer-consumer program
in which some of the threads are producers and others are
consumers. The producers read text from a collection of
files, one per producer. They insert lines of text into a
single shared queue. The consumers take the lines of text
and tokenize them. Tokens are “words”
A search engine can be implemented using a farm of
servers; each contains a subset of data that can be searched.
Assume that this farm server has a single front-end that
interacts with clients who submit queries. Implement the
above server form using master-worker pattern

You might also like