Shared Memory: Openmp Environment and Synchronization
This document discusses key aspects of the OpenMP API for parallel programming including compiler directives, parallel regions, work sharing directives like parallel for and sections, thread identification functions, data environment directives like private and reduction, and synchronization primitives like barriers and critical sections. It provides examples of using various OpenMP constructs for parallelizing loops and implementing common parallel patterns.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
109 views
Shared Memory: Openmp Environment and Synchronization
This document discusses key aspects of the OpenMP API for parallel programming including compiler directives, parallel regions, work sharing directives like parallel for and sections, thread identification functions, data environment directives like private and reduction, and synchronization primitives like barriers and critical sections. It provides examples of using various OpenMP constructs for parallelizing loops and implementing common parallel patterns.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32
Shared Memory: OpenMP
Environment and Synchronization
OpenMP API Overview API is a set of compiler directives inserted in the source program (in addition to some library functions). Ideally, compiler directives do not affect sequential code. pragma’s in C / C++ . (special) comments in Fortran code. API Semantics Master thread executes sequential code. Master and slaves execute parallel code. Note: very similar to fork-join semantics of Pthreads create/join primitives. OpenMP Directives Parallelization directives: parallel region parallel for Data environment directives: shared, private, threadprivate, reduction, etc. Synchronization directives: barrier, critical General Rules about Directives They always apply to the next statement, which must be a structured block. Examples #pragma omp … statement #pragma omp … { statement1; statement2; statement3; } OpenMP Parallel Region #pragma omp parallel
A number of threads are spawned at entry.
Each thread executes the same code. Each thread waits at the end. Very similar to a number of create/join’s with the same function in Pthreads. Getting Threads to do Different Things Through explicit thread identification (as in Pthreads). Through work-sharing directives. Thread Identification int omp_get_thread_num() int omp_get_num_threads()
Gets the thread id.
Gets the total number of threads. Example #pragma omp parallel { if( !omp_get_thread_num() ) master(); else slave(); } Work Sharing Directives Always occur within a parallel region directive. Two principal ones are parallel for parallel section OpenMP Parallel For #pragma omp parallel #pragma omp for for( … ) { … } Each thread executes a subset of the iterations. All threads wait at the end of the parallel for. Multiple Work Sharing Directives May occur within a single parallel region #pragma omp parallel { #pragma omp for for( ; ; ) { … } #pragma omp for for( ; ; ) { … } } All threads wait at the end of the first for. The NoWait Qualifier #pragma omp parallel { #pragma omp for nowait for( ; ; ) { … } #pragma omp for for( ; ; ) { … } } Threads proceed to second for w/o waiting. Sections A parallel loop is an example of independent work units that are numbered If you have a pre-determined number of independent work units, the sections is more appropriate In a sections construct can be any number of section constructs and each should be independent They can be executed by any available thread in the current team Parallel Sections Directive
#pragma omp parallel
{ #pragma omp sections { {…} #pragma omp section this is a delimiter {…} #pragma omp section {…} … } } Example: y = f(x) + g(x) double y1,y2; #pragma omp sections { #pragma omp section y1 = f(x) #pragma omp section y2 = g(x) } y = y1+y2; Single directive It limits the execution of a block to a single thread If the computation needs to be done only once Helpful for initializing shared variables #pragma omp parallel { #pragma omp single printf(“Inside section single!\n"); //Try to get thread numbers using omp_get_thread_num // parallel code } Exercise 1: Matrix multiplication using sections primitive and observe the time taken Matrix multiplication using serial programming and observe the time taken Exercise 2: Data Environment Directives (2 of 2) Private Threadprivate Reduction Private Variables #pragma omp parallel for private( list )
Makes a private copy for each thread for each variable
in the list. This and all further examples are with parallel for, but same applies to other region and work-sharing directives. Private Variables: Example (1 of 2) for( i=0; i<n; i++ ) { tmp = a[i]; a[i] = b[i]; b[i] = tmp; } Swaps the values in a and b. Loop-carried dependence on tmp. Easily fixed by privatizing tmp. Private Variables: Example (2 of 2) #pragma omp parallel for private( tmp ) for( i=0; i<n; i++ ) { tmp = a[i]; a[i] = b[i]; b[i] = tmp; } Removes dependence on tmp. Would be more difficult to do in Pthreads. Threadprivate Private variables are private on a parallel region basis. Threadprivate variables are global variables that are private throughout the execution of the program. Threadprivate #pragma omp threadprivate( list ) Example: #pragma omp threadprivate( x) Requires program change in Pthreads. Requires an array of size p. Access as x[pthread_self()]. Costly if accessed frequently. Not cheap in OpenMP either. Reduction Variables #pragma omp parallel for reduction( op:list ) op is one of +, *, -, &, ^, |, &&, or || The variables in list must be used with this operator in the loop. The variables are automatically initialized to sensible values. Reduction Variables: Example #pragma omp parallel for reduction( +:sum ) for( i=0; i<n; i++ ) sum += a[i];
Sum is automatically initialized to zero.
{ int x; x = 2; #pragma omp parallel num_threads(2) shared(x) { if (omp_get_thread_num() == 0) { x = 5; } else { /* Print 1: the following read of x has a race */ printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } #pragma omp barrier if (omp_get_thread_num() == 0) { /* Print 2 */ printf("2: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } else { /* Print 3 */ printf("3: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } } return 0; Synchronization Primitives Critical #pragma omp critical name Implements critical sections by name. Similar to Pthreads mutex locks (name ~ lock). Barrier #pragma omp critical barrier Implements global barrier. Reduction #pragma omp parallel for reduction(+,sum) for( i=0, sum=0; i<n; i++ ) sum += a[i];
Dependence on sum is removed.
Exercise Use OpenMP to implement a producer-consumer program in which some of the threads are producers and others are consumers. The producers read text from a collection of files, one per producer. They insert lines of text into a single shared queue. The consumers take the lines of text and tokenize them. Tokens are “words” A search engine can be implemented using a farm of servers; each contains a subset of data that can be searched. Assume that this farm server has a single front-end that interacts with clients who submit queries. Implement the above server form using master-worker pattern