0% found this document useful (0 votes)
4 views

OpenMP 3

The document outlines a course on parallel programming, covering topics such as OpenMP, MPI, and CUDA, along with practical implementations and performance evaluations. It details the structure of OpenMP programs, including directives and clauses like master, reduction, and lastprivate. Additionally, it provides references for further reading on parallel programming techniques and architectures.

Uploaded by

Ranny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

OpenMP 3

The document outlines a course on parallel programming, covering topics such as OpenMP, MPI, and CUDA, along with practical implementations and performance evaluations. It details the structure of OpenMP programs, including directives and clauses like master, reduction, and lastprivate. Additionally, it provides references for further reading on parallel programming techniques and architectures.

Uploaded by

Ranny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Index

• OpenMP
• Program Structure

• Directives : master

• Clauses
• Reduction

• lastprivate

• References
Course Outline

Course Plan: Theory:


Part A: Parallel Computer Architectures
Week 1,2,3: Introduction to Parallel Computer Architecture: Parallel Computing,
Parallel architecture, bit level, instruction level , data level and task level
parallelism. Instruction level parelllelism: pipelining(Data and control instructions),
scalar and superscalar processors, vector processors. Parallel computers and
computation.
Week 4,5: Memory Models: UMA, NUMA and COMA. Flynns classification, Cache
coherence,
Week 6,7: Amdahl’s Law. Performance evaluation, Designing parallel algorithms :
Divide and conquer, Load balancing, Pipelining.
Week 8 -11: Parallel Programming techniques like Task Parallelism using TBB,
TL2, Cilk++ etc. and software transactional memory techniques.
Course Outline
Part B: OpenMP/MPI/CUDA
Week 1,2,3 : Shared Memory Programing Techniques: Introduction to OpenMP :
Directives: parallel, for, sections, task, master, single, critical, barrier, taskwait,
atomic. Clauses: private, shared, firstprivate, lastprivate, reduction, nowait, ordered,
schedule, collapse, num_threads, if().
Week 4,5: Distributed Memory programming Techniques: MPI: Blocking, Non-
blocking.
Week 6,7 : CUDA : OpenCL, Execution models, GPU memory, GPU libraries.
Week 10,11,: Introduction to accelerator programming using CUDA/OpenCL and
Xeon-phi. Concepts of Heterogeneous programming techniques.
Practical:
Implementation of parallel programs using OpenMP/MPI/CUDA.
Assignment: Performance evaluation of parallel algorithms (in group of 2 or 3
members)
1. OpenMP
FORK – JOIN Parallelism
• OpenMP program begin as a single process: the master thread. The master
thread executes sequentially until the first parallel region construct is
encountered.
• When a parallel region is encountered, master thread
– Create a group of threads by FORK.
– Becomes the master of this group of threads and is assigned the thread id 0 within the group.
• The statement in the program that are enclosed by the parallel region
construct are then executed in parallel among these threads.
• JOIN: When the threads complete executing the statement in the parallel region
construct, they synchronize and terminate, leaving only the master thread.
2. OpenMP Programming: Directives
#pragma omp parallel [clause[,]clause...] new-line
Structured-block
Clause: if(scalar-expression)
num_threads(integer-expression)
default(shared|none)
private(list)
firstprivate(list)
shared( list)
copyin(list)
reduction( operator:list)
2. OpenMP Programming: Clauses

#pragma omp parallel [clause[,]clause...] Default(shared/none) , shared(list) Clause


new-line • if(scalar_expression): if true execute in parallel
• num_threads(int): set number of threads
Structured-block
• default (shared) clause causes all variables referenced in the
Clause: if(scalar-expression) construct which have implicitly determined sharing attributes
to be shared.
num_threads(integer-expression) • default(none) clause requires that each variable which is
referenced in the construct, and that does not have
default(shared|none) a predetermined sharing attribute,
must have its sharing attribute explicitly determined
private(list) by being listed in a data sharing attribute clause

firstprivate(list) • shared(list) : One or more list items must be shared among all
the threads in a team.
shared(list) • private (list) clause declares one or more list items must be
private to a thread.
copyin(list)
• firstprivate (list) clause declares one or more list items to be
reduction(operator:list) private to a thread and initializes each of them with that
the corresponding original item has when the construct is
encountered.
2. OpenMP Programming: Directives - Shared

X is shared. X is assigned value 15 inside parallel region.


Each thread is assigning value as 15. And updating it with x=x+1; But update is
not getting reflected in other threads.
2. OpenMP Programming: Directives - Shared

x is shared. No assignment in the parallel region.


Update is getting reflected in all other threads.
Synchronization is job of programmer.
2. OpenMP Programming: Directives - master

x is shared. Assignment statement is done only by master.


2. OpenMP Programming: Directives - master
The master construct specifies a structured block that is executed by the
master thread of the team.
• A master region binds to the innermost
enclosing parallel region.
• Only the master thread executes the
#pragma omp master new-line structured block
Structured block • There is no implied barrier on entry or exit,
for master construct. So other threads need
not fork or join.

#pragma omp master


{
Structured block
}
Consider a program for adding sum of elements in an array.

The parallel region assigns iterations to each


thread .
Partial sum is calculated in each thread
Partial sum is stored in an array
Master thread computes final sum
Consider a program for adding sum of elements in an array.
2. OpenMP Programming: Clauses

#pragma omp parallel [clause[,]clause...] • Reduction(operator: list)


new-line • The reduction clause specifies an operator and
Structured-block one or more list items.
Clause: if(scalar-expression) • For each list item, a private copy is created on
each thread, and is initialized appropriately for
num_threads(integer-expression)
the operator.
default(shared|none)
• After the end of the region, the original list
private(list) item is updated with the values of the
firstprivate(list) private copies using the specified operator.
shared(list) • Initialization value depend on data type of
the reduction variable.
copyin(list)
reduction(operator:list)
2. OpenMP Programming: Clauses

#pragma omp parallel [clause[,]clause...] • Reduction(operator: list)


new-line • Initialization value depend on data type of
Structured-block the reduction variable.
Clause: if(scalar-expression)
num_threads(integer-expression)
default(shared|none)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction(operator:list)
Consider a program for adding sum of elements in an array: with reduction
2. OpenMP Programming: Clauses

#pragma omp parallel [clause[,]clause...] • Reduction(operator: list)


new-line • Used for some form of recurrence calculations
Structured-block • The type of a list item that appears in a
Clause: if(scalar-expression) reduction clause must be valid for the
reduction operator.
num_threads(integer-expression)
• Aggregate types(including arrays), pointers
default(shared|none)
types and reference types may not appear in
private(list) a reduction clause
firstprivate(list) • A variable must appear in a
shared(list) reduction clause must not be const-qualified
copyin(list) • The operator specified in a reduction
clause cannot be overloaded with respect to the
reduction(operator:list) variables that appear in that clause.
2. OpenMP Programming: Clauses

#pragma omp for [clause[,]clause...] new-


line
for-loops
Clause: private(list)
firstprivate(list)
lastprivate(list)
reduction(operator:list)
schedule(kind[,chunk_size])
collapse(n)
ordered
nowait
2. OpenMP Programming: Clauses
2. OpenMP Programming: Clauses
2. OpenMP Programming: Clauses
2. OpenMP Programming: Clauses
2. OpenMP Programming: Clauses
2. OpenMP Programming: Clauses
Index

• OpenMP
• Program Structure

• Directives : master

• Clauses
• Reduction

• Lastprivate

• References
Reference
Text Books and/or Reference Books:
1. Professional CUDA C Programming – John Cheng, Max Grossman, Ty McKercher, 2014
2. B.Wilkinson, M.Allen, ”Parallel Programming: Techniques and Applications Using Networked
Workstations and Parallel Computers”, Pearson Education, 1999
3. I.Foster, ”Designing and building parallel programs”, 2003
4. Parallel Programming in C using OpenMP and MPI – Micheal J Quinn, 2004
5. Introduction to Parallel Programming – Peter S Pacheco, Morgan Kaufmann Publishers,
2011
6. Advanced Computer Architectures: A design approach, Dezso Sima, Terence Fountain, Peter
Kacsuk, 2002
7. Parallel Computer Architecture : A hardware/Software Approach, David E Culler, Jaswinder
Pal Singh Anoop Gupta, 2011 8. Introduction to Parallel Computing, Ananth Grama, Anshul
Gupta, George Karypis, Vipin Kumar, Pearson, 2011
Reference
Acknowledgements
1. Introduction to OpenMP https://www3.nd.edu/~zxu2/acms60212-40212/Lec-12-OpenMP.pdf
2. Introduction to parallel programming for shared memory
Machines https://www.youtube.com/watch?v=LL3TAHpxOig
3. OpenMP Application Program Interface Version 2.5 May 2005
4. OpenMP Application Program Interface Version 5.0 November 2018

You might also like