Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center

The document describes exercises for an OpenMP lab that teach how to parallelize code using OpenMP directives. Students will modify example serial code to add OpenMP pragmas and compile with OpenMP support enabled. The exercises cover parallelizing loops, using reductions, and synchronizing updates between threads using constructs like critical sections. Students will run the code serially and in parallel to compare performance.

Uploaded by

Sidou Sissah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views17 pages

Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center

Uploaded by

Sidou Sissah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

OpenMP Lab

Antonio Gómez-Iglesias
[email protected]
Texas Advanced Computing Center
Introduction
What you will learn
• How to compile Code (C and Fortran) with OpenMP
• How to parallelize code with OpenMP
– Use the correct header declarations
– Parallelize simple loops
• How to effectively hide OpenMP statements

What you will do

• Modify example code READ the CODE COMMENTS
• Compile and execute the example
• Compare the run-time of the serial codes and the OpenMP
parallel codes with different scheduling methods
Accessing Lab Files

• Log on to Stampede using your account.

• Untar the file lab_OpenMP.tar file (in ~train00).
• The new directory (lab_openmp) contains
sub-directories for exercises 1-3.
• cd into the appropriate subdirectory for an exercise.

ssh [email protected]
tar -xvf ~train00/lab_OpenMP.tar
cd lab_openmp
Running on compute nodes Interactively

• You will be compiling your code on the login node

• You will be running on the compute nodes
• In one of the sessions, run idev:
– idev -A TRAINING-HPC -t 1:00:00
– idev -A TG-TRA140011 -t 1:00:00
• This will give you access to a compute node
Compiling
• All OpenMP statements are activated by the OpenMP flag:
– Intel compiler: icc/ifort -openmp -fpp source.<c,f90>
• Compilation with the OpenMP flag (-openmp):
Activates OpenMP comment directives (…) :
Fortran: !$OMP ...
C: #pragma omp ...
Enables the macro named _OPENMP
#ifdef _OPENMP evaluates to true
(Fortraners: compile with –fpp)
Enables ”hidden” statements (Fortran only!)
!$ ...
Exercises – Lab 1
• Exercise 1: Kernel check
f_kernel.f90/c_kernel.c
Kernel of the calculation (see exercise 2)
Parallelize one Loop
• Exercise 2: Calculation of p
f_pi.f90/c_pi.c
Parallelize one Loop with a reduction
• Exercise 3: daxpy (a * x + b)
f_daxpy.f90/c_daxpy.c
Parallelize one Loop
Exercise 1: p Integration Kernel Check
• cd exercise_kernel
• Codes: f_kernel.f90/c_kernel.c
• Number of intervals is varied (Trial loop)
Kernel 1 Parallelize the Loop over i :
Trial Loop: itrial Use omp parallel do/for
Calculation of n and deltax Set appropriate variables to private
Loop over i
2 Compile with:
make sure area >0.0
ifort -openmp f_kernel.f90
•1 Parallelize the code icc -openmp c_kernel.c
•2 Compile
•3 Run with 1, 2, 4, 8,12, 16 threads
 Timings decrease with more
e.g. export OMP_NUM_THREADS=4 threads.
./a.out  If you execute with more threads
Try also: export KMP_AFFINITY=compact
than cores the timings will NOT
4 Compare the timings decrease. Why?
Exercise 2: p Integration
• cd exercise_pi
• Codes: f_pi.90/c_pi.c
• Number of intervals is varied (Trial loop)

1 Parallelize the Loop over i :

p calculation Use omp parallel do/for
Trial Loop: itrial with the default(none) clause
Calculation of n and deltax 2 Compile with:
Loop over i
make f_pi
or
make c_pi
•1 Parallelize the code • 3 Run with 1, 2, 4, 8,12 threads
•2 Complete OpenMP statements e.g. export OMP_NUM_THREADS=4
./c_pi or ./f_pi
– Initialization • 4 Compare timings
– omp_get_max_threads
 Timings decrease with more threads
– omp_get_thread_num  What is the scale up at 12 threads?.
Exercise 3: daxpy
• cd exercise_daxpy
• Codes: f_daxpy.f90/c_daxpy.c
• Number of intervals is varied (Trial loop)

1 Parallelize the Loop over i :

daxpy Use omp parallel do/for
Trial Loop: itrial with the default(none) clause
Loop over i 2 Compile with:
make f_daxpy
or
make c_daxpy
• Parallelize the code • 3 Run with 1 and 12
•1 complete OpenMP statements • 4 Compare timings
– Initialization • Why is performance only doubled?
– omp_get_max_threads  Hint: Parallel performance can be limited by
memory bandwidth– what is happening for every
daxpy operation? (Is there cache reuse?)
Exercises – Lab 2
• Exercise 4: Update from neighboring cells (2 arrays)
f_neighbor.f90/c_neighbor.c
Create a Parallel Region
Use a Single construct to initialize
Use a Critical construct to update
Use dynamic or guided scheduling
• Exercise 5: Update from neighboring cells (same array)
f_red black.f90/c_red black.c
Parallelize 3 individual loops, use a reduction
Create a Parallel Region
Combine loops 1 and 2
Use a Single construct to initialize
Exercise 4: Neighbor Update; Part 1
• cd exercise_neighbor
• Codes: f_neighbor.f90/c_neighbor.c

Compile with: make f_neighbor

neighbor update make c_neighbor
Parallel Region
Initialization: j_update
• Parallelize the Loop over i
Parallelize loop i
Loop i • Use a single construct
Loop j for initialization
increment j_update
Loop k • Would a master construct
b is calculated from a work, too?
• Use critical for increment
of j_update
• Use omp parallel do/for
• Try different schedules:
with the default(none) clause
static, dynamic, guided
Exercise 4: Neighbor Update; Part 2

neighbor update
Compile with: make f_neighbor
Parallel Region
Initialization: j_update make c_neighbor
Parallelize loop i
Loop i • Change the single
Loop j to a master construct
single or master
increment j_update • Run with 1 and 12 threads
end single or end master • How does the number
Loop k of j_update change?
b is calculated from a
Exercise 5: Red-Black Update; Part 1
• cd exercise_redblack
• Codes: f_red_black.f90/c_red_black.c
• make a copy and create f_red_black_v1.f90/c_read_black_v1.c
Compile with: make f_red_black_v1
red-black update make c_red_black_v1
Iteration Loop: niter Part 1
Loop: Update even elements
Loop: Update odd elements • Parallelize each loop separately
Initialize error • Use omp parallel do/for
Loop-summation: error for the Update loops
• Use a reduction
for the Error calculation
with the default(none) clause
• Try static scheduling
Exercise 5: Red-Black Update; Part 2
• cd exercise_redblack
• Start from version 1
• Codes: f_red_black.f90/c_red_black.c
• make a copy and create f_red_black_v2.f90/c_read_black_v2.c
Compile with: make f_red_black_v2
red-black update make c_red_black_v2
Iteration Loop: niter Part 2
Loop:
Update even and odd el. • Can the loops be combined?
Initialize error • Why can the update loops
Loop-summation: error
be combined?
• Why can the error loop
not be combined with the update
• Try static scheduling loops?
• Task:
Combine the update loops
Solution 5: Red-Black Update; Part 2

red-black update red-black update

!*** Update even elements !*** Update even and odd
do i=2, n, 2 !*** in one loop
a(i) = 0.5 * (a(i) + a(i-1)) do i=2, n, 2
enddo a(i) = 0.5 * (a(i) + a(i-1))
!*** Update odd elements a(i-1) = 0.5 * (a(i-1) + a(i))
do i=1, n-1, 2 enddo
a(i) = 0.5 * (a(i) + a(i+1))
enddo
Exercise 5: Red-Black Update; Part 3
• cd exercise_redblack
• Start from version 2
• Codes: f_red_black.f90/c_red_black.c
• make a copy and create f_red_black_v3.f90/c_read_black_v3.c
Compile with: make f_red_black_v3
red-black update make c_red_black_v3
Iteration Loop: niter
parallel region Part 3
Loop: • Make one parallel region
Update even and odd el. around both loops:
single update and error.
Initialize error
• The initialization of error
end single
Loop-summation: error has to be done by one thread
end parallel region • Use a single construct
• Would a master construct work?
Exercise 6: Orphaned work-sharing
• cd exercise_print
• Codes: f_print.f90/c_print.c
• make a copy and create f_print_v1.f90/c_print_v1.c

Orphaned work-sharing Compile with: make f_print

parallel region make c_print
print 1
parallel Loop • Inspect the code
print 2
call printer_sub • Run with 1, 2, ... threads
master • Explain the output
print 5
• How often are the 5 print
subroutine print_sub statements executed?
parallel Loop
print 3 • Why?
Loop
print 4

Samsung All-In-One Security System: Front
No ratings yet
Samsung All-In-One Security System: Front
14 pages
Uploading Excel Spreadsheets Into Ebusiness Suite: Oracle
No ratings yet
Uploading Excel Spreadsheets Into Ebusiness Suite: Oracle
13 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Azizul Azri Bin Mustaffa - PEC12-60
No ratings yet
Azizul Azri Bin Mustaffa - PEC12-60
36 pages
Openmp
No ratings yet
Openmp
115 pages
Omp Exercises
No ratings yet
Omp Exercises
81 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Openmp
No ratings yet
Openmp
61 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
TP2
No ratings yet
TP2
4 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
Lab # 1
No ratings yet
Lab # 1
11 pages
MC Openmp
No ratings yet
MC Openmp
10 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
Untitled document
No ratings yet
Untitled document
23 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
HPC Programs
No ratings yet
HPC Programs
19 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
Unit III
No ratings yet
Unit III
15 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Lab 7
No ratings yet
Lab 7
3 pages
CP4292 Multicore Architecture lab manual
No ratings yet
CP4292 Multicore Architecture lab manual
36 pages
Mcap-lab Manual 1
No ratings yet
Mcap-lab Manual 1
19 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
MAP laB mannual
No ratings yet
MAP laB mannual
24 pages
MPC LAB Manual new
No ratings yet
MPC LAB Manual new
24 pages
PDC-Assignment#02
No ratings yet
PDC-Assignment#02
5 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
trial.c
No ratings yet
trial.c
2 pages
Excelente
No ratings yet
Excelente
64 pages
OpenACC 2017spring
No ratings yet
OpenACC 2017spring
46 pages
02 Basicarch
No ratings yet
02 Basicarch
83 pages
4 Performance.4x
No ratings yet
4 Performance.4x
14 pages
MAP lab completed doc
No ratings yet
MAP lab completed doc
29 pages
OpenACC Fundamentals
No ratings yet
OpenACC Fundamentals
38 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
22l-6819
No ratings yet
22l-6819
8 pages
Parallel Programming: Openmp + Fortran
No ratings yet
Parallel Programming: Openmp + Fortran
46 pages
Parallel Computing Lab Manual PDF
100% (1)
Parallel Computing Lab Manual PDF
51 pages
Lab 2
No ratings yet
Lab 2
2 pages
HPC LAB MANUAL
No ratings yet
HPC LAB MANUAL
31 pages
Day 2 1 Advanced-Openmp
No ratings yet
Day 2 1 Advanced-Openmp
52 pages
MPC LAB Manual New
No ratings yet
MPC LAB Manual New
23 pages
E05_22cs4106R Lab WorkBook
No ratings yet
E05_22cs4106R Lab WorkBook
96 pages
Lab 1
No ratings yet
Lab 1
2 pages
HPC - Assignment 1
No ratings yet
HPC - Assignment 1
2 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Micro
No ratings yet
Micro
30 pages
W8L2 OpenMP6 Furthertopics
No ratings yet
W8L2 OpenMP6 Furthertopics
20 pages
OpenMP Programs
No ratings yet
OpenMP Programs
4 pages
410A-week-5
No ratings yet
410A-week-5
23 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
GNU Octave Beginner's Guide
From Everand
GNU Octave Beginner's Guide
Jesper Schmidt Hansen
3/5 (2)
NTRU and Lattice-Based Crypto: Past, Present, and Future: Joseph H. Silverman
No ratings yet
NTRU and Lattice-Based Crypto: Past, Present, and Future: Joseph H. Silverman
54 pages
Basis Reduction: I J J J
No ratings yet
Basis Reduction: I J J J
9 pages
Algorithms For The Closest and Shortest Vector Problems
No ratings yet
Algorithms For The Closest and Shortest Vector Problems
13 pages
Cache-Timing Attack On AES
No ratings yet
Cache-Timing Attack On AES
1 page
Minkowski's Theorem
No ratings yet
Minkowski's Theorem
14 pages
1: Introduction To Lattices: N 1 D×N N N N
No ratings yet
1: Introduction To Lattices: N 1 D×N N N N
11 pages
Lattice Basis Reduction Improved Practical Algorit
No ratings yet
Lattice Basis Reduction Improved Practical Algorit
28 pages
Sieve Algorithms For The Shortest Vector Problem Are Practical
No ratings yet
Sieve Algorithms For The Shortest Vector Problem Are Practical
28 pages
Parallel Acceleration of Deadl
No ratings yet
Parallel Acceleration of Deadl
144 pages
Carbetocin Versus Syntometrine For Prevention of Postpartum Hemorrhage After Cesarean Section2016
No ratings yet
Carbetocin Versus Syntometrine For Prevention of Postpartum Hemorrhage After Cesarean Section2016
6 pages
Analysis of Institute of Electrical and Electronics Engineers (IEEE) Publications: An Empirical Study
No ratings yet
Analysis of Institute of Electrical and Electronics Engineers (IEEE) Publications: An Empirical Study
9 pages
Module 15 - Synonyms, Sequences and Views
No ratings yet
Module 15 - Synonyms, Sequences and Views
17 pages
Raw Modders Union & WWE RAW: Total Edition Software License Agreement
No ratings yet
Raw Modders Union & WWE RAW: Total Edition Software License Agreement
3 pages
Nasscom FutureSkills Prime Student Free Certification Courses
No ratings yet
Nasscom FutureSkills Prime Student Free Certification Courses
8 pages
Geneva Manual
No ratings yet
Geneva Manual
53 pages
Io 500
No ratings yet
Io 500
9 pages
Programing 2 - Lab 03
No ratings yet
Programing 2 - Lab 03
4 pages
5 Papert
No ratings yet
5 Papert
31 pages
(Original PDF) SAS Certification Prep Guide Base Programming for SAS9, Fourth Edition instant download
100% (9)
(Original PDF) SAS Certification Prep Guide Base Programming for SAS9, Fourth Edition instant download
52 pages
Introduction To Inverse Kinematics With Jacobian Transpose, Pseudo Inverse and Damped Least Squares Methods
No ratings yet
Introduction To Inverse Kinematics With Jacobian Transpose, Pseudo Inverse and Damped Least Squares Methods
19 pages
Netflix Debunker 3.0
No ratings yet
Netflix Debunker 3.0
78 pages
WEB 3.0, Blogs, Wikis
No ratings yet
WEB 3.0, Blogs, Wikis
13 pages
Logg 20231219
No ratings yet
Logg 20231219
27 pages
Lathe Machine Lab Report
No ratings yet
Lathe Machine Lab Report
8 pages
18 Pytest
No ratings yet
18 Pytest
9 pages
Vacancy Matrix 2019
No ratings yet
Vacancy Matrix 2019
1 page
Robodrill A-Dia (E) v06
No ratings yet
Robodrill A-Dia (E) v06
12 pages
Unit 4 Ejb JDBC
No ratings yet
Unit 4 Ejb JDBC
13 pages
MPMC Syllabus
No ratings yet
MPMC Syllabus
1 page
TK 2 D Manual
No ratings yet
TK 2 D Manual
13 pages
OOP Lab 903 File Handling
No ratings yet
OOP Lab 903 File Handling
5 pages
BSNL-MNP-postpaid Processflow-Production v1 2.1.1doc
No ratings yet
BSNL-MNP-postpaid Processflow-Production v1 2.1.1doc
104 pages
7 TrustedEnvironments
No ratings yet
7 TrustedEnvironments
50 pages
Marketo Core Concepts I
No ratings yet
Marketo Core Concepts I
95 pages
Audit Checklist
No ratings yet
Audit Checklist
0 pages
Attendance Tool in Blackboard
No ratings yet
Attendance Tool in Blackboard
7 pages
z114 System Overview
No ratings yet
z114 System Overview
198 pages
Samba Sharing: User Guide of Samba/FTP/BT/PT/NFS For RTD1185 / 1186 Media Players W2COMP
No ratings yet
Samba Sharing: User Guide of Samba/FTP/BT/PT/NFS For RTD1185 / 1186 Media Players W2COMP
4 pages

Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center

Uploaded by

Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center

Uploaded by

OpenMP Lab

What you will do

• Log on to Stampede using your account.

• You will be compiling your code on the login node

1 Parallelize the Loop over i :

1 Parallelize the Loop over i :

Compile with: make f_neighbor

red-black update red-black update

Orphaned work-sharing Compile with: make f_print

You might also like