0% found this document useful (0 votes)

22 views

High Performance Computing (HPC) - Lec3

Uploaded by

omargamalelziky

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

High Performance Computing (HPC) - Lec3

Uploaded by

omargamalelziky

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

High Performance Computing

(HPC)
Lecture 3

By: Dr. Maha Dessokey

Programming with Shared Memory
(OpenMP)
Parallel Computer Memory Architectures

 Shared Memory
All processors access all memory as a single global address
space.
Data sharing is fast.
Lack of scalability between memory and CPUs
Multithreading vs. Multiprocessing

Threads - shares “heavyweight”

the same process -
memory space completely
and global separate program
variables with its own
between variables, stack,
routines. and memory
allocation.
Programming with Shared Memory

The most popular shared memory multithreading API is

 POSIX Threads (Pthreads).
 OpenMP
Agenda

 Introduction to OpenMP
 Creating Threads
 Synchronization
 Parallel Loops
What is OpenMP?

 OpenMP: An API for Writing Multithreaded Applications

 “Standard” API for defining multi-threaded shared-memory
programs
 Set of compiler directives and library routines for parallel
application programmers
 Greatly simplifies writing multi-threaded (MT)
 programs in Fortran, C and C++
OpenMP Solution Stack
A Programmer’s View of OpenMP

 OpenMP will:
 Allow a programmer to separate a program into serial regions and parallel
regions, rather than concurrently-executing threads.
 Hide stack management
 Provide synchronization constructs
 OpenMP will not:
 Parallelize automatically
 Guarantee speedup
 Provide freedom from data races
 race condition: when the program’s outcome changes as
the threads are scheduled differently
An instance of a program

 Threads interact through reads/writes

to a shared address space.
 OS scheduler decides when to run
which threads … interleaved for
fairness.
 Synchronization to assure every legal
order results in correct results.
OpenMP core syntax

 Most of the constructs in OpenMP are compiler directives.

Example #pragma omp parallel num_threads(4)
where omp is an OpenMP keyword.
 Function prototypes and types in the file:
#include < omp.h>
 OpenMP constructs apply to a “structured block”.
 Structured block: a block of one or more statements with one point of entry at
the top and one point of exit at the bottom.
 It’s OK to have an exit() within the structured block.
 A non-structured block lacks clear control flow and can lead to "spaghetti code,"
where the logic is tangled and difficult to follow. This often includes the use of
GOTO statements or deeply nested control structures.
A multi-threaded “Hello world” program

 Write a multithreaded program where each thread prints “hello

world”
How do threads interact?

 OpenMP is a multi-threading, shared address model.

 Threads communicate by sharing variables.
 Unintended sharing of data causes race conditions:
 Race Condition: when the program’s outcome changes as the threads are
scheduled differently.
 To control race conditions: – Use synchronization to protect data conflicts.
OpenMP Programming Model

 Fork-Join Model:
 Master thread spawns a team
of threads as needed.
 Parallelism added
incrementally until
performance goals are met:
i.e. the sequential program
evolves into a parallel
program.
Thread Creation: Parallel Regions

 You create threads in OpenMP* with the parallel construct.

For example, To create a 4 thread Parallel region:

Each thread calls pooh(ID,A) for ID = 0 to 3

Thread Creation: Parallel Regions

 You create threads in OpenMP* with the parallel construct.

For example, To create a 4 thread Parallel region:

Each thread calls pooh(ID,A) for ID = 0 to 3

Thread Creation: Parallel Regions
Example- Numerical Integration

 Mathematically, we know that:

 We can approximate the integral

as a sum of rectangles:

 Where each rectangle has width

∆x and height F(x’) at the middle
of interval i.
Serial PI Program

static long num_steps = 100000;

double step;
int main () for (i=0;i< num_steps; i++)
{ {
int i; double x, pi, sum = 0.0; x = (i+0.5)*step;
step = 1.0/ /(double) num_steps; sum = sum + 4.0/(1.0+x*x);
}
pi = step * sum;
}
A simple Parallel pi program

 To create a parallel version of the pi program pay close attention to

shared versus private variables.
 We will need the runtime library routines
int omp_get_num_threads(); -------→Get number of threads
int omp_get_thread_num(); ---------→ Get Thread ID or rank
double omp_get_wtime()------------→Time in Seconds since a fixed
point in the past
A simple Parallel pi program

#include <omp.h> if (id == 0) nthreads = nthrds;

static long num_steps = 100000; double step;
#define NUM_THREADS 2 for (i=id, sum[id]=0.0; i< num_steps; i=i+nthrds)
void main () {
{ int i, nthreads; double pi, sum[NUM_THREADS]; x = (i+0.5)*step;
step = 1.0/(double) num_steps; sum[id] += 4.0/(1.0+x*x);
omp_set_num_threads(NUM_THREADS); }
#pragma omp parallel } // End of parallel region
{
int i, id,nthrds; double x; for(i=0, pi=0.0; i<nthreads; i++)
id = omp_get_thread_num(); pi += sum[i] * step;
nthrds = omp_get_num_threads(); }
How to calculate the runtime?

#include <omp.h> if (id == 0) nthreads = nthrds;

static long num_steps = 100000; double step;
#define NUM_THREADS 2 for (i=id, sum[id]=0.0; i< num_steps; i=i+nthrds)
void main () {
{ int i, nthreads; double pi, sum[NUM_THREADS]; x = (i+0.5)*step;
double runtime; sum[id] += 4.0/(1.0+x*x);
runtime = omp_get_wtime(); }
step = 1.0/(double) num_steps; } // End of parallel region
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for(i=0, pi=0.0; i<nthreads; i++)
{ pi += sum[i] * step;
int i, id,nthrds; double x; runtime = omp_get_wtime() - runtime;
id = omp_get_thread_num(); printf(" In %lf seconds, The sum is %lf \n",runtime,sum);
nthrds = omp_get_num_threads(); }
Algorithm strategy

 The SIMD (Single Instruction Multiple Data) design pattern

 Run the same program on P processing elements where P can be
arbitrarily large.
 Use the rank( an ID ranging from 0 to (P-1)) to select between a set
of tasks and to manage any shared data structures.
 This pattern is very general and has been used to support most (if
not all) the algorithm strategy patterns.
 MPI programs almost always use this pattern
 it is probably the most commonly used pattern in the history of
parallel programming.
How do threads interact?

 OpenMP is a multi-threading, shared address model.

 Synchronization: bringing one or more threads to a well

defined and known point in their execution.
 Synchronization is used to impose order constraints and to
protect access to shared data
 The two most common forms of synchronization are
Barrier: each thread wait at the barrier until all threads arrive.

Mutual exclusion: Define a block of code that only one thread

at a time can execute.
Synchronization: Barrier

Barrier: Each thread waits until all threads arrive.

#pragma omp parallel
{
int id=omp_get_thread_num();
A[id] = big_calc1(id);
B [] will not be
#pragma omp barrier calculated unless all
B[id] = big_calc2(id, A); threads complete
A[] calculations
}
Synchronization: Mutual exclusion

Mutual exclusion: Only one thread at a time can enter a critical region
float res;
#pragma omp parallel
{
float B; int i, id, nthrds;
id = omp_get_thread_num();
nthrds = omp_get_num_threads();
for(i=id;i<niters;i+=nthrds){
B = big_job(i);
#pragma omp critical Threads wait their
turn – only one at a
res += consume (B); time calls consume()
}
}
Synchronization: Atomic

Atomic: provides mutual exclusion but only applies to the update of a

memory location (the update of X in the following example)
#pragma omp parallel
{
double tmp, B;
B = DOIT();
tmp = big_ugly(B);
#pragma omp atomic Atomic only protects
X += big_ugly(B); the read/update of X

}
SPMD vs. worksharing

 A parallel construct by itself creates an SPMD or “Single Program

Multiple Data” program … i.e., each thread redundantly executes
the same code.
 How do you split up pathways through the code between threads
within a team?
 This is called worksharing
 Loop construct
 Sections/section constructs
 Single construct Out of our scope
 Task construct
The loop worksharing Constructs

The loop worksharing construct splits up loop iterations among the

threads in a team
#pragma omp parallel
{
#pragma omp for
for (I=0;I<N;I++)
{
The variable I is made
NEAT_STUFF(I);
“private” to each
} thread by default.

}
The loop worksharing Constructs

Sequential for(i=0;i<N;i++) { a[i] = a[i] + b[i];}

code
#pragma omp parallel Block distribution for
{ loop iterations
int id, i, Nthrds,Step,, istart, iend;
OpenMP id = omp_get_thread_num();
Nthrds = omp_get_num_threads();
parallel Step= N / Nthrds
region istart = id *Step;
iend = (id+1) * Step;
if (id == Nthrds-1) iend = N; /// last thread takes the remainder
for(i=istart;i<iend;i++)
{
a[i] = a[i] + b[i];
}
}
The loop worksharing Constructs

Sequential for(i=0;i<N;i++) { a[i] = a[i] + b[i];}

code

OpenMP parallel region #pragma omp parallel

and a worksharing for #pragma omp for
construct for(i=0;i<N;i++) { a[i] = a[i] + b[i];}
Combined parallel/worksharing construct

 OpenMP shortcut: Put the “parallel” and the worksharing directive

on the same line
loop worksharing constructs:
The schedule clause

 The schedule clause affects how loop iterations are mapped onto threads
 schedule(static [,chunk])
 Deal-out blocks of iterations of size “chunk” to each thread.
 schedule(dynamic[,chunk])
 Each thread grabs “chunk” iterations off a queue until all iterations have been handled.
 schedule(guided[,chunk])
 Threads dynamically grab blocks of iterations. The size of the block starts large and shrinks down
to size “chunk” as the calculation proceeds.
 schedule(runtime)
 Schedule and chunk size taken from the OMP_SCHEDULE environment variable (or the runtime
library).
 schedule(auto) – Schedule is left up to the runtime to choose (does not have to be any of
the above)
Assignment 1- Parallel Matrix Addition
Using OpenMP

 you will implement a parallel program to perform matrix addition

using OpenMP.

This exercise will help you understand how to utilize parallel computing to
enhance performance.
Due will be after 2 weeks

@RefundCrew - CRYPTO REFUND GUIDE
No ratings yet
@RefundCrew - CRYPTO REFUND GUIDE
3 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
UNIT 3
No ratings yet
UNIT 3
13 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
No ratings yet
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
50 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Openmp Programming: Aiichiro Nakano
No ratings yet
Openmp Programming: Aiichiro Nakano
10 pages
Open MP
No ratings yet
Open MP
35 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
ipc_assig 1
No ratings yet
ipc_assig 1
9 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
OPENMP
No ratings yet
OPENMP
37 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
NGK Openmp
No ratings yet
NGK Openmp
13 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
49 pages
Exercise 1 (Openmp-I)
No ratings yet
Exercise 1 (Openmp-I)
10 pages
Openmp
No ratings yet
Openmp
21 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Unit 3 - Programming Multi-Core and Shared Memory
No ratings yet
Unit 3 - Programming Multi-Core and Shared Memory
100 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
07 OpenMP
No ratings yet
07 OpenMP
28 pages
Openmp 2pp
No ratings yet
Openmp 2pp
15 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Open MP
No ratings yet
Open MP
30 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Implementation of MANET Routing Protocols On OMNeT
No ratings yet
Implementation of MANET Routing Protocols On OMNeT
5 pages
Project Report Group C1
No ratings yet
Project Report Group C1
22 pages
Dade County FCU Statement Jan-May 2024
No ratings yet
Dade County FCU Statement Jan-May 2024
28 pages
AI Chapter 1
No ratings yet
AI Chapter 1
28 pages
Renesas MCU - V10
No ratings yet
Renesas MCU - V10
22 pages
DM - Graph Connectivity: NGUYEN Hoang Thach
No ratings yet
DM - Graph Connectivity: NGUYEN Hoang Thach
16 pages
Describing SAP HANA and SAP S/4HANA: Unit 1 Lesson 1
No ratings yet
Describing SAP HANA and SAP S/4HANA: Unit 1 Lesson 1
2 pages
Shortcut For Excel
No ratings yet
Shortcut For Excel
13 pages
Let's Route Your Emails To Gmail
No ratings yet
Let's Route Your Emails To Gmail
3 pages
Network - Process V1.4 (1) - User Manual
No ratings yet
Network - Process V1.4 (1) - User Manual
33 pages
Supria Plus Brochure 2021 09 24 Compressed
No ratings yet
Supria Plus Brochure 2021 09 24 Compressed
20 pages
Sampatti Computers Private Limited: Party Details
No ratings yet
Sampatti Computers Private Limited: Party Details
1 page
Configuration Document For Prepaid Expense Automation
No ratings yet
Configuration Document For Prepaid Expense Automation
28 pages
Flsun QQ-S Pro Delta 3D Printer 255X360mm Printing Size
No ratings yet
Flsun QQ-S Pro Delta 3D Printer 255X360mm Printing Size
2 pages
Text Data Errors and Error Correction: Mathematics in The Modern World
No ratings yet
Text Data Errors and Error Correction: Mathematics in The Modern World
18 pages
Bank SRC NWC204 Module+Check
No ratings yet
Bank SRC NWC204 Module+Check
97 pages
Residential Garbage Application Form: Applicant Details
No ratings yet
Residential Garbage Application Form: Applicant Details
2 pages
IT Parks in India PDF
No ratings yet
IT Parks in India PDF
55 pages
S4 HANA Business Partner Configuration@Ganesh Tarlana PDF
No ratings yet
S4 HANA Business Partner Configuration@Ganesh Tarlana PDF
31 pages
Smart QR Creator (Malware Report)
No ratings yet
Smart QR Creator (Malware Report)
45 pages
Dynamic Memory Allocation in C
No ratings yet
Dynamic Memory Allocation in C
4 pages
Procedural Guidelines: A.TREDS Limited
No ratings yet
Procedural Guidelines: A.TREDS Limited
41 pages
Flat PPT Template by Sakkarupa PowerPoint
No ratings yet
Flat PPT Template by Sakkarupa PowerPoint
10 pages
Logcat 1714497243765
No ratings yet
Logcat 1714497243765
8 pages
01 3.1 Digital Signals Systems 17-55
No ratings yet
01 3.1 Digital Signals Systems 17-55
48 pages
Automatic Control of Electrical Overhead Smart Trolley Crane AEOSTC Based Programmable Logic Controller (PLC)
No ratings yet
Automatic Control of Electrical Overhead Smart Trolley Crane AEOSTC Based Programmable Logic Controller (PLC)
10 pages
ArcGIS 10 Book Training PDF
67% (6)
ArcGIS 10 Book Training PDF
2 pages
Matrix Minima and Vogel's Method
No ratings yet
Matrix Minima and Vogel's Method
13 pages
BOOPATHI
No ratings yet
BOOPATHI
46 pages