High Performance Computing (HPC) - Lec3
High Performance Computing (HPC) - Lec3
(HPC)
Lecture 3
Shared Memory
All processors access all memory as a single global address
space.
Data sharing is fast.
Lack of scalability between memory and CPUs
Multithreading vs. Multiprocessing
Introduction to OpenMP
Creating Threads
Synchronization
Parallel Loops
What is OpenMP?
OpenMP will:
Allow a programmer to separate a program into serial regions and parallel
regions, rather than concurrently-executing threads.
Hide stack management
Provide synchronization constructs
OpenMP will not:
Parallelize automatically
Guarantee speedup
Provide freedom from data races
race condition: when the program’s outcome changes as
the threads are scheduled differently
An instance of a program
Fork-Join Model:
Master thread spawns a team
of threads as needed.
Parallelism added
incrementally until
performance goals are met:
i.e. the sequential program
evolves into a parallel
program.
Thread Creation: Parallel Regions
Mutual exclusion: Only one thread at a time can enter a critical region
float res;
#pragma omp parallel
{
float B; int i, id, nthrds;
id = omp_get_thread_num();
nthrds = omp_get_num_threads();
for(i=id;i<niters;i+=nthrds){
B = big_job(i);
#pragma omp critical Threads wait their
turn – only one at a
res += consume (B); time calls consume()
}
}
Synchronization: Atomic
}
SPMD vs. worksharing
}
The loop worksharing Constructs
The schedule clause affects how loop iterations are mapped onto threads
schedule(static [,chunk])
Deal-out blocks of iterations of size “chunk” to each thread.
schedule(dynamic[,chunk])
Each thread grabs “chunk” iterations off a queue until all iterations have been handled.
schedule(guided[,chunk])
Threads dynamically grab blocks of iterations. The size of the block starts large and shrinks down
to size “chunk” as the calculation proceeds.
schedule(runtime)
Schedule and chunk size taken from the OMP_SCHEDULE environment variable (or the runtime
library).
schedule(auto) – Schedule is left up to the runtime to choose (does not have to be any of
the above)
Assignment 1- Parallel Matrix Addition
Using OpenMP
This exercise will help you understand how to utilize parallel computing to
enhance performance.
Due will be after 2 weeks