0% found this document useful (0 votes)
1 views

HPC_SUMMARY

This document serves as a comprehensive guide for OpenMP and MPI, focusing on parallel programming techniques for shared-memory and distributed-memory systems. It covers key concepts, syntax, examples, and comparisons to aid in exam preparation for High Performance Computing. The document includes detailed sections on compiling, running, and various programming constructs for both OpenMP and MPI.

Uploaded by

krimtech.contact
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

HPC_SUMMARY

This document serves as a comprehensive guide for OpenMP and MPI, focusing on parallel programming techniques for shared-memory and distributed-memory systems. It covers key concepts, syntax, examples, and comparisons to aid in exam preparation for High Performance Computing. The document includes detailed sections on compiling, running, and various programming constructs for both OpenMP and MPI.

Uploaded by

krimtech.contact
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

High Performance Computing: OpenMP and MPI

Prepared for Exam Revision

May 2025

A comprehensive guide to OpenMP and MPI for shared-memory and


distributed-memory parallel programming.

Contents
1 Introduction 2

2 OpenMP 2
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Compiling and Running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Setting the Number of Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Private and Shared Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Parallelizing For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.7 Reduction Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.9 Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 MPI 12
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Compiling and Running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 General Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Number of Processes and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6 Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.7 Scatter and Gather . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.8 Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.9 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.10 Cluster Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 OpenMP vs. MPI 17

5 Study Tips for Your Exam 17

1
1 Introduction
This document provides a detailed course on OpenMP and MPI, two key technologies
for parallel computing. OpenMP is designed for shared-memory systems using threads,
while MPI is suited for distributed-memory systems using processes. The course in-
cludes concepts, syntax, examples, and comparisons to aid in preparing for your High
Performance Computing exam.

Note
This document uses blue for OpenMP-related content and green for MPI-related
content to enhance clarity.

2 OpenMP
OpenMP (Open Multi-Processing) is an API for shared-memory parallel programming in
C, C++, and Fortran. It uses threads to execute code concurrently on a single machine
with shared memory.

2.1 Overview
• Purpose: Parallelize code on multi-core processors using threads.

• Key Features:

– Shared memory model: All threads access the same memory space.
– Directives to parallelize loops, sections, or tasks.
– Synchronization mechanisms to manage thread interactions.

• Advantages: Easy to use, portable across platforms.

• Limitations: Limited to single-node systems, thread overhead.

2
2.2 Compiling and Running
To compile, use the -fopenmp flag with gcc:

gcc -fopenmp source.c -o output


./output

Example: A simple "Hello World" program.


1 #include <stdio.h>
2 #include <omp.h>
3 int main() {
4 #pragma omp parallel
5 {
6 int thread_id = omp_get_thread_num();
7 printf("Hello from thread %d\n", thread_id);
8 }
9 return 0;
10 }
11 \newpage

Expected Output (varies by number of threads, e.g., for 4 threads):

Hello from thread 0


Hello from thread 1
Hello from thread 2
Hello from thread 3

Note: The order of output lines may vary due to thread scheduling.

2.3 Setting the Number of Threads


Threads can be set in three ways:

• Programmatically: omp_set_num_threads(NUM_THREADS);

• Directive: #pragma omp parallel num_threads(NUM_THREADS)

• Environment: export OMP_NUM_THREADS=NUM_THREADS

Comparison:

• omp_set_num_threads: Global setting for all parallel regions.

• num_threads: Specific to a parallel region.

• OMP_NUM_THREADS: Default for the entire program.

Example: Using all three methods.

3
1 #include <stdio.h>
2 #include <omp.h>
3 int main() {
4 omp_set_num_threads(4); // Programmatic
5 #pragma omp parallel
6 {
7 printf("Thread %d (programmatic)\n", omp_get_thread_num());
8 }
9 #pragma omp parallel num_threads(4) // Directive
10 {
11 printf("Thread %d (directive)\n", omp_get_thread_num());
12 }
13 #pragma omp parallel // Environment (export OMP_NUM_THREADS=4)
14 {
15 printf("Thread %d (environment)\n", omp_get_thread_num());
16 }
17 return 0;
18 }

Expected Output (assuming OMP_NUM_THREADS=4):

Thread 0 (programmatic)
Thread 1 (programmatic)
Thread 2 (programmatic)
Thread 3 (programmatic)
Thread 0 (directive)
Thread 1 (directive)
Thread 2 (directive)
Thread 3 (directive)
Thread 0 (environment)
Thread 1 (environment)
Thread 2 (environment)
Thread 3 (environment)

Note: The order of threads within each section may vary.

4
2.4 Private and Shared Variables
Variables are shared by default. Use private to give each thread its own copy.

• private(var): Each thread has an independent copy.

• shared(var): All threads access the same variable.

Example: Private vs. shared variables.


1 #include <stdio.h>
2 #include <unistd.h>
3 #include <omp.h>
4 int main() {
5 int prv = 0, pub = 0;
6 printf("Thread ID, Private, Shared\n");
7 #pragma omp parallel num_threads(4) private(prv) shared(pub)
8 {
9 int thread_id = omp_get_thread_num();
10 prv = 0;
11 prv += thread_id;
12 sleep(1);
13 pub += thread_id;
14 printf("%d, %d, %d\n", thread_id, prv, pub);
15 }
16 return 0;
17 }

Expected Output:

Thread ID, Private, Shared


0, 0, 0
1, 1, 1
2, 2, 3
3, 3, 6

Note: The shared variable pub accumulates thread IDs, and the output order may vary.

5
2.5 Parallelizing For Loops
#pragma omp for distributes loop iterations across threads. Example: 8 iterations
across 4 threads.
1 #include <stdio.h>
2 #include <omp.h>
3 #define NUM_THREADS 4
4 #define NUM_ITERS 8
5 int main() {
6 #pragma omp parallel num_threads(NUM_THREADS)
7 {
8 #pragma omp for
9 for (int i = 0; i < NUM_ITERS; i++) {
10 int id = omp_get_thread_num();
11 printf("Thread %d: Loop iteration %d\n", id, i);
12 }
13 }
14 return 0;
15 }

Expected Output (sample for static scheduling):

Thread 0: Loop iteration 0


Thread 0: Loop iteration 1
Thread 1: Loop iteration 2
Thread 1: Loop iteration 3
Thread 2: Loop iteration 4
Thread 2: Loop iteration 5
Thread 3: Loop iteration 6
Thread 3: Loop iteration 7

Note: The assignment of iterations to threads depends on the scheduling type (default is
static). Output order may vary.

2.6 Scheduling
Scheduling controls how iterations are assigned:

• Static:Divides the iterations of the loop into equal-sized chunks, with each chunk
assigned to a thread statically at compile time. The optional chunk parameter spec-
ifies the size of each chunk.
(Equal chunks at compile time.)

– Syntax: #pragma omp for schedule(static[,chunk])


– Pros: Low overhead.
– Cons: Poor load balancing.

• Dynamic: Divides the iterations of the loop into chunks of size chunk, and assigns
these chunks dynamically to threads as threads become available to do work.
(Assigns chunks dynamically as threads become available.)

– Syntax: #pragma omp for schedule(dynamic[,chunk])


– Pros: Better load balancing.
– Cons: Higher overhead.

6
• Guided:Similar to dynamic scheduling, but the size of the chunks decreases dy-
namically over time. Initially, larger chunks are assigned, which helps reduce
scheduling overhead, but as the loop progresses, the chunk size decreases to bal-
ance the workload among threads.
(Assigns larger chunks initially, decreasing over time.)

– Syntax: #pragma omp for schedule(guided[,chunk])


– Pros: Balances overhead and load.
– Cons: Less predictable.

• Runtime: Allows the scheduling policy to be determined at runtime using the OMP
SCHEDULE environment variable or the omp set schedule function. This provides
flexibility in choosing the scheduling policy without recompiling the code.
Set via OMP_SCHEDULE.

– Syntax: #pragma omp for schedule(runtime)


– Pros: Flexible.
– Cons: Requires external setup.

Example: Comparing scheduling types.


1 #include <stdio.h>
2 #include <omp.h>
3 #define NUM_THREADS 4
4 #define NUM_ITERS 16
5 int main() {
6 printf("Static Scheduling:\n");
7 #pragma omp parallel num_threads(NUM_THREADS)
8 {
9 #pragma omp for schedule(static, 2)
10 for (int i = 0; i < NUM_ITERS; i++) {
11 printf("Thread %d: Iteration %d\n", omp_get_thread_num(), i);
12 }
13 }
14 printf("\nDynamic Scheduling:\n");
15 #pragma omp parallel num_threads(NUM_THREADS)
16 {
17 #pragma omp for schedule(dynamic, 2)
18 for (int i = 0; i < NUM_ITERS; i++) {
19 printf("Thread %d: Iteration %d\n", omp_get_thread_num(), i);
20 }
21 }
22 printf("\nGuided Scheduling:\n");
23 #pragma omp parallel num_threads(NUM_THREADS)
24 {
25 #pragma omp for schedule(guided, 2)
26 for (int i = 0; i < NUM_ITERS; i++) {
27 printf("Thread %d: Iteration %d\n", omp_get_thread_num(), i);
28 }
29 }
30 printf("\nRuntime Scheduling:\n");
31 #pragma omp parallel num_threads(NUM_THREADS)
32 {
33 #pragma omp for schedule(runtime)
34 for (int i = 0; i < NUM_ITERS; i++) {
35 printf("Thread %d: Iteration %d\n", omp_get_thread_num(), i);

7
36 }
37 }
38 return 0;
39 }

Expected Output (sample):


Static Scheduling:
Thread 0: Iteration 0
Thread 0: Iteration 1
Thread 0: Iteration 2
Thread 0: Iteration 3
Thread 1: Iteration 4
Thread 1: Iteration 5
Thread 1: Iteration 6
Thread 1: Iteration 7
Thread 2: Iteration 8
Thread 2: Iteration 9
Thread 2: Iteration 10
Thread 2: Iteration 11
Thread 3: Iteration 12
Thread 3: Iteration 13
Thread 3: Iteration 14
Thread 3: Iteration 15

Dynamic Scheduling:
Thread 0: Iteration 0
Thread 0: Iteration 1
Thread 1: Iteration 2
Thread 1: Iteration 3
Thread 2: Iteration 4
Thread 2: Iteration 5
Thread 3: Iteration 6
Thread 3: Iteration 7
Thread 0: Iteration 8
Thread 0: Iteration 9
Thread 1: Iteration 10
Thread 1: Iteration 11
Thread 2: Iteration 12
Thread 2: Iteration 13
Thread 3: Iteration 14
Thread 3: Iteration 15

Guided Scheduling:
Thread 0: Iteration 0
Thread 0: Iteration 1
Thread 0: Iteration 2
Thread 0: Iteration 3
Thread 1: Iteration 4
Thread 1: Iteration 5
Thread 1: Iteration 6
Thread 2: Iteration 7
Thread 2: Iteration 8

8
Thread 2: Iteration 9
Thread 3: Iteration 10
Thread 3: Iteration 11
Thread 0: Iteration 12
Thread 1: Iteration 13
Thread 2: Iteration 14
Thread 3: Iteration 15

Runtime Scheduling:
Thread 0: Iteration 0
Thread 0: Iteration 1
Thread 0: Iteration 2
Thread 0: Iteration 3
Thread 1: Iteration 4
Thread 1: Iteration 5
Thread 1: Iteration 6
Thread 1: Iteration 7
Thread 2: Iteration 8
Thread 2: Iteration 9
Thread 2: Iteration 10
Thread 2: Iteration 11
Thread 3: Iteration 12
Thread 3: Iteration 13
Thread 3: Iteration 14
Thread 3: Iteration 15
Note: Static scheduling assigns fixed chunks (e.g., 4 iterations per thread). Dynamic and
guided scheduling assign chunks as threads become available, with guided reducing
chunk sizes over time. Runtime scheduling depends on OMP_SCHEDULE, so the output
may vary.

2.7 Reduction Clauses


reduction(op:var) combines thread results using operators like +, -, *, &, |, ^, &&, ||,
max, min. Example: Summing an array.
1 #include <stdio.h>
2 #include <omp.h>
3 #define NUM_THREADS 4
4 #define ARRAY_SIZE 8
5 int main() {
6 int arr[ARRAY_SIZE];
7 int total = 0;
8 for (int i = 0; i < ARRAY_SIZE; i++) {
9 arr[i] = i + 1;
10 }
11 #pragma omp parallel num_threads(NUM_THREADS)
12 {
13 #pragma omp for reduction(+:total)
14 for (int i = 0; i < ARRAY_SIZE; i++) {
15 total += arr[i];
16 }
17 }
18 printf("Total = %d\n", total); // 1+2+3+4+5+6+7+8 = 36
19 return 0;

9
20 }

Expected Output:

Total = 36

2.8 Synchronization
• Critical:Ensures that a specific block of code is executed by only one thread at a
time. all threads will eventually execute the code in this section but not in the same
time. ( One thread at a time)

• Single:Specifies that a block of code should be executed by only one thread, but it
doesn’t specify which thread. ( One thread, others wait.)

• Master:master: like single but the block is ensured to be executed by the master
(main program,thread with ID 0). (Master thread (ID 0) only.)

• Barrier:Synchronizes all threads in a parallel region. Each thread waits at the bar-
rier until all threads have reached it, then they all proceed. (All threads wait.)

• Atomic:Specifies that a specific operation is atomic,it is used for indivisible opera-


tions like an increment. (Indivisible operation.)

Example: Synchronization directives.


1 #include <stdio.h>
2 #include <omp.h>
3 int main() {
4 int counter = 0;
5 #pragma omp parallel num_threads(4)
6 {
7 int id = omp_get_thread_num();
8 #pragma omp critical
9 {
10 counter++;
11 printf("Thread %d incremented counter to %d\n", id, counter);
12 }
13 #pragma omp single
14 {
15 printf("Thread %d in single section\n", id);
16 }
17 #pragma omp master
18 {
19 printf("Master thread %d\n", id);
20 }
21 #pragma omp barrier
22 printf("Thread %d passed barrier\n", id);
23 #pragma omp atomic
24 counter++;
25 }
26 printf("Final counter: %d\n", counter);
27 return 0;
28 }

Expected Output (sample for 4 threads):

10
Thread 0 incremented counter to 1
Thread 1 incremented counter to 2
Thread 2 incremented counter to 3
Thread 3 incremented counter to 4
Thread 2 in single section
Master thread 0
Thread 0 passed barrier
Thread 1 passed barrier
Thread 2 passed barrier
Thread 3 passed barrier
Final counter: 8

Note: The order of critical section outputs may vary. The single section is executed by
one thread (ID varies). The final counter is 8 (4 from critical + 4 from atomic).

2.9 Sections
#pragma omp sections assigns independent tasks to threads. Example: Parallelizing
two functions.
1 #include <stdio.h>
2 #include <omp.h>
3 int f1(int b, int c) { return b + c; }
4 int f2(int d, int e) { return d * e; }
5 int main() {
6 int a = 0, b = 1, c = 2, d = 3, e = 4;
7 int a1, a2;
8 #pragma omp parallel sections
9 {
10 #pragma omp section
11 {
12 a1 = f1(b, c);
13 printf("Section 1: a1 = %d\n", a1);
14 }
15 #pragma omp section
16 {
17 a2 = f2(d, e);
18 printf("Section 2: a2 = %d\n", a2);
19 }
20 }
21 a = a1 + a2;
22 printf("a = %d\n", a);
23 return 0;
24 }

Expected Output:

Section 1: a1 = 3
Section 2: a2 = 12
a = 15

Note: The order of section outputs may vary due to thread assignment.

11
3 MPI
MPI (Message Passing Interface) is a standard for distributed-memory parallel comput-
ing using processes.

3.1 Overview
• Purpose: Parallelize code across multiple nodes/processes.

• Key Features:

– Distributed memory: Each process has its own memory.


– Message passing for communication.
– Scalable to thousands of nodes.

• Advantages: Scalable, works on clusters.

• Limitations: Complex, higher overhead for small tasks.

3.2 Compiling and Running


Compile with mpicc, run with mpirun:

mpicc source.c -o output


mpirun -np <NUM_PROCESSES> ./output

Example: MPI "Hello World".


1 #include <stdio.h>
2 #include <mpi.h>
3 int main(int argc, char *argv[]) {
4 MPI_Init(&argc, &argv);
5 int rank, size;
6 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
7 MPI_Comm_size(MPI_COMM_WORLD, &size);
8 printf("Hello from process %d of %d\n", rank, size);
9 MPI_Finalize();
10 return 0;
11 }

Expected Output (for 4 processes):

Hello from process 0 of 4


Hello from process 1 of 4
Hello from process 2 of 4
Hello from process 3 of 4

Note: The order of process outputs may vary.

12
3.3 General Structure
• MPI_Init(&argc, &argv);

• Code and communications.

• MPI_Finalize();

3.4 Number of Processes and Rank


• MPI_Comm_size(MPI_COMM_WORLD, &size);

• MPI_Comm_rank(MPI_COMM_WORLD, &rank);

3.5 Message Passing


MPI_Send and MPI_Recv for point-to-point communication. Example: Sending a num-
ber.
1 #include <stdio.h>
2 #include <mpi.h>
3 int main(int argc, char *argv[]) {
4 MPI_Init(&argc, &argv);
5 int rank, size;
6 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
7 MPI_Comm_size(MPI_COMM_WORLD, &size);
8 int number;
9 if (rank == 0) {
10 number = 42;
11 MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
12 printf("Process 0 sent %d to process 1\n", number);
13 } else if (rank == 1) {
14 MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
15 printf("Process 1 received %d from process 0\n", number);
16 }
17 MPI_Finalize();
18 return 0;
19 }

Expected Output (for 2 processes):

Process 0 sent 42 to process 1


Process 1 received 42 from process 0

Note: The order of outputs may vary.

13
3.6 Broadcast
MPI_Bcast sends data from a root process to all others. Example: Broadcasting a num-
ber.
1 #include <stdio.h>
2 #include <mpi.h>
3 int main(int argc, char *argv[]) {
4 MPI_Init(&argc, &argv);
5 int rank;
6 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
7 int number = 0;
8 if (rank == 0) {
9 number = 100;
10 }
11 MPI_Bcast(&number, 1, MPI_INT, 0, MPI_COMM_WORLD);
12 printf("Process %d has number %d\n", rank, number);
13 MPI_Finalize();
14 return 0;
15 }

Expected Output (for 4 processes):


Process 0 has number 100
Process 1 has number 100
Process 2 has number 100
Process 3 has number 100
Note: The order of process outputs may vary.

3.7 Scatter and Gather


• MPI_Scatter: Distributes data from root to all processes.

• MPI_Gather: Collects data from all processes to root.

Example: Scatter and gather.


1 #include <stdio.h>
2 #include <mpi.h>
3 int main(int argc, char *argv[]) {
4 MPI_Init(&argc, &argv);
5 int rank, size;
6 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
7 MPI_Comm_size(MPI_COMM_WORLD, &size);
8 int sendbuf[4] = {10, 20, 30, 40};
9 int recvbuf;
10 MPI_Scatter(sendbuf, 1, MPI_INT, &recvbuf, 1, MPI_INT, 0,
MPI_COMM_WORLD);
11 printf("Process %d received %d\n", rank, recvbuf);
12 int gatherbuf[4];
13 recvbuf *= 2;
14 MPI_Gather(&recvbuf, 1, MPI_INT, gatherbuf, 1, MPI_INT, 0,
MPI_COMM_WORLD);
15 if (rank == 0) {
16 printf("Gathered data: ");
17 for (int i = 0; i < size; i++) {
18 printf("%d ", gatherbuf[i]);
19 }

14
20 printf("\n");
21 }
22 MPI_Finalize();
23 return 0;
24 }

Expected Output (for 4 processes):

Process 0 received 10
Process 1 received 20
Process 2 received 30
Process 3 received 40
Gathered data: 20 40 60 80

Note: The order of scatter outputs may vary, but the gathered data is deterministic.

3.8 Reduce
MPI_Reduce combines data using an operation. Example: Summing values.
1 #include <stdio.h>
2 #include <mpi.h>
3 int main(int argc, char *argv[]) {
4 MPI_Init(&argc, &argv);
5 int rank;
6 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
7 int value = rank + 1;
8 int result;
9 MPI_Reduce(&value, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
10 if (rank == 0) {
11 printf("Sum of values: %d\n", result);
12 }
13 MPI_Finalize();
14 return 0;
15 }

Expected Output (for 4 processes):

Sum of values: 10

Note: The sum is 1 + 2 + 3 + 4 = 10, printed only by rank 0.

15
3.9 Synchronization
MPI_Barrier synchronizes all processes. Example: Ordered "Hello World".
1 #include <stdio.h>
2 #include <mpi.h>
3 int main(int argc, char *argv[]) {
4 MPI_Init(&argc, &argv);
5 int size, rank;
6 char dummy;
7 MPI_Comm_size(MPI_COMM_WORLD, &size);
8 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
9 if (rank != 0) {
10 int src = (rank + size - 1) % size;
11 MPI_Recv(&dummy, 1, MPI_CHAR, src, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
12 }
13 printf("Process %d is saying Hello World.\n", rank);
14 if (rank != size - 1) {
15 MPI_Send(&dummy, 1, MPI_CHAR, (rank + 1) % size, 0,
MPI_COMM_WORLD);
16 }
17 MPI_Finalize();
18 return 0;
19 }

Expected Output (for 4 processes):

Process 0 is saying Hello World.


Process 1 is saying Hello World.
Process 2 is saying Hello World.
Process 3 is saying Hello World.

Note: The outputs are ordered due to the token-passing synchronization.

3.10 Cluster Setup


Master Node:

1. Update: sudo apt update && sudo apt upgrade

2. Install MPI: sudo apt install build-essential mpi-default-dev

3. SSH: ssh-keygen -t rsa, ssh-copy-id user@slave_ip

4. NFS: sudo apt install nfs-server, configure /etc/exports.

5. Machines file: List slave IPs.

6. Run: mpirun -n 2 -machinefile machines ./program

Slave Node:

1. Update system.

2. Install MPI and SSH server.

3. NFS client: sudo apt install nfs-client, mount master’s share.

16
Table 1: Comparison of OpenMP and MPI

Feature OpenMP MPI


Memory Model Shared memory (threads share Distributed memory (processes
memory) have own memory)
Parallel Unit Threads Processes
Ease of Use Easier (directives, minimal Harder (explicit message pass-
changes) ing)
Scalability Limited to single node Scales to clusters
Communication Implicit (shared variables) Explicit (send/receive)
Synchronization Critical, single, master, barrier, Barrier, send/receive ordering
atomic
Use Case Multi-core CPUs, simple tasks Clusters, large-scale computing

4 OpenMP vs. MPI

5 Study Tips for Your Exam


• Understand Concepts: Differentiate shared vs. distributed memory.

• Memorize: Key directives (#pragma omp parallel) and MPI functions (MPI_Send).

• Practice: Compile and modify provided examples.

• Compare: Scheduling types, synchronization, OpenMP vs. MPI.

• Debug: Watch for race conditions (OpenMP) and deadlocks (MPI).

17

You might also like