PDC LAB Experiment 2
PDC LAB Experiment 2
Pre-Requisites:
Proficiency in a language supporting OpenMP (e.g., C/C++), understanding of OpenMP
directives, familiarity with array and matrix manipulation, and a foundational grasp of parallel
programming concepts including thread management, synchronization, and parallelization
strategies.
Pre-Lab:
1. What is the primary objective of utilizing OpenMP in parallel summation techniques?
improve computational efficiency and performance by distributing the workload of
summation across multiple threads.
1.Dividing Workload Across Threads
2 .Leveraging Multicore Processors
3. Simplifying Parallel Programming
4. Reducing Computational Bottlenecks: A 'computational bottleneck' refers to a point in
an algorithm where the computational demand is significantly high, causing a slowdown in
the overall process.
5. Maintaining Scalability:
2. Briefly discuss one advantage of using OpenMP for parallel summation techniques
compared to sequential approaches.
reduced execution time.
3 In the context of matrix-vector multiplication, define the term "data parallelism" and explain
how OpenMP leverages this concept for parallel computation.
Data parallelism refers to the simultaneous execution of the same operation on different pieces of data.
In the context of matrix-vector multiplication, this means distributing the computation of individual
rows of the matrix (each contributing to a single element of the result vector) across multiple threads or
processing units.
How OpenMP Leverages Data Parallelism:
Dividing the Rows of the Matrix Among Threads
Parallelizing the Loop: Using the #pragma omp parallel for
Managing Workload Distribution
Reducing Overhead
4. Explain the role of load balancing in the context of parallel matrix-vector multiplication
using OpenMP, and why it is crucial for optimizing performance.
Load balancing refers to the even distribution of computational tasks among the available threads or
processing units in a parallel system.
Why Load Balancing is Crucial for Optimizing Performance
Minimizing Idle Time
Reducing Execution Time
Improving Scalability
Mitigating(reducing) Overhead
5. Name one key OpenMP directive used in the parallelization of computations. Provide a
brief description of its purpose?
One key OpenMP directive is #pragma omp parallel.
In-Lab:
1. Implement Parallel Summation using OMP - The Array Element Sum Problem, Tree
structure global sum - Parallel-tree-sum.c
o Program:
Aim: Implement Parallel Summation using OMP - The Array Element Sum Problem .c
Source code:
#include <stdio.h>
#include <omp.h>
int main() {
int n = 10; // Number of elements in the array
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; // Array elements
int sum = 0; // Shared variable to store the result
return 0;
}
Output:
Program Overview
1. Array Initialization:
2. The array arr has 10 elements: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
3. Parallelization: The #pragma omp parallel for directive is used to parallelize the for
loop, and the reduction(+:sum) clause ensures that the sum is correctly computed in parallel
without race conditions.
4. Goal: Compute the sum of the elements in the array, i.e., 1 + 2 + 3 + ... + 10 = 55.
Tracing:
------------------------------------------------------------------------------------
Aim: Tree structure global sum
Source code:
#include <stdio.h>
#include <omp.h>
int main() {
int n = 16; // Number of elements in the array
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
int sum = 0; // Variable to store the global sum
int num_threads;
int local_sums[4] = {0}; // Array to hold partial sums for each thread
return 0;
}
Output:
Explanation
1. Initialization:
a. The array has 16 elements.
b. We set the number of threads to 4, so each thread handles 4 elements.
2. Thread Work:
a. Each thread computes the partial sum of its assigned chunk of the array:
i.Thread 0: arr[0..3]
ii.Thread 1: arr[4..7]
iii.Thread 2: arr[8..11]
iv.Thread 3: arr[12..15]
3. Tree-Structured Reduction:
a. After all threads compute their local sums, the results are combined in a tree-like
fashion:
i.Step 1: Combine sums from adjacent threads (0+1, 2+3).
ii.Step 2: Combine the results from the previous step (0+2).
b. This reduction process minimizes contention and is efficient for large arrays.
Tracing
-------------------------------------------------------------------------------------
Aim:Parallel-Tree-Sum.c
Source code:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
int array[SIZE];
int i;
Output
For SIZE = 16, the array elements are [1, 2, 3, ..., 16]. The output would look like this:
int main() {
int arr[SIZE];
int prefix_sum[SIZE]; // Array to hold the prefix sum
int i;
return 0;
}
Final Output
Explanation of Output