0% found this document useful (0 votes)
6 views

PDC LAB Experiment 2

The document outlines an experiment focused on parallel computation using OpenMP, specifically for summation techniques and matrix-vector multiplication. It details the objectives, prerequisites, and provides source code examples for implementing parallel summation and prefix sum operations. The document emphasizes the importance of load balancing, data parallelism, and key OpenMP directives in optimizing performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

PDC LAB Experiment 2

The document outlines an experiment focused on parallel computation using OpenMP, specifically for summation techniques and matrix-vector multiplication. It details the objectives, prerequisites, and provides source code examples for implementing parallel summation and prefix sum operations. The document emphasizes the importance of load balancing, data parallelism, and key OpenMP directives in optimizing performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Experiment 2

Title: Parallel Computation with OpenMP: Summation Techniques and Matrix-Vector


Multiplication
Aim/Objective:
Illustrate parallel computation techniques using OpenMP for two fundamental algorithms—
summation and matrix-vector multiplication—aiming to enhance performance through
concurrent processing.
Description:
Implement OpenMP-based programs for parallel summation and matrix-vector multiplication,
exploring various OpenMP directives and strategies to optimize computation speed and
scalability.

Pre-Requisites:
Proficiency in a language supporting OpenMP (e.g., C/C++), understanding of OpenMP
directives, familiarity with array and matrix manipulation, and a foundational grasp of parallel
programming concepts including thread management, synchronization, and parallelization
strategies.
Pre-Lab:
1. What is the primary objective of utilizing OpenMP in parallel summation techniques?
improve computational efficiency and performance by distributing the workload of
summation across multiple threads.
1.Dividing Workload Across Threads
2 .Leveraging Multicore Processors
3. Simplifying Parallel Programming
4. Reducing Computational Bottlenecks: A 'computational bottleneck' refers to a point in
an algorithm where the computational demand is significantly high, causing a slowdown in
the overall process.
5. Maintaining Scalability:

2. Briefly discuss one advantage of using OpenMP for parallel summation techniques
compared to sequential approaches.
reduced execution time.
3 In the context of matrix-vector multiplication, define the term "data parallelism" and explain
how OpenMP leverages this concept for parallel computation.
Data parallelism refers to the simultaneous execution of the same operation on different pieces of data.
In the context of matrix-vector multiplication, this means distributing the computation of individual
rows of the matrix (each contributing to a single element of the result vector) across multiple threads or
processing units.
How OpenMP Leverages Data Parallelism:
Dividing the Rows of the Matrix Among Threads
Parallelizing the Loop: Using the #pragma omp parallel for
Managing Workload Distribution
Reducing Overhead

4. Explain the role of load balancing in the context of parallel matrix-vector multiplication
using OpenMP, and why it is crucial for optimizing performance.
Load balancing refers to the even distribution of computational tasks among the available threads or
processing units in a parallel system.
Why Load Balancing is Crucial for Optimizing Performance
Minimizing Idle Time
Reducing Execution Time
Improving Scalability
Mitigating(reducing) Overhead

5. Name one key OpenMP directive used in the parallelization of computations. Provide a
brief description of its purpose?
One key OpenMP directive is #pragma omp parallel.

In-Lab:
1. Implement Parallel Summation using OMP - The Array Element Sum Problem, Tree
structure global sum - Parallel-tree-sum.c
o Program:
Aim: Implement Parallel Summation using OMP - The Array Element Sum Problem .c

Source code:
#include <stdio.h>
#include <omp.h>

int main() {
int n = 10; // Number of elements in the array
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; // Array elements
int sum = 0; // Shared variable to store the result

// Parallel region with reduction to compute the sum


#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < n; i++) {
sum += arr[i]; // Add each element to the shared sum
}

printf("The sum of the array elements is: %d\n", sum);

return 0;
}

Output:

Program Overview
1. Array Initialization:
2. The array arr has 10 elements: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
3. Parallelization: The #pragma omp parallel for directive is used to parallelize the for
loop, and the reduction(+:sum) clause ensures that the sum is correctly computed in parallel
without race conditions.
4. Goal: Compute the sum of the elements in the array, i.e., 1 + 2 + 3 + ... + 10 = 55.
Tracing:
------------------------------------------------------------------------------------
Aim: Tree structure global sum
Source code:
#include <stdio.h>
#include <omp.h>

int main() {
int n = 16; // Number of elements in the array
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
int sum = 0; // Variable to store the global sum

omp_set_num_threads(4); // Set the number of threads to 4

int num_threads;
int local_sums[4] = {0}; // Array to hold partial sums for each thread

// Parallel region with thread-local partial sums


#pragma omp parallel
{
int tid = omp_get_thread_num(); // Thread ID
int chunk_size = n / omp_get_num_threads();
int start = tid * chunk_size; // Start index for this thread
int end = start + chunk_size; // End index for this thread

// Compute the partial sum for this thread


for (int i = start; i < end; i++) {
local_sums[tid] += arr[i];
}

// Synchronize threads before the final reduction


#pragma omp barrier
// Combine partial sums in a tree-like fashion
#pragma omp single
{
num_threads = omp_get_num_threads();
for (int step = 1; step < num_threads; step *= 2) {
for (int i = 0; i + step < num_threads; i += 2 * step) {
local_sums[i] += local_sums[i + step];
}
}
}
}

// The final sum is stored in local_sums[0]


sum = local_sums[0];
printf("The sum of the array elements is: %d\n", sum);

return 0;
}

Output:

Explanation
1. Initialization:
a. The array has 16 elements.
b. We set the number of threads to 4, so each thread handles 4 elements.
2. Thread Work:
a. Each thread computes the partial sum of its assigned chunk of the array:
i.Thread 0: arr[0..3]
ii.Thread 1: arr[4..7]
iii.Thread 2: arr[8..11]
iv.Thread 3: arr[12..15]
3. Tree-Structured Reduction:
a. After all threads compute their local sums, the results are combined in a tree-like
fashion:
i.Step 1: Combine sums from adjacent threads (0+1, 2+3).
ii.Step 2: Combine the results from the previous step (0+2).
b. This reduction process minimizes contention and is efficient for large arrays.

Tracing
-------------------------------------------------------------------------------------
Aim:Parallel-Tree-Sum.c
Source code:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define SIZE 16 // Size of the array (power of 2 for simplicity)

int main() {
int array[SIZE];
int i;

// Initialize the array with values 1 to SIZE


for (i = 0; i < SIZE; i++) {
array[i] = i + 1;
}

printf("Array elements: ");


for (i = 0; i < SIZE; i++) {
printf("%d ", array[i]);
}
printf("\n");

int sum = 0; // Final global sum


int step = 1; // Step size for tree reduction

// Perform the tree-based parallel sum


#pragma omp parallel
{
while (step < SIZE) {
#pragma omp for
for (i = 0; i < SIZE; i += 2 * step) {
array[i] += array[i + step];
}
#pragma omp barrier
step *= 2; // Move to the next level of the tree
}
}

// The total sum is stored in the first element


sum = array[0];

printf("Total Sum: %d\n", sum);


return 0;
}

Output
For SIZE = 16, the array elements are [1, 2, 3, ..., 16]. The output would look like this:

Explanation of the Code


use a binary tree structure to progressively reduce pairs of elements, eventually summing the entire
array.
Tracing
2. Implement a program to demonstrate Parallel prefix sum using OpenMP - OMP Parallel
prefix sumfinal.c
The prefix sum (or scan) is a common operation in parallel computing, where each element in an array
is replaced by the sum of all the previous elements in the array, including the element itself.
• Program:
#include <stdio.h>
#include <omp.h>

#define SIZE 16 // Size of the array (power of 2 for simplicity)

int main() {
int arr[SIZE];
int prefix_sum[SIZE]; // Array to hold the prefix sum
int i;

// Initialize the array with values 1 to SIZE


for (i = 0; i < SIZE; i++) {
arr[i] = i + 1;
}

// Print the original array


printf("Original array: ");
for (i = 0; i < SIZE; i++) {
printf("%d ", arr[i]);
}
printf("\n");

// Prefix sum computation using parallel approach


#pragma omp parallel
{
int tid = omp_get_thread_num(); // Thread ID
int step;

// Copy the input array to the prefix sum array


#pragma omp for
for (i = 0; i < SIZE; i++) {
prefix_sum[i] = arr[i];
}

// Perform the parallel prefix sum in a tree-like reduction fashion


for (step = 1; step < SIZE; step *= 2) {
// Parallel step where each thread computes its part of the prefix sum
#pragma omp for
for (i = step; i < SIZE; i++) {
prefix_sum[i] += prefix_sum[i - step]; // Add the value from step distance
}
}
}

// Print the prefix sum array


printf("Prefix sum: ");
for (i = 0; i < SIZE; i++) {
printf("%d ", prefix_sum[i]);
}
printf("\n");

return 0;
}
Final Output

Explanation of Output

You might also like