0% found this document useful (0 votes)

36 views9 pages

Par - 1 In-Term Exam - Course 2018/19-Q2

The document contains a parallel OpenMP code that transforms one vector into another based on the modulo 256 of each element. It asks questions about improving the parallel efficiency by changing synchronization constructs. The questions analyze strategies for maximizing parallelism when updating shared data in tasks, including using atomic operations, locks, and task decomposition.

Uploaded by

Juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views9 pages

Par - 1 In-Term Exam - Course 2018/19-Q2

Uploaded by

Juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

PAR – 1st In-Term Exam – Course 2018/19-Q2

April 3rd , 2019

Problem 1 (2.0 points) Given the following parallel code in OpenMP in which vector S is transformed into
vector D in such a way that the position in S of all those elements with the same value S[i]%256 are stored
in consecutive positions of D. At the end of the main program vector D will contain the position for all those
elements with value %256 equal to 0, followed by the position for all those elements with value %256 equal
to 1, ... up to those with value %256 equal to 255.

#define N 1024*1024*1024
unsigned int S[N], D[N], C[256];

void find_groups(unsigned int S, unsigned int C) {

unsigned int i, value, TMP[256];

#pragma omp parallel

#pragma omp single
{
#pragma omp taskloop grainsize(4)
for (i=0; i<256; i++) TMP[i]=0;

#pragma omp taskloop private(value) num_tasks(1024)

for (i=0; i<N; i++) {
value = S[i]%256;
#pragma omp critical
TMP[value]++;
}
}

C[0] = 0;
for (int i=1; i<256; i++) C[i] = C[i-1] + TMP[i];
}

void transform_vector(unsigned int *S, unsigned int *D, unsigned int *C) {
unsigned int i, value;

#pragma omp parallel

#pragma omp single
#pragma omp taskloop private(value) num_tasks(1024)
for (i=0; i<N; i++) {
value = S[i]%256;
#pragma omp critical
{
D[C[value]] = i;
C[value]++;
}
}
}

void main() {
find_groups(S, C);
transform_vector(S, D, C);
}
To do the transformation, function find_groups builds a vector TMP so that element TMP[value] in-
dicates the number of elements in S for which S[i]%256 = value; based on this vector TMP function
find_groups builds and returns vector C so that element C[value] indicates the initial position in D
where to store the information for all those elements whose S[i]%256 = value.
We ask you:

1. (0.5 points) Rewrite the code changing the synchronisation construct that is used in function
find_groups in order to reduce the overhead that is incurred in the parallel update of vector TMP.
Solution: To protect the update of vector TMP in function find_groups we just need an atomic
operation, which allows more concurrency and reduces the overhead in the data sharing.

...
#pragma omp taskloop private(value) num_tasks(1024)
for (i=0; i<N; i++) {
value = S[i]%GROUPS;
#pragma omp atomic
TMP[value]++;
}
...

2. (1.0 point) Rewrite the code changing the synchronisation construct that is used in function
transform_vector in order to maximise the possible parallelism in the update of the elements
of D and C.
Solution: To protect the update of the other two vectors in function transform_vector we need to
use a vector of OpenMP locks in order to allow the update of each region in vector D and its associated
counter in C. The use of locks requires its creation and destruction at some point.

void transform_vector(unsigned int *S, unsigned int *D, unsigned int *C) {
unsigned int i, value;
omp_lock_t lock_vector[256];

#pragma omp parallel

#pragma omp single
{
#pragma omp taskloop grainsize(4)
for(i=0; i<256; i++) omp_init_lock(&lock_vector[i]);

#pragma omp taskloop private(value) num_tasks(1024)

for (i=0; i<N; i++) {
value = S[i]%256;
omp_set_lock(&lock_vector[value]);
D[C[value]] = i;
C[value]++;
omp_set_lock(&lock_vector[value]);
}

#pragma omp taskloop grainsize(4)

for(i=0; i<256; i++) omp_destroy_lock(&lock_vector[i]);
}
}

3. (0.5 points) Assuming the following new version for function transform_vector:

void transform_vector(unsigned int *S, unsigned int *D, unsigned int *C) {
unsigned int i, value;
#pragma omp parallel
#pragma omp single
#pragma omp taskloop private(i) grainsize(4)
for (value=0; value<256; value++)
for (i=0; i<N; i++)
if (S[i]%256 == value) {
D[C[value]] = i;
C[value]++;
}
}

Insert the necessary synchronisation constructs that guarantee the correct update of the elements of
vectors D and C.
Solution: Since each task is assigned with different values for value then there is no need to syn-
chronize the access to variables D and C.

Problem 2 (2.0 points) Given the following task dependence graphs for three different parallelization strate-
gies of a sequential code:
Strategy A Strategy B Strategy C
0 1 2 3 4 5 6 7 8 0 3 6 0

1 4 7 1 2 3 4 5

2 5 8 6 7 8

Answer the following questions:

1. (1.0 point) Compute the Parallelism and Pmin metrics for each one of the three dependence graphs
assuming that the cost of executing each task is tc time units.
Solution: For all strategies T1 = 9 × tc . Then for each strategy we have:
(a) Strategy A: T∞ = 9 × tc so Parallelism= T1 ÷ T∞ = 1; the minimum number of processors to
achieve this parallelism is Pmin = 1.
(b) Strategy B: T∞ = 3 × tc , Parallelism= 3 and Pmin = 3.
(c) Strategy C: T∞ = 3 × tc , Parallelism= 3 and Pmin = 4.
2. (1.0 point) Assuming a multiprocessor with P = 3 processors and the following mapping of tasks to
processors for each strategy:
• Strategy A and B: P 0 ← {0, 3, 6}; P 1 ← {1, 4, 7}; P 2 ← {2, 5, 8}.
• Strategy C: P 0 ← {0, 1, 2}, P 1 ← {3, 4, 6}; P 2 ← {5, 7, 8}.
Obtain the general expression for the speed–up SP=3 for each strategy and associated mapping, assum-
ing that there is an overhead related with task synchronisation of tsynch time units, i.e. the overhead
that a task has to pay to signal to ALL its successor tasks that it has finished.
Solution: For each parallel strategy we have:

(a) Strategy A: T3 = 9 × tc + 8 × tsynchr ; therefore S3 = (9 × tc ) ÷ (9 × tc + 8 × tsynchr ).

(b) Strategy B: T3 = 3 × tc + 2 × tsynchr ; therefore S3 = (9 × tc ) ÷ (3 × tc + 2 × tsynchr ).
(c) This figure shows the timeline with the parallel execution of the tasks and the synchronisation
overheads between tasks that are necessary.
Strategy C
P0 0 1 2
0
P1 3 4 6
1 2 3 4 5
P2 5 7 8
6 7 8
Time

Therefore, for Strategy C: T3 = 4×tc +3×tsynchr ; and therefore S3 = (9×tc )÷(4×tc +3×tsynchr ).

Problem 3 (3.0 points) Given the following sequential code:

#define N 1024*1024*1024
int S[N][N], D[N][N];

// Operation using a, b, c, and d. It doesn’t access/modify any other shared memory

int PROCESS(int a, int b, int c, int d);

void A2B_process(int A[N][N], int B[N][N]) {

for(unsigned int i=1; i<N-1; i++)
for(unsigned int j=1; j<N-1; j++)
A[i][j] += PROCESS(B[i-1][j],B[i][j-1],B[i-1][j-1],B[i+1][j+1]);
}

void A2A_process(int A[N][N]) {

for(unsigned int i=1; i<N-1; i++)
for(unsigned int j=1; j<N-1; j++)
A[i][j] += PROCESS(A[i-1][j],A[i][j-1],A[i-1][j-1],A[i+1][j+1]);
}

void main() {
...
A2B_process(S,D);
...
A2A_process(D);
...
}

We ask you:

1. (1.0 point) Assume that PROCESS is an operation whose cost depends on the input data it receives,
that is, its execution can significantly vary depending on the value of the input arguments. Write an
OpenMP parallelisation for A2B_process following an Iterative Task Decomposition strategy that
tries to maximise the load balance we can achieve. Explain the reason of your parallel code directives.
Solution:
There are not dependences. Two possible solutions are proposed:

(a) First solution: It can be used omp parallel for with implicit tasks. In addition,
collapse(2) and schedule(dynamic), by default (dynamic,1), in order to dynamically
schedule only one PROCESS computation every time. With this type of fine-grain scheduling
we try to maximize the load balance of the work to be done among threads.
(b) Second solution: we can use explicit tasks so that each task is an invocation of PROCESS com-
putation. This way any idle threads will execute one task every time, trying to maximize the load
balance among threads.
#define N 1024*1024*1024
unsigned int S[N][N], D[N][N];

// Operation using a, b, c, and d. It doesn’t operate with any other shared memory
void PROCESS(unsigned int a, unsigned int b, unsigned int c, unsigned int d);

// First Solution

void A2B_process(unsigned int A[N][N], unsigned int B[N][N])

{
unsigned int i,j;

#pragma omp parallel for collapse(2) schedule(dynamic)

for(i=1; i<N-1; i++)
for(j=1; j<N-1; j++)
A[i][j]+= PROCESS(B[i-1][j],B[i][j-1],B[i-1][j-1],B[i+1][j+1]);

// Second Alternative Solution:

void A2B_process(unsigned int A[N][N], unsigned int B[N][N])

{
unsigned int i,j;

#pragma omp parallel

#pragma omp single
for(i=1; i<N-1; i++)
for(j=1; j<N-1; j++)
#pragma omp task firstprivate(i,j)
A[i][j]+= PROCESS(B[i-1][j],B[i][j-1],B[i-1][j-1],B[i+1][j+1]);

2. (1.0 point) Assume that PROCESS is a time consuming operation (coarse grain) whose cost is always the
same. Write an OpenMP parallelisation for A2A_process using Implicit Tasks following an Iterative
Task Decomposition strategy.
Solution:
There are dependences. We use doacross and specify the dependences between iterations. As the load
balance is not a problem in this exercise, we statically schedule (the default) the loop i iterations.

#define N 1024*1024*1024
unsigned int S[N][N], D[N][N];

void A2A_process(unsigned int A[N][N])

{
unsigned int i,j;
#pragma omp parallel for ordered(2) private(j)
for(i=1; i<N-1; i++)
for(j=1; j<N-1; j++)
{
#pragma omp ordered depend(sink:i-1,j) depend(sink: i,j-1)
A[i][j]+= PROCESS(A[i-1][j],A[i][j-1],A[i-1][j-1],A[i+1][j+1]);
#pragma omp ordered depend(source)
}

}
3. (1.0 point) Assume that PROCESS is a time consuming operation (coarse grain) whose cost is always the
same. Write an OpenMP parallelisation for A2A_process using Explicit Tasks following an Iterative
Task Decomposition strategy.
Solution:
There are dependences. We use task explicits with data in and out dependences. Although we only
need left (A[i][j-1]) and up (A[i-1][j]) true data dependences to be defined, we define all of them. Load
balance is not a problem here since tasks are dynamically scheduled. Task creation overhead is not a
problem since it is said that PROCESS (one task computation) is a very time consuming operation.

#define N 1024*1024*1024
unsigned int S[N][N], D[N][N];

void A2A_process(unsigned int A[N][N])

{

#pragma omp parallel

#pragma omp single
{
unsigned int i,j;
for(i=1; i<N-1; i++)
for(j=1; j<N-1; j++)
#pragma omp task depend(in:A[i-i][j],A[i][j-1],A[i-1][j-1],A[i+1][j+1]) \
depend(out:A[i][j])
A[i][j]+= PROCESS(A[i-1][j],A[i][j-1],A[i-1][j-1],A[i+1][j+1]);

}
}

Problem 4 (3.0 points) Given the following C code:

#define N 1000000
#define MINSIZE 4
#define MAXGRAINSIZE MAXROW

typedef struct {
int size; // size is always smaller than or equal to MAXROW
float *data;
} tRow;

// Function partition is already implemented:

// 1) it finds the index such that total number of
// elements is well-balanced between both partitions
// (index belongs to right partition), and
// 2) it returns the total number of elements in each partition
void partition (tRow *rows, int nrows, int *index, int *nelem_left, int *nelem_right);

void process_rows (tRow *rows, int nrows) {

for (int r=0; r<nrows; r++)
for (int i=0; i<rows[r].size; i++)
foo (&rows[r].data[i]); // only modifies the parameter
}

void process_rows_rec (tRow *rows, int nrows) {

int index, nelem_left, nelem_right;

if (nrows < MINSIZE)

process_rows (rows, nrows);
else {
partition (rows, nrows, &index, &nelem_left, &nelem_right);
process_rows_rec (rows, index);
process_rows_rec (&rows[index], nrows-index);
}
return;
}

void main () {
tRow rows[N];
// initialization of rows
// each row can have different size
...
process_rows_rec (rows, N);
}

We ask you:

1. (1.0 point) Write a parallel version in OpenMP implementing a Recursive Task Decomposition following
a Tree strategy.
Solution:

#define N 1000000
#define MINSIZE 4
#define MAXGRAINSIZE MAXROW

typedef struct {
int size; // size is always smaller than or equal to MAXROW
float *data;
} tRow;

// finds the index such that total number of elements

// is well-balanced between both partitions and
// return also the number of elements on each partition
// (index belongs to right partition)
void partition (tRow *rows, int nrows, int *index, int *nelem_left, int *nelem_right);

void process_rows (tRow *rows, int nrows) {

for (int r=0; (r<nrows); r++)
for (int i=0; (i<rows[r].size); i++)
foo (&rows[r].data[i]); // only modifies the parameter
}

void process_rows_rec (tRow *rows, int nrows) {

int index, nelem_left, nelem_right;

if (nrows < MINSIZE)

process_rows (rows, nrows);
else {
partition (rows, nrows, &index, &nelem_left, &nelem_right);
#pragma omp task
process_rows_rec (rows, index);
#pragma omp task
process_rows_rec (&rows[index], nrows-index);
}
}

void main () {
tRow rows[N];
// initialization of rows
// each row can have different size
...
#pragma omp parallel
#pragma omp single
process_rows_rec (rows, N);
}

2. (2.0 points) Modify the previous parallel code to control task generation, not allowing task creation
when granularity (i.e. total number of elements to be processed) is smaller than MAXGRAINSIZE. You
should not use the OpenMP mergeable clause.
Solution:
In order to have final tasks with granularity less than or equal to MAXGRAINSIZE, the cutoff control
has to be done on the number of elements of the partition to be processed.

#define N 1000000
#define MINSIZE 4
#define MAXGRAINSIZE MAXROW

typedef struct {
int size; // size value is always less than or equal to MAXROW
float *data;
} tRow;

// finds the index such that total number of elements

// is well-balanced between both partitions and
// return also the number of elements on each sub-partition
// (index belongs to right partition)
void partition (tRow *rows, int nrows, int *index, int *nelem_left, int *nelem_right);

void process_rows (tRow *rows, int nrows) {

for (int r=0; (r<nrows); r++)
for (int i=0; (i<rows[r].size); i++)
foo (&rows[r].data[i]); // only modifies the parameter
}

void process_rows_rec (tRow *rows, int nrows) {

int index, nelem_left, nelem_right;

if (nrows < MINSIZE)

process_rows (rows, nrows);
else {
partition (rows, nrows, &index, &nelem_left, &nelem_right);
if (!omp_in_final()) {
#pragma omp task final (nelem_left <= MAXGRAINSIZE)
process_rows_rec (rows, index);
#pragma omp task final (nelem_right) <= MAXGRAINSIZE)
process_rows_rec (&rows[index], nrows-index);
}
else {
process_rows_rec (rows, index);
process_rows_rec (&rows[index], nrows-index);
}
}
return;
}
void main () {
tRow rows[N];
// initialization of rows
// each row can have different size
...
#pragma omp parallel
#pragma omp single
process_rows_rec (rows, N);
}

Solutions To Exercises On Parallelism and Concurrency
No ratings yet
Solutions To Exercises On Parallelism and Concurrency
5 pages
Par - 1 In-Term Exam - Course 2017/18-Q2
No ratings yet
Par - 1 In-Term Exam - Course 2017/18-Q2
7 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
9 pages
Mid Sem QP&Solution
No ratings yet
Mid Sem QP&Solution
7 pages
Par - 1 In-Term Exam - Course 2017/18-Q1: Matrix U J
No ratings yet
Par - 1 In-Term Exam - Course 2017/18-Q1: Matrix U J
11 pages
18-Assignment 1 - Solution
No ratings yet
18-Assignment 1 - Solution
12 pages
Excelente
No ratings yet
Excelente
64 pages
Assignment No. 2 PDC 21L-1786
No ratings yet
Assignment No. 2 PDC 21L-1786
6 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
mid 19
No ratings yet
mid 19
3 pages
Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line
No ratings yet
Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line
9 pages
written_asst2
No ratings yet
written_asst2
27 pages
t2 2017 Key
No ratings yet
t2 2017 Key
7 pages
Lab 3
No ratings yet
Lab 3
23 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Day 2 1 Advanced-Openmp
No ratings yet
Day 2 1 Advanced-Openmp
52 pages
Parallel Computing Lab Manual PDF
100% (1)
Parallel Computing Lab Manual PDF
51 pages
TP3
No ratings yet
TP3
3 pages
HPC Int I Retest Answer Key
No ratings yet
HPC Int I Retest Answer Key
10 pages
MAP55611-1-2
No ratings yet
MAP55611-1-2
6 pages
PAR Final Lab Sol 2023 24Q1
No ratings yet
PAR Final Lab Sol 2023 24Q1
3 pages
PAR Control1 2016 Q1
No ratings yet
PAR Control1 2016 Q1
6 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
4 Performance.4x
No ratings yet
4 Performance.4x
14 pages
W8L2 OpenMP7 Tasks
No ratings yet
W8L2 OpenMP7 Tasks
21 pages
#Include #Include #Define
No ratings yet
#Include #Include #Define
8 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
RG2-ParallelizationPrinciples-HPCAI-Jan2020
No ratings yet
RG2-ParallelizationPrinciples-HPCAI-Jan2020
40 pages
AAM Test-2 Scheme and Solutions-December 2024
No ratings yet
AAM Test-2 Scheme and Solutions-December 2024
8 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Quiz For Chapter 7 With Solutions
No ratings yet
Quiz For Chapter 7 With Solutions
8 pages
8 Week Report
No ratings yet
8 Week Report
23 pages
04_progbasics
No ratings yet
04_progbasics
51 pages
PAR - Final Exam: Part 1 - Course 2020/21-Q1 January 18, 2020
No ratings yet
PAR - Final Exam: Part 1 - Course 2020/21-Q1 January 18, 2020
5 pages
Name: Harshvardhan Singh Gahlaut Reg. No.: 19BCE2372 Slot: L41+L42
No ratings yet
Name: Harshvardhan Singh Gahlaut Reg. No.: 19BCE2372 Slot: L41+L42
3 pages
written_asst1
No ratings yet
written_asst1
31 pages
Untitled document
No ratings yet
Untitled document
23 pages
Untitled
No ratings yet
Untitled
25 pages
Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center
No ratings yet
Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center
17 pages
Major
No ratings yet
Major
10 pages
HW 3
No ratings yet
HW 3
12 pages
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
No ratings yet
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
32 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
CS222 - COAL - SOLUTION - Final - Spring2023
No ratings yet
CS222 - COAL - SOLUTION - Final - Spring2023
12 pages
Mid 1 Spring 2024
No ratings yet
Mid 1 Spring 2024
9 pages
Cao 2021 HW2
No ratings yet
Cao 2021 HW2
4 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
10 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
Assignment1
No ratings yet
Assignment1
4 pages
22l-6819
No ratings yet
22l-6819
8 pages
Assign 1-Statistical Summaries Using Pthreads
No ratings yet
Assign 1-Statistical Summaries Using Pthreads
4 pages
labquiz3
No ratings yet
labquiz3
8 pages
03 Progmodels
No ratings yet
03 Progmodels
47 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
Processors
No ratings yet
Processors
25 pages
Parallel Processing Previous Year Question
No ratings yet
Parallel Processing Previous Year Question
11 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
XI - PI - Command Line Sample Functions - SAP Blogs
No ratings yet
XI - PI - Command Line Sample Functions - SAP Blogs
9 pages
Aml Crashlog
No ratings yet
Aml Crashlog
20 pages
DX Diag
No ratings yet
DX Diag
38 pages
Final Practical No 02 2022
No ratings yet
Final Practical No 02 2022
7 pages
How To Secure Your Computer and Surf Fully Anonymous BLACK-HAT STYLE
100% (3)
How To Secure Your Computer and Surf Fully Anonymous BLACK-HAT STYLE
30 pages
Thread Basics
No ratings yet
Thread Basics
20 pages
Ccs335 Cc Lab Manual Lab Experiments
No ratings yet
Ccs335 Cc Lab Manual Lab Experiments
78 pages
Mac Os X Manual Page - Hdiutil
No ratings yet
Mac Os X Manual Page - Hdiutil
19 pages
Linux Kernel Labs
No ratings yet
Linux Kernel Labs
50 pages
Query To Local
No ratings yet
Query To Local
12 pages
The Complete Guide For Linux System Administration CH03 Powerpoint
No ratings yet
The Complete Guide For Linux System Administration CH03 Powerpoint
40 pages
Lect_01_Introduction
No ratings yet
Lect_01_Introduction
20 pages
GPU Scheduling: Scheduling and Thread Scheduling Are Often Used Interchangeably. in This
No ratings yet
GPU Scheduling: Scheduling and Thread Scheduling Are Often Used Interchangeably. in This
20 pages
SFDX Setup
No ratings yet
SFDX Setup
19 pages
How To Make Portable With NSIS
No ratings yet
How To Make Portable With NSIS
5 pages
Android - How To Unpack - Pack Factory Images
No ratings yet
Android - How To Unpack - Pack Factory Images
8 pages
Stepby Step by Step Install Oraacle 10g For HP-UX
No ratings yet
Stepby Step by Step Install Oraacle 10g For HP-UX
8 pages
Log
No ratings yet
Log
3 pages
Linux
No ratings yet
Linux
28 pages
Windows WD Align Version 2-0-107 Release Notes
No ratings yet
Windows WD Align Version 2-0-107 Release Notes
3 pages
Module 3 - Containers Vs Virtualization (Docker)
No ratings yet
Module 3 - Containers Vs Virtualization (Docker)
41 pages
Configuration Commands-Git
No ratings yet
Configuration Commands-Git
8 pages
GNS3 - Microcore Live CD
No ratings yet
GNS3 - Microcore Live CD
4 pages
2.1 History Unix Linux
0% (1)
2.1 History Unix Linux
27 pages
Ebooks File (Ebook PDF) A+ Guide To Software 9th Edition All Chapters
100% (6)
Ebooks File (Ebook PDF) A+ Guide To Software 9th Edition All Chapters
49 pages
File Upload Cheat Sheet
No ratings yet
File Upload Cheat Sheet
5 pages
Introduction To Automation With Bash
No ratings yet
Introduction To Automation With Bash
5 pages
Assignment: Parallel and Distributed Computing Submitted To: Sir Shoaib Date: 25-03-2019
No ratings yet
Assignment: Parallel and Distributed Computing Submitted To: Sir Shoaib Date: 25-03-2019
5 pages
Documentacion RALINK5350
No ratings yet
Documentacion RALINK5350
29 pages
MT6582 Scatter LVP5 3G ROW Voice.0.024.01
No ratings yet
MT6582 Scatter LVP5 3G ROW Voice.0.024.01
6 pages

Par - 1 In-Term Exam - Course 2018/19-Q2

Uploaded by

Par - 1 In-Term Exam - Course 2018/19-Q2

Uploaded by

PAR – 1st In-Term Exam – Course 2018/19-Q2

April 3rd , 2019

void find_groups(unsigned int *S, unsigned int *C) {

#pragma omp parallel

#pragma omp taskloop private(value) num_tasks(1024)

#pragma omp parallel

#pragma omp parallel

#pragma omp taskloop private(value) num_tasks(1024)

#pragma omp taskloop grainsize(4)

Answer the following questions:

(a) Strategy A: T3 = 9 × tc + 8 × tsynchr ; therefore S3 = (9 × tc ) ÷ (9 × tc + 8 × tsynchr ).

Problem 3 (3.0 points) Given the following sequential code:

// Operation using a, b, c, and d. It doesn’t access/modify any other shared memory

void A2B_process(int A[N][N], int B[N][N]) {

void A2A_process(int A[N][N]) {

void A2B_process(unsigned int A[N][N], unsigned int B[N][N])

#pragma omp parallel for collapse(2) schedule(dynamic)

// Second Alternative Solution:

void A2B_process(unsigned int A[N][N], unsigned int B[N][N])

#pragma omp parallel

void A2A_process(unsigned int A[N][N])

void A2A_process(unsigned int A[N][N])

#pragma omp parallel

Problem 4 (3.0 points) Given the following C code:

// Function partition is already implemented:

void process_rows (tRow *rows, int nrows) {

void process_rows_rec (tRow *rows, int nrows) {

if (nrows < MINSIZE)

// finds the index such that total number of elements

void process_rows (tRow *rows, int nrows) {

void process_rows_rec (tRow *rows, int nrows) {

if (nrows < MINSIZE)

// finds the index such that total number of elements

void process_rows (tRow *rows, int nrows) {

void process_rows_rec (tRow *rows, int nrows) {

if (nrows < MINSIZE)

You might also like

void find_groups(unsigned int S, unsigned int C) {