Lab Report 6
Lab Report 6
Faculty Member: Dr. Usman Zabit / Jafar Hussain Dated: 23rd October 2019
with Cuda
1.2 Deliverables
You are required to submit
• Code
• Observations and experiences
in the beginning of next lab.
Lab Tasks
Task B: GPUs & their Properties
Compile and run the prop.cu2 and observe the number of GPUs and their specifications in
your system.
Output:
Task C: Parallelizing a Vector Computation
Output:
Task C-II: Thread-level Parallelism
Change blockIdx.x to threadIdx.x in line 9 of code in snippet 3.3.2. Replace <<<N,1>>>
with <<<1,N>>> in line 38 as well. Compile and execute the code.
Output:
Observing the maximum thread dimensions allowed for GPU in properties, are you
prompted by the expected result? If not, what reason could have made it possible?
The maximum threads per block are 512, whereas N is 10000. We are not prompted any error or
warning. This can be because once 512 operations are executed in parallel, the next 512 will be
crunched after that automatically by the block and so on serially till all N are complete.
Recommend the maximum thread and block dimensions for optimum parallel processing
in GPUs.
Output:
MATLAB code:
A = zeros(64,64);
B = zeros(64,64);
for i = 1:64
for j = 1:64
A(i,j) = i-1+j-1;
B(i,j) = i-1-j+1;
end
end
C = A*B;
x = diag(C);
x(1:32)
MATLAB Output:
The output is the same as for the GPU code, hence verified.
Task D-II:
Evolve the code snippet in section 3.4.1 for rectangular matrix multiplication.
Our Code:
#include <stdio.h>
#define R1 16
#define C1 25
#define R2 25
#define C2 16
}
}
int main()
{
int matA[R1*C1] = { 0 };
int matB[R2*C2] = { 0 };
int matProd[R1*C2] = { 0 };
{
matA[i*C1 + j] = i+j;
}
}
matB[i*C2 + j] = i-j;
printf("Error\n");
return -1;
Output:
A = zeros(16,25);
B = zeros(25,16);
for i = 1:16
for j = 1:25
A(i,j) = i-1+j-1;
end
end
for i = 1:25
for j = 1:16
B(i,j) = i-1-j+1;
end
end
C = A*B;
x = diag(C);
x(1:8)
MATLAB Output:
Conclusion:
We get the same result for rectangular matrix multiplication in MATLAB as for our C
code on GPU. Hence, our code is verified.