0% found this document useful (0 votes)

37 views

Lab Report 6

The document summarizes work done on GPU programming labs. It includes tasks on exploring GPU properties, parallelizing vector computations on GPU using CUDA, and performing matrix multiplication on GPU. The tasks are verified by comparing outputs with MATLAB calculations. Rectangular matrix multiplication is implemented on GPU by evolving the code for square matrices. Verification by MATLAB shows the GPU code generates correct results.

Uploaded by

Rama Ali

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Lab Report 6

Uploaded by

Rama Ali

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Department of Electrical Engineering

Faculty Member: Dr. Usman Zabit / Jafar Hussain Dated: 23rd October 2019

Semester: Fall 2019 (7th) Section: Group-1

EE-423 – Embedded System Design

Lab 6: Introduction to GPU Programming

with Cuda

PLO4 PLO5 PLO8 PLO9

Name Reg. No Viva / Analysis Modern Ethics Individua
Quiz / Lab of data in Tool and l and
Performa Lab Usage Safety Team
nce Report Work

5 Marks 5 Marks 5 Marks 5 Marks 5 Marks

Muhammad Talha 198257

Rama Ali 182621

Muhammad Fahad Baig 174885

1.1 Learning Objectives
By the end of this lab you will be able to:
1. Exploit GPU architecture to parallelize previously sequential nature of routines.
2. Appreciate significance of CPU and GPU for various genre of computations.
3. Express SIMD nature of computations through Cuda extensions for C programming
language.

1.2 Deliverables
You are required to submit
• Code
• Observations and experiences
in the beginning of next lab.
Lab Tasks
Task B: GPUs & their Properties
Compile and run the prop.cu2 and observe the number of GPUs and their specifications in
your system.

Output:
Task C: Parallelizing a Vector Computation

Task C-I: Block-level Parallelism

Compile and run the vec cpu.cu and vec gpu.cu; observe the results.

Output:
Task C-II: Thread-level Parallelism
Change blockIdx.x to threadIdx.x in line 9 of code in snippet 3.3.2. Replace <<<N,1>>>
with <<<1,N>>> in line 38 as well. Compile and execute the code.

Output:

Observing the maximum thread dimensions allowed for GPU in properties, are you
prompted by the expected result? If not, what reason could have made it possible?
The maximum threads per block are 512, whereas N is 10000. We are not prompted any error or
warning. This can be because once 512 operations are executed in parallel, the next 512 will be
crunched after that automatically by the block and so on serially till all N are complete.

Recommend the maximum thread and block dimensions for optimum parallel processing
in GPUs.

Maximum thread dimensions (512,512,64)

Maximum grid dimensions (65535,65535,1)

Task C-III: Threads & Blocks Combined

Call the kernel with following snippet now:

1 #define threadsPerBlock 64
2 compute<<<ceil (N/ threadsPerBlock ) , threadsPerBlcok >>>(dev a , dev b , dev c ) ;
and change the indexing technique to
1 // indexing with block ID and thread Id combined
2 int i = block Id.x∗blockDim.x + threadIdx.x ;

Output:

Task D: Matrix Multiplication on GPU

Task D-I
Compile the code in multSq.cu5 and observe the output. Verify using MATLAB or any other
tool possible.
Output:

MATLAB code:

A = zeros(64,64);
B = zeros(64,64);
for i = 1:64
for j = 1:64
A(i,j) = i-1+j-1;
B(i,j) = i-1-j+1;
end
end
C = A*B;
x = diag(C);
x(1:32)
MATLAB Output:

The output is the same as for the GPU code, hence verified.

Task D-II:
Evolve the code snippet in section 3.4.1 for rectangular matrix multiplication.

Our Code:

#include <stdio.h>

#define R1 16
#define C1 25
#define R2 25
#define C2 16

global void matMult(int * matProd, int * matA, int * matB)

{
int row = blockIdx.x;

int col = threadIdx.x;

int tmpSum = 0;;

if (row < R1 && col < C2)

{

for (int i=0; i<C1; i++){

tmpSum += matA[rowC1 + i] matB[i*C2 + col];

matProd[row*C2 + col] = tmpSum;

}
}

int main()
{

// initialize, aalocate and define host memory

int matA[R1*C1] = { 0 };

int matB[R2*C2] = { 0 };

int matProd[R1*C2] = { 0 };

for(int i=0; i<R1; ++i)

for (int j=0; j<C1; ++j)

{
matA[i*C1 + j] = i+j;
}
}

for (int i=0; i<R2; ++i)

for (int j=0; j<C2; ++j)

matB[i*C2 + j] = i-j;

// initialize and allocate device memory

int * dev_matProd, * dev_matA, * dev_matB;

cudaMalloc((void **)&dev_matA, R1C1sizeof(int));

cudaMalloc((void **)&dev_matB, R2C2sizeof(int));

cudaMalloc((void **)&dev_matProd, R1C2sizeof(int));

// copy data to device memory

cudaMemcpy((void )dev_matA, (void )matA, R1C1sizeof(int), cudaMemcpyHostToDevice);

cudaMemcpy((void )dev_matB, (void )matB, R2C2sizeof(int),cudaMemcpyHostToDevice);

matMult<<<R1,C2>>>(dev_matProd, dev_matA, dev_matB);

// check for successful thread execution

if (cudaDeviceSynchronize() != cudaSuccess)

printf("Error\n");

return -1;

// copy results from device to host memory

cudaMemcpy(matProd, dev_matProd, R1C2sizeof(int),cudaMemcpyDeviceToHost);

for (int i=0; i<R1/2; ++i) // inspecting first few diagnols

{
printf(" > Diagonal %d of prudect is %d.\n", i, matProd[i*C2+i]);
}

Output:

MATLAB Code (for verifying result):

A = zeros(16,25);
B = zeros(25,16);
for i = 1:16
for j = 1:25
A(i,j) = i-1+j-1;
end
end
for i = 1:25
for j = 1:16
B(i,j) = i-1-j+1;
end
end
C = A*B;
x = diag(C);
x(1:8)

MATLAB Output:

Conclusion:

We get the same result for rectangular matrix multiplication in MATLAB as for our C
code on GPU. Hence, our code is verified.

Sangeeta Sudha
No ratings yet
Sangeeta Sudha
1 page
MSM-Micronetics Standart Mumps - Utility Manual v.4.0 (Micronetics) 1993 Revised
No ratings yet
MSM-Micronetics Standart Mumps - Utility Manual v.4.0 (Micronetics) 1993 Revised
354 pages
Reichelt, H., Marx's Critique of Economic Categories: Reflections On The Problem of Validity in The Dialectical Method of Presentation in Capital
No ratings yet
Reichelt, H., Marx's Critique of Economic Categories: Reflections On The Problem of Validity in The Dialectical Method of Presentation in Capital
50 pages
Hindustan College of Science and Technology Farah, Mathura
No ratings yet
Hindustan College of Science and Technology Farah, Mathura
34 pages
BEE-9D ESD Lab 1
No ratings yet
BEE-9D ESD Lab 1
4 pages
CUDA - MonteCarloPi Code
No ratings yet
CUDA - MonteCarloPi Code
6 pages
Course: Parallel Processing Lab #2 - Multithreads and Openmp
No ratings yet
Course: Parallel Processing Lab #2 - Multithreads and Openmp
14 pages
AEL ZG626 EC-3R FIRST SEM 2024-2025
No ratings yet
AEL ZG626 EC-3R FIRST SEM 2024-2025
5 pages
CUDA Introduction
No ratings yet
CUDA Introduction
71 pages
ECPC Assignment 2 - With Ans
No ratings yet
ECPC Assignment 2 - With Ans
4 pages
VLSI Lab Manual - 2022-1
No ratings yet
VLSI Lab Manual - 2022-1
54 pages
COMP600 Spring Lab#2
No ratings yet
COMP600 Spring Lab#2
7 pages
Lab 10,11
No ratings yet
Lab 10,11
4 pages
wc&iot practicals
No ratings yet
wc&iot practicals
12 pages
Lab 05 - MES
No ratings yet
Lab 05 - MES
6 pages
VLSI Lab Manual
No ratings yet
VLSI Lab Manual
83 pages
Class4 Advanced Cuda Opencl
No ratings yet
Class4 Advanced Cuda Opencl
64 pages
06-Intro To Opencl PDF
No ratings yet
06-Intro To Opencl PDF
57 pages
Independent Verification of Ipsec Functionality in Freebsd: David Honig
No ratings yet
Independent Verification of Ipsec Functionality in Freebsd: David Honig
6 pages
FPGA Design Flow & Experiment 1
No ratings yet
FPGA Design Flow & Experiment 1
5 pages
C++ Interview: Option 1
No ratings yet
C++ Interview: Option 1
3 pages
C++ Interview
No ratings yet
C++ Interview
3 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
Computer Laboratory Manual: Fundamental of ICT Lab Manual
No ratings yet
Computer Laboratory Manual: Fundamental of ICT Lab Manual
133 pages
Vlsi Manual 2013-2014
No ratings yet
Vlsi Manual 2013-2014
36 pages
GPU_Assignment-3_Solution
No ratings yet
GPU_Assignment-3_Solution
4 pages
Chat GPT
No ratings yet
Chat GPT
7 pages
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
No ratings yet
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
21 pages
Experiment 5
No ratings yet
Experiment 5
7 pages
Umesh Multithreading
No ratings yet
Umesh Multithreading
34 pages
USB Thermometer
No ratings yet
USB Thermometer
40 pages
ES1113 Fundamental of Automation Engineering II Lab Activity - 9
No ratings yet
ES1113 Fundamental of Automation Engineering II Lab Activity - 9
3 pages
OS Question Bank - All Modules - II ND Year
100% (2)
OS Question Bank - All Modules - II ND Year
8 pages
govind_6
No ratings yet
govind_6
4 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Modern GPU
100% (1)
Modern GPU
221 pages
EE 354 ES Consolidated Lab Manual Sp23 (Lab1-9)
No ratings yet
EE 354 ES Consolidated Lab Manual Sp23 (Lab1-9)
98 pages
Kammavari Sangam ® K.S.Institute of Technology: Computer Communication Networks Lab 15TEL77
No ratings yet
Kammavari Sangam ® K.S.Institute of Technology: Computer Communication Networks Lab 15TEL77
63 pages
Lab Manual of Nsit PDF
No ratings yet
Lab Manual of Nsit PDF
63 pages
Worksheet 1.1, 21BCS9666, Avishek Kumar
No ratings yet
Worksheet 1.1, 21BCS9666, Avishek Kumar
8 pages
Dsu Micro Project
No ratings yet
Dsu Micro Project
19 pages
Lab Manual: III Year/V Sem B.Tech Regulation: 2015 Sub. Code & Sub. Title: 15IT303J - Computer Networks Lab
No ratings yet
Lab Manual: III Year/V Sem B.Tech Regulation: 2015 Sub. Code & Sub. Title: 15IT303J - Computer Networks Lab
84 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
DD Lab Exp1_2_3
No ratings yet
DD Lab Exp1_2_3
33 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
C++ Gnuradio
100% (1)
C++ Gnuradio
23 pages
Final
No ratings yet
Final
4 pages
Objective Questions
No ratings yet
Objective Questions
14 pages
VE4072-RTES-lab manual
No ratings yet
VE4072-RTES-lab manual
34 pages
Interview Questions
No ratings yet
Interview Questions
9 pages
Lab Manual MES Experiment 1
No ratings yet
Lab Manual MES Experiment 1
7 pages
First Exam OS
No ratings yet
First Exam OS
8 pages
CS Solutions Class Xii
No ratings yet
CS Solutions Class Xii
219 pages
Computer Graphics AND Multimedia: ETCS 257
100% (1)
Computer Graphics AND Multimedia: ETCS 257
75 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
Iot 3 (1)_1907 (1) (1)
No ratings yet
Iot 3 (1)_1907 (1) (1)
5 pages
Compter Network
No ratings yet
Compter Network
26 pages
HP Course Structure 80 Hours
No ratings yet
HP Course Structure 80 Hours
3 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
Aimcat 1807
No ratings yet
Aimcat 1807
39 pages
CFD-3
No ratings yet
CFD-3
3 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
CHAPTER 4 Booting a Unix System
No ratings yet
CHAPTER 4 Booting a Unix System
32 pages
UEH Exception Definition, Format and Mapping: Revision Date Project
No ratings yet
UEH Exception Definition, Format and Mapping: Revision Date Project
5 pages
ENG (TGAT NETSAT) 2024 K.Champ
No ratings yet
ENG (TGAT NETSAT) 2024 K.Champ
15 pages
Diagnostic Test
No ratings yet
Diagnostic Test
12 pages
Anuual Worksheet Grammar
No ratings yet
Anuual Worksheet Grammar
14 pages
BÀI TẬP QUÁ KHỨ ĐƠN
No ratings yet
BÀI TẬP QUÁ KHỨ ĐƠN
4 pages
Accomplishment Report
100% (1)
Accomplishment Report
3 pages
Pontis 5.1.2 User Manual PDF
No ratings yet
Pontis 5.1.2 User Manual PDF
459 pages
cs12
No ratings yet
cs12
8 pages
Localization and Contextualization
100% (1)
Localization and Contextualization
16 pages
RNW - Lesson 1
No ratings yet
RNW - Lesson 1
25 pages
TelScale SMSCGateway Release Notes
100% (1)
TelScale SMSCGateway Release Notes
12 pages
BG Absence Duration
No ratings yet
BG Absence Duration
7 pages
Download Complete Flexible regression and smoothing using GAMLSS in R 1st Edition Mikis D. Stasinopoulos PDF for All Chapters
100% (8)
Download Complete Flexible regression and smoothing using GAMLSS in R 1st Edition Mikis D. Stasinopoulos PDF for All Chapters
85 pages
Organizing and Delivering A Manuscript Speech
No ratings yet
Organizing and Delivering A Manuscript Speech
18 pages
Topology: Rendered As PNG
No ratings yet
Topology: Rendered As PNG
94 pages
ڕێزمانی ئینگلیزی زاگرۆس 2
No ratings yet
ڕێزمانی ئینگلیزی زاگرۆس 2
206 pages
Ricoeur, Paul - 1965 - Fallible Man 15 PDF
No ratings yet
Ricoeur, Paul - 1965 - Fallible Man 15 PDF
1 page
Free/Libre and Open Source Software (FLOSS) Resources For Online Courses
No ratings yet
Free/Libre and Open Source Software (FLOSS) Resources For Online Courses
4 pages
Resume For Manual Testing
No ratings yet
Resume For Manual Testing
3 pages
COLLOCATION AND CONNOTATION A Corpus Bas
No ratings yet
COLLOCATION AND CONNOTATION A Corpus Bas
384 pages
CS Art Integrated Progect
No ratings yet
CS Art Integrated Progect
11 pages
Mood and Figure of Syllogism
No ratings yet
Mood and Figure of Syllogism
9 pages
Java Generics
No ratings yet
Java Generics
5 pages