Google Colab Solution Activity

Uploaded by

omarobeidd03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Google Colab Solution Activity

Uploaded by

omarobeidd03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Google Colab

Activity Solution
DR. RACHAD ATAT
In-Class Activity
Modify the vecAdd program so that each thread adds two adjacent elements from the input
vectors A and B and stores the result in the corresponding two adjacent elements of the output
vector C.
What you need to do: Update the Kernel: Modify the vecAddKernel so that each thread
processes two adjacent elements instead of just one.

2
#include <stdio.h> // Host function to setup and call the vecAddKernel
#include <cuda.h> void vecAdd(float* A, float* B, float* C, int n) {
float *A_d, *B_d, *C_d;
// Kernel function to perform vector addition int size = n * sizeof(float);
__global__
void vecAddKernel(float* A, float* B, float* C, int n) { // Allocate memory on the device (GPU)
int i = threadIdx.x + blockDim.x * blockIdx.x; cudaMalloc((void **) &A_d, size);
cudaMalloc((void **) &B_d, size);
// Compute the index for the first element cudaMalloc((void **) &C_d, size);
// Compute the index for the second element
// Ensure both indices are within bounds before performing // Copy vectors A and B from host (CPU) to device (GPU)
the operation cudaMemcpy(A_d, A, size, cudaMemcpyHostToDevice);
} cudaMemcpy(B_d, B, size, cudaMemcpyHostToDevice);

// Launch the kernel

// Since each thread handles two elements, we need n/2
threads, so we launch (n + 511)/512 blocks with 256 threads
each
vecAddKernel<<<(n + 511)/512, 256>>>(A_d, B_d, C_d, n);

3
// Copy the result vector C from device (GPU) to host (CPU)
cudaMemcpy(C, C_d, size, cudaMemcpyDeviceToHost);

// Free the device memory

cudaFree(A_d);
cudaFree(B_d);
cudaFree(C_d);
}

int main() {
int n = 1000; // Size of the vectors
float A[n], B[n], C[n]; // Declare host vectors

// Initialize vectors A and B with some values

for (int i = 0; i < n; i++) {
A[i] = i * 1.0f;
B[i] = i * 2.0f;
}

// Call the vecAdd function to perform the vector addition on the GPU
vecAdd(A, B, C, n);

// Display a few results from the output

for (int i = 0; i < 10; i++) {
printf("C[%d] = %f\n", i, C[i]);
}

return 0;
}

4
#include <stdio.h> // Host function to setup and call the vecAddKernel
#include <cuda.h> void vecAdd(float* A, float* B, float* C, int n) {
float *A_d, *B_d, *C_d;
// Kernel function to perform vector addition int size = n * sizeof(float);
__global__
void vecAddKernel(float* A, float* B, float* C, int n) { // Allocate memory on the device (GPU)
int i = threadIdx.x + blockDim.x * blockIdx.x; cudaMalloc((void **) &A_d, size);
cudaMalloc((void **) &B_d, size);
// Each thread now processes two adjacent elements cudaMalloc((void **) &C_d, size);
int idx1 = 2 * i;
int idx2 = 2 * i + 1; // Copy vectors A and B from host (CPU) to device (GPU)
cudaMemcpy(A_d, A, size, cudaMemcpyHostToDevice);
// Ensure both indices are within bounds before performing cudaMemcpy(B_d, B, size, cudaMemcpyHostToDevice);
the operation
if (idx1 < n) { // Launch the kernel
C[idx1] = A[idx1] + B[idx1]; // Since each thread handles two elements, we need n/2
} threads, so we launch (n + 511)/512 blocks with 256 threads
if (idx2 < n) { each
C[idx2] = A[idx2] + B[idx2]; vecAddKernel<<<(n + 511)/512, 256>>>(A_d, B_d, C_d, n);
}
}
SOLUTION
5

cuda
No ratings yet
cuda
4 pages
7. Moving to Parallel - Addition of 2 Matrices
No ratings yet
7. Moving to Parallel - Addition of 2 Matrices
14 pages
TP1: Converting Vector Addition To CUDA.: Listing 1 An Example of Vector Addition Implemented in C
No ratings yet
TP1: Converting Vector Addition To CUDA.: Listing 1 An Example of Vector Addition Implemented in C
1 page
2023-CSC14120-Lecture01-CUDAIntroduction
No ratings yet
2023-CSC14120-Lecture01-CUDAIntroduction
32 pages
LP 1,,1
No ratings yet
LP 1,,1
5 pages
Rishi
No ratings yet
Rishi
30 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
CUDA Programming Invert
No ratings yet
CUDA Programming Invert
36 pages
Group A Assignment 4 (A) : Two Large Vectors
No ratings yet
Group A Assignment 4 (A) : Two Large Vectors
5 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
CUDA PPT Anurita Unit3
No ratings yet
CUDA PPT Anurita Unit3
42 pages
Vector Addition
No ratings yet
Vector Addition
3 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
3-CUDA
No ratings yet
3-CUDA
5 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
CUDA
No ratings yet
CUDA
3 pages
PDC assignment
No ratings yet
PDC assignment
9 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
Introduccion CUDA C
No ratings yet
Introduccion CUDA C
51 pages
L06_GPGPU_CUDA_Programming_1
No ratings yet
L06_GPGPU_CUDA_Programming_1
23 pages
Cuda Firstprograms PDF
No ratings yet
Cuda Firstprograms PDF
6 pages
Parallel Scan in C CUda
No ratings yet
Parallel Scan in C CUda
3 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
G80 Cuda
No ratings yet
G80 Cuda
25 pages
Introduction To CUDA C
No ratings yet
Introduction To CUDA C
67 pages
Ejercicio 2 Práctica 3: CUDA Desempeño en Función de La Homogeneidad para Acceder A Memoria y de La Regularidad Del Código
No ratings yet
Ejercicio 2 Práctica 3: CUDA Desempeño en Función de La Homogeneidad para Acceder A Memoria y de La Regularidad Del Código
8 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
No ratings yet
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
3 pages
Mulmatrix Cu
No ratings yet
Mulmatrix Cu
3 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
Cuda Notes From Udacity Lecture
No ratings yet
Cuda Notes From Udacity Lecture
3 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
19 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
Lecture17 12
No ratings yet
Lecture17 12
86 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
5-computation
No ratings yet
5-computation
13 pages
DeviceFunc Cu
100% (1)
DeviceFunc Cu
1 page
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
No ratings yet
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
45 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
Allocate The Device Memory Where We Will Copy M
No ratings yet
Allocate The Device Memory Where We Will Copy M
2 pages
CUDA_part-1
No ratings yet
CUDA_part-1
52 pages
GPU Series III CUDA Compilation Host Side 1721302802
No ratings yet
GPU Series III CUDA Compilation Host Side 1721302802
8 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
Pgi Cuda Tutorial
No ratings yet
Pgi Cuda Tutorial
58 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
HPC (Pra 04)
No ratings yet
HPC (Pra 04)
11 pages
CUDA_part-1-LMS
No ratings yet
CUDA_part-1-LMS
51 pages
combinepdf
No ratings yet
combinepdf
28 pages
Advanced Computer Graphics and Graphics Hardware: CUDA: Course Project
No ratings yet
Advanced Computer Graphics and Graphics Hardware: CUDA: Course Project
8 pages
2023 CSC14120 Lecture05 CUDAMemories
No ratings yet
2023 CSC14120 Lecture05 CUDAMemories
48 pages
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
No ratings yet
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
17 pages
Intro To CUDA
No ratings yet
Intro To CUDA
76 pages
Threads
No ratings yet
Threads
54 pages
CUDA Libraries and CUDA Fortran: Massimiliano Fatica
No ratings yet
CUDA Libraries and CUDA Fortran: Massimiliano Fatica
55 pages
ECE408 S19 ZJUI Exam1 Study Guide
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
25 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
LEC16_MTH305
No ratings yet
LEC16_MTH305
72 pages
CSC 438 Blockchain Systems - Programming Project
No ratings yet
CSC 438 Blockchain Systems - Programming Project
4 pages
LEC17_MTH305(1) (1)
No ratings yet
LEC17_MTH305(1) (1)
34 pages
Image Blurring Report
No ratings yet
Image Blurring Report
2 pages
LAB2
No ratings yet
LAB2
4 pages
Deep Learning L5
No ratings yet
Deep Learning L5
17 pages
Semaphores and Mutexes
No ratings yet
Semaphores and Mutexes
36 pages
Deep Learning L4
No ratings yet
Deep Learning L4
19 pages
Lt.j-A) : TLR-D
No ratings yet
Lt.j-A) : TLR-D
4 pages
CSC447 Multidimensional Grids and Data
No ratings yet
CSC447 Multidimensional Grids and Data
65 pages
CH 1234 Summaries SE
No ratings yet
CH 1234 Summaries SE
13 pages
CSC430 L2 Sum
No ratings yet
CSC430 L2 Sum
3 pages
Manjunath B.S., Salembier P., Sikora T. - Introduction To MPEG 7. Multimedia Content Description Language
No ratings yet
Manjunath B.S., Salembier P., Sikora T. - Introduction To MPEG 7. Multimedia Content Description Language
400 pages
Module 13 - Aircraft Aerodynamics Structures and Systems
No ratings yet
Module 13 - Aircraft Aerodynamics Structures and Systems
8 pages
Copper - Wikipedia
No ratings yet
Copper - Wikipedia
67 pages
Laser Range Finder: Diode-Pumped Technology
No ratings yet
Laser Range Finder: Diode-Pumped Technology
2 pages
HSAD Assignment
No ratings yet
HSAD Assignment
2 pages
Simple Wep Crack (Aircrack-Ng)
No ratings yet
Simple Wep Crack (Aircrack-Ng)
7 pages
Juliane Science 11 Enrichment Program Unit Topics Sessions/Target Dates Remarks
No ratings yet
Juliane Science 11 Enrichment Program Unit Topics Sessions/Target Dates Remarks
4 pages
DLL Q4w1
No ratings yet
DLL Q4w1
10 pages
General Mathematics: Quarter 1 - Module 16: Solving Problems Involving Inverse Functions
No ratings yet
General Mathematics: Quarter 1 - Module 16: Solving Problems Involving Inverse Functions
20 pages
Design and Implementation of FPGA Based High Speed Data Acquisition Systems For Embedded Applications
No ratings yet
Design and Implementation of FPGA Based High Speed Data Acquisition Systems For Embedded Applications
52 pages
Machine Learning Based Risk Classification of Musculoskeletal Disorder Among The Garment Industry Operators
No ratings yet
Machine Learning Based Risk Classification of Musculoskeletal Disorder Among The Garment Industry Operators
6 pages
Series de Fourier - Rajendra PDF
100% (1)
Series de Fourier - Rajendra PDF
131 pages
Power System Analysis: Dr. M. Varadarajan Eee, Sce
No ratings yet
Power System Analysis: Dr. M. Varadarajan Eee, Sce
21 pages
3/2-And 2/2-Way Cartridge Valve Type BVE: 1. General
100% (1)
3/2-And 2/2-Way Cartridge Valve Type BVE: 1. General
14 pages
CBSE Class 8 Maths Sample Paper Set 1 Solution
No ratings yet
CBSE Class 8 Maths Sample Paper Set 1 Solution
17 pages
Spectrum Estimation
No ratings yet
Spectrum Estimation
49 pages
Board of Examiners 0416102089
No ratings yet
Board of Examiners 0416102089
2 pages
Object Oriented Concept Class & Object
No ratings yet
Object Oriented Concept Class & Object
11 pages
Industrial Electronics
100% (1)
Industrial Electronics
3 pages
FDDSFDSFDSGFFDGFDGDFGFFDG
No ratings yet
FDDSFDSFDSGFFDGFDGDFGFFDG
15 pages
Sugar E
No ratings yet
Sugar E
12 pages
Wheel-Balancer-Ml Balanceadora Manatec
No ratings yet
Wheel-Balancer-Ml Balanceadora Manatec
4 pages
8th Maths Lesson 10 To 12 Eng
No ratings yet
8th Maths Lesson 10 To 12 Eng
2 pages
SolutionArchitect Part1
No ratings yet
SolutionArchitect Part1
28 pages
Physical Science
93% (15)
Physical Science
468 pages
ScopeImage 9.0 User Manual
No ratings yet
ScopeImage 9.0 User Manual
40 pages
Ra00118aa U en
No ratings yet
Ra00118aa U en
20 pages
ME2610 Exam Jan-2021
No ratings yet
ME2610 Exam Jan-2021
7 pages
Profibus Glossário
No ratings yet
Profibus Glossário
92 pages
Product Keys 2019
No ratings yet
Product Keys 2019
3 pages

Google Colab Solution Activity

Uploaded by

Google Colab Solution Activity

Uploaded by

Google Colab

// Launch the kernel

// Free the device memory

// Initialize vectors A and B with some values

// Display a few results from the output

You might also like