SlideShare a Scribd company logo
HPC GPU Programming with CUDA

An Overview of CUDA for High Performance Computing

By Kato Mivule
Computer Science Department
Bowie State University
COSC887 Fall 2013

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

Agenda
•
•
•
•
•
•
•
•

CUDA Introduction.
CUDA Process flow.
CUDA Hello world program.
CUDA – Compiling and running a program.
CUDA Basic structure.
CUDA – Example program on vector addition.
CUDA – The conclusion.
CUDA – References and sources

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Introduction

•CUDA – Compute Unified Device Architecture.
•Developed by NVIDIA.
•A parallel computing platform and programming model .
•Implemented by the NVIDIA graphics processing units (GPUs).

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Introduction
•Grants access directly to the virtual instruction set and memory of GPUs.
•Allows for General Purpose Processing (GPGPU) beyond graphics .
•Allows for increased computing performance using GPUs.

Plymouth Cuda – Image Source: betterparts.org

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Process flow in three steps
1.

Copy input data from CPU memory to GPU memory.

2.

Load GPU program and execute.

3.

Copy results from GPU memory to CPU memory.

Image Source: http://en.wikipedia.org/wiki/CUDA

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Hello world program
#include <stdio.h>
__global__ void mykernel(void) {

// Denotes that this is device (GPU)code
// Denotes that function runs on device (GPU)
// Gets called from host code

}
int main(void) {

//Host (CPU) code
//Runs on Host

printf("Hello, world!n");
mykernel<<<1,1>>>();

//<<< >>> Denotes a call from host to device code

return 0;
}

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA
CUDA – Compiling and Running A Program on GWU’s Cray
1. Log into Cary: ssh cray
2. Change to ‘work’ directory: cd work
3. Create your program with file extension as .cu: vim hello1.cu
4. Load the CUDA Module module load cudatoolkit
5. Compile using NVCC: nvcc hello1.cu -o hello1
6. Execute program: ./hello1

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
•The kernel – this is the GPU program.
•The kernel is executed on a grid.
•The grid – is a group of thread blocks.
•The thread block – is a group of threads.
Image Source: CUDA Overview Tutorial, Cliff Woolley, NVIDIA
http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/02-cuda-overview.pdf

•Executed on a single multi-processor.
•Can communicate and synchronize.
•Threads are grouped into Blocks and Blocks into a Grid
Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
Declaring functions
• __global__ Denotes a kernel function called on host and executed on device.
• __device__ Denotes device function called and executed on device.
• __host__

Denotes a host function called and executed on host.

• __constant__ Denotes a constant device variable available to all threads.
• __shared__ Denotes a shared device variable available to all threads in a block.

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
Some of the supported data types
• char and uchar
• short and ushort
• int and uint
• long and ulong
• float and ufloat

• longlong and ulonglong

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
• Accessing components – kernel function specifies the number of threads
• dim3 gridDim – denotes the dimensions of grid in blocks.
•

Example: dim3 DimGrid(8,4) – 32 thread blocks

• dim3 blockDim – denotes the dimensions of block in threads.
•

Example: dim3 DimBlock (2, 2, 2) – 8 threads per block

• uint3 blockIdx – denotes a block index within grid.
• uint3 threadIdx – denotes a thread index within block.

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
Thread management
•

__threadfence_block() – wait until memory access is available to block.

•

__threadfence() – wait until memory access is available to block and device.

•

__threadfence_system() – wait until memory access is available to block, device and host.

•

__syncthreads() – wait until all threads synchronize.

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
Memory management
•

cudaMalloc( ) – allocates memory.

•

cudaFree( ) – frees allocated memory.

•

cudaMemcpyDeviceToHost, cudaMemcpy( )
• copies device (GPU) results back to host (CPU) memory from device to host.

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
Atomic functions – executed without obstruction from other threads
• atomicAdd ( )
• atomicSub ( )
• atomicExch( )
• atomicMin ( )
• atomicMax ( )

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Basic structure
Atomic functions – executed without obstruction from other threads
• atomicAdd ( )
• atomicSub ( )
• atomicExch( )
• atomicMin ( )
• atomicMax ( )

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Example code for vector addition
//=============================================================
//Vector addition
//Oakridge National Lab Example
//https://www.olcf.ornl.gov/tutorials/cuda-vector-addition/
//=============================================================
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// CUDA kernel. Each thread takes care of one element of c
// To run on device (GPU) and get called by Host(CPU)
__global__ void vecAdd(double *a, double *b, double *c, int n)
{
// Get our global thread ID
int id = blockIdx.x*blockDim.x+threadIdx.x;
// Make sure we do not go out of bounds
if (id < n)
c[id] = a[id] + b[id];
}

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Example code for vector addition
int main( int argc, char* argv[] )
{
// Size of vectors
int n = 100000;
// Host input vectors
double *h_a;
double *h_b;
//Host output vector
double *h_c;
// Device input vectors
double *d_a;
double *d_b;
//Device output vector
double *d_c;
// Size, in bytes, of each vector
size_t bytes = n*sizeof(double);

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Example code for vector addition
// Allocate memory for each vector on host
h_a = (double*)malloc(bytes);
h_b = (double*)malloc(bytes);
h_c = (double*)malloc(bytes);
// Allocate memory for each vector on GPU
cudaMalloc(&d_a, bytes);
cudaMalloc(&d_b, bytes);
cudaMalloc(&d_c, bytes);
int i;
// Initialize vectors on host
for( i = 0; i < n; i++ ) {
h_a[i] = sin(i)*sin(i);
h_b[i] = cos(i)*cos(i);
}

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Example code for vector addition
// Copy host vectors to device
cudaMemcpy( d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy( d_b, h_b, bytes, cudaMemcpyHostToDevice);
int blockSize, gridSize;
// Number of threads in each thread block
blockSize = 1024;
// Number of thread blocks in grid
gridSize = (int)ceil((float)n/blockSize);
// Execute the kernel
vecAdd<<<gridSize, blockSize>>>(d_a, d_b, d_c, n);
// Copy array back to host
cudaMemcpy( h_c, d_c, bytes, cudaMemcpyDeviceToHost );

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Example code for vector addition
// Sum up vector c and print result divided by n, this should equal 1 within error
double sum = 0;
for(i=0; i<n; i++)
sum += h_c[i];
printf("final result: %fn", sum/n);
// Release device memory
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
// Release host memory
free(h_a);
free(h_b);
free(h_c);
return 0;
}

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

CUDA – Example code for vector addition
Sometimes your correct CUDA code will output wrong results.
•
Check the machine for error – access to the device(GPU) might not be granted.
•
Computation might only produce correct results at the host (CPU).
//============================
//ERROR CHECKING
//============================
#define cudaCheckErrors(msg) 
do { 
cudaError_t __err = cudaGetLastError(); 
if (__err != cudaSuccess) { 
fprintf(stderr, "Fatal error: %s (%s at %s:%d)n", 
msg, cudaGetErrorString(__err), 
__FILE__, __LINE__); 
fprintf(stderr, "*** FAILED - ABORTINGn"); 
exit(1); 
} 
} while (0)
//place in memory allocation section
cudaCheckErrors("cudamalloc fail");
//place in memory copy section
cudaCheckErrors("cuda memcpy fail");
cudaCheckErrors("cudamemcpy or cuda kernel fail");
Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

Conclusion
• CUDA’s access to GPU computational power is outstanding.
• CUDA is easy to learn.

• CUDA – can take care of business by coding in C.
• However, it is a challenge translating code from host to device and device to host.

Bowie State University Department of Computer Science
HPC GPU Programming with CUDA

References and Sources
[1] CUDA Programming Blog Tutorial
http://cuda-programming.blogspot.com/2013/03/cuda-complete-complete-reference-on-cuda.html
[2] Dr. Kenrick Mock CUDA Tutorial
http://www.math.uaa.alaska.edu/~afkjm/cs448/handouts/cuda-firstprograms.pdf
[3] Parallel Programming Lecture Notes, Spring 2008, Johns Hopkins University
http://hssl.cs.jhu.edu/wiki/lib/exe/fetch.php?media=randal:teach:cs420:cudatools.pdf
[4] CUDA Super Computing Blog Tutorials
http://supercomputingblog.com/cuda-tutorials/
[5] Introduction to CUDA C Tutorial, Jason Sanders
http://www.nvidia.com/content/GTC-2010/pdfs/2131_GTC2010.pdf
[6] CUDA Overview Tutorial, Cliff Woolley, NVIDIA
http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/02-cuda-overview.pdf
[7] Oakridge National Lab CUDA Vector Addition Example
//https://www.olcf.ornl.gov/tutorials/cuda-vector-addition/
[8] CUDA – Wikipedia
http://en.wikipedia.org/wiki/CUDA

Bowie State University Department of Computer Science
Ad

More Related Content

What's hot (20)

スーパーコンピュータとアプリケーションの性能
スーパーコンピュータとアプリケーションの性能スーパーコンピュータとアプリケーションの性能
スーパーコンピュータとアプリケーションの性能
RCCSRENKEI
 
cuda.ppt
cuda.pptcuda.ppt
cuda.ppt
dawoodsarfraz
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
Piyush Mittal
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
Dr Shashikant Athawale
 
Advanced Debugging with GDB
Advanced Debugging with GDBAdvanced Debugging with GDB
Advanced Debugging with GDB
David Khosid
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
Davidlohr Bueso
 
C++11 & C++14
C++11 & C++14C++11 & C++14
C++11 & C++14
CyberPlusIndia
 
One-Liners to Rule Them All
One-Liners to Rule Them AllOne-Liners to Rule Them All
One-Liners to Rule Them All
egypt
 
Rough K Means - Numerical Example
Rough K Means - Numerical ExampleRough K Means - Numerical Example
Rough K Means - Numerical Example
Dr.E.N.Sathishkumar
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
Uptime Technologies LLC (JP)
 
How to Choose a Software Update Mechanism for Embedded Linux Devices
How to Choose a Software Update Mechanism for Embedded Linux DevicesHow to Choose a Software Update Mechanism for Embedded Linux Devices
How to Choose a Software Update Mechanism for Embedded Linux Devices
Leon Anavi
 
Introduction to BeagleBone Black
Introduction to BeagleBone BlackIntroduction to BeagleBone Black
Introduction to BeagleBone Black
SysPlay eLearning Academy for You
 
Tech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDATech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDA
Jens Rühmkorf
 
Gravitational Teleportの話
Gravitational Teleportの話Gravitational Teleportの話
Gravitational Teleportの話
Kentaro Kishigami
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
Operating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputOperating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / Output
Peter Tröger
 
Cuda
CudaCuda
Cuda
Gopi Saiteja
 
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Linaro
 
スーパーコンピュータとアプリケーションの性能
スーパーコンピュータとアプリケーションの性能スーパーコンピュータとアプリケーションの性能
スーパーコンピュータとアプリケーションの性能
RCCSRENKEI
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Advanced Debugging with GDB
Advanced Debugging with GDBAdvanced Debugging with GDB
Advanced Debugging with GDB
David Khosid
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
Davidlohr Bueso
 
One-Liners to Rule Them All
One-Liners to Rule Them AllOne-Liners to Rule Them All
One-Liners to Rule Them All
egypt
 
How to Choose a Software Update Mechanism for Embedded Linux Devices
How to Choose a Software Update Mechanism for Embedded Linux DevicesHow to Choose a Software Update Mechanism for Embedded Linux Devices
How to Choose a Software Update Mechanism for Embedded Linux Devices
Leon Anavi
 
Gravitational Teleportの話
Gravitational Teleportの話Gravitational Teleportの話
Gravitational Teleportの話
Kentaro Kishigami
 
Operating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputOperating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / Output
Peter Tröger
 
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Linaro
 

Similar to Kato Mivule: An Overview of CUDA for High Performance Computing (20)

Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
NVIDIA cuda programming, open source and AI
NVIDIA cuda programming, open source and AINVIDIA cuda programming, open source and AI
NVIDIA cuda programming, open source and AI
Tae wook kang
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
Cuda intro
Cuda introCuda intro
Cuda intro
Anshul Sharma
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
Moayadhn
 
Cuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick startCuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick start
LloydMoore
 
introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
Himanshu577858
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
GPU programming and Its Case Study
GPU programming and Its Case StudyGPU programming and Its Case Study
GPU programming and Its Case Study
Zhengjie Lu
 
GPU Computing with CUDA
GPU Computing with CUDAGPU Computing with CUDA
GPU Computing with CUDA
PriyankaSaini94
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
Hanibei
 
002 - Introduction to CUDA Programming_1.ppt
002 - Introduction to CUDA Programming_1.ppt002 - Introduction to CUDA Programming_1.ppt
002 - Introduction to CUDA Programming_1.ppt
ceyifo9332
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
Shree Kumar
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
Piyush Mittal
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
VICTOR MAESTRE RAMIREZ
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
bakers84
 
Deep Learning Edge
Deep Learning Edge Deep Learning Edge
Deep Learning Edge
Ganesan Narayanasamy
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
NVIDIA cuda programming, open source and AI
NVIDIA cuda programming, open source and AINVIDIA cuda programming, open source and AI
NVIDIA cuda programming, open source and AI
Tae wook kang
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
Moayadhn
 
Cuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick startCuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick start
LloydMoore
 
introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
Himanshu577858
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
GPU programming and Its Case Study
GPU programming and Its Case StudyGPU programming and Its Case Study
GPU programming and Its Case Study
Zhengjie Lu
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
Hanibei
 
002 - Introduction to CUDA Programming_1.ppt
002 - Introduction to CUDA Programming_1.ppt002 - Introduction to CUDA Programming_1.ppt
002 - Introduction to CUDA Programming_1.ppt
ceyifo9332
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
Shree Kumar
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
Piyush Mittal
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
VICTOR MAESTRE RAMIREZ
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
bakers84
 
Ad

More from Kato Mivule (20)

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization
Kato Mivule
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
Kato Mivule
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Kato Mivule
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
Kato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
Kato Mivule
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
Kato Mivule
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Kato Mivule
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
Kato Mivule
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Kato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule
 
A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization
Kato Mivule
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
Kato Mivule
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Kato Mivule
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
Kato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
Kato Mivule
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
Kato Mivule
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Kato Mivule
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
Kato Mivule
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Kato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule
 
Ad

Recently uploaded (20)

Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

Kato Mivule: An Overview of CUDA for High Performance Computing

  • 1. HPC GPU Programming with CUDA An Overview of CUDA for High Performance Computing By Kato Mivule Computer Science Department Bowie State University COSC887 Fall 2013 Bowie State University Department of Computer Science
  • 2. HPC GPU Programming with CUDA Agenda • • • • • • • • CUDA Introduction. CUDA Process flow. CUDA Hello world program. CUDA – Compiling and running a program. CUDA Basic structure. CUDA – Example program on vector addition. CUDA – The conclusion. CUDA – References and sources Bowie State University Department of Computer Science
  • 3. HPC GPU Programming with CUDA CUDA – Introduction •CUDA – Compute Unified Device Architecture. •Developed by NVIDIA. •A parallel computing platform and programming model . •Implemented by the NVIDIA graphics processing units (GPUs). Bowie State University Department of Computer Science
  • 4. HPC GPU Programming with CUDA CUDA – Introduction •Grants access directly to the virtual instruction set and memory of GPUs. •Allows for General Purpose Processing (GPGPU) beyond graphics . •Allows for increased computing performance using GPUs. Plymouth Cuda – Image Source: betterparts.org Bowie State University Department of Computer Science
  • 5. HPC GPU Programming with CUDA CUDA – Process flow in three steps 1. Copy input data from CPU memory to GPU memory. 2. Load GPU program and execute. 3. Copy results from GPU memory to CPU memory. Image Source: http://en.wikipedia.org/wiki/CUDA Bowie State University Department of Computer Science
  • 6. HPC GPU Programming with CUDA CUDA – Hello world program #include <stdio.h> __global__ void mykernel(void) { // Denotes that this is device (GPU)code // Denotes that function runs on device (GPU) // Gets called from host code } int main(void) { //Host (CPU) code //Runs on Host printf("Hello, world!n"); mykernel<<<1,1>>>(); //<<< >>> Denotes a call from host to device code return 0; } Bowie State University Department of Computer Science
  • 7. HPC GPU Programming with CUDA CUDA – Compiling and Running A Program on GWU’s Cray 1. Log into Cary: ssh cray 2. Change to ‘work’ directory: cd work 3. Create your program with file extension as .cu: vim hello1.cu 4. Load the CUDA Module module load cudatoolkit 5. Compile using NVCC: nvcc hello1.cu -o hello1 6. Execute program: ./hello1 Bowie State University Department of Computer Science
  • 8. HPC GPU Programming with CUDA CUDA – Basic structure •The kernel – this is the GPU program. •The kernel is executed on a grid. •The grid – is a group of thread blocks. •The thread block – is a group of threads. Image Source: CUDA Overview Tutorial, Cliff Woolley, NVIDIA http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/02-cuda-overview.pdf •Executed on a single multi-processor. •Can communicate and synchronize. •Threads are grouped into Blocks and Blocks into a Grid Bowie State University Department of Computer Science
  • 9. HPC GPU Programming with CUDA CUDA – Basic structure Declaring functions • __global__ Denotes a kernel function called on host and executed on device. • __device__ Denotes device function called and executed on device. • __host__ Denotes a host function called and executed on host. • __constant__ Denotes a constant device variable available to all threads. • __shared__ Denotes a shared device variable available to all threads in a block. Bowie State University Department of Computer Science
  • 10. HPC GPU Programming with CUDA CUDA – Basic structure Some of the supported data types • char and uchar • short and ushort • int and uint • long and ulong • float and ufloat • longlong and ulonglong Bowie State University Department of Computer Science
  • 11. HPC GPU Programming with CUDA CUDA – Basic structure • Accessing components – kernel function specifies the number of threads • dim3 gridDim – denotes the dimensions of grid in blocks. • Example: dim3 DimGrid(8,4) – 32 thread blocks • dim3 blockDim – denotes the dimensions of block in threads. • Example: dim3 DimBlock (2, 2, 2) – 8 threads per block • uint3 blockIdx – denotes a block index within grid. • uint3 threadIdx – denotes a thread index within block. Bowie State University Department of Computer Science
  • 12. HPC GPU Programming with CUDA CUDA – Basic structure Thread management • __threadfence_block() – wait until memory access is available to block. • __threadfence() – wait until memory access is available to block and device. • __threadfence_system() – wait until memory access is available to block, device and host. • __syncthreads() – wait until all threads synchronize. Bowie State University Department of Computer Science
  • 13. HPC GPU Programming with CUDA CUDA – Basic structure Memory management • cudaMalloc( ) – allocates memory. • cudaFree( ) – frees allocated memory. • cudaMemcpyDeviceToHost, cudaMemcpy( ) • copies device (GPU) results back to host (CPU) memory from device to host. Bowie State University Department of Computer Science
  • 14. HPC GPU Programming with CUDA CUDA – Basic structure Atomic functions – executed without obstruction from other threads • atomicAdd ( ) • atomicSub ( ) • atomicExch( ) • atomicMin ( ) • atomicMax ( ) Bowie State University Department of Computer Science
  • 15. HPC GPU Programming with CUDA CUDA – Basic structure Atomic functions – executed without obstruction from other threads • atomicAdd ( ) • atomicSub ( ) • atomicExch( ) • atomicMin ( ) • atomicMax ( ) Bowie State University Department of Computer Science
  • 16. HPC GPU Programming with CUDA CUDA – Example code for vector addition //============================================================= //Vector addition //Oakridge National Lab Example //https://www.olcf.ornl.gov/tutorials/cuda-vector-addition/ //============================================================= #include <stdio.h> #include <stdlib.h> #include <math.h> // CUDA kernel. Each thread takes care of one element of c // To run on device (GPU) and get called by Host(CPU) __global__ void vecAdd(double *a, double *b, double *c, int n) { // Get our global thread ID int id = blockIdx.x*blockDim.x+threadIdx.x; // Make sure we do not go out of bounds if (id < n) c[id] = a[id] + b[id]; } Bowie State University Department of Computer Science
  • 17. HPC GPU Programming with CUDA CUDA – Example code for vector addition int main( int argc, char* argv[] ) { // Size of vectors int n = 100000; // Host input vectors double *h_a; double *h_b; //Host output vector double *h_c; // Device input vectors double *d_a; double *d_b; //Device output vector double *d_c; // Size, in bytes, of each vector size_t bytes = n*sizeof(double); Bowie State University Department of Computer Science
  • 18. HPC GPU Programming with CUDA CUDA – Example code for vector addition // Allocate memory for each vector on host h_a = (double*)malloc(bytes); h_b = (double*)malloc(bytes); h_c = (double*)malloc(bytes); // Allocate memory for each vector on GPU cudaMalloc(&d_a, bytes); cudaMalloc(&d_b, bytes); cudaMalloc(&d_c, bytes); int i; // Initialize vectors on host for( i = 0; i < n; i++ ) { h_a[i] = sin(i)*sin(i); h_b[i] = cos(i)*cos(i); } Bowie State University Department of Computer Science
  • 19. HPC GPU Programming with CUDA CUDA – Example code for vector addition // Copy host vectors to device cudaMemcpy( d_a, h_a, bytes, cudaMemcpyHostToDevice); cudaMemcpy( d_b, h_b, bytes, cudaMemcpyHostToDevice); int blockSize, gridSize; // Number of threads in each thread block blockSize = 1024; // Number of thread blocks in grid gridSize = (int)ceil((float)n/blockSize); // Execute the kernel vecAdd<<<gridSize, blockSize>>>(d_a, d_b, d_c, n); // Copy array back to host cudaMemcpy( h_c, d_c, bytes, cudaMemcpyDeviceToHost ); Bowie State University Department of Computer Science
  • 20. HPC GPU Programming with CUDA CUDA – Example code for vector addition // Sum up vector c and print result divided by n, this should equal 1 within error double sum = 0; for(i=0; i<n; i++) sum += h_c[i]; printf("final result: %fn", sum/n); // Release device memory cudaFree(d_a); cudaFree(d_b); cudaFree(d_c); // Release host memory free(h_a); free(h_b); free(h_c); return 0; } Bowie State University Department of Computer Science
  • 21. HPC GPU Programming with CUDA CUDA – Example code for vector addition Sometimes your correct CUDA code will output wrong results. • Check the machine for error – access to the device(GPU) might not be granted. • Computation might only produce correct results at the host (CPU). //============================ //ERROR CHECKING //============================ #define cudaCheckErrors(msg) do { cudaError_t __err = cudaGetLastError(); if (__err != cudaSuccess) { fprintf(stderr, "Fatal error: %s (%s at %s:%d)n", msg, cudaGetErrorString(__err), __FILE__, __LINE__); fprintf(stderr, "*** FAILED - ABORTINGn"); exit(1); } } while (0) //place in memory allocation section cudaCheckErrors("cudamalloc fail"); //place in memory copy section cudaCheckErrors("cuda memcpy fail"); cudaCheckErrors("cudamemcpy or cuda kernel fail"); Bowie State University Department of Computer Science
  • 22. HPC GPU Programming with CUDA Conclusion • CUDA’s access to GPU computational power is outstanding. • CUDA is easy to learn. • CUDA – can take care of business by coding in C. • However, it is a challenge translating code from host to device and device to host. Bowie State University Department of Computer Science
  • 23. HPC GPU Programming with CUDA References and Sources [1] CUDA Programming Blog Tutorial http://cuda-programming.blogspot.com/2013/03/cuda-complete-complete-reference-on-cuda.html [2] Dr. Kenrick Mock CUDA Tutorial http://www.math.uaa.alaska.edu/~afkjm/cs448/handouts/cuda-firstprograms.pdf [3] Parallel Programming Lecture Notes, Spring 2008, Johns Hopkins University http://hssl.cs.jhu.edu/wiki/lib/exe/fetch.php?media=randal:teach:cs420:cudatools.pdf [4] CUDA Super Computing Blog Tutorials http://supercomputingblog.com/cuda-tutorials/ [5] Introduction to CUDA C Tutorial, Jason Sanders http://www.nvidia.com/content/GTC-2010/pdfs/2131_GTC2010.pdf [6] CUDA Overview Tutorial, Cliff Woolley, NVIDIA http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/02-cuda-overview.pdf [7] Oakridge National Lab CUDA Vector Addition Example //https://www.olcf.ornl.gov/tutorials/cuda-vector-addition/ [8] CUDA – Wikipedia http://en.wikipedia.org/wiki/CUDA Bowie State University Department of Computer Science