0% found this document useful (0 votes)
53 views

Gpu Cuda Part2

This document provides an outline for an introduction to CUDA course. It begins with an introduction to GPUs and their evolution towards general purpose computing. It then discusses key differences between CPUs and GPUs, how latency is hidden on GPUs, and how this enabled the dawn of general purpose GPU (GPGPU) programming. The document outlines CUDA as NVIDIA's programming model for GPGPU, its compilation process, execution model with threads arranged in blocks and grids, and memory model. It provides examples of applications that utilize GPU acceleration like machine learning, scientific computing, and medical imaging.

Uploaded by

Raghav Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Gpu Cuda Part2

This document provides an outline for an introduction to CUDA course. It begins with an introduction to GPUs and their evolution towards general purpose computing. It then discusses key differences between CPUs and GPUs, how latency is hidden on GPUs, and how this enabled the dawn of general purpose GPU (GPGPU) programming. The document outlines CUDA as NVIDIA's programming model for GPGPU, its compilation process, execution model with threads arranged in blocks and grids, and memory model. It provides examples of applications that utilize GPU acceleration like machine learning, scientific computing, and medical imaging.

Uploaded by

Raghav Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IT301: INTRODUCTION TO

CUDA
By,
Ms. Thanmayee
Adhoc Faculty,
Department of IT,
NITK, Surathkal
OUTLINE
● Introduction to GPU
● Evolution of GPU microarchitectures
● General Purpose GPU
● Introduction to CUDA
● CUDA Execution Model
● CUDA Memory Model
● Steps in GPU Execution
● Hello World Program
● CUDA Device Variables
● CUDA Programming examples
CPU vs GPU
● Need to understand how CPUs and GPUs differ
− Simpler calculation versus complex calculation
− Basic graphics versus 3D rendering, animations.
− Few higher capacity cores versus many low capacity
cores
− Latency Intolerance versus Latency Tolerance
− Task Parallelism versus Data Parallelism
− 10s of Threads versus 10,000s of Threads
Latency Hiding in GPU
General Purpose GPU : GPGPU
The dawn of GPGPU
General Purpose Computing on GPU was far from easy back then
− Even for those who knew graphics programming
languages such as OpenGL!
− Developers had to map scientific calculations onto
problems that could be represented by triangles and
polygons.
Applications
Applications
● Machine Learning – self driving cars,
Watson AI Supercomputer.
● Scientific Applications such as Genome
sequencing, molecular simulations.
● Medical Image processing.
● Image tagging in Facebook.
● Numeric weather predictions.
● Oil exploration.
● Movie making.
● Atmospheric simulation.
● Sequencing the novel coronavirus and the
genomes of people afflicted with
COVID-19.
CUDA – Compute Unified Device Architecture
● In 2003, a team of researchers led by Ian Buck unveiled Brook,
the first widely adopted programming model to extend C with
data-parallel constructs.
● Exposed the GPU as a general - purpose processor in a high-
level language
− Most importantly, Brook programs were
● Easier to write than hand-tuned GPU code
● Seven times faster than similar existing code
CUDA – Compute Unified Device Architecture
● NVIDIA invited Ian Buck to join the company.
− Started evolving a solution to seamlessly run C on the GPU.
− Putting the software and hardware together, NVIDIA unveiled CUDA in
2006

− CUDA was launched in 2007.
− The world's first solution for general-computing on GPUs
− CUDA:
■ is a parallel computing architecture and programming model.

■ Includes C/C++ compiler and also support for OpenCL, DirectCompute.


General Structure of the GPU Program in
CUDA

● Host Program – Executed by the


CPU.

− This is a serial code.
− Sets up the parameters for
GPU (kernel) execution.
● Kernel Program – Executed in
Parallel by the SIMD cores
(Streaming Processors) in the
GPU.
Compiling CUDA Program:
CUDA Execution Model
● Threads :
○ perform computations. They run
on Scalar Processor (Streaming
Processors) in GPU.
○ Thousands are needed to get full
efficiency.
● Blocks :
○ Group of Threads. Max. Number
of Threads vary from 1 to 1024.
○ They are alloted to Streaming
Multiprocessors (SMs) in GPU.
○ Multiple blocks can reside in one
SM.
● Grid :
○ Group of Blocks.
○ Holds the complete computation
task. They represent the Kernel.
Blocks in SMs
THANK YOU

You might also like