Gpu Cuda Part2

This document provides an outline for an introduction to CUDA course. It begins with an introduction to GPUs and their evolution towards general purpose computing. It then discusses key differences between CPUs and GPUs, how latency is hidden on GPUs, and how this enabled the dawn of general purpose GPU (GPGPU) programming. The document outlines CUDA as NVIDIA's programming model for GPGPU, its compilation process, execution model with threads arranged in blocks and grids, and memory model. It provides examples of applications that utilize GPU acceleration like machine learning, scientific computing, and medical imaging.

Uploaded by

Raghav Ganesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Gpu Cuda Part2

Uploaded by

Raghav Ganesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

IT301: INTRODUCTION TO

CUDA
By,
Ms. Thanmayee
Adhoc Faculty,
Department of IT,
NITK, Surathkal
OUTLINE
● Introduction to GPU
● Evolution of GPU microarchitectures
● General Purpose GPU
● Introduction to CUDA
● CUDA Execution Model
● CUDA Memory Model
● Steps in GPU Execution
● Hello World Program
● CUDA Device Variables
● CUDA Programming examples
CPU vs GPU
● Need to understand how CPUs and GPUs differ
− Simpler calculation versus complex calculation
− Basic graphics versus 3D rendering, animations.
− Few higher capacity cores versus many low capacity
cores
− Latency Intolerance versus Latency Tolerance
− Task Parallelism versus Data Parallelism
− 10s of Threads versus 10,000s of Threads
Latency Hiding in GPU
General Purpose GPU : GPGPU
The dawn of GPGPU
General Purpose Computing on GPU was far from easy back then
− Even for those who knew graphics programming
languages such as OpenGL!
− Developers had to map scientific calculations onto
problems that could be represented by triangles and
polygons.
Applications
Applications
● Machine Learning – self driving cars,
Watson AI Supercomputer.
● Scientific Applications such as Genome
sequencing, molecular simulations.
● Medical Image processing.
● Image tagging in Facebook.
● Numeric weather predictions.
● Oil exploration.
● Movie making.
● Atmospheric simulation.
● Sequencing the novel coronavirus and the
genomes of people afflicted with
COVID-19.
CUDA – Compute Unified Device Architecture
● In 2003, a team of researchers led by Ian Buck unveiled Brook,
the first widely adopted programming model to extend C with
data-parallel constructs.
● Exposed the GPU as a general - purpose processor in a high-
level language
− Most importantly, Brook programs were
● Easier to write than hand-tuned GPU code
● Seven times faster than similar existing code
CUDA – Compute Unified Device Architecture
● NVIDIA invited Ian Buck to join the company.
− Started evolving a solution to seamlessly run C on the GPU.
− Putting the software and hardware together, NVIDIA unveiled CUDA in
2006
●
− CUDA was launched in 2007.
− The world's first solution for general-computing on GPUs
− CUDA:
■ is a parallel computing architecture and programming model.

■ Includes C/C++ compiler and also support for OpenCL, DirectCompute.

General Structure of the GPU Program in
CUDA

● Host Program – Executed by the

CPU.
●
− This is a serial code.
− Sets up the parameters for
GPU (kernel) execution.
● Kernel Program – Executed in
Parallel by the SIMD cores
(Streaming Processors) in the
GPU.
Compiling CUDA Program:
CUDA Execution Model
● Threads :
○ perform computations. They run
on Scalar Processor (Streaming
Processors) in GPU.
○ Thousands are needed to get full
efficiency.
● Blocks :
○ Group of Threads. Max. Number
of Threads vary from 1 to 1024.
○ They are alloted to Streaming
Multiprocessors (SMs) in GPU.
○ Multiple blocks can reside in one
SM.
● Grid :
○ Group of Blocks.
○ Holds the complete computation
task. They represent the Kernel.
Blocks in SMs
THANK YOU

Kingkong 1024 Moving Head DMX Controller.203.2
No ratings yet
Kingkong 1024 Moving Head DMX Controller.203.2
18 pages
CUDA 1_Introduction to GPU, CUDA (1)
No ratings yet
CUDA 1_Introduction to GPU, CUDA (1)
21 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
cuuda nvidai guide_Part1
No ratings yet
cuuda nvidai guide_Part1
15 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
Cuda-: An Emerging Technology That Can Make Robots Reflex Action Faster
No ratings yet
Cuda-: An Emerging Technology That Can Make Robots Reflex Action Faster
11 pages
1. Introduction — CUDA C Programming Guide
No ratings yet
1. Introduction — CUDA C Programming Guide
573 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
CUDA
No ratings yet
CUDA
46 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
CUDA
No ratings yet
CUDA
33 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
0-gpu-computing-i-give-it
No ratings yet
0-gpu-computing-i-give-it
57 pages
Cuda C
No ratings yet
Cuda C
70 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
course-7
No ratings yet
course-7
21 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
No ratings yet
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
71 pages
p10-cuda
No ratings yet
p10-cuda
28 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
chapter-8
No ratings yet
chapter-8
58 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
Cuda
No ratings yet
Cuda
69 pages
Lec 1
No ratings yet
Lec 1
27 pages
Cuda PDF
No ratings yet
Cuda PDF
18 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Parallel Processing With Cuda
No ratings yet
Parallel Processing With Cuda
25 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
28 pages
Introduction To The Cuda Programming
No ratings yet
Introduction To The Cuda Programming
25 pages
Christian Eh An Sen 2
No ratings yet
Christian Eh An Sen 2
18 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
04 IntroductionGPUsCUDA
No ratings yet
04 IntroductionGPUsCUDA
25 pages
CH19 COA10e
No ratings yet
CH19 COA10e
20 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
CSE_lec4_cuda
No ratings yet
CSE_lec4_cuda
91 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
cuda
No ratings yet
cuda
25 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
CUDA
No ratings yet
CUDA
20 pages
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
Mastering CUDA C Programming
From Everand
Mastering CUDA C Programming
Ed Norex
No ratings yet
ME352 Lecture 1 Static Force Analysis
No ratings yet
ME352 Lecture 1 Static Force Analysis
68 pages
ME352 Lecture 4 Cams
No ratings yet
ME352 Lecture 4 Cams
43 pages
Kinematic and Dynamic Analysis of Cam and Follower
No ratings yet
Kinematic and Dynamic Analysis of Cam and Follower
12 pages
Gpu Cuda Part1
No ratings yet
Gpu Cuda Part1
27 pages
Academic Calendar Jan July 2020 Revised
No ratings yet
Academic Calendar Jan July 2020 Revised
2 pages
Cam Kinematics - Cam Profile
No ratings yet
Cam Kinematics - Cam Profile
12 pages
Details of Minor Course For 2019 Batch
No ratings yet
Details of Minor Course For 2019 Batch
9 pages
History of OS Final
No ratings yet
History of OS Final
5 pages
Assignment#1
No ratings yet
Assignment#1
2 pages
Broadcom MegaRAID MR416i-A x16 Lanes 4GB Cache NVMeSAS 12G Controller For HPE Gen10 Plus-PSN1013314980BEEN
No ratings yet
Broadcom MegaRAID MR416i-A x16 Lanes 4GB Cache NVMeSAS 12G Controller For HPE Gen10 Plus-PSN1013314980BEEN
4 pages
24 Port - 48 Port CAT6A Patch Panel
No ratings yet
24 Port - 48 Port CAT6A Patch Panel
2 pages
Power Supply Export
No ratings yet
Power Supply Export
2 pages
HP P900 Specifications
No ratings yet
HP P900 Specifications
2 pages
Versions Modbus 7SJ6x 6MD63 en
No ratings yet
Versions Modbus 7SJ6x 6MD63 en
3 pages
Experiment 2 Updated
No ratings yet
Experiment 2 Updated
11 pages
(ACV-S06) Week 06 - Pre-Task - Quiz - Weekly Quiz (PA) - INGLES III
No ratings yet
(ACV-S06) Week 06 - Pre-Task - Quiz - Weekly Quiz (PA) - INGLES III
4 pages
Nokia 5 Schematics (Phonelumi - Com)
No ratings yet
Nokia 5 Schematics (Phonelumi - Com)
92 pages
Lenovo Thinpad Series
No ratings yet
Lenovo Thinpad Series
189 pages
Introduction To Assembly Language Chapter 1
No ratings yet
Introduction To Assembly Language Chapter 1
27 pages
Computer Repair and Maintenance
No ratings yet
Computer Repair and Maintenance
3 pages
Dte Model
No ratings yet
Dte Model
23 pages
EasyDiagnost Eleva With Digital RF PRL
No ratings yet
EasyDiagnost Eleva With Digital RF PRL
3 pages
puig-quillas-ref-8559-mounting-instructions
No ratings yet
puig-quillas-ref-8559-mounting-instructions
2 pages
Cpu - Lesson Plan
No ratings yet
Cpu - Lesson Plan
3 pages
HISTORY OF COMPUTERS Long Notes
No ratings yet
HISTORY OF COMPUTERS Long Notes
9 pages
Z180-Based Boards and Changes To The Serial Eeprom: BIOS .Bin Files
No ratings yet
Z180-Based Boards and Changes To The Serial Eeprom: BIOS .Bin Files
5 pages
Foresee-Ncemasld-32g C520992
No ratings yet
Foresee-Ncemasld-32g C520992
29 pages
Computer Component
No ratings yet
Computer Component
44 pages
Class11_computer Sytem Organisation
No ratings yet
Class11_computer Sytem Organisation
48 pages
Cad 0230
No ratings yet
Cad 0230
1 page
HP Elitebook 835 G8 Notebook PC: Interactive Bios Simulator
No ratings yet
HP Elitebook 835 G8 Notebook PC: Interactive Bios Simulator
46 pages
ATmega328P - USART Registers
No ratings yet
ATmega328P - USART Registers
4 pages
MPC5xxx Programming The eTPU
No ratings yet
MPC5xxx Programming The eTPU
14 pages
PDF Practical Guide to Fedora and Red Hat Enterprise Linux A 7th Edition Sobell Solutions Manual download
100% (4)
PDF Practical Guide to Fedora and Red Hat Enterprise Linux A 7th Edition Sobell Solutions Manual download
40 pages
SolaX Power Troubleshooting X1MINI&AIR&BOOST PDF
100% (1)
SolaX Power Troubleshooting X1MINI&AIR&BOOST PDF
10 pages
Data Sheet RTL8762AR-CG-Realtek
No ratings yet
Data Sheet RTL8762AR-CG-Realtek
125 pages

Gpu Cuda Part2

Uploaded by

Gpu Cuda Part2

Uploaded by

IT301: INTRODUCTION TO

■ Includes C/C++ compiler and also support for OpenCL, DirectCompute.

● Host Program – Executed by the

You might also like