0% found this document useful (0 votes)

52 views

cs179 2017 Lec01

This document provides an introduction and overview of CS 179: GPU Programming at Caltech. The course will cover GPU computing and parallelization using C++ and CUDA. It lists the TAs, instructors, class times, course requirements including homework, projects, and grading. It provides information about the primary and secondary machines for assignments as well as tips for using your own machine. It also gives a brief history of GPUs and an introduction to GPU concepts like kernels and indexing.

Uploaded by

Rajul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

cs179 2017 Lec01

Uploaded by

Rajul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CS 179: GPU Programming

Lecture 1: Introduction

Images: http://en.wikipedia.org
http://www.pcper.com
http://northdallasradiationoncology.com/
GPU Gems (Nvidia)
Administration
Covered topics:
• (GP)GPU computing/parallelization
• C++ CUDA (parallel computing platform)
TAs:
• [email protected]
• Parker Won ([email protected])
• Nailen Matchstick ([email protected])
• Jordan Bonilla ([email protected])
Website:
• http://courses.cms.caltech.edu/cs179/
Overseeing Instructor:
• Al Barr ([email protected])
Class time:
• ANB 105, MWF 3:00 PM
• Recitations on Fridays
Course Requirements

Fill out this survey: https://goo.gl/forms/4GDYz4PtpcBr0qe03

Homework:
• 6 weekly assignments
• Each worth 10% of grade
Final project:
• 4-week project
• 40% of grade total
P/F Students must receive at least 60% on every
assignment AND the final project
Homework

Due on Wednesdays before class (3PM)

Collaboration policy:
• Discuss ideas and strategies freely, but all code must be
your own
Office Hours: Located in ANB 104
• Times: TBA (will be announced before first set is out)
Extensions
• Ask a TA for one if you have a valid reason
Projects

Topic of your choice

• We will also provide many options
Teams of up to 2 people
• 2-person teams will be held to higher
expectations
Requirements
• Project Proposal
• Progress report(s) and Final Presentation
• More info later…
Machines

Primary machine (multi-GPU):

• Currently being setup. You will gain access shortly after
emailing [email protected]
Secondary machines
• mx.cms.caltech.edu
• minuteman.cms.caltech.edu
• Use your CMS login
• NOTE: Not all assignments work on these machines
Change your password
• Use passwd command
Machines

Alternative: Use your own machine:

• Must have an NVIDIA CUDA-capable GPU
• Virtual machines won’t work
• Exception: Machines with I/O MMU virtualization and
certain GPUs
• Special requirements for:
• Hybrid/optimus systems
• Mac/OS X
Setup guide on the website is outdated. Do not
follow 2016 instructions
The CPU

The “Central Processing Unit”

Traditionally, applications use CPU for primary
calculations
• General-purpose capabilities
• Established technology
• Usually equipped with 8 or less powerful cores
• Optimal for concurrent processes but not large scale
parallel computations

Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg

The GPU

The "Graphics Processing Unit"

Relatively new technology designed for parallelizable problems
• Initially created specifically for graphics
• Became more capable of general computations
GPUs – The Motivation

Raytracing:
for all pixels (i,j):
Calculate ray point and direction in 3d space
if ray intersects object:
calculate lighting at closest object
store color of (i,j) Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981
EXAMPLE

Add two arrays

• A[ ] + B[ ] -> C[ ]

On the CPU:

float C = malloc(N sizeof(float));

for (int i = 0; i < N; i++)
C[i] = A[i] + B[i];
return C;

This operates sequentially… can we do better?

A simple problem…

• On the CPU (multi-threaded, pseudocode):

(allocate memory for C)

Create # of threads equal to number of cores on processor
(around 2, 4, perhaps 8)
(Indicate portions of A, B, C to each thread...)

...

In each thread,
For (i from beginning region of thread)
C[i] <- A[i] + B[i]
//lots of waiting involved for memory reads, writes, ...
Wait for threads to synchronize...

This is slightly faster – 2-8x (slightly more with other tricks)

A simple problem…

• How many threads? How does performance scale?

• Context switching:
• High penalty on the CPU
• Not an issue on the GPU
A simple problem…

• On the GPU:

(allocate memory for A, B, C on GPU)

Create the “kernel” – each thread will perform one (or a few)
additions
Specify the following kernel operation:

For all i‘s (indices) assigned to this thread:

C[i] <- A[i] + B[i]

Start ~20000 (!) threads

Wait for threads to synchronize...
GPU: Strengths Revealed

• Emphasis on parallelism means we have lots of cores

• This allows us to run many threads simultaneously with
no context switches
GPU Computing: Step by Step

• Setup inputs on the host (CPU-accessible memory)

• Allocate memory for outputs on the host
• Allocate memory for inputs on the GPU
• Allocate memory for outputs on the GPU
• Copy inputs from host to GPU
• Start GPU kernel
• Copy output from GPU to host

NOTE: Copying can be asynchronous

The Kernel

• Our “parallel” function

• Given to each thread
• Simple implementation:
Indexing

Can get a block ID and thread ID within the block:

Unique thread ID!
Calling the Kernel
Calling the Kernel (2)
Questions?
GPUs – Brief History

• Initially based on graphics focused

fixed-function pipelines
• Pre-set functions, limited options

http://gamedevelopment.tutsplus.com/articles/the-end-of-
fixed-function-rendering-pipelines-and-how-to-move-on--
cms-21469
Source: Super Mario 64, by Nintendo
GPUs – Brief History

• Shaders
• Could implement one’s own functions!
• GLSL (C-like language)
• Could “sneak in” general-purpose programming!

http://minecraftsix.com/glsl-shaders-mod/
GPUs – Brief History

• “General-purpose computing on GPUs” (GPGPU)

• Hardware has gotten good enough to a point where it’s
basically having a mini-supercomputer
• CUDA (Compute Unified Device Architecture)
• General-purpose parallel computing platform for NVIDIA GPUs
• OpenCL (Open Computing Language)
• General heterogenous computing framework
• Both are accessible as extensions to various languages
• If you’re into python, checkout theano

Cucumber With Java - Paul Watson
100% (2)
Cucumber With Java - Paul Watson
66 pages
cs179_2024_lec01
No ratings yet
cs179_2024_lec01
26 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Lecture 17-Introduction to GPU
No ratings yet
Lecture 17-Introduction to GPU
36 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
GPGPU
No ratings yet
GPGPU
139 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
0-gpu-computing-i-give-it
No ratings yet
0-gpu-computing-i-give-it
57 pages
p10-cuda
No ratings yet
p10-cuda
28 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
1
No ratings yet
1
44 pages
Lec 1
No ratings yet
Lec 1
27 pages
Parralel Demro 001
No ratings yet
Parralel Demro 001
45 pages
CUDA
No ratings yet
CUDA
33 pages
COE4590_15_GPU1
No ratings yet
COE4590_15_GPU1
14 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
chapter-8
No ratings yet
chapter-8
58 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
GPU Programming Slides 1
No ratings yet
GPU Programming Slides 1
33 pages
1 Tutorial Intro
No ratings yet
1 Tutorial Intro
27 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
Lec 6
No ratings yet
Lec 6
16 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
w13s1_MultiprocessingGPU
No ratings yet
w13s1_MultiprocessingGPU
21 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
1 Cuda
100% (1)
1 Cuda
173 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Lec 14
No ratings yet
Lec 14
52 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
Day1 1
No ratings yet
Day1 1
25 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
10 - Introduction and Overview GPGPU
No ratings yet
10 - Introduction and Overview GPGPU
69 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
Owens
No ratings yet
Owens
67 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
CSE_lec4_cuda
No ratings yet
CSE_lec4_cuda
91 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
hpc_architecture
No ratings yet
hpc_architecture
86 pages
hpc_graph
No ratings yet
hpc_graph
22 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
hpc_nbody
No ratings yet
hpc_nbody
23 pages
hpc_iterative
No ratings yet
hpc_iterative
106 pages
hpc_unix
No ratings yet
hpc_unix
46 pages
hpc_pkgconfig
No ratings yet
hpc_pkgconfig
12 pages
hpc_performance
No ratings yet
hpc_performance
13 pages
hpc_linear
No ratings yet
hpc_linear
52 pages
hpc_programming
No ratings yet
hpc_programming
33 pages
hpc_git
No ratings yet
hpc_git
12 pages
hpc_intro
No ratings yet
hpc_intro
16 pages
hpc_arithmetic
No ratings yet
hpc_arithmetic
62 pages
hpc_cmake
No ratings yet
hpc_cmake
76 pages
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
No ratings yet
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
24 pages
0.1 Installation of R Packages
No ratings yet
0.1 Installation of R Packages
10 pages
Equity Structured Products Accumulator/ Decumulator
No ratings yet
Equity Structured Products Accumulator/ Decumulator
5 pages
hpc_scaling
No ratings yet
hpc_scaling
56 pages
hpc_debug
No ratings yet
hpc_debug
38 pages
Lec4 17
No ratings yet
Lec4 17
22 pages
Lec1 17
No ratings yet
Lec1 17
39 pages
Lec2 17
No ratings yet
Lec2 17
27 pages
hpc_cmakeshort
No ratings yet
hpc_cmakeshort
11 pages
Elective I (Math)
No ratings yet
Elective I (Math)
2 pages
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
No ratings yet
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
33 pages
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
No ratings yet
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
59 pages
Flume User Guide
No ratings yet
Flume User Guide
48 pages
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
No ratings yet
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
2 pages
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
No ratings yet
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
22 pages
Numerical Methods in Finance. Part A. (2010-2011)
No ratings yet
Numerical Methods in Finance. Part A. (2010-2011)
23 pages
Concentrating Solar Power: Solar Energy Technologies Program
No ratings yet
Concentrating Solar Power: Solar Energy Technologies Program
2 pages
Ictad Preliminaries Bill No 1 PDF
100% (2)
Ictad Preliminaries Bill No 1 PDF
42 pages
QRH 20140922-020307 MSN6164
No ratings yet
QRH 20140922-020307 MSN6164
230 pages
Fortimail Admin Guide 40 Mr2
No ratings yet
Fortimail Admin Guide 40 Mr2
602 pages
The Global Exploration Roadmap: International Space Exploration Coordination Group
No ratings yet
The Global Exploration Roadmap: International Space Exploration Coordination Group
23 pages
Ms Plate Weight Chart in KG
No ratings yet
Ms Plate Weight Chart in KG
6 pages
HUF75545P3
No ratings yet
HUF75545P3
10 pages
Report of Automatic Papad Makeing Machine
100% (1)
Report of Automatic Papad Makeing Machine
42 pages
PassengersGuidePlatform KCBT English
No ratings yet
PassengersGuidePlatform KCBT English
8 pages
Blackarch Install
No ratings yet
Blackarch Install
14 pages
Nireesh Nagaraj: Astute It Business Executive - Vice President
No ratings yet
Nireesh Nagaraj: Astute It Business Executive - Vice President
3 pages
Manual LB 55 - QX189479-15.12.2012
No ratings yet
Manual LB 55 - QX189479-15.12.2012
52 pages
ZipGrade User Guide JP
No ratings yet
ZipGrade User Guide JP
6 pages
Mio Aerox v1
No ratings yet
Mio Aerox v1
58 pages
Corporate Guide - Ajinomoto Co., Inc
No ratings yet
Corporate Guide - Ajinomoto Co., Inc
22 pages
Spare Consumption April 2024
No ratings yet
Spare Consumption April 2024
1 page
Quantum Plasmon
No ratings yet
Quantum Plasmon
55 pages
Manual Yawei-Pbh-Pba-Press-Brakes
No ratings yet
Manual Yawei-Pbh-Pba-Press-Brakes
20 pages
Lab 08 Solutions
No ratings yet
Lab 08 Solutions
5 pages
DFF Fines
No ratings yet
DFF Fines
104 pages
AJCEP
No ratings yet
AJCEP
32 pages
Relational Databases
No ratings yet
Relational Databases
368 pages
SE_DS_Symo_Advanced_UL_EN
No ratings yet
SE_DS_Symo_Advanced_UL_EN
4 pages
Example Fire and Emergency Evacuation Plan
No ratings yet
Example Fire and Emergency Evacuation Plan
6 pages
JB n20 Install
No ratings yet
JB n20 Install
19 pages
Just-In-Time Purchasing: An Investigation For Research and Applications
No ratings yet
Just-In-Time Purchasing: An Investigation For Research and Applications
8 pages
P 800/P 801 M0CG/M0CH: Parts Catalog
No ratings yet
P 800/P 801 M0CG/M0CH: Parts Catalog
26 pages
A CMOS Bandgap Reference
No ratings yet
A CMOS Bandgap Reference
10 pages
Chennab Rail Bridge
100% (1)
Chennab Rail Bridge
50 pages

cs179 2017 Lec01

Uploaded by

cs179 2017 Lec01

Uploaded by

CS 179: GPU Programming

Fill out this survey: https://goo.gl/forms/4GDYz4PtpcBr0qe03

Due on Wednesdays before class (3PM)

Topic of your choice

Primary machine (multi-GPU):

Alternative: Use your own machine:

The “Central Processing Unit”

Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg

The "Graphics Processing Unit"

Add two arrays

float *C = malloc(N * sizeof(float));

This operates sequentially… can we do better?

• On the CPU (multi-threaded, pseudocode):

(allocate memory for C)

This is slightly faster – 2-8x (slightly more with other tricks)

• How many threads? How does performance scale?

(allocate memory for A, B, C on GPU)

For all i‘s (indices) assigned to this thread:

Start ~20000 (!) threads

• Emphasis on parallelism means we have lots of cores

• Setup inputs on the host (CPU-accessible memory)

NOTE: Copying can be asynchronous

• Our “parallel” function

Can get a block ID and thread ID within the block:

• Initially based on graphics focused

• “General-purpose computing on GPUs” (GPGPU)

You might also like

float C = malloc(N sizeof(float));