0% found this document useful (0 votes)
52 views

cs179 2017 Lec01

This document provides an introduction and overview of CS 179: GPU Programming at Caltech. The course will cover GPU computing and parallelization using C++ and CUDA. It lists the TAs, instructors, class times, course requirements including homework, projects, and grading. It provides information about the primary and secondary machines for assignments as well as tips for using your own machine. It also gives a brief history of GPUs and an introduction to GPU concepts like kernels and indexing.

Uploaded by

Rajul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

cs179 2017 Lec01

This document provides an introduction and overview of CS 179: GPU Programming at Caltech. The course will cover GPU computing and parallelization using C++ and CUDA. It lists the TAs, instructors, class times, course requirements including homework, projects, and grading. It provides information about the primary and secondary machines for assignments as well as tips for using your own machine. It also gives a brief history of GPUs and an introduction to GPU concepts like kernels and indexing.

Uploaded by

Rajul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CS 179: GPU Programming

Lecture 1: Introduction

Images: http://en.wikipedia.org
http://www.pcper.com
http://northdallasradiationoncology.com/
GPU Gems (Nvidia)
Administration
Covered topics:
• (GP)GPU computing/parallelization
• C++ CUDA (parallel computing platform)
TAs:
[email protected]
• Parker Won ([email protected])
• Nailen Matchstick ([email protected])
• Jordan Bonilla ([email protected])
Website:
• http://courses.cms.caltech.edu/cs179/
Overseeing Instructor:
• Al Barr ([email protected])
Class time:
• ANB 105, MWF 3:00 PM
• Recitations on Fridays
Course Requirements

Fill out this survey: https://goo.gl/forms/4GDYz4PtpcBr0qe03


Homework:
• 6 weekly assignments
• Each worth 10% of grade
Final project:
• 4-week project
• 40% of grade total
P/F Students must receive at least 60% on every
assignment AND the final project
Homework

Due on Wednesdays before class (3PM)


Collaboration policy:
• Discuss ideas and strategies freely, but all code must be
your own
Office Hours: Located in ANB 104
• Times: TBA (will be announced before first set is out)
Extensions
• Ask a TA for one if you have a valid reason
Projects

Topic of your choice


• We will also provide many options
Teams of up to 2 people
• 2-person teams will be held to higher
expectations
Requirements
• Project Proposal
• Progress report(s) and Final Presentation
• More info later…
Machines

Primary machine (multi-GPU):


• Currently being setup. You will gain access shortly after
emailing [email protected]
Secondary machines
• mx.cms.caltech.edu
• minuteman.cms.caltech.edu
• Use your CMS login
• NOTE: Not all assignments work on these machines
Change your password
• Use passwd command
Machines

Alternative: Use your own machine:


• Must have an NVIDIA CUDA-capable GPU
• Virtual machines won’t work
• Exception: Machines with I/O MMU virtualization and
certain GPUs
• Special requirements for:
• Hybrid/optimus systems
• Mac/OS X
Setup guide on the website is outdated. Do not
follow 2016 instructions
The CPU

The “Central Processing Unit”


Traditionally, applications use CPU for primary
calculations
• General-purpose capabilities
• Established technology
• Usually equipped with 8 or less powerful cores
• Optimal for concurrent processes but not large scale
parallel computations

Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg


The GPU

The "Graphics Processing Unit"


Relatively new technology designed for parallelizable problems
• Initially created specifically for graphics
• Became more capable of general computations
GPUs – The Motivation

Raytracing:
for all pixels (i,j):
Calculate ray point and direction in 3d space
if ray intersects object:
calculate lighting at closest object
store color of (i,j) Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981
EXAMPLE

Add two arrays


• A[ ] + B[ ] -> C[ ]

On the CPU:

float *C = malloc(N * sizeof(float));


for (int i = 0; i < N; i++)
C[i] = A[i] + B[i];
return C;

This operates sequentially… can we do better?


A simple problem…

• On the CPU (multi-threaded, pseudocode):

(allocate memory for C)


Create # of threads equal to number of cores on processor
(around 2, 4, perhaps 8)
(Indicate portions of A, B, C to each thread...)

...

In each thread,
For (i from beginning region of thread)
C[i] <- A[i] + B[i]
//lots of waiting involved for memory reads, writes, ...
Wait for threads to synchronize...

This is slightly faster – 2-8x (slightly more with other tricks)


A simple problem…

• How many threads? How does performance scale?

• Context switching:
• High penalty on the CPU
• Not an issue on the GPU
A simple problem…

• On the GPU:

(allocate memory for A, B, C on GPU)


Create the “kernel” – each thread will perform one (or a few)
additions
Specify the following kernel operation:

For all i‘s (indices) assigned to this thread:


C[i] <- A[i] + B[i]

Start ~20000 (!) threads


Wait for threads to synchronize...
GPU: Strengths Revealed

• Emphasis on parallelism means we have lots of cores


• This allows us to run many threads simultaneously with
no context switches
GPU Computing: Step by Step

• Setup inputs on the host (CPU-accessible memory)


• Allocate memory for outputs on the host
• Allocate memory for inputs on the GPU
• Allocate memory for outputs on the GPU
• Copy inputs from host to GPU
• Start GPU kernel
• Copy output from GPU to host

NOTE: Copying can be asynchronous


The Kernel

• Our “parallel” function


• Given to each thread
• Simple implementation:
Indexing

Can get a block ID and thread ID within the block:


Unique thread ID!
Calling the Kernel
Calling the Kernel (2)
Questions?
GPUs – Brief History

• Initially based on graphics focused


fixed-function pipelines
• Pre-set functions, limited options

http://gamedevelopment.tutsplus.com/articles/the-end-of-
fixed-function-rendering-pipelines-and-how-to-move-on--
cms-21469
Source: Super Mario 64, by Nintendo
GPUs – Brief History

• Shaders
• Could implement one’s own functions!
• GLSL (C-like language)
• Could “sneak in” general-purpose programming!

http://minecraftsix.com/glsl-shaders-mod/
GPUs – Brief History

• “General-purpose computing on GPUs” (GPGPU)


• Hardware has gotten good enough to a point where it’s
basically having a mini-supercomputer
• CUDA (Compute Unified Device Architecture)
• General-purpose parallel computing platform for NVIDIA GPUs
• OpenCL (Open Computing Language)
• General heterogenous computing framework
• Both are accessible as extensions to various languages
• If you’re into python, checkout theano

You might also like