COE4590_16_GPU2

The document discusses the architecture of Graphics Processing Units (GPUs), focusing on the organization of threads, blocks, and grids for parallel processing. It explains how threads are lightweight and grouped into blocks, which are further organized into grids, allowing for efficient data processing. The document also illustrates the execution of threads in a multi-threaded operation using an example of counting occurrences of a number in an array.

Uploaded by

21COB164 Aisha Tanvir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

COE4590_16_GPU2

Uploaded by

21COB164 Aisha Tanvir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Distributed and Parallel System

(Lecture-16)
Graphics Processing Unit (GPU)
(II Lecture)

BY
ABDUS SAMAD
Threads, Blocks, and Grids
 A thread is associated with each data element.
 GPU supports 1000s lightweight threads as
compared to CPU which needs only a few heavy
ones.
 Threads are organized into blocks.
 A whole thread block consists of many elements
which are executed per thread at a time.
 Blocks are organized into a grid. Blocks are
executed independently and in any order.
 Thread management handled by GPU hardware
not by applications or OS:
How a Thread works?
 Example: How many times 6 appear in an array
of 16 elements?
 For single threaded operation there is a scan of
16 nos. sequentially and result of match is
counted in a counter.
 For a multi-threaded operation lets there are 4
threads.
 Each thread will examine 4 elements of thread
running on one GPU.
 There is one grid consisting of one block.
How a Thread works?
3 6 7 5 3 5 6 2 9 1 2 7 0 9 3 6

 Instead of sequential data distribution here

Cyclic Data Distribution.
 Thread 0 examines array elements 0, 4, 8, 12
 Thread 1 examines array elements 1, 5, 9, 13
 Thread 2 examines array elements 2, 6, 10, 14
 Thread 3 examines array elements 3, 7, 11, 15
Threads, Blocks, and Grids
 Thread: Consists of 32 elements
 Block: Consist of 16 threads forming a code of 512
elements
 Grid: consists of 16 blocks forming a code of 8192
elements that works over all elements
 Thread Block:

Analogous to a strip-mined vector loop with vector
length of 32.

Breaks down the vector into manageable set of
vector elements.
 32 elements/thread x 16 SIMD threads/block = 512 elements/block
 SIMD instruction executes 32 elements at a time
 Grid size = 8192 Vector elements / 512 elements/block = 16 blocks.
Threads, Blocks, and Grids
 Threads are grouped into
thread blocks.
 Blocks are grouped into a
single grid.
 The grid is executed on the
GPU as a kernel.
 A kernel is executed as a
grid of thread blocks.
 Thread from the same block
cooperate and share
memory space.
Threads, Blocks, and Grids
 A thread block is a batch of
threads that can cooperate
with each other by:

synchronization of their
execution for hazard-free
share memory accesses.

efficient sharing of data
through a low latency
shared memory.
 Threads from different
blocks can not cooperate/
share the data.
Block and Thread IDs
 Thread and blocks have
IDs:

Block IDs are either
1D or 2D.
Block (x, y)

Thread IDs may be
1D, 2D or 3D.
Thread (x, y, z)
 Simplifies memory
addresses when
processing
multidimensional data.
 32 elements per
thread
 16 threads/block
 512 Threads/block
 Grid consists of
entire code
 Grid size = 8192
 It requires 8192/512
(elements/block)
= 16 Blocks
 Block is assigned
to multithreaded
processor
Sample GPU SIMT Code (Reductions)
 Example:

Transfor
m To

 First two operations are done in parallel on different

GPUs
 Last operation is done serially
Thanks

GPU Computing 2
No ratings yet
GPU Computing 2
28 pages
Parallel Architecture Classification
50% (2)
Parallel Architecture Classification
41 pages
Topic: Identify The Front and Rear Panel Controls and Ports On A PC, Cases, Cooling, Cables and Connectors
0% (1)
Topic: Identify The Front and Rear Panel Controls and Ports On A PC, Cases, Cooling, Cables and Connectors
29 pages
computer maintenance and support schemes
No ratings yet
computer maintenance and support schemes
2 pages
chapter-8
No ratings yet
chapter-8
58 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Assignment 1
No ratings yet
Assignment 1
1 page
XY1 Load Cell Simulator
100% (1)
XY1 Load Cell Simulator
30 pages
VSCSE-Lecture3-cuda-memory-model-2012
No ratings yet
VSCSE-Lecture3-cuda-memory-model-2012
31 pages
Magician Robot Control
No ratings yet
Magician Robot Control
4 pages
Presentation1 (1) hpc mod 3
No ratings yet
Presentation1 (1) hpc mod 3
51 pages
Gpu Architecture
No ratings yet
Gpu Architecture
43 pages
Vector Processors
No ratings yet
Vector Processors
20 pages
COE4590_11_Feng
No ratings yet
COE4590_11_Feng
16 pages
SSRN Id3938897
No ratings yet
SSRN Id3938897
56 pages
COE4590_15_GPU1
No ratings yet
COE4590_15_GPU1
14 pages
Lec 3
No ratings yet
Lec 3
48 pages
Threads and Memory7
No ratings yet
Threads and Memory7
42 pages
Lecture4 CUDA Threads Part2
No ratings yet
Lecture4 CUDA Threads Part2
15 pages
dhcp
No ratings yet
dhcp
5 pages
Memory Hardware in G80: © David Kirk/NVIDIA and Wen-Mei W Hwu 2007-2009 1
No ratings yet
Memory Hardware in G80: © David Kirk/NVIDIA and Wen-Mei W Hwu 2007-2009 1
21 pages
HPC
No ratings yet
HPC
90 pages
2
No ratings yet
2
58 pages
GPU_Programming_slides_3
No ratings yet
GPU_Programming_slides_3
73 pages
Lec 6
No ratings yet
Lec 6
16 pages
Applsci 12 12070
No ratings yet
Applsci 12 12070
15 pages
SPL - Software Product Line
No ratings yet
SPL - Software Product Line
8 pages
Managing Host and Cluster Lifecycle
No ratings yet
Managing Host and Cluster Lifecycle
255 pages
Experiment 2
No ratings yet
Experiment 2
3 pages
Cuda 101
No ratings yet
Cuda 101
53 pages
CS 179: GPU Computing: Lecture 2: More Basics
No ratings yet
CS 179: GPU Computing: Lecture 2: More Basics
23 pages
06-CUDA Thread Organization
No ratings yet
06-CUDA Thread Organization
27 pages
Hardware
No ratings yet
Hardware
54 pages
onur-digitaldesign-2020-lecture20-gpu-beforelecture
No ratings yet
onur-digitaldesign-2020-lecture20-gpu-beforelecture
73 pages
NAT
No ratings yet
NAT
5 pages
COE4590_8_Multiprocessor
No ratings yet
COE4590_8_Multiprocessor
17 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
Developing Kernels: Part 2: Algorithm Considerations, Multi-Kernel Programs and Optimization
No ratings yet
Developing Kernels: Part 2: Algorithm Considerations, Multi-Kernel Programs and Optimization
23 pages
OpenCL Tutorial - Basics
No ratings yet
OpenCL Tutorial - Basics
24 pages
CSE_lec4_cuda
No ratings yet
CSE_lec4_cuda
91 pages
ipfrag
No ratings yet
ipfrag
4 pages
0-gpu-computing-i-give-it
No ratings yet
0-gpu-computing-i-give-it
57 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
Lecture2 GPU Architecture_2025
No ratings yet
Lecture2 GPU Architecture_2025
46 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
1
No ratings yet
1
44 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
MPMC GTU Questions
No ratings yet
MPMC GTU Questions
4 pages
Sequencer For Dedusting Plants - Dc32
100% (1)
Sequencer For Dedusting Plants - Dc32
12 pages
Parralel Demro 001
No ratings yet
Parralel Demro 001
45 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
Lect11 12 Cuda Threads
No ratings yet
Lect11 12 Cuda Threads
25 pages
GPU Computing 3
No ratings yet
GPU Computing 3
32 pages
COE4590_10_Flyns
No ratings yet
COE4590_10_Flyns
15 pages
Project 2
No ratings yet
Project 2
5 pages
JPerf and IPerf
No ratings yet
JPerf and IPerf
5 pages
Class 10
No ratings yet
Class 10
13 pages
Daftar Inventaris Lab. Komputer
No ratings yet
Daftar Inventaris Lab. Komputer
1 page
Slides - Chapter 6
No ratings yet
Slides - Chapter 6
59 pages
Cip 8
No ratings yet
Cip 8
4 pages
Lec 14
No ratings yet
Lec 14
52 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
Esab Vision 500 Manual
100% (1)
Esab Vision 500 Manual
327 pages
Marten Repeller: Inaudible 90 DB
No ratings yet
Marten Repeller: Inaudible 90 DB
4 pages
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
No ratings yet
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
53 pages
GPU Fundamentals
No ratings yet
GPU Fundamentals
20 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Traverse: Optical Service Interface Modules
No ratings yet
Traverse: Optical Service Interface Modules
2 pages
ADG004
No ratings yet
ADG004
18 pages
Leica Zeno 20 Android: User Manual
No ratings yet
Leica Zeno 20 Android: User Manual
42 pages
Gpu Cuda 2
No ratings yet
Gpu Cuda 2
72 pages
GPU Architecture
No ratings yet
GPU Architecture
17 pages
Lec 1
No ratings yet
Lec 1
27 pages
A41101 - How CUDA Programming Works
No ratings yet
A41101 - How CUDA Programming Works
116 pages
Bills of Quantities KMTC For 4no Classrooms PDF
100% (1)
Bills of Quantities KMTC For 4no Classrooms PDF
18 pages
Referencias Variadores FL v1000
No ratings yet
Referencias Variadores FL v1000
2 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
Gpgpu Final
No ratings yet
Gpgpu Final
124 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
GPU Programming: CUDA
No ratings yet
GPU Programming: CUDA
29 pages
01 - Basic Management Configuration
No ratings yet
01 - Basic Management Configuration
54 pages
Building Microsoft Windows Server 2012 Clusters On Dell Poweredge VRTX
No ratings yet
Building Microsoft Windows Server 2012 Clusters On Dell Poweredge VRTX
52 pages
Control of Boiler Operation Using PLC - SCADA
No ratings yet
Control of Boiler Operation Using PLC - SCADA
6 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
TB 11-5820-890-30-5 Technical Bulletin: Approved For Public Release Distribution Is Unlimited
No ratings yet
TB 11-5820-890-30-5 Technical Bulletin: Approved For Public Release Distribution Is Unlimited
40 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
CUDA
No ratings yet
CUDA
33 pages
Livid Looper: User's Manual
No ratings yet
Livid Looper: User's Manual
10 pages
Globe Valve Inspection and Functionality Test Check Sheet - SafetyCulture
No ratings yet
Globe Valve Inspection and Functionality Test Check Sheet - SafetyCulture
36 pages
1991 Plymouth Grand Voyager SE 1991 Plymouth Grand Voyager SE
No ratings yet
1991 Plymouth Grand Voyager SE 1991 Plymouth Grand Voyager SE
15 pages
8085 Lab
No ratings yet
8085 Lab
13 pages
Culv5 v2.2.2 User Guide
No ratings yet
Culv5 v2.2.2 User Guide
8 pages
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
From Everand
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
Rodrigo Copetti
No ratings yet
SNES Architecture: Architecture of Consoles: A Practical Analysis, #4
From Everand
SNES Architecture: Architecture of Consoles: A Practical Analysis, #4
Rodrigo Copetti
No ratings yet
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
From Everand
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
Rodrigo Copetti
2/5 (1)

COE4590_16_GPU2

Uploaded by

COE4590_16_GPU2

Uploaded by

Distributed and Parallel System

 Instead of sequential data distribution here

 First two operations are done in parallel on different

You might also like