Data Parallelism, Task Parallelism, CPU,GPU (1)

The document compares CPUs and GPUs, highlighting their architectures and performance capabilities. CPUs are optimized for sequential processing with fewer powerful cores, while GPUs excel in parallel processing with many simpler cores. Performance metrics such as computational performance and memory bandwidth are discussed, along with example code demonstrating CPU and GPU execution times.

Uploaded by

TIKTOK DHAMAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views13 pages

Data Parallelism, Task Parallelism, CPU,GPU (1)

Uploaded by

TIKTOK DHAMAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

CPU vs GPU , Performance

analysis , Terminology
Group Member
Maasma Zari 2022-CS-
504
Zameer ul Hassan 2022-
CS-540
Haris Khan 2022-CS-
556
What is a CPU?

•CPU (Central Processing Unit)

is known as the brain of the
computer.
•Optimized for sequential serial
processing.
•Excellent at handling complex
instructions and few tasks at
high speed.
Example Tasks:
.

•Running operating systems

•Managing input/output operations
•Handling background processe
Architecture of CPU

•Few cores (4 to 32 cores typically).

.
•Each core is very powerful and complex.
•Large cache memory.
•Focus on minimizing latency (responding
quickly).
Diagram: (Simple core architecture illustration)
What is a GPU?
•GPU (Graphics Processing
Unit) is designed for massively
.

parallel tasks.
•Originally built for rendering
graphics.
•Now used in AI, scientific
simulations, video editing, etc.
•Can handle thousands of simple
tasks at the same time.
Architecture of GPU

. •Hundreds to thousands of simple cores.

•Smaller cache memory per core.
•High throughput (focus on executing
many operations at once).
•Ideal for Data Parallelism.
Diagram: (Grid of many small cores)
Code:
#include <iostream> cudaMalloc((void**)&dev_a, size);
#include <cuda_runtime.h> cudaMalloc((void**)&dev_b, size);
#include <chrono> cudaMalloc((void**)&dev_c, size);

__global__ void addKernelGPU(int *c, const int *a, const int *b, int n) { cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice);
int i = blockDim.x * blockIdx.x + threadIdx.x; cudaMemcpy(dev_b, b, size, cudaMemcpyHostToDevice);
if (i < n) {
c[i] = a[i] + b[i]; // CPU Execution
} auto start_cpu = std::chrono::high_resolution_clock::now();
} addCPU(c_cpu, a, b, N);
auto end_cpu = std::chrono::high_resolution_clock::now();
void addCPU(int *c, const int *a, const int *b, int n) { std::chrono::duration<double> cpu_duration = end_cpu - start_cpu;
for (int i = 0; i < n; ++i) { std::cout << "CPU Time: " << cpu_duration.count() << " seconds" <<
c[i] = a[i] + b[i]; std::endl;
}
} // GPU Execution
auto start_gpu = std::chrono::high_resolution_clock::now();
int main() { addKernelGPU<<<(N + 255) / 256, 256>>>(dev_c, dev_a, dev_b, N);
const int N = 1 << 20; // 1 million elements cudaMemcpy(c_gpu, dev_c, size, cudaMemcpyDeviceToHost);
size_t size = N * sizeof(int); auto end_gpu = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> gpu_duration = end_gpu - start_gpu;
int *a = new int[N]; std::cout << "GPU Time: " << gpu_duration.count() << " seconds" <<
int *b = new int[N]; std::endl;
int *c_cpu = new int[N];
int *c_gpu = new int[N]; // Cleanup
cudaFree(dev_a);
for (int i = 0; i < N; ++i) { cudaFree(dev_b);
a[i] = i; cudaFree(dev_c);
b[i] = i * 2; delete[] a;
} delete[] b;
delete[] c_cpu;
int *dev_a = nullptr; delete[] c_gpu;
int *dev_b = nullptr;
int *dev_c = nullptr; return 0;
Output:

CPU Time: 0.06 seconds

GPU Time: 0.002 seconds
Metric Description

Measures the number of floating-

Peak Computational point operations per second
Performance (GFLOPS) (billions) that a system can
achieve.
Measures how fast data can be
Memory Bandwidth (GB/sec) moved between memory and
processor.
How effectively the processor
Efficiency Ratio
reads/writes data from/to memory.
Term Meaning

Host CPU and its memory

Device GPU and its memory
.

Report of Bus Stop Simulation
86% (7)
Report of Bus Stop Simulation
40 pages
Unit1-Building Models With Tensorflow
No ratings yet
Unit1-Building Models With Tensorflow
17 pages
Data Parallelism, Task Parallelism, CPU,GPU
No ratings yet
Data Parallelism, Task Parallelism, CPU,GPU
13 pages
G80 Cuda
No ratings yet
G80 Cuda
25 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
L06_GPGPU_CUDA_Programming_1
No ratings yet
L06_GPGPU_CUDA_Programming_1
23 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
2023-CSC14120-Lecture01-CUDAIntroduction
No ratings yet
2023-CSC14120-Lecture01-CUDAIntroduction
32 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
2023 CSC14120 Lecture05 CUDAMemories
No ratings yet
2023 CSC14120 Lecture05 CUDAMemories
48 pages
DS1822 - Parallel Computing-unit3
No ratings yet
DS1822 - Parallel Computing-unit3
17 pages
CUDA
No ratings yet
CUDA
33 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
note2_4
No ratings yet
note2_4
11 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
AcceleratingAIAdvancements Pre Print Doube Blind
No ratings yet
AcceleratingAIAdvancements Pre Print Doube Blind
9 pages
APznzaa7hF-mfVj2V8zO8HsZAO1P27t34A_Cwjs4-Z3dfKvBUC5VsYBuhEAJ9SIGkA_GXNl5dyWxHJkRO3WAl2Jt4EKGp-jnhYlLaWvgg0wLs49f16rQ9FnUS0CCjb-vIvNwOm12gNSGVrKSlqloDZSL1rH-gaTCVskKMNLwlnBmLJqnqBYBomhOI-umTK9SEbJe5htEpTgTAzDOWsEifZHJrzFN3v8RrsLh3b6BmYq
No ratings yet
APznzaa7hF-mfVj2V8zO8HsZAO1P27t34A_Cwjs4-Z3dfKvBUC5VsYBuhEAJ9SIGkA_GXNl5dyWxHJkRO3WAl2Jt4EKGp-jnhYlLaWvgg0wLs49f16rQ9FnUS0CCjb-vIvNwOm12gNSGVrKSlqloDZSL1rH-gaTCVskKMNLwlnBmLJqnqBYBomhOI-umTK9SEbJe5htEpTgTAzDOWsEifZHJrzFN3v8RrsLh3b6BmYq
6 pages
Lab 1 Intro To High Performance Computing
No ratings yet
Lab 1 Intro To High Performance Computing
8 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Vector Addition
No ratings yet
Vector Addition
3 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
GPGPU Tutorial
No ratings yet
GPGPU Tutorial
155 pages
Owens
No ratings yet
Owens
67 pages
CUDA_1
No ratings yet
CUDA_1
45 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
04 IntroductionGPUsCUDA
No ratings yet
04 IntroductionGPUsCUDA
25 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
Lecture3 Fundamentals of CUDA(Part1)_2025
No ratings yet
Lecture3 Fundamentals of CUDA(Part1)_2025
52 pages
chapter-8
No ratings yet
chapter-8
58 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
cs179 2016 Lec13
No ratings yet
cs179 2016 Lec13
30 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
GPU Computing 2
No ratings yet
GPU Computing 2
28 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Overview of GPGPU's
No ratings yet
Overview of GPGPU's
81 pages
CUDA PPT Anurita Unit3
No ratings yet
CUDA PPT Anurita Unit3
42 pages
Demystifying GPU microarchitecture through microbenchmarking
No ratings yet
Demystifying GPU microarchitecture through microbenchmarking
12 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Addition_Cuda
No ratings yet
Addition_Cuda
2 pages
Optimus_Developer_Guide
No ratings yet
Optimus_Developer_Guide
11 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
CUDA Programming Invert
No ratings yet
CUDA Programming Invert
36 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
GPU_Programming_slides_2
No ratings yet
GPU_Programming_slides_2
37 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
LP 1,,1
No ratings yet
LP 1,,1
5 pages
CUDAProgModel
No ratings yet
CUDAProgModel
24 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
15-bellmanFord
No ratings yet
15-bellmanFord
39 pages
week 7 AI LUMS
No ratings yet
week 7 AI LUMS
47 pages
13 WLAN Overview
No ratings yet
13 WLAN Overview
60 pages
First Merit List BSCS 2023 New Campus269
No ratings yet
First Merit List BSCS 2023 New Campus269
2 pages
Gerardo Rodríguez Barba Control Lectura 120224
No ratings yet
Gerardo Rodríguez Barba Control Lectura 120224
9 pages
Accelerating AI with Storage Scale
No ratings yet
Accelerating AI with Storage Scale
19 pages
Pete Ogl Readme 1 76
No ratings yet
Pete Ogl Readme 1 76
7 pages
Lastexception 63843778832
No ratings yet
Lastexception 63843778832
2 pages
GPU Programming in MATLAB
No ratings yet
GPU Programming in MATLAB
6 pages
Docker Guide For AI Research
No ratings yet
Docker Guide For AI Research
8 pages
Pcmark10 Technical Guide
No ratings yet
Pcmark10 Technical Guide
141 pages
Brain Intelligence: Go Beyond Artificial Intelligence
No ratings yet
Brain Intelligence: Go Beyond Artificial Intelligence
15 pages
General Purpose Application Software
No ratings yet
General Purpose Application Software
24 pages
Prajwal M Pawar Resume
No ratings yet
Prajwal M Pawar Resume
1 page
Automatic Control of Unmanned Vehicles Based on Deep Reinforcement Learning and YOLO Algorithm Using Airsim Simulation
No ratings yet
Automatic Control of Unmanned Vehicles Based on Deep Reinforcement Learning and YOLO Algorithm Using Airsim Simulation
7 pages
Group D Modelarts
No ratings yet
Group D Modelarts
18 pages
Desktop Engineering - 2011-08
No ratings yet
Desktop Engineering - 2011-08
52 pages
Microprocessor (Ass.1)
No ratings yet
Microprocessor (Ass.1)
2 pages
Machine Learning - The Mastery Bible - The Definitive Guide To Machine Learning Data Science PDF
100% (4)
Machine Learning - The Mastery Bible - The Definitive Guide To Machine Learning Data Science PDF
331 pages
The State of AI Infrastructure at Scale 2024
No ratings yet
The State of AI Infrastructure at Scale 2024
22 pages
Opengl Configuring GLFW and Glew
No ratings yet
Opengl Configuring GLFW and Glew
18 pages
Batch C03 Medicine Recommendation System Using Machine Learning
No ratings yet
Batch C03 Medicine Recommendation System Using Machine Learning
17 pages
Videocard For Gaming
No ratings yet
Videocard For Gaming
23 pages
DualSPHysics v4.0 GUIDE
No ratings yet
DualSPHysics v4.0 GUIDE
140 pages
Launcher Log
No ratings yet
Launcher Log
127 pages
session11 papers
No ratings yet
session11 papers
13 pages
Recommended Psu Table
No ratings yet
Recommended Psu Table
2 pages
Lecture Clusters PDF
No ratings yet
Lecture Clusters PDF
168 pages
Download Parallel programming for modern high performance computing systems Czarnul ebook All Chapters PDF
100% (1)
Download Parallel programming for modern high performance computing systems Czarnul ebook All Chapters PDF
55 pages
7672 (Z68A-GD80 (B3) ) OC Guide - (For Web)
No ratings yet
7672 (Z68A-GD80 (B3) ) OC Guide - (For Web)
13 pages

Data Parallelism, Task Parallelism, CPU,GPU (1)

Uploaded by

Data Parallelism, Task Parallelism, CPU,GPU (1)

Uploaded by

CPU vs GPU , Performance

•CPU (Central Processing Unit)

•Running operating systems

•Few cores (4 to 32 cores typically).

. •Hundreds to thousands of simple cores.

CPU Time: 0.06 seconds

Measures the number of floating-

Host CPU and its memory

You might also like