0% found this document useful (0 votes)
2 views13 pages

Data Parallelism, Task Parallelism, CPU,GPU (1)

The document compares CPUs and GPUs, highlighting their architectures and performance capabilities. CPUs are optimized for sequential processing with fewer powerful cores, while GPUs excel in parallel processing with many simpler cores. Performance metrics such as computational performance and memory bandwidth are discussed, along with example code demonstrating CPU and GPU execution times.

Uploaded by

TIKTOK DHAMAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views13 pages

Data Parallelism, Task Parallelism, CPU,GPU (1)

The document compares CPUs and GPUs, highlighting their architectures and performance capabilities. CPUs are optimized for sequential processing with fewer powerful cores, while GPUs excel in parallel processing with many simpler cores. Performance metrics such as computational performance and memory bandwidth are discussed, along with example code demonstrating CPU and GPU execution times.

Uploaded by

TIKTOK DHAMAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

CPU vs GPU , Performance

analysis , Terminology
Group Member
Maasma Zari 2022-CS-
504
Zameer ul Hassan 2022-
CS-540
Haris Khan 2022-CS-
556
What is a CPU?

•CPU (Central Processing Unit)


is known as the brain of the
computer.
•Optimized for sequential serial
processing.
•Excellent at handling complex
instructions and few tasks at
high speed.
Example Tasks:
.

•Running operating systems


•Managing input/output operations
•Handling background processe
Architecture of CPU

•Few cores (4 to 32 cores typically).


.
•Each core is very powerful and complex.
•Large cache memory.
•Focus on minimizing latency (responding
quickly).
Diagram: (Simple core architecture illustration)
What is a GPU?
•GPU (Graphics Processing
Unit) is designed for massively
.

parallel tasks.
•Originally built for rendering
graphics.
•Now used in AI, scientific
simulations, video editing, etc.
•Can handle thousands of simple
tasks at the same time.
Architecture of GPU

. •Hundreds to thousands of simple cores.


•Smaller cache memory per core.
•High throughput (focus on executing
many operations at once).
•Ideal for Data Parallelism.
Diagram: (Grid of many small cores)
Code:
#include <iostream> cudaMalloc((void**)&dev_a, size);
#include <cuda_runtime.h> cudaMalloc((void**)&dev_b, size);
#include <chrono> cudaMalloc((void**)&dev_c, size);

__global__ void addKernelGPU(int *c, const int *a, const int *b, int n) { cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice);
int i = blockDim.x * blockIdx.x + threadIdx.x; cudaMemcpy(dev_b, b, size, cudaMemcpyHostToDevice);
if (i < n) {
c[i] = a[i] + b[i]; // CPU Execution
} auto start_cpu = std::chrono::high_resolution_clock::now();
} addCPU(c_cpu, a, b, N);
auto end_cpu = std::chrono::high_resolution_clock::now();
void addCPU(int *c, const int *a, const int *b, int n) { std::chrono::duration<double> cpu_duration = end_cpu - start_cpu;
for (int i = 0; i < n; ++i) { std::cout << "CPU Time: " << cpu_duration.count() << " seconds" <<
c[i] = a[i] + b[i]; std::endl;
}
} // GPU Execution
auto start_gpu = std::chrono::high_resolution_clock::now();
int main() { addKernelGPU<<<(N + 255) / 256, 256>>>(dev_c, dev_a, dev_b, N);
const int N = 1 << 20; // 1 million elements cudaMemcpy(c_gpu, dev_c, size, cudaMemcpyDeviceToHost);
size_t size = N * sizeof(int); auto end_gpu = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> gpu_duration = end_gpu - start_gpu;
int *a = new int[N]; std::cout << "GPU Time: " << gpu_duration.count() << " seconds" <<
int *b = new int[N]; std::endl;
int *c_cpu = new int[N];
int *c_gpu = new int[N]; // Cleanup
cudaFree(dev_a);
for (int i = 0; i < N; ++i) { cudaFree(dev_b);
a[i] = i; cudaFree(dev_c);
b[i] = i * 2; delete[] a;
} delete[] b;
delete[] c_cpu;
int *dev_a = nullptr; delete[] c_gpu;
int *dev_b = nullptr;
int *dev_c = nullptr; return 0;
Output:

CPU Time: 0.06 seconds


GPU Time: 0.002 seconds
Metric Description

Measures the number of floating-


Peak Computational point operations per second
Performance (GFLOPS) (billions) that a system can
achieve.
Measures how fast data can be
Memory Bandwidth (GB/sec) moved between memory and
processor.
How effectively the processor
Efficiency Ratio
reads/writes data from/to memory.
Term Meaning

Host CPU and its memory


Device GPU and its memory
.

You might also like