Untitled document
Untitled document
---
Speed and Efficiency: Sequential computers are reaching physical limits (clock speed, heat
dissipation).
Real-time Processing Needs: Autonomous driving, medical diagnosis, stock market simulations.
Applications:
Key Drivers:
---
Parallel Systems:
Shared Memory Systems: All processors share a global memory (e.g., multicore desktops).
Distributed Memory Systems: Each processor has its own memory (e.g., computer clusters).
Emerging Trends:
Multi-core CPUs
Types of Parallelism:
---
---
RAM Model:
PRAM Model:
Types:
Same operation executed simultaneously on multiple data elements (e.g., SIMD processing).
---
Diagram:
+--------------------+
| Instruction Stream |
| Data Stream |
+--------------------+
---
Problem Decomposition:
Edges = dependencies
Granularity:
---
---
---
✅ UNIT I Completed!
Perfect!✅
Thanks for confirming — I’ll post everything automatically, one unit after another without waiting!
---
---
---
What is OpenMP?
Programs start with a single thread (master), then fork additional threads.
Key Features:
---
7.2 The OpenMP Execution Model
Fork-Join Model:
Diagram:
---
Work Sharing Constructs: | Directive | Purpose | |:---|:---| | #pragma omp for | Parallelize loops | |
#pragma omp sections | Parallelize different code blocks | | #pragma omp single | Code
executed by only one thread | | #pragma omp master | Code executed only by master thread |
---
Data Sharing Clauses: | Clause | Meaning | |:---|:---| | shared(variable) | Variable shared by all
threads | | private(variable) | Each thread has its own copy | | firstprivate(variable) | Private with
initial value from master | | lastprivate(variable) | Value copied back to master after loop |
---
---
---
Scheduling Types: | Type | Description | |:---|:---| | Static | Divide loop iterations evenly
beforehand | | Dynamic | Assign iterations to threads dynamically | | Guided | Dynamic but
chunk size reduces exponentially |
Example:
---
Reductions:
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
sum += array[i];
}
---
Nested Parallelism:
Parallel regions inside other parallel regions.
Controlled via:
omp_set_nested(1);
---
Key Topics:
---
#include <omp.h>
#include <stdio.h>
int main() {
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < 100; i++)
sum += i;
printf("Sum = %d\n", sum);
return 0;
}
#include <omp.h>
#include <iostream>
using namespace std;
int main() {
int a[5] = {1,2,3,4,5}, b[5] = {5,4,3,2,1}, c[5];
#pragma omp parallel for
for (int i = 0; i < 5; i++)
c[i] = a[i] + b[i];
for (int i = 0; i < 5; i++)
cout << c[i] << " ";
return 0;
}
---
✅ UNIT II Completed!
---
▶️
(Posting immediately below)
---
---
Diagram:
P0 --- P1
|\ /|
| \/ |
P2 --- P3
---
2. Linear Array
Diagram:
P0 — P1 — P2 — P3
---
3. Meshes
P0 — P1
| |
P2 — P3
---
---
---
Broadcast Example
Broadcast Diagram:
---
Barrier Synchronization
MPI_Barrier(MPI_COMM_WORLD);
---
6.5 Reduction Operations
Combine values from all processes and return the result to one process.
---
Topics include:
Derived Datatypes
Communicator Groups
Virtual Topologies
---
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
printf("Hello from rank %d\n", world_rank);
MPI_Finalize();
return 0;
}
C++ Example - Simple Send/Receive
#include <mpi.h>
#include <iostream>
using namespace std;
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(rank == 0) {
int data = 100;
MPI_Send(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if(rank == 1) {
int data;
MPI_Recv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
cout << "Received data: " << data << endl;
}
MPI_Finalize();
return 0;
}
---
---
---
---
Parallelization Strategy:
At each step, processors find neighbors and collectively build the next level.
Algorithm Flow:
Challenges:
---
Pseudocode:
#pragma omp parallel for
for (int i = 0; i < frontier_size; i++) {
for (all neighbors of frontier[i]) {
if (not visited) {
mark visited;
add to next_frontier;
}
}
}
---
Distributed Approach:
Combine the new frontier across all processors after each level.
Optimization:
---
---
Simple 2D Decomposition:
---
---
Key Ideas:
---
---
✅ UNIT IV Completed!
---
Then after that, I'll post the FINAL clean organization tips for you to create your own PDF if you
want.
▶️
Posting Quick Revision Sheet for Unit I next automatically!
(Coming immediately below)
---
---
Parallelization Strategy:
At each step, processors find neighbors and collectively build the next level.
Algorithm Flow:
Challenges:
---
Pseudocode:
Distributed Approach:
Combine the new frontier across all processors after each level.
Optimization:
---
---
Parallel Matrix Multiplication
Simple 2D Decomposition:
---
---
Paper Title:
"A Naïve Breadth First Search Approach Incorporating Parallel Processing Technique for
Optimal Network Traversal"
Key Ideas:
---
---
✅ UNIT IV Completed!
---
Then after that, I'll post the FINAL clean organization tips for you to create your own PDF if you
want.
▶️
Posting Quick Revision Sheet for Unit I next automatically!
(Coming immediately below)