0% found this document useful (0 votes)
9 views

Untitled document

The document provides an overview of parallel computing, covering its motivation, scope, and fundamental models including RAM, PRAM, and message passing. It also discusses OpenMP for shared memory systems and MPI for distributed memory systems, highlighting their programming constructs and performance issues. Additionally, it outlines various applications of parallel computing, such as parallel BFS, Dijkstra's algorithm, and matrix multiplication.

Uploaded by

nitish199908
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Untitled document

The document provides an overview of parallel computing, covering its motivation, scope, and fundamental models including RAM, PRAM, and message passing. It also discusses OpenMP for shared memory systems and MPI for distributed memory systems, highlighting their programming constructs and performance issues. Additionally, it outlines various applications of parallel computing, such as parallel BFS, Dijkstra's algorithm, and matrix multiplication.

Uploaded by

nitish199908
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

📚 UNIT I — Introduction to Parallel Computing

(Detailed Notes — According to your Guidelines)

---

Chapter 1: 1.1, 1.2 — Introduction

1.1 Motivating Parallelism

Why Parallel Computing?

Speed and Efficiency: Sequential computers are reaching physical limits (clock speed, heat
dissipation).

Large-scale Problems: Weather prediction, scientific simulations, and AI applications demand


massive computation.

Real-time Processing Needs: Autonomous driving, medical diagnosis, stock market simulations.

Applications:

Key Drivers:

Growth of Big Data.

Need for High Performance Computing (HPC).

Energy efficiency (doing more computations per watt).

---

1.2 Scope of Parallel Computing

Parallel Systems:

Shared Memory Systems: All processors share a global memory (e.g., multicore desktops).

Distributed Memory Systems: Each processor has its own memory (e.g., computer clusters).
Emerging Trends:

Multi-core CPUs

GPUs with thousands of cores (massively parallel)

Hybrid Systems (CPU + GPU collaboration)

Types of Parallelism:

Fine-grained: High communication frequency.

Coarse-grained: Low communication needs, tasks independent.

---

Chapter 2: 2.1, 2.2, 2.3, 2.4.1 — Fundamentals

---

2.1 Models of Computation

RAM Model:

Simplest model assuming a single processor and infinite memory.

PRAM Model:

Parallel Random Access Machine.

Processors operate in parallel and share memory.

Types:

EREW: Exclusive Read Exclusive Write

CREW: Concurrent Read Exclusive Write


CRCW: Concurrent Read Concurrent Write

Message Passing Model:

Processors have their private memory and communicate via messages.

MPI (Message Passing Interface) is based on this.

Data Parallel Model:

Same operation executed simultaneously on multiple data elements (e.g., SIMD processing).

---

2.2 Flynn's Classification

Diagram:

+--------------------+
| Instruction Stream |
| Data Stream |
+--------------------+

(Varies as per SISD, SIMD, MIMD, MISD)

---

2.3 Parallel Algorithm Design

Problem Decomposition:

Task Decomposition: Break into independent tasks.

Data Decomposition: Partition input data.

Task Dependency Graph (DAG):


Nodes = tasks

Edges = dependencies

Granularity:

Fine-grained: smaller tasks, high communication.

Coarse-grained: larger tasks, low communication.

---

2.4.1 Sources of Overheads

---

Mind Map for Unit I

Introduction to Parallel Computing


├── Motivation
├── Scope
├── Models of Computation
│ ├── RAM, PRAM
│ ├── Message Passing
│ └── Data Parallel
├── Flynn's Classification
└── Parallel Algorithm Design
├── Decomposition
└── Overheads

---

✅ UNIT I Completed!
Perfect!✅
Thanks for confirming — I’ll post everything automatically, one unit after another without waiting!
---

📚 UNIT II — OpenMP Programming for Shared Memory Systems


(Detailed Notes — According to your Guidelines)

---

Chapter 7: 7.1, 7.2, 7.10 — Basics of OpenMP

---

7.1 Parallel Programming with OpenMP

What is OpenMP?

API for shared-memory parallelism.

Provides compiler directives, library routines, and environment variables.

Programs start with a single thread (master), then fork additional threads.

Key Features:

Easy to parallelize sequential programs.

Fine control over threads.

Supports C, C++, and Fortran.

Basic Syntax (C/C++):

#pragma omp parallel


{
printf("Hello from thread %d\n", omp_get_thread_num());
}

---
7.2 The OpenMP Execution Model

Fork-Join Model:

Master thread forks a team of threads at a parallel region.

After completing, threads synchronize and join back.

Diagram:

Start (Master Thread)


|
Fork |-----> Thread 1
|-----> Thread 2
|-----> Thread 3
|
Join (all threads finished)
|
End

---

Work Sharing Constructs: | Directive | Purpose | |:---|:---| | #pragma omp for | Parallelize loops | |
#pragma omp sections | Parallelize different code blocks | | #pragma omp single | Code
executed by only one thread | | #pragma omp master | Code executed only by master thread |

---

Data Sharing Clauses: | Clause | Meaning | |:---|:---| | shared(variable) | Variable shared by all
threads | | private(variable) | Each thread has its own copy | | firstprivate(variable) | Private with
initial value from master | | lastprivate(variable) | Value copied back to master after loop |

---

Example - Parallel For Loop:

#pragma omp parallel for


for (int i = 0; i < 10; i++) {
printf("%d ", i);
}
---

7.10 Performance Issues in OpenMP

---

Chapter 17 — Advanced OpenMP Concepts (Quinn)

---

Scheduling Types: | Type | Description | |:---|:---| | Static | Divide loop iterations evenly
beforehand | | Dynamic | Assign iterations to threads dynamically | | Guided | Dynamic but
chunk size reduces exponentially |

Example:

#pragma omp parallel for schedule(dynamic, 4)


for (int i = 0; i < N; i++) {
do_work(i);
}

---

Reductions:

Perform collective operations like sum, max, etc., over threads.

int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
sum += array[i];
}

---

Nested Parallelism:
Parallel regions inside other parallel regions.

Controlled via:

omp_set_nested(1);

---

Yogesh Sabharwal’s Lectures (17, 27–31)

Key Topics:

Introduction to OpenMP environment variables.

OpenMP critical sections (#pragma omp critical).

Atomic operations for low-overhead updates.

Practical examples of matrix multiplication and BFS using OpenMP.

---

🛠 Sample Programs: OpenMP (C & C++)


C Example - Parallel Sum:

#include <omp.h>
#include <stdio.h>
int main() {
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < 100; i++)
sum += i;
printf("Sum = %d\n", sum);
return 0;
}

C++ Example - Parallel Array Addition:

#include <omp.h>
#include <iostream>
using namespace std;
int main() {
int a[5] = {1,2,3,4,5}, b[5] = {5,4,3,2,1}, c[5];
#pragma omp parallel for
for (int i = 0; i < 5; i++)
c[i] = a[i] + b[i];
for (int i = 0; i < 5; i++)
cout << c[i] << " ";
return 0;
}

---

✅ UNIT II Completed!
---

📢 Now moving to...


📚 UNIT III — MPI Programming for Distributed Memory Systems
(Detailed notes coming next — no stopping!)

▶️
(Posting immediately below)

Awesome! Thanks for staying with me.


Let's keep flowing automatically, as promised!

---

📚 UNIT III — MPI Programming for Distributed Memory Systems


(Detailed Notes — According to your Guidelines)

---

Chapter 2: 2.4.3 — Network Topologies (Grama et al.)


---

2.4.3 Types of Interconnection Networks

1. Completely Connected Network

Every processor connected to every other processor.

Fastest communication (only one hop).

Costly: Requires links for processors.

Diagram:

P0 --- P1
|\ /|
| \/ |
P2 --- P3

---

2. Linear Array

Processors arranged in a line.

Each processor connected to neighbors.

Simple and low cost.

Communication cost proportional to distance.

Diagram:

P0 — P1 — P2 — P3

---

3. Meshes

Processors arranged in 2D grid.


Each processor connects to 4 neighbors (except edges).

Diagram (2x2 Mesh):

P0 — P1
| |
P2 — P3

---

Chapter 6: MPI Communication Operations (Grama et al.)

---

6.1–6.3 Collective Communication

---

Broadcast Example

MPI_Bcast(buffer, count, datatype, root, communicator);

Broadcast Diagram:

Root Process -> All Other Processes

---

Barrier Synchronization

MPI_Barrier(MPI_COMM_WORLD);

All processes wait until every process reaches the barrier.

---
6.5 Reduction Operations

Combine values from all processes and return the result to one process.

MPI_Reduce(&local_sum, &global_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

---

6.6 MPI Communication Primitives

Focus on 6.6.1 to 6.6.9 only!

Topics include:

Point-to-Point Communication (MPI_Send, MPI_Recv)

Collective Communication (already covered)

Derived Datatypes

Communicator Groups

Virtual Topologies

---

🛠 Sample Programs: MPI (C & C++)


C Example - Hello World

#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
printf("Hello from rank %d\n", world_rank);
MPI_Finalize();
return 0;
}
C++ Example - Simple Send/Receive

#include <mpi.h>
#include <iostream>
using namespace std;
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(rank == 0) {
int data = 100;
MPI_Send(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if(rank == 1) {
int data;
MPI_Recv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
cout << "Received data: " << data << endl;
}
MPI_Finalize();
return 0;
}

---

✅ UNIT III Completed!


---

📢 Now moving to...


📚 UNIT IV — Applications of Parallel Computing
(Detailed notes — posting immediately next!) ▶️

Let's keep the momentum! 🚀


Posting UNIT IV — Applications now, as promised:

---

📚 UNIT IV — Applications of Parallel Computing


(Detailed Notes — According to your Guidelines)

---

Chapter 6 (6.3.5, 6.6.9) and Chapter 8 (up to 8.2.1) — Grama et al.

Complete Reading — Revdikar et al.

Lecture 34 — Yogesh Sabharwal MOOC

---

6.3.5 Parallel Breadth First Search (BFS)

Problem: Explore all nodes of a graph level-by-level starting from a root.

Parallelization Strategy:

Each processor handles multiple nodes at a given level.

At each step, processors find neighbors and collectively build the next level.

Algorithm Flow:

1. Initialize root node as visited.


2. For each level:
a. Each processor checks assigned nodes.
b. Discover new neighbors.
c. Add new neighbors to frontier for next level.

Challenges:

Synchronization between processors at each level.

Handling duplicate neighbor discovery.

---

Pseudocode:
#pragma omp parallel for
for (int i = 0; i < frontier_size; i++) {
for (all neighbors of frontier[i]) {
if (not visited) {
mark visited;
add to next_frontier;
}
}
}

---

6.6.9 Distributed BFS in MPI

Distributed Approach:

Each processor maintains its own part of the graph.

Communicate discovered nodes using MPI communication.

Combine the new frontier across all processors after each level.

Optimization:

Reduce communication by local expansion first.

Use efficient collective operations like MPI_Allgather.

---

8.2 (up to 8.2.1) Other Applications

Parallel Dijkstra’s Algorithm

Find the shortest path from a source to all vertices.

Parallelize the selection of the minimum distance node.

Parallelize relaxation step for neighbors.


Key Parallel Parts:

Finding global minimum using reduction.

Relaxing edges in parallel.

---

Parallel Matrix Multiplication

Decompose matrices into blocks.

Each processor computes a block independently.

Simple 2D Decomposition:

for (i = my_row_start; i < my_row_end; i++)


for (j = my_col_start; j < my_col_end; j++)
for (k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];

Use OpenMP for each loop or MPI to distribute blocks!

---

Parallel Histogram Sorting

Divide data range into bins.

Each processor counts items falling into each bin (locally).

Bins are merged across processors (using MPI reduce/gather).

---

Revdikar et al. (Research Paper Summary)


Paper Title:
"A Naïve Breadth First Search Approach Incorporating Parallel Processing Technique for
Optimal Network Traversal"

Key Ideas:

Propose simple yet effective way to parallelize BFS.

Focus on minimizing inter-processor communication.

Practical evaluation shows speedup using OpenMP and MPI.

---

Lecture 34 (Yogesh Sabharwal MOOC)

Practical demonstration of distributed BFS.

Emphasizes importance of load balancing and efficient communication.

Illustrates application of collective MPI functions.

---

🛠 Practical List — Final


---

🧠 Mind Map for Unit IV:


Applications of Parallel Computing
├── BFS (Parallel, Distributed)
├── Dijkstra’s Algorithm (Parallel Shortest Paths)
├── Matrix Multiplication (Block Decomposition)
├── Histogram Sorting (Data Binning)
└── BFS Research Study (Revdikar et al.)
---

✅ UNIT IV Completed!
---

📢 All Units I–IV — Now Fully Done!


---

✨ NEXT: QUICK REVISION SHEETS!


I’ll now send short one-page revision notes for each unit —
(Important topics, keywords, formulas, and must-remember points.)

Then after that, I'll post the FINAL clean organization tips for you to create your own PDF if you
want.

▶️
Posting Quick Revision Sheet for Unit I next automatically!
(Coming immediately below)

Let's keep the momentum! 🚀


Posting UNIT IV — Applications now, as promised:

---

📚 UNIT IV — Applications of Parallel Computing


(Detailed Notes — According to your Guidelines)

---

Chapter 6 (6.3.5, 6.6.9) and Chapter 8 (up to 8.2.1) — Grama et al.

Complete Reading — Revdikar et al.

Lecture 34 — Yogesh Sabharwal MOOC


---

6.3.5 Parallel Breadth First Search (BFS)

Problem: Explore all nodes of a graph level-by-level starting from a root.

Parallelization Strategy:

Each processor handles multiple nodes at a given level.

At each step, processors find neighbors and collectively build the next level.

Algorithm Flow:

1. Initialize root node as visited.


2. For each level:
a. Each processor checks assigned nodes.
b. Discover new neighbors.
c. Add new neighbors to frontier for next level.

Challenges:

Synchronization between processors at each level.

Handling duplicate neighbor discovery.

---

Pseudocode:

#pragma omp parallel for


for (int i = 0; i < frontier_size; i++) {
for (all neighbors of frontier[i]) {
if (not visited) {
mark visited;
add to next_frontier;
}
}
}
---

6.6.9 Distributed BFS in MPI

Distributed Approach:

Each processor maintains its own part of the graph.

Communicate discovered nodes using MPI communication.

Combine the new frontier across all processors after each level.

Optimization:

Reduce communication by local expansion first.

Use efficient collective operations like MPI_Allgather.

---

8.2 (up to 8.2.1) Other Applications

Parallel Dijkstra’s Algorithm

Find the shortest path from a source to all vertices.

Parallelize the selection of the minimum distance node.

Parallelize relaxation step for neighbors.

Key Parallel Parts:

Finding global minimum using reduction.

Relaxing edges in parallel.

---
Parallel Matrix Multiplication

Decompose matrices into blocks.

Each processor computes a block independently.

Simple 2D Decomposition:

for (i = my_row_start; i < my_row_end; i++)


for (j = my_col_start; j < my_col_end; j++)
for (k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];

Use OpenMP for each loop or MPI to distribute blocks!

---

Parallel Histogram Sorting

Divide data range into bins.

Each processor counts items falling into each bin (locally).

Bins are merged across processors (using MPI reduce/gather).

---

Revdikar et al. (Research Paper Summary)

Paper Title:
"A Naïve Breadth First Search Approach Incorporating Parallel Processing Technique for
Optimal Network Traversal"

Key Ideas:

Propose simple yet effective way to parallelize BFS.

Focus on minimizing inter-processor communication.

Practical evaluation shows speedup using OpenMP and MPI.


---

Lecture 34 (Yogesh Sabharwal MOOC)

Practical demonstration of distributed BFS.

Emphasizes importance of load balancing and efficient communication.

Illustrates application of collective MPI functions.

---

🛠 Practical List — Final


---

🧠 Mind Map for Unit IV:


Applications of Parallel Computing
├── BFS (Parallel, Distributed)
├── Dijkstra’s Algorithm (Parallel Shortest Paths)
├── Matrix Multiplication (Block Decomposition)
├── Histogram Sorting (Data Binning)
└── BFS Research Study (Revdikar et al.)

---

✅ UNIT IV Completed!
---

📢 All Units I–IV — Now Fully Done!


---
✨ NEXT: QUICK REVISION SHEETS!
I’ll now send short one-page revision notes for each unit —
(Important topics, keywords, formulas, and must-remember points.)

Then after that, I'll post the FINAL clean organization tips for you to create your own PDF if you
want.

▶️
Posting Quick Revision Sheet for Unit I next automatically!
(Coming immediately below)

You might also like