0% found this document useful (0 votes)

82 views

HPC Unit 456

The document discusses sources of overhead in parallel programs such as inter-process communication, idling, and excess computation. It then explains performance metrics for parallel systems including performance overhead, speedup, efficiency, and scalability. Finally, it describes the effect of granularity on performance when scaling down parallel systems, with communication and computation costs decreasing and increasing respectively.

Uploaded by

ShadowOP

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views

HPC Unit 456

Uploaded by

ShadowOP

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

जय श्री राम

UNIT 4

1. What are the sources of overhead in Parallel Programs?

=>
Inter-process communication:
Parallel programs often require communication between different processes or threads.
It involves the exchange of data or messages between processors.
Time spent in sending and receiving messages, coordinating operations can introduce
overhead in parallel programs

Idling:
Idling refers to situation where processors or threads remain inactive or have no work to
perform during parallel execution.
This can happen due to load imbalance, resulting in wasted resources and decreased
efficiency.

Excess computation:
Excess computation overhead arises from redundant or unnecessary calculations in
parallel program.
It can occur due to improper task decomposition, redundant calculations, or inefficient
algorithms.
It leads to additional computational resources being used and longer execution time
जय श्री राम

2. Explain Performance Metrics for Parallel Systems

=>
Ts = Serial Runtime
Tp = Parallel Runtime
p = Number of processing elements

Performance overhead:
• Execution time:
The total time taken to execute a parallel program.

• Total overhead:
The additional time or resources consumed by a parallel program beyond the
actual computation.

• Speedup:
The performance improvement achieved by parallelizing a program compared to
its sequential version.

• Efficiency:
It measures of how effectively computational resources are utilized to solve
problem
जय श्री राम

• Scalability:
The ability of parallel system to maintain or improve its performance as number
of processors/threads increases.

3. What are the effect of Granularity on Performance

=>
Scaling down:
Using fewer processors improves performance of parallel systems.
Scaling down refers to using fewer processors or processing elements than maximum
possible number in parallel system.
It means reducing the number of processors from 'n' to 'p'.

• Communication decrease:
When scaling down, the communication cost decreases by a factor of 'n/p'.

• Computation increase:
Computation at each processing element increases by factor of 'n/p' when scaling
down. This increase is due to total workload is divided among reduced number of
processors.
Total cost: Θ (n log p)

Cost optimal way:

Total cost: Θ (n + plog p)

https://youtu.be/gOv7t5yYvmo
जय श्री राम

4. Describe Minimum Execution time and minimum cost, optimal execution time
=>
Minimum Execution Time (TPmin):
To find the minimum execution time, you can differentiate the expression for TP (parallel
runtime) with respect to p (number of processors) and set it to zero:
d(TP)/dp = 0

Minimum Cost-Optimal Parallel Time (TPcost_opt):

for cost optimality, p = O(f-1(W))
TPcost_opt = Θ(W/p)
जय श्री राम

5. Define Matrix-Vector Multiplication with examples

a) Row-wise 1D partitioning
b) 2D partitioning
https://youtu.be/0-uajl0skxA
https://youtu.be/4py-tfXTld8
https://youtu.be/kpQheszFaEI

6. Define Matrix-Matrix Multiplication with examples

https://youtu.be/YmugJ2SLA5g
https://youtu.be/ejLoqAE1vgc

7. State and explain Canon's Algorithm

https://youtu.be/ZaNxMTjUB0w
https://youtu.be/Z-LWWpH2ScE

https://youtu.be/Vk-mJcj7y_I
जय श्री राम

8. State and explain Dense Matrix Algorithm

=>
Characteristics:
Operate on matrices with multiple non-zero elements
Utilize parallelism for efficient computation
Break down tasks for concurrent processing

Advantages:
Efficient
High performance
Faster processing

Limitations:
Require significant memory
High computational complexity for large matrices
Not suitable for sparse matrices with mostly zero elements

Applications:
Scientific simulations, data analytics, and machine learning
Image/signal processing, data compression, and recommendation systems

9. Explain Scalability in Parallel Systems

जय श्री राम

UNIT 5

1. What are the issues in Sorting on Parallel Computers?

=>
Issues:
Where input and output is stored:
Determining the optimal storage location for input and output data in parallel sorting
algorithm is crucial for efficient data access

How comparisons are performed:

Sorting algorithms rely on comparisons between elements to arrange them in desired
order.
Techniques like data partitioning and efficient communication protocols are used to
perform these comparisons efficiently across multiple processing elements

Load Balancing:
Load balancing involves distributing computational workload evenly across available
processing elements.
Load balancing techniques aim to minimize impact of workload and computational
complexities.

Data Locality:
Data locality refers to proximity of data to processing element that operates on it.
Efficient utilization of data locality can minimize data transfer and communication
overhead, leading to improved performance.
Techniques like data partitioning, data replication are used to enhance data locality in
parallel sorting.
जय श्री राम

Scalability:
Scalability refers to ability of parallel sorting algorithm to handle larger problem sizes
and utilize increasing computational resources effectively.
For this proper algorithm need to be designed

2. Explain Parallelizing Quick Sort

=>
Parallelizing Quick Sort involves dividing sorting task among multiple processors to
accelerate sorting process.

The basic steps of Quick Sort are as follows:

Choose a pivot element from array.
Partition array into two sub-arrays, one with elements smaller than pivot and other with
elements larger than pivot.
Recursively apply Quick Sort on two sub-arrays.

To parallelize Quick Sort:

Partitioning Phase:
Each processor takes a subset of the array.
Each processor selects a pivot element from its subset.
Processors exchange elements based on the pivot to separate smaller and larger
elements.
जय श्री राम

Recursive Sorting Phase:

Sub-arrays are created with elements less than or greater than the pivot.
Each processor independently applies Quick Sort on its subset.

Combining Phase:
Sorted sub-arrays need to be combined into one sorted array.
Use parallel merge algorithm or communication/synchronization between processors.

Advantages:
Improved performance
Scalability
Efficient resource utilization

Challenges:
Load balancing
Data partitioning
Communication and synchronization overhead

https://youtu.be/UO5cQ5G9DFI
जय श्री राम

3. Explain All-Pairs Shortest Paths?

=>
Dijkstra's Algorithm helps to find shortest path between every pair of vertices in
weighted directed graph
Serial Execution Time (Tp): θ(n3)
It can be parallelized using two approaches: source partitioned and source parallel.

a) Source partitioned:
Graph is divided into smaller subgraphs, and each processor is assigned a subgraph
Parallel Execution Time (Tp): θ(n2)

Advantages:
Clearly defined partitioning
Each processor can compute shortest path independently
Suitable for distributed memory systems

Limitations:
Overhead of exchanging information between processors
Load imbalancing
जय श्री राम

b) Source parallel:
p>n
In this multiple processor work concurrently to explore different parts of the graph.

Advantages:
Parallelism is achieved by distributing the workload among multiple processors.
Improved performance.
Suitable for large graphs

Limitations:
Communication and synchronization overhead
Load imbalancing

Dijkstra Algorithm
https://youtu.be/84y-fHI008M

Floyd Warshall Algorithm

https://youtu.be/waG9itqH-EI
जय श्री राम

4. Explain dynamic load balancing scheme

=>
Asynchronous round robin:
Each processor maintains a counter. When processor needs more work, it requests it by
incrementing its counter and requesting next available task in round-robin fashion.

Global round robin:

The system maintains a global counter. Each processor requests task in round-robin
fashion, ensuring that tasks are distributed equally among processors globally.

Random polling:
Request randomly selected processor for work. This approach helps prevent any
particular processor from being overloaded with work.

5. Compare various communication strategies

=>
Message Passing Interface (MPI):
Widely used for communication in distributed memory systems.
Message-based communication between processes.
Explicit coding required.
Flexible and portable.
जय श्री राम

Shared Memory:
Processes access a shared memory region.
Communication through reading and writing to shared memory.
Simple implementation, high performance on shared memory systems.
Requires synchronization.

Remote Procedure Calls (RPC):

Processes call functions on remote processes.
Abstracts communication as local function calls.
Simplifies programming.
Used in distributed and shared memory systems.

Publish-Subscribe:
Publishers send messages to a central broker, subscribers receive relevant messages.
Loosely coupled communication.
Simplifies system design, dynamic scalability.
Used in event-driven systems and publish-subscribe frameworks.

Direct Memory Access (DMA):

Direct data transfers between I/O devices and memory without CPU involvement.
Reduces CPU overhead, improves data transfer efficiency.
Used in high-performance computing and data-intensive applications.
जय श्री राम

6. Explain Bubble Sort and its variants

https://youtu.be/5-xExK9Wf1o

https://youtu.be/Sazh4Y-WlDk

7. Explain Algorithm for sparse graph

https://youtu.be/ShYMsLp8rAA

8. Dense Graph vs Sparse Graph

जय श्री राम

9. Describe Parallel Depth-First Search

https://youtu.be/dkp9KvUtrWo

https://youtu.be/embRDiiH-ts

10. Explain Parallel Best-First Search

https://youtu.be/alxqDHJg_q0
जय श्री राम

UNIT 6

1. Explain CUDA architecture with example and its application

=>
CUDA (Compute Unified Device Architecture) is parallel computing platform and
programming model developed by NVIDIA.
It allows developers to harness the power of NVIDIA GPUs for general-purpose
computing tasks

Components:
• Host (CPU):
It is responsible for managing overall execution of program and managing tasks

• Device (GPU):
It performs parallel computations

• Kernels:
Kernels are parallel functions that are executed on the GPU
Written in the CUDA C/C++ language

• Thread Hierarchy:
Threads are lightweight, independent units of execution that run on GPU. Threads
are organized in a hierarchical manner, starting from individual threads grouped
into thread blocks (grid)

• Grid:
A grid is collection of thread blocks that execute independently on GPU.

• Memory Hierarchy:
The CUDA architecture includes various memory types that are accessible by
threads. This includes registers (private to each thread), shared memory (shared
within thread block) and global memory (accessible by all threads)
जय श्री राम

CUDA Applications
Medical Imaging
Computing Finance
Oil and Gas Exploration
Data science and analytics
Deep learning and machine learning

Benefits:
Massive Parallelism
Accelerated Performance
Heterogeneous Computing
Programming Flexibility
Wide Adoption and Support

Limitations:
Hardware Dependency
Learning Curve
Memory Limitations
Limited Software Support
Development Complexity

https://youtu.be/Ongct-wmYxo
जय श्री राम

2. Explain Heterogenous system architecture

3. Explain CUDA program flow:

• Load data into CPU memory

• Copy data from CPU to GPU memory
• Call GPU kernel using device variable
• Copy results from GPU to CPU memory
• Use results on CPU
जय श्री राम

4. CUDA programming model

https://youtu.be/X2hcpD7pUAM

5. Write short note on CUDA C

=>
CUDA C is a programming language and framework for GPU programming.
It is extension of C programming language which provides set of APIs and libraries that
allow developers to utilize parallel processing power of GPUs.
It allows developers to write kernel functions that can be executed in parallel on the
GPU.

Key features of CUDA C:

• Kernel Functions:
It allows writing parallel kernel functions that run on GPU

• Thread Hierarchy:
It organizes threads into group of blocks (grid)

• Memory Management:
It offers explicit memory management for global memory (accessible by all
threads) and shared memory (accessible within a block)

• Libraries and Tools:

It provides collection of libraries and tools that simplify development
जय श्री राम

Benefits of CUDA C:
High Performance
Flexibility
Broad GPU Support

Limitations of CUDA C:
GPU Dependency of NVIDIA
More Learning Time
Limited Portability

6. Explain CUDA Kernel

=>
CUDA Kernel is a function in CUDA C/C++ that is designed to be executed in parallel on
GPU.
It is code that runs on GPU and performs computations for specific task.

Writing CUDA Kernel:

Define kernel function using the __global__ specifier.
__global__ void kernel_name (argument_list)
We can also use __host__ or __device__
Specify input and output parameters.
Write the computation or task that needs to be performed by each thread.
जय श्री राम

Launching CUDA Kernel:

Allocate memory on GPU for input and output data.
Copy input data from CPU to GPU memory using cudaMemcpy.
Call the CUDA kernel function using the <<< >>> syntax, passing the required
parameters.
kernel_name <<< grid, block >>> (argument_list)
The GPU launches multiple threads to execute the kernel function in parallel.
After launching copy results from GPU to CPU using cudaMemcpy.

7. Managing Communication and Synchronization

=>
Data Transfer:
Efficiently transferring data between CPU and GPU memories using cudaMemcpy, taking
into account data size and memory access patterns.

Inter-Thread Communication:
Implementing mechanisms for inter-thread communication, such as shared memory, to
facilitate data sharing and coordination between threads within block.

__syncthreads():
It is synchronization primitive in CUDA. It includes strategies for data transfer between
CPU and GPU and utilizing shared memory effectively,
जय श्री राम

8. Explain Apache Hadoop

=>
Apache Hadoop is an open-source framework for distributed storage and processing of
large datasets. It provides a scalable and fault-tolerant platform for handling big data
analytics

Components:

• HDFS:
Distributed file system for storing and accessing large data files.

• MapReduce:
Programming model for parallel processing and analysis of large datasets.

• YARN:
Resource management framework for scheduling jobs and allocating resources in
a Hadoop cluster.

• Hadoop Ecosystem:
Collection of tools and frameworks that extend Hadoop's capabilities.

• Fault-tolerance and Scalability:

Hadoop handles failures without data loss and can scale by adding more
machines to the cluster.
जय श्री राम

9. Explain Apache Spark

=>
Apache Spark is an open-source distributed computing system designed for big data
processing and analytics. It provides a fast and flexible framework for processing large
datasets in parallel

Features:

• Speed:
Fast in-memory processing.

• Distributed computing:
It distributes data and computation across machines for efficient parallel
processing.

• Resilient Distributed Datasets (RDDs):

Fault-tolerant data processing using data cache in memory.

• Data processing capabilities:

Spark offers APIs for batch processing, real-time streaming, ML and SQL queries.

• Scalability:
Handles large data and scales from single machines to clusters.

• Integration:
Works well with popular big data tools.

• Ease of use:
User-friendly API supporting multiple languages.
जय श्री राम

10. Explain Apache Flink

=>
Apache Flink is an open-source stream processing and batch processing framework
designed to process large-scale data in real-time.

Features:
• Stream Processing:
It enables real-time data processing and analysis

• Batch Processing:
It efficiently executes complex batch jobs.

• Fault Tolerance:
It allows recovery from failures without data loss.

• Event Time Processing:

It handles event time delays for accurate processing.

• Scalability:
It scales horizontally to handle large data volumes, automatically parallelizing
computations

• Integration:
It seamlessly integrates with other big data ecosystems, like Kafka, Hadoop, etc.

• Dynamic Updates:
It supports dynamic updates to running jobs.
जय श्री राम

11. Explain OpenCL

=>
OpenCL (Open Computing Language) is an open standard for parallel programming
across heterogeneous devices, including CPUs, GPUs, and other accelerators. It allows
developers to write programs that can utilize the processing power of multiple devices

Features:
• Platform and Device Independence:
It allows developers to write code that runs on different hardware platforms and
devices.

• Heterogeneous Computing:
Utilize multiple devices simultaneously for efficient processing.

• Parallel Execution:
Task is divided into multiple small tasks, which are executed concurrently

• Portability:
Write code once and run it on various hardware platforms.

• Community and Ecosystem:

Active community support and rich set of tools and libraries.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
HPC Notes
No ratings yet
HPC Notes
24 pages
All Makes Filters Cross Reference Guides
No ratings yet
All Makes Filters Cross Reference Guides
5 pages
Jeffrey Dean CSE Summa Sum1990
No ratings yet
Jeffrey Dean CSE Summa Sum1990
34 pages
Examination of Mill and Kiln Girth Gear Teeth - Electromagnetic Methods
100% (1)
Examination of Mill and Kiln Girth Gear Teeth - Electromagnetic Methods
9 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Load Scheduling
100% (1)
Load Scheduling
10 pages
HighPerformanceComputing DS
No ratings yet
HighPerformanceComputing DS
2 pages
Cluster Computing
No ratings yet
Cluster Computing
32 pages
Literature Survey On A Load Balancing Model Based
No ratings yet
Literature Survey On A Load Balancing Model Based
12 pages
Knowledge Representation and Reasoning Unit 5
No ratings yet
Knowledge Representation and Reasoning Unit 5
72 pages
Load Balancing in Parallel Computing
No ratings yet
Load Balancing in Parallel Computing
5 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
No ratings yet
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
22 pages
Cse330 Agent-Based-Intelligent-Systems TH 1.00 Ac26
0% (1)
Cse330 Agent-Based-Intelligent-Systems TH 1.00 Ac26
2 pages
Knowledge Representation & Reasoning: By: Irum Naz Sodhar Lecturer IT, SBBU-SBA Main Campus
100% (1)
Knowledge Representation & Reasoning: By: Irum Naz Sodhar Lecturer IT, SBBU-SBA Main Campus
22 pages
CST 402 DC QB
No ratings yet
CST 402 DC QB
6 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
Distributed Computing Consensus and Agreement Algorithms: BITS Pilani
No ratings yet
Distributed Computing Consensus and Agreement Algorithms: BITS Pilani
46 pages
Deadlock Assignment
No ratings yet
Deadlock Assignment
6 pages
UNIT 5 RISC Architecture
No ratings yet
UNIT 5 RISC Architecture
16 pages
Unit 2 (Process Synchronization) 1
No ratings yet
Unit 2 (Process Synchronization) 1
79 pages
HPC Job
No ratings yet
HPC Job
8 pages
Lecture 04 Part A - Knowledge Representation and Reasoning
100% (1)
Lecture 04 Part A - Knowledge Representation and Reasoning
23 pages
Heuristic Search
No ratings yet
Heuristic Search
25 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
Slides Chapter 5 Basic Processing Unit
No ratings yet
Slides Chapter 5 Basic Processing Unit
44 pages
CS 303 Operating Systems Concepts: Semester I - 2019/2020
No ratings yet
CS 303 Operating Systems Concepts: Semester I - 2019/2020
19 pages
PDF
100% (2)
PDF
39 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
Introduction To High Performance Scientific Computing
No ratings yet
Introduction To High Performance Scientific Computing
464 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
36 pages
Distributed System Course File
No ratings yet
Distributed System Course File
26 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Collaboration Diagram
No ratings yet
Collaboration Diagram
14 pages
Cs3551 Distributed Computing
No ratings yet
Cs3551 Distributed Computing
2 pages
C Programming and Data Structures
No ratings yet
C Programming and Data Structures
5 pages
Matrix Multiplication Using SIMD Technologies
No ratings yet
Matrix Multiplication Using SIMD Technologies
13 pages
OS Lecture3 - Inter Process Communication
No ratings yet
OS Lecture3 - Inter Process Communication
43 pages
DLunit 2
No ratings yet
DLunit 2
8 pages
Parallel Algorithm Models
No ratings yet
Parallel Algorithm Models
21 pages
CS3251 UNIT IV Notes
No ratings yet
CS3251 UNIT IV Notes
27 pages
Advanced Computer Architecture: Parallel Computer Models 1.1 The State of Computing
100% (1)
Advanced Computer Architecture: Parallel Computer Models 1.1 The State of Computing
46 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
Distributed Deadlock Detection
No ratings yet
Distributed Deadlock Detection
18 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Precedence and Associativity To Resolve Conflicts: 3/25/2013 Dept of CSE, MIT 1
No ratings yet
Precedence and Associativity To Resolve Conflicts: 3/25/2013 Dept of CSE, MIT 1
18 pages
Module 2 Principle of AI
No ratings yet
Module 2 Principle of AI
15 pages
Unit-6: Pipeline & Vector Processing
No ratings yet
Unit-6: Pipeline & Vector Processing
41 pages
Distributed File Systems
No ratings yet
Distributed File Systems
75 pages
Assignment I: System Calls
No ratings yet
Assignment I: System Calls
21 pages
Single Layer Perceptron
No ratings yet
Single Layer Perceptron
14 pages
Decode HPC
No ratings yet
Decode HPC
68 pages
QUESTION BANK UNIT 5 - Computer Organization and Architecture
No ratings yet
QUESTION BANK UNIT 5 - Computer Organization and Architecture
9 pages
6CS5 DS Unit-4
No ratings yet
6CS5 DS Unit-4
64 pages
Notes - 4 Unit-Big Data
No ratings yet
Notes - 4 Unit-Big Data
38 pages
Q.1. (2) Q.2. Q.3. (3) Q.4
No ratings yet
Q.1. (2) Q.2. Q.3. (3) Q.4
1 page
Resource and Process Management
No ratings yet
Resource and Process Management
98 pages
Scalar Processor Report To Print
No ratings yet
Scalar Processor Report To Print
13 pages
BE02000041-Fundamental of Assignments
No ratings yet
BE02000041-Fundamental of Assignments
12 pages
Neural Network Complete Notes
No ratings yet
Neural Network Complete Notes
46 pages
Compiler Qbank
No ratings yet
Compiler Qbank
4 pages
DMW 4
No ratings yet
DMW 4
1 page
DMW 3
No ratings yet
DMW 3
1 page
ML Unit-4
No ratings yet
ML Unit-4
82 pages
Annotated Bibliography PDF
No ratings yet
Annotated Bibliography PDF
10 pages
Generating Feynman Graphs and Amplitudes With Feynarts
No ratings yet
Generating Feynman Graphs and Amplitudes With Feynarts
52 pages
RRL With Methodology
No ratings yet
RRL With Methodology
13 pages
Multicultural Lesson Plan - Keith
No ratings yet
Multicultural Lesson Plan - Keith
7 pages
Volume 2 (English) - Homunculus ComicK
No ratings yet
Volume 2 (English) - Homunculus ComicK
8 pages
Wa0084.
No ratings yet
Wa0084.
1 page
1) Cpm/Pert Network For Moore House Contractors (See Paper)
No ratings yet
1) Cpm/Pert Network For Moore House Contractors (See Paper)
3 pages
Building Real-Time Marvels With Laravel: Create Dynamic and Interactive Web Applications 1st Edition Sivaraj Selvaraj
No ratings yet
Building Real-Time Marvels With Laravel: Create Dynamic and Interactive Web Applications 1st Edition Sivaraj Selvaraj
64 pages
Basil I 2016
No ratings yet
Basil I 2016
14 pages
College Gaballo Victor Del Mundo PDF
No ratings yet
College Gaballo Victor Del Mundo PDF
1 page
On Noteworthy Applications of Laplace Transform in Real Life
100% (2)
On Noteworthy Applications of Laplace Transform in Real Life
7 pages
Cherry Rose T. Malgapo: Probability and Statistics
No ratings yet
Cherry Rose T. Malgapo: Probability and Statistics
26 pages
Explaining Creativity The Science of Human Innovation 1st Edition R. Keith Sawyer - The ebook is available for quick download, easy access to content
100% (2)
Explaining Creativity The Science of Human Innovation 1st Edition R. Keith Sawyer - The ebook is available for quick download, easy access to content
47 pages
Plunger Pumps Pompe A Pistoni Bombas de Embolos
No ratings yet
Plunger Pumps Pompe A Pistoni Bombas de Embolos
8 pages
BGAS-CSWIP-BG-22-12 BGAS-CSWIP 2nd Edition April 2019 PDF
No ratings yet
BGAS-CSWIP-BG-22-12 BGAS-CSWIP 2nd Edition April 2019 PDF
30 pages
Irjet V6i1057
No ratings yet
Irjet V6i1057
4 pages
Learning Module General Biology 1
No ratings yet
Learning Module General Biology 1
18 pages
2024 Education Brochure FINAL Online
No ratings yet
2024 Education Brochure FINAL Online
24 pages
CHAPTER2
No ratings yet
CHAPTER2
12 pages
Operating Systems (Cse2005) Project Component: Review 1 Submitted To Prof - Manikandan K Title
No ratings yet
Operating Systems (Cse2005) Project Component: Review 1 Submitted To Prof - Manikandan K Title
8 pages
Reflection Sona 2016
100% (1)
Reflection Sona 2016
1 page
Structural Design & Analysis Using Bentley STAAD - Pro Instructor-Led Online Training
No ratings yet
Structural Design & Analysis Using Bentley STAAD - Pro Instructor-Led Online Training
3 pages
5/2016 - 12/2016 Operations Consultant: Work Experience
No ratings yet
5/2016 - 12/2016 Operations Consultant: Work Experience
3 pages
Cel Microproject - 1
No ratings yet
Cel Microproject - 1
17 pages
How To Install
No ratings yet
How To Install
1 page
Greek City-States Project
No ratings yet
Greek City-States Project
3 pages
Calculus Sol2023
No ratings yet
Calculus Sol2023
6 pages
Ipsec
No ratings yet
Ipsec
18 pages

HPC Unit 456

Uploaded by

HPC Unit 456

Uploaded by

जय श्री राम

1. What are the sources of overhead in Parallel Programs?

2. Explain Performance Metrics for Parallel Systems

3. What are the effect of Granularity on Performance

Cost optimal way:

Minimum Cost-Optimal Parallel Time (TPcost_opt):

5. Define Matrix-Vector Multiplication with examples

6. Define Matrix-Matrix Multiplication with examples

7. State and explain Canon's Algorithm

8. State and explain Dense Matrix Algorithm

9. Explain Scalability in Parallel Systems

1. What are the issues in Sorting on Parallel Computers?

How comparisons are performed:

2. Explain Parallelizing Quick Sort

The basic steps of Quick Sort are as follows:

To parallelize Quick Sort:

Recursive Sorting Phase:

3. Explain All-Pairs Shortest Paths?

Floyd Warshall Algorithm

4. Explain dynamic load balancing scheme

Global round robin:

5. Compare various communication strategies

Remote Procedure Calls (RPC):

Direct Memory Access (DMA):

6. Explain Bubble Sort and its variants

7. Explain Algorithm for sparse graph

8. Dense Graph vs Sparse Graph

9. Describe Parallel Depth-First Search

10. Explain Parallel Best-First Search

1. Explain CUDA architecture with example and its application

2. Explain Heterogenous system architecture

3. Explain CUDA program flow:

• Load data into CPU memory

4. CUDA programming model

5. Write short note on CUDA C

Key features of CUDA C:

• Libraries and Tools:

6. Explain CUDA Kernel

Writing CUDA Kernel:

Launching CUDA Kernel:

7. Managing Communication and Synchronization

8. Explain Apache Hadoop

• Fault-tolerance and Scalability:

9. Explain Apache Spark

• Resilient Distributed Datasets (RDDs):

• Data processing capabilities:

10. Explain Apache Flink

• Event Time Processing:

11. Explain OpenCL

• Community and Ecosystem:

You might also like