0% found this document useful (0 votes)

24 views53 pages

Introduction To Parallel Computing Design and Anal

Uploaded by

Shastry Sarang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views53 pages

Introduction To Parallel Computing Design and Anal

Uploaded by

Shastry Sarang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/201976857

Introduction to parallel computing. Design and analysis of algorithms

Book · January 1994

CITATIONS READS

1,580 8,157

4 authors, including:

Vipin Kumar Anshul Gupta

University of Minnesota Twin Cities IBM
720 PUBLICATIONS 77,270 CITATIONS 104 PUBLICATIONS 5,575 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Efficient Exact Sparse Nearest Neighbor Search View project

Mining Evolving Patterns in Dynamic Relational Networks View project

All content following this page was uploaded by Vipin Kumar on 22 December 2013.

The user has requested enhancement of the downloaded file.

Introduction to
Parallel Computing

George Karypis
Principles of Parallel Algorithm
Design
Outline
Overview of some Serial Algorithms
Parallel Algorithm vs Parallel Formulation
Elements of a Parallel Algorithm/Formulation
Common Decomposition Methods
concurrency extractor!
Common Mapping Methods
parallel overhead reducer!
Some Serial Algorithms
Working Examples
Dense Matrix-Matrix & Matrix-Vector
Multiplication
Sparse Matrix-Vector Multiplication
Gaussian Elimination
Floyd’s All-pairs Shortest Path
Quicksort
Minimum/Maximum Finding
Heuristic Search—15-puzzle problem
Dense Matrix-Vector Multiplication
Dense Matrix-Matrix Multiplication
Sparse Matrix-Vector Multiplication
Gaussian Elimination
Floyd’s All-Pairs Shortest Path
Quicksort
Minimum Finding
15—Puzzle Problem
Parallel Algorithm vs Parallel
Formulation
Parallel Formulation
Refers to a parallelization of a serial algorithm.
Parallel Algorithm
May represent an entirely different algorithm than the
one used serially.

We primarily focus on “Parallel Formulations”

Our goal today is to primarily discuss how to develop
such parallel formulations.
Of course, there will always be examples of “parallel
algorithms” that were not derived from serial
algorithms.
Elements of a Parallel
Algorithm/Formulation
Pieces of work that can be done concurrently
tasks
Mapping of the tasks onto multiple processors
processes vs processors
Distribution of input/output & intermediate data across the different
processors
Management the access of shared data
either input or intermediate
Synchronization of the processors at various points of the parallel
execution

Holy Grail:
Maximize concurrency and reduce overheads due to parallelization!
Maximize potential speedup!
Finding Concurrent Pieces of Work
Decomposition:
The process of dividing the computation into
smaller pieces of work i.e., tasks
Tasks are programmer defined and are
considered to be indivisible
Example: Dense Matrix-Vector
Multiplication
Tasks can be of different size.
• granularity of a task
Example: Query Processing

Query:
Example: Query Processing
Finding concurrent tasks…
Task-Dependency Graph
In most cases, there are dependencies between
the different tasks
certain task(s) can only start once some other task(s)
have finished
e.g., producer-consumer relationships
These dependencies are represented using a
DAG called task-dependency graph
Task-Dependency Graph (cont)
Key Concepts Derived from the Task-
Dependency Graph
Degree of Concurrency
The number of tasks that can be concurrently
executed
we usually care about the average degree of
concurrency
Critical Path
The longest vertex-weighted path in the graph
The weights represent task size
Task granularity affects both of the above
characteristics
Task-Interaction Graph
Captures the pattern of interaction between
tasks
This graph usually contains the task-dependency
graph as a subgraph
i.e., there may be interactions between tasks even if there
are no dependencies
these interactions usually occur due to accesses on shared
data
Task Dependency/Interaction
Graphs
These graphs are important in developing
effectively mapping the tasks onto the different
processors
Maximize concurrency and minimize overheads

More on this later…

Common Decomposition Methods
Data Decomposition
Recursive Decomposition
Task
Exploratory Decomposition decomposition
methods
Speculative Decomposition
Hybrid Decomposition
Recursive Decomposition
Suitable for problems that can be solved
using the divide-and-conquer paradigm
Each of the subproblems generated by the
divide step becomes a task
Example: Quicksort
Example: Finding the Minimum
Note that we can obtain divide-and-conquer algorithms
for problems that are traditionally solved using non-
divide-and-conquer approaches
Recursive Decomposition
How good are the decompositions that it
produces?
average concurrency?
critical path?
How do the quicksort and min-finding
decompositions measure-up?
Data Decomposition
Used to derive concurrency for problems that operate on
large amounts of data
The idea is to derive the tasks by focusing on the
multiplicity of data
Data decomposition is often performed in two steps
Step 1: Partition the data
Step 2: Induce a computational partitioning from the data
partitioning
Which data should we partition?
Input/Output/Intermediate?
Well… all of the above—leading to different data decomposition
methods
How do induce a computational partitioning?
Owner-computes rule
Example: Matrix-Matrix
Multiplication
Partitioning the output data
Example: Matrix-Matrix
Multiplication
Partitioning the intermediate data
Data Decomposition
Is the most widely-used decomposition
technique
after all parallel processing is often applied to
problems that have a lot of data
splitting the work based on this data is the natural
way to extract high-degree of concurrency
It is used by itself or in conjunction with other
decomposition methods
Hybrid decomposition
Exploratory Decomposition
Used to decompose computations that
correspond to a search of a space of
solutions
Example: 15-puzzle Problem
Exploratory Decomposition
It is not as general purpose
It can result in speedup anomalies
engineered slow-down or superlinear
speedup
Speculative Decomposition
Used to extract concurrency in problems in
which the next step is one of many
possible actions that can only be
determined when the current tasks
finishes
This decomposition assumes a certain
outcome of the currently executed task
and executes some of the next steps
Just like speculative execution at the
microprocessor level
Example: Discrete Event
Simulation
Speculative Execution
If predictions are wrong…
work is wasted
work may need to be undone
state-restoring overhead
memory/computations

However, it may be the only way to extract

concurrency!
Mapping the Tasks
Why do we care about task mapping?
Can I just randomly assign them to the available processors?
Proper mapping is critical as it needs to minimize the
parallel processing overheads
If Tp is the parallel runtime on p processors and Ts is the serial
runtime, then the total overhead To is p*Tp – Ts
The work done by the parallel system beyond that required by the
serial system
Overhead sources:
they can Load imbalance remember the
be at odds Inter-process communication holy grail…
with each
other coordination/synchronization/data-sharing
Why Mapping can be Complicated?
Proper mapping needs to take into account the task-dependency
and interaction graphs
Are the tasks available a priori?
Static vs dynamic task generation Task
How about their computational requirements? dependency
Are they uniform or non-uniform? graph
Do we know them a priori?
How much data is associated with each task?
How about the interaction patterns between the tasks?
Are they static or dynamic?
Task
Do we know them a priori? interaction
Are they data instance dependent? graph
Are they regular or irregular?
Are they read-only or read-write?
Depending on the above characteristics different mapping
techniques are required of different complexity and cost
Example: Simple & Complex Task
Interaction
Mapping Techniques for Load
Balancing
Be aware…
The assignment of tasks whose aggregate
computational requirements are the same does not
automatically ensure load balance.

Each
processor is
assigned three
tasks but (a) is
better than (b)!
Load Balancing Techniques
Static
The tasks are distributed among the processors prior
to the execution
Applicable for tasks that are
generated statically
known and/or uniform computational requirements
Dynamic
The tasks are distributed among the processors
during the execution of the algorithm
i.e., tasks & data are migrated
Applicable for tasks that are
generated dynamically
unknown computational requirements
Static Mapping—Array Distribution
Suitable for algorithms that
use data decomposition
their underlying input/output/intermediate data
are in the form of arrays
Block Distribution
Cyclic Distribution
1D/2D/3D
Block-Cyclic Distribution
Randomized Block Distributions
Examples: Block Distributions
Examples: Block Distributions
Example: Block-Cyclic Distributions
Gaussian Elimination
The active portion
of the array shrinks
as the computations
progress
Random Block Distributions
Sometimes the computations are performed only
at certain portions of an array
sparse matrix-matrix multiplication
Random Block Distributions
Better load balance can be achieved via a
random block distribution
Graph Partitioning
A mapping can be achieved by directly
partitioning the task interaction graph.
EG: Finite element mesh-based computations
Directly partitioning this graph
Example: Sparse Matrix-Vector
Another instance of graph partitioning
Dynamic Load Balancing Schemes
There is a huge body of research
Centralized Schemes
A certain processors is responsible for giving out work
master-slave paradigm
Issue:
task granularity
Distributed Schemes
Work can be transferred between any pairs of processors.
Issues:
How do the processors get paired?
Who initiates the work transfer? push vs pull
How much work is transferred?
View publication stats

Mapping to Minimize Interaction

Overheads
Maximize data locality
Minimize volume of data-exchange
Minimize frequency of interactions
Minimize contention and hot spots
Overlap computation with interactions
Selective data and computation replication

Achieving the above is usually an interplay of

decomposition and mapping and is usually done iteratively

CLRS 4 1 2
No ratings yet
CLRS 4 1 2
7 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
Unit 2_Part_1
No ratings yet
Unit 2_Part_1
32 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
Unit 2
No ratings yet
Unit 2
81 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
Unit 2
No ratings yet
Unit 2
64 pages
Unit 2
No ratings yet
Unit 2
151 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Common PDC Module3
No ratings yet
Common PDC Module3
43 pages
Hpc_unit-2 Insem Notes
No ratings yet
Hpc_unit-2 Insem Notes
99 pages
Chap3 Slides Week4
No ratings yet
Chap3 Slides Week4
42 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Chapter Overview: Algorithms and Concurrency: - Introduction To Parallel Algorithms
No ratings yet
Chapter Overview: Algorithms and Concurrency: - Introduction To Parallel Algorithms
84 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
78 pages
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
89 pages
3.1.3 Processes and Mapping (1/5)
No ratings yet
3.1.3 Processes and Mapping (1/5)
74 pages
FoP HPC Unit II
No ratings yet
FoP HPC Unit II
107 pages
03-Task Decomposition and Mapping
No ratings yet
03-Task Decomposition and Mapping
62 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
Parallel Computing Unit 3 - Principles of Parallel Computing Design
No ratings yet
Parallel Computing Unit 3 - Principles of Parallel Computing Design
78 pages
AA-Part1 (1)
No ratings yet
AA-Part1 (1)
43 pages
Processes and Mapping, Decomposition Techniques
No ratings yet
Processes and Mapping, Decomposition Techniques
28 pages
6-Decomposition Techniques
No ratings yet
6-Decomposition Techniques
30 pages
in3200-chap05
No ratings yet
in3200-chap05
34 pages
Lecture 3 and 4HPC
No ratings yet
Lecture 3 and 4HPC
24 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
Partitioning
No ratings yet
Partitioning
37 pages
07 Parallel Algorithms in Parallel and Distributed Computing
No ratings yet
07 Parallel Algorithms in Parallel and Distributed Computing
13 pages
Parallel Algorithms Presentation (1)
No ratings yet
Parallel Algorithms Presentation (1)
32 pages
WINSEM2022_23_CSE4001_ETH_VL2022230503182_Reference_Material_I_02
No ratings yet
WINSEM2022_23_CSE4001_ETH_VL2022230503182_Reference_Material_I_02
28 pages
Parallel Algorithms Complete Notes
No ratings yet
Parallel Algorithms Complete Notes
13 pages
E- Notes -HPC-Unit 3-1
No ratings yet
E- Notes -HPC-Unit 3-1
26 pages
2.Decomposition Done
No ratings yet
2.Decomposition Done
4 pages
Unit 2 HPC
No ratings yet
Unit 2 HPC
92 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Module - 3 Parallel Algorithm Design - Preliminaries
No ratings yet
Module - 3 Parallel Algorithm Design - Preliminaries
12 pages
Module 3 - Principles of Parallel Algorithm Design
No ratings yet
Module 3 - Principles of Parallel Algorithm Design
39 pages
Padp Unit 4up
No ratings yet
Padp Unit 4up
147 pages
HPC Chapter 2
No ratings yet
HPC Chapter 2
16 pages
X. Decomposition and Orchestration: April 27, 2009
No ratings yet
X. Decomposition and Orchestration: April 27, 2009
30 pages
PC 10 Esf PDF
No ratings yet
PC 10 Esf PDF
30 pages
Chapter Six
No ratings yet
Chapter Six
19 pages
PDA_4
No ratings yet
PDA_4
82 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
Week 3 Parallel Algorithms
No ratings yet
Week 3 Parallel Algorithms
10 pages
Parallel Algorithm Design: 3.1 Task/Channel Model
No ratings yet
Parallel Algorithm Design: 3.1 Task/Channel Model
27 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
Chapter Six
No ratings yet
Chapter Six
18 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Anna University: Chennai - 600025 Mca Degree Examinations - May/June-2012 Second Semester Mc9229: Algorithms Lab
No ratings yet
Anna University: Chennai - 600025 Mca Degree Examinations - May/June-2012 Second Semester Mc9229: Algorithms Lab
6 pages
C#Brain Teasers
No ratings yet
C#Brain Teasers
5 pages
Automata Theory-Notes
No ratings yet
Automata Theory-Notes
92 pages
Practicle File Dsa 1
No ratings yet
Practicle File Dsa 1
55 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
331 pages
Algorithm Exam
No ratings yet
Algorithm Exam
14 pages
Comparators and Code Convertersv
No ratings yet
Comparators and Code Convertersv
36 pages
Sova and BCJR
No ratings yet
Sova and BCJR
18 pages
CS 162 - Automata Theory - Fall 2015 - Goodrich - Final Exam
No ratings yet
CS 162 - Automata Theory - Fall 2015 - Goodrich - Final Exam
10 pages
MachineLearning Technique On Horse Racing
50% (2)
MachineLearning Technique On Horse Racing
7 pages
Lesson Plan - MCS 301
No ratings yet
Lesson Plan - MCS 301
3 pages
LFU and LRU
No ratings yet
LFU and LRU
7 pages
Vertex Form of Parabolas
No ratings yet
Vertex Form of Parabolas
4 pages
Elastic Run 2018
No ratings yet
Elastic Run 2018
7 pages
Data+Structures+and+Algorithms+Bootcamp+in+Python+slides+Remaster (1) - Part-4
No ratings yet
Data+Structures+and+Algorithms+Bootcamp+in+Python+slides+Remaster (1) - Part-4
62 pages
Coding Session Question Bank-2
No ratings yet
Coding Session Question Bank-2
34 pages
Theory of Computation
No ratings yet
Theory of Computation
7 pages
Planar Graph Drawing Takao Nishizeki Dr Md Saidur Rahman instant download
No ratings yet
Planar Graph Drawing Takao Nishizeki Dr Md Saidur Rahman instant download
77 pages
Linear-Algebra-and-Network-Theory-A-Powerful-Combination
No ratings yet
Linear-Algebra-and-Network-Theory-A-Powerful-Combination
8 pages
Report Lab 6
No ratings yet
Report Lab 6
19 pages
DSA Assignment - 4
No ratings yet
DSA Assignment - 4
9 pages
102 Combinatorial Problems: Printed Book
No ratings yet
102 Combinatorial Problems: Printed Book
1 page
Numerical Computation - Computing
No ratings yet
Numerical Computation - Computing
97 pages
Data Structure Using 'C'.
No ratings yet
Data Structure Using 'C'.
15 pages
BCS 042 ALGO Imp Ques
100% (1)
BCS 042 ALGO Imp Ques
3 pages
Introduction to Model Predictive Control(增量式mpc)
No ratings yet
Introduction to Model Predictive Control(增量式mpc)
24 pages
Threaded Representation of Binary Trees
No ratings yet
Threaded Representation of Binary Trees
6 pages
DAA Assign 2
No ratings yet
DAA Assign 2
5 pages

Introduction To Parallel Computing Design and Anal

Uploaded by

Introduction To Parallel Computing Design and Anal

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Introduction to parallel computing. Design and analysis of algorithms

Book · January 1994

Vipin Kumar Anshul Gupta

SEE PROFILE SEE PROFILE

Efficient Exact Sparse Nearest Neighbor Search View project

Mining Evolving Patterns in Dynamic Relational Networks View project

The user has requested enhancement of the downloaded file.

 We primarily focus on “Parallel Formulations”

 More on this later…

 However, it may be the only way to extract

Mapping to Minimize Interaction

Achieving the above is usually an interplay of

You might also like

We primarily focus on “Parallel Formulations”

More on this later…

However, it may be the only way to extract