0% found this document useful (0 votes)

6 views

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Omar Amer

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Omar Amer

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Module #2

Performance Analysis of Multiprocessor

Professor Mostafa Abd-El-Barr

Term 2024-2025

Friday, October 4, 2024 1

Outline
❑Computational Models
❑ An Argument for Parallel Architectures
❑ Interconnection Networks Performance Issues
❑ Scalability Of Parallel Architectures

2
Computational Models
1. Equal Duration Models
✓ It is assumed that a given task can be divided into n equal subtasks each of which can be
executed by one processor.
✓ If t s = execution time of the whole task usingta single processor, then the time taken by
each processor to execute its subtask is tm = n
s

✓ Definition: The speedup factor S(n) of a parallel system is the ratio between the time taken
by a single processor to solve a given problem instance to the time taken by a parallel system
consisting of n processors to solve the same problem instance.
S ( n) = speedup factor
ts t
= = s = n
t t
m s
n

✓ Effect of the communication overhead: Assume that t c is the communication overhead

needed to communicate resulting from the time needed for processors to communicate and
possibly exchange data while executing their subtasks.
✓ Assume that the time incurred due to the communication overhead is called t.m = ts + tc
n
✓ The actual time taken by each processor to execute its subtask is given by

S (n) = speedup factor wit h communiati on overhead

t t n
= s = s =
t t t Friday, October 4, 2024
m s +t 1+ n  c
n c t
Computational Models
✓ Definition: efficiency  is a measure of the speedup achieved per processor.
✓ If the communication overhead is taken into consideration, the efficiency can be expressed
1
as  = t
1+ n c
t
s
✓ The equal duration model is however unrealistic.
✓ This is because it is based on the assumption that a given task can be divided into a number
of equal subtasks that can be executed by a number of processors in parallel.
✓ Real algorithms contain some (serial) parts that cannot be divided among processors. These
(serial) parts must be executed on a single processor.
✓ Example

For I  1, n
c(I)  a(I) + b(I); done in parallel, each processor does one addition

Sum  0; only one processor can do this (series section)

For J  1, n

sum  sum + c(j); done in parallel, each processor does one addition

Average  sum/n; only one processor can do this (series section)

For k  1, n
a(k) a(k)-average;
b(k)  b(k) –average done in parallel, each processor does one addition

✓ This illustrative example shows that a realistic computational model should assume the
Computational Models
2. Parallel Computation with Serial Sections Model
✓ It is assumed that a fraction f of the given task (computation) is not dividable into concurrent
subtasks.
✓ The remaining part (1-f) is assumed to be dividable into concurrent subtasks.
✓ Performing similar derivations to those done in the case of the equal duration model will
result in the following: ts
tm = fts + (1 − f )
✓ The time required to execute the task on n processors
ts is n
n
✓ The speedup factor is therefore given by S ( n ) =
t
=
1 + (n − 1) f
fts + (1 − f ) s
n
✓ Result: The potential speedup due to the use of n processors is determined primarily by the
fraction of code that cannot be divided.
✓ If the task is completely serial, i.e. f = 1, then no speedup can be achieved regardless of the
number of processors used.
✓ This principle is known as the Amdahl’s law.
1
✓ According to this law, the maximum speedup factor is given by nLim S ( n) =
→ f
✓ According to Amdahl’s law the improvement in performance (speedup) of a parallel
algorithm over a sequential one is limited not by the number of processors employed but
rather by the fraction of the algorithm that cannot be parallelized.
✓ For some time and according to Amdahl’s law researchers were led to believe that a
substantial increase in speedup factor will not be possible by using parallel architectures.
Computational Models
✓ As stated earlier, the communication overhead should be included in the processing time.
✓ Considering the time incurred due to this communication overhead, the speedup factor is
ts n
given by S (n) = t
=
t
fts + (1 − f ) s + tc f (n − 1) + 1 + n c
n ts n 1
✓ The maximum speedup factor under such conditions is given by nLim S (n) = Lim =
→ n→ t t
f (n − 1) + 1 + n c f + c
ts ts
✓ The above equation indicates that the maximum speed-up factor is determined not by the
number of parallel processors employed but by the fraction of the computation that is not
parallelized and the communication overhead.
✓ Recall that the efficiency is defined as the ratio between the speedup factor and the number
of processors, n. The efficiency can be computed as follows.
1
 (no communication overhead) =
1 + (n − 1) f
1
 (with communication overhead) =
tc
f (n − 1) + 1 + n
ts
✓ As the number of processors increases, it may become difficult to use those processors efficiently.
✓ In order to maintain a certain level of processor efficiency, there should exist a relationship
between the fraction of serial computation, f, and the number of processor employed.
Interconnection Networks Performance Issues
o Definition:
Channel Bisection Width of a network (B): is the minimum number of wires that, when cut, divide the network into
equal halves with respect to the number of nodes.
o Definition: The wire bisection is the number of wires crossing this cut of the network.
o Example: the bisection width of a 4-cube is B = 8.

The k-ary n-cube network is a radix k cube with

n dimensions. N = k n
K=8
(a) 8-ary 1-cube (8 nodes ring) network

The Table provides some numerical values of the above topological characteristics for sample static networks.
Network Configuration Bisection Width (B) Node Degree (d) Diameter (D)
8-ary 1-cube 2 2 4
4-cube 8 4 4
3  3  2 Mesh 9 3 5
8-ary 2-cube 16 4 8
Interconnection Networks Performance Issues
✓ Bandwidth of a crossbar
o Define the bandwidth for the crossbar as the average number of requests that can be accepted by a crossbar
in a given cycle.
o As processors make requests for memory modules in a crossbar, contention can take place when two or
more processors request access to the same memory module.
o Example: the case of a crossbar consisting of three processors p1, p2 , and p3 and three memory modules M1, M 2 , and M 3.
o As processors make requests for accessing memory modules, the following cases may take place.
1. All three processors request access to the same memory module: In this case, only one request can be accepted.
Since there are three memory modules, then there are three ways (three accepted requests) in which such a case
can arise.
2. All three processors request access to two different memory modules: In this case two requests can be granted.
There are 18 ways (find why)(thirty-six accepted requests) in which such a case can arise.
3. All three processors request access to three different memory modules: In this case all three requests can be
granted. There are six ways (find why) (eighteen accepted requests) in which such a case can arise.
o Out of the twenty-seven combinations of 3 requests taken from 3 possible requests, there are 57 requests that can
be accepted (causing no memory contention).
o We say that the bandwidth of such a crossbar is BW = 57/27 = 2.11 Assuming that all processors make requests
for memory module access in every cycle.
Scalability Of Parallel Architectures
• Definition: A parallel architecture is said to be scalable if it can be expanded
(reduced) to a larger (smaller) system with a linear increase (decrease) in its
performance (cost).
• This general definition indicates the desirability for providing equal chance for
scaling up a system for improved performance and for scaling down a system for
greater cost-effectiveness and/or affordability.
• Scalability is used as a measure of the system’s ability to provide increased
performance, e.g., speed as its size is increased.
• Scalability is a reflection of the system’s ability to efficiently utilize the increased
processing resources.
• Scalability of a system can be manifested in a number of forms. These forms
include speed, efficiency, size, applications, generation, and heterogeneity.
Scalability Of Parallel Architectures
✓speed
o Scalable system is capable of increasing its speed in proportion to the increase in the number of processors.
o Example:
▪ Consider the case of adding m numbers on a 4-cube (n = 16 processors) parallel system.
▪ Assume for simplicity that m is a multiple of n e.g., 32, 64, ….
▪ Assume also that originally each processor has m numbers stored in its local memory.
▪ The addition can then proceed as follows: n

▪ First: each processor can add its own numbers sequentially in m steps.
n
▪ The addition operation is performed simultaneously in all processors.
▪ Second: each pair of neighboring processors can communicate their results to one of them whereby the
communicated result is added to the local result.
▪ The second step can be repeated in promotion to log n times, until the final result of the addition process is stored in one of
2
the processors.
▪ Assuming that each computation andmthe communication takes one unit time then the time needed to perform the
addition of these m numbers is Tp = n + 2  log 2 n
▪ Recall that the time required to perform the samem
operation on a single processor is Ts = m
S=
m
▪ Therefore, the speedup is given by n
+ 2  log 2 n
Scalability Of Parallel Architectures
The possible speedup for different m and n
m n=2 n=4 n =8 n = 16 n = 32
64 1.88 3.2 4.57 5.33 5.33
128 1.94 3.55 5.82 8.00 9.14
256 1.97 3.76 6.74 10.67 14.23
512 1.98 3.88 7.31 12.8 19.70
1024 1.99 3.94 7.64 14.23 24.38
✓ Efficiency
o Consider, for example, the above problem of adding m numbers on an n-cube. The efficiency of such system is
defined as the ratio between the actual speedup, S, and the ideal speedup, n. Therefore,
= S = m
n m + 2n  log 2 n

Efficiency for different values of m and n

M n=2 n=4 n =8 n = 16 n = 32
64 0.94 0.8 0.57 0.33 0.167
128 0.97 0.888 0.73 0.5 0.285
256 0.985 0.94 0.84 0.67 0.444
512 0.99 0.97 0.91 0.8 0.62
1024 0.995 0.985 0.955 0.89 0.76
Scalability Of Parallel Architectures
o The values in the table indicate that for the same number of processors, n, higher efficiency is achieved as the size
of the problem, m, is increased.
o Also, as the number of processors, n, increases, the efficiency continues to decrease.
o Given these two observations, it should be possible to keep the efficiency fixed by increasing simultaneously both
the size of the problem, m, and the number of processors, n.
o This is a property of a scalable parallel system.
o The degree of scalability of a parallel system is determined by the rate at which the problem size must increase
with respect to n in order to maintain a fixed efficiency as the number of processors increases.
o In a highly scalable parallel system, the size of the problem needs to grow linearly with respect to n to maintain a
fixed efficiency.
o In a poorly scalable system, the size of the problem needs to grow exponentially with respect to n to maintain a
fixed efficiency.
References
▪ Textbook Chapter 3.

Friday, October 4, 2024 13

Muka Depan Nkjqzu
No ratings yet
Muka Depan Nkjqzu
10 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Module 1 Chapter3
No ratings yet
Module 1 Chapter3
45 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
OOAD
No ratings yet
OOAD
67 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Week_7 (1)
No ratings yet
Week_7 (1)
27 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Parallel Computing
No ratings yet
Parallel Computing
30 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Week 7
No ratings yet
Week 7
27 pages
CSCI 8150 Advanced Computer Architecture
No ratings yet
CSCI 8150 Advanced Computer Architecture
26 pages
Unit 4
No ratings yet
Unit 4
64 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
Unit_4_HPC
No ratings yet
Unit_4_HPC
82 pages
ACA Answer Key
No ratings yet
ACA Answer Key
24 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Performance&Scalability Ch3
No ratings yet
Performance&Scalability Ch3
41 pages
Massively Parallel Processors
No ratings yet
Massively Parallel Processors
102 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Introduction to Paralel Procesing
No ratings yet
Introduction to Paralel Procesing
40 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
PP 1
No ratings yet
PP 1
41 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
Parallel Algorithms Unit 2 By Dr. Choudhary Ravi Singh
No ratings yet
Parallel Algorithms Unit 2 By Dr. Choudhary Ravi Singh
18 pages
Pepper Presentation
No ratings yet
Pepper Presentation
38 pages
Unit 4 HPC Part2
No ratings yet
Unit 4 HPC Part2
18 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
BDS-Session-2
No ratings yet
BDS-Session-2
58 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
15CS72 ACA Module1 Chapter3FinalCopy
No ratings yet
15CS72 ACA Module1 Chapter3FinalCopy
28 pages
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
No ratings yet
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
4 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
410A-week-4
No ratings yet
410A-week-4
12 pages
Lect 02
No ratings yet
Lect 02
51 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
2nd
No ratings yet
2nd
19 pages
UNIT-2 Parallel Programming Challenges
No ratings yet
UNIT-2 Parallel Programming Challenges
32 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Cours 2
No ratings yet
Cours 2
25 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
Chapter 3: Principles of Scalable Performance
No ratings yet
Chapter 3: Principles of Scalable Performance
41 pages
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
IRITM Manak Nagar Lucknow - Google Search
No ratings yet
IRITM Manak Nagar Lucknow - Google Search
1 page
Graphics Software
No ratings yet
Graphics Software
2 pages
CIT353 2024_1
No ratings yet
CIT353 2024_1
2 pages
GTA 5 Cheats Full List of GTA 5 Cheat Codes For
No ratings yet
GTA 5 Cheats Full List of GTA 5 Cheat Codes For
2 pages
Pdf2Gerb 1.6
No ratings yet
Pdf2Gerb 1.6
11 pages
Steel Frame Report
No ratings yet
Steel Frame Report
34 pages
Arduino Cheat Sheet: Basics 1: Using Digital I/O, Analog Reading, Serial As Output, Basic Timer
No ratings yet
Arduino Cheat Sheet: Basics 1: Using Digital I/O, Analog Reading, Serial As Output, Basic Timer
3 pages
An Introduction To Basics of Interfaces in Oracle Apps
No ratings yet
An Introduction To Basics of Interfaces in Oracle Apps
15 pages
Getting Started With Cygwin
No ratings yet
Getting Started With Cygwin
22 pages
Department of Education: Accomplishment Report For The Month of February
100% (1)
Department of Education: Accomplishment Report For The Month of February
3 pages
CO 1 Tutorials
No ratings yet
CO 1 Tutorials
2 pages
Vinted FR S791418251
No ratings yet
Vinted FR S791418251
1 page
01programming SZGH CNC990TDb (V2.4)
100% (1)
01programming SZGH CNC990TDb (V2.4)
120 pages
Smart School SRS
100% (1)
Smart School SRS
33 pages
DBE-04499Deng - Display
No ratings yet
DBE-04499Deng - Display
163 pages
Panasonic th-p42x20d Chassis Gph13da SM
100% (1)
Panasonic th-p42x20d Chassis Gph13da SM
112 pages
Earley Parser
No ratings yet
Earley Parser
29 pages
User Manual of CR-31&31D Video Layrngoscope V1.11 230215
No ratings yet
User Manual of CR-31&31D Video Layrngoscope V1.11 230215
48 pages
4.2.1 STS 3113 202020 Course Project
No ratings yet
4.2.1 STS 3113 202020 Course Project
5 pages
IB NIMCET-2023 28feb
No ratings yet
IB NIMCET-2023 28feb
3 pages
NIST 800 171 Compliance Scoping Guide
No ratings yet
NIST 800 171 Compliance Scoping Guide
84 pages
PSM 2 - Final Report - Nagace180100
No ratings yet
PSM 2 - Final Report - Nagace180100
101 pages
Dev List
No ratings yet
Dev List
7 pages
Iviewabplctutorial
No ratings yet
Iviewabplctutorial
8 pages
Secrets of The Mix Engineers - Robert Orton
No ratings yet
Secrets of The Mix Engineers - Robert Orton
8 pages
L22 DecisionTrees
No ratings yet
L22 DecisionTrees
14 pages
MSeries MCE Product Info
No ratings yet
MSeries MCE Product Info
2 pages
Bott
No ratings yet
Bott
328 pages
DeterminationofDischargeCoefficientofRectangularBroad-CrestedWeirby (2)
No ratings yet
DeterminationofDischargeCoefficientofRectangularBroad-CrestedWeirby (2)
6 pages

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version

Uploaded by

Module #2

Performance Analysis of Multiprocessor

Friday, October 4, 2024 1

✓ Effect of the communication overhead: Assume that t c is the communication overhead

S (n) = speedup factor wit h communiati on overhead

Sum  0; only one processor can do this (series section)

Average  sum/n; only one processor can do this (series section)

The k-ary n-cube network is a radix k cube with

Efficiency for different values of m and n

Friday, October 4, 2024 13

You might also like