2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
Term 2024-2025
2
Computational Models
1. Equal Duration Models
✓ It is assumed that a given task can be divided into n equal subtasks each of which can be
executed by one processor.
✓ If t s = execution time of the whole task usingta single processor, then the time taken by
each processor to execute its subtask is tm = n
s
✓ Definition: The speedup factor S(n) of a parallel system is the ratio between the time taken
by a single processor to solve a given problem instance to the time taken by a parallel system
consisting of n processors to solve the same problem instance.
S ( n) = speedup factor
ts t
= = s = n
t t
m s
n
For I 1, n
c(I) a(I) + b(I); done in parallel, each processor does one addition
sum sum + c(j); done in parallel, each processor does one addition
For k 1, n
a(k) a(k)-average;
b(k) b(k) –average done in parallel, each processor does one addition
✓ This illustrative example shows that a realistic computational model should assume the
Computational Models
2. Parallel Computation with Serial Sections Model
✓ It is assumed that a fraction f of the given task (computation) is not dividable into concurrent
subtasks.
✓ The remaining part (1-f) is assumed to be dividable into concurrent subtasks.
✓ Performing similar derivations to those done in the case of the equal duration model will
result in the following: ts
tm = fts + (1 − f )
✓ The time required to execute the task on n processors
ts is n
n
✓ The speedup factor is therefore given by S ( n ) =
t
=
1 + (n − 1) f
fts + (1 − f ) s
n
✓ Result: The potential speedup due to the use of n processors is determined primarily by the
fraction of code that cannot be divided.
✓ If the task is completely serial, i.e. f = 1, then no speedup can be achieved regardless of the
number of processors used.
✓ This principle is known as the Amdahl’s law.
1
✓ According to this law, the maximum speedup factor is given by nLim S ( n) =
→ f
✓ According to Amdahl’s law the improvement in performance (speedup) of a parallel
algorithm over a sequential one is limited not by the number of processors employed but
rather by the fraction of the algorithm that cannot be parallelized.
✓ For some time and according to Amdahl’s law researchers were led to believe that a
substantial increase in speedup factor will not be possible by using parallel architectures.
Computational Models
✓ As stated earlier, the communication overhead should be included in the processing time.
✓ Considering the time incurred due to this communication overhead, the speedup factor is
ts n
given by S (n) = t
=
t
fts + (1 − f ) s + tc f (n − 1) + 1 + n c
n ts n 1
✓ The maximum speedup factor under such conditions is given by nLim S (n) = Lim =
→ n→ t t
f (n − 1) + 1 + n c f + c
ts ts
✓ The above equation indicates that the maximum speed-up factor is determined not by the
number of parallel processors employed but by the fraction of the computation that is not
parallelized and the communication overhead.
✓ Recall that the efficiency is defined as the ratio between the speedup factor and the number
of processors, n. The efficiency can be computed as follows.
1
(no communication overhead) =
1 + (n − 1) f
1
(with communication overhead) =
tc
f (n − 1) + 1 + n
ts
✓ As the number of processors increases, it may become difficult to use those processors efficiently.
✓ In order to maintain a certain level of processor efficiency, there should exist a relationship
between the fraction of serial computation, f, and the number of processor employed.
Interconnection Networks Performance Issues
o Definition:
Channel Bisection Width of a network (B): is the minimum number of wires that, when cut, divide the network into
equal halves with respect to the number of nodes.
o Definition: The wire bisection is the number of wires crossing this cut of the network.
o Example: the bisection width of a 4-cube is B = 8.
The Table provides some numerical values of the above topological characteristics for sample static networks.
Network Configuration Bisection Width (B) Node Degree (d) Diameter (D)
8-ary 1-cube 2 2 4
4-cube 8 4 4
3 3 2 Mesh 9 3 5
8-ary 2-cube 16 4 8
Interconnection Networks Performance Issues
✓ Bandwidth of a crossbar
o Define the bandwidth for the crossbar as the average number of requests that can be accepted by a crossbar
in a given cycle.
o As processors make requests for memory modules in a crossbar, contention can take place when two or
more processors request access to the same memory module.
o Example: the case of a crossbar consisting of three processors p1, p2 , and p3 and three memory modules M1, M 2 , and M 3.
o As processors make requests for accessing memory modules, the following cases may take place.
1. All three processors request access to the same memory module: In this case, only one request can be accepted.
Since there are three memory modules, then there are three ways (three accepted requests) in which such a case
can arise.
2. All three processors request access to two different memory modules: In this case two requests can be granted.
There are 18 ways (find why)(thirty-six accepted requests) in which such a case can arise.
3. All three processors request access to three different memory modules: In this case all three requests can be
granted. There are six ways (find why) (eighteen accepted requests) in which such a case can arise.
o Out of the twenty-seven combinations of 3 requests taken from 3 possible requests, there are 57 requests that can
be accepted (causing no memory contention).
o We say that the bandwidth of such a crossbar is BW = 57/27 = 2.11 Assuming that all processors make requests
for memory module access in every cycle.
Scalability Of Parallel Architectures
• Definition: A parallel architecture is said to be scalable if it can be expanded
(reduced) to a larger (smaller) system with a linear increase (decrease) in its
performance (cost).
• This general definition indicates the desirability for providing equal chance for
scaling up a system for improved performance and for scaling down a system for
greater cost-effectiveness and/or affordability.
• Scalability is used as a measure of the system’s ability to provide increased
performance, e.g., speed as its size is increased.
• Scalability is a reflection of the system’s ability to efficiently utilize the increased
processing resources.
• Scalability of a system can be manifested in a number of forms. These forms
include speed, efficiency, size, applications, generation, and heterogeneity.
Scalability Of Parallel Architectures
✓speed
o Scalable system is capable of increasing its speed in proportion to the increase in the number of processors.
o Example:
▪ Consider the case of adding m numbers on a 4-cube (n = 16 processors) parallel system.
▪ Assume for simplicity that m is a multiple of n e.g., 32, 64, ….
▪ Assume also that originally each processor has m numbers stored in its local memory.
▪ The addition can then proceed as follows: n
▪ First: each processor can add its own numbers sequentially in m steps.
n
▪ The addition operation is performed simultaneously in all processors.
▪ Second: each pair of neighboring processors can communicate their results to one of them whereby the
communicated result is added to the local result.
▪ The second step can be repeated in promotion to log n times, until the final result of the addition process is stored in one of
2
the processors.
▪ Assuming that each computation andmthe communication takes one unit time then the time needed to perform the
addition of these m numbers is Tp = n + 2 log 2 n
▪ Recall that the time required to perform the samem
operation on a single processor is Ts = m
S=
m
▪ Therefore, the speedup is given by n
+ 2 log 2 n
Scalability Of Parallel Architectures
The possible speedup for different m and n
m n=2 n=4 n =8 n = 16 n = 32
64 1.88 3.2 4.57 5.33 5.33
128 1.94 3.55 5.82 8.00 9.14
256 1.97 3.76 6.74 10.67 14.23
512 1.98 3.88 7.31 12.8 19.70
1024 1.99 3.94 7.64 14.23 24.38
✓ Efficiency
o Consider, for example, the above problem of adding m numbers on an n-cube. The efficiency of such system is
defined as the ratio between the actual speedup, S, and the ideal speedup, n. Therefore,
= S = m
n m + 2n log 2 n