Principles of Scalable Performance
Principles of Scalable Performance
• Performance measures
• Speedup laws
• Scalability principles
• Scaling up vs. scaling down
1
Performance metrics and measures
• Parallelism profiles
2
Parallelism profile in Programs
4
Degree of parallelism
• Parallelism profile is a plot of the DOP as a
function of time
• Algorithm structure
• Program optimization
• Resource utilization
• Run-time conditions
6
Degree of parallelism
• n – homogeneous processors
9
Average parallelism
1 t2
A
t 2 t1 t 1
DOP (t )dt
m
m
A i ti / ti
i 1 i 1
11
Example: parallelism profile and average
parallelism
12
Available Parallelism
m
m
T (1) ti (1)
Wim
T (1) W i
i 1 i 1 S i 1
T ( ) m
m
T ( ) ti ( )
Wim
W / i
i 1
i
(response time)
14
Performance measures
15
Arithmetic mean performance
m
Ra Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R ( f i Ri )
* Weighted arithmetic mean
execution rate
a
i 1
-proportional to the sum of the inverses of
execution times
16
Geometric mean performance
m
Rg R 1/ m
i
Geometric mean execution rate
i 1
m
R Ri
*
g
fi Weighted geometric mean
execution rate
i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time
17
Harmonic mean performance
1 m 1 m 1
Ta Ti Arithmetic mean execution time
per instruction
m i 1 m i 1 Ri
18
Harmonic mean performance
m
Rh 1 / Ta m
Harmonic mean execution rate
(1 / R )
i 1
i
1
R
*
h m
Weighted harmonic mean execution rate
( f
i 1
i / Ri )
-corresponds to total # of operations divided by
the total time (closest to the real performance)
19
Harmonic Mean Speedup
n
i 1
f i / Ri
20
Harmonic Mean Speedup Performance
21
Amdahl’s Law
• Assume Ri = i, w = (, 0, 0, …, 1- )
• System is either sequential, with
probability , or fully parallel with prob.
1-
n
Sn
1 (n 1)
• Implies S 1/ as n
22
Speedup Performance
23
System Efficiency
24
Redundancy and Utilization
O ( n)
U ( n) R ( n) E ( n)
nT (n)
25
Quality of Parallelism
S ( n) E ( n) T 3 (1)
Q ( n)
R ( n) nT 2 (n)O(n)
26
Example of Performance
27
Standard Performance Measures
28
Standard Performance Measures
• Dhrystone results
• CPU intensive benchmark
• Consists of 100 high level language instructions
& data types
• Balanced with respect to statement type, data
type, locality of reference , with no operating
system calls and making no use of library
functions or subroutines
• Measure of integer performance of modern
processor
Standard Performance Measures
• Whestone results
• Fortran based synthetic benchmark
• Measure of floating-point performance
• Benchmark includes both integer & floating
point operations involving array
indexing ,subroutine calls, parameter
passing ,conditional branching
Standard Performance Measures
• Performance depends on compliers used