Analysis
Analysis
CPS343
Spring 2016
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 1 / 32
Outline
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 2 / 32
Acknowledgements
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 3 / 32
Outline
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 4 / 32
Speedup
Speedup is defined as
sequential execution time tseq
Speedup on N processors = =
execution time on N processors tpar
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 5 / 32
Efficiency
Efficiency is defined as
speedup tseq
Efficiency = =
N N · tpar
0 ≤ efficiency ≤ 1
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 6 / 32
Speedup
σ(n) + ϕ(n)
ψ(n, N) =
σ(n) + ϕ(n)/N + κ(n, N)
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 7 / 32
Speedup
σ(n) + ϕ(n)
ψ(n, N) ≤
σ(n) + ϕ(n)/N
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 8 / 32
Outline
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 9 / 32
Amdahl’s Law
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 10 / 32
Amdahl’s Law
Simplifying we have
tseq 1
ψ(N) = ≤ α
tpar (1 − α) + N
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 11 / 32
Amdahl’s Law speedup prediction
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 12 / 32
Amdahl’s Law efficiency prediction
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 13 / 32
Amdahl’s Law example
Suppose a serial program reads n data from a file, performs some
computation, and then writes n data back out to another file. The I/O
time is measured and found to be 4500 + n µsec. If the computation
portion takes n2 /200 µsec, what is the maximum speedup we can expect
when n=10,000 and p processors are used?
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 14 / 32
Amdahl’s Law example
Suppose a serial program reads n data from a file, performs some
computation, and then writes n data back out to another file. The I/O
time is measured and found to be 4500 + n µsec. If the computation
portion takes n2 /200 µsec, what is the maximum speedup we can expect
when n=10,000 and p processors are used?
We assume that the I/O must be done serially but that the computation
can be parallelized. Computing α we find
500000 5000
f = = ≈ 0.97182
4500 + 10000 + 500000 5145
so, by Amdahl’s Law,
1 5145
ψ≤ 5000
5000
=
1− 5145 + 5145N
145 + 5000/N
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 15 / 32
The Gustafson-Barsis Law
tseq = (1 − α)T + α · T · N
so
tseq (1 − α)T + α · T · N
ψ≤ = = (1 − α) + αN
tpar T
This is the Gustafson-Barsis Law (1988).
The speedup estimate it produces is sometimes called scaled speedup.
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 16 / 32
Gustafson-Barsis Law speedup prediction
This is much more encouraging than what Amdahl’s Law showed us.
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 17 / 32
Gustafson-Barsis Law efficiency prediction
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 19 / 32
Gustafson-Barsis’s Law example
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 19 / 32
The laws compared
The Wikipedia page for Gustafson’s Law offers the following metaphor to
contrast the two laws.
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 21 / 32
Speedup revisited
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 22 / 32
Parallel Execution Time
tp = tcomp + tcomm
Speedup is then
ts ts
ψ= =
tp tcomp + tcomm
The computation/communication ratio is tcomp /tcomm .
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 23 / 32
Message transfer time
Typically the time for communication can be broken down into two parts,
the time tstartup necessary for building the message and initiating the
transfer, and the time tdata required per data item in the message. At a
first approximation this looks like
t
tstartup
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 25 / 32
Workstation cluster timing data (Fall 2010)
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 26 / 32
Workstation cluster timing data (Spring 2016)
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 27 / 32
Minor Prophets cluster latency and bandwidth (2016)
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 28 / 32
Canaan cluster latency and bandwidth (2016)
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 29 / 32
LittleFe cluster latency and bandwidth (2013)
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 30 / 32
Cluster latency and bandwidth comparison
Minor Prophets Canaan
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 31 / 32
Cluster latency and bandwidth comparison
CPS343 (Parallel and HPC) Performance Metrics, Prediction, and Measurement Spring 2016 32 / 32