Cs2100 14 Understanding Performance
Cs2100 14 Understanding Performance
Understanding Performance
Performance
Metrics
Purchasing perspective
given a collection of machines, which has the
best performance ?
least cost ?
best cost/performance?
Design perspective
faced with design options, which has the
best performance improvement ?
least cost ?
best cost/performance?
2011 Sem 1
Understanding Performance
Performance
Metrics
Both require
basis for comparison
metric for evaluation
2011 Sem 1
Understanding Performance
Technology cost
R&D
Manufacturing cost
Raw material
Cost of factory
Marketing cost
2011 Sem 1
Understanding Performance
Moores Law
The number of transistors that can be
inexpensively placed on an integrated circuit
is increasing exponentially, doubling
approximately every two years.
- Gordon Moore (1965)
2011 Sem 1
co-founder of Intel
Understanding Performance
Moores Law
2011 Sem 1
Understanding Performance
From: Facing the Hot Chips Challenge Again, Bill Holt, Intel, presented at Hot Chips 17, 2005.
2011 Sem 1
Understanding Performance
2011 Sem 1
Understanding Performance
Intel Penryn
45nm technology
Quad-core will have
820 million
transistors
Dual core will be
107mm2
2011 Sem 1
Understanding Performance
performanceX = 1 / execution_timeX
2011 Sem 1
Understanding Performance
10
2011 Sem 1
Understanding Performance
11
Performance Factors
Want to distinguish elapsed time and the time spent on our task
CPU execution time (CPU time) time the CPU spends working on
a task
Does not include time waiting for I/O or running other programs
2011 Sem 1
Understanding Performance
12
Performance Factors
CPU execution time = # CPU clock cycles x clock cycle
for a program
for a program
time
or
2011 Sem 1
Understanding Performance
13
Clock rate (MHz, GHz) is inverse of clock cycle time (clock period)
CC = 1 / CR
2011 Sem 1
Understanding Performance
14
2011 Sem 1
Understanding Performance
15
2011 Sem 1
Understanding Performance
16
2011 Sem 1
A
1
B
2
Understanding Performance
C
3
17
Effective CPI
i=1
(CPIi x ICi)
Understanding Performance
18
Effective CPI
2011 Sem 1
Understanding Performance
19
CPU time
CPU time
2011 Sem 1
Instruction_count x
CPI
----------------------------------------------clock_rate
Understanding Performance
20
2011 Sem 1
Understanding Performance
21
Determinates of CPU
Performance
CPU time
CPI
clock_cycle
Algorithm
Programming
language
Compiler
ISA
Processor
organization
Technology
2011 Sem 1
Understanding Performance
22
Determinates of CPU
Performance
CPU time
CPI
Algorithm
Programming
language
Compiler
ISA
Processor
organization
Technology
2011 Sem 1
clock_cycle
X
Understanding Performance
23
A Simple Example
Op
Freq
CPIi
Freq x CPIi
ALU
50%
Load
20%
Store
10%
Branch
20%
How much faster would the machine be if a better data cache reduced
the average load time to 2 cycles?
How does this compare with using branch prediction to shave a cycle off
the branch time?
2011 Sem 1
Understanding Performance
24
A Simple Example
Op
Freq
CPIi
Freq x CPIi
ALU
50%
.5
.5
.5
.25
Load
20%
1.0
.4
1.0
1.0
Store
10%
.3
.3
.3
.3
Branch
20%
.4
.4
.2
.4
2.2
1.6
2.0
1.95
2011 Sem 1
Understanding Performance
25
Benchmarking
2011 Sem 1
Understanding Performance
26
AM =
1/n
Timei
i=1
Where Timei is the execution time for the ith program of a total of n
programs in the workload
A smaller mean indicates a smaller average execution time and thus
improved performance
2011 Sem 1
Understanding Performance
27
2011 Sem 1
Understanding Performance
28
FP benchmarks
gzip
compression
wupwise
Quantum chromodynamics
vpr
swim
gcc
GNU C compiler
mgrid
mcf
Combinatorial optimization
applu
Parabolic/elliptic pde
crafty
Chess program
mesa
3D graphics library
parser
galgel
eon
Computer visualization
art
perlbmk
perl application
equake
gap
facerec
vortex
ammp
Computational chemistry
bzip2
compression
lucas
Primality testing
twolf
fma3d
sixtrack
apsi
Pollutant distribution
2011 Sem 1
Understanding Performance
29
perl application
bzip2
compression
gcc
GNU C compiler
mcf
Combinatorial optimization
go
Go program
hmmer
Gene sequencing
sjeng
AI pattern recognition
libquantum
Quantum computing
h264ref
Video compression
omnetpp
astar
Computer game
xalancbmk
XML processor
2011 Sem 1
Understanding Performance
30
gamess
milc
Quantum Chromodynamics
zeusmp
Magnetohydrodynamics
gromacs
Molecular Dynamics
cactusADM
leslie3D
namd
dealII
soplex
povray
Computer Visualization
calculix
Structural Mechanics
gemFDTD
Computational Electromagnetics
tonto
Quantum Crystallography
lbm
wrf
Weather Forecasting
2011
Sem 1
sphinx3
Understanding Performance
Speech Recognition
31
SPEC_rate
After the benchmarks are run on the system under test (SUT), a ratio for each of them is
calculated using the run time on the SUT and a SPEC-determined reference time.
From these ratios, the following metrics are calculated:
CINT2006 (for integer compute intensive performance comparisons):
* SPECint2006: The geometric mean of twelve normalized ratios - one for each
integer benchmark - when the benchmarks are compiled with peak tuning.
* SPECint_base2006: The geometric mean of twelve normalized ratios when the
benchmarks are compiled with base tuning.
* SPECint_rate2006: The geometric mean of twelve normalized throughput ratios
when the benchmarks are compiled with peak tuning.
* SPECint_rate_base2006: The geometric mean of twelve normalized throughput
ratios when the benchmarks are compiled with base tuning.
Similar metrics are used for the FP benchmarks.
In all cases, a higher score means "better performance" on the given workload.
2011 Sem 1
Understanding Performance
32
Notes
2011 Sem 1
Understanding Performance
33
Performance improvement
2011 Sem 1
Understanding Performance
34
2011 Sem 1
Understanding Performance
35
2011 Sem 1
Understanding Performance
36
2011 Sem 1
Understanding Performance
37
Understanding Performance
38
Intels chip
2011 Sem 1
Understanding Performance
39
Desktop PC
Motherboard
2011 Sem 1
Understanding Performance
40
Design-time metrics:
Can it be implemented, in how long, at what cost?
Can it be programmed? Ease of compilation?
Static Metrics:
How many bytes does the program occupy in memory?
Dynamic Metrics:
How many instructions are executed? How many bytes does the
CPI
processor fetch to execute the program?
How many clocks are required per instruction?
How "lean" a clock is practical?
Inst. Count
Understanding Performance
Cycle Time
41
Multicore processors
2011 Sem 1
Understanding Performance
42
2011 Sem 1
Understanding Performance
43
2011 Sem 1
Understanding Performance
44
QX6850 4 cores
2011 Sem 1
E6850 2 cores
Understanding Performance
45
END
2011 Sem 1
Understanding Performance
46