Quatitative Principle
Quatitative Principle
1
Recommended Book::
2
Performance
What do we mean when we say one computer
is faster than the other?
Response Time
Throughput
3
Performance
We relate the performance of two different
computers (say X and Y) in comparing design
alternatives
4
Performance
"X is n times faster than Y" means
ExTime(Y) Performance(X)
--------- = --------------- =n
ExTime(X) Performance(Y)
5
Performance
The “throughput of X is 1.3 times higher than Y”
6
Computer Performance:
Response Time (latency)
7
Computer Performance:
Throughput
8
Computer Performance:
If we upgrade a machine with a new processor what do we
increase?
9
Execution Time
10
Execution Time
CPU time
Doesn't count I/O or time spent running other programs
Can be broken up into system time, and user time
11
Quantitative Principles
Guidelines and principles that are useful in the
design and analysis of computers
12
Quantitative Principles
Digital Design
13
Quantitative Principles
Principle of Locality
Programs tend to reuse data and instructions they have used
recently
Programs spend 90% of its execution time in only 10% of the
code.
Temporal Locality
Recently accessed items are likely to be accessed in the near
future
Spatial Locality
Items whose addresses are near to one another tend to be
referenced close together in time
14
Quantitative Principles
Focus on the Common Case
Make the common case Fast and Other Parts Correct
15
Amdahl's Law
• Amdahl's Law states that the performance
improvement to be gained from using some
faster mode of execution is limited by the
fraction of time the faster mode can be used.
16
Amdahl's Law
17
Amdahl's Law
18
Speedup
• Speedup tells us how much faster a task will run
using the computer with the enhancement as opposed
to the original computer.
• Amdahl's Law gives us a quick way to find speedup
from some enhancement.
– Fractionenhanced
– Speedupenhanced
19
Speedup
Fractionenhanced
Speedupenhanced
20
Speedup
21
Amdahl’s Law
1
ExTimeold
Speedupoverall = =
(1 - Fractionenhanced) + Fractionenhanced
ExTimenew
Speedupenhanced
22
Amdahl’s Law
• The maximum speedup, which can be obtained by
using a faster mode of execution, is limited by the
fraction of the time the faster mode can be used.
Examples include:
– Improving your Web-surfing speed
– Parallelization of computer programs
– Performance of symmetric-multiprocessor systems
23
Amdahl’s Law
ExTimenew =
Speedupoverall =
24
Amdahl’s Law
• Floating point instructions improved to run
2X; but only 10% of actual instructions are FP
Speedupoverall = 1 = 1.053
0.95
25
Amdahl’s Law
• A program spent 90% time on doing computing jobs, and
other 10% time on disk accesses. If we use a 100X faster CPU,
how much speedup we gain?
26
Amdahl’s Law
• A program spent 90% time on doing computing jobs, and
other 10% time on disk accesses. If we use a 100X faster CPU,
how much speedup we gain?
Speedupoverall = 1 = 9.17
0.109
27
Metrics of Performance
28
Performance
Performance is determined by execution time
Do any of the other variables equal to performance?
# of cycles to execute program?
# of instructions in program?
# of cycles per second?
average # of cycles per instruction?
average # of instructions per second?
29
CPU Performance Equation
30
CPU Performance Equation
31
CPU Performance Equation
CPU time = CPU clock cycles for a program x Clock
Cycle Time
32
CPU Performance Equation
CPU time is equally depend on these three
characteristics:
A 10% improvement in any one of them leads to 10%
improvement in CPU time.
• Unfortunately, it is difficult to change one parameter
in complete isolation from others because the basic
technologies involved in changing each
characteristics are interdependent:
Clock Cycle Time : Hardware technology and Organization
CPI : Organization and Instruction set architecture
Instruction count : Instruction set architecture and Compiler
technology
33
CPU Performance Equation
CPU time =CPU clock cycles for a program x clock cycle time
CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle
34
Cycles Per Instruction
Average Cycles per Instruction
CPI = (CPU clock cycles for a program) /
Instruction Count
CPU Time = Instruction Count x Cycles per
Instruction x clock cycle time
35
CPU Performance Equation
CPU time = Instruction Count x Cycles per Instruction
x clock cycle time
CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle
36
CPU Performance Equation
37
Understanding Program Performance
Performance of a program depends on
The algorithm,
The language,
The compiler,
The architecture, and
The actual hardware
38
H/W or S/W Affects How?
Component What?
Algorithm IC, Possibly 1. The algorithm determines the
CPI number of source program
instruction executed and hence
the number of processor
instructions executed
39
H/W or S/W Affects How?
Component What?
Programming IC, CPI 1. The programming language
Language affects the IC since the
statement in the language are
translated to processor
instructions, which determines
the IC
41
H/W or S/W Affects How?
Component What?
Instruction set IC, CPI, Clock 1. The instruction set architecture
architecture rate affects all three aspects of CPU
performance, since it affects the
instructions needed for a
function, the cost in cycle of
each instruction, and the overall
clock rate of the processor.
42
CPU Performance Equation
CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle
Compiler X X
Inst. Set. X X X
Organization X X
Technology X
43
Cycles Per Instruction
n
CPU Clock Cycle = CPIi x ICi
i =1
44
Cycles Per Instruction
n
n
CPI = CPIi * ICi
i =1
Instruction Count
45
Cycles Per Instruction
n
CPI = CPIi * ICi
i =1
Instruction Count
n
CPI = ICi
* CPIi
i =1
Instruction Count
46
Example: Calculating CPI
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
1.5
Typical Mix
47
Example
Suppose we have the following measurements:
Frequency of FP operation = 25 %
Average CPI of FP operations = 4.0
Average CPI of other instructions = 1.33
Frequency of FPSQR = 2%
CPI of FPSQR = 20
48
More on CPI
Is CPI a good metric to evaluate CPU
performance
49
More on CPI
50
Amdahl’s Law for Multiprocessor
Let T the total execution time of a program on a
uniprocessor workstation
Let the program has been parallelized or partitioned for
parallel execution on a cluster of many processing nodes
Assume that a fraction α of the code must be executed
sequentially, called sequential bottleneck
1 – α of the code can be compiled for parallel execution
by n processor
Total execution time of the program = αT + (1- α)T/n
51
Amdahl’s Law for Multiprocessor
52
Power and Energy
Problem: Get power in, get power out
• Three primary concern from the designers point of
view
What is the maximum power a processor requires ?
Modern processors provide voltage indexing methods that
allow the processor to slow down and regulate voltage within a
wider margin
What is the sustained power consumption ?
Widely called the Thermal Design Power (TDP)
Used as target for power supply and cooling system
Lower than peak power, higher than average power
consumption
Energy and energy efficiency
53
Trends in Power and Energy
Dynamic Energy and Power
Dynamic energy
Capacitive load x Voltage2
Transistor switch from 0 -> 1 or 1 -> 0
½ x Capacitive load x Voltage2
Dynamic power
½ x Capacitive load x Voltage2 x Frequency switched
55
Copyright © 2012, Elsevier Inc. All rights reserved.
Trends in Power and Energy
Static Power
Static power consumption
Currentstatic x Voltage
Scales with number of transistors
To reduce: power gating
56
Copyright © 2012, Elsevier Inc. All rights reserved.