Chapter 1 Lecture 2 & 3 - Computer Performance
Chapter 1 Lecture 2 & 3 - Computer Performance
1
• Lecture Objectives : -
– Understand the ways in which computer
performance is measured.
– Become familiar with factors that contribute to
improvements in CPU and disk performance.
• Goal : - to understand cost and performance
association of architectural choice.
2
Computer Performance Metrics
Response Time
Delay between start end time of a task
Throughput
Numbers of tasks per given time
Power/Energy
Energy per task, power
3
Metrics for different applications
Desktop computing
Metrics: performance (latency), cost
Server computing
Examples: web servers, transaction servers, file servers
Metrics: performance (throughput), reliability
Embedded computing
Examples: printer, cell phone, video console
Metrics: performance (real-time), cost, power consumption
4
Measures of system performance
• Measures of system performance depend upon one’s
point of view.
.
5
Airline example
7
Amdahl’s Law
8
Amhdahl’s Law [contd…]
Speedup = ExTime =
overall old (1 - Fraction ) + Fraction
enhanced enhanced
ExTime Speedup
new enhanced
improve performance.
10
We can compare these two alternatives by
comparing the speedups.
11
–Example (2)
• You have a system that contains a special processor for doing
floating-point operations. You have determined that 50% of
your computations can use the floating-point processor. The
speedup of the floating pointing-point processor is 15.
• a) Overall speedup achieved by using the floating-point
processor.
12
b) Overall speedup achieved if you modify the compiler so
that 75% of the computations can use the floating-point
processor.
13
14
Example (3):-
15
b) In order to improve the speedup consider two
options:
• Option 1: Modifying the compiler so that 70% of the computations
can use the floating-point processor. Cost of this option is $50K.
16
Therefore, Option 1 is better because it has a smaller Cost/Speedup ratio.
17
Example (4):-
A program runs in 100 seconds on a machine, with multiply operations responsible for 80
seconds of this time.
How much do I have to improve the speed of multiplication if I want my program to run
five times faster ?
18
CPU performance equation
• All computers are constructed using a clock running at
a constant rate called clock ticks, clock periods, cycles,
or clock cycles.
– Computer designers refer to the time of a clock period by its
duration (e.g., 1 ns) or by its rate (e.g., 1 GHz).
• CPU time can be expressed two ways:
19
CPU perform…..cont’d
20
CPU perform…..cont’d
22
23
24
Example - 2
Op Freq Cycles CPI
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
2.2
How much faster would the machine be if a better data cache reduced the
average load time to 2 cycles?
• Load 20% x 2 cycles = .4
• Total CPI 2.2 1.6
• Relative performance is 2.2 / 1.6 = 1.38
How does this compare with reducing the branch instruction to 1 cycle?
• Branch 20% x 1 cycle = .2
• Total CPI 2.2 2.0
• Relative performance is 2.2 / 2.0 = 1.1
First, observe that only the CPI changes; the clock rate and instruction count
remain identical. We start by finding the original CPI with neither enhancement:
We can compute the CPI for the enhanced FPSQR by subtracting the cycles saved
27
CPU perform…..cont’d
We can compute the CPI for the enhancement of all FP instructions the same way or by
Since the CPI of the overall FP enhancement is slightly lower, its performance will be
28
Example - 4
• For the purpose of solving a given application problem, you
benchmark a program on two computer systems.
– On system A, the object code executed 80 million Arithmetic Logic Unit
operations (ALU ops), 40 million load instructions, and 25 million branch
instructions.
– On system B, the object code executed 50 million ALU ops, 50 million
loads, and 40 million branch instructions.
– In both systems, each ALU op takes 1 clock cycles, each load takes 3
clock cycles, and each branch takes 5 clock cycles.
• Find the CPI for each system.
• Assuming that the clock on system B is 10% faster than the clock on system
A, which system is faster for the given application problem and by how much
percent?
29
a) relative frequency of Compute the relative frequency of
occurrence of each type of instruction executed in both
systems occurrence
30
• c) Assuming that the clock on system B is 10% faster than the clock
on system A, which system is faster for the given application
problem and by how much percent?
31
Example - 5
• Suppose we are considering two alternatives for our conditional
branch instructions, as follows:
– CPU A: A condition code is set by a compare instruction and followed by
a branch that tests the condition code.
– CPU B: A compare is included in the branch.
• On both CPUs, the conditional branch instruction takes 2 cycles,
and all other instructions take 1 clock cycle. On CPU A, 20% of
all instructions executed are conditional branches; since every
branch needs a compare, another 20% of the instructions are
compares. Because CPU A does not have the compare included
in the branch, assume that its clock cycle time is 1.25 times faster
than that of CPU B. Which CPU is faster? Suppose CPU A’s
clock cycle time was only 1.1 times faster?
32
• Since we are ignoring all systems issues, we can use the CPU
performance formula:
• since 20% are branches taking 2 clock cycles and the rest of the
instructions take 1 cycle each. The performance of CPU A is then
• Clock cycle timeB is 1.25 × Clock cycle timeA, since A has a clock
rate that is 1.25 times higher. Compares are not executed in CPU B,
so 20%/80% or 25% of the instructions are now branches taking 2
clock cycles, and the remaining 75% of the instructions take 1
cycle. Hence,
33
• Because CPU B doesn’t execute compares, ICB = 0.8
× ICA. Hence, the performance of CPU B is
With this improvement CPU B, which executes fewer instructions, is now faster.
34
Example - 6
• Suppose you have a load/store computer with the following
instruction mix:
35
• b) We observe that 35% of the ALU ops are paired with a load,
and we propose to replace these ALU ops and their loads with a
new instruction. The new instruction takes 1 clock cycle. With the
new instruction added, branches take 5 clock cycles, Compute the
CPI for the new version.
36
• c) If the clock of the old version is 20% faster than the
new version, which version has faster CPU Execution
time and by how much percent
37