Chapter 2
Chapter 2
Chapter 2
Performance Issues
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Designing for Performance
• The cost of computer systems continues to drop dramatically, while the performance and capacity
of those systems continue to rise equally dramatically
• Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago
• Processors are so inexpensive that we now have microprocessors we throw away
• Desktop applications that require the great power of today’s microprocessor-based systems
include:
– Image processing
– Three-dimensional rendering
– Speech recognition
– Videoconferencing
– Multimedia authoring
– Voice and video annotation of files
– Simulation modeling
• Businesses are relying on increasingly powerful servers to handle transaction and database
processing and to support massive client/server networks that have replaced the huge mainframe
computer centers of yesteryear
• Cloud service providers use massive high-performance banks of servers to satisfy high-volume,
high-transaction-rate applications for a broad spectrum of clients
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Microprocessor Speed
Techniques built into contemporary processors include:
Pipelining
• Processor moves data or instructions into a
conceptual pipe with all stages of the pipe processing
simultaneously
Branch prediction
• Processor looks ahead in the instruction code fetched
from memory and predicts which branches, or groups
of instructions, are likely to be processed next
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Performance Balance
• Adjust the organization and Increase the number
of bits that are
architecture to compensate retrieved at one time
by making DRAMs
for the mismatch among the “wider” rather than
“deeper” and by
using wide bus data
capabilities of the various paths
components
Reduce the frequency
• Architectural examples of memory access by
incorporating
increasingly complex
include: and efficient cache
structures between
the processor and
main memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 2.1
Ethernet modem
(max speed)
Graphics display
Wi-Fi modem
(max speed)
Hard disk
Optical disc
Laser printer
Scanner
Mouse
Keyboard
101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Improvements in Chip Organization and
Architecture
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Problems with Clock Speed and Logic
Density
• Power
– Power density increases with density of logic and clock speed
– Dissipating heat
• RC delay
– Speed at which electrons flow limited by resistance and capacitance
of metal wires connecting them
– Delay increases as the RC product increases
– As components on the chip decrease in size, the wire interconnects
become thinner, increasing resistance
– Also, the wires are closer together, increasing capacitance
• Memory latency and throughput
– Memory access speed (latency) and transfer speed (throughput) lag
processor speeds
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 2.2
107
106
Transistors (Thousands)
105 Frequency (MHz)
Power (W)
104 Cores
103
102
10
0.1
1970 1975 1980 1985 1990 1995 2000 2005 2010
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
• Leap in performance as well • Core designed to perform
as the challenges in parallel operations on graphics
developing software to data
exploit such a large number • Traditionally found on a plug-in
of cores graphics card, it is used to
• The multicore and MIC encode and render 2D and 3D
strategy involves a graphics as well as process
homogeneous collection of video
general purpose processors • Used as vector processors for
on a single chip a variety of applications that
require repetitive computations
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Amdahl’s Law
• Gene Amdahl
• Deals with the potential speedup of a program using
multiple processors compared to a single processor
• Illustrates the problems facing industry in the
development of multi-core machines
– Software must be adapted to a highly parallel execution
environment to exploit the power of parallel processing
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 2.3 T
(1 – f)T fT
(1 – f)T fT
N
1
1 f 1 T
N
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 2.4
Spedup f = 0.95
f = 0.90
f = 0.75
f = 0.5
Number of Processors
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Little’s Law
• Fundamental and simple relation with broad applications
• Can be applied to almost any system that is statistically in
steady state, and in which there is no leakage
• Queuing system
– If server is idle an item is served immediately, otherwise an arriving
item joins a queue
– There can be a single queue for a single server or for multiple servers,
or multiple queues with one being for each of multiple servers
• Average number of items in a queuing system equals the
average rate at which items arrive multiplied by the time that
an item spends in the system
– Relationship requires very few assumptions
– Because of its simplicity and generality it is extremely useful
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 2.5
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.1 Performance Factors and System Attributes
Ic p m k
Compiler technology X X X
Processor implementation X X
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Calculating the Mean
The three
The use of benchmarks to
compare systems involves common
calculating the mean value of formulas used
a set of data points related to for calculating
execution time
a mean are:
• Arithmetic
• Geometric
• Harmonic
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
MD
AM
(a) GM
Figure 2.6 HM
MD
AM
(b) GM
HM
MD
AM
(c) GM
HM
MD
AM
(d) GM
HM
MD
AM
(e) GM
HM
MD
AM
(f) GM
HM
MD
AM
(g) GM
HM
0 1 2 3 4 5 6 7 8 9 10 11
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
The AM used for a time-based variable, such as program execution time, has the
important property that it is directly proportional to the total time
If the total time doubles, the mean value doubles
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.2
A Comparison of Arithmetic and Harmonic Means for Rates
Computer Computer Computer Computer Computer Computer
A time B time C time A rate B rate C rate
(secs) (secs) (secs) (MFLOPS) (MFLOPS) (MFLOPS)
Program 1
2.0 1.0 0.75 50 100 133.33
(108 FP ops)
Program 1
0.75 2.0 4.0 133.33 50 25
(108 FP ops)
Total
execution 2.75 3.0 4.75 – – –
time
Arithmetic
mean of 1.38 1.5 2.38 – – –
times
Inverse
of total
0.36 0.33 0.21 – – –
execution
time (1/sec)
Arithmetic
mean of – – – 91.67 75.00 79.17
rates
Harmonic
mean of – – – 72.72 66.67 42.11
rates
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.3
A Comparison of Arithmetic and Geometric Means for Normalized
Results
(a) Results normalized to Computer A
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.4
Another Comparison of Arithmetic and Geometric Means for
Normalized Results
(a) Results normalized to Computer A
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Benchmark Principles
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
System Performance Evaluation
Corporation (SPEC)
• Benchmark suite
– A collection of programs, defined in a high-level language
– Together attempt to provide a representative test of a computer in a
particular application or system programming area
– SPEC
– An industry consortium
– Defines and maintains the best known collection of benchmark suites
aimed at evaluating computer systems
– Performance measurements are widely used for comparison and
research purposes
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
SPEC CPU2017
• Best known SPEC benchmark suite
• Industry standard suite for processor intensive applications
• Appropriate for measuring performance for applications that
spend most of their time doing computation rather than I/O
• Consists of 20 integer benchmarks and 23 floating-point
benchmarks written in C, C++, and Fortran
• For all of the integer benchmarks and most of the floating-
point benchmarks, there are both rate and speed benchmark
programs
• The suite contains over 11 million lines of code
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Rate Speed Language Kloc Application Area
Kloc = line count (including comments/whitespace) for source files used in a build/1000 (Table can be found on page 61 in the textbook.)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Rate Speed Language Kloc Application Area
Kloc = line count (including comments/whitespace) for source files used in a build/1000 (Table can be found on page 61 in the textbook.)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Base Peak
541.leela_r
892 1410 896 1420 (a) Rate Result
833 2420 770 2610 (768 copies)
548.exchange2_r
602.gcc_s
546 7.29 535 7.45 SPEC
605.mcf_s
866 5.45 700 6.75 CPU 2017
276 5.90 247 6.61 Integer
620.omnetpp_s
Benchmarks
188 7.52 179 7.91 for HP
623.xalancbmk_s
Integrity
625.x264_s
283 6.23 271 6.51 Superdome X
407 3.52 343 4.18
631.deepsjeng_s
(b) Speed
469 3.63 439 3.88 Result
641.leela_s
329 8.93 299 9.82
(384 threads)
648.exchange2_s
Run program
three times
Select
median value
Ratio(prog) =
Tref(prog)/TSUT(prog)
End
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Copyright
This work is protected by United States copyright laws and is provided solely
for the use of instructions in teaching their courses and assessing student
learning. dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permit-
ted. The work and materials from it should never be made available to
students except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved