Chapter 1 PPT 2007 V 2
Chapter 1 PPT 2007 V 2
Introduction
1
Outline
Background
Recap : Computer Architecture
Performance
SPEC Benchmarks
Measuring and summarizing performance
Amdahl’s Law
2
Background
Computer technology has made incredible progress in the roughly
60 years since the first general-purpose electronic computer was
created
Today, less than $500 will purchase a personal computer that has
More performance
More main memory and
More disk storage , than a computer bought in 1985 for 1 million dollars
3
Background…
The late 1970s saw the emergence of the microprocessor
The ability of the microprocessor to ride the improvements in integrated circuit
technology led to a higher rate of improvement
Roughly 35% growth per year in performance
5
Recap : Computer Architecture
Computer Architecture --- Old View --- Instruction Set Architecture (ISA)
Refers to the actual programmer visible instruction set
The ISA serves as the boundary between the software and hardware
software
instruction set architecture
hardware
The term organization (also called micro architecture) includes the high-level aspects
of a computer’s design, such as
the memory system
the memory interconnect
the design of the internal processor or CPU
(central processing unit—where arithmetic, logic, branching, and data transfer are implemented)
For example, two processors with the same instruction set architectures but very
different organizations are
the AMD Opteron 64 and the Intel Pentium 4. Both processors implement the x86 instruction
set, but they have very different pipeline and cache organizations
the AMD Opteron and the Intel Core i7.Both processors implement the x86 instruction set , but
they have very different pipeline and cache organizations
7
Recap : Computer Architecture …
Hardware refers to the specifics of a computer, including
the detailed logic design
the packaging technology of the computer
8
Recap : Computer Architecture …
Computer Architecture --- Modern View
Covers all three aspects of computer design
Instruction set architecture
Organization
Hardware
9
Performance
Assessing the performance of computers can be quite challenging
The scale and intricacy of modern software systems
The wide range of performance improvement techniques employed by hardware
designers
Made performance assessment much more difficult
???
11
Performance…
If you were running a program on two different desktop computers
you'd say that the faster one is the desktop computer that gets the job done first
As an individual computer user, you are interested in reducing response time
the time between the start and completion of a task—also referred to as execution time
If you were running a datacenter that had several servers running jobs
submitted by many users,
you'd say that the faster computer was the one that completed the most jobs during a
day
Datacenter managers are often interested in increasing throughput or bandwidth—
the total amount of work done in a given time
Throughput (bandwidth)
Another measure of performance, it is the number of tasks
13
Performance…
Whether we are interested in throughput or response time,
the key measurement is time. The difference is whether we
measure one task (response time) or many tasks (throughput)
14
Performance…
For two computers X and Y, if the performance of X is
greater than the performance of Y, we have:
15
Performance…
In comparing design alternatives, we often want to
relate the performance of two different computers
quantitatively
We will use the phrase "X is n times faster than Y"—or
equivalently "X is n times as fast as Y"—to mean
16
Performance…
Although as computer users we care about time, when we examine the details of a
computer it's convenient to think about performance in other metrics. In
particular, computer designers may want to think about a computer by using
a measure that relates to how fast the hardware can perform basic functions
Almost all computers are constructed using a clock that determines when events
take place in the hardware. These discrete time intervals are called clock cycles
(or ticks, clock ticks, clock periods, clocks, cycles)
Designers refer to the length of a clock period both as the time for a complete
clock cycle (e.g., 250 picoseconds, or 250 ps) and as the clock rate (e.g., 4 gigahertz,
or 4 GHz), which is the inverse of the clock period
In the next section, we will formalize the relationship between the clock cycles
of the hardware designer and the seconds of the computer user
17
Performance…
Users and designers often examine performance using different metrics
A simple formula relates the most basic metrics (clock cycles and clock
cycle time) to CPU time:
Alternatively, because clock rate and clock cycle time are inverses,
18
Performance…
The performance equations above did not include any reference to the
number of instructions needed for the program. However, the execution
time must depend on the number of instructions in a program
One way to think about execution time is that it equals the number of
instructions executed multiplied by the average time per instruction
Therefore, the number of clock cycles required for a program can be written as
CPU clock cycles = Instructions for a program x Average clock cycles per instruction
19
Performance…
The Classic CPU Performance Equation
Written in terms of instruction count (the number of instructions executed by the program), CPI,
and clock cycle time
These formulas are particularly useful because they separate the three key factors that affect
performance. We can use these formulas to compare two different implementations or to
evaluate a design alternative if we know its impact on these three parameters
CPI provides one way of comparing two different implementations of the same instruction set
architecture, since the number of instructions executed for a program will, of course, be the
same
20
Performance…
We can see how these factors are combined to yield execution
time measured in seconds per program:
Assignment:
Find out how these components affect the factors
in the CPU performance equation
22
SPEC Benchmarks
A computer user who runs the same programs day in and day out would be the
perfect candidate to evaluate a new computer
The set of programs run would form a workload. To evaluate two computer
systems, a user would simply compare the execution time of the workload on the
two computers
Most users, however, are not in this situation. Instead, they must rely on other
methods that measure the performance of a candidate computer, hoping that the
methods will reflect how well the computer will perform with the user's workload
23
SPEC Benchmarks
Definition:
Workload
A set of programs run on a computer that is either the
actual collection of applications run by a user or
constructed from real programs to approximate such a
mix. A typical workload specifies both the programs
and the relative frequencies
Benchmark
A program selected for use in comparing computer
performance
24
SPEC Benchmarks…
One of the most successful attempts to create standardized benchmark
application suites has been the SPEC (Standard Performance Evaluation
Corporation)
Had its roots in the late 1980s efforts to deliver better benchmarks for
workstations
An effort funded and supported by a number of computer vendors to create
standard sets of benchmarks for modern computer systems
Just as the computer industry has evolved over time, so has the need
for different benchmark suites, and there are now SPEC benchmarks
to cover different application classes
All the SPEC benchmark suites and their reported results are found
at www.spec.org
25
SPEC Benchmarks…
Desktop Benchmarks
Desktop benchmarks divide into two broad classes:
Processor-intensive benchmarks and
Graphics-intensive benchmarks, although many graphics benchmarks include intensive processor activity
SPEC originally created a benchmark set focusing on processor performance (initially called SPEC89),
which has evolved into its fifth generation: SPEC CPU2006, which follows SPEC2000, SPEC95, SPEC92, and
SPEC89
SPEC CPU2006 consists of a set of 12 integer benchmarks (CINT2006) and 17 floating-point benchmarks
(CFP2006)
SPEC benchmarks are real programs modified to be portable and to minimize the effect of I/O on
performance.
The integer benchmarks vary from part of a C compiler to a chess program to a quantum computer simulation.
The floating point benchmarks include structured grid codes for finite element modeling, particle method codes
for molecular dynamics, and sparse linear algebra codes for fluid dynamics.
The SPEC CPU suite is useful for processor benchmarking for both desktop systems and single-processor servers
Although SPEC CPU2006 is aimed at processor performance, SPEC also has benchmarks for graphics
and Java
26
Measuring, Reporting and Summarizing
The guiding principle of reporting performance measurements
should be reproducibility :
List everything another experimenter would need to duplicate the
results
An alternative would be to add a weighting factor to each benchmark and use the
weighted arithmetic mean as the single number to summarize performance
The problem would be then how to pick weights; since SPEC is a consortium of competing
companies, each company might have their own favorite set of weights, which would make it
hard to reach consensus
A third approach, rather than pick weights, we could normalize execution times to
a reference computer by dividing the time on the reference computer by the time on
the computer being rated, yielding a ratio proportional to performance
28
Measuring, Reporting and Summarizing…
Summarizing Performance Results…
SPEC uses the third approach, calling the ratio the SPECRatio. It has a
particularly useful property that it matches the way we compare
computer performance—namely, comparing performance ratios
For example, suppose that the SPECRatio of computer A on a benchmark was 1.25
times higher than computer B; then you would know
Notice that the execution times on the reference computer drop out
and the choice of the reference computer is irrelevant when the
comparisons are made as a ratio, which is the approach we consistently
use
29
Measuring, Reporting and Summarizing…
Summarizing Performance Results…
30
Measuring, Reporting and Summarizing…
Summarizing Performance Results…
Because a SPECRatio is a ratio rather than an absolute execution time, the
mean must be computed using the geometric mean. (Since SPECRatios
have no units, comparing SPECRatios arithmetically is meaningless.) The
formula is
32
Amdahl’s Law
Amdahl’s Law
Enable us to calculate the performance gain that can be obtained by
improving some portion of a computer
It states that the performance improvement to be gained from using
some faster mode of execution is limited by the fraction of the time
the faster mode can be used
Amdahl’s Law defines the speedup that can be gained by using a particular
feature
What is speedup?
Speedup tells us how much faster a task will run using the computer with
the enhancement as opposed to the original computer
Speedup is the ratio
33
Performance…
Amdahl’s Law
Amdahl’s Law gives us a quick way to find the speedup from some
enhancement, which depends on two factors:
The fraction of the computation time in the original computer that can be converted to
take advantage of the enhancement :
For example, if 20 seconds of the execution time of a program that takes 60 seconds in
total can use an enhancement, the fraction is 20/60
This value, which we will call Fractionenhanced, is always less than or equal to 1
The improvement gained by the enhanced execution mode; that is, how much faster
the task would run if the enhanced mode were used for the entire program :
This value is the time of the original mode over the time of the enhanced mode
If the enhanced mode takes, say, 2 seconds for a portion of the program, while it is 5
seconds in the original mode, the improvement is 5/2. We will call this value, which is
always greater than 1, Speedupenhanced
34
Performance…
Amdahl’s Law
The execution time using the original computer with the enhanced mode
will be the time spent using the unenhanced portion of the computer plus
the time spent using the enhancement:
35
Summary
36