0% found this document useful (0 votes)
22 views36 pages

Chapter 1 PPT 2007 V 2

Uploaded by

zelalem2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views36 pages

Chapter 1 PPT 2007 V 2

Uploaded by

zelalem2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 1

Introduction

1
Outline
Background
Recap : Computer Architecture
Performance
SPEC Benchmarks
Measuring and summarizing performance
Amdahl’s Law

2
Background
 Computer technology has made incredible progress in the roughly
60 years since the first general-purpose electronic computer was
created

 Today, less than $500 will purchase a personal computer that has
 More performance
 More main memory and
 More disk storage , than a computer bought in 1985 for 1 million dollars

 This rapid improvement has come both


 From advances in the technology used to build computers and
 From innovation in computer design

3
Background…
 The late 1970s saw the emergence of the microprocessor
 The ability of the microprocessor to ride the improvements in integrated circuit
technology led to a higher rate of improvement
 Roughly 35% growth per year in performance

 A combination of architectural and organizational enhancements led to 16 years of


sustained growth in performance at an annual rate of over 50%
 A rate that is unprecedented in the computer industry

 These innovations led to a renaissance in computer design, which emphasized both


architectural innovation and efficient use of technology improvements

 This rate of growth has compounded so that by 2002, high-performance microprocessors


are about seven times faster than what would have been obtained by relying solely on
technology, including improved circuit design
4
Background…
 However, since 2002 this 16-year renaissance is over. Processor
performance improvement has dropped to about 20% per year due to
the triple hurdles of
 Maximum power dissipation of air-cooled chips
 Little instruction-level parallelism left to exploit efficiently, and
 Almost unchanged memory latency

 In 2004 Intel canceled its high-performance uniprocessor projects and


joined IBM and Sun in declaring that the road to higher performance
would be
 via multiple processors per chip rather than via faster uniprocessors

5
Recap : Computer Architecture
 Computer Architecture --- Old View --- Instruction Set Architecture (ISA)
 Refers to the actual programmer visible instruction set
 The ISA serves as the boundary between the software and hardware
software
instruction set architecture
hardware

• The instruction set architecture  interface between software and hardware


• It provides the mechanism by which the software tells the hardware what should be done

 Quick review of the seven dimensions of an ISA:


 Class of ISA
 Memory addressing
 Addressing modes
 Types and sizes of operands
 Operations
 Control flow instructions
 Encoding an ISA
6
Recap : Computer Architecture…
 The implementation of a computer has two components:
 Organization
 Hardware

 The term organization (also called micro architecture) includes the high-level aspects
of a computer’s design, such as
 the memory system
 the memory interconnect
 the design of the internal processor or CPU
 (central processing unit—where arithmetic, logic, branching, and data transfer are implemented)

 For example, two processors with the same instruction set architectures but very
different organizations are
 the AMD Opteron 64 and the Intel Pentium 4. Both processors implement the x86 instruction
set, but they have very different pipeline and cache organizations
 the AMD Opteron and the Intel Core i7.Both processors implement the x86 instruction set , but
they have very different pipeline and cache organizations
7
Recap : Computer Architecture …
 Hardware refers to the specifics of a computer, including
 the detailed logic design
 the packaging technology of the computer

 Often a line of computers contains computers with identical


instruction set architectures and nearly identical organizations, but
they differ in the detailed hardware implementation
 For example,
 the Pentium 4 and the Mobile Pentium 4 are nearly identical, but offer
different clock rates and different memory systems, making the Mobile
Pentium 4 more effective for low-end computers
 the Intel Core i7 and the Intel Xeon 7560 are nearly identical but offer
different clock rates and different memory systems, making the Xeon 7560
more effective for server computers

8
Recap : Computer Architecture …
Computer Architecture --- Modern View
Covers all three aspects of computer design
 Instruction set architecture
 Organization
 Hardware

9
Performance
 Assessing the performance of computers can be quite challenging
 The scale and intricacy of modern software systems
 The wide range of performance improvement techniques employed by hardware
designers
 Made performance assessment much more difficult

 When trying to choose among different computers, performance is an


important attribute

 Accurately measuring and comparing different computers is critical to


purchasers and therefore to designers

 Hence, understanding how best to measure performance and the


limitations of performance measurements is important in selecting a
computer
10
Performance…
Defining Performance
When we say one computer has better performance than
another, what do we mean ???

???
11
Performance…
 If you were running a program on two different desktop computers
 you'd say that the faster one is the desktop computer that gets the job done first
 As an individual computer user, you are interested in reducing response time
 the time between the start and completion of a task—also referred to as execution time

 If you were running a datacenter that had several servers running jobs
submitted by many users,
 you'd say that the faster computer was the one that completed the most jobs during a
day
 Datacenter managers are often interested in increasing throughput or bandwidth—
 the total amount of work done in a given time

 Hence, in most cases, we will need different performance metrics as well as


different sets of applications to benchmark embedded and desktop
computers, which are more focused on response time, versus servers, which
are more focused on throughput
12
Performance…
Definition:
Response time (execution time)
 The total time required for the computer to complete a task,
including disk accesses, memory accesses, I/O activities,
operating system over­head, CPU execution time, and so on

Throughput (bandwidth)
 Another measure of performance, it is the number of tasks

completed per unit time

13
Performance…
Whether we are interested in throughput or response time,
the key measurement is time. The difference is whether we
 measure one task (response time) or many tasks (throughput)

In discussing the performance of computers, we will be


primarily concerned with response time

 To maximize performance, we want to minimize response


time or execution time for some task. Thus, we can relate
performance and execution time for a computer X:

14
Performance…
For two computers X and Y, if the performance of X is
greater than the performance of Y, we have:

That is, the execution time on Y is longer than that on


X, if X is faster than Y

15
Performance…
In comparing design alternatives, we often want to
relate the performance of two different computers
quantitatively

We will use the phrase "X is n times faster than Y"—or
equivalently "X is n times as fast as Y"—to mean

16
Performance…
 Although as computer users we care about time, when we examine the details of a
computer it's convenient to think about performance in other metrics. In
particular, computer designers may want to think about a computer by using
a measure that relates to how fast the hardware can perform basic functions

 Almost all computers are constructed using a clock that determines when events
take place in the hardware. These discrete time intervals are called clock cycles
(or ticks, clock ticks, clock periods, clocks, cycles)

 Designers refer to the length of a clock period both as the time for a complete
clock cycle (e.g., 250 picoseconds, or 250 ps) and as the clock rate (e.g., 4 gigahertz,
or 4 GHz), which is the inverse of the clock period

 In the next section, we will formalize the relationship between the clock cycles
of the hardware designer and the seconds of the computer user

17
Performance…
 Users and designers often examine performance using different metrics

 If we could relate these different metrics, we could determine the effect


of a design change on the performance as experienced by the user

 A simple formula relates the most basic metrics (clock cycles and clock
cycle time) to CPU time:

 Alternatively, because clock rate and clock cycle time are inverses,

18
Performance…
 The performance equations above did not include any reference to the
number of instructions needed for the program. However, the execution
time must depend on the number of instructions in a program

 One way to think about execution time is that it equals the number of
instructions executed multiplied by the average time per instruction

 Therefore, the number of clock cycles required for a program can be written as
 CPU clock cycles = Instructions for a program x Average clock cycles per instruction

 Since different instructions may take different amounts of time depending on


what they do, CPI is an average of all the instructions executed in the program

19
Performance…
 The Classic CPU Performance Equation
 Written in terms of instruction count (the number of instructions executed by the program), CPI,
and clock cycle time

 These formulas are particularly useful because they separate the three key factors that affect
performance. We can use these formulas to compare two different implementations or to
evaluate a design alternative if we know its impact on these three parameters

 CPI provides one way of comparing two different implementations of the same instruction set
architecture, since the number of instructions executed for a program will, of course, be the
same

 clock cycles per instruction (CPI)


 Average number of clock cycles per instruction for a program or program fragment

20
Performance…
We can see how these factors are combined to yield execution
time measured in seconds per program:

Always bear in mind that the only complete and reliable


measure of computer performance is time

When comparing two computers, you must look at all three


components, which combine to form execution time. If some
of the factors are identical, like the clock rate in the above
example, performance can be determined by comparing all
the non identical factors
21
Performance…
Understanding program performance:
The performance of a program depends on the
algorithm, the language, the compiler, the
architecture, and the actual hardware

Assignment:
 Find out how these components affect the factors
in the CPU performance equation

22
SPEC Benchmarks
 A computer user who runs the same programs day in and day out would be the
perfect candidate to evaluate a new computer

 The set of programs run would form a workload. To evaluate two computer
systems, a user would simply compare the execution time of the workload on the
two computers

 Most users, however, are not in this situation. Instead, they must rely on other
methods that measure the performance of a candidate computer, hoping that the
methods will reflect how well the computer will perform with the user's workload

 This alternative is usually followed by evaluating the computer using a set of


benchmarks—programs specifically chosen to measure performance. The
benchmarks form a workload that the user hopes will predict the performance of
the actual workload

23
SPEC Benchmarks
Definition:
 Workload
 A set of programs run on a computer that is either the
actual collection of applications run by a user or
constructed from real programs to approximate such a
mix. A typical workload specifies both the programs
and the relative frequencies

 Benchmark
 A program selected for use in comparing computer
performance
24
SPEC Benchmarks…
 One of the most successful attempts to create standardized benchmark
application suites has been the SPEC (Standard Performance Evaluation
Corporation)
 Had its roots in the late 1980s efforts to deliver better benchmarks for
workstations
 An effort funded and supported by a number of computer vendors to create
standard sets of benchmarks for modern computer systems

 Just as the computer industry has evolved over time, so has the need
for different benchmark suites, and there are now SPEC benchmarks
to cover different application classes

 All the SPEC benchmark suites and their reported results are found
at www.spec.org
25
SPEC Benchmarks…
 Desktop Benchmarks
 Desktop benchmarks divide into two broad classes:
 Processor-intensive benchmarks and
 Graphics-intensive benchmarks, although many graphics benchmarks include intensive processor activity

 SPEC originally created a benchmark set focusing on processor performance (initially called SPEC89),
which has evolved into its fifth generation: SPEC CPU2006, which follows SPEC2000, SPEC95, SPEC92, and
SPEC89
 SPEC CPU2006 consists of a set of 12 integer benchmarks (CINT2006) and 17 floating-point benchmarks
(CFP2006)

 SPEC benchmarks are real programs modified to be portable and to minimize the effect of I/O on
performance.
 The integer benchmarks vary from part of a C compiler to a chess program to a quantum computer simulation.
 The floating point benchmarks include structured grid codes for finite element modeling, particle method codes
for molecular dynamics, and sparse linear algebra codes for fluid dynamics.
 The SPEC CPU suite is useful for processor benchmarking for both desktop systems and single-processor servers

 Although SPEC CPU2006 is aimed at processor performance, SPEC also has benchmarks for graphics
and Java

26
Measuring, Reporting and Summarizing
 The guiding principle of reporting performance measurements
should be reproducibility :
 List everything another experimenter would need to duplicate the
results

 A SPEC benchmark report requires an extensive description of


the computer and the compiler flags, as well as the publication
of both the baseline and optimized results

 In addition to hardware, software, and baseline tuning


parameter descriptions, a SPEC report contains the actual
performance times, shown both in tabular form and as a graph
27
Measuring, Reporting and Summarizing…
Summarizing Performance Results
 Once we have chosen to measure performance with a benchmark suite, we would like
to be able to summarize the performance results of the suite in a single number

 A straightforward approach to computing a summary result would be to compare the


arithmetic means of the execution times of the programs in the suite
 Some SPEC programs take four times longer than others, so those programs would be much
more important if the arithmetic mean were the single number used to summarize performance

 An alternative would be to add a weighting factor to each benchmark and use the
weighted arithmetic mean as the single number to summarize performance
 The problem would be then how to pick weights; since SPEC is a consortium of competing
companies, each company might have their own favorite set of weights, which would make it
hard to reach consensus

 A third approach, rather than pick weights, we could normalize execution times to
a reference computer by dividing the time on the reference computer by the time on
the computer being rated, yielding a ratio proportional to performance
28
Measuring, Reporting and Summarizing…
Summarizing Performance Results…
 SPEC uses the third approach, calling the ratio the SPECRatio. It has a
particularly useful property that it matches the way we compare
computer performance—namely, comparing performance ratios
 For example, suppose that the SPECRatio of computer A on a benchmark was 1.25
times higher than computer B; then you would know

 Notice that the execution times on the reference computer drop out
and the choice of the reference computer is irrelevant when the
comparisons are made as a ratio, which is the approach we consistently
use
29
Measuring, Reporting and Summarizing…
Summarizing Performance Results…

30
Measuring, Reporting and Summarizing…
Summarizing Performance Results…
 Because a SPECRatio is a ratio rather than an absolute execution time, the
mean must be computed using the geometric mean. (Since SPECRatios
have no units, comparing SPECRatios arithmetically is meaningless.) The
formula is

samplei is the SPECRatio for program I

 Using the geometric mean ensures two important properties:


 The geometric mean of the ratios is the same as the ratio of the geometric means
 The ratio of the geometric means is equal to the geometric mean of the performance ratios, which implies that
the choice of the reference computer is irrelevant

 Hence, the motivations to use the geometric mean are substantial,


especially when we use performance ratios to make comparisons.
31
Measuring, Reporting and Summarizing…
Summarizing Performance Results…

32
Amdahl’s Law
Amdahl’s Law
 Enable us to calculate the performance gain that can be obtained by
improving some portion of a computer
 It states that the performance improvement to be gained from using
some faster mode of execution is limited by the fraction of the time
the faster mode can be used

 Amdahl’s Law defines the speedup that can be gained by using a particular
feature

 What is speedup?
 Speedup tells us how much faster a task will run using the computer with
the enhancement as opposed to the original computer
 Speedup is the ratio
33
Performance…
 Amdahl’s Law
 Amdahl’s Law gives us a quick way to find the speedup from some
enhancement, which depends on two factors:

 The fraction of the computation time in the original computer that can be converted to
take advantage of the enhancement :
 For example, if 20 seconds of the execution time of a program that takes 60 seconds in
total can use an enhancement, the fraction is 20/60
 This value, which we will call Fractionenhanced, is always less than or equal to 1

 The improvement gained by the enhanced execution mode; that is, how much faster
the task would run if the enhanced mode were used for the entire program :
 This value is the time of the original mode over the time of the enhanced mode
 If the enhanced mode takes, say, 2 seconds for a portion of the program, while it is 5
seconds in the original mode, the improvement is 5/2. We will call this value, which is
always greater than 1, Speedupenhanced

34
Performance…
Amdahl’s Law
 The execution time using the original computer with the enhanced mode
will be the time spent using the unenhanced portion of the computer plus
the time spent using the enhancement:

35
Summary

36

You might also like