Computer Organization and Design 2nd Edition David A. pdf download
Computer Organization and Design 2nd Edition David A. pdf download
https://ebookname.com/product/computer-organization-and-
design-2nd-edition-david-a/
https://ebookname.com/product/computer-organization-and-design-
the-hardware-software-interface-3rd-edition-david-a-patterson/
https://ebookname.com/product/digital-design-and-computer-
organization-1st-edition-hassan-a-farhat/
https://ebookname.com/product/structured-computer-
organization-6ed-edition-tanenbaum-a-s/
https://ebookname.com/product/molecular-epidemiology-of-
microorganisms-methods-and-protocols-1st-edition-michel-
tibayrenc-auth/
I Could Tell You But Then You Would Have to be
Destroyed by Me Trevor Paglen
https://ebookname.com/product/i-could-tell-you-but-then-you-
would-have-to-be-destroyed-by-me-trevor-paglen/
https://ebookname.com/product/quadrature-domains-and-their-
applications-the-harold-s-shapiro-anniversary-volume-operator-
theory-advances-and-applications-15-1st-edition-peter-ebenfelt/
https://ebookname.com/product/inherit-the-holy-mountain-religion-
and-the-rise-of-american-environmentalism-1st-edition-stoll/
https://ebookname.com/product/group-work-with-persons-with-
disabilities-1st-edition-bauman/
https://ebookname.com/product/ghettostadt-lodz-and-the-making-of-
a-nazi-city-1st-edition-gordon-j-horwitz/
Mexican Labor Migrants and U S Immigration Policies
From Sojourner to Emigrant 1st Edition Florian K.
Kaufmann
https://ebookname.com/product/mexican-labor-migrants-and-u-s-
immigration-policies-from-sojourner-to-emigrant-1st-edition-
florian-k-kaufmann/
1 Fundamentals of
Computer Design 1
Exercises 60
1.1 Introduction
Computer technology has made incredible progress in the past half century. In
1945, there were no stored-program computers. Today, a few thousand dollars
will purchase a personal computer that has more performance, more main memo-
ry, and more disk storage than a computer bought in 1965 for $1 million. This
rapid rate of improvement has come both from advances in the technology used
to build computers and from innovation in computer design. While technological
improvements have been fairly steady, progress arising from better computer
architectures has been much less consistent. During the first 25 years of elec-
tronic computers, both forces made a major contribution; but beginning in about
1970, computer designers became largely dependent upon integrated circuit tech-
nology. During the 1970s, performance continued to improve at about 25% to
30% per year for the mainframes and minicomputers that dominated the industry.
The late 1970s saw the emergence of the microprocessor. The ability of the
microprocessor to ride the improvements in integrated circuit technology more
closely than the less integrated mainframes and minicomputers led to a higher
rate of improvement—roughly 35% growth per year in performance.
2 Chapter 1 Fundamentals of Computer Design
350
DEC Alpha
300
200
SPECint rating
DEC Alpha
150
IBM Power2
DEC Alpha
100
1.35x per year
HP
9000
50 MIPS IBM
MIPS Power1
R3000
SUN4 R2000
0
4
5
8
9
19
19
19
19
19
19
19
19
19
19
19
19
Year
FIGURE 1.1 Growth in microprocessor performance since the mid 1980s has been substantially higher than in ear-
lier years. This chart plots the performance as measured by the SPECint benchmarks. Prior to the mid 1980s, micropro-
cessor performance growth was largely technology driven and averaged about 35% per year. The increase in growth since
then is attributable to more advanced architectural ideas. By 1995 this growth leads to more than a factor of five difference
in performance. Performance for floating-point-oriented calculations has increased even faster.
1.2 The Task of a Computer Designer 3
The effect of this dramatic growth rate has been twofold. First, it has signifi-
cantly enhanced the capability available to computer users. As a simple example,
consider the highest-performance workstation announced in 1993, an IBM
Power-2 machine. Compared with a CRAY Y-MP supercomputer introduced in
1988 (probably the fastest machine in the world at that point), the workstation of-
fers comparable performance on many floating-point programs (the performance
for the SPEC floating-point benchmarks is similar) and better performance on in-
teger programs for a price that is less than one-tenth of the supercomputer!
Second, this dramatic rate of improvement has led to the dominance of micro-
processor-based computers across the entire range of the computer design. Work-
stations and PCs have emerged as major products in the computer industry.
Minicomputers, which were traditionally made from off-the-shelf logic or from
gate arrays, have been replaced by servers made using microprocessors. Main-
frames are slowly being replaced with multiprocessors consisting of small num-
bers of off-the-shelf microprocessors. Even high-end supercomputers are being
built with collections of microprocessors.
Freedom from compatibility with old designs and the use of microprocessor
technology led to a renaissance in computer design, which emphasized both ar-
chitectural innovation and efficient use of technology improvements. This renais-
sance is responsible for the higher performance growth shown in Figure 1.1—a
rate that is unprecedented in the computer industry. This rate of growth has com-
pounded so that by 1995, the difference between the highest-performance micro-
processors and what would have been obtained by relying solely on technology is
more than a factor of five. This text is about the architectural ideas and accom-
panying compiler improvements that have made this incredible growth rate possi-
ble. At the center of this dramatic revolution has been the development of a
quantitative approach to computer design and analysis that uses empirical obser-
vations of programs, experimentation, and simulation as its tools. It is this style
and approach to computer design that is reflected in this text.
Sustaining the recent improvements in cost and performance will require con-
tinuing innovations in computer design, and the authors believe such innovations
will be founded on this quantitative approach to computer design. Hence, this
book has been written not only to document this design style, but also to stimu-
late you to contribute to this progress.
packaging, power, and cooling. Optimizing the design requires familiarity with a
very wide range of technologies, from compilers and operating systems to logic
design and packaging.
In the past, the term computer architecture often referred only to instruction
set design. Other aspects of computer design were called implementation, often
insinuating that implementation is uninteresting or less challenging. The authors
believe this view is not only incorrect, but is even responsible for mistakes in the
design of new instruction sets. The architect’s or designer’s job is much more
than instruction set design, and the technical hurdles in the other aspects of the
project are certainly as challenging as those encountered in doing instruction set
design. This is particularly true at the present when the differences among in-
struction sets are small (see Appendix C).
In this book the term instruction set architecture refers to the actual programmer-
visible instruction set. The instruction set architecture serves as the boundary be-
tween the software and hardware, and that topic is the focus of Chapter 2. The im-
plementation of a machine has two components: organization and hardware. The
term organization includes the high-level aspects of a computer’s design, such as
the memory system, the bus structure, and the internal CPU (central processing
unit—where arithmetic, logic, branching, and data transfer are implemented)
design. For example, two machines with the same instruction set architecture but
different organizations are the SPARCstation-2 and SPARCstation-20. Hardware
is used to refer to the specifics of a machine. This would include the detailed
logic design and the packaging technology of the machine. Often a line of ma-
chines contains machines with identical instruction set architectures and nearly
identical organizations, but they differ in the detailed hardware implementation.
For example, two versions of the Silicon Graphics Indy differ in clock rate and in
detailed cache structure. In this book the word architecture is intended to cover
all three aspects of computer design—instruction set architecture, organization,
and hardware.
Computer architects must design a computer to meet functional requirements
as well as price and performance goals. Often, they also have to determine what
the functional requirements are, and this can be a major task. The requirements
may be specific features, inspired by the market. Application software often
drives the choice of certain functional requirements by determining how the ma-
chine will be used. If a large body of software exists for a certain instruction set
architecture, the architect may decide that a new machine should implement an
existing instruction set. The presence of a large market for a particular class of
applications might encourage the designers to incorporate requirements that
would make the machine competitive in that market. Figure 1.2 summarizes
some requirements that need to be considered in designing a new machine. Many
of these requirements and features will be examined in depth in later chapters.
Once a set of functional requirements has been established, the architect must
try to optimize the design. Which design choices are optimal depends, of course,
on the choice of metrics. The most common metrics involve cost and perfor-
1.2 The Task of a Computer Designer 5
mance. Given some application domain, the architect can try to quantify the per-
formance of the machine by a set of programs that are chosen to represent that
application domain. Other measurable requirements may be important in some
markets; reliability and fault tolerance are often crucial in transaction processing
environments. Throughout this text we will focus on optimizing machine cost/
performance.
In choosing between two designs, one factor that an architect must consider is
design complexity. Complex designs take longer to complete, prolonging time to
market. This means a design that takes longer will need to have higher perfor-
mance to be competitive. The architect must be constantly aware of the impact of
his design choices on the design time for both hardware and software.
In addition to performance, cost is the other key parameter in optimizing cost/
performance. In addition to cost, designers must be aware of important trends in
both the implementation technology and the use of computers. Such trends not
only impact future cost, but also determine the longevity of an architecture. The
next two sections discuss technology and cost trends.
6 Chapter 1 Fundamentals of Computer Design
thresholds that can enable an implementation technique that was previously im-
possible. For example, when MOS technology reached the point where it could
put between 25,000 and 50,000 transistors on a single chip in the early 1980s, it
became possible to build a 32-bit microprocessor on a single chip. By eliminating
chip crossings within the processor, a dramatic increase in cost/performance was
possible. This design was simply infeasible until the technology reached a certain
point. Such technology thresholds are not rare and have a significant impact on a
wide variety of design decisions.
in Figure 1.3, where the cost of a new DRAM chip is depicted over its lifetime.
Between the start of a project and the shipping of a product, say two years, the
cost of a new DRAM drops by a factor of between five and 10 in constant dollars.
Since not all component costs change at the same rate, designs based on project-
ed costs result in different cost/performance trade-offs than those using current
costs. The caption of Figure 1.3 discusses some of the long-term trends in DRAM
cost.
80
16 MB
70
60
50
4 MB
1 MB
Dollars per 40
DRAM chip 256 KB
30
Final chip cost
64 KB
20
10 16 KB
0
8
5
7
9
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
Year
FIGURE 1.3 Prices of four generations of DRAMs over time in 1977 dollars, showing the learning curve at work. A
1977 dollar is worth about $2.44 in 1995; most of this inflation occurred in the period of 1977–82, during which the value
changed to $1.61. The cost of a megabyte of memory has dropped incredibly during this period, from over $5000 in 1977 to
just over $6 in 1995 (in 1977 dollars)! Each generation drops in constant dollar price by a factor of 8 to 10 over its lifetime.
The increasing cost of fabrication equipment for each new generation has led to slow but steady increases in both the start-
ing price of a technology and the eventual, lowest price. Periods when demand exceeded supply, such as 1987–88 and
1992–93, have led to temporary higher pricing, which shows up as a slowing in the rate of price decrease.
10 Chapter 1 Fundamentals of Computer Design
In this section, we focus on the cost of dies, summarizing the key issues in testing
and packaging at the end. A longer discussion of the testing costs and packaging
costs appears in the Exercises.
1.4 Cost and Trends in Cost 11
FIGURE 1.4 Photograph of an 8-inch wafer containing Intel Pentium microprocessors. The die size is 480.7 mm2
and the total number of dies is 63. (Courtesy Intel.)
FIGURE 1.5 Photograph of an 8-inch wafer containing PowerPC 601 microprocessors. The die size is 122 mm2. The
number of dies on the wafer is 200 after subtracting the test dies (the odd-looking dies that are scattered around). (Courtesy
IBM.)
12 Chapter 1 Fundamentals of Computer Design
To learn how to predict the number of good chips per wafer requires first
learning how many dies fit on a wafer and then learning how to predict the per-
centage of those that will work. From there it is simple to predict cost:
Cost of wafer
Cost of die = ---------------------------------------------------------------
Dies per wafer × Die yield
The most interesting feature of this first term of the chip cost equation is its sensi-
tivity to die size, shown below.
The number of dies per wafer is basically the area of the wafer divided by the
area of the die. It can be more accurately estimated by
2
π × ( Wafer diameter/2 ) π × Wafer diameter
Dies per wafer = ----------------------------------------------------------- – -----------------------------------------------
Die area 2 × Die area
The first term is the ratio of wafer area (πr2) to die area. The second compensates
for the “square peg in a round hole” problem—rectangular dies near the periphery
of round wafers. Dividing the circumference (πd) by the diagonal of a square die is
approximately the number of dies along the edge. For example, a wafer 20 cm (≈ 8
inch) in diameter produces 3.14 × 100 – ( 3.14 × 20 ⁄ 1.41 ) = 269 1-cm dies.
EXAMPLE Find the number of dies per 20-cm wafer for a die that is 1.5 cm on a side.
2
π × ( 20 ⁄ 2 ) π × 20 314 62.8
Dies per wafer = ------------------------------ – ------------------------ = ---------- – ---------- = 110
2.25 2 × 2.25 2.25 2.12
■
But this only gives the maximum number of dies per wafer. The critical ques-
tion is, What is the fraction or percentage of good dies on a wafer number, or the
die yield? A simple empirical model of integrated circuit yield, which assumes
that defects are randomly distributed over the wafer and that yield is inversely
proportional to the complexity of the fabrication process, leads to the following:
where wafer yield accounts for wafers that are completely bad and so need not be
tested. For simplicity, we’ll just assume the wafer yield is 100%. Defects per unit
area is a measure of the random and manufacturing defects that occur. In 1995,
these values typically range between 0.6 and 1.2 per square centimeter, depend-
ing on the maturity of the process (recall the learning curve, mentioned earlier).
Lastly, α is a parameter that corresponds roughly to the number of masking lev-
els, a measure of manufacturing complexity, critical to die yield. For today’s mul-
tilevel metal CMOS processes, a good estimate is α = 3.0.
1.4 Cost and Trends in Cost 13
EXAMPLE Find the die yield for dies that are 1 cm on a side and 1.5 cm on a side,
assuming a defect density of 0.8 per cm2.
ANSWER The total die areas are 1 cm2 and 2.25 cm2. For the smaller die the yield is
0.8 × 1 – 3
Die yield = 1 + ---------------- = 0.49
3
0.8 × 2.25 – 3
Die yield = 1 + ------------------------ = 0.24
3
■
The bottom line is the number of good dies per wafer, which comes from mul-
tiplying dies per wafer by die yield. The examples above predict 132 good 1-cm2
dies from the 20-cm wafer and 26 good 2.25-cm2 dies. Most high-end micro-
processors fall between these two sizes, with some being as large as 2.75 cm2 in
1995. Low-end processors are sometimes as small as 0.8 cm2, while processors
used for embedded control (in printers, automobiles, etc.) are often just 0.5 cm2.
(Figure 1.22 on page 63 in the Exercises shows the die size and technology for sev-
eral current microprocessors.) Occasionally dies become pad limited: the amount
of die area is determined by the perimeter rather than the logic in the interior. This
may lead to a higher yield, since defects in empty silicon are less serious!
Processing a 20-cm-diameter wafer in a leading-edge technology with 3–4
metal layers costs between $3000 and $4000 in 1995. Assuming a processed wa-
fer cost of $3500, the cost of the 1-cm2 die is around $27, while the cost per die
of the 2.25-cm2 die is about $140, or slightly over 5 times the cost for a die that is
2.25 times larger.
What should a computer designer remember about chip costs? The manufac-
turing process dictates the wafer cost, wafer yield, α, and defects per unit area, so
the sole control of the designer is die area. Since α is typically 3 for the advanced
processes in use today, die costs are proportional to the fourth (or higher) power
of the die area:
Cost of die = f (Die area4)
The computer designer affects die size, and hence cost, both by what functions
are included on or excluded from the die and by the number of I/O pins.
Before we have a part that is ready for use in a computer, the part must be
tested (to separate the good dies from the bad), packaged, and tested again after
packaging. These steps all add costs. These processes and their contribution to
cost are discussed and evaluated in Exercise 1.8.
14 Chapter 1 Fundamentals of Computer Design
before it becomes price, and the computer designer should understand how a de-
sign decision will affect the potential selling price. For example, changing cost
by $1000 may change price by $3000 to $4000. Without understanding the rela-
tionship of cost to price the computer designer may not understand the impact on
price of adding, deleting, or replacing components. The relationship between
price and volume can increase the impact of changes in cost, especially at the low
end of the market. Typically, fewer computers are sold as the price increases. Fur-
thermore, as volume decreases, costs rise, leading to further increases in price.
Thus, small changes in cost can have a larger than obvious impact. The relation-
ship between cost and price is a complex one with entire books written on the
subject. The purpose of this section is to give you a simple introduction to what
factors determine price and typical ranges for these factors.
The categories that make up price can be shown either as a tax on cost or as a
percentage of the price. We will look at the information both ways. These differ-
ences between price and cost also depend on where in the computer marketplace
a company is selling. To show these differences, Figures 1.7 and 1.8 on page 16
show how the difference between cost of materials and list price is decomposed,
with the price increasing from left to right as we add each type of overhead.
Direct costs refer to the costs directly related to making a product. These in-
clude labor costs, purchasing components, scrap (the leftover from yield), and
warranty, which covers the costs of systems that fail at the customer’s site during
the warranty period. Direct cost typically adds 20% to 40% to component cost.
Service or maintenance costs are not included because the customer typically
pays those costs, although a warranty allowance may be included here or in gross
margin, discussed next.
The next addition is called the gross margin, the company’s overhead that can-
not be billed directly to one product. This can be thought of as indirect cost. It in-
cludes the company’s research and development (R&D), marketing, sales,
manufacturing equipment maintenance, building rental, cost of financing, pretax
profits, and taxes. When the component costs are added to the direct cost and
gross margin, we reach the average selling price—ASP in the language of
MBAs—the money that comes directly to the company for each product sold.
The gross margin is typically 20% to 55% of the average selling price, depending
on the uniqueness of the product. Manufacturers of low-end PCs generally have
lower gross margins for several reasons. First, their R&D expenses are lower.
Second, their cost of sales is lower, since they use indirect distribution (by mail,
phone order, or retail store) rather than salespeople. Third, because their products
are less unique, competition is more intense, thus forcing lower prices and often
lower profits, which in turn lead to a lower gross margin.
List price and average selling price are not the same. One reason for this is that
companies offer volume discounts, lowering the average selling price. Also, if the
product is to be sold in retail stores, as personal computers are, stores want to
keep 40% to 50% of the list price for themselves. Thus, depending on the distri-
bution system, the average selling price is typically 50% to 75% of the list price.
16 Chapter 1 Fundamentals of Computer Design
List
price
Average
33.3% discount
Average
selling
price
50%
Gross Gross
margin 33.3% margin
FIGURE 1.7 The components of price for a mid-range product in a workstation com-
pany. Each increase is shown along the bottom as a tax on the prior price. The percentages
of the new price for all elements are shown on the left of each column.
List
price
45%
Average
discount
Average
selling
price Gross Gross
25% margin 14%
margin
FIGURE 1.8 The components of price for a desktop product in a personal computer
company. A larger average discount is used because of indirect selling, and a lower gross
margin is required.
1.4 Cost and Trends in Cost 17
Execution time Y
---------------------------------------- = n
Execution time X
The phrase “the throughput of X is 1.3 times higher than Y” signifies here that
the number of tasks completed per unit time on machine X is 1.3 times the num-
ber completed on Y.
Because performance and execution time are reciprocals, increasing perfor-
mance decreases execution time. To help avoid confusion between the terms
increasing and decreasing, we usually say “improve performance” or “improve
execution time” when we mean increase performance and decrease execution
time.
Whether we are interested in throughput or response time, the key measure-
ment is time: The computer that performs the same amount of work in the least
time is the fastest. The difference is whether we measure one task (response time)
or many tasks (throughput). Unfortunately, time is not always the metric quoted
in comparing the performance of computers. A number of popular measures have
been adopted in the quest for a easily understood, universal measure of computer
performance, with the result that a few innocent terms have been shanghaied
from their well-defined environment and forced into a service for which they
were never intended. The authors’ position is that the only consistent and reliable
measure of performance is the execution time of real programs, and that all pro-
posed alternatives to time as the metric or to real programs as the items measured
1.5 Measuring and Reporting Performance 19
Measuring Performance
Even execution time can be defined in different ways depending on what we
count. The most straightforward definition of time is called wall-clock time, re-
sponse time, or elapsed time, which is the latency to complete a task, including
disk accesses, memory accesses, input/output activities, operating system over-
head—everything. With multiprogramming the CPU works on another program
while waiting for I/O and may not necessarily minimize the elapsed time of one
program. Hence we need a term to take this activity into account. CPU time rec-
ognizes this distinction and means the time the CPU is computing, not including
the time waiting for I/O or running other programs. (Clearly the response time
seen by the user is the elapsed time of the program, not the CPU time.) CPU time
can be further divided into the CPU time spent in the program, called user CPU
time, and the CPU time spent in the operating system performing tasks requested
by the program, called system CPU time.
These distinctions are reflected in the UNIX time command, which returns
four measurements when applied to an executing program:
90.7u 12.9s 2:39 65%
User CPU time is 90.7 seconds, system CPU time is 12.9 seconds, elapsed time is
2 minutes and 39 seconds (159 seconds), and the percentage of elapsed time that
is CPU time is (90.7 + 12.9)/159 or 65%. More than a third of the elapsed time in
this example was spent waiting for I/O or running other programs or both. Many
measurements ignore system CPU time because of the inaccuracy of operating
systems’ self-measurement (the above inaccurate measurement came from UNIX)
and the inequity of including system CPU time when comparing performance be-
tween machines with differing system codes. On the other hand, system code on
some machines is user code on others, and no program runs without some operat-
ing system running on the hardware, so a case can be made for using the sum of
user CPU time and system CPU time.
In the present discussion, a distinction is maintained between performance
based on elapsed time and that based on CPU time. The term system performance
is used to refer to elapsed time on an unloaded system, while CPU performance
refers to user CPU time on an unloaded system. We will concentrate on CPU per-
formance in this chapter.
20 Chapter 1 Fundamentals of Computer Design
This program is the result of extensive research to determine the instruction mix
of a typical Fortran program. The results of this program on different machines
should give a good indication of which machine performs better under a typical
load of Fortran programs. The statements are purposely arranged to defeat opti-
mizations by the compiler.
H. J. Curnow and B. A. Wichmann [1976], Comments in the Whetstone Benchmark
A computer user who runs the same programs day in and day out would be the
perfect candidate to evaluate a new computer. To evaluate a new system the user
would simply compare the execution time of her workload—the mixture of pro-
grams and operating system commands that users run on a machine. Few are in
this happy situation, however. Most must rely on other methods to evaluate ma-
chines and often other evaluators, hoping that these methods will predict per-
formance for their usage of the new machine. There are four levels of programs
used in such circumstances, listed below in decreasing order of accuracy of pre-
diction.
1. Real programs—While the buyer may not know what fraction of time is spent
on these programs, she knows that some users will run them to solve real prob-
lems. Examples are compilers for C, text-processing software like TeX, and CAD
tools like Spice. Real programs have input, output, and options that a user can se-
lect when running the program.
2. Kernels—Several attempts have been made to extract small, key pieces from
real programs and use them to evaluate performance. Livermore Loops and Lin-
pack are the best known examples. Unlike real programs, no user would run kernel
programs, for they exist solely to evaluate performance. Kernels are best used to
isolate performance of individual features of a machine to explain the reasons for
differences in performance of real programs.
3. Toy benchmarks—Toy benchmarks are typically between 10 and 100 lines of
code and produce a result the user already knows before running the toy program.
Programs like Sieve of Eratosthenes, Puzzle, and Quicksort are popular because
they are small, easy to type, and run on almost any computer. The best use of such
programs is beginning programming assignments.
4. Synthetic benchmarks—Similar in philosophy to kernels, synthetic bench-
marks try to match the average frequency of operations and operands of a large set
of programs. Whetstone and Dhrystone are the most popular synthetic benchmarks.
1.5 Measuring and Reporting Performance 21
A description of these benchmarks and some of their flaws appears in section 1.8
on page 44. No user runs synthetic benchmarks, because they don’t compute any-
thing a user could want. Synthetic benchmarks are, in fact, even further removed
from reality because kernel code is extracted from real programs, while synthetic
code is created artificially to match an average execution profile. Synthetic bench-
marks are not even pieces of real programs, while kernels might be.
Benchmark Suites
Recently, it has become popular to put together collections of benchmarks to try
to measure the performance of processors with a variety of applications. Of
course, such suites are only as good as the constituent individual benchmarks.
Nonetheless, a key advantage of such suites is that the weakness of any one
benchmark is lessened by the presence of the other benchmarks. This is especial-
ly true if the methods used for summarizing the performance of the benchmark
suite reflect the time to run the entire suite, as opposed to rewarding performance
increases on programs that may be defeated by targeted optimizations. In the re-
mainder of this section, we discuss the strengths and weaknesses of different
methods for summarizing performance.
Benchmark suites are made of collections of programs, some of which may be
kernels, but many of which are typically real programs. Figure 1.9 describes the
programs in the popular SPEC92 benchmark suite used to characterize perfor-
mance in the workstation and server markets.The programs in SPEC92 vary from
collections of kernels (nasa7) to small, program fragments (tomcatv, ora, alvinn,
swm256) to applications of varying size (spice2g6, gcc, compress). We will see
data on many of these programs throughout this text. In the next subsection, we
show how a SPEC92 report describes the machine, compiler, and OS configura-
tion, while in section 1.8 we describe some of the pitfalls that have occurred in
attempting to develop the benchmark suite and to prevent the benchmark circum-
vention that makes the results not useful for comparing performance among
machines.
22 Chapter 1 Fundamentals of Computer Design
Hardware Software
Model number Powerstation 590 O/S and version AIX version 3.2.5
CPU 66.67 MHz POWER2 Compilers and version C SET++ for AIX C/C++ version 2.1
XL FORTRAN/6000 version 3.1
FPU Integrated Other software See below
Number of CPUs 1 File system type AIX/JFS
Primary cache 32KBI+256KBD off chip System state Single user
Secondary cache None
Other cache None
Memory 128 MB
Disk subsystem 2x2.0 GB
Other hardware None
SPECbase_fp92 tuning parameters/notes/summary of changes:
FORTRAN flags: -O3 -qarch=pwrx -qhsflt -qnofold -bnso -BI:/lib/syscalss.exp
C flags: -O3 -qarch=pwrx -Q -qtune=pwrx -qhssngl -bnso -bI:/lib/syscalls.exp
FIGURE 1.10 The machine, software, and baseline tuning parameters for the SPECfp92 report on an IBM RS/6000
Powerstation 590. SPECfp92 means that this is the report for the floating-point (FP) benchmarks in the 1992 release (the
earlier release was renamed SPEC89) The top part of the table describes the hardware and software. The bottom describes
the compiler and options used for the baseline measurements, which must use one compiler and one set of flags for all the
benchmarks in the same language. The tuning parameters and flags for the tuned SPEC92 performance are given in Figure
1.18 on page 49. Data from SPEC [1994].
We would like to think that if we could just agree on the programs, the experi-
mental environments, and the definition of faster, then misunderstandings would
be avoided, leaving the networks free for scholarly discourse. Unfortunately,
that’s not the reality. Once we agree on the basics, battles are then fought over
what is the fair way to summarize relative performance of a collection of pro-
grams. For example, two articles on summarizing performance in the same jour-
nal took opposing points of view. Figure 1.11, taken from one of the articles, is an
example of the confusion that can arise.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com