0% found this document useful (0 votes)

26 views

Chapter 2

Uploaded by

ivanmcwils022

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Chapter 2

Uploaded by

ivanmcwils022

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Computer Organization and Architecture

Designing for Performance

11th Edition

Chapter 2
Performance Issues

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Designing for Performance
• The cost of computer systems continues to drop dramatically, while the performance and capacity
of those systems continue to rise equally dramatically
• Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago
• Processors are so inexpensive that we now have microprocessors we throw away
• Desktop applications that require the great power of today’s microprocessor-based systems
include:
– Image processing
– Three-dimensional rendering
– Speech recognition
– Videoconferencing
– Multimedia authoring
– Voice and video annotation of files
– Simulation modeling
• Businesses are relying on increasingly powerful servers to handle transaction and database
processing and to support massive client/server networks that have replaced the huge mainframe
computer centers of yesteryear
• Cloud service providers use massive high-performance banks of servers to satisfy high-volume,
high-transaction-rate applications for a broad spectrum of clients

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Microprocessor Speed
Techniques built into contemporary processors include:
Pipelining
• Processor moves data or instructions into a
conceptual pipe with all stages of the pipe processing
simultaneously

Branch prediction
• Processor looks ahead in the instruction code fetched
from memory and predicts which branches, or groups
of instructions, are likely to be processed next

Superscalar • This is the ability to issue more than one instruction in

every processor clock cycle. (In effect, multiple
execution parallel pipelines are used.)

Data flow analysis

• Processor analyzes which instructions are dependent
on each other’s results, or data, to create an
optimized schedule of instructions

Speculative • Using branch prediction and data flow analysis, some

processors speculatively execute instructions ahead
of their actual appearance in the program execution,
execution holding the results in temporary locations, keeping
execution engines as busy as possible

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Performance Balance
• Adjust the organization and Increase the number
of bits that are
architecture to compensate retrieved at one time
by making DRAMs
for the mismatch among the “wider” rather than
“deeper” and by
using wide bus data
capabilities of the various paths

components
Reduce the frequency
• Architectural examples of memory access by
incorporating
increasingly complex
include: and efficient cache
structures between
the processor and
main memory

Change the DRAM Increase the

interface to make it interconnect
more efficient by bandwidth between
processors and
including a cache or memory by using
other buffering higher speed buses
scheme on the DRAM and a hierarchy of
chip buses to buffer and
structure data flow

Graphics display

Wi-Fi modem
(max speed)

Hard disk

Optical disc

Laser printer

Scanner

Mouse

Keyboard

101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)

Figure 2.1 Typical I/O Device Data Rates

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Improvements in Chip Organization and
Architecture

• Increase hardware speed of processor

– Fundamentally due to shrinking logic gate size
▪ More gates, packed more tightly, increasing clock rate
▪ Propagation time for signals reduced

• Increase size and speed of caches

– Dedicating part of processor chip
▪ Cache access times drop significantly

• Change processor organization and architecture

– Increase effective speed of instruction execution
– Parallelism

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Problems with Clock Speed and Logic
Density
• Power
– Power density increases with density of logic and clock speed
– Dissipating heat
• RC delay
– Speed at which electrons flow limited by resistance and capacitance
of metal wires connecting them
– Delay increases as the RC product increases
– As components on the chip decrease in size, the wire interconnects
become thinner, increasing resistance
– Also, the wires are closer together, increasing capacitance
• Memory latency and throughput
– Memory access speed (latency) and transfer speed (throughput) lag
processor speeds

106
Transistors (Thousands)
105 Frequency (MHz)
Power (W)
104 Cores

103

102

0.1
1970 1975 1980 1985 1990 1995 2000 2005 2010

Figure 2.2 Processor Trends

Multicore provides the potential to

increase performance without
increasing the clock rate

Strategy is to use two simpler

processors on the chip rather
than one more complex
processor

With two processors larger

caches are justified

As caches became larger it

made performance sense to
create two and then three
levels of cache on a chip

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
• Leap in performance as well • Core designed to perform
as the challenges in parallel operations on graphics
developing software to data
exploit such a large number • Traditionally found on a plug-in
of cores graphics card, it is used to
• The multicore and MIC encode and render 2D and 3D
strategy involves a graphics as well as process
homogeneous collection of video
general purpose processors • Used as vector processors for
on a single chip a variety of applications that
require repetitive computations

• Gene Amdahl
• Deals with the potential speedup of a program using
multiple processors compared to a single processor
• Illustrates the problems facing industry in the
development of multi-core machines
– Software must be adapted to a highly parallel execution
environment to exploit the power of parallel processing

• Can be generalized to evaluate and design technical

improvement in a computer system

(1 – f)T fT
N

1
1 f 1 T
N

Figure 2.3 Illustration of Amdahl’s Law

Spedup f = 0.95

f = 0.90

f = 0.75

f = 0.5

Number of Processors

Figure 2.4 Amdahl’s Law for Multiprocessors

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Little’s Law
• Fundamental and simple relation with broad applications
• Can be applied to almost any system that is statistically in
steady state, and in which there is no leakage
• Queuing system
– If server is idle an item is served immediately, otherwise an arriving
item joins a queue
– There can be a single queue for a single server or for multiple servers,
or multiple queues with one being for each of multiple servers
• Average number of items in a queuing system equals the
average rate at which items arrive multiplied by the time that
an item spends in the system
– Relationship requires very few assumptions
– Because of its simplicity and generality it is extremely useful

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.1 Performance Factors and System Attributes

Ic p m k 

Instruction set architecture X X

Compiler technology X X X

Processor implementation X X

Cache and memory hierarchy X X

The three
The use of benchmarks to
compare systems involves common
calculating the mean value of formulas used
a set of data points related to for calculating
execution time
a mean are:

• Arithmetic
• Geometric
• Harmonic
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
MD
AM
(a) GM

Figure 2.6 HM

MD
AM
(b) GM
HM

MD
AM
(c) GM
HM

MD
AM
(d) GM
HM

MD
AM
(e) GM
HM

MD
AM
(f) GM
HM

MD
AM
(g) GM
HM

0 1 2 3 4 5 6 7 8 9 10 11

(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)

Figure 2.6 Comparison of Means on Various Data Sets

(each set has a maximum data point value of 11)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Arithmetic Mean
 An Arithmetic Mean (AM) is an appropriate measure
if the sum of all the measurements is a meaningful
and interesting value

 The AM is a good candidate for comparing the execution

time performance of several systems

For example, suppose we were interested in using a system

for large-scale simulation studies and wanted to evaluate several
alternative products. On each system we could run the simulation
multiple times with different input values for each run, and then take
the average execution time across all runs. The use of
multiple runs with different inputs should ensure that the results are
not heavily biased by some unusual feature of a given input set. The
AM of all the runs is a good measure of the system’s performance on
simulations, and a good number to use for system comparison.

 The AM used for a time-based variable, such as program execution time, has the
important property that it is directly proportional to the total time
 If the total time doubles, the mean value doubles

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.2
A Comparison of Arithmetic and Harmonic Means for Rates
Computer Computer Computer Computer Computer Computer
A time B time C time A rate B rate C rate
(secs) (secs) (secs) (MFLOPS) (MFLOPS) (MFLOPS)

Program 1
2.0 1.0 0.75 50 100 133.33
(108 FP ops)

Program 1
0.75 2.0 4.0 133.33 50 25
(108 FP ops)

Total
execution 2.75 3.0 4.75 – – –
time

Arithmetic
mean of 1.38 1.5 2.38 – – –
times

Inverse
of total
0.36 0.33 0.21 – – –
execution
time (1/sec)

Arithmetic
mean of – – – 91.67 75.00 79.17
rates

Harmonic
mean of – – – 72.72 66.67 42.11
rates

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.3
A Comparison of Arithmetic and Geometric Means for Normalized
Results
(a) Results normalized to Computer A

Computer A time Computer B time Computer C time

Program 1 2.0 (1.0) 1.0 (0.5) 0.75 (0.38)

Program 2 0.75 (1.0) 2.0 (2.67) 4.0 (5.33)

Total execution time 2.75 3.0 4.75

Arithmetic mean of
1.00 1.58 2.85
normalized times
Geometric mean of
1.00 1.15 1.41
normalized times
(a) Results normalized to Computer B

Computer A time Computer B time Computer C time

Program 1 2.0 (2.0) 1.0 (1.0) 0.75 (0.75)

Program 2 0.75 (0.38) 2.0 (1.0) 4.0 (2.0)

Total execution time 2.75 3.0 4.75

Arithmetic mean of
1.19 1.00 1.38
normalized times
Geometric mean of
0.87 1.00 1.22
normalized times

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 2.4
Another Comparison of Arithmetic and Geometric Means for
Normalized Results
(a) Results normalized to Computer A

Computer A time Computer B time Computer C time

Program 1 2.0 (1.0) 1.0 (0.5) 0.20 (0.1)

Program 2 0.4 (1.0) 2.0 (5.0) 4.0 (10.0)

Total execution time 2.4 3.00 4.2

Arithmetic mean of
1.00 2.75 5.05
normalized times
Geometric mean of
1.00 1.58 1.00
normalized times
(a) Results normalized to Computer B

Computer A time Computer B time Computer C time

Program 1 2.0 (2.0) 1.0 (1.0) 0.20 (0.2)

Program 2 0.4 (0.2) 2.0 (1.0) 4.0 (2.0)

Total execution time 2.4 3.0 4.2

Arithmetic mean of
1.10 1.00 1.10
normalized times
Geometric mean of
0.63 1.00 0.63
normalized times

• Desirable characteristics of a benchmark

program:

1. It is written in a high-level language, making it portable

across different machines
2. It is representative of a particular kind of programming
domain or paradigm, such as systems programming,
numerical programming, or commercial programming
3. It can be measured easily
4. It has wide distribution

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
System Performance Evaluation
Corporation (SPEC)

• Benchmark suite
– A collection of programs, defined in a high-level language
– Together attempt to provide a representative test of a computer in a
particular application or system programming area

– SPEC
– An industry consortium
– Defines and maintains the best known collection of benchmark suites
aimed at evaluating computer systems
– Performance measurements are widely used for comparison and
research purposes

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
SPEC CPU2017
• Best known SPEC benchmark suite
• Industry standard suite for processor intensive applications
• Appropriate for measuring performance for applications that
spend most of their time doing computation rather than I/O
• Consists of 20 integer benchmarks and 23 floating-point
benchmarks written in C, C++, and Fortran
• For all of the integer benchmarks and most of the floating-
point benchmarks, there are both rate and speed benchmark
programs
• The suite contains over 11 million lines of code

500.perlbench_r 600.perlbench_s C 363 Perl interpreter

502.gcc_r 602.gcc_s C 1304 GNU C compiler

505.mcf_r 605.mcf_s C 3 Route planning

520.omnetpp_r 620.omnetpp_s C++ 134 Discrete event simulation - computer

network
Table 2.5
523.xalancbmk_r 623.xalancbmk_s C++ 520 XML to HTML conversion via XSLT (A)

525.x264_r 625.x264_s C 96 Video compression SPEC

531.deepsjeng_r 631.deepsjeng_s C++ 10 AI: alpha-beta tree search (chess)
CPU2017
Benchmarks
541.leela_r 641.leela_s C++ 21 AI: Monte Carlo tree search (Go)

548.exchange2_r 648.exchange2_s Fortran 1 AI: recursive solution generator

(Sudoku)

557.xz_r 657.xz_s C 33 General data compression

Kloc = line count (including comments/whitespace) for source files used in a build/1000 (Table can be found on page 61 in the textbook.)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Rate Speed Language Kloc Application Area

503.bwaves_r 603.bwaves_s Fortran 1 Explosion modeling

507.cactuBSSN_r 607.cactuBSSN_s C++, C, 257 Physics; relativity
Fortran

508.namd_r C++, C 8 Molecular dynamics

510.parest_r C++ 427 Biomedical imaging; optical
tomography with finite elements Table 2.5
(B)
511.povray_r C++ 170 Ray tracing
519.ibm_r 619.ibm_s C 1 Fluid dynamics SPEC
521.wrf_r 621.wrf_s Fortran, C 991 Weather forecasting CPU2017
526.blender_r C++ 1577 3D rendering and animation
Benchmarks
527.cam4_r 627.cam4_s Fortran, C 407 Atmosphere modeling
628.pop2_s Fortran, C 338 Wide-scale ocean modeling
(climate level)

538.imagick_r 638.imagick_s C 259 Image manipulation

544.nab_r 644.nab_s C 24 Molecular dynamics
549.fotonik3d_r 649.fotonik3d_s Fortran 14 Computational electromagnetics

554.roms_r 654.roms_s Fortran 210 Regional ocean modeling.

Seconds Rate Seconds Rate

Benchmark
1141 1070 933 1310
500.perlbench_r
Table 2.6
1303 835 1276 852
502.gcc_r
1433 866 1378 901
SPEC
505.mcf_r
1664 606 1634 617
CPU 2017
520.omnetpp_r Integer
722 1120 713 1140
Benchmarks
523.xalancbmk_r for HP
655 2053 661 2030 Integrity
525.x264_r
604 1460 597 1470 Superdome X
531.deepsjeng_r

541.leela_r
892 1410 896 1420 (a) Rate Result
833 2420 770 2610 (768 copies)
548.exchange2_r

870 953 863 961

557.xz_r
© 2018 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
(Table can be found on page 64 in the textbook.)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Base Peak

Seconds Ratio Seconds Ratio

Benchmark
358 4.96 295 6.01 Table 2.6
600.perlbench_s

602.gcc_s
546 7.29 535 7.45 SPEC
605.mcf_s
866 5.45 700 6.75 CPU 2017
276 5.90 247 6.61 Integer
620.omnetpp_s
Benchmarks
188 7.52 179 7.91 for HP
623.xalancbmk_s
Integrity
625.x264_s
283 6.23 271 6.51 Superdome X
407 3.52 343 4.18
631.deepsjeng_s
(b) Speed
469 3.63 439 3.88 Result
641.leela_s
329 8.93 299 9.82
(384 threads)
648.exchange2_s

2164 2.86 2119 2.92

657.xz_s

(Table can be found on page 64 in the textbook.)

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Terms Used in SPEC Documentation
• Benchmark • Peak metric
– A program written in a high-level – This enables users to attempt to
language that can be compiled and optimize system performance by
executed on any computer that optimizing the compiler output
implements the compiler • Speed metric
• System under test – This is simply a measurement of the
– This is the system to be evaluated time it takes to execute a compiled
• Reference machine benchmark
– This is a system used by SPEC to • Used for comparing the ability of a
establish a baseline performance for all computer to complete single tasks
benchmarks • Rate metric
▪ Each benchmark is run and – This is a measurement of how many
measured on this machine to tasks a computer can accomplish in a
establish a reference time for that certain amount of time
benchmark • This is called a throughput, capacity,
• Base metric or rate measure
– These are required for all reported • Allows the system under test to
results and have strict guidelines for execute simultaneous tasks to take
compilation advantage of multiple processors
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Start
Figure 2.7
Get next
program

Run program
three times

Select
median value

Ratio(prog) =
Tref(prog)/TSUT(prog)

Yes More No Compute geometric

programs? mean of all ratios

End

Figure 2.7 SPEC Evaluation Flowchart

1774 1920 1080 1090

600.perlbench_s

3981 4330 1090 1110

602.gcc_s

4721 5150 1090 1120 Table 2.7

605.mcf_s

1630 1770 1090 1090

620.omnetpp_s SPECspeed
2017_int_base
1417 1540 1090 1090 Benchmark
623.xalancbmk_s Results for
Reference
1764 1920 1090 1100 Machine (1
625.x264_s

1432 1560 1090 1130 thread)

631.deepsjeng_s

1706 1850 1090 1090

641.leela_s

2939 3200 1080 1090

648.exchange2_s

6182 6730 1090 1140

657.xz_s
(Table can be found on page 66 in the textbook.)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Summary Performance
Concepts
Chapter 2
• Designing for performance • Basic measures of computer
– Microprocessor speed performance
– Performance balance – Clock speed
– Improvements in chip – Instruction execution rate
organization and
architecture • Calculating the mean
– Arithmetic mean
• Multicore – Harmonic mean
• MICs – Geometric mean
• GPGPUs • Benchmark principles
• Amdahl’s Law
• SPEC benchmarks
• Little’s Law

This work is protected by United States copyright laws and is provided solely
for the use of instructions in teaching their courses and assessing student
learning. dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permit-
ted. The work and materials from it should never be made available to
students except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.

William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
33 pages
2. ünite
No ratings yet
2. ünite
33 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
Chapter 11
No ratings yet
Chapter 11
33 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
2.Week
No ratings yet
2.Week
35 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
SP23 CS 212 Week 2
No ratings yet
SP23 CS 212 Week 2
23 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Chapter 2 Notes NBCAS511
No ratings yet
Chapter 2 Notes NBCAS511
10 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
chapter 2
No ratings yet
chapter 2
14 pages
Performance Issues
No ratings yet
Performance Issues
19 pages
CA01_2024S2
No ratings yet
CA01_2024S2
30 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
Unit 3 - Computer Performance
No ratings yet
Unit 3 - Computer Performance
29 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
Computer Architecture
No ratings yet
Computer Architecture
56 pages
Intro
No ratings yet
Intro
14 pages
Hpca Notes
No ratings yet
Hpca Notes
216 pages
CCS 1202 Lecture 2_Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2_Computer Evolution and Performance
32 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
02. Performance
No ratings yet
02. Performance
57 pages
Lecture1 2
No ratings yet
Lecture1 2
30 pages
Ch.2 Performance Issues: Computer Organization and Architecture
No ratings yet
Ch.2 Performance Issues: Computer Organization and Architecture
25 pages
CA Lec1
No ratings yet
CA Lec1
29 pages
How To Design A Microprocessor - Lesson Plan
No ratings yet
How To Design A Microprocessor - Lesson Plan
7 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
Lecture1 Introduction to Parallel Computing_2025
No ratings yet
Lecture1 Introduction to Parallel Computing_2025
38 pages
Aula Ch1
No ratings yet
Aula Ch1
40 pages
L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
Seminar Report
50% (4)
Seminar Report
30 pages
[FREE PDF sample] Computer Organization and Architecture 10th Edition Stallings Test Bank ebooks
100% (25)
[FREE PDF sample] Computer Organization and Architecture 10th Edition Stallings Test Bank ebooks
34 pages
PPT#01
No ratings yet
PPT#01
30 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
24 pages
Ico22 - 1 - Computer Abstraction and Technology
No ratings yet
Ico22 - 1 - Computer Abstraction and Technology
42 pages
Computer Architecture
No ratings yet
Computer Architecture
21 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
af933808-8d23-4c09-8d97-d44d4c730d12
No ratings yet
af933808-8d23-4c09-8d97-d44d4c730d12
49 pages
Lect 1
No ratings yet
Lect 1
54 pages
Computer Organization and Architecture 10th Edition Stallings Test Bank - Available For One-Click Instant Download
100% (16)
Computer Organization and Architecture 10th Edition Stallings Test Bank - Available For One-Click Instant Download
48 pages
IAS & MIPS Rate
No ratings yet
IAS & MIPS Rate
42 pages
COA1x1 - Class 2
No ratings yet
COA1x1 - Class 2
19 pages
Computer Organization and Architecture 10th Edition Stallings Test Bank - 2025 Scribd Download Full Chapters
100% (11)
Computer Organization and Architecture 10th Edition Stallings Test Bank - 2025 Scribd Download Full Chapters
44 pages
Patterson&Hennessy - (1 8)
No ratings yet
Patterson&Hennessy - (1 8)
3 pages
Unit I-Basic Structure of A Computer: System
No ratings yet
Unit I-Basic Structure of A Computer: System
64 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Computer Organization and Architecture

Designing for Performance

Superscalar • This is the ability to issue more than one instruction in

Data flow analysis

Speculative • Using branch prediction and data flow analysis, some

Change the DRAM Increase the

Figure 2.1 Typical I/O Device Data Rates

• Increase hardware speed of processor

• Increase size and speed of caches

• Change processor organization and architecture

Figure 2.2 Processor Trends

Multicore provides the potential to

Strategy is to use two simpler

With two processors larger

As caches became larger it

• Can be generalized to evaluate and design technical

Figure 2.3 Illustration of Amdahl’s Law

Figure 2.4 Amdahl’s Law for Multiprocessors

Instruction set architecture X X

Cache and memory hierarchy X X

Figure 2.6 Comparison of Means on Various Data Sets

 The AM is a good candidate for comparing the execution

For example, suppose we were interested in using a system

Computer A time Computer B time Computer C time

Program 1 2.0 (1.0) 1.0 (0.5) 0.75 (0.38)

Program 2 0.75 (1.0) 2.0 (2.67) 4.0 (5.33)

Total execution time 2.75 3.0 4.75

Computer A time Computer B time Computer C time

Program 1 2.0 (2.0) 1.0 (1.0) 0.75 (0.75)

Program 2 0.75 (0.38) 2.0 (1.0) 4.0 (2.0)

Total execution time 2.75 3.0 4.75

Computer A time Computer B time Computer C time

Program 1 2.0 (1.0) 1.0 (0.5) 0.20 (0.1)

Program 2 0.4 (1.0) 2.0 (5.0) 4.0 (10.0)

Total execution time 2.4 3.00 4.2

Computer A time Computer B time Computer C time

Program 1 2.0 (2.0) 1.0 (1.0) 0.20 (0.2)

Program 2 0.4 (0.2) 2.0 (1.0) 4.0 (2.0)

Total execution time 2.4 3.0 4.2

• Desirable characteristics of a benchmark

1. It is written in a high-level language, making it portable

500.perlbench_r 600.perlbench_s C 363 Perl interpreter

502.gcc_r 602.gcc_s C 1304 GNU C compiler

505.mcf_r 605.mcf_s C 3 Route planning

520.omnetpp_r 620.omnetpp_s C++ 134 Discrete event simulation - computer

525.x264_r 625.x264_s C 96 Video compression SPEC

548.exchange2_r 648.exchange2_s Fortran 1 AI: recursive solution generator

557.xz_r 657.xz_s C 33 General data compression

503.bwaves_r 603.bwaves_s Fortran 1 Explosion modeling

508.namd_r C++, C 8 Molecular dynamics

538.imagick_r 638.imagick_s C 259 Image manipulation

554.roms_r 654.roms_s Fortran 210 Regional ocean modeling.

Seconds Rate Seconds Rate

870 953 863 961

Seconds Ratio Seconds Ratio

2164 2.86 2119 2.92

(Table can be found on page 64 in the textbook.)

Yes More No Compute geometric

Figure 2.7 SPEC Evaluation Flowchart

1774 1920 1080 1090

3981 4330 1090 1110

4721 5150 1090 1120 Table 2.7

1630 1770 1090 1090

1432 1560 1090 1130 thread)

1706 1850 1090 1090

2939 3200 1080 1090

6182 6730 1090 1140

You might also like