0% found this document useful (0 votes)
2 views

LEC 2

The document discusses performance issues in computer systems, emphasizing the dramatic increase in computing power and the need for efficient design to handle demanding applications. It outlines various strategies to improve performance, including increasing hardware speed, optimizing cache, and utilizing multiprocessors, while also introducing Amdahl's Law to explain the limitations of performance gains from optimizing single components. Additionally, it highlights the importance of benchmarks for evaluating system performance and the role of SPEC in providing standardized testing for computer systems.

Uploaded by

mohamedhuawei010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

LEC 2

The document discusses performance issues in computer systems, emphasizing the dramatic increase in computing power and the need for efficient design to handle demanding applications. It outlines various strategies to improve performance, including increasing hardware speed, optimizing cache, and utilizing multiprocessors, while also introducing Amdahl's Law to explain the limitations of performance gains from optimizing single components. Additionally, it highlights the importance of benchmarks for evaluating system performance and the role of SPEC in providing standardized testing for computer systems.

Uploaded by

mohamedhuawei010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

+

Performance Issues
+
Designing for Performance
n The cost of computer systems continues to drop dramatically, while the performance
and capacity of those systems continue to rise equally dramatically

n Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years
ago

n Processors are so inexpensive that we now have microprocessors we throw away

n Desktop applications that require the great power of today’s microprocessor-based


systems include:
n Image processing
n Three-dimensional rendering
n Speech recognition
n Videoconferencing
n Multimedia authoring
n Voice and video annotation of files
n Simulation modeling

n Businesses are relying on increasingly powerful servers to handle transaction and


database processing and to support massive client/server networks that have
replaced the huge mainframe computer centers of yesteryear

n Cloud service providers use massive high-performance banks of servers to


satisfy high-volume, high-transaction-rate applications for a broad spectrum of
clients
+

Increase the number


n Adjust the organization and of bits that are
retrieved at one time
architecture to compensate by making DRAMs
“wider” rather than
for the mismatch among the “deeper” and by
using wide bus data
capabilities of the various paths

components
Reduce the frequency
of memory access by
n Architectural examples incorporating
include: increasingly complex
and efficient cache
structures between
the processor and
main memory

Increase the
Change the DRAM
interconnect
interface to make it
bandwidth between
more efficient by
processors and
including a cache or
memory by using
other buffering
higher speed buses
scheme on the DRAM
and a hierarchy of
chip
buses to buffer and
structure data flow
+

n Increase hardware speed of processor


n Fundamentally due to shrinking logic gate size
n More gates, packed more tightly, increasing clock rate
n Propagation time for signals reduced

n Increase size and speed of caches


n Dedicating part of processor chip
n Cache access times drop significantly

n Change processor organization and architecture


n Increase effective speed of instruction execution
n Parallelism
+ 5

n To improve performance you can either:


n Decrease the CPI (clock cycles per instruction) by
using new Hardware.
n Decrease the clock time or Increase clock rate by
reducing propagation delays or by use pipelining.
n Decreasethe number of required cycles or improve
ISA or Compiler.

2/19/2025
+
Relative Performance
+
Measuring Execution Time
+
Execution Time and CPU Clocking
+
CPU Time
+
EXAMPLE

n Our favorite program runs in 10 seconds on


computer A, which has a 2 GHz clock. We are trying
to help a computer designer build a computer, B,
which will run this program in 6 seconds. The
designer has determined that a substantial increase
in the clock rate is possible, but this increase will
affect the rest of the CPU design, causing computer
B to require 1.2 times as many clock cycles as
computer A for this program. What clock rate should
we tell the designer to target?
+
Answer

To run the program in 6 seconds, B must have twice the clock rate of
A.
+
Instruction Count and CPI
Clock Cycles per Instruction (CPI)
• Instructions take different number of cycles to execute
– Multiplication takes more time than addition
– Floating point operations take longer than integer ones
– Accessing memory takes more time than accessing registers
• CPI is an average number of clock cycles per instruction

I1 I2 I3 I4 I5 I6 I7 CPI = 14/7 = 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 cycles

• Important point
Changing the cycle time often changes the number of cycles required
for various instructions (more later)

13
+
CPI Example
+
CPI in More Detail
Performance Equation
• To execute, a given program will require …
– Some number of machine instructions
– Some number of clock cycles
– Some number of seconds
• We can relate CPU clock cycles to instruction count
CPU cycles = Instruction Count × CPI

• Performance Equation: (related to instruction count)


Time = Instruction Count × CPI × cycle time

Program Compiler
Physical Design
architecture (Scheduling)
Circuit
compiler (ISA) Organization (uArch)
Designers
Microarchitects
16
Understanding Performance Equation
Time = Instruction Count × CPI × cycle time

• How to improve (i.e. decrease) CPU time:


– Clock rate: hardware technology & organization,
– CPI: organization, ISA and compiler technology,
– Instruction count: ISA & compiler technology.
Many potential performance improvement techniques primarily improve
one component with small or predictable impact on the other two.
I-Count CPI Cycle
Program X
Compiler X X
ISA X X
Organization X X
Technology X
+
CPI Example
+
Performance Summary
+
Clock Rate and Power Trends
+
Reducing Power
n Suppose a new CPU has
n 85% of capacitive load of old CPU
n 15% voltage and 15% frequency reduction

n The power wall


n We can’t reduce voltage further
n We can’t remove more heat

n How else can we improve performance?


+
Multiprocessors
n Multicore microprocessors
n More than one processor per chip
n Requires explicitly parallel programming
n Compare with instruction level parallelism
n Hardware executes multiple instructions at once
n Hidden from the programmer
n Hard to do
n Programming for performance
n Load balancing
n Optimizing communication and synchronization
+
Amdahl's Law
n In computer architecture, Amdahl's law (or Amdahl's
argument) is a formula that shows how much faster a
task can be completed when you add more resources to
the system.

n The law can be stated as:


" t h e ove ra l l p e r f o r m a n c e i m p rove m e n t ga i n e d by
optimizing a single part of a system is limited by the
fraction of time that the improved part is actually used".

n Amdahl's law is often used in parallel computing to


predict the theoretical speedup when using multiple
processors.
+
Amdahl's Law
S pe du p f = 0 .9 5

f = 0 .9 0

+ f = 0 .7 5

f = 0 .5

Nu m be r of P roc e s s ors

F igu re 2 .4 Am da h l’s L a w for Mu ltiproc e s s ors


+ Example on Amdahl's Law 26

• Suppose a program runs in 100 seconds on a machine,


with multiply responsible for 80 seconds of this time.
How much do we have to improve the speed of
multiplication if we want the program to run 4 times
faster?
• Solution:
suppose we improve multiplication by a factor s
25 sec (4 times faster) = 80 sec / s + 20 sec
s = 80 / (25 – 20) = 80 / 5 = 16
Improve the speed of multiplication by s = 16 times
• How about making the program 5 times faster?
20 sec ( 5 times faster) = 80 sec / s + 20 sec
s = 80 / (20 – 20) = ∞ Impossible to make 5 times faster!

2/19/2025
+
Benchmark Principles

n Desirable characteristics of a benchmark


program:

1. It is written in a high-level language, making it


portable across different machines
2. It is representative of a particular kind of
programming domain or paradigm, such as
systems programming, numerical programming,
or commercial programming
3. It can be measured easily
4. It has wide distribution
+
System Performance Evaluation
Corporation (SPEC)
n Benchmark suite
n A collection of programs, defined in a high-level language
n Together attempt to provide a representative test of a computer in
a particular application or system programming area

n SPEC
n An industry consortium
n Defines and maintains the best known collection of benchmark
suites aimed at evaluating computer systems
n Performance measurements are widely used for comparison and
research purposes
+ n Best known SPEC benchmark suite

n Industry standard suite for processor


intensive applications

n Appropriate for measuring


performance for applications that
spend most of their time doing
computation rather than I/O

n Consists of 17 floating point programs


written in C, C++, and Fortran and 12
integer programs written in C and C++

n Suite contains over 3 million lines of


code

n Fifth generation of processor intensive


suites from SPEC
+
Pitfall: MIPS as a Performance Metric
+
Concluding Remarks

You might also like