0% found this document useful (0 votes)
20 views

Quatitative Principle

Uploaded by

Balaram Pal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Quatitative Principle

Uploaded by

Balaram Pal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

Computer Architecture

Dr. Ashok Kumar Turuk

1
 Recommended Book::

Computer Architecture: A Quantitative Approach


(Fifth Edition) by
John L Hennessy
David A Patterson

2
Performance
 What do we mean when we say one computer
is faster than the other?

 Response Time

The time between the start and completion of an event


(also known as execution time)

 Throughput

The total amount of work done in a given time

3
Performance
 We relate the performance of two different
computers (say X and Y) in comparing design
alternatives

 The phrase “X is faster than Y”

 The execution time or response time of a given task is


lower on X than on Y

4
Performance
"X is n times faster than Y" means

ExTime(Y) Performance(X)
--------- = --------------- =n
ExTime(X) Performance(Y)

5
Performance
 The “throughput of X is 1.3 times higher than Y”

 Number of tasks completed per unit time on X is 1.3 times the


number completed on Y

6
Computer Performance:
 Response Time (latency)

 How long does it take for my job to run?


 How long does it take to execute a job?
 How long must I wait for the database query?

7
Computer Performance:
 Throughput

 How many jobs can the machine run at once?


 What is the average execution rate?
 How much work is getting done?

8
Computer Performance:
 If we upgrade a machine with a new processor what do we
increase?

 If we add a new machine to the lab what do we increase?

9
Execution Time

 Execution time can be defined in different ways


depending on what we count
 Elapsed Time (wall-clock time, response time)
 Latency to complete a task
 Counts everything (disk and memory accesses, I/O , etc.)
 A useful number, but often not good for comparison
purposes

10
Execution Time

 CPU time
 Doesn't count I/O or time spent running other programs
 Can be broken up into system time, and user time

 Our focus: User CPU time


 Time spent executing the lines of code that are "in" our
program

11
Quantitative Principles
 Guidelines and principles that are useful in the
design and analysis of computers

 Take advantage of parallelism: Important method to


improve performance
 System level
 To improve the throughput performance of a typical server,
using multiple processor and multiple disk
 Individual processor
 Parallelism among instruction is critical to achieving high
performance
 Pipelining

12
Quantitative Principles

 Digital Design

 Set-associate cache uses multiple banks of memory that


are searched in parallel to find the desired data item

 Carry-Look ahead adder

13
Quantitative Principles
 Principle of Locality
 Programs tend to reuse data and instructions they have used
recently
 Programs spend 90% of its execution time in only 10% of the
code.
 Temporal Locality
 Recently accessed items are likely to be accessed in the near
future
 Spatial Locality
 Items whose addresses are near to one another tend to be
referenced close together in time

14
Quantitative Principles
 Focus on the Common Case
 Make the common case Fast and Other Parts Correct

 In making a design trade-off, favor the frequent cases over


the infrequent case.

 In a CPU with rare overflow events, optimize the normal


“add” part (more than 90% typically) rather than
considering priority on overflow parts

15
Amdahl's Law
• Amdahl's Law states that the performance
improvement to be gained from using some
faster mode of execution is limited by the
fraction of time the faster mode can be used.

• Amdahl's Law defines the speedup that can be


gained by using a particular feature

16
Amdahl's Law

17
Amdahl's Law

18
Speedup
• Speedup tells us how much faster a task will run
using the computer with the enhancement as opposed
to the original computer.
• Amdahl's Law gives us a quick way to find speedup
from some enhancement.

• Speedup depends on two factors

– Fractionenhanced

– Speedupenhanced
19
Speedup
Fractionenhanced

The fraction of the computation time in the original computer that


can be converted to take advantage of the enhancement.
Fractionenhanced is always less than 1.

Speedupenhanced

The improvement gained by the enhanced execution mode; that is


how much faster the task would run if the enhanced mode were
used for the entire program. Speedupenhanced is always greater than
1. Ratio of the time in the original mode over the time of the
enhanced mode

20
Speedup

• The execution time using the original computer with


the enhanced mode will be the time spent using the
unenhanced portion of the computer plus the time
spent using the enhancement

21
Amdahl’s Law

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced


Speedupenhanced

1
ExTimeold
Speedupoverall = =
(1 - Fractionenhanced) + Fractionenhanced
ExTimenew
Speedupenhanced

22
Amdahl’s Law
• The maximum speedup, which can be obtained by
using a faster mode of execution, is limited by the
fraction of the time the faster mode can be used.
Examples include:
– Improving your Web-surfing speed
– Parallelization of computer programs
– Performance of symmetric-multiprocessor systems

23
Amdahl’s Law

• Floating point instructions improved to run


2X; but only 10% of actual instructions are FP

ExTimenew =

Speedupoverall =

24
Amdahl’s Law
• Floating point instructions improved to run
2X; but only 10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Speedupoverall = 1 = 1.053
0.95

• What’s the overall performance gain if we


could improve the non-FP instructions by 2x?

25
Amdahl’s Law
• A program spent 90% time on doing computing jobs, and
other 10% time on disk accesses. If we use a 100X faster CPU,
how much speedup we gain?

26
Amdahl’s Law
• A program spent 90% time on doing computing jobs, and
other 10% time on disk accesses. If we use a 100X faster CPU,
how much speedup we gain?

ExTimenew = ExTimeold x (0.1 + 0.9/100) = 0.109 x


ExTimeold

Speedupoverall = 1 = 9.17
0.109

27
Metrics of Performance

Application Answers per month


Operations per second
Programming
Language
Compiler
(millions) of Instructions per second: MIPS
ISA (millions) of (FP) operations per second:
MFLOP/s
Datapath
Control Megabytes per second
Function Units
Transistors Wires Pins Cycles per second (clock rate)

28
Performance
 Performance is determined by execution time
 Do any of the other variables equal to performance?
 # of cycles to execute program?
 # of instructions in program?
 # of cycles per second?
 average # of cycles per instruction?
 average # of instructions per second?

 Common pitfall: thinking one of the variables is


indicative of performance when it really isn’t.

29
CPU Performance Equation

30
CPU Performance Equation

31
CPU Performance Equation
 CPU time = CPU clock cycles for a program x Clock
Cycle Time

 CPU time = Instruction Count x Cycles Per Instruction x Clock


Cycle Time
CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

32
CPU Performance Equation
 CPU time is equally depend on these three
characteristics:
 A 10% improvement in any one of them leads to 10%
improvement in CPU time.
• Unfortunately, it is difficult to change one parameter
in complete isolation from others because the basic
technologies involved in changing each
characteristics are interdependent:
 Clock Cycle Time : Hardware technology and Organization
 CPI : Organization and Instruction set architecture
 Instruction count : Instruction set architecture and Compiler
technology
33
CPU Performance Equation

 CPU time =CPU clock cycles for a program x clock cycle time

 CPU time =CPU clock cycles for a program/(clock rate)

CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

34
Cycles Per Instruction
 Average Cycles per Instruction
 CPI = (CPU clock cycles for a program) /
Instruction Count
 CPU Time = Instruction Count x Cycles per
Instruction x clock cycle time

CPU time = IC x CPI/ (clock rate)

35
CPU Performance Equation
 CPU time = Instruction Count x Cycles per Instruction
x clock cycle time

CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

 CPU performance is dependent on three characteristic: Clock


cycle (rate), clock cycles per instructions, and instruction count

36
CPU Performance Equation

Components of Units of Measure


Performance
CPU execution time Seconds for the program
for a program
Instruction count Instructions executed for the
program
Clock cycle per Average number of clock cycles
instruction (CPI) per instruction
Clock cycle time Seconds per clock cycle

37
Understanding Program Performance
 Performance of a program depends on

 The algorithm,
 The language,
 The compiler,
 The architecture, and
 The actual hardware

38
H/W or S/W Affects How?
Component What?
Algorithm IC, Possibly 1. The algorithm determines the
CPI number of source program
instruction executed and hence
the number of processor
instructions executed

2. The algorithm may also affect


the CPI by favoring slower or
faster instructions. Ex. if the
algorithm uses more FP
operations, it will tend to have a
higher CPI

39
H/W or S/W Affects How?
Component What?
Programming IC, CPI 1. The programming language
Language affects the IC since the
statement in the language are
translated to processor
instructions, which determines
the IC

2. The language may also affect


the CPI because of its feature;
for ex a language with heavy
support for data abstraction (e.g
java) will require indirect calls,
which will use higher CPI
instrcutions. 40
H/W or S/W Affects How?
Component What?
Compiler IC, CPI 1. The efficiency of the compiler
affects both the instruction
count and average cycles per
instruction, since the compiler
determines the translation of the
source language instruction to
computer instructions.

2. The compiler’s role can be very


complex and affects the CPI in
complex ways.

41
H/W or S/W Affects How?
Component What?
Instruction set IC, CPI, Clock 1. The instruction set architecture
architecture rate affects all three aspects of CPU
performance, since it affects the
instructions needed for a
function, the cost in cycle of
each instruction, and the overall
clock rate of the processor.

42
CPU Performance Equation
CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

Inst Count CPI Clock Rate


Program X

Compiler X X

Inst. Set. X X X

Organization X X

Technology X
43
Cycles Per Instruction

 CPI = (CPU clock cycles for a program) / Instruction Count

n

CPU Clock Cycle =  CPIi x ICi
i =1

where ICi represents number of times instruction i is executed


in a program, and CPIi average number of clock per
instruction for instruction i.

44
Cycles Per Instruction
n

CPU time = Clock Cycle Time x  CPIi * ICi


i =1

n
CPI =  CPIi * ICi
i =1

Instruction Count

45
Cycles Per Instruction

n
CPI =  CPIi * ICi
i =1

Instruction Count

n
CPI =  ICi
* CPIi
i =1
Instruction Count

46
Example: Calculating CPI
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
1.5
Typical Mix

47
Example
 Suppose we have the following measurements:
 Frequency of FP operation = 25 %
 Average CPI of FP operations = 4.0
 Average CPI of other instructions = 1.33
 Frequency of FPSQR = 2%
 CPI of FPSQR = 20

Assume that the two design alternatives are to


decrease the CPI of FPSQR to 2 or to decreases
the average CPI of FP operations to 2.5. Compare
these two design alternatives using the
processor performance equation.

48
More on CPI
 Is CPI a good metric to evaluate CPU
performance

 Yes, when clock rate is fixed, CPI is a good metric in


comparing pipelining machines since higher instruction
level parallelism results in smaller CPI

 No, small CPI does not mean a faster processor since


clock rate may vary and compiler may generate program
codes in different lengths

49
More on CPI

 Can CPI be less than 1?

 Yes. Multiple-issue processors like superscalar, allowing


multiple instructions to issue in a clock cycle

50
Amdahl’s Law for Multiprocessor
 Let T the total execution time of a program on a
uniprocessor workstation
 Let the program has been parallelized or partitioned for
parallel execution on a cluster of many processing nodes
 Assume that a fraction α of the code must be executed
sequentially, called sequential bottleneck
 1 – α of the code can be compiled for parallel execution
by n processor
 Total execution time of the program = αT + (1- α)T/n

51
Amdahl’s Law for Multiprocessor

 Speedup (S) = T / [αT + (1 – α)T/n] = 1/[α + (1- α)n]

 The maximum speedup of n is achieved only if the


sequential bottleneck α is reduced to zero or the code
is fully parallelizable with α = 0.

52
Power and Energy
 Problem: Get power in, get power out
• Three primary concern from the designers point of
view
 What is the maximum power a processor requires ?
 Modern processors provide voltage indexing methods that
allow the processor to slow down and regulate voltage within a
wider margin
 What is the sustained power consumption ?
 Widely called the Thermal Design Power (TDP)
 Used as target for power supply and cooling system
 Lower than peak power, higher than average power
consumption
 Energy and energy efficiency
53
Trends in Power and Energy
Dynamic Energy and Power
 Dynamic energy
 Capacitive load x Voltage2
 Transistor switch from 0 -> 1 or 1 -> 0
 ½ x Capacitive load x Voltage2

 Dynamic power
 ½ x Capacitive load x Voltage2 x Frequency switched

 Reducing clock rate reduces power, not energy

Copyright © 2012, Elsevier Inc. All rights reserved. 54


Trends in Power and Energy
Reducing Power
 Techniques for reducing power:
 Do nothing well
 Dynamic Voltage-Frequency Scaling
 Low power state for DRAM, disks
 Overclocking, turning off cores

55
Copyright © 2012, Elsevier Inc. All rights reserved.
Trends in Power and Energy
Static Power
 Static power consumption
 Currentstatic x Voltage
 Scales with number of transistors
 To reduce: power gating

56
Copyright © 2012, Elsevier Inc. All rights reserved.

You might also like