0% found this document useful (0 votes)
170 views

Principles of Scalable Performance

The document discusses principles of scalable performance including performance metrics, speedup laws, scaling principles, parallelism profiles, degree of parallelism, average parallelism, available parallelism, asymptotic speedup, performance measures, redundancy, utilization, quality of parallelism, and standard performance benchmarks.

Uploaded by

Senthil Ganesh R
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views

Principles of Scalable Performance

The document discusses principles of scalable performance including performance metrics, speedup laws, scaling principles, parallelism profiles, degree of parallelism, average parallelism, available parallelism, asymptotic speedup, performance measures, redundancy, utilization, quality of parallelism, and standard performance benchmarks.

Uploaded by

Senthil Ganesh R
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Principles of Scalable Performance

• Performance measures
• Speedup laws
• Scalability principles
• Scaling up vs. scaling down

1
Performance metrics and measures

• Parallelism profiles

• Asymptotic speedup factor

• System efficiency, utilization and quality

• Standard performance measures

2
Parallelism profile in Programs

• The degree of parallelism reflects the extent to


which software parallelism matches hardware
parallelism
Degree of parallelism

• Execution of a program on a parallel computers-


use different number of processor at different time
periods during the execution cycle

• For each period –number of processor used to


execute a program – degree of parallelism (DOP)

• Discrete time function- only non negative integer


value

4
Degree of parallelism
• Parallelism profile is a plot of the DOP as a
function of time

• Ideally have unlimited resources

• Software tools –available to trace the


parallelism profile
Factors affecting parallelism profiles

• Algorithm structure

• Program optimization

• Resource utilization

• Run-time conditions

6
Degree of parallelism

• DOP-assumption – unbounded number of


available processors and other necessary
resources

• DOP not achievable on a real computer with


limited resources

• DOP exceeds maximum number of available


processor – parallel branches executed in
chunks sequentially
Degree of parallelism
• Parallelism still exists within each chunk , limited
by machine size

• Limited by memory & other non processor


resources
Average parallelism variables

• n – homogeneous processors

• m – maximum parallelism in a profile

•  - computing capacity of a single processor


(execution rate only, no overhead)

• DOP=i – # processors busy during an observation


period

9
Average parallelism

• Total amount of work performed is proportional to


the area under the profile curve
t2
W    DOP(t )dt
t1
m
W    i  ti
i 1
• ti total amount of time that DOP = I

• t2 –t1-total elapsed time


10
Average parallelism

1 t2
A 
t 2  t1 t 1
DOP (t )dt

 m
  m

A    i  ti  /   ti 
 i 1   i 1 

11
Example: parallelism profile and average
parallelism

12
Available Parallelism

• Potential parallelism in application programs

• Engineering & scientific codes exhibit a high DOP due


to data parallelism

• Computation is less –little parallelism when basic


boundaries are ignored

• Basic block- block of instructions that has single entry


and single exit points

• Complier organization & algorithm redesign –increase


available parallelism
Asymptotic speedup

m
m
T (1)   ti (1)  
Wim
T (1) W i

i 1 i 1  S   i 1
T ( ) m
m
T (  )   ti (  )  
Wim
W / i
i 1
i

i 1 i 1 i = A in the ideal case

(response time)

14
Performance measures

• Consider n processors executing m programs in


various modes with different performance levels
• Want to define the mean performance of these
multimode computers:
• Arithmetic mean performance
• Geometric mean performance
• Harmonic mean performance

15
Arithmetic mean performance

m
Ra   Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R   ( f i Ri )
* Weighted arithmetic mean
execution rate
a
i 1
-proportional to the sum of the inverses of
execution times

16
Geometric mean performance

m
Rg   R 1/ m
i
Geometric mean execution rate

i 1
m
R   Ri
*
g
fi Weighted geometric mean
execution rate

i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time

17
Harmonic mean performance

Mean execution time per instruction


Ti  1 / Ri For program i

1 m 1 m 1
Ta   Ti   Arithmetic mean execution time
per instruction
m i 1 m i 1 Ri

18
Harmonic mean performance

m
Rh  1 / Ta  m
Harmonic mean execution rate

 (1 / R )
i 1
i

1
R 
*
h m
Weighted harmonic mean execution rate

( f
i 1
i / Ri )
-corresponds to total # of operations divided by
the total time (closest to the real performance)

19
Harmonic Mean Speedup

• Ties the various modes of a program to the


number of processors used

• Program is in execution mode i, if i processors


used
1
S  T1 / T 
*


n
i 1
f i / Ri

• Sequential execution time T1 = 1/R1 = 1

20
Harmonic Mean Speedup Performance

21
Amdahl’s Law

• Assume Ri = i, w = (, 0, 0, …, 1- )
• System is either sequential, with
probability , or fully parallel with prob.
1- 
n
Sn 
1  (n  1)

• Implies S  1/  as n  
22
Speedup Performance

23
System Efficiency

• O(n) is the total # of unit operations

• T(n) is execution time in unit time steps

• T(n) < O(n) and T(1) = O(1)


𝑆 (𝑛)=𝑇 (1)/𝑇 (𝑛)
𝑆 (𝑛) 𝑇 (1)
𝐸 (𝑛)= =
𝑛 𝑛𝑇 (𝑛)

24
Redundancy and Utilization

• Redundancy signifies the extent of matching


software and hardware parallelism

R (n)  O (n) / O(1)


• Utilization indicates the percentage of resources
kept busy during execution

O ( n)
U ( n)  R ( n) E ( n) 
nT (n)

25
Quality of Parallelism

• Directly proportional to the speedup and efficiency


and inversely related to the redundancy

• Upper-bounded by the speedup S(n)

S ( n) E ( n) T 3 (1)
Q ( n)  
R ( n) nT 2 (n)O(n)

26
Example of Performance

• Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n) =


4 n3/(n+3)
S(n) = (n+3)/4
E(n) = (n+3)/(4n)
R(n) = (n + log n)/n
U(n) = (n+3)(n + log n)/(4n2)
Q(n) = (n+3)2 / (16(n + log n))

27
Standard Performance Measures

• MIPS and Mflops


• Describe the instruction execution rate & floating point
capability of a parallel computer
• MIPS= fx Ic/ C x 10^6

• MIPS-Depends on instruction ,performance

• Mflops – depends on machine hardware design and on


program behavior

28
Standard Performance Measures

• Dhrystone results
• CPU intensive benchmark
• Consists of 100 high level language instructions
& data types
• Balanced with respect to statement type, data
type, locality of reference , with no operating
system calls and making no use of library
functions or subroutines
• Measure of integer performance of modern
processor
Standard Performance Measures

• Whestone results
• Fortran based synthetic benchmark
• Measure of floating-point performance
• Benchmark includes both integer & floating
point operations involving array
indexing ,subroutine calls, parameter
passing ,conditional branching
Standard Performance Measures
• Performance depends on compliers used

• Dhrystone – to test CPU

• Procedure in-lining compiler technique –affect


dhrystone performance

• Sensitivity to compliers – drawback


Standard Performance Measures
• TPS and KLIPS ratings
• On line transaction processing applications
demand rapid, interactive processing for a
large number of relatively simple transaction
• Supported by very large database
• Automated teller machine & airline reservation
-examples
• Transaction performance
Standard Performance Measures

• Throughput of computers –on-line transaction


processing –transaction per second

• Transaction involve – database search , query


answering , database update operations

• In AI applications , the measure KLIPS(Kilo logic


interference per second)

• reasoning power of AI machine


Standard Performance Measures

• Japan fifth generation computer system –


performance of 400 KLIPS

• 400 KLIPS = 40 MIPS

• Logic inference demands symbolic manipulation

You might also like