CSCI 8150 Advanced Computer Architecture

This document discusses various metrics for measuring parallel computing performance, including: - Degree of parallelism (DOP), which measures the number of processors used at a given time. A parallelism profile plots DOP over time. - Available parallelism in programs can be high in theory but lower in practice due to limitations of real machines. - Metrics like speedup, efficiency, redundancy, and quality aim to characterize how effectively a parallel system utilizes its processors. - Standard benchmarks like Dhrystone, Whetstone and TPS (transactions per second) are used to compare overall system performance.

Uploaded by

Rishabh Panday

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views

CSCI 8150 Advanced Computer Architecture

Uploaded by

Rishabh Panday

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

CSCI 8150

Advanced Computer Architecture

Hwang, Chapter 3
Principles of Scalable Performance
3.1 Performance Metrics and Measures
Degree of Parallelism
The number of processors used at any instant to
execute a program is called the degree of
parallelism (DOP); this can vary over time.
DOP assumes an infinite number of processors are
available; this is not achievable in real machines, so
some parallel program segments must be executed
sequentially as smaller parallel segments. Other
resources may impose limiting conditions.
A plot of DOP vs. time is called a parallelism profile.

Example Parallelism Profile
Time
DOP
Average
Parallelism
t
1
t
2
Average Parallelism - 1
Assume the following:
n homogeneous processors
maximum parallelism in a profile is m
Ideally, n >> m
A, the computing capacity of a processor, is something
like MIPS or Mflops w/o regard for memory latency, etc.
i is the number of processors busy in an observation
period (e.g. DOP = i )
W is the total work (instructions or computations)
performed by a program
A is the average parallelism in the program
Average Parallelism - 2
}
A =
2
1
) (
t
t
dt t DOP W

=
A =
m
i
i
t i W
1
where t
i
= total time that DOP = i, and

=
=
m
i
i
t t t
1
1 2
Average Parallelism - 3
}

=
2
1
) (
1
1 2
t
t
dt t DOP
t t
A
|
.
|

\
|
|
.
|

\
|
=

= =
m
i
i
m
i
i
t t i A
1 1
/
Available Parallelism
Various studies have shown that the potential
parallelism in scientific and engineering
calculations can be very high (e.g. hundreds or
thousands of instructions per clock cycle).
But in real machines, the actual parallelism is much
smaller (e.g. 10 or 20).
Basic Blocks
A basic block is a sequence or block of instructions
with one entry and one exit.
Basic blocks are frequently used as the focus of
optimizers in compilers (since its easier to manage
the use of registers utilized in the block).
Limiting optimization to basic blocks limits the
instruction level parallelism that can be obtained
(to about 2 to 5 in typical code).
Asymptotic Speedup - 1

=
=
m
i
i
W W
1
i i
t i W A =
(work done when DOP = i)
(relates sum of W
i
terms to W)
A = k W k t
i i
/ ) (
(execution time with k processors)
A = i W t
i i
/ ) (
(for 1 s i s m)
A = / ) (
i i
W k t
(execution time with 1 processors)
Asymptotic Speedup - 2

= =
A
= =
m
i
m
i
i
i
W
t T
1 1
) 1 ( ) 1 ( (resp. time w/ 1 proc.)

= =
A
= =
m
i
m
i
i
i
i
W
t T
1 1
) ( ) ( (resp. time w/ proc.)
A
i W
W
T
T
S
m
i
i
m
i
i
= =

=
=

1
1
/
) (
) 1 (
(in the ideal case)
S are asymptotic speedup
Mean Performance Calculation
We seek to obtain a measure that characterizes
the mean, or average, performance of a set of
benchmark programs with potentially many
different execution modes (e.g. scalar, vector,
sequential, parallel).
We may also wish to associate weights with these
programs to emphasize these different modes and
yield a more meaningful performance measure.
Arithmetic Mean
The arithmetic mean is familiar (sum of the terms
divided by the number of terms).
Our measures will use execution rates expressed in
MIPS or Mflops.
The arithmetic mean of a set of execution rates is
proportional to the sum of the inverses of the
execution times; it is not inversely proportional to
the sum of the execution times.
Thus arithmetic mean fails to represent real times
consumed by the benchmarks when executed.
Geometric Mean
A geometric mean of n terms is the n
th
root of the
product of the n terms.
Like the arithmetic mean, the geometric mean of a
set of execution rates does not have an inverse
relationship with the total execution time of the
programs.
(Geometric mean has been advocated for use with
normalized performance numbers for comparison
with a reference machine.)
Harmonic Mean
Instead of using arithmetic or geometric mean, we
use the harmonic mean execution rate, which is
just the inverse of the arithmetic mean of the
execution time (thus guaranteeing the inverse
relation not exhibited by the other means).
( )

=
=
m
i
i
h
R
m
R
1
/ 1
Weighted Harmonic Mean
If we associate weights f
i
with the benchmarks,
then we can compute the weighted harmonic
mean:
( )

=
=
m
i
i i
h
R f
m
R
1
/
Weighted Harmonic Mean Speedup
T
1
= 1/R
1
= 1 is the sequential execution time on a single
processor with rate R
1
= 1.
T
i
= 1/R
i
= 1/i = is the execution time using i processors
with a combined execution rate of R
i
= i.
Now suppose a program has n execution modes with
associated weights f
1
f
n
. The weighted harmonic mean
speedup is defined as:
( )
*
1
1
1
/
/
n
i i
i
S T T
f R
=
= =

* *
1/
h
T R =
(weighted arithmetic
mean execution time)
Amdahls Law
Assume R
i
= i, and w (the weights) are (o, 0, , 0, 1-o).
Basically this means the system is used sequentially (with
probability o) or all n processors are used (with probability
1- o).
This yields the speedup equation known as Amdahls law:
( )
1 1
n
n
S
n o
=
+
The implication is that the best speedup possible is 1/ o,
regardless of n, the number of processors.
System Efficiency 1
Assume the following definitions:
O (n) = total number of unit operations performed by an n-
processor system in completing a program P.
T (n) = execution time required to execute the program P on an n-
processor system.
O (n) can be considered similar to the total number of
instructions executed by the n processors, perhaps scaled
by a constant factor.
If we define O (1) = T (1), then it is logical to expect that
T (n) < O (n) when n > 1 if the program P is able to make
any use at all of the extra processor(s).
System Efficiency 2
Clearly, the speedup factor (how much faster the program
runs with n processors) can now be expressed as
S (n) = T (1) / T (n)

Recall that we expect T (n) < T (1), so S (n) > 1.
System efficiency is defined as
E (n) = S (n) / n = T (1) / ( n T (n) )
It indicates the actual degree of speedup achieved in a
system as compared with the maximum possible speedup.
Thus 1 / n s E (n) s 1. The value is 1/n when only one
processor is used (regardless of n), and the value is 1 when
all processors are fully utilized.
Redundancy
The redundancy in a parallel computation is defined as
R (n) = O (n) / O (1)
What values can R (n) obtain?
R (n) = 1 when O (n) = O (1), or when the number of operations
performed is independent of the number of processors, n. This is
the ideal case.
R (n) = n when all processors performs the same number of
operations as when only a single processor is used; this implies that
n completely redundant computations are performed!
The R (n) figure indicates to what extent the software
parallelism is carried over to the hardware implementation
without having extra operations performed.
System Utilization
System utilization is defined as
U (n) = R (n) E (n) = O (n) / ( n T (n) )
It indicates the degree to which the system
resources were kept busy during execution of the
program. Since 1 s R (n) s n, and 1 / n s E (n) s
1, the best possible value for U (n) is 1, and the
worst is 1 / n.
1 / n s E (n) s U (n) s 1
1 s R (n) s 1 / E (n) s n
Quality of Parallelism
The quality of a parallel computation is defined as
Q (n) = S (n) E (n) / R (n)
= T
3
(1) / ( n T
2
(n) O (n) )
This measure is directly related to speedup (S) and
efficiency (E), and inversely related to redundancy
(R).
The quality measure is bounded by the speedup
(that is, Q (n) s S (n) ).
Standard Industry Performance Measures
MIPS and Mflops, while easily understood, are poor
measures of system performance, since their interpretation
depends on machine clock cycles and instruction sets. For
example, which of these machines is faster?
a 10 MIPS CISC computer
a 20 MIPS RISC computer
It is impossible to tell without knowing more details about
the instruction sets on the machines. Even the question,
which machine is faster, is suspect, since we really need
to say faster at doing what?
Doing What?
To answer the doing what? question, several standard
programs are frequently used.
The Dhrystone benchmark uses no floating point instructions,
system calls, or library functions. It uses exclusively integer data
items. Each execution of the entire set of high-level language
statements is a Dhrystone, and a machine is rated as having a
performance of some number of Dhrystones per second (sometimes
reported as KDhrystones/sec).
The Whestone benchmark uses a more complex program involving
floating point and integer data, arrays, subroutines with
parameters, conditional branching, and library functions. It does
not, however, contain any obviously vectorizable code.
The performance of a machine on these benchmarks
depends in large measure on the compiler used to generate
the machine language. [Some companies have, in the
past, actually tweaked their compilers to specifically deal
with the benchmark programs!]
Whats VAX Got To Do With It?
The Digital Equipment VAX-11/780 computer for
many years has been commonly agreed to be a 1-
MIPS machine (whatever that means).
Since the VAX-11/780 also has a rating of about
1.7 KDhrystrones, this gives a method whereby a
relative MIPS rating for any other machine can be
derived: just run the Dhrystone benchmark on the
other machine, divide by 1.7K, and you then obtain
the relative MIPS rating for that machine
(sometimes also called VUPs, or VAX units of
performance).
Other Measures
Transactions per second (TPS) is a measure that is
appropriate for online systems like those used to support
ATMs, reservation systems, and point of sale terminals.
The measure may include communication overhead,
database search and update, and logging operations. The
benchmark is also useful for rating relational database
performance.
KLIPS is the measure of the number of logical inferences
per second that can be performed by a system, presumably
to relate how well that system will perform at certain AI
applications. Since one inference requires about 100
instructions (in the benchmark), a rating of 400 KLIPS is
roughly equivalent to 40 MIPS.

ANSI STD C57.120-1991 (IEEE Loss Evaluation Guide For
100% (1)
ANSI STD C57.120-1991 (IEEE Loss Evaluation Guide For
28 pages
001 MEP - Method Floor&Wall Penetration
100% (1)
001 MEP - Method Floor&Wall Penetration
7 pages
UK and US Shoe Sizes
No ratings yet
UK and US Shoe Sizes
11 pages
Dan Tow Manual SQL Tuning
No ratings yet
Dan Tow Manual SQL Tuning
41 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
15CS72 ACA Module1 Chapter3FinalCopy
No ratings yet
15CS72 ACA Module1 Chapter3FinalCopy
28 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Performance Measures and Metrics
0% (1)
Performance Measures and Metrics
21 pages
CS-3006_4_PerformanceAnalysis
No ratings yet
CS-3006_4_PerformanceAnalysis
62 pages
Performance&Scalability Ch3
No ratings yet
Performance&Scalability Ch3
41 pages
Chapter 3: Principles of Scalable Performance
No ratings yet
Chapter 3: Principles of Scalable Performance
41 pages
Module 3
No ratings yet
Module 3
23 pages
Clfaracterlzmlg Computer Perforiuance With A Single Hlum3Er
No ratings yet
Clfaracterlzmlg Computer Perforiuance With A Single Hlum3Er
5 pages
Module 1 Chapter3
No ratings yet
Module 1 Chapter3
45 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Karp
No ratings yet
Karp
5 pages
PDC Week 2 (Performance Metrice, Amdahl's Law)
No ratings yet
PDC Week 2 (Performance Metrice, Amdahl's Law)
18 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
SEN307-Lecture-5
No ratings yet
SEN307-Lecture-5
34 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
12 MPIProgramPerformance
No ratings yet
12 MPIProgramPerformance
33 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
ch2
No ratings yet
ch2
10 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Micro Processor and Assembly Language
No ratings yet
Micro Processor and Assembly Language
16 pages
Week_7 (1)
No ratings yet
Week_7 (1)
27 pages
Csc 301 Lecture Vii
No ratings yet
Csc 301 Lecture Vii
78 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Week 7
No ratings yet
Week 7
27 pages
Pepper Presentation
No ratings yet
Pepper Presentation
38 pages
2.Week
No ratings yet
2.Week
35 pages
Unit 2 - 2.1(Parallel Approaches) (1)
No ratings yet
Unit 2 - 2.1(Parallel Approaches) (1)
11 pages
High Performance Computing With Applications in R: Florian Schwendinger, Gregor Kastner, Stefan Theußl
No ratings yet
High Performance Computing With Applications in R: Florian Schwendinger, Gregor Kastner, Stefan Theußl
68 pages
Computer Performance
No ratings yet
Computer Performance
35 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
Week 9 Performance Metrics
No ratings yet
Week 9 Performance Metrics
10 pages
Computer Architecture and Performance
No ratings yet
Computer Architecture and Performance
33 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
12 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Unit 1 (2)
No ratings yet
Unit 1 (2)
54 pages
OOAD
No ratings yet
OOAD
67 pages
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
No ratings yet
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
9 pages
1 - Introduction To Computer System
No ratings yet
1 - Introduction To Computer System
31 pages
Session5-Processor Design Metrics-Perfomance Metrics
No ratings yet
Session5-Processor Design Metrics-Perfomance Metrics
10 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
PP 1
No ratings yet
PP 1
41 pages
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Tera Hi Hissa Hu Maa Tera Hi Baccha Hu
No ratings yet
Tera Hi Hissa Hu Maa Tera Hi Baccha Hu
2 pages
Shraddha Jain
No ratings yet
Shraddha Jain
2 pages
Application Form For Recognition As M.Tech. Project Guide
No ratings yet
Application Form For Recognition As M.Tech. Project Guide
1 page
Notification BSNL Rajasthan State TTA Posts
No ratings yet
Notification BSNL Rajasthan State TTA Posts
15 pages
Introduction: Unipolar Stepper Motor
No ratings yet
Introduction: Unipolar Stepper Motor
4 pages
Category - General O.B.C. (Creamy Layer) S.B.C. (Creamy Layer)
No ratings yet
Category - General O.B.C. (Creamy Layer) S.B.C. (Creamy Layer)
150 pages
Polyurea Grease Mobil SHC Polyrex
No ratings yet
Polyurea Grease Mobil SHC Polyrex
2 pages
Hipace300 Manual
No ratings yet
Hipace300 Manual
44 pages
CTEM Sheet Membrane Waterproofing
No ratings yet
CTEM Sheet Membrane Waterproofing
18 pages
Blue Print,.... Accunting & Finance
100% (11)
Blue Print,.... Accunting & Finance
21 pages
Vulnerability Disclosure Programs Vdps a Comprehensive Guide (2024!10!29 10-30-29 UTC)
No ratings yet
Vulnerability Disclosure Programs Vdps a Comprehensive Guide (2024!10!29 10-30-29 UTC)
13 pages
Vaq Raq
No ratings yet
Vaq Raq
4 pages
McCoyGlobalCatalog LowRes PDF
No ratings yet
McCoyGlobalCatalog LowRes PDF
60 pages
HP 8490xK
No ratings yet
HP 8490xK
10 pages
Electrodeposited Coatings of Zinc On Iron and Steel: Standard Specification For
No ratings yet
Electrodeposited Coatings of Zinc On Iron and Steel: Standard Specification For
5 pages
Astm A524 PDF
No ratings yet
Astm A524 PDF
2 pages
Quality Management Union Bank Iso 9001:2015
No ratings yet
Quality Management Union Bank Iso 9001:2015
2 pages
Cybersecurity of 5G Networks - EU Toolbox of Risk Mitigating Measures
100% (1)
Cybersecurity of 5G Networks - EU Toolbox of Risk Mitigating Measures
45 pages
Demand Factor-Diversity Factor-Utilization Factor-Load Factor - EEP
No ratings yet
Demand Factor-Diversity Factor-Utilization Factor-Load Factor - EEP
15 pages
Cylindrical Pins Is-2393
No ratings yet
Cylindrical Pins Is-2393
2 pages
(Ebook) Pipe Drafting and Design, Fourth Edition by Roy A. Parisher; Robert A. Rhea ISBN 9780128220481, 0128220481 - The ebook with all chapters is available with just one click
100% (2)
(Ebook) Pipe Drafting and Design, Fourth Edition by Roy A. Parisher; Robert A. Rhea ISBN 9780128220481, 0128220481 - The ebook with all chapters is available with just one click
84 pages
Geofill LD Abandonment Spec
No ratings yet
Geofill LD Abandonment Spec
3 pages
Whose Baby?
No ratings yet
Whose Baby?
5 pages
Atlas Copco Automan
No ratings yet
Atlas Copco Automan
6 pages
Java UDP Lab
0% (2)
Java UDP Lab
5 pages
1.3-Basic Packet Analysis Wireshark
No ratings yet
1.3-Basic Packet Analysis Wireshark
24 pages
FP1200C-2000C User Manual R7.0 (English)
No ratings yet
FP1200C-2000C User Manual R7.0 (English)
24 pages
Definición de Alarmas en Controladoras E1-T1: Error Events
No ratings yet
Definición de Alarmas en Controladoras E1-T1: Error Events
7 pages
2017-06 - SBI Substation Equipment Catalogue PDF
100% (2)
2017-06 - SBI Substation Equipment Catalogue PDF
196 pages
MoparChemCat2016 v00r01
No ratings yet
MoparChemCat2016 v00r01
46 pages
Hardie Oblique Cladding and Stria Cladding Vertical Installation Guide
No ratings yet
Hardie Oblique Cladding and Stria Cladding Vertical Installation Guide
16 pages
Nozzle Loads On Machinery - 31!08!2015
100% (2)
Nozzle Loads On Machinery - 31!08!2015
30 pages

CSCI 8150 Advanced Computer Architecture

Uploaded by

CSCI 8150 Advanced Computer Architecture

Uploaded by

CSCI 8150

Advanced Computer Architecture

You might also like