0% found this document useful (0 votes)

13 views

L7 Performance

Uploaded by

Hitin Chandra Reddy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

L7 Performance

Uploaded by

Hitin Chandra Reddy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

EC340 COA

Performance
• Speed up of execution
– Response time
• How long it takes to do a task
• Throughput
– Total work done per unit time
• Instruction set, Hardware
• Software – OS, Compilers

• CPU time – user CPU time + system CPU time

Page 22 COA August 2024

Understanding performance
• Algorithm
– Determines number of operations executed
• Programming language, compiler,
architecture
– Determine number of machine instructions
executed per operation
• Processor and memory system
– Determine how fast instructions are executed
• I/O system (including OS)
– Determines how fast I/O operations are executed
Page 23 COA August 2024

Dept of E&C, NITK Surathkal 1

EC340 COA

Understanding performance
• How programs are translated into the machine
language
– And how the hardware executes them
• The hardware/software interface
• What determines performance of a program
• How hardware designers improve performance
– parallel processing
• How to improve energy efficiency

Page 24 COA August 2024

Seven great ideas

• Use abstraction to simplify design
• Make the common case fast
• Performance via parallelism
• Performance via pipelining
• Performance via prediction
• Hierarchy of memories
• Dependability via redundancy

Page 25 COA August 2024

Dept of E&C, NITK Surathkal 2

EC340 COA

CPU performance
• Performance = 1/Execution Time
• CPU time = CPU clocks x clock cycle time (tc)
• CPU clocks = Instruction count x clocks/instruction
• CPU time = IC x CPI/clock frequency (fc)
– CPI – average clocks per instruction
• Determined by CPU hardware
• If different instructions have different CPI
– Average CPI affected by instruction mix
• compare two different implementations of the same ISA
– IC – Instruction count
• Determined by program, ISA and compiler
– ISA – Instruction set architecture
Page 26 COA August 2024

Example
Computer tc CPI CPU time Rel Perf
A 250ps 2 ICx2x250ps 𝑃𝑒𝑟𝑓 𝐴 𝐶𝑃𝑈𝑡𝑖𝑚𝑒𝐵
= = 1.2
B 500ps 1.2 ICx1.2x500ps 𝑃𝑒𝑟𝑓 𝐵 𝐶𝑃𝑈𝑡𝑖𝑚𝑒𝐴

If different instruction classes take different numbers of cycles

n
Clock Cycles =  (CPIi  Instructio n Count i )
i=1

Weighted average CPI

Clock Cycles n
 Instructio n Count i 
CPI = =   CPIi  
Instructio n Count i=1  Instructio n Count 

Page 27 COA August 2024

Dept of E&C, NITK Surathkal 3

EC340 COA

Power

5-<1V
Power  C V2 fc • Dynamic Power
• Leakage
×30 ×1000 Courtesy- H&P, Computer Organisation, 6e

Page 28 COA August 2024

Reducing power
• Suppose a new CPU has
– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction

Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85

= = 0.854 = 0.52
Cold  Vold  Fold
2
Pold

◼ The power wall

◼ We can’t reduce voltage further

◼ We can’t remove more heat

◼ How else can we improve performance?

Page 29 COA August 2024

Dept of E&C, NITK Surathkal 4

EC340 COA

Processor performance

Constrained by power, instruction-level parallelism,

Courtesy- H&P, Computer Organisation, 6e
memory latency
Page 30 COA August 2024

Multicore processors
• Requires explicitly parallel programming
• Hardware executes multiple instructions at once
• Hidden from the programmer
• Programming for performance
• Scheduling
• Load balancing
• Optimizing communication and synchronization

Page 31 COA August 2024

Dept of E&C, NITK Surathkal 5

EC340 COA

SPEC CPU Benchmark

• Programs used to measure performance
– Supposedly typical of actual workload
• Standard Performance Evaluation Coop (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2017
– Elapsed time to execute a selection of programs
– Negligible I/O, so focuses on CPU performance
– Normalize relative to reference machine
– Integer (10) and floating-point (13)
– Summarize as geometric mean of performance ratios

n
n
Execution time ratio
i=1
i

Page 32 COA August 2024

SPECspeed 2017 Integer benchmarks on a

1.8 GHz Intel Xeon E5-2650L

Courtesy- H&P, Computer Organisation, 6e

Page 33 COA August 2024

Dept of E&C, NITK Surathkal 6

EC340 COA

SPEC power benchmark

• Power consumption of server at different
workload levels
– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_ops i    power i 
 i=0   i=0 

Page 34 COA August 2024

SPECpower_ssj2008 for Xeon E5-2650L

 10   10 
Overall ssj_ops per Watt =   ssj_ops i    power i  Courtesy- H&P, Computer Organisation, 6e
 i=0   i=0 
Page 35 COA August 2024

Dept of E&C, NITK Surathkal 7

EC340 COA

Amdahl’s Law
• Improving an aspect of a computer and expecting a
proportional improvement in overall performance
• Make the common case fastest

Taf f ected
Timprov ed = + Tunaf f ected
improvemen t factor

◼ Example: multiply accounts for 80s/100s

◼ How much improvement in multiply performance to
get 5× overall?
80 ◼ Can’t be done!
20 = + 20
n
Page 36 COA August 2024

Example
• Consider three different processors P1, P2, and P3 executing the
same instruction set. P1 has a 3 GHz clock rate and a CPI of
1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0
GHz clock rate and has a CPI of 2.2.
– Which processor has the highest performance expressed in instructions per
second?

Processor Instns/sec
P1 3x109/ 1.5 =2x109
P2 2.5x109/ 1 = 2.5x109
P3 4x109 / 2.2 = 1.8x109

Page 37 COA August 2024

Dept of E&C, NITK Surathkal 8

EC340 COA

Example
• Consider two different implementations of the same ISA. The
instructions can be divided into four classes according to their CPI
(class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2,
3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.
Given a program with a dynamic instruction count of 1.0E6 instructions
divided into classes as follows: 10% class A, 20% class B, 50% class C,
and 20% class D, which implementation is faster? What is the global
CPI for each implementation?

– Time = No. instr. x CPI/clock rate

Processor Total Time CPI

P1 10.4x10-4 s 2.6
P2 6.66 x10-4 s 2

Page 38 COA August 2024

Exercise
• A processor has CPIs of 1, 12, and 5, respectively for arithmetic,
load/store, and branch instructions, Assume that
– On a single processor a program requires the execution of 2.56E9
arithmetic instructions, 1.28E9 load/store instructions, and 256
million branch instructions.
– Each processor has a 2 GHz clock frequency.
– As the program is parallelized to run over multiple cores, the
number of arithmetic and load/store instructions per processor is
divided by 0.7 x p (where p is the number of processors) but the
number of branch instructions per processor remains the same.
• Find the total execution time for this program on 1, 2, 4, and 8
processors, and show the relative speedup of the 2, 4, and 8
processor result relative to the single processor result.

Page 39 COA August 2024

Dept of E&C, NITK Surathkal 9

EC340 COA

Exercise
• A computer spends 30 percent of its time accessing memory, 20
percent performing multiplications, and 50 percent executing
other instructions. As a computer architect, you have to choose
between improving either the memory, multiplication hardware,
or execution of non multiplication instructions. There is only
space on the chip for one improvement, and each of the
improvements will improve its associated part of the
computation by a factor of 2.
– Without performing any calculations, which improvement would
you expect to give the largest performance increase, and why?
– What speedup would making each of the three changes give?

Page 40 COA August 2024

MIPS as performance benchmark

• MIPS: Millions of Instructions Per Second
– Doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions

Instructio n count
MIPS =
Execution time  106
Instructio n count Clock rate
= =
Instructio n count  CPI CPI 106
 106
Clock rate

◼ CPI varies between programs on a given CPU

Page 41 COA August 2024

Dept of E&C, NITK Surathkal 10

EC340 COA

Summary
• Cost/performance is improving
– Due to underlying technology development
• Hierarchical layers of abstraction
– In both hardware and software
• Instruction set architecture
– The hardware/software interface
• Execution time: the best performance measure
• Power is a limiting factor
– Use parallelism to improve performance

Page 42 COA August 2024

Dept of E&C, NITK Surathkal 11

Lecture # 2
No ratings yet
Lecture # 2
33 pages
Discussion Session 4-11
No ratings yet
Discussion Session 4-11
12 pages
Module 2 [26-10-2024]
No ratings yet
Module 2 [26-10-2024]
50 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Intro
No ratings yet
Intro
14 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
09 Perf
No ratings yet
09 Perf
22 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
SEN307-Lecture-5
No ratings yet
SEN307-Lecture-5
34 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
28 pages
2 CPU Performance
No ratings yet
2 CPU Performance
35 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
2_Computer Organization and Architecture
No ratings yet
2_Computer Organization and Architecture
21 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
550 12 6 2011 PDF
No ratings yet
550 12 6 2011 PDF
45 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Performance
No ratings yet
Performance
51 pages
DHXD - Chuong 8. Performance
No ratings yet
DHXD - Chuong 8. Performance
27 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Performance Numericals
No ratings yet
Performance Numericals
24 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
2024 Lecture3 Come321
No ratings yet
2024 Lecture3 Come321
23 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Comp Org Notes On Measuring Cpu Performance
No ratings yet
Comp Org Notes On Measuring Cpu Performance
4 pages
COAL- Week 5 - Chap 2 (William Stallings)
No ratings yet
COAL- Week 5 - Chap 2 (William Stallings)
52 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Performance of Processor1
No ratings yet
Performance of Processor1
9 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Lecture4 Performance Evaluation
No ratings yet
Lecture4 Performance Evaluation
34 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Computer Component Performance-Nguyễn Hoàng Long - BI11-157
100% (1)
Computer Component Performance-Nguyễn Hoàng Long - BI11-157
9 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
COMP 303 Computer Architecture
No ratings yet
COMP 303 Computer Architecture
34 pages
Lect 1
No ratings yet
Lect 1
56 pages
Lect 1
No ratings yet
Lect 1
54 pages
CA 02 Performance
No ratings yet
CA 02 Performance
21 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Practice - Creating Oracle Database with ASM
No ratings yet
Practice - Creating Oracle Database with ASM
32 pages
Unit 9 DOS Attack
No ratings yet
Unit 9 DOS Attack
9 pages
How Do We Do Each of These Steps?: Step 1: Create A 3x3 Array To Represent The Tic Tac Toe Board and Fill It With Dashes
No ratings yet
How Do We Do Each of These Steps?: Step 1: Create A 3x3 Array To Represent The Tic Tac Toe Board and Fill It With Dashes
6 pages
Integrating A CCS Spectrometer in MATLAB
No ratings yet
Integrating A CCS Spectrometer in MATLAB
5 pages
Excel Manual - Student
No ratings yet
Excel Manual - Student
28 pages
Customisable Dynamic Titles in SAP Analytics Cloud - The
No ratings yet
Customisable Dynamic Titles in SAP Analytics Cloud - The
11 pages
Ghotit Speak It! Dictanote Soundnote Quizlet: Name of Product, Type
No ratings yet
Ghotit Speak It! Dictanote Soundnote Quizlet: Name of Product, Type
10 pages
Solved QBank_CSV Files
No ratings yet
Solved QBank_CSV Files
10 pages
R72 Boot Sequence V1.00a ENG
No ratings yet
R72 Boot Sequence V1.00a ENG
3 pages
Major Project
No ratings yet
Major Project
32 pages
Smart Hotel Menu Ordering System PDF
63% (8)
Smart Hotel Menu Ordering System PDF
37 pages
Design Psychology 6 Concepts Every UX Designer Should Know
No ratings yet
Design Psychology 6 Concepts Every UX Designer Should Know
12 pages
Company Setup: 1. How To Create A New Company
No ratings yet
Company Setup: 1. How To Create A New Company
93 pages
The Programmer (Corrected Edition) by Bruce Jackson Book Preview
No ratings yet
The Programmer (Corrected Edition) by Bruce Jackson Book Preview
28 pages
UNIT-3
No ratings yet
UNIT-3
19 pages
Configuring Recovery Services Vault Slides
No ratings yet
Configuring Recovery Services Vault Slides
13 pages
Cikakkiyar Kariya: Salisu Abdulrazak
No ratings yet
Cikakkiyar Kariya: Salisu Abdulrazak
42 pages
Sonim XP Strike User Manual PDF
No ratings yet
Sonim XP Strike User Manual PDF
111 pages
UNIT 2 Hand Written
No ratings yet
UNIT 2 Hand Written
77 pages
Jam Py PDF
No ratings yet
Jam Py PDF
451 pages
Array Summation
No ratings yet
Array Summation
1 page
Jenny Rahayu Afsebel Situmorang - E-Test Assessment and Certification
No ratings yet
Jenny Rahayu Afsebel Situmorang - E-Test Assessment and Certification
10 pages
Budget Buddy
No ratings yet
Budget Buddy
9 pages
Slot3 SWE201c
No ratings yet
Slot3 SWE201c
79 pages
SAC Tutorial 1: iRIC Software
No ratings yet
SAC Tutorial 1: iRIC Software
9 pages
Data Representation
No ratings yet
Data Representation
21 pages
BR81459 - EN - 01 Valmet DNA Field Device Manager EN
No ratings yet
BR81459 - EN - 01 Valmet DNA Field Device Manager EN
2 pages
The Teaching Profession Book Philippines PDF
0% (1)
The Teaching Profession Book Philippines PDF
2 pages
Create and Use Spreadsheets
No ratings yet
Create and Use Spreadsheets
30 pages
In The Key of Your Commands
No ratings yet
In The Key of Your Commands
8 pages

L7 Performance

Uploaded by

L7 Performance

Uploaded by

EC340 COA

• CPU time – user CPU time + system CPU time

Page 22 COA August 2024

Dept of E&C, NITK Surathkal 1

Page 24 COA August 2024

Seven great ideas

Page 25 COA August 2024

Dept of E&C, NITK Surathkal 2

If different instruction classes take different numbers of cycles

Weighted average CPI

Page 27 COA August 2024

Dept of E&C, NITK Surathkal 3

Page 28 COA August 2024

Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85

◼ The power wall

◼ We can’t remove more heat

◼ How else can we improve performance?

Page 29 COA August 2024

Dept of E&C, NITK Surathkal 4

Constrained by power, instruction-level parallelism,

Page 31 COA August 2024

Dept of E&C, NITK Surathkal 5

SPEC CPU Benchmark

Page 32 COA August 2024

SPECspeed 2017 Integer benchmarks on a

Courtesy- H&P, Computer Organisation, 6e

Page 33 COA August 2024

Dept of E&C, NITK Surathkal 6

SPEC power benchmark

Page 34 COA August 2024

SPECpower_ssj2008 for Xeon E5-2650L

Dept of E&C, NITK Surathkal 7

◼ Example: multiply accounts for 80s/100s

Page 37 COA August 2024

Dept of E&C, NITK Surathkal 8

– Time = No. instr. x CPI/clock rate

Processor Total Time CPI

Page 38 COA August 2024

Page 39 COA August 2024

Dept of E&C, NITK Surathkal 9

Page 40 COA August 2024

MIPS as performance benchmark

◼ CPI varies between programs on a given CPU

Page 41 COA August 2024

Dept of E&C, NITK Surathkal 10

Page 42 COA August 2024

Dept of E&C, NITK Surathkal 11

You might also like