0% found this document useful (0 votes)
81 views131 pages

Unit 1: Computer Evolution Performance Measurement and Arithmetic

The document provides an overview of a course syllabus covering computer evolution, performance measurement, and arithmetic. The syllabus outlines topics such as the history of computers, Von Neumann and Harvard architectures, benchmarks for evaluating performance, and algorithms for multiplication and division along with their hardware implementations. It also includes a teaching plan specifying the topics and contents to be covered in each of the 8 lectures.

Uploaded by

Prachi Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views131 pages

Unit 1: Computer Evolution Performance Measurement and Arithmetic

The document provides an overview of a course syllabus covering computer evolution, performance measurement, and arithmetic. The syllabus outlines topics such as the history of computers, Von Neumann and Harvard architectures, benchmarks for evaluating performance, and algorithms for multiplication and division along with their hardware implementations. It also includes a teaching plan specifying the topics and contents to be covered in each of the 8 lectures.

Uploaded by

Prachi Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 131

UNIT 1

Computer Evolution Performance Measurement and Arithmetic


Syllabus
(No. of lectures allotted : 8 Hrs)
2

 A Brief History of computers


 Von Neumann Architecture
 Harvard architecture
 Computer Performance Measurement:
 Benchmarks (SPEC) for Evaluation
 Metrics such as CPU Time, Throughput, etc.
 Aspects & Factors affecting Computer Performance
 Comparing Computer Performances
 Marketing Metrics – MIPS & MFLOPS, Speedup & Amdahl’s Law
 Booths algorithm for signed multiplication and its Hardware
Implementation
 Division: Restoring and Non Restoring algorithms and its Hardware
Implementation

11/13/2019
Teaching Plan
Lect. Planned Conducti
3 Unit Topics & Contents Planned
No. Date on Date
1 I Syllabus overview, A Brief History of computers,
2 I Von Neumann Architecture, Harvard architecture
3 I Computer Performance Measurement:
Benchmarks (SPEC) for Evaluation
Metrics such as CPU Time, Throughput, etc.
4 I Aspects & Factors affecting Computer
Performance, Comparing Computer Performances
5 I Marketing Metrics – MIPS & MFLOPS, Speedup
& Amdahl’s Law
6 I Booths algorithm for multiplication and its Hardware
Implementation
7 I Bit pair Recoding method
8 I Division: Restoring and Non Restoring algorithms its
Hardware Implementation

11/13/2019
Architecture & Organization 1
 Architecture is those attributes visible to the
programmer
 Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
 e.g. Is there a multiply instruction?

 Organization is how features are implemented


 Control signals, interfaces, memory technology.
 e.g. Is there a hardware multiply unit or is it done by
repeated addition?
Architecture & Organization 2
 All Intel x86 family share the same basic
architecture
 The IBM System/370 family share the same basic
architecture

 This gives code compatibility


 At least backwards
 Organization differs between different versions
Structure & Function
 Structure is the way in which components relate to
each other
 Function is the operation of individual components
as part of the structure
Function
 All computer functions are:
 Data processing
 Data storage

 Data movement

 Control
Functional view

 Functional view of a computer


Data
Storage
Facility

Data
Control
Movement
Mechanism
Apparatus

Data
Processing
Facility
Operations (1)

 Data movement
 e.g. keyboard to screen
Data
Storage
Facility

Data
Control
Movement
Mechanism
Apparatus

Data
Processing
Facility
Operations (2)

 Storage
 e.g. Internet download to disk
Data
Storage
Facility

Data
Control
Movement
Mechanism
Apparatus

Data
Processing
Facility
Operation (3)

 Processing from/to storage


 e.g. updating bank statement
Data
Storage
Facility

Data
Control
Movement
Mechanism
Apparatus

Data
Processing
Facility
Operation (4)

 Processing from storage to I/O


 e.g. printing a bank statement
Data
Storage
Facility

Data
Control
Movement
Mechanism
Apparatus

Data
Processing
Facility
Structure - Top Level

Peripherals Computer

Central Main
Processing Memory
Unit

Computer
Systems
Interconnection

Input
Output
Communication
lines
Structure - The CPU

CPU

Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection

Control
Unit
Structure - The Control Unit

Control Unit

CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders

Control
Memory
Architecture & Organization
16

 All Intel x86 family share the same basic


architecture
 The IBM System/370 family share the same basic
architecture

 This gives code compatibility


 At least backwards
 Organization differs between different versions

11/13/2019
Computer system
17

Arithmetic and Logic Unit

Input
Output Main
Equipment Memory

Program Control Unit

11/13/2019
Functions of Computer
18

 Data processing
 Data storage

 Data movement

 Control

11/13/2019
19

Evolution of Computer

11/13/2019
Zeroth Generation : Mechanical Era
(1600-1940)
20

 Wilhelm Schickhard and Blaise Pascal (1623-1642)


: built a mechanical counter with carry
 Von Leibniz (1646-1716)
 Charles Babbage(1823-34): built the difference engine
 Babbage’s Differential Engine (1822-36)
 Babbage’s Analytical Engine
 George Boole (1847)
 Herman Hollerith (1880) – Punched Cards
 Konrad Zuse (1938):Automatic Calculating Machine based on
Electromagnetic mechanism

11/13/2019
First Generation : Vacuum Tubes
(1946-1957)
21

 1943-44: John Mauchly (professor) and J. Presper


Eckert (graduate student) built ENIAC (Electronic
Numerical Integrator and Computer) at U. Penn.
 1944: Howard Aiken used “separate data and
program memories” in MARK I computer – Harvard
Architecture.
 1945-52: John von Neumann proposed a “stored
program computer” EDVAC (Electronic Discrete
Variable Computer) – Von Neumann Architecture –
use the same memory for program and data.

11/13/2019
ENIAC – background:
22

 Electronic Numerical Integrator And Computer


 Eckert and Mauchly
 University of Pennsylvania
 Trajectory tables for weapons
 Started 1943
 Finished 1946
 Too late for war effort
 Used until 1955

11/13/2019
ENIAC - details
23

 Decimal (not binary)


 20 accumulators of 10 digits
 Programmed manually by switches
 Programming by Plugging and Unplugging of switches
 Results were punched on cards or printed on type writer

 18,000 vacuum tubes


 30 tons
 15,000 square feet
 140 kW power consumption
 5,000 additions per second

11/13/2019
First Generation (contd..)
24

 Stored Program Concept


 EDVAC ( Electronic Discrete Variable Computer)

 The Von-Neumann Machine (IAS Computer)

11/13/2019
First Generation ( Von-Neumann
Machine contd..)
25

 1000 x 40 bit words


 Binary number
 2 x 20 bit instructions.
 Each no. Represented by a sign bit & a 39 bit value
 OR a word may also contain 2, 20 bit instructions. Where 8
bit specifies op-code & remaining 12 bits for address of
one of the word in memory

11/13/2019
Ref: Computer Organization and Architecture by William Stallings
Von Neumann/Turing
26

 Stored Program concept


 Main memory storing programs and data
 ALU operating on binary data
 Control unit interpreting instructions from memory
and executing
 Input and output equipment operated by control unit
 Princeton Institute for Advanced Studies
 IAS

 Completed 1952
11/13/2019
Structure of Von Neumann machine
27

11/13/2019
Set of registers (storage in CPU):
28

 Memory Buffer Register:


Contains a word to be stored in memory or is used to
receive a word from memory
 Memory Address Register:
specifies the address in memory of the word to be
written from or read into the MBR
 Instruction Register:
contains the 8 bit op-code of instruction being executed
 Instruction Buffer Register:
employed to temporarily hold the right hand instruction
from a word in memory

11/13/2019
Conti..
29

 Program Counter:
contains the address of the next instruction pair to be
fetched from memory
 Accumulator & Multiplier Quotient:
employed to temporarily hold operands and results of
ALU operations.
The result of multiplying 2, 40 bit nos. is an 80 bit no.,
the most significant 40 bits are stored in the AC & the
least significant in the MQ.

11/13/2019
Structure of IAS –
30
detail

11/13/2019
Harvard Architecture
31

 Computers with separate program and data memories


implemented in ROMs and RAMs, respectively are
called Harvard architecture.
 Harvard architecture has separate buses for data and
program memory.
 Both memories can be accessed simultaneously.
 This reduces the chances of resource conflict related to
memory accesses.
 This has improved bandwidth over traditional Von
Neumann architecture.
 Suitable for RISC based micro- controllers.

11/13/2019
Commercial Computers
32

 1947 - Eckert-Mauchly Computer Corporation


 UNIVAC I (Universal Automatic Computer)
 US Bureau of Census 1950 calculations
 Became part of Sperry-Rand Corporation
 Late 1950s - UNIVAC II
 Faster
 More memory
 Birth of IBM and Sperry in 1950s.
 1953-IBM 701 for scientific calculations
 1955-IBM 702 for Business Application

11/13/2019
Features: First-Generation Computers
33

 Late 1940s and 1950s


 Stored-program computers
 Programmed in assembly language
 Used magnetic devices and earlier forms of
memories
 Examples: IAS, ENIAC, EDVAC, UNIVAC, Mark I, IBM
701

11/13/2019
Transistors
34

 Replaced vacuum tubes


 Smaller
 Cheaper
 Less heat dissipation
 Solid State device
 Made from Silicon (Sand)
 Invented 1947 at Bell Labs
 Multiplexer, Data Channel
 Concept of System Software

11/13/2019
Second Generation (1955-1965)
:Transistors
35

 1955 to 1964
 Transistor replaced vacuum tubes
 Magnetic core memories
 Floating-point arithmetic
 High-level languages used: ALGOL, COBOL and
FORTRAN
 System software: compilers, subroutine libraries,
batch processing
 Examples:
 IBM 7000, IBM 7094, DEC – 1957, PDP-1

11/13/2019
Microelectronics
36

 Literally - “small electronics”


 A computer is made up of gates, memory cells and
interconnections
 These can be manufactured on a semiconductor
 e.g. silicon wafer

11/13/2019
Third Generation (1965-1971) :
Integrated Circuit
37
 Beyond 1965
 Integrated circuit (IC) technology
 Semiconductor memories
 Memory hierarchy, virtual memories and caches
 Time-sharing
 Parallel processing and pipelining
 Microprogramming
 Examples:
 IBM 360 and 370, CYBER, ILLIAC IV, DEC PDP8 and
VAX, Amdahl 470
11/13/2019
All Generations of Computer
38

 Vacuum tube - 1946-1957


 Transistor - 1958-1964
 Small scale integration - 1965 on
 Up to 100 devices on a chip
 Medium scale integration - to 1971
 100-3,000 devices on a chip
 Large scale integration - 1971-1977
 3,000 - 100,000 devices on a chip
 Very large scale integration - 1978 to date
 100,000 - 100,000,000 devices on a chip
 Ultra large scale integration
 Over 100,000,000 devices on a chip

11/13/2019
Later Generation
39

 Intel
 1971 - 4004
 First microprocessor
 All CPU components on a single chip
 4 bit

 Followed in 1972 by 8008


 8 bit
 Both designed for specific applications

 1974 - 8080
 Intel’s first general purpose microprocessor

11/13/2019
Pentium Evolution (1)
40

 8080
 first general purpose microprocessor
 8 bit data path
 Used in first personal computer – Altair

 8086
 much more powerful
 16 bit
 instruction cache, prefetch few instructions
 8088 (8 bit external bus) used in first IBM PC

 80286
 16 Mbyte memory addressable
 up from 1Mb

 80386
 32 bit 11/13/2019
 Support for multitasking
Pentium Evolution (2)
41

 80486
 sophisticated powerful cache and instruction pipelining
 built in maths co-processor
 Pentium
 Superscalar
 Multiple instructions executed in parallel
 Pentium Pro
 Increased superscalar organization
 Aggressive register renaming
 branch prediction
 data flow analysis
 speculative execution

11/13/2019
Computer Performance Measurement
42

11/13/2019
43

Binary Arithmetic

11/13/2019
Scalar Data Types:
44  Objective:
 To study how the arithmetic operations are
performed on binary numbers.
 Information is stored in computer in the form of
binary digit, called bit.
 Only have 0 & 1 to represent everything
 Positive numbers stored in binary
e.g. (41)10=(00101001)2
 No minus sign

11/13/2019
1. Sign-Magnitude representation
45

 Left most bit (MSB) indicates sign bit:


 0 means positive
 1 means negative
eg:
 +18 = 00010010
 -18 = 10010010
 Problems
 Need to consider both sign and magnitude in arithmetic
 Two representations of zero (+0 and -0)
+ 0 = 0 000 0000
- 0 = 1 000 0000

11/13/2019
2. 1’s Complement representation
46

 Negative values are obtained by complementing each


bit of the corresponding positive number.
 +3 = 00000011
 +2 = 00000010
 +1 = 00000001
 +0 = 00000000
 -1 = 11111110
 -2 = 11111101
 -3 = 11111100

11/13/2019
3. 2’s Complement representation
47

 Negative values are obtained by adding 1 to 1’s


complement representation.

 +3 = 00000011
 +2 = 00000010
 +1 = 00000001
 +0 = 00000000
 -1 = 11111111
 -2 = 11111110
 -3 = 11111101

11/13/2019
Multiplication
48

 Complex
 Work out partial product for each digit
 Take care with place value (column)
 Add partial products

11/13/2019
Multiplication Example
49

 1011 Multiplicand (11)10


 x 1101 Multiplier (13)10
 1011 Partial products
 + 0000x Note: if multiplier bit is 1 copy
 + 1011xx multiplicand (place value)
 +1011xxxotherwise zero
 10001111 Product (143)10

 Note: need double length to store result

11/13/2019
Unsigned Binary Multiplication
50

11/13/2019
Flowchart for Unsigned Binary Multiplication
51
(initially carry (C) and accumulator (A) = 0 )
no. of cycles = no. of bits in multiplier = n

11/13/2019
Execution of Example
52

11/13/2019
Multiplying Negative Numbers
53

 Straight forward multiplication does not work with


negative nos.
 Solution 1
 Convert to positive if required
 Multiply as above
 If signs were different, negate answer

 Solution 2
 Booth’salgorithm
 Most common algorithm for multiplication 2’s
complement numbers
11/13/2019
Example: 7 x 3 = 21
54

0111 Multiplicand (7 dec)


x 0011 Multiplier (3 dec)
0111 Partial products
+ 0111x Note: if multiplier bit is 1 copy
+ 0000xx multiplicand (place value)
00000xxx otherwise zero
00010101 Product (21 dec)

11/13/2019
Operations in Booth’s algorithm
55
 (initially register (A) = 0 )
no. of cycles = no. of bits in multiplier = n
 A – M = A + (2’s complement of M)
eg: M = 0111, (-M) = 1001
 Logical right shifting:
A= 1 0 0 1
After shifting A becomes
eg: A = 0 1 0 0
 Arithmetic right shifting:
eg: A = 1 0 0 1
After shifting A becomes
eg: A = 1 1 0 0

11/13/2019
Booth’s Algorithm
56

11/13/2019
Example of Booth’s Algorithm
57

11/13/2019
Multiply -7 and 3 or multiply following pair of
2’s complement nos. using Booth’s algorithm:
58 11001 x 00011

 M= ( -7) = 11001, (-M) = 7 = 00111

So, M = ( -7) = 11001


 Q = 3 = 00011
 Result :-7 x 3 = 111101011 (which is 2’s
complement of 21)
 where, 21 = 000010101
11/13/2019
Example :- 7 X 3 -21
59 A Q Q-1 M
00000 00011 0 11001 initial values
00111 00011 0 11001 A= A-M

00011 10001 1 11001 shift } First cycle


00001 11000 1 11001 shift } Secondcycle
11010 11000 1 11001 A= A+M

11101 01100 0 11001 shift } Third cycle


11110 10110 0 11001 shift } Forth cycle
11111 01011 0 11001 shift } Fifth cycle
11/13/2019
(as sign bit is 1: result is negative, so take 2’s complement
Hardware implementation of Booth’s algorithm:
 M (B) : multiplicand
60  Q : multiplier
 A = 0 and Q-1 = 0
 It consists of n bit adder, shift,
add subtract control logic.
 N bit adder performs addition on
2 i/ps, one is A register and other
is either M or M(bar)
 In case of addition:
 Add(bar)/ sub = 0, therefore C0
= 0 and multiplicand is directly
applied as second input to the n-
bit adder. Complemented
 In case of subtraction:
 Add(bar)/ sub = 1, therefore C0
= 1 and multiplicand is
complemented and then applied
to the n- bit adder. As a result the
2’s complement of multiplicand
(M) is added to A register.

11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
01 011 01
x 001 1 11 0 (30)10
0 0 0 00 00
0 1 0 1 10 1
01 0 1 1 01
+ 10 1 1 0 1
01 01 1 0 1
000 0000
0 000 000
00 010 1010 001 10

Normal multiplication scheme

61 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier

• Multiplier (30)10 = 0011110 requires adding 4


shifted versions of the multiplicand.
•Booth Recoding technique is derived directly
from Booth’s algorithm.
• It halves the maximum no. of summands.

62 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
Multiplier
V ersion of multiplicand
selected by bit
Biti Biti -1

0 0 0 M
0 1 +1 M
1 0  1 M
1 1 0 M

Booth multiplier recoding table

63 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
implied zero
to right of
0 0 1 1 1 1 0 0 LSB before
recoding,

0 +1 0 0 0 -1 0

Booth recoding of a multiplier = 30 = 0011110,


here only two summands are required
.

64 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
implied
zero to
right of
0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0
LSB
before
recoding,

0 +1 - 1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1 +1 0 - 1 0 0

65 11/13/2019
Faster Multiplication – Booth Recoding of multiplier

Worst-case 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
multiplier
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0
Ordinary
multiplier
0 -1 0 0 +1 -1 +1 0 -1 +1 0 0 0 -1 0 0

0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0
Good
multiplier
0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1

Booth recoded multipliers

66 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
01 011 01
x 001 1 11 0 (30)10
0 0 0 00 00
0 1 0 1 10 1
01 0 1 1 01
+ 10 1 1 0 1
01 01 1 0 1
000 0000
0 000 000
00 010 1010 001 10

Normal multiplication scheme

67 11/13/2019
Faster Multiplication – Booth Recoding of multiplier
Booth multiplication scheme: 0101101 * 0011110

01 011 01
0+1 0 0 0 -1 0

00 000 0000 000 00


2's complement of
1 1 1 11 1 1 0 1 0 0 1 1 x
the multiplicand and
00 000 0000 000 x x sign extension
00 000 0000 00 x x x
00 000 0000 0 x x x x
00 010 1101 x x x x x
0 0 0 0 0 0 0 0 x x xx x x
00 010 1010 001 10

68 11/13/2019
Booths Bit pair recoding
69

 This technique halves the maximum no. of summands.


 Table: Multiplicand selection decisions
Multiplier bit pair Mulr bit Recoded mulr bits Multiplicand
i+1 i (i-1) selected
0 0 0 0 0 0*M
0 0 1 0 +1 +1*M
0 1 0 +1 -1 +1*M
0 1 1 +1 0 +2*M
1 0 0 -1 0 -2*M
1 0 1 -1 +1 -1*M
1 1 0 0 -1 -1*M
1 1 1 0 0 0*M
11/13/2019
Example of bit pair recoding derived
70
from Booth recoding:

 Multiplier: 111010 =(-6)


111010 0 – implied ‘0’ to right of LSB

Extended sign bit


1 1 1 0 1 0 0
0 0 -1 +1 -1 0
0 -1 -2

11/13/2019
Bit pair recoding
(+13) * (-6) = (- 78) = 1 1 1 0 1 1 0 0 1 0
71
(+13): 01101
(-6) : 11010
0 1 1 0 1
0 -1 -2
---------------------------------
1 1 1 1 1 0 0 1 1 0 2’s com, left shift
+ 1 1 1 1 0 0 1 1 x x 2’s com
+ 000000xxxx
---------------------------------
1110110010
11/13/2019
Normal multilpication:
72

0 1 1 0 1 (+13)
X 1 1 0 1 0 (-6)
---------------------------------
0 0 0 0
+ 111111 0 1 x
+ 000000 0 x x
111101 x x x
11101x x x x
---------------------------------
1110110 0 1 0

11/13/2019
Division
73

 More complex than multiplication


 Negative numbers are really bad!
 Based on long division

11/13/2019
Division of Unsigned Binary Integers
74

00001101 Quotient

Divisor 1011 10010011 Dividend


1011
001110
Partial
1011
Remainders
001111
1011
Remainder
100

11/13/2019
Hardware implementation of binary
75
division
 Register M: n bit positive
divisor
 Register Q: n bit positive
dividend
 Register A is set to zero
 Initial carry C is set to
zero
 After the division is
complete, the n bit
quotient is in register Q,
and remainder is in
register A
11/13/2019
Flowchart for Unsigned Binary Division-Restoring method
76

11/13/2019
Unsigned Binary Division-Restoring
77
method:
 M = divisor
 Q = dividend
 n= no. of cycles = size of dividend (Q)

 Size of M should be taken as (n+1), here one additional bit has


been taken to handle borrow.
 Size of A = size of M
 eg:
 Q = 8 =1000, i.e. n = 4
 M = 3 = 0011
 Take size of M =n+1 = 5 bits
 So, M = 00011, (-M) = 11101

11/13/2019
Example: 8/3 = 2 (Remainder=2)
78
Initialize A=00000 Q= 1000 M=00011

Step 1, L-shift A=00001 Q= 000?


Cycle 1

Step 2, Add -M 11101


A= 11110
Step 3, Set Q0 Q= 0000
Restore + M 00011
A= 00001

Step 1, L-shift A= 00010 Q= 000? M=00011


Step 2, Add -M 11101
Cycle 2

A= 11111
Step 3, Set Q0 Q= 0000
Restore + M 00011
A= 00010

11/13/2019
Example: 8/3 = 2 (Remainder=2)
(Continued)
79

A= 00010 Q= 0000 M= 00011

Step 1, L-shift A= 00100 Q= 000? M= 00011


Cycle 3

Step 2, Add -M 11101


A= 00001
Step 3, Set Q0 Q= 0001

Step 1, L-shift A= 00010 Q= 001? M= 00011


Step 2, Add -M 11101
A= 11111
Cycle 4

Step 3, Set Q0 Q= 0010


Restore + M 00011
A= 00010
Note “Restore A” in Steps 1, 2 and 4. This method is known as
the RESTORING DIVISION.

11/13/2019
Non-Restoring Division
80

 The algorithm of restoring division can be improved


by avoiding restoring after an unsuccessful
subtraction.
 Subtraction is said to be unsuccessful if the result is
negative (i.e. the result of A – M, whenever M > A).

11/13/2019
Non-Restoring Division
81
 Avoids the addition in the
restore operation – does
exactly one addition or subtract
per cycle.
 Non-restoring division algorithm:
 Step 1: Repeat n times
 Shift A, Q left one bit
 if sign of A is 0,
Subtract, A ← A - M
else (sign of A is 1),
Add, A ← A + M
 if sign of A is 0,
Set Q0 = 1
else (sign of A is 1),
Set Q0 = 0
 Step 2: (after n Step 1 iterations)
if sign of A is 1,
add A ← A + M
11/13/2019
Non-Restoring Division: 8/3 = 2 (Rem=2)
Initialize A= 00000 Q=1000 M= 00011
82
Step 1, L-shift A= 00001 Q=000? M= 00011
Cycle 1

Subtract - M 11101
A= 11110 Q= 0000

Step 1, L-shift A= 11100 Q= 000? M= 00011


Cycle 2

Add M 00011
A= 11111 Q= 0000

Step 1, L-shift A= 11110 Q= 000? M= 00011


Cycle 3

Add M 00011
A= 00001 Q= 0001
Cycle 4

Step 1, L-shift A= 00010 Q= 001? M= 00011


Subtract -M 11101
A= 11111 Q= 0010

Step 2, Add A ← A + M = 11111+00011 = 00010 (Final remainder)


11/13/2019
Computer Performance Measurement
83
and Evaluation
 Many dimensions to computer performance
 CPU execution time
 by instruction or sequence
 floating point
 integer
 branch performance

 Cache bandwidth
 Main memory bandwidth
 I/O performance
 bandwidth
 seeks
 pixels or polygons per second
 Relative importance depends on applications

11/13/2019
SPEC (System Performance Evaluation Cooperative) CPU Benchmark:

 Benchmark :
84
A program selected for use in comparing computer
performance.
 The benchmarks form a workload that the user hopes
will predict the performance of the actual workload.
 Workload:
programs run on a computer that is either the actual
collection of applications run by a user or
constructed from real programs to approximate such
a mix

11/13/2019
 SPEC is an effort funded and supported by a number of
computer vendors to create standard sets of
85 benchmarks for modern computer systems.
 In 1989, SPEC originally created a benchmark set
focusing on processor performance (now called
SPEC89), which has evolved through five generations.
 The latest is SPEC CPU2006, which consists of a set of
12 integer benchmarks (CINT2006) and 17 floating-
point benchmarks (CFP2006).
 The integer benchmarks vary from part of a C compiler
to a chess program to a quantum computer simulation.
The floating-point benchmarks include structured grid
codes for finite element modeling, particle method codes
for molecular dynamics, and sparse linear algebra
codes for fluid dynamics.

11/13/2019
Measuring and Reporting Performance
86

 What do we mean by one Computer is faster than


another?
 program runs less time

 Response time or execution time


 time that users see the output
 Elapsed time
 A latency to complete a task including disk accesses, memory accesses,
I/O activities, operating system overhead
 Throughput (Bandwidth)
 total amount of work done in a given time
 Response time and throughput often are in opposition

11/13/2019
Performance
87

“Increasing and decreasing” ?????

We use the term “improve performance” or “ improve


execution time” When we mean increase performance
and decrease execution time .

improve performance = increase performance


improve execution time = decrease execution time

11/13/2019
Metrics of Performance
88

Software Contributor Answers per month


Application Operations per second
Programming
Language
Compiler
(millions) of Instructions per second: MIPS
ISA (millions) of (FP) operations per second: MFLOP/s
Hardware Contributor
Datapath Megabytes per second
Control
Function Units Cycles per second (clock rate)
Transistors Wires Pins

11/13/2019
Does Anybody Really Know What Time
89
it is?
UNIX Time Command

 User CPU Time (Time spent in program): 90.7 Sec.


 System CPU Time (Time spent in OS):12.9 Sec.
 Elapsed Time (Response Time = 159 Sec.)
 (90.7+12.9)/159 * 100 = 65%, % of lapsed time that is CPU
time. 35% of the time spent in I/O or running other programs

11/13/2019
Time
90

CPU time
 time the CPU is computing
 not including the time waiting for I/O or running other
program
User CPU time
 CPU time spent in the program
System CPU time
 CPU time spent in the operating system performing task
requested by the program decrease execution time
CPU time = User CPU time + System CPU time

11/13/2019
Performance
91

System Performance
 elapsed time on unloaded system
CPU performance
 user CPU time on an unloaded system

11/13/2019
Computer Performance Measures: Program
Execution Time (1/2)
92

 For a specific program compiled to run on a specific


machine (CPU) “A”, the following parameters are
provided:
 The total instruction count of the program.
 The average number of cycles per instruction
(average CPI).
 Clock cycle of machine “A”

11/13/2019
Computer Performance Measures: Program
Execution Time (2/2)
93

 How can one measure the performance of this


machine running this program?
 Intuitively the machine is said to be faster or has better performance
running this program if the total execution time is shorter.
 Thus the inverse of the total measured program execution time is a
possible performance measure or metric:

PerformanceA = 1 / Execution TimeA

How to compare performance of different machines?


What factors affect performance? How to improve performance?!!!!

11/13/2019
Comparing Computer Performance Using Execution
Time
94

 To compare the performance of two machines (or CPUs) “A”,


“B” running a given specific program:
PerformanceA = 1 / Execution TimeA
PerformanceB = 1 / Execution TimeB

 Machine A is n times faster than machine B means (or slower?


if n < 1) :

PerformanceA Execution TimeB


Speedup = n = =
PerformanceB Execution TimeA

11/13/2019
Example
95

For a given program:


Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
Performanc e Execution Time
Speedup  A  B
Performanc e Execution Time
B A
10
  10
1
The performance of machine A is 10 times the performance of
machine B when running this program, or: Machine A is said to be 10 times
faster than machine B when running this program.

The two CPUs may target different ISAs provided


the program is written in a high level language (HLL) 11/13/2019
CPU Execution Time: The CPU Equation
96

 A program is comprised of a number of instructions executed , I


 Measured in: instructions/program
 The average instruction executed takes a number of cycles per instruction (CPI)
to be completed.
 Measured in: cycles/instruction, CPI
 CPU has a fixed clock cycle time C = 1/clock rate
 Measured in: seconds/cycle
 CPU execution time is the product of the above three parameters as follows:

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle

T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed

11/13/2019
CPU Execution Time: Example
97

 A Program is running on a specific machine with the following


parameters:
 Total executed instruction count: 10,000,000 instructions Average CPI for the
program: 2.5 cycles/instruction.
 CPU clock rate: 200 MHz. (clock cycle = 5x10-9 seconds)
 What is the execution time for this program:
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

CPU time = Instruction count x CPI x Clock cycle


= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= .125 seconds

11/13/2019
Aspects of CPU Execution Time
98
CPU Time = Instruction count x CPI x Clock cycle
Depends on: T = I x CPI x C
Program
Used
Compiler
ISA

Instruction Count I

(executed)

Depends on:
Program Used Depends on:
CPI Clock
Compiler CPU
Cycle
ISA Organization
(Average C
CPU Organization Technology
CPI) (VLSI)
11/13/2019
The basic components of performance
99
and how each is measured:
Components of Units of measure
performance
CPU execution time for a Seconds for the program
program
Instruction count Instructions executed for the
program
Clock cycles per instruction Average number of clock cycles
(CPI) per instruction
Clock cycle time Seconds per clock cycle

11/13/2019
Factors Affecting CPU Performance
100
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

Instruction
CPI Clock Cycle C
Count I
Program X X
Compiler X X
Instruction Set
Architecture (ISA) X X
Organization X X
(CPU Design)

Technology X
(VLSI)
11/13/2019
101

Hardware or Affects what? How?


software
components

Algorithm Instruction count, The algorithm determines the number of source program
possibly CPI instructions executed and hence the number of processor
instructions executed. The algorithm may also affect the CPI, by
favoring slower or faster instructions. For example, if the
algorithm uses more floating-point operations, it will tend to have
a higher CPI.
Programming Instruction count, The programming language certainly affects the instruction count,
language CPI since statements in the language are translated to processor
instructions, which determine instruction count. The language
may also affect the CPI because of its features; for example,
a language with heavy support for data abstraction (e.g., Java)
will require indirect calls, which will use higher CPI instructions.

11/13/2019
102
Hardware Affects How?
or software what?
components

Compiler Instruction count, The efficiency of the compiler affects both the instruction count
CPI and average cycles per instruction, since the compiler
determines
the translation of the source language instructions into computer
instructions. The compiler’s role can be very complex and affect
the CPI in complex ways.
Instruction set Instruction count, The instruction set architecture affects all three aspects of CPU
architecture clock rate, performance, since it affects the instructions needed for a
CPI function, the cost in cycles of each instruction, and the overall
clock rate of the processor.

11/13/2019
Performance Comparison: Example
103

 From the previous example: A Program is running on a specific machine with


the following parameters:
 Total executed instruction count, I: 10,000,000 instructions
 Average CPI for the program: 2.5 cycles/instruction.
 CPU clock rate: 200 MHz.
 Using the same program with these changes:
 A new compiler used: New instruction count 9,500,000
New CPI: 3.0
 Faster CPU implementation: New clock rate = 300 MHZ
 What is the speedup with the changes?

Speedup = Old Execution Time = Iold x CPIold x Clock cycleold


New Execution Time I x CPI
-9) / (9,500,000
new new x Clock Cycle
new -9
Speedup = (10,000,000 x 2.5 x 5x10 x 3 x 3.33x10 )
= .125 / .095 = 1.32
or 32 % faster after changes.

11/13/2019
Performance Terminology
104

“X is n% faster than Y” means:


ExTime(Y) Performance(X)
--------- = -------------- = 1 + n/100
ExTime(X) Performance(Y)

n = 100(Performance(X) - Performance(Y))
Performance(Y)

n = 100(ExTime(Y) - ExTime(X))
ExTime(X)

11/13/2019
Example
105

Example: Y takes 15 seconds to complete a task, X takes 10


seconds. What % faster is X?
n = 100(ExTime(Y) - ExTime(X)) / ExTime(X)

n = 100(15 - 10) / 10
n = 50%

11/13/2019
Speedup
106

Speedup due to enhancement E:


ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fractionenhanced of


the task by a factor Speedupenhanced , and the remainder of
the task is unaffected, then what is
ExTime(E) = ?
Speedup(E) = ?

11/13/2019
Amdahl’s Law
107

 States that the performance improvement to be gained


from using some faster mode of execution is limited by
the fraction of the time faster mode can be used

Speedup = Performance for entire task using the enhancement


Performance for the entire task without using the enhancement

or Speedup = Execution time without the enhancement


Execution time for entire task using the enhancement

11/13/2019
Amdahl’s Law
108

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced


Speedupenhanced

1
ExTimeold
Speedupoverall = =
ExTimenew (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced

11/13/2019
Example of Amdahl’s Law
109

 Floating point instructions improved to run 2X; but


only 10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Speedupoverall 1
= = 1.053
0.95

11/13/2019
Performance Enhancement Calculations:
Amdahl's Law
110

 The performance enhancement possible due to a given design improvement is limited by the
amount that the improved feature is used
 Amdahl’s Law:
Performance improvement or speedup due to enhancement E:

Execution Time without E Performance with E


Speedup(E) = -------------------------------------- = ---------------------------------
Execution Time with E Performance without E

Suppose that enhancement E accelerates a fraction F of the execution time by a factor S


and the remainder of the time is unaffected then:
Execution Time with E = ((1-F) + F/S) X Execution Time without E
Hence speedup is given by:

Execution Time without E 1


Speedup(E) = --------------------------------------------------------- = --------------------
((1 - F) + F/S) X Execution Time without E (1 - F) + F/S

11/13/2019
Pictorial Depiction of Amdahl’s Law
111
Enhancement E accelerates fraction F of original execution time by a factor of S
Before:
Execution Time without enhancement E: (Before enhancement is applied)
• shown normalized to 1 = (1-F) + F =1

Unaffected fraction: (1- F) Affected fraction: F

Unchanged

Unaffected fraction: (1- F) F/S


After:
Execution Time with enhancement E:

Execution Time without enhancement E 1


Speedup(E) = ------------------------------------------------------ = ------------------
Execution Time with enhancement E
11/13/2019 (1 - F) + F/S
Performance Enhancement Example (1/2)
112
 A program runs in 100 seconds on a machine with multiply operations
responsible for 80 seconds of this time. By how much must the speed of
multiplication be improved to make the program four times faster?

100
Desired speedup = 4 = -----------------------------------------------------
Execution Time with enhancement

 Execution time with enhancement = 25 seconds

25 seconds = (100 - 80 seconds) + 80 seconds / n


25 seconds = 20 seconds + 80 seconds / n
 5 = 80 seconds / n
 n = 80/5 = 16

Hence multiplication should be 16 times faster to get a speedup of 4.

11/13/2019
Performance Enhancement Example (2/2)
113

 For the previous example with a program running in 100 seconds on a


machine with multiply operations responsible for 80 seconds of this time.
By how much must the speed of multiplication be improved to make the
program five times faster?
100
Desired speedup = 5 = -----------------------------------------------------
Execution Time with enhancement

 Execution time with enhancement = 20 seconds

20 seconds = (100 - 80 seconds) + 80 seconds / n


20 seconds = 20 seconds + 80 seconds / n
 0 = 80 seconds / n

No amount of multiplication speed improvement can achieve this.

11/13/2019
Performance Enhancement Example
114
(2/3)
 Suppose that we want to enhance the processor used for Web
serving. The new processor is 10 times faster on computation in the
Web serving application than the original processor. Assuming that
the original processor is busy with computation 40% of the time and
is waiting for I/O 60% of the time, what is the overall speedup
gained by incorporating the enhancement? (Amdahl’s Law
problem)
Answer:
Fraction enhanced = 0.4,
Speedup enhanced = 10,
Speedup overall = 1/(0.6+ (0.4/10))
= 1/0.64 ≈ 1.56

11/13/2019
Marketing Metrics: MIPS Rating (1/3)
115

 For a specific program running on a specific CPU the MIPS rating is a measure
of how many millions of instructions are executed per second:
MIPS Rating = Instruction count / (Execution Time x 106)
= Instruction count / (CPU clocks x Cycle time x 106)
= (Instruction count x Clock rate) / (Instruction
count x CPI x 106)
= Clock rate / (CPI x 106)

 Major problem with MIPS rating: As shown above the MIPS rating does not
account for the count of instructions executed (I).
 A higher MIPS rating in many cases may not mean higher performance or
better execution time. i.e. due to compiler design variations.

11/13/2019
Marketing Metrics: MIPS Rating (2/3)
116

 In addition the MIPS rating:


 Does not account for the instruction set architecture (ISA)
used.
 Thus it cannot be used to compare computers/CPUs with different
instruction sets.

 Easy to abuse: Program used to get the MIPS rating is often


omitted.
 Often the Peak MIPS rating is provided for a given CPU which is
obtained using a program comprised entirely of instructions with the
lowest CPI for the given CPU design which does not represent real
programs.

11/13/2019
Marketing Metrics: MIPS Rating (3/3)
117

 Under what conditions can the MIPS rating be used to compare


performance of different CPUs?

 The MIPS rating is only valid to compare the performance of different CPUs
provided that the following conditions are satisfied:

1 The same program is used


(actually this applies to all performance metrics)

2 The same ISA is used

3 The same compiler is used

 (Thus the resulting programs used to run on the CPUs and obtain the MIPS
rating are identical at the machine code level including the same instruction
count)

11/13/2019
Compiler Variations, MIPS, Performance: An
Example (1/2)
118

 For the machine with instruction classes:


Instruction class CPI
A 1
B 2
C 3
 For a given program two compilers produced the following instruction counts:

Instruction counts (in millions)


for each instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
 The machine is assumed to run at a clock rate of 100 MHz

11/13/2019
Compiler Variations, MIPS, Performance: An
Example (2/2)
119
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)
CPI = CPU execution cycles / Instructions count

CPI  C 
n
CPU clock cycles   i i
i 1

CPU time = Instruction count x CPI / Clock rate


 For compiler 1:
 CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.43
 MIP Rating1 = 100 / (1.428 x 106) = 70.0
 CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds

 For compiler 2:
 CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
 MIPS Rating2 = 100 / (1.25 x 106) = 80.0
 CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds

11/13/2019
Marketing Metrics: MFLOPS (1/2)
120

 A floating-point operation is an addition, subtraction,


multiplication, or division operation applied to numbers
represented by a single or a double precision floating-point
representation.
 MFLOPS, for a specific program running on a specific
computer, is a measure of millions of floating point-operation
(megaflops) per second:
MFLOPS = Number of floating-point operations /
(Execution time x 106 )
 MFLOPS rating is a better comparison measure between
different machines (applies even if ISAs are different) than the
MIPS rating.
 Applicable even if ISAs are different

11/13/2019
Marketing Metrics: MFLOPS(2/2)
121

 Program-dependent: Different programs have


different percentages of floating-point operations
present. i.e compilers have no floating- point
operations and yield a MFLOPS rating of zero.
 Dependent on the type of floating-point operations
present in the program.
 Peak MFLOPS rating for a CPU: Obtained using a program comprised
entirely of the simplest floating point instructions (with the lowest CPI) for
the given CPU design which does not represent real floating point
programs.

11/13/2019
Other ways to measure performance
122

 (1) Use MIPS (millions of instructions/second)


MIPS =
Instruction Count = Clock Rate
Exec. Time x 106 CPI x 106

 MIPS is a rate of operations/unit time.

 Performance can be specified as the inverse of execution time


so faster machines have a higher MIPS rating

 So, bigger MIPS = faster machine. Right?


11/13/2019
Wrong!!!
123

 3 significant problems with using MIPS:


 Problem 1:
 MIPS is instruction set dependent.
 (And different computer brands usually have different instruction sets)

 Problem 2:
 MIPS varies between programs on the same computer

 Problem 3:
 MIPS can vary inversely to performance!
 Let’s look at an example of why MIPS doesn’t work…

11/13/2019
A MIPS Example (1)
124

 Consider the following computer:


Instruction counts (in millions) for each
instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

The machine runs at 100MHz.


Instruction A requires 1 clock cycle, Instruction B requires 2
clock cycles, Instruction C requires 3 clock cycles.
n
S CPIi x Ci
Note CPU Clock Cycles i =1
! important CPI = =
formula! Instruction Count Instruction Count
11/13/2019
125 count cycles

[(5x1) + (1x2) + (1x3)] x 106


CPI1 = = 10/7 = 1.43
(5 + 1 + 1) x 106
cycles
100 MHz
MIPS1= = 69.9
1.43 x 106

[(10x1) + (1x2) + (1x3)] x 106


CPI2 = = 15/12 = 1.25
(10 + 1 + 1) x 106
So, compiler 2 has a higher
100 MHz MIPS rating and should be Faster?
MIPS2 = = 80.0 1 million = 10 lac = 106
1.25 1 billion = 100 core = 109

11/13/2019
126

 Now let’s compare CPU time:


Note Instruction Count x CPI
! important CPU Time =
formula!
Clock Rate

7 x 106 x 1.43
CPU Time1 = = 0.10 seconds
100 x 106

12 x 106 x 1.25
CPU Time2 = = 0.15 seconds
100 x 106

Therefore program 1 is faster despite a lower MIPS!

11/13/2019
Required Reading
 Patterson : Chapter-1, page 26 onwards
 William Stalling : Chapter-2, page 68 onwards
 Zaky : Chapter-1, page 13 onwards

11/13/2019
127
Subjective Question Bank
1. Explain Von Neumann architecture with the help of a neat diagram.
2. Explain Harvard architecture with the help of a neat diagram.
3. Compare Von Neumann architecture with Harvard architecture.
4. What are the attributes of architecture to programmer?
5. What features are considered in computer organization?
6. Draw evolution roadmap of computer.
7. List functions of following registers of CPU: Memory buffer register, memory address register, instruction
register, instruction buffer register
8. Draw and explain the hardware implementation of Booth’s multiplication algorithm and explain the
same.
9. Draw flowchart of Booth’s algorithm for signed multiplication.
10. Perform multiplication operation on the given nos. using Booth’s algorithm :
a. Multiplicand= 1001, Multiplier = 1101
b. Multiplicand = 11011,Multiplier = 00111
11. How does bit pair recoding technique achieves faster multiplication? Bit pair recode multipliers:
(110110101111001)2 and (0101101010010101)2
12. Draw and explain the hardware implementation of Booth’s division algorithm and explain the same.
13. Compare restoring and non- restoring division algorithms.
14. Draw flowchart of Booth’s algorithm for restoring unsigned division and Perform restoring division for the
following: a. Dividend= 1010, Divisor = 11
b. Dividend = 1100, Divisor = 0011
128 11/13/2019
15. Draw flowchart of Booth’s algorithm for non restoring unsigned division and
divide the following unsigned numbers and justify your answer:
a. Dividend= (15)10, Divisor = (2)10
b. Dividend = 1101, Divisor = 0100
16. List out and explain computer performance measures considered in design.
17. List out levels of programs/benchmarks to evaluate computer performance .
18. What are the types of benchmarks for performance evaluation of
computer?
19. List out SPEC marks.
20. Explain different metrics of computer performance .
21. What is the relation between CPU time, user CPU time and system CPU
Time?What is the difference between CPU and system performance?
22. What are the aspects of CPU execution time.
23. List out and explain factors affecting computer performance.
24. What is the relation between CPI, CPU clock cycles and instruction
count?
25. State Amdahl’s law.
129 11/13/2019
26. What is meant by performance improvement due to enhancement?
27. What is the relation between enhancement E, execition time without
enhancement and execition time with enhancement?
28. What is the relation between average speed, total distance and total time?
29. Define MIPS rating and explain problems associated with it.
30. When MIPS rating is used to compare performance of different CPU’s?
31. What is the relation between CPU time, instruction count , CPI and clock
rate?
32. What is the relation between CPU, CPUexecution cycles and instruction
count ?
33. Define MFLOPS reated to computer.
34. List out problems with MIPS.
35. Do the following changesin the computer system increase throughput,
drease response time, or both?
1. Replacing the processor in a computer with a faster version.
2. Adding additional processors to a system that uses multiple processors
for separate taskslike searching the World Wide Web.

130 11/13/2019
36. Suppose that we want to enhance the processor used for Web serving. The new processor is
10 times faster on computation in the Web serving application than the original processor.
Assuming that the original processor is busy with computation 40% of the time and is waiting
for I/O 60% of the time, what is the overall speedup gained by incorporating the
enhancement? (Amdahl’s Law problem)
37. A program runs in 10 seconds on computer A, which has 2 2GHz clock. We are trying to
help a computer designer build a computer B, which will run this program in 6 seconds. The
designer has determined that a substantial increase in the clock rate possible, but this
increase will affect the rest of the CPU design, causing computer B to require 1.2 times as
many clock cycles as computer A for this program. What clock rate we should we tell the
designer to target?
38. Suppose we have two implementations of the same instruction setarchitecture. Computer A
has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a
clock cycle time of 500ps and a CPI of 1.2 for the same program. Which computer is faster
for this program and by how much?
39. Consider performance measurements for a program: as shown in the table
a. Which computer has the higher MIPS rating?
b. Which computer is faster?
Measirment Computer A Computer B
Instruction count 10 billion 8 billion
Clock rate 4 GHz 4 GHz
CPI 1.0 1.1
131 11/13/2019

You might also like