Unit 1: Computer Evolution Performance Measurement and Arithmetic
Unit 1: Computer Evolution Performance Measurement and Arithmetic
11/13/2019
Teaching Plan
Lect. Planned Conducti
3 Unit Topics & Contents Planned
No. Date on Date
1 I Syllabus overview, A Brief History of computers,
2 I Von Neumann Architecture, Harvard architecture
3 I Computer Performance Measurement:
Benchmarks (SPEC) for Evaluation
Metrics such as CPU Time, Throughput, etc.
4 I Aspects & Factors affecting Computer
Performance, Comparing Computer Performances
5 I Marketing Metrics – MIPS & MFLOPS, Speedup
& Amdahl’s Law
6 I Booths algorithm for multiplication and its Hardware
Implementation
7 I Bit pair Recoding method
8 I Division: Restoring and Non Restoring algorithms its
Hardware Implementation
11/13/2019
Architecture & Organization 1
Architecture is those attributes visible to the
programmer
Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
e.g. Is there a multiply instruction?
Data movement
Control
Functional view
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operations (1)
Data movement
e.g. keyboard to screen
Data
Storage
Facility
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operations (2)
Storage
e.g. Internet download to disk
Data
Storage
Facility
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operation (3)
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operation (4)
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Structure - Top Level
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
Architecture & Organization
16
11/13/2019
Computer system
17
Input
Output Main
Equipment Memory
11/13/2019
Functions of Computer
18
Data processing
Data storage
Data movement
Control
11/13/2019
19
Evolution of Computer
11/13/2019
Zeroth Generation : Mechanical Era
(1600-1940)
20
11/13/2019
First Generation : Vacuum Tubes
(1946-1957)
21
11/13/2019
ENIAC – background:
22
11/13/2019
ENIAC - details
23
11/13/2019
First Generation (contd..)
24
11/13/2019
First Generation ( Von-Neumann
Machine contd..)
25
11/13/2019
Ref: Computer Organization and Architecture by William Stallings
Von Neumann/Turing
26
Completed 1952
11/13/2019
Structure of Von Neumann machine
27
11/13/2019
Set of registers (storage in CPU):
28
11/13/2019
Conti..
29
Program Counter:
contains the address of the next instruction pair to be
fetched from memory
Accumulator & Multiplier Quotient:
employed to temporarily hold operands and results of
ALU operations.
The result of multiplying 2, 40 bit nos. is an 80 bit no.,
the most significant 40 bits are stored in the AC & the
least significant in the MQ.
11/13/2019
Structure of IAS –
30
detail
11/13/2019
Harvard Architecture
31
11/13/2019
Commercial Computers
32
11/13/2019
Features: First-Generation Computers
33
11/13/2019
Transistors
34
11/13/2019
Second Generation (1955-1965)
:Transistors
35
1955 to 1964
Transistor replaced vacuum tubes
Magnetic core memories
Floating-point arithmetic
High-level languages used: ALGOL, COBOL and
FORTRAN
System software: compilers, subroutine libraries,
batch processing
Examples:
IBM 7000, IBM 7094, DEC – 1957, PDP-1
11/13/2019
Microelectronics
36
11/13/2019
Third Generation (1965-1971) :
Integrated Circuit
37
Beyond 1965
Integrated circuit (IC) technology
Semiconductor memories
Memory hierarchy, virtual memories and caches
Time-sharing
Parallel processing and pipelining
Microprogramming
Examples:
IBM 360 and 370, CYBER, ILLIAC IV, DEC PDP8 and
VAX, Amdahl 470
11/13/2019
All Generations of Computer
38
11/13/2019
Later Generation
39
Intel
1971 - 4004
First microprocessor
All CPU components on a single chip
4 bit
1974 - 8080
Intel’s first general purpose microprocessor
11/13/2019
Pentium Evolution (1)
40
8080
first general purpose microprocessor
8 bit data path
Used in first personal computer – Altair
8086
much more powerful
16 bit
instruction cache, prefetch few instructions
8088 (8 bit external bus) used in first IBM PC
80286
16 Mbyte memory addressable
up from 1Mb
80386
32 bit 11/13/2019
Support for multitasking
Pentium Evolution (2)
41
80486
sophisticated powerful cache and instruction pipelining
built in maths co-processor
Pentium
Superscalar
Multiple instructions executed in parallel
Pentium Pro
Increased superscalar organization
Aggressive register renaming
branch prediction
data flow analysis
speculative execution
11/13/2019
Computer Performance Measurement
42
11/13/2019
43
Binary Arithmetic
11/13/2019
Scalar Data Types:
44 Objective:
To study how the arithmetic operations are
performed on binary numbers.
Information is stored in computer in the form of
binary digit, called bit.
Only have 0 & 1 to represent everything
Positive numbers stored in binary
e.g. (41)10=(00101001)2
No minus sign
11/13/2019
1. Sign-Magnitude representation
45
11/13/2019
2. 1’s Complement representation
46
11/13/2019
3. 2’s Complement representation
47
+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101
11/13/2019
Multiplication
48
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
11/13/2019
Multiplication Example
49
11/13/2019
Unsigned Binary Multiplication
50
11/13/2019
Flowchart for Unsigned Binary Multiplication
51
(initially carry (C) and accumulator (A) = 0 )
no. of cycles = no. of bits in multiplier = n
11/13/2019
Execution of Example
52
11/13/2019
Multiplying Negative Numbers
53
Solution 2
Booth’salgorithm
Most common algorithm for multiplication 2’s
complement numbers
11/13/2019
Example: 7 x 3 = 21
54
11/13/2019
Operations in Booth’s algorithm
55
(initially register (A) = 0 )
no. of cycles = no. of bits in multiplier = n
A – M = A + (2’s complement of M)
eg: M = 0111, (-M) = 1001
Logical right shifting:
A= 1 0 0 1
After shifting A becomes
eg: A = 0 1 0 0
Arithmetic right shifting:
eg: A = 1 0 0 1
After shifting A becomes
eg: A = 1 1 0 0
11/13/2019
Booth’s Algorithm
56
11/13/2019
Example of Booth’s Algorithm
57
11/13/2019
Multiply -7 and 3 or multiply following pair of
2’s complement nos. using Booth’s algorithm:
58 11001 x 00011
11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
01 011 01
x 001 1 11 0 (30)10
0 0 0 00 00
0 1 0 1 10 1
01 0 1 1 01
+ 10 1 1 0 1
01 01 1 0 1
000 0000
0 000 000
00 010 1010 001 10
61 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
62 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
Multiplier
V ersion of multiplicand
selected by bit
Biti Biti -1
0 0 0 M
0 1 +1 M
1 0 1 M
1 1 0 M
63 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
implied zero
to right of
0 0 1 1 1 1 0 0 LSB before
recoding,
0 +1 0 0 0 -1 0
64 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
implied
zero to
right of
0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0
LSB
before
recoding,
0 +1 - 1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1 +1 0 - 1 0 0
65 11/13/2019
Faster Multiplication – Booth Recoding of multiplier
Worst-case 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
multiplier
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0
Ordinary
multiplier
0 -1 0 0 +1 -1 +1 0 -1 +1 0 0 0 -1 0 0
0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0
Good
multiplier
0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1
66 11/13/2019
Faster Multiplication –
Booth Recoding of multiplier
01 011 01
x 001 1 11 0 (30)10
0 0 0 00 00
0 1 0 1 10 1
01 0 1 1 01
+ 10 1 1 0 1
01 01 1 0 1
000 0000
0 000 000
00 010 1010 001 10
67 11/13/2019
Faster Multiplication – Booth Recoding of multiplier
Booth multiplication scheme: 0101101 * 0011110
01 011 01
0+1 0 0 0 -1 0
68 11/13/2019
Booths Bit pair recoding
69
11/13/2019
Bit pair recoding
(+13) * (-6) = (- 78) = 1 1 1 0 1 1 0 0 1 0
71
(+13): 01101
(-6) : 11010
0 1 1 0 1
0 -1 -2
---------------------------------
1 1 1 1 1 0 0 1 1 0 2’s com, left shift
+ 1 1 1 1 0 0 1 1 x x 2’s com
+ 000000xxxx
---------------------------------
1110110010
11/13/2019
Normal multilpication:
72
0 1 1 0 1 (+13)
X 1 1 0 1 0 (-6)
---------------------------------
0 0 0 0
+ 111111 0 1 x
+ 000000 0 x x
111101 x x x
11101x x x x
---------------------------------
1110110 0 1 0
11/13/2019
Division
73
11/13/2019
Division of Unsigned Binary Integers
74
00001101 Quotient
11/13/2019
Hardware implementation of binary
75
division
Register M: n bit positive
divisor
Register Q: n bit positive
dividend
Register A is set to zero
Initial carry C is set to
zero
After the division is
complete, the n bit
quotient is in register Q,
and remainder is in
register A
11/13/2019
Flowchart for Unsigned Binary Division-Restoring method
76
11/13/2019
Unsigned Binary Division-Restoring
77
method:
M = divisor
Q = dividend
n= no. of cycles = size of dividend (Q)
11/13/2019
Example: 8/3 = 2 (Remainder=2)
78
Initialize A=00000 Q= 1000 M=00011
A= 11111
Step 3, Set Q0 Q= 0000
Restore + M 00011
A= 00010
11/13/2019
Example: 8/3 = 2 (Remainder=2)
(Continued)
79
11/13/2019
Non-Restoring Division
80
11/13/2019
Non-Restoring Division
81
Avoids the addition in the
restore operation – does
exactly one addition or subtract
per cycle.
Non-restoring division algorithm:
Step 1: Repeat n times
Shift A, Q left one bit
if sign of A is 0,
Subtract, A ← A - M
else (sign of A is 1),
Add, A ← A + M
if sign of A is 0,
Set Q0 = 1
else (sign of A is 1),
Set Q0 = 0
Step 2: (after n Step 1 iterations)
if sign of A is 1,
add A ← A + M
11/13/2019
Non-Restoring Division: 8/3 = 2 (Rem=2)
Initialize A= 00000 Q=1000 M= 00011
82
Step 1, L-shift A= 00001 Q=000? M= 00011
Cycle 1
Subtract - M 11101
A= 11110 Q= 0000
Add M 00011
A= 11111 Q= 0000
Add M 00011
A= 00001 Q= 0001
Cycle 4
Cache bandwidth
Main memory bandwidth
I/O performance
bandwidth
seeks
pixels or polygons per second
Relative importance depends on applications
11/13/2019
SPEC (System Performance Evaluation Cooperative) CPU Benchmark:
Benchmark :
84
A program selected for use in comparing computer
performance.
The benchmarks form a workload that the user hopes
will predict the performance of the actual workload.
Workload:
programs run on a computer that is either the actual
collection of applications run by a user or
constructed from real programs to approximate such
a mix
11/13/2019
SPEC is an effort funded and supported by a number of
computer vendors to create standard sets of
85 benchmarks for modern computer systems.
In 1989, SPEC originally created a benchmark set
focusing on processor performance (now called
SPEC89), which has evolved through five generations.
The latest is SPEC CPU2006, which consists of a set of
12 integer benchmarks (CINT2006) and 17 floating-
point benchmarks (CFP2006).
The integer benchmarks vary from part of a C compiler
to a chess program to a quantum computer simulation.
The floating-point benchmarks include structured grid
codes for finite element modeling, particle method codes
for molecular dynamics, and sparse linear algebra
codes for fluid dynamics.
11/13/2019
Measuring and Reporting Performance
86
11/13/2019
Performance
87
11/13/2019
Metrics of Performance
88
11/13/2019
Does Anybody Really Know What Time
89
it is?
UNIX Time Command
11/13/2019
Time
90
CPU time
time the CPU is computing
not including the time waiting for I/O or running other
program
User CPU time
CPU time spent in the program
System CPU time
CPU time spent in the operating system performing task
requested by the program decrease execution time
CPU time = User CPU time + System CPU time
11/13/2019
Performance
91
System Performance
elapsed time on unloaded system
CPU performance
user CPU time on an unloaded system
11/13/2019
Computer Performance Measures: Program
Execution Time (1/2)
92
11/13/2019
Computer Performance Measures: Program
Execution Time (2/2)
93
11/13/2019
Comparing Computer Performance Using Execution
Time
94
11/13/2019
Example
95
T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed
11/13/2019
CPU Execution Time: Example
97
11/13/2019
Aspects of CPU Execution Time
98
CPU Time = Instruction count x CPI x Clock cycle
Depends on: T = I x CPI x C
Program
Used
Compiler
ISA
Instruction Count I
(executed)
Depends on:
Program Used Depends on:
CPI Clock
Compiler CPU
Cycle
ISA Organization
(Average C
CPU Organization Technology
CPI) (VLSI)
11/13/2019
The basic components of performance
99
and how each is measured:
Components of Units of measure
performance
CPU execution time for a Seconds for the program
program
Instruction count Instructions executed for the
program
Clock cycles per instruction Average number of clock cycles
(CPI) per instruction
Clock cycle time Seconds per clock cycle
11/13/2019
Factors Affecting CPU Performance
100
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Instruction
CPI Clock Cycle C
Count I
Program X X
Compiler X X
Instruction Set
Architecture (ISA) X X
Organization X X
(CPU Design)
Technology X
(VLSI)
11/13/2019
101
Algorithm Instruction count, The algorithm determines the number of source program
possibly CPI instructions executed and hence the number of processor
instructions executed. The algorithm may also affect the CPI, by
favoring slower or faster instructions. For example, if the
algorithm uses more floating-point operations, it will tend to have
a higher CPI.
Programming Instruction count, The programming language certainly affects the instruction count,
language CPI since statements in the language are translated to processor
instructions, which determine instruction count. The language
may also affect the CPI because of its features; for example,
a language with heavy support for data abstraction (e.g., Java)
will require indirect calls, which will use higher CPI instructions.
11/13/2019
102
Hardware Affects How?
or software what?
components
Compiler Instruction count, The efficiency of the compiler affects both the instruction count
CPI and average cycles per instruction, since the compiler
determines
the translation of the source language instructions into computer
instructions. The compiler’s role can be very complex and affect
the CPI in complex ways.
Instruction set Instruction count, The instruction set architecture affects all three aspects of CPU
architecture clock rate, performance, since it affects the instructions needed for a
CPI function, the cost in cycles of each instruction, and the overall
clock rate of the processor.
11/13/2019
Performance Comparison: Example
103
11/13/2019
Performance Terminology
104
n = 100(Performance(X) - Performance(Y))
Performance(Y)
n = 100(ExTime(Y) - ExTime(X))
ExTime(X)
11/13/2019
Example
105
n = 100(15 - 10) / 10
n = 50%
11/13/2019
Speedup
106
11/13/2019
Amdahl’s Law
107
11/13/2019
Amdahl’s Law
108
1
ExTimeold
Speedupoverall = =
ExTimenew (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
11/13/2019
Example of Amdahl’s Law
109
Speedupoverall 1
= = 1.053
0.95
11/13/2019
Performance Enhancement Calculations:
Amdahl's Law
110
The performance enhancement possible due to a given design improvement is limited by the
amount that the improved feature is used
Amdahl’s Law:
Performance improvement or speedup due to enhancement E:
11/13/2019
Pictorial Depiction of Amdahl’s Law
111
Enhancement E accelerates fraction F of original execution time by a factor of S
Before:
Execution Time without enhancement E: (Before enhancement is applied)
• shown normalized to 1 = (1-F) + F =1
Unchanged
100
Desired speedup = 4 = -----------------------------------------------------
Execution Time with enhancement
11/13/2019
Performance Enhancement Example (2/2)
113
11/13/2019
Performance Enhancement Example
114
(2/3)
Suppose that we want to enhance the processor used for Web
serving. The new processor is 10 times faster on computation in the
Web serving application than the original processor. Assuming that
the original processor is busy with computation 40% of the time and
is waiting for I/O 60% of the time, what is the overall speedup
gained by incorporating the enhancement? (Amdahl’s Law
problem)
Answer:
Fraction enhanced = 0.4,
Speedup enhanced = 10,
Speedup overall = 1/(0.6+ (0.4/10))
= 1/0.64 ≈ 1.56
11/13/2019
Marketing Metrics: MIPS Rating (1/3)
115
For a specific program running on a specific CPU the MIPS rating is a measure
of how many millions of instructions are executed per second:
MIPS Rating = Instruction count / (Execution Time x 106)
= Instruction count / (CPU clocks x Cycle time x 106)
= (Instruction count x Clock rate) / (Instruction
count x CPI x 106)
= Clock rate / (CPI x 106)
Major problem with MIPS rating: As shown above the MIPS rating does not
account for the count of instructions executed (I).
A higher MIPS rating in many cases may not mean higher performance or
better execution time. i.e. due to compiler design variations.
11/13/2019
Marketing Metrics: MIPS Rating (2/3)
116
11/13/2019
Marketing Metrics: MIPS Rating (3/3)
117
The MIPS rating is only valid to compare the performance of different CPUs
provided that the following conditions are satisfied:
(Thus the resulting programs used to run on the CPUs and obtain the MIPS
rating are identical at the machine code level including the same instruction
count)
11/13/2019
Compiler Variations, MIPS, Performance: An
Example (1/2)
118
11/13/2019
Compiler Variations, MIPS, Performance: An
Example (2/2)
119
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)
CPI = CPU execution cycles / Instructions count
CPI C
n
CPU clock cycles i i
i 1
For compiler 2:
CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
MIPS Rating2 = 100 / (1.25 x 106) = 80.0
CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds
11/13/2019
Marketing Metrics: MFLOPS (1/2)
120
11/13/2019
Marketing Metrics: MFLOPS(2/2)
121
11/13/2019
Other ways to measure performance
122
Problem 2:
MIPS varies between programs on the same computer
Problem 3:
MIPS can vary inversely to performance!
Let’s look at an example of why MIPS doesn’t work…
11/13/2019
A MIPS Example (1)
124
11/13/2019
126
7 x 106 x 1.43
CPU Time1 = = 0.10 seconds
100 x 106
12 x 106 x 1.25
CPU Time2 = = 0.15 seconds
100 x 106
11/13/2019
Required Reading
Patterson : Chapter-1, page 26 onwards
William Stalling : Chapter-2, page 68 onwards
Zaky : Chapter-1, page 13 onwards
11/13/2019
127
Subjective Question Bank
1. Explain Von Neumann architecture with the help of a neat diagram.
2. Explain Harvard architecture with the help of a neat diagram.
3. Compare Von Neumann architecture with Harvard architecture.
4. What are the attributes of architecture to programmer?
5. What features are considered in computer organization?
6. Draw evolution roadmap of computer.
7. List functions of following registers of CPU: Memory buffer register, memory address register, instruction
register, instruction buffer register
8. Draw and explain the hardware implementation of Booth’s multiplication algorithm and explain the
same.
9. Draw flowchart of Booth’s algorithm for signed multiplication.
10. Perform multiplication operation on the given nos. using Booth’s algorithm :
a. Multiplicand= 1001, Multiplier = 1101
b. Multiplicand = 11011,Multiplier = 00111
11. How does bit pair recoding technique achieves faster multiplication? Bit pair recode multipliers:
(110110101111001)2 and (0101101010010101)2
12. Draw and explain the hardware implementation of Booth’s division algorithm and explain the same.
13. Compare restoring and non- restoring division algorithms.
14. Draw flowchart of Booth’s algorithm for restoring unsigned division and Perform restoring division for the
following: a. Dividend= 1010, Divisor = 11
b. Dividend = 1100, Divisor = 0011
128 11/13/2019
15. Draw flowchart of Booth’s algorithm for non restoring unsigned division and
divide the following unsigned numbers and justify your answer:
a. Dividend= (15)10, Divisor = (2)10
b. Dividend = 1101, Divisor = 0100
16. List out and explain computer performance measures considered in design.
17. List out levels of programs/benchmarks to evaluate computer performance .
18. What are the types of benchmarks for performance evaluation of
computer?
19. List out SPEC marks.
20. Explain different metrics of computer performance .
21. What is the relation between CPU time, user CPU time and system CPU
Time?What is the difference between CPU and system performance?
22. What are the aspects of CPU execution time.
23. List out and explain factors affecting computer performance.
24. What is the relation between CPI, CPU clock cycles and instruction
count?
25. State Amdahl’s law.
129 11/13/2019
26. What is meant by performance improvement due to enhancement?
27. What is the relation between enhancement E, execition time without
enhancement and execition time with enhancement?
28. What is the relation between average speed, total distance and total time?
29. Define MIPS rating and explain problems associated with it.
30. When MIPS rating is used to compare performance of different CPU’s?
31. What is the relation between CPU time, instruction count , CPI and clock
rate?
32. What is the relation between CPU, CPUexecution cycles and instruction
count ?
33. Define MFLOPS reated to computer.
34. List out problems with MIPS.
35. Do the following changesin the computer system increase throughput,
drease response time, or both?
1. Replacing the processor in a computer with a faster version.
2. Adding additional processors to a system that uses multiple processors
for separate taskslike searching the World Wide Web.
130 11/13/2019
36. Suppose that we want to enhance the processor used for Web serving. The new processor is
10 times faster on computation in the Web serving application than the original processor.
Assuming that the original processor is busy with computation 40% of the time and is waiting
for I/O 60% of the time, what is the overall speedup gained by incorporating the
enhancement? (Amdahl’s Law problem)
37. A program runs in 10 seconds on computer A, which has 2 2GHz clock. We are trying to
help a computer designer build a computer B, which will run this program in 6 seconds. The
designer has determined that a substantial increase in the clock rate possible, but this
increase will affect the rest of the CPU design, causing computer B to require 1.2 times as
many clock cycles as computer A for this program. What clock rate we should we tell the
designer to target?
38. Suppose we have two implementations of the same instruction setarchitecture. Computer A
has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a
clock cycle time of 500ps and a CPI of 1.2 for the same program. Which computer is faster
for this program and by how much?
39. Consider performance measurements for a program: as shown in the table
a. Which computer has the higher MIPS rating?
b. Which computer is faster?
Measirment Computer A Computer B
Instruction count 10 billion 8 billion
Clock rate 4 GHz 4 GHz
CPI 1.0 1.1
131 11/13/2019