0% found this document useful (0 votes)
22 views59 pages

Slot02 03 CH02 ComputerEvolutionAndPerformace 59 Slides

Chapter 2 of William Stallings' 'Computer Organization and Architecture' discusses the evolution of computer technology, from early vacuum tube computers to modern microprocessors. It covers key performance issues, the development of multicore systems, and various architectures such as GPGPUs. The chapter also emphasizes the importance of assessing computer performance and the historical significance of major computing milestones.

Uploaded by

lkc12052006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views59 pages

Slot02 03 CH02 ComputerEvolutionAndPerformace 59 Slides

Chapter 2 of William Stallings' 'Computer Organization and Architecture' discusses the evolution of computer technology, from early vacuum tube computers to modern microprocessors. It covers key performance issues, the development of multicore systems, and various architectures such as GPGPUs. The chapter also emphasizes the importance of assessing computer performance and the historical significance of major computing milestones.

Uploaded by

lkc12052006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

+

Chapter 2
Computer Evolution and Performance
William Stallings : Computer Organization and Architecture, 9 th Edition
+ 2

Objectives

Why should we study this chapter?


 How are computers developed?  generations
 What
applications require great power
computers?
 What are Multicore, MICs (many integrated
cores), and GPGPUs (general purpose
graphical processing unit)?
 How to assess computer performance?
+ 3

Objectives
After studying this chapter, you should be able
to:
 Present an overview of the evolution of
computer technology from early digital
computers to the latest microprocessors.
 Understand the key performance issues that
relate to computer design.
 Explain the reasons for the move to multicore
organization, and understand the trade-off
between cache and processor resources on a
single chip.
+ 4

Contents

 2.1- A Brief History of Computers


 2.2- Designing for Performance
 2.3- Multicore, MICs, and GPGPUs
 2.6- Performance Assessment
+ 5

2.1- History of Computers

A generation is engraved based on IC: Integrated Circuit


an event/essential invention
First
+ Generation: Vacuum Tubes 6

 Basic technology: Vacuum tubes


 Building block: Composition and operating of
vacuum tube
(https://en.wikipedia.org/wiki/Vacuum_tube)
 Typical computers:
 ENIAC (Electronic Numerical Integrator And Computer)
 EDVAC (Electronic Discrete Variable Computer) and John
Von Neumann
 IAS computer (Princeton Institute for Advanced Studies)
 Commercial Computers: UNIVAC ((Universal Automatic
Computer)
 IBM Computers ( International Business Machines)
+ First Generation: ENIAC 7

Computer
(Read by yourself)
 Electronic Numerical Integrator And Computer

 Designed and constructed at the University of Pennsylvania


 Started in 1943 – completed in 1946, by John Mauchly and John
Eckert

 World’s first general purpose electronic digital computer


 Army’s Ballistics Research Laboratory (BRL) needed a way to
supply trajectory tables for new weapons accurately and within a
reasonable time frame
 Was not finished in time to be used in the war effort

 Its first task was to perform a series of calculations that were


used to help determine the feasibility of the hydrogen bomb

 Continued to operate under BRL management until 1955


when it was disassembled (Army’s Ballistics Research Laboratory )
ENIAC: Characteristics
8

Major
Memory drawback
consisted
Occupied was the need
of 20
Contained Capable
1500 Decimal accumulators,
more 140 kW of for manual
Weighed square rather each
than Power 5000 programming
30 feet than capable
18,000 consumpti additions by setting
tons of binary of
vacuum on per
floor machine holding switches
tubes second and
space a
10 digit plugging/
number unplugging
cables
+ 9

John von Neumann


EDVAC (Electronic Discrete Variable Computer)

 First publication of the idea was in 1945


 Stored program concept
 Attributed to ENIAC designers, most notably the
mathematician John von Neumann
 Program represented in a form suitable for storing in
memory alongside the data (program= data +
instructions)

 IAS computer
 Princeton Institute for Advanced Studies
 Prototype of all subsequent general-purpose
computers
 Completed in 1952
10

Structure of von Neumann


Machine

CA: Cellular Automata


CC: Cellular Constructor
+ 11

IAS Memory Formats


 Both data and instructions are
 The memory of the IAS stored there
consists of 1000 storage
locations (called words)  Numbers are represented in
binary form and each
of 40 bits each instruction is a binary code

data

Instruction
One word contains 2 instructions
+
Structure
of
IAS
Computer
AC: Accumulator
MQ: Multiplier Quotient
MBR: Memory Buffer Register
IBR: Instruction Buffer Register
PC: program counter
IR: Instruction register
MAR: Memory Address Register
+ 13

Table 2.1
The IAS
Instruction
Set
Hexadecimal Code:
+ 010FA210FB
14

IAS code length: 20 bits


Left instruction: 010FA
Opcode: 01(h)
Address: 0FA
01(h)  0000 0001
Load data in the 0FA memory
word to AC
AC = [0FA]
Right instruction: 210FB
Opcode: 21(h)
Address: 0FB Run IAS
21(h)  0010 0001
Store AC to the 0FB memory
Machin
word
[0FB] = AC
e Code
AC: 7 7 OFA
 [0FB] = [0FA]
7 OFB
A part of the exercise 2.7
+ 15

Commercial Computers:
UNIVAC
(Read

by yourself)
1947 – Eckert and Mauchly formed the Eckert-Mauchly
Computer Corporation to manufacture computers commercially
 UNIVAC I (Universal Automatic Computer)
 First successful commercial computer
 Was intended for both scientific and commercial applications
 Commissioned by the US Bureau of Census for 1950 calculations

 The Eckert-Mauchly Computer Corporation became part of the


UNIVAC division of the Sperry-Rand Corporation
 UNIVAC II – delivered in the late 1950’s
 Had greater memory capacity and higher performance

 Backward compatible
+
16

 Was the major manufacturer


of punched-card processing
equipment
 Delivered its first electronic
stored-program computer
(701) in 1953
 Intended primarily for
scientific applications IBM
 Introduced 702 product in
1955
(Read by yourself)
 Hardware features made it
suitable to business
applications

 Series of 700/7000
computers established IBM as
the overwhelmingly dominant
computer manufacturer
+ 17

Second Generation: Transistors


 Transistor = Transfer – resistor (vật có thể truyền-cản điện)
 Building block: Composition and operating of transistor

More details: https://en.wikipedia.org/wiki/Transistor


 It’s activity is similar to those in vacuum tube
 Smaller, Cheaper
 Dissipates (phát tán) less heat than a vacuum tube
 Is a solid state device made from silicon
 Was invented at Bell Labs in 1947
 It was not until the late 1950’s that fully transistorized
computers were commercially available
 Typical computers: IBM 700/7000 series
+ 18

Second Generation Computers


 Introduced:  Appearance of the Digital
 More complex arithmetic Equipment Corporation
and logic units and control (DEC) in 1957
units
 The use of high-level  PDP-1 (programmed data
programming languages processor) was DEC’s first
 Provision of system computer
software which provided
the ability to:  This began the mini-
 load programs computer phenomenon
that would become so
 move data to peripherals
prominent (leading) in the
and libraries
third generation
 perform common
computations
Table 2.3 : Example Members of the
19

IBM 700/7000 Series


20

IBM
7094
Configuration
Read by yourself

Multiplexer (mạch đa hợp)


manages centrally some devices.
Mag: magnetic
Drum: magnetic drum for storing
data
21

Third Generation: Integrated


Circuits
IC
 1958 – the invention of the integrated circuit
 All components of a circuit are minimize to micro size.
So, all of them are packed in a chip
 Discrete component
 Single, self-contained transistor
 Manufactured separately, packaged in their own containers,
and soldered or wired together onto masonite (like circuit
boards)
 Manufacturing process was expensive and cumbersome
(complex)

 The two most important members of the third


generation were the IBM System/360 and the DEC
PDP-8
+ 22

Microelectronics
+  A computer consists of
23

Integrated gates, memory cells, and


interconnections among
Circuits these elements

 Data storage – provided by The gates and memory


memory cells cells are constructed of
simple digital electronic
 Data processing – provided components
by gates  Exploits the fact that such
components as transistors,
 Data movement – the
resistors, and conductors can be
paths among components
fabricated from a semiconductor
are used to move data from
such as silicon
memory to memory and
from memory through gates  Many transistors can be produced
to memory
at the same time on a single
 Control – the paths among wafer(thin piece) of silicon
components can carry
control signals
 Transistors can be connected with
a processor metallization (cover
using metal) to form circuits
More details: https://en.wikipedia.org/wiki/Silicon
+ 24

Wafer, Wafer: a thin piece of


Chip, silicon (< 1 mm)

and
Gate
Relationshi
p
+ Chip Growth 25

Figure 2.8 Growth in Transistor Count on Integrated Circuits


Number
of
transistor
s

Year m: million
bn: billion
Moore’s Law 26

1965, Gordon Moore


(co-founder of Intel)

Observed number of transistors that could be


put on a single chip was doubling every year

Consequences of Moore’s
The pace slowed
to a doubling
every 18 months
law:
in the 1970’s The cost of The Computer
computer electrical becomes
but has logic and path length
Reduction
smaller and in power Fewer
sustained that memory is is more
rate ever since and cooling interchip
circuitry has shortened, convenient to
use in a requirement connections
fallen at a increasing
variety of s
dramatic operating
rate speed environments
+ 27

Table 2.4: Characteristics of the

System/360 Family

Table 2.4 Characteristics of the System/360 Family


28

Table 2.5: Evolution of the


PDP-8
PDP: Programmed Data (Read
Processorby yourself)
Produced by Digital Equipment Corporation (DEC)
+ 29

DEC - PDP-8 Bus Structure


DEC: Digital Equipment Corporation
PDP: Programmed Data Processor

Omni (Latin) = for all


+ LSI
Large
Scale
Later Integration

Generation
VLSI
s Very Large
Scale
Integration

ULSI
Semiconductor Memory Ultra Large
Microprocessors Scale
Integration
+ Semiconductor Memory 31

In 1970 Fairchild produced the first relatively capacious semiconductor


memory
Chip was about the Could hold 256 bits Much faster than
Non-destructive
size of a single core of memory core

In 1974 the price per bit of semiconductor memory dropped below the price
There has been a continuing and
perrapid
bit of core Developments
memory in memory and processor
decline in memory cost accompanied by a
technologies changed the nature of
corresponding increase in physical memory
computers in less than a decade
density

Since 1970 semiconductor memory has been through 13 generations

Each generation has provided four times the storage density of the previous generation,
accompanied by declining cost per bit and declining access time
+ 32

Microprocessors
 The density of elements on processor chips continued to
rise
 More and more elements were placed on each chip so that
fewer and fewer chips were needed to construct a single
computer processor

 1971 Intel developed 4004


 First chip to contain all of the components of a CPU on a single
chip
 Birth of microprocessor

 1972 Intel developed 8008


 First 8-bit microprocessor

 1974 Intel developed 8080


 First general purpose microprocessor
 Faster, has a richer instruction set, has a large addressing
capability
Evolution of Intel Microprocessors 33
Evolution of Intel Microprocessors 34
+ 35

2.2- Designing for


Performance
Desktop applications that require the great
power of today’s microprocessor-based systems
include

• Image processing

• Speech recognition

• Videoconferencing

• Multimedia authoring

• Voice and video annotation of files

• Simulation modeling
+ Microprocessor Speed 36

Techniques built into contemporary (current) processors include:


Technique Description
Pipelining Processor moves data or instructions into a
conceptual pipe with all stages of the pipe
processing simultaneously
Branch Processor looks ahead in the instruction
prediction code fetched from memory and predicts
which branches, or groups of instructions,
are likely to be processed next

Data flow Processor analyzes which instructions are


analysis dependent on each other’s results, or data,
to create an optimized schedule of
instructions
Speculativ Using branch prediction and data flow
e (suy analysis, some processors speculatively
đoán) execute instructions ahead of their actual
execution appearance in the program execution,
holding the results in temporary locations,
keeping execution engines as busy as
+ 37

Performance
Balance
Increase the
 Adjust the organization and number of bits
that are retrieved
architecture to compensate at one time by
making DRAMs
for the mismatch among the “wider” rather
capabilities of the various than “deeper”
and by using wide
components bus data paths
Reduce the
frequency of
 Architectural examples memory access by
incorporating
include: increasingly
complex and
efficient cache
structures
between the
processor and
main memory
Increase the
Change the DRAM interconnect
interface to make bandwidth
it more efficient between
by including a processors and
memory by using
cache or other higher speed buses
buffering scheme and a hierarchy of
on the DRAM chip buses to buffer and
structure data flow
Typical I/O Device Data Rates 38
+ 39

Improvements in Chip
Organization and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate

 Propagation time for signals reduced

 Increase size and speed of caches


 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and


architecture
 Increase effective speed of instruction execution
 Parallelism
+ 40

Problems with Clock Speed and


Login Density
 Power
 Power density increases with density of logic and clock
speed
 Dissipating heat

 RC (Resistance and Capacitance) delay


 Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
 Delay increases as RC product increases
 Wire interconnects thinner, increasing resistance
 Wires closer together, increasing capacitance

 Memory latency
 Memory speeds lag (slow down) processor speeds
+ Processor Trends
41
+ 42

2.3- Multicore, MICs, and


GPGPUs

 Multicore
CPU: CPU has some cores
running concurrently.
 MIC: Many integrated core
 GPGPU: General Purpose Graphical
Processing Unit
The use of multiple

Multicore
processors on the same
chip provides the potential
to increase performance
without increasing the
clock rate
Strategy is to use two
simpler processors on
the chip rather than
one more complex
processor

With two processors larger


caches are justified

As caches became
larger it made
performance sense to
create two and then
three levels of cache on
a chip
+ 44

Many Integrated Core (MIC)


Graphics Processing Unit (GPU)

MIC GPU
 Leap (fast growth) in
 Core designed to perform parallel
performance as well as the operations on graphics data
challenges in developing
software to exploit such a
 Traditionally found on a plug-in
large number of cores graphics card, it is used to
encode and render 2D and 3D
 The multicore and MIC graphics as well as process video
strategy involves a
homogeneous (same kind)
 Used as vector processors for a
collection of general purpose variety of applications that
processors on a single chip require repetitive computations
Read by Yourself 45

2.4- The Evolution of The Intel x86 Architecture


2.5- Embedded Systems and the ARM

Some definitions:
CISC: Complex Instruction Set Computer, CPU is equipped a
large set of instructions
RISC: Reduced Instruction Set Computer, CPU is equipped basic
instructions only based on the thinking: A high instruction is
created using some basic instructions.
ARM: Advanced RISC Machine
+ 46

2.6- Performance
Assessment
Factors affect on computer
performance:

Factors
Clock Speed and Instructions per Second
Instruction execution rate
Methods: Benchmarks
Some laws: Read by yourself
Amdahl’s Law
Little’s Law
+ 47

System Clock
- Digital devices need pulses to operate. Pulses are created by a
clock generator (a hardware using crystal oscillator)
- The rate of pulses is known as the clock rate, or clock speed.
- The time between pulses is the cycle time.
- One increment, or pulse, of the clock is referred to as a clock
cycle, or a clock tick.
- Unit: cycles per second, Hertz (Hz)
- Operations performed by a processor, such as fetching an
instruction, decoding the instruction, performing an arithmetic
operation, and so on, are governed by a system clock.
 High clock rate  High performance.
+ 48

Instruction Execution Rate

- Unit: MIPS (millions of instructions per second)


- Unit: MFLOPs (Floating-point performance is
expressed as millions of floating-point operations
per second)
+ 49

Benchmark

- A test used to measure hardware or software


performance.
- Benchmarks for hardware use programs that test the
capabilities of the equipment
- Benchmarks for software determine the efficiency,
accuracy, or speed of a program in performing a
particular task, such as recalculating data in a
spreadsheet.
- The same data is used with each program tested, so the
resulting scores can be compared to see which programs
perform well and in what areas.
Benchmarks …
50

For example, consider this high-level language statement:

A = B + C /* assume all quantities in main memory */

With a traditional instruction set architecture, referred to as a


complex instruction set computer (CISC), this instruction can be
compiled into one processor instruction:
2 codes may
add mem(B), mem(C), mem (A)
need the same
amount of time
On a typical RISC machine, the compilation would look
when they
something like this:
execute on 2
load mem(B), reg(1);
load mem(C), reg(2);
machines.
add reg(1), reg(2), reg(3);
store reg(3), mem (A)
+ 51

Benchmark
- The design of fair benchmarks is something of an art,
because various combinations of hardware and software
can exhibit widely variable performance under different
conditions. Often, after a benchmark has become a
standard, developers try to optimize a product to run that
benchmark faster than similar products run it in order to
enhance sales (MS Computer Dictionary)
 Beginning in the late 1980s and early 1990s, industry
and academic interest shifted to measuring the
performance of systems using a set of benchmark
programs
+ 52

Desirable Benchmark Characteristics

1. It is written in a high-level language, making


it portable across different machines.
2. It is representative of a particular kind of
programming style, such as system
programming, numerical programming, or
commercial programming.
3. It can be measured easily.
4. It has wide distribution.
+ 53

System Performance Evaluation


Corporation (SPEC)
 Benchmark suite
A collection of programs, defined in a high-level
language
 Attempts to provide a representative test of a
computer in a particular application or system
programming area
 SPEC
 An industry consortium
 Defines and maintains the best known collection of
benchmark suites
 Performance measurements are widely used for
+  Best known SPEC benchmark
suite
 Industry standard suite for
SPEC processor intensive applications
 Appropriate for measuring
performance for applications

CPU2006 that spend most of their time


doing computation rather than
I/O
 Consists of 17 floating point
programs written in C, C++, and
Fortran and 12 integer programs
written in C and C++
 Suite contains over 3 million
lines of code

 Gene Amdahl [AMDA67]
+Amdahl’s  Dealswith the potential
Law speedup of a program
using multiple processors
(Read by compared to a single
processor
yourself)  Illustratesthe problems facing
industry in the development
of multi-core machines
 Software must be adapted
to a highly parallel
execution environment to
exploit the power of parallel
processing
 Canbe generalized to
evaluate and design technical
+ Amdahl’s Law (Read by 56

yourself)
+ 57

Little’s Law (Read by


yourself)
The general setup is that we have a steady state system to which
items arrive at an average rate of λ items per unit time. The items
stay in the system an average of W units of time. Finally, there is an
average of L units in the system at any one time. Little’s Law relates
these three variables as L = λ W.

 Fundamental and simple relation with broad applications

 Can be applied to almost any system that is statistically in steady


state, and in which there is no leakage

 Queuing system
 If server is idle an item is served immediately, otherwise an
arriving item joins a queue
 There can be a single queue for a single server or for multiple
servers, or multiples queues with one being for each of multiple
servers

 Average number of items in a queuing system equals the average


rate at which items arrive multiplied by the time that an item spends
in the system
 Relationship requires very few assumptions
+ Questions (Use your notebook) 58

Building blocks: Composition and operating of vacuum tube/transistor

2.1 What is a stored program computer?

2.2 What are the four main components of any general-purpose


computer?

2.3 At the integrated circuit level, what are the three principal
constituents of a computer system?

2.4 Explain Moore’s law.

2.5 List and explain the key characteristics of a computer family.

2.6 What is the key distinguishing feature of a microprocessor?

2.7- Refer to the table 2.1


+ Summary Computer
59

Evolution and
Performance
Chapter 2
 Multi-core
 First generation computers  MICs
 Vacuum tubes
 Second generation
 GPGPUs
computers  Performance assessment
 Transistors  Clock speed and
 Third generation computers instructions per second
 Integrated circuits  Benchmarks
 Performance designs
 Amdahl’s Law
 Microprocessor speed
 Little’s Law
 Performance balance
 Chip organization and
architecture

You might also like