0% found this document useful (0 votes)
3 views

02_Computer Evolution and Performance

Chapter 2 of 'Computer Organization and Architecture' discusses the evolution of computers, starting with the ENIAC and the introduction of the stored program concept by von Neumann. It outlines the transition from vacuum tubes to transistors, detailing their advantages and the development of microelectronics. The chapter also covers various generations of computers, advancements in memory technology, and the implications of Moore's Law on computing performance.

Uploaded by

jerry20050125
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

02_Computer Evolution and Performance

Chapter 2 of 'Computer Organization and Architecture' discusses the evolution of computers, starting with the ENIAC and the introduction of the stored program concept by von Neumann. It outlines the transition from vacuum tubes to transistors, detailing their advantages and the development of microelectronics. The chapter also covers various generations of computers, advancements in memory technology, and the implications of Moore's Law on computing performance.

Uploaded by

jerry20050125
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

William Stallings

Computer Organization
and Architecture
7th Edition

Chapter 2
Computer Evolution and
Performance
ENIAC - background
• Electronic Numerical Integrator And
Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory ([trəˋdʒɛktrɪ] ) tables for
weapons
• Started 1943
• Finished 1946
—Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
ENIAC - details
Turing Award Nobel Prize of computing
von Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from
memory and executing
• Input and output equipment operated by
control unit
• Princeton Institute for Advanced Studies
—IAS
• Completed 1952
• Institute for Advanced Study (IAS),
Princeton, NJ, USA
Structure of von Neumann machine
IAS - details
• 1000 x 40 bit words
—Binary number One word hold two intsructions

—2 x 20 bit instructions
• Set of registers (storage in CPU)
—Memory Buffer Register
—Memory Address Register
—Instruction Register
—Instruction Buffer Register
—Program Counter
—Accumulator
—Multiplier Quotient
Structure of IAS –
detail

Memory Buffer Register (MBR)


Memory Address Register (MAR)
Instruction Register (IR)
Instruction Buffer Register (IBR)
Program Counter (PC)
Accumulator (AC)
Multiplier Quotient (MQ)
• MBR: the register in a computer's processor, or central
processing unit, CPU, that stores the data being transferred to and
from the immediate access store. It acts as a buffer allowing the
processor and memory units to act independently without being
affected by minor differences in operation.
• MAR: register that either stores the memory address from which
data will be fetched to the CPU or the address to which data will
be sent and stored.
• IR: the part of a CPU's control unit that stores the instruction
currently being executed or decoded.
• IBR (Instruction Buffer Register) is used to temporarily hold
instruction for the next use
• PC: commonly called the instruction pointer (IP), is a processor
register that indicates where a computer is in its program
sequence.  holds the memory address of (“points to”) the next
instruction that would be executed.
• AC: a register in which intermediate arithmetic and logic results
are stored. Without a register like an accumulator, it would be
necessary to write the result of each calculation (addition,
multiplication, shift, etc.) to main memory.
Video

1. CPU Update - Buses, Registers, and RAM


http://www.youtube.com/watch?v=TcAUHp9jjf8
2. CPU Registers ***
http://www.youtube.com/watch?v=RbZDezRyFQc
Commercial Computers
• 1947 - Eckert-Mauchly Computer
Corporation
• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census (人口普查) 1950
calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
—Faster
—More memory
UNIVAC
IBM
• Punched-card processing equipment
• 1953 - the 701
—IBM’s first stored program computer
—Scientific calculations
• 1955 - the 702
—Business applications
• Lead to 700/7000 series
Vacuum tube
Diodes: 二極體

• In electronics, a vacuum tube, electron tube (in North America), thermionic


valve, or just valve (elsewhere, especially in Britain) is a device used to amplify,
switch, otherwise modify, or create an electrical signal by controlling the
movement of electrons in a low-pressure space. Some special function vacuum
tubes are filled with low-pressure gas: these are so-called soft valves (or tubes), as
distinct from the hard vacuum type which have the internal gas pressure reduced as
far as possible. Almost all depend on the thermal emission of electrons, hence
thermionic.
• For most purposes, the vacuum tube has been replaced by solid-state devices such
as transistors and solid-state diodes. Solid-state devices last much longer, are
smaller, more efficient, more reliable, and cheaper than equivalent vacuum tube
devices. However, tubes are still used in specialized applications: for engineering
reasons, as in high-power radio frequency transmitters; or for their aesthetic appeal,
as in audio amplification. Cathode ray tubes are still used as display devices in
television sets, video monitors, and oscilloscopes, although they are being replaced
by LCDs and other flat-panel displays. A specialized form of the electron tube, the
magnetron, is the source of microwave energy in microwave ovens and some radar
systems
Vacuum tube (真空管):维基百科
Diodes: 二極體

• 一種電子元件,在電路中控制電子的流動方向,放大訊號。
• 真空管因成本高、不耐用、體積大、效能低等原因,最後被電晶體
(transistor)取代了
• 但是可以在音響、微波爐及人造衛星的高頻發射機看見真空管的身影。部份
戰鬥機為防止核爆造成的電磁脈衝損壞,機上的電子設備亦採用真空管。另
外,像是電視機與電腦陰極射線管顯示器內的陰極射線管以及X光機的X射線
管等則是屬於特殊的真空管。
• 電晶體機聲音容易偏薄、偏硬,真空管聲音則較豐潤、較生動。
Vacuum tube
Transistor

• From Wikipedia, the free encyclopedia

• In electronics, a transistor is a semiconductor device


commonly used to amplify or switch electronic signals. A
transistor is made of a solid piece of a semiconductor
material, with at least three terminals for connection to an
external circuit. A voltage or current applied to one pair of
the transistor's terminals changes the current flowing
through another pair of terminals. Because the controlled
(output) power can be much larger than the controlling
(input) power, the transistor provides amplification of a
signal. The transistor is the fundamental building block of
modern electronic devices, and is used in radio, telephone,
computer and other electronic systems. Some transistors
are packaged individually but most are found in integrated
circuits
Transistor
電晶體

• From Wikipedia, the free encyclopedia

• 一種固體半導體器件,可以用於放大、開關、穩壓、訊號調變和許多
其他功能。

• 電晶體作為一種可變開關,基於輸入的電壓,控制流出的電流,因此
電晶體可做為電流的開關,和一般機械開關(如Relay、switch)
• 在類比電路中,電晶體用於放大器、音頻放大器、射頻放大器、穩壓
電路;在計算機電源中,主要用於開關電源。
• 電晶體也應用於數位電路,主要功能是當成電子開關。數位電路包括
邏輯閘、隨機存取記憶體 (RAM) 和微處理器。
Transistor
Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
Transistor Based Computers
• Second generation machines
• NCR & RCA produced small transistor
machines
• IBM 7000
• DEC - 1957
—Produced PDP-1
Viedos

1. How It's Made - vacuum tubes


http://www.youtube.com/watch?v=8n4WVRKkmww
2. How Transistors Work
http://www.youtube.com/watch?v=ZaBLiciesOU
3.
Microelectronics
• Literally - “small electronics”
• A computer is made up of gates, memory
cells and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer
Wafer (electronics):薄酥餅; 威化
餅,晶圓

• A wafer is a thin slice of semiconductor material,


such as a silicon crystal, used in the fabrication
of integrated circuit and other microdevices. The
wafer serves as the substrate for microelectronic
devices built in and over the wafer and
undergoes many microfabrication process steps
such as doping or ion implantation, etching,
deposition of various materials, and
photolithographic patterning.
• Several types of solar cells are made from such
wafers. A solar wafer is a circular solar cell
made from the entire wafer (rather than cutting
into smaller rectangular solar cells).
Relaionship Among Wafer, Chip, and Gate

Silicon Wafer Processing Animation


http://www.youtube.com/watch?v=LWfCq
pJzJYM
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
—Up to 100 devices on a chip
• Medium scale integration - to 1971
—100-3,000 devices on a chip
• Large scale integration - 1971-1977
—3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
—100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
—Over 100,000,000 devices on a chip
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every
year
• Since 1970’s development has slowed a little
— Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical
paths, giving higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series

• 1964,Replaced (& not compatible with)


7000 series
• First planned “family” of computers
—Similar or identical instruction sets
—Similar or identical O/S
—Increasing speed
—Increasing number of I/O ports
(i.e. more terminals)
—Increased memory size
—Increased cost
• Multiplexed switch structure
DEC PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
—$100k+ for IBM 360
• Embedded applications & OEM
• BUS STRUCTURE
DEC - PDP-8 Bus Structure

PCI Express bus card slots (from top to bottom: x4, x16, x1 and x16),
compared to a traditional 32-bit PCI bus card slot (bottom).
Drum memory (磁鼓存儲器)
• Drum memory is a magnetic data storage device and
was an early form of computer memory widely used in
the 1950s and into the 1960s, invented by Gustav
Tauschek in 1932 in Austria. For many machines, a
drum formed the main working memory of the
machine, with data and programs being loaded on to
or off the drum using media such as paper tape or
punch cards. Drums were so commonly used for the
main working memory that these computers were
often referred to as drum machines. Drums were
later replaced as the main working memory by
memory such as core memory and a variety of other
systems which were faster as they had no moving
parts, and which lasted until semiconductor memory 鐵磁
entered the scene.
• A drum is a large metal cylinder that is coated on the
outside surface with a ferromagnetic recording
material. It is, simply put, a hard disk platter in the
form of a drum rather than a flat disk. A row of read-
write heads runs along the long axis of the drum, one
for each track.
Drum memory (磁鼓存儲器)
• 磁鼓記憶體(英語:Drum memory)是一
種依靠磁介質的資料儲存裝置,為20世紀50
年代和60年代計算機所用記憶體的早期形式
,由Gustav Tauschek於1932年在奧地利
發明。磁鼓為這套機制的主要工作儲存單元
,透過穿孔紙帶或者打孔卡載入、取出資料
。當時許多計算機採用了這種磁鼓記憶體,
以至於它們常常被叫做「鼓機」(drum
machines)。不過不久之後,磁芯記憶體等
其他技術取代了磁鼓器成為了主要的儲存媒
體,直到最後半導體記憶體進入了儲存媒體
的領域。
Magnetic core memory
• Magnetic core memory, or ferrite-core
memory, is an early form of random access
computer memory. It uses small magnetic
ceramic rings, the cores, through which wires are
threaded to store information via the polarity of
the magnetic field they contain. Such memory is
often just called core memory, or, informally,
core.
• Although computer memory long ago moved to
silicon chips, memory is still occasionally called
"core". This is most obvious in the naming of the
core dump, which refers to the contents of
memory recorded at the time of a program error.
Magnetic core memory
• 磁芯記憶體(英語:Magnetic Core Memory)是一種
早期的電腦記憶體。磁芯記憶體是利用磁性材料製成之記
憶體,其原理為:將磁環(磁芯)帶磁性或不帶磁性之狀
態,用以代表1或0之位元,一長串1或0之組合就代表要
儲存之資訊。
• 磁芯記憶體是一種隨機存取記憶體(Random Access
Memory),在電腦中可擔任主記憶體的角色。
• 磁芯記憶體是非揮發性記憶體(Non-volatile
([ˋvɑlət!] ,) Memory),它的一個特色是:即使當
機或電源中斷,只要沒有發生錯誤的寫入訊號,則仍然可
保有其內容。
Semiconductor Memory
• 1970
• Fairchild
• Size of a single core
—i.e. 1 bit of magnetic core storage
• Holds 256 bits
• Non-destructive read
• Much faster than core
• Capacity approximately doubles each year
附帶利益
Fairchild
• Present day Fairchild Semiconductor
International, Inc. is a spin-off
company resulting from reconstitution of
assets in National Semiconductor. It
inherits the Fairchild name of the original
Fairchild Camera and Instrument, which
had been the cornerstone of the
semiconductor industry since 1957. The
original Fairchild had been acquired by
Schlumberger which then sold it to
National Semiconductor.
flip-flop (正反器)
• In digital circuits, a flip-flop is a term referring
to an electronic circuit (a bistable multivibrator)
that has two stable states and thereby is capable
of serving as one bit of memory. Today, the term
flip-flop usually refers to clocked or edge-
triggered devices (i.e., devices that are a
conceptual combination of a transparent-high
latch with a transparent-low latch).
• A flip-flop is usually controlled by one or two
control signals and/or a gate or clock signal. The
output often includes the complement as well as
the normal output. As flip-flops are implemented
electronically, they require power and ground
connections.
flip-flop (正反器)
• 正反器(英語:Flip-flop, FF,中國大陸譯作触发器,台
灣譯作正反器),學名雙穩態多諧振盪器(Bistable
Multivibrator),是一種應用在數位電路上具有記憶功
能的循序邏輯元件,可記錄二進位制數位訊號「1」和「
0」。
• 正反器是構成時序邏輯電路以及各種複雜數位系統的基本
邏輯單元。
Set-Reset flip-flops (SR flip-flops).

• The fundamental latch is the simple SR flip-flop ,


where S and R stand for set and reset
respectively. It can be constructed from a pair of
cross-coupled NOR logic gates. The stored bit is
present on the output marked Q.
• Normally, in storage mode, the S and R inputs
are both low, and feedback maintains the Q and
Q outputs in a constant state, with Q the
complement of Q. If S (Set) is pulsed high while
R is held low, then the Q output is forced high,
and stays high even after S returns low;
similarly, if R (Reset) is pulsed high while S is
held low, then the Q output is forced low, and
stays low even after R returns low.
S
R
Set-Reset flip-flops (SR flip-flops).
• RS正反器
• 基本RS正反器又稱SR閂鎖,是正反器中最簡單的一種,也
是各種其他型別正反器的基本組成部分。兩個反及閘或反或
閘的輸入端輸出端進行交叉耦合或首尾相接,即可構成一個
基本RS正反器。
Shift register
• In digital circuits, a shift register is a group of
flip flops set up in a linear fashion which have
their inputs and outputs connected together in
such a way that the data is shifted down the line
when the circuit is activated.

Shift registers can have co parallel inputs and
outputs, including serial-in, parallel-out (SIPO)
and parallel-in, serial-out (PISO) types. There
are also types that have both serial and parallel
input and types with serial and parallel output.
There are also bi-directional shift registers
which allow you to vary the direction of the shift
register. The serial input and outputs of a
register can also be connected together to create
a circular shift register. One could also create
multi-dimensional shift registers, which can
perform more complex computation.
Shift register (移位暫存器)
• 在數位電路中,移位暫存器(英語:shift register)是
一種在若干相同時間脈衝下[1]工作的正反器為基礎[2]的
器件,數據以並行或串列的方式輸入到該器件中,然後每
個時間脈衝依次向左或右移動一個位元,[1]在輸出端進
行輸出
SIPO shift register
Destructive readout
• These are the simplest kind of shift register. The data string is
presented at 'Data In', and is shifted right one stage each time
'Data Advance' is brought high. At each advance, the bit on the
far left (i.e. 'Data In') is shifted into the first flip-flop's output.
The bit on the far right (i.e. 'Data Out') is shifted out and lost.
• 000010001100011010110101001000010000The data are stored
after each flip-flop on the 'Q' output, so there are four storage
'slots' available in this arrangement, hence it is a 4-Bit Register.
To give an idea of the shifting pattern, imagine that the register
holds 0000 (so all storage slots are empty). As 'Data In' presents
1,1,0,1,0,0,0,0 (in that order, with a pulse at 'Data Advance' each
time. This is called clocking or strobing) to the register, this is the
result. The left hand column corresponds to the left-most flip-
flop's output pin, and so on.
• So the serial output of the entire register is 11010000 (). As you
can see if we were to continue to input data, we would get
exactly what was put in, but offset by four 'Data Advance' cycles.
This arrangement is the hardware equivalent of a queue. Also, at
any time, the whole register can be set to zero by bringing the
reset (R) pins high.
• This arrangement performs destructive readout - each datum is
lost once it been shifted out of the right-most bit.
• The animation below shows the write/shift
sequence, including the internal state of the shift
register.
• http://en.wikipedia.org/wiki/Shift_register#Non-
destructive_readout
Non-destructive readout
• Non-destructive readout can be achieved using
the configuration shown (in image link provided)
below. Another input line is added - the
Read/Write Control. When this is high (i.e. write)
then the shift register behaves as normal,
advancing the input data one place for every
clock cycle, and data can be lost from the end of
the register. However, when the R/W control is
set low (i.e. read), any data shifted out of the
register at the right becomes the next input at
the left, and is kept in the system. Therefore, as
long as the R/W control is set low, no data can
be lost from the system.
Intel
• 1971 - 4004
—First microprocessor
—All CPU components on a single chip
—4 bit
• Followed in 1972 by 8008
—8 bit
—Both designed for specific applications
• 1974 - 8080
—Intel’s first general purpose microprocessor
Speeding it up
• Pipelining (see following page)
• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis
• Speculative execution

Instructions are scheduled when ready, independently of the original


program order
Using branch prediction and data flow analysis, some processors
speculatively execute instructions ahead of their actual appearance
in the program execution, holding the results in temporary locations
Performance Balance
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor
speed
Logic and Memory Performance Gap
Solutions
• Increase number of bits retrieved at one
time
—Make DRAM “wider” rather than “deeper”
(more bits)
• Change DRAM interface
—Cache
• Reduce frequency of memory access
—More complex cache and cache on chip
• Increase interconnection bandwidth
—High speed buses
—Hierarchy of buses
I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this
• Problem moving data
• Solutions:
—Caching
—Buffering
—Higher-speed interconnection buses
—More elaborate bus structures
—Multiple-processor configurations
精心設計
Typical I/O Device Data Rates
Key is Balance
• Processor components
• Main memory
• I/O devices
• Interconnection structures
Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
—Fundamentally due to shrinking logic gate size
– More gates, packed more tightly, increasing clock
rate
– Propagation time for signals reduced
• Increase size and speed of caches
—Dedicating part of processor chip
– Cache access times drop significantly
• Change processor organization and
architecture
—Increase effective speed of execution
—Parallelism
Problems with Clock Speed and Logic
Density
• Power
— Power density increases with density of logic and clock
speed
capacitance is the ability of a body to
— Dissipating heat
hold an electrical charge.
• RC delay resistor–capacitor circuit (RC circuit), or
RC filter or RC network
— Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
— Delay increases as RC product increases
— Wire interconnects thinner, increasing resistance
— Wires closer together, increasing capacitance
• Memory latency
— Memory speeds lag processor speeds
• Solution:
— More emphasis on organizational and architectural
approaches
Intel Microprocessor Performance
Increased Cache Capacity
• Typically two or three levels of cache
between processor and main memory
• Chip density increased
—More cache memory on chip
– Faster cache access
• Pentium chip devoted about 10% of chip
area to cache
• Pentium 4 devotes about 50%
More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line
—Different stages of execution of different
instructions at same time along pipeline
• Superscalar allows multiple pipelines
within single processor
—Instructions that do not depend on one
another can be executed in parallel

Pentium 4 instruction pipeline scheduling


Instruction pipeline

• the classic RISC pipeline is broken into five stages


with a set of flip flops between each stage.

— 1.Instruction fetch
— 2.Instruction decode and register fetch
— 3.Execute
— 4.Memory access
— 5.Register write back
Instruction pipeline (指令管線化)

• 指令管線化是為了讓計
算機和其它數位電子裝
置能夠加速指令的通過
速度(單位時間內被執
行的指令數量)而設計
的技術。
• http://zh.wikipedia.
org/wiki/%E6%8C
%87%E4%BB%A4
%E7%AE%A1%E7
%B7%9A%E5%8C
%96
superscalar

• A superscalar CPU architecture implements a form of parallelism


called instruction-level parallelism within a single processor. It
thereby allows faster CPU throughput than would otherwise be
possible at the same clock rate. A superscalar processor executes
more than one instruction during a clock cycle by simultaneously
dispatching multiple instructions to redundant functional units on
the processor. Each functional unit is not a separate CPU core but
an execution resource within a single CPU such as an arithmetic
logic unit, a bit shifter, or a multiplier.
• While a superscalar CPU is typically also pipelined, they are two
different performance enhancement techniques. It is theoretically
possible to have a non-pipelined superscalar CPU or a pipelined
non-superscalar CPU.
• The superscalar technique is traditionally associated with several
identifying characteristics. Note these are applied within a given
CPU core.
• Instructions are issued from a sequential instruction stream
• CPU hardware dynamically checks for data dependencies between
instructions at run time (versus software checking at compile
time)
• Accepts multiple instructions per clock cycle
From scalar to superscalar

• The simplest processors are scalar processors. Each instruction executed by a


scalar processor typically manipulates one or two data items at a time. By
contrast, each instruction executed by a vector processor operates
simultaneously on many data items. An analogy is the difference between
scalar and vector arithmetic. A superscalar processor is sort of a mixture of the
two. Each instruction processes one data item, but there are multiple redundant
functional units within each CPU thus multiple instructions can be processing
separate data items concurrently.
• Superscalar CPU design emphasizes improving the instruction dispatcher
accuracy, and allowing it to keep the multiple functional units in use at all
times. This has become increasingly important when the number of units
increased. While early superscalar CPUs would have two ALUs and a single
FPU, a modern design such as the PowerPC 970 includes four ALUs, two
FPUs, and two SIMD units. If the dispatcher is ineffective at keeping all of
these units fed with instructions, the performance of the system will suffer.
• In a superscalar CPU the dispatcher reads instructions from memory and
decides which ones can be run in parallel, dispatching them to redundant
functional units contained inside a single CPU. Therefore a superscalar
processor can be envisioned having multiple parallel pipelines, each of which
is processing instructions simultaneously from a single instruction thread.
Single instruction, multiple data (SIMD),
• Like vector addition, matrix operation
Diminishing Returns
• Internal organization of processors
complex
—Can get a great deal of parallelism
—Further significant increases likely to be
relatively modest
• Benefits from cache are reaching limit
• Increasing clock rate runs into power
dissipation problem
—Some fundamental physical limits are being
reached
New Approach – Multiple Cores
• Multiple processors on single chip
— Large shared cache
• Within a processor, increase in performance
proportional to square root of increase in
complexity
• If software can use multiple processors, doubling
number of processors almost doubles
performance
• So, use two simpler processors on the chip
rather than one more complex processor
• With two processors, larger caches are justified
— Power consumption of memory logic less than
processing logic
• Example: IBM POWER4
— Two cores based on PowerPC
POWER4 Chip Organization
Pentium Evolution (1)
• 8080
— first general purpose microprocessor
— 8 bit data path
— Used in first personal computer – Altair
• 8086
— much more powerful
— 16 bit
— instruction cache, prefetch few instructions
— 8088 (8 bit external bus) used in first IBM PC
• 80286
— 16 Mbyte memory addressable
— up from 1Mb
• 80386
— 32 bit
— Support for multitasking
Pentium Evolution (2)
• 80486
—sophisticated powerful cache and instruction
pipelining
—built in maths co-processor
• Pentium
—Superscalar
—Multiple instructions executed in parallel
• Pentium Pro
—Increased superscalar organization
—Aggressive register renaming
—branch prediction
—data flow analysis
—speculative execution
register renaming
• register renaming refers to a technique used to avoid
unnecessary serialization of program operations imposed
by the reuse of registers by those operations.
• Programs are composed of instructions which operate on
values. The instructions must name these values in order
to distinguish them from one another. A typical instruction
might say, add X and Y and put the result in Z. In this
instruction, X, Y, and Z are the names of storage locations.
• In order to have a compact instruction encoding, most
processor instruction sets have a small set of special
locations which can be directly named. For example, the
x86 instruction set architecture has 8 integer registers,
x86-64 has 16, many RISCs have 32, and IA-64 has 128.
In smaller processors, the names of these locations
correspond directly to elements of a register file.
Out-of-order execution & register renaming
• Consider this piece of code running on an
out-of-order CPU:
• Instructions 4, 5, and 6 are independent of
instructions 1, 2, and 3, but the processor
cannot finish 4 until 3 is done, because 3
would then write the wrong value.
• We can eliminate this restriction by
changing the names of some of the
registers:
• Now instructions 4, 5, and 6 can be
executed in parallel with instructions 1, 2,
and 3, so that the program can be
executed faster.
Pentium Evolution (3)
• Pentium II
— MMX technology
— graphics, video & audio processing
• Pentium III
— Additional floating point instructions for 3D graphics
• Pentium 4
— Note Arabic rather than Roman numerals
— Further floating point and multimedia enhancements
• Itanium
— 64 bit
— see chapter 15
• Itanium 2
— Hardware enhancements to increase speed
• See Intel web pages for detailed information on
processors
MMX
• Short for Multimedia Extensions, a set of
57 multimedia instructions built into Intel
microprocessors and other x86-
compatible microprocessors. MMX-enabled
microprocessors can handle many
common multimedia operations, such as
digital signal processing (DSP), that are
normally handled by a separate sound or
video card. However, only software
especially written to call MMX instructions
-- so-called MMX-enabled software -- can
take advantage of the MMX instruction
set.
PowerPC
• 1975, 801 minicomputer project (IBM) RISC
• Berkeley RISC I processor
• 1986, IBM commercial RISC workstation product, RT PC.
— Not commercial success
— Many rivals with comparable or better performance
• 1990, IBM RISC System/6000
— RISC-like superscalar machine
— POWER architecture
• IBM alliance with Motorola (68000 microprocessors), and
Apple, (used 68000 in Macintosh)
• Result is PowerPC architecture
— Derived from the POWER architecture
— Superscalar RISC
— Apple Macintosh
— Embedded chip applications
PowerPC Family (1)
• 601:
— Quickly to market. 32-bit machine
• 603:
— Low-end desktop and portable
— 32-bit
— Comparable performance with 601
— Lower cost and more efficient implementation
• 604:
— Desktop and low-end servers
— 32-bit machine
— Much more advanced superscalar design
— Greater performance
• 620:
— High-end servers
— 64-bit architecture
PowerPC Family (2)
• 740/750:
—Also known as G3
—Two levels of cache on chip
• G4:
—Increases parallelism and internal speed
• G5:
—Improvements in parallelism and internal
speed
—64-bit organization
Internet Resources
• http://www.intel.com/
—Search for the Intel Museum
• http://www.ibm.com
• http://www.dec.com
• Charles Babbage Institute
• PowerPC
• Intel Developer Home

You might also like