Lecture 7 - The CPU (Part 2)
Lecture 7 - The CPU (Part 2)
Components of a
computer & the CPU
Foundations of Computing
Part 2 – some more CPU
Foundations of Computing: Technical Strand
João Filipe Ferreira
1
Clocks [1]
Foundations of Computing
• This “frequency” is measured in Hz (Hertz’s) and is commonly seen
as MhZ (Mega-Hertz) or GHz (Giga-Hertz).
• For example:
• My work PC is 3.4 GHz
• This means that there are
3400000000 pulses per second
[1]
being sent simultaneously to all
components in the system 2
Registers
• Registers are small fast memory locations on board the processor
• They are typically implemented using flip-flops
[1]
Foundations of Computing
• They are designed to be small scratch pads for programs
• Their management is under the direct control of the ISA 3
The Microarchitecture
Level
• The microarchitecture view defines what structure a CPU
has and therefore what instructions a CPU is capable of
executing
Foundations of Computing
• Level of complexity varies greatly:
• Accumulator-based CPU (single operand)
• Stack-based CPU (single operand, stack control)
• Register-based CPU (two+ operands, register management)
• Other systems VLIW (Very Long Instruction Word), SIMD
(Single Instruction Multiple Data), MIMD (Multiple
Instruction Multiple Data), … the list is quite extensive
4
Data Path
• The ALU is just one element of the CPU Data-Path, we can largely divide
elements into two types:
• Combinatorial Inputs
• Output follows Input instantly (in an ideal world!)
• Combination of arithmetic / logic operations
Foundations of Computing
• Examples:
Combinatorial
• ALU /n Logic
/m
• Sign Extender
• Number format translator
• Adder
• Multiplexer (MUX)
• State Elements
• Sequential circuits (memory-based components that sync using clock)
/m
• Outputs / inputs change only on the clock edge /n Combinatorial
Logic
• Examples: /x
5
• Registers /x
• Memory State Register
• Program Counter
Fetch-Decode-Execute
A modern CPU can be described as implementing a
fetch–decode–execute cycle for instructions. These steps
dictate how to interpret and execute instructions
1.Fetch next instruction from memory into Instruction
Register
Foundations of Computing
2.Change Program Counter to point to next instruction
(why 2nd in the order? Principle of locality – to discuss later)
3.Determine type of instruction just fetched
4.If instructions uses a word in memory, determine where
to fetch word, if needed, into CPU register
5.Execute the instruction
6
6.Go to step 1 to begin executing following instruction
7.https://www.youtube.com/watch?v=jFDMZpkUWCw
Video 2: Making a Processor
Foundations of Computing
7
https://www.youtube.com/watch?v=-KTKg0Y1snQ
Sidebar: How CPU’s are made -
Video Links
• Intel Factory Tour -
http://www.youtube.com/watch?v=SeGqCl3YAaQ
Foundations of Computing
http://www.youtube.com/watch?v=Cg-mvrG-K-E
Foundations of Computing
• Only loads and stores should reference memory
• Provide plenty of registers
• In the past we had:
• RISC (reduced instruction set computers)
• CISC (complex instruction set computers)
• Most ISAs are now a mixture of these two design principles
• Have a look at the Intel X86 instruction Set
9
• https://en.wikipedia.org/wiki/X86_instruction_listings
Computation takes time
• A single stage computer is limited in the speed it can operate at
due to the depth of gates. We need to speed up its operation!
• Consider this circuit:
• The longest path (called critical path) is 7 NAND gates long
• If each NAND gate takes 5ns to operate, this circuit takes 35ns to give
Foundations of Computing
us an output
• This means at best this circuit can operate at: 1/35ns = 28.5 MHz!
• This seems really slow think of a laptop running at 2.9GHz (roughly
100x faster!)
10
Video 3: Moore’s Law
Foundations of Computing
11
https://www.youtube.com/watch?v=aWLBmapcJRU
1000
“Moore’s Law” CPU
Foundations of Computing
100
Performance
Processor-Memory
Performance Gap:
10 (grows 50% / year)
DRAM
1
12
198
198
198
198
198
198
198
198
198
198
199
199
199
199
199
199
199
199
199
199
200
0
1
2
3
4
5
6
7
8
9
0
2
3
4
5
6
7
8
9
0
Improving Performance
• In order to address the circuit depth issue and the memory
gap we must devise solutions to improve performance:
Foundations of Computing
• Pre-fetching
• Locality optimisation Further reading topic
• Pipelining
• Superscalar designs
• Multi-processor designs
• Out-of-order execution Further reading topic
Further reading topic (but will
• Multithreading look at this from an Operating
Systems perspective later on this
term)
13
Caches
• Caches are small, fast (i.e. faster than main memory) blocks of
memory located closer to the CPU than main memory
• They hold blocks of data that the cache believes is currently in
use, has just been used, or will likely be used by the processor
and memory locations near that (principle of locality)
Foundations of Computing
• Level 1 and 2 caches are typically on the CPU chip
• Level 3 cache can be off
chip but with newer
designs, can be found on
chip
• A combination of circuit
design and being physically
closer to the processor 14
make these caches faster
than main memory
Pre-fetch buffers
• As with caches, pre-fetch buffers are designed to manage the
expected data flow into the processor.
Foundations of Computing
15
Foundations of Computing
stage 1, stage 2, stage 3, ...
Foundations of Computing
stage 1, stage 2, stage 3, ...
[1]
Foundations of Computing
18
Pipelining
[1]
Foundations of Computing
• A five-stage pipeline
• The state of each stage as a function of time 19
• Nine clock cycles are illustrated
• Read the explanation in the book to understand this…
Superscalar Architectures
• Superscalar architectures act by having multiple physical
hardware units
• The CPU has multiple copies of each pipeline stage
• Each pipeline is independent of the other pipelines
• The CPU executes multiple instructions at once
Foundations of Computing
• Effectively a pre-cursor to multi-threading (each running
execution thread takes turns at being executed by CPU) [1]
20
Superscalar Architectures
• Having completely separate pipelines is a very expensive way
to add efficiency to the system, however it is simple
• Alternatively, we can add multiple functional units as the
decode and fetch / store stages are likely to be simpler
• Adds complexity however is more hardware efficient than the
Foundations of Computing
raw pipeline multiplication version
[1] 21
Multi-processor Systems
• Multiple processors is an obvious way to make a system more
effective without redesigning the processor
• A multicore system can be thought of as a being a natural
evolution from superscalar and multithreading systems:
• Multiple cores are capable of running multiple physical threads at
Foundations of Computing
once (like multithreading systems).
• Multiple cores have multiple functional units / pipelines like
superscalar systems.
• Example: Intel i7/i5/i3 or AMD
22
[1]
Many Core Processors
• Multi-core systems have found their natural evolution in array
and stream processor systems
• Graphics cards (GPUs) are typical examples of this kind of
processor
Foundations of Computing
[1]
23
Foundations of Computing
24
https://www.youtube.com/watch?v=VcoVYfDVEww
We’ll cover this in greater depth in a couple of weeks…
Summary
In these lectures we have looked at:
• Processors come in many different varieties.
• The design of the processor influences the way in which it can be utilised and the facilities it will
offer.
• A simple CPU has a very limited operational speed.
• Simple improvements such as doubling hardware are expensive.
• More complex improvements such as pipelining allow for greater efficiency at greater complexity.
Foundations of Computing
Directed Study
1. Read chapter on microarchitecture level - 6 th Edition, Structured Computer Organisation.
2. Read http://www.eecs.berkeley.edu/~knight/cs267/papers/cache_memories.pdf
Investigate
3. RISC and CISC architectures.
4. Moore’s Law (visit:
http://www.intel.co.uk/content/www/uk/en/history/museum-gordon-moore-law.html )
5. Locality optimisation.
6. Out of order execution.
7. Multithreading.
8. ARM 3 CPU
25
9. View the video links
References
[1] “Structured Computer Organisation”, Andrew Tanenbaum, 2008
Real CPUs: MIPS CPU
• If we consider a simple generic processor architecture such as
MIPS we have a simple CPU architecture diagram.
• Other ISAs will have different implementations as we will see
on the next slide (for the ARM & Pentium), each is tailored to
the specific CPU design.
Foundations of Computing
26
Real CPUs: The Pentium 3
Foundations of Computing
27
Real CPUs: UltraSPARC III
Cu Pipeline
A simplified
Foundations of Computing
representation of
the UltraSPARC III
Cu pipeline.
28
Real CPUs: The Microarchitecture of the
8051 CPU
The microarchitecture
of the 8051.
Foundations of Computing
29