Instruction Set Architectures
Instruction Set Architectures
Chapter 5
Chapter 5 Objectives
• Understand the factors involved in instruction
set architecture design.
• Gain familiarity with memory addressing
modes.
• Understand the concepts of instruction-level
pipelining and its affect upon execution
performance.
5.1 Introduction
• This chapter builds upon the ideas in Chapter 4.
• We present a detailed look at different
instruction formats, operand types, and memory
access methods.
• We will see the interrelation between machine
organization and instruction formats.
• This leads to a deeper understanding of
computer architecture in general.
5.2 Instruction Formats
Instruction sets are differentiated by the following:
• Number of bits per instruction.
• Stack-based or register-based.
• Number of explicit operands per instruction.
• Operand location.
• Types of operations.
• Type and size of operands.
5.2 Instruction Formats
Instruction set architectures are measured
according to:
• Main memory space occupied by a program.
• Instruction complexity.
• Instruction length (in bits).
• Total number of instruction in the instruction
set.
5.2 Instruction Formats
In designing an instruction set, consideration is
given to:
• Instruction length.
– Whether short, long, or variable.
• Number of operands.
• Number of addressable registers.
• Memory organization.
– Whether byte- or word addressable.
• Addressing modes.
– Choose any or all: direct, indirect or indexed.
5.2 Instruction Formats
• Byte ordering, or endianness, is another major
architectural consideration.
• If we have a two-byte integer, the integer may be
stored so that the least significant byte is followed
by the most significant byte or vice versa.
– In little endian machines, the least significant byte is
followed by the most significant byte.
– Big endian machines store the most significant byte
first (at the lower address).
5.2 Instruction Formats
• As an example, suppose we have the
hexadecimal number 12345678.
• The big endian and small endian arrangements of
the bytes are shown below.
Note: This is the internal storage format, usually invisible to the user
5.2 Instruction Formats
• Big endian:
– Is more natural.
– The sign of the number can be determined by looking
at the byte at address offset 0.
– Strings and integers are stored in the same order.
• Little endian:
– Makes it easier to place values on non-word
boundaries, e.g. odd or even addresses
– Conversion from a 32-bit integer to a 16-bit integer
does not require any arithmetic.
Standard…What Standard?
• Intel (80x86), VAX are little-endian
• IBM 370, Motorola 680x0 (Mac), and most RISC systems
are big-endian
• Makes it problematic to translate data back and forth
between say a Mac/PC
• Internet is big-endian
– Why? Useful control bits in the Most Significant Byte can be
processed as the data streams in to avoid processing the rest of
the data
– Makes writing Internet programs on PC more awkward!
– Must convert back and forth
What is an instruction set?
• The complete collection of instructions that are
understood by a CPU
– The physical hardware that is controlled by the
instructions is referred to as the Instruction Set
Architecture (ISA)
• The instruction set is ultimately represented in
binary machine code also referred to as object
code
– Usually represented by assembly codes to human
programmer
Elements of an Instruction
• Operation code (Op code)
– Do this
• Source Operand reference(s)
– To this
• Result Operand reference(s)
– Put the answer here
• Next Instruction Reference
– When you are done, do this instruction next
Where are the operands?
• Main memory
• CPU register
• I/O device
• In instruction itself
Load 800
Load 800
Load 800
Load R1[800]
Addressing Example
• These are the values loaded into the accumulator
for each addressing mode.
We completed 9
instructions in
the time it would
take to
sequentially
complete two
instructions!
Assumption for
simplicity:
Stages are
of equal
duration
Instruction-Level Pipelining
• The theoretical speedup offered by a pipeline can be
determined as follows:
Let tp be the time per stage. Each instruction represents a
task, T, in the pipeline, with n tasks.
The first task (instruction) requires k tp time to complete in
a k-stage pipeline. The remaining (n - 1) tasks emerge from
the pipeline one per cycle. So the total time to complete the
remaining tasks is (n - 1)tp.
Thus, to complete n tasks using a k-stage pipeline requires:
(k tp) + (n - 1)tp = (k + n - 1)tp.
Instruction-Level Pipelining
Branch
Not taken
Continue with
next instruction
as usual
Branch in a Pipeline – Flushed
Pipeline
Branch
Taken
(goto Instr 15)
Flushed
Instructions
Dealing with Branches
• Multiple Streams
• Prefetch Branch Target
• Loop buffer
• Branch prediction
• Delayed branching
Multiple Streams
• Have two pipelines
• Prefetch each branch into a separate pipeline
• Use appropriate pipeline
Only wrong
00 once for
10
branches that
Start State execute an
unusual
direction once
(e.g. loop)
01 11
Branch Prediction
• State not stored in memory, but in a
special high-speed history table
Branch
Instruction Target
Address Address State
FF0103 FF1104 11
…
Dealing with Branches – RISC
Approach
• Delayed Branch – used with RISC machines
– Requires some clever rearrangement of instructions
– Burden on programmers but can increase performance
Form of branch
prediction –
compiler
predicts based
on context
Delay Slot Effectiveness
• On benchmarks
– Delay slot allowed branch hazards to be hidden 70% of the
time
– About 20% of delay slots filled with NOPs
– Delay slots we can’t easily fill: when target is another branch
• Philosophically, delay slots good?
– No longer hides the pipeline implementation from the
programmers (although it will if through a compiler)
– Does allow for compiler optimizations, other schemes don’t
– Not very effective with modern machines that have deep
pipelines, too difficult to fill multiple delay slots
Other Pipelining Overhead
• Each stage of the pipeline has overhead in moving data
from buffer to buffer for one stage to another. This can
lengthen the total time it takes to execute a single
instruction!
• The amount of control logic required to handle memory
and register dependencies and to optimize the use of the
pipeline increases enormously with the number of stages.
This can lead to a case where the logic between stages
is more complex than the actual stages being controlled.
• Need balance, careful design to optimize pipelining
Pipelining on the 486/Pentium
• 486 has a 5-stage pipeline
– Fetch
• Instructions can have variable length and can make this
stage out of sync with other stages. This stage actually
fetches about 5 instructions with a 16 byte load
– Decode1
• Decode opcode, addressing modes – can be determined
from the first 3 bytes
– Decode2
• Expand opcode into control signals and more complex
addressing modes
– Execute
– Write Back
• Store value back to memory or to register file
486 Pipelining Examples
Fetch D1 D2 Ex WB MOV R1, M
Fetch D1 D2 Ex WB MOV R1, R2
Fetch D1 D2 Ex WB MOV M, R3