0% found this document useful (0 votes)
13 views

Lecture 4

Uploaded by

ryuu.ducat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 4

Uploaded by

ryuu.ducat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

1

+ N. Navet – Computing Infrastructure 1 / Lecture 4

How processors execute programs

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Outline 2

◼ Executable programs are created by translation

◼ The Von Neumann computer architecture

◼ Components of a processor

◼ Instruction Set Architecture (ISA)

◼ Instruction format

◼ Instruction cycle

◼ Simulating the execution of a small program

◼ Addressing modes

◼ Interrupts in the instruction cycle

◼ Nested interrupts

◼ RISC vs CICS CPUs

◼ Designing Fast CPUs with a focus on instructions’ pipelines


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+Programs are created by successive 3

translations (performed by other programs)


Preprocessing Compilation Assembly Linking

Object files Executable


(binary) file (binary)

• Preprocessing: source to source translation (e.g., replace #define by


their definition, expand macros)
• Compilation: source translated into assembly language (still a text file)
• Assembly: translation into machine language instructions (binary file)
• Libraries: are binary code but they don’t run as programs of their own,
functions from libraries are called by programs. There are shared
libraries (vs static) loaded only once in memory and shared by several
processes
• Linking: merge all necessary binary files (here printf() comes from the
C library) and create a program ready to be loaded into memory and
executed
What makes a program run fast ?
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+Translations 4

Sll: shift value in register $2 by 2bits to the left and


write results in register $5

Machine only knows binary and its


own specific set of instructions!
Assembly language is a human-readable version of the equivalent machine
language - where each machine language instruction is assigned a code known as a
mnemonic, such as add
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Historically, software engineering has always
+moved towards higher level of abstractions 5

People used to code in machine language (binary),


then assembly code (text), then C, etc.. See
https://www.youtube.com/watch?v=8pTEmbeENF4 for an
excellent historical overview on this
C++ → Java → Python and Model-Driven Engineering → low-code or no-code platform

Higher level of abstractions

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Von Neumann Architecture (1/3) 6

◼ Contemporary computer designs are largely based on concepts


developed by John von Neumann (and colleagues) at Princeton
(at the end of WW2)

◼ Main novelty ? They proposed the concepts for a general-purpose


computer able to execute arbitrary tasks, whereas before
“computing devices” were specialized (hardwired) for specific
fixed tasks. First commercial general-purpose computers in 1950s.

◼ Referred to as the von Neumann architecture and is based on


three key concepts:
1. Data and instructions are stored in a single read-write memory
2. The content of this memory is addressable (load/store) by
location, without regard to the type of data contained there
3. Execution of program occurs in a sequential fashion (unless the
flow of execution is modified) from one instruction to the next

John von Neumann was a consultant on the ENIAC project, the first general purpose computer (operating
in base 10!) developed from 1943-45 by US military to create firing tables with ballistic equations.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+Von Neumann Architecture (2/3) 7

Instruction register +
Program counter

Store data + instructions

Control Unit contains:


ALU : Arithmetic Logic Unit - The Instruction Register (IR) holds the
performs simple operations instruction currently being executed or
like additions, multiplications decoded
and bitwise operations (shift, - Program Counter (PC) also called
and, or, …) on integers Instruction Pointer (IP): contains the
address where to “fetch” the next
instruction to be executed
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+Von Neumann Architecture (3/3) 8

Execution of 3
A+B Main memory
0
Operands are in memory

Control Unit copies operands


into ALU input registers and 1
instructs ALU to execute ADD

2 types of operations:
✓ Register  memory: e.g.,
1) load memory words into
registers or 2) store back
results into memory 2
✓ Register  register : ex.
ADD like here Once available, result is
copied into memory
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+Notes on previous slide 9

◼ The decoding of the instruction is


done by the Control Unit (CU), which
“orders” the ALU to execute the
instruction.

◼ In this lecture, we assume that the


ALU uses (input/output) standard
CPU registers plus a special register
to store intermediate results called the
accumulator (ACC) like early CPUs did

◼ An important function of the ALU is Modern superscalar CPUs (=


to set up bits or flags which give more than 1 instruction can
information to the CU about the be executed during a clock
result of an operation (e.g. overflow). cycle) have multiple
execution units (e.g., ALUs)
The flags are grouped together in the
operating in parallel and
“status word” (or “FLAGS register”) of
specialized for different
the CPU. functions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ 10

About registers

◼ Small memory areas located within the CPU


→ fastest access time (ns range)

◼ Each CPU core has its own set of registers

◼ E.g.: 16 general-purpose registers per core in the Intel x86-64


architecture

◼ 2 types of registers:
◼ General-purpose registers: rapid access to frequently-used data
such as local variables and calculation results
◼ Special-purpose registers: program counter, instruction register,
FLAGS register, etc (others on next slides)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Main special-purpose ✓ CPU exchanges data with
memory through 2 internal
registers registers: MAR for the
addresses and MBR for the
content
Memory address Memory buffer
register (MAR) register (MBR) ✓ + Program Counter
• Specifies the • Contains the data to (PC): register that holds
address in memory be written into the address of the
for the next read or memory or the data
write read from memory instruction to be fetched
• Also called Memory next.
Data Register (MDR)
✓ + Instruction Register
(IR): The fetched
instruction is loaded into
Accumulator Example use of the IR register
(AC) AC: ✓ + Program Status Word
• Register associated a = (b - c) + 123 (PSW) also called FLAGS
+ with an ALU
• Used to store
mov AC,[b] register (e.g. Intel):
intermediate results sub AC,[c] contains status bits that
without transferring add AC,123 reflect the current state of
them to memory mov [a],AC
the CPU: overflow at last
operation, runs in kernel
mode, interrupts enabled
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. or masked, etc
A special register: the FLAGS register 12

Some bits of FLAGS are set by the ALU and reflect the status of the most
recent operation:

N—Set when the result was Negative.

Z — Set when the result was Zero.

V— Set when the result caused an oVerflow.

C — Set when the result caused a Carry out of the leftmost bit.

P — Set when the result had even Parity.

….

The exact name and the size of this register vary depending on the CPU:
▪ Intel: FLAGS (16 bits), EFLAGS (32 bits), RFLAGS (64 bits)
▪ Known as Program Status Word (PSW) on older IBM CPUs

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Putting it all together 13

I/O AR and I/O BR


are equivalent of
MAR and MBR to
communicate with
I/O devices (to
control them or
write/read data)

Warning: this figure is


to explain the Von
Neumann architecture
but not realistic wrt
modern CPUs with
caches and possibly
many memory
references outstanding
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Sequential Flow of Control and 14

Branches – value of PC
The flow of control is
the order in which the
instructions
of a program are
executed (also called
“the sequence/flow of
execution”)
Does the for() loop
lead to “jumps” in
Program counter value as a function the PC values?
of time, here no branches in the code.
Von Neumann: “Execution occurs in
a sequential fashion (unless explicitly e.g., for (i=0; i<10;i++) {
modified) from one instruction to the next.” /* do D */
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
}
Sequential Flow of Control and 15

Branches – value of PC
Control flow is
altered by
Program counter value

control
instructions,
such as jump to
another address

Program counter as a function of time (a) Without branches. (b) With


branches (jump instruction)
Where are two control instructions
located in the example of the previous slide ?
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ 16

Instruction Set Architecture (ISA)


also called “machine language” or
“instruction set” for short
▪ Instruction set is the target language of
compilers, i.e., the instructions the
processor of the machine can execute.
▪ Each instruction is represented by a
unique sequence of bits: the opcode
▪ Opcodes have a symbolic representation
for developers, the mnemonic, e.g. ADD
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The Instruction Set Architecture Level 17

More than 4000 pages to


describe the instruction set
of the Intel I7!

The set of instructions


that can be executed
depends on the
execution mode: kernel
mode and user mode
The ISA level is the interface between
the compilers and the hardware.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The 4 types of instructions 18

• Data transferred from • Data transferred to or


processor to memory from a peripheral
(STORE) or from device by
memory to processor, transferring between
i.e. to some registers the processor and an
in the CPU (LOAD) I/O module

Processor- Processor-
memory I/O

Data
Control
processing

• Instructions that • Arithmetic or logic


specify that the operations on data
sequence of (ex: MUL)
execution is altered
(ex: JUMP)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The 4 types of instructions 19

• Data transferred from • Data transferred to or


processor to memory from a peripheral
(STORE) or from device by
memory to processor, transferring between
i.e. to some registers the processor and an
in the CPU (LOAD) I/O module

Processor- Processor-
memory I/O

Data
Control
processing

• Instructions that • Arithmetic or logic


specify that the operations on data
sequence of (ex: MUL)
execution is altered
(ex: JUMP)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
C
+ ontrol instruction: examples 20

◼ When the next instruction to be performed is not the one that


immediately follows in memory the current instruction..

◼ Unconditional branch: jump to a certain address

◼ Conditional branch:
Whether last instruction’s
result is zero or not can be
read from the FLAGS register

Could you imagine a code snippet that could be translated into the
assembly code located between address 202 and 211?
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Common Instruction Set operations 21

The number of different instructions varies from machine to machine. However,


the same general types of operations are found on all machines.

Data-
transfer
instructions

Data-
processing
instructions

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Common Instruction Set operations 22

continued

Control-flow
instructions

Data-
transfer
(to/from)
IO

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


D
+ ata Transfer instructions: example 23

of IBM EAS/390

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Data Transfer instructions in more 24

details
◼ If both source and destination are registers, then this is a simple
CPU internal operation

◼ If one or both operands are in memory, then the CPU (i.e., the CU):
1. Calculate the memory address based on addressing mode →
see later in the lecture
2. If the address refers to virtual memory, translate from virtual to
real-memory address →outside the scope of the course
3. Determine whether the addressed data is in cache → see later
in the lecture
4. If not, issue a command to memory module

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ 25

Processor instruction format

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


26

Data Types on the Core i7


Each CPU supports natively a limited set of data types, which
are the data types of the operands of the CPU instructions

The Core i7 numeric data types. Supported


types are marked with ×. Types marked with
‘‘64-bit’’ are only supported in 64-bit mode
(there is 32-bit execution mode for compatibility
Example BCD encoding, with 32-bit programs)
Each base 10 digit is 4 bits
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Data Types on the OMAP4430 ARM
27

CPU

The OMAP4430 ARM CPU (Texas Instrument) numeric data types.


Supported types are marked with ×.

Used a few years ago in


smartphones (LG) and tablet (Samsung)
(see https://en.wikipedia.org/wiki/OMAP)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Data Types on the ATmega168 AVR
28

CPU

The ATmega168 (Atmel) numeric data types.


Supported types are marked with ×.

Note that there is no floating point support.

Used for low-cost nodes in embedded systems

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Instruction format 29

How many instructions max. for this CPU ?

◼ Opcode = operation code = instruction code (it is a binary number)


◼ On some machines, all instructions have the same length but that is
usually not the case
◼ The number memory addresses or registers that can be referred to
by an operand depends on the number of bits for the operands
◼ For operands that are memory addresses, the total memory that can
be addressed depends on whether consecutive addresses are
separated by 1 byte, 4 bytes, 8 bytes, etc (typically it is 8 bytes
today).
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Instruction format 30

◼ Symbolic representation of machine


instructions in assembly language using
Mnemonic ease the programming /
understanding of generated assembly code
◼ Usually, for a given instruction length (here
16 bits), more than 1 instruction format is used
(with “expanding opcodes”, more on that later)
◼ Many processors support instructions having different lengths (e.g.,
32 and 64 bits).

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Instruction format: different # of 31

operands
Eg: NOP, RET Eg: JUMP (branch

Eg: ADD with result in one operand Eg: ADD with result in third operand

Four common instruction formats: (a) Zero-address instruction. (b) One-


address instruction (c) Two-address instruction. (d) Three-address
instruction.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Instruction format : different # of 32

operands

Eg: NOP, RET Eg: JUMP (branch)

ADD with result stored in first operand ADD with result stored in first operand

Four common instruction formats: (a) Zero-address instruction. (b) One-


address instruction (c) Two-address instruction. (d) Three-address
instruction.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


N
+ umber of addresses per instruction 33

Same program coded with one-, two-, and three-


address instructions

2 source operands, 1 destination operand

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+Number of addresses per instruction 34

Same program executed with one-, two-, and


three-address instructions

AC: accumulator (register) to store


intermediate arithmetic results used
as an implicit operand

SUB, ADD, … destination operand is implicit

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. What do we observe wrt code size ?
+Number of addresses per instruction 35

Same program executed with one-, two-, and


three-address instructions

AC: accumulator (register) to store


intermediate arithmetic results used
as an implicit operand

SUB, ADD, … destination operand is implicit

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. What do we observe wrt code size ?
+Number of addresses per instruction 36

Same program executed with one-, two-, and


three-address instructions

✓ Fewer operands per instruction results in instructions that are


simpler, requiring a less complex processor.
It also results in shorter instructions (less operands).

✓ On the other hand, programs contain more instructions, which


in general results in larger programs and often longer
execution times → CPU design choice

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Expanding Opcodes (1) 37

◼ Expanding opcodes (=length of opcode is variable) provides flexibility


in the ISA design and allows to implement instructions with different
number of operands efficiently (i.e., all bits are used)

◼ Principle: total instruction size remains fixed, but opcode size and
operands number vary:
◼ when the opcode is short, like below, a lot of bits are left to hold operands,
◼ If an instruction has no operands (such as Halt) or few, all/most of the bits can
be used for the opcode.

An instruction with a 4-bit opcode and


three 4-bit address fields.
Instruction format above suited for a machine
with 16 registers (i.e., 4-bit register address)
which are used for all arithmetic operations

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Expanding Opcodes (2) 38

Keep 1111 Keep


for 8bit opcodes 1111 1111 1111
for 16bit opcodes

Keep 1111 1111


and
1111 1110
for 12bit opcodes

This expanding opcode allows 15 three-operand instructions, 14 two-


operand instructions, 31 one-operand instructions, and 16 zero-operand
instructions. The fields marked xxxx, yyyy, and zzzz are 4-bit address fields.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Expanding Opcodes (2) 39

How many 8bit opcodes and 12bit opcodes if 1111 1110


is not reserved for 12bit opcodes

Keep
Keep 1111 1111 1111 1111
for 8bit opcodes for 16bit opcodes

Keep 1111 1111


and
1111 1110
for 12bit opcodes

This expanding opcode allows 15 three-operand instructions, 14 two-


operand instructions, 31 one-operand instructions, and 16 zero-operand
instructions. The fields marked xxxx, yyyy, and zzzz are 4-bit address fields.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ 40

Data “alignment” in memory

▪ In modern computers, memory is byte oriented: each byte has


its own address. However, the HW usually does not allow to
fetch / write individual bytes (but words).
▪ These one-byte cell are generally grouped together into
words of 4 bytes long (32 bits) or 8 bytes long (64 bits) with
instructions to read/write these words.
▪ For the sake of efficiency, most systems require the words
that can be addressed (for reading or writing) in memory to
be aligned: the address of the data in memory is a multiple of
the word size.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
41

Memory Models

An 8-byte word in memory.


(a) Aligned. (b) Not aligned. By far, most CPUs
require that words in memory are aligned
(= start at addresses that are multiple of 8, 32, or 64 bits depending on the
word size & HW)

e.g.: DDR3/4/5 controllers support only aligned 64-bit accesses, a program


can read/write a single byte but it will require the transfer of 64 bits anyway
(64 bits is the width of the data bus)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
42

Memory Models

An 8-byte word in memory.


(a) Aligned. (b) Not aligned. By far, most CPUs
require that words in memory are aligned
(= start at addresses that are multiple of 8, 32, or 64 bits depending on the
word size & HW)

e.g.: DDR3/4/5 controllers support only aligned 64-bit accesses, a program


can read/write a single byte but it will require the transfer of 64 bits anyway
(64 bits is the width of the data bus)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ 43

Instruction execution

The processing required for a single


instruction is called the instruction cycle: it
consists of 3 main steps:
fetch – decode - execute

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Instruction execution: coarse-
44

grained view (i.e., 2 steps)


Here, and in the following, the execute stage
includes the decode stage

Program execution halts only if the machine is turned


off, an unrecoverable error occurs (e.g. HW fault), or a
program instruction that halts the CPU is encountered
(e.g. , HALT halts the CPU until the next HW interrupt occurs)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+Instruction execution in more details 45

Steps 1,2 Steps 3,4,5,6

1. Fetch the next instruction (pointed by the program counter) from


memory into the instruction register.
2. Change the program counter to point to the following instruction.
3. Determine the type of instruction just fetched (decode stage).
4. If the instruction has operands, determine the addresses at which
they are in memory (see later in the slide set).
5. Fetch the operands, if any, and copy them into CPU registers.
6. Execute the instruction.
7. Go to step 1 to begin executing the next instruction.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+Instruction 46

execution: fine-
grained view
CIR : current instruction register
= instruction register This figure is
simplified in the
sense that it does
not consider that
instructions can
CU be in cache
= Control Unit, see slide 7
directs the operations of the processor
and other resources such as buses
by providing control signals

Steps 4, 5 and 6 of previous


slide in “execute stage”

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Characteristics of a hypothetical 47

machine How many different instructions max. ? How many


words can be addressed in memory ?

The word size of the machine is 16 bits

Characteristics of the machine:


1. The cpu has a single
data register called
the accumulator (AC), as
well as the PC and IR.
2. Instructions have one operand
3. Both instructions and data
are 16 bits long.
4. Here we consider only the
3 instructions whose
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
opcodes are on the left
Characteristics of a hypothetical 48

machine

The word size of the machine is 16 bits

Characteristics of the machine:


1. The cpu has a single
data register called
the accumulator (AC), as
well as the PC and IR.
2. Instructions have one operand
3. Both instructions and data
are 16 bits long.
4. Here we consider only the
3 instructions whose
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
opcodes are on the left
Simulating the execution of 3 instructions 49

Fetch Execute
All numbers in
Hexadecimal Instruction #1
2 at address 300

3 4 Instruction #2

5 6
Instruction #3

Nb: MAR and MBR ignored here for simplicity


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Simulating the execution of a program 50

Fetch Execute
All numbers in
Hexadecimal Instruction #1

3 4 Instruction #2

5 6
Instruction #3

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Simulating the execution of a program 51

Fetch Execute
All numbers in
Hexadecimal Instruction #1

4 Instruction #2

5 6
Instruction #3

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Simulating the execution of a program 52

Fetch Execute
All numbers in
Hexadecimal Instruction #1

Instruction #2

5 6
Instruction #3

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Simulating the execution of a program 53

Fetch Execute
All numbers in
Hexadecimal Instruction #1

Instruction #2

6
Instruction #3

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Simulating the execution of a program 54

Fetch Execute
All numbers in
Hexadecimal Instruction #1

Instruction #2
The program
fragment shown adds
the contents of the
memory word at
address 940 to the
contents of the
memory
word at address 941 Instruction #3
and stores the result
in the latter location

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Addressing modes 55

Most instructions have operands, but there are different ways to


indicate where they are in memory / in which registers they are
-
Question is: how the bits of an address field (of an instruction)
should be interpreted to find the operands ?
-
Objectives of addressing modes include 1) reference as many
memory locations as possibly with the smallest possible address
fields 2) allows for efficient compilation

Nb:
- not all machines support all addressing modes
- there are other addressing modes not covered here
- up to know we assumed operands are memory addresses or
registers, we will discover they can be values as well

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Addressing modes 56

Most instructions have operands, but there are different ways to


indicate where they are in memory / in which registers they are
-
Question is: how the bits of an address field (of an instruction)
should be interpreted to find the operands ?
-
Objectives of addressing modes include 1) reference as many
memory locations as possibly with the smallest possible address
fields 2) allows for efficient execution

Nb:
- not all machines support all addressing modes
- there are other addressing modes not covered here
- up to know we assumed operands are memory addresses or
registers, we will learn they can be values as well

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


57

Immediate Addressing Mode

An immediate instruction for


loading 4 into register 1.
Simplest form of addressing: the instruction contains the
value of the operand itself rather than its address
✓ This mode is for constants and initial values of variables,
efficient as no need to fetch the data from memory
✓ Cannot be used for the destination operand, just for
source operand
✓ But .. the value is restricted to the size of the address field
(which, typically, is small)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


58
Register Addressing Mode

ADD R1 R2

Registers contain the operands


Most efficient and common addressing mode:

✓ Instructions are shorter (e.g., 4 bits to specify a register)


✓ No memory access is needed (speed-up the execution)

Optimization done by compiler or programmer: temporarily store


frequently accessed variables in registers

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Direct Addressing Mode 59

ADD R1 ADDR
MOV ADDR R1
Direct Addressing: address contains the location
of the operand in memory
✓ the instruction will always access exactly the
same memory location → only for global
variables of a program
✓ Address space limited by operand size!

EAX is a general-purpose
register on Intel platforms

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Register Indirect Addressing 60

Address of the operand in not


“hardcoded” in the instruction, as
with direct addressing – a register
used by the instruction contains the
address of the operand

Constants
are
indicated
by the #
sign

BLT ?
An assembly program that computes the sum of the elements of an
array of 1024 integers of 4 bytes. Here constants are prefixed by #
and (R2) means the address contained in register 2. Note that ADD is
used with 2 different addressing modes here.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory Indirect Addressing 61

Same as with register indirect addressing except that


address of the operand is stored in a memory area
and not a register
Memory Indirect Addressing is well suited for the
implementation of pointers / references (the address of the
pointer does not change, but where it is pointing at does)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Example: (some) addressing 62

modes of the Intel Pentium

The leafs of this tree are the 5


addressing modes seen in the
lecture

Memory indirect

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ 63

Interrupts:
other modules (I/O, memory)
and software may interrupt the
normal processing of the processor

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


64

Classes of interrupts
Please refer to slide #24 of lecture #3, which is more detailed
Program Generated by some condition that occurs as a result of an instruction
execution, such as arithmetic overflow, division by zero, attempt to
execute an illegal machine instruction, or reference outside a user's
allowed memory space.
Timer Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis.
I /O Generated by an I/O controller, to signal normal completion of an
operation, request service from the processor, or to signal a variety of
error conditions.
Hardware failure Generated by a failure such as power failure or memory parity error.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Transfer of control via interrupt 65

An interrupt is just an interruption of the normal flow of execution


User Program Interrupt Handler

i
Interrupt
occurs here i+1
The program does not have to do
anything special - the processor and
the operating system are responsible
M for suspending the user program and
then resuming it at the same point.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction cycle with interrupts 66

A program, provided it has


sufficient rights to do that, can If an interrupt has
disable interrupts during some occurred, 1) the
portions of its execution context of the
program is saved 2)
the PC is set to the
Interrupts involve some overhead but they address of the ISR
allow I/Os to execute in parallel with program
execution and thus usually they save time wrt
synchronous I/O - see slide #26 of lecture #3
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Nested interrupts Interrupt
Interrupt Y occurs
67

User program handler X

during X - even if Y is
urgent it will have to
Solution A: Sequential wait until X finishes
Interrupt
interrupt processing handler Y

(a) Sequential interrupt processing

Interrupt
User program handler X

Nested
interrupt processing:
Interrupt
interrupts have priorities, handler Y

the most urgent task


is executed first
(b) Nested interrupt processing

© 2016 Pearson Education, Inc., Hoboken, NJ. AllFigure


rights reserved.
3.13 Transfer of Control with Multiple Interrupts
Nested interrupts Interrupt
68

User program handler X

Solution A: Sequential interrupt


processing. Interrupts are disabled
Interrupt
while an interrupt is being handler Y

processed.

(a) Sequential interrupt processing

Interrupt
User program handler X

Solution B: Nested
interrupt processing.
Interrupts have priorities, Interrupt
handler Y
the most urgent can
preempt the less urgent and
get executed first
(b) Nested interrupt processing

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Figure 3.13 Transfer of Control with Multiple Interrupts
+ 69

RISC vs CISC processors

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Instruction set design:
+ Reduced Instruction Set Computers (RISC) 70

versus Complex Instruction Set Computers (CISC)


RISC
CISC LOAD Reg1, mem_addr_1
MULT mem_addr_1, mem_addr_2 LOAD Reg2, mem_addr_2
# result written back at mem_addr_1 PROD Reg1, Reg2
Complex instructions in HW STORE mem_addr_1, Reg1
https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/

- Intel and AMD X86 CPUs are among the


last few representatives of CISC CPUs
- ARM (Advanced RISC Machines)
processors are RISC processors .. with
some complex instructions
- The trend is to used SOCs, such as the
Apple M1 which is based on ARM, and
contains specialized computation units

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Instruction set design: many issues…
+ 71

RISC : Reduced Instruction Set Computer (simple and Intel X86


fast instructions, less transistors (thus energy efficient) family is CISC
but more memory needed because programs are larger) as originally
VS meant for
CISC :Complex Instruction Set Computer (much larger system with
set of instructions with multiple addressing modes) little memory

Today’s processors cannot be classified as pure RISC or CISC anymore


(even RISC have multimedia instructions). Beyond RISC and CISC what
becomes more important is the efficiency of parallelism, eg. instruction
level parallelism (superscalar processor)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Characteristics
+ of some historical 72

CISCs & RISCs processors


Complex I nstruction Set Reduced I nstruction
(CI SC)Computer Set (RI SC) Computer
Characteristic IBM VAX Intel SPARC MIPS
370/168 11/780 80486 R4000
Year developed 1973 1978 1989 1987 1991
Number of 208 303 235 69 94
instructions
I nstruction size (bytes) 2–6 2–57 1–11 4 4
Addressing modes 4 22 11 1 1
Number of general- 16 16 8 40 - 520 32
purpose registers
Control memory size 420 480 246 — —
(kbits)
Cache size (kB) 64 64 8 32 128
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ 74

Designing fast CPUs

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Techniques
+ to make CPUs faster 75

◼ Higher frequency but there is a limit to that!


◼ Parallel computing (e.g., multi-ALU, multi-core platforms) and
specialized processors (Graphic Processing Unit - GPU).
Making each CPU core faster e.g. with the following mechanisms:
◼ Pipelining: a processor can work on several instructions at the same time
◼ Branch prediction: by predicting (based on history) the next group of
instructions that will be executed (if or else block?), the processor can
prefetch them and save latencies due to memory transfer.
◼ Data flow analysis: CPU analyzes which instructions are dependent on
each other’s results, or data, to create an optimized schedule of instructions
(e.g. faster because intermediate results remains in registers or instructions
remain in cache). Original instructions order is not preserved.
◼ Speculative execution: Some processors speculatively execute instructions
ahead of their actual appearance in the program execution, holding the
results in temporary locations.
◼ …

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Processors with instruction 76

pipelining
◼ CPU have abandoned the simple model of fetching, decoding and
executing one instruction at a time
◼ Pipelining: CPU might have separate fetch, decode, and execute
units, so that while it is executing instruction n, it could also be
decoding instruction n + 1 and fetching instruction n + 2 (see (a)).
Instr. N+2 Instr. N+1 Instr. N

Nb: (a) A three-stage pipeline.


▪ the instruction cycle in modern processors has more than 3
stages, and pipelines are thus longer and thus more efficient
▪ The throughput is limited byFigure 1-7. (a) A three-stage pipeline. (b) A superscalar CPU.
the slowest stage !
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Superscalar processors 77

◼ CPU have abandoned the simple model of fetching, decoding and


executing one instruction at a time
◼ Superscalar CPU: multiple instructions can be executed in parallel
by several execution units (ex: ALU, FPU): as soon as an execution
unit becomes available, it looks in the holding buffer to see if
there is an instruction it can execute

A superscalar CPU.
Figure 1-7. (a) A three-stage pipeline. (b) A superscalar CPU.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

You might also like