Chapter 1 Edit
Chapter 1 Edit
Chapter 1
COMPUTER SYSTEM
PERFORMANCE
1
02/03/2019
Computer organization
Encompasses all physical aspects of computer systems.
E.g., circuit design, control signals, memory types.
How does a computer work?
Computer architecture
Logical aspects of system implementation as seen by
the programmer.
E.g., instruction sets, instruction formats, data types,
addressing modes.
How do I design a computer?
Computer Components
2
02/03/2019
An Example System
3
02/03/2019
An Example System
An Example System
4
02/03/2019
An Example System
This system has 64MB of (fast)
synchronous dynamic RAM
(SDRAM) . . .
Google search for SDRAM?
An Example System
Hard disk capacity determines
the amount of data and size of
programs you can store.
10
5
02/03/2019
An Example System
11
An Example System
Ports allow movement of data
between a system and its external
devices.
12
6
02/03/2019
An Example System
13
An Example System
System buses can be augmented by
dedicated I/O buses. PCI, peripheral
component interface, is one such bus.
14
7
02/03/2019
An Example System
The number of times per second that the image on
the monitor is repainted is its refresh rate. The dot
pitch of a monitor tells us how clear the image is.
8
02/03/2019
9
02/03/2019
without modification.
Level 2: Machine Level
Also known as the Instruction Set Architecture (ISA) Level.
Consists of instructions that are particular to the
architecture of the machine.
Programs written in machine language need no compilers,
interpreters, or assemblers.
10
02/03/2019
21
An I/O system
bottleneck.
22
11
02/03/2019
This is a general
depiction of a von
Neumann system:
These computers
employ a fetch-
decode-execute cycle
to run programs as
follows . . .
23
24
12
02/03/2019
25
26
13
02/03/2019
27
28
14
02/03/2019
Pipelining
Branch prediction
Data flow analysis
Speculative execution
Pineling
15
02/03/2019
For every clock cycle, one small step is carried out, and
the stages are overlapped.
16
02/03/2019
Branch Prediction
17
02/03/2019
Performance
18
02/03/2019
Response time
How long it takes to do a task
Throughput
Total work done per unit time (tasks/transactions/…
per hour)
How are response time and throughput affected
by
Replacing the processor with a faster version?
Adding more processors?
We’ll focus on response time for now…
Relative Performance
19
02/03/2019
CPU Clocking
CPU Time
Performance improved by
Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock rate
against cycle count
20
02/03/2019
21
02/03/2019
CPI Example
22
02/03/2019
CPI Example
Performance Summary
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
23
02/03/2019
Amdahl’s Law
24
02/03/2019
Amdahl’s Law
25
02/03/2019
26
02/03/2019
The 0 generation
The 1st generation
The 2nd generation
The 3rd generation
Evolution of Intel process
Generation Zero
(1860 - 1929).
27
02/03/2019
28
02/03/2019
29
02/03/2019
30
02/03/2019
31
02/03/2019
The Microprocessors
In 1971, Intel developed 4004: the first chip to contain all
of the components of a CPU on a single chip. The 4004 can
add two 4-bit numbers and can multiply only by repeated
addition.
In 1972, Intel developed 8008. This was the first 8-bit
32
02/03/2019
1970s
1980s
33
02/03/2019
1990s
Recent processors
34
02/03/2019
35
02/03/2019
RISC Machines
The underlying philosophy of RISC machines is that a system
is better able to manage program execution when the program
consists of only a few different instructions that are the same
length and require the same number of clock cycles to decode
and execute.
RISC systems access memory only with explicit load and
store instructions.
In CISC systems, many different kinds of instructions access
memory, making instruction length variable and fetch-decode-
execute time unpredictable.
36
02/03/2019
37
02/03/2019
The total clock cycles for the CISC version might be:
(2 movs 1 cycle) + (1 mul 30 cycles) = 32 cycles
While the clock cycles for the RISC version is:
(3 movs 1 cycle) + (5 adds 1 cycle) + (5 loops 1 cycle) = 13 cycles
38
02/03/2019
This is how
registers can be
overlapped in a
RISC system.
The current
window pointer
(CWP) points to
the active register
window.
Flynn’s Taxonomy
39
02/03/2019
40
02/03/2019
Chapter 2
COMPUTER
FUNCTION
Contents
1
02/03/2019
Computer Components
2
02/03/2019
What is a program
A sequence of steps
For each step, an
arithmetic or logical
operation is done
For each operation, a
different set of control
signals is needed
Also need temp storage
(memory) and way to get
input and output
3
02/03/2019
Software
A sequence of codes or instructions
Part of the hardware interprets each instruction and generates
control signals
Provide a new sequence of codes for each new program instead
of rewiring the hardware
Major components:
CPU
Instruction interpreter
Module of general-purpose arithmetic and logic functions
I/O Components
Input module : Contains basic components for accepting data and
instructions and converting them into an internal form of signals usable
by the system
Output module : Means of reporting results
Memory
Memory address Memory buffer
register (MAR) register (MBR)
• Specifies the • Contains the data
address in memory to be written into
for the next read or memory or
write receives the data
read from memory
4
02/03/2019
Computer system
At the most basic level, a computer is a device consisting of
four parts:
A processor to interpret and execute programs
A memory to store both data and programs
A mechanism for transferring data to and from the outside
world.
Bus (interconnection among parts)
5
02/03/2019
6
02/03/2019
7
02/03/2019
8
02/03/2019
Instruction Cycle
9
02/03/2019
Fetch Cycle
Execute Cycle
10
02/03/2019
Explain
11
02/03/2019
12
02/03/2019
Interrupt
Classes of Interrupts
13
02/03/2019
Software Interrupts
14
02/03/2019
15
02/03/2019
Interrupt processing
16
02/03/2019
Interrupt Cycle
17
02/03/2019
18
02/03/2019
Program timing
Long IO wait
19
02/03/2019
Multiple Interrupts
20
02/03/2019
Sequential interrupt
Processing
Nested interrupt
Processing
21
02/03/2019
22
02/03/2019
Exception Table
code for
exception handler 0
Exception code for
Table exception handler 1
0
1 code for
2 exception handler 2
...
n-1
...
code for
Exception
exception handler n-1
numbers
23
02/03/2019
I/O Function
24
02/03/2019
I/O Module
IO systems
IO peripherals
25
02/03/2019
Questions to investigate:
How does the CPU communicate with I/O
devices?
How do I/O devices communicate with the CPU?
How to transmit data efficiently, without errors?
How to connect the I/O devices to the CPU?
IO addressing
According to address separation there are two
possibilities:
Separate I/O and Memory address space
Shared I/O and Memory address space
26
02/03/2019
Alternative implementation
The CPU has a shared bus for the I/O and the
memory
A selector signal determines the target of the
communication
More cost effective (less wires)
Example: x86
27
02/03/2019
Memory mapped IO
28
02/03/2019
I/O Space
It is important to notice that these I/O addresses are
NOT memory-mapped addresses on the 80x86
machines.
Special instructions (IN/OUT)
Memory
29
02/03/2019
Interconnection Structures
Computer modules
Computer is a network of basic modules.
There must be paths for connecting the modules.
The collection of paths connecting the various modules is called
the interconnection structure. The design of this structure will
depend on the exchanges that must be made among modules.
Computer Modules
Memory modul :
A memory module will
consist of N words of equal
length.
Each word is assigned a
unique numerical address (0,
1, …, N - 1)
A word of data can be read
from or written into the
memory.
30
02/03/2019
IO modul :
Function similar to memory.
An I/O module may send interrupt signals to the processor.
Processor
The processor reads in instructions and data, writes out data
after processing, and uses control signals to control the overall
operation of the system.
The processor also receives interrupt signals.
Types of transfers
31
02/03/2019
Bus interconnection
32
02/03/2019
System Bus
33
02/03/2019
34
02/03/2019
Bus Structure
Multiple-Bus Hierachies
35
02/03/2019
Multiple-Bus Hierachies
High-performance architecture
Data Bus
36
02/03/2019
Address Bus
Control Bus
Used to control the access and the use of the data and
address lines
Because the data and address lines are shared by all
components there must be a means of controlling their
use
Control signals transmit both command and timing
information among system modules
Timing signals indicate the validity of data and address
information
Command signals specify operations to be performed
37
02/03/2019
38
02/03/2019
39
02/03/2019
40
02/03/2019
Bus Types
Dedicated (functional)
Separate data & address lines
Multiplexed (Time multiplexing)
Shared lines
Address valid or data valid control line
Advantage - fewer lines
Disadvantages
More complex control
Performance – cannot have address and data simultaneously on
bus
Dedicated (physical)
Bus connects subset of modules
Example: all I/O devices on a slow bus
Provides high throughput, but cost and complexity increase
Bus Arbitration
41
02/03/2019
Timing
Synchronous diagram
42
02/03/2019
Asynchronous Timing
Asynchronous diagram
43
02/03/2019
Bus Width
Example of Bus
Address:
If I/O, a value between 0000H and FFFFH is issued.
If memory, it depends on the architecture:
20 -bits (8086/8088)
24 -bits (80286/80386SX)
25 -bits (80386SL/SLC/EX)
32 -bits (80386DX/80486/Pentium)
36 -bits (Pentium Pro/II/III)
44
02/03/2019
Data:
8 -bits (8088)
16 -bits (8086/80286/80386SX/SL/SLC/EX)
32 -bits (80386DX/80486/Pentium)
64 -bits (Pentium/Pro/II/III)
Control:
Most systems have at least 4 control bus connections
(active low).
MRDC (Memory ReaD Control), MWRC , IORC
(I/O Read Control), IOWC
45
02/03/2019
Chapter 3
COMPUTER MEMORY
Part 1
Contents
1
02/03/2019
Hardware review
Overview computer
systems
CPU executes instructions; CPU Memory
memory stores data
To execute an instruction,
the CPU must:
fetch an instruction;
fetch the data used by the Bus
instruction; and, finally,
execute the instruction on the
data… Disks Net
USB Etc.
which may result in writing
data back to memory
•••
2
02/03/2019
Introduction
3
02/03/2019
Capacity
Word size: Capacity is expressed in terms of words or
bytes.The natural unit of organisation
Number of words: Common word lengths are 8, 16, 32 bits
etc. or Bytes
Unit of Transfer
Internal: For internal memory, the unit of transfer is equal to
the number of data lines into and out of the memory module
External: For external memory, they are transferred in block
which is larger than a word.
Addressable unit : Smallest location which can be uniquely
addressed— Word internally— Cluster on Magnetic disks
4
02/03/2019
Access Method
Sequential acces. Examples tape
Direct Access: Individual blocks of records have
unique address based on location. Access is
accomplished by jumping (direct access) to general
vicinity plus a sequential search to reach the final
location. Example disk
Random access: example RAM
Associative access: example cache
Performance
Access time
Memory Cycle time
Transfer Rate:
Physical Types
Semiconductor : RAM
Magnetic : Disk & Tape
Optical : CD & DVD
Others
5
02/03/2019
6
02/03/2019
Memory Hierarchy
1 ns on-chip L1
Smaller, cache (SRAM)
faster,
costlier
5-10 ns off-chip L2 1-2 min
per byte
cache (SRAM)
7
02/03/2019
Examples
registers CPU registers hold words retrieved from L1 cache
on-chip L1
Smaller, cache (SRAM) L1 cache holds cache lines retrieved from L2 cache
faster,
costlier
off-chip L2
per byte
cache (SRAM) L2 cache holds cache lines retrieved from
main memory
L0:
Regs CPU registers hold words
Smaller,
retrieved from the L1 cache.
faster, L1: L1 cache
and (SRAM) L1 cache holds cache lines
costlier retrieved from the L2 cache.
L2: L2 cache
(per byte)
(SRAM) L2 cache holds cache lines
storage
devices retrieved from L3 cache
L3: L3 cache
(SRAM)
L3 cache holds cache lines
Larger, retrieved from main memory.
slower, L4: Main memory
and (DRAM) Main memory holds disk
cheaper blocks retrieved from
(per byte) local disks.
storage L5: Local secondary storage
devices (local disks)
8
02/03/2019
L2 unified cache:
L2 unified cache L2 unified cache 256 KiB, 8-way,
Access: 11 cycles
Main memory
9
02/03/2019
Processor-Memory Gap
DRAM
7%/year
(2X/10yrs)
10
02/03/2019
Example cache
Smaller, faster, more expensive
Cache 8
4 9 14
10 3 memory caches a subset of
the blocks
Cache memory
11
02/03/2019
Cache Memories
I/O Main
Bus interface
bridge memory
12
02/03/2019
S = 2s sets
Cache size:
C = S x E x B data bytes
v tag 0 1 2 B-1
Address of word:
t bits s bits b bits
S = 2s sets
tag set block
index offset
v tag 0 1 2 B-1
valid bit
B = 2b bytes per cache block (the data)
13
02/03/2019
Overview cache
• Smaller, faster, more expensive
Cache memory.
7 9 14 3 • Caches a subset of the blocks (a.k.a.
lines)
8 9 10 11
12 13 14 15
Cache
Block b is in cache:
7 9 14 3
Hit!
Memory 0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
14
02/03/2019
Cache
Block b is not in cache:
7 9
12 14 3
Miss!
Memory
Block b is stored in cache
0 1 2 3 • Placement policy:
4 5 6 7 determines where b goes
• Replacement policy:
8 9 10 11
determines which block
12 13 14 15 gets evicted (victim)
15
02/03/2019
Operations of cache
16
02/03/2019
17
02/03/2019
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Cache Addressing
18
02/03/2019
Cache size
Mapping function
19
02/03/2019
Cache of 64kByte
Cache block of 4 bytes cache is 16k (214)
lines of 4 bytes
16MBytes main memory
24 bit address (224=16M)
Thus, for mapping purposes, we can consider
main memory to consist of 4Mbytes blocks of 4
bytes each.
20
02/03/2019
Cache organization
21
02/03/2019
Block Number
Direct Mapping
22
02/03/2019
8 14 2
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
23
02/03/2019
1 1,m+1, 2m+1…2s-m+1
…
m-1 m-1, 2m-1,3m-1…2s-1
24
02/03/2019
Direct
cache
example
25
02/03/2019
Address of int:
v tag 0 1 2 3 4 5 6 7
t bits 0…01 100
v tag 0 1 2 3 4 5 6 7
find set
S= 2s sets
v tag 0 1 2 3 4 5 6 7
v tag 0 1 2 3 4 5 6 7
v tag 0 1 2 3 4 5 6 7
block offset
26
02/03/2019
v tag 0 1 2 3 4 5 6 7
block offset
v Tag Block
Set 0 0
1 1?
0 ?
M[8-9]
M[0-1]
Set 1
Set 2
Set 3 1 0 M[6-7]
27
02/03/2019
Direct-Mapped Cache-example
Memory Cache
Block Addr Block Data Index Tag Block Data
00 00 00 00
00 01 01 11
00 10 10 01
00 11 11 01
01 00
01 01
01 10
01 11 Hash function: (block address)
10 00
10 01
mod (# of blocks in cache)
10 10 Each memory address maps to
10 11
11 00
exactly one index in the cache
11 01 Fast (and simpler) to find an
11 10
address
11 11
8 = 00 10 00
Direct-Mapped Cache Problem
24=01 10 00
(t) (s) (k)
Memory Cache
Block Addr Block Data Index Tag Block Data
00 00 00 ??
00 01 01 ??
00 10 10
00 11 11 ??
01 00
01 01
01 10
01 11 What happens if we access the
10 00
10 01 following addresses?
10 10
8, 24, 8, 24, 8, …?
10 11
11 00 Conflict in cache (misses!)
11 01
11 10
Rest of cache goes unused
11 11
Solution?
28
02/03/2019
Note that
Associated Mapping
29
02/03/2019
Associative Mapping
30
02/03/2019
Example
31
02/03/2019
Word
Tag 22 bit 2 bit
32
02/03/2019
33
02/03/2019
34
02/03/2019
35
02/03/2019
Word
Tag 9 bit Set 13 bit 2 bit
36
02/03/2019
37
02/03/2019
v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7
v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7
v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7
v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7
block offset
38
02/03/2019
v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7
block offset
No match:
One line in set is selected for eviction and replacement
Replacement policies: random, least recently used (LRU), …
Example of 2 way
t=2 s=1 b=1
xx x x M=16 byte addresses, B=2 bytes/block,
S=2 sets, E=2 blocks/set
v Tag Block
0
Set 0 1 ?
00 ?M[0-1]
0
1 10 M[8-9]
0
1 01 M[6-7]
Set 1
0
39
02/03/2019
Replacement algorithm
40
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B
2 E *
3 R
LRU
Memory page B E E R B A R E B E A R
1 B *
2 E *
3 R
41
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B *
2 E *
3 R
LRU
Memory page B E E R B A R E B E A R
1 B *
2 E * A
3 R
42
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B *
2 E * A
3 R *
LRU
Memory page B E E R B A R E B E A R
1 B *
2 E * A
3 R *
43
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B * E
2 E * A
3 R *
LRU
Memory page B E E R B A R E B E A R
1 B * E
2 E * A
3 R *
44
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B * E
2 E * A B
3 R *
LRU
Memory page B E E R B A R E B E A R
1 B * E *
2 E * A B
3 R *
45
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B * E *
2 E * A B
3 R *
LRU
Memory page B E E R B A R E B E A R
1 B * E *
2 E * A B
3 R * A
46
02/03/2019
LRU
Memory page B E E R B A R E B E A R
1 B * E *
2 E * A B
3 R * A
LRU
Memory page B E E R B A R E B E A R
1 B * E *
2 E * A B R
3 R * A
47
02/03/2019
LRU
8 page faults
Memory page B E E R B A R E B E A R
1 B * E *
2 E * A B R
3 R * A
LFU
Memory page B E E R B A R E B E A R
1 B
2
3
48
02/03/2019
LFU
Memory page B E E R B A R E B E A R
1 B
2
3
LFU
Memory page B E E R B A R E B E A R
1 B
2 E
3
49
02/03/2019
LFU
Memory page B E E R B A R E B E A R
1 B
2 E 2
3
LFU
Memory page B E E R B A R E B E A R
1 B
2 E 2
3 R
50
02/03/2019
LFU
Memory page B E E R B A R E B E A R
1 B 2
2 E 2
3 R
LFU
Memory page B E E R B A R E B E A R
1 B 2
2 E 2
3 R A
51
02/03/2019
LFU
Memory page B E E R B A R E B E A R
1 B 2
2 E 2
3 R A R
LFU
Memory page B E E R B A R E B E A R
1 B 2
2 E 2 3
3 R A R
52
02/03/2019
LFU
Memory page B E E R B A R E B E A R
1 B 2 3
2 E 2 3
3 R A R
LFU
Memory page B E E R B A R E B E A R
1 B 2 3
2 E 2 3 4
3 R A R
53
02/03/2019
LFU
Memory page B E E R B A R E B E A R
1 B 2 3
2 E 2 3 4
3 R A R A
LFU
Memory page B E E R B A R E B E A R
1 B 2 3
2 E 2 3 4
3 R A R A R
54
02/03/2019
LFU
7 page faults
Memory page B E E R B A R E B E A R
1 B 2 3
2 E 2 3 4
3 R A R A R
55
02/03/2019
Write Through
All write operations are made to main memory as well as to cache,
so main memory is always valid
Other CPU’s monitor traffic to main memory to update their
caches when needed
This generates substantial memory traffic and may create a
bottleneck
Anytime a word in cache is changed, it is also changed in main
memory
Both copies always agree
Generates lots of memory writes to main memory
Multiple CPUs can monitor main memory traffic to keep local (to
CPU) cache up to date
Lots of traffic
Slows down writes
Remember bogus write through caches!
Write back
When an update occurs, an UPDATE bit associated
with that slot is set, so when the block is replaced it is
written back first
During a write, only change the contents of the cache
Update main memory only when the cache line is to be
replaced
Causes “cache coherency” problems -- different values
for the contents of an address are in the cache and the
main memory
Complex circuitry to avoid this problem
Accesses by I/O modules must occur through the cache
56
02/03/2019
57
02/03/2019
Chapter 3
COMPUTER MEMORY
Part 2
(Virtual memory)
1
02/03/2019
Main memory
0:
1:
Physical address 2:
(PA) 3:
CPU 4:
4
5:
6:
7:
8:
...
M-1:
Data word
Main memory
0:
CPU Chip 1:
2:
Virtual address Physical address
(VA) (PA)
3:
CPU MMU 4:
4100 4
5:
6:
7:
8:
...
M-1:
Data word
2
02/03/2019
Address Spaces
3
02/03/2019
Time to load block from disk > 1ms (> 1 million clock cycles)
Consequences
Large page (block) size: typically 4 KB
Fully associative
4
02/03/2019
Physical memory
Physical page (DRAM)
number or
VP 1 PP 0
Valid disk address
VP 2
PTE 0 0 null
VP 7
1 VP 4 PP 3
1
0
1
0 null Virtual memory
0 (disk)
PTE 7 1 VP 1
Memory resident VP 2
page table
VP 3
(DRAM)
VP 4
VP 6
VP 7
Page Hit
5
02/03/2019
Page Fault
Physical memory
Physical page (DRAM)
Virtual address number or
VP 1 PP 0
Valid disk address
VP 2
PTE 0 0 null
VP 7
1 VP 4 PP 3
1
0
1
0 null Virtual memory
0 (disk)
PTE 7 1 VP 1
Memory resident VP 2
page table
VP 3
(DRAM)
VP 4
VP 6
VP 7
6
02/03/2019
Physical memory
Physical page (DRAM)
Virtual address number or
VP 1 PP 0
Valid disk address
VP 2
PTE 0 0 null
VP 7
1 VP 4 PP 3
1
0
1
0 null Virtual memory
0 (disk)
PTE 7 1 VP 1
Memory resident VP 2
page table
VP 3
(DRAM)
VP 4
VP 6
VP 7
Physical memory
Physical page (DRAM)
Virtual address number or
VP 1 PP 0
Valid disk address
VP 2
PTE 0 0 null
VP 7
1 VP 3 PP 3
1
1
0
0 null Virtual memory
0 (disk)
PTE 7 1 VP 1
Memory resident VP 2
page table
VP 3
(DRAM)
VP 4
VP 6
VP 7
7
02/03/2019
Allocating Pages
8
02/03/2019
9
02/03/2019
10
02/03/2019
Concept real
and virtual
memory
11
02/03/2019
Paging of Memory
Paging
12
02/03/2019
Paging
protection level bit: kernel page or user page (more bits are
13
02/03/2019
14
02/03/2019
Sharing Pages
15
02/03/2019
16
02/03/2019
17
02/03/2019
Since a page table will generally require several pages to be stored. One
solution is to organize page tables into a multilevel hierarchy
When 2 levels are used (ex: 386, Pentium), the page number is split into
two numbers p1 and p2
p1 indexes the outer paged table (directory) in main memory who’s entries
points to a page containing page table entries which is itself indexed by p2.
Page tables, other than the directory, are swapped in and out as needed
18
02/03/2019
Segmentation
Segmentation
Typically, each process has its own segment table
Similarly to paging, each segment table entry contains a
present bit and a modified bit
If the segment is in main memory, the entry contains the
starting address and the length of that segment
Other control bits may be present if protection and
sharing is managed at the segment level
Logical to physical address translation is similar to
paging except that the offset is added to the starting
address (instead of being appended)
19
02/03/2019
Segmentation: comments
In each segment table entry we have both the starting address and
length of the segment
the segment can thus dynamically grow or shrink as needed
Supervisor/User bit
20
02/03/2019
21
02/03/2019
gives the starting address of the page table for that segment
a page number: used to index that page table to obtain the
22
02/03/2019
The Segment Base is the physical address of the page table of that
segment
Present and modified bits are present only in page table entry
Protection and sharing info most naturally resides in segment table
entry
Ex: a read-only/read-write bit, a kernel/user bit...
Introduction
23
02/03/2019
24
02/03/2019
25
02/03/2019
If the valid bit is zero in the page table entry for the logical
address, this means that the page is not in memory and must be
fetched from disk.
This is a page fault.
If necessary, a page is evicted from memory and is replaced
by the page retrieved from disk, and the valid bit is set to 1.
If the valid bit is 1, the virtual page number is replaced by the
physical frame number.
The data is then accessed by adding the offset to the physical
frame number.
A virtual address has 13 bits (8K = 213) with 3 bits for the page field
and 10 for the offset, because the page size is 1024.
A physical memory address requires 12 bits, the first two bits for the
page frame and the trailing 10 bits the offset.
26
02/03/2019
Virtual memory
27
02/03/2019
28
02/03/2019
29
02/03/2019
30
02/03/2019
31
02/03/2019
Real-World Example
32
02/03/2019
Chapter 3
COMPUTER MEMORY
Part 3
Internal memory
1
02/03/2019
2
02/03/2019
•••
3
02/03/2019
Byte Ordering
4
02/03/2019
Big-Endian 01
a1 23
b2 45
c3 67
d4
Decimal: 12345
Binary: 0011 0000 0011 1001
Byte Ordering Examples Hex: 3 0 3 9
5
02/03/2019
Register file
ALU
I/O Main
Bus interface
bridge memory
ALU
%rax
Main memory
I/O bridge 0
A
Bus interface
x A
6
02/03/2019
ALU
%rax
Main memory
I/O bridge x 0
Bus interface x A
ALU
%rax x
Main memory
I/O bridge 0
Bus interface
x A
7
02/03/2019
ALU
%rax y
Main memory
I/O bridge 0
A
Bus interface A
ALU
%rax y
Main memory
I/O bridge 0
y
Bus interface
A
8
02/03/2019
ALU
%rax y
main memory
I/O bridge 0
Bus interface y A
Physical types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD & DVD
Others
Bubble
Hologram
9
02/03/2019
Nonvolatile Memories
DRAM and SRAM are volatile memories
Lose information if powered off.
Nonvolatile memories retain value even if powered off
Read-only memory (ROM): programmed during production
Programmable ROM (PROM): can be programmed once
Eraseable PROM (EPROM): can be bulk erased (UV, X-Ray)
Electrically eraseable PROM (EEPROM): electronic erase capability
Flash memory: EEPROMs. with partial (block-level) erase capability
Wears out after about 100,000 erasings
Uses for Nonvolatile Memories
Firmware programs stored in a ROM (BIOS, controllers for disks, network
cards, graphics accelerators, security subsystems,…)
Solid state disks (replace rotating disks in thumb drives, smart phones, mp3
players, tablets, laptops,…)
Disk caches
10
02/03/2019
11
02/03/2019
DRAM
12
02/03/2019
Static RAM
SRAM v DRAM
Both volatile
Power needed to preserve data
Dynamic cell
Simpler to build, smaller
More dense
Less expensive
Needs refresh
Larger memory units
Static
Faster
Cache
13
02/03/2019
Chip Organization
14
02/03/2019
DRAM bank
Structure:
DRAM cells in a 2D grid
Each cell in a row shares
the same word line
Each cell in a column
shares the same bit line
Reading:
The row decoder selects (activates) a row
The sense amplifiers detect and store the bits of the row
The column multiplexer selects the desired column from the row
Two-phase operations:
To reduce the width of the address bus
Address bus: row address → wait → address bus: column
address→ data bus: the desired data
16 X 1 as 4 X 4 Array
Two decoders
Row
Column
Address just broken
up
Not visible from
outside
15
02/03/2019
d x w DRAM:
dw total bits organized as d supercells of size w bits
16
02/03/2019
17
02/03/2019
Memory Modules
Enhanced DRAMs
Basic DRAM cell has not changed since its invention in
1966
Commercialized by Intel in 1970
DRAMs with better interface logic and faster I/O:
Synchronous DRAM (SDRAM)
Uses a conventional clock signal instead of asynchronous control
Allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS)
Double data-rate synchronous DRAM (DDR SDRAM)
DDR1 : twice as fast
DDR2 : four times as fast
DDR3 : eight times as fast
18
02/03/2019
Enhanced DRAMs
DRAM chips
19
02/03/2019
Organisation in detail
20
02/03/2019
Packaging
21
02/03/2019
Memory Interleaving
Goal: Try to take advantage of bandwidth of multiple
DRAMs in memory system
Memory address A is converted into (b,w) pair, where
b = bank index
w = word index within bank
Logically a wide memory
Accesses to B banks staged over time to share internal resources
such as memory bus
Interleaving can be on
Low-order bits of address (cyclic)
b = A mod B, w = A div B
High-order bits of address (block)
Combination of the two (block-cyclic)
22
02/03/2019
Mixed Interleaving
23
02/03/2019
Chapter 3
COMPUTER MEMORY
Part 4
Storage devices
1
02/03/2019
First HDD:
1956, IBM (RAMAC 305)
Features:
Weight: 1 tons
50 double sided disks, 24" each
Two read/write heads
100 tracks/disk
Access time: 1s
Capacity: 5 million 7-bit characters
Microdrive
2006: 1", 8 GB capacity
2
02/03/2019
Disk geometry
3
02/03/2019
Disk capacity
40,000 tracks/surface
2 surfaces/platter
5 platters/disk
4
02/03/2019
Disk operation
5
02/03/2019
6
02/03/2019
Data organization
Data units
We can only read and write blocks (and not individual
bytes)
Sector system
Fixed data units – sectors (typically 512 bytes)
Advantage: easier to handle, the free space is not fragmented
Issue: the operating system has to map the files of various
sizes to the fixed sized sectors
Components of a sector:
7
02/03/2019
Identifying a sector
8
02/03/2019
9
02/03/2019
10
02/03/2019
Example
Given:
rotational rate = 7,200 RPM
average seek time = 9 ms
avg # sectors/track = 600
Derived:
Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms
Tavg transfer = 60/7200 RPM x 1/600 sects/track x 1000 ms/sec = 0.014 ms
Taccess = 9 ms + 4 ms + 0.014 ms
Important points:
access time dominated by seek time and rotational latency
first bit in a sector is the most expensive, the rest are free
SRAM access time is about 4 ns/doubleword, DRAM about 60
ns
disk is about 40,000 times slower than SRAM
2,500 times slower than DRAM
11
02/03/2019
IO Bus
12
02/03/2019
13
02/03/2019
14
02/03/2019
Advantages
no moving parts à faster, less power, more rugged
Disadvantages
have the potential to wear out
mitigated by “wear-leveling logic” in flash translation layer
e.g. Intel X25 guarantees 1 petabyte (1015 bytes) of random
writes before they wear out
in 2010, about 100 times more expensive per byte
in 2017, about 6 times more expensive per byte
Applications
smart phones, laptops
Apple “Fusion” drives
RAID
15
02/03/2019
RAID 0
16
02/03/2019
RAID 1
Mirrored Disks, provides 100% redundancy, and good
performance.
Data is striped across disks
2 copies of each stripe on separate disks
Read from either
Write to both
Recovery is simple
Swap faulty disk & re-mirror
No down time
Expensive
Two matched sets of disks contain the same data.
The disadvantage of RAID 1 is cost.
RAID 2
17
02/03/2019
RAID 3
Similar to RAID 2
Only one redundant disk, no matter how large the array
Simple parity bit for each set of corresponding bits
Data on failed drive can be reconstructed from surviving
data and parity info
Very high transfer rates
RAID 4
Each disk operates independently
Good for high I/O request rate
Large stripes
Bit by bit parity calculated across stripes on each disk
Parity stored on parity disk
RAID Level 4 is like adding parity disks to RAID 0.
Data is written in blocks across the data disks, and a parity block is
written to the redundant drive.
RAID 4 would be feasible if all record blocks were the same
size, such as audio/video data.
Poor performance, no commercial implementation of RAID
18
02/03/2019
RAID 5
Like RAID 4
RAID 6
19
02/03/2019
Optical Disks
20
02/03/2019
21
02/03/2019
CD-ROM Format
22
02/03/2019
Chapter 4
THE CENTRAL
PROCESSING UNIT
Language levels
High-Level Languages
Characteristics
Portable : to varying
degrees
Complex:one statement
can do much work
Expressive
Human readable
1
02/03/2019
Machine languages
Characteristics
Not portable : specific to
hardware
Simple : each instruction
does a simple task
Not expressive : each
instruction performs little
work
Not human readable :
requires losts of effort,
requires tool support
Assembly languages
Characteristics
Not portable : each
assembly language
instruction map to one
machine language
instruction
Simple : each
instruction does a
simple task
Not expressive
Human readable
2
02/03/2019
Pros
X86-64 is popular
CourseLab computers are x86-64 computers
Program natively on CourseLab instead of using an emulator
Cons
X86-64 assembly language is big
Each instruction is simple, but…
There are many instructions
Instructions differ widely
3
02/03/2019
Includes
CPU : CU, ALU,
Registers
Memory : RAM
RAM
4
02/03/2019
Intel Microprocessors
5
02/03/2019
6
02/03/2019
7
02/03/2019
64-bit Processors
Intel64
64-bit linear address space
Intel: Pentium Extreme, Xeon, Celeron D, Pendium D,
Core 2, and Core i7
IA-32e Mode
Compatibility mode for legacy 16- and 32-bit
applications
64-bit Mode uses 64-bit addresses and operands
8
02/03/2019
General-Purpose Registers
Used primarily for arithmetic and data movement
mov eax, 10 move constant 10 into register eax
Specialized uses of Registers
EAX – Accumulator register
Automatically used by multiplication and division instructions
ECX – Counter register
Automatically used by LOOP instructions
ESP – Stack Pointer register
Used by PUSH and POP instructions, points to top of stack
ESI and EDI – Source Index and Destination Index register
Used by string instructions
EBP – Base Pointer register
Used to reference parameters and local variables on the stack
9
02/03/2019
10
02/03/2019
EFLAGS Register
Status Flags
Status of arithmetic and logical operations
Control and System flags
Control the CPU operation
Programs can set and clear individual bits in the EFLAGS
register
11
02/03/2019
Status Flags
Carry Flag
Set when unsigned arithmetic result is out of range
Overflow Flag
Set when signed arithmetic result is out of range
Sign Flag
Copy of sign bit, set when result is negative
Zero Flag
Set when result is zero
Auxiliary Carry Flag
Set when there is a carry from bit 3 to bit 4
Parity Flag
Set when parity is even
Least-significant byte in result contains even number of 1s
12
02/03/2019
X86 – 64 Registers
13
02/03/2019
14
02/03/2019
RSP Register
15
02/03/2019
16
02/03/2019
RIP Register
Special-purpose register…
RIP (Instruction Pointer) register
Stores the location of the next instruction
Address (in TEXT section) of machine-language instructions to be
executed next
Value changed:
Automatically to implement sequential control flow
By jump instructions to implement selection, repetition
Memory Segmentation
Memory segmentation is necessary since the 20-bits memory
addresses cannot fit in the 16-bits CPU registers
Since x86 registers are 16-bits wide, a memory segment is made of
216 consecutive words (i.e. 64K words)
Each segment has a number identifier that is also a 16-bit number
(i.e. we have segments numbered from 0 to 64K)
A memory location within a memory segment is referenced by
specifying its offset from the start of the segment. Hence the first
word in a segment has an offset of 0 while the last one has an offset
of FFFFh
To reference a memory location its logical address has to be
specified. The logical address is written as:
Segment number:offset
For example, A43F:3487h means offset 3487h within segment
A43Fh.
17
02/03/2019
Program Segments
18
02/03/2019
Segment Overlap
19
02/03/2019
20
02/03/2019
Upper 13 bits of
segment selector are
used to index the
descriptor table
TI = Table Indicator
Select the descriptor table
0 = Global Descriptor Table
1 = Local Descriptor Table
21
02/03/2019
Base Address
32-bit number that defines the starting location of the segment
32-bit Base Address + 32-bit Offset = 32-bit Linear Address
Segment Limit
20-bit number that specifies the size of the segment
The size is specified either in bytes or multiple of 4 KB pages
Using 4 KB pages, segment size can range from 4 KB to 4 GB
Access Rights
Whether the segment contains code or data
Whether the data can be read-only or read & written
Privilege level of the segment to protect its access
22
02/03/2019
Paging
23
02/03/2019
Paging – cont’d
The operating
system uses
An assembly statement
3 essentials: opcode, operands
(dest, src)
E.g.,: add a, b,c => c = a+b AT&T Syntax
Example : Example :
24
02/03/2019
25
02/03/2019
26
02/03/2019
27
02/03/2019
Chapter 5
Data representation
Computer Arithmetic
Contents
Number system
Digital Number System
Decimal, Binary, and Hexadecimal
Base Conversion
Binary Encoding
IEC Prefixes
1
02/03/2019
2
02/03/2019
3
02/03/2019
Base Conversion
Use formular :
NS = Cn Sn + Cn-1Sn-1 + Cn-2 Sn-2 + … + C0 S0 + C-1 S-1 + …
Or
NS = Ci Si
In which :
0 Ci S-1
i is position of ith digit, i=0 is the first digit in
front of dot decimal
4
02/03/2019
From B – D
Example : 10012
10012 = 1x23 + 0x22 + 0x21 + 1x20 = 9
From O – D
Example : 162.438
162.438 = 1x82+6x81+2x80+4x8-1+3x8-2
From H – D
Example : 1E4A.6B16
1x163+Ex162+4x161+Ax160+6x16-1+Bx16-2
1x163+14x162+4x161+10x160+6x16-1+11x16-2
Examples :
5
02/03/2019
Examples
29/2 14 1
14/2 7 0
7/2 3 1
3/2 1 1
1/2 0 1
2910= 111012
1310= 11012
6
02/03/2019
7
02/03/2019
8
02/03/2019
Convert 0b100110110101101
How many digits?
Pad:
Substitute:
Example: 3E8 H – B
9
02/03/2019
Base Comparison
Base 10 Base 2 Base 16
Why does all of this matter? 0 0000 0
1 0001 1
Humans think about numbers in 2 0010 2
base 10, but computers “think” 3 0011 3
about numbers in base 2 4 0100 4
Binary encoding is what allows 5 0101 5
computers to do all of the 6 0110 6
7 0111 7
amazing things that they do!
8 1000 8
You should have this table 9 1001 9
memorized by the end of the 10 1010 A
11 1011 B
class 12 1100 C
Might as well start now! 13 1101 D
14 1110 E
15 1111 F
10
02/03/2019
Electronic implementation
Easy to store with bi-stable elements
Reliably transmitted on noisy and inaccurate wires
0 1 0
3.3V
2.8V
0.5V
0.0V
Other bases possible, but not yet viable:
DNA data storage (base 4: A, C, G, T) is a hot topic
Quantum computing
Binary Encoding
11
02/03/2019
So What’s It Mean?
Numerical Encoding
12
02/03/2019
Binary Encoding
13
02/03/2019
14
02/03/2019
15
02/03/2019
16
02/03/2019
Multiply :
0x0=0
0x1=0
1x0=0
1x1=1
Examples:
17
02/03/2019
18
02/03/2019
Signed magnitude
Comments
19
02/03/2019
One’s complement
20
02/03/2019
Two’s complement
11111100
As following Rules :
MSB is sign bit : 0 – Positive, 1 – Negative
Other bits present value of postitive number or one’s
complement of negative number
With n bit, values can be present –(2n-1 – 1) to (2n-1 – 1)
Example : using 6 bits
17 : 010001 26 : 011010
-17 : 101110 -26 : 100101
21
02/03/2019
As following rules :
MSB is sign bit : 0 – Positive, 1 – Negative
Other bits present value of postitive number or two’s
complement of negative number
With n bit, values can be present –(2n-1 ) to (2n-1 – 1)
Examples:
17 : 010001 26 : 011010
-17 : 101111 -26 : 100110
22
02/03/2019
Sign Extension
23
02/03/2019
Examples
24
02/03/2019
Example
1 (carry from decade S0)
28 0010 1000
+ 19 0001 1001
47 0100 0001
+ 0110 (editting S0)
0100 0111
25
02/03/2019
mantissa exponent
6.0210 × 1023
decimal point radix (base)
26
02/03/2019
mantissa exponent
1.012 × 2-1
27
02/03/2019
31 30 23 22 0
S E M
1 bit 8 bits 23 bits
28
02/03/2019
29
02/03/2019
31 30 23 22 0
S E M
1 bit 8 bits 23 bits
(-1)S x (1 . M) x 2(E–bias)
Note the implicit 1 in front of the M bit vector
Example: 0b 0011 1111 1100 0000 0000 0000 0000
0000
is read as 1.12 = 1.510, not 0.12 = 0.510
Gives us an extra bit of precision
Mantissa “limits”
Low values near M = 0b0…0 are close to 2Exp
High values near M = 0b1…1 are close to 2Exp+1
30
02/03/2019
31
02/03/2019
32
02/03/2019
Examples
3. Mantissa: 0101
5. Sign bit is 0
Examples
3. Mantissa: 0011
5. Sign bit is 1
33
02/03/2019
Examples
3. Mantissa: 01001000010101000000000
5. Sign bit is 1
Examples
3. Mantissa: 10100000000000000000000
5. Sign bit is 0
34
02/03/2019
Examples
35
02/03/2019
36
02/03/2019
Examples
37
02/03/2019
38
02/03/2019
Chapter 6
ASSEMBLY
LANGUAGE
Intel Pentium 4
C Language
Intel Core i7
Program
A
GCC x86-64 AMD Ryzen
1
02/03/2019
CPU
PC
Memory
Registers
Instructions
What instructions are available? What do they do?
How are they encoded?
Registers
How many registers are there?
How wide are they?
Memory
How do you specify a memory location?
2
02/03/2019
Mainstream ISAs
CPU Addresses
Memory
Registers • Code
PC Data
• Data
Condition Instructions • Stack
Codes
Programmer-visible state
PC: the Program Counter (rip in x86-64)
Address of next instruction
Memory
Named registers
Byte-addressable array
Together in “register file”
Heavily used program data
Code and user data
Condition codes Includes the Stack (for
supporting procedures)
Store status information about most recent
arithmetic operation
Used for conditional branching
3
02/03/2019
4
02/03/2019
eax ax ah al accumulate
ecx cx ch cl counter
general purpose
edx dx dh dl data
ebx bx bh bl base
5
02/03/2019
What is an Assembler?
Our platform
6
02/03/2019
Character Set
Letters a..z A..Z
Digits 0..9
Special characters ? _ @ $ . ~
NASM (unlike most assemblers) is case-sensitive
with respect to labels and variables
It is not case-sensitive with respect to keywords,
mnemonics, register names, directives, etc.
7
02/03/2019
Literals
Integers
Numeric digits (including A..F) with no decimal point
may include radix specifier at end:
b or y binary
d decimal
h hexadecimal
q octal
Examples
200 decimal (default)
200d decimal
200h hex
200q octal
10110111b binary
8
02/03/2019
NASM Syntax
In order to refer to the contents of a memory location, use square
brackets.
In order to refer to the address of a variable, leave them out, e.g.,
mov eax, bar ;Refers to the address of bar
mov eax, [bar] ;Refers to the contents of bar
No need for the OFFSET directive.
NASM does not support the hybrid syntaxes such as:
mov eax,table[ebx] ;ERROR
mov eax,[table+ebx] ;O.K
mov eax,[es:edi] ;O.K
NASM does NOT remember variable types:
data dw 0 ;Data type defi ned as double word.
mov [data], 2 ;Doesn’t work.
mov word [data], 2 ;O.K
9
02/03/2019
Statemenmts
Syntax:
[label[:]] [mnemonic] [operands] [;comment]
[ ] indicates optionality
Note that all parts are optional blank lines are legal
[label] can also be [name]
Variable names are used in data definitions
Labels are used to identify locations in code
Statements are free form; they need not be formed
into columns
Statement must be on a single line, max 128 chars
10
02/03/2019
Example:
L100: add eax, edx ; add subtotal to total
Labels often appear on a separate line for code
clarity:
L100:
add eax, edx ; add subtotal to total
11
02/03/2019
Type of statements
1. Directives
limit EQU 100 ; defines a symbol limit
% define limit 100 ; like C #define
2. Data Definitions
msg db 'Welcome to Assembler!‘
db 0Dh, 0Ah
count dd 0
mydat dd 1,2,3,4,5
resd 100 ; reserves 400 bytes
3. Instructions
mov eax, ebx
add ecx, 10
Directives
12
02/03/2019
Defines a symbol
Including files
%include “some_file”
If you know the C preprocessor, these are the
same ideas as
#define SIZE 100 or #include “stdio.h
13
02/03/2019
Data formats
14
02/03/2019
Example : L8 db 0, 1, 2, 3
Examples
mov al , [L2] ;move a byte at L2 to al
mov eax, L2 ;move the address of L2 to eax
mov [L1], ah ;move ah to the byte pointed to by L1
mov eax, dword 5
add [L2], eax ;double word at L2 containing [L2]+eax
mov [L2], 1 ;does not work, why?
mov dword [L2], 1 ;works, why
15
02/03/2019
NASM directives
16
02/03/2019
Examples using $
17
02/03/2019
18
02/03/2019
Example
Uninitialized Data
19
02/03/2019
20
02/03/2019
Program structure
SECTION .data ;data section
msg: db "Hello World",10 ;the string to print 10=newline
len: equ $-msg ;len is value, not an addr.
SECTION .text ;code section
global main ;for linker
main: ;standard gcc entry point
mov edx, len ;arg3, len of str. to print
mov ecx, msg ;arg2, pointer to string
mov ebx, 1 ;arg1, write to screen
mov eax, 4 ;write sysout command to int 80 hex
int 0x80 ;interrupt 80 hex, call kernel
mov ebx, 0 ;exit code, 0=normal
mov eax, 1 ;exit command to kernel
int 0x80 ;interrupt 80 hex, call kernel
21
02/03/2019
Program layout
Consit of 3 parts:
Text
Data
Bss
; include directives
segment .data
; DX directives
segment .bss
; RESX directives
segment .text
global asm_main
asm_main:
; instructions
22
02/03/2019
segment .text
global asm_main
asm_main:
enter 0,0 ;setup
pusha ;save all registers
;put your code here
popa ;restore all registers
mov eax, 0 ;return value
leave
ret
23
02/03/2019
; include directives
segment .data
; DX directives
segment .bss
; RESX directives
segment .text
global asm_main
asm_main:
enter 0,0
pusha
; Your program here
popa
mov eax, 0
leave
ret
Example
segment .data
integer1 dd 15 ; first int
integer2 dd 6 ; second int
segment .bss
result resd 1 ; result
segment .text
global asm_main
asm_main:
enter 0,0
pusha
24
02/03/2019
I/O?
This is all well and good, but it’s not very interesting if we can’t
“see” anything
We would like to:
Be able to provide input to the program
Be able to get output from the program
Also, debugging will be difficult, so it would be nice if we could
tell the program to print out all register values, or to print out the
content of some zones of memory
Doing all this requires quite a bit of assembly code and
requires techniques that we will not see for a while
The author of our textbook provides a nice I/O package that
we can just use, without understanding how it works for now
25
02/03/2019
I/O
26
02/03/2019
Examples
Modified example
27
02/03/2019
28
02/03/2019
First program
;
; file: first.asm
; First assembly program. This program asks for two
integers as
; input and prints out their sum.
;
; To create executable:
;
; Using Linux and gcc:
; nasm -f elf first.asm
; gcc -o first first.o driver.c asm_io.o
29
02/03/2019
%include "asm_io.inc" ;
; initialized data is put in the .data segment
segment .data
;
; These labels refer to strings used for output
prompt1 db "Enter a number: ", 0 ; don’t forget null
prompt2 db "Enter another number: ", 0
outmsg1 db "You entered ", 0
outmsg2 db " and ", 0
outmsg3 db ", the sum of these is ", 0
; uninitialized data is put in the .bss segment
;
segment .bss
;
; These labels refer to double words used to store the inputs;
;
input1 resd 1
input2 resd 1
; code is put in the .text segment
segment .text
global asm_main
asm_main:
enter 0,0 ; setup routine
pusha
mov eax, prompt1 ; print out prompt
call print_string
call read_int ; read integer
mov [input1], eax ; store into input1
mov eax, prompt2 ; print out prompt
30
02/03/2019
call print_string
call read_int ; read integer
mov [input2], eax ; store into input2
mov eax, [input1] ; eax = dword at input1
add eax, [input2] ; eax += dword at input2
mov ebx, eax ; ebx = eax
dump_regs 1 ; dump out register values
dump_mem 2, outmsg1, 1 ; dump out memory
; next print out result message as series of steps
mov eax, outmsg1
call print_string ; print out first message
mov eax, [input1]
call print_int ; print out input1
mov eax, outmsg2
call print_string ; print out second message
mov eax, [input2]
31
02/03/2019
C driver
#include "cdecl.h"
int PRE_CDECL asm_main( void ) POST_CDECL;
int main() {
int ret_status;
ret_status = asm_main();
return ret_status;
}
All segments and registers are initialized by the C system
I/O is done through the C standard library
Initialized data in .data
Uninitialized data in .bss
Code in .text
Stack later
32
02/03/2019
Compiling
Linking
33
02/03/2019
Assembling/Linking Process
34
02/03/2019
Assembling/Linking Process
Assembling/Linking Process
35
02/03/2019
The macro dump_regs prints out the bytes stored in all the
registers (in hex), as well as the bits in the FLAGS register
(only if they are set to 1)
dump_regs 13
‘13’ above is an arbitrary integer, that can be used to distinguish outputs
from multiple calls to dump_regs
The macro dump_memory prints out the bytes stored in memory
(in hex). It takes three arguments:
An arbitrary integer for output identification purposes
The address at which memory should be displayed
The number minus one of 16-byte segments that should be displayed
for instance
dump_mem 29, integer1, 3
prints out “29”, and then (3+1)*16 bytes
36
02/03/2019
Example
37
02/03/2019
Chapter 7
INSTRUCTION SET
OVERVIEW
Contents
Operand Types
Data Transfer Instructions
Addition and Subtraction
Addressing Modes
Jump and Loop Instructions
Copying a String
Summing an Array of Integers
1
02/03/2019
Immediate
Constant integer (8, 16, or 32 bits)
Constant value is stored within the instruction
Register
Name of a register is specified
Register number is encoded within the instruction
Memory
Reference to a location in memory
Memory address is encoded within the instruction, or
Register holds the address of a memory location
2
02/03/2019
MOV Examples
.DATA
count BYTE 100
bVal BYTE 20
wVal WORD 2
dVal DWORD 5
.CODE
3
02/03/2019
Zero Extension
MOVZX Instruction
Fills (extends) the upper part of the destination with zeros
Used to copy a small source into a larger destination
Destination must be a register
movzx r32, r/m8
movzx r32, r/m16
movzx r16, r/m8
0 10001111 Source
Sign Extension
MOVSX Instruction
Fills (extends) the upper part of the destination register with a
copy of the source operand's sign bit
Used to copy a small source into a larger destination
movsx r32, r/m8
movsx r32, r/m16
movsx r16, r/m8
10001111 Source
4
02/03/2019
XCHG Instruction
XCHG exchanges the values of two operands
xchg reg, reg Rules
xchg reg, mem • Operands must be of the same size
• At least one operand must be a register
xchg mem, reg
• No immediate operands are permitted
Examples
.DATA
var1 DWORD 10000000h
var2 DWORD 20000000h
.CODE
xchg ah, al ; exchange 8-bit regs
xchg ax, bx ; exchange 16-bit regs
xchg eax, ebx; exchange 32-bit regs
xchg var1,ebx; exchange mem, reg
xchg var1,var2; error: two memory operands
Alternate Format
5
02/03/2019
Direct-Offset Operands
6
02/03/2019
Examples
7
02/03/2019
Examples
Write a program that adds the following three words:
.DATA
array WORD 890Fh,1276h,0AF5Bh
Solution: Accumulate the sum in the AX register
mov ax, array
add ax,[array+2]
add ax,[array+4] ; what if sum cannot fit in AX?
Flags Affected
8
02/03/2019
Addition: A + B
The Carry flag is the carry out of the most significant bit
The Overflow flag is only set when . . .
Two positive operands are added and their sum is negative
Two negative operands are added and their sum is positive
Overflow cannot occur when adding operands of opposite signs
Subtraction: A – B
For Subtraction, the carry flag becomes the borrow flag
Carry flag is set when A has a smaller unsigned value than B
The Overflow flag is only set when . . .
A and B have different signs and sign of result ≠ sign of A
Overflow cannot occur when subtracting operands of the same sign
Hardware Viewpoint
9
02/03/2019
1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0
INC destination
destination = destination + 1
More compact (uses less space) than: ADD destination, 1
DEC destination
destination = destination – 1
More compact (uses less space) than: SUB destination, 1
NEG destination
destination = 2's complement of destination
Destination can be 8-, 16-, or 32-bit operand
In memory or a register
NO immediate operand
10
02/03/2019
Affected Flags
11
02/03/2019
Extended Arithmetic
Addressing Modes
12
02/03/2019
Register Addressing
Most efficient way of specifying an operand: no memory
access
Shorter Instructions: fewer bits are needed to specify register
Compilers use registers to optimize code
Immediate Addressing
Used to specify a constant
Immediate constant is part of the instruction
Efficient: no separate operand fetch is needed
Examples
mov eax, ebx ; register-to-register move
add eax, 5 ; 5 is an immediate constant
13
02/03/2019
EBX contains the address of the operand, not the operand itself
14
02/03/2019
Indexed Addressing
Combines a displacement (name±constant) with an
index register
Assembler converts displacement into a constant offset
Constant offset is added to register to form an effective address
15
02/03/2019
Index Scaling
Useful to index array elements of size 2, 4, and 8 bytes
Syntax: [disp + index * scale] or disp [index * scale]
.DATA
arrayB BYTE 10h,20h,30h,40h
arrayW WORD 100h,200h,300h,400h
arrayD DWORD 10000h,20000h,30000h,40000h
.CODE
mov esi, 2
mov al, arrayB[esi] ; AL = 30h
mov ax, arrayW[esi*2] ; AX = 300h
mov eax, arrayD[esi*4] ; EAX = 30000h
Based Addressing
Syntax: [Base + disp.]
Effective Address = Base register + Constant Offset
Useful to access fields of a structure or an object
Base Register points to the base address of the structure
Constant Offset relative offset within the structure
.DATA
mystruct WORD 12
mystruct is a structure
DWORD 1985 consisting of 3 fields: a
BYTE 'M' word, a double word,
.CODE and a byte
mov ebx, OFFSET mystruct
mov eax, [ebx+2] ; EAX = 1985
mov al, [ebx+6] ; AL = 'M'
16
02/03/2019
Based-Indexed Addressing
Based-Indexed Examples
.data
matrix DWORD 0, 1, 2, 3, 4 ; 4 rows, 5 cols
DWORD 10,11,12,13,14
DWORD 20,21,22,23,24
DWORD 30,31,32,33,34
.code
mov ebx, 2*ROWSIZE ; row index = 2
mov esi, 3 ; col index = 3
mov eax, matrix[ebx+esi*4] ; EAX = matrix[2][3]
17
02/03/2019
LEA Instruction
LEA Examples
.data
array WORD 1000 DUP(?)
.code ; Equivalent to . . .
lea eax, array ; mov eax, OFFSET array
18
02/03/2019
19
02/03/2019
No Scale Factor
Default Segments
20
02/03/2019
JMP Instruction
JMP is an unconditional jump to a destination instruction
Syntax: JMP destination
JMP causes the modification of the EIP register
EIP destination address
A label is used to identify the destination address
Example: top:
. . .
jmp top
JMP provides an easy way to create a loop
Loop will continue endlessly unless we find a way to terminate it
LOOP Instruction
The LOOP instruction creates a counting loop
Syntax: LOOP destination
Logic: ECX ECX – 1
if ECX != 0, jump to destination label
ECX register is used as a counter to count the iterations
Example: calculate the sum of integers from 1 to 100
mov eax, 0 ; sum = eax
mov ecx, 100 ; count = ecx
L1:
add eax, ecx ; accumulate sum in eax
loop L1 ; decrement ecx until 0
21
02/03/2019
Your turn . . .
mov eax,6
What will be the final value of EAX? mov ecx,4
Solution: 10 L1:
inc eax
loop L1
Nested Loop
22
02/03/2019
Copying a String
.DATA
source BYTE "This is the source string",0
target BYTE SIZEOF source DUP(0)
.CODE
main PROC Good use of SIZEOF
mov esi,0 ; index register
mov ecx, SIZEOF source ; loop counter
L1:
mov al,source[esi] ; get char from source
mov target[esi],al ; store it in the target
inc esi ; increment index
loop L1 ESI is used to index; loop for entire string
exit source & target
main ENDP strings
END main
.DATA
intarray WORD 100h,200h,300h,400h,500h,600h
.CODE
main PROC
mov esi, OFFSET intarray ; address of intarray
mov ecx, LENGTHOF intarray ; loop counter
mov ax, 0 ; zero the accumulator
L1:
add ax, [esi] ; accumulate sum in ax
add esi, 2 ; point to next integer
loop L1 ; repeat until ecx = 0
exit
main ENDP esi is used as a pointer
END main contains the address of an array element
23
02/03/2019
.DATA
intarray DWORD 10000h,20000h,30000h,40000h,50000h,60000h
.CODE
main PROC
mov esi, 0 ; index of intarray
mov ecx, LENGTHOF intarray ; loop counter
mov eax, 0 ; zero the accumulator
L1:
add eax, intarray[esi*4] ; accumulate sum in eax
inc esi ; increment index
loop L1 ; repeat until ecx = 0
exit
main ENDP esi is used as a scaled index
END main
PC-Relative Addressing
24
02/03/2019
Assembler:
Calculates the difference (in bytes), called PC-relative offset, between
the offset of the target label and the offset of the following instruction
Processor:
Adds the PC-relative offset to EIP when executing LOOP instruction
Summary
Data Transfer
MOV, MOVSX, MOVZX, and XCHG instructions
Arithmetic
ADD, SUB, INC, DEC, NEG, ADC, SBB, STC, and CLC
Carry, Overflow, Sign, Zero, Auxiliary and Parity flags
Addressing Modes
Register, immediate, direct, indirect, indexed, based-indexed
Load Effective Address (LEA) instruction
32-bit and 16-bit addressing
JMP and LOOP Instructions
Traversing and summing arrays, copying strings
PC-relative addressing
25
Stack and Procedures
Computer Organization
&
Assembly Language Programming
Dr Adnan Gutub
aagutub ‘at’ uqu.edu.sa
[Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]
Most Slides contents have been arranged by Dr Muhamed Mudawar & Dr Aiman El-Maleh from Computer Engineering Dept. at KFUPM
Presentation Outline
Runtime Stack
Stack Operations
Defining and Using Procedures
Program Design Using Procedures
Stack and Procedures Computer Organization & Assembly Language Programming slide 2/46
1
What is a Stack?
Stack is a Last-In-First-Out (LIFO) data structure
Analogous to a stack of plates in a cafeteria
Plate on Top of Stack is directly accessible
Two basic stack operations
Push: inserts a new element on top of the stack
Pop: deletes top element from the stack
View the stack as a linear array of elements
Insertion and deletion is restricted to one end of array
Stack has a maximum capacity
When stack is full, no element can be pushed
When stack is empty, no element can be popped
Stack and Procedures Computer Organization & Assembly Language Programming slide 3/46
Runtime Stack
Runtime stack: array of consecutive memory locations
Managed by the processor using two registers
Stack Segment register SS
Not modified in protected mode, SS points to segment descriptor
Stack Pointer register ESP
For 16-bit real-address mode programs, SP register is used
2
Runtime Stack Allocation
.STACK directive specifies a runtime stack
Operating system allocates memory for the stack
high
Runtime stack is initially empty ESP = 0012FFC4 address
?
The stack size can change dynamically at runtime ?
?
?
Stack pointer ESP ?
?
ESP is initialized by the operating system ?
?
Typical initial value of ESP = 0012FFC4h .
.
.
The stack grows downwards ?
The memory below ESP is free low
address
ESP is decremented to allocate stack memory
Stack and Procedures Computer Organization & Assembly Language Programming slide 5/46
Presentation Outline
Runtime Stack
Stack Operations
Defining and Using Procedures
Program Design Using Procedures
Stack and Procedures Computer Organization & Assembly Language Programming slide 6/46
3
Stack Instructions
Two basic stack instructions:
push source
pop destination
Source can be a word (16 bits) or doubleword (32 bits)
General-purpose register
Segment register: CS, DS, SS, ES, FS, GS
Memory operand, memory-to-stack transfer is allowed
Immediate value
Destination can be also a word or doubleword
General-purpose register
Segment register, except that pop CS is NOT allowed
Memory, stack-to-memory transfer is allowed
Stack and Procedures Computer Organization & Assembly Language Programming slide 7/46
Push Instruction
Push source32 (r/m32 or imm32)
ESP is first decremented by 4
ESP = ESP – 4 (stack grows by 4 bytes)
32-bit source is then copied onto the stack at the new ESP
[ESP] = source32
Push source16 (r/m16)
ESP is first decremented by 2
ESP = ESP – 2 (stack grows by 2 bytes)
16-bit source is then copied on top of stack at the new ESP
[ESP] = source16
Operating system puts a limit on the stack capacity
Push can cause a Stack Overflow (stack cannot grow)
Stack and Procedures Computer Organization & Assembly Language Programming slide 8/46
4
Examples on the Push Instruction
Suppose we execute:
The stack grows
PUSH EAX ; EAX = 125C80FFh
downwards
PUSH EBX ; EBX = 2Eh
The area below
PUSH ECX ; ECX = 9B61Dh
ESP is free
BEFORE AFTER
0012FFB4 0012FFB4
Stack and Procedures Computer Organization & Assembly Language Programming slide 9/46
Pop Instruction
Pop dest32 (r/m32)
32-bit doubleword at ESP is first copied into dest32
dest32 = [ESP]
ESP is then incremented by 4
ESP = ESP + 4 (stack shrinks by 4 bytes)
Pop dest16 (r/m16)
16-bit word at ESP is first copied into dest16
dest16 = [ESP]
ESP is then incremented by 2
ESP = ESP + 2 (stack shrinks by 2 bytes)
Popping from an empty stack causes a stack underflow
Stack and Procedures Computer Organization & Assembly Language Programming slide 10/46
5
Examples on the Pop Instruction
Suppose we execute: The stack shrinks
POP SI ; SI = B61Dh upwards
POP DI ; DI = 0009h The area at & above
ESP is allocated
BEFORE AFTER
0012FFC4 0012FFC4
0012FFB4 0012FFB4
Stack and Procedures Computer Organization & Assembly Language Programming slide 11/46
Stack and Procedures Computer Organization & Assembly Language Programming slide 12/46
6
Temporary Storage of Registers
Stack is often used to free a set of registers
push EBX ; save EBX
push ECX ; save ECX
. . .
; EBX and ECX can now be modified
. . .
pop ECX ; restore ECX first, then
pop EBX ; restore EBX
Stack and Procedures Computer Organization & Assembly Language Programming slide 13/46
. . . ; outer loop
pop ecx ; restore outer loop count
loop L1 ; repeat the outer loop
Stack and Procedures Computer Organization & Assembly Language Programming slide 14/46
7
Push/Pop All Registers
pushad
Pushes all the 32-bit general-purpose registers
EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI in this order
Initial ESP value (before pushad) is pushed
ESP = ESP – 32
pusha
Same as pushad but pushes all 16-bit registers AX through DI
ESP = ESP – 16
popad
Pops into registers EDI through EAX in reverse order of pushad
ESP is not read from stack. It is computed as: ESP = ESP + 32
popa
Same as popad but pops into 16-bit registers. ESP = ESP + 16
Stack and Procedures Computer Organization & Assembly Language Programming slide 15/46
popfd
Pop the 32-bit EFLAGS
Stack and Procedures Computer Organization & Assembly Language Programming slide 16/46
8
Next . . .
Runtime Stack
Stack Operations
Defining and Using Procedures
Program Design Using Procedures
Stack and Procedures Computer Organization & Assembly Language Programming slide 17/46
Procedures
A procedure is a logically self-contained unit of code
Called sometimes a function, subprogram, or subroutine
Receives a list of parameters, also called arguments
Performs computation and returns results
Plays an important role in modular program development
Example of a procedure (called function) in C language
int sumof ( int x,int y,int z ) {
Result type int temp; Formal parameter list
temp = x + y + z;
return temp;
Return function result
}
The above function sumof can be called as follows:
sum = sumof( num1,num2,num3 ); Actual parameter list
Stack and Procedures Computer Organization & Assembly Language Programming slide 18/46
9
Defining a Procedure in Assembly
Assembler provides two directives to define procedures
PROC to define name of procedure and mark its beginning
ENDP to mark end of procedure
A typical procedure definition is
procedure_name PROC
. . .
; procedure body
. . .
procedure_name ENDP
Stack and Procedures Computer Organization & Assembly Language Programming slide 19/46
Documenting Procedures
Suggested Documentation for Each Procedure:
Does: Describe the task accomplished by the procedure
Preconditions
Must be satisfied before the procedure is called
Stack and Procedures Computer Organization & Assembly Language Programming slide 20/46
10
Example of a Procedure Definition
The sumof procedure receives three integer parameters
Assumed to be in EAX, EBX, and ECX
Computes and returns result in register EAX
;------------------------------------------------
; Sumof: Calculates the sum of three integers
; Receives: EAX, EBX, ECX, the three integers
; Returns: EAX = sum
; Requires: nothing
;------------------------------------------------
sumof PROC
add EAX, EBX ; EAX = EAX + second number
add EAX, ECX ; EAX = EAX + third number
ret ; return to caller
sumof ENDP
11
How a Procedure Call / Return Works
How does a procedure know where to return?
There can be multiple calls to same procedure in a program
Procedure has to return differently for different calls
It knows by saving the return address (RA) on the stack
This is the address of next instruction after call
The call instruction does the following
Pushes the return address on the stack
Jumps into the first instruction inside procedure
ESP = ESP – 4; [ESP] = RA; EIP = procedure address
The ret (return) instruction does the following
Pops return address from stack
Jumps to return address: EIP = [ESP]; ESP = ESP + 4
Stack and Procedures Computer Organization & Assembly Language Programming slide 23/46
Allocated
00401081 03 C3 add EAX, EBX ESP
00401083 03 C1 add EAX, ECX ESP RA=00401036
00401085 C3 ret Free Area
sumof ENDP
END main
Stack and Procedures Computer Organization & Assembly Language Programming slide 24/46
12
Don’t Mess Up the Stack !
Just before returning from a procedure
Make sure the stack pointer ESP is pointing at return address
Example of a messed-up procedure
Pushes EAX on the stack before returning
Stack pointer ESP is NOT pointing at return address!
main PROC Stack
call messedup
. . . high addr
exit Used
ESP
main ENDP ESP Return Addr
messedup PROC ESP EAX Value Where to return?
push EAX Free Area
EAX value is NOT
ret
the return address!
messedup ENDP
Stack and Procedures Computer Organization & Assembly Language Programming slide 25/46
Sub1 PROC
.
return address of call Sub1
.
call Sub2
ret return address of call Sub2
Sub1 ENDP
return address of call Sub3 ESP
Sub2 PROC
.
.
call Sub3
ret
Sub2 ENDP
Sub3 PROC
.
.
ret
Sub3 ENDP
Stack and Procedures Computer Organization & Assembly Language Programming slide 26/46
13
Parameter Passing
Parameter passing in assembly language is different
More complicated than that used in a high-level language
In assembly language
Place all required parameters in an accessible storage area
Then call the procedure
Two types of storage areas used
Registers: general-purpose registers are used (register method)
Memory: stack is used (stack method)
Two common mechanisms of parameter passing
Pass-by-value: parameter value is passed
Pass-by-reference: address of parameter is passed
Stack and Procedures Computer Organization & Assembly Language Programming slide 27/46
Stack and Procedures Computer Organization & Assembly Language Programming slide 28/46
14
Preserving Registers
Need to preserve the registers across a procedure call
Stack can be used to preserve register values
Which registers should be saved?
Those registers that are modified by the called procedure
But still used by the calling procedure
We can save all registers using pusha if we need most of them
However, better to save only needed registers when they are few
Stack and Procedures Computer Organization & Assembly Language Programming slide 30/46
15
USES Operator
The USES operator simplifies the writing of a procedure
Registers are frequently modified by procedures
Just list the registers that should be preserved after USES
Assembler will generate the push and pop instructions
ArraySum PROC
ArraySum PROC USES esi ecx push esi
mov eax,0 push ecx
L1: add eax, [esi] mov eax,0
add esi, 4 L1: add eax, [esi]
loop L1 add esi, 4
ret loop L1
ArraySum ENDP pop ecx
pop esi
ret
ArraySum ENDP
Stack and Procedures Computer Organization & Assembly Language Programming slide 31/46
Next . . .
Runtime Stack
Stack Operations
Defining and Using Procedures
Program Design Using Procedures
Stack and Procedures Computer Organization & Assembly Language Programming slide 32/46
16
Program Design using Procedures
Program Design involves the Following:
Break large tasks into smaller ones
Use a hierarchical structure based on procedure calls
Test individual procedures separately
Structure Chart
Summation
Program (main)
Structure Chart
Above diagram is called a structure chart
Describes program structure, division into procedure, and call sequence
Link library procedures are shown in grey
Stack and Procedures Computer Organization & Assembly Language Programming slide 34/46
17
Integer Summation Program – 1 of 4
INCLUDE Irvine32.inc
ArraySize EQU 5
.DATA
prompt1 BYTE "Enter a signed integer: ",0
prompt2 BYTE "The sum of the integers is: ",0
array DWORD ArraySize DUP(?)
.CODE
main PROC
call Clrscr ; clear the screen
mov esi, OFFSET array
mov ecx, ArraySize
call PromptForIntegers ; store input integers in array
call ArraySum ; calculate the sum of array
call DisplaySum ; display the sum
exit
main ENDP
Stack and Procedures Computer Organization & Assembly Language Programming slide 35/46
18
Integer Summation Program – 3 of 4
;-----------------------------------------------------
; ArraySum: Calculates the sum of an array of integers
; Receives: ESI = pointer to the array,
; ECX = array size
; Returns: EAX = sum of the array elements
;-----------------------------------------------------
ArraySum PROC USES esi ecx
mov eax,0 ; set the sum to zero
L1:
add eax, [esi] ; add each integer to sum
add esi, 4 ; point to next integer
loop L1 ; repeat for array size
Stack and Procedures Computer Organization & Assembly Language Programming slide 37/46
Stack and Procedures Computer Organization & Assembly Language Programming slide 38/46
19
Sample Output
Stack and Procedures Computer Organization & Assembly Language Programming slide 39/46
Stack and Procedures Computer Organization & Assembly Language Programming slide 40/46
20
Parameter Passing Through Stack
Higher Address
Stack and Procedures Computer Organization & Assembly Language Programming slide 41/46
Stack and Procedures Computer Organization & Assembly Language Programming slide 42/46
21
Freeing Passed Parameters From Stack
Use RET N instruction to free parameters from stack
Stack and Procedures Computer Organization & Assembly Language Programming slide 43/46
Local Variables
Local variables are dynamic data whose values must be
preserved over the lifetime of the procedure, but not
beyond its termination.
At the termination of the procedure, the current
environment disappears and the previous environment
must be restored.
Space for local variables can be reserved by subtracting
the required number of bytes from ESP.
Offsets from ESP are used to address local variables.
Stack and Procedures Computer Organization & Assembly Language Programming slide 44/46
22
Local Variables
Test PROC
push EBP
mov EBP, ESP
sub ESP, 4
void Test(int i){
push EAX
int k;
mov DWORD PTR [EBP-4], 9
mov EAX, [EBP + 8]
k = i+9;
add [EBP-4], EAX
……
……
}
pop EAX
mov ESP, EBP
pop EBP
ret 4
Test ENDP
Stack and Procedures Computer Organization & Assembly Language Programming slide 45/46
Summary
Procedure – Named block of executable code
CALL: call a procedure, push return address on top of stack
RET: pop the return address and return from procedure
Preserve registers across procedure calls
Runtime stack – LIFO structure – Grows downwards
Holds return addresses, saved registers, etc.
PUSH – insert value on top of stack, decrement ESP
POP – remove top value of stack, increment ESP
Stack and Procedures Computer Organization & Assembly Language Programming slide 46/46
23
Conditional Processing
Computer Organization
&
Assembly Language Programming
Dr Adnan Gutub
aagutub ‘at’ uqu.edu.sa
[Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]
Most Slides contents have been arranged by Dr Muhamed Mudawar & Dr Aiman El-Maleh from Computer Engineering Dept. at KFUPM
Presentation Outline
Conditional Processing Computer Organization & Assembly Language Programming slide 2/55
1
AND Instruction
Bitwise AND between each pair of matching bits
AND destination, source
Following operand combinations are allowed AND
AND reg, reg
Operands can be
AND reg, mem
8, 16, or 32 bits
AND reg, imm and they must be
AND mem, reg of the same size
AND mem, imm
AND instruction is 00111011
often used to AND 00001111
cleared unchanged
clear selected bits 00001011
Conditional Processing Computer Organization & Assembly Language Programming slide 3/55
Conditional Processing Computer Organization & Assembly Language Programming slide 4/55
2
OR Instruction
Bitwise OR operation between each pair of matching bits
OR destination, source
Following operand combinations are allowed OR
OR reg, reg
Operands can be
OR reg, mem
8, 16, or 32 bits
OR reg, imm and they must be
OR mem, reg of the same size
OR mem, imm
OR instruction is 00111011
often used to OR 11110000
set unchanged
set selected bits 11111011
Conditional Processing Computer Organization & Assembly Language Programming slide 5/55
Conditional Processing Computer Organization & Assembly Language Programming slide 6/55
3
Converting Binary Digits to ASCII
OR instruction can convert a binary digit to ASCII
0 =00000000 1 =00000001
'0' = 0 0 1 1 0 0 0 0 '1' = 0 0 1 1 0 0 0 1
XOR Instruction
Bitwise XOR between each pair of matching bits
XOR destination, source
Following operand combinations are allowed XOR
XOR reg, reg
Operands can be
XOR reg, mem
8, 16, or 32 bits
XOR reg, imm and they must be
XOR mem, reg of the same size
XOR mem, imm
XOR instruction is 00111011
often used to XOR 11110000
inverted unchanged
invert selected bits 11001011
Conditional Processing Computer Organization & Assembly Language Programming slide 8/55
4
Affected Status Flags
Sample Output
Conditional Processing Computer Organization & Assembly Language Programming slide 10/55
5
Encrypting a String
KEY = 239 ; Can be any byte value
BUFMAX = 128
.data
buffer BYTE BUFMAX+1 DUP(0)
bufSize DWORD BUFMAX
Conditional Processing Computer Organization & Assembly Language Programming slide 11/55
TEST Instruction
Bitwise AND operation between each pair of bits
TEST destination, source
The flags are affected similar to the AND Instruction
However, TEST does NOT modify the destination operand
TEST instruction can check several bits at once
Example: Test whether bit 0 or bit 3 is set in AL
Solution: test al, 00001001b ; test bits 0 & 3
We only need to check the zero flag
; If zero flag => both bits 0 and 3 are clear
; If Not zero => either bit 0 or 3 is set
Conditional Processing Computer Organization & Assembly Language Programming slide 12/55
6
NOT Instruction
Inverts all the bits in a destination operand
NOT destination
Result is called the 1's complement
Destination can be a register or memory NOT
Conditional Processing Computer Organization & Assembly Language Programming slide 13/55
CMP Instruction
CMP (Compare) instruction performs a subtraction
Syntax: CMP destination, source
Computes: destination – source
Destination operand is NOT modified
All six flags: OF, CF, SF, ZF, AF, and PF are affected
CMP uses the same operand combinations as SUB
Operands can be 8, 16, or 32 bits and must be of the same size
Examples: assume EAX = 5, EBX = 10, and ECX = 5
cmp eax, ebx ; OF=0, CF=1, SF=1, ZF=0
cmp eax, ecx ; OF=0, CF=0, SF=0, ZF=1
Conditional Processing Computer Organization & Assembly Language Programming slide 14/55
7
Unsigned Comparison
CMP can perform unsigned and signed comparisons
The destination and source operands can be unsigned or signed
Signed Comparison
For signed comparison, we examine SF, OF, and ZF
Signed Comparison Flags
signed destination < signed source SF ≠ OF
signed destination > signed source SF = OF, ZF = 0
destination = source ZF = 1
Conditional Processing Computer Organization & Assembly Language Programming slide 16/55
8
Next . . .
Conditional Processing Computer Organization & Assembly Language Programming slide 17/55
Conditional Structures
No high-level control structures in assembly language
Comparisons and conditional jumps are used to …
Implement conditional structures such as IF statements
Implement conditional loops
9
Jumps Based on Specific Flags
Conditional Jump Instruction has the following syntax:
Jcond destination ; cond is the jump condition
Destination
Destination Label
Prior to 386
Jump must be within
–128 to +127 bytes
from current location
IA-32
32-bit offset permits
jump anywhere in
memory
Conditional Processing Computer Organization & Assembly Language Programming slide 19/55
Conditional Processing Computer Organization & Assembly Language Programming slide 20/55
10
Examples of Jump on Zero
Task: Check whether integer value in EAX is even
Solution: TEST whether the least significant bit is 0
If zero, then EAX is even, otherwise it is odd
Conditional Processing Computer Organization & Assembly Language Programming slide 21/55
11
Jumps Based on Signed Comparisons
12
Computing the Max and Min
Compute the Max of unsigned EAX and EBX
mov Max, eax ; assume Max = eax
cmp Max, ebx
Solution: jae done
mov Max, ebx ; Max = ebx
done:
13
BT Instruction
BT = Bit Test Instruction
Syntax:
BT r/m16, r16
BT r/m32, r32
BT r/m16, imm8
BT r/m32, imm8
bt AX, 9 ; CF = bit 9
jc L1 ; jump if Carry to L1
Conditional Processing Computer Organization & Assembly Language Programming slide 27/55
Next . . .
Conditional Processing Computer Organization & Assembly Language Programming slide 28/55
14
LOOPZ and LOOPE
Syntax:
LOOPE destination
LOOPZ destination
Logic:
ECX = ECX – 1
if ECX > 0 and ZF=1, jump to destination
Conditional Processing Computer Organization & Assembly Language Programming slide 29/55
Conditional Processing Computer Organization & Assembly Language Programming slide 30/55
15
LOOPZ Example
The following code finds the first negative value in an array
.data
array SWORD 17,10,30,40,4,-5,8
.code
mov esi, OFFSET array – 2 ; start before first
mov ecx, LENGTHOF array ; loop counter
L1:
add esi, 2 ; point to next element
test WORD PTR [esi], 8000h ; test sign bit
loopz L1 ; ZF = 1 if value >= 0
jnz found ; found negative value
notfound:
. . . ; ESI points to last array element
found:
. . . ; ESI points to first negative value
Conditional Processing Computer Organization & Assembly Language Programming slide 31/55
Your Turn . . .
Locate the first zero value in an array
If none is found, let ESI be initialized to 0
.data
array SWORD -3,7,20,-50,10,0,40,4
.code
mov esi, OFFSET array – 2 ; start before first
mov ecx, LENGTHOF array ; loop counter
L1:
add esi, 2 ; point to next element
cmp WORD PTR [esi], 0 ; check for zero
loopne L1 ; continue if not zero
JE Found
XOR ESI, ESI
Found:
Conditional Processing Computer Organization & Assembly Language Programming slide 32/55
16
Next . . .
Conditional Processing Computer Organization & Assembly Language Programming slide 33/55
Block-Structured IF Statements
IF statement in high-level languages (such as C or Java)
Boolean expression (evaluates to true or false)
List of statements performed when the expression is true
Optional list of statements performed when expression is false
Task: Translate IF statements into assembly language
Example:
mov eax,var1
cmp eax,var2
if( var1 == var2 ) jne elsepart
X = 1; mov X,1
else jmp next
X = 2; elsepart:
mov X,2
next:
Conditional Processing Computer Organization & Assembly Language Programming slide 34/55
17
Your Turn . . .
Translate the IF statement to assembly language
All values are unsigned
cmp ebx,ecx
if( ebx <= ecx )
ja next
{
mov eax,5
eax = 5;
mov edx,6
edx = 6;
next:
}
Conditional Processing Computer Organization & Assembly Language Programming slide 35/55
Your Turn . . .
Implement the following IF in assembly language
All variables are 32-bit signed integers
mov eax,var1
if (var1 <= var2) { cmp eax,var2
var3 = 10; jle ifpart
} mov var3,6
else { mov var4,7
var3 = 6; jmp next
var4 = 7; ifpart:
}
mov var3,10
next:
18
Compound Expression with AND
HLLs use short-circuit evaluation for logical AND
If first expression is false, second expression is skipped
if ((al > bl) && (bl > cl)) {X = 1;}
19
Your Turn . . .
Implement the following IF in assembly language
All values are unsigned
Conditional Processing Computer Organization & Assembly Language Programming slide 39/55
IsDigit PROC
cmp al,'0' ; AL < '0' ?
jb L1 ; yes? ZF=0, return
cmp al,'9' ; AL > '9' ?
ja L1 ; yes? ZF=0, return
test al, 0 ; ZF = 1
L1: ret
IsDigit ENDP
Conditional Processing Computer Organization & Assembly Language Programming slide 40/55
20
Compound Expression with OR
HLLs use short-circuit evaluation for logical OR
If first expression is true, second expression is skipped
if ((al > bl) || (bl > cl)) {X = 1;}
Conditional Processing Computer Organization & Assembly Language Programming slide 41/55
WHILE Loops
A WHILE loop can be viewed as
IF statement followed by
The body of the loop, followed by
Unconditional jump to the top of the loop
21
Your Turn . . .
Implement the following loop, assuming unsigned integers
Conditional Processing Computer Organization & Assembly Language Programming slide 43/55
22
Next . . .
Conditional Processing Computer Organization & Assembly Language Programming slide 45/55
Indirect Jump
Direct Jump: Jump to a Labeled Destination
Destination address is a constant
Address is encoded in the jump instruction
Address is an offset relative to EIP (Instruction Pointer)
Indirect jump
Destination address is a variable or register
Address is stored in memory/register
Address is absolute
23
Switch Statement
Consider the following switch statement:
Switch (ch) {
case '0': exit();
case '1': count++; break;
case '2': count--; break;
case '3': count += 5; break;
case '4': count -= 5; break;
default : count = 0;
}
24
Jump Table and Indirect Jump
Jump Table is an array of double words
Contains the case labels of the switch statement
Can be defined inside the same procedure of switch statement
jumptable DWORD case0,
case1, Assembler converts
case2,
case3, labels to addresses
case4
Next . . .
Conditional Processing Computer Organization & Assembly Language Programming slide 50/55
25
Bubble Sort
Consider sorting an array of 5 elements: 5 1 3 2 4
First Pass (4 comparisons) 5 1 3 2 4
Compare 5 with 1 and swap: 1 5 3 2 4 (swap)
Compare 5 with 3 and swap: 1 3 5 2 4 (swap)
Compare 5 with 2 and swap: 1 3 2 5 4 (swap)
Compare 5 with 4 and swap: 1 3 2 4 5 (swap)
Second Pass (3 comparisons) largest
Compare 1 with 3 (No swap): 1 3 2 4 5 (no swap)
Compare 3 with 2 and swap: 1 2 3 4 5 (swap)
Compare 3 with 4 (No swap): 1 2 3 4 5 (no swap)
Third Pass (2 comparisons)
Compare 1 with 2 (No swap): 1 2 3 4 5 (no swap)
Compare 2 with 3 (No swap): 1 2 3 4 5 (no swap)
No swapping during 3rd pass array is now sorted
Conditional Processing Computer Organization & Assembly Language Programming slide 51/55
26
Bubble Sort Procedure – Slide 1 of 2
;---------------------------------------------------
; bubbleSort: Sorts a DWORD array in ascending order
; Uses the bubble sort algorithm
; Receives: ESI = Array Address
; ECX = Array Length
; Returns: Array is sorted in place
;---------------------------------------------------
bubbleSort PROC USES eax ecx edx
outerloop:
dec ECX ; ECX = comparisons
jz sortdone ; if ECX == 0 then we are done
mov EDX, 1 ; EDX = sorted = 1 (true)
push ECX ; save ECX = comparisons
push ESI ; save ESI = array address
Conditional Processing Computer Organization & Assembly Language Programming slide 53/55
27
Summary
Bitwise instructions (AND, OR, XOR, NOT, TEST)
Manipulate individual bits in operands
CMP: compares operands using implied subtraction
Sets condition flags for later conditional jumps and loops
Conditional Jumps & Loops
Flag values: JZ, JNZ, JC, JNC, JO, JNO, JS, JNS, JP, JNP
Equality: JE(JZ), JNE (JNZ), JCXZ, JECXZ
Signed: JG (JNLE), JGE (JNL), JL (JNGE), JLE (JNG)
Unsigned: JA (JNBE), JAE (JNB), JB (JNAE), JBE (JNA)
LOOPZ (LOOPE), LOOPNZ (LOOPNE)
Indirect Jump and Jump Table
Conditional Processing Computer Organization & Assembly Language Programming slide 55/55
28
Integer Arithmetic
Computer Organization
&
Assembly Language Programming
Dr Adnan Gutub
aagutub ‘at’ uqu.edu.sa
[Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]
Most Slides contents have been arranged by Dr Muhamed Mudawar & Dr Aiman El-Maleh from Computer Engineering Dept. at KFUPM
Outline
Shift and Rotate Instructions
Shift and Rotate Applications
Multiplication and Division Instructions
Translating Arithmetic Expressions
Decimal String to Number Conversions
1
SHL Instruction
SHL is the Shift Left instruction
Performs a logical left shift on the destination operand
Fills the lowest bit with zero
The last bit shifted out from the left becomes the Carry Flag
0
CF
Fast Multiplication
Shifting left 1 bit multiplies a number by 2
2
SHR Instruction
SHR is the Shift Right instruction
Performs a logical right shift on the destination operand
The highest bit position is filled with a zero
The last bit shifted out from the right becomes the Carry Flag
SHR uses the same instruction format as SHL
0
CF
0
CF
Arithmetic Shift
Fills the newly created bit position with a copy of the sign bit
Applies only to Shift Arithmetic Right (SAR)
CF
3
SAL and SAR Instructions
SAL: Shift Arithmetic Left is identical to SHL
SAR: Shift Arithmetic Right
Performs a right arithmetic shift on the destination operand
CF
Your Turn . . .
Indicate the value of AL and CF after each shift
4
Effect of Shift Instructions on Flags
The CF is the last bit shifted
The OF is defined for single bit shift only
It is 1 if the sign bit changes
The ZF, SF and PF are affected according to the result
The AF is unaffected
ROL Instruction
ROL is the Rotate Left instruction
Rotates each bit to the left, according to the count operand
Highest bit is copied into the Carry Flag and into the Lowest Bit
CF
mov al,11110000b
rol al,1 ; AL = 11100001b, CF = 1
mov dl,3Fh ; DL = 00111111b
rol dl,4 ; DL = 11110011b = F3h, CF = 1
5
ROR Instruction
ROR is the Rotate Right instruction
Rotates each bit to the right, according to the count operand
Lowest bit is copied into the Carry flag and into the highest bit
CF
mov al,11110000b
ror al,1 ; AL = 01111000b, CF = 0
mov dl,3Fh ; DL = 00111111b
ror dl,4 ; DL = F3h, CF = 1
RCL Instruction
RCL is the Rotate Carry Left instruction
Rotates each bit to the left, according to the count operand
Copies the Carry flag to the least significant bit
Copies the most significant bit to the Carry flag
As if the carry flag is part of the destination operand
CF
6
RCR Instruction
RCR is the Rotate Carry Right instruction
Rotates each bit to the right, according to the count operand
Copies the Carry flag to the most significant bit
Copies the least significant bit to the Carry flag
As if the carry flag is part of the destination operand
CF
7
SHLD Instruction
SHLD is the Shift Left Double instruction
Syntax: SHLD destination, source, count
Shifts a destination operand a given count of bits to the left
SHLD Example
Shift variable var1 4 bits to the left
Replace the lowest 4 bits of var1 with the high 4 bits of AX
.data var1 AX
var1 WORD 9BA6h
.code Before: 9BA6 AC36
mov ax, 0AC36h After: BA6A AC36
shld var1, ax, 4
8
SHRD Instruction
SHRD is the Shift Right Double instruction
Syntax: SHRD destination, source, count
Shifts a destination operand a given count of bits to the right
SHRD Example
Shift AX 4 bits to the right
Replace the highest 4 bits of AX with the low 4 bits of DX
DX AX
mov ax,234Bh
mov dx,7654h
Before: 7654 234B
9
Your Turn . . .
Indicate the values (in hex) of each destination operand
mov ax,7C36h
mov dx,9FA6h
shld dx,ax,4 ; DX = FA67h
shrd ax,dx,8 ; AX = 677Ch
Next . . .
Shift and Rotate Instructions
Shift and Rotate Applications
Multiplication and Division Instructions
Translating Arithmetic Expressions
Decimal String to Number Conversions
10
Shifting Bits within an Array
Sometimes, we need to shift all bits within an array
Example: moving a bitmapped image from one screen to another
.data
ArraySize EQU 100
array BYTE ArraySize DUP(9Bh) [0] [1] [2] [99]
.code
array before 9B 9B 9B ... 9B
mov ecx, ArraySize
array after 4D CD CD ... CD
mov esi, 0
clc ; clear carry flag
L1:
rcr array[esi], 1 ; propagate the carry flag
inc esi ; does not modify carry
loop L1 ; does not modify carry
Integer Arithmetic COE 205 – KFUPM slide 21
Binary Multiplication
You know that SHL performs multiplication efficiently
When the multiplier is a power of 2
11
Your Turn . . .
Multiply EAX by 26, using shifting and addition instructions
Hint: 26 = 2 + 8 + 16
mov ebx, eax ; EBX = number
shl eax, 1 ; EAX = number * 2
shl ebx, 3 ; EBX = number * 8
add eax, ebx ; EAX = number * 10
shl ebx, 1 ; EBX = number * 16
add eax, ebx ; EAX = number * 26
12
Convert Number to Hex String
Task: Convert EAX to a Hexadecimal String pointed by ESI
Receives: EAX = Number, ESI= Address of hex string
Returns: String pointed by ESI is filled with hex characters '0' to 'F'
ConvToHexStr PROC USES ebx ecx esi
mov ecx, 8 ; 8 iterations, why?
L1: rol eax, 4 ; rotate upper 4 bits
mov ebx, eax
and ebx, 0Fh ; keep only lower 4 bits
mov bl, HexChar[ebx] ; convert to a hex char
mov [esi], bl ; store hex char in string
inc esi
loop L1 ; loop 8 times
mov BYTE PTR [esi], 0 ; append a null byte
ret
HexChar BYTE "0123456789ABCDEF"
ConvToHexStr ENDP
Integer Arithmetic COE 205 – KFUPM slide 25
DH DL In this example:
0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0
Day = 10
Month = 3
Field: Year Month Day Year = 1980 + 19
Bit numbers: 9-15 5-8 0-4 Date = March 10, 1999
13
Next . . .
Shift and Rotate Instructions
Shift and Rotate Applications
Multiplication and Division Instructions
Translating Arithmetic Expressions
Decimal String to Number Conversions
MUL Instruction
The MUL instruction is used for unsigned multiplication
Multiplies 8-, 16-, or 32-bit operand by AL, AX, or EAX
The instruction formats are:
MUL r/m8 ; AX = AL * r/m8
MUL r/m16 ; DX:AX = AX * r/m16
MUL r/m32 ; EDX:EAX = EAX * r/m32
14
MUL Examples
Example 1: Multiply 16-bit var1 (2000h) * var2 (100h)
.data
var1 WORD 2000h The Carry and Overflow flags are set if
var2 WORD 100h upper half of the product is non-zero
.code
mov ax,var1
mul var2 ; DX:AX = 00200000h, CF = OF = 1
mov eax,12345h
mov ebx,1000h
mul ebx ; EDX:EAX = 0000000012345000h, CF=OF=0
Your Turn . . .
What will be the hexadecimal values of DX, AX, and the
Carry flag after the following instructions execute?
mov ax, 1234h
mov bx, 100h Solution
15
IMUL Instruction
The IMUL instruction is used for signed multiplication
Preserves the sign of the product by sign-extending it
One-Operand formats, as in MUL
IMUL r/m8 ; AX = AL * r/m8
IMUL r/m16 ; DX:AX = AX * r/m16
IMUL r/m32 ; EDX:EAX = EAX * r/m32
Two-Operand formats:
IMUL r16, r16/m16/imm8/imm16 The Carry and Overflow
IMUL r32, r32/m32/imm8/imm32 flags are set if the upper
half of the product is not
Three-Operand formats: a sign extension of the
IMUL r16, r16/m16, imm8/imm16 lower half
IMUL r32, r32/m32, imm8/imm32
IMUL Examples
Multiply AL = 48 by BL = 4
mov al,48
mov bl,4
imul bl ; AX = 00C0h, CF = OF = 1
mov ax,8760h
mov bx,100h
imul bx
DX = FF87h, AX = 6000h, OF = CF = 1
Integer Arithmetic COE 205 – KFUPM slide 32
16
Two and Three Operand Formats
.data
wval SWORD -4
dval SDWORD 4
.code
mov ax, -16
mov bx, 2
imul bx, ax ; BX = BX * AX = -32
imul bx, 2 ; BX = BX * 2 = -64
imul bx, wval ; BX = BX * wval = 256
imul bx, 5000 ; OF = CF = 1
mov edx,-16
imul edx,dval ; EDX = EDX * dval = -64
imul bx, wval,-16 ; BX = wval * -16 = 64
imul ebx,dval,-16 ; EBX = dval * -16 = -64
imul eax,ebx,2000000000 ; OF = CF = 1
Integer Arithmetic COE 205 – KFUPM slide 33
DIV Instruction
The DIV instruction is used for unsigned division
A single operand (divisor) is supplied
Divisor is an 8-bit, 16-bit, or 32-bit register or memory
Dividend is implicit and is either AX, DX:AX, or EDX:EAX
17
DIV Examples
Divide AX = 8003h by CX = 100h
mov dx,0 ; clear dividend, high
mov ax,8003h ; dividend, low
mov cx,100h ; divisor
div cx ; AX = 0080h, DX = 3 (Remainder)
mov dx,0087h
mov ax,6023h
mov bx,100h
div bx Solution: DX = 0023h, AX = 8760h
Divide Overflow
Divide Overflow occurs when …
Quotient cannot fit into the destination operand, or when
Dividing by Zero
Divide Overflow causes a CPU interrupt
The current program halts and an error dialog box is produced
Example of a Divide Overflow
mov dx,0087h
Divide overflow:
mov ax,6002h
Quotient = 87600h
mov bx,10h
Cannot fit in AX
div bx
18
Signed Integer Division
Signed integers must be sign-extended before division
Fill high byte, word, or double-word with a copy of the sign bit
Example:
mov ax, 0FE9Bh ; AX = -357
cwd ; DX:AX = FFFFFF9Bh
IDIV Instruction
IDIV performs signed integer division
Same syntax and operands as DIV instruction
IDIV r/m8
IDIV r/m16
IDIV r/m32
19
IDIV Examples
Example: Divide DX:AX (-48) by BX (-5)
mov ax,-48
cwd ; sign-extend AX into DX
mov bx,-5
idiv bx ; AX = 9, DX = -3
mov eax,48
cdq ; sign-extend EAX into EDX
mov ebx,-5
idiv ebx ; EAX = -9, EDX = 3
Next . . .
Shift and Rotate Instructions
Shift and Rotate Applications
Multiplication and Division Instructions
Translating Arithmetic Expressions
Decimal String to Number Conversions
20
Translating Arithmetic Expressions
Some good reasons to translate arithmetic expressions
Learn how compilers do it
Test your understanding of MUL, IMUL, DIV, and IDIV
Check for Carry and Overflow flags
21
Signed Arithmetic Expressions
Example: var4 = (-var1 * var2) + var3
mov eax, var1
neg eax
imul var2 ; signed multiplication
jo tooBig ; check for overflow
add eax, var3
jo tooBig ; check for overflow
mov var4, eax ; save result
Your Turn . . .
Translate: var5 = (var1 * -var2)/(var3 – var4)
Assume signed 32-bit integers
22
Next . . .
Shift and Rotate Instructions
Shift and Rotate Applications
Multiplication and Division Instructions
Translating Arithmetic Expressions
Decimal String to Number Conversions
23
Convert Decimal String – cont'd
; Assumes: String should contain only decimal chars
; String should not be empty
; Procedure does not detect invalid input
; Procedure does not skip leading spaces
24
Convert to Decimal String – cont'd
ConvToDecStr PROC
pushad ; save all since most are used
mov ecx, 0 ; Used to count decimal digits
mov ebx, 10 ; divisor = 10
L1: mov edx, 0 ; dividend = EDX:EAX
div ebx ; EDX = remainder = 0 to 9
add dl, '0' ; convert DL to '0' to '9'
push dx ; save decimal character
inc ecx ; and count it
cmp eax, 0
jnz L1 ; loop back if EAX != 0
L2: pop dx ; pop in reverse order
mov [esi], dl ; store decimal char in string
inc esi
loop L2
mov BYTE PTR [esi], 0 ; Terminate with a NULL char
popad ; restore all registers
ret ; return
ConvToDecStr ENDP
Integer Arithmetic COE 205 – KFUPM slide 49
Summary
Shift and rotate instructions
Provide finer control over bits than high-level languages
Can shift and rotate more than one bit left or right
SHL, SHR, SAR, SHLD, SHRD, ROL, ROR, RCL, RCR
Shifting left by n bits is a multiplication by 2n
Shifting right does integer division (use SAR to preserve sign)
25
02/03/2019
Chapter 8
MACRO
FUNCTION
Macro
1
02/03/2019
Multi-Line Macros
Multi-line macros can include a varying number of
lines (including one). The multi-line macros are
more useful and the following sections will focus
primarily on multi-line macros.
Macro Definition : before using
Syntax :
%macro <name> <number of arguments>
; [body of macro]
%endmacro
The arguments can be referenced within the macro by
%<number>, with %1 being the first argument, and
%2 the second argument, and so forth.
2
02/03/2019
Using a Macro
3
02/03/2019
Macro Example
; Example Program to demonstrate a simple macro
;****************************************
; Define the macro
; called with three arguments:
; aver <lst>, <len>, <ave>
%macro aver 3
mov eax, 0
mov ecx, dword [%2] ; length
mov r12, 0
lea rbx, [%1]
4
02/03/2019
%%sumLoop:
add eax, dword [rbx+r12*4] ; get list[n]
inc r12
loop %%sumLoop
cdq
idiv dword [%2]
mov dword [%3], eax
%endmacro
;***************************************;
Data declarations
section .data
; -----
; Define constants
EXIT_SUCCESS equ 0 ; success code
SYS_exit equ 60 ; code for terminate
; Define Data.
section .data
list1 dd 4, 5, 2, -3, 1
len1 dd 5
ave1 dd 0
5
02/03/2019
list2 dd 2, 6, 3, -2, 1, 8, 19
len2 dd 7
ave2 dd 0
;***************************************section
.text
global _start
_start:
; Use the macro in the program
aver list1, len1, ave1 ; 1st, data set 1
aver list2, len2, ave2
last:
mov rax, SYS_exit ; exit
mov rdi, EXIT_SUCCESS ; success
syscall
Functions
Functions and procedures (i.e., void functions),
help break-up a program into smaller parts
making it easier to code, debug, and maintain.
Function calls involve two main actions:
Linkage : Since the function can be called from
multiple different places in the code, the function must
be able to return to the correct place in which it was
originally called.
Argument Transmission : The function must be able to
access parameters to operate on or to return
results (i.e., access call-by-reference parameters).
6
02/03/2019
Function Declaration
Linkage
The linkage is about getting to and returning from
a function call correctly. There are two instructions
that handle the linkage, call <funcName> and ret
instructions.
The call transfers control to the named function,
and ret returns control back to the calling routine.
The call works
Push RIP
Jump to label
Ret instruction
POP RIP
Jump to address
7
02/03/2019
Argument Transmission
Argument transmission refers to sending information
(variables, etc.) to a function and obtaining a result as
appropriate for the specific function.
Transmitting values to a function is referred to as call-
byvalue.
Transmitting addresses to a function is referred to as call-
by-reference.
There are various ways to pass arguments to and/or from a
function
Placing values in register
Easiest, but has limitations (i.e., the number of registers).
Used for first six integer arguments.
Used for system calls.
8
02/03/2019
Parameter Passing
As noted, a combination of registers and the stack is used
to pass parameters to and/or from a function.The first six
integer arguments are passed in registers as follows:
9
02/03/2019
10
02/03/2019
Register Usage
some registers are expected to be preserved across a
function call. That means that if a value is placed in a
preserved register or saved register and the function must
use that register, the original value must be preserved by
placing it on the stack, altered as needed, and then
restored to its original value before returning to the
calling routine
11
02/03/2019
Call Frame
12
02/03/2019
Red Zone
13
02/03/2019
Caller
There are 4 arguments, and all arguments are passed in
registers in accordance with the standard calling
convention. The assembly language code in the calling
routine for the call to the stats function would be as
follows:
; stats1(arr, len, sum, ave);
mov rcx, ave ; 4th arg, addr of ave
mov rdx, sum ; 3rd arg, addr of sum
mov esi, dword [len] ; 2nd arg, value of len
mov rdi, arr ; 1st arg, addr of arr
call stats1
Callee
14
02/03/2019
15
02/03/2019
Caller
There are 8 arguments and only the first six can be passed
in registers. The last two arguments are passed on the stack
The assembly language code in the calling routine for the
call to the stats function would be as follows:
Callee
16
02/03/2019
17
02/03/2019
18
02/03/2019
19
02/03/2019
Chapter 9
SYSTEM SERVICES
Introduction
1
02/03/2019
2
02/03/2019
System Calls
3
02/03/2019
Newline Character
In the context of output, a newline means move the
cursor to the start of the next line.
The many languages, including C, it is often noted as
“\n” as part of a string
Nothing is displayed for the newline, but the cursor is
moved to the start of the next line.
In Unix/Linux systems, the linefeed, abbreviated LF with
an ASCII value of 10 (or 0x0A), is used as the newline
character.
In Windows systems, the newline is carriage
return, abbreviated as CR with an ASCII value 13 (or
0x0D) followed by the LF.
Console Output
The system service to output characters to the console is
the system write (SYS_write).
Like a high-level language characters are written to
standard out (STDOUT) which is the console
The STDOUT is the default file descriptor for the console
The arguments for the write system service are as follows:
4
02/03/2019
5
02/03/2019
6
02/03/2019
7
02/03/2019
Console Input
The system service to read characters from the
console is the system read (SYS_read).
Like a high-level language, for the console,
characters are read from standard input (STDIN).
We will need to declare an appropriate amount of
space to store the characters being read
If we request 10 characters to read and the user types
more than 10, the additional characters will be lost
If the user types less than 10 characters, for example
5 characters, all five characters will be read plus the
newline (LF) for a total of six characters.
8
02/03/2019
Example
9
02/03/2019
10
02/03/2019
11
02/03/2019
12
02/03/2019
13
02/03/2019
After the system call, the rax register will contain the
return value.
If the file open operation fails, rax will contain a negative
value (i.e., < 0).
If the file open operation succeeds, rax contains the file
descriptor
File Open/Create
A file open/create operation will create a file
If the file does not exist, a new file will be created
If the file already exists, it will be erased and a new file created.
Since the file is being created, the access mode must include the
file permissions that will be set when the file is created.
14
02/03/2019
File Read
A file must be opened with the appropriate file access flags
before it can be read.
The arguments for the file read system service are as follows:
15
02/03/2019
File Write
The arguments for the file write system service are as
follows:
16
02/03/2019
17
02/03/2019
18
02/03/2019
19
02/03/2019
20
02/03/2019
21
02/03/2019
22