0% found this document useful (0 votes)
110 views

Instructions: Language of The Computer P

The document discusses computer instruction sets and operations. It begins by explaining that different computers have different instruction sets but with many common aspects. Early computers had very simple instruction sets for simplified implementation, while modern computers still aim for simplicity. The MIPS instruction set is then used as an example throughout. The document goes on to describe common arithmetic, logical, and memory operations that instructions can perform on computer hardware. It also explains how instructions and data are represented in binary and stored in memory for computers to execute as programs. Finally, some logical operations like shift and bitwise AND/OR are discussed.

Uploaded by

Adip Chy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Instructions: Language of The Computer P

The document discusses computer instruction sets and operations. It begins by explaining that different computers have different instruction sets but with many common aspects. Early computers had very simple instruction sets for simplified implementation, while modern computers still aim for simplicity. The MIPS instruction set is then used as an example throughout. The document goes on to describe common arithmetic, logical, and memory operations that instructions can perform on computer hardware. It also explains how instructions and data are represented in binary and stored in memory for computers to execute as programs. Finally, some logical operations like shift and bitwise AND/OR are discussed.

Uploaded by

Adip Chy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Chapter 2

Instructions: Language p of the Computer

2.1 Intro oduction

Instruction Set

The repertoire of instructions of a co pute computer Different computers have different instruction sets

But with many aspects in common

Early computers had very simple instruction sets

Simplified p implementation p

Many modern computers also have simple instruction sets


Chapter 2 Instructions: Language of the Computer 2

The MIPS Instruction Set


Used as the example throughout the book Stanford MIPS commercialized by MIPS Technologies (www.mips.com) Large share of embedded core market

Applications in consumer electronics, network/storage equipment, cameras, printers, See MIPS Reference Data tear-out card, and A Appendixes di B and dE

Typical of many modern ISAs

Chapter 2 Instructions: Language of the Computer 3

2.2 Ope erations of f the Comp puter Hard dware

Arithmetic Operations

Add and subtract, three operands

Two sources and one destination

add a, b, c # a gets b + c All arithmetic operations have this form Design g Principle 1: Simplicity y favours regularity

Regularity g y makes implementation p simpler p Simplicity enables higher performance at lower cost
Chapter 2 Instructions: Language of the Computer 4

Arithmetic Example

C code:
f = (g ( + h) - (i + j); j)

Compiled p MIPS code:


add t0, g, h add t1, t1 i, i j sub f, t0, t1 # temp t0 = g + h # temp t1 = i + j # f = t0 - t1

Chapter 2 Instructions: Language of the Computer 5

2.3 Ope erands of t the Compu uter Hardw ware

Register Operands

Arithmetic instructions use register operands MIPS has a 32 32-bit register file

Use for frequently accessed data N b d0t Numbered to 31 32-bit data called a word $t0, $t1, , $t9 for temporary values $s0, $s1, , $s7 for saved variables c.f. main memory: millions of locations

Assembler names

Design Principle 2: Smaller is faster

Chapter 2 Instructions: Language of the Computer 6

Register Operand Example

C code:
f = (g + h) - (i + j); f, , j in $s0, , $s4

C Compiled il d MIPS code: d


add $t0, $s1, $s2 add dd $t1, $t1 $s3, $ 3 $s4 $ 4 sub $s0, $t0, $t1

Chapter 2 Instructions: Language of the Computer 7

Memory Operands

Main memory used for composite data

Arrays, structures, dynamic data Load values from memory into registers Store result from register to memory Each address identifies an 8-bit byte Address must be a multiple of 4 Most-significant byte at least address of a word c.f. Little Endian: least-significant byte at least address
Chapter 2 Instructions: Language of the Computer 8

To apply arithmetic operations


Memory is byte addressed

Words are aligned in memory

MIPS is Big Endian


Memory Operand Example 1

C code:
g = h + A[8]; g in $s1, h in $s2, base address of A in $s3

C Compiled il d MIPS code: d

Index 8 requires offset of 32

4 bytes per word

lw $t0, 32($s3) add $s1, $s1 $s2, $s2 $t0


offset

# load word

base register

Chapter 2 Instructions: Language of the Computer 9

Memory Operand Example 2

C code:
A[12] = h + A[8]; h in $s2, base address of A in $s3

C Compiled il d MIPS code: d


Index 8 requires offset of 32 lw $t0, 32($s3) # load word add $t0, $s2, $t0 sw $t0, $t0 48($s3) # store word

Chapter 2 Instructions: Language of the Computer 10

Registers vs. Memory

Registers are faster to access than e oy memory Operating on memory data requires loads and stores

More instructions to be executed

Compiler must use registers for variables as much as possible

Only y spill p to memory y for less frequently q y used variables Register optimization is important!
Chapter 2 Instructions: Language of the Computer 11

Immediate Operands

Constant data specified in an instruction


addi $s3, $s3 $s3, $s3 4

No subtract immediate instruction

J Just use a negative i constant


addi $s2, $s1, -1

Design D i P Principle i i l 3 3: Make M k the th common case fast


Small constants are common Immediate operand avoids a load instruction


Chapter 2 Instructions: Language of the Computer 12

The Constant Zero

MIPS register 0 ($zero) is the constant 0

Cannot be overwritten E.g., move between E b registers i add $t2, $s1, $zero

Useful for common operations

Chapter 2 Instructions: Language of the Computer 13

2.4 Sign ned and U Unsigned N Numbers

Unsigned Binary Integers

Given an n-bit number


x = x n1 2
n 1

+ x n2 2

n2

+ L + x1 2 + x 0 2
1

Range: 0 to +2n 1 Example

0000 0000 0000 0000 0000 0000 0000 10112 = 0 + + 123 + 022 +121 +120 = 0 + + 8 + 0 + 2 + 1 = 1110

Using 32 bits

0 to o +4,294,967,295 , 9 ,96 , 95
Chapter 2 Instructions: Language of the Computer 14

2s-Complement Signed Integers

Given an n-bit number


x = x n1 2
n 1

+ x n2 2

n2

+ L + x1 2 + x 0 2
1

Range: 2 2n 1 to +2n 1 1 Example

1111 1111 1111 1111 1111 1111 1111 11002 = 1231 + 1230 + + 122 +021 +020 = 2,147,483,648 + 2,147,483,644 = 410

Using 32 bits

2,147,483,648 , , 83,6 8 to o +2,147,483,647 , , 83,6


Chapter 2 Instructions: Language of the Computer 15

2s-Complement Signed Integers

Bit 31 is sign bit


1 for negative numbers 0 for non-negative numbers

(2n 1) cant be represented Non-negative numbers have the same unsigned and 2s-complement representation Some specific numbers

0: 0000 0000 0000 1: 1111 1111 1111 Most-negative: 1000 0000 0000 Most-positive: 0111 1111 1111

Chapter 2 Instructions: Language of the Computer 16

Signed Negation

Complement and add 1

Complement means 1 0, 0 01
x + x = 1111...1112 = 1 x + 1 = x

Example: negate +2

+2 2 = 0000 0000 00102 2 = 1111 1111 11012 + 1 = 1111 1111 11102


Chapter 2 Instructions: Language of the Computer 17

Sign Extension

Representing a number using more bits

Preserve the numeric value addi: extend immediate value lb, lh: extend loaded byte/halfword / f beq, bne: extend the displacement c.f. unsigned values: extend with 0s +2: 0000 0010 => 0000 0000 0000 0010 2: 1111 1110 => 1111 1111 1111 1110
Chapter 2 Instructions: Language of the Computer 18

In MIPS instruction set


Replicate the sign bit to the left

Examples: a p es 8 8-bit b t to 16-bit 6 bt


2.5 Rep presenting Instructio ons in the C Computer

Representing Instructions

Instructions are encoded in binary

Called machine code Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode), register numbers, Regularity! $t0 $ 0 $t7 $ are regs 8 15 1 $t8 $t9 are regs 24 25 $s0 $s7 are regs reg s 16 23

MIPS instructions

Register numbers

Chapter 2 Instructions: Language of the Computer 19

MIPS R-format Instructions


op
6 bit bits

rs
5 bit bits

rt
5 bit bits

rd
5 bit bits

shamt
5 bit bits

funct
6 bit bits

Instruction fields

op: operation code (opcode) rs: first source register number rt: second source register number rd: destination register number shamt: shift amount (00000 for now) funct: function code ( (extends opcode) )
Chapter 2 Instructions: Language of the Computer 20

R-format Example
op
6 bit bits

rs
5 bit bits

rt
5 bit bits

rd
5 bit bits

shamt
5 bit bits

funct
6 bit bits

add $ $t0, , $s1, $ , $s2 $


special 0 000000 $s1 17 10001 $s2 18 10010 $t0 8 01000 0 0 00000 add 32 100000

000000100011001001000000001000002 = 0232402016
Chapter 2 Instructions: Language of the Computer 21

Hexadecimal

Base 16

Compact representation of bit strings 4 bits per hex digit


0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 a b 1000 1001 1010 1011 c d e f 1100 1101 1110 1111

0 1 2 3

Example: eca8 6420

1110 1100 1010 1000 0110 0100 0010 0000


Chapter 2 Instructions: Language of the Computer 22

MIPS I-format Instructions


op
6 bit bits

rs
5 bit bits

rt
5 bit bits

constant or address
16 bit bits

Immediate arithmetic and load/store instructions


rt: t destination d ti ti or source register i t number b Constant: 215 to +215 1 Address: dd ess o offset set added to base add address ess in rs s

Design Principle 4: Good design demands good compromises

Different formats complicate decoding, but allow 32-bit instructions uniformly Keep formats as similar as possible
Chapter 2 Instructions: Language of the Computer 23

Stored Program Computers


The BIG Picture

Instructions represented in binary, just like data Instructions and data stored in memory P Programs can operate t on programs

e g compilers e.g., compilers, linkers linkers,

Binary compatibility allows compiled programs g to work on different computers

Standardized ISAs

Chapter 2 Instructions: Language of the Computer 24

2.6 Logical Opera ations

Logical Operations

Instructions for bitwise manipulation


Operation Shift left Shift right Bitwise AND Bitwise OR Bitwise NOT C << >> & | ~ Java << >>> & | ~ MIPS sll srl and, andi or, ori i nor

Useful for extracting and inserting groups of bits in a word


Chapter 2 Instructions: Language of the Computer 25

Shift Operations
op
6 bits

rs
5 bits

rt
5 bits

rd
5 bits

shamt
5 bits

funct
6 bits

shamt: how many positions to shift Shift left logical


Shift left and fill with 0 bits sll by i bits multiplies by 2i Shift right and fill with 0 bits srl by i bits divides by 2i (unsigned only)
Chapter 2 Instructions: Language of the Computer 26

Shift right logical


AND Operations

Useful to mask bits in a word

Select some bits, bits clear others to 0

and $t0, $t1, $t2


$t2 $t1 $t0 0000 0000 0000 0000 0000 1101 1100 0000 0000 0000 0000 0000 0011 1100 0000 0000 0000 0000 0000 0000 0000 1100 0000 0000

Chapter 2 Instructions: Language of the Computer 27

OR Operations

Useful to include bits in a word

Set some bits to 1, 1 leave others unchanged

or $t0, $t1, $t2


$t2 $t1 $t0 0000 0000 0000 0000 0000 1101 1100 0000 0000 0000 0000 0000 0011 1100 0000 0000 0000 0000 0000 0000 0011 1101 1100 0000

Chapter 2 Instructions: Language of the Computer 28

NOT Operations

Useful to invert bits in a word

Change 0 to 1 1, and 1 to 0 a NOR b == NOT ( a OR b )


Register 0: always read as zero

MIPS has NOR 3-operand instruction

nor $t0, $t1, $zero


$t1 $ $t0

0000 0000 0000 0000 0011 1100 0000 0000 1111 1111 1111 1111 1100 0011 1111 1111

Chapter 2 Instructions: Language of the Computer 29

2.7 Instructions fo or Making Decisions s

Conditional Operations

Branch to a labeled instruction if a condition co d t o is st true ue

Otherwise, continue sequentially if (rs == rt) branch to instruction labeled L1; if (rs != rt) branch to instruction labeled L1; unconditional jump to instruction labeled L1

beq rs, rt, L1

bne rs, rt, L1

j L1

Chapter 2 Instructions: Language of the Computer 30

Compiling If Statements

C code:
if (i (i==j) j) f = g+h; else f = g-h;

f g, f, g in $s0, $s0 $s1 $s1, $s3, $s4, Else $s0, $s1, $s2 Exit $s0, $s1, $s2
Assembler calculates addresses
Chapter 2 Instructions: Language of the Computer 31

Compiled MIPS code:


bne add j Else: sub Exit:

Compiling Loop Statements

C code:
while (save[i] == k) i += 1;

i in $s3, k in $s5, address of save in $s6 $t1, $t1, $t1 $t0, $t0, $s3, $s3 Loop $s3, 2 $t1, $t1 $s6 0($t1) $s5, Exit $s3, $s3 1

Compiled MIPS code:


Loop: sll add lw bne addi j Exit:

Chapter 2 Instructions: Language of the Computer 32

Basic Blocks

A basic block is a sequence of instructions with


No embedded branches (except at end) No branch targets (except at beginning)

A compiler il id identifies tifi b basic i blocks for optimization An advanced processor can accelerate execution of basic blocks

Chapter 2 Instructions: Language of the Computer 33

More Conditional Operations

Set result to 1 if a condition is true

Otherwise set to 0 Otherwise, if ( (rs < rt) ) rd d=1 1; else l rd d=0 0; if (rs < constant) rt = 1; else rt = 0;
slt $t0, $s1, $s2 bne $t0, $zero, L # if ($s1 < $s2) # branch to L

slt rd, rs, rt

slti rt, rs, constant

Use in combination with beq q, bne

Chapter 2 Instructions: Language of the Computer 34

Branch Instruction Design


Why not blt, bge, etc? Hardware for < <, , slower than = =,

Combining with branch involves more work per instruction instruction, requiring a slower clock All instructions penalized!

beq and b d bne b are the th common case This is a good design compromise

Chapter 2 Instructions: Language of the Computer 35

Signed vs. Unsigned


Signed comparison: slt, slti Unsigned comparison: sltu, sltui Example


$s0 = 1111 1111 1111 1111 1111 1111 1111 1111 $s1 = 0000 0000 0000 0000 0000 0000 0000 0001 slt $t0, $s0, $s1 # signed

1 < +1 $t0 = 1

sltu $t0, $s0, $s1

# unsigned

+4,294,967,295 > +1 $t0 = 0

Chapter 2 Instructions: Language of the Computer 36

2.8 Sup pporting Pr rocedures in Compu uter Hardw ware

Procedure Calling

Steps required
1. 1 2. 3 3. 4. 5 5. 6. Place parameters in registers Transfer control to procedure Acquire storage for procedure Perform procedures operations Pl Place result lt in i register i t for f caller ll Return to place of call

Chapter 2 Instructions: Language of the Computer 37

Register Usage

$a0 $a3: arguments (regs 4 7) $v0, $ ,$ $v1: result values ( (regs g 2 and 3) ) $t0 $t9: temporaries

Can be overwritten by callee Must be saved/restored by callee

$s0 $s7: saved

$gp: global $ l b l pointer i t f for static t ti d data t ( (reg 28) $sp: stack pointer (reg 29) $f frame $fp: f pointer i t (reg ( 30) $ra: return address (reg 31)
Chapter 2 Instructions: Language of the Computer 38

Procedure Call Instructions

Procedure call: jump and link


jal ProcedureLabel Address of following instruction put in $ra Jumps to target address

Procedure return: jump register


jr $ra Copies $ra to program counter Can also be used for computed jumps

e.g., for case/switch statements

Chapter 2 Instructions: Language of the Computer 39

Leaf Procedure Example

C code:
int leaf_example leaf example (int g, g h, h i, i j) { int f; f = (g + h) ) - ( (i + j); return f; } Arguments g, , j in $a0, , $a3 f in $s0 ( (hence, need to save $s0 on stack) ) Result in $v0

Chapter 2 Instructions: Language of the Computer 40

Leaf Procedure Example

MIPS code:
leaf_example: leaf example: addi $sp, $sp, -4 sw $s0, 0($sp) add dd $t0, $t0 $a0, $ 0 $a1 $ 1 add $t1, $a2, $a3 sub $ $s0, , $t0, $ , $t1 $ add $v0, $s0, $zero lw $s0, 0($sp) addi $sp $sp, $sp, $sp 4 jr $ra
Save $s0 on stack

Procedure body Result Restore $s0 Return

Chapter 2 Instructions: Language of the Computer 41

Non-Leaf Procedures

Procedures that call other procedures For nested call call, caller needs to save on the stack:

Its return It t address dd Any arguments and temporaries needed after the call

Restore from the stack after the call

Chapter 2 Instructions: Language of the Computer 42

Non-Leaf Procedure Example

C code:
int fact (int n) { if ( (n < 1) ) return etu f; ; else return n * fact(n - 1); } Argument n in $a0 Result in $v0

Chapter 2 Instructions: Language of the Computer 43

Non-Leaf Procedure Example

MIPS code:
fact: addi sw sw slti beq addi addi j jr L1: addi jal lw lw addi mul jr $sp, $ra, $a0, $t0, $t0, $v0, $sp, $ra $ $a0, fact $a0, $ , $ra, $sp, $v0, $ra $sp, -8 4($sp) 0($sp) $a0, 1 $zero, L1 $zero, 1 $sp, 8 $a0, -1 0($sp) ($ p) 4($sp) $sp, 8 $a0, $v0 # # # # # # # # # # # # # # adjust stack for 2 items save return address save argument test for n < 1 if so, result is 1 pop 2 items from stack and d return else decrement n recursive call restore original g n and return address pop 2 items from stack multiply to get result and return

Chapter 2 Instructions: Language of the Computer 44

Local Data on the Stack

Local data allocated by y callee

e.g., C automatic variables U db Used by some compilers il t to manage stack t k storage t


Chapter 2 Instructions: Language of the Computer 45

Procedure frame (activation record)

Memory Layout

Text: program code Static data: global g variables

e.g., static variables in C, constant arrays and strings $gp initialized to address allowing offsets into this segment E.g., E g malloc in C C, new in Java

Dynamic data: heap

Stack: automatic storage


Chapter 2 Instructions: Language of the Computer 46

2.9 Com mmunicatin ng with Pe eople

Character Data

Byte-encoded character sets

ASCII: 128 characters

95 graphic, 33 control ASCII, +96 more graphic characters

Latin-1: 256 characters

Unicode: 32 32-bit bit character set


Used in Java, C++ wide characters, M t of Most f the th worlds ld alphabets, l h b t plus l symbols b l UTF-8, UTF-16: variable-length encodings
Chapter 2 Instructions: Language of the Computer 47

Byte/Halfword Operations

Could use bitwise operations MIPS byte/halfword load/store

String processing is a common case


lh rt, offset(rs) ff ( ) lhu lh rt, offset(rs) ff ( ) sh h rt, offset(rs) ff ( )

lb rt, offset(rs) ff ( )

Sign extend to 32 bits in rt Zero extend to 32 bits in rt Store just rightmost byte/halfword
Chapter 2 Instructions: Language of the Computer 48

lb rt, offset(rs) lbu ff ( )

sb b rt, offset(rs) ff ( )

String Copy Example

C code (nave):
Null-terminated Null terminated string void strcpy (char x[], char y[]) { int i; i = 0; while ((x[i]=y[i])!='\0') (( [ ] y[ ]) \ ) i += 1; } Addresses of x, y in $a0, $a1 i in $s0

Chapter 2 Instructions: Language of the Computer 49

String Copy Example

MIPS code:
strcpy: addi sw add L1: add lbu add sb b beq addi j L2: lw addi jr $sp, $s0, $s0, $t1, $t2, $t3, $t2, $t2, $ 2 $s0, L1 $s0, $ , $sp, $ra $sp, -4 0($sp) $zero, $zero $s0, $a1 0($t1) $s0, $a0 0($t3) $zero, $ L2 2 $s0, 1 0($sp) ($ p) $sp, 4 # # # # # # # # # # # # # adjust stack for 1 item save $s0 i = 0 addr of y[i] in $t1 $t2 = y[i] addr of x[i] in $t3 x[i] = y[i] exit i loop l if y[i] [i] == 0 i = i + 1 next iteration of loop restore saved $s0 $ pop 1 item from stack and return

Chapter 2 Instructions: Language of the Computer 50

2.10 MIPS Addres ssing for 3 32-Bit Imm mediates and Addres sses

32-bit Constants

Most constants are small

16 bit immediate is sufficient 16-bit

For the occasional 32-bit constant l i rt, constant lui


Copies 16-bit constant to left 16 bits of rt Clears right 16 bits of rt to 0


0000 0000 0111 1101 0000 0000 0000 0000

lhi $s0, 61

ori $s0, , $s0, , 2304 0000 0000 0111 1101 0000 1001 0000 0000

Chapter 2 Instructions: Language of the Computer 51

Branch Addressing

Branch instructions specify

Opcode two registers, Opcode, registers target address F Forward d or backward b k d


op
6 bits

Most branch targets are near branch

rs
5 bits

rt
5 bits

constant or address
16 bits

PC relative addressing PC-relative


Target address = PC + offset 4 PC already l d incremented i t d by b 4b by thi this ti time


Chapter 2 Instructions: Language of the Computer 52

Jump Addressing

Jump (j and jal) targets could be anywhere in text segment

Encode full address in instruction


op
6 bits

address
26 bits

(Pseudo)Direct jump addressing

Target address = PC3128 : (address 4)

Chapter 2 Instructions: Language of the Computer 53

Target Addressing Example

Loop code from earlier example

Assume Loop at location 80000


$t1, $s3, 2 $t1, $t1, $s6 $t0, 0($t1) 80000 80004 80008 0 0 35 5 8 2 0 9 9 8 19 19 22 8 21 19 20000 9 9 4 0 0 2 1 0 32

Loop: sll add lw bne

$t0, $s5, Exit 80012 80016 80020 80024

addi $s3, $s3, 1 j Exit: Loop

Chapter 2 Instructions: Language of the Computer 54

Branching Far Away

If branch target is too far to encode with 16-bit offset, offset assembler rewrites the code Example
b beq $s0,$s1, $ 0 $ 1 L1 b bne $s0,$s1, $ 0 $ 1 L2 2 j L1 L2:

Chapter 2 Instructions: Language of the Computer 55

Addressing Mode Summary

Chapter 2 Instructions: Language of the Computer 56

2.11 Parallelism a and Instruc ctions: Syn nchronizat tion

Synchronization

Two processors sharing an area of memory


P1 writes, then P2 reads Data race if P1 and P2 dont synchronize

Result depends of order of accesses

Hardware support required


Atomic read/write memory operation No other access to the location allowed between the read and write E.g., atomic swap of register memory Or an atomic pair of instructions

Could be a single instruction


Chapter 2 Instructions: Language of the Computer 57

Synchronization in MIPS

Load linked: ll rt, offset(rs) Store conditional: sc rt, , offset(rs) ( )

Succeeds if location not changed since the ll

Returns 1 in rt Returns 0 in rt

Fails if location is changed

Example: p atomic swap p( (to test/set lock variable) )


try: add ll sc beq add $t0,$zero,$s4 $t1,0($s1) $t0,0($s1) $t0 0($s1) $t0,$zero,try $s4,$zero,$t1 ;copy exchange value ;load linked ;store conditional ;branch store fails ;put load value in $s4

Chapter 2 Instructions: Language of the Computer 58

2.12 Tra anslating a and Startin ng a Progr ram

Translation and Startup


Many compilers produce object modules directly

Static linking

Chapter 2 Instructions: Language of the Computer 59

Assembler Pseudoinstructions

Most assembler instructions represent machine instructions one-to-one Pseudoinstructions: figments of the assemblers assembler s imagination
add $t0, $zero, $t1 blt $t0, $t1, L slt $at, $t0, $t1
move $t0, $t1 bne $at, $zero, L

$ t( $at (register i t 1) 1): assembler bl t temporary

Chapter 2 Instructions: Language of the Computer 60

Producing an Object Module


Assembler (or compiler) translates program into machine instructions Provides information for building a complete program from the pieces

Header: H d d described ib d contents t t of f object bj t module d l Text segment: translated instructions Static data segment: data allocated for the life of the program Relocation info: for contents that depend on absolute location of loaded program Symbol table: global definitions and external refs Debug info: for associating with source code
Chapter 2 Instructions: Language of the Computer 61

Linking Object Modules

Produces an executable image


1. Merges segments 1 2. Resolve labels (determine their addresses) 3 Patch location-dependent 3. location dependent and external refs

Could leave location dependencies for fi i b fixing by a relocating l ti l loader d


But with virtual memory, no need to do this Program can be loaded into absolute location in virtual memory space
Chapter 2 Instructions: Language of the Computer 62

Loading a Program

Load from image file on disk into memory


1. Read header to determine segment sizes 1 2. Create virtual address space 3 Copy text and initialized data into memory 3.

Or set page table entries so they can be faulted in

4. Set up arguments on stack 4 5. Initialize registers (including $sp, $fp, $gp) 6 Jump 6. J to t startup t t routine ti

Copies arguments to $a0, and calls main When main returns returns, do exit syscall
Chapter 2 Instructions: Language of the Computer 63

Dynamic Linking

Only link/load library procedure when it is called


Requires procedure code to be relocatable Avoids image bloat caused by static linking of all (transitively) referenced libraries Automatically picks up new library versions

Chapter 2 Instructions: Language of the Computer 64

Lazy Linkage

Indirection table Stub: Loads routine ID, Jump to linker/loader Linker/loader code

Dynamically mapped code

Chapter 2 Instructions: Language of the Computer 65

Starting Java Applications


Simple Si l portable t bl instruction set for the JVM

Compiles bytecodes of hot methods into native code for host machine

Interprets bytecodes

Chapter 2 Instructions: Language of the Computer 66

2.13 A C Sort Exa ample to P Put It All To ogether

C Sort Example

Illustrates use of assembly instructions o a C bubb bubble e so sort t function u ct o for Swap procedure (leaf)
void swap(int v[], int k) { int temp; t temp = v[k]; [k] v[k] = v[k+1]; v[k+1] [ ] = temp; p; } v in $a0, k in $a1, temp in $t0

Chapter 2 Instructions: Language of the Computer 67

The Procedure Swap


swap: sll $t1, $a1, 2 # $t1 = k * 4 add $t1, $a0, $t1 # $t1 = v+(k*4) # (address of v[k]) lw $t0, 0($t1) # $t0 (temp) = v[k] lw $t2, 4($t1) # $t2 = v[k+1] sw $t2, 0($t1) # v[k] = $t2 (v[k+1]) sw $t0, 4($t1) # v[k+1] = $t0 (temp) jr $ra j # return to calling g routine

Chapter 2 Instructions: Language of the Computer 68

The Sort Procedure in C

Non-leaf (calls swap)


void sort (int v[], int n) { int i, j; for ( (i = 0; ; i < n; ; i += 1) ) { for (j = i 1; j >= 0 && v[j] > v[j + 1]; j -= 1) { swap(v,j); } } } v in $a0, , k in $a1, , i in $s0, , j in $s1
Chapter 2 Instructions: Language of the Computer 69

The Procedure Body


move move move for1tst: slt beq addi for2tst: slti bne sll add lw lw slt beq move move jal addi j exit2: addi j $s2, $a0 $s3, $a1 $s0, $zero $t0 $t0, $s0 $s0, $s3 $t0, $zero, exit1 $s1, $s0, 1 $t0, $s1, 0 $t0, $t0 $zero, $zero exit2 $t1, $s1, 2 $t2, $s2, $t1 $t3, 0($t2) $t4, $t4 4($t2) $t0, $t4, $t3 $t0, $zero, exit2 $a0, $s2 $a1, $a1 $s1 swap $s1, $s1, 1 for2tst $s0 $s0, $s0, $s0 1 for1tst # # # # # # # # # # # # # # # # # # # # # save $a0 into $s2 save $a1 into $s3 i = 0 $t0 = 0 if $s0 $s3 (i n) go to exit1 if $s0 $s3 (i n) j = i 1 $t0 = 1 if $s1 < 0 (j < 0) go to exit2 if $s1 < 0 (j < 0) $t1 = j * 4 $t2 = v + (j * 4) $t3 = v[j] $t4 = v[j + 1] $t0 = 0 if $t4 $t3 go to exit2 if $t4 $t3 1st param of swap is v (old $a0) 2nd param of swap is j call swap procedure j = 1 jump to test of inner loop i += + 1 jump to test of outer loop Move params Outer loop p

Inner loop

Pass params & call Inner loop Outer loop

Chapter 2 Instructions: Language of the Computer 70

The Full Procedure


sort: addi $sp,$sp, 20 sw $ra, 16($sp) sw $s3,12($sp) sw $s2, 8($sp) sw $s1, 4($sp) sw $s0, 0($sp) exit1: lw $s0, 0($sp) lw $s1, 4($sp) lw $s2, 8($sp) lw $s3,12($sp) lw $ra,16($sp) addi $sp,$sp, 20 jr $ra # # # # # # # # # # # # # # make room on stack for 5 registers save $ra on stack save $s3 on stack save $s2 on stack save $s1 on stack save $s0 on stack procedure body restore $s0 from stack restore $s1 from stack restore $s2 from stack restore $s3 from stack restore $ra from stack restore stack pointer return to calling routine

Chapter 2 Instructions: Language of the Computer 71

Effect of Compiler Optimization


Compiled with gcc for Pentium 4 under Linux
3 2.5 2 1.5 1 0.5 0 none 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 none O1 O2 O3 2 1.5 1 0.5 0 O1 O2 O3 none O1 O2 O3

Relative Performance

140000 120000 100000 80000 60000 40000 20000 0 none

Instruction count

O1

O2

O3

Clock Cycles

CPI

Chapter 2 Instructions: Language of the Computer 72

Effect of Language and Algorithm


3 2.5 2 15 1.5 1 0.5 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT

Bubblesort Relative Performance

2.5 2 1.5 1 0.5 0 C/none

Quicksort Relative Performance

C/O1

C/O2

C/O3

Java/int

Java/JIT

3000 2500 2000 1500 1000 500 0 C/none

Quicksort vs. Bubblesort Speedup

C/O1

C/O2

C/O3

Java/int

Java/JIT

Chapter 2 Instructions: Language of the Computer 73

Lessons Learnt

Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Java/JIT compiled code is significantly f t than faster th JVM interpreted i t t d

Comparable to optimized C in some cases

Nothing can fix a dumb algorithm!

Chapter 2 Instructions: Language of the Computer 74

2.14 Arr rays versu us Pointers s

Arrays vs. Pointers

Array indexing involves


Multiplying index by element size Adding to array base address

Pointers P i t correspond d directly di tl t to memory addresses

Can avoid indexing complexity

Chapter 2 Instructions: Language of the Computer 75

Example: Clearing and Array


clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0; } move $t0,$zero loop1: sll $t1,$t0,2 add $t2,$a0,$t1 # i = 0 # $t1 = i * 4 # $t2 = y[ ] # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i = i + 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if () # goto loop1 clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0; } move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = y[ ] # &array[size] loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if () # goto loop2

Chapter 2 Instructions: Language of the Computer 76

Comparison of Array vs. Ptr


Multiply strength reduced to shift Array version requires shift to be inside loop

Part P t of f index i d calculation l l ti f for i incremented t di c.f. incrementing pointer

Compiler can achieve same effect as manual use of pointers


Induction variable elimination Better to make program clearer and safer


Chapter 2 Instructions: Language of the Computer 77

2.16 Re eal Stuff: ARM Instru uctions

ARM & MIPS Similarities


ARM: the most popular embedded core Similar basic set of instructions to MIPS
ARM MIPS 1985 32 bits 32-bit flat Aligned 3 31 32-bit Memory mapped 1985 32 bits 32-bit flat Aligned 9 15 32-bit Memory mapped

Date announced Instruction size Address space Data alignment Data addressing modes Registers Input/output

Chapter 2 Instructions: Language of the Computer 78

Compare and Branch in ARM

Uses condition codes for result of an arithmetic/logical instruction


Negative, zero, carry, overflow Compare instructions to set condition codes without keeping the result Top 4 bits of instruction word: condition value C avoid Can id b branches h over single i l i instructions t ti

Each instruction can be conditional


Chapter 2 Instructions: Language of the Computer 79

Instruction Encoding

Chapter 2 Instructions: Language of the Computer 80

2.17 Re eal Stuff: x x86 Instruc ctions

The Intel x86 ISA

Evolution with backward compatibility

8080 (1974): 8-bit 8 bit microprocessor

Accumulator, plus 3 index-register pairs Complex instruction set (CISC) Adds FP instructions and register stack Segmented memory mapping and protection Additional addressing modes and operations Paged memory mapping as well as segments

8086 (1978): 16-bit extension to 8080

8087 (1980): floating-point coprocessor

80286 (1982): 24-bit addresses, MMU

80386 (1985): 32-bit extension (now IA-32)


Chapter 2 Instructions: Language of the Computer 81

The Intel x86 ISA

Further evolution

i486 (1989): pipelined, on-chip caches and FPU

C Compatible tibl competitors: tit AMD AMD, C Cyrix, i Later versions added MMX (Multi-Media eXtension) instructions The infamous FDIV bug New microarchitecture (see Colwell Colwell, The Pentium Chronicles) Added SSE (Streaming SIMD Extensions) and associated registers New microarchitecture Added SSE2 instructions
Chapter 2 Instructions: Language of the Computer 82

Pentium (1993): superscalar, 64-bit datapath


Pentium Pro (1995), Pentium II (1997)

Pentium III (1999)

Pentium 4 (2001)

The Intel x86 ISA

And further

AMD64 (2003): extended architecture to 64 bits EM64T Extended E d dM Memory 64 T Technology h l (2004)

AMD64 adopted by Intel (with refinements) Added SSE3 instructions Added SSE4 instructions, virtual machine support Intel declined to follow, instead Longer SSE registers, more instructions

I t l Core Intel C (2006)

AMD64 (announced 2007): SSE5 instructions

Advanced Vector Extension (announced 2008)

If Intel didnt extend with compatibility, its competitors would!

Technical elegance market success


Chapter 2 Instructions: Language of the Computer 83

Basic x86 Registers

Chapter 2 Instructions: Language of the Computer 84

Basic x86 Addressing Modes

Two operands per instruction


Source/dest operand Register Register R i t Register Memory Memory Second source operand Register Immediate M Memory Register Immediate

Memory addressing modes


Address in register Address = Rbase + displacement Address = Rbase + 2scale Rindex (scale = 0, 1, 2, or 3) l R Add Address = Rbase + 2scale di l index + displacement
Chapter 2 Instructions: Language of the Computer 85

x86 Instruction Encoding

Variable length encoding

Postfix bytes specify addressing mode Prefix bytes modify operation

Operand length, repetition, locking,

Chapter 2 Instructions: Language of the Computer 86

Implementing IA-32

Complex instruction set makes implementation difficult

Hardware translates instructions to simpler microoperations


Simple instructions: 11 Complex p instructions: 1many y

Microengine similar to RISC Market share makes this economically viable Compilers avoid complex instructions
Chapter 2 Instructions: Language of the Computer 87

Comparable performance to RISC

2.18 Fa allacies and d Pitfalls

Fallacies

Powerful instruction higher performance


Fewer instructions required But complex instructions are hard to implement

May slow down all instructions, including simple ones

Compilers are good at making fast code from simple instructions But modern compilers are better at dealing with modern ode processors p ocesso s More lines of code more errors and less productivity

Use assembly code for high performance

Chapter 2 Instructions: Language of the Computer 88

Fallacies

Backward compatibility instruction set doesnt doesn t change

But they do accrete more instructions

x86 instruction set

Chapter 2 Instructions: Language of the Computer 89

Pitfalls

Sequential words are not at sequential addresses

Increment by 4, not by 1!

Keeping a pointer to an a automatic tomatic variable ariable after procedure returns


e.g., passing i pointer i t b back k via i an argument t Pointer becomes invalid when stack popped

Chapter 2 Instructions: Language of the Computer 90

2.19 Co oncluding R Remarks

Concluding Remarks

Design principles
1. 1 2. 3. 3 4. Simplicity favors regularity Smaller is faster Make a et the e co common o case fast ast Good design demands good compromises

Layers of software/hardware

Compiler, assembler, hardware c.f. x86

MIPS: typical of RISC ISAs

Chapter 2 Instructions: Language of the Computer 91

Concluding Remarks

Measure MIPS instruction executions in benchmark be c a p programs og a s


Consider making the common case fast Consider compromises p


MIPS examples add, sub, addi lw, sw, lb, lbu, lh, lhu, sb, lui and, or, nor, andi, ori, sll, srl beq, bne, slt, slti, sltiu j, jr, jal SPEC2006 Int 16% 35% 12% 34% 2% SPEC2006 FP 48% 36% 4% 8% 0%

Instruction class Arithmetic Data transfer Logical g Cond. Branch Jump

Chapter 2 Instructions: Language of the Computer 92

You might also like