Instructions: Language of The Computer P
Instructions: Language of The Computer P
Instruction Set
The repertoire of instructions of a co pute computer Different computers have different instruction sets
Simplified p implementation p
Used as the example throughout the book Stanford MIPS commercialized by MIPS Technologies (www.mips.com) Large share of embedded core market
Applications in consumer electronics, network/storage equipment, cameras, printers, See MIPS Reference Data tear-out card, and A Appendixes di B and dE
Arithmetic Operations
add a, b, c # a gets b + c All arithmetic operations have this form Design g Principle 1: Simplicity y favours regularity
Regularity g y makes implementation p simpler p Simplicity enables higher performance at lower cost
Chapter 2 Instructions: Language of the Computer 4
Arithmetic Example
C code:
f = (g ( + h) - (i + j); j)
Register Operands
Arithmetic instructions use register operands MIPS has a 32 32-bit register file
Use for frequently accessed data N b d0t Numbered to 31 32-bit data called a word $t0, $t1, , $t9 for temporary values $s0, $s1, , $s7 for saved variables c.f. main memory: millions of locations
Assembler names
C code:
f = (g + h) - (i + j); f, , j in $s0, , $s4
Memory Operands
Arrays, structures, dynamic data Load values from memory into registers Store result from register to memory Each address identifies an 8-bit byte Address must be a multiple of 4 Most-significant byte at least address of a word c.f. Little Endian: least-significant byte at least address
Chapter 2 Instructions: Language of the Computer 8
C code:
g = h + A[8]; g in $s1, h in $s2, base address of A in $s3
# load word
base register
C code:
A[12] = h + A[8]; h in $s2, base address of A in $s3
Registers are faster to access than e oy memory Operating on memory data requires loads and stores
Only y spill p to memory y for less frequently q y used variables Register optimization is important!
Chapter 2 Instructions: Language of the Computer 11
Immediate Operands
Cannot be overwritten E.g., move between E b registers i add $t2, $s1, $zero
+ x n2 2
n2
+ L + x1 2 + x 0 2
1
0000 0000 0000 0000 0000 0000 0000 10112 = 0 + + 123 + 022 +121 +120 = 0 + + 8 + 0 + 2 + 1 = 1110
Using 32 bits
0 to o +4,294,967,295 , 9 ,96 , 95
Chapter 2 Instructions: Language of the Computer 14
+ x n2 2
n2
+ L + x1 2 + x 0 2
1
1111 1111 1111 1111 1111 1111 1111 11002 = 1231 + 1230 + + 122 +021 +020 = 2,147,483,648 + 2,147,483,644 = 410
Using 32 bits
(2n 1) cant be represented Non-negative numbers have the same unsigned and 2s-complement representation Some specific numbers
0: 0000 0000 0000 1: 1111 1111 1111 Most-negative: 1000 0000 0000 Most-positive: 0111 1111 1111
Signed Negation
Complement means 1 0, 0 01
x + x = 1111...1112 = 1 x + 1 = x
Example: negate +2
Sign Extension
Preserve the numeric value addi: extend immediate value lb, lh: extend loaded byte/halfword / f beq, bne: extend the displacement c.f. unsigned values: extend with 0s +2: 0000 0010 => 0000 0000 0000 0010 2: 1111 1110 => 1111 1111 1111 1110
Chapter 2 Instructions: Language of the Computer 18
Representing Instructions
Called machine code Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode), register numbers, Regularity! $t0 $ 0 $t7 $ are regs 8 15 1 $t8 $t9 are regs 24 25 $s0 $s7 are regs reg s 16 23
MIPS instructions
Register numbers
rs
5 bit bits
rt
5 bit bits
rd
5 bit bits
shamt
5 bit bits
funct
6 bit bits
Instruction fields
op: operation code (opcode) rs: first source register number rt: second source register number rd: destination register number shamt: shift amount (00000 for now) funct: function code ( (extends opcode) )
Chapter 2 Instructions: Language of the Computer 20
R-format Example
op
6 bit bits
rs
5 bit bits
rt
5 bit bits
rd
5 bit bits
shamt
5 bit bits
funct
6 bit bits
000000100011001001000000001000002 = 0232402016
Chapter 2 Instructions: Language of the Computer 21
Hexadecimal
Base 16
0 1 2 3
rs
5 bit bits
rt
5 bit bits
constant or address
16 bit bits
rt: t destination d ti ti or source register i t number b Constant: 215 to +215 1 Address: dd ess o offset set added to base add address ess in rs s
Different formats complicate decoding, but allow 32-bit instructions uniformly Keep formats as similar as possible
Chapter 2 Instructions: Language of the Computer 23
Instructions represented in binary, just like data Instructions and data stored in memory P Programs can operate t on programs
Standardized ISAs
Logical Operations
Shift Operations
op
6 bits
rs
5 bits
rt
5 bits
rd
5 bits
shamt
5 bits
funct
6 bits
Shift left and fill with 0 bits sll by i bits multiplies by 2i Shift right and fill with 0 bits srl by i bits divides by 2i (unsigned only)
Chapter 2 Instructions: Language of the Computer 26
AND Operations
OR Operations
NOT Operations
0000 0000 0000 0000 0011 1100 0000 0000 1111 1111 1111 1111 1100 0011 1111 1111
Conditional Operations
Otherwise, continue sequentially if (rs == rt) branch to instruction labeled L1; if (rs != rt) branch to instruction labeled L1; unconditional jump to instruction labeled L1
j L1
Compiling If Statements
C code:
if (i (i==j) j) f = g+h; else f = g-h;
f g, f, g in $s0, $s0 $s1 $s1, $s3, $s4, Else $s0, $s1, $s2 Exit $s0, $s1, $s2
Assembler calculates addresses
Chapter 2 Instructions: Language of the Computer 31
C code:
while (save[i] == k) i += 1;
i in $s3, k in $s5, address of save in $s6 $t1, $t1, $t1 $t0, $t0, $s3, $s3 Loop $s3, 2 $t1, $t1 $s6 0($t1) $s5, Exit $s3, $s3 1
Basic Blocks
A compiler il id identifies tifi b basic i blocks for optimization An advanced processor can accelerate execution of basic blocks
Otherwise set to 0 Otherwise, if ( (rs < rt) ) rd d=1 1; else l rd d=0 0; if (rs < constant) rt = 1; else rt = 0;
slt $t0, $s1, $s2 bne $t0, $zero, L # if ($s1 < $s2) # branch to L
Why not blt, bge, etc? Hardware for < <, , slower than = =,
Combining with branch involves more work per instruction instruction, requiring a slower clock All instructions penalized!
beq and b d bne b are the th common case This is a good design compromise
$s0 = 1111 1111 1111 1111 1111 1111 1111 1111 $s1 = 0000 0000 0000 0000 0000 0000 0000 0001 slt $t0, $s0, $s1 # signed
1 < +1 $t0 = 1
# unsigned
Procedure Calling
Steps required
1. 1 2. 3 3. 4. 5 5. 6. Place parameters in registers Transfer control to procedure Acquire storage for procedure Perform procedures operations Pl Place result lt in i register i t for f caller ll Return to place of call
Register Usage
$a0 $a3: arguments (regs 4 7) $v0, $ ,$ $v1: result values ( (regs g 2 and 3) ) $t0 $t9: temporaries
$gp: global $ l b l pointer i t f for static t ti d data t ( (reg 28) $sp: stack pointer (reg 29) $f frame $fp: f pointer i t (reg ( 30) $ra: return address (reg 31)
Chapter 2 Instructions: Language of the Computer 38
C code:
int leaf_example leaf example (int g, g h, h i, i j) { int f; f = (g + h) ) - ( (i + j); return f; } Arguments g, , j in $a0, , $a3 f in $s0 ( (hence, need to save $s0 on stack) ) Result in $v0
MIPS code:
leaf_example: leaf example: addi $sp, $sp, -4 sw $s0, 0($sp) add dd $t0, $t0 $a0, $ 0 $a1 $ 1 add $t1, $a2, $a3 sub $ $s0, , $t0, $ , $t1 $ add $v0, $s0, $zero lw $s0, 0($sp) addi $sp $sp, $sp, $sp 4 jr $ra
Save $s0 on stack
Non-Leaf Procedures
Procedures that call other procedures For nested call call, caller needs to save on the stack:
Its return It t address dd Any arguments and temporaries needed after the call
C code:
int fact (int n) { if ( (n < 1) ) return etu f; ; else return n * fact(n - 1); } Argument n in $a0 Result in $v0
MIPS code:
fact: addi sw sw slti beq addi addi j jr L1: addi jal lw lw addi mul jr $sp, $ra, $a0, $t0, $t0, $v0, $sp, $ra $ $a0, fact $a0, $ , $ra, $sp, $v0, $ra $sp, -8 4($sp) 0($sp) $a0, 1 $zero, L1 $zero, 1 $sp, 8 $a0, -1 0($sp) ($ p) 4($sp) $sp, 8 $a0, $v0 # # # # # # # # # # # # # # adjust stack for 2 items save return address save argument test for n < 1 if so, result is 1 pop 2 items from stack and d return else decrement n recursive call restore original g n and return address pop 2 items from stack multiply to get result and return
Memory Layout
e.g., static variables in C, constant arrays and strings $gp initialized to address allowing offsets into this segment E.g., E g malloc in C C, new in Java
Character Data
Used in Java, C++ wide characters, M t of Most f the th worlds ld alphabets, l h b t plus l symbols b l UTF-8, UTF-16: variable-length encodings
Chapter 2 Instructions: Language of the Computer 47
Byte/Halfword Operations
lb rt, offset(rs) ff ( )
Sign extend to 32 bits in rt Zero extend to 32 bits in rt Store just rightmost byte/halfword
Chapter 2 Instructions: Language of the Computer 48
sb b rt, offset(rs) ff ( )
C code (nave):
Null-terminated Null terminated string void strcpy (char x[], char y[]) { int i; i = 0; while ((x[i]=y[i])!='\0') (( [ ] y[ ]) \ ) i += 1; } Addresses of x, y in $a0, $a1 i in $s0
MIPS code:
strcpy: addi sw add L1: add lbu add sb b beq addi j L2: lw addi jr $sp, $s0, $s0, $t1, $t2, $t3, $t2, $t2, $ 2 $s0, L1 $s0, $ , $sp, $ra $sp, -4 0($sp) $zero, $zero $s0, $a1 0($t1) $s0, $a0 0($t3) $zero, $ L2 2 $s0, 1 0($sp) ($ p) $sp, 4 # # # # # # # # # # # # # adjust stack for 1 item save $s0 i = 0 addr of y[i] in $t1 $t2 = y[i] addr of x[i] in $t3 x[i] = y[i] exit i loop l if y[i] [i] == 0 i = i + 1 next iteration of loop restore saved $s0 $ pop 1 item from stack and return
2.10 MIPS Addres ssing for 3 32-Bit Imm mediates and Addres sses
32-bit Constants
lhi $s0, 61
ori $s0, , $s0, , 2304 0000 0000 0111 1101 0000 1001 0000 0000
Branch Addressing
rs
5 bits
rt
5 bits
constant or address
16 bits
Jump Addressing
address
26 bits
If branch target is too far to encode with 16-bit offset, offset assembler rewrites the code Example
b beq $s0,$s1, $ 0 $ 1 L1 b bne $s0,$s1, $ 0 $ 1 L2 2 j L1 L2:
Synchronization
Atomic read/write memory operation No other access to the location allowed between the read and write E.g., atomic swap of register memory Or an atomic pair of instructions
Synchronization in MIPS
Returns 1 in rt Returns 0 in rt
Static linking
Assembler Pseudoinstructions
Most assembler instructions represent machine instructions one-to-one Pseudoinstructions: figments of the assemblers assembler s imagination
add $t0, $zero, $t1 blt $t0, $t1, L slt $at, $t0, $t1
move $t0, $t1 bne $at, $zero, L
Assembler (or compiler) translates program into machine instructions Provides information for building a complete program from the pieces
Header: H d d described ib d contents t t of f object bj t module d l Text segment: translated instructions Static data segment: data allocated for the life of the program Relocation info: for contents that depend on absolute location of loaded program Symbol table: global definitions and external refs Debug info: for associating with source code
Chapter 2 Instructions: Language of the Computer 61
But with virtual memory, no need to do this Program can be loaded into absolute location in virtual memory space
Chapter 2 Instructions: Language of the Computer 62
Loading a Program
4. Set up arguments on stack 4 5. Initialize registers (including $sp, $fp, $gp) 6 Jump 6. J to t startup t t routine ti
Copies arguments to $a0, and calls main When main returns returns, do exit syscall
Chapter 2 Instructions: Language of the Computer 63
Dynamic Linking
Requires procedure code to be relocatable Avoids image bloat caused by static linking of all (transitively) referenced libraries Automatically picks up new library versions
Lazy Linkage
Indirection table Stub: Loads routine ID, Jump to linker/loader Linker/loader code
Compiles bytecodes of hot methods into native code for host machine
Interprets bytecodes
C Sort Example
Illustrates use of assembly instructions o a C bubb bubble e so sort t function u ct o for Swap procedure (leaf)
void swap(int v[], int k) { int temp; t temp = v[k]; [k] v[k] = v[k+1]; v[k+1] [ ] = temp; p; } v in $a0, k in $a1, temp in $t0
Inner loop
Relative Performance
Instruction count
O1
O2
O3
Clock Cycles
CPI
C/O1
C/O2
C/O3
Java/int
Java/JIT
C/O1
C/O2
C/O3
Java/int
Java/JIT
Lessons Learnt
Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Java/JIT compiled code is significantly f t than faster th JVM interpreted i t t d
Multiply strength reduced to shift Array version requires shift to be inside loop
ARM: the most popular embedded core Similar basic set of instructions to MIPS
ARM MIPS 1985 32 bits 32-bit flat Aligned 3 31 32-bit Memory mapped 1985 32 bits 32-bit flat Aligned 9 15 32-bit Memory mapped
Date announced Instruction size Address space Data alignment Data addressing modes Registers Input/output
Negative, zero, carry, overflow Compare instructions to set condition codes without keeping the result Top 4 bits of instruction word: condition value C avoid Can id b branches h over single i l i instructions t ti
Instruction Encoding
Accumulator, plus 3 index-register pairs Complex instruction set (CISC) Adds FP instructions and register stack Segmented memory mapping and protection Additional addressing modes and operations Paged memory mapping as well as segments
Further evolution
C Compatible tibl competitors: tit AMD AMD, C Cyrix, i Later versions added MMX (Multi-Media eXtension) instructions The infamous FDIV bug New microarchitecture (see Colwell Colwell, The Pentium Chronicles) Added SSE (Streaming SIMD Extensions) and associated registers New microarchitecture Added SSE2 instructions
Chapter 2 Instructions: Language of the Computer 82
Pentium 4 (2001)
And further
AMD64 (2003): extended architecture to 64 bits EM64T Extended E d dM Memory 64 T Technology h l (2004)
AMD64 adopted by Intel (with refinements) Added SSE3 instructions Added SSE4 instructions, virtual machine support Intel declined to follow, instead Longer SSE registers, more instructions
Address in register Address = Rbase + displacement Address = Rbase + 2scale Rindex (scale = 0, 1, 2, or 3) l R Add Address = Rbase + 2scale di l index + displacement
Chapter 2 Instructions: Language of the Computer 85
Implementing IA-32
Microengine similar to RISC Market share makes this economically viable Compilers avoid complex instructions
Chapter 2 Instructions: Language of the Computer 87
Fallacies
Compilers are good at making fast code from simple instructions But modern compilers are better at dealing with modern ode processors p ocesso s More lines of code more errors and less productivity
Fallacies
Pitfalls
Increment by 4, not by 1!
e.g., passing i pointer i t b back k via i an argument t Pointer becomes invalid when stack popped
Concluding Remarks
Design principles
1. 1 2. 3. 3 4. Simplicity favors regularity Smaller is faster Make a et the e co common o case fast ast Good design demands good compromises
Layers of software/hardware
Concluding Remarks