0% found this document useful (0 votes)
17 views36 pages

ch2-2

The document discusses the hardware/software interface in C, detailing variable storage classes, memory layout, and character data encoding. It covers LEGv8 instruction formats, synchronization mechanisms, and the process of linking and loading programs, including the use of dynamically linked libraries. Additionally, it provides examples of C code and corresponding LEGv8 assembly code for operations like string copying and sorting, emphasizing the importance of algorithm efficiency and compiler optimizations.

Uploaded by

macbay prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views36 pages

ch2-2

The document discusses the hardware/software interface in C, detailing variable storage classes, memory layout, and character data encoding. It covers LEGv8 instruction formats, synchronization mechanisms, and the process of linking and loading programs, including the use of dynamically linked libraries. Additionally, it provides examples of C code and corresponding LEGv8 assembly code for operations like string copying and sorting, emphasizing the importance of algorithm efficiency and compiler optimizations.

Uploaded by

macbay prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Hardware/Software Interface

A C variable is generally a location in storage, and its


interpretation depends both on its type and storage
class.

C has two storage classes: automatic and static.

C variables declared outside all procedures are


considered static, as are any variables declared
using the keyword static. The rest are automatic. T

o simplify access to static data, some LEGv8


compilers reserve a register, called the global
pointer, or GP.

For example X27 could be reserved for GP.


Memory Layout

Text: program code

Static data: global variables

e.g., static variables in C,
constant arrays and
strings


Dynamic data: heap

E.g., malloc in C, new in Java

Stack: automatic storage

FIGURE 2.14 The LEGv8 memory


allocation for program and data
Communicating with People
Character Data
• Byte-encoded character sets
– ASCII: 128 characters

95 graphic, 33 control
– Latin-1: 256 characters

ASCII, +96 more graphic characters
• Unicode: 32-bit character set
– Used in Java, C++ wide characters, …
– Most of the world’s alphabets, plus symbols
– UTF-8, UTF-16: variable-length encodings
Byte/Halfword Operations
• LEGv8 byte/halfword load/store
– Load byte:

LDURB Rt, [Rn, offset]

Sign extend to 32 bits in rt
– Store byte:

STURB Rt, [Rn, offset]

Store just rightmost byte
– Load halfword:

LDURH Rt, [Rn, offset]

Sign extend to 32 bits in rt
– Store halfword:

STURH Rt, [Rn, offset]

Store just rightmost halfword
String Copy Example
• C code:
– Null-terminated string
void strcpy (char x[], char y[])
{ size_t i; i =
0;
while ((x[i]=y[i])!='\0') i += 1;
}
String Copy Example
• LEGv8 code:
strcpy:
SUBI SP,SP,8 // push X19
STUR X19,[SP,#0]
ADD X19,XZR,XZR // i=0
L1: ADD X10,X19,X1 // X10 = addr of y[i] LDURB
X11,[X10,#0] // X11 = y[i]
ADD X12,X19,X0 // X12 = addr of x[i]
STURB X11,[X12,#0] // x[i] = y[i]
CBZ X11,L2 // if y[i] == 0 then exit ADDI
X19,X19,#1 // i = i + 1
B L1 // next iteration of loop L2:
LDUR X19,[SP,#0] // restore saved $s0
ADDI SP,SP,8 // pop 1 item from stack BR
LR // and return
§

Addresses
2.10 LEGv8 Addressing for 32-Bit Immediates and
32-bit Constants
• Most constants are small
– 12-bit immediate is sufficient
• For the occasional 32-bit constant
MOVZ: mov wide with zeros
MOVK: e with with keep
mov
e

Use with flexible second operand (shift)
MOV X9,255,LSL 16
Z
0000 0000 0000 0000 0000 0000 0000 0000 1111 0000 0000 1111
0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 1111
1111 0000 0000 0000
1111
0000 0000 1111 0000

MOV X9,255,LSL 0
K
Branch Addressing
• B-type
– B 1000 // go to location 10000ten

5 10000ten

6 bits 26 bits

• CB-type
– CBNZ X19, Exit // go to Exit if X19 != 0
181 Exit 19
8 bits 19 5 bits
bits

• Both addresses are PC-relative


Address = PC + offset (from instruction)
LEGv8 Addressing Summary
FIGURE 2.19 Illustration of four LEGv8
addressing modes.
LEGv8 Encoding Summary

FIGURE 2.21 LEGv8 instruction formats.


Parallelism and Instructions:
Synchronization
• Two processors sharing an area of memory
– P1 writes, then P2 reads
– Data race if P1 and P2 don’t synchronize

Result depends of order of accesses
• Hardware support required
– Atomic read/write memory operation
– No other access to the location allowed between the
read and write
• Could be a single instruction
– E.g., atomic swap of register ↔ memory
– Or an atomic pair of instructions
Synchronization in LEGv8
• Load exclusive register: LDXR
• Store exclusive register: STXR
• To use:
– Execute LDXR then STXR with same address
– If there is an intervening change to the address, store
fails (communicated with additional output register)
– Only use register instruction in between
Synchronization in LEGv8
• Example 1: atomic swap (to test/set lock variable)
again: LDXR X10,[X20,#0]
STXR X23,X9,[X20] // X9 = status
CBNZ X9, again
ADD X23,XZR,X10 // X23 = loaded value

• Example 2: lock
ADDI
X11,XZR,#1 // copy locked value
again: LDXR
X10,[X20,#0] // read lock
CBNZ
X10, again // check if it is 0 yet
STXR
X11, X9, [X20] // attempt to store
BNEZ
X9,again // branch if fails
– Unlock:
STUR XZR, [X20,#0] // free lock
Translation and Startup

Many compilers
produce object
modules directly

Static linking
Producing an Object Module
• Assembler (or compiler) translates program
into machine instructions
• Provides information for building a
complete program from the pieces
– Header: described contents of object module
– Text segment: translated instructions
– Static data segment: data allocated for the life of
the program
– Relocation info: for contents that depend on
absolute location of loaded program
– Symbol table: global definitions and external refs
– Debug info: for associating with source code
Linking Object Modules
• Produces an executable image
1.Merges segments
2.Resolve labels (determine their addresses)
3.Patch location-dependent and external refs
• Could leave location dependencies for
fixing by a relocating loader
– But with virtual memory, no need to do this
– Program can be loaded into absolute location
in virtual memory space
Loading a Program
• Load from image file on disk into memory
1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory

Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including SP, FP)
6. Jump to startup routine

Copies arguments to X0, … and calls main

When main returns, do exit syscall
Dynamically Linked Libraries (DLLs)
• Only link/load library procedure when it
is called
– Requires procedure code to be relocatable
– Avoids image bloat caused by static linking of
all (transitively) referenced libraries
– Automatically picks up new library versions
Lazy Linkage

Indirection
table

Stub: Loads routine


ID, Jump to
linker/loader

Linker/loader code

Dynamically
mapped code
FIGURE 2.23 Dynamically linked
library via lazy procedure linkage.
Starting Java Applications
Simple
portable
instruction set
for the JVM

Compiles
Interpret
bytecodes
s
of “hot”
bytecode
methods
into native
code for
host
Together
§

C Sort Example

2.13 A C Sort Example to Put It All



Illustrates use of assembly instructions for
a C bubble sort function

Swap procedure (leaf)
void swap(long long int v[], long
long int k)
{
long long int temp; temp =
v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
– v in X0, k in X1, temp in X9
The Procedure Swap
swap: LSL X10,X1,#3 // X10 = k * 8
ADD X10,X0,X10 // X10 = address of v[k] LDUR
X9,[X10,#0] // X9 = v[k]
LDUR X11,[X10,#8] // X11 = v[k+1] STUR
X11,[X10,#0] // v[k] = X11 (v[k+1])
STUR X9,[X10,#8] // v[k+1] = X9 (v[k])
BR LR // return to calling routine
The Sort Procedure in C
• Non-leaf (calls swap)
void sort (long long int v[], size_t n)
{
size_t i, j;
for (i = 0; i < n; i + 1) {
=
for (j = i – 1;
j >= 0 && v[j] > v[j + 1]; j -=
1) {
swap(v,j);
}
}
}
– v in X0, n in X1, i in X19, j in X20
The Outer Loop
• Skeleton of outer loop:
– for (i = 0; i <n; i += 1) {

MOV X19,XZR // i = 0
for1tst: CMP X19, X1 // compare X19 to X1 (i to n) B.GE exit1 //
go to exit1 if X19 ≥ X1 (i≥n)

(body of outer for-loop)

ADDI X19,X19,#1 // i += 1
B for1tst // branch to test of outer loop
exit1:
The Inner Loop
• Skeleton of inner loop:
– for (j = i − 1; j >= 0 && v[j] > v[j + 1]; j − = 1) {

SUBI X20, X19, #1 // j=i–1


for2tst: CM X20,XZR // compare X20 to 0 (j to 0)
P
B.LT exit2 // go to exit2 if X20 < 0 (j < 0)
LSL X10, X20, #3 // reg X10 = j * 8
ADD X11, X0, X10 // reg X11 = v + (j * 8)
LDUR X12, [X11,#0] // reg X12 = v[j]
LDUR X13, [X11,#8] // reg X13 = v[j + 1]
CMP X12, X13 // compare X12 to X13
B.LE exit2 // go to exit2 if X12 ≤ X13
MOV X0, X21 // first swap parameter is v
MOV X1, X20 // second swap parameter is j
BL swap // call swap
SUBI X20, X20, #1 // j –= 1
B for2tst // branch to test of inner loop
exit2:
Preserving Registers

Preserve saved registers:
SUBI SP,SP,#40 // make room on stack for 5 regs
STUR LR,[SP,#32] // save LR on stack
STUR X22,[SP,#24] // save X22 on stack
STUR X21,[SP,#16] // save X21 on stack
STUR X20,[SP,#8] // save X20 on stack
STUR X19,[SP,#0] // save X19 on stack
MOV X21, X0 // copy parameter X0 into X21
MOV X22, X1 // copy parameter X1 into X22

Restore saved registers:
exit1: LDUR X19, [SP,#0] // restore X19 from stack
LDUR X20, [SP,#8] // restore X20 from stack
LDUR X21,[SP,#16] // restore X21 from stack
LDUR X22,[SP,#24] // restore X22 from stack
LDUR X30,[SP,#32] // restore LR from stack
SUBI SP,SP,#40 // restore stack pointer

• Procedure return
BR LR // return to calling routine
Lessons Learnt
• Instruction count and CPI are not
good performance indicators in
isolation
• Compiler optimizations are sensitive to
the algorithm
• Java/JIT compiled code is significantly
faster than JVM interpreted
– Comparable to optimized C in some cases
• Nothing can fix a dumb algorithm!
Pointers
§

Arrays vs. Pointers

2.14 Arrays versus


• Array indexing involves

Multiplying index by element size

Adding to array base address
• Pointers correspond directly to
memory addresses

Can avoid indexing complexity
Example: Clearing an Array
clear1(int array[], int size) { int i; clear2(int *array, int size) { int *p;
for (i = 0; i < size; i += 1) array[i] for (p = &array[0]; p < &array[size]; p = p +
= 0; 1)
} *p = 0;
}

MOV X9,XZR // i = 0 loop1: MOV X9,X0 // p = address of


LSL X10,X9,#3 // X10 = i * 8 // array[0]
ADD X11,X0,X10 // X11 = address LSL X10,X1,#3 // X10 = size * 8
// of array[i] ADD X11,X0,X10 // X11 = address
STUR XZR,[X11,#0] // of array[size] loop2:
// array[i] = 0 ADDI STUR XZR,0[X9,#0]
X9,X9,#1 // i = i + 1 // Memory[p] = 0
CMP X9,X1 // compare i to ADDI X9,X9,#8 // p = p + 8
// size CMP X9,X11 // compare p to <
B.LT loop1 // if (i < size) // &array[size]
// go to loop1 B.LT loop2 // if (p <
// &array[size])
// go to loop2
FIGURE 2.31
Comparison of Array vs. Ptr
• Multiply “strength reduced” to shift
• Array version requires shift to be inside loop

Part of index calculation for incremented i

c.f. incrementing pointer
• Compiler can achieve same effect as
manual use of pointers

Induction variable elimination

Better to make program clearer and safer

You might also like