100% found this document useful (1 vote)
770 views

Load and Store Instructions

Here is the code segment to add together elements x to x+(n-1) of an array using post-indexed addressing: Loop: LDR r3, [r0], #4 ; Load current element and increment pointer ADD r4, r4, r3 ; Add element to running total SUB r2, r2, #1 ; Decrement count BNE Loop ; Repeat until count reaches 0

Uploaded by

mkollam
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
770 views

Load and Store Instructions

Here is the code segment to add together elements x to x+(n-1) of an array using post-indexed addressing: Loop: LDR r3, [r0], #4 ; Load current element and increment pointer ADD r4, r4, r3 ; Add element to running total SUB r2, r2, #1 ; Decrement count BNE Loop ; Repeat until count reaches 0

Uploaded by

mkollam
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Embedded System Design Center

ARM7TDMI Microprocessor

Load and store instruction

Sai Kumar Devulapalli


Objectives

Detailed understanding of ARM Load and Store instructions


Understanding the instruction encoding formats
Understand the use of addressing mode in accessing the
memory locations
Understanding the general format with which data processing
instructions can be conditionally executed and used for
different sizes of data
Use of immediate offset in addressing memory locations
Understand the operation of swap instruction
Understand the use of coprocessor instructions

2 of 32
Load / Store Instructions
The ARM is a Load / Store Architecture:
Does not support memory to memory data processing
operations.
Must move data values into registers before using them.
This might sound inefficient, but in practice isnt:
Load data values from memory into registers.
Process data in registers using a number of data processing
instructions which are not slowed down by memory access.
Store results from registers out to memory.
The ARM has three sets of instructions which interact with
main memory. These are:
Single register data transfer (LDR / STR).
Block data transfer (LDM/STM).
Single Data Swap (SWP).

3 of 32
Single register data transfer
The basic load and store instructions are:
Load and Store Word or Byte
LDR / STR / LDRB / STRB
ARM Architecture Version 4 also adds support for
halfwords and signed data.
Load and Store Halfword
LDRH / STRH
Load Signed Byte or Halfword - load value and sign
extend it to 32 bits.
LDRSB / LDRSH
All of these instructions can be conditionally executed by
inserting the appropriate condition code after STR / LDR.
e.g. LDREQB
Syntax:
<LDR|STR>{<cond>}{<size>} Rd, <address>
4 of 32
Load and Store Word or Byte:Base Register
The memory location to be accessed is held in a base register
STR r0, [r1] ; Store contents of r0 to location pointed to
; by contents of r1.
LDR r2, [r1] ; Load r2 with contents of memory location
; pointed to by contents of r1.

r0 Memory
Source
Register 0x5
for STR

r1 r2
Base Destination
Register 0x200 0x200 0x5 0x5 Register
for LDR

5 of 32
Instruction Format

31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
Cond 0 1 # P U B W L Rn Rd offset

P : Pre / Post Index


U : Up / Down
B : Unsigned Byte / Word
W : Write-back (auto-index)
L : Load / Store
Rn : Base Register
Rd : Source / Destination Register

6 of 32
Offset

If # (I) = 0

12-bit immediate
If # (I) = 1

11 7 6 5 4 3 0
#shift Sh 0 Rm

7 of 32
Load and Store Word or Byte:
Offsets from the Base Register
As well as accessing the actual location contained in the base register,
these instructions can access a location offset from the base register
pointer.
This offset can be
An unsigned 12bit immediate value (i.e. 0 - 4095 bytes).
A register, optionally shifted by an immediate value
This can be either added or subtracted from the base register:
Prefix the offset value or register with + (default) or -.
This offset can be applied:
before the transfer is made: Pre-indexed addressing
optionally auto-incrementing the base register, by postfixing the
instruction with an !.
after the transfer is made: Post-indexed addressing
causing the base register to be auto-incremented.

8 of 32
Load and Store Word or Byte:Pre-indexed Addressing
Example: STR r0, [r1,#12]
r0
Memory Source
0x5 Register
for STR
Offset
12 0x20c 0x5

r1
Base
Register 0x200 0x200

To store to location 0x1f4 instead use: STR r0, [r1,#-12]


To auto-increment base pointer to 0x20c use: STR r0, [r1, #12]!
If r2 contains 3, access 0x20c by multiplying this by 4:
STR r0, [r1, r2, LSL #2]

9 of 32
Load and Store Word or Byte:
Post-indexed Addressing
Example: STR r0, [r1], #12 Memory

r1 Offset r0
Updated Source
Base 0x20c 12 0x20c
0x5 Register
Register for STR

0x200 0x5
r1
Original
Base 0x200
Register

To auto-increment the base register to location 0x1f4 instead use:


STR r0, [r1], #-12
If r2 contains 3, auto-increment base register to 0x20c by multiplying this
by 4:
STR r0, [r1], r2, LSL #2

10 of 32
Load and Stores with User Mode Privilege

When using post-indexed addressing, there is a further form


of Load/Store Word/Byte: (with translation)
<LDR|STR>{<cond>}{B}T Rd, <post_indexed_address>

When used in a privileged mode, this does the load/store with


user mode privilege.
Normally used by an exception handler that is emulating a
memory access instruction that would normally execute in
user mode.

11 of 32
Example Usage of Addressing Modes

Imagine an array, the first element of which is pointed to by the


contents of r0.
If we want to access a particular element, Memory
element Offset
then we can use pre-indexed addressing:
r1 is element we want.
LDR r2, [r0, r1, LSL #2] 3 12
Pointer to 2 8
If we want to step through every start of array
1 4
element of the array, for instance r0 0 0
to produce sum of elements in the
array, then we can use post-indexed addressing within a loop:
r1 is address of current element (initially equal to r0).
LDR r2, [r1], #4
Use a further register to store the address of final element,
so that the loop can be correctly terminated.

12 of 32
Offsets for Halfword and Signed Halfword / Byte
Access

The Load and Store Halfword and Load Signed Byte or


Halfword instructions can make use of pre- and post-indexed
addressing in much the same way as the basic load and store
instructions.
However the actual offset formats are more constrained:
The immediate value is limited to 8 bits (rather than 12
bits) giving an offset of 0-255 bytes.
The register form cannot have a shift applied to it.

13 of 32
Instruction Format

31 28 27 25 24 23 22 21 20 19 16 15 12 11 8 7 6 5 4 3 0
Cond 000 P U # W L Rn Rd offsetH 1 S H 1 offsetL

P : Pre / Post Index S H Data type


U : Up / Down 1 0 Signed byte

W : Write-back (auto-index) 0 1 Unsigned half-word

L : Load / Store 1 1 Signed half-word

Rn : Base Register
Rd : Source / Destination
Register

14 of 32
Offset

If # (I) = 0
11 8 3 0
imm[7:4] imm [3:0]
If # (I) = 1

11 8 3 0
0 0 0 0 Rm

15 of 32
Quiz
Write a segment of code that add together elements x to
x+(n-1) of an array, where the element x=0 is the first
element of the array.
Each element of the array is word sized (i.e.. 32 bits).
The segment should use post-indexed addressing.
At the start of your segments, you should assume that:
r0 points to the start of the array. Elements

r1 = x
r2 = n
{
x + (n - 1)

n elements
x+1
x

r0 0

16 of 32
Quiz - Sample Solution

ADD r0, r0, r1, LSL#2 ; Set r0 to address of element x


ADD r2, r0, r2, LSL#2 ; Set r2 to address of element n+1
MOV r1, #0 ; Initialise counter
loop
LDR r3, [r0], #4 ; Access element and move to next
ADD r1, r1, r3 ; Add contents to counter
CMP r0, r2 ; Have we reached element x+n?
BLT loop ; If not - repeat for
; next element
; on exit sum contained in r1

17 of 32
Block Data Transfer (1)

The Load and Store Multiple instructions (LDM / STM) allow between 1
and 16 registers to be transferred to or from memory.
The transferred registers can be either:
Any subset of the current bank of registers (default).
Any subset of the user mode bank of registers when in a privileged mode
(postfix instruction with a ^).
31 28 27 24 23 22 21 20 19 16 15 0

Cond 1 0 0 P U S W L Rn Register list

Condition field Base register Each bit corresponds to a particular


register. For example:
Up/Down bit Load/Store bit Bit 0 set causes r0 to be transferred.
0 = Down; subtract offset from base 0 = Store to memory Bit 0 unset causes r0 not to be transferred.
1 = Up ; add offset to base 1 = Load from memory
At least one register must be
Pre/Post indexing bit Write- back bit transferred as the list cannot be empty.
0 = Post; add offset after transfer, 0 = no write-back
1 = Pre ; add offset before transfer 1 = write address into base
PSR and force user bit
0 = dont load PSR or force user mode
1 = load PSR or force user mode

18 of 32
Block Data Transfer (2)

Base register used to determine where memory access should occur.


4 different addressing modes allow increment and decrement inclusive
or exclusive of the base register location.
Base register can be optionally updated following the transfer (by
appending it with an !.
Lowest register number is always transferred to/from lowest memory
location accessed.
These instructions are very efficient for
Saving and restoring context
For this useful to view memory as a stack.
Moving large blocks of data around memory
For this useful to directly represent functionality of the
instructions.

19 of 32
Direct functionality of Block Data Transfer

When LDM / STM are not being used to implement stacks,


it is clearer to specify exactly what functionality of the
instruction is:
i.e. specify whether to increment / decrement the base
pointer, before or after the memory access.
In order to do this, LDM / STM support a further syntax
in addition to the stack one:
STMIA / LDMIA : Increment After
STMIB / LDMIB : Increment Before
STMDA / LDMDA : Decrement After
STMDB / LDMDB : Decrement Before

20 of 32
Example: Block Copy

Copy a block of memory, which is an exact


multiple of 12 words long from the location
pointed to by r12 to the location pointed to
by r13. r14 points to the end of block to be copied.
; r12 points to the start of the source data
r1
; r14 points to the end of the source data 3

; r13 points to the start of the dest. data r1 Increasing


4 Memory
loop LDMIA {r0-r11}, r12!; load 48 bytes
STMIA {r0-r11}, r13!; and store them
CMP r12, r14 ; check for the end r1
BNE loop ; and loop until done 2

This loop transfers 48 bytes in 31 cycles


Over 50 Mbytes/sec at 33 MHz

21 of 32
Swap and Swap Byte Instructions
Atomic operation of a memory read followed by a memory write which
moves byte or word quantities between registers and memory.
Syntax:
SWP{<cond>}{B} Rd, Rm, [Rn]

1
Rn
temp

2 3
Memory
Rm Rd

Thus to implement an actual swap of contents make Rd = Rm.


The compiler cannot produce this instruction.

22 of 32
Example: FIR filter

C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f

23 of 32
FIR filter, cont.d

ADR r3,c ; load r3 with base of c


ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add 1 word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue

24 of 32
Example: Program01

AREA CODE ; Declare the following is CODE area.


ENTRY ; the program entry point here.
LDR R1,N ; R1 = [[PC] + offset to N]
LDR R2,POINTER ; R2 = [[PC] + offset to POINTER]
MOV R0,#0 ; SET R0 = 0
LOOP LDR R3,[R2],#4 ; R3 = [R2];R2 = [R2] + 4
ADD R0,R0,R3 ; R0 = R0 + R3
SUBS R1,R1,#1 ; R1 = R1 - 1; Set the bits in CPSR
BGT LOOP ; PC = [PC + offset to LOOP] if R1>0
STR R0,SUM ; [PC+offset to SUM] = [R0]
AREA DATA
SUM DCD 0
N DCD 5
POINTER DCD NUM1
NUM1 DCD 3,-17,27,-22,322
END
25 of 32
Coprocessors
The ARM architecture supports 16 coprocessors
Each coprocessor instruction set occupies part of the ARM
instruction set.
There are three types of coprocessor instruction
Coprocessor data processing
Coprocessor (to/from ARM) register transfers
Coprocessor memory transfers (load and store to/from
memory)
Assembler macros can be used to transform custom
coprocessor mnemonics into the generic mnemonics
understood by the processor.
A coprocessor may be implemented
in hardware
in software (via the undefined instruction exception)
in both (common cases in hardware, the rest in software)
26 of 32
Coprocessor Data Processing
This instruction initiates a coprocessor operation
The operation is performed only on internal coprocessor
state
For example, a Floating point multiply, which multiplies
the contents of two registers and stores the result in a third
register
Syntax:
CDP{<cond>}
<cp_num>,<opc_1>,CRd,CRn,CRm,{<opc_2>}
31 28 27 26 25 24 23 20 19 16 15 12 11 8 7 5 4 3 0

Cond 1 1 1 0 opc_1 CRn CRd cp_num opc_2 0 CRm

Destination Register Opcode


Source Registers
Opcode
Condition Code Specifier

27 of 32
Coprocessor Register Transfers
These two instructions move data between ARM registers
and coprocessor registers
MRC : Move to Register from Coprocessor
MCR : Move to Coprocessor from Register
An operation may also be performed on the data as it is
transferred
For example a Floating Point Convert to Integer instruction
can be implemented as a register transfer to ARM that also
converts the data from floating point format to integer
format.
Syntax <MRC|MCR>{<cond>}
<cp_num>,<opc_1>,Rd,CRn,CRm,<opc_2>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 5 4 3 0

Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm

ARM Source/Dest Register Opcode


Coprocesor Source/Dest Registers
Condition Code Specifier Transfer To/From Coprocessor
Opcode

28 of 32
Coprocessor Memory Transfers (1)

Load from memory to coprocessor registers


Store to memory from coprocessor registers.

31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 0

Cond 1 1 0 P U N W L Rn CRd cp_num Offset

Source/Dest Register Address Offset


Base Register
Load/Store
Condition Code Specifiers Base Register Writeback
Transfer Length
Add/Subtract Offset
Pre/Post Increment

29 of 32
Coprocessor Memory Transfers (2)
Syntax of these is similar to word transfers between ARM and
memory:
<LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<address>
PC relative offset generated if possible, else causes an error.
<LDC|STC>{<cond>}{<L>}
<cp_num>,CRd,<[Rn,offset]{!}>
Pre-indexed form, with optional writeback of the base register
<LDC|STC>{<cond>}{<L>}
<cp_num>,CRd,<[Rn],offset>
Post-indexed form
where
<L> when present causes a long transfer to be performed (N=1) else
causes a short transfer to be performed (N=0).
Effect of this is coprocessor dependant.

30 of 32
Summary

Load/Store instruction is for moving data between memory


and internal registers
Single register transfer: LDR, STR, LDRB, STRB, LDRH,
STRH, LDRSB, LSRSH
Multi register transfer: LDM, STM
Swap Instruction Byte/Word
Coprocessor Instruction : CDP, MRC, MCR, LDC, STC.

31 of 32
Thank You, Any Questions ?

32 of 32

You might also like