Final Instruction Set
Final Instruction Set
Dr. Nisha K C R
Professor-Dept. of ECE
New Horizon College of Engineering
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Thumb 2 Technology
• Thumb-2 is a superset of Thumb instructions,
including new 32-bit instructions for more
complex operations.
• Thumb-2 is a combination of both 16-bit and 32-
bit instructions.
• The Thumb-2 technology extended the Thumb
3
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Data Types in Cortex M4
Dr.K C R Nisha_Professor-ECE
Commonly Used Directives
Dr.K C R Nisha_Professor-ECE
Instruction Set - Cortex M4
Dr.K C R Nisha_Professor-ECE
LIST OF SYMBOLS
Dr.K C R Nisha_Professor-ECE
Moving data within the processor
Move data from one register to another
Move data between a register and a special register (e.g., CONTROL,
PRIMASK, FAULTMASK, BASEPRI)
Move an immediate constant into a register
Dr.K C R Nisha_Professor-ECE
Immediate Addressing
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions
Dr.K C R Nisha_Professor-ECE
Optional Modifier to mention the data
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions
Dr.K C R Nisha_Professor-ECE
Indexed Addressing Mode
Data in Memory
Dr.K C R Nisha_Professor-ECE
Indexed Addressing Mode
Dr.K C R Nisha_Professor-ECE
Illustration
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions
Example
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions
Example
LDR R3, [R0, R2, LSL #2] ; Read memory [R0+(R2 << 2)] into R3
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions
Dr.K C R Nisha_Professor-ECE
Post Indexing
LDR R0, [R1], #offset ; Read memory[R1], then R1 updated to R1+offset
STR R0, [R1], #12 ; Store memory [R1], then R1 updated to R1+offset
Dr.K C R Nisha_Professor-ECE
Multiple load and Multiple store Instructions
Dr.K C R Nisha_Professor-ECE
Multiple load and Multiple store
Instructions
• Suppose you wanted to load a subset of all registers,
for example, registers r0 to r3, from memory, where
the data starts at address 0xBEEF0000 and continues
upward in memory. The instruction would simply be
LDMIA r9, {r0-r3}
Dr.K C R Nisha_Professor-ECE
Multiple load and Multiple store
Instructions
• This has the same effect as four separate LDR instructions, or
LDR r0, [r9]
LDR r1, [r9, #4]
LDR r2, [r9, #8]
LDR r3, [r9, #12]
• At the end of the load sequence, register r9 has not been changed and still
holds the value 0xBEEF0000. If you wanted to load data into registers r0
through r3 and r12, you could simply add it to the end of the list, i.e.,
LDMIA r9, {r0-r3, r12}
• The lowest register will always be loaded from the lowest address in
memory, and the highest register will be loaded from the highest address.
For example, you could say
LDMIA r9, {r5, r3, r0-r2, r14}
and register r0 will be loaded first, followed by registers r1, r2, r3, r5, and r14.
Dr.K C R Nisha_Professor-ECE
For example, if register r10 contained 0x4000,
LDMIA r10, {r0, r1, r4}
would begin by loading register r0 with data from address 0x4000. The value in the base
register is incremented by one word after the first load is complete. The second register,
r1, is loaded with data from 0x4004, and register r4 is loaded with data from 0x4008.
Note here that the base register is not updated after the instruction completes. The other
three suffixes indicate whether the base register is changed before or after the load or
store, as well as whether it is incremented or decremented, as shown in Figure
Dr.K C R Nisha_Professor-ECE
Stack
• PUSH and POP make it very easy to conceptually deal with stacks
(since the instruction implicitly contains the addressing mode)
• Suppose a stack that starts at address 0x20000200, grows downward
in memory (a full descending stack), and has two words pushed onto
it with the following code:
AREA Example3, CODE, READONLY
ENTRY
SRAM_BASE EQU 0X 20000200
LDR sp, =SRAM_BASE
LDR r3, =0xBABEFACE
LDR r4, =0xDEADBEEF
PUSH {r3}
PUSH {r4}
POP {r4}
POP {r3}
stop B stop ; stop program Dr.K C R Nisha_Professor-ECE
Stack
Dr.K C R Nisha_Professor-ECE
Stack
Dr.K C R Nisha_Professor-ECE
Multiple load and Multiple store Instructions
Dr.K C R Nisha_Professor-ECE
Push and Pop Instruction
Example
Dr.K C R Nisha_Professor-ECE
Recall -Memory Access Instructions
Dr.K C R Nisha_Professor-ECE
Arithmetic Operation
Carry bit is set on addition when crossing 255-0 and Carry bit is cleared
on subtraction when crossing 0-255
Dr.K C R Nisha_Professor-ECE
Arithmetic Data Operation
Dr.K C R Nisha_Professor-ECE
Generalized format
Unary Operation
Produces its result given a single input parameter
Ex. Negate, Complement, Increment, Decrement
Dr.K C R Nisha_Professor-ECE
Logical Operations
Dr.K C R Nisha_Professor-ECE
Logical Operations
Dr.K C R Nisha_Professor-ECE
Logical Operations
Generalized format
Dr.K C R Nisha_Professor-ECE
Shift and Rotate Operations
Dr.K C R Nisha_Professor-ECE
Shift and Rotate Operations
Dr.K C R Nisha_Professor-ECE
Shift and Rotate Operations
Dr.K C R Nisha_Professor-ECE
Data Conversion Operations
Instructions are available for handling signed and unsigned extensions of data; for
example, to convert an 8-bit value to 32-bit, or from 16-bit to 32-bit. The signed and
unsigned instructions are available in both 16-bit and 32-bit forms
Dr.K C R Nisha_Professor-ECE
Data Conversion Operations
• For SXTB/SXTH, the data are sign extended using bit[7]/bit[15] of Rn.
• With UXTB and UXTH, the value is zero extended to 32-bit.
• Illustration:
• For example, if R0 is 0x55AA8765; 0101 0101 1010 1010 1000 0111 0110 0101
• SXTB R1, R0 ; R1 = 0x00000065; 0000 0000 0000 0000 0000 0000 0110 0101
SXTH R1, R0 ; R1 = 0xFFFF8765 ; 1111 1111 1111 1111 1000 0111 0110 0101
UXTB R1, R0 ; R1 = 0x00000065 ; 0000 0000 0000 0000 0000 0000 0110 0101
UXTH R1, R0 ; R1 = 0x00008765 0000 0000 0000 0000 1000 0111 0110 0101
• These instructions are useful for converting between different data types.
Dr.K C R Nisha_Professor-ECE
Data Conversion Operations
The 32-bit form of these instructions can access high registers, and optionally
rotate the input data before the signed extension operations,
Dr.K C R Nisha_Professor-ECE
Reverse Operation
• These instructions are usually used for converting data between little endian and big
endian.
• The 16-bit form of these instructions can only access low registers (R0 to R7).
• REV reverses the byte order in a data word, and REVH reverses the byte order
inside a half-word.
For example, if R0 is 0x12345678, in executing the following:
REV R1, R0 ; R1 will be 0x78563412
REVH R2, R0 ; R2 will be 0x34127856.
• REVSH - processes the lower half-word and then sign extends the result.
For example, if R0 is 0x33448899, running:
REVSH R1, R0; R1 will become 0xFFFF9988.
Dr.K C R Nisha_Professor-ECE
Reverse Operation
Dr.K C R Nisha_Professor-ECE
Bit field processing Instruction
To make the Cortex!-M3 and Cortex-M4 processor an excellent architecture for
control applications, these processors support a number of bit-field processing
operations,
• BFC (Bit Field Clear) clears 1 to 31 adjacent bits in any position of a register.
The syntax of the instruction is:
BFC <Rd>, <#lsb>, <#width> ;
For example:
LDR R0,=0x1234FFFF; 0001 0010 0011 0100 1111 1111 1111 1111
BFC R0, #4, #8
This will give R0 = 0x1234F00F.
Dr.K C R Nisha_Professor-ECE
Bit field processing Instruction
• BFI (Bit Field Insert) copies 1 to 31 bits (#width) from one register to any location
(#lsb) in another register. The syntax is:
For example:
LDR R0,=0x12345678 ;
LDR R1,=0x3355AACC ;
Dr.K C R Nisha_Professor-ECE
Compare and Test
The compare and test instructions are used to update the flags in the APSR,
which may then be used by a conditional branch or conditional execution
Dr.K C R Nisha_Professor-ECE
Program flow control
Several Program Control Instruction:
• Branch Instructions
• Function Call Instructions
• Conditional branch
• Combined compare and conditional branch
• Conditional execution (IF-THEN instruction)
• Table branch
Dr.K C R Nisha_Professor-ECE
Program flow control
Dr.K C R Nisha_Professor-ECE
Program flow control
Dr.K C R Nisha_Professor-ECE
Program flow control
Dr.K C R Nisha_Professor-ECE
Program flow control
Dr.K C R Nisha_Professor-ECE
Program flow control
Dr.K C R Nisha_Professor-ECE
Program flow control
CONDITIONAL EXECUTION- IF Then Instructions
Dr.K C R Nisha_Professor-ECE
Program flow control
CONDITIONAL EXECUTION- IF Then Instructions
Different combinations of “T” and “E” sequence are possible:
•Just one conditional execution instruction: IT
•Two conditional execution instructions: ITT, ITE
•Three conditional execution instructions: ITTT, ITTE, ITET, ITEE
•Four conditional execution instructions: ITTTT, ITTTE, ITTET, ITTEE, ITETT,
•ITETE, ITEET, ITEEE
Example
Dr.K C R Nisha_Professor-ECE
Program flow control
Example:
Dr.K C R Nisha_Professor-ECE
To find Factorial
Dr.K C R Nisha_Professor-ECE
Euclid’s algorithm for computing the GCD of two
positive integers (a,b) can be written as
while (a != b) {
if (a>b) a=a – b;
else b=b – a; }
Dr.K C R Nisha_Professor-ECE
Table branch instructions
The Cortex-M3 and Cortex-M4 support two table branch instructions to
implement branch Tables:
TBB (Table Branch Byte) and TBH (Table Branch Half-word).
• The TBB is used when all the entries in the branch table are organized as a byte
array (offset from base address is less than 2x2^8=512 bytes)
• TBH is used when all the entries are organized as a half-word array (offset from base
address is less than 2x2^16=128K bytes).
The TBB instruction has the syntax:
where Rn stores the base address of the branch table and Rm is the branch
table index.
Dr.K C R Nisha_Professor-ECE
Table branch instructions
Dr.K C R Nisha_Professor-ECE
Saturation operations
• Saturation is commonly used in signal processing. For example, after certain operations
such as amplification, the amplitude of a signal can exceed the maximum
allowed output range.
• If the value is adjusted by simply cutting off the MSB bits, the resulted signal waveform
could be completely distorted as shown in Fig.
• The saturation operation reduces the distortion by forcing the value to the maximum
allowed value.
• The distortion still exists, but if the value does not exceed the maximum range by too
much it is less noticeable.
Dr.K C R Nisha_Professor-ECE
Saturation operations
Dr.K C R Nisha_Professor-ECE
Saturation operations
• The Cortex-M3 processor supports two instructions that provide saturation
adjustment of signed and unsigned data.
• SSAT (for signed data) and USAT (for unsigned data).
For example, if a 32-bit signed value is to be saturated into a 16-bit signed value,
the following instruction can be used:
SSAT R1, #16, R0
For example, you can convert a 32-bit signed value to a 16-bit unsigned value
using:
Dr.K C R Nisha_Professor-ECE
Saturation operations
• Algorithms for handling speech data, adaptive control algorithms, and routines for filtering
are often sensitive to quantization effects when implemented on a micro- processor or
microcontroller.
• Saturated math is one such approach, especially when dealing with signed data.
• For example, consider a digital waveform in Figure 7.7, possibly the output of an adaptive
predictor, where the values are represented by 16-bit signed integers;
• the largest positive value in a register would be 0x00007FFF and the largest negative
value would be 0xFFFF8000.
• If this signal were scaled in some way, it’s quite possible that the largest value would
overflow, effectively flipping the MSB of a value so that a positive number suddenly
becomes negative, and the waveform might appear as in Figure 7.8.
• Using saturated math instructions, the signal would get clipped, and the waveform might
appear as in Figure 7.9, not correcting the values but at least keeping them within limits.
Dr.K C R Nisha_Professor-ECE
Saturation operations
Dr.K C R Nisha_Professor-ECE
Saturation operations
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
Multiply and MAC Instructions
Dr.K C R Nisha_Professor-ECE
SIMD Instructions
or double precision:
double pi = 3.1415926535897932384626433832795;
Floating point data allows the processor to handle a much wider data range
(compared to integers or fixed point data) as well as very small values.
To represent a tiny small number and large number using fewer number of
bits in computer memory----- Floating Point
Ex.
00000000005=0.5X10^-10 –> Small No.
50000000000=5X10^10 -> Large No.
Dr.K C R Nisha_Professor-ECE
Floating Point
Dr.K C R Nisha_Professor-ECE
Floating Point
Dr.K C R Nisha_Professor-ECE
Floating Point
Dr.K C R Nisha_Professor-ECE
Special registers- CONTROL Registers
Dr.K C R Nisha_Professor-ECE
Floating Point Registers
• The Cortex-M4 processor has an
optional floating point unit.
• In the architecture, the FPU is
viewed as a co-processor.
• To be consistent with other ARM
architectures, the floating point unit
is defined as Co-Processor #10 and
#11 in the CPACR programmer’s
model
Dr.K C R Nisha_Professor-ECE
Floating Point Registers-Overview
Dr.K C R Nisha_Professor-ECE
Floating Point Registers-Overview
• Floating point register
bank
– The floating point register bank
contains thirty-two 32-bit
registers, which can be
organized as sixteen 64-bit
double-word registers
– S0 to S15 are caller saved
registers
– S16 to S31 are callee saved
registers.
Dr.K C R Nisha_Professor-ECE
Floating Point Registers
• S0 to S31/D0 to D15
• Each of the 32-bit registers S0 to S31 (“S” for single precision)
can be accessed using floating point instructions, or accessed
as a pair, in the symbol of D0 toD15 (“D” for double-
word/double-precision).
• For example, S1 and S0 are paired together to become D0,
and S3 and S2 are paired together to become D1.
Dr.K C R Nisha_Professor-ECE
Dr.K C R Nisha_Professor-ECE
Floating Point Registers
Note: Before using any floating point instructions, the floating point unit
must be enabled by programming the CPACR register
Dr.K C R Nisha_Professor-ECE
Moving data within the processor
For the Cortex-M4 processor with the floating point unit, one can also:
• Move data between a register in the core register bank and a register
in the floating point unit register bank
• Move data between registers in the floating point register bank
• Move data between a floating point system register (such as the FPSCR
• Floating point Status and Control Register) and a core register
• Move immediate data into a floating point register
Dr.K C R Nisha_Professor-ECE
Memory Access Instructions-FPU
Dr.K C R Nisha_Professor-ECE
Floating Point Instructions
Dr.K C R Nisha_Professor-ECE
Floating Point Instructions
Dr.K C R Nisha_Professor-ECE
Addition of Floating point numbers
Ex.1
Ex.2
LDR r0, =0xE000ED88 ; Read-modify-write LDR
LDR r1, [r0]
ORR r1, r1, #(0xF << 20) ; Enable CP10, CP11
STR r1, [r0]
VMOV.F s0, #0x3FC00000 ; single-precision 1.0
VMOV.F s1, s0
VADD.F s2, s1, s0 ; 1.5+1.5=??
Dr.K C R Nisha_Professor-ECE
Floating point numbers
Ex.3
;Function-------à X^3+2X-8
Dr.K C R Nisha_Professor-ECE
Data exchange from one memory
location to another
Address data
R0+R5 20000000 12345678 R3
Address data
R1+R5 20000050 1511 2522 R4
Dr.K C R Nisha_Professor-ECE
GCD- Lab manual
AREA MYCODE, CODE,READONLY
ENTRY
EXPORT START
START
LDR R0,=4
LDR R1,=2
LOOP CMP R0,R1
BEQ STOP
BLT LESS
SUBS R0,R0,R1
B LOOP
LESS SUB R1,R1,R0
B LOOP
STOP B STOP
END
Dr.K C R Nisha_Professor-ECE
To find Factorial
Dr.K C R Nisha_Professor-ECE
Reference
1. The Definitive Guide to ARM Cortex M3 and Cortex M4 Processors Joseph Yiu,3rd
Edition,2014 Elseiver
2. ARM Assembly Language Fundamentals and Techniques, William Hohl et.al, CRC press
Dr.K C R Nisha_Professor-ECE