Module 5 ARM
Module 5 ARM
One of the most licensed and thus widespread processor cores in the world
– Used in PDA, cell phones, multimedia players, handheld game console, digital TV and cameras
Used especially in portable devices due to its low power consumption and reasonable performance
1
ARM HISTORY
2
IP-INTELLECTUAL PROPERTY
3
PROCESSOR DESIGN, NOT A CHIP
4
ARM PARTNERSHIP MODEL
5
RISC DESIGN PHILOSOPHY
Instructions- reduced number of instructions that can operate in a
single cycle. Compiler synthesis complex operations by combining
these instructions.
Pipelines-no need for executing microcode as in CISC processor
Registers: large general purpose registers. Any register can contain
address or data
Load store architecture: processor operates only on data in registers.
Separate load and store instruction to transfer data to external memory.
In CISC we can act on memory directly
6
CISC VS RISC
7
ORTHOGONAL INSTRUCTIONS
If each choice in the building of an instruction is independent of the other
choices
If add uses a 3-address format with register addresses, so should
subtract, and in neither case should there be any peculiar restrictions on
the registers which may be used easier for the assembly language
programmer to learn and easier for the compiler writer to target
The hardware implementation will usually be more efficient
8
CISC
A complex sequence of operations over many clock cycles.
The processors were controlled by microcode ROMs (Read Only
Memories) that were faster than main memory
Micro codes absorbed an unreasonable proportion of the area of a single
chip, leaving little room for other performance-enhancing features
9
RISC
The RISC concept was a major influence on the design of the ARM processor;
indeed, RISC was the ARM's middle name
the most important instructions to optimise are those concerned with data
movement, either between the processor registers and memory or from register to
register
Second most frequent are the control flow instructions such as branches and
procedure calls
10
ADVANTAGES AND DISADVANTAGES OF RISC
Advantages
A smaller die size
A shorter development time.
A higher performance.
Drawbacks
1. RISCs generally have poor code density compared with CISCs.
2. RISCs don't execute x86 code. PC emulation software is available for
many RISC platforms
11
ARM DESIGN PHILOSOPHY
Requirements
1. Limited battery power- hence reduce power consumption-portable devices
2. High code density: limited memory due to cost/ physical size restriction
3. Reduce area of die: more space will be available for peripherals
4. Debug technology: soft engg can view what is happening while processor is
executing code
5. Not a pure RISC architecture because of constraints of primary application-
embedded system
12
DEVIATION FROM PURE RISC DEFINITIONS
13
EMBEDDED SYSTEM HARDWARE
14
ARM CLASSIFICATION
ARM1 26 bit addressing – first ARM processor
ARM2 32 bit multiplier and32 bit co-processor
ARM3 on-chip cache
ARM6 separate CPSR and SPSR (current/saved program status registers),
abort and undefined instruction modes, MMU support
ARM7 family: in 1995 , have 3 stage pipeline. Most popular is ARM7TDMI-S
(Thumb Instruction, Debugger, Multiplier, In Circuit Emulator)
ARM9 family: 1997 Five stage pipeline-Harvard architecture having separate
instruction and data bus-
15
ARM CLASSIFICATION
ARM10 family: 1999 six pipeline stages-vector floating point unit-
ARM11 family: 2003, 8 stage pipeline, SIMD extension for media processing
(single instruction multiple data)
16
EMBEDDED PROCESSORS
17
ARM7 FAMILY
18
ABSTRACTION IN HARDWARE DESIGN
1. Transistors;
2. Logic gates, memory cells, special circuits;
3. Single-bit adders, multiplexers, decoders, flip-flops;
4. Word-wide adders, multiplexers, decoders, registers, buses;
5. ALUs (Arithmetic-Logic Units), barrel shifters, register banks, memory blocks;
6. Processor, cache and memory management organizations;
7. Processors, peripheral cells, cache memories, memory management units;
8. Integrated system chips;
9. Printed circuit boards;
10. Mobile phones, PCs, engine Controller
19
MU0 SIMPLE PROCESSOR
Program counter (PC) register is used to hold the address of the next instruction;
Accumulator (ACC) that used to perform arithmetic and logic operation then
holds a data value while it is worked upon;
Arithmetic-logic unit (ALU) that can perform a number of operations on binary
operands
Instruction register (IR) that holds the current instruction while it is executed;
Instruction decode and control logic that employs the above components to
achieve the desired results from each instruction.
20
DATA PATH DESIGN
21
DATA PATH OPERATIONS
22
ARCHITECTURAL INHERITANCE
Features Used
Load-store architecture;
23
ARCHITECTURAL INHERITANCE
Features Rejected
Register window
Delayed branches
24
ARM PROGRAMMER MODEL
18 active registers: 16 data registers and 2 program status registers. The data registers are
visible to the programmer as r0 to r15.
Register r13 used as the stack pointer (sp) ( stores the head of the stack in the current
processor mode).
Register r14 is called the link register (lr) and is where the core puts the return address
whenever it calls a subroutine.
Register r15 is the program counter (pc) and contains the address of the next instruction to
be fetched by the processor
25
ARM PROGRAMING MODEL
27
PROCESSOR MODES
There are seven processor modes in total: six privileged modes (abort, fast
interrupt request, interrupt request, supervisor, system, and undefined) and one
non-privileged mode (user)
A privileged mode allows full read-write access to the cpsr. a non privileged
mode only allows read access to the control field in the cpsr but still allows
read-write access to the condition flags
28
PROCESSOR MODES
Abort mode: when there is a failed attempt to access memory
System mode: is a special version of user mode that allows full read-write access to
the CPSR
30
STATUS REGISTER
CPSR is divided into four fields, each 8 bits wide: flags, status, extension, and
control
In current designs the extension and status fields are reserved for future use
Control field contains the processor mode, state, and interrupt mask bits
The J bit, which can be found in the flags field, is only available on Jazelle-enabled
processors, which execute 8-bit instructions
31
STATUS REGISTER
N: Negative; the last ALU operation which changed the flags produced a negative result
(the top bit of the 32-bit result was a one).
Z: Zero; the last ALU operation which changed the flags produced a zero result (every bit
of the 32-bit result was zero).
C: Carry; the last ALU operation which changed the flags generated a carry-out, either as
a result of an arithmetic operation in the ALU or from the shifter.
V:Overflow; the last arithmetic ALU operation which changed the flags generated an
overflow into the sign bit
32
ARM I/O SYSTEM
The ARM handles I/O (input/output) peripherals (such as disk controllers, network
interfaces, and so on) as memory-mapped devices with interrupt support
Normal interrupt (IRQ) or the fast interrupt (FIQ) input. Both interrupt inputs are level-
sensitive and maskable
Some systems may include direct memory access (DMA) hardware external to the
processor to handle high-bandwidth I/O traffic
33
ARM CROSS DEVELOPMENT TOOL
34
5 STAGES OF PIPELINE
1. Fetch: the instruction is fetched from memory and placed in the instruction pipeline
2. Decode: the instruction is decoded and register operands read from the register file.
There are three operand read ports in the register file, so most ARM instructions can
source all their operands in one cycle
3. Execute: an operand is shifted and the ALU result generated. If the instruction is a load
or store the memory address is computed in the ALU.
4. Buffer/data: data memory is accessed if required. Otherwise the ALU result is simply
buffered for one clock cycle to give the same pipeline flow for all instructions
5. Write-back: the results generated by the instruction are written back to the register file,
including any data loaded from memory 35
STRUCTURAL HAZARD
A machine has shared a single-memory pipeline for data and instructions. As a result,
when an instruction contains a data-memory reference (load), it will conflict with the
instruction reference for a later instruction (instr 3)
36
STRUCTURAL HAZARD
To resolve this, we stall the pipeline for one clock cycle when a data-memory access
occurs. The effect of the stall is actually to occupy the resources for that instruction slot.
Another solution is to use separate instruction and data memories (Harvard architecture).
37
DATA FORWARDING
Because instruction execution is spread across three pipeline stages,
the only way to resolve data dependencies without stalling the pipeline
is to introduce forwarding paths
Forwarding paths allow results to be passed between stages as soon
as they are available
38
DATA HAZARDS
Data hazards occur when the pipeline changes the order of read/write
accesses to operands so that the order differs from the order seen by
sequentially executing instructions on the unpipelined machine
39
FORWARDING
The problem with data hazards, introduced by this sequence of
instructions can be solved with a simple hardware technique called
forwarding
40
WITHOUT FORWARD
41
5 STAGE PIPELINE
42
ADDRESSING MODES
1. Immediate addressing
2. Absolute addressing: the instruction contains the full binary address of the desired value in
memory.
3. Indirect addressing: the instruction contains the binary address of a memory location that contains
the binary address of the desired value.
4. Register addressing: the desired value is in a register, and the instruction contains the register
number.
5. Register indirect addressing: the instruction contains the number of a register which contains the
address of the value in memory.
6. Base plus offset addressing: the instruction specifies a register (the base) and a binary offset to be
added to the base to form the memory address.
43
ADDRESSING MODE
7. Base plus index addressing: the instruction specifies a base register and another register
(the index) which is added to the base to form the memory address.
8. Base plus scaled index addressing: as above, but the index is multiplied by a constant
(usually the size of the data item, and usually a power of two) before being added to the base.
9. Stack addressing: an implicit or specified register (the stack pointer) points to an area of
memory (the stack) where data items are written (pushed) or read (popped) on a last-in-first-
out basis.
44
ARITHMETIC INSTRUCTION
Arithmetic instruction: 'ADD' is simple addition, 'ADC' is add with carry,
45
BIT-WISE LOGICAL OPERATIONS
Bit-wise logical operations : BIC, stands for 'bit clear' where every ' 1' in the
second operand clears the corresponding bit in the first
46
REGISTER MOVEMENT OPERATIONS
Register movement operations: The 'MVN' mnemonic stands for 'move negated
47
COMPARISON
Comparison operations: do not produce result but set condition flags.
'compare' (CMP)
48
IMMEDIATE OPERANDS
ADD r3,r3,#1 ;r3:=r3+1
# indicate immediate
49
SHIFT OPERATOR
LSL: logical shift left by 0 to 31 places; fill the vacated bits at the least significant end of the
word with zeros.
LSR: logical shift right by 0 to 31 places; fill the vacated bits at the most significant end of
the word with zeros.
ASL: arithmetic shift left; this is a synonym for LSL.
ASR: arithmetic shift right by 0 to 31 places; sign extension used.
ROR: rotate right by 0 to 31 places; the bits which fall off the least significant end of the word
are used, in order, to fill the vacated bits at the most significant end of the word.
RRX: rotate right extended by 1 place; the vacated bit (bit 31) is filled with the old value of
the C flag and the operand is shifted one place to the right.
50
ROR AND RRX
51
CONDITION CODES
52
MULTIPLY
MUL r4,r3,r2
Immediate second operands are not supported.
The result register must not be the same as the first source register.
If the 's' bit is set the V flag is preserved (as for a logical instruction) and the C
flag is rendered meaningless.
Multiplying two 32-bit integers gives a 64-bit result, the least significant 32
bits of which are placed in the result register and the rest are ignored
53
MULTIPLY ACCUMULATE
MLA r4,r3,r2,r1 ; r4=(r3xr2)+r1
54
DATA TRANSFER INSTRUCTION
Multiple register load and store instructions : used for copy blocks of
data around memory
55
REGISTER INDIRECT ADDRESSING
56
PSEUDO INSTRUCTION (ADR)
Looks like a normal instruction in the assembly source code but does not correspond
directly to a particular ARM instruction. Here we have introduced labels (COPY, TABLE1
and TABLE2) which are simply names given to particular points in the assembly code
57
SINGLE REGISTER LOAD AND STORE
INSTRUCTIONS
58
BASE PLUS OFFSET ADDRESSING
;r1:=r1+4
;r1:=r1+4
59
MULTIPLE ADDRESS DATA TRANSFER
60
STACK ADDRESSING
Stack pointer holds the address of current top of the stack, either by
pointing to the last valid data item pushed onto the stack(full stack), or
by pointing to the vacant slot where the next item will be placed (empty
stack)
61
TYPES OF STACK
ARM support all four forms of stack.
Full ascending: the stack grows up through increasing memory addresses and the
base register points to the highest address containing a valid item
Empty ascending: the stack grows up through increasing memory addresses and
the base register points to the first empty location above the stack.
Full descending: the stack grows down through decreasing memory addresses
and the base register points to the lowest address containing a valid item.
Empty descending: the stack grows down through decreasing memory addresses
and the base register points to the first empty location below the stack.
62
BLOCK COPY ADDRESSING
We have two views , both are interchangeable
63
BLOCK COPY ADDRESSING
64
BLOCK COPY ADDRESSING
Copying 8 words from location r0 to r1. if reg r2 to r9 have useful info we
can store them in stack
STMFD r13!,{r2-r9} ; save regs to stack
LDMIA r0!,{r2-r9}
STMIA r1,{r2-r9}
LDMFD r13!,{r2-r9} ; restore from stack
these load and store multiple instruction register are four time faster than
equivalent single register instructions.
65
CONTROL FLOW INSTRUCTION
Branch instructions
B LABEL
….
LABEL ….
the jump can be backward or forward.
conditional branches
MOV r0,#0 ; initialize counter
LOOP ….
ADD r0,r0,#1 ; increment loop counter
CMP r0,#10 ; compare with limits
BNE LOOP ; repeat if not equal
66
BRANCH CONDITION
B/ BAL unconditional always
BEQ equal
BNE not equal
BPL result positive or zero
BMI result is minus
BCC/BLO carry clear/lower
BCS/ BHS carry set/ Higher or same
BVC signed operation , no overflow
BVS signed operation ,overflow set
67
BRANCH CONDITION
BGT signed operation gave greater than
BGE signed operation gave greater or equal
BLT signed operation gave less than
BLE signed operation gave less than or equal
BHI unsigned comparison gave higher
BLS unsigned comparison gave lower or same
68
BRANCH CONDITION
Conditional execution : applicable to all ARM instruction. invoked using 2
letter condition after opcode
CMP r0,#5
ADD r1,r1,r0
SUB r1,r1,r2
BYPASS ….
May be replaced by
CMP r0,#5
ADDNE r1,r1,r0
if(a==b)&&(c==d) e++
CMP r0,r1
CMPEQ r2,r3
ADDEQ r4,r4,#1
70
BRANCH CONDITION
Branch and link
BL SUBR
….
SUBR….
MOV pc,r14
Since return address is stored in link register, subroutine should not call a nested subroutine because r14
will be overwritten and will not be possible to go back to original caller. In such case push to stack is better
BL SUB1
…
SUB1 STMFD r13!, {r0-r2,r14}
71
BRANCH CONDITION
Subroutine return instruction
SUB2 …..
MOV pc,r14
If return address pushed to stack
SUB1 STMFD r13!,{r0-r2,r14}
BL SUB2
…
LDMFD r13!,{r0-r2,pc} single restore and return instruction
72
BRANCH CONDITION
Supervisor calls: operates at privileged levels .Use software interrupt
SWI instruction
ARM coprocessors may support other data types including floating point values
74
THE 'LITTLE-ENDIAN' AND 'BIG-ENDIAN' TERMINOLOGY
75
THE 'LITTLE-ENDIAN' AND 'BIG-ENDIAN'
TERMINOLOGY
76
PRIVILEDGED MODES
77
SPSR
ARM has privileged operating modes which are used to handle exceptions
and supervisor calls
The SPSR register is used to save the state of the CPSR (Current Program
Status Register) when the privileged mode is entered in order that the user
state can be fully restored when the user process is resumed
Some ARM processors do not support all of the above operating modes, and
some also support '26-bit' modes for backwards compatibility with older ARMs
78
EXCEPTIONS
Exceptions are usually used to handle unexpected events which arise during the execution of a program
1. Exceptions generated as the direct effect of executing an instruction. Software interrupts, undefined
instructions (including coprocessor instructions where the requested coprocessor is absent) and
prefetch aborts (instructions that are invalid due to a memory fault occurring during fetch) come under
this heading.
2. Exceptions generated as a side-effect of an instruction. Data aborts (a memory fault during a load or
store data access) are in this class.
3. Exceptions generated externally, unrelated to the instruction flow. Reset, IRQ and FIQ fall into this
category.
79
EXCEPTION ENTRY
ARM completes the current instruction as best it can (except RESET).
Exception entry caused by a side-effect or an external event take over the next
instruction in the current sequence; direct-effect exceptions are handled in sequence
as they arise
80
SEQUENCE OF ACTION
• It changes to the operating mode corresponding to the particular exception.
• It saves the address of the instruction following the exception entry instruction in r14
of the new mode.
• It saves the old value of the CPSR in the SPSR of the new mode.
• It disables IRQs by setting bit 7 of the CPSR and, if the exception is a fast interrupt,
disables further fast interrupts by setting bit 6 of the CPSR.
• It forces the PC to begin executing at the relevant vector address
81
VECTORED ADDRESS
The two banked registers in each of the privileged modes are used to hold the return
address and a stack pointer. FIQ have some additional registers also
82
EXCEPTION RETURN
83
EXCEPTION RETURN
If the CPSR is restored first, the banked r14 holding the
return address is no longer accessible;
If the PC is restored first, the exception handler loses
control of the instruction stream and cannot cause the
restoration of the CPSR to take place.
84
METHOD 1
When the return address has been kept in the banked r14
To return from a SWI or undefined instruction trap use: MOVS pc, r14
To return from an IRQ, FIQ or prefetch abort use: SUBS pc, r14,
#4
To return from a data abort to retry the data access use: SUBS
pc, r14, #8
The ‘S' modifier after the opcode signifies the special form of the instruction when the destination
register is the PC
85
IRQ and FIQ must return one instruction early in order to
execute the instruction that was 'usurped' for the exception entry.
Prefetch abort must return one instruction early to execute the
instruction that had caused a memory fault when first requested.
Data abort must return two instructions early to retry the data
transfer instruction, which was the instruction before the one
usurped for exception entry.
86
METHOD 2
If the handler has copied the return address out onto a stack ( mostly for
recursive calls)
The ^ after the register list (must have PC) indicate a special form of
instruction that copies CPSR and PC at the same time from memory
87
EXCEPTION PRIORITIES
avoid the 'never' condition is that ARM Limited have indicated that they may use in future for
other purpose.
Alternative mnemonics indicates that there is more than one way to interpret the condition field
For any instruction opposite also available( except always and never)
89
90
BRANCH AND BRANCH LINK
Binary encoding
Assembler format
B{L}{<cond>} <target address>
L for branch and link variant
<cond> any one mnemonic extension. If omitted AL is assumed
<target address> will be a label in assembler code. Assembler will
generate the offset 91
24 bit offset in instruction is sign extended, shift left two place
and add with PC which have branch instruction address plus 8
bytes.
Only on ARM that support thumb for switching processor to execute thumb instruction or
back wards
BX/BLX
Two formats
93
FORMAT 1
Branch target in reg Rm. Bit [0] of Rm copied to T bit in the CPSR and remaining
bits copied to PC
94
FORMAT 2
24 bit offset in instruction is sign extended, shift left two place and add with
PC which have branch instruction address plus 8 bytes. H bit added bit 1 to
the resulting address, allowing odd half word address. L for link register.
B{L}X{<cond>}Rm
BLX < target address>