0% found this document useful (0 votes)
2 views

MPMC Unit-3_Part-1

This document provides an overview of ARM architecture and its programming model, detailing components such as the Arithmetic Logic Unit, Booth multiplier, barrel shifter, control unit, and various registers. It explains the ARM design philosophy emphasizing simplicity, efficiency, and the use of a load/store architecture, as well as the role of the instruction pipeline and handling of interrupts. Additionally, it describes the function and structure of the Current Program Status Register and the significance of the instruction set and addressing modes in ARM processors.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

MPMC Unit-3_Part-1

This document provides an overview of ARM architecture and its programming model, detailing components such as the Arithmetic Logic Unit, Booth multiplier, barrel shifter, control unit, and various registers. It explains the ARM design philosophy emphasizing simplicity, efficiency, and the use of a load/store architecture, as well as the role of the instruction pipeline and handling of interrupts. Additionally, it describes the function and structure of the Current Program Status Register and the significance of the instruction set and addressing modes in ARM processors.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT – III: ARM ARCHITECTURE & PROGRAMMING MODEL

History, Architecture, ARM design philosophy, Registers, Program status register, Instruction
pipeline, Interrupts and vector table, ARM processor families, Instruction set: Data processing
instructions, Addressing modes, Branch, Load-Store instructions, PSR instructions, and
Conditional instructions.
ARM Architecture

• Load/store architecture
• A large array of uniform registers
• Fixed-length 32-bit instructions
• 3-address instructions
The ARM Architecture consists of
 Arithmetic Logic Unit
 Booth multiplier
 Barrel shifter
 Control unit
 Register file
Page 1 of 10
The ARM processor conjointly has other components like the Program status register, which
contains the processor flags (Z, S, V and C). The modes bits conjointly exist within the program
standing register, in addition to the interrupt and quick interrupt disable bits; Some special
registers: Some registers are used like the instruction, memory data read and write registers
and memory address register.

Priority encoder:
The encoder is used in the multiple load and store instruction to point which register within
the register file to be loaded or kept .

Multiplexers:
Several multiplexers are accustomed to the management operation of the processor buses.
Because of the restricted project time, we tend to implement these components in a very
behavioral model. Each component is described with an entity. Every entity has its own
architecture, which can be optimized for certain necessities depending on its application. This
creates the design easier to construct and maintain.

Arithmetic Logic Unit (ALU):


The ALU has two 32-bits inputs. The primary comes from the register file, whereas the other
comes from the shifter. Status registers flags modified by the ALU outputs. The V-bit output
goes to the V flag as well as the Count goes to the C flag. Whereas the foremost significant
bit really represents the S flag, the ALU output operation is done by NORed to get the Z flag.
The ALU has a 4-bit function bus that permits up to 16 opcode to be implemented.

Booth Multiplier Factor:


The multiplier factor has 3 32-bit inputs and the inputs return from the register file. The
multiplier output is barely 32-Least Significant Bits of the merchandise. The entity
representation of the multiplier factor is shown in the above block diagram. The multiplication
starts whenever the beginning 04 input goes active. Fin of the output goes high when finishing.

Booth Algorithm:
Booth algorithm is a noteworthy multiplication algorithmic rule for 2’s complement numbers.
This treats positive and negative numbers uniformly. Moreover, the runs of 0’s or 1’s within the
multiplier factor are skipped over without any addition or subtraction being performed, thereby
creating possible quicker multiplication.

Page 2 of 10
Barrel Shifter:
The barrel shifter features a 32-bit input to be shifted. This input is coming back from the
register file or it might be immediate data. The shifter has different control inputs coming back
from the instruction register. The Shift field within the instruction controls the operation of the
barrel shifter. This field indicates the kind of shift to be performed (logical left or right,
arithmetic right or rotate right). The quantity by which the register ought to be shifted is
contained in an immediate field within the instruction or it might be the lower 6 bits of a
register within the register file.
The shift_val input bus is 6-bits, permitting up to 32 bit shift. The shift type indicates the
needed shift sort of 00, 01, 10, 11 are corresponding to shift left, shift right, an arithmetic shift
right and rotate right, respectively. The barrel shifter is especially created with multiplexers.

Control Unit:
For any microprocessor, control unit is the heart of the whole process and it is responsible for
the system operation, so the control unit design is the most important part within the whole
design. The control unit is sometimes a pure combinational circuit design. Here, the control
unit is implemented by easy state machine. The processor timing is additionally included
within the control unit. Signals from the control unit are connected to each component within
the processor to supervise its operation.
ARM design philosophy:
ARM, previously Advanced RISC Machine, originally Acorn RISC Machine, is a family of
Reduced Instruction Set Computing (RISC) Architecture for Computer Processors. The ARM
processor core is key component of many successful 32-bit embedded systems.
The RISC design philosophy
The design philosophy aimed at delivering the following.
 simple but powerful instructions
 single cycle execution at a high clock speed
 intelligence in software rather than hardware
 Provide greater flexibility on reducing the complexity of instructions.
The ARM core uses RISC architecture.
The RISC philosophy is implemented with four major design rules:
1. Instructions – RISC processors have a reduced number of instruction classes. These classes
provide simple operations that can each execute in a single cycle. The compiler or programmer
synthesizes complicated operations (a divide operation) by combining several simple
instructions. Each instruction is a fixed length to allow the pipeline to fetch future instructions
before decoding the current instruction. In contrast, in CISC processors the instructions are
often of variable size and take many cycles to execute.

Page 3 of 10
2.

3. Pipelines —The processing of instructions is broken down into smaller units that can be
executed in parallel by pipelines. Ideally the pipeline advances by one step on each cycle for
maximum throughput. There is no need for an instruction to be executed by a mini program
called microcode as on CISC processors.
4.

5. Registers—RISC machines have a large general-purpose register set. Any register can contain
either data or an address. In contrast, CISC processors have dedicated registers for specific
purposes.
6. Load-store architecture--The processor operates on data held in registers. Separate load and
store instructions transfer data between the register bank and external memory. In contrast,
with a CISC design the data processing operations can act on memory directly.
The ARM Design Philosophy
There are a number of physical features that have driven the ARM processor design.
 Small to reduce power consumption and extend battery operation
 High code density
 Price sensitive and use slow and low-cost memory devices.
 Reduce the area of the die taken up by the embedded processor.
 Hardware debug technology
 ARM core is not a pure RISC architecture

Registers:
ARM processors provide general-purpose and special-purpose registers. Some additional
registers are available in privileged execution modes. In all ARM processors, the following
registers are available and accessible in any processor mode:
 13 general-purpose registers R0-R12.
 One Stack Pointer (SP).
 One Link Register (LR).
 One Program Counter (PC).
 One Application Program Status Register (APSR).
The amount of registers depends on the ARM version. According to the ARM Reference Manual,
there are 30 general-purpose 32-bit registers, with the exception of ARMv6-M and ARMv7-M
based processors. The first 16 registers are accessible in user-level mode, the additional
registers are available in privileged software execution (with the exception of ARMv6-M and
ARMv7-M). In this tutorial series we will work with the registers that are accessible in any
privilege mode: r0-15. These 16 registers can be split into two groups: general purpose and
special purpose registers.

Page 4 of 10
R0-R12: can be used during common operations to store temporary values, pointers (locations
to memory), etc. R0, for example, can be referred as accumulator during the arithmetic
operations or for storing the result of a previously called function. R7 becomes useful while
working with syscalls as it stores the syscall number and R11 helps us to keep track of
boundaries on the stack serving as the frame pointer (will be covered later). Moreover, the
function calling convention on ARM specifies that the first four arguments of a function are
stored in the registers r0-r3.

R13: SP (Stack Pointer). The Stack Pointer points to the top of the stack. The stack is an area of
memory used for function-specific storage, which is reclaimed when the function returns. The
stack pointer is therefore used for allocating space on the stack, by subtracting the value (in
bytes) we want to allocate from the stack pointer. In other words, if we want to allocate a 32 bit
value, we subtract 4 from the stack pointer.

R14: LR (Link Register). When a function call is made, the Link Register gets updated with a
memory address referencing the next instruction where the function was initiated from. Doing
this allows the program return to the “parent” function that initiated the “child” function call
after the “child” function is finished.
R15: PC (Program Counter). The Program Counter is automatically incremented by the size of
the instruction executed. This size is always 4 bytes in ARM state and 2 bytes in THUMB mode.
When a branch instruction is being executed, the PC holds the destination address. During
execution, PC stores the address of the current instruction plus 8 (two ARM instructions) in
ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb(v1) state.
This is different from x86 where PC always points to the next instruction to be executed.

Page 5 of 10
Current Program Status Register:
The Current Program Status Register (CPSR) holds the same program status flags as the APSR,
and some additional information.
The CPSR holds:
The APSR flags.
The processor mode.
The interrupt disable flags.
The instruction set state (ARM, Thumb, ThumbEE, or Jazelle®).
The endianness state (on ARMv4T and later).
The execution state bits for the IT block (on ARMv6T2 and later).

The Current Program Status Register is a 32-bit wide register used in the ARM architecture to
record various pieces of information regarding the state of the program being executed by the
processor and the state of the processor. This information is recorded by setting or clearing
specific bits in the register. The top four bits (bits 31, 30, 29, and 28) are the condition code (cc)
bits and are of most interest to us. Condition code bits are sometimes referred to as "flags". The
lowest 8 bits (bit 7 through to bit 0) store information about the processor's own state. The
remaining bits (i.e. bit 27 to bit 8) are currently unused in most ARM processors.
The N bit is the "negative flag" and indicates that a value is negative.
The Z bit is the "zero flag" and is set when an appropriate instruction produces a zero result.
The C bit is the "carry flag" but it can also be used to indicate "borrows" (from subtraction
operations) and "extends" (from shift instructions (LINK)).
The V bit is the "overflow flag" which is set if an instruction produces a result that overflows
and hence may go beyond the range of numbers that can be represented in 2's complement
signed format.
For completeness, the other state bits are:
The I and F bits which determine whether interrupts (such as requests for input/output) are
enabled or disabled.
The T bit which indicates whether the processor is in "Thumb" mode, where the processor can

Page 6 of 10
execute a subset of the assembly language as 16-bit compact instructions. As Thumb code
packs more instructions into the same amount of memory, it is an effective solution to
applications where physical memory is at a premium.
The M4 to M0 bits are the mode bits. Application programs normally run in user mode (where
the mode bits are 10000). Whenever an interrupt or similar event occurs, the processor
switches into one of the alternative modes allowing the software handler greater privileges with
regard to memory manipulation.

Program status register


Program Status Registers:
At any given moment, you have access to 16 registers (R0-R15) and the Current Program
Status Register (CPSR). In User mode, a restricted form of the CPSR called the Application
Program Status Register (APSR) is accessed instead.
The Current Program Status Register (CPSR) is used to store:
• The APSR flags.
• The current processor mode.
• Interrupt disable flags.
• The current processor state, that is, ARM, Thumb, ThumbEE, or Jazelle.
• The endianness.
• Execution state bits for the IT block.
The Program Status Registers (PSRs) form an additional set of banked registers. Each exception
mode has its own Saved Program Status Register (SPSR) where a copy of the pre-exception
CPSR is stored automatically when an exception occurs. These are not accessible from User
modes. Application programmers must use the APSR to access the parts of the CPSR that can
be changed in unprivileged mode. The APSR must be used only to access the N, Z, C, V, Q, and
GE[3:0] bits. These bits are not normally accessed directly, but instead set by condition code
setting instructions and tested by instructions that are executed conditionally. For example,
the CMP R0, R1 instruction compares the values of R0 and R1 and sets the zero flag (Z) if R0
and R1 are equal.

Page 7 of 10
The individual bits represent the following:
• N – Negative result from ALU.
• Z – Zero result from ALU.
• C – ALU operation Carry out.
• V – ALU operation overflowed.
• Q – cumulative saturation (also described as sticky).
• J – indicates whether the core is in Jazelle state.
• GE[3:0] – used by some SIMD instructions.
• IT [7:2] – If-Then conditional execution of Thumb-2 instruction groups.
• E bit controls load/store endianness.
• A bit disables asynchronous aborts.
• I bit disables IRQ.
• F bit disables FIQ.
• T bit – indicates whether the core is in Thumb state.
• M[4:0] – specifies the processor mode (FIQ, IRQ, as described in Table.

Instruction pipeline
The Process of fetching the next instruction while the current instruction is being executed is
called as “pipelining”. Pipelining is supported by the processor to increase the speed of program
execution. Increases throughput. Several operations take place simultaneously, rather than
serially in pipelining. The Pipeline has three stages fetch, decode and execute as shown in
figure

Page 8 of 10
The three stages used in the pipeline are:
(i) Fetch : In this stage the ARM processor fetches the instruction from the memory.
(ii) Decode : In this stage recognizes the instruction that is to be executed.
(iii) Execute 2 In this stage the processor processes the instruction and writes the result back
to desired register.
If these three stages of execution are overlapped, we will achieve higher speed of execution.
Such pipeline exists in version 7 of ARM processor. Once the pipeline is filled, each
instructions require s one cycle to complete execution. Below fig shows three staged pipelined
instruction.
 In first cycle, the processor fetches instruction 1 from the memory
 In the second cycle the processor fetches instruction 2 from the memory and decodes
instruction 1.
 In the third cycle the processor fetches instruction 3 from memory, decodes instruction 2
and executes instruction
 In the fourth cycle the processor fetches instruction 4, decodes instruction 3 and executes
instruction
 The pipeline thus executes an instruction in three cycles i.e. it delivers a throughput equal
to one instruction per cycle.
In case of a multi-cycle instruction as shown in Fig below, instruction 2 (i. e. STR of the
store instruction) requires 4 clock cycles and hence the pipeline stalls for one clock pulse. The
first instruction completes execution in the third clock pulse, while the second instruction
instead of completing execution in fourth clock pulse completes the same in fifth clock pulse.
Thereafter every instruction completes execution in one clock pulse as seen in this figure

The amount of work done at each stage can be reduced by increasing the number of stages in
the pipeline. To improve the performance, the processor then can be operated at higher
operating frequency. As more number of cycles are required to fill the pipeline, the system
latency also increases. The data dependency between the stages can also be increased as the
stages of pipeline increase. So the instructions need to be schedule while writing code to
decrease data dependency.

Page 9 of 10
Interrupts and vector table
Interrupts and the Vector Table.
When an exception or interrupt occurs, the processor sets the pc to a specific memory address.
The address is within a special address range called the vector table. The entries in the vector
table are instructions that branch to specific routines designed to handle a particular exception
or interrupt. The memory map address 0x00000000 is reserved for the vector table, a set of 32-
bit words. On some processors the vector table can be optionally located at a higher address in
memory (starting at the offset 0xffff0000). Operating systems such as Linux and Microsoft’s
embedded products can take advantage of this feature. When an exception or interrupt occurs,
the processor suspends normal execution and starts loading instructions from the exception
vector table. Each vector table entry contains a form of branch instruction pointing to the start
of a specific routine:
Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction
is frequently used as the mechanism to invoke an operating system routine.
Prefetch abort vector occurs when the processor attempts to fetch an instruction from an
address without the correct access permissions. The actual abort occurs in the decode stage.
Data abort vector is similar to a pre-fetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
Interrupt request vector is used by external hardware to interrupt the normal execution flow
of the processor. It can only be raised if IRQs are not masked in the cpsr.
Interrupt vector Table:

Page 10 of 10

You might also like