ARM Module - 1
ARM Module - 1
COURSE CO-ORDINATOR
Dr. Dileep Reddy Bolla
ASST. PROFESSOR
DEPT OF ECE
SVCE
COURSE CONTENTS /
SYLLABUS
Module-1
ARM-32 bitMicrocontroller:Thumb-2 technology
The microcontroller market is vast, with more than 20 billion devices per year estimated
to be shipped in 2010. A bewildering array of vendors, devices, and architectures is
competing in this market.
the industry’s changing needs; for example, Microcontrollers are required to handle
more work without increasing a product’s frequency or power.
support these communication channels and advanced peripherals are growing.
WHAT IS THE ARM CORTEX-M3
PROCESSOR?
convergence of functionalities.
Count and comes with many features previously available only in high-end
processors, The Cortex-M3 addresses the requirements for the 32-bit
embedded processor market in the following ways:
WHAT IS THE ARM CORTEX-M3
PROCESSOR?
increasing the frequency or power requirements
portable products including wireless networking applications
Enhanced determinism: guaranteeing that critical tasks and interrupts are
serviced as quickly as possible and in a known number of cycles
footprints
Ease of use: providing easier programmability and debugging for the growing
number of 8-bit and 16-bit users migrating to 32 bits.
WHAT IS THE ARM CORTEX-M3
PROCESSOR?
bit and 16-bit devices and enabling low-end, 32-bit microcontrollers to be priced at
less than US$1 for the first time
development suites from many development tool vendors.
Microcontrollers based on the Cortex-M3 processor already compete head-on with
Designers are increasingly looking at reducing the system cost, as opposed to the
traditional device cost. As such, organizations are implementing device aggregation,
whereby a single, more powerful device can potentially replace three or four
traditional 8-bit devices.
WHAT IS THE ARM CORTEX-M3
PROCESSOR?
Other cost savings can be achieved by improving the amount of code reuse across all systems.
Because Cortex-M3 processor-based microcontrollers can be easily programmed using the C
language and are based on a well-established architecture, application code can be ported and
reused easily, reducing development time and testing costs.
It is worthwhile highlighting that the Cortex-M3 processor is not the first ARM processor to
be used to create generic microcontrollers.
The venerable ARM7 processor has been very successful in this market, with partners such as
NXP (Philips), Texas Instruments, Atmel, OKI, and many other vendors delivering robust 32-
bit Microcontroller Units (MCUs).
The ARM7 is the most widely used 32-bit Embedded Processor in history, with over 1 billion
processors produced each year in a huge variety of electronic products, from mobile phones
to cars.
WHAT IS THE ARM CORTEX-M3
PROCESSOR?
The Cortex-M3 processor builds on the success of the ARM7 processor to deliver
devices that are significantly easier to program and debug and yet deliver a higher
processing capability.
Additionally, the Cortex-M3 processor introduces a number of features and
technologies that meet the specific requirements of the microcontroller applications,
such as non-maskable interrupts for critical tasks, highly deterministic nested
vector interrupts, atomic bit manipulation, and an optional Memory Protection
Unit (MPU).
These factors make the Cortex-M3 processor attractive to existing ARM processor
users as well as many new users considering use of 32-bit MCUs in their products.
BACKGROUND OF ARM AND ARM
ARCHITECTURE
To help you understand the variations of ARM processors and architecture versions, let’s look at
a little bit of ARM history.
ARM was formed in 1990 as Advanced RISC Machines Ltd., a Joint venture of Apple
Computer, Acorn Computer Group, and VLSI Technology.
In 1991, ARM introduced the ARM6 processor family, and VLSI became the initial
licensee. Subsequently, additional companies, including Texas Instruments, NEC, Sharp, and
ST Microelectronics, licensed the ARM processor designs, extending the applications of
ARM processors into mobile phones, computer hard disks, personal digital assistants
(PDAs), home entertainment systems, and many other consumer products.
ARM does not manufacture processors or sell the chips directly. Instead, ARM licenses the
processor designs to business partners, including a majority of the world’s leading
semiconductor companies.
BACKGROUND OF ARM AND ARM
ARCHITECTURE
Based on the ARM low-cost and power-efficient processor designs, these partners create their
processors, microcontrollers, and system-on-chip solutions. This business model is commonly
called intellectual property (IP) licensing.
In addition to processor designs, ARM also licenses systems-level IP and various software IPs.
To support these products, ARM has developed a strong base of development tools, hardware,
and software products to enable partners to develop their own products.
ARCHITECTURE VERSIONS
Over the years, ARM has continued to develop new processors and system blocks. These include
the popular ARM7TDMI processor and, more recently, the ARM1176TZ(F)-S processor, which
is used in high-end applications such as smart phones.
The evolution of features and enhancements to the processors over time has led to successive
versions of the ARM architecture. Note that architecture version numbers are independent
from processor names. For example, the ARM7TDMI processor is based on the ARMv4T
architecture (the T is for Thumb® instruction mode support).
The ARMv5E architecture was introduced with the ARM9E processor families, including the
Signal Processing (DSP) instructions for multimedia applications.
ARCHITECTURE VERSIONS
With the arrival of the ARM11 processor family, the architecture was extended to the
ARMv6. New features in this architecture included memory system features and Single
Instruction–Multiple Data (SIMD) instructions. Processors based on the ARMv6 architecture
include the ARM1136J(F)-S, the ARM1156T2(F)-S, and the ARM1176JZ(F)-S.
Following the introduction of the ARM11 family, it was decided that many of the new
technologies, such as the optimized Thumb-2 instruction set, were just as applicable to the
lower cost markets of microcontroller and automotive components.
It was also decided that although the architecture needed to be consistent from the lowest
MCU to the highest performance application processor, there was a need to deliver processor
architectures that best fit applications, enabling very deterministic and low gate count
processors for cost-sensitive markets and feature-rich and high-performance ones for
high-end applications.
ARCHITECTURE VERSIONS
Over the past several years, ARM extended its product portfolio by diversifying its CPU development,
which resulted in the architecture version 7 or v7. In this version, the architecture design is divided
into three profiles:
The A profile is designed for high-performance open application platforms.
The M profile is designed for deeply embedded microcontroller-type systems.
as high-end embedded operating systems (OSs) (e.g., Symbian, Linux, and Windows
Embedded). These processors requiring the highest processing power, virtual memory system
support with memory management units (MMUs), and, optionally, enhanced Java support and a
secure program execution environment. Example products include high-end mobile phones and
electronic wallets for financial transactions.
ARCHITECTURE VERSIONS
ARCHITECTURE VERSIONS
R Profile (ARMv7-R): Real-time, high-performance processors targeted primarily at the higher end
of the real-time1 market: those applications, such as high-end breaking systems and hard drive
controllers, in which high processing power and high reliability are essential and for which low
latency is important.
M Profile (ARMv7-M): Processors targeting low-cost applications in which processing efficiency is
important and cost, power consumption, low interrupt latency, and ease of use are critical, as well
as industrial control applications, including real-time control systems.
The Cortex processor families are the first products developed on architecture v7, and the Cortex-
M3 processor is based on one profile of the v7 architecture, called ARM v7-M, an architecture
specification for microcontroller products.
Other Cortex family processors include the Cortex-A8 (application processor), which is based on the
ARMv7-A profile, and the Cortex-R4 (real-time processor), which is based on the ARMv7-R profile
ARCHITECTURE VERSIONS
through a simple registration process. The ARMv7-M architecture contains the following key
areas:
Programmer’s mode
Instruction set
Memory model
Debug architecture
Processor-specific information, such as interface details and timing, is documented in the
ARM web site. The Cortex-M3 TRM also covers a number of implementation details not
covered by the architecture specifications, such as the x
PROCESSOR NAMING
Traditionally, ARM used a numbering scheme to name processors. In the early days (the
1990s), suffixes were also used to indicate features on the processors. For example, with
the ARM7TDMI processor, the T indicates Thumb instruction support, D indicates JTAG
debugging, M indicates fast multiplier, and I indicates an embedded ICE module.
Subsequently, it was decided that these features should become standard features of future
ARM processors; therefore, these suffixes are no longer added to the new processor family
names. Instead, variations on memory interface, cache, and tightly coupled memory (TCM)
have created a new scheme for processor naming.
given the suffix “46” (e.g., ARM946E-S). In addition, other suffixes are added to indicate
synthesizable2 (S) and Jazelle (J) technology.
PROCESSOR NAMING
With version 7 of the architecture, ARM has migrated away from these complex numbering
schemes that needed to be decoded, moving to a consistent naming for families
ofprocessors, with Cortex its initial brand. In addition to illustrating the compatibility across
processors, this system removes confusion between architectural version and processor
family number; for example, the ARM7TDMI is not a v7 processor but was based on the v4T
architecture.
INSTRUCTION SET DEVELOPMENT
Enhancement and extension of instruction sets used by the ARM processors has been one of the key
driving forces of the architecture’s evolution (see Figure 1.3). Historically (since ARM7TDMI), two
different instruction sets are supported on the ARM processor: the ARM instructions that are 32 bits
and Thumb instructions that are 16 bits. During program execution, the processor can be dynamically
switched between the ARM state and the Thumb state to use either.one of the instruction sets.
The Thumb instruction set provides only a subset of the ARM instructions, but it can provide higher
code density. It is useful for products with tight memory requirements.
As the architecture version has been updated, extra instructions have been added to both ARM
instructions and Thumb instructions.
In 2003, ARM announced the Thumb-2 instruction set, which is a new superset of Thumb instructions
that contains both 16-bit and 32-bit instructions. The details of the instruction set are provided in a
has been updated for the ARMv5 architecture, the ARMv6 architecture, and the ARMv7 architecture.
INSTRUCTION SET DEVELOPMENT
THE THUMB-2 TECHNOLOGY AND
INSTRUCTION SET ARCHITECTUR
The Thumb-2 technology extended the Thumb Instruction Set Architecture (ISA) into a highly efficient
and powerful instruction set that delivers significant benefits in terms of ease of use, code size, and
performance (see Figure 1.4).
The extended instruction set in Thumb-2 is a superset of the previous 16-bit Thumb instruction set,
with additional 16-bit instructions alongside 32-bit instructions. It allows more complex operations to be
switching between ARM state and Thumb state.
Focused on small memory system devices such as microcontrollers and reducing the size of the
processor, the Cortex-M3 supports only the Thumb-2 (and traditional Thumb) instruction set.
Instead of using ARM instructions for some operations, as in traditional ARM processors, it uses the
Thumb-2 instruction set for all operations. As a result, the Cortex-M3 processor is not backward
compatible with traditional ARM processors. That is, you cannot run a binary image for ARM7
processors on the Cortex-M3 processor.
THE THUMB-2 TECHNOLOGY AND
INSTRUCTION SET ARCHITECTUR
THE THUMB-2 TECHNOLOGY AND
INSTRUCTION SET ARCHITECTUR
Nevertheless, the Cortex-M3 processor can execute almost all the 16-bit Thumb instructions,
including all 16-bit Thumb instructions supported on ARM7 family processors, making application
porting easy.
With support for both 16-bit and 32-bit instructions in the Thumb-2 instruction set, there is no
need to switch the processor between Thumb state (16-bit instructions) and ARM state (32-bit
instructions).
For example, in ARM7 or ARM9 family processors, you might need to switch to ARM state if you
want to carry out complex calculations or a large number of conditional operations and good
performance is needed,
whereas in the Cortex-M3 processor, you can mix 32-bit instructions with 16-bit instructions
without switching state, getting high code density and high performance with no extra complexity.
The Thumb-2 instruction set is a very important feature of the ARMv7 architecture. Compared with
the instructions supported on ARM7 family processors (ARMv4T architecture), the Cortex-M3
processor instruction set has a large number of new features.
CORTEX-M3 PROCESSOR
APPLICATIONS
With its high performance and high code density and small silicon footprint, the Cortex-M3
processor is ideal for a wide variety of applications:
microcontrollers, which are commonly used in consumer products, from toys to electrical
appliances. It is a highly competitive market due to the many well-known 8-bit and 16-bit
microcontroller products on the market. Its lower power, high performance, and ease-of-use
advantages enable embedded developers to migrate to 32-bit systems and develop products with
the ARM architecture.
Automotive: Another ideal application for the Cortex-M3 processor is in the automotive industry.
The Cortex-M3 processor has very high-performance efficiency and low interrupt latency,
allowing it to be used in real-time systems. The Cortex-M3 processor supports up to 240
external vectored interrupts, with a built-in interrupt controller with nested interrupt supports
and an optional MPU, making it ideal for highly integrated and cost-sensitive automotive
applications.
CORTEX-M3 PROCESSOR
APPLICATIONS
in Thumb-2 for bit-field manipulation, make the Cortex-M3 ideal for many communications
applications, such as Bluetooth and ZigBee.
factors. Again, the Cortex-M3 processor’s interrupt feature, low interrupt latency, and enhanced
fault-handling features make it a strong candidate in this area.
is used. The Cortex-M3 processor, being a small processor, is highly efficient and low in power and
supports an MPU enabling complex software to execute while providing robust memory protection.
ARCHITECTURE OF ARM CORTEX M3
The Cortex™-M3 is a 32-bit microprocessor. It has a 32-bit data path, a 32-bit register bank, and
The processor has a Harvard architecture, which means that it has a separate instruction bus and
data bus. This allows instructions and data accesses to take place at the same time, and as a
result of this, the performance of the processor increases because data accesses do not affect
the instruction pipeline.This feature results in multiple bus interfaces on Cortex-M3, each with
optimized usage and the ability to be used simultaneously. However, the instruction and data buses
share the same memory space (a unified memory system). In other words, you cannot get 8 GB of
memory space just because you have separate bus interfaces
For complex applications that require more memory system features, the Cortex-M3 processor has
an optional Memory Protection Unit (MPU), and it is possible to use an external cache if it’s
required. Both little endian and big endian memory systems are supported.
The Cortex-M3 processor includes a number of fixed internal debugging components. These
components provide debugging operation supports and features, such as breakpoints and watch points.
CORTEX-M3 PROCESSOR
APPLICATIONS
ARCHITECTURE OF ARM CORTEX M3
REGISTERS
The Cortex-M3 processor has Registers R0 through R15 R13 (the stack pointer) is banked, with only
one copy of the R13 visible at a time.
operations. Some 16-bit Thumb® instructions can only access a subset of these registers (low
registers, R0–R7).
that only one is visible at a time.
The two stack pointers are as follows:
(OS) kernel and exception handlers
The lowest 2 bits of the stack pointers are always 0, which means they are always word aligned.
ARCHITECTURE OF ARM CORTEX M3
REGISTERS
register.
can be written to control the program flow.
Special Registers
The Cortex-M3 processor also has a number of special registers. They are as follows:
Program Status registers (PSRs)
Interrupt Mask registers (PRIMASK, FAULTMASK, and BASEPRI)
Control register (CONTROL)
These registers have special functions and can be accessed only by special instructions. They
cannot be used for normal data processing.
ARCHITECTURE OF ARM CORTEX M3
ARCHITECTURE OF ARM CORTEX M3
OPERATION MODES
The Cortex-M3 processor has two modes and two privilege levels.
The operation modes (thread mode and handler mode) determine whether the processor is running a
normal program or running an exception handler like an interrupt handler or system exception
handler (see Figure 2.4).
The privilege levels (privileged level and user level) provide a mechanism for safeguarding memory
accesses to critical regions as well as providing a basic security model.
When the processor is running a main program (thread mode), it can be either in a privileged
state or a user state, but exception handlers can only be in a privileged state.
When the processor exits reset, it is in thread mode, with privileged access rights. In the
privileged state, a program has access to all memory ranges (except when prohibited by MPU
settings) and can use all supported instructions.
ARCHITECTURE OF ARM CORTEX M3
OPERATION MODES
When the processor is running a main program (thread mode), it can be either in a privileged state
or a user state, but exception handlers can only be in a privileged state.
When the processor exits reset, it is in thread mode, with privileged access rights. In the
privileged state, a program has access to all memory ranges (except when prohibited by MPU
settings) and can use all supported instructions.
Software in the privileged access level can switch the program into the user access level using
the control register. When an exception takes place, the processor will always switch back to
the privileged state and return to the previous state when exiting the exception handler.
A user program cannot change back to the privileged state by writing to the control register
(see Figure 2.5). It has to go through an exception handler that programs the control register
to switch the processor back into the privileged access level when returning to thread mode.
ARCHITECTURE OF ARM CORTEX M3
ARCHITECTURE OF ARM CORTEX M3
OPERATION MODES
The separation of privilege and user levels improves system reliability by preventing system
configuration registers from being accessed or changed by some untrusted programs. If an MPU is
available, it can be used in conjunction with privilege levels to protect critical memory locations, such
as programs and data for OSs.
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 processor includes an interrupt controller called the Nested Vectored Interrupt
Controller (NVIC). It is closely coupled to the processor core and provides a number of features as
follows:
Nested interrupt support * Vectored interrupt support
Dynamic priority changes support * Reduction of interrupt latency
Interrupt masking
interrupts and most of the system exceptions can be programmed to different priority levels.
When an interrupt occurs, the NVIC compares the priority of this interrupt to the current
running priority level. If the priority of the new interrupt is higher than the current level, the
interrupt handler of the new interrupt will override the current running task.
ARCHITECTURE OF ARM CORTEX M3
interrupt is accepted, the starting address of the interrupt service routine (ISR) is located from
a vector table in memory. There is no need to use software to determine and branch to the
starting address of the ISR. Thus, it takes less time to process the interrupt request.
run time. Interrupts that are being serviced are blocked from further activation until the ISR is
completed, so their priority can be changed without risk of accidental reentry.
features to lower the interrupt latency. These include automatic saving and restoring some
register contents, reducing delay in switching from one ISR to another, and handling of late
their priority level or masked completely using the interrupt masking registers BASEPRI,
PRIMASK, and FAULTMASK. They can be used to ensure that time-critical tasks can be finished
on time without being interrupted.
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 has a predefined memory map. This allows the built-in peripherals, such as the
interrupt controller and the debug components, to be accessed by simple memory access
instructions. Thus, most system features are accessible in C program code.
The predefined memory map also allows the Cortex-M3 processor to be highly optimized for speed
and ease of integration in system-on-a-chip (SoC) designs.
Overall, the 4 GB memory space can be divided into ranges as shown in Figure. The Cortex-M3 design
has an internal bus infrastructure optimized for this memory usage. In addition, the design allows
these regions to be used differently. For example, data memory can still be put into the CODE
region, and program code can be executed from an external Random Access Memory (RAM)
region.
The system-level memory region contains the interrupt controller and the debug components.
ARCHITECTURE OF ARM CORTEX M3
There are several bus interfaces on the Cortex-M3 processor.
They allow the Cortex-M3 to carry instruction fetches and data accesses at the same time. The main
bus interfaces are as follows:
The code memory region access is carried out on the code memory buses, which physically consist of
two buses, one called I-Code and other called D-Code. These are optimized for instruction fetches
for best instruction execution speed.
The system bus is used to access memory and peripherals. This provides access to the Static
Random Access Memory (SRAM), peripherals, external RAM, external devices, and part of the
system level memory regions.
The private peripheral bus provides access to a part of the system-level memory dedicated to
private peripherals, such as debugging components.
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 has an optional MPU. This unit allows access rules to be set up for privileged
access and user program access. When an access rule is violated, a fault exception is
generated, and the fault exception handler will be able to analyze the problem and correct it,
if possible.
The MPU can be used in various ways. In common scenarios, the OS can set up the MPU to
protect data use by the OS kernel and other privileged processes to be protected from
untrusted user programs.
The MPU can also be used to make memory regions read-only, to prevent accidental erasing of
data or to isolate memory regions between different tasks in a multitasking system. Overall, it
can help make Embedded systems more robust and reliable.
The MPU feature is optional and is determined during the implementation stage of the
microcontroller or SoC design.
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 supports the Thumb-2 instruction set. This is one of the most important
features of the Cortex-M3 processor because it allows 32-bit instructions and 16-bit
instructions to be used together for high code density and high efficiency. It is flexible and
powerful yet easy to use.
In previous ARM processors, the central processing unit (CPU) had two operation states: a 32-
bit ARM state and a 16-bit Thumb state. In the ARM state, the instructions are 32 bits and
can execute all supported instructions with very high performance. In the Thumb state, the
instructions are 16 bits, so there is a much higher instruction code density, but the Thumb
state does not have all the functionality of ARM instructions and may require more instructions
to complete certain types of operations.
To get the best of both worlds, many applications have mixed ARM and Thumb codes. However,
the mixed-code arrangement does not always work best.
ARCHITECTURE OF ARM CORTEX M3
There is overhead (in terms of both execution time and instruction space, see Figure) to switch
between the states, and ARM and Thumb codes might need to be compiled separately in
different files. This increases the complexity of software development and reduces maximum
efficiency of the CPU core.
ARCHITECTURE OF ARM CORTEX M3
With the introduction of the Thumb-2 instruction set, it is now possible to handle all processing
requirements in one operation state. There is no need to switch between the two. In fact, the
Cortex-M3 does not support the ARM code. Even interrupts are now handled with the Thumb
state. (Previously, the ARM core entered interrupt handlers in the ARM state.) Since there is
no need to switch between states, the Cortex-M3 processor has a number of advantages over
traditional ARM processors, such as:
No state switching overhead, saving both execution time and instruction space
No need to separate ARM code and Thumb code source files, making software development and
maintenance easier.
It’s easier to get the best efficiency and performance, in turn making it easier to write
software, because there is no need to worry about switching code between ARM and Thumb to
try to get the best density/performance
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 processor has a number of interesting and powerful instructions. Here are a
few examples:
general-purpose register and move special register to general-purpose register; for access to
the special registers
Since the Cortex-M3 processor supports the Thumb-2 instruction set only, existing program
code for ARM needs to be ported to the new architecture. Most C applications simply need to
be recompiled using new compilers that support the Cortex-M3. Some assembler codes need
modification and porting to use the new architecture and the new unified assembler framework.
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 processor implements a new exception model, introduced in the ARMv7-M
architecture. This exception model differs from the traditional ARM exception model, enabling
very efficient exception handling.
It has a number of system exceptions plus a number of external Interrupt Request (IRQs)
(external interrupt inputs). There is no fast interrupt (FIQ) (fast interrupt in ARM7/ARM9/
ARM10/ARM11) in the Cortex-M3; however, interrupt priority handling and nested interrupt
support are now included in the interrupt architecture. Therefore, it is easy to set up a system
that supports nested interrupts (a higher-priority interrupt can override or preempt a lower-
priority interrupt handler) and that behaves just like the FIQ in traditional ARM processors.
The interrupt features in the Cortex-M3 are implemented in the NVIC. Aside from supporting
external interrupts, the Cortex-M3 also supports a number of internal exception sources, such
as system fault handling. As a result, the Cortex-M3 has a number of predefined exception
types, as shown in Table
ARCHITECTURE OF ARM CORTEX M3
The Cortex-M3 processor is designed with various features to allow designers to develop low
power and high energy efficient products.
First, it has sleep mode and deep sleep mode supports, which can work with various system-
design methodologies to reduce power consumption during idle period.
Second, its low gate count and design techniques reduce circuit activities in the processor to
allow active power to be reduced.
In addition, since Cortex-M3 has high code density, it has lowered the program size
requirement. At the same time, it allows processing tasks to be completed in a short time, so
that the processor can return to sleep modes as soon as possible to cut down energy use.
As a result, the energy efficiency of Cortex-M3 is better than many 8-bit or 16-bit
microcontrollers.
ARCHITECTURE OF ARM CORTEX M3
Starting from Cortex-M3 revision 2, a new feature called Wakeup Interrupt Controller (WIC) is
available.
This feature allows the whole processor core to be powered down, while processor states are
retained and the processor can be returned to active state almost immediately when an
interrupt takes place. This makes the Cortex-M3 even more suitable for many ultra-low power
applications that previously could only be implemented with 8-bit or 16-bit microcontrollers.
ARCHITECTURE OF ARM CORTEX M3
DEBUGGING SUPPORT
The Cortex-M3 processor includes a number of debugging features, such as program execution
controls, including halting and stepping, instruction breakpoints, data watchpoints, registers and
based on the CoreSight™ architecture. Unlike traditional ARM processors, the CPU core itself
does not have a Joint Test Action Group (JTAG) interface. Instead, a debug interface module is
decoupled from the core, and a bus interface called the Debug Access Port (DAP) is provided at
the core level. Through this bus interface, external debuggers can access control registers to
debug hardware as well as system memory, even when the processor is running.
The control of this bus interface is carried out by a Debug Port (DP) device. The DPs currently
available are the Serial-Wire JTAG Debug Port (SWJ-DP) (supports the traditional JTAG
protocol as well as the Serial-Wire protocol) or the SW-DP (supports the Serial-Wire protocol
only). A JTAG-DP module from the ARM CoreSight product family can also be used. Chip
manufacturers can choose to attach one of these DP modules to provide the debug interface.
ARCHITECTURE OF ARM CORTEX M3
DEBUGGING SUPPORT
Chip manufacturers can also include an Embedded Trace Macrocell (ETM) to allow instruction
trace. Trace information is output via the Trace Port Interface Unit (TPIU), and the debug host
(usually a Personal Computer [PC]) can then collect the executed instruction information via
used to trigger debug actions. Debug events can be breakpoints, watch-points, fault conditions,
or external debugging request input signals. When a debug event takes place, the Cortex-M3
watch-point function is provided by a Data Watchpoint and Trace (DWT) unit in the Cortex-M3
processor. This can be used to stop the processor (or trigger the debug monitor exception
routine) or to generate data trace information. When data trace is used, the traced data can be
output via the TPIU. (In the CoreSight architecture, multiple trace devices can share one single
trace port.) In addition to these basic debugging features, the Cortex-M3 processor also
provides a Flash Patch and Breakpoint (FPB) unit that can provide a simple breakpoint function or
ARCHITECTURE OF ARM CORTEX M3
DEBUGGING SUPPORT
An Instrumentation Trace Macrocell (ITM) provides a new way for developers to output data to a
debugger. By writing data to register memory in the ITM, a debugger can collect the data via a
trace interface and display or process them. This method is easy to use and faster than JTAG
output.All these debugging components are controlled via the DAP interface bus on the Cortex-M3
or by a program running on the processor core, and all trace information is accessible from the
TPIU.
REGISTERS
we’ve seen, the Cortex™-M3 processor has registers R0 through R15 and a number of special
registers. R0 through R12 are general purpose, but some of the 16-bit Thumb® instructions can
only access R0 through R7 (low registers), whereas 32-bit Thumb-2 instructions can access
all these registers. Special registers have predefined functions and can only be accessed by
special register access instructions.
2 instructions. They are all 32 bits; the reset value is unpredictable.
registers. They are accessible by all Thumb-2 instructions but not by all 16-bit Thumb
instructions. These registers are all 32 bits; the reset value is unpredictable.
REGISTERS
REGISTERS
Stack Pointer R13
R13 is the stack pointer (SP). In the Cortex-M3 processor, there are two SPs. This duality allows
two separate stack memories to be set up. When using the register name R13, you can only access
the current SP; the other one is inaccessible unless you use special instructions to move to
special register from general-purpose register (MSR) and move special register to general-
purpose register (MRS). The two SPs are as follows:
by the operating system (OS) kernel, exception handlers, and all application codes that
require privileged access.
level application code (when not running an exception handler).
In the Cortex-M3, the instructions for accessing stack memory are PUSH and POP. The assembly
new data is stored in the stack. PUSH and POP are usually used to save register contents to
stack memory at the start of a subroutine and then restore the registers from stack at the
end of the subroutine. You can PUSH or POP multiple registers in one instruction:
subroutine_1
PUSH {R0-R7, R12, R14} ; Save registers
... ; Do your processing
POP {R0-R7, R12, R14} ; Restore registers
BX R14 ; Return to calling function
REGISTERS
Instead of using R13, you can use SP (for SP) in your program codes. It means the same thing.
Inside program code, both the MSP and the PSP can be called R13/SP. However, you can access
a particular one using special register access instructions (MRS/MSR).
The MSP, also called SP_main in ARM documentation, is the default SP after power-up; it is used
by kernel code and exception handlers. The PSP, or SP_process in ARM documentation, is typically
used by thread processes in system with embedded OS running.
Because register PUSH and POP operations are always word aligned (their addresses must be
0x0, 0x4, 0x8, ...), the SP/R13 bit 0 and bit 1 are hardwired to 0 and always read as zero
(RAZ).
R14 is the Link register (LR). Inside an assembly program, you can write it as either R14 or LR.
LR is used to store the return program counter (PC) when a subroutine or function is called—
for example, when you’re using the branch and link (BL) instruction:
REGISTERS
main ; Main program
...
BL function1 ; Call function1 using Branch with Link instruction.
; PC = function1 and
; LR = the next instruction in main
...
function1
... ; Program code for function 1
BX LR ; Return
Despite the fact that bit 0 of the PC is always 0 (because instructions are word aligned or half
word aligned), the LR bit 0 is readable and writable. This is because in the Thumb instruction set,
bit 0 is often used to indicate ARM/Thumb states. To allow the Thumb-2 program for the Cortex-
M3 to work with other ARM processors that support the Thumb-2 technology, this least
significant bit (LSB) is writable and readable.
Because of the pipelined nature of the Cortex-M3 processor, when you read this register, you will
find that the value is different than the location of the executing instruction, normally by 4
REGISTERS
0x1000 : MOV R0, PC ; R0 = 0x1004
In other instructions like literal load (reading of a memory location related to current PC value),
the effective value of PC might not be instruction address plus 4 due to alignment in
address calculation. But the PC value is still at least 2 bytes ahead of the instruction
address during execution.
Writing to the PC will cause a branch (but LRs do not get updated). Because an instruction
address must be half word aligned, the LSB (bit 0) of the PC read value is always 0.
However, in branching, either by writing to PC or using branch instructions, the LSB of the
target address should be set to 1 because it is used to indicate the Thumb state
operations. If it is 0, it can imply trying to switch to the ARM state and will result in a
fault exception in the Cortex-M3.
SPECIAL REGISTERS
• The special registers in the Cortex-M3 processor include the following:
• Program Status registers (PSRs)
• Interrupt Mask registers (PRIMASK, FAULTMASK, and BASEPRI)
• Control register (CONTROL)
• Special registers can only be accessed via MSR and MRS instructions; they do not have memory
addresses:
• MRS <reg>, <special_reg>; Read special register
• The PSRs are subdivided into three status registers:
* Application Program Status register (APSR) * Interrupt Program Status register (IPSR)
• Execution Program Status register (EPSR)
• The three PSRs can be accessed together or separately using the special register access
SPECIAL REGISTERS
using the MRS instruction. You can also change the APSR using the MSR instruction, but
EPSR and IPSR are read-only
SPECIAL REGISTERS
SPECIAL REGISTERS
• In ARM assembler, when accessing xPSR (all three PSRs as one), the symbol PSR is used:
• The descriptions for the bit fields in PSR are shown in Table 3.1.
• If you compare this with the Current Program Status register (CPSR) in ARM7, you might find
that some bit fields that were used in ARM7 are gone. The Mode (M) bit field is gone because the
Cortex-M3 does not have the operation mode as defined in ARM7. Thumb-bit (T) is moved to
bit 24. Interrupt status (I and F) bits are replaced by the new interrupt mask registers
(PRIMASKs), which are separated from PSR
SPECIAL REGISTERS
• The PRIMASK and BASEPRI registers are useful for temporarily disabling interrupts in timing-
critical tasks. An OS could use FAULTMASK to temporarily disable fault handling when a task
has crashed. In this scenario, a number of different faults might be taking place when a task
crashes. Once the core starts cleaning up, it might not want to be interrupted by other
faults caused by the crashed process. Therefore, the FAULTMASK gives the OS kernel time
to deal with fault conditions.
• To access the PRIMASK, FAULTMASK, and BASEPRI registers, a number of functions are
available in the device driver libraries provided by the microcontroller vendors.
SPECIAL REGISTERS
SPECIAL REGISTERS
SPECIAL REGISTERS
• The Control Register The control register is used to define the privilege level and the SP
selection. This register has 2 bits, as shown in Table 3.3.
• CONTROL[1] In the Cortex-M3, the CONTROL[1] bit is always 0 in handler mode. However, in
the thread or base level, it can be either 0 or 1.
• This bit is writable only when the core is in thread mode and privileged. In the user state or
handler mode, writing to this bit is not allowed. Aside from writing to this register, another
way to change this bit is to change bit 2 of the LR when in exception return.
• CONTROL[0] The CONTROL[0] bit is writable only in a privileged state. Once it enters the user
state, the only way to switch back to privileged is to trigger an interrupt and change this in the
exception handler.
• To access the control register in assembly, the MRS and MSR instructions are used:
• When the processor is running in thread mode, it can be in either the privileged or user level, but
handlers can only be in the privileged level. When the processor exits reset, it is in thread
mode, with privileged access rights.
• In the user access level (thread mode), access to the system control space (SCS)—a part of the
memory region for configuration registers and debugging components—is blocked. Furthermore,
instructions that access special registers (such as MSR, except when accessing APSR) cannot be
used. If a program running at the user access level tries to access SCS or special registers,
a fault exception will occur.
• Software in a privileged access level can switch the program into the user access level using
the control register. When an exception takes place, the processor will always switch to a
privileged state and return to the previous state when exiting the exception handler
OPERATION MODE
• A user program cannot change back to the privileged state directly by writing to the control
register. It has to go through an exception handler that programs the control register to switch
the processor back into privileged access level when returning to thread mode.
OPERATION MODE
• The support of privileged and user access levels provides a more secure and robust
architecture. For example, when a user program goes wrong, it will not be able to corrupt
control registers in the Nested Vectored Interrupt Controller (NVIC). In addition, if the
Memory Protection Unit (MPU) is present, it is possible to block user programs from
accessing memory regions used by privileged processes.In simple applications, there is no
need to separate the privileged and user access levels. In these cases, there is no need to
use user access level and no need to program the control register. You can separate the user
application stack from the kernel stack memory to avoid the possibility of crashing a system
caused by stack operation errors in user programs.With this arrangement, the user program
(running in thread mode) uses the PSP, and the exception handlers use the MSP. The
switching of SPs is automatic upon entering or leaving the exception handlers The mode and
access level of the processor are defined by the control register. When the control register
bit 0 is 0, the processor mode changes when an exception takes place
OPERATION MODE
OPERATION MODE
• When control register bit 0 is 1 (thread running user application), both processor mode and
access level change when an exception takes place (see Figure 3.10).
• Control register bit 0 is programmable only in the privileged level (see Figure 2.5). For a
user-level program to switch to privileged state, it has to raise an interrupt (for example,
supervisor call [SVC]) and write to CONTROL[0] within the handler.
EXCEPTIONS AND INTERRUPTS
• The Cortex-M3 supports a number of exceptions, including a fixed number of system exceptions
and a number of interrupts, commonly called IRQ.The number of interrupt inputs on a Cortex-
M3 microcontroller depends on the individual design. Interrupts generated by peripherals,
except System Tick Timer, are also connected to the interrupt input signals. The typical
number of interrupt inputs is 16 or 32. However, you might find some microcontroller designs
with more (or fewer) interrupt inputs. Besides the interrupt inputs, there is also a Non
Maskable interrupt (NMI) input signal. The actual use of NMI depends on the design of the
microcontroller or system-on-chip (SoC) product you use. In most cases, the NMI could be
connected to a watchdog timer or a voltage-monitoring block that warns the processor when the
voltage drops below a certain level. The NMI exception can be activated any time, even right
after the core exits reset. A number of the system exceptions are fault-handling exceptions
that can be triggered by various error conditions. The NVIC also provides a number of fault
status registers so that error handlers can determine the cause of the exceptions.
VECTOR TABLES
• When an exception event takes place on the Cortex-M3 and is accepted by the processor core,
the corresponding exception handler is executed. To determine the starting address of the
exception handler, a vector table mechanism is used. The vector table is an array of word data
inside the system memory, each representing the starting address of one exception type. The
vector table is relocatable, and the relocation is controlled by a relocation register in the NVIC
(see Table 3.5). After reset, this relocation control register is reset to 0; therefore, the
vector table is located in address 0x0 after reset. For example, if the reset is exception type
1, the address of the reset vector is 1 times 4 (each word is 4 bytes), which equals
0x00000004, and NMI vector (type 2) is located in 2 × 4 = 0x00000008. The address
0x00000000 is used to store the starting value for the MSP. The LSB of each exception
vector indicates whether the exception is to be executed in the Thumb state. Because the
Cortex-M3 can support only Thumb instructions, the LSB of all the exception vectors should be
set to 1.
VECTOR TABLES
STACK MEMORY OPERATIONS
• In the Cortex-M3, besides normal software-controlled stack PUSH and POP, the stack PUSH
and POP operations are also carried out automatically when entering or exiting an
exception/interrupt handler.
• In general, stack operations are memory write or read operations, with the address specified
by an SP. Data in registers is saved into stack memory by a PUSH operation and can be
restored to registers later by a POP operation. The SP is adjusted automatically in PUSH and
POP so that multiple data PUSH will not cause old stacked data to be erased.
• The function of the stack is to store register contents in memory so that they can be restored
later, after a processing task is completed. For normal uses, for each store (PUSH), there
must be a corresponding read (POP), and the address of the POP operation should match that
of the PUSH operation (see Figure 3.11). When PUSH/POP instructions are used, the SP is
incremented/decremented automatically.
STACK MEMORY OPERATIONS
• When program control returns to the main program, the R0–R2 contents are the same as
before. Notice the order of PUSH and POP: The POP order must be the reverse of PUSH.
These operations can be simplified, thanks to PUSH and POP instructions allowing multiple load
and store. In this case, the ordering of a register POP is automatically reversed by the
processor. You can also combine RETURN with a POP operation. This is done by pushing the LR
to the stack and popping it back to PC at the end of the subroutine.
• The Cortex-M3 uses a full-descending stack operation model. The SP points to the last data
pushed to the stack memory, and the SP decrements before a new PUSH operation
STACK MEMORY OPERATIONS
STACK MEMORY OPERATIONS
• For POP operations, the data is read from the memory location pointer by SP, and then, the
SP is incremented. The contents in the memory location are unchanged but will be overwritten
when the next PUSH operation takes place
• Because each PUSH/POP operation transfers 4 bytes of data (each register contains 1 word, or
4 bytes), the SP decrements/increments by 4 at a time or a multiple of 4 if more than 1
register is pushed or popped. In the Cortex-M3, R13 is defined as the SP. When an interrupt
takes place, a number of registers will be pushed automatically, and R13 will be used as the SP
for this stacking process. Similarly, the pushed registers will be restored/popped automatically
when exiting an interrupt handler, and the SP will also be adjusted.
The Two-Stack Model in the Cortex-
M3
• As mentioned before, the Cortex-M3 has two SPs: the MSPS and the PSP. The SP register to
be used is controlled by the control register bit 1
• When CONTROL[1] is 0, the MSP is used for both thread mode and handler mode (see Figure
3.16). In this arrangement, the main program and the exception handlers share the same stack
memory region. This is the default setting after power-up.
• When the CONTROL[1] is 1, the PSP is used in thread mode (see Figure 3.17). In this
arrangement, the main program and the exception handler can have separate stack memory
regions. This can prevent a stack error in a user application from damaging the stack used by
the OS (assuming that the user application runs only in thread mode and the OS kernel
executes in handler mode).Note that in this situation, the automatic stacking and unstacking
mechanism will use PSP, whereas stack operations inside the handler will use MSP.
The Two-Stack Model in the Cortex-
M3
The Two-Stack Model in the Cortex-
M3
• It is possible to perform read/write operations directly to the MSP and PSP, without any
confusion of which R13 you are referring to. Provided that you are in privileged level, you can
access MSP and PSP values:
• In general, it is not recommended to change current selected SP values in a C function, as the
stack memory could be used for storing local variables. To access the SPs in assembly, you
can use the MRS and MSR instructions:
• By reading the PSP value using an MRS instruction, the OS can read data stacked by the user
application (such as register contents before SVC). In addition, the OS can change the PSP
pointer value—for example, during context switching in multitasking systems.
RESET SEQUENCE
• After the processor exits reset, it will read two words from memory (see Figure 3.18):
• Address 0x00000000: Starting value of R13 (the SP)
• Address 0x00000004: Reset vector (the starting address of program execution; LSB
should be set to 1 to indicate Thumb state)
• This differs from traditional ARM processor behavior. Previous ARM processors executed
program code starting from address 0x0. Furthermore, the vector table in previous ARM
devices was instructions (you have to put a branch instruction there so that your exception
handler can be put in another location) In the Cortex-M3, the initial value for the MSP is put
at the beginning of the memory map, followed by the vector table, which contains vector
address values. (The vector table can be relocated to another location later, during program
execution.) In addition, the contents of the vector table are address values not branch
instructions. The first vector in the vector table (exception type 1) is the reset vector, which
is the second piece of data fetched by the processor after reset.
RESET SEQUENCE