21CS43 - Module 1
21CS43 - Module 1
Greater
Complexity
RISC CISC
Q. What are the silent features of ARM instruction set are suitable for embedded applications?
Answer:
In the following ways that make the ARM instruction set suitable for embedded applications:
• Variable cycle execution for certain instructions—Not every ARM instruction executes in a
single cycle. For example, load-store-multiple instructions vary in the number of execution cycles
depending upon the number of registers being transferred.
• Inline barrel shifter leading to more complex instructions—The inline barrel shifter is a
hardware component that preprocesses one of the input registers before it is used by an instruction.
This expands the capability of many instructions to improve core performance and code density.
Or
With a neat diagram explain the different hardware components of an embedded device based on ARM
core.
Answer: Figure shown below shows a typical embedded device based on ARM core. Each box represents
a feature or function.
ARM ROM
Processor Memory Controller FLASH ROM
SRAM
DRAM
Interrupt Controller
AHB-external bridge External bus
AHB Arbiter
AHB-APB bridge
Ethernet
Real-time clock
Counter/timers
Console Serial UARTs
• ARM processor based embedded system hardware can be separated into the following four main
hardware components:
o The ARM processor: The ARM processor controls the embedded device. Different
versions of the ARM processor are available to suits the desired operating characteristics.
o Controllers: Controllers coordinate important blocks of the system. Two commonly
found controllers are memory controller and interrupt controller.
o Peripherals: The peripherals provide all the input-output capability external to the chip
and responsible for the uniqueness of the embedded device.
o Bus: A bus is used to communicate between different parts of the device.
▪
• AMBA Bus Protocol
o The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 and
has been widely adopted as the on-chip bus architecture used for ARM processors.
o The first AMBA buses introduced were the ARM System Bus (ASB) and the ARM
Peripheral Bus (APB).
o Later ARM introduced another bus design, called the ARM High Performance Bus
(AHB).
o AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design.
o
• MEMORY
o An embedded system has to have some form of memory to store and execute code.
o Figure below shows the memory trade-offs: the fastest memory cache is physically located
nearer the ARM processor core and the slowest secondary memory is set further away.
Generally the closer memory is to the processor core, the more it costs and the smaller its
• A peripheral can simply be bolted onto the on-chip bus without having to redesign an
interface for different processor architecture.
• This plug-and-play interface for hardware developers improves availability and time to
market.
• AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design.
• This change allows the AHB bus to run at higher clock speeds and to be the first ARM
bus to support widths of 64 and 128 bits.
• ARM has introduced two variations on the AHB bus: Multi-layer AHB and AHB-Lite.
• In contrast to the original AHB, which allows a single bus master to be active on the bus
at any time, the Multi-layer AHB bus allows multiple active bus masters.
• AHB-Lite is a subset of the AHB bus and it is limited to a single bus master. This bus
was developed for designs that do not require the full features of the standard AHB bus.
Answer:
• An embedded system requires software to drive it. Figure below shows typical software
components required to control an embedded device.
• Each software components in the stack uses a higher level of abstraction to separate the code
from the hardware device.
• An ARM core as functional units connected by data buses, as shown in Figure1, where, the arrows
represent the flow of data, the lines represent the buses, and the boxes represent either an operation
unit or a storage area.
• The instruction decoder translates instructions before they are executed.
REGISTERS
Q5. Explain briefly the active registers available in user mode.
OR
With a neat diagram explain the different general purpose registers of ARM processors.
Answer: Figure shown below shows the active registers available in user mode. All the registers shown
are 32 bits in size.
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15
• There are up to 18 active registers: 16 data registers and 2 processor status registers. The data
registers are visible to the programmer as r0 to r15.
• The ARM processor has three registers assigned to a particular task: r13, r14 and r15.
• Register r13: Register r13 is traditionally used as the stack pointer (sp) and stores the head of the
stack in the current processor mode.
Bit 31 30 29 28 7 6 5 4 0
N Z C V I F T Mode
Function
Thumb
state
• The cpsr is divided into four fields, each 8 bits wide: flags, status, extension and control.
• In current designs the extension and status fields are reserved for future use.
• The control field contains the processor mode, state and interrupts mask bits.
• The flag field contains the condition flags.
• The following table gives the bit patterns that represent each of the processor modes in the cpsr.
Mode Mode[4:0]
Abort 10111
Fast interrupt request 10001
Interrupt request 10010
Supervisor 10011
System 11111
Undefined 11011
User 10000
• When cpsr bit 5, T=1, then the processor is in Thumb state. When T=0, the processor is in ARM
state.
• For example, if SUBS subtract instruction results in a register value of zero, then the Z flag in the
cpsr is set.
Processor Mode
Answer:
PIPELINE
Q9. With neat diagram explain the various blocks in a 3 stage pipeline of ARM processor
organization.
OR
Explain ARM pipeline with 3,5,6 stages.
Answer:
• Pipeline is the mechanism to speed up execution by fetching the next instruction while other
instruction are being decoded and executed.
• Figure 1 shows the ARM7 three-stage pipeline.
Execute
Figure 1: ARM7 Three-stage pipeline
• Fetch loads an instruction from memory.
• Decode identifies the instruction to be executed.
• Execute processes the instruction and writes the result back to a register.
• Figure 2 illustrates the pipeline using a simple example. It shows a sequence of three instructions
being fetched, decoded and executed by the processor.
• Each instruction takes a single cycle to complete after the pipeline is filled.
o In the first cycle, the core fetches the ADD instruction from the memory.
o In the second cycle, the core fetches the SUB instruction and decode the ADD
instruction.
o In the third cycle, the core fetches CMP instruction from the memory, decode the SUB
instruction and execute the ADD instruction.
o The ADD instruction is executed, the SUB instruction is decoded, and the CMP
instruction is fetched. This procedure is called filling the pipeline.
Time
Cycle 2
Cycle 3
• The pipeline design for each ARM family differs. For example, the ARM9 core increases the
pipeline length to five stages as shown in the figure below.
• The ARM10 increases the pipeline length still further by adding a sixth stage as shown in the
figure below.
• As the pipeline length increases the amount of work done at each stage is reduced, which allows
the processor to attain a higher operating frequency. This in turn increases the performance.
• Pipeline Executing Characteristics
a. The ARM pipeline has not processed an instruction until it passes completely through the
execute stage. For example, an ARM7 pipeline (with three stages) has executed an instruction
only when the fourth instruction is fetched. Figure below shows an instruction sequence on an
ARM7 pipeline.
• Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
• Undefined instruction vector is used when the processor cannot decode the instruction.
• Software interrupt vector is called when SWI instruction is executed. The SWI is frequently
used as the mechanism to invoke an operating system routine.
• Prefetch abort vector occurs when the processor attempts to fetch an instruction from an address
without the correct access permissions.
• Data abort vectors is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
• Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor.
• Fast interrupt request vector is similar to the interrupt request but is reserved for hardware
requiring faster response times.
Core Extensions
Q11. Discuss the following with neat diagrams
a. Von Neumann architecture with cache
b. Harvard architecture with TCM
OR
Discuss all 3 core extensions.
Answer:
There are three core extensions wrap around ARM processor: cache and tightly coupled memory, memory
management and the coprocessor interface.
1. Cache and tightly coupled memory: The cache is a block of fast memory placed between main
memory and the core. With a cache the processor core can run for the majority of the time without
having to wait for data from slow external memory.
o ARM has two forms of cache. The first found attached to the Von Neumann-style cores. It
combines both data and instruction into a single unified cache as shown in the figure 1
below.
Subject: 21CS43 Faculty: EMMANUEL R Page 18
Figure 1: A simplified Von Neumann architecture with cache.
o The second form, attached to the Harvard-style cores, has separate cache for data and
instruction as shown figure 2
o A cache provides an overall increase in performance but will not give predictable
execution.
o But for real-time systems it is paramount that code execution is deterministic.
o This is achieved using a form of memory called tightly coupled memory (TCM).
o TCM is fast SRAM located close to the core and guarantees the clock cycles required to
fetch instructions or data.
o By combining both technologies, ARM processors can behave both improved performance
and predictable real-time response. The following diagram shows an example of core with
a combination of caches and TCMs as shown in figure 3
3. Coprocessors:
• A coprocessor extends the processing features of a core by extending the instruction set or by
providing configuration registers.
• More than one coprocessor can be added to the ARM core via the coprocessor interface.
• The coprocessor can be accessed through a group of dedicated ARM instructions that provide a
load-store type interface.
• The coprocessor can also extend the instruction set by providing a specialized instructions that
can be added to standard ARM instruction set to process vector floating-point (VFP) operations.
• These new instructions are processed in the decode stage of the ARM pipeline. If the decode
stage sees a coprocessor instruction, then it offers it to the relevant coprocessor.
• But, if the coprocessor is not present or doesn’t recognize the instruction, then the ARM takes an
undefined instruction exception.