0% found this document useful (0 votes)
21 views292 pages

Embedded Systems

Embedded systems are specialized computing systems designed to perform dedicated functions within larger systems, characterized by low cost, low power consumption, and real-time response capabilities. They can be classified into soft and hard real-time systems based on their timing constraints, with applications spanning consumer electronics, automobiles, and communication devices. The architecture of embedded systems often utilizes microcontrollers and microprocessors, which differ in terms of design complexity, memory hierarchy, and task execution capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views292 pages

Embedded Systems

Embedded systems are specialized computing systems designed to perform dedicated functions within larger systems, characterized by low cost, low power consumption, and real-time response capabilities. They can be classified into soft and hard real-time systems based on their timing constraints, with applications spanning consumer electronics, automobiles, and communication devices. The architecture of embedded systems often utilizes microcontrollers and microprocessors, which differ in terms of design complexity, memory hierarchy, and task execution capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 292

EMBEDDED SYSTEMS

By
Dr. H Manas Singh
Department of ECE
IIIT Manipur
INTRODUCTION TO EMBEDDED
SYSTEM
What are embedded systems?
• Computers embedded within other systems

• Processors are often very simple and inexpensive


• It typically executes a single program related to the application for which
it was built, and it can take inputs from the environment.
Common features:
• Single functioned
• Tight constraints on cost, energy, form factor etc.
• Low cost, low power, small size, relatively fast
• Real-time
• Ideally responds to inputs from the environment and compute in real-time
without delay.
Typical design constraints
• Low cost
• Low energy consumption
• Limited memory
• Real-time response
Define an embedded system
• Embedded systems are computing
systems with tightly coupled hardware
and software integration, that are
designed to perform a dedicated
function.
or
• An embedded system is a combination Fig. A generic embedded system architecture.

of computer hardware and software,


and sometimes mechanical components
as well.
Real-time systems
• Systems that need to respond to a service request within a certain a
mount of time.
• Timing constraint: Typically associated with a real-time computing
constraint.
• A timing constraint is hard if the consequence of a missed deadline is fatal
• A timing constraint is soft if the consequence of a missed deadline is
undesirable but tolerable.
• A system in which all tasks have soft timing constraints is a soft real-
time system.
• A system is a hard real-time system if its key tasks have hard timing
constraints.
Example System Example Timing Constraint Consequence of Missed Classification (soft real-
Deadlines time system/ hard real-
time system)
Digital camera When the shutter speed is
set to 0.5 s,
the shutter open time
should be (0.5
±
0.125) s
99.9% of the time
Antilock braking The antilock braking
system system should apply/
release braking pressure
15 times per second
A wheel that locks up
should stop spinning in
less than 1 s
Example System Example Timing Constraint Consequence of Missed Classification (soft real-
Deadlines time system/ hard real-
time system)
Robot-soccer Once it has caught the ball,
player the robot needs to
kick the ball within 2 s,
with the probability of
breaking this deadline
being less than 10%

Cardiac pacemaker The pacemaker waits for a


ventricular beat
after the detection of an
atrial beat. The lower
bound of the waiting time
is 0.1 s, and the
upper bound of the
waiting time is 0.2 s
A soft real-time system is a real-time system that must meet its deadlines but with a degree of flexibility.
A hard real-time system is a real-time system that must meet its deadlines with a near-zero degree of
flexibility.
A real-time system is called a real-time embedded system if it is designed to be
embedded within some larger system.

Fig. System classification


Applications of embedded systems:
• Consumer segment: refrigerator, microwave, camera, security
systems, dishwasher etc.
• Automobiles: airbags, anti-lock braking system, door lock etc.
• Communication: Mobile phones, modem, network switches etc.
•…
Design challenges:
• Primary design goal: implementation that realizes desired functionality
• Speed, Cost, & Quality : Mutually conflicting in most cases
• Non recurring engineering (NRE) cost: One-time initial cost of designing a
system
• Unit cost: Cost of manufacturing each copy of the system, without NRE
cost
• Size
• Performance
• Power
• Time to prototype
• Time to market
Brain power behind any embedded systems

Basic operation of a computing system


• CPU carries out all the computations
• Input/output provides the interface
to the outside world
Types of instruction set architecture (ISA) of
the CPU
1. Complex instruction set computer (CISC)
• Used in desktops, laptops and servers
• Intel classes of processors
• Complex instruction set with lot of flexibilities and features
2. Reduced instruction set computer (RISC)
• Used in microcontrollers
• Easier to implement
Memory:
1. RAM
• RAMs are used to store data in the data memory
2. ROM
• ROM are used for program memory in microcontroller

*flash memory: Non volatile memory where program is stored, which


could be changed and the program does not get destroyed even if you
switch off the power.
Classification of CPU architecture
• Von Neumann and Harvard class of architectures
• Basic difference is that in the Von Neumann Architecture we have a
single memory where both instructions and data are stored.
• Example: conventional computers are based on Von Neumann
architecture and microcontrollers are based on Harvard architectures
Fig. Von Neumann architecture Fig. Harvard architecture
Microprocessor
• A microprocessor is a multipurpose, programmable, clock driven,
register based semiconductor device which consists of electronic logic
circuits that reads binary instructions from storage device called
memory and processes binary input data according to instructions
and provides results as output.
• Microprocessor: silicon chip which includes ALU, register circuits &
control circuits.
• Microcontroller: silicon chip which includes microprocessor, memory
& I/O in a single package.
Figure. Microprocessor
▪ Reprogrammable systems
• Its a general-purpose microprocessor used for Computing and Data
Processing. Consists of mass storage devices.
▪ Embedded Systems
• Application specific and its a part of a fixed product. Not
programmable. Ex: Microcontroller.
▪ Programmable device:
• The microprocessor can perform different sets of operations on the
data it receives depending on the sequence of instructions supplied in
the given program.
• By changing the program, the microprocessor manipulates the data in
different ways.
• A set of instructions written for the microprocessor to perform a task
is called program, and group of programs is called software.
• Instructions: Each microprocessor is designed to execute a specific
group of operations. This group of operations is called an instruction
set. This instruction set defines what the microprocessor can and
cannot do.
• The Microprocessor takes the data from input devices and these are
devices that bring data into the system from the outside world (Ex:
keyboard, a mouse and switches etc).
• Microprocessor only understands binary numbers.
• Fixed set of instructions - in the form of binary patterns machine
language.
• Mnemonics - which forms the assembly language for a given
computer.
• Machine language - Assembly language - High level language(C, C++
and Java).
• A binary digit is called a bit (which comes from binary digit).
• Microprocessor recognizes and processes a group of bits together.
This group of bits is called a word.
• Microprocessors come in many different levels of sophistication, and
they are usually classified by their word size.
• 8 bit microcontroller - low cost applications (on board memory and
I/O devices).
• 16 bit microcontroller - sophisticated applications (off chip I/O
devices, large word lengths and memory).
• 32 bit microcontroller - RISC microprocessors (high performance
computing applications).
• A group of 8-bits were referred to as a half-word or byte.
• A group of 4 bits is called a nibble.
• Also, 32-bit groups were given the name long word.
• Today, all processors manipulate at least 32 bits at a time and there
exists microprocessors that can process 64, 80, 128 bits.
• Moreover, today microprocessor is designed to understand and
execute many binary instructions.
ALU
• Every microprocessor has arithmetic operations such as add and subtract
as part of its instruction set.
• Most of the microprocessors will have operations such as multiply, divide
and also the newer ones will have complex operations such as square root.
• In addition, microprocessors have logic operations such as AND, OR, XOR,
shift left, shift right, etc.
• Today, all processors manipulate at least 32 bits at a time and there exists
microprocessors that can process 64, 80, 128 bits.
• Again, the number and types of operations define the microprocessor’s
instruction set and depends on the specific microprocessor (application
specific).
MEMORY
• Memory is the location where information is kept while not in current use.
• Memory is a collection of storage devices (cell or D-FF). Usually, each
storage device holds one bit.
• When a program is entered into a computer, it is stored in memory. Then
as the microprocessor starts to execute the instructions, it brings the
instructions from memory one at a time.
• Memory is also used to hold the data.
• Again, the number and types of operations define the microprocessor’s
instruction set and depends on the specific microprocessor (application
specific).
OUTPUT DEVICES
• The results must be presented in a human readable form on an
output device.
• This can be the monitor, a paper from the printer, a simple LED or
many other forms.
• Example: The clock initiates the adding operation. Similarly, the bit
pattern of an instruction set initiates a sequence of clock signals
which activates the appropriate logic circuits in the ALU to execute
the given task.
MICROPRECESSOR

Figure: Microprocessor
Figure: Microprocessor
MACHINE LANGUAGE
• Each microprocessor has its own binary words, meanings, and language.
• The word (or word length) is defined as the number of bits the
microprocessor recognizes and process at a time.
• To communicate with computer, one must give instructions in binary
language (machine language).
• The number of bits that form the word of a microprocessor is fixed for that
particular processor.
• For example, an 8-bit microprocessor can have at most 256 different
combinations.
• Each of these patterns forms an instruction for the microprocessor.
• The complete set of patterns makes up the microprocessor’s machine
language.
Microcontroller
• A microcontroller (µC) is a small computer on a single integrated
circuit consisting of a central processing unit (CPU) combined with
peripheral devices such as memories, I/O devices, and timers as well
as ADC’s.
• An 8-bit µC is designed for low-cost applications and includes on-board
memory and I/O devices.
• A 16-bit µC is often used for more sophisticated applications that may require
either longer word lengths or off-chip I/O and memory
• A 32-bit RISC µP offers very high performance for computation-intensive
applications.
How to distinguish between microcontroller
and general-purpose processors?
• Microcontrollers are generally associated with the embedded applications.
• Microprocessors are associated with the desktop computers.
• Microcontrollers will have simpler memory hierarchy i.e. the RAM and
ROM may exist on the same chip and generally the cache memory will be
absent.
• The power consumption and temperature rise of µC is restricted because
of the constraints on the physical dimensions.
• 8-bit and 16-bit microcontrollers are very popular with a simpler design as
compared to large bit-length (32-bit, 64-bit) complex general purpose
processors.
Figure. Performance vs cost regions
Figure: Microprocessor-based system
Figure: Microcontroller-based system
Microprocessor Microcontroller
Microprocessors are multitasking in nature. Single task oriented. For example, a washing
Can perform multiple tasks at a time. For machine is designed for washing clothes only.
example, on computer we can play music while
writing text in text editor.
RAM, ROM, I/O Ports, and Timers can be RAM, ROM, I/O Ports, and Timers cannot be
added externally and can vary in numbers. added externally. These components are to be
embedded together on a chip and are fixed in
numbers.
Designers can decide the number of memory Fixed number for memory or I/O makes a
or I/O ports needed. microcontroller ideal for a limited but specific
task.
External support of external memory and I/O Microcontrollers are lightweight and cheaper
ports makes a microprocessor-based system than a microprocessor.
heavier and costlier.
External devices require more space and their A microcontroller-based system consumes less
power consumption is higher. power and takes less space.
CISC (8051) and RISC (ARM) architectures
• A microcontrollers (µC’s) with Harvard architecture are called “RISC
µC’s”.
• µC’s with von-Neumann’s architecture are called “CISC µC’s”.
Von Neumann Architecture

Figure: Von Neumann architecture


CISC Architecture (Von Neumann Architecture)
• Also known as Princeton architecture.
• In CISC processor, a single instructions can execute several low-level
operations (such as a load from memory, an arithmetic operation, and a
memory store) or are capable of multi-step operations or addressing
modes within single instructions.
• Here, a memory to store both data and instructions as the result an
instruction fetch and a data operation cannot occur at the same time
because they share a common bus.
• Limits operating bandwidth.
• Uses unified cache memory: instructions and data may be stored in the
same cache memory.
Harvard Architecture

Figure: Harvard Architecture


RISC Architecture (Harvard Architecture)
• RISC is a design philosophy aimed at delivering simple but powerful
instruction that execute within a single cycle at high clock speed.
• RISC Architectures are employed entirely separate memory systems to
store instructions and data.
• Uses two separate memory spaces for program instructions and data -
separate pathways with separate address spaces.
• The CPU can both read an instruction and perform a data memory access
at the same time, even without a cache.
• RISC processors have a CPI (clock per instruction) of one cycle. This is due
to the optimization of each instruction on the CPU and pipelining.
• The RISC design philosophy generally incorporates a larger number of
registers to prevent in large amounts of interactions with memory
Von-Neumann Architecture Harvard Architecture
Single memory to be shared by both Separate memories for code and
code and data. data.
Processor needs to fetch code in a Single set of clock cycles is sufficient,
separate clock cycle and data in as separate buses are used to access
another clock cycle. So, it requires code and data.
two sets of clock cycles.

Pipelining is not Possible Pipelining is Possible


Simple in design. Complex in design
CISC RISC
Larger set of instructions. Easy to program Smaller set of Instructions. Difficult to
program.
Simpler design of compiler, considering larger Complex design of compiler.
set of instructions.

Many addressing modes causing complex Few addressing modes, fix instruction
instruction formats. format.

Pipelining is not possible. Pipelining of instructions is possible,


considering single clock cycle.
Higher clock cycles per second. Low clock cycle per second.
Emphasis is on hardware. Emphasis is on software.
Control unit implements large instruction set Each instruction is to be executed by
using microprogram unit. hardware.

Slower execution, as instructions are to be read Faster execution, as each instruction is to be


from memory and decoded by the decoder executed by hardware.
unit.
Embedded Systems - 8051 Microcontroller
• 8085 and 8086 microprocessors were also invented by Intel.
• In 1981, Intel introduced an 8-bit microcontroller called the 8051.
• It is a basic type of micro controller based on Harvard & RISC
Architectures and developed primarily for use in embedded systems
technology.
• Intel redesigned Microcontroller 8051 using CMOS technology.
• 80C51 it is an 8 bit microcontroller.
Comparison between 8051 Family Members
Features of 8051 Microcontroller
• 64K bytes on-chip program memory • 16 bit Timers (usually 2, but may have
(ROM) more or less)
• 128 bytes on-chip data memory • Three internal and two external
(RAM) Interrupts
• Four register banks • Four 8-bit ports,(short model have two
• 128 user defined software flags 8-bit ports)
• 8-bit bidirectional data bus • 16-bit program counter and data pointer
• 16-bit unidirectional address bus • 8051 may also have a number of special
• 32 general purpose registers each of features such as UARTs, ADC, Op-amp,
8-bit etc.
Architecture of 8051 Microcontroller:
Architecture of 8051 Microcontroller:
• CPU: CPU is the brain of any processing device of the microcontroller. It
monitors and controls all operations that are performed on the
Microcontroller units.
• Interrupts: An Interrupts gives us a mechanism to put on hold the ongoing
operations, execute a subroutine and then again resumes to another type
of operations. Generally, five interrupt sources are there in 8051
Microcontroller.
• Memory: Microcontroller requires a program which is a collection of
instructions. It is known as ROM memory of microcontroller also requires a
memory to store data or operands temporarily of the micro controller.
✓8051 microcontroller has 4K of code memory or program memory, that has 4KB ROM
and also 128 bytes of data memory of RAM.
Architecture of 8051 Microcontroller:
• BUS: Bus is a collection of wires which work as a communication
channel or medium for transfer of Data. These buses consists of 8, 16
or more wires of the microcontroller.
✓Address Bus: Microcontroller 8051 has a 16 bit address bus for transferring
the data. It is used to address memory locations and to transfer the address
from CPU to Memory of the microcontroller. It has five addressing modes that
are:
✓ Immediate addressing modes.
✓ Register indirect addressing mode.
✓ Bank address (or) Register addressing mode.
✓ Indexed or memory indirect Addressing mode
✓ Direct Addressing mode.
Architecture of 8051 Microcontroller:
✓Data Bus: Microcontroller 8051 has 8 bits of the data bus, which is used to
carry data of particular applications.
• Oscillator: Microcontroller 8051 has an on-chip oscillator which works
as a clock source for Central Processing Unit of the microcontroller.
Typical frequency of 8051 MC is 11.0592MHz.
• Input/Output Port: To connect it to other machines, devices or
peripherals we require I/O interfacing ports in the microcontroller
interface. Microcontroller 8051 has 4 input, output ports to connect it
to the other peripherals. Out of these 4 ports, 2 ports are used for
address and data where 8bits of data and 8bits of lower byte of
address are multiplexed.
Architecture of 8051 Microcontroller:
• Timers/Counters: 8051 microcontroller has two 16 bit timers and
counters. These counters are again divided into a 8 bit register. The
timers are used for measurement of intervals to determine the pulse
width of pulses.
Applications of 8051 Microcontroller:
• Light sensing and controlling • Industrial instrumentation devices
devices • Process control devices
• Fire detections and safety devices • Voltmeter applications
• Temperature sensing and • Current meter objects
• controlling devices • Measuring and revolving objects
• Automobile applications • Hand held metering system
• Defense applications
8051 Microcontroller Applications in
Embedded Systems
• Arduino Managed High Sensitive LDR • Parking Availability Indication System
based Power Saver for Street Light • Voice Controlled Home Appliances
Control System
• The Temperature Humidity Monitoring • Remote Control Home Appliances
System of Soil Based on Wireless Sensor • PC Mouse operated Electrical Load
Networks using Control Using VB Application
• Arduino RFID based Electronic Passport • Solar Highway Lighting System with Auto
System for Easy Governance using Turn Off in Daytime
Arduino
• Arduino based RFID Sensed Device Access
• Arduino based DC Motor Speed Control
• Arduino Based Line Following Robot
• Zigbee based Automatic Meter Reading
System
Basic PIN setup
• PIN 9: PIN 9 is the reset pin which is used to reset
the microcontroller’s internal registers and ports
upon starting up.
• PINS 18 & 19: The 8051 has a built-in oscillator
amplifier hence we need to only connect a crystal at
these pins to provide clock pulses to the circuit.
• PIN 40 and 20: Pins 40 and 20 are VCC and ground
respectively.
• PINS 29, 30 & 31:
• Pin 29: If we use an external ROM then it should have a
logic 0 which indicates microcontroller to read data
from memory.
• Pin 30: This Pin is used for ALE that is Address Latch
Enable.
• Pin 31: If we have to use multiple memories then by
applying logic 1 to this pin instructs Microcontroller to
read data from both memories first internal and
afterwards external.
Hardware Connection of Pins
• Vcc − Pin 40 provides supply to the Chip and it is +5 V.
• Gnd − Pin 20 provides ground for the Reference.
• XTAL1, XTAL2 (Pin no 18 & Pin no 19) − 8051 has on-chip oscillator but requires
external clock to run it. A quartz crystal is connected between the XTAL1 & XTAL2
pin of the chip.
• RST (Pin No. 9) − It is an Input pin and active High pin.
• EA or External Access (Pin No. 31) − It is an input pin. This pin is an active low pin;
upon applying a low pulse, it gets activated.
• PSEN or Program store Enable (Pin No 29) − This is also an active low pin, i.e., it
gets activated after applying a low pulse.
• ALE or (Address Latch Enable) − This is an Output Pin and is active high. It is
especially used for 8031 IC to connect it to the external memory.
Embedded Systems - Terms
▪ Program Counter:
• The Program Counter is a 16- or 32-bit register which contains the address of
the next instruction to be executed.
• The PC automatically increments to the next sequential memory location
every time an instruction is fetched.
• Activating a power-on reset will cause all values in the register to be lost.
▪ Reset Vector:
• The significance of the reset vector is that it points the processor to the
memory address which contains the firmware's first instruction.
• Upon reset, the processor loads the Program Counter (PC) with the reset
vector value from a predefined memory location.
Embedded Systems - Terms
• Stack Pointer:
• Stack is implemented in RAM and a CPU register is used to access it called SP
(Stack Pointer) register.
• Stack Pointer register is an 8-bit register and can address memory addresses
of range 00h to FFh.
• When the content of a CPU register is stored in a stack, it is called a PUSH
operation. When the content of a stack is stored in a CPU register, it is called a
POP operation.
Embedded Systems - Terms
• Infinite Loop:
• An infinite loop or an endless loop can be identified as a sequence of
instructions in a computer program that executes endlessly in a loop, because
of the following reasons:
• loop with no terminating condition
• loop with a terminating condition that can never be met
• loop with a terminating condition that causes the loop to start over
• Embedded systems, unlike a PC, never "exit" an application. They idle
through an Infinite Loop waiting for an event to take place in the form
of an interrupt, or a pre-scheduled task.
Embedded Systems - Terms
• Interrupts:
• Interrupts are mostly hardware mechanisms that instruct the program that an
event has occurred.
• Handled by a corresponding Interrupt Service Routine (ISR).
Embedded Systems - Assembly Language
• Assembly languages were developed to provide mnemonics or
symbols for the machine level code instructions.
• Assembler is a program which converts assembly language into
machine code.
• A compiler translates a high-level language into machine code.
• A program language instruction consists of the following four fields:
Embedded Systems - Assembly Language

• The label field allows the program to refer to a line of code by name.
The label fields cannot exceed a certain number of characters.
• The mnemonics and operands fields together perform the real work
of the program and accomplish the tasks.
Embedded Systems - Assembly Language
Assembling and Running an 8051 Program:
Editor program
1. First, we use an editor to type in a program similar
to the above program. The Editor must be able to
produce an ASCII file. The "asm" extension for the .asm
source file is used by an assembler in the next step. assembler
2. The "asm" source file contains the program code program
created in Step 1. It is fed to an 8051 assembler
which produces an .obj file (object file) and a .lst file .obj & .lst
(list file). It is also called as a source file.
3. Assemblers require a third step called linking. The linker program
link program takes one or more object files and
produces an absolute object file with the extension
"abs".
.abs
4. Next, the "abs" file is fed to a program called "OH"
(object to hex converter), which creates a file with
the extension "hex" that is ready to burn in to the 0H program
ROM.
.hex
Why Use HEX Files in Embedded Systems?
▪ Error Detection:
• The checksum ensures data integrity during transmission or loading.
▪ Human Readability:
• Unlike binary files, HEX files can be inspected and debugged manually.
▪ Ease of Parsing:
• Programming tools and bootloaders can easily parse the structured format.
▪ Target Compatibility:
• Memory addresses in HEX files ensure the firmware is loaded into the correct
locations in the target device.
Data Type
• The 8051 microcontroller contains a single data type of 8-bits, and
each register is also of 8-bits size.
• The programmer has to break down data larger than 8-bits (00 to
FFH, or to 255 in decimal) so that it can be processed by the CPU.
▪ DB (Define Byte):
• The DB directive is the most widely used data directive in the assembler.
• It is used to define the 8-bit data.
• It can also be used to define decimal, binary, hex, or ASCII formats data.
• For decimal, the "D" after the decimal number is optional, but it is required
for "B" (binary) and "H" (hexadecimal).
Assembler Directives
• Some of the directives of 8051 are as follows:
1. ORG (origin):
• The origin directive is used to indicate the beginning of the address.
• It takes the numbers in hexa or decimal format.
2. EQU (equate)
• It is used to define a constant without occupying a memory location.
• EQU associates a constant value with a data label so that the label appears in the
program, its constant value will be substituted for the label.
3. END directive:
• It indicates the end of the source (asm) file.
• The END directive is the last line of the program; anything after the END directive is
ignored by the assembler.
Embedded Systems - Registers
• Registers are used in the CPU to store information on temporarily
basis which could be data to be processed, or an address pointing to
the data which is to be fetched.
• In 8051, there is one data type of 8-bits.
• The most widely used registers of the 8051 are A (accumulator), B,
R0-R7, DPTR (data pointer), and PC(program counter).
Storage Registers in 8051
• Accumulator
• Data Pointer (DPTR)
• B register
• Program Counter (PC)
• R register (R0 – R7)
• Stack Pointer (SP)
Storage Registers in 8051
▪ Accumulator:
• The accumulator, register A, is used for all arithmetic and logic
operations.
• If the accumulator is not present, then every result of each
calculation (addition, multiplication, shift, etc.) is to be stored into the
main memory.
▪ The "R" Registers
• The "R" registers are a set of eight registers, namely, R0, R1 to R7.
• These registers function as auxiliary or temporary storage registers in
many operations.
Storage Registers in 8051
▪ The "B" Register:
• The "B" register is very similar to the Accumulator in the sense that it
may hold an 8bit(1-byte) value.
• The "B" register is used only by two 8051 instructions: MUL AB and
DIV AB.
▪ The Data Pointer:
• The Data Pointer (DPTR) is the 8051’s only user-accessible 16-bit (2-
byte) register.
• It is used by the 8051 to access external memory using the address
indicated by DPTR.
Storage Registers in 8051
▪ The Program Counter:
• The Program Counter (PC) is a 2-byte address which tells the 8051
where the next instruction to execute can be found in the memory.
▪ The Stack Pointer (SP):
• The Stack Pointer, like all registers except DPTR and PC, may hold an
8-bit (1-byte) value.
• The Stack Pointer tells the location from where the next value is to be
removed from the stack.
Embedded Systems - Addressing Modes
▪ An addressing mode refers to how you are addressing a given
memory location. There are five different ways or five addressing
modes to execute this instruction which are as follows −
• Immediate addressing mode
• Direct addressing mode
• Register direct addressing mode
• Register indirect addressing mode
• Indexed addressing mode
Embedded Systems - Addressing Modes
▪ Immediate Addressing Mode:
Embedded Systems - Addressing Modes
▪ Direct Addressing Mode:
Here, the address of the data (source data) is given as an operand.
Embedded Systems - Addressing Modes
▪ Register Direct Addressing Mode:
• In this addressing mode, we use the register name directly (as source
operand). Instruction Opcode Bytes Cycles
MOV A, R4 ECH 1 1
Embedded Systems - Addressing Modes
▪ Register Indirect Addressing Mode:
• In this addressing mode, the address of the data is stored in the
register as operand.

Only R0 and R1 are allowed to


form a register indirect
addressing instruction.
Embedded Systems - Addressing Modes
▪ Indexed Addressing Mode:
Embedded Systems - Timer/Counter
• A timer is a specialized type of clock which is used to measure time
intervals.
• It is a device that counts down from a specified time interval and used
to generate a time delay, for example, an hourglass is a timer.

• A counter is a device that stores the number of times a particular


event or process occurred, with respect to a clock signal.
• In electronics, counters can be implemented quite easily using
register-type circuits such as a flip-flop.
Embedded Systems – Interrupt
• An interrupt is a signal to the processor emitted by hardware or
software indicating an event that needs immediate attention.
• Whenever an interrupt occurs, the controller completes the execution
of the current instruction and starts the execution of an Interrupt
Service Routine (ISR) or Interrupt Handler.
• ISR tells the processor or controller what to do when the interrupt
occurs.
• The interrupts can be either hardware interrupts or software
interrupts.
Embedded Systems – Interrupt
• Hardware Interrupt: A hardware interrupt is an electronic alerting
signal sent to the processor from an external device, like a disk
controller or an external peripheral.
• Software Interrupt: A software interrupt is caused either by an
exceptional condition or a special instruction in the instruction set
which causes an interrupt when it is executed by the processor.
• Software interrupt instructions work similar to subroutine calls.
Steps to Execute an Interrupt
▪ When an interrupt gets active, the microcontroller goes through the
following steps −
• The microcontroller closes the currently executing instruction and saves the address
of the next instruction (PC) on the stack.
• It also saves the current status of all the interrupts internally (i.e., not on the stack).
• It jumps to the memory location of the interrupt vector table that holds the address
of the interrupts service routine.
• The microcontroller gets the address of the ISR from the interrupt vector table and
jumps to it. It starts to execute the interrupt service subroutine, which is RETI (return
from interrupt).
• Upon executing the RETI instruction, the microcontroller returns to the location
where it was interrupted. First, it gets the program counter (PC) address from the
stack by popping the top bytes of the stack into the PC. Then, it start to execute from
that address.
History of ARM Series of Microcontroller
• Architectural idea developed in 1983 by Acorn Computers
• To replace the 8-bit 6502 microprocessors in BBC computers
• The first commercial RISC implementation
• The company founded in 1990
• ARM: Advanced RISC Machine
• Initially owned by Acorn, Apple and VLSI
Why ARM?
• One of the widely used processor cores.
• Some applications:
• ARM7: iPod
• ARM9: BenQ, Sony Ericsson
• ARM11: Apple iPhone, Nokia N93
• 90%of 32-bit embedded RISC processor till
2010
• Mainly used in battery operated
devices:
• Due to low power consumption and
reasonably good performance
About ARM Processors
• A simple RISC-based architecture wit powerful design
• Design philosophy:
• Small processor for lower power consumption
• High code density for limited memory and physical size restrictions
• Can interface with slow and low-cost memory systems
• Reduced die size for processor to accommodate more peripherals
Popular ARM architectures
• ARM7
• 3 pipeline stages (fetch/decode/execute)
• High code density/lower power consumption
• Most widely used for low-end systems
• ARM9
• Compatible with ARM7
• 5 stages (fetch/decode/execute/memory/write)
• Separate instruction and data cache
• ARM10
• 6 stages (fetch/issue/decode/execute/memory/write)
ARM Family comparison
ARM IS based on RISC Architecture
• Major design feature:
• Instructions: reduced set/single cycle/fixed length
• Pipeline: decode in one stage/no need for microcode
• Registers: large number of general-purpose registers (GPRs)
• Load/Store Architecture: data processing instructions work on registers only;
load/store instructions to transfer data from/to memory
ARM Feature:
• ARM architecture is different from pure RISC:
• Variable cycle execution for certain instructions
• Multiple-register load/store for higher code density
• In-line barrel shifter results in more instructions
• Improves performance and code density
• Thumb 16-bit instruction set
• Results in improvement in code density by about 30%
• Conditional execution
• Reduces branch and improves performance
• Enhanced instructions
• Some DSP instructions are present
ARM architecture
• ARM processors requires fewer transistors as a result:
• The cost will be decreased.
• Power consumption will be reduced which leads to less heat dissipation.
• These characteristics are desirable for light, portable (PDA’s), battery-powered
devices, including smartphones, laptops and tablet computers, and other
embedded systems.
• Not only portable devices, for supercomputers, which consume large
amounts of electricity, ARM could also be a power-efficient solution.
• The ARM processor core is a key component of many successful 32-bit
embedded systems.
ARM
• The ARM instruction set differs from the pure RISC definition several ways
that make the ARM instruction set suitable for embedded applications.
• Variable cycle execution for certain instructions - Not every ARM instruction executes
in a single cycle.
• Inline barrel shifter leading to more complex instructions - It is a hardware
component that preprocesses one of the input registers before it is used by an
instruction.
• Thumb 16-bit instruction set - It prelimits the ARM core to execute either 16 bit or 32
bit instructions .
• Conditional execution - An instruction is only executed when a specific condition has
been satisfied.
• Enhanced instructions - The enhanced digital signal processor (DSP)instructions were
added to the standard ARM instruction set to support fast 16 × 16-bit multiplier
operations and saturations.
Figure: An example of an ARM based embedded device, a microcontroller
Embedded System Hardware
Embedded Device:
• ARM processor controls the embedded device. An ARM processor
comprises a core plus the surrounding components that interface it
with a bus.
• Controllers coordinate important functional blocks of the system.
• The peripherals provide all the input - output capability external to
the chip and responsible for the uniqueness of the embedded device.
• A bus is used to communicate between different parts of the device.
ARM Bus technology
• Embedded device use an on-chip bus that is internal to the chip and
that allows different peripheral devices to be interconnected with an
ARM core.
• The ARM core is a bus master and peripherals tend to be bus slaves.
A logical devices capable only of responding to a transfer request
from a bus master device.
• A bus has two architecture levels:
• The first is a physical level that covers the electrical characteristics and bus
width (16, 32 or 64 bits).
• The second level deals with protocol - the logical rules that governs the
communication between the processor and a peripheral.
Advanced Microcontroller Bus Architecture
(AMBA) Bus Protocol
• The first AMBA buses introduced were the ARM System Bus (ASB) and the
ARM Peripheral Bus (APB).
• ARM high performance bus (AHB) provides higher data throughput than
ASB because it is based on a centralized multiplexed bus scheme rather
than the ASB bidirectional bus design.
• ARM introduced two variations on the AHB bus: Multi-layer AHB and AHB-
lite.
• AHB allows a single bus master to be active on the bus at any time, and the
Multi-layer AHB allows multiple active bus masters.
• AHB-lite is a subset of the AHB bus and it is limited to single bus master.
• AHB and Multi-layer AHB support the same protocol for master and slave
but have different interconnects.
Memory
• The fastest memory cache is physically
located nearer the ARM processor core
and the slowest secondary memory is set
further away.
• Generally, the closer memory is to the
processor core, the more it costs and the
smaller its capacity.
• The cache is placed between main
memory and the core. It is used to speed
up data transfer between the processor
and main memory.
Memory
• Read-only memory (ROM) is the least flexible of all memory types
because it contains an image that is permanently set at production
time and cannot be reprogrammed.
• ROMs are used in high-volume devices that require no updates or
corrections. Many devices also use a ROM to hold boot code.
• Flash ROM can be written to as well as read, but it is slow to write. Its
main use is for holding the device firmware or storing long term data
that needs to be preserved after power is off.
Memory
• Dynamic random access memory (DRAM) is the most commonly used
RAM for devices.
• It has the lowest cost per megabyte compared with other types of
RAM.
• DRAM is dynamic—it needs to have its storage cells refreshed and
given a new electronic charge every few milliseconds
Memory
• Static random access memory (SRAM) is faster than the more
traditional DRAM, but requires more silicon area.
• SRAM is static—the RAM does not require refreshing.
• The access time for SRAM is considerably shorter than the equivalent
DRAM because SRAM does not require a pause between data
accesses. Because of its higher cost, it is used mostly for smaller high-
speed tasks, such as fast memory and caches.
Memory
• Synchronous dynamic random access memory (SDRAM) is one of
many subcategories of DRAM. It can run at much higher clock speeds
than conventional memory.
• SDRAM synchronizes itself with the processor bus because it is
clocked. Internally the data is fetched from memory cells, pipelined,
and finally brought out on the bus in a burst.
• The old-style DRAM is asynchronous, so does not burst as efficiently
as SDRAM.
Peripherals
• A peripheral device performs input and output functions for the chip by
connecting to other devices or sensors that are off-chip.
• Each peripheral devices usually performs a single function and may reside
on-chip.
• Peripherals range from a simple serial communication device to a more
complex 802.11 wireless device.
• All ARM peripherals are memory mapped.
• Controllers are specialized peripherals that implement higher levels
functionality within an embedded system.
• Memory Controllers
• Interrupt controllers
Memory Controller
• Memory controllers connect different types of memory to the
processor bus.
• On power-up a memory controller is configured in hardware to allow
certain memory devices to be active. These memory devices allow
the initialization code to be executed.
Interrupt Controller
• When a peripheral or device require attention, it raises interrupt to the processor.
• An interrupt controller provides a programmable governing policy that allows
software to determine which peripheral or device can interrupt processor at any
specific time by setting the appropriate bits in the interrupt controls registers.
• There are two types of interrupt controller available for the ARM processor:
standard interrupt controller and the vector interrupt controller (VIC).
• The standard interrupt controller sends an interrupt signal to the processor core
when an external device request servicing.
• The interrupt handler determines which device require servicing by reading a
device bitmap register in the interrupt controller.
• The VIC is more powerful than the standard interrupt controller because it
prioritizes interrupts and simplifies the determination which device cause the
interrupt.
INTRODUCTION TO ARM
Embedded System Software
• Four typical software components required to
control an embedded device
• The initialization code is the first code executed
on the board and is specific to a particular target
or group of targets.
• It sets up the minimum parts of the board
before handing control over to the
operating system.
• The operating system provides an infrastructure
to control applications and manage hardware
system resources.
• The device drivers provide a consistent software Fig. Software abstraction layers executing on hardware
interface to the peripherals on the hardware
device.
• The software components can run from ROM or
RAM. ROM code that is fixed on the device (for
example, the initialization code) is called
firmware.
Initialization (Boot) Code
• Initialization code (or boot code) takes the processor from the reset
state to a state where the operating system can run.
• It usually configures the memory controller and processor caches and
initializes some devices.
• The initialization code handles a number of administrative tasks prior
to handing control over to an operating system image.
• Different tasks into three phases:
• Initial hardware configuration
• Diagnostics
• Booting.
Initialization (Boot) Code - Initial hardware
configuration
• Initial hardware configuration involves setting up the target platform
so it can boot an image.
• The target platform itself comes up in a standard configuration, this
configuration normally requires modification to satisfy the
equirements of the booted image.
• For example, the memory system normally requires reorganization of
the memory map
Initialization (Boot) Code - Diagnostics
• Diagnostics are often embedded in the initialization code.
• Diagnostic code tests the system by exercising the hardware target to
check if the target is in working order.
• It also tracks down standard system-related issues.
• The primary purpose of diagnostic code is fault identification and
isolation.
Initialization (Boot) Code - Booting
• Booting involves loading an image and handing control over to that
image.
• Booting an image is the final phase, but first you must load the image.
• Loading an image involves anything from copying an entire program
including code and data into RAM, to just copying a data area
containing volatile variables into RAM.
• Once booted, the system hands over control by modifying the
program counter to point into the start of the image.
Fig. Memory remapping
It is common for ARM-based embedded systems to provide for memory remapping because it allows the system to
start the initialization code from ROM at power-up.
Operating System
• The initialization process prepares the hardware for an operating
system to take control.
• An operating system organizes the system resources:
• Peripherals
• Memory
• Processing time.
• ARM processors support over 50 operating systems.
• We can divide operating systems into two main categories:
• real-time operating systems (RTOSs)
• platform operating systems.
Operating System
• RTOSs provide guaranteed response times to events.
• Systems running an RTOS generally do not have secondary storage.
• Platform operating systems require a memory management unit to
manage large, nonreal-time applications and tend to have secondary
storage.
Applications
• ARM processors are found in numerous market segments, including networking,
automotive, mobile and consumer devices, mass storage, and imaging.
• In contrast, ARM processors are not found in applications that require leading-
edge high performance. Because these applications tend to be low volume and
high cost.
ARM Processor Fundamentals

the arrows represent the flow of


data, the lines represent the
buses, and the boxes represent
either an operation unit or a
storage area.

Fig. ARM core dataflow model.


ARM Processor Fundamentals
• The ARM processor, like all RISC processors, uses a load-store
architecture.
• It has two instruction types for transferring data in and out of the
processor:
• Load instructions copy data from memory to registers in the core
• Store instructions copy data from registers to memory.
• Data processing is carried out solely in registers.
• ARM instructions typically have two source registers, Rn and Rm, and
a single result or destination register, Rd.
ARM Processor Fundamentals - Registers
• General-purpose registers hold either data or an address.
• They are identified with the letter r prefixed to the register
number.
• All the registers shown are 32 bits in size.
• There are up to 18 active registers: 16 data registers and 2
processor status registers.
• The data registers are visible to the programmer as r0 to r15.

Fig. Registers available in user mode


ARM Processor Fundamentals - Registers
• The ARM processor has three registers assigned to a particular
task or special function:
• r13, r14, and r15.
• Register r13 is traditionally used as the stack pointer (sp) and
stores the head of the stack in the current processor mode.
• Register r14 is called the link register (lr) and is where the
core puts the return address whenever it calls a subroutine.
• Register r15 is the program counter (pc) and contains the
address of the next instruction to be fetched by the
processor.
Fig. Registers available in user mode
ARM Processor Fundamentals - Registers
• It is dangerous to use r13 as a general register when the processor is
running any form of operating system because operating systems
often assume that r13 always points to a valid stack frame.
• In ARM state the registers r0 to r13 are orthogonal—any instruction
that you can apply to r0 you can equally well apply to any of the other
registers.
• In addition to the 16 data registers, there are two program status
registers: cpsr and spsr (the current and saved program status
registers)
Current Program Status Register
• The ARM core uses the cpsr to monitor and control internal
operations.
• The cpsr is a dedicated 32-bit register and resides in the register file.
• The cpsr is divided into four fields, each 8 bits wide: flags, status,
extension, and control.

Fig. A generic program status register (psr)


Processor Modes
• The processor mode determines which registers are active and the
access rights to the cpsr register itself.
• Two processor modes:
• A privileged mode allows full read-write access to the cpsr.
• A nonprivileged mode only allows read access to the control field in the cpsr
but still allows read-write access to the condition flags.
• There are seven processor modes in total: six privileged modes
(abort, fast interrupt request, interrupt request, supervisor, system,
and undefined) and one nonprivileged mode (user).
Processor Modes
• The processor enters abort mode when there is a failed attempt to access
memory.
• Fast interrupt request and interrupt request modes correspond to the two
interrupt levels available on the ARM processor.
• Supervisor mode is the mode that the processor is in after reset and is
generally the mode that an operating system kernel operates in.
• System mode is a special version of user mode that allows full read-write
access to the cpsr.
• Undefined mode is used when the processor encounters an instruction
that is undefined or not supported by the implementation.
• User mode is used for programs and applications.
Banked Registers
shows all 37 registers in the register file
Of those, 20 registers are hidden from a program at different
times. These registers are called banked registers and are
identified by the shading in the diagram.
Banked registers of a particular mode are denoted by an
underline character post-fixed to the mode mnemonic or
_mode.

Fig. Complete ARM register set.


Banked Registers
• Every processor mode except user mode can change mode by writing
directly to the mode bits of the cpsr.
• All processor modes except system mode have a set of associated
banked registers that are a subset of the main 16 registers.
• A banked register maps one-to-one onto a user mode register.
Banked Registers
• The following exceptions and interrupts cause a mode change: reset,
interrupt request, fast interrupt request, software interrupt, data
abort, prefetch abort, and undefined instruction.
• Exceptions and interrupts suspend the normal execution of sequential
instructions and jump to a specific location.
Banked Registers
• A new register appearing in interrupt request mode:
the saved program status register (spsr), which stores
the previous mode’s cpsr.
• The cpsr being copied into spsr_irq.
• To return back to user mode, a special return
instruction is used that instructs the core to restore
the original cpsr from the spsr_irq and bank in the
user registers r13 and r14.
• Note that the spsr can only be modified and read in a
privileged mode. There is no spsr available in user
mode.

Fig. Changing mode on an exception.


Banked Registers
Banked Registers
• Important feature to note is that the cpsr is not copied into the spsr
when a mode change is forced due to a program writing directly to
the cpsr.
• The saving of the cpsr only occurs when an exception or interrupt is
raised.
State and Instruction Sets
• There are three instruction sets:
• ARM
• Thumb
• Jazelle
• The ARM instruction set is only active when the processor is in ARM
state.
• The Thumb instruction set is only active when the processor is in
Thumb state.
• The Jazelle J and Thumb T bits in the cpsr reflect the state of the
processor.
State and Instruction Sets
• When both J and T bits are 0, the processor is in ARM state and executes
ARM instructions.
• When the T bit is 1, then the processor is in Thumb state.
• The ARM designers introduced a third instruction set called Jazelle.
• Jazelle executes 8-bit instructions and is a hybrid mix of software and
hardware designed to speed up the execution of Java bytecodes.
• To execute Java bytecodes, the Jazelle technology is required, plus a
specially modified version of the Java virtual machine.
• The Jazelle instruction set is a closed instruction set and is not openly
available
ARM and Thumb instruction set features
Jazelle instruction set features
Interrupt Masks
• Interrupt masks are used to stop specific interrupt requests from
interrupting the processor.
• Two interrupt request levels available on the ARM processor core
• Interrupt request (IRQ)
• fast interrupt request (FIQ).
• The cpsr has two interrupt mask bits, 7 and 6 (or I and F ), which
control the masking of IRQ and FIQ, respectively.
• The I bit masks IRQ when set to binary 1, and similarly the F bit masks
FIQ when set to binary 1.
Condition Flags

Fig. Example: cpsr = nzCvqjiFt_SVC


Conditional Execution
Conditional execution controls whether or not the core will execute an instruction.
Pipeline
• A pipeline is the mechanism a RISC processor uses to execute
instructions.
• Using a pipeline speeds up execution by fetching the next instruction
while other instructions are being decoded and executed.

Fig. ARM7 Three-stage pipeline

▪ Fetch loads an instruction from memory.


▪ Decode identifies the instruction to be executed.
▪ Execute processes the instruction and writes the result back to a
register.
Pipeline

This procedure is called filling the pipeline.


Fig. Pipelined instruction sequence

Fig. ARM9 five-stage pipeline.

Fig. ARM10 six-stage pipeline.


Pipeline
• The pipeline allows the core to execute an instruction every cycle.
• As the pipeline length increases, the amount of work done at each
stage is reduced, which allows the processor to attain a higher
operating frequency.
• The system latency also increases because it takes more cycles to fill
the pipeline before the core can execute an instruction.
Pipeline Executing Characteristics
• The ARM9 and ARM10 pipelines are different, they still use the same
pipeline executing characteristics as an ARM7.
• Code written for the ARM7 will execute on an ARM9 or ARM10

Fig. ARM instruction sequence


Exceptions, Interrupts, and the Vector Table
• When an exception or interrupt occurs, the processor sets the pc to a
specific memory address.
• The address is within a special address range called the vector table.
• When an exception or interrupt occurs, the processor suspends
normal execution and starts loading instructions from the exception
vector table.
Exceptions, Interrupts, and the Vector Table
• Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
• Undefined instruction vector is used when the processor cannot decode an instruction.
• Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is
frequently used as the mechanism to invoke an operating system routine.
• Prefetch abort vector occurs when the processor attempts to fetch an instruction from an
address without the correct access permissions. The actual abort occurs in the decode stage.
• Data abort vector is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
• Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor. It can only be raised if IRQs are not masked in the cpsr.
Exceptions, Interrupts, and the Vector Table
The vector table

Interrupt
Reset
Undefined
Software
Prefetch request
Data vector
abort
abort is the
interrupt vector
location
instruction
vector
vector is used
isvector
similar
vector
occurs of isthe
istowhen by external
first
called
aused
prefetch
when
when
the hardware
instruction
abort
the
youprocessor
processorbut
execute to interrupt
executed a by
is raised
attempts cannot
SWIthe
when
to the normal
processor
instruction.
decode
fetch execution
when
ananinstruction power
an instruction.
The attempts
instructionSWI flow
isto of theis
applied.
frominstruction
an access
address
processor.
This theItused
frequently
withoutmemory can
datainstruction only
branches
without
as the
correct bethe
raised
theifinitialization
topermissions.
mechanism
access correct IRQs are
access notactual
to invoke masked
code.
permissions.
The an in the
operating
abort cpsr.in routine.
system
occurs the decode stage.
Core Extensions
• Placed next to the ARM Core.
• Improve performance, manage resources, and provide extra functionality and
are designed to provide flexibility in handling particular applications.
• Three hardware extensions ARM wraps around the core:
• cache and tightly coupled memory
• memory management
• the coprocessor interface.
• Cache and Tightly Coupled Memory
• The cache is a block of fast memory placed between main memory and the
core.
• Most ARM-based embedded systems use a single-level cache internal to the
processor.
Core Extensions - Cache and Tightly Coupled
Memory

• ARM has two forms of cache.


1. Attached to the Von Neumann–
style cores.
• It combines both data and instruction
into a single unified cache
• The glue logic that connects the
memory system to the AMBA bus logic
and control.
Core Extensions - Cache and Tightly Coupled
Memory
2. Attached to the Harvard-style cores, has separate caches for data
and instruction.
• A cache provides an overall increase in performance but at the
expense of predictable execution.
• For real-time systems it is paramount that code execution is
deterministic— the time taken for loading and storing instructions
or data must be predictable.
• Tightly coupled memory (TCM)
Core Extensions - Cache and Tightly Coupled
Memory
• Tightly coupled memory (TCM)
• TCM is fast SRAM located close to
the core and guarantees the clock
cycles required to fetch instructions
or data—critical for real-time
algorithms requiring deterministic
behavior.
• TCMs appear as memory in the
address map and can be accessed
as fast memory.
Fig. A simplified Harvard architecture with caches and TCMs.
Memory Management
• Embedded systems often use multiple memory devices.
• To protect the system from applications trying to make inappropriate
accesses to hardware >>>>> achieved with the assistance of memory
management hardware
• ARM cores have three different types of memory management
hardware:
• no extensions providing no protection
• a memory protection unit (MPU) providing limited protection
• a memory management unit (MMU) providing full protection
Memory Management
• Nonprotected memory is fixed and provides very little flexibility. It is
normally used for small, simple embedded systems that require no
protection from rogue applications.
• MPUs employ a simple system that uses a limited number of memory
regions. These regions are controlled with a set of special coprocessor
registers, and each region is defined with specific access permissions. This
type of memory management is used for systems that require memory
protection but don’t have a complex memory map.
• MMUs are the most comprehensive memory management hardware
available on the ARM. The MMU uses a set of translation tables to provide
fine-grained control over memory. MMUs are designed for more
sophisticated platform operating systems that support multitasking.
Coprocessors
• Coprocessors can be attached to the ARM processor.
• A coprocessor extends the processing features of a core by extending
the instruction set or by providing configuration registers.
• The coprocessor can be accessed through a group of dedicated ARM
instructions that provide a load-store type interface.
• For example, coprocessor 15: The ARM processor uses coprocessor 15
registers to control the cache, TCMs, and memory management.
• The coprocessor can also extend the instruction set by providing a
specialized group of new instructions.
Nomenclature

x—family
y—memory management/protection unit
z—cache
T—Thumb 16-bit decoder
D—JTAG debug
M—fast multiplier
I—EmbeddedICE macrocell
E—enhanced instructions
J—Jazelle
F—vector floating-point unit
S—synthesizible version
Description of the cpsr
ARM family attribute comparison
ARM processor variants.
Summary
• The ARM processor can be abstracted into eight components—ALU,
barrel shifter, MAC, register file, instruction decoder, address register,
incrementer, and sign extend.
• ARM has three instruction sets—ARM, Thumb, and Jazelle.
• The register file contains 37 registers, but only 17 or 18 registers are
accessible at any point in time; the rest are banked according to
processor mode.
• The current processor mode is stored in the cpsr.
• It holds the current status of the processor core as well interrupt masks,
condition flags, and state. The state determines which instruction set is being
executed.
Summary
• An ARM processor comprises a core plus the surrounding components that
interface it with a bus. The core extensions include the following:
• Caches are used to improve the overall system performance.
• TCMs are used to improve deterministic real-time response.
• Memory management is used to organize memory and protect system resources.
• Coprocessors are used to extend the instruction set and functionality. Coprocessor
15 controls the cache, TCMs, and memory management.
• An ARM processor is an implementation of a specific instruction set
architecture (ISA).
• Processors are grouped into implementation families (ARM7, ARM9,
ARM10, and ARM11) with similar characteristics.
INTRODUCTION TO THE ARM
INSTRUCTION SET
The ARM Instruction Set
• ARM instructions process data held in registers and only
access memory with load and store instructions.
The ARM Instruction Set
• ARM instructions commonly take two or three operands.

• The function and syntax of the ARM instructions by instruction class—


data processing instructions, branch instructions, load-store
instructions, software interrupt instruction, and program status
register instructions.
Data Processing Instructions
• The data processing instructions manipulate data within registers.
• They are
• move instructions
• arithmetic instructions
• logical instructions
• comparison instructions
• multiply instructions.
• If you use the S suffix on a data processing instruction, then it updates the flags in
the cpsr.
• Move and logical operations update the carry flag C, negative flag N, and zero
flag Z.
• The carry flag is set from the result of the barrel shift as the last bit shifted out.
• The N flag is set to bit 31 of the result.
• The Z flag is set if the result is zero.
Data Processing Instructions
▪ Move Instructions
Data Processing Instructions
▪ Barrel Shifter
• There are data processing instructions that do not
use the barrel shift, for example, the MUL
(multiply), CLZ (count leading zeros), and QADD
(signed saturated 32-bit add) instructions.
• Pre-processing or shift occurs within the cycle
time of the instruction.
Data Processing Instructions
Barrel shifter operations
Data Processing Instructions

Fig. Logical shift left by one


Data Processing Instructions
Barrel shift operation syntax for data processing instructions
Data Processing Instructions

*The C flag is updated in the cpsr because the S suffix is present in the instruction mnemonic.
Arithmetic Instructions
• The arithmetic instructions implement addition and subtraction of 32-
bit signed and unsigned values.

N is the result of the shifter operation.


Arithmetic Instructions
simple subtract instruction
SUBS instruction is useful for
decrementing loop counters

reverse subtract instruction (RSB)


The cpsr is updated with the ZC
flags being set
Using the Barrel Shifter with Arithmetic
Instructions

Register r1 is first shifted one location to the left to give the value of twice r1.
Logical Instructions
• Logical instructions perform bitwise logical operations on the two
source registers.
Logical Instructions
logical OR operation between registers r1 and r2

BIC, which carries out a logical bit clear

This instruction is particularly useful when clearing


status bits and is frequently used to change interrupt
masks in the cpsr.
Comparison Instructions
• The comparison instructions are used to compare or test a register with a
32-bit value.
• They update the cpsr flag bits according to the result, but do not affect
other registers.
• After the bits have been set, the information can then be used to change
program flow by using conditional execution.
• No need to apply the S suffix for comparison instructions to update the
flags.

N is the result of the shifter operation.


Comparison Instructions
CMP comparison instruction

The CMP is effectively a subtract instruction with the result discarded


After execution the z flag changes to 1 or an uppercase Z. This change
indicates equality.

• Similarly the TST instruction is a logical AND operation, and TEQ is a logical
exclusive OR operation.
• It is important to understand that comparison instructions only modify the
condition flags of the cpsr and do not affect the registers being compared.
Multiply Instructions
• The multiply instructions multiply the contents of a pair of registers and,
depending upon the instruction, accumulate the results in with another register.

The long multiplies accumulate onto


a pair of registers representing a 64-
bit value.
Multiply Instructions
• The number of cycles taken to execute a multiply instruction depends
on the processor implementation.

a simple multiply instruction that multiplies registers r1 and r2


Multiply Instructions
• The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL)
produce a 64-bit result.
• The result is too large to fit a single 32-bit register so the result is
placed in two registers labeled RdLo and RdHi.
• RdLo holds the lower 32 bits of the 64-bit result, and RdHi holds the
higher 32 bits of the 64-bit result.
Multiply Instructions
The instruction multiplies registers r2 and r3 and places the result into register r0 and r1. Register
r0 contains the lower 32 bits, and register r1 contains the higher 32 bits of the 64-bit result.
Branch Instructions
• A branch instruction changes the flow of execution or is used to call a
routine.
• This type of instruction allows programs to have subroutines, if-then-
else structures, and loops.
• The change of execution flow forces the program counter (pc) to
point to a new address.
• The ARMv5E instruction set includes four different branch
instructions.
Branch Instructions

• The address label is stored in the instruction as a signed pc-relative offset and must be
within approximately 32 MB of the branch instruction.
Branch Instructions
• The forward branch skips three instructions.
• The backward branch creates an infinite loop.

Branches are used to change execution flow.

✓ Most assemblers hide the details of a branch instruction encoding by using labels.
✓ The branch labels are placed at the beginning of the line and are used to mark an address
that can be used later by the assembler to calculate the branch offset.
Branch Instructions
• The branch with link, or BL, instruction is similar to the B instruction
but overwrites the link register lr with a return address.
• It performs a subroutine call.

✓ The branch exchange (BX) and branch exchange with link (BLX) are the third type of
branch instruction.
✓ The BX instruction uses an absolute address stored in register Rm. It is primarily used to
branch to and from Thumb code
Load-Store Instructions
• Load-store instructions transfer data between memory and processor
registers.
• There are three types of load-store instructions:
• single-register transfer
• multiple-register transfer
• swap
Load-Store Instructions
▪ Single-Register Transfer
• These instructions are used for moving a single data item in and out
of a register.
• The datatypes supported are signed and unsigned words (32-bit),
halfwords (16-bit), and bytes.
Load-Store Instructions

Loads a word from the address stored in register r1 and Stores the contents of register r0 to the address contained
places it into register r0. in register r1.
The offset from register r1 is zero
Register r1 is called the base address register.
Single-Register Load-Store Addressing Modes
Index methods

• Preindex with writeback calculates an address from a base register plus address
offset and then updates that address base register with the new address.
• The preindex offset is the same as the preindex with writeback but does not
update the address base register.
• Postindex only updates the address base register after the address is used.
Single-Register Load-Store Addressing Modes
• The preindex mode is useful for accessing an element in a data
structure.
• The postindex and preindex with writeback modes are useful for
traversing an array.
Single-Register Load-Store Addressing Modes
Single-Register Load-Store Addressing Modes
Single-register load-store addressing, word or unsigned byte

• A signed offset or register is denoted by “+/”,identifying that it is either a positive or negative offset from the
base address register Rn.
• The base address register is a pointer to a byte in memory, and the offset specifies a number of bytes.
• Immediate means the address is calculated using the base address register and a 12-bit offset encoded in the
instruction.
• Register means the address is calculated using the base address register and a specific register’s contents.
• Scaled means the address is calculated using the base address register and a barrel shift operation.
Single-Register Load-Store Addressing Modes
Examples of LDR instructions using different addressing modes.
Single-Register Load-Store Addressing Modes
Single-register load-store addressing, halfword, signed halfword, signed byte, and doubleword.
Single-Register Load-Store Addressing Modes
Variations of STRH instructions.
Multiple-Register Transfer
• Load-store multiple instructions can transfer multiple registers
between memory and the processor in a single instruction.
• The transfer occurs from a base address register Rn pointing into
memory.
• Multiple-register transfer instructions are more efficient from single-
register transfers for moving blocks of data around memory and
saving and restoring context and stacks.
• Load-store multiple instructions can increase interrupt latency.
Multiple-Register Transfer

Here N is the number of registers in the list of registers.

• Any subset of the current bank of registers can be transferred to memory or fetched from memory.
• The base register Rn determines the source or destination address for a load-store multiple
instruction.
• This register can be optionally updated following the transfer.
• This occurs when register Rn is followed by the ! character, similar to the single-register load-
store using preindex with writeback.
Multiple-Register Transfer
Addressing mode for load-store multiple instructions.
Swap Instruction
• The swap instruction is a special case of a load-store instruction.
• It swaps the contents of memory with the contents of a register.
• This instruction is an atomic operation—it reads and writes a
location in the same bus operation, preventing any other instruction
from reading or writing to that location until it completes.
• Swap cannot be interrupted by any other instruction or any other bus
access.
• The system “holds the bus” until the transaction is complete.
Swap Instruction
Swap Instruction
Software Interrupt Instruction
• A software interrupt instruction (SWI) causes a software interrupt
exception, which provides a mechanism for applications to call
operating system routines.
• When the processor executes an SWI instruction, it sets the program
counter pc to the offset 0x8 in the vector table.
• The instruction also forces the processor mode to SVC
Software Interrupt Instruction
Typically the SWI instruction is executed in user mode.

• Code called the SWI handler is required to process the SWI call. The handler obtains the SWI number
using the address of the executed instruction, which is calculated from the link register lr.
• The SWI number is determined by

Here the SWI instruction is the actual 32-bit SWI instruction executed by the processor.
Software Interrupt Instruction
Program Status Register Instructions
• The ARM instruction set provides two instructions to directly control
a program status register (psr).
• The MRS instruction transfers the contents of either the cpsr or spsr
into a register; in the reverse direction, the MSR instruction transfers
the contents of a register into the cpsr or spsr.
• These instructions are used to read and write the cpsr and spsr.

fields: This can be any combination of control (c), extension (x), status (s), and
flags (f ). These fields relate to particular byte regions in a psr.
Program Status Register Instructions

Fig. psr byte fields

The c field controls the interrupt masks, Thumb state, and processor mode.
Program Status Register Instructions

SVC mode

In user mode you can read all cpsr bits, but you can only update the condition flag
field f.
Bus-Based Computer Systems
• The bus is the mechanism by which the CPU communicates with
memory and I/O devices.
• A bus is a collection wires with a defined protocols by which the CPU,
memory, and devices communicate.
• Bus Protocols: Controls communication between entities.
• Determines who gets to use the bus at any particular time.
• Governs length, style of communication.
CPU Bus
• Device 1 raises its output to signal an enquiry, which tells device 2
that it should get ready to listen for data.
• When device 2 is ready to receive, it raises its output to signal an
acknowledgment. At this point, devices 1 and 2 can transmit or
receive.
• Once the data transfer is complete, device 2 lowers its output,
signaling that it has received the data.
• After seeing that ack has been released, device 1 lowers its output.
CPU Bus
• The fundamental bus operations are reading and writing.
• The major components are:
• Clock provides synchronization to the bus components,
• R/𝑾 is true when the bus is reading and false when the bus is writing,
• Address is an n-bit bundle of signals that transmits the address for an access,
• Data is an n-bit bundle of signals that can carry data to or from the CPU, and
• Data ready signals when the values on the data bundle are valid.
CPU Bus

Figure: A typical microprocessor bus


CPU Bus
• The R/ 𝑊ഥ and address are unidirectional signals, since only the CPU
(read or write a device or memory) can determine the address and
direction of the transfer.
• The behavior of a bus is most often specified as a timing diagram.
CPU Bus
• A timing diagram shows how the signals on a bus vary over time.
• A signal can go between a known 0/1 state and a stable/changing
state.
• The bus’s timing requirements are independent of the exact address
on the bus.
• A changing signal does not have a stable value.
• Changing signals should not be used for computation.
• To be sure that signals go to their proper values at the proper times,
timing diagrams sometimes show timing constraints.
Figure: Timing diagram notation
Figure: Timing diagram for the example bus
Figure: Timing diagram for the example bus
Memory read operation with no wait states
• CPU puts Address on command (memory read) on bus in middle of
clock cycle T1.
• Memory looks at the bus at the start of T2 and sees the address and
command.
• CPU looks at the bus at the start of T3 and sees that the data is
available because the 𝑟𝑒𝑎𝑑𝑦 signal is asserted.
• The CPU removes the address, IO/ 𝑚𝑒𝑚𝑜𝑟𝑦, and 𝑟𝑒𝑎𝑑𝑦 /write
signals.
Figure: Timing diagram for the example bus
Memory read operation with one wait states
• CPU puts Address on command (memory read) on bus in middle of
clock cycle T1.
• Memory looks at the bus at the start of T2 and sees the address and
command.
• CPU looks at the bus at the start of T4 and sees that the data is
available because the ready signal is asserted.
• The CPU removes the address, IO/ 𝑚𝑒𝑚𝑜𝑟𝑦, and 𝑟𝑒𝑎𝑑𝑦/write
signals.
BUS STRUCTURE
▪ Bus Structure:
• Bus Structure is how a CPU communicates with memory and devices.
• Bus is the mechanism by which CPU communicates with memory and
IO devices.
• Bus is a transport device then the bus in the context of Embedded
System or computers is also defining the corresponding transport
mechanism between the CPU and memory and IO devices.
A Generic bus structure will have address, data and control.
BUS STRUCTURE
• Control lines actually implements the transaction protocol.
• The signals which flow along the control lines are really instrumental in
implementing the transport protocol.
• Request and Acknowledgement signals are the 2 basic and generic types of
signals we will find on the bus.
• Data lines carry information between the source and the destination.

*Buses can be serial or parallel, synchronous or asynchronous.


*The Universal Serial Bus (USB) and IEEE 1394 are examples of serial buses
while the ISA and PCI buses are examples of popular parallel buses
BUS STRUCTURE
• A typical embedded system is composed of several components such
as the Central Processing Unit (CPU), memory chips, and Input /
Output (I/O) devices. A bus is a common pathway or a set of wires
that interconnect these various subsystems.
• A bus can also be defined as a channel over which information flows
between units or devices.
• Buses often include wires to carry signals for addresses, data,
control, status, clock, power and ground.
BUS STRUCTURE
• There are normally three types of bus in any processor
system.
• Address bus: This determines the location in memory
that the processor will read data from or write data to.
• The physical location of the data in memory is carried by
the address bus.
• Data bus: This contains the contents that have been read
from the memory location or are to be written into the
memory location.
• The width of the data bus reflects the maximum amount
of data that can be processed and delivered at one time
• Control bus: this manages the information flow between
components indicating whether the operation is a read or
a write and ensuring that the operation happens at the
right time.
TIME MULTIPLEXING:
• Bus lines can be separated into two generic types:
• Dedicated
• Multiplexed
• A dedicated bus line is permanently assigned either to one function or to a
physical subset of computer components.
• An example of functional dedication is the use of separate dedicated
address and data lines, which is common on many buses.
• In a multiplexed bus, address and data information may be transmitted
over the same set of lines using an Address Valid control line.
• At the beginning of a data transfer, the address is placed on the bus and
the Address Valid line is activated. The address is then removed from the
bus, and the same bus connections are used for the subsequent read or
write data transfer.
TIME MULTIPLEXING:
• The advantage of time multiplexing is the use of fewer lines, which
saves space and, usually, cost.
• The disadvantage of time multiplexing is that more complex circuitry
is needed within each module. Also, there is a potential reduction in
performance because certain events that share the same lines cannot
take place in parallel.
Time-multiplexed data transfer
BUS
BUS - Synchronous Bus
• Synchronous buses rely on a common clock signal that synchronizes
the entire system. This clock signal is generated by a master device
(typically the CPU or the bus controller), and all components on the
bus use this signal to determine when data can be transferred.
• Each data transfer occurs at the rising or falling edge of the clock
signal.
• The data transfer itself is synchronized to the clock signal, ensuring
that there are no timing mismatches or errors.
• Typically, synchronous buses operate at higher speeds because the
clock signal ensures that devices work together in harmony, enabling
faster data transfers.
BUS - Synchronous Bus
▪ Characteristics of Synchronous Bus Timing
• Clock Dependency: All components rely on the same clock signal,
which dictates the timing of data transfers.
• Fixed Data Transfer Speed: The system speed is determined by the
frequency of the clock. Higher clock frequencies allow faster data
transfer.
• Data Transfer Predictability: Synchronous buses offer more
predictability in terms of data transfer timing.
• Efficiency: Due to their synchronization, synchronous buses are more
efficient
BUS - Synchronous Bus
▪ Advantages of Synchronous Bus Timing
• High Data Transfer Rates
• Predictable Performance
• Reduced Overhead
▪ Disadvantages of Synchronous Bus Timing
• Clock Speed Limitation
• Scalability Issues
• Complex Design
BUS - Asynchronous Bus
• Asynchronous buses do not rely on a common clock signal.
• Each device on the bus operates independently, and data transfers
are controlled by the handshaking process between the sender and
receiver.
• Data transfer is controlled by control signals rather than a shared
clock.
• These control signals include signals like REQ, ACK, and RDY.
• The key difference here is that the sender and receiver work
independently of each other, with the data transfer occurring only
when both parties are ready.
BUS - Asynchronous Bus
▪ Characteristics of Asynchronous Bus Timing
• Clock Independence: Devices operate without a common clock signal,
allowing them to work at their own pace.
• Handshaking Signals: Data transfers are coordinated through handshaking
signals that indicate when data is ready to be transmitted or received.
• Variable Data Transfer Speeds: Since there is no clock dictating the speed,
asynchronous buses can operate at different speeds depending on the
capabilities of the devices involved.
• No Global Synchronization: Each device is independent of the others, and
there is no global synchronization across the system.
BUS - Asynchronous Bus
▪ Advantages of Asynchronous Bus Timing
• Flexibility
• Scalability
• Lower Power Consumption
▪ Disadvantages of Asynchronous Bus Timing
• Slower Data Transfer
• Complexity in Design
• Higher Overhead
Key Differences Between Synchronous and
Asynchronous Bus Timing
1. Clock Signal
2. Data Transfer Speed
3. Flexibility
4. Complexity
Applications of Synchronous and
Asynchronous Bus Timing
▪ Synchronous Bus Timing:
• High-speed CPUs and memory systems
• Graphics cards and video memory
• Embedded systems
▪ Asynchronous Bus Timing:
• Peripheral devices
• Older communication systems
• Low-power applications
Basic Protocol Concepts:
• The basic element of a protocol is bus transaction.
• A bus transaction includes two parts - request and action.
• Typically, a request consists of a command and an address an action is
basically transferring of data.
• Master is a one who starts the bus transaction by issuing the
command and address.
• Slave is the one responds to the address by sending data to the
master if the master asks for data or receives data if the master wants
to send data.
BUS Protocols:
• Bus protocols are standardized ways for various embedded system
components or devices to communicate with one another in
embedded systems.
• Bus protocols provide out the guidelines and norms for addressing,
synchronization, and data transfer, providing seamless
interoperability.
▪ Importance of Communication Protocols in Ensuring Seamless
Operation of Embedded Systems
• Data Integrity and Accuracy
• Interoperability
• Efficiency
Types of Communication Protocols
1. UART
2. SPI
3. I2C
4. USB
UART protocol
• UART uses asynchronous serial communication, which means that
no clock signal is used to synchronize data between the transmitter
and receiving devices.
• UART data transmission follows TX-RX/Transmitter/receiver pin
communication connected through two wires.
• UART adds stop and start bits to detect incoming data. Device
communication speed must follow around the same baud rate
configuration in beats per second (bps) to send and receive data
transmission packets.
UART protocol
▪ UART can be configured in three distinct ways:
• Simplex: One-way data Communication.
• Half-Duplex: Data transmission in both directions but not at the same time.
• Full-Duplex: Simultaneous data transmission in both directions at the same time.
▪ UART Pros
• Only uses two wires to connect
• Popular and commonly used protocol
• Asynchronous meaning no clock signal is needed
• Simple structure means data packets can be easily changed
• Multiple ways for configuration
▪ UART Cons
• With one master and slave, UART doesn’t support multiple systems
• Baud rate configuration needs to be similar
SPI Protocols
• SPI refers to Serial Peripheral Interface and is a famous
communication protocol within the embedded world.
• Its essential function as an interface Bus is to send/receive data
across Microcontrollers (Master) and Peripherals such as sensors and
SD cards (Slaves) to assist device communication.
• SPI follows a Serial Communication Protocol with a Full-Duplex
configuration.
• This holds an advantage over stop/start bit communication, as
devices can function without interruption.
SPI Protocols
▪ The overall operation of SPI involves four signals.
• The Serial Clock (SCLK) assists a synchronous interface, allowing faster data
transfer.
• The Master node commands Master In Slave Out (MISO) or Maser Out Slave
In (MOSI) communication.
• The Slave Select lines (SS) indicate activity; when a line goes low,
communication is present between the Master and a Slave Node.
• This enables higher speeds-16Mhz-32Mhz- of data transmission than other
Protocols.
SPI Protocols
▪ SPI Pros
• Flexibility for bits transferred
• Can support multi-master-slave systems
• High speed – faster than asynchronous methods
• Continuous transmission of data bits means less interruption
▪ SPI Cons
• Not as scalable as other multi-slave systems
• More wires are required for communication
• One master controls all slave communication
I2C Protocols
• I2C means Intern-Integrated Circuit and was first created by Phillips to
power electrical lighting.
• I2C communication is popular due to its Multi-Master-Multi-Slave
structure, otherwise known as an I2C Bus.
• The I2C communication protocol follows a Half-Duplex configuration,
which means data can be transferred bit by bit via two-way communication
at a single time.
• I2C follows a Serial Communication Protocol enabling two-wire interface
communication between masters and slaves.
• Multiple Master and Salve device communication works via Synchronous
Communication, where a clock signal controlled by the master is
distributed amongst the salve nodes across two-wire interfaces, the Serial
Clock Line (SCL) and Serial Data Line (SDA).
I2C Protocols
▪ I2C Pros
• Cheaper cost to integrate than other communication protocols
• Flexible- the multi-master-slave design makes I2C much more functional
• Adaptable to integrated circuit types
▪ I2C Cons
• Lower transmission speeds
• It is considered complex, especially as the number of devices used increases.
USB Protocols
• USB refers to Universal Serial Bus and follows an Inter-System
Protocol, which communicates between two devices.
• USB follows an Asynchronous Serial Protocol where no clock signal
is needed, making it a low-cost device.
▪ USB Pros
• Low cost, low power and smaller in size
• Can support high-capacity of data
• Plug and play means easy implementation
USB Protocols
▪ USB Cons
• Limited capability
• Limited messages can be communicated between the host and peripheral
Direct Memory Access (DMA)
• In DMA is what we are bypassing the CPU to transfer the data from
peripheral to the memory.
• A direct memory access (DMA) is an operation in which data is copied
(transported) from one resource to another resource in a computer system
without the involvement of the CPU.
• The task of a DMA-controller (DMAC) is to execute the copy operation of
data from one resource location to another. The copy of data can be
performed from:
• I/O-device to memory
• memory to I/O-device
• memory to memory
• I/O-device to I/O-device
Direct Memory Access (DMA)
• The DMA requires the CPU to provide two additional bus signals:
• The bus request is an input to the CPU through which DMA controllers ask for
ownership of the bus.
• The bus grant signals that the bus has been granted to the DMA controller.
• A device that can initiate its own bus transfer is known as a bus
master.
• The DMA controller uses these two signals (bus request and bus
grant) to gain control of the bus using a classic four-cycle handshake.
• The bus request is asserted by the DMA controller when it wants to
control the bus, and the bus grant is asserted by the CPU when the
bus is ready.
Direct Memory Access (DMA)

Figure: A bus with a DMA controller


Direct Memory Access (DMA)
• The CPU will finish all pending bus transactions before granting
control of the bus to the DMA controller.
• When it does grant control, it stops driving the other bus signals:
R/W, address, and so on.
• Once the DMA controller is bus master, it can perform reads and
writes using the same bus protocol as with any CPU-driven bus
transaction.
• The CPU controls the DMA operation through registers in the DMA
controller.
Direct Memory Access (DMA)
• A typical DMA controller includes the following three registers:
• A starting address register specifies where the transfer is to begin.
• A length register specifies the number of words to be transferred.
• A status register allows the DMA controller to be operated by the CPU.
• After the DMA operation is complete, the DMA controller interrupts the CPU
to tell it that the transfer is done.
Direct Memory Access (DMA)

Figure: UML sequence diagram of system activity around a DMA transfer


A multiple bus system
• High-speed devices may be connected to a high-performance bus,
while lower-speed devices are connected to a different bus.
• A small block of logic known as a bridge allows the buses to connect
to each other.
• There are advantages to use multiple buses and bridges:
• Higher-speed buses may provide wider data connections.
• A high-speed bus usually requires more expensive circuits and connectors.
• The bridge may allow the buses to operate independently, thereby providing
some parallelism in I/O operations.
A multiple bus system

Figure: A multiple bus system


Bridge
• The bridge serves as a protocol translator between the two bridges as
well.
• The bridge is a slave on the fast bus and the master of the slow bus.
• The bridge takes commands from the fast bus on which it is a slave
and issues those commands on the slow bus.
• It also returns the results from the slow bus to the fast bus.
Bridge

Figure: UML state diagram of bus bridge operation


AMBA Bus
• The Advanced microcontroller bus architecture (AMBA) bus supports
CPUs, memories, and peripherals integrated in a system-on-silicon.
• The AMBA specification includes two buses:
• The AMBA high-performance bus (AHB) is optimized for high-speed transfers
and is directly connected to the CPU.
• It supports several high-performance features: pipelining, burst transfers, split
transactions, and multiple bus masters.
• A bridge can be used to connect the AHB to an AMBA peripherals bus (APB).
Bridge

Figure: Elements of the ARM AMBA bus system


Embedded Operating
Systems
INTRODUCTION TO REAL-TIME
OPERATING SYSTEMS
• Some core functional similarities between a typical RTOS and GPOS include:
• some level of multitasking,
• software and hardware resource management,
• provision of underlying OS services to applications, and
• abstracting the hardware from the software application.
• Some key functional differences that set RTOSes apart from GPOSes include:
• better reliability in embedded application contexts,
• the ability to scale up or down to meet application needs,
• faster performance,
• reduced memory requirements,
• scheduling policies tailored for real-time embedded systems,
• support for diskless embedded systems by allowing executables to boot and run from ROM
or RAM, and
• better portability to different hardware platforms.
INTRODUCTION TO REAL-TIME
OPERATING SYSTEMS
• GPOSes target general-purpose computing and run predominantly on
systems such as personal computers, workstations, and mainframes.
• In some cases, GPOSes run on embedded devices that have ample
memory and very soft real-time requirements.
• RTOSes, on the other hand, are reliable, compact, and scalable, and
they perform well in real-time embedded systems.
• In addition, RTOSes can be easily tailored to use only those
components required for a particular application.
Defining an RTOS
• A real-time operating system (RTOS) is a program that schedules
execution in a timely manner, manages system resources, and
provides a consistent foundation for developing application code.
• RTOS can be quite diverse, ranging from a simple application for a
digital stopwatch to a much more complex application for aircraft
navigation.
• Good RTOSes, therefore, are scalable in order to meet different sets
of requirements for different applications.
Defining an RTOS
• An RTOS comprises only a kernel, which is the core supervisory
software that provides minimal logic, scheduling, and resource-
management algorithms.
• An RTOS can be a combination of various modules, including the
kernel, a file system, networking protocol stacks, and other
components required for a particular application.
Kernel
• A kernel is the core component of an operating system that acts as a bridge
between software and hardware. It manages system resources, including
CPU, memory, and device communication, ensuring that applications can
function efficiently.
• Functions of a Kernel:
• Process Management: Schedules and controls processes, including multitasking and
process synchronization.
• Memory Management: Allocates and deallocates RAM for processes and ensures
memory protection.
• Device Management: Communicates with hardware via drivers and manages
input/output (I/O) operations.
• File System Management: Handles file storage, retrieval, and organization.
• Interrupt Handling: Manages system interrupts and exceptions to ensure smooth
execution.
• Security & Access Control: Enforces user permissions and system security policies.
Fig. High-level view of an RTOS, its kernel, and
other components found in embedded
systems.
Board Support Package (BSP)
• A Board Support Package (BSP) is a set of software components that enable an
RTOS (Real-Time Operating System) to run on a specific hardware platform. It acts
as an interface between the RTOS and the underlying hardware, ensuring that the
operating system can properly interact with the processor, memory, and
peripherals.
• Key Functions of a BSP:
• Hardware Abstraction: Provides low-level drivers for hardware components like timers, I/O
ports, and communication interfaces.
• Bootloader Support: Includes initialization code to start up the system and load the OS.
• Memory Management: Configures RAM, ROM, and external memory mappings.
• Device Drivers: Offers support for peripherals such as UART, SPI, I2C, Ethernet, and USB.
• RTOS Integration: Ensures seamless communication between the RTOS kernel and the
hardware.
• Interrupt Handling: Manages interrupt service routines (ISRs) for real-time processing.
Defining an RTOS
• RTOS kernels contain the following components:
• Scheduler—is contained within each kernel and follows a set of algorithms
that determines which task executes when. Some common examples of
scheduling algorithms include round-robin and preemptive scheduling.
• Objects—are special kernel constructs that help developers create
applications for real-time embedded systems. Common kernel objects include
tasks, semaphores, and message queues.
• Services—are operations that the kernel performs on an object or, generally
operations such as timing, interrupt handling, and resource management.
Fig. Common components in an RTOS kernel that
including objects, the scheduler, and some services.
The Scheduler
• The scheduler is at the heart of every kernel.
• A scheduler provides the algorithms needed to determine which task
executes when.
• schedulable entities
• Multitasking
• context switching
• dispatcher, and
• scheduling algorithms
The Scheduler - Schedulable Entities
• A schedulable entity is a kernel object that can compete for execution
time on a system, based on a predefined scheduling algorithm.
• Tasks and processes are all examples of schedulable entities found in
most kernels.
• A task is an independent thread of execution that contains a sequence of
independently schedulable instructions.
• Processes are similar to tasks in that they can independently compete for
CPU execution time.
• Processes differ from tasks in that they provide better memory protection
features, at the expense of performance and memory overhead.
The Scheduler - Multitasking
• Multitasking is the ability of the operating
system to handle multiple activities within set
deadlines.
• A real-time kernel might have multiple tasks
that it has to schedule to run.
• The kernel multitasks in such a way that many
threads of execution appear to be running
concurrently; however, the kernel is actually
interleaving executions sequentially, based
on a preset scheduling algorithm.
• The scheduler must ensure that the
appropriate task runs at the right time.
Fig. Multitasking using a context
switch.
The Context Switch
• Each task has its own context, which is the state of the CPU registers
required each time it is scheduled to run.
• A context switch occurs when the scheduler switches from one task
to another.
• Every time a new task is created, the kernel also creates and
maintains an associated task control block (TCB).
• TCBs are system data structures that the kernel uses to maintain task-
specific information.
• When the task is not running, its context is frozen within the TCB, to
be restored the next time the task runs.
The Context Switch
• When the kernel’s scheduler determines that
it needs to stop running task 1 and start
running task 2, it takes the following steps:
1. The kernel saves task 1’s context
information in its TCB.
2. It loads task 2’s context information from
its TCB, which becomes the current thread
of execution.
3. The context of task 1 is frozen while task 2
executes, but if the scheduler needs to run
task 1 again, task 1 continues from where it
left off just before the context switch
Fig. Multitasking using a context
switch.
The Context Switch
• The time it takes for the scheduler to switch from one task to another
is the context switch time.
• Every time an application makes a system call, the scheduler has an
opportunity to determine if it needs to switch contexts.
• When the scheduler determines a context switch is necessary, it relies
on an associated module, called the dispatcher, to make that switch
happen.
The Context Switch
Let's say we have an RTOS running three tasks:
1.Task A (Low Priority) → Reads sensor data.
2.Task B (Medium Priority) → Processes data.
3.Task C (High Priority) → Handles real-time communication.
Scenario:
• Initially, Task A is running.
• An interrupt (ISR) occurs from a network module, requiring Task C to
execute.
• The RTOS performs context switching to move from Task A to Task C.
The Context Switch
• Context Switching Steps:
1. Task A is running.
2. An Interrupt Occurs (from Network Module).
3. Switch to Task C.
4. Task C Finishes and Returns Control.
The Dispatcher
• The dispatcher is the part of the scheduler that performs context switching and
changes the flow of execution.
• At any time an RTOS is running, the flow of execution, also known as flow of
control, is passing through one of three areas: through an application task,
through an ISR, or through the kernel.
• When a task or ISR makes a system call, the flow of control passes to the kernel to
execute one of the system routines provided by the kernel.
• When it is time to leave the kernel, the dispatcher is responsible for passing
control to one of the tasks in the user’s application.
• It is the scheduling algorithms of the scheduler that determines which task
executes next.
• It is the dispatcher that does the actual work of context switching and passing
execution control.
The Dispatcher
• On the other hand, if an ISR makes system calls, the dispatcher is
bypassed until the ISR fully completes its execution.
• This process is true even if some resources have been freed that
would normally trigger a context switch between tasks.
• These context switches do not take place because the ISR must
complete without being interrupted by tasks.
• After the ISR completes execution, the kernel exits through the
dispatcher so that it can then dispatch the correct task.
Scheduling Algorithms
• The scheduler determines which task runs by following a scheduling
algorithm (also known as scheduling policy).
• The two common scheduling algorithms:
1. preemptive priority-based scheduling
2. round-robin scheduling
Scheduling Algorithms - Preemptive Priority-
Based Scheduling
• Most real-time kernels use preemptive priority-based scheduling by
default.

Fig. Preemptive priority-based scheduling.


Scheduling Algorithms - Preemptive Priority-
Based Scheduling
• Preemptive Priority-Based Scheduling is a task scheduling technique
used in Real-Time Operating Systems (RTOS) where the highest-
priority ready task is always executed, preempting (interrupting)
lower-priority tasks if necessary.
• Real-time kernels generally support 256 priority levels, in which 0 is
the highest and 255 the lowest.
• If a task with a priority higher than the current task becomes ready
to run, the kernel immediately saves the current task’s context in its
TCB and switches to the higher-priority task.
• A task’s priority can be changed dynamically using kernel-provided
calls.
Round Robin Scheduling-
In Round Robin Scheduling,
• CPU is assigned to the process on the basis of FCFS for a fixed amount
of time.
• This fixed amount of time is called as time quantum or time slice.
• After the time quantum expires, the running process is preempted
and sent to the ready queue.
• Then, the processor is assigned to the next arrived process.
• It is always preemptive in nature.
Round Robin Scheduling-
Consider the set of 5 processes whose arrival time and burst time are given
below-

If the CPU scheduling policy is Round Robin with time quantum = 2 unit,
calculate the average waiting time and average turn around time.
Round Robin Scheduling-

• Turn Around time = Exit time – Arrival time


• Waiting time = Turn Around time – Burst time
Average Turn
Around time =
(13 + 11 + 3 + 6
+ 10) / 5 = 43 / 5
= 8.6 unit
Average waiting
time = (8 + 8 + 2
+ 4 + 7) / 5 = 29
/ 5 = 5.8 unit
Round Robin Scheduling-
Advantages-

• It gives the best performance in terms of average response time.


• It is best suited for time sharing system, client server architecture and
interactive system.
Disadvantages-

• It leads to starvation for processes with larger burst time as they have to
repeat the cycle many times.
• Its performance heavily depends on time quantum.
• Priorities can not be set for the processes.
Scheduling Algorithms - Round-Robin
Scheduling
• Round-robin scheduling provides each task an equal share of the CPU
execution time.
• Pure round-robin scheduling cannot satisfy real-time system
requirements because in real-time systems, tasks perform work of
varying degrees of importance.
• A priority-based scheduling can be augmented with round-robin
scheduling which uses time slicing to achieve equal allocation of the
CPU for tasks of the same priority.
Scheduling Algorithms - Round-Robin
Scheduling

Fig. Round-robin and preemptive scheduling.


Objects
• Kernel objects are special constructs that are the building blocks for
application development for real-time embedded systems.
• The most common RTOS kernel objects are:
• Tasks—are concurrent and independent threads of execution that can
compete for CPU execution time.
• Semaphores—are token-like objects that can be incremented or decremented
by tasks for synchronization or mutual exclusion.
• Message Queues—are buffer-like data structures that can be used for
synchronization, mutual exclusion, and data exchange by passing messages
between tasks.
Services
• Along with objects, most kernels provide services that help
developers create applications for real-time embedded systems.
• These services comprise sets of API calls that can be used to perform
operations on kernel objects or can be used in general to facilitate
timer management, interrupt handling, device I/O, and memory
management.
Key Characteristics of an RTOS
• An application’s requirements define the requirements of its
underlying RTOS.
• The more common attributes are:
• Reliability
• Predictability
• Performance
• Compactness
• scalability

You might also like