0% found this document useful (0 votes)

8 views

Arm Unit 3

Uploaded by

Nikhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Arm Unit 3

Uploaded by

Nikhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

UNIT – III

Embedded C codes: Overview of C compiler and optimization, Basic C

data types, Local variable types, C looping and structures, Register
allocation, Function calls, Mixing C and Assembly Programming,
Instruction Scheduling.

ARM UNIT-3 Radha R C

Overview of C compiler and optimization:
• This will give an idea of the problems the C compiler faces when
optimizing code written by the user.
• Understanding these problems helps programmer to write source
code that will compile more efficiently in terms of increased speed
and reduced code size.
• Optimizing code takes time and reduces source code readability.
Usually, it’s only worth optimizing functions that are frequently
executed and important for performance.
• Profiling tool, found in most ARM simulators, helps to find frequently
executed functions.

ARM UNIT-3 Radha R C

• C compilers have to translate your C function literally into assembler
so that it works for all possible inputs.
• In practice, many of the input combinations are not possible or won’t
occur.
Example code: The memclr function clears N bytes of memory at
address data.

ARM UNIT-3 Radha R C

Problems that the compiler face :
• It does not know whether N can be 0 on input or not. Therefore the
compiler needs to test for this case explicitly before the first iteration
of the loop.
• The compiler doesn’t know whether the data array pointer is four-
byte aligned or not. If it is four-byte aligned, then the compiler can
clear four bytes at a time using an int store rather than a char store.
• Nor does it know whether N is a multiple of four or not. If N is a
multiple of four, then the compiler can repeat the loop four times or
store four bytes at a time using an int store.
• The compiler must be conservative and assume all possible values for
N and all possible alignments for data.
• To write efficient C code, programer must be aware of areas where
the C compiler has to be conservative.
• The limits of the processor architecture the C compiler is mapping
to, and the limits of a specific C compiler

ARM UNIT-3 Radha R C

Basic C Data Types:
How ARM compilers handle the basic C data types
some of these types are more efficient to use for local variables than
others

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C
• Loads that act on 8- or 16-bit values extend the value to 32 bits
before writing to an ARM register.
• Unsigned values are zero-extended, and signed values sign-extended.
This means that the cast of a loaded value to an int type does not cost
extra instructions.
• Similarly, a store of an 8- or 16-bit value selects the lowest 8 or 16 bits
of the register.
• The cast of an int to smaller type does not cost extra instructions on a
store.
• Prior to ARMv4, ARM processors were not good at handling signed
8-bit or any 16-bit values.
• Therefore ARM C compilers define char to be an unsigned 8-bit
value, rather than a signed 8-bit value as is typical in many other
compilers.

ARM UNIT-3 Radha R C

Compilers armcc and gcc use the above datatype mappings for an
ARM target.
The exceptional case for type char is worth noting as it can cause
problems when you are porting code from another processor
architecture.
A common example is using a char type variable i as a loop counter,
with loop continuation condition i ≥ 0. As i is unsigned for the ARM
compilers, the loop will never terminate. Fortunately armcc
produces a warning in this situation: unsigned comparison with 0.
Compilers also provide an override switch to make char signed.
For example, the command line option -fsigned-char will make char
signed on gcc.
The command line option -zc will have the same effect with armcc.
ARM UNIT-3 Radha R C
Local Variable Types:
ARMv4-based processors can efficiently load and store 8-, 16-, and 32-
bit data. However, most ARM data processing operations are 32-bit
only.
So, use a 32-bit datatype, int or long, for local variables wherever
possible.
Avoid using char and short as local variable types, even if you are
manipulating an 8- or 16-bit value.
The one exception is when you want wrap-around to occur. If you
require modulo arithmetic of the form 255 + 1 = 0, then use the char
type.

ARM UNIT-3 Radha R C

The following code checksums a data packet containing 64 words. It
shows why you should avoid using char for local variables.

• It looks as though declaring i as a char is efficient as char uses less

register space or less space on the stack than an int.
• On the ARM, both these assumptions are wrong. All ARM registers are
32-bit and all stack entries are at least 32-bit.
• Furthermore, to implement the i++ exactly, the compiler must account
for the case when i = 255. Any attempt to increment 255 should
produce the answer 0. ARM UNIT-3 Radha R C
I is declare as character

I is declare as integer

BCC: Branch if Carry is

Clear"

ARM UNIT-3 Radha R C

• In the first case, the compiler inserts an extra AND instruction to reduce i to the
range 0 to 255 before the comparison with 64.
• This instruction not used in the second case.
• The ARM data processing operations always operate on 32-bit quantities. You
should therefore:
• Use a 32-bit data type (e.g. int) for local variables.
• Avoid char and short for local variables, even if you’re manipulating a char or short value.
The exception to this is when you require wrap-around or modulo arithmetic
(e.g. 255+1 → 0).
• The compiler is emitting an AND r1,r1,#0xff instruction even though it should know
that i never exceeds 64.
• If we change i from char to unsigned int the AND disappears: it’s no longer
necessary to account for wrap-around.
• Remember that this isn’t just a saving of one instruction or cycle. It
saves 64 instructions: one for each iteration.
• This is an inner loop. Optimizations to inner loops are highly beneficial.
ARM UNIT-3 Radha R C
• suppose the data packet contains 16-bit values
The expression sum + data[i] is an integer and
so can only be assigned to a short using an
(implicit or explicit) narrowing cast.
The compiler must insert extra instructions to
implement the narrowing cast:

The loop is now three instructions

longer than the loop for example
checksum_v2 earlier

ARM UNIT-3 Radha R C

There are two reasons for the extra instructions:
• The LDRH instruction does not allow for a shifted address offset as
the LDR instruction did in checksum_v2. Therefore the first ADD in
the loop calculates the address of item i in the array. The LDRH loads
from an address with no offset.
• The cast reducing total + array[i] to a short requires two MOV
instructions. The compiler shifts left by 16 and then right by 16 to
implement a 16-bit sign extend. The shift right is a sign-extending
shift so it replicates the sign bit to fill the upper 16 bits.
• We can avoid the second problem by using an int type variable to
hold the partial sum. We only reduce the sum to a short type at the
function exit.

ARM UNIT-3 Radha R C

• First problem can be solved by accessing the array by incrementing
the pointer data rather than using an index as in data[i]. This is
efficient regardless of array type size or element size. All ARM load
and store instructions have a postincrement addressing mode.
• uses int type local variables to avoid unnecessary casts. It increments
the pointer data instead of using an index offset data[i].

ARM UNIT-3 Radha R C

• The *(data++) operation translates to a single ARM instruction that
loads the data and increments the data pointer.
• You could write sum += *data; data++; or even *data++ instead.
• The compiler produces the following output. Three instructions have
been removed from the inside loop, saving three cycles per loop
compared to checksum_v3.

ARM UNIT-3 Radha R C

• Function Argument Types

•In this example the input and output values are both short.
•Yet the values will be passed in (and out) in 32-bit-wide registers.
•Should the compiler assume values are already in the range of
a short?
•Or should the compiler force the values to this range by sign-
extending the low 16 bits to fill the 32-bit register?
•The compiler must make a compatible decision for both the caller
and the callee.

ARM UNIT-3 Radha R C

Armcc: It assumes input values are in the correct
range.

Gcc: Makes no assumptions about the range of argument

values so sign extends the values on entry.
ARM UNIT-3 Radha R C
Signed versus Unsigned Types:
Compares the efficiencies of signed int and unsigned int.
If your code uses addition, subtraction, and multiplication, then there is
no performance difference between signed and unsigned operations.
However, there is a difference when it comes to division. Consider the
following short example that averages two integers:

ARM UNIT-3 Radha R C

Compiler adds one to the sum before shifting by right if the sum is
negative. In other words it replaces x/2 by the statement:

ARM UNIT-3 Radha R C

• As x is signed, in C on an ARM target, a divide by two is not a right
shift if x is negative. For example, −3 ≫ 1 = −2 but −3/2 = −1. Division
rounds towards zero, but arithmetic right shift rounds towards −∞

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C
C looping and structures:
About efficient ways to code for and while loops on the ARM.
Loops with a fixed number of iterations and then move on to loops
with a variable number of iterations.
Loops with a Fixed Number of Iterations:

ARM UNIT-3 Radha R C

Compiler generated code takes three instructions to implement FOR
loop
• An ADD to increment i
• A compare to check if i is less than 64
• A conditional branch to continue the loop if i < 64
On the ARM, a loop should only use two instructions:
• A subtract to decrement the loop counter, which also sets the
condition code flags on the result
• A conditional branch instruction

ARM UNIT-3 Radha R C

• Example shows the improvement if we switch to a decrementing loop
rather than an incrementing loop.

The SUBS and BNE instructions implement the loop and it takes four instructions per loop. But
using increment operation it takes 5 instructions per iteration.

ARM UNIT-3 Radha R C

• For an unsigned loop counter i we can use either of the loop
continuation conditions i!=0 or i>0. As i can’t be negative,
they are the same condition.
• For a signed loop counter, it is tempting to use the condition
i>0 to continue the loop

• When i = -0x80000000, two sections of code generate

different answers. For the first piece of code the SUBS
instruction compares i with 1 and then decrements i. Since -
0x80000000 < 1, the loop terminates.

ARM UNIT-3 Radha R C

• For the second piece of code, we decrement i and then compare with
0. Modulo arithmetic means that i now has the value +0x7fffffff,
which is greater than zero. Thus the loop continues for many
iterations.
• Therefore you should use the termination condition i!=0 for signed or
unsigned loop counters. It saves one instruction over the condition
i>0 for signed i.
Loops Using a Variable Number of Iterations:
For the checksum routine to handle packets of arbitrary size. We pass
in a variable N giving the number of words in the data packet.
We count down until N = 0 and don’t require an extra loop counter i.

ARM UNIT-3 Radha R C

Notice that the compiler checks that N is nonzero on entry to the function.
Often this check is unnecessary since array won’t be empty.
In this case a do-while loop gives better performance and code density
than a for loop.
ARM UNIT-3 Radha R C
This example shows how to use a do-while loop to remove the test for N
being zero that occurs in a for loop.

Compare this with the output for checksum_v7 using for loop, while loop save
two-cycles in comparing with for loop.

ARM UNIT-3 Radha R

C
Loop Unrolling:
Each loop iteration costs two instructions in addition to the body of the
loop: a subtract to decrement the loop count and a conditional
branch. This is called as loop overhead.
On ARM7 or ARM9 processors the subtract takes one cycle and the
branch three cycles, giving an overhead of four cycles per loop.
This loop overhead can be avoided by unrolling a loop—repeating the
loop body several times, and reducing the number of loop iterations by
the same proportion.

ARM UNIT-3 Radha R

C
• The following code unrolls our packet checksum loop by four times.
We assume that the number of words in the packet N is a multiple of
four.

Reduced the loop overhead from 4N cycles to (4N)/4=N cycles

ARM UNIT-3 Radha R C

There are two questions when unrolling a loop:
■ How many times should I unroll the loop?
■ What if the number of loop iterations is not a multiple of the unroll amount?
For example, what if N is not a multiple of four in checksum_v9?
For first question, only unroll loops that are important for the overall performance
of the application. Otherwise unrolling will increase the code size with little
performance benefit. Unrolling may even reduce performance by evicting more
important code from the cache. Suppose the loop is important, for example, 30% of
the entire application. The loop overhead cost is 3/128, roughly 3%. Recalling that
the loop is 30% of the entire application, overall the loop overhead is only 1%.
Unrolling the code further gains little extra performance, but has a significant
impact on the cache contents. It is usually not worth unrolling further when the
gain is less than 1%.
For the second question, try to arrange it so that array sizes are multiples of your
unroll amount. If this isn’t possible, then you must add extra code to take care of
the leftover cases. This increases the code size a little but keeps the performance
high.
ARM UNIT-3 Radha R C
This example handles the checksum of any size of data packet using a
loop that has been unrolled four times.
The second for loop handles the remaining cases when N is not a
multiple of four. Note that both N/4 and N&3 can be zero, so we can’t
use do-while loops.

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C
Register Allocation:
The compiler attempts to allocate a processor register to each local
variable used in a C function.
It will try to use the same register for different local variables if the use
of the variables do not overlap.
When there are more local variables than available registers, the
compiler stores the excess variables on the processor stack.
These variables are called spilled or swapped out variables since they
are written out to memory.
Spilled variables are slow to access compared to variables allocated to
registers.

ARM UNIT-3 Radha R C

To implement a function efficiently, you need to
■ Minimize the number of spilled variables
■ Ensure that the most important and frequently accessed variables
are stored in registers
The number of processor registers available in ARM C compilers for
allocating variables are given in Table.
Table shows the standard register names and usage when following the
ARM-Thumb procedure call standard (ATPCS), which is used in code
generated by C compilers.

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C
ARM UNIT-3 Radha R
C
ARM UNIT-3 Radha R C
Efficient Register Allocation
■ Try to limit the number of local variables in the internal loop of
functions to 12. The compiler should be able to allocate these to ARM
registers.
■ Guide the compiler as to which variables are important by ensuring
these variables are used within the innermost loop.

ARM UNIT-3 Radha R

C
Function Calls:
The ARM Procedure Call Standard (APCS) defines how to pass function
arguments and return values in ARM registers.
The first four integer arguments are passed in the first four ARM
registers: r0, r1, r2, and r3.
Subsequent integer arguments are placed on the full descending stack,
ascending in memory as in Figure.
Function return integer values are passed in r0.
Two-word arguments such as long long or double are passed in a pair
of consecutive argument registers and returned in r0, r1.
The compiler may pass structures in registers or by reference according
to command line compiler options.

ARM UNIT-3 Radha R

C
ARM UNIT-3 Radha R C
• The first point to note about the procedure call standard is the four-
register rule.
• Functions with four or fewer arguments are far more efficient to call
than functions with five or more arguments.
• For functions with four or fewer arguments, the compiler can pass all
the arguments in registers.
• For functions with more arguments, both the caller and callee must
access the stack for some arguments.

ARM UNIT-3 Radha R

C
Example: Illustrates the benefits of using a structure pointer.
It is a typical routine to insert N bytes from array data into a queue.
For this queue is implemented using a cyclic buffer with start address
Q_start (inclusive) and end address Q_end (exclusive).

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C
• The following code creates a Queue structure and passes this to the
function to reduce the number of function arguments.

ARM UNIT-3 Radha R C

The queue_bytes_v2 is one instruction longer than queue_bytes_v1, but it is in fact more efficient
overall. The second version has only three function arguments rather than five. Each call to the
function requires only three register setups. There is a net saving of two instructions in function
call overhead. There are likely further savings in the callee function, as it only needs to assign a
single register to the Queue structure pointer, rather than three registers in the nonstructured
case.

ARM UNIT-3 Radha R C

• The function uint_to_hex converts a 32-bit unsigned integer into an array of eight
hexadecimal digits.
• It uses a helper function nybble_to_hex, which converts a digit d in the range 0 to
15 to a hexadecimal digit.

ARM UNIT-3 Radha R C

• When we compile this, we see that uint_to_hex doesn’t call
nybble_to_hex at all!
• In the following compiled code, the compiler has inlined the
uint_to_hex code. This is more efficient than generating a function
call.

ARM UNIT-3 Radha R C

• The compiler will only inline small functions. You can ask the compiler to
inline a function using the __inline keyword, although this keyword is only
a hint and the compiler may ignore it.
• Inlining large functions can lead to big increases in code size without much
performance improvement.

Calling Functions Efficiently :

• Try to restrict functions to four arguments. This will make them more
efficient to call. Use structures to group related arguments and pass
structure pointers instead of multiple arguments.
• Define small functions in the same source file and before the functions that
call them. The compiler can then optimize the function call or inline the
small function.
• Critical functions can be inlined using the __inline keyword.

ARM UNIT-3 Radha R C

Mixing C and Assembly
• Mixing C and assembly is quite common, especially in deeply
embedded applications where programmers work nearly at the
hardware level.
• There are two ways to add assembly to your high-level source code:
the inline assembler and the embedded assembler.
• One way is through a process called inlining, where the __inline
keyword is placed in the C or C++ code to notate a function that,
when possible, should be placed in the assembly directly, rather than
being called as a subroutine.
• This potentially avoids some of the overhead associated with
branching and returning.

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R
C
ARM UNIT-3 Radha R
C
Instruction scheduling:
• Reordering the instructions in a code sequence to avoid processor
stalls.
• Since ARM implementations are pipelined, the timing of an
instruction can be affected by neighboring instructions
• It takes additional effort to optimize assembly routines so don’t
bother to optimize noncritical ones.

ARM UNIT-3 Radha R C

Cycle timing of common instructions:
Time taken to execute the instruction depends on the
implementation pipeline
• Instructions that are conditional on the value of the ARM condition codes in
the cpsr take one cycle if the condition is not met.
• If the condition is met, then the following rules apply:
• ALU operations such as addition, subtraction, shift by an immediate value and
logical operations take one cycle.
• Register-specified shift -2 cycles.
• If the instruction writes to the pc, then add two cycles
• Load instructions that load N 32-bit words of memory such as LDR and LDM
take N cycles to issue, but the result of the last word loaded is not available on
the following cycle. The updated load address is available on the next cycle.
• If the instruction loads pc, then add two cycles.
• Load instructions that load 16-bit or 8-bit data such as LDRB, LDRSB, LDRH,
and LDRSH take one cycle to issue. The load result is not available on the
following two cycles. The updated load address is available on the next cycle.

ARM UNIT-3 Radha R C

• Branch instructions take three cycles.
• Store instructions that store N values take N cycles.
• An STM/LDM of a single value is exceptional, taking two cycles.
• Multiply instructions take a varying number of cycles depending on the value of the
second operand in the product
MUL, MLA 2
xMULL, xMLAL 3
To schedule code efficiently on the ARM, we need to understand the ARM pipeline and
dependencies.
The ARM9TDMI processor performs five operations in parallel:
■ Fetch: Fetch from memory the instruction at address pc. The instruction is loaded
into the core and then processes down the core pipeline.
■ Decode: Decode the instruction that was fetched in the previous cycle. The
processor also reads the input operands from the register bank if they are not available
via one of the forwarding paths.

ARM UNIT-3 Radha R C

ALU: Executes the instruction that was decoded in the previous cycle.
Note this instruction was originally fetched from address pc − 8 (ARM
state) or pc − 4 (Thumb state).
This involves calculating the answer for a data processing operation, or
the address for a load, store, or branch operation.
LS1: Load or store the data specified by a load or store instruction. If
the instruction is not a load or store, then this stage has no effect.
LS2: Extract and zero- or sign-extend the data loaded by a byte or
halfword load instruction. If the instruction is not a load of an 8-bit
byte or 16-bit halfword item, then this stage has no effect.

ARM UNIT-3 Radha R C

• After an instruction has completed the five stages of the pipeline, the core
writes the result to the register file.
Note: pc points to the address of the instruction being fetched.
The ALU is executing the instruction that was originally fetched from address
pc − 8 in parallel with fetching the instruction at address pc.
• If an instruction requires the result of a previous instruction that is not
available, then the processor stalls. This is called a pipeline hazard or
pipeline interlock.
Example for no interlock in the pipeline.
ADD r0, r0, r1
ADD r0, r0, r2
This instruction pair takes two cycles. The ALU calculates r0 + r1 in one cycle.
Therefore this result is available for the ALU to calculate r0 + r2 in the second
cycle.

ARM UNIT-3 Radha R C

• Example shows a one-cycle interlock caused by load instruction.
LDR r1, [r2, #4]
ADD r0, r0, r1
This instruction pair takes three cycles.
The ALU calculates the address r2 + 4 in the first cycle while decoding
the ADD instruction in parallel.
However, the ADD cannot proceed on the second cycle because the
load instruction has not yet loaded the value of r1. Therefore the
pipeline stalls for one cycle while the load instruction completes the
LS1 stage.
Now that r1 is ready, the processor executes the ADD in the ALU on the
third cycle.

ARM UNIT-3 Radha R C

Figure illustrates how this interlock affects the pipeline.
• The processor stalls the ADD instruction for one cycle in the ALU stage
of the pipeline while the load instruction completes the LS1 stage
indicated in italic ADD.
• Since the LDR instruction proceeds down the pipeline, but the ADD
instruction is stalled, a gap opens up between them.
• This gap is called a pipeline bubble marked with a dash.

ARM UNIT-3 Radha R C

• This example shows a one-cycle interlock caused by delayed load use.
LDRB r1, [r2, #1]
ADD r0, r0, r2
EOR r0, r0, r1

This instruction triplet takes four cycles.

Although the ADD proceeds on the cycle following the load byte, the EOR
instruction cannot start on the third cycle.
The r1 value is not ready until the load instruction completes the LS2 stage of
the pipeline.
The processor stalls the EOR instruction for one cycle. Note that the ADD
instruction does not affect the timing at all.
The ADD doesn’t cause any stalls since the ADD does not use r1, the result of
the load.

ARM UNIT-3 Radha R C

Example shows why a branch instruction takes three cycles.
The processor must flush the pipeline when jumping to a new address.
MOV r1, #1
B case1
AND r0, r0, r1
EOR r2, r2, r3
...
case1 SUB r0, r0, r1

The three executed instructions take a total of five cycles. The MOV instruction executes on
the first cycle. On the second cycle, the branch instruction calculates the destination address.
This causes the core to flush the pipeline and refill it using this new pc value. The refill takes
two cycles.
ARM UNIT-3 Radha R C

Computer Organization and Architecture Themes and Variations 1st Edition Alan Clements Solutions Manual 1
100% (64)
Computer Organization and Architecture Themes and Variations 1st Edition Alan Clements Solutions Manual 1
29 pages
EE Lab Manuls Fast Nu
No ratings yet
EE Lab Manuls Fast Nu
71 pages
UNIT-IV Basic C Data Types
No ratings yet
UNIT-IV Basic C Data Types
24 pages
Module 3
No ratings yet
Module 3
21 pages
Es (U4) 1
No ratings yet
Es (U4) 1
24 pages
Module 2 Part B (Mces 21cs43)
No ratings yet
Module 2 Part B (Mces 21cs43)
29 pages
Module 3 Notes
No ratings yet
Module 3 Notes
18 pages
Hello World
No ratings yet
Hello World
18 pages
Module 3
No ratings yet
Module 3
35 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
25 pages
BCS402_MC_M3_Notes SJCIT
No ratings yet
BCS402_MC_M3_Notes SJCIT
18 pages
BCS402 M3
No ratings yet
BCS402 M3
110 pages
BCS402 Module 3 PDF
No ratings yet
BCS402 Module 3 PDF
18 pages
Module 3 Book1_merged
No ratings yet
Module 3 Book1_merged
42 pages
ARM MC Module 03
No ratings yet
ARM MC Module 03
21 pages
Module-5
No ratings yet
Module-5
33 pages
MC IA-2 (1)
No ratings yet
MC IA-2 (1)
14 pages
Embedded C Interview Questions
75% (4)
Embedded C Interview Questions
3 pages
Module 2
No ratings yet
Module 2
41 pages
Lecture 08
No ratings yet
Lecture 08
17 pages
SET - ARM - Inst
No ratings yet
SET - ARM - Inst
4 pages
Class Ans Q
No ratings yet
Class Ans Q
24 pages
21CS43 - MCES Module-3 Chapter 1-2023.
No ratings yet
21CS43 - MCES Module-3 Chapter 1-2023.
23 pages
Lecture 06
No ratings yet
Lecture 06
76 pages
Module-3 ARMProgram Notes.-16857877494142 PDF
No ratings yet
Module-3 ARMProgram Notes.-16857877494142 PDF
5 pages
Embedded C Programming
No ratings yet
Embedded C Programming
49 pages
Module Iii - MC
No ratings yet
Module Iii - MC
25 pages
Lecture 9 Using C
No ratings yet
Lecture 9 Using C
28 pages
Architectural Support For High Level Languages
No ratings yet
Architectural Support For High Level Languages
33 pages
ARM Flow Control Instructions
No ratings yet
ARM Flow Control Instructions
8 pages
1 5
No ratings yet
1 5
7 pages
Lecture 02
No ratings yet
Lecture 02
58 pages
CH1 - ARM - PPT New New
No ratings yet
CH1 - ARM - PPT New New
62 pages
Embedded C_Lecture 3
No ratings yet
Embedded C_Lecture 3
5 pages
Chapter 04 ARM Assembly
No ratings yet
Chapter 04 ARM Assembly
53 pages
CSE331_L3_ARM_ISA
No ratings yet
CSE331_L3_ARM_ISA
103 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
02 Arm
No ratings yet
02 Arm
53 pages
Embedded_C_1708564537
No ratings yet
Embedded_C_1708564537
55 pages
ARM Assembly Language Guide: Common ARM Instructions (And Psuedo-Instructions)
No ratings yet
ARM Assembly Language Guide: Common ARM Instructions (And Psuedo-Instructions)
7 pages
Chap2 RISC5 ISA
No ratings yet
Chap2 RISC5 ISA
31 pages
Q&A Module-2 (1)
No ratings yet
Q&A Module-2 (1)
12 pages
Aula Ch2 2
No ratings yet
Aula Ch2 2
27 pages
Arm Instruction Set
No ratings yet
Arm Instruction Set
54 pages
unit 1 topic 3
No ratings yet
unit 1 topic 3
21 pages
CA L7 Unit2 Slides
No ratings yet
CA L7 Unit2 Slides
110 pages
Lecture-08 ARM - Data Processing Instructions
No ratings yet
Lecture-08 ARM - Data Processing Instructions
40 pages
C for Embedded Systems Programming
No ratings yet
C for Embedded Systems Programming
69 pages
Efficient Embedded C Programming
No ratings yet
Efficient Embedded C Programming
70 pages
Embedded C Programming
No ratings yet
Embedded C Programming
33 pages
Introduction To ARM Systems-11!17!2012
No ratings yet
Introduction To ARM Systems-11!17!2012
203 pages
Introduction To ARM Systems-11!17!2012
No ratings yet
Introduction To ARM Systems-11!17!2012
203 pages
Mpca Lab1a Introduction to Armsim - Copy
No ratings yet
Mpca Lab1a Introduction to Armsim - Copy
21 pages
Introduction To Assembler: Microcontroller VL Thomas Nowak TU Wien
No ratings yet
Introduction To Assembler: Microcontroller VL Thomas Nowak TU Wien
26 pages
ARM - QB-Unit-3 & 4
No ratings yet
ARM - QB-Unit-3 & 4
3 pages
Instruction Set Architecture (ISA)
No ratings yet
Instruction Set Architecture (ISA)
41 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
C Programming
From Everand
C Programming
Netra
No ratings yet
1.1.1a Architecture of The CPU - Workbook OCR GCSE
No ratings yet
1.1.1a Architecture of The CPU - Workbook OCR GCSE
6 pages
Multiple Choice Questions For Computer Operator
No ratings yet
Multiple Choice Questions For Computer Operator
5 pages
Three-Wire Automotive Temperature Serial Eeproms: 1. Features
No ratings yet
Three-Wire Automotive Temperature Serial Eeproms: 1. Features
14 pages
Microprocessors and Microcontrollers-2
No ratings yet
Microprocessors and Microcontrollers-2
141 pages
Microprocessors and Microcontrollers Lab Manual
No ratings yet
Microprocessors and Microcontrollers Lab Manual
159 pages
Microprocessor Syllabus
No ratings yet
Microprocessor Syllabus
2 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
Intel 8086 Microprocessor Architecture, Features, And: Signals
No ratings yet
Intel 8086 Microprocessor Architecture, Features, And: Signals
11 pages
DSP Algorithm and Architecture PDF
No ratings yet
DSP Algorithm and Architecture PDF
199 pages
Fall 22-23 COA Lecture-9 Overview of Multiplication and Division
No ratings yet
Fall 22-23 COA Lecture-9 Overview of Multiplication and Division
38 pages
J-Link GDB Server - GDB Protocol Extension: Document: UM08036 Software Version: 1.00 Revision: 0
No ratings yet
J-Link GDB Server - GDB Protocol Extension: Document: UM08036 Software Version: 1.00 Revision: 0
27 pages
ARM Prog Model 1
No ratings yet
ARM Prog Model 1
29 pages
BHUSA2015 Unicorn
No ratings yet
BHUSA2015 Unicorn
39 pages
Computer and Communication
No ratings yet
Computer and Communication
40 pages
Programming Languages and Systems 1st Edition Nobuko Yoshida Download PDF
100% (4)
Programming Languages and Systems 1st Edition Nobuko Yoshida Download PDF
62 pages
INVITED Protecting RISC-V Against Side-Channel Attacks
No ratings yet
INVITED Protecting RISC-V Against Side-Channel Attacks
4 pages
UNIT 5 MP&MC (Final)
No ratings yet
UNIT 5 MP&MC (Final)
61 pages
Berkeley RISC
No ratings yet
Berkeley RISC
6 pages
LU - Microprocessor I - Chap01 - Introduction To Microprocessor-Based Embedded Systems Design - FA2018
No ratings yet
LU - Microprocessor I - Chap01 - Introduction To Microprocessor-Based Embedded Systems Design - FA2018
74 pages
Assignment 01: 1. What Is A Micro Operation? List and Explain Its Categories
No ratings yet
Assignment 01: 1. What Is A Micro Operation? List and Explain Its Categories
15 pages
18ecl66 - Embedded Systems Laboratory
100% (1)
18ecl66 - Embedded Systems Laboratory
33 pages
UNIT-5: Micro Controller Programming and Applications
No ratings yet
UNIT-5: Micro Controller Programming and Applications
19 pages
Data-Level Parallelism: Nima Honarmand
No ratings yet
Data-Level Parallelism: Nima Honarmand
59 pages
Large x86 - 64 Assembly Programs
No ratings yet
Large x86 - 64 Assembly Programs
10 pages
DDCO QUESTION BANK
No ratings yet
DDCO QUESTION BANK
1 page
Register Transfer Language
No ratings yet
Register Transfer Language
20 pages
300+ TOP Computer Organization & Architecture MCQs and Answers 2023
No ratings yet
300+ TOP Computer Organization & Architecture MCQs and Answers 2023
41 pages
Flynn's Taxonomy
No ratings yet
Flynn's Taxonomy
6 pages
Features of 8086: Unit - 4 Sec1312 Microprocessor and Microcontroller Based Systems
No ratings yet
Features of 8086: Unit - 4 Sec1312 Microprocessor and Microcontroller Based Systems
28 pages
Chapter-3 Real Time OS
No ratings yet
Chapter-3 Real Time OS
130 pages

Arm Unit 3

Uploaded by

Arm Unit 3

Uploaded by

UNIT – III

Embedded C codes: Overview of C compiler and optimization, Basic C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

• It looks as though declaring i as a char is efficient as char uses less

BCC: Branch if Carry is

ARM UNIT-3 Radha R C

The loop is now three instructions

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

Gcc: Makes no assumptions about the range of argument

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

• When i = -0x80000000, two sections of code generate

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R

ARM UNIT-3 Radha R

Reduced the loop overhead from 4N cycles to (4N)/4=N cycles

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R

ARM UNIT-3 Radha R

ARM UNIT-3 Radha R

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

Calling Functions Efficiently :

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

ARM UNIT-3 Radha R C

This instruction triplet takes four cycles.

ARM UNIT-3 Radha R C

You might also like