Compiler User Guide 100748 0616 01 en
Compiler User Guide 100748 0616 01 en
Version 6.16
User Guide
Arm® Compiler
User Guide
Copyright © 2016–2021 Arm Limited or its affiliates. All rights reserved.
Release Information
Document History
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE
WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to, and has
undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or other
rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of
this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is
not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s customers is
not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document at
any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any click through or signed written
agreement covering this document with Arm, then the click through or signed written agreement prevails over and supersedes the
conflicting provisions of these terms. This document may be translated into other languages for convenience, and you agree that if
there is any conflict between the English version of this document and any translation, the terms of the English version of the
Agreement shall prevail.
The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited (or its
subsidiaries) in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the
trademarks of their respective owners. Please follow Arm’s trademark usage guidelines at http://www.arm.com/company/policies/
trademarks.
Copyright © 2016–2021 Arm Limited (or its affiliates). All rights reserved.
(LES-PRE-20349)
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.
We believe that this document contains no offensive terms. If you find offensive terms in this document, please contact
[email protected].
Preface
About this book ..................................................... ..................................................... 12
Chapter 9 Overlays
9.1 Overlay support in Arm® Compiler .................................... .................................... 9-166
9.2 Automatic overlay support .......................................... .......................................... 9-167
9.3 Manual overlay support ............................................ ............................................ 9-172
Glossary
The Arm® Glossary is a list of terms used in Arm documentation, together with definitions for those
terms. The Arm Glossary does not contain terms that are industry standard unless the Arm meaning
differs from the generally accepted meaning.
See the Arm® Glossary for more information.
Typographic conventions
italic
Introduces special terminology, denotes cross-references, and citations.
bold
Highlights interface elements, such as menu names. Denotes signal names. Also used for terms
in descriptive lists, where appropriate.
monospace
Denotes text that you can enter at the keyboard, such as commands, file and program names,
and source code.
monospace
Denotes a permitted abbreviation for a command or option. You can enter the underlined text
instead of the full command or option name.
monospace italic
Denotes arguments to monospace text where the argument is to be replaced by a specific value.
monospace bold
Denotes language keywords when used outside example code.
<and>
Encloses replaceable terms for assembler syntax where they appear in code or code fragments.
For example:
MRC p15, 0, <Rd>, <CRn>, <CRm>, <Opcode_2>
SMALL CAPITALS
Used in body text for a few terms that have specific technical meanings, that are defined in the
Arm® Glossary. For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and
UNPREDICTABLE.
Feedback
Feedback on content
If you have comments on content then send an e-mail to [email protected]. Give:
• The title Arm Compiler User Guide.
• The number 100748_0616_01_en.
• If applicable, the page number(s) to which your comments refer.
• A concise explanation of your comments.
Arm also welcomes general suggestions for additions and improvements.
Note
Arm tests the PDF only in Adobe Acrobat and Acrobat Reader, and cannot guarantee the quality of the
represented document when used with any other PDF reader.
Other information
• Arm® Developer.
• Arm® Documentation.
• Technical Support.
• Arm® Glossary.
This chapter introduces Arm Compiler 6 and helps you to start working with Arm Compiler 6 quickly.
You can use Arm Compiler 6 from Arm Development Studio, Arm DS-5 Development Studio, Arm
Keil® MDK, or as a standalone product.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-15
reserved.
Non-Confidential
1 Getting Started
1.1 Introduction to Arm® Compiler 6
armlink
The linker combines the contents of one or more object files with selected parts of one or more
object libraries to produce an executable program.
armar
The archiver enables sets of ELF object files to be collected together and maintained in archives
or libraries. If you do not change the files often, these collections reduce compilation time as
you do not have to recompile from source every time you use them. You can pass such a library
or archive to the linker in place of several ELF files. You can also use the archive for
distribution to a third-party application developer as you can share the archive without giving
away the source code.
fromelf
The image conversion utility can convert Arm ELF images to binary formats. It can also
generate textual information about the input image, such as its disassembly, code size, and data
size.
Arm C++ libraries
The Arm C++ libraries are based on the LLVM libc++ project:
• The libc++abi library is a runtime library providing implementations of low-level language
features.
• The libc++ library provides an implementation of the ISO C++ library standard. It depends
on the functions that are provided by libc++abi.
Note
Arm does not guarantee the compatibility of C++ compilation units compiled with different
major or minor versions of Arm Compiler and linked into a single image. Therefore, Arm
recommends that you always build your C++ code from source with a single version of the
toolchain.
You can mix C++ with C code or C libraries.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-16
reserved.
Non-Confidential
1 Getting Started
1.1 Introduction to Arm® Compiler 6
Arm C libraries
The Arm C libraries provide:
• An implementation of the library features as defined in the C standards.
• Nonstandard extensions common to many C libraries.
• POSIX extended functionality.
• Functions standardized by POSIX.
Note
Comments inside source files and header files that are provided by Arm might not be accurate and must
not be treated as documentation about the product.
Application development
A typical application development flow might involve the following:
• Developing C/C++ source code for the main application (armclang).
• Developing assembly source code for near-hardware components, such as interrupt service routines
(armclang, or armasm for legacy assembly code).
• Linking all objects together to generate an image (armlink).
• Converting an image to flash format in plain binary, Intel Hex, and Motorola-S formats (fromelf).
The following figure shows how the compilation tools are used for the development of a typical
application.
code
C/C++ A32 .c .o data
and T32 code
debug
Note
Be aware of the following:
• Generated code might be different between two Arm Compiler releases.
• For a feature release, there might be significant code generation differences.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-17
reserved.
Non-Confidential
1 Getting Started
1.1 Introduction to Arm® Compiler 6
Related concepts
1.6 Compiling a Hello World example on page 1-23
Related references
3.2 Common Arm® Compiler toolchain options on page 3-42
Related information
-S (armclang)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-18
reserved.
Non-Confidential
1 Getting Started
1.2 About the Arm® Compiler toolchain assemblers
Note
The command-line option descriptions and related information in the Arm® Compiler Reference Guide
describe all the features that Arm Compiler supports. Any features not documented are not supported and
are used at your own risk. You are responsible for making sure that any generated code using community
features on page Appx-A-266 is operating correctly.
Related concepts
5.1 Assembling armasm and GNU syntax assembly code on page 5-99
Related references
Chapter 6 Using Assembly and Intrinsics in C or C++ Code on page 6-102
Related information
Arm Compiler Reference Guide
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-19
reserved.
Non-Confidential
1 Getting Started
1.3 Installing Arm® Compiler
System Requirements
Arm Compiler 6 is available for the following operating systems:
• Windows 64-bit.
• Windows 32-bit.
• Linux 64-bit.
For more information on system requirements see the Arm® Compiler release note.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-20
reserved.
Non-Confidential
1 Getting Started
1.4 Accessing Arm® Compiler from Arm® Development Studio
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-21
reserved.
Non-Confidential
1 Getting Started
1.5 Accessing Arm® Compiler from the Arm® Keil® µVision® IDE
1.5 Accessing Arm® Compiler from the Arm® Keil® µVision® IDE
MDK is a microprocessor development suite that provides the µVision® IDE, and Arm Compiler as a
built-in toolchain.
For more information, see Manage Arm® Compiler Versions in the µVision® User's Guide.
Related references
1.3 Installing Arm® Compiler on page 1-20
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-22
reserved.
Non-Confidential
1 Getting Started
1.6 Compiling a Hello World example
A simple example
The source code that is used in the examples is a single C source file, hello.c, to display a greeting
message:
#include <stdio.h>
int main() {
printf("Hello World\n");
return 0;
}
This command creates an executable file with the default name a.out. You can use the -o
option to specify a different name for the executable file.
This example compiles for an AArch64 state target. Because only --target is specified, the
compiler defaults to generating code that runs on any Armv8‑A target. You can also use -mcpu
to target a specific processor.
Compiling for an AArch32 target
To create an executable for an AArch32 target in a single step:
armclang --target=arm-arm-none-eabi -mcpu=cortex-a53 hello.c
There is no default target for AArch32 state. You must specify either -march to target an
architecture or -mcpu to target a processor. This example uses -mcpu to target the Cortex‑A53
processor. The compiler generates code that is optimized specifically for the Cortex‑A53, but
might not run on other processors.
Use -mcpu=list or -march=list to see all available processor or architecture options.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-23
reserved.
Non-Confidential
1 Getting Started
1.6 Compiling a Hello World example
...
main
0x000081a0: e92d4800 .H-. PUSH {r11,lr}
0x000081a4: e1a0b00d .... MOV r11,sp
0x000081a8: e24dd010 ..M. SUB sp,sp,#0x10
0x000081ac: e3a00000 .... MOV r0,#0
0x000081b0: e50b0004 .... STR r0,[r11,#-4]
0x000081b4: e30a19cc .... MOV r1,#0xa9cc
...
• Examine the size of code and data in the executable:
fromelf --text -z a.out
See fromelf Command-line Options for the options from the fromelf tool.
This command compiles the two source files file1.c and file2.c into an executable file for an
AArch64 state target. The -o option specifies that the filename of the generated executable file is
image.axf.
However, more complex projects might have a large number of source files. It is not efficient to compile
every source file at every compilation, because many of the source files are unlikely to change. To avoid
compiling unchanged source files, you can compile and link as separate steps. In this way, you can then
use a build system (such as make) to compile only those source files that have changed, then link the
object code together. The armclang -c option tells the compiler to compile to object code and stop
before calling the linker:
armclang -c --target=aarch64-arm-none-eabi file1.c
armclang -c --target=aarch64-arm-none-eabi file2.c
armlink file1.o file2.o -o image.axf
Related information
--target (armclang)
-march (armclang)
-mcpu (armclang)
Summary of armclang command-line options
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-24
reserved.
Non-Confidential
1 Getting Started
1.7 Using the integrated assembler
Note
The integrated assembler sets a minimum alignment of 4 bytes for a .text section. However, if you
define your own sections with the integrated assembler, then you must include the .balign directive to
set the correct alignment. For a section containing T32 instructions, set the alignment to 2 bytes. For a
section containing A32 instructions, set the alignment to 4 bytes.
.global mystrcopy
.type mystrcopy, "function"
mystrcopy:
ldrb r2, [r1], #1
strb r2, [r0], #1
cmp r2, #0
bne mystrcopy
bx lr
The .section directive creates a new section in the object file named StringCopy. The characters in the
string following the section name are the flags for this section. The a flag marks this section as
allocatable. The x flag marks this section as executable.
The .balign directive aligns the subsequent code to a 4-byte boundary. The alignment is required for
compliance with the Arm® Application Procedure Call Standard (AAPCS).
The .global directive marks the symbol mystrcopy as a global symbol. This enables the symbol to be
referenced by external files.
The .type directive sets the type of the symbol mystrcopy to function. This helps the linker use the
proper linkage when the symbol is branched to from A32 or T32 code.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-25
reserved.
Non-Confidential
1 Getting Started
1.7 Using the integrated assembler
...
** Section #3 'StringCopy' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 14 bytes (alignment 4)
Address: 0x00000000
$t.0
mystrcopy
0x00000000: f8112b01 ...+ LDRB r2,[r1],#1
0x00000004: f8002b01 ...+ STRB r2,[r0],#1
0x00000008: 2a00 .* CMP r2,#0
0x0000000a: d1f9 .. BNE mystrcopy ; 0x0
0x0000000c: 4770 pG BX lr
...
The example shows the disassembly for the section StringCopy as created in the source file.
Note
The code is marked as T32 by default because Armv8‑M Mainline does not support A32 code. For
processors that support A32 and T32 code, you can explicitly mark the code as A32 or T32 by adding the
GNU assembly .arm or .thumb directive, respectively, at the start of the source file.
int main(void) {
mystrcopy(dest, source);
return 0;
}
An extern function declaration has been added for the mystrcopy function. The return type and function
parameters must be checked manually.
If you want to call the assembly function from a C++ source file, you must disable C++ name mangling
by using extern "C" instead of extern. For the above example, use:
extern "C" void mystrcopy(char *dest, const char *source);
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-26
reserved.
Non-Confidential
1 Getting Started
1.7 Using the integrated assembler
Related concepts
3.1 Mandatory armclang options on page 3-40
Related information
Summary of armclang command-line options
Sections
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-27
reserved.
Non-Confidential
1 Getting Started
1.8 Running bare-metal images
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-28
reserved.
Non-Confidential
1 Getting Started
1.9 Architectures supported by Arm® Compiler
arm-arm-none-eabi
Generates A32 and T32 instructions for AArch32 state. Must be used in conjunction with -
march (to target an architecture) or -mcpu (to target a processor).
To generate generic code that runs on any processor with a particular architecture, use the -march option.
Use the -march=list option to see all supported architectures.
To optimize your code for a particular processor, use the -mcpu option. Use the -mcpu=list option to see
all supported processors.
Note
The --target, -march, and -mcpu options are armclang options. For all of the other tools, such as
armasm and armlink, use the --cpu option to specify target processors and architectures.
Related information
--target (armclang)
-march (armclang)
-mcpu (armclang)
--cpu (armlink)
Arm Glossary
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-29
reserved.
Non-Confidential
1 Getting Started
1.10 Using Arm® Compiler securely in a shared environment
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 1-30
reserved.
Non-Confidential
Chapter 2
Getting Started with the SVE features in Arm®
Compiler
Describes how to generate an executable binary that makes use of the instructions provided by the SVE
architectural extension to the Armv8‑A architecture.
It contains the following sections:
• 2.1 Introducing SVE on page 2-32.
• 2.2 Assembling SVE code on page 2-33.
• 2.3 Disassembling SVE object files on page 2-35.
• 2.4 Running a binary in an AEMv8-A Base Fixed Virtual Platform (FVP) on page 2-36.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-31
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.1 Introducing SVE
Note
The Arm Compiler toolchain only supports bare-metal applications. For SVE compilation for Linux, use
Arm Compiler for Linux, that is part of Arm Allinea Studio.
Note
Arm Compiler does not support auto-vectorization for SVE. If you require auto-vectorization for SVE,
then you must use Arm Compiler for Linux. For more information, see Arm Allinea Studio.
Related information
Arm Compiler 6 documentation
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-32
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.2 Assembling SVE code
To assemble this source file into a binary object file, use armclang with an SVE-enabled target:
armclang -c --target=aarch64-arm-none-eabi -march=armv8-a+sve example1.s -o
example1.o
-march=armv8-a+sve
Specifies that the compiler targets the Armv8‑A architecture profile with the SVE target feature
enabled.
The default for AArch64 is -march=armv8-a, that is the Armv8‑A architecture profile without
the SVE extension. You must explicitly specify +sve to assemble SVE instructions.
Armv8‑A and later architectures support the SVE extension. For example, -march=armv8.1-a
+sve.
example1.s
Input assembly language file.
-o example1.o
Output ELF object file.
Related tasks
2.3 Disassembling SVE object files on page 2-35
Related information
Arm Compiler Reference Guide
armclang -c option
armclang -o option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-33
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.2 Assembling SVE code
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-34
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.3 Disassembling SVE object files
Procedure
1. Create the C file daxpy.c containing the following code:
#ifdef __ARM_FEATURE_SVE
#include <arm_sve.h>
#endif /* __ARM_FEATURE_SVE */
Results:
The disassembly is as follows:
...
** Section #3 '.text.daxpy_1_1' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 76 bytes (alignment 4)
Address: 0x00000000
$x.0
daxpy_1_1
0x00000000: 04e0e3e9 .... CNTD x9
0x00000004: aa1f03e8 .... MOV x8,xzr
0x00000008: 25e017e0 ...% WHILELT p0.D,xzr,x0
0x0000000c: 05282000 . (. MOV z0.D,d0
0x00000010: 25d8e3e1 ...% PTRUE p1.D
0x00000014: 04e7e3ea .... CNTD x10,ALL,MUL #8
0x00000018: aa0903eb .... MOV x11,x9
0x0000001c: 8b08002c ,... ADD x12,x1,x8
0x00000020: 8b08004d M... ADD x13,x2,x8
0x00000024: a5e0a181 .... LD1D {z1.D},p0/Z,[x12]
0x00000028: a5e0a1a2 .... LD1D {z2.D},p0/Z,[x13]
0x0000002c: 8b0a0108 .... ADD x8,x8,x10
0x00000030: 65e00022 "..e FMLA z2.D,p0/M,z1.D,z0.D
0x00000034: e5e0e1a2 .... ST1D {z2.D},p0,[x13]
0x00000038: 25e01560 `..% WHILELT p0.D,x11,x0
0x0000003c: 2550c400 ..P% PTEST p1,p0.B
0x00000040: 8b09016b k... ADD x11,x11,x9
0x00000044: 54fffec1 ...T B.NE {pc}-0x28 ; 0x1c
0x00000048: d65f03c0 .._. RET
...
Related concepts
2.2 Assembling SVE code on page 2-33
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-35
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.4 Running a binary in an AEMv8-A Base Fixed Virtual Platform (FVP)
Where:
$FVP_BASE
Defines the SVE vector width, in units of 64-bit (8 byte) blocks. The maximum value is 32,
which corresponds to the architectural maximum SVE vector width of 2048 bits (256 bytes).
The SVE architecture only supports vector lengths in 128-bit (16 byte increments), so all values
of $VECLEN must be even. For example, a value of 8 signifies a 512-bit vector width.
--quiet
Specifies that the FVP emits reduced output. For example, if --quiet is omitted, Simulation
is started and Simulation is terminating messages are output to signify the start and end
of program execution.
--stat
Specifies that the FVP writes a short summary of program execution to standard output
following termination (even if --quiet is specified).
This output is of the form:
Total instructions executed: 10344
User time: 0.01 sec
Kernel time: 0.00 sec
CPU time: 0.01 sec
Elapsed clock: 0.00 sec
$CMDLINE
Specifies the command line to pass to your program. This command line is typically of the form
"./binary_name arg1 arg2".
$BINARY
Specifies the path to the compiled binary that the FVP is to load and execute.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-36
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.4 Running a binary in an AEMv8-A Base Fixed Virtual Platform (FVP)
A sample application
The following sample application uses the svld1, svst1, svcntd, svmla_x, and svwhilelt_b64
intrinsics:
// daxpy_acle.c
#include <stdio.h>
#ifdef __ARM_FEATURE_SVE
#include <arm_sve.h>
#endif /* __ARM_FEATURE_SVE */
*dx = 1.5;
*dy = 1.5;
daxpy_1_1(10, da, dx, dy);
return 0;
}
VECLEN=$1
CMDLINE=$2
$FVP_BASE/FVP_Base_AEMv8A-AEMv8A \
--plugin $FVP_BASE/ScalableVectorExtension.so \
-C SVE.ScalableVectorExtension.veclen=$VECLEN \
--quiet \
--stat \
-C cluster0.NUM_CORES=1 \
-C bp.secure_memory=0 \
-C bp.refcounter.non_arch_start_at_default=1 \
-C cluster0.cpu0.semihosting-use_stderr=1 \
-C bp.vis.disable_visualisation=1 \
-C cluster0.cpu0.semihosting-cmd_line="$CMDLINE" \
-a cluster0.cpu0=$CMDLINE
This script loads and executes the compiled binary with the FVP.
Related information
Arm Compiler Reference Guide
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-37
reserved.
Non-Confidential
2 Getting Started with the SVE features in Arm® Compiler
2.4 Running a binary in an AEMv8-A Base Fixed Virtual Platform (FVP)
armclang -o option
armclang -Xlinker option
armclang -O option
armclang -march option
armclang --target option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 2-38
reserved.
Non-Confidential
Chapter 3
Using Common Compiler Options
There are many options that you can use to control how Arm Compiler generates code for your
application. This section lists the mandatory and commonly used optional command-line arguments,
such as to control target selection, optimization, and debug view.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-39
reserved.
Non-Confidential
3 Using Common Compiler Options
3.1 Mandatory armclang options
Specifying a target
To specify a target, use the --target option. The following targets are available:
• To generate A64 instructions for AArch64 state, specify --target=aarch64-arm-none-eabi.
Note
For AArch64, the default architecture is Armv8‑A.
• To generate A32 and T32 instructions for AArch32 state, specify --target=arm-arm-none-eabi. To
specify generation of either A32 or T32 instructions, use -marm or -mthumb respectively.
Note
AArch32 has no defaults. You must always specify an architecture or processor.
Specifying an architecture
To generate code for a specific architecture, use the -march option. The supported architectures vary
according to the selected target.
To see a list of all the supported architectures for the selected target, use -march=list.
Specifying a processor
To generate code for a specific processor, use the -mcpu option. The supported processors vary according
to the selected target.
To see a list of all the supported processors for the selected target, use -mcpu=list.
It is also possible to enable or disable optional architecture features, by using the +[no]feature notation.
For a list of the architecture features that your processor supports, see the processor product
documentation. See the Arm Compiler Reference Guide for a list of architecture features that Arm
Compiler supports.
Use +feature or +nofeature to explicitly enable or disable an optional architecture feature.
Note
Avoid specifying both the architecture (-march) and the processor (-mcpu) because specifying both has
the potential to cause a conflict. The compiler infers the correct architecture from the processor.
• If you want to run code on one particular processor, specify the processor using -mcpu. Performance
is optimized, but code is only guaranteed to run on that processor. If you specify a value for -mcpu,
do not also specify a value for -march.
• If you want your code to run on a range of processors from a particular architecture, specify the
architecture using -march. The code runs on any processor implementation of the target architecture,
but performance might be impacted. If you specify a value for -march, do not also specify a value for
-mcpu.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-40
reserved.
Non-Confidential
3 Using Common Compiler Options
3.1 Mandatory armclang options
Examples
These examples compile and link the input file helloworld.c:
• To compile for the Armv8‑A architecture in AArch64 state, use:
armclang --target=aarch64-arm-none-eabi -march=armv8-a helloworld.c
• To compile for the Armv8‑R architecture in AArch32 state, use:
armclang --target=arm-arm-none-eabi -march=armv8-r helloworld.c
• To compile for the Armv8‑M architecture mainline profile, use:
armclang --target=arm-arm-none-eabi -march=armv8-m.main helloworld.c
• To compile for a Cortex‑A53 processor in AArch64 state, use:
armclang --target=aarch64-arm-none-eabi -mcpu=cortex-a53 helloworld.c
• To compile for a Cortex‑A53 processor in AArch32 state, use:
armclang --target=arm-arm-none-eabi -mcpu=cortex-a53 helloworld.c
• To compile for a Cortex-M4 processor, use:
armclang --target=arm-arm-none-eabi -mcpu=cortex-m4 helloworld.c
• To compile for a Cortex-M33 processor, with DSP disabled, use:
armclang --target=arm-arm-none-eabi -mcpu=cortex-m33+nodsp helloworld.c
• To target the AArch32 state of an Arm Neoverse N1 processor, use:
armclang --target=arm-arm-none-eabi -mcpu=neoverse-n1 helloworld.c
• To target the AArch64 state of an Arm Neoverse E1 processor, use:
armclang --target=aarch64-arm-none-eabi -mcpu=neoverse-e1 helloworld.c
Related information
--target (armclang)
-march (armclang)
-mcpu (armclang)
-marm (armclang)
-mthumb (armclang)
Summary of armclang command-line options
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-41
reserved.
Non-Confidential
3 Using Common Compiler Options
3.2 Common Arm® Compiler toolchain options
Option Description
-c Performs the compilation step, but not the link step.
-x Specifies the language of the subsequent source files, -xc inputfile.s or -xc++
inputfile.s for example.
-std Specifies the language standard to compile for, -std=c90 for example.
--target=arch- Generates code for the selected Execution state (AArch32 or AArch64), for example
vendor-os-abi --target=aarch64-arm-none-eabi or --target=arm-arm-none-eabi.
-march=name Generates code for the specified architecture, for example -march=armv8-a or
-march=armv7-a.
-march=list Displays a list of all the supported architectures for the selected execution state.
-mcpu=name Generates code for the specified processor, for example -mcpu=cortex-a53,
-mcpu=cortex-a57, or -mcpu=cortex-a15.
-mcpu=list Displays a list of all the supported processors for the selected execution state.
-marm Requests that the compiler targets the A32 instruction set, which consists of 32-bit
wide instructions only. For example,
--target=arm-arm-none-eabi -march=armv7-a -marm. This option
emphasizes performance.
The -marm option is not valid with M-profile or AArch64 targets:
• If you use the -marm option with an M-profile target architecture, the compiler
generates an error and stops, and does not output any code.
• For AArch64 targets, the compiler ignores the -marm option and generates a
warning.
-mthumb Requests that the compiler targets the T32 instruction set, which consists of both 16-
bit wide and 32-bit wide instructions. For example,
--target=arm-arm-none-eabi -march=armv8-a -mthumb. This option
emphasizes code density.
The -mthumb option is not valid with AArch64 targets. The compiler ignores the
-mthumb option and generates a warning if used with AArch64 targets.
-mfloat-abi Specifies whether to use hardware instructions or software library functions for
floating-point operations.
-mfpu Specifies the target FPU architecture.
-g Generates DWARF debug tables compatible with the DWARF 4 standard.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-42
reserved.
Non-Confidential
3 Using Common Compiler Options
3.2 Common Arm® Compiler toolchain options
Option Description
-E Executes only the preprocessor step.
-I Adds the specified directories to the list of places that are searched to find included
files.
-o Specifies the name of the output file.
-Onum Specifies the level of performance optimization to use when compiling source files.
-Os Balances code size against code speed.
-Oz Optimizes for code size.
-S Outputs the disassembly of the machine code that the compiler generates.
-### Displays diagnostic output showing the options that would be used to invoke the
compiler and linker. The compilation and link steps are not performed.
Option Description
--scatter=filename Creates an image memory map using the scatter-loading description that the specified
file contains.
--entry Specifies the unique initial entry point of the image.
--info Displays information about linker operation. For example, --
info=sizes,unused,unusedsymbols displays information about all of the
following:
• Code and data sizes for each input object and library member in the image.
• Unused sections that --remove has removed from the code.
• Symbols that were removed with the unused sections.
--list=filename Redirects diagnostics output from options including --info and --map to the
specified file.
--map Displays a memory map containing the address and the size of each load region,
execution region, and input section in the image, including linker-generated input
sections.
--symbols Lists each local and global symbol that is used in the link step, and their values.
--keep=section_id Specifies input sections that unused section elimination must not remove.
--load_addr_map_info Includes the load addresses for execution regions and the input sections within them
in the map file.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-43
reserved.
Non-Confidential
3 Using Common Compiler Options
3.2 Common Arm® Compiler toolchain options
Option Description
--debug_symbols Includes debug symbols in the library.
-a pos_name Places new files in the library after the file pos_name.
-b pos_name Places new files in the library before the file pos_name.
Option Description
--elf Selects ELF output mode.
--text [options] Displays image information in text format.
The optional options specify additional information to include in the image
information. Valid options include -c to disassemble code, and -s to print the
symbol and versioning tables.
--info Displays information about specific topics, for example --info=totals lists the
Code, RO Data, RW Data, ZI Data, and Debug sizes for each input object and
library member in the image.
Option Description
--cpu=name Sets the target processor.
-g Generates DWARF debug tables compatible with the DWARF 3 standard.
--fpu=name Selects the target floating-point unit (FPU) architecture.
-o Specifies the name of the output file.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-44
reserved.
Non-Confidential
3 Using Common Compiler Options
3.3 Selecting source language options
Note
This topic includes descriptions of [ALPHA] and [COMMUNITY] features. See Support level
definitions on page Appx-A-266.
Source language
By default Arm Compiler treats files with .c extension as C source files. If you want to compile a .c
file, for example file.c, as a C++ source file, use the -xc++ option:
armclang --target=aarch64-arm-none-eabi -march=armv8-a -xc++ file.c
By default Arm Compiler treats files with .cpp extension as C++ source files. If you want to compile
a .cpp file, for example file.cpp, as a C source file, use the -xc option:
armclang --target=aarch64-arm-none-eabi -march=armv8-a -xc file.cpp
The -x option only applies to input files that follow it on the command line.
- - c++14 gnu++14
The default language standard for C code is gnu11 [COMMUNITY]. The default language standard for
C++ code is gnu++14. To specify a different source language standard, use the -std=name option.
Note
Arm does not guarantee the compatibility of C++ compilation units compiled with different major or
minor versions of Arm Compiler and linked into a single image. Therefore, Arm recommends that you
always build your C++ code from source with a single version of the toolchain.
You can mix C++ with C code or C libraries.
Arm Compiler supports various language extensions, including GCC extensions, which you can use in
your source code. The GCC extensions are only available when you specify one of the GCC C or C++
language variants. For more information on language extensions, see the Arm® C Language Extensions in
Arm Compiler.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-45
reserved.
Non-Confidential
3 Using Common Compiler Options
3.3 Selecting source language options
Because Arm Compiler uses the available language extensions by default, it does not adhere to the strict
ISO standard. To compile to strict ISO standard for the source language, use the -Wpedantic option. This
option generates warnings where the source code violates the ISO standard. Arm Compiler does not
support strict adherence to C++98 or C++03.
If you do not use -Wpedantic, Arm Compiler uses the available language extensions without warning.
However, where language variants produce different behavior, the behavior is that of the language
variant that -std specifies.
Note
Certain compiler optimizations can violate strict adherence to the ISO standard for the language. To
identify when these violations happen, use the -Wpedantic option.
The following example shows the use of a variable length array, which is a C99 feature. In this example,
the function declares an array i, with variable length n.
#include <stdlib.h>
void function(int n) {
int i[n];
}
Arm Compiler does not warn when compiling the example for C99 with -Wpedantic:
armclang --target=aarch64-arm-none-eabi -march=armv8-a -c -std=c99 -Wpedantic file.c
Arm Compiler does warn about variable length arrays when compiling the example for C90 with -
Wpedantic:
All supported standard versions For all supported standards, the libc++ library deviates from the standard library as
follows:
• For std::vector<bool>::const_reference, the standards require the
const_reference type to be bool. However, in libc++ the const_reference
type is an IMPLEMENTATION DEFINED, read-only bit reference class.
• For std::bitset<N>, the standards require bool operator[](size_t pos)
const; to return bool. However, in libc++ bool operator[](size_t pos)
const; returns an IMPLEMENTATION DEFINED, read-only bit reference object.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-46
reserved.
Non-Confidential
3 Using Common Compiler Options
3.3 Selecting source language options
Table 3-7 Exceptions to the support for the language standards (continued)
C11 [COMMUNITY] The base Clang component provides C11 language functionality. However, Arm has
performed no independent testing of these features and therefore these features are
[COMMUNITY] features. Use of C11 library features is unsupported.
C11 is the default language standard for C code. However, use of the new C11 language
features is a community feature. Use the -std option to restrict the language standard if
necessary. Use the -Wc11-extensions option to warn about any use of C11-specific
features.
C++11 • Concurrency constructs or other constructs that are enabled through the following
standard library headers are [ALPHA] supported:
— <thread>
— <mutex>
— <shared_mutex>
— <condition_variable>
— <future>
— <chrono>
— <atomic>
— For more details, contact the Arm Support team.
• The thread_local keyword is not supported.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-47
reserved.
Non-Confidential
3 Using Common Compiler Options
3.3 Selecting source language options
Table 3-7 Exceptions to the support for the language standards (continued)
C++14 • Concurrency constructs or other constructs that are enabled through the following
standard library headers are [ALPHA] supported:
— <thread>
— <mutex>
— <shared_mutex>
— <condition_variable>
— <future>
— <chrono>
— <atomic>
— For more details, contact the Arm Support team.
• The thread_local keyword is not supported.
Note
gnu++14 is the default language standard for C++ code.
C++17 [COMMUNITY] The base Clang and libc++ components provide C++17 language functionality.
However, Arm has performed no independent testing of these features and therefore
these features are [COMMUNITY] features.
Additional information
See the Arm® Compiler Reference Guide for information about Arm-specific language extensions.
For more information about libc++ support, see Standard C++ library implementation definition, in the
Arm® C and C++ Libraries and Floating-Point Support User Guide.
The Clang documentation provides additional information about language compatibility:
• Language compatibility:
http://clang.llvm.org/compatibility.html
• Language extensions:
http://clang.llvm.org/docs/LanguageExtensions.html
• C++ status:
http://clang.llvm.org/cxx_status.html
Note
The -fsanitize=undefined command-line option is a [COMMUNITY] feature.
Related information
Standard C++ library implementation definition
Arm Compiler Reference Guide
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-48
reserved.
Non-Confidential
3 Using Common Compiler Options
3.4 Selecting optimization options
Better correlation between source code and generated code -O0 (no optimization)
If you use a higher optimization level for performance, it has a higher impact on the other goals such as
degraded debug experience, increased code size, and increased build time.
If your optimization goal is code size reduction, it has an impact on the other goals such as degraded
debug experience, slower performance, and increased build time.
armclang provides a range of options to help you find a suitable approach for your requirements.
Consider whether code size reduction or faster performance is the goal which matters most for your
application, and then choose an option which matches your goal.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-49
reserved.
Non-Confidential
3 Using Common Compiler Options
3.4 Selecting optimization options
which the compiler might automatically generate vector instructions. It also degrades the debug
experience, and might result in an increased code size compared to -O1.
The differences when using -O2 as compared to -O1 are:
• The threshold at which the compiler believes that it is profitable to inline a call site might increase.
• The amount of loop unrolling that is performed might increase.
• Vector instructions might be generated for simple loops and for correlated sequences of independent
scalar operations.
The creation of vector instructions can be inhibited with the armclang command-line option
-fno-vectorize.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-50
reserved.
Non-Confidential
3 Using Common Compiler Options
3.4 Selecting optimization options
This level also performs other aggressive optimizations that might violate strict compliance with
language standards.
This level degrades the debug experience, and might result in increased code size compared to -O3.
Examples
The example shows the code generation when using the -O0 optimization option. To perform this
optimization, compile your source file using:
armclang --target=arm-arm-none-eabi -march=armv7-a -O0 -S file.c
The example shows the code generation when using the -O1 optimization option. To perform this
optimization, compile your source file using:
armclang --target=arm-arm-none-eabi -march=armv7-a -O1 -S file.c
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-51
reserved.
Non-Confidential
3 Using Common Compiler Options
3.4 Selecting optimization options
The source file contains mostly dead code, such as int x=10 and z=x+y. At optimization level -O0, the
compiler performs no optimization, and therefore generates code for the dead code in the source file.
However, at optimization level -O1, the compiler does not generate code for the dead code in the source
file.
Related information
Semihosting for AArch32 and AArch64
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-52
reserved.
Non-Confidential
3 Using Common Compiler Options
3.5 Building to aid debugging
When linking, there are several armlink options available to help improve the debug view:
• --debug. This option is the default.
• --no_remove to retain all input sections in the final image even if they are unused.
• --bestdebug. When different input objects are compiled with different optimization levels, this
option enables linking for the best debug illusion.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-53
reserved.
Non-Confidential
3 Using Common Compiler Options
3.6 Linking object files to produce an executable
where:
options
are linker command-line options.
input-file-list
is a space-separated list of objects, libraries, or symbol definitions (symdefs) files.
For example, to link the object file hello_world.o into an executable image hello_world.axf:
armlink -o hello_world.axf hello_world.o
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-54
reserved.
Non-Confidential
3 Using Common Compiler Options
3.7 Linker options for mapping code and data to target memory
3.7 Linker options for mapping code and data to target memory
For an image to run correctly on a target, you must place the various parts of the image at the correct
locations in memory. Linker command-line options are available to map the various parts of an image to
target memory.
The options implement the scatter-loading mechanism that describes the memory layout for the image.
The options that you use depend on the complexity of your image:
• For simple images, use the following memory map related options:
— --ro_base to specify the address of both the load and execution region containing the RO output
section.
— --rw_base to specify the address of the execution region containing the RW output section.
— --zi_base to specify the address of the execution region containing the ZI output section.
Note
For objects that include execute-only (XO) sections, the linker provides the --xo_base option to
locate the XO sections. These sections are objects that are targeted at Armv7‑M or Armv8‑M
architectures, or objects that are built with the armclang -mthumb option,
• For complex images, use a text format scatter-loading description file. This file is known as a scatter
file, and you specify it with the --scatter option.
Note
You cannot use the memory map related options with the --scatter option.
Examples
The following example shows how to place code and data using the memory map related options:
armlink --ro_base=0x0 --rw_base=0x400000 --zi_base=0x405000 --first="init.o(init)" init.o
main.o
Note
In this example, --first is also included to make sure that the initialization routine is executed first.
The following example shows a scatter file, scatter.scat, that defines an equivalent memory map:
LR1 0x0000 0x20000
{
ER_RO 0x0
{
init.o (INIT, +FIRST)
*(+RO)
}
ER_RW 0x400000
{
*(+RW)
}
ER_ZI 0x405000
{
*(+ZI)
}
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-55
reserved.
Non-Confidential
3 Using Common Compiler Options
3.8 Passing options from the compiler to the linker
In addition, the -Xlinker and -Wl options let you pass options directly to the linker from the compiler
command line. These options perform the same function, but use different syntaxes:
• The -Xlinker option specifies a single option, a single argument, or a single option=argument pair.
If you want to pass multiple options, use multiple -Xlinker options.
• The -Wl, option specifies a comma-separated list of options and arguments or option=argument
pairs.
For example, the following are all equivalent because armlink treats the single option --list=diag.txt
and the two options --list diag.txt equivalently:
-Xlinker --list -Xlinker diag.txt -Xlinker --split
-Wl,--list,diag.txt,--split
-Wl,--list=diag.txt,--split
Note
The -### compiler option produces diagnostic output showing exactly how the compiler and linker are
invoked, displaying the options for each tool. With the -### option, armclang only displays this
diagnostic output. It does not compile source files or invoke armlink.
The following example shows how to use the -Xlinker option to pass the --split option to the linker,
splitting the default load region containing the RO and RW output sections into separate regions:
armclang hello.c --target=aarch64-arm-none-eabi -Xlinker --split
You can use fromelf --text to compare the differences in image content:
armclang hello.c --target=aarch64-arm-none-eabi -o hello_DEFAULT.axf
armclang hello.c --target=aarch64-arm-none-eabi -o hello_SPLIT.axf -Xlinker --split
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-56
reserved.
Non-Confidential
3 Using Common Compiler Options
3.9 Controlling diagnostic messages
file
The message text. This text might end with a diagnostic flag of the form -Wflag, for example -
Wvla-extension, to identify the error or warning. Only the messages that you can suppress
have an associated flag. Errors that you cannot suppress do not have an associated flag.
An example warning diagnostic message is:
file.c:8:7: warning: variable length arrays are a C99 feature [-Wvla-extension]
int i[n];
^
Option Description
-Werror Turn all warnings into errors.
-Werror=foo Turn warning flag foo into an error.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-57
reserved.
Non-Confidential
3 Using Common Compiler Options
3.9 Controlling diagnostic messages
Option Description
-Weverything Enable all warnings.
-Wpedantic Generate warnings if code violates strict ISO C and ISO C++.
-pedantic Generate warnings if code violates strict ISO C and ISO C++.
-pedantic-errors Generate errors if code violates strict ISO C and ISO C++.
See Controlling Errors and Warnings in the Clang Compiler User's Manual for full details about
controlling diagnostics with armclang.
printf("Result of %d plus %d is %d\n", i, x); /* Missing an input argument for the third
%d */
call(); /* This function has not been declared and is therefore an implicit declaration
*/
return;
}
By default, armclang checks the format of printf() statements to ensure that the number of % format
specifiers matches the number of data arguments. Therefore armclang generates this diagnostic message:
file.c:9:36: warning: more '%' conversions than data arguments [-Wformat]
printf("Result of %d plus %d is %d\n", i, x);
^
By default, armclang compiles for the gnu11 standard for .c files. This language standard does not
allow implicit function declarations. Therefore armclang generates this diagnostic message:
file.c:11:3: warning: implicit declaration of function 'call' is invalid C99 [-Wimplicit-
function-declaration]
call();
^
Some diagnostic messages are suppressed by default. To see all diagnostic messages, use -Weverything:
armclang --target=aarch64-arm-none-eabi -march=armv8-a -c file.c -Weverything
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-58
reserved.
Non-Confidential
3 Using Common Compiler Options
3.9 Controlling diagnostic messages
The compiler only generates a warning for the second instance of #endif foo:
foo.c:8:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]
#endif foo /* warning: extra tokens at end of #endif directive */
^
//
1 warning generated.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-59
reserved.
Non-Confidential
3 Using Common Compiler Options
3.9 Controlling diagnostic messages
type
Internal faults indicate an internal problem with the tool. Contact your supplier with
feedback.
Error
Warnings indicate unusual conditions that might indicate a problem, but the tool
continues.
Remark
All the diagnostic messages that are in this format, and any additional information, are in the Arm®
Compiler Errors and Warnings Reference Guide.
armasm only. Uses a shorter form of the diagnostic output. The original source line is not
displayed and the error message text is not wrapped when it is too long to fit on a single line.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-60
reserved.
Non-Confidential
3 Using Common Compiler Options
3.9 Controlling diagnostic messages
--diag_error=tag[,tag]...
Sets the specified diagnostic messages to Error severity. Use --diag_error=warning to treat all
warnings as errors.
--diag_remark=tag[,tag]...
Sets the specified diagnostic messages to Warning severity. Use --diag_warning=error to set
all errors that can be downgraded to warnings.
--errors=filename
armlink only. Enables the display of remark messages (including any messages redesignated to
remark severity using --diag_remark).
tag is the four-digit diagnostic number, nnnn, with the tool letter prefix, but without the letter suffix
indicating the severity. A full list of tags with the associated suffixes is in the Arm® Compiler Errors and
Warnings Reference Guide.
For example, to downgrade a warning message to Remark severity:
$ armasm test.s --cpu=8-A.32
"test.s", line 55: Warning: A1313W: Missing END directive at end of file
0 Errors, 1 Warning
Related information
-W (armclang)
The LLVM Compiler Infrastructure Project
Clang Compiler User's Manual
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-61
reserved.
Non-Confidential
3 Using Common Compiler Options
3.10 Selecting floating-point options
Option Description
armclang -mfpu Specify the floating-point architecture to the compiler (ignored with AArch64 targets).
armclang -mfloat-abi Specify the floating-point linkage to the compiler.
armclang -march Specify the target architecture to the compiler. This option automatically selects the default
floating-point architecture.
armclang -mcpu Specify the target processor to the compiler. This option automatically selects the default
floating-point architecture.
armlink --fpu Specify the floating-point architecture to the linker.
To improve performance, the compiler can use floating-point registers instead of the stack. You can
disable this feature with the [COMMUNITY] option -mno-implicit-float.
Note
Avoid specifying both the architecture (-march) and the processor (-mcpu) because specifying both has
the potential to cause a conflict. The compiler infers the correct architecture from the processor.
• If you want to run code on one particular processor, specify the processor using -mcpu. Performance
is optimized, but code is only guaranteed to run on that processor. If you specify a value for -mcpu,
do not also specify a value for -march.
• If you want your code to run on a range of processors from a particular architecture, specify the
architecture using -march. The code runs on any processor implementation of the target architecture,
but performance might be impacted. If you specify a value for -march, do not also specify a value for
-mcpu.
Note
The -mfpu option is ignored with AArch64 targets, for example aarch64-arm-none-eabi. Use the -mcpu
option to override the default FPU for aarch64-arm-none-eabi targets. For example, to prevent the use of
floating-point instructions or floating-point registers for the aarch64-arm-none-eabi target use the -
mcpu=name+nofp+nosimd option. Subsequent use of floating-point data types in this mode is
unsupported.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-62
reserved.
Non-Confidential
3 Using Common Compiler Options
3.10 Selecting floating-point options
See -march and -mcpu in the Arm Compiler Reference Guide for more information.
When compiling for AArch32:
• By default, Arm Compiler uses floating-point hardware that is available on the target, except for
Armv6‑M, which does not have any floating-point hardware.
• To disable the use of floating-point hardware instructions, use the -mfpu=none option.
armclang --target=arm-arm-none-eabi -march=armv8-a -mfpu=none
• On AArch32 targets, using -mfpu=none disables the hardware for both Advanced SIMD and floating-
point arithmetic. You can use -mfpu to selectively enable certain hardware features. For example, if
you want to use the hardware for Advanced SIMD operations on an Armv7 architecture-based
processor, but not for floating-point arithmetic, then use -mfpu=neon.
armclang --target=arm-arm-none-eabi -march=armv7-a -mfpu=neon
• The Armv8.1-M architecture profile has optional support for the M-profile Vector Extension (MVE).
-march and -mcpu support certain MVE floating-point combinations.
See -march, -mcpu, and -mfpu in the Arm Compiler Reference Guide for more information.
Floating-point linkage
Floating-point linkage refers to how the floating-point arguments are passed to and returned from
function calls.
For AArch64, Arm Compiler always uses hardware linkage. When using hardware linkage, Arm
Compiler passes and returns floating-point values in hardware floating-point registers.
For AArch32, Arm Compiler can use hardware linkage or software linkage. When using software
linkage, Arm Compiler passes and returns floating-point values in general-purpose registers. By default,
Arm Compiler uses software linkage. You can use the -mfloat-abi option to force hardware linkage or
software linkage.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-63
reserved.
Non-Confidential
3 Using Common Compiler Options
3.10 Selecting floating-point options
softfp (This value is the default) Software linkage. Use general-purpose Use hardware floating-point instructions.
registers. But if -mfpu=none is specified for
AArch32, then use software libraries.
Code with hardware linkage can be faster than the same code with software linkage. However, code with
software linkage can be more portable because it does not require the hardware floating-point registers.
Hardware floating-point is not available on some architectures such as Armv6‑M, or on processors where
the floating-point hardware might be powered down for energy efficiency reasons.
Note
In AArch32 state, if you specify -mfloat-abi=soft, then specifying the -mfpu option does not have an
effect.
See the Arm Compiler Reference Guide for more information on the -mfloat-abi option.
Note
All objects to be linked together must have the same type of linkage. If you link object files that have
hardware linkage with object files that have software linkage, then the image might have unpredictable
behavior. When linking objects, specify the armlink option --fpu=name where name specifies the
correct linkage type and floating-point hardware. This option enables the linker to provide diagnostic
information if it detects different linkage types.
See the Arm Compiler Reference Guide for more information on how the --fpu option specifies the
linkage type and floating-point hardware.
Related information
-mcpu (armclang)
-mfloat-abi (armclang)
-mfpu (armclang)
About floating-point support
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-64
reserved.
Non-Confidential
3 Using Common Compiler Options
3.11 Compilation tools command-line option rules
armclang follows the same syntax rules as GCC. Some options are preceded by a single dash -, others
by a double dash --. Some options require an = character between the option and the argument, others
require a space character.
Keyword options
All keyword options, including keyword options with arguments, are preceded by a double dash
--. An = or space character is required between the option and the argument. For example:
armlink -- -ifile_1
In some Unix shells, you might have to include quotes when using arguments to some command-line
options, for example:
armlink obj1.o --keep='s.o(vect)'
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 3-65
reserved.
Non-Confidential
Chapter 4
Writing Optimized Code
To make best use of the optimization capabilities of Arm Compiler, there are various options, pragmas,
attributes, and coding techniques that you can use.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-66
reserved.
Non-Confidential
4 Writing Optimized Code
4.1 Effect of the volatile keyword on compiler optimization
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-67
reserved.
Non-Confidential
4 Writing Optimized Code
4.1 Effect of the volatile keyword on compiler optimization
register. This register must be written to using a two-register STM instruction, and not by either an STRD
instruction or a pair of STR instructions. There is no guarantee that the compiler selects the access method
required by that register in response to a volatile modifier on the associated variable or pointer type.
If you are writing code that must access the AXI port, or any other memory-mapped location that
requires a particular access strategy, then declaring the location as a volatile variable is not enough.
You must also perform your accesses to the register using an __asm__ statement containing the load or
store instructions you need. For example:
__asm__ volatile("stm %1,{%Q0,%R0}" : : "r"(val), "r"(ptr));
__asm__ volatile("ldm %1,{%Q0,%R0}" : "=r"(val) : "r"(ptr));
Both of these routines increment a counter in a loop until a status flag buffer_full is set to true. The
state of buffer_full can change asynchronously with program flow.
The example on the left does not declare the variable buffer_full as volatile and is therefore wrong.
The example on the right does declare the variable buffer_full as volatile.
The following table shows the corresponding disassembly of the machine code that the compiler
produces for each of the examples in Table 4-1 C code for nonvolatile and volatile buffer loops
on page 4-68. The C code for each example is compiled using
armclang --target=arm-arm-none-eabi -march=armv8-a -Os -S.
read_stream: read_stream:
movw r0, :lower16:buffer_full movw r1, :lower16:buffer_full
movt r0, :upper16:buffer_full mvn r0, #0
ldr r1, [r0] movt r1, :upper16:buffer_full
mvn r0, #0 .LBB1_1:
.LBB0_1: ldr r2, [r1] ; buffer_full
add r0, r0, #1 add r0, r0, #1
cmp r1, #0 cmp r2, #0
beq .LBB0_1 ; infinite loop beq .LBB1_1
bx lr bx lr
In the disassembly of the nonvolatile example, the statement LDR r1, [r0] loads the value of
buffer_full into register r1 outside the loop labeled .LBB0_1. Because buffer_full is not declared as
volatile, the compiler assumes that its value cannot be modified outside the program. Having already
read the value of buffer_full into r0, the compiler omits reloading the variable when optimizations are
enabled, because its value cannot change. The result is the infinite loop labeled .LBB0_1.
In the disassembly of the volatile example, the compiler assumes that the value of buffer_full can
change outside the program and performs no optimization. Therefore, the value of buffer_full is
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-68
reserved.
Non-Confidential
4 Writing Optimized Code
4.1 Effect of the volatile keyword on compiler optimization
loaded into register r2 inside the loop labeled .LBB1_1. As a result, the assembly code that is generated
for loop .LBB1_1 is correct.
Related information
Volatile variables
armclang Inline Assembler
Arm Cortex-R7 MPCore Technical Reference Manual
Arm Cortex-R8 MPCore Processor Technical Reference Manual
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-69
reserved.
Non-Confidential
4 Writing Optimized Code
4.2 Optimizing loops
Loop unrolling
You can reduce the impact of this overhead by unrolling some of the iterations, which in turn reduces the
number of iterations for checking the condition. Use #pragma unroll (n) to unroll time-critical loops
in your source code. However, unrolling loops has the disadvantage of increasing the code size. These
pragmas are only effective at optimization -O2, -O3, -Ofast, and -Omax.
Pragma Description
#pragma unroll (n) Unroll n iterations of the loop.
Note
Manually unrolling loops in source code might hinder the automatic rerolling of loops and other loop
optimizations by the compiler. Arm recommends that you use #pragma unroll instead of manually
unrolling loops. See #pragma unroll[(n)], #pragma unroll_completely in the Arm® Compiler Reference
Guide for more information.
The following examples show code with loop unrolling and code without loop unrolling.
Bit counting loop without unrolling Bit counting loop with unrolling
The following code is the code that Arm Compiler generates for the preceding examples. Copy the
examples into file.c and compile using:
armclang --target=arm-arm-none-eabi -march=armv8-a file.c -O2 -S -o file.s
For the function with loop unrolling, countSetBits2, the generated code is faster but larger in size.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-70
reserved.
Non-Confidential
4 Writing Optimized Code
4.2 Optimizing loops
Bit counting loop without unrolling Bit counting loop with unrolling
countSetBits1: countSetBits2:
mov r1, r0 mov r1, r0
mov r0, #0 mov r0, #0
cmp r1, #0 cmp r1, #0
bxeq lr bxeq lr
mov r2, #0 mov r2, #0
mov r0, #0 mov r0, #0
.LBB0_1: LBB0_1:
and r3, r1, #1 and r3, r1, #1
cmp r2, r1, asr #1 cmp r2, r1, asr #1
add r0, r0, r3 add r0, r0, r3
lsr r3, r1, #1 beq .LBB0_4
mov r1, r3 @ BB#2:
bne .LBB0_1 asr r3, r1, #1
bx lr cmp r2, r1, asr #2
and r3, r3, #1
add r0, r0, r3
asrne r3, r1, #2
andne r3, r3, #1
addne r0, r0, r3
cmpne r2, r1, asr #3
beq .LBB0_4
@ BB#3:
asr r3, r1, #3
cmp r2, r1, asr #4
and r3, r3, #1
add r0, r0, r3
asr r3, r1, #4
mov r1, r3
bne .LBB0_1
.LBB0_4:
bx lr
Arm Compiler can unroll loops completely only if the number of iterations is known at compile time.
Loop vectorization
If your target has the Advanced SIMD unit, then Arm Compiler can use the vectorizing engine to
optimize vectorizable sections of the code. At optimization level -O1, you can enable vectorization using
-fvectorize. At higher optimizations, -fvectorize is enabled by default and you can disable it using
-fno-vectorize. See -fvectorize in the Arm® Compiler Reference Guide for more information. When
using -fvectorize with -O1, vectorization might be inhibited in the absence of other optimizations
which might be present at -O2 or higher.
For example, loops that access structures can be vectorized if all parts of the structure are accessed
within the same loop rather than in separate loops. The following examples show a loop that Advanced
SIMD can vectorize, and a loop that cannot be vectorized easily.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-71
reserved.
Non-Confidential
4 Writing Optimized Code
4.2 Optimizing loops
For each example, copy the code into file.c and compile at optimization level O2 to enable auto-
vectorization:
armclang --target=arm-arm-none-eabi -march=armv8-a -O2 file.c -S -o file.s
The vectorized assembly code contains the Advanced SIMD instructions, for example vld1, vshl, and
vst1. These Advanced SIMD instructions are not generated when compiling the example with the non-
vectorizable loop.
DoubleBuffer1: DoubleBuffer2:
.fnstart .fnstart
@ BB#0: @ BB#0:
movw r0, :lower16:buffer movw r0, :lower16:buffer
movt r0, :upper16:buffer movt r0, :upper16:buffer
vld1.64 {d16, d17}, [r0:128] ldr r1, [r0]
mov r1, r0 lsl r1, r1, #1
vshl.i32 q8, q8, #1 str r1, [r0]
vst1.32 {d16, d17}, [r1:128]! ldr r1, [r0, #12]
vld1.64 {d16, d17}, [r1:128] lsl r1, r1, #1
vshl.i32 q8, q8, #1 str r1, [r0, #12]
vst1.64 {d16, d17}, [r1:128] ldr r1, [r0, #24]
add r1, r0, #32 lsl r1, r1, #1
vld1.64 {d16, d17}, [r1:128] str r1, [r0, #24]
vshl.i32 q8, q8, #1 ldr r1, [r0, #36]
vst1.64 {d16, d17}, [r1:128] lsl r1, r1, #1
add r1, r0, #48 str r1, [r0, #36]
vld1.64 {d16, d17}, [r1:128] ldr r1, [r0, #48]
vshl.i32 q8, q8, #1 lsl r1, r1, #1
vst1.64 {d16, d17}, [r1:128] str r1, [r0, #48]
add r1, r0, #64 ldr r1, [r0, #60]
add r0, r0, #80 lsl r1, r1, #1
vld1.64 {d16, d17}, [r1:128] str r1, [r0, #60]
vshl.i32 q8, q8, #1 ldr r1, [r0, #72]
vst1.64 {d16, d17}, [r1:128] lsl r1, r1, #1
vld1.64 {d16, d17}, [r0:128] str r1, [r0, #72]
vshl.i32 q8, q8, #1 ldr r1, [r0, #84]
vst1.64 {d16, d17}, [r0:128] lsl r1, r1, #1
bxlr str r1, [r0, #84]
ldr r1, [r0, #4]
lsl r1, r1, #1
str r1, [r0, #4]
ldr r1, [r0, #16]
lsl r1, r1, #1
...
bx lr
Note
Advanced SIMD (Single Instruction Multiple Data), also known as Arm Neon™ technology, is a powerful
vectorizing unit on Armv7‑A and later Application profile architectures. It enables you to write highly
optimized code. You can use intrinsics to directly use the Advanced SIMD capabilities from C or C++
code. The intrinsics and their data types are defined in arm_neon.h. For more information on Advanced
SIMD, see the Arm® C Language Extensions ACLE Q1 2019, Cortex®‑A Series Programmer's Guide, and
Arm® Neon™ Programmer's Guide.
Using -fno-vectorize does not necessarily prevent the compiler from emitting Advanced SIMD
instructions. The compiler or linker might still introduce Advanced SIMD instructions, such as when
linking libraries that contain these instructions.
To prevent the compiler from emitting Advanced SIMD instructions for AArch64 targets, specify
+nosimd using -march or -mcpu:
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-72
reserved.
Non-Confidential
4 Writing Optimized Code
4.2 Optimizing loops
To prevent the compiler from emitting Advanced SIMD instructions for AArch32 targets, set the option -
mfpu to the correct value that does not include Advanced SIMD. For example, set -mfpu=fp-armv8.
The following table shows the corresponding disassembly for each of the preceding sample
implementations. Generate the disassembly using:
armclang -Os -S --target=arm-arm-none-eabi -march=armv8-a
fact1: fact2:
mov r1, r0 mov r1, r0
mov r0, #1 mov r0, #1
cmp r1, #1 cmp r1, #0
bxlt lr bxeq lr
mov r2, #0 .LBB1_1:
.LBB0_1: mul r0, r0, r1
add r2, r2, #1 subs r1, r1, #1
mul r0, r0, r2 bne .LBB1_1
cmp r1, r2 bx lr
bne .LBB0_1
bx lr
Comparing the disassemblies shows that the ADD and CMP instruction pair in the incrementing loop
disassembly has been replaced with a single SUBS instruction in the decrementing loop disassembly.
Because the SUBS instruction updates the status flags, including the Z flag, there is no requirement for an
explicit CMP r1,r2 instruction.
Also, the variable n does not have to be available for the lifetime of the loop, reducing the number of
registers that have to be maintained. Having fewer registers to maintain eases register allocation. If the
original termination condition involves a function call, each iteration of the loop might call the function,
even if the value it returns remains constant. In this case, counting down to zero is even more important.
For example:
for (...; i < get_limit(); ...);
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-73
reserved.
Non-Confidential
4 Writing Optimized Code
4.2 Optimizing loops
The technique of initializing the loop counter to the number of iterations that are required, and then
decrementing down to zero, also applies to while and do statements.
Infinite loops
armclang considers infinite loops with no side-effects to be undefined behavior, as stated in the C11 and
C++11 standards. In certain situations armclang deletes or moves infinite loops, resulting in a program
that eventually terminates, or does not behave as expected.
To ensure that a loop executes for an infinite length of time, Arm recommends writing infinite loops in
the following way:
void infinite_loop(void) {
while (1)
asm volatile(""); // this line is considered to have side-effects
}
armclang does not delete or move the loop, because it has side-effects.
Related information
-O (armclang)
pragma unroll
-fvectorize (armclang)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-74
reserved.
Non-Confidential
4 Writing Optimized Code
4.3 Inlining functions
__attribute__((always_inline)) Specify this function attribute on a function definition or declaration to tell the compiler
to always inline this function, with certain exceptions such as for recursive functions.
This overrides the -fno-inline-functions option.
__attribute__((noinline)) Specify this function attribute on a function definition or declaration to tell the compiler
to not inline the function. This is equivalent to __declspec(noinline).
-fno-inline-functions This is a compiler command-line option. Specify this option to the compiler to disable
inlining. This option overrides the __inline__ hint.
Note
• Arm Compiler only inlines functions within the same compilation unit, unless you use Link Time
Optimization. For more information, see Optimizing across modules with link time optimization
on page 4-87 in the Software Development Guide.
• C++ and C99 provide the inline language keyword. The effect of this inline language keyword is
identical to the effect of using the __inline__ compiler keyword. However, the effect in C99 mode
is different from the effect in C++ or other C that does not adhere to the C99 standard. For more
information, see Inline functions in the Arm Compiler Reference Guide.
• Function inlining normally happens at higher optimization levels, such as -O2, except when you
specify __attribute__((always_inline)).
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-75
reserved.
Non-Confidential
4 Writing Optimized Code
4.3 Inlining functions
In the example code, functions bar and row are identical but function row is always inlined. Use the
following compiler commands to compile for -O2 with -fno-inline-functions and without -fno-
inline-functions:
When compiling with -fno-inline-functions, the compiler does not inline the function bar. When
compiling without -fno-inline-functions, the compiler inlines the function bar. However, the
compiler always inlines the function row even though it is identical to function bar.
Related information
-fno-inline-functions (armclang)
__inline keyword
__attribute__((always_inline)) function attribute
__attribute__((no_inline)) function attribute
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-76
reserved.
Non-Confidential
4 Writing Optimized Code
4.4 Stack use in C and C++
• Several optimizations can introduce new temporary variables to hold intermediate results. The
optimizations include: CSE elimination, live range splitting, and structure splitting. The compiler
tries to allocate these temporary variables to registers. If not, it spills them to the stack. For more
information about what these optimizations do, see Overview of optimizations.
• Generally, code that is compiled for processors that only support 16-bit encoded T32 instructions
makes more use of the stack than A64 code, A32 code, and code that is compiled for processors that
support 32-bit encoded T32 instructions. This is because 16-bit encoded T32 instructions have only
eight registers available for allocation, compared to fourteen for A32 code and 32-bit encoded T32
instructions.
• The AAPCS64 requires that some function arguments are passed through the stack instead of the
registers, depending on their type, size, and order.
Processors for embedded applications have limited memory and therefore the amount of space available
on the stack is also limited. You can use Arm Compiler to determine how much stack space is used by
the functions in your application code. The amount of stack that a function uses depends on factors such
as the number and type of arguments to the function, local variables in the function, and the
optimizations that the compiler performs.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-77
reserved.
Non-Confidential
4 Writing Optimized Code
4.4 Stack use in C and C++
3. Run your application, or a fixed portion of it. Aim to use as much of the stack space as possible in
the test run. For example, try to execute the most deeply nested function calls and the worst case
path that the static analysis finds. Try to generate interrupts where appropriate, so that they are
included in the stack trace.
4. After your application has finished executing, examine the stack space of memory to see how
many of the known values have been overwritten. The space has garbage in the used part and the
known values in the remainder.
5. Count the number of garbage values and multiply by sizeof(value), to give their size, in bytes.
The result of the calculation shows how the size of the stack has grown, in bytes.
• Use a Fixed Virtual Platform (FVP) that corresponds to the target processor or architecture. With a
map file, define a region of memory directly below your stack where access is forbidden. If the stack
overflows into the forbidden region, a data abort occurs, which a debugger can trap.
Copy the code example to file.c and compile it using the following command:
armclang --target=arm-arm-none-eabi -march=armv8-a -c -g file.c -o file.o
Compiling with the -g option generates the DWARF frame information that armlink requires for
estimating the stack use. Run armlink on the object file using --info=stack:
armlink file.o --info=stack
For the example code, armlink shows the amount of stack that the various functions use. Function
foo_mor has more arguments than function foo, and therefore uses more stack.
You can also examine stack usage using the linker option --callgraph:
armlink file.o --callgraph -o FileImage.axf
This outputs a file called FileImage.htm which contains the stack usage information for the various
functions in the application.
fact (ARM, 84 bytes, Stack size 12 bytes, file.o(.text))
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-78
reserved.
Non-Confidential
4 Writing Optimized Code
4.4 Stack use in C and C++
[Stack]
Max Depth = 12
Call Chain = fact
[Called By]
>> foo_mor
>> foo
foo (ARM, 36 bytes, Stack size 8 bytes, file.o(.text))
[Stack]
Max Depth = 20
Call Chain = foo >> fact
[Calls]
>> fact
[Called By]
>> main
foo_mor (ARM, 76 bytes, Stack size 16 bytes, file.o(.text))
[Stack]
Max Depth = 28
Call Chain = foo_mor >> fact
[Calls]
>> fact
[Called By]
>> main
main (ARM, 76 bytes, Stack size 8 bytes, file.o(.text))
[Stack]
Max Depth = 36
Call Chain = main >> foo_mor >> fact
[Calls]
>> foo_mor
>> foo
[Called By]
>> __rt_entry_main (via BLX)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-79
reserved.
Non-Confidential
4 Writing Optimized Code
4.5 Packing data structures
For each example use linker option --info=sizes to examine the memory used in file.o.
armlink file.o --info=sizes
The linker output shows the total memory used by the two objects c and d. For example:
Code (inc. data) RO Data RW Data ZI Data Debug Object Name
36 0 0 0 24 0 str.o
---------------------------------------------------------------------------
36 0 16 0 24 0 Object Totals
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-80
reserved.
Non-Confidential
4 Writing Optimized Code
4.5 Packing data structures
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-81
reserved.
Non-Confidential
4 Writing Optimized Code
4.5 Packing data structures
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-82
reserved.
Non-Confidential
4 Writing Optimized Code
4.5 Packing data structures
Dereferencing such a pointer can be unsafe even when unaligned accesses are supported by the target,
because certain instructions always require word-aligned addresses.
Note
If you take the address of a packed member, in most cases, the compiler generates a warning.
Related information
pragma pack
__attribute__((packed)) type attribute
__attribute__((packed)) variable attribute
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-83
reserved.
Non-Confidential
4 Writing Optimized Code
4.6 Optimizing for code size or performance
Note
This topic includes descriptions of [ALPHA] features. See Support level definitions
on page Appx-A-266.
Different optimizations often work against each other. That is, techniques for improving code
performance might result in increased code size, and techniques for reducing code size might reduce
performance. For example, the compiler can unroll small loops for higher performance, with the
disadvantage of increased code size.
The default optimization level is -O0. At -O0, armclang does not perform optimization.
The following armclang options help you optimize for code performance:
-O1 | -O2 | -O3
Specify the level of optimization to be used when compiling source files. A higher number
implies a higher level of optimization for performance.
-Ofast
Enables all the optimizations from -O3 along with other aggressive optimizations that might
violate strict compliance with language standards.
-Omax
Enables all the optimizations from -Ofast along with Link-Time Optimization (LTO).
The following armclang options help you optimize for code size:
-Os
Performs optimizations to reduce the code size at the expense of a possible increase in execution
time. This option aims for a balanced code size reduction and fast performance.
-Oz
Optimizes for smaller code size.
-Omin
Minimum image size. Specifically targets minimizing code size. Enables all the optimizations
from level -Oz, together with:
• Link-Time Optimization aimed at removing unused code and data, while also trying to
optimize global memory accesses.
• Virtual function elimination, which is a particular benefit to C++ users.
For more information on optimization levels, see Selecting optimization levels.
Note
You can also set the optimization level for the linker with the armlink option --lto_level. The
optimization levels available for armlink are the same as the armclang optimization levels.
-fshort-enums
Allows the compiler to set the size of an enumeration type to the smallest data type that can hold
all enumerator values.
-fshort-wchar
Sets the size of wchar_t to 2 bytes.
-fno-exceptions
C++ only. Disables the generation of code that is required to support C++ exceptions.
-fno-rtti [ALPHA]
C++ only. Disables the generation of code that is required to support Run Time Type
Information (RTTI) features.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-84
reserved.
Non-Confidential
4 Writing Optimized Code
4.6 Optimizing for code size or performance
The following armclang option helps you optimize for both code size and code performance:
-flto
Enables Link-Time Optimization (LTO), which enables the linker to make additional
optimizations across multiple source files. See 4.8 Optimizing across modules with Link-Time
Optimization on page 4-87 for more information.
Note
If you want to use LTO when invoking armlink separately, you can use the armlink option --
lto_level to select the LTO optimization level that matches your optimization goal.
In addition, choices you make during coding can affect optimization. For example:
• Optimizing loop termination conditions can improve both code size and performance. In particular,
loops with counters that decrement to zero usually produce smaller, faster code than loops with
incrementing counters.
• Manually unrolling loops by reducing the number of loop iterations, but increasing the amount of
work that is done in each iteration, can improve performance at the expense of code size.
• Reducing debug information in objects and libraries reduces the size of your image.
• Using inline functions offers a trade-off between code size and performance.
• Using intrinsics can improve performance.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-85
reserved.
Non-Confidential
4 Writing Optimized Code
4.7 Methods of minimizing function parameter passing overhead
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-86
reserved.
Non-Confidential
4 Writing Optimized Code
4.8 Optimizing across modules with Link-Time Optimization
ELF Object
containing ELF Object
Bitcode .o
.o
Libraries Link-time optimizer
libLTO
Note
In this figure, ELF Object containing Bitcode is an ELF file that does not contain normal code and data.
Instead, it contains a section that is called .llvmbc that holds LLVM bitcode.
Section .llvmbc is reserved. You must not create an .llvmbc section with, for example
__attribute__((section(".llvmbc"))).
Caution
LTO performs aggressive optimizations by analyzing the dependencies between bitcode format objects.
Such aggressive optimizations can result in the removal of unused variables and functions in the source
code.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-87
reserved.
Non-Confidential
4 Writing Optimized Code
4.8 Optimizing across modules with Link-Time Optimization
To enable LTO:
1. At compilation time, use the armclang option -flto to produce ELF files suitable for LTO. These
ELF files contain bitcode in a .llvmbc section.
Note
The armclang option -Omax automatically enables the -flto option.
2. At link time, use the armlink option --lto to enable LTO for the specified bitcode files.
Note
If you use the -flto option without the -c option, armclang automatically passes the --lto option to
armlink.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-88
reserved.
Non-Confidential
4 Writing Optimized Code
4.8 Optimizing across modules with Link-Time Optimization
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-89
reserved.
Non-Confidential
4 Writing Optimized Code
4.8 Optimizing across modules with Link-Time Optimization
In this example:
• The function main() calls an externally defined function foo(), and returns the value that foo()
returns. Because this function is externally defined, the compiler cannot inline or otherwise optimize
it when compiling main.c, without using LTO.
• The file foo.c contains the following functions:
foo()
If the parameter a is nonzero, foo() conditionally calls a function bar().
bar()
This function prints a message.
In this case, foo() is called with the parameter a == 0, so bar() is not called at run time.
Example code that is used in the following procedure:
// main.c
extern int foo(int a);
int main(void)
{
return foo(0);
}
// foo.c
#include <stdio.h>
int foo(int a);
void bar(void);
return 0;
}
}
void bar(void)
{
printf("a is non-zero.\n");
}
Procedure
1. Build the example code with LTO disabled:
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -c main.c -o main.o
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -c foo.c -o foo.o
armlink main.o foo.o -o image_without_lto.axf
fromelf --text -c -z image_without_lto.axf
Results:
The compiler cannot inline the call to foo() because it is in a different object from main().
Therefore, the compiler must keep the conditional call to bar() within foo(), because the compiler
does not have any information about the value of the parameter a while foo.c is being compiled:
$a.0
foo
0x00008bd8: e3500000 ..P. CMP r0,#0
0x00008bdc: 0a000004 .... BEQ 0x8bf4 ; foo + 28
0x00008be0: e92d4800 .H-. PUSH {r11,lr}
0x00008be4: e3080c44 D... MOV r0,#0x8c44
0x00008be8: e3400000 ..@. MOVT r0,#0
0x00008bec: fafffd28 (... BLX puts ; 0x8094
0x00008bf0: e8bd4800 .H.. POP {r11,lr}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-90
reserved.
Non-Confidential
4 Writing Optimized Code
4.8 Optimizing across modules with Link-Time Optimization
Also, bar() uses the Arm C library function printf(). In this example, printf() is optimized to
puts() and inlined into foo(). Therefore, the linker must include the relevant C library code to allow
the puts() function to be used. Including the C library code results in a large amount of uncalled
code being included in the image. The output from the fromelf utility shows the resulting overall
image size:
** Object/Image Component Sizes
Results:
Although the compiler does not have any information about the call to foo() from main() when
compiling foo.c, at link time, it is known that:
• foo() is only ever called once, with the parameter a == 0.
• bar() is never called.
• The Arm C library function puts() is never called.
Because LTO is enabled, this extra information is used to make the following optimizations:
• Inlining the call to foo() into main().
• Removing the code to conditionally call bar() from foo() entirely.
• Removing the C library code that allows use of the puts() function.
$a.0
main
0x00008128: e3a00000 .... MOV r0,#0
0x0000812c: e12fff1e ../. BX lr
Also, this optimization means that the overall image size is much lower. The output from the fromelf
utility shows the reduced image size:
** Object/Image Component Sizes
Related references
4.6 Optimizing for code size or performance on page 4-84
4.8 Optimizing across modules with Link-Time Optimization on page 4-87
4.10 How optimization affects the debug experience on page 4-97
Related information
-O (armclang)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-91
reserved.
Non-Confidential
4 Writing Optimized Code
4.9 Scatter file section or object placement with Link-Time Optimization
To use scatter file section or object placement with LTO, the following changes must be made to a
project:
• Compile all source files that are built with LTO enabled with -fno-inline-functions.
• Modify each source file that is built with LTO enabled to use #pragma clang section to place all
functions in that source file into sections with a name unique to that source file.
• Modify the scatter file to use section names instead of object file names.
Example code
The following example code is used in the example sections, unless specified otherwise. In this code, all
functions in foo.c must be placed in an execution region EXEC_FOO, and all functions in bar.c must be
placed in an execution region EXEC_BAR:
foo.c:
#include <stdio.h>
void foo_A(void)
{
printf("%s", foo_string1);
}
void foo_B(void)
{
printf("%s", foo_string2);
}
bar.c:
#include <stdio.h>
void bar_A(void)
{
printf("%s", bar_string1);
}
void bar_B(void)
{
printf("%s", bar_string2);
}
main.c:
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-92
reserved.
Non-Confidential
4 Writing Optimized Code
4.9 Scatter file section or object placement with Link-Time Optimization
int main(void)
{
foo_A();
foo_B();
bar_A();
bar_B();
return 0;
}
scatter.sct:
LOAD 0x0
{
EXEC_ANY +0x0
{
.ANY(+RO, +RW, +ZI)
}
The memory map from the listing file image.lst shows that EXEC_FOO and EXEC_BAR contain code from
foo.c and bar.c respectively, as intended:
Execution Region EXEC_FOO (Base: 0x00001000, Size: 0x00000038, Max: 0xffffffff, ABSOLUTE)
Execution Region EXEC_BAR (Base: 0x00001400, Size: 0x00000038, Max: 0xffffffff, ABSOLUTE)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-93
reserved.
Non-Confidential
4 Writing Optimized Code
4.9 Scatter file section or object placement with Link-Time Optimization
Also, the memory map from the listing file image.lst shows that EXEC_FOO and EXEC_BAR are empty:
Execution Region EXEC_FOO (Base: 0x00001000, Size: 0x00000000, Max: 0xffffffff, ABSOLUTE)
Execution Region EXEC_BAR (Base: 0x00001000, Size: 0x00000000, Max: 0xffffffff, ABSOLUTE)
These execution regions are empty because LTO has inlined all functions within foo.c and bar.c.
Therefore, the functions are no longer available for placement with a scatter-file.
The reason is that, even though function inlining is disabled, all code from main.c, foo.c, and bar.c is
part of the same object file. Therefore, at the final link stage within the LTO process, foo.o and bar.o
do not exist as separate object files.
The memory map in the listing file image.lst shows that the code from foo.c and bar.c is now placed
in the EXEC_ANY execution region instead:
Execution Region EXEC_ANY (Base: 0x00000000, Size: 0x00000da8, Max: 0xffffffff, ABSOLUTE)
lto_llvm_3d77ff.o is the LTO intermediate filename that the linker generates. You can change this
name using the armlink --lto_intermediate_filename command-line option, though that does not
help in this use case. Instead, section names must be used.
Example: Using section names for all functions within a C language source file
The easiest way to specify section names for all functions within a C language source file is to use
#pragma clang section. For this example, rewrite the example code foo.c and bar.c as follows:
foo.c:
#include <stdio.h>
void foo_A(void)
{
printf("%s", foo_string1);
}
void foo_B(void)
{
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-94
reserved.
Non-Confidential
4 Writing Optimized Code
4.9 Scatter file section or object placement with Link-Time Optimization
printf("%s", foo_string2);
}
bar.c:
#include <stdio.h>
void bar_A(void)
{
printf("%s", bar_string1);
}
void bar_B(void)
{
printf("%s", bar_string2);
}
#pragma clang section text="foo_rotext" rodata="foo_rodata" specifies that code and read-
only data (such as the string constants used within the calls to printf() in foo.c) are placed in named
sections:
• foo_rotext for the code that is generated.
• foo_rodata for the read-only data that is generated.
Similar names are specified in bar.c for the code and data generated by that file. You can rewrite
scatter.sct to use these section names as follows:
scatter.sct:
LOAD 0x0
{
EXEC_ANY +0x0
{
.ANY(+RO, +RW, +ZI)
}
Example: Building with LTO enabled, function inlining disabled, and using section
names instead of object file names
Build the modified example with:
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -flto -fno-inline-functions -c foo.c -
o foo.o
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -flto -fno-inline-functions -c bar.c -
o bar.o
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -flto -fno-inline-functions -c main.c
-o main.o
armlink --scatter=scatter.sct foo.o bar.o main.o -o image.axf --lto --map --list=image.lst
The linker does not report any warnings. Also, the memory map from the listing file image.lst shows
that EXEC_FOO and EXEC_BAR contain the code from the expected sections:
Execution Region EXEC_FOO (Base: 0x00001000, Size: 0x00000038, Max: 0xffffffff, ABSOLUTE)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-95
reserved.
Non-Confidential
4 Writing Optimized Code
4.9 Scatter file section or object placement with Link-Time Optimization
Execution Region EXEC_BAR (Base: 0x00001400, Size: 0x00000038, Max: 0xffffffff, ABSOLUTE)
The key difference between this LTO approach and the non-LTO approach with object file names is that
in this approach, the function names are not visible in the listing file. To verify that the sections foo_RO
and bar_RO contain the functions from foo.c and bar.c respectively, examine the symbol table from the
fromelf --text -s output:
The addresses for these functions in the output from the fromelf utility correspond to the execution
region addresses in the memory map from the listing file image.lst. The symbol table also confirms the
location of the char[] constants.
Other considerations
Consider the following approaches:
• If you plan to build a project with LTO eventually, it might be better to use section names instead of
object file names within scatter-files using the method shown this example. This approach is
compatible both with and without LTO.
• If you disable LTO, it is better to also remove -fno-inline-functions, because doing so allows the
compiler to perform inlining optimizations.
• If disabling function inlining entirely is not required, then the attribute __attribute__((noinline))
must be used on a per-function basis. This approach can help achieve a better balance between
explicit code placement and cross-file function inlining optimizations.
Related references
4.8 Optimizing across modules with Link-Time Optimization on page 4-87
Related information
-fno-inline-functions (armclang)
-flto (armclang)
-O (armclang)
__attribute__((noinline)) function attribute
#pragma clang section
--lto (armlink)
--lto_intermediate_filename (armlink)
Scatter-loading Features
Scatter File Syntax
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-96
reserved.
Non-Confidential
4 Writing Optimized Code
4.10 How optimization affects the debug experience
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 4-97
reserved.
Non-Confidential
Chapter 5
Assembling Assembly Code
Describes how to assemble assembly source code with armclang and armasm.
It contains the following sections:
• 5.1 Assembling armasm and GNU syntax assembly code on page 5-99.
• 5.2 Preprocessing assembly code on page 5-101.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 5-98
reserved.
Non-Confidential
5 Assembling Assembly Code
5.1 Assembling armasm and GNU syntax assembly code
The following examples show equivalent armasm and GNU syntax assembly code for incrementing a
register in a loop.
armasm assembler syntax:
main PROC
MOV w5,#0x64 ; W5 = 100
MOV w4,#0 ; W4 = 0
B test_loop ; branch to test_loop
loop
ADD w5,w5,#1 ; Add 1 to W5
ADD w4,w4,#1 ; Add 1 to W4
test_loop
CMP w4,#0xa ; if W4 < 10, branch back to loop
BLT loop
ENDP
END
You might have legacy assembly source files that use the armasm syntax. Use armasm to assemble legacy
armasm syntax assembly code. Typically, you invoke the armasm assembler as follows:
.section .text,"ax"
.balign 4
main:
MOV w5,#0x64 // W5 = 100
MOV w4,#0 // W4 = 0
B test_loop // branch to test_loop
loop:
ADD w5,w5,#1 // Add 1 to W5
ADD w4,w4,#1 // Add 1 to W4
test_loop:
CMP w4,#0xa // if W4 < 10, branch back to loop
BLT loop
.end
Use GNU syntax for newly created assembly files. Use the armclang integrated assembler to assemble
GNU assembly language source code. Typically, you invoke the armclang assembler as follows:
armclang --target=aarch64-arm-none-eabi -c -o file.o file.S
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 5-99
reserved.
Non-Confidential
5 Assembling Assembly Code
5.1 Assembling armasm and GNU syntax assembly code
Related information
GNU Binutils - Using as
Migrating armasm syntax assembly code to GNU syntax
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 5-100
reserved.
Non-Confidential
5 Assembling Assembly Code
5.2 Preprocessing assembly code
By default, armclang uses the assembly code source file suffix to determine whether to run the C
preprocessor:
• The .s (lowercase) suffix indicates assembly code that does not require preprocessing.
• The .S (uppercase) suffix indicates assembly code that requires preprocessing.
The -x option lets you override the default by specifying the language of the subsequent source files,
rather than inferring the language from the file suffix. Specifically, -x assembler-with-cpp indicates
that the assembly code contains C preprocessor directives and armclang must run the C preprocessor.
The -x option only applies to input files that follow it on the command line.
Note
Do not confuse the .ifdef assembler directive with the preprocessor #ifdef directive:
• The preprocessor #ifdef directive checks for the presence of preprocessor macros, These macros are
defined using the #define preprocessor directive or the armclang -D command-line option.
• The armclang integrated assembler .ifdef directive checks for code symbols. These symbols are
defined using labels or the .set directive.
The preprocessor runs first and performs textual substitutions on the source code. This stage is when the
#ifdef directive is processed. The source code is then passed onto the assembler, when the .ifdef
directive is processed.
Note
If you want to preprocess assembly files that contain legacy armasm-syntax assembly code, then you
must either:
• Use the .S filename suffix.
• Use separate steps for preprocessing and assembling.
For more information, see Command-line options for preprocessing assembly source code in the
Migration and Compatibility Guide.
Related information
Command-line options for preprocessing assembly source code
-E (armclang)
-x (armclang)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 5-101
reserved.
Non-Confidential
Chapter 6
Using Assembly and Intrinsics in C or C++ Code
All code for a single application can be written in the same source language. This source language is
usually a high-level language such as C or C++ that is compiled to instructions for Arm architectures.
However, in some situations you might need lower-level control than that provided by C or C++.
For example:
• To access features that are not available from C or C++, such as interfacing directly with device
hardware.
• To generate highly optimized code by using intrinsics or inline assembly to write sections of your
code.
There are several ways to have low-level control over the generated code:
• Intrinsics are functions that the compiler provides. An intrinsic function has the appearance of a
function call in C or C++, but compilation replaces the intrinsic by a specific sequence of low-level
instructions.
Note
Arm compilers recognize Arm intrinsics, but are not guaranteed to work with any third-party
compiler toolchains.
• Inline assembly lets you write assembly instructions directly in your C/C++ code, without the
overhead of a function call.
• Calling assembly functions from C/C++ lets you write standalone assembly code in a separate source
file. This code is assembled separately to the C/C++ code, and then integrated at link time.
It contains the following sections:
• 6.1 Using intrinsics on page 6-104.
• 6.2 Custom Datapath Extension support on page 6-105.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-102
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-103
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.1 Using intrinsics
Using compiler intrinsics, you can achieve more complete coverage of target architecture instructions
than you would from the instruction selection of the compiler.
An intrinsic function has the appearance of a function call in C or C++, but is replaced during
compilation by a specific sequence of low-level instructions. The following example shows how to
access the __qadd saturated add intrinsic:
#include <arm_acle.h> /* Include ACLE intrinsics */
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-104
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.2 Custom Datapath Extension support
These intrinsics are documented in the Custom Datapath Extension section of the Arm C Language
Extensions document.
Example
The following example shows how to use the ACLE intrinsics for CDE:
1. Create the foo.c file containing the following code:
#include <arm_cde.h>
In this file, the function foo() uses the __arm_cx2() ACLE intrinsic for CDE. This intrinsic
generates a CX2 instruction.
A CX2 instruction is a Custom class 2 instruction that computes a value based on a source register, an
immediate, optionally the original value of the destination register, and also writes the result to the
destination register.
For example, the instruction CX2 p0, r0, r1, #2 sends the immediate 2 and the register R1 to the
CDE coprocessor p0, and writes the result returned by p0 to the register R0.
The intrinsic is defined as follows:
uint32_t __arm_cx2(int coproc, uint32_t n, uint32_t imm);
Where:
• coproc is the CDE coprocessor number to use.
• n is the variable to send to the CDE coprocessor via the general-purpose source register operand.
• imm is the compile-time constant immediate value to use.
This intrinsic generates a variant of the CX2 instruction that does not use the destination register value
to compute the result.
2. Compile foo.c with the command:
armclang --target=arm-arm-none-eabi -march=armv8.1-m.main+cdecp0 -O1 -c foo.c -o
foo.o
The compiler generates a CX2 instruction with the expected operands, and returns the result of the
instruction in register R0.
3. Run the following fromelf command to examine the output:
fromelf --cpu=8.1-M.Main --coproc0=cde --text -c foo.o
...
** Section #3 '.text.foo' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 6 bytes (alignment 4)
Address: 0x00000000
$t.0
[Anonymous symbol #3]
foo
0x00000000: ee400004 @... CX2 p0,r0,r0,#4
0x00000004: 4770 pG BX lr
...
Related information
-march
-mcpu
--coprocN=value (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-105
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.2 Custom Datapath Extension support
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-106
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.3 Writing inline assembly code
int main(void)
{
int a = 1;
int b = 2;
int c = 0;
c = add(a,b);
Note
The inline assembler does not support legacy assembly code written in armasm assembler syntax. See the
Migration and Compatibility Guide for more information about migrating armasm syntax assembly code
to GNU syntax.
Use the volatile qualifier for assembler instructions that have processor side-effects, which the
compiler might be unaware of. The volatile qualifier disables certain compiler optimizations, which
may otherwise lead to the compiler removing the code block. The volatile qualifier is optional, but you
should consider using it around your assembly code blocks to ensure the compiler does not remove them
when compiling with -O1 or above.
code is the assembly instruction, for example "ADD R0, R1, R2". code_template is a template for an
assembly instruction, for example "ADD %[result], %[input_i], %[input_j]".
If you specify a code_template rather than code then you must specify the output_operand_list
before specifying the optional input_operand_list and clobbered_register_list.
output_operand_list is a list of output operands, separated by commas. Each operand consists of a
symbolic name in square brackets, a constraint string, and a C expression in parentheses. In this example,
there is a single output operand: [result] "=r" (res). The list can be empty. For example:
__asm ("ADD R0, %[input_i], %[input_j]"
: /* This is an empty output operand list */
: [input_i] "r" (i), [input_j] "r" (j)
);
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-107
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.3 Writing inline assembly code
input_operand_list is an optional list of input operands, separated by commas. Input operands use the
same syntax as output operands. In this example, there are two input operands: [input_i] "r" (i),
[input_j] "r" (j). The list can be empty.
Multiple instructions
You can write multiple instructions within the same __asm statement. This example shows an interrupt
handler written in one __asm statement for an Armv8‑M mainline architecture.
void HardFault_Handler(void)
{
asm (
"TST LR, #0x40\n\t"
"BEQ from_nonsecure\n\t"
"from_secure:\n\t"
"TST LR, #0x04\n\t"
"ITE EQ\n\t"
"MRSEQ R0, MSP\n\t"
"MRSNE R0, PSP\n\t"
"B hard_fault_handler_c\n\t"
"from_nonsecure:\n\t"
"MRS R0, CONTROL_NS\n\t"
"TST R0, #2\n\t"
"ITE EQ\n\t"
"MRSEQ R0, MSP_NS\n\t"
"MRSNE R0, PSP_NS\n\t"
"B hard_fault_handler_c\n\t"
);
}
Copy the above handler code to file.c and then you can compile it using:
armclang --target=arm-arm-none-eabi -march=armv8-m.main -S file.c -o file.s
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-108
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.3 Writing inline assembly code
Embedded assembly
You can write embedded assembly using __attribute__((naked)). For more information, see
__attribute__((naked)) in the Arm Compiler Reference Guide.
Related information
armclang Inline Assembler
Migrating armasm syntax assembly code to GNU syntax
Semihosting for AArch32 and AArch64
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-109
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.4 Calling assembly functions from C and C++
Note
For code portability, it is better to use intrinsics or inline assembly rather than writing and calling
assembly functions.
Note
armclang requires that you explicitly specify the types of exported symbols using the .type
directive. If the .type directive is not specified in the above example, the linker outputs warnings of
the form:
Warning: L6437W: Relocation #RELA:1 in test.o(.text) with respect to myadd...
int main()
{
int a = 4;
int b = 5;
printf("Adding %d and %d results in %d\n", a, b, myadd(a, b));
return (0);
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-110
reserved.
Non-Confidential
6 Using Assembly and Intrinsics in C or C++ Code
6.4 Calling assembly functions from C and C++
The AAPCS describes a contract between caller functions and callee functions. For example, for
integer or pointer types, it specifies that:
• Registers R0-R3 pass argument values to the callee function, with subsequent arguments passed
on the stack.
• Register R0 passes the result value back to the caller function.
• Caller functions must preserve R0-R3 and R12, because these registers are allowed to be
corrupted by the callee function.
• Callee functions must preserve R4-R11 and LR, because these registers are not allowed to be
corrupted by the callee function.
For more information, see the Procedure Call Standard for the Arm® Architecture (AAPCS).
4. Compile both source files:
armclang --target=arm-arm-none-eabi -march=armv8-a main.c myadd.s
Related information
Procedure Call Standard for the Arm Architecture
Procedure Call Standard for the Arm 64-bit Architecture
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 6-111
reserved.
Non-Confidential
Chapter 7
SVE Coding Considerations with Arm® Compiler
Describes best practices for writing code that uses the SVE and SVE2 features of Arm Compiler.
It contains the following sections:
• 7.1 Embedding SVE assembly code directly into C and C++ code on page 7-113.
• 7.2 Using SVE and SVE2 intrinsics directly in your C code on page 7-118.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-112
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.1 Embedding SVE assembly code directly into C and C++ code
7.1 Embedding SVE assembly code directly into C and C++ code
Inline assembly (or inline asm) provides a mechanism for inserting hand-written assembly instructions
into C and C++ code. This lets you vectorize parts of a function by hand without having to write the
entire function in assembly code.
Note
This information assumes that you are familiar with details of the SVE Architecture, including vector-
width agnostic registers, predication, and WHILE operations.
Using inline assembly rather than writing a separate .s file has the following advantages:
• Shifts the burden of handling the procedure call standard (PCS) from the programmer to the compiler.
This includes allocating the stack frame and preserving all necessary callee-saved registers.
• Inline assembly code gives the compiler more information about what the assembly code does.
• The compiler can inline the function that contains the assembly code into its callers.
• Inline assembly code can take immediate operands that depend on C-level constructs, such as the size
of a structure or the byte offset of a particular structure field.
Where:
instructions
is a text string that contains AArch64 assembly instructions, with at least one newline sequence
\n between consecutive instructions.
outputs
is a comma-separated list of outputs from the assembly instructions.
inputs
is a comma-separated list of inputs to the assembly instructions.
side-effects
is a comma-separated list of effects that the assembly instructions have, besides reading from
inputs and writing to outputs.
Additionally, the asm keyword might need to be followed by the volatile keyword.
Outputs
Each entry in outputs has one of the following forms:
[name] "=®ister-class" (destination)
[name] "=register-class" (destination)
The first form has the register class preceded by =&. This specifies that the assembly instructions might
read from one of the inputs (specified in the asm statement's inputs section) after writing to the output.
The second form has the register class preceded by =. This specifies that the assembly instructions never
read from inputs in this way. Using the second form is an optimization. It allows the compiler to allocate
the same register to the output as it allocates to one of the inputs.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-113
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.1 Embedding SVE assembly code directly into C and C++ code
Both forms specify that the assembly instructions produce an output that is stored in the C object
specified by destination. This can be any scalar value that is valid for the left-hand side of a C
assignment. The register-class field specifies the type of register that the assembly instructions require. It
can be one of:
r
if the register for this output when used within the assembly instructions is a general-purpose
register (x0-x30)
w
if the register for this output when used within the assembly instructions is a SIMD and floating-
point register (v0-v31).
It is not possible at present for outputs to contain an SVE vector or predicate value. All uses of SVE
registers must be internal to the inline assembly block.
It is the responsibility of the compiler to allocate a suitable output register and to copy that register into
the destination after the asm statement is executed. The assembly instructions within the instructions
section of the asm statement can use one of the following forms to refer to the output value:
%[name]
to refer to an r-class output as xN or a w-class output as vN
%w[name]
to refer to an r-class output as wN
%s[name]
to refer to a w-class output as sN
%d[name]
to refer to a w-class output as dN
In all cases N represents the number of the register that the compiler has allocated to the output. The use
of these forms means that it is not necessary for the programmer to anticipate precisely which register is
selected by the compiler. The following example creates a function that returns the value 10. It shows
how the programmer is able to use the %w[res] form to describe the movement of a constant into the
output register without knowing which register is used.
int f()
{
int result;
asm("movz %w[res], #10" : [res] "=r" (result));
return result;
}
In optimized output the compiler picks the return register (0) for res, resulting in the following assembly
code:
movz w0, #10
ret
Inputs
Within an asm statement, each entry in the inputs section has the form:
[name] "operand-type" (value)
This construct specifies that the asm statement uses the scalar C expression value as an input, referred to
within the assembly instructions as name. The operand-type field specifies how the input value is handled
within the assembly instructions. It can be one of the following:
r
if the input is to be placed in a general-purpose register (x0-x30)
w
if the input is to be placed in a SIMD and floating-point register (v0-v31).
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-114
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.1 Embedding SVE assembly code directly into C and C++ code
[output-name]
if the input is to be placed in the same register as output output-name. In this case the [name]
part of the input specification is redundant and can be omitted. The assembly instructions can
use the forms described in the Outputs section above (%[name], %w[name], %s[name],
%d[name]) to refer to both the input and the output.
i
if the input is an integer constant and is used as an immediate operand. The assembly
instructions use %[name] in place of immediate operand #N, where N is the numerical value of
value.
In the first two cases, it is the responsibility of the compiler to allocate a suitable register and to ensure
that it contains value on entry to the assembly instructions. The assembly instructions must refer to these
registers using the same syntax as for the outputs (%[name], %w[name], %s[name], %d[name]).
It is not possible at present for inputs to contain an SVE vector or predicate value. All uses of SVE
registers must be internal to instructions.
This example shows an asm directive with the same effect as the previous example, except that an i-form
input is used to specify the constant to be assigned to the result.
int f()
{
int result;
asm("movz %w[res], %[value]" : [res] "=r" (result) : [value] "i" (10));
return result;
}
Side effects
Many asm statements have effects other than reading from inputs and writing to outputs. This is
particularly true of asm statements that implement vectorized loops, since most such loops read from or
write to memory. The side-effects section of an asm statement tells the compiler what these additional
effects are. Each entry must be one of the following:
"memory"
if the asm statement reads from or writes to memory. This is necessary even if inputs contain
pointers to the affected memory.
"cc"
if the asm statement modifies the condition-code flags.
"xN"
if the asm statement modifies general-purpose register N.
"vN"
if the asm statement modifies SIMD and floating-point register N.
"zN"
if the asm statement modifies SVE vector register N. Since SVE vector registers extend the
SIMD and floating-point registers, this is equivalent to writing "vN".
"pN"
if the asm statement modifies SVE predicate register N.
Use of volatile
Sometimes an asm statement might have dependencies and side effects that cannot be captured by the
asm statement syntax. For example, suppose there are three separate asm statements (not three lines
within a single asm statement), that do the following:
• The first sets the floating-point rounding mode.
• The second executes on the assumption that the rounding mode set by the first statement is in effect.
• The third statement restores the original floating-point rounding mode.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-115
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.1 Embedding SVE assembly code directly into C and C++ code
It is important that these statements are executed in order, but the asm statement syntax provides no direct
method for representing the dependency between them. Instead, each statement must add the keyword
volatile after asm. This prevents the compiler from removing the asm statement as dead code, even if
the asm statement does not modify memory and if its results appear to be unused. The compiler always
executes asm volatile statements in their original order.
For example:
asm volatile ("msr fpcr, %[flags]" :: [flags] "r" (new_fpcr_value));
Note
An asm volatile statement must still have a valid side effects list. For example, an asm volatile
statement that modifies memory must still include "memory" in the side-effects section.
Labels
The compiler might output a given asm statement more than once, either as a result of optimizing the
function that contains the asm statement or as a result of inlining that function into some of its callers.
Therefore, asm statements must not define named labels like .loop, since if the asm statement is written
more than once, the output contains more than one definition of label .loop. Instead, the assembler
provides a concept of relative labels. Each relative label is simply a number and is defined in the same
way as a normal label. For example, relative label 1 is defined by:
1:
The assembly code can contain many definitions of the same relative label. Code that refers to a relative
label must add the letter f to refer the next definition (f is for forward) or the letter b (backward) to refer
to the previous definition. A typical assembly loop with a pre-loop test would therefore have the
following structure. This allows the compiler output to contain many copies of this code without creating
any ambiguity.
...pre-loop test...
b.none 2f
1:
...loop...
b.any 1b
2:
Example
The following example shows a simple function that performs a fused multiply-add operation (x=a∙b+c)
across four passed-in arrays of a size specified by n:
void f(double *restrict x, double *restrict a, double *restrict b, double *restrict c,
unsigned long n)
{
for (unsigned long i = 0; i < n; ++i)
{
x[i] = fma(a[i], b[i], c[i]);
}
}
An asm statement that exploits SVE instructions to achieve equivalent behavior might look like the
following:
void f(double *x, double *a, double *b, double *c, unsigned long n)
{
unsigned long i;
asm ("whilelo p0.d, %[i], %[n] \n\
1: \n\
ld1d z0.d, p0/z, [%[a], %[i], lsl #3] \n\
ld1d z1.d, p0/z, [%[b], %[i], lsl #3] \n\
ld1d z2.d, p0/z, [%[c], %[i], lsl #3] \n\
fmla z2.d, p0/m, z0.d, z1.d \n\
st1d z2.d, p0, [%[x], %[i], lsl #3] \n\
uqincd %[i] \n\
whilelo p0.d, %[i], %[n] \n\
b.any 1b"
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-116
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.1 Embedding SVE assembly code directly into C and C++ code
Note
Keeping the restrict qualifiers would be valid but would have no effect.
The input specifier "[i]" (0) indicates that the assembly statements take an input 0 in the same register
as output [i]. In other words, the initial value of [i] must be zero. The use of =& in the specification of
[i] indicates that [i] cannot be allocated to the same register as [x], [a], [b], [c], or [n] (because the
assembly instructions use those inputs after writing to [i]).
In this example, the C variable i is not used after the asm statement. In effect the asm statement is simply
reserving a register that it can use as scratch space. Including "memory" in the side effects list indicates
that the asm statement reads from and writes to memory. The compiler must therefore keep the asm
statement even though i is not used.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-117
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.2 Using SVE and SVE2 intrinsics directly in your C code
Introduction
The Arm C Language Extensions (ACLE) for SVE provide a set of types and accessors for SVE vectors
and predicates, and a function interface for all relevant SVE and SVE2 instructions.
The function interface is more general than the underlying architecture, so not every function maps
directly to an architectural instruction. The intention is to provide a regular interface and leave the
compiler to pick the best mapping to SVE or SVE2 instructions.
The Arm® C Language Extensions for SVE specification has a detailed description of this interface, and
must be used as the primary reference. This section introduces a selection of features to help you get
started with the Arm C Language Extensions for SVE.
All functions and types that are defined in the header file have the prefix sv, to reduce the chance of
collisions with other extensions.
For example, svint64_t represents a vector of 64-bit signed integers, and svfloat16_t represents a
vector of half-precision floating-point numbers.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-118
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.2 Using SVE and SVE2 intrinsics directly in your C code
For most functions, this name is the lowercase name of the SVE instruction. Sometimes, letters
indicating the type or size of data being operated on are omitted, where it can be implied from
the argument types.
Unsigned extending loads add a u to indicate that the data is zero extended, to more explicitly
differentiate them from their signed equivalent.
<disambiguator>
This field distinguishes between different forms of a function, for example:
• To distinguish between addressing modes
• To distinguish forms that take a scalar rather than a vector as the final argument.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-119
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.2 Using SVE and SVE2 intrinsics directly in your C code
A list of types for vectors and predicates, starting with the return type then with each argument
type. For example, _s8, _u32, and _f32, which represent signed 8-bit integer, an unsigned 32-bit
integer and single-precision 32-bit float types, respectively.
Predicate types are represented by, for example, _b8 and _b16, for predicates suitable for 8-bit
and 16-bit types respectively. A predicate type suitable for all element types is represented by
_b. Where a type is not needed to disambiguate between variants of a base function, it is
omitted.
<predication>
This suffix describes the inactive elements in the result of a predicated operation. It can be one
of the following:
• z – Zero predication: Set all inactive elements of the result to zero.
• m – Merge predication: copy all inactive elements from the first vector argument.
• x – ‘Don’t care’ predication. Use this form when you do not care about the inactive
elements. The compiler is then free to choose between zeroing, merging, or unpredicated
forms to give the best code quality, but gives no guarantee of what data is left in inactive
elements.
Addressing modes
Load, store, prefetch, and ADR functions have arguments that describe the memory area being addressed.
The first addressing argument is the base – either a single pointer to an element type, or a 32-bit or 64-bit
vector of addresses. The second argument, when present, offsets the base (or bases) by some number of
bytes, elements, or vectors. This offset argument can be an immediate constant value, a scalar argument,
or a vector of offsets.
Not every combination of the addressing modes exists. The following table gives examples of some
common addressing mode disambiguators, and describes how to interpret the address arguments:
Disambiguator Interpretation
_u32base The base argument is a vector of unsigned 32-bit addresses.
_u64base The base argument is a vector of unsigned 64-bit addresses.
_s32offset The offset argument is a vector of byte offsets. These offsets are signed or unsigned 32-bit or 64-bit numbers.
_s64offset
_u32offset
_u64offset
_s32index The offset argument is a vector of element-sized indices. These indices are signed or unsigned 32-bit or 64-bit
numbers.
_s64index
_u32index
_u64index
_offset The offset argument is a scalar, and must be treated as a byte offset.
_index The offset argument is a scalar, and must be treated as an index into an array of elements.
_vnum The offset argument is a scalar, and must be treated an index into an array of SVE vectors.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-120
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.2 Using SVE and SVE2 intrinsics directly in your C code
Short forms
Sometimes, it is possible to omit part of the full name, and still uniquely identify the correct form of a
function, by inspecting the argument types. Where omitting part of the full name is possible, these
simplified forms are provided as aliases to their fully named equivalents, and are used for preference in
the rest of this document.
In the Arm® C Language Extensions for SVE specification, the portion that can be removed is enclosed in
square brackets. For example svclz[_s16]_m has the full name svclz_s16_m, and an overloaded alias,
svclz_m.
SVE2 intrinsics
SVE2 builds on SVE to add data-processing instructions that bring the benefits of scalable long vectors
to a wider class of applications. To enable only the base SVE2 instructions, use the +sve2 option with the
armclang -march or -mcpu options. To enable additional optional SVE2 instructions, use the following
armclang options:
• +sve2-aes to enable scalable vector forms of AESD, AESE, AESIMC, AESMC, PMULLB, and PMULLT
instructions.
• +sve2-bitperm to enable the BDEP, BEXT, and BGRP instructions.
• +sve2-sha3 to enable scalable vector forms of the RAX1 instruction.
• +sve2-sm4 to enable scalable vector forms of SM4E and SM4EKEY instructions.
You can use one or more of these options. Each option also implies +sve2. For example, +sve2-aes
+sve2-bitperm+sve2-sha3+sve2-sm4 enables all base and optional instructions. For clarity, you can
include +sve2 if necessary.
See -march and -mcpu in the Arm® Compiler Reference Guide for more information.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-121
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.2 Using SVE and SVE2 intrinsics directly in your C code
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-122
reserved.
Non-Confidential
7 SVE Coding Considerations with Arm® Compiler
7.2 Using SVE and SVE2 intrinsics directly in your C code
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 7-123
reserved.
Non-Confidential
Chapter 8
Mapping Code and Data to the Target
There are various options in Arm Compiler to control how code, data and other sections of the image are
mapped to specific locations on the target.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-124
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.1 What the linker does to create an image
Note
XO sections are supported only for images that are targeted at Armv7‑M or Armv8‑M architectures.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-125
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.1 What the linker does to create an image
If the location of some code or data lies outside all the regions that are specified in your scatter file, the
linker attempts to create a load and execution region to contain that code or data.
Note
Multiple code and data sections cannot occupy the same area of memory, unless you place them in
separate overlay regions.
8.1.2 Interaction of OVERLAY and PROTECTED attributes with armlink merge options
The OVERLAY and PROTECTED scatter-loading attributes modify the behavior of the armlink options --
merge and --merge_litpools.
The following table describes how the OVERLAY and PROTECTED scatter-loading attributes affect the
armlink options --merge and --merge_litpools. The terms const string and const value have the
following meanings:
const string
A string literal from an ELF section with the SHF_MERGE and SHF_STRINGS flags.
const value
A constant defined in a constant pool where the constant pool is in the same section as the code
that uses it.
--no_merge Disables the merging Disables the merging of all const strings. Disables the merging of all const
of all const strings. strings.
--merge_litpools Merges all const Prevents merging across regions marked Prevents merging across regions
values. OVERLAY. A const in an OVERLAY can be marked PROTECTED with other
merged into a region that is not marked regions.
with either OVERLAY or PROTECTED.
const values within a region are
const values within a region are merged. merged.
--no_merge_litpools Disables the merging Disables the merging of all const values. Disables the merging of all const
of all const values. values.
Related information
--merge, --no_merge
--merge_litpools, --no_merge_litpools
Merging identical constants
Load region attributes
Execution region attributes
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-126
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.2 Placing data items for target peripherals with a scatter file
8.2 Placing data items for target peripherals with a scatter file
To access the peripherals on your target, you must locate the data items that access them at the addresses
of those peripherals.
To make sure that the data items are placed at the correct address for the peripherals, use the
__attribute__((section(".ARM.__at_address"))) variable attribute together with a scatter file.
Procedure
1. Create peripheral.c to place the my_peripheral variable at address 0x10000000.
#include "stdio.h"
int main(void)
{
printf("%d\n",my_peripheral);
return 0;
}
LR_2 0x01000000
{
ER_ZI +0 UNINIT
{
*(.bss)
}
}
LR_3 0x10000000
{
ER_PERIPHERAL 0x10000000 UNINIT
{
*(.ARM.__at_0x10000000)
}
}
Results:
The memory map for load region LR_3 is:
Load Region LR_3 (Base: 0x10000000, Size: 0x00000004, Max: 0xffffffff, ABSOLUTE)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-127
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.3 Placing the stack and heap with a scatter file
Note
• If you re-implement __user_setup_stackheap(), your version does not get invoked when stack and
heap are defined in a scatter file.
• You might have to update your startup code to use the correct initial stack pointer. Some processors,
such as the Cortex-M3 processor, require that you place the initial stack pointer in the vector table.
See Stack and heap configuration in AN179 - Cortex®-M3 Embedded Software Development for more
details.
• You must ensure correct alignment of the stack and heap:
— In AArch32 state, the stack and heap must be 8-byte aligned.
— In AArch64 state, the stack and heap must be 16-byte aligned.
Procedure
1. Define two special execution regions in your scatter file that are named ARM_LIB_HEAP and
ARM_LIB_STACK.
2. Assign the EMPTY attribute to both regions.
Because the stack and heap are in separate regions, the library selects the non-default implementation
of __user_setup_stackheap() that uses the value of the symbols:
• Image$$ARM_LIB_STACK$$ZI$$Base.
• Image$$ARM_LIB_STACK$$ZI$$Limit.
• Image$$ARM_LIB_HEAP$$ZI$$Base.
• Image$$ARM_LIB_HEAP$$ZI$$Limit.
You can specify only one ARM_LIB_STACK or ARM_LIB_HEAP region, and you must allocate a size.
Example:
LOAD_FLASH …
{
…
ARM_LIB_STACK 0x40000 EMPTY -0x20000 ; Stack region growing down
{ }
ARM_LIB_HEAP 0x28000000 EMPTY 0x80000 ; Heap region growing up
{ }
…
}
3. Alternatively, define a single execution region that is named ARM_LIB_STACKHEAP to use a combined
stack and heap region. Assign the EMPTY attribute to the region.
Because the stack and heap are in the same region, __user_setup_stackheap() uses the value of the
symbols Image$$ARM_LIB_STACKHEAP$$ZI$$Base and Image$$ARM_LIB_STACKHEAP$$ZI$$Limit.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-128
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.4 Root region
Example
Root region with the same load and execution address.
LR_1 0x040000 ; load region starts at 0x40000
{ ; start of execution region descriptions
ER_RO 0x040000 ; load address = execution address
{
* (+RO) ; all RO sections (must include section with
; initial entry point)
}
… ; rest of scatter-loading description
}
Example
The following example shows an implicitly defined root region:
LR_1 0x040000 ; load region starts at 0x40000
{ ; start of execution region descriptions
ER_RO 0x040000 ABSOLUTE ; load address = execution address
{
* (+RO) ; all RO sections (must include the section
; containing the initial entry point)
}
… ; rest of scatter-loading description
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-129
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.4 Root region
init.o init.o
0x80000
Single (FIXED)
load
Empty
region
(movable)
*(RO) *(RO)
0x4000
You can use this attribute to place a function or a block of data, for example a constant table or a
checksum, at a fixed address in ROM. This makes it easier to access the function or block of data
through pointers.
If you place two separate blocks of code or data at the start and end of ROM, some of the memory
contents might be unused. For example, you might place some initialization code at the start of ROM and
a checksum at the end of ROM. Use the * or .ANY module selector to flood fill the region between the
end of the initialization block and the start of the data block.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-130
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.4 Root region
To make your code easier to maintain and debug, use the minimum number of placement specifications
in scatter files. Leave the detailed placement of functions and data to the linker.
Note
There are some situations where using FIXED and a single load region are not appropriate. Other
techniques for specifying fixed locations are:
• If your loader can handle multiple load regions, place the RO code or data in its own load region.
• If you do not require the function or data to be at a fixed location in ROM, use ABSOLUTE instead of
FIXED. The loader then copies the data from the load region to the specified address in RAM.
ABSOLUTE is the default attribute.
• To place a data structure at the location of memory-mapped I/O, use two load regions and specify
UNINIT. UNINIT ensures that the memory locations are not initialized to zero.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-131
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.5 Placing functions and data in a named section
Procedure
1. Create a C source file file.c to specify a section name foo for a variable and a section
name .bss.mybss for a zero-initialized variable z, for example:
#include "stdio.h"
int main(void)
{
int x = 4;
int y = 7;
z = x + y;
printf("%d\n",variable);
printf("%d\n",z);
return 0;
}
2. Create a scatter file to place the named section, scatter.scat, for example:
LR_1 0x0
{
ER_RO 0x0 0x4000
{
*(+RO)
}
ER_RW 0x4000 0x2000
{
*(+RW)
}
ER_ZI 0x6000 0x2000
{
*(+ZI)
}
ER_MYBSS 0x8000 0x2000
{
*(.bss.mybss)
}
ADDER 0x08000000
{
file.o (foo) ; select section foo from file.o
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-132
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.5 Placing functions and data in a named section
The ARM_LIB_STACK and ARM_LIB_HEAP regions are required because the program is being linked
with the semihosting libraries.
Note
If you omit file.o (foo) from the scatter file, the linker places the section in the region of the same
type. That is, ER_RW in this example.
Execution Region ADDER (Base: 0x08000000, Size: 0x00000004, Max: 0xffffffff, ABSOLUTE)
Note
• If scatter-loading is not used, the linker places the section foo in the default ER_RW execution
region of the LR_1 load region. It also places the section .bss.mybss in the default execution
region ER_ZI.
• If you have a scatter file that does not include the foo selector, then the linker places the section in
the defined RW execution region.
You can also place a function at a specific address using .ARM.__at_address as the section name.
For example, to place the function sqr at 0x20000, specify:
int sqr(int n1) __attribute__((section(".ARM.__at_0x20000")));
For more information, see 8.7 Placement of functions and data at specific addresses on page 8-136.
Related information
Semihosting for AArch32 and AArch64
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-133
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.6 Loading armlink-generated ELF files that have complex scatter-files
The ELF loader copies p_filesz bytes from the file at offset p_offset to the address specified by
p_vaddr. The loader then creates p_memsz - p_filesz bytes of zero-initialized (ZI) data at address
p_vaddr + p_filesz.
p_vaddr
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-134
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.6 Loading armlink-generated ELF files that have complex scatter-files
{
*(+RO)
}
RW_DATA +0
{
*(+RW)
}
/* ZI_DATA is not a root region */
ZI_DATA 0x10000000
{
*(+ZI)
}
}
LR_STACKHEAP 0x20000000
{
ARM_LIB_STACKHEAP +0 EMPTY 0x2000 {}
}
3. Compile and link the example using the following commands:
armclang --target=arm-arm-none-eabi -march=armv7-a -c foo.c -o foo.o
armlink --scatter=scatter.scat foo.o -o foo.axf
4. To examine the program headers, enter the following fromelf command:
fromelf -s -v foo.axf
...
========================================================================
** Program header #0
** Section #5
...
179 foo 0x10000000 Gb 2 Data Hi 0x40000
...
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-135
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
Note
For images targeted at Armv7‑M or Armv8‑M, the compiler might generate execute-only (XO) sections.
Typically, you create a scatter file that defines an execution region at the required address with a section
description that selects only one section.
To place a function or variable at a specific address, it must be placed in its own section. There are
several ways to place a function or variable in its own section:
• By default, the compiler places each function and variable in individual ELF sections. To override
this default placement, use the -fno-function-sections or -fno-data-sections compiler options.
• Place the function or data item in its own source file.
• Use __attribute__((section("name"))) to place functions and variables in a specially named
section, .ARM.__at_address, where address is the address to place the function or variable. For
example, __attribute__((section(".ARM.__at_0x4000"))).
To place ZI data at a specific address, use the variable attribute
__attribute__((section("name"))) with the special name .bss.ARM.__at_address
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-136
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
Note
The name of the section is only significant if you are trying to match the section by name in a scatter file.
Without overlays, the linker automatically assigns __at sections when you use the --autoat command-
line option. This option is the default. If you are using overlays, then you cannot use --autoat to place
__at sections.
Note
You cannot use __at section placement with position-independent execution regions.
When linking with the --autoat option, the linker does not place __at sections with scatter-loading
selectors. Instead, the linker places the __at section in a compatible region. If no compatible region is
found, the linker creates a load and execution region for the __at section.
All linker execution regions created by --autoat have the UNINIT scatter-loading attribute. If you
require a Zero-Initialized (ZI) __at section to be zero-initialized, then it must be placed within a
compatible region. A linker execution region created by --autoat must have a base address that is at
least 4 byte-aligned. If any region is incorrectly aligned, the linker produces an error message.
A compatible region is one where:
• The __at address lies within the execution region base and limit, where limit is the base address +
maximum size of execution region. If no maximum size is set, the linker sets the limit for placing
__at sections as the current size of the execution region without __at sections plus a constant. The
default value of this constant is 10240 bytes, but you can change the value using the
--max_er_extension command-line option.
• The execution region meets at least one of the following conditions:
— It has a selector that matches the __at section by the standard scatter-loading rules.
— It has at least one section of the same type (RO or RW) as the __at section.
— It does not have the EMPTY attribute.
Note
The linker considers an __at section with type RW compatible with RO.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-137
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
The following example shows the sections .ARM.__at_0x0000 type RO, .ARM.__at_0x4000 type RW,
and .ARM.__at_0x8000 type RW:
// place the RO variable in a section called .ARM.__at_0x0000
const int foo __attribute__((section(".ARM.__at_0x0000"))) = 10;
The following scatter file shows how automatically to place these __at sections:
LR1 0x0
{
ER_RO 0x0 0x4000
{
*(+RO) ; .ARM.__at_0x0000 lies within the bounds of ER_RO
}
ER_RW 0x4000 0x2000
{
*(+RW) ; .ARM.__at_0x4000 lies within the bounds of ER_RW
}
ER_ZI 0x6000 0x2000
{
*(+ZI)
}
}
; The linker creates a load and execution region for the __at section
; .ARM.__at_0x8000 because it lies outside all candidate regions.
The following example shows the placement of read-only sections .ARM.__at_0x2000 and the read-
write section .ARM.__at_0x4000. Load and execution regions are not created automatically in manual
mode. An error is produced if an __at section cannot be placed in an execution region.
The following example shows the placement of the variables in C or C++ code:
// place the RO variable in a section called .ARM.__at_0x2000
const int foo __attribute__((section(".ARM.__at_0x2000"))) = 100;
// place the RW variable in a section called .ARM.__at_0x4000
int bar __attribute__((section(".ARM.__at_0x4000")));
The following scatter file shows how to place __at sections manually:
LR1 0x0
{
ER_RO 0x0 0x2000
{
*(+RO) ; .ARM.__at_0x0000 is selected by +RO
}
ER_RO2 0x2000
{
*(.ARM.__at_0x02000) ; .ARM.__at_0x2000 is selected by the section named
; .ARM.__at_0x2000
}
ER2 0x4000
{
*(+RW, +ZI) ; .ARM.__at_0x4000 is selected by +RW
}
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-138
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
Procedure
1. Create a C file abs_address.c to define an integer and a string constant.
unsigned int const number = 0x12345678;
char* const string = "Hello World";
2. Create a scatter file, scatter.scat, to place the constants in separate sections ER_RONUMBERS and
ER_ROSTRINGS.
LR_1 0x040000 ; load region starts at 0x40000
{ ; start of execution region descriptions
ER_RO 0x040000 ; load address = execution address
{
*(+RO +RW) ; all RO sections (must include section with
; initial entry point)
}
ER_RONUMBERS +0
{
*(.rodata.number, +RO-DATA)
}
ER_ROSTRINGS +0
{
*(.rodata.string, .rodata.str1.1, +RO-DATA)
}
; rest of scatter-loading description
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-139
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
4. Run fromelf on the image to view the contents of the output sections.
fromelf -c -d abs_address.axf
Results:
The output contains the following sections:
...
** Section #2 'ER_RONUMBERS' (SHT_PROGBITS) [SHF_ALLOC]
Size : 4 bytes (alignment 4)
Address: 0x00040000
0x040000: 78 56 34 12 xV4.
5. Replace the ER_RONUMBERS and ER_ROSTRINGS sections in the scatter file with the following
ER_RODATA section:
ER_RODATA +0
{
abs_address.o(.rodata.number, .rodata.string, .rodata.str1.1, +RO-DATA)
}
The following procedure describes how to place the jump table in a ROM .rodata section.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-140
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
Procedure
1. Create a C file jump.c.
Make the PFUNC type a pointer to a void function that has no parameters. You can then use PFUNC to
create an array of constant function pointers.
extern void func0(void);
extern void func1(void);
extern void func2(void);
void jump(unsigned i)
{
if (i<=2)
table[i]();
}
3. Run fromelf on the image to view the contents of the output sections.
fromelf -c -d jump.o
Results:
The table is placed in the read-only section .rodata that you can place in ROM as required:
...
** Section #3 '.text.jump' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 64 bytes (alignment 4)
Address: 0x00000000
$a.0
[Anonymous symbol #24]
jump
0x00000000: e92d4800 .H-. PUSH {r11,lr}
0x00000004: e24dd008 ..M. SUB sp,sp,#8
0x00000008: e1a01000 .... MOV r1,r0
0x0000000c: e58d0004 .... STR r0,[sp,#4]
0x00000010: e3500002 ..P. CMP r0,#2
0x00000014: e58d1000 .... STR r1,[sp,#0]
0x00000018: 8a000006 .... BHI {pc}+0x20 ; 0x38
0x0000001c: eaffffff .... B {pc}+0x4 ; 0x20
0x00000020: e59d0004 .... LDR r0,[sp,#4]
0x00000024: e3001000 .... MOVW r1,#:LOWER16: table
0x00000028: e3401000 ..@. MOVT r1,#:UPPER16: table
0x0000002c: e7910100 .... LDR r0,[r1,r0,LSL #2]
0x00000030: e12fff30 0./. BLX r0
0x00000034: eaffffff .... B {pc}+0x4 ; 0x38
0x00000038: e28dd008 .... ADD sp,sp,#8
0x0000003c: e8bd8800 .... POP {r11,pc}
...
** Section #7 '.rodata.table' (SHT_PROGBITS) [SHF_ALLOC]
Size : 12 bytes (alignment 4)
Address: 0x00000000
0x000000: 00 00 00 00 00 00 00 00 00 00 00 00 ............
...
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-141
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
{
int squared;
squared=sqr(gValue);
printf("Value squared is: %d\n", squared);
return 0;
}
2. Create the source file function.c containing the following code:
int sqr(int n1)
{
return n1*n1;
}
3. Compile and link the sources:
armclang --target=arm-arm-none-eabi -march=armv8-a -c function.c
armclang --target=arm-arm-none-eabi -march=armv8-a -c main.c
armlink --map function.o main.o -o squared.axf
The --map option displays the memory map of the image. Also, --autoat is the default.
In this example, __attribute__((section(".ARM.__AT_0x5000"))) specifies that the global variable
gValue is to be placed at the absolute address 0x5000. gValue is placed in the execution region
ER$$.ARM.__AT_0x5000 and load region LR$$.ARM.__AT_0x5000.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-142
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.7 Placement of functions and data at specific addresses
The ARM_LIB_STACK and ARM_LIB_HEAP regions are required because the program is being linked
with the semihosting libraries.
4. Compile and link the sources:
armclang --target=arm-arm-none-eabi -march=armv8-a -c function.c
armclang --target=arm-arm-none-eabi -march=armv8-a -c main.c
armlink --no_autoat --scatter=scatter.scat --map function.o main.o -o squared.axf
In this example, the size of ER1 is unknown. Therefore, gValue might be placed in ER1 or ER2. To make
sure that gValue is placed in ER2, you must include the corresponding selector in ER2 and link with the
--no_autoat command-line option. If you omit --no_autoat, gValue is placed in a separate load region
LR$$.ARM.__at_0x10000 that contains the execution region ER$$.ARM.__at_0x10000.
Related information
Semihosting for AArch32 and AArch64
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-143
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.8 Bare-metal Position Independent Executables
Note
• Bare-metal PIE support is deprecated.
• There is support for -fropi and -frwpi in armclang. You can use these options to create bare-metal
position independent executables.
Position independent code uses PC-relative addressing modes where possible and otherwise accesses
global data via the Global Offset Table (GOT). The address entries in the GOT and initialized pointers in
the data area are updated with the executable load address when the executable runs for the first time.
All objects and libraries that are linked into the image must be compiled to be position independent.
int main(void)
{
printf(“Hello World!\n”);
return 0;
}
To compile and automatically link this code for bare-metal PIE, use the -fbare-metal-pie option with
armclang:
Alternatively, you can compile with armclang -fbare-metal-pie and link with armlink --
bare_metal_pie as separate steps:
If you are using link time optimization, use the armlink --lto_relocation_model=pic option to tell
the link time optimizer to produce position independent code:
armclang --target=arm-arm-none-eabi -march=armv8-a -flto -fbare-metal-pie -c hello.c -o
hello.bc
armlink --lto --lto_relocation_model=pic --bare_metal_pie hello.bc -o hello
Restrictions
A bare-metal PIE executable must conform to the following:
• AArch32 state only.
• The .got section must be placed in a writable region.
• All references to symbols must be resolved at link time.
• The image must be linked Position Independent with a base address of 0x0.
• The code and data must be linked at a fixed offset from each other.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-144
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.8 Bare-metal Position Independent Executables
• The stack must be set up before the runtime relocation routine __arm_relocate_pie_ is called. This
means that the stack initialization code must only use PC-relative addressing if it is part of the image
code.
• It is the responsibility of the target platform that loads the PIE to ensure that the ZI region is zero-
initialized.
• When writing assembly code for position independence, some instructions (LDR, for example) let you
specify a PC-relative address in the form of a label. For example:
LDR r0,=__main
This causes the link step to fail when building with --bare-metal-pie, because the symbol is in a
read-only section. armlink returns an error message, for example:
Error: L6084E: Dynamic relocation from #REL:0 in unwritable section
foo-7cb47a.o(.text.main) of type R_ARM_RELATIVE to symbol main cannot be applied.
got +0 { *(.got) }
er_rw +0 { *(+RW) }
er_zi +0 { *(+ZI) }
; Add any stack and heap section required by the user supplied
; stack/heap initialization routine here
}
The linker generates the DYNAMIC_RELOCATION_TABLE section. This section must be placed in an
execution region called DYNAMIC_RELOCATION_TABLE. This allows the runtime relocation routine
__arm_relocate_pie_ that is provided in the C library to locate the start and end of the table using the
symbols Image$$DYNAMIC_RELOCATION_TABLE$$Base and Image$$DYNAMIC_RELOCATION_TABLE$
$Limit.
When using a scatter file and the default entry code that the C library supplies, the linker requires that
you provide your own routine for initializing the stack and heap. This user supplied stack and heap
routine is run before the routine __arm_relocate_pie_, so it is necessary to ensure that this routine only
uses PC relative addressing.
Related information
--fpic (armlink)
--pie (armlink)
--bare_metal_pie (armlink)
--ref_pre_init (armlink)
-fbare-metal-pie (armclang)
-fropi (armclang)
-frwpi (armclang)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-145
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.9 Placement of Arm® C and C++ library code
RAM1 0x3000
{
*armlib* (+RO) ; all other Arm-supplied library code
; for example, floating-point libraries
}
RAM2 0x4000
{
* (+RW, +ZI)
}
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-146
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.9 Placement of Arm® C and C++ library code
The name armlib indicates the Arm C library files that are located in the directory
install_directory\lib\armlib.
Procedure
1. Create the following C++ program, foo.cpp:
#include <iostream>
2. To place the C++ library code, define the following scatter file, scatter.scat:
LR 0x8000
{
ER1 +0
{
*armlib*(+RO)
}
ER2 +0
{
*libcxx*(+RO)
}
ER3 +0
{
*(+RO)
The name *armlib* matches install_directory\lib\armlib, indicating the Arm C library files
that are located in the armlib directory.
The name *libcxx* matches install_directory\lib\libcxx, indicating the C++ library files that
are located in the libcxx directory.
3. Compile and link the sources:
armclang --target=arm-arm-none-eabi -march=armv8-a -c foo.cpp
armclang --target=arm-arm-none-eabi -march=armv8-a -c main.c
armlink --scatter=scatter.scat --map main.o foo.o -o foo.axf
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-147
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
Note
The placement of data can cause some data to be removed and shrink the size of the sections.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-148
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
When more than one .ANY selector is present in a scatter file, the linker sorts sections in descending size
order. It then takes the unassigned section with the largest size and assigns the section to the most
specific .ANY execution region that has enough free space. For example, .ANY(.text) is judged to be
more specific than .ANY(+RO).
If several execution regions are equally specific, then the section is assigned to the execution region with
the most available remaining space.
For example:
• You might have two equally specific execution regions where one has a size limit of 0x2000 and the
other has no limit. In this case, all the sections are assigned to the second unbounded .ANY region.
• You might have two equally specific execution regions where one has a size limit of 0x2000 and the
other has a size limit of 0x3000. In this case, the first sections to be placed are assigned to the
second .ANY region of size limit 0x3000. This assignment continues until the remaining size of the
second .ANY region is reduced to 0x2000. From this point, sections are assigned alternately between
both .ANY execution regions.
You can specify a maximum amount of space to use for unassigned sections with the execution region
attribute ANY_SIZE.
The --any_contingency option prevents the linker from filling the region up to its maximum. It
reserves a portion of the region's size for linker-generated content and fills this contingency area only if
no other regions have space. It is enabled by default for the first_fit and best_fit algorithms,
because they are most likely to exhibit this behavior.
Procedure
• To prioritize the order of multiple .ANY sections use the .ANYnum selector, where num is a positive
integer starting at zero.
The highest priority is given to the selector with the highest integer.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-149
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
8.10.4 Specify the maximum region size permitted for placing unassigned sections
You can specify the maximum size in a region that armlink can fill with unassigned sections.
Use the execution region attribute ANY_SIZE max_size to specify the maximum size in a region that
armlink can fill with unassigned sections.
Example
The following example shows how to use ANY_SIZE:
LOAD_REGION 0x0 0x3000
{
ER_1 0x0 ANY_SIZE 0xF00 0x1000
{
.ANY
}
ER_2 0x0 ANY_SIZE 0xFB0 0x1000
{
.ANY
}
ER_3 0x0 ANY_SIZE 0x1000 0x1000
{
.ANY
}
}
In this example:
• ER_1 has 0x100 reserved for linker-generated content.
• ER_2 has 0x50 reserved for linker-generated content. That is about the same as the automatic
contingency of --any_contingency.
• ER_3 has no reserved space. Therefore, 100% of the region is filled, with no contingency for veneers.
Omitting the ANY_SIZE parameter causes 98% of the region to be filled, with a two percent
contingency for veneers.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-150
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
Name Size
sec1 0x4
sec2 0x4
sec3 0x4
sec4 0x4
sec5 0x4
sec6 0x4
Note
These examples have --any_contingency disabled.
Execution Region ER_2 (Base: 0x00000200, Size: 0x00000008, Max: 0x00000010, ABSOLUTE)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-151
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
In this example:
• For first_fit, the linker first assigns all the sections it can to ER_1, then moves on to ER_2 because
that is the next available region.
• For next_fit, the linker does the same as first_fit. However, when ER_1 is full it is marked as
FULL and is not considered again. In this example, ER_1 is full. ER_2 is then considered.
• For best_fit, the linker assigns sec1 to ER_1. It then has two regions of equal priority and
specificity, but ER_1 has less space remaining. Therefore, the linker assigns sec2 to ER_1, and
continues assigning sections until ER_1 is full.
Execution Region ER_2 (Base: 0x00000200, Size: 0x0000000c, Max: 0x00000010, ABSOLUTE)
The linker first assigns sec1 to ER_1. It then has two equally specific and priority regions. It assigns sec2
to the one with the most free space, ER_2 in this example. The regions now have the same amount of
space remaining, so the linker assigns sec3 to the first one that appears in the scatter file, that is ER_1.
Note
The behavior of worst_fit is the default behavior in this version of the linker, and it is the only
algorithm available in earlier linker versions.
8.10.6 Example of next_fit algorithm showing behavior of full regions, selectors, and priority
This example shows the operation of the next_fit placement algorithm for RO-CODE sections in
sections.o.
The input section properties and ordering are shown in the following table:
Table 8-2 Input section properties for placement of sections with next_fit
Name Size
sec1 0x14
sec2 0x14
sec3 0x10
sec4 0x4
sec5 0x4
sec6 0x4
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-152
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
.ANY1(+RO-CODE)
}
ER_2 0x200 0x20
{
.ANY2(+RO)
}
ER_3 0x300 0x20
{
.ANY3(+RO)
}
}
Note
This example has --any_contingency disabled.
The next_fit algorithm is different to the others in that it never revisits a region that is considered to be
full. This example also shows the interaction between priority and specificity of selectors. This is the
same for all the algorithms.
Execution Region ER_1 (Base: 0x00000100, Size: 0x00000014, Max: 0x00000020, ABSOLUTE)
Execution Region ER_2 (Base: 0x00000200, Size: 0x0000001c, Max: 0x00000020, ABSOLUTE)
Base Addr Size Type Attr Idx E Section Name Object
Execution Region ER_3 (Base: 0x00000300, Size: 0x00000014, Max: 0x00000020, ABSOLUTE)
In this example:
• The linker places sec1 in ER_1 because ER_1 has the most specific selector. ER_1 now has 0x6 bytes
remaining.
• The linker then tries to place sec2 in ER_1, because it has the most specific selector, but there is not
enough space. Therefore, ER_1 is marked as full and is not considered in subsequent placement steps.
The linker chooses ER_3 for sec2 because it has higher priority than ER_2.
• The linker then tries to place sec3 in ER_3. It does not fit, so ER_3 is marked as full and the linker
places sec3 in ER_2.
• The linker now processes sec4. This is 0x4 bytes so it can fit in either ER_1 or ER_3. Because both of
these sections have previously been marked as full, they are not considered. The linker places all
remaining sections in ER_2.
• If another section sec7 of size 0x8 exists, and is processed after sec6 the example fails to link. The
algorithm does not attempt to place the section in ER_1 or ER_3 because they have previously been
marked as full.
The input section properties and ordering are shown in the following table:
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-153
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
Table 8-3 Input section properties and ordering for sections_a.o and sections_b.o
sections_a.o sections_b.o
The following table shows the order that the sections are processed by the .ANY assignment algorithm.
Name Size
seca_4 0x14
secb_4 0x14
seca_3 0x10
secb_3 0x10
seca_1 0x4
seca_2 0x4
secb_1 0x4
secb_2 0x4
With --any_sort_order=descending_size, sections of the same size use the creation index as a
tiebreaker.
Command-line example
The following linker command-line options are used for this example:
--any_sort_order=cmdline sections_a.o sections_b.o --scatter scatter.txt
The following table shows the order that the sections are processed by the .ANY assignment algorithm.
Name Size
seca_1 0x4
seca_2 0x4
seca_3 0x10
seca_4 0x14
secb_1 0x4
secb_2 0x4
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-154
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
Name Size
secb_3 0x10
secb_4 0x14
.ANY
sections
Image
content
Free
space
2%
limit
The downward arrows for prospective padding show that the prospective padding continues to grow as
more sections are added to the .ANY selector.
Prospective padding is dealt with before the two percent veneer contingency.
When the prospective padding is cleared, the priority is set to zero. When the two percent is cleared, the
priority is decremented again.
You can also use the ANY_SIZE keyword on an execution region to specify the maximum amount of
space in the region to set aside for .ANY section assignments.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-155
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
You can use the armlink command-line option --info=any to get extra information on where the linker
has placed sections. This information can be useful when trying to debug problems.
Note
When there is only one .ANY selector, it might not behave identically to *. The algorithms that are used to
determine the size of the section and place data still run with .ANY and they try to estimate the impact of
changes that might affect the size of sections.These algorithms do not run if * is used instead. When it is
appropriate to use one or the other of .ANY or *, then you must not use a single .ANY selector that applies
to a kind of data, such as RO, RW, or ZI. For example, .ANY (+RO).
You might see error L6407E generated, for example:
Error: L6407E: Sections of aggregate size 0x128 bytes could not fit into .ANY selector(s).
However, increasing the section size by 0x128 bytes does not necessarily lead to a successful link. The
failure to link is because of the extra data, such as region table entries, that might end up in the region
after adding more sections.
Example
1. Create the following foo.c program:
#include "stdio.h"
struct S {
char A[8];
char B[4];
};
struct S s;
struct S* get()
{
return &s;
}
int main(void) {
int i;
for (i=0; i<10; i++) {
array[i]=i*i;
printf("%d\n", array[i]);
}
gSquared=sqr(i);
printf("%d squared is: %d\n", i, gSquared);
return sizeof(array);
}
2. Create the following scatter.scat file:
LOAD_REGION 0x0 0x3000
{
ER_1 0x0 0x1000 {
.ANY
}
ER_2 (ImageLimit(ER_1)) 0x1500 {
.ANY
}
ER_3 (ImageLimit(ER_2)) 0x500
{
.ANY
}
ER_4 (ImageLimit(ER_3)) 0x1000
{
*(+RW,+ZI)
}
ARM_LIB_STACK 0x800000 EMPTY -0x10000
{
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-156
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.10 Manual placement of unassigned sections
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
3. Compile and link the program as follows:
armclang -c --target=arm-arm-none-eabi -mcpu=cortex-m4 -o foo.o foo.c
armlink --cpu=cortex-m4 --any_contingency --scatter=scatter.scat --info=any -o foo.axf
foo.o
==============================================================================
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-157
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.11 Placing veneers with a scatter file
Procedure
1. To place veneers at a specific location, include the linker-generated symbol Veneer$$Code in a
scatter file. At most, one execution region in the scatter file can have the *(Veneer$$Code) section
selector.
If it is safe to do so, the linker places veneer input sections into the region identified by the
*(Veneer$$Code) section selector. It might not be possible for a veneer input section to be assigned
to the region because of address range problems or execution region size limitations. If the veneer
cannot be added to the specified region, it is added to the execution region containing the relocated
input section that generated the veneer.
Note
Instances of *(IWV$$Code) in scatter files from earlier versions of Arm tools are automatically
translated into *(Veneer$$Code). Use *(Veneer$$Code) in new descriptions.
*(Veneer$$Code) is ignored when the amount of code in an execution region exceeds 4MB of 16-bit
T32 code, 16MB of 32-bit T32 code, and 32MB of A32 code.
Note
There are no state-change veneers in A64.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-158
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.12 Preprocessing a scatter file
You can:
• Add preprocessing directives to the top of the scatter file.
• Use simple expression evaluation in the scatter file.
For example, a scatter file, file.scat, might contain:
#! armclang --target=arm-arm-none-eabi -march=armv8-a -E -x c
#define ADDRESS 0x20000000
#include "include_file_1.h"
LR1 ADDRESS
{
…
}
The linker parses the preprocessed scatter file and treats the directives as comments.
You can also use the --predefine command-line option to assign values to constants. For this example:
1. Modify file.scat to delete the directive #define ADDRESS 0x20000000.
2. Specify the command:
armlink --predefine="-DADDRESS=0x20000000" --scatter=file.scat
This section contains the following subsections:
• 8.12.1 Default behavior for armclang -E in a scatter file on page 8-159.
• 8.12.2 Use of other preprocessors in a scatter file on page 8-159.
On Windows, .exe suffixes are handled, so armclang.exe is considered the same as armclang.
Executable names are case insensitive, so ARMCLANG is considered the same as armclang. The portable
way to write scatter file preprocessing lines is to use correct capitalization and omit the .exe suffix.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-159
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.12 Preprocessing a scatter file
This means:
• The string must be correctly quoted for the host system. The portable way to do this is to use double-
quotes.
• Single quotes and escaped characters are not supported and might not function correctly.
• The use of a double-quote character in a path name is not supported and might not work.
These rules also apply to any strings passed with the --predefine option.
All preprocessor executables must accept the -o file option to mean output to file and accept the input
as a filename argument on the command line. These options are automatically added to the user
command line by armlink. Any options to redirect preprocessing output in the user-specified command
line are not supported.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-160
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.13 Reserving an empty block of memory
Note
The dummy ZI region that is created for an EMPTY execution region is not initialized to zero at runtime.
If the address is in relative (+offset) form and the length is negative, the linker generates an error.
The following figure shows a diagrammatic representation for this example.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-161
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.13 Reserving an empty block of memory
0x810000
Limit
Heap
0x800000
Base Limit
Stack
0x7F0000
Base
Note
The EMPTY attribute applies only to an execution region. The linker generates a warning and ignores an
EMPTY attribute that is used in a load region definition.
The linker checks that the address space used for the EMPTY region does not overlap any other execution
region.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-162
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.14 Alignment of regions to page boundaries
Note
• Alignment on an execution region causes both the load address and execution address to be aligned.
• The default page size is 0x8000. To change the page size, specify the --pagesize linker command-
line option.
To produce an ELF file with each execution region starting on a new page, and with code starting on the
next page boundary after the header information:
LR1 0x0 + SizeOfHeaders()
{
ER_RO +0
{
*(+RO)
}
ER_RW AlignExpr(+0, GetPageSize())
{
*(+RW)
}
ER_ZI AlignExpr(+0, GetPageSize())
{
*(+ZI)
}
}
If you set up your ELF file in this way, then you can memory-map it onto an operating system in such a
way that:
• RO and RW data can be given different memory protections, because they are placed in separate
pages.
• The load address everything expects to run at is related to its offset in the ELF file by specifying
SizeOfHeaders() for the first load region.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-163
reserved.
Non-Confidential
8 Mapping Code and Data to the Target
8.15 Alignment of execution regions and input sections
Increases the section alignment of all the sections in an execution region, for example:
ER_DATA … ALIGNALL 8
{
… ;selectors
}
OVERALIGN
Increases the alignment of a specific section, for example:
ER_DATA …
{
*.o(.bar, OVERALIGN 8)
… ;selectors
}
Note
armlink does not OVERALIGN some sections where it might be unsafe to do so. For
more information, see Syntax of an input section description.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 8-164
reserved.
Non-Confidential
Chapter 9
Overlays
Describes the Arm Compiler support for overlays to enable you to have multiple load regions at the same
address.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-165
reserved.
Non-Confidential
9 Overlays
9.1 Overlay support in Arm® Compiler
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-166
reserved.
Non-Confidential
9 Overlays
9.2 Automatic overlay support
Procedure
1. Declare the functions that you want the armlink automatic overlay mechanism to process.
• In C, use a function attribute, for example:
__attribute__((section(".ARM.overlay1"))) void foo(void) { ... }
.section .ARM.overlay2,"ax",%progbits
.globl bar
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-167
reserved.
Non-Confidential
9 Overlays
9.2 Automatic overlay support
.p2align 2
.type bar,%function
bar: @ @bar
...
.fnend
• In armasm assembler syntax, use the AREA directive, for example:
AREA |.ARM.overlay1|,CODE
foo PROC
...
ENDP
AREA |.ARM.overlay2|,CODE
bar PROC
...
ENDP
Note
You can only overlay code sections. Data sections must never be overlaid.
2. Specify the locations to load the code sections from and to in a scatter file. Use the AUTO_OVERLAY
keyword on one or more execution regions.
The execution regions must not have any section selectors. For example:
OVERLAY_LOAD_REGION 0x10000000
{
OVERLAY_EXECUTE_REGION_A 0x20000000 AUTO_OVERLAY 0x10000 { }
OVERLAY_EXECUTE_REGION_B 0x20010000 AUTO_OVERLAY 0x10000 { }
}
In this example, armlink emits a program header table entry that loads all the overlay data starting at
address 0x10000000. Also, each overlay is relocated so that it runs correctly if copied to address
0x20000000 or 0x20010000. armlink chooses one of these addresses for each overlay.
3. When linking, specify the --overlay_veneers command-line option. This option causes armlink to
arrange function calls between two overlays, or between non-overlaid code and an overlay, to be
diverted through the entry point of an overlay manager.
To permit an overlay-aware debugger to track the overlay that is active, specify the armlink --
emit_debug_overlay_section command-line option.
Related information
__attribute__((section("name"))) function attribute
AREA
Execution region attributes
--emit_debug_overlay_section linker option
--overlay_veneers linker option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-168
reserved.
Non-Confidential
9 Overlays
9.2 Automatic overlay support
information about the target function and the overlay that it is in, and transfers control to the overlay
manager entry point. The overlay manager must then:
• Ensure that the correct overlay is loaded and then transfer control to the target function.
• Restore the stack and registers to the state they were left in by the original BL instruction.
• If the function call originated inside an overlay, make sure that returning from the called function
reloads the overlay being returned to.
Related information
--overlay_veneers linker option
Region$$Count$$AutoOverlay
This symbol points to a single 16-bit integer (an unsigned short) giving the total number of
overlay regions. That is, the number of entries in the arrays Region$$Table$$AutoOverlay and
CurrLoad$$Table$$AutoOverlay.
Overlay$$Map$$AutoOverlay
This symbol points to an array containing a 16-bit integer (an unsigned short) per overlay. For
each overlay, this table indicates which overlay region the overlay expects to be loaded into to
run correctly.
Size$$Table$$AutoOverlay
This symbol points to an array containing a 32-bit word per overlay. For each overlay, this table
gives the exact size of the data for the overlay. This size might be less than the size of its
containing overlay region, because overlays typically do not fill their regions exactly.
In addition to the read-only tables, armlink also provides one piece of read/write memory:
CurrLoad$$Table$$AutoOverlay
This symbol points to an array containing a 16-bit integer (an unsigned short) for each overlay
region. The array is intended for the overlay manager to store the identifier of the currently
loaded overlay in each region. The overlay manager can then avoid reloading an already-loaded
overlay.
All these data tables are optional. If your code does not refer to any particular table, then it is omitted
from the image.
Related concepts
9.2 Automatic overlay support on page 9-167
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-169
reserved.
Non-Confidential
9 Overlays
9.2 Automatic overlay support
void non_overlaid(void)
{
innermost();
}
int main(void)
{
// Call the overlaid function call_via_ptr() and pass it a pointer
// to non_overlaid(). non_overlaid() then calls the function
// innermost() in another overlay. If call_via_ptr() and innermost()
// are allocated to the same overlay region by the linker, then there
// is no way for call_via_ptr to have been reloaded by the time control
// has to return to it from non_overlaid().
call_via_ptr(non_overlaid);
}
Related concepts
9.2 Automatic overlay support on page 9-167
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-170
reserved.
Non-Confidential
9 Overlays
9.2 Automatic overlay support
The armlink --info=auto_overlay option causes the linker to write out a text summary of the
overlays in the image it outputs. The summary consists of the integer ID, start address, and size of each
overlay. You can use this information to extract the overlays from the image, for example from the
fromelf --bin output. You can then put them in a separate peripheral storage system. Therefore, you
still know which chunk of data goes with which overlay ID when you have to load one of them in the
overlay manager.
Related concepts
9.2 Automatic overlay support on page 9-167
Related information
--info linker option
Related information
__attribute__((section("name"))) function attribute
AREA
Execution region attributes
--emit_debug_overlay_section linker option
--overlay_veneers linker option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-171
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
The C library at startup does not initialize a region that is marked as OVERLAY. The contents of the
memory that is used by the overlay region is the responsibility of an overlay manager. If the region
contains initialized data, use the NOCOMPRESS attribute to prevent RW data compression.
You can use the linker defined symbols to obtain the addresses that are required to copy the code and
data.
You can use the OVERLAY attribute on a single region that is not at the same address as a different region.
Therefore, you can use an overlay region as a method to prevent the initialization of particular regions by
the C library startup code. As with any overlay region, you must manually initialize them in your code.
An overlay region can have a relative base. The behavior of an overlay region with a +offset base
address depends on the regions that precede it and the value of +offset. If they have the same +offset
value, the linker places consecutive +offset regions at the same base address.
When a +offset execution region ER follows a contiguous overlapping block of overlay execution
regions the base address of ER is:
limit address of the overlapping block of overlay execution regions + offset
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-172
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
The following table shows the effect of +offset when used with the OVERLAY attribute. REGION1 appears
immediately before REGION2 in the scatter file:
The following example shows the use of relative offsets with overlays and the effect on execution region
addresses:
EMB_APP 0x8000
{
CODE 0x8000
{
*(+RO)
}
# REGION1 Base = CODE limit
REGION1 +0 OVERLAY
{
module1.o(*)
}
# REGION2 Base = REGION1 Base
REGION2 +0 OVERLAY
{
module2.o(*)
}
# REGION3 Base = REGION2 Base = REGION1 Base
REGION3 +0 OVERLAY
{
module3.o(*)
}
# REGION4 Base = REGION3 Limit + 4
Region4 +4 OVERLAY
{
module4.o(*)
}
}
If the length of the non-overlay area is unknown, you can use a zero relative offset to specify the start
address of an overlay so that it is placed immediately after the end of the static section.
Related information
Load region descriptions
Load region attributes
Inheritance rules for load region address attributes
Considerations when using a relative address +offset for a load region
Considerations when using a relative address +offset for execution regions
--emit_debug_overlay_relocs linker option
--emit_debug_overlay_section linker option
ABI for the Arm Architecture: Support for Debugging Overlaid Programs
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-173
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
The overlay manager must ensure that the correct overlay segment is loaded before calling any function
in that segment. If a function from one overlay is called while a different overlay is loaded, then some
kind of runtime failure occurs. If such a failure is a possibility, the linker and compiler do not warn you
because it is not statically determinable. The same is true for a data overlay.
The central component of this overlay manager is a routine to copy code and data from the load address
to the execution address. This routine is based around the following linker defined symbols:
• Load$$execution_region_name$$Base, the load address.
• Image$$execution_region_name$$Base, the execution address.
• Image$$execution_region_name$$Length, the length of the execution region.
The implementation of the overlay manager depends on the system requirements. This procedure shows
a simple method of implementing an overlay manager. The downloadable example contains a
Readme.txt file that describes details of each source file.
The copy routine that is called load_overlay() is implemented in overlay_manager.c. The routine
uses memcpy() and memset() functions to copy CODE and RW data overlays, and to clear ZI data
overlays.
Note
For RW data overlays, it is necessary to disable RW data compression for the whole project. You can
disable compression with the linker command-line option --datacompressor off, or you can mark the
execution region with the attribute NOCOMPRESS.
The assembly file overlay_list.s lists all the required symbols. This file defines and exports two
common base addresses and a RAM space that is mapped to the overlay structure table:
code_base
data_base
overlay_regions
As specified in the scatter file, the two functions, func1() and func2(), and their corresponding data are
placed in CODE_ONE, CODE_TWO, DATA_ONE, DATA_TWO regions, respectively. armlink has a special
mechanism for replacing calls to functions with stubs. To use this mechanism, write a small stub for each
function in the overlay that might be called from outside the overlay.
In this example, two stub functions $Sub$$func1() and $Sub$$func2() are created for the two
functions func1() and func2() in overlay_stubs.c. These stubs call the overlay-loading function
load_overlay() to load the corresponding overlay. After the overlay manager finishes its overlay
loading task, the stub function can then call $Super$$func1 to call the loaded function func1() in the
overlay.
Procedure
1. Create the overlay_manager.c program to copy the correct overlay to the runtime addresses.
// overlay_manager.c
/* Basic overlay manager */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-174
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
int current_overlay = 0;
void load_overlay(int n)
{
const overlay_region_t * selected_region;
if(n == current_overlay)
{
printf("Overlay %d already loaded.\n", n);
return;
}
/* boundary check */
if(n<1 || n>NUM_OVERLAYS)
{
printf("Error - invalid overlay number %d specified\n", n);
exit(1);
}
/* Comment out the next line if your overlays have any static ZI variables
* and should not be reinitialized each time, and move them out of the
* overlay region in your scatter file */
memset(selected_region->exec_zi_base, 0, selected_region->zi_length);
printf("...Done.\n");
2. Create a separate source file for each of the functions func1() and func2().
// func1.c
#include <stdio.h>
#include <stdlib.h>
void func1(void)
{
unsigned int i;
printf("%s\n", func1_string);
for(i = 19; i; i--)
{
func1_values[i] = rand();
foo(i);
printf("%d ", func1_values[i]);
}
printf("\n");
}
// func2.c
#include <stdio.h>
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-175
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
void func2(void)
{
printf("%s\n", func2_string);
foo(func2_values[9]);
}
int main(void)
{
printf("Start of main()...\n");
func1();
func2();
/*
* Call func2() again to demonstrate that we don't need to
* reload the overlay
*/
func2();
func1();
printf("End of main()...\n");
return 0;
}
void foo(int x)
{
return;
}
4. Create overlay_stubs.c to provide two stub functions $Sub$$func1() and $Sub$$func2() for the
two functions func1() and func2().
// overlay_stub.c
extern void $Super$$func1(void);
extern void $Super$$func2(void);
void $Sub$$func1(void)
{
load_overlay(1);
$Super$$func1();
}
void $Sub$$func2(void)
{
load_overlay(2);
$Super$$func2();
}
IMPORT ||Load$$CODE_ONE$$Base||
IMPORT ||Load$$CODE_TWO$$Base||
IMPORT ||Load$$DATA_ONE$$Base||
IMPORT ||Load$$DATA_TWO$$Base||
IMPORT ||Image$$CODE_ONE$$Base||
IMPORT ||Image$$DATA_ONE$$Base||
IMPORT ||Image$$DATA_ONE$$ZI$$Base||
IMPORT ||Image$$DATA_TWO$$ZI$$Base||
IMPORT ||Image$$CODE_ONE$$Length||
IMPORT ||Image$$CODE_TWO$$Length||
IMPORT ||Image$$DATA_ONE$$ZI$$Length||
IMPORT ||Image$$DATA_TWO$$ZI$$Length||
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-176
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
; Symbols to export
EXPORT code_base
EXPORT data_base
EXPORT overlay_regions
overlay_regions
; overlay 1
DCD ||Load$$CODE_ONE$$Base||
DCD ||Load$$DATA_ONE$$Base||
DCD ||Image$$DATA_ONE$$ZI$$Base||
DCD ||Image$$CODE_ONE$$Length||
DCD ||Image$$DATA_ONE$$ZI$$Length||
; overlay 2
DCD ||Load$$CODE_TWO$$Base||
DCD ||Load$$DATA_TWO$$Base||
DCD ||Image$$DATA_TWO$$ZI$$Base||
DCD ||Image$$CODE_TWO$$Length||
DCD ||Image$$DATA_TWO$$ZI$$Length||
END
return config;
}
RAM_EXEC 0x10000
{
* (+RW, +ZI)
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-177
reserved.
Non-Confidential
9 Overlays
9.3 Manual overlay support
Related concepts
9.3 Manual overlay support on page 9-172
Related information
Use of $Super$$ and $Sub$$ to patch symbol definitions
Related concepts
9.1 Overlay support in Arm® Compiler on page 9-166
Related information
Execution region attributes
--emit_debug_overlay_relocs linker option
--emit_debug_overlay_section linker option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 9-178
reserved.
Non-Confidential
Chapter 10
Embedded Software Development
Describes how to develop embedded applications with Arm Compiler, with or without a target system
present.
It contains the following sections:
• 10.1 About embedded software development on page 10-181.
• 10.2 Default compilation tool behavior on page 10-182.
• 10.3 C library structure on page 10-183.
• 10.4 Default memory map on page 10-184.
• 10.5 Application startup on page 10-186.
• 10.6 Tailoring the C library to your target hardware on page 10-187.
• 10.7 Reimplementing C library functions on page 10-189.
• 10.8 Tailoring the image memory map to your target hardware on page 10-191.
• 10.9 About the scatter-loading description syntax on page 10-192.
• 10.10 Root regions on page 10-193.
• 10.11 Placing the stack and heap on page 10-194.
• 10.12 Run-time memory models on page 10-195.
• 10.13 Reset and initialization on page 10-197.
• 10.14 The vector table on page 10-198.
• 10.15 ROM and RAM remapping on page 10-199.
• 10.16 Local memory setup considerations on page 10-200.
• 10.17 Stack pointer initialization on page 10-201.
• 10.18 Hardware initialization on page 10-202.
• 10.19 Execution mode considerations on page 10-203.
• 10.20 Target hardware and the memory map on page 10-204.
• 10.21 Execute-only memory on page 10-205.
• 10.22 Building applications for execute-only memory on page 10-206.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-179
reserved.
Non-Confidential
10 Embedded Software Development
• 10.23 Vector table for ARMv6 and earlier, ARMv7-A and ARMv7-R profiles on page 10-207.
• 10.24 Vector table for M-profile architectures on page 10-208.
• 10.25 Vector Table Offset Register on page 10-209.
• 10.26 Integer division-by-zero errors in C code on page 10-210.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-180
reserved.
Non-Confidential
10 Embedded Software Development
10.1 About embedded software development
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-181
reserved.
Non-Confidential
10 Embedded Software Development
10.2 Default compilation tool behavior
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-182
reserved.
Non-Confidential
10 Embedded Software Development
10.3 C library structure
For example, the following figure shows the C library implementing the function printf() by writing to
the debugger console window. This implementation is provided by calling _sys_write(), a support
function that executes a semihosting call, resulting in the default behavior using the debugger instead of
target peripherals.
Functions called by
ISO C your application,
for example, printf()
C Library
Device driver level.
input/ error stack and Use semihosting,
output handling heap other for example,
setup _sys_write()
Debug Implemented by
Agent Semihosting Support the debugging
environment
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-183
reserved.
Non-Confidential
10 Embedded Software Development
10.4 Default memory map
HEAP Calculated
by the linker
ZI
RW
RO
0x8000
Figure 10-2 Default memory map
Note
Processors that are based on andArmv6‑M Armv7‑M architectures have fixed memory maps. Having
fixed memory maps makes porting software easier between different systems that are based on these
processors.
section A
ZI from file2.o
B Section A
RW from file1.o
DATA
A
CODE
RO
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-184
reserved.
Non-Confidential
10 Embedded Software Development
10.4 Default memory map
Generally, the linker sorts the Input sections by attribute (RO, RW, ZI), by name, and then by position in
the input list.
To fully control the placement of code and data, you must use the scatter-loading mechanism.
Related concepts
10.6 Tailoring the C library to your target hardware on page 10-187
Related information
The image structure
Section placement with the linker
About scatter-loading
Scatter file syntax
Cortex-M1 Technical Reference Manual
Cortex-M3 Technical Reference Manual
Semihosting for AArch32 and AArch64
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-185
reserved.
Non-Confidential
10 Embedded Software Development
10.5 Application startup
Image
entry point ..__main
copy code and data
copy or decompress RW
. data
Initialize ZI data to
zeros
.
main()
causes the linker to link
in library initialization
code
__rt_entry
.. and heap
initialize library functions
call top-level
constructors (C++)
__main is responsible for setting up the memory and __rt_entry is responsible for setting up the run-
time environment.
__main performs code and data copying, decompression, and zero initialization of the ZI data. It then
branches to __rt_entry to set up the stack and heap, initialize the library functions and static data, and
call any top level C++ constructors. __rt_entry then branches to main(), the entry to your application.
When the main application has finished executing, __rt_entry shuts down the library, then hands
control back to the debugger.
The function label main() has a special significance. The presence of a main() function forces the linker
to link in the initialization code in __main and __rt_entry. Without a function labeled main(), the
initialization sequence is not linked in, and as a result, some standard C library functionality is not
supported.
Related information
--startup=symbol, --no_startup (armlink)
Arm Compiler C Library Startup and Initialization
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-186
reserved.
Non-Confidential
10 Embedded Software Development
10.6 Tailoring the C library to your target hardware
Target- Target-
independent independent
C Library
Retarget
Target- Target- User
Code
dependent dependent
For example, you have a peripheral I/O device, such as an LCD screen, and want to override the library
implementation of fputc(), which writes to the debugger console, with one that prints to the LCD.
Because this implementation of fputc() is linked in to the final image, the entire printf() family of
functions prints to the LCD.
In a standalone application, you are unlikely to support semihosting operations. Therefore, you must
remove all calls to target-dependent C library functions or re-implement them with non-semihosting
functions.
Related information
Using the libraries in a nonsemihosting environment
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-187
reserved.
Non-Confidential
10 Embedded Software Development
10.6 Tailoring the C library to your target hardware
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-188
reserved.
Non-Confidential
10 Embedded Software Development
10.7 Reimplementing C library functions
Use armclang and armar to create a library from your reimplemented printf() function:
armclang --target=arm-arm-none-eabi -c -O2 -march=armv7-a -mfpu=none mylib.c -o mylib.o
armar --create mylib.a mylib.o
void foo(void)
{
printf("Hello, world!\n");
}
Use armclang to build the example application source file using the -nostdlib, -nostdlibinc and -
fno-builtin options. Then use armlink to link the example reimplemented library using the --
no_scanlib option.
If you do not use the -fno-builtin option, then the compiler transforms the printf() function to the
puts() function, and the linker generates an error because it cannot find the puts() function in the
reimplemented library.
armclang --target=arm-arm-none-eabi -c -O2 -march=armv7-a -mfpu=none -nostdlib -nostdlibinc
foo.c -o foo.o
armlink foo.o mylib.a -o image.axf --no_scanlib
Note
If the linker sees a definition of main(), it automatically creates a reference to a startup symbol called
__main. The Arm standard C library defines __main to provide startup code. If you use your own library
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-189
reserved.
Non-Confidential
10 Embedded Software Development
10.7 Reimplementing C library functions
instead of the Arm standard C library, then you must provide your implementation of __main or change
the startup symbol using the linker --startup option.
Related concepts
10.3 C library structure on page 10-183
Related information
--startup (armlink)
Run-time ABI for the Arm Architecture
C Library ABI for the Arm Architecture
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-190
reserved.
Non-Confidential
10 Embedded Software Development
10.8 Tailoring the image memory map to your target hardware
Related information
Information about scatter files
--scatter=filename (armlink)
Armv7‑M Architecture Reference Manual
Armv6‑M Architecture Reference Manual
Semihosting for AArch32 and AArch64
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-191
reserved.
Non-Confidential
10 Embedded Software Development
10.9 About the scatter-loading description syntax
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-192
reserved.
Non-Confidential
10 Embedded Software Development
10.10 Root regions
Because these sections are defined as read-only, they are grouped by the * (+RO) wildcard syntax. As a
result, if * (+RO) is specified in a non-root region, these sections must be explicitly declared in a root
region using InRoot$$Sections.
Note
All eXecute In Place (XIP) code must be stored in root regions.
Related information
About placing Arm C and C++ library code
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-193
reserved.
Non-Confidential
10 Embedded Software Development
10.11 Placing the stack and heap
Related concepts
10.12 Run-time memory models on page 10-195
Related information
Tailoring the C library to a new execution environment
Specifying stack and heap using the scatter file
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-194
reserved.
Non-Confidential
10 Embedded Software Development
10.12 Run-time memory models
One-region model
The application stack and heap grow towards each other in the same region of memory, see the following
figure. In this run-time memory model, the heap is checked against the value of the stack pointer when
new heap space is allocated. For example, when malloc() is called.
Stack Base
0x40000
STACK
HEAP
Two-region model
The stack and heap are placed in separate regions of memory, see the following figure. For example, you
might have a small block of fast RAM that you want to reserve for stack use only. For a two-region
model, you must import __use_two_region_memory.
In this run-time memory model, the heap is checked against the heap limit when new heap space is
allocated.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-195
reserved.
Non-Confidential
10 Embedded Software Development
10.12 Run-time memory models
Heap 0x28080000
Limit
Heap HEAP
Base 0x28000000
Stack 0x40000
STACK
Base
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-196
reserved.
Non-Confidential
10 Embedded Software Development
10.13 Reset and initialization
. data
initialize ZI data to zeros
2
3
.
__user_setup_stackheap()
set up application stack
and heap
..
__rt_entry
initialize library functions
call top-level
4 .
$Sub$$main()
enable caches and
interrupts
constructors (C++)
. 5
.
Exit from application
main()
6 causes the linker to link
in library initialization
code
If you use a scatter file to tailor stack and heap placement, the linker includes a version of the library
heap and stack setup code using the linker defined symbols, ARM_LIB_*, for these region names.
Alternatively you can create your own implementation.
The reset handler is normally a short module coded in assembler that executes immediately on system
startup. As a minimum, your reset handler initializes stack pointers for the modes that your application is
running in. For processors with local memory systems, such as caches, TCMs, MMUs, and MPUs, some
configuration must be done at this stage in the initialization process. After executing, the reset handler
typically branches to __main to begin the C library initialization sequence.
There are some components of system initialization, for example, the enabling of interrupts, that are
generally performed after the C library initialization code has finished executing. The block of code
labeled $Sub$$main() performs these tasks immediately before the main application begins executing.
Related information
About using $Super$$ and $Sub$$ to patch symbol definitions
Specifying stack and heap using the scatter file
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-197
reserved.
Non-Confidential
10 Embedded Software Development
10.14 The vector table
The vector table for the microcontroller profiles is very different to most Arm architectures.
Related concepts
10.23 Vector table for ARMv6 and earlier, ARMv7-A and ARMv7-R profiles on page 10-207
10.24 Vector table for M-profile architectures on page 10-208
Related information
Information about scatter files
Scatter-loading images with a simple memory map
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-198
reserved.
Non-Confidential
10 Embedded Software Development
10.15 ROM and RAM remapping
Note
This information does not apply to Armv6‑M, Armv7‑M, and Armv8‑M profiles.
Note
This information assumes that an Arm processor begins fetching instructions at 0x0. This is the standard
behavior for systems based on Arm processors. However, some Arm processors, for example the
processors based on the Armv7‑A architecture, can be configured to begin fetching instructions from
0xFFFF0000.
There has to be a valid instruction at 0x0 at startup, so you must have nonvolatile memory located at 0x0
at the moment of power-on reset. One way to achieve this is to have ROM located at 0x0. However, there
are some drawbacks to this configuration.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-199
reserved.
Non-Confidential
10 Embedded Software Development
10.16 Local memory setup considerations
Tightly Coupled Memories (TCM) must also be enabled before branching to __main, normally before
MMU/MPU setup, because you generally want to scatter-load code and data into TCMs. You must be
careful that you do not have to access memory that is masked by the TCMs when they are enabled.
You might also encounter problems with cache coherency if caches are enabled before branching to
__main. Code in __main copies code regions from their load address to their execution address,
essentially treating instructions as data. As a result, some instructions can be cached in the data cache, in
which case they are not visible to the instruction path.
To avoid these coherency problems, enable caches after the C library initialization sequence finishes
executing.
Related information
Cortex-A Series Programmer's Guide for Armv8-A
Cortex-A Series Programmer's Guide for Armv7-A
Cortex-R Series Programmer's Guide for Armv7-R
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-200
reserved.
Non-Confidential
10 Embedded Software Development
10.17 Stack pointer initialization
The stack_base symbol can be a hard-coded address, or it can be defined in a separate assembler source
file and located by a scatter file.
The example allocates 256 bytes of stack for Fast Interrupt Request (FIQ) and Interrupt Request (IRQ)
mode, but you can do the same for any other execution mode. To set up the stack pointers, enter each
mode with interrupts disabled, and assign the appropriate value to the stack pointer.
The stack pointer value set up in the reset handler is automatically passed as a parameter to
__user_initial_stackheap() by C library initialization code. Therefore, this value must not be
modified by __user_initial_stackheap().
Related information
Specifying stack and heap using the scatter file
Cortex-M3 Embedded Software Development
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-201
reserved.
Non-Confidential
10 Embedded Software Development
10.18 Hardware initialization
The linker replaces the function call to main() with a call to $Sub$$main(). From there you can call a
routine that enables caches and another to enable interrupts.
The code branches to the real main() by calling $Super$$main().
Related information
Use of $Super$$ and $Sub$$ to patch symbol definitions
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-202
reserved.
Non-Confidential
10 Embedded Software Development
10.19 Execution mode considerations
Note
This does not apply to Armv6‑M, Armv7‑M, and Armv8‑M profiles.
Much of the functionality that you are likely to implement at startup, both in the reset handler and $Sub$
$main, can only be done while executing in privileged modes, for example, on-chip memory
manipulation, and enabling interrupts.
If you want to run your application in a privileged mode, this is not an issue. Ensure that you change to
the appropriate mode before exiting your reset handler.
If you want to run your application in User mode, however, you can only change to User mode after
completing the necessary tasks in a privileged mode. The most likely place to do this is in $Sub$
$main().
Note
The C library initialization code must use the same stack as the application. If you need to use a non-
User mode in $Sub$$main and User mode in the application, you must exit your reset handler in System
mode, which uses the User mode stack pointer.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-203
reserved.
Non-Confidential
10 Embedded Software Development
10.20 Target hardware and the memory map
Note
You can also use __attribute__((section(".ARM.__at_address"))) to specify the absolute address
of a variable.
It is important that the contents of these registers are not zero-initialized during application startup,
because this is likely to change the state of your system. Marking an execution region with the UNINIT
attribute prevents ZI data in that region from being zero-initialized by __main.
Related concepts
8.7 Placement of functions and data at specific addresses on page 8-136
Related information
__attribute__((section("name"))) variable attribute
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-204
reserved.
Non-Confidential
10 Embedded Software Development
10.21 Execute-only memory
Related tasks
10.22 Building applications for execute-only memory on page 10-206
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-205
reserved.
Non-Confidential
10 Embedded Software Development
10.22 Building applications for execute-only memory
Note
LTO does not honor the armclang -mexecute-only option. If you use the armclang -flto or -Omax
options, then the compiler cannot generate execute-only code.
Procedure
1. Compile your C or C++ code using the -mexecute-only option.
Example: armclang --target=arm-arm-none-eabi -march=armv7-m -mexecute-only -c
test.c -o test.o
The -mexecute-only option prevents the compiler from generating any data accesses to the code
sections.
To keep code and data in separate sections, the compiler disables the placement of literal pools inline
with code.
Compiled execute-only code sections in the ELF object file are marked with the SHF_ARM_NOREAD
flag.
2. Specify the memory map to the linker using either of the following:
• The +XO selector in a scatter file.
• The armlink --xo-base option on the command-line.
Example: armlink --xo-base=0x8000 test.o -o test.axf
Results:
The XO execution region is placed in a separate load region from the RO, RW, and ZI execution
regions.
Note
If you do not specify --xo-base, then by default:
• The XO execution region is placed immediately before the RO execution region, at address
0x8000.
• All execution regions are in the same load region.
Related concepts
10.21 Execute-only memory on page 10-205
Related information
-mexecute-only (armclang)
--execute_only (armasm)
--xo_base=address (armlink)
AREA directive
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-206
reserved.
Non-Confidential
10 Embedded Software Development
10.23 Vector table for ARMv6 and earlier, ARMv7-A and ARMv7-R profiles
10.23 Vector table for ARMv6 and earlier, ARMv7-A and ARMv7-R profiles
The vector table for Armv6 and earlier, Armv7‑A and Armv7‑R profiles consists of branch or load PC
instructions to the relevant handlers.
If required, you can include the FIQ handler at the end of the vector table to ensure it is handled as
efficiently as possible, see the following example. Using a literal pool means that addresses can easily be
modified later if necessary.
This example assumes that you have ROM at location 0x0 on reset. Alternatively, you can use the
scatter-loading mechanism to define the load and execution address of the vector table. In that case, the C
library copies the vector table for you.
Note
The vector table for Armv6 and earlier architectures supports A32 instructions only. Armv6T2 and later
architectures support both T32 instructions and A32 instructions in the vector table. This does not apply
to the Armv6‑M, Armv7‑M, and Armv8‑M profiles.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-207
reserved.
Non-Confidential
10 Embedded Software Development
10.24 Vector table for M-profile architectures
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-208
reserved.
Non-Confidential
10 Embedded Software Development
10.25 Vector Table Offset Register
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-209
reserved.
Non-Confidential
10 Embedded Software Development
10.26 Integer division-by-zero errors in C code
When integer division by zero is detected, a branch to __aeabi_idiv0() is made. To trap the division by
zero, therefore, you only have to place a breakpoint on __aeabi_idiv0().
The library provides two implementations of __aeabi_idiv0(). The default one does nothing, so if
division by zero is detected, the division function returns zero. However, if you use signal handling, an
alternative implementation is selected that calls __rt_raise(SIGFPE, DIVBYZERO).
If you provide your own version of __aeabi_idiv0(), then the division functions call this function. The
function prototype for __aeabi_idiv0() is:
int __aeabi_idiv0(void);
If __aeabi_idiv0() returns a value, that value is used as the quotient returned by the division function.
On entry into __aeabi_idiv0(), the link register LR contains the address of the instruction after the call
to the __aeabi_uidiv() division routine in your application code.
The offending line in the source code can be identified by looking up the line of C code in the debugger
at the address given by LR.
If you want to examine parameters and save them for postmortem debugging when trapping
__aeabi_idiv0, you can use the $Super$$ and $Sub$$ mechanism:
1. Prefix __aeabi_idiv0() with $Super$$ to identify the original unpatched function
__aeabi_idiv0().
2. Use __aeabi_idiv0() prefixed with $Super$$ to call the original function directly.
3. Prefix __aeabi_idiv0() with $Sub$$ to identify the new function to be called in place of the
original version of __aeabi_idiv0().
4. Use __aeabi_idiv0() prefixed with $Sub$$ to add processing before or after the original function
__aeabi_idiv0().
The following example shows how to intercept __aeabi_div0 using the $Super$$ and $Sub$$
mechanism.
extern void $Super$$__aeabi_idiv0(void);
/* this function is called instead of the original __aeabi_idiv0() */
void $Sub$$__aeabi_idiv0()
{
// insert code to process a divide by zero
...
// call the original __aeabi_idiv0 function
$Super$$__aeabi_idiv0();
}
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-210
reserved.
Non-Confidential
10 Embedded Software Development
10.26 Integer division-by-zero errors in C code
If you re-implement __rt_raise(), then the library automatically provides the signal-handling library
version of __aeabi_idiv0(), which calls __rt_raise(), then that library version of __aeabi_idiv0()
is included in the final image.
In that case, when a divide-by-zero error occurs, __aeabi_idiv0() calls __rt_raise(SIGFPE,
DIVBYZERO). Therefore, if you re-implement __rt_raise(), you must check (signal == SIGFPE) &&
(type == DIVBYZERO) to determine if division by zero has occurred.
Related information
Use of $Super$$ and $Sub$$ to patch symbol definitions
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 10-211
reserved.
Non-Confidential
Chapter 11
Building Secure and Non-secure Images Using
Arm®v8-M Security Extensions
Describes how to use the Armv8‑M Security Extensions to build a secure image, and how to allow a
non-secure image to call a secure image.
It contains the following sections:
• 11.1 Overview of building Secure and Non-secure images on page 11-213.
• 11.2 Building a Secure image using the Arm®v8‑M Security Extensions on page 11-216.
• 11.3 Building a Non-secure image that can call a Secure image on page 11-220.
• 11.4 Building a Secure image using a previously generated import library on page 11-222.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-212
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.1 Overview of building Secure and Non-secure images
Note
The Armv8‑M Security Extension is not supported when building Read-Only Position-Independent
(ROPI) and Read-Write Position-Independent (RWPI) images.
To build an image that runs in the Secure state you must include the <arm_cmse.h> header in your code,
and compile using the armclang -mcmse command-line option. Compiling in this way makes the
following features available:
• The Test Target, TT, instruction.
• TT instruction intrinsics.
• Non-secure function pointer intrinsics.
• The __attribute__((cmse_nonsecure_call)) and __attribute__((cmse_nonsecure_entry))
function attributes.
On startup, your Secure code must set up the Security Attribution Unit (SAU) and call the Non-secure
startup code.
void your_api(int p1, int p2, int p3, int p4, int p5) {
Params p1 = { p1, p2, p3, p4, p5 };
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-213
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.1 Overview of building Secure and Non-secure images
your_api_implementation(&p1);
}
Here, your_api_implementation(&p1) is the call to your existing function, with fewer than the
maximum of 4 arguments allowed.
The following figure is a graphical representation of the calling sequence, but for clarity, the return from
the entry function is not shown:
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-214
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.1 Overview of building Secure and Non-secure images
Related tasks
11.2 Building a Secure image using the Arm®v8‑M Security Extensions on page 11-216
11.4 Building a Secure image using a previously generated import library on page 11-222
11.3 Building a Non-secure image that can call a Secure image on page 11-220
Related information
Whitepaper - Armv8‑M Architecture Technical Overview
-mcmse
__attribute__((cmse_nonsecure_call)) function attribute
__attribute__((cmse_nonsecure_entry)) function attribute
Predefined macros
TT instruction intrinsics
Non-secure function pointer intrinsics
B instruction
BL instruction
BXNS instruction
SG instruction
TT, TTT, TTA, TTAT instruction
Placement of CMSE veneer sections for a Secure image
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-215
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.2 Building a Secure image using the Arm®v8-M Security Extensions
Prerequisites
The following procedure is not a complete example, and assumes that your code sets up the Security
Attribution Unit (SAU) and calls the Non-secure startup code.
Procedure
1. Create an interface header file, myinterface_v1.h, to specify the C linkage for use by Non-secure
code:
Example:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
2. In the C program for your Secure code, secure.c, include the following:
Example:
#include <arm_cmse.h>
#include "myinterface_v1.h"
In addition to the implementation of the two entry functions, the code defines the function func1()
that is called only by Secure code.
Note
If you are compiling the Secure code as C++, then you must add extern "C" to the functions
declared as __attribute__((cmse_nonsecure_entry)).
4. Enter the following command to see the disassembly of the machine code that armclang generates:
Example:
$ armclang -c --target=arm-arm-none-eabi -march=armv8-m.main -mcmse -S secure.c
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-216
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.2 Building a Secure image using the Arm®v8-M Security Extensions
.fnstart
@ BB#0:
.save {r7, lr}
push {r7, lr}
...
bl func1
...
pop.w {r7, lr}
...
bxns lr
...
__acle_se_entry2:
entry2:
.fnstart
@ BB#0:
.save {r7, lr}
push {r7, lr}
...
bl entry1
...
pop.w {r7, lr}
bxns lr
...
main:
.fnstart
@ BB#0:
...
movs r0, #0
...
bx lr
...
An entry function does not start with a Secure Gateway (SG) instruction. The two symbols
__acle_se_entry_name and entry_name indicate the start of an entry function to the linker.
5. Create a scatter file containing the Veneer$$CMSE selector to place the entry function veneers in a
Non-Secure Callable (NSC) memory region.
Example:
LOAD_REGION 0x0 0x3000
{
EXEC_R 0x0
{
*(+RO,+RW,+ZI)
}
EXEC_NSCR 0x4000 0x1000
{
*(Veneer$$CMSE)
}
ARM_LIB_STACK 0x700000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
...
6. Link the object file using the armlink --import-cmse-lib-out command-line option and the scatter
file to create the Secure image:
Example:
$ armlink secure.o -o secure.axf --cpu 8-M.Main --import-cmse-lib-out importlib_v1.o --
scatter secure.scf
In addition to the final image, the link in this example also produces the import library,
importlib_v1.o, for use when building a Non-secure image. Assuming that the section with veneers
is placed at address 0x4000, the import library consists of a relocatable file containing only a symbol
table with the following entries:
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-217
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.2 Building a Secure image using the Arm®v8-M Security Extensions
When you link the relocatable file corresponding to this assembly code into an image, the linker
creates veneers in a section containing only entry veneers.
Note
If you have an import library from a previous build of the Secure image, you can ensure that the
addresses in the output import library do not change when producing a new version of the Secure
image. To ensure that the addresses do not change, specify the --import-cmse-lib-in command-
line option together with the --import-cmse-lib-out option. However, make sure the input and
output libraries have different names.
7. Enter the following command to see the entry veneers that the linker generates:
Example:
$ fromelf --text -s -c secure.axf
The following entry veneers are generated in the EXEC_NSCR execute-only (XO) region for this
example:
...
...
The section with the veneers is aligned on a 32-byte boundary and padded to a 32-byte boundary.
If you do not use a scatter file, the entry veneers are placed in an ER_XO section as the first
execution region, for example:
...
$t
entry1
0x00008000: e97fe97f .... SG ; [0x7e08]
0x00008004: f000b85a ..Z. B.W __acle_se_entry1 ; 0x80bc
entry2
0x00008008: e97fe97f .... SG ; [0x7e10]
0x0000800c: f000b868 ..h. B.W __acle_se_entry2 ; 0x80e0
...
Next Steps
After you have built your Secure image:
1. Pre-load the Secure image onto your device.
2. Deliver your device with the pre-loaded image, together with the import library package, to a party
who develops the Non-secure code for this device. The import library package contains:
• The interface header file, myinterface_v1.h.
• The import library, importlib_v1.o.
Related tasks
11.4 Building a Secure image using a previously generated import library on page 11-222
11.3 Building a Non-secure image that can call a Secure image on page 11-220
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-218
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.2 Building a Secure image using the Arm®v8-M Security Extensions
Related information
Whitepaper - Armv8‑M Architecture Technical Overview
-c armclang option
-march armclang option
-mcmse armclang option
-S armclang option
--target armclang option
__attribute__((cmse_nonsecure_entry)) function attribute
SG instruction
--cpu armlink option
--import_cmse_lib_in armlink option
--import_cmse_lib_out armlink option
--scatter armlink option
--text fromelf option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-219
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.3 Building a Non-secure image that can call a Secure image
Prerequisites
The following procedure assumes that you have the import library package that is created in
11.2 Building a Secure image using the Arm®v8‑M Security Extensions on page 11-216. The package
provides the C linkage that allows you to compile your Non-secure code as C or C++.
The import library package identifies the entry points for the Secure image.
Procedure
1. Include the interface header file in the C program for your Non-secure code, nonsecure.c, and use
the entry functions as required.
Example:
#include <stdio.h>
#include "myinterface_v1.h"
int main(void) {
int val1, val2, x;
val1 = entry1(x);
val2 = entry2(x);
if (val1 == val2) {
printf("val2 is equal to val1\n");
} else {
printf("val2 is different from val1\n");
}
return 0;
}
3. Create a scatter file for the Non-secure image, but without the Non-Secure Callable (NSC) memory
region.
Example:
LOAD_REGION 0x8000 0x3000
{
ER 0x8000
{
*(+RO,+RW,+ZI)
}
ARM_LIB_STACK 0x800000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
...
4. Link the object file using the import library, importlib_v1.o, and the scatter file to create the Non-
secure image.
Example:
$ armlink nonsecure.o importlib_v1.o -o nonsecure.axf --cpu=8-M.Main --scatter
nonsecure.scat
Related tasks
11.2 Building a Secure image using the Arm®v8‑M Security Extensions on page 11-216
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-220
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.3 Building a Non-secure image that can call a Secure image
Related information
Whitepaper - Armv8‑M Architecture Technical Overview
-march armclang option
--target armclang option
--cpu armlink option
--scatter armlink option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-221
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.4 Building a Secure image using a previously generated import library
Prerequisites
The following procedure is not a complete example, and assumes that your code sets up the Security
Attribution Unit (SAU) and calls the Non-secure startup code.
The following procedure assumes that you have the import library package that is created in
11.2 Building a Secure image using the Arm®v8‑M Security Extensions on page 11-216.
Procedure
1. Create an interface header file, myinterface_v2.h, to specify the C linkage for use by Non-secure
code:
Example:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
2. Include the following in the C program for your Secure code, secure.c:
Example:
#include <arm_cmse.h>
#include "myinterface_v2.h"
In addition to the implementation of the two entry functions, the code defines the function func1()
that is called only by Secure code.
Note
If you are compiling the Secure code as C++, then you must add extern "C" to the functions
declared as __attribute__((cmse_nonsecure_entry)).
4. To see the disassembly of the machine code that is generated by armclang, enter:
Example:
$ armclang -c --target arm-arm-none-eabi -march=armv8-m.main -mcmse -S secure.c
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-222
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.4 Building a Secure image using a previously generated import library
...
func1:
.fnstart
...
bx lr
...
__acle_se_entry1:
entry1:
.fnstart
@ BB#0:
.save {r7, lr}
push {r7, lr}
...
bl func1
pop.w {r7, lr}
...
bxns lr
...
__acle_se_entry4:
entry4:
.fnstart
@ BB#0:
.save {r7, lr}
push {r7, lr}
...
bl entry1
...
pop.w {r7, lr}
bxns lr
...
main:
.fnstart
@ BB#0:
...
movs r0, #0
...
bx lr
...
An entry function does not start with a Secure Gateway (SG) instruction. The two symbols
__acle_se_entry_name and entry_name indicate the start of an entry function to the linker.
5. Create a scatter file containing the Veneer$$CMSE selector to place the entry function veneers in a
Non-Secure Callable (NSC) memory region.
Example:
LOAD_REGION 0x0 0x3000
{
EXEC_R 0x0
{
*(+RO,+RW,+ZI)
}
EXEC_NSCR 0x4000 0x1000
{
*(Veneer$$CMSE)
}
ARM_LIB_STACK 0x700000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
...
6. Link the object file using the armlink --import-cmse-lib-out and --import-cmse-lib-in
command-line option, together with the preprocessed scatter file to create the Secure image:
Example:
$ armlink secure.o -o secure.axf --cpu 8-M.Main --import-cmse-lib-out importlib_v2.o --
import-cmse-lib-in importlib_v1.o --scatter secure.scf
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-223
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.4 Building a Secure image using a previously generated import library
In addition to the final image, the link in this example also produces the import library,
importlib_v2.o, for use when building a Non-secure image. Assuming that the section with veneers
is placed at address 0x4000, the import library consists of a relocatable file containing only a symbol
table with the following entries:
When you link the relocatable file corresponding to this assembly code into an image, the linker
creates veneers in a section containing only entry veneers.
7. Enter the following command to see the entry veneers that the linker generates:
Example:
$ fromelf --text -s -c secure.axf
The following entry veneers are generated in the EXEC_NSCR execute-only (XO) region for this
example:
...
$t
entry1
0x00004000: e97fe97f .... SG ; [0x3e08]
0x00004004: f7fcb85e ..^. B __acle_se_entry1 ; 0xc4
entry2
0x00004008: e97fe97f .... SG ; [0x3e10]
0x0000400c: f7fcb86c ..l. B __acle_se_entry2 ; 0xe8
...
entry3
0x00004020: e97fe97f .... SG ; [0x3e28]
0x00004024: f7fcb872 ..r. B __acle_se_entry3 ; 0x10c
entry4
0x00004028: e97fe97f .... SG ; [0x3e30]
0x0000402c: f7fcb888 .... B __acle_se_entry4 ; 0x140
...
The section with the veneers is aligned on a 32-byte boundary and padded to a 32-byte boundary.
If you do not use a scatter file, the entry veneers are placed in an ER_XO section as the first
execution region. The entry veneers for the existing entry points are placed in a CMSE veneer
section. For example:
...
$t
entry3
0x00008000: e97fe97f .... SG ; [0x7e08]
0x00008004: f000b87e ..~. B.W __acle_se_entry3 ; 0x8104
entry4
0x00008008: e97fe97f .... SG ; [0x7e10]
0x0000800c: f000b894 .... B.W __acle_se_entry4 ; 0x8138
...
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-224
reserved.
Non-Confidential
11 Building Secure and Non-secure Images Using Arm®v8-M Security Extensions
11.4 Building a Secure image using a previously generated import library
Address: 0x00004000
$t
entry1
0x00004000: e97fe97f .... SG ; [0x3e08]
0x00004004: f004b85a ..Z. B.W __acle_se_entry1 ; 0x80bc
entry2
0x00004008: e97fe97f .... SG ; [0x3e10]
0x0000400c: f004b868 ..h. B.W __acle_se_entry2 ; 0x80e0
...
Next Steps
After you have built your updated Secure image:
1. Pre-load the updated Secure image onto your device.
2. Deliver your device with the pre-loaded image, together with the new import library package, to a
party who develops the Non-secure code for this device. The import library package contains:
• The interface header file, myinterface_v2.h.
• The import library, importlib_v2.o.
Related tasks
11.2 Building a Secure image using the Arm®v8‑M Security Extensions on page 11-216
11.3 Building a Non-secure image that can call a Secure image on page 11-220
Related information
Whitepaper - Armv8‑M Architecture Technical Overview
-c armclang option
-march armclang option
-mcmse armclang option
-S armclang option
--target armclang option
__attribute__((cmse_nonsecure_entry)) function attribute
SG instruction
--cpu armlink option
--import_cmse_lib_in armlink option
--import_cmse_lib_out armlink option
--scatter armlink option
--text fromelf option
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 11-225
reserved.
Non-Confidential
Chapter 12
Overview of the Linker
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 12-226
reserved.
Non-Confidential
12 Overview of the Linker
12.1 About the linker
Note
Be aware of the following:
• Generated code might be different between two Arm Compiler releases.
• For a feature release, there might be significant code generation differences.
• You cannot link A32 or T32 code with A64 code.
Note
The command-line option descriptions and related information in the Arm® Compiler Reference Guide
describe all the features that Arm Compiler supports. Any features not documented are not supported and
are used at your own risk. You are responsible for making sure that any generated code using community
features on page Appx-A-266 is operating correctly.
Related references
Chapter 13 Getting Image Details on page 13-232
Related information
Linker support for creating demand-paged files
Linking Models Supported by armlink
Image Structure and Generation
Linker Optimization Features
Accessing and Managing Symbols with armlink
Scatter-loading Features
BPABI Shared Libraries and Executables
Features of the Base Platform Linking Model
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 12-227
reserved.
Non-Confidential
12 Overview of the Linker
12.1 About the linker
Object files must be formatted as Arm ELF. This format is described in:
• ELF for the Arm® Architecture (IHI 0044).
• ELF for the Arm® 64-bit Architecture (AArch64) (IHI 0056).
Optionally, the following files can be used as input to armlink:
• One or more libraries created by the librarian, armar.
• A symbol definitions file.
• A scatter file.
• A steering file.
• A Secure code import library when building a Non-secure image that needs to call a Secure image.
• A Secure code import library when building a Secure image that has to use the entry addresses in a
previously generated import library.
Related concepts
17.1 About the Arm® Librarian on page 17-257
Related references
Chapter 11 Building Secure and Non-secure Images Using Arm®v8‑M Security Extensions
on page 11-212
Related information
--import_cmse_lib_in=filename
Access symbols in another image
Scatter-loading Features
Scatter File Syntax
Linker Steering File Command Reference
ELF for the Arm Architecture (IHI 0044)
ELF for the Arm 64-bit Architecture (AArch64) (IHI 0056)
Note
You can also use fromelf to convert an ELF executable image to other file formats, or to display,
process, and protect the content of an ELF executable image.
Related references
Chapter 11 Building Secure and Non-secure Images Using Arm®v8‑M Security Extensions
on page 11-212
Chapter 15 Overview of the fromelf Image Converter on page 15-242
Related information
Partial linking model
Section placement with the linker
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 12-228
reserved.
Non-Confidential
12 Overview of the Linker
12.1 About the linker
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 12-229
reserved.
Non-Confidential
12 Overview of the Linker
12.2 armlink command-line syntax
where:
options
input-file-list
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 12-230
reserved.
Non-Confidential
12 Overview of the Linker
12.3 What the linker does when constructing an executable image
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 12-231
reserved.
Non-Confidential
Chapter 13
Getting Image Details
Describes how to get image details from the Arm linker, armlink.
It contains the following sections:
• 13.1 Options for getting information about linker-generated files on page 13-233.
• 13.2 Identifying the source of some link errors on page 13-234.
• 13.3 Example of using the --info linker option on page 13-235.
• 13.4 How to find where a symbol is placed when linking on page 13-238.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-232
reserved.
Non-Confidential
13 Getting Image Details
13.1 Options for getting information about linker-generated files
Displays the image memory map, and contains the address and the size of each load region,
execution region, and input section in the image, including linker-generated input sections. It
also shows how RW data compression is applied.
--show_cmdline
Displays a list of each local and global symbol used in the link step, and its value.
--verbose
Displays detailed information about the link operation, including the objects that are included
and the libraries that contain them.
--xref
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-233
reserved.
Non-Confidential
13 Getting Image Details
13.2 Identifying the source of some link errors
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-234
reserved.
Non-Confidential
13 Getting Image Details
13.3 Example of using the --info linker option
Here, sizes gives a list of the Code and data sizes for each input object and library member in the
image. Using this option implies --info sizes,totals.
The following example shows the output in tabular format with the totals separated out for easy reading:
Image component sizes
30 16 0 0 0 0 foo.o
56 10 960 0 1024 372 startup_ARMCM7.o
----------------------------------------------------------------------
88 26 992 0 5120 372 Object Totals
0 0 32 0 4096 0 (incl. Generated)
2 0 0 0 0 0 (incl. Padding)
----------------------------------------------------------------------
Code (inc. data) RO Data RW Data ZI Data Debug Library Member Name
8 0 0 0 0 68 __main.o
0 0 0 0 0 0 __rtentry.o
12 0 0 0 0 0 __rtentry2.o
8 4 0 0 0 0 __rtentry5.o
52 8 0 0 0 0 __scatter.o
26 0 0 0 0 0 __scatter_copy.o
28 0 0 0 0 0 __scatter_zi.o
10 0 0 0 0 68 defsig_exit.o
50 0 0 0 0 88 defsig_general.o
80 58 0 0 0 76 defsig_rtmem_inner.o
14 0 0 0 0 80 defsig_rtmem_outer.o
52 38 0 0 0 76 defsig_rtred_inner.o
14 0 0 0 0 80 defsig_rtred_outer.o
18 0 0 0 0 80 exit.o
76 0 0 0 0 88 fclose.o
470 0 0 0 0 88 flsbuf.o
236 4 0 0 0 128 fopen.o
26 0 0 0 0 68 fputc.o
248 6 0 0 0 84 fseek.o
66 0 0 0 0 76 ftell.o
94 0 0 0 0 80 h1_alloc.o
52 0 0 0 0 68 h1_extend.o
78 0 0 0 0 80 h1_free.o
14 0 0 0 0 84 h1_init.o
80 6 0 4 0 96 heapauxa.o
4 0 0 0 0 136 hguard.o
0 0 0 0 0 0 indicate_semi.o
138 0 0 0 0 168 init_alloc.o
312 46 0 0 0 112 initio.o
2 0 0 0 0 0 libinit.o
6 0 0 0 0 0 libinit2.o
16 8 0 0 0 0 libinit4.o
2 0 0 0 0 0 libshutdown.o
6 0 0 0 0 0 libshutdown2.o
0 0 0 0 96 0 libspace.o
0 0 0 0 0 0 maybetermalloc1.o
44 4 0 0 0 84 puts.o
8 4 0 0 0 68
rt_errno_addr_intlibspace.o
8 4 0 0 0 68
rt_heap_descriptor_intlibspace.o
78 0 0 0 0 80 rt_memclr_w.o
2 0 0 0 0 0 rtexit.o
10 0 0 0 0 0 rtexit2.o
70 0 0 0 0 80 setvbuf.o
240 6 0 0 0 156 stdio.o
0 0 0 12 252 0 stdio_streams.o
62 0 0 0 0 76 strlen.o
12 4 0 0 0 68 sys_exit.o
102 0 0 0 0 240 sys_io.o
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-235
reserved.
Non-Confidential
13 Getting Image Details
13.3 Example of using the --info linker option
0 0 12 0 0 0 sys_io_names.o
14 0 0 0 0 76 sys_wrch.o
2 0 0 0 0 68 use_no_semi.o
----------------------------------------------------------------------
2962 200 14 16 352 3036 Library Totals
12 0 2 0 4 0 (incl. Padding)
----------------------------------------------------------------------
----------------------------------------------------------------------
2962 200 14 16 352 3036 Library Totals
----------------------------------------------------------------------
==============================================================================
==============================================================================
==============================================================================
In this example:
Code (inc. data)
The number of bytes occupied by the code. In this image, there are 3050 bytes of code. This
value includes 226 bytes of inline data (inc. data), for example, literal pools, and short strings.
RO Data
The number of bytes occupied by the RO data. This value is in addition to the inline data
included in the Code (inc. data) column.
RW Data
The number of bytes occupied by the RW data.
ZI Data
The number of bytes occupied by the ZI data.
Debug
The number of bytes occupied by the debug data, for example, debug Input sections and the
symbol and string table.
Object Totals
The number of bytes occupied by the objects when linked together to generate the image.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-236
reserved.
Non-Confidential
13 Getting Image Details
13.3 Example of using the --info linker option
(incl. Generated)
armlink might generate image contents, for example, interworking veneers, and Input sections
such as region tables. If the Object Totals row includes this type of data, it is shown in this
row.
Combined across all of the object files (foo.o and startup_ARMCM7.o), the example shows that
there are 992 bytes of RO data, of which 32 bytes are linker-generated RO data.
Note
If the scatter file contains EMPTY regions, the linker might generate ZI data. In the example, the
4096 bytes of ZI data labeled (incl. Generated) correspond to an ARM_LIB_STACKHEAP
execution region used to set up the stack and heap in a scatter file as follows:
ARM_LIB_STACKHEAP +0x0 EMPTY 0x1000 {} ; 4KB stack + heap
Library Totals
The number of bytes occupied by the library members that have been extracted and added to the
image as individual objects.
(incl. Padding)
If necessary, armlink inserts padding to force section alignment. If the Object Totals row
includes this type of data, it is shown in the associated (incl. Padding) row. Similarly, if the
Library Totals row includes this type of data, it is shown in its associated row.
In the example, there are 992 bytes of RO data in the object total, of which 0 bytes is linker-
generated padding, and 14 bytes of RO data in the library total, with 2 bytes of padding.
Grand Totals
Shows the true size of the image. In the example, there are 5120 bytes of ZI data (in Object
Totals) and 352 of ZI data (in Library Totals) giving a total of 5472 bytes.
ROM Totals
Shows the minimum size of ROM required to contain the image. This size does not include ZI
data and debug information that is not stored in the ROM.
Related references
13.1 Options for getting information about linker-generated files on page 13-233
Related information
--info=topic[,topic,…] (armlink)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-237
reserved.
Non-Confidential
13 Getting Image Details
13.4 How to find where a symbol is placed when linking
Note
You can also run fromelf -s on the resultant image.
Procedure
1. Create the file s.c containing the following source code:
long long array[10] __attribute__ ((section ("ARRAY")));
int main(void)
{
return sizeof(array);
}
...
Execution Region ER_RW (Base: 0x00008360, Size: 0x00000050, Max: 0xffffffff, ABSOLUTE)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 13-238
reserved.
Non-Confidential
Chapter 14
SysV Dynamic Linking
Arm Compiler 6 supports the System V (SysV) linking model and can produce SysV shared objects and
executables. The feature allows building programs for SysV-like platforms.
Note
Cortex‑M processors do not support dynamic linking.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 14-239
reserved.
Non-Confidential
14 SysV Dynamic Linking
14.1 Build a SysV shared object
Procedure
1. Create the file lib.c containing the following code:
__attribute__((visibility("default")))
int lib_func(int a)
{
return 5 * a;
}
3. Run fromelf with the --only option to see that the function lib_func() has the visibility set to
default and is present in the dynamic symbol table:
fromelf -s --only=.dynsym lib.so
...
** Section #2 '.dynsym' (SHT_DYNSYM) [SHF_ALLOC]
Size : 32 bytes (alignment 4)
Address: 0x00000110
String table #3 '.dynstr'
Last local symbol no. 0
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 14-240
reserved.
Non-Confidential
14 SysV Dynamic Linking
14.2 Build a SysV executable
Prerequisites
Build the lib.o shared library as described in 14.1 Build a SysV shared object on page 14-240.
Procedure
1. Create the file app.c containing the following code:
#include <stdio.h>
int lib_func(int a);
int main(void)
{
printf("Result: %d.\n", lib_func(3));
return 0;
}
0 DT_NEEDED 1 (lib.so)
1 DT_HASH 33100 (0x0000814c)
2 DT_STRTAB 33156 (0x00008184)
3 DT_SYMTAB 33124 (0x00008164)
4 DT_STRSZ 17
5 DT_SYMENT 16
6 DT_PLTRELSZ 8
7 DT_PLTGOT 77124 (0x00012d44)
8 DT_DEBUG 0 (0x00000000)
9 DT_JMPREL 33176 (0x00008198)
10 DT_PLTREL 17 (DT_REL)
11 DT_NULL 0
...
When executed, a platform-specific dynamic loader processes information in the dynamic array, loads
lib.so, resolves relocations in all loaded files, and passes control to the main executable. The
program then outputs:
Result: 15.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 14-241
reserved.
Non-Confidential
Chapter 15
Overview of the fromelf Image Converter
Gives an overview of the fromelf image converter provided with Arm Compiler.
It contains the following sections:
• 15.1 About the fromelf image converter on page 15-243.
• 15.2 fromelf execution modes on page 15-244.
• 15.3 Getting help on the fromelf command on page 15-245.
• 15.4 fromelf command-line syntax on page 15-246.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 15-242
reserved.
Non-Confidential
15 Overview of the fromelf Image Converter
15.1 About the fromelf image converter
Note
If your image is produced without debug information, fromelf cannot:
• Translate the image into other file formats.
• Produce a meaningful disassembly listing.
Note
The command-line option descriptions and related information in the Arm® Compiler Reference Guide
describe all the features that Arm Compiler supports. Any features not documented are not supported and
are used at your own risk. You are responsible for making sure that any generated code using community
features on page Appx-A-266 is operating correctly.
Related concepts
16.3 Options to protect code in image files with fromelf on page 16-250
16.4 Options to protect code in object files with fromelf on page 16-251
Related references
15.2 fromelf execution modes on page 15-244
15.4 fromelf command-line syntax on page 15-246
Related information
fromelf Command-line Options
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 15-243
reserved.
Non-Confidential
15 Overview of the fromelf Image Converter
15.2 fromelf execution modes
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 15-244
reserved.
Non-Confidential
15 Overview of the fromelf Image Converter
15.3 Getting help on the fromelf command
Related references
15.4 fromelf command-line syntax on page 15-246
Related information
--help (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 15-245
reserved.
Non-Confidential
15 Overview of the fromelf Image Converter
15.4 fromelf command-line syntax
Syntax
fromelf options input_file
options
input_file
The ELF file or library file to be processed. When some options are used, multiple input files
can be specified.
Related information
fromelf Command-line Options
input_file (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 15-246
reserved.
Non-Confidential
Chapter 16
Using fromelf
Describes how to use the fromelf image converter provided with Arm Compiler.
It contains the following sections:
• 16.1 General considerations when using fromelf on page 16-248.
• 16.2 Examples of processing ELF files in an archive on page 16-249.
• 16.3 Options to protect code in image files with fromelf on page 16-250.
• 16.4 Options to protect code in object files with fromelf on page 16-251.
• 16.5 Option to print specific details of ELF files on page 16-253.
• 16.6 Using fromelf to find where a symbol is placed in an executable ELF image on page 16-254.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-247
reserved.
Non-Confidential
16 Using fromelf
16.1 General considerations when using fromelf
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-248
reserved.
Non-Confidential
16 Using fromelf
16.2 Examples of processing ELF files in an archive
Examples
Consider an archive, test.a, containing the following ELF files:
bmw.o
bmw1.o
call_c_code.o
newtst.o
shapes.o
strmtst.o
This creates an output archive with the name test.a in the subdirectory strip_all
Example of processing a subset of files in the archive
To remove all debug, comments, notes and symbols from only the shapes.o and the strmtst.o
files in the archive, enter:
fromelf --elf --strip=all test.a(s*.o) -o subset/
This creates an output archive with the name test.a in the subdirectory subset. The archive
contains the processed files together with the remaining files that are unprocessed.
To process the bmw.o, bmw1.o, and newtst.o files in the archive, enter:
fromelf --elf --strip=all test.a(??w*) -o subset/
Related information
--disassemble (fromelf)
--elf (fromelf)
input_file (fromelf)
--output=destination (fromelf)
--strip=option[,option,…] (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-249
reserved.
Non-Confidential
16 Using fromelf
16.3 Options to protect code in image files with fromelf
Restrictions
You must use --elf with these options. Because you have to use --elf, you must also use --output.
Effect of the --privacy and --strip options for protecting code in image files
Option Effect
fromelf --elf --privacy Removes the whole symbol table.
Removes the .comment section name. This section is marked as [Anonymous
Section] in the fromelf --text output.
Gives section names a default value. For example, changes code section names to
'.text'.
Example
To produce a new ELF executable image with the complete symbol table removed and with the various
section names changed, enter:
fromelf --elf --privacy --output=outfile.axf infile.axf
Related concepts
16.4 Options to protect code in object files with fromelf on page 16-251
Related references
15.4 fromelf command-line syntax on page 15-246
Related information
--elf (fromelf)
--output=destination (fromelf)
--privacy (fromelf)
--strip=option[,option,…] (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-250
reserved.
Non-Confidential
16 Using fromelf
16.4 Options to protect code in object files with fromelf
Restrictions
You must use --elf with these options. Because you have to use --elf, you must also use --output.
Effect of the --privacy and --strip options for protecting code in object files
Option Local symbols Section Mapping Build
names symbols attributes
fromelf --elf --privacy Removes those local symbols that can be removed Gives section Present Present
without loss of functionality. names a default
value. For
Symbols that cannot be removed, such as the targets
example,
for relocations, are kept. For these symbols, the
changes code
names are removed. These are marked as
section names to
[Anonymous Symbol] in the fromelf --text
'.text'
output.
fromelf --elf Removes those local symbols that can be removed Section names Present Present
--strip=symbols without loss of functionality. remain the same
Symbols that cannot be removed, such as the targets
for relocations, are kept. For these symbols, the
names are removed. These are marked as
[Anonymous Symbol] in the fromelf --text
output.
fromelf --elf Removes those local symbols that can be removed Section names Present Present
--strip=localsymbols without loss of functionality. remain the same
Symbols that cannot be removed, such as the targets
for relocations, are kept. For these symbols, the
names are removed. These are marked as
[Anonymous Symbol] in the fromelf --text
output.
Example
To produce a new ELF object with the complete symbol table removed and various section names
changed, enter:
fromelf --elf --privacy --output=outfile.o infile.o
Related concepts
16.3 Options to protect code in image files with fromelf on page 16-250
Related references
15.4 fromelf command-line syntax on page 15-246
Related information
--elf (fromelf)
--output=destination (fromelf)
--privacy (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-251
reserved.
Non-Confidential
16 Using fromelf
16.4 Options to protect code in object files with fromelf
--strip=option[,option,…] (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-252
reserved.
Non-Confidential
16 Using fromelf
16.5 Option to print specific details of ELF files
Note
You can specify some of the --emit options using the --text option.
Examples
To print the contents of the data sections of an ELF file, infile.axf, enter:
fromelf --emit=data infile.axf
To print relocation information and the dynamic section contents for the ELF file infile2.axf, enter:
fromelf --emit=relocation_tables,dynamic_segment infile2.axf
Related references
15.4 fromelf command-line syntax on page 15-246
Related information
--emit=option[,option,…] (fromelf)
--text (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-253
reserved.
Non-Confidential
16 Using fromelf
16.6 Using fromelf to find where a symbol is placed in an executable ELF image
16.6 Using fromelf to find where a symbol is placed in an executable ELF image
You can find where a symbol is placed in an executable ELF image.
To find where a symbol is placed in an ELF image file, use the --text -s -v options to view the
symbol table and detailed information on each segment and section header, for example:
The symbol table identifies the section where the symbol is placed.
Procedure
1. Create the file s.c containing the following source code:
long long arr[10] __attribute__ ((section ("ARRAY")));
int main()
{
return sizeof(arr);
}
The Sec column shows the section where the stack is placed. In this example, section 5.
6. Locate the section identified for the symbol in the fromelf output, for example:
...
====================================
** Section #5
Name : ARRAY
Type : SHT_PROGBITS (0x00000001)
Flags : SHF_ALLOC + SHF_WRITE (0x00000003)
Addr : 0x00000000
File Offset : 88 (0x58)
Size : 80 bytes (0x50)
Link : SHN_UNDEF
Info : 0
Alignment : 8
Entry Size : 0
====================================
...
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-254
reserved.
Non-Confidential
16 Using fromelf
16.6 Using fromelf to find where a symbol is placed in an executable ELF image
Related information
--text (fromelf)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 16-255
reserved.
Non-Confidential
Chapter 17
Overview of the Arm® Librarian
Gives an overview of the Arm Librarian, armar, provided with Arm Compiler.
It contains the following sections:
• 17.1 About the Arm® Librarian on page 17-257.
• 17.2 Considerations when working with library files on page 17-258.
• 17.3 armar command-line syntax on page 17-259.
• 17.4 Option to get help on the armar command on page 17-260.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 17-256
reserved.
Non-Confidential
17 Overview of the Arm® Librarian
17.1 About the Arm® Librarian
Related information
--debug_symbols (armar)
--library=name (armlink)
--libpath=pathlist (armlink)
--library_type=lib (armlink)
--userlibpath=pathlist (armlink)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 17-257
reserved.
Non-Confidential
17 Overview of the Arm® Librarian
17.2 Considerations when working with library files
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 17-258
reserved.
Non-Confidential
17 Overview of the Arm® Librarian
17.3 armar command-line syntax
Syntax
armar options archive [file_list]
options
armar command-line options.
archive
The filename of the library. A library file must always be specified.
file_list
The list of files to be processed.
Related information
armar Command-line Options
archive (armar)
file_list (armar)
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 17-259
reserved.
Non-Confidential
17 Overview of the Arm® Librarian
17.4 Option to get help on the armar command
Example
To display the help information, enter:
armar --help
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 17-260
reserved.
Non-Confidential
Chapter 18
Overview of the armasm Legacy Assembler
Gives an overview of the armasm legacy assembler provided with Arm Compiler toolchain.
It contains the following sections:
• 18.1 Key features of the armasm assembler on page 18-262.
• 18.2 How the assembler works on page 18-263.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 18-261
reserved.
Non-Confidential
18 Overview of the armasm Legacy Assembler
18.1 Key features of the armasm assembler
Note
armasm does not support some architectural features, such as:
• Features of Armv8.4-A and later architectures, even those back-ported to Armv8.2-A and Armv8.3-
A.
• Half-precision floating-point multiply with add or multiply with subtract arithmetic operations. These
instructions are an optional extension in Armv8.2-A and Armv8.3-A, and a mandatory extension in
Armv8.4-A and later. See +fp16fml in the -mcpu command-line option in the Arm Compiler
Reference Guide.
• AArch64 Crypto instructions (for SHA512, SHA3, SM3, SM4). See +crypto in the -mcpu
command-line option in the Arm Compiler Reference Guide.
• AArch64 Scalable Vector Extension (SVE) instructions. See +sve in the -mcpu command-line option
in the Arm Compiler Reference Guide.
Related concepts
18.2 How the assembler works on page 18-263
Related information
About the Unified Assembler Language
Use of macros
armasm Directives Reference
--cpu=name (armasm)
-mcpu
Arm Compiler Instruction Set Reference Guide
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 18-262
reserved.
Non-Confidential
18 Overview of the armasm Legacy Assembler
18.2 How the assembler works
Related information
Directives that can be omitted in pass 2 of the assembler
Two pass assembler diagnostics
Instruction and directive relocations
--diag_error=tag[,tag,…]
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 18-263
reserved.
Non-Confidential
18 Overview of the armasm Legacy Assembler
18.2 How the assembler works
--debug
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights 18-264
reserved.
Non-Confidential
Appendix A
Supporting reference information
The various features in Arm Compiler might have different levels of support, ranging from fully
supported product features to community features.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-265
reserved.
Non-Confidential
A Supporting reference information
A.1 Support level definitions
Product features
Product features are suitable for use in a production environment. The functionality is well-tested, and is
expected to be stable across feature and update releases.
• Arm intends to give advance notice of significant functionality changes to product features.
• If you have a support and maintenance contract, Arm provides full support for use of all product
features.
• Arm welcomes feedback on product features.
• Any issues with product features that Arm encounters or is made aware of are considered for fixing in
future versions of Arm Compiler.
In addition to fully supported product features, some product features are only alpha or beta quality.
Beta product features
Beta product features are implementation complete, but have not been sufficiently tested to be
regarded as suitable for use in production environments.
Beta product features are indicated with [BETA].
• Arm endeavors to document known limitations on beta product features.
• Beta product features are expected to eventually become product features in a future release
of Arm Compiler 6.
• Arm encourages the use of beta product features, and welcomes feedback on them.
• Any issues with beta product features that Arm encounters or is made aware of are
considered for fixing in future versions of Arm Compiler.
Alpha product features
Alpha product features are not implementation complete, and are subject to change in future
releases, therefore the stability level is lower than in beta product features.
Alpha product features are indicated with [ALPHA].
• Arm endeavors to document known limitations of alpha product features.
• Arm encourages the use of alpha product features, and welcomes feedback on them.
• Any issues with alpha product features that Arm encounters or is made aware of are
considered for fixing in future versions of Arm Compiler.
Community features
Arm Compiler 6 is built on LLVM technology and preserves the functionality of that technology where
possible. This means that there are additional features available in Arm Compiler that are not listed in the
documentation. These additional features are known as community features. For information on these
community features, see the documentation for the Clang/LLVM project.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-266
reserved.
Non-Confidential
A Supporting reference information
A.1 Support level definitions
Where community features are referenced in the documentation, they are indicated with
[COMMUNITY].
• Arm makes no claims about the quality level or the degree of functionality of these features, except
when explicitly stated in this documentation.
• Functionality might change significantly between feature releases.
• Arm makes no guarantees that community features will remain functional across update releases,
although changes are expected to be unlikely.
Some community features might become product features in the future, but Arm provides no roadmap
for this. Arm is interested in understanding your use of these features, and welcomes feedback on them.
Arm supports customers using these features on a best-effort basis, unless the features are unsupported.
Arm accepts defect reports on these features, but does not guarantee that these issues will be fixed in
future releases.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-267
reserved.
Non-Confidential
A Supporting reference information
A.1 Support level definitions
Arm C library
armasm
armasm syntax
syntax C/C++
C/C++ GNU
GNU syntax
syntax
LLVM Project
assembly
assembly Source
Source code
code Assembly
Assembly
libc++
armclang
armasm Source
Source code
code
headers
headers
LLVM Project
clang
Objects
Objects Objects
Objects Objects
Objects
armlink
Scatter/Steering/
Scatter/Steering/
Symdefs
Symdefs file
file
Image
Image
The dashed boxes are toolchain components, and any interaction between these components is an
integration boundary. Community features that span an integration boundary might have significant
limitations in functionality. The exception to this is if the interaction is codified in one of the
standards supported by Arm Compiler 6. See Application Binary Interface (ABI) for the Arm®
Architecture. Community features that do not span integration boundaries are more likely to work as
expected.
• Features primarily used when targeting hosted environments such as Linux or BSD might have
significant limitations, or might not be applicable, when targeting bare-metal environments.
• The Clang implementations of compiler features, particularly those that have been present for a long
time in other toolchains, are likely to be mature. The functionality of new features, such as support
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-268
reserved.
Non-Confidential
A Supporting reference information
A.1 Support level definitions
for new language features, is likely to be less mature and therefore more likely to have limited
functionality.
Deprecated features
A deprecated feature is one that Arm plans to remove from a future release of Arm Compiler. Arm does
not make any guarantee regarding the testing or maintenance of deprecated features. Therefore, Arm
does not recommend using a feature after it is deprecated.
For information on replacing deprecated features with supported features, refer to the Arm Compiler
documentation and Release Notes.
Unsupported features
With both the product and community feature categories, specific features and use-cases are known not
to function correctly, or are not intended for use with Arm Compiler 6.
Limitations of product features are stated in the documentation. Arm cannot provide an exhaustive list of
unsupported features or use-cases for community features. The known limitations on community features
are listed in Community features on page Appx-A-266.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-269
reserved.
Non-Confidential
A Supporting reference information
A.2 Standards compliance in Arm® Compiler
Note
The -fsanitize=undefined command-line option is a [COMMUNITY] feature.
Related information
C++ Status
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-270
reserved.
Non-Confidential
A Supporting reference information
A.3 Compliance with the ABI for the Arm® Architecture (Base Standard)
A.3 Compliance with the ABI for the Arm® Architecture (Base Standard)
The ABI for the Arm Architecture (Base Standard) is a collection of standards. Some of these standards
are open. Some are specific to the Arm architecture.
The Application Binary Interface (ABI) for the Arm® Architecture (Base Standard) (BSABI) regulates the
inter-operation of binary code and development tools in Arm architecture-based execution environments,
ranging from bare metal to major operating systems such as Arm Linux.
By conforming to this standard, objects produced by the toolchain can work together with object libraries
from different producers.
The BSABI consists of a family of specifications including:
AADWARF64
DWARF for the Arm® 64-bit Architecture (AArch64). This ABI uses the DWARF 3 standard to
govern the exchange of debugging data between object producers and debuggers. It also gives
additional rules on how to use DWARF 3, and how it is extended in ways specific to the 64-bit
Arm architecture.
AADWARF
DWARF for the Arm® Architecture. This ABI uses the DWARF 3 standard to govern the
exchange of debugging data between object producers and debuggers.
AAELF64
ELF for the Arm® 64-bit Architecture (AArch64). This specification provides the processor-
specific definitions required by ELF for AArch64-based systems. It builds on the generic ELF
standard to govern the exchange of linkable and executable files between producers and
consumers.
AAELF
ELF for the Arm® Architecture. Builds on the generic ELF standard to govern the exchange of
linkable and executable files between producers and consumers.
AAPCS64
Procedure Call Standard for the Arm® 64-bit Architecture (AArch64). Governs the exchange of
control and data between functions at runtime. There is a variant of the AAPCS for each of the
major execution environment types supported by the toolchain.
AAPCS64 describes a number of different supported data models. Arm Compiler 6 implements
the LP64 data model for AArch64 state.
AAPCS
Procedure Call Standard for the Arm® Architecture. Governs the exchange of control and data
between functions at runtime. There is a variant of the AAPCS for each of the major execution
environment types supported by the toolchain.
BPABI
Base Platform ABI for the Arm® Architecture. Governs the format and content of executable and
shared object files generated by static linkers. Supports platform-specific executable files using
post linking. Provides a base standard for deriving a platform ABI.
CLIBABI
C Library ABI for the Arm® Architecture. Defines an ABI to the C library.
CPPABI64
C++ ABI for the Arm® Architecture. This specification builds on the generic C++ ABI
(originally developed for IA-64) to govern interworking between independent C++ compilers.
DBGOVL
Support for Debugging Overlaid Programs. Defines an extension to the ABI for the Arm
Architecture to support debugging overlaid programs.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-271
reserved.
Non-Confidential
A Supporting reference information
A.3 Compliance with the ABI for the Arm® Architecture (Base Standard)
EHABI
Exception Handling ABI for the Arm® Architecture. Defines both the language-independent and
C++-specific aspects of how exceptions are thrown and handled.
RTABI
Run-time ABI for the Arm® Architecture. Governs what independently produced objects can
assume of their execution environments by way of floating-point and compiler helper-function
support.
If you are upgrading from a previous toolchain release, ensure that you are using the most recent versions
of the Arm specifications.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-272
reserved.
Non-Confidential
A Supporting reference information
A.4 GCC compatibility provided by Arm® Compiler 6
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-273
reserved.
Non-Confidential
A Supporting reference information
A.5 Locale support in Arm® Compiler
Note
There is no support for Shift-Japanese Industrial Standard (Shift-JIS) encoded files.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-274
reserved.
Non-Confidential
A Supporting reference information
A.6 Toolchain environment variables
ARM_TOOL_VARIANT Required only if you have an Arm Development Studio or DS-5 Development Studio toolkit license
and you are running the Arm Compiler tools outside of that environment.
If you have an ultimate license, set this environment variable to ult to enable the Ultimate features.
See Product and toolkit configuration for more information.
ARM_PRODUCT_DEF Required only if you have an Arm Development Studio toolkit license and you are running the Arm
Compiler tools outside of the Arm Development Studio environment.
Use this environment variable to specify the location of the product definition file. See Product and
toolkit configuration for more information.
ARMCOMPILER6_ASMOPT An optional environment variable to define additional assembler options that are to be used outside
your regular makefile.
The options listed appear before any options specified for the armasm command in the makefile.
Therefore, any options specified in the makefile might override the options listed in this environment
variable.
ARMCOMPILER6_CLANGOPT An optional environment variable to define additional armclang options that are to be used outside
your regular makefile.
The options listed appear before any options specified for the armclang command in the makefile.
Therefore, any options specified in the makefile might override the options listed in this environment
variable.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-275
reserved.
Non-Confidential
A Supporting reference information
A.6 Toolchain environment variables
ARMCOMPILER6_LINKOPT An optional environment variable to define additional linker options that are to be used outside your
regular makefile.
The options listed appear before any options specified for the armlink command in the makefile.
Therefore, any options specified in the makefile might override the options listed in this environment
variable.
ARMLMD_LICENSE_FILE This environment variable must be set, and specifies the location of your Arm license file. See
Product and toolkit configuration for more information.
Note
On Windows, the length of ARMLMD_LICENSE_FILE must not exceed 260 characters.
C_INCLUDE_PATH GCC-compatible environment variable. Adds the specified directories to the list of places that are
searched to find included C files.
COMPILER_PATH GCC-compatible environment variable. Adds the specified directories to the list of places that are
searched to find subprograms.
CPATH GCC-compatible environment variable. Adds the specified directories to the list of places that are
searched to find included files regardless of the source language.
CPLUS_INCLUDE_PATH GCC-compatible environment variable. Adds the specified directories to the list of places that are
searched to find included C++ files.
TMP Used on Windows platforms to specify the directory to be used for temporary files.
TMPDIR Used on Red Hat Linux platforms to specify the directory to be used for temporary files.
Related information
Product and toolkit configuration
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-276
reserved.
Non-Confidential
A Supporting reference information
A.7 Clang and LLVM documentation
See the third_party_licenses.txt file in your installation for details of open source software projects
used.
Note
Although Arm Compiler 6 is based on Clang and LLVM technology, it:
• Is not based on the same revision as any specific release of the open source version of Clang or
LLVM;
• Can contain changes introduced by Arm which are not included in the open source version.
The third_party_licenses.txt file includes GitHub links for the specific revisions in the open source
project which are relevant to the particular version of Arm Compiler.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-277
reserved.
Non-Confidential
A Supporting reference information
A.8 Further reading
Arm® publications
Arm periodically provides updates and corrections to its documentation. See Arm® Infocenter for current
errata sheets and addenda, and the Arm Frequently Asked Questions (FAQs).
For full information about the base standard, software interfaces, and standards supported by Arm, see
Application Binary Interface (ABI) for the Arm® Architecture.
In addition, see the following documentation for specific information relating to Arm products:
• Arm® Architecture Reference Manuals.
• Cortex®‑A series processors.
• Cortex®‑R series processors.
• Cortex®‑M series processors.
Other publications
This Arm Compiler tools documentation is not intended to be an introduction to the C or C++
programming languages. It does not try to teach programming in C or C++, and it is not a reference
manual for the C or C++ standards. Other publications provide general information about programming.
The following publications describe the C++ language:
• ISO/IEC 14882:2014, C++ Standard.
• Stroustrup, B., The C++ Programming Language (4th edition, 2013). Addison-Wesley Publishing
Company, Reading, Massachusetts. ISBN 978-0321563842.
The following publications provide general C++ programming information:
• Stroustrup, B., The Design and Evolution of C++ (1994). Addison-Wesley Publishing Company,
Reading, Massachusetts. ISBN 0-201-54330-3.
This book explains how C++ evolved from its first design to the language in use today.
• Vandevoorde, D and Josuttis, N.M. C++ Templates: The Complete Guide (2003). Addison-Wesley
Publishing Company, Reading, Massachusetts. ISBN 0-201-73484-2.
• Meyers, S., Effective C++ (3rd edition, 2005). Addison-Wesley Publishing Company, Reading,
Massachusetts. ISBN 978-0321334879.
This provides short, specific guidelines for effective C++ development.
• Meyers, S., More Effective C++ (2nd edition, 1997). Addison-Wesley Publishing Company, Reading,
Massachusetts. ISBN 0-201-92488-9.
The following publications provide general C programming information:
• ISO/IEC 9899:2011, C Standard.
The standard is available from national standards bodies (for example, AFNOR in France, ANSI in
the USA).
• Kernighan, B.W. and Ritchie, D.M., The C Programming Language (2nd edition, 1988). Prentice-
Hall, Englewood Cliffs, NJ, USA. ISBN 0-13-110362-8.
This book is co-authored by the original designer and implementer of the C language, and is updated
to cover the essentials of ANSI C.
• Harbison, S.P. and Steele, G.L., A C Reference Manual (5th edition, 2002). Prentice-Hall, Englewood
Cliffs, NJ, USA. ISBN 0-13-089592-X.
This is a very thorough reference guide to C, including useful information on ANSI C.
• Plauger, P., The Standard C Library (1991). Prentice-Hall, Englewood Cliffs, NJ, USA. ISBN
0-13-131509-9.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-278
reserved.
Non-Confidential
A Supporting reference information
A.8 Further reading
This is a comprehensive treatment of ANSI and ISO standards for the C Library.
• Koenig, A., C Traps and Pitfalls, Addison-Wesley (1989), Reading, Mass. ISBN 0-201-17928-8.
This explains how to avoid the most common traps in C programming. It provides informative
reading at all levels of competence in C.
See The DWARF Debugging Standard web site for the latest information about the Debug With Arbitrary
Record Format (DWARF) debug table standards and ELF specifications.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-A-279
reserved.
Non-Confidential
Appendix B
Arm® Compiler User Guide Changes
Describes the technical changes that have been made to the Arm Compiler User Guide.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-B-280
reserved.
Non-Confidential
B Arm® Compiler User Guide Changes
B.1 Changes for the Arm® Compiler User Guide
Added information about linking objects compiled with different • 3.3 Selecting source language options on page 3-45.
C or C++ standards. • 3.6 Linking object files to produce an executable
on page 3-54.
Added a topic that describes the interaction of OVERLAY and • 8.1.2 Interaction of OVERLAY and PROTECTED attributes
PROTECTED attributes with armlink merge options. with armlink merge options on page 8-126.
Added information about the effects of linking with a scatter file • 8.7.3 Automatic placement of __at sections on page 8-137.
having ZI data in an execution region.
Added a note to include a .balign directive when defining your • 1.7 Using the integrated assembler on page 1-25.
own sections with the armclang integrated assembler.
Minor improvements to the Getting Started section about compile • 1.6 Compiling a Hello World example on page 1-23.
and link steps, and clarification of what the • 6.3 Writing inline assembly code on page 6-107.
clobbered_register_list means when building programs
with inline assembly code.
Update description of -marm command line option to clarify that • 3.2 Common Arm® Compiler toolchain options on page 3-42.
it gives an error, not a warning, when used with an M-profile
architecture.
Added a note for the workaround when entry functions or Non- • 11.1 Overview of building Secure and Non-secure images
secure function calls have more than 4 arguments. on page 11-213.
Added chapters about the Scalable Vector Extension (SVE) • Chapter 2 Getting Started with the SVE features in Arm®
compiler. Compiler on page 2-31.
• Chapter 7 SVE Coding Considerations with Arm® Compiler
on page 7-112.
Added note about Arm Compiler and undefined behavior. • 3.3 Selecting source language options on page 3-45.
• A.2 Standards compliance in Arm® Compiler
on page Appx-A-270.
Added a note about not specifying both the architecture (-march) • 3.1 Mandatory armclang options on page 3-40.
and the processor (-mcpu). • 3.10 Selecting floating-point options on page 3-62.
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-B-281
reserved.
Non-Confidential
B Arm® Compiler User Guide Changes
B.1 Changes for the Arm® Compiler User Guide
Added details about the SVE and SVE2 intrinsics support. • 7.2 Using SVE and SVE2 intrinsics directly in your C code
on page 7-118.
Reworded the note about dynamic linking not being supported for • Chapter 14 SysV Dynamic Linking on page 14-239.
Cortex‑M processors.
Added note clarifying that Arm Compilerr 6 is not based on the • A.7 Clang and LLVM documentation on page Appx-A-277.
same revision as any specific release of the open source version of
LLVM and Clang, and may containArm-specific changes which
are not included in open source versions.
Updated text and examples to clarify correct naming of sections • 4.9 Scatter file section or object placement with Link-Time
when using #pragma clang section. Optimization on page 4-92.
Added note that all eXecute In Place (XIP) code must be stored in • 8.4 Root region on page 8-129.
root regions. • 10.10 Root regions on page 10-193.
Improved explanation of when to use the volatile keyword to • B.1 Changes for the Arm® Compiler User Guide
prevent unwanted removal of inline assembler code when building on page Appx-B-281.
optimized output. • 6.3 Writing inline assembly code on page 6-107.
Added details of the new -Omin compiler option which • 3.4 Selecting optimization options on page 3-49.
minimizes code size. • 4.6 Optimizing for code size or performance on page 4-84.
Removed outdated note about using __ARM_use_no_argv with • 3.4 Selecting optimization options on page 3-49.
-O0 optimization level in Arm Compiler 6. The -O0 option now
supports argv/argc optimization.
Added a note for OVERALIGN. • 8.15 Alignment of execution regions and input sections
on page 8-164
100748_0616_01_en Copyright © 2016–2021 Arm Limited or its affiliates. All rights Appx-B-282
reserved.
Non-Confidential