armv8-a
armv8-a
Agenda
Day 1
3
1. Armv8A Overview
2
2. Exception model
Day 2
20
3. Memory Management Overview
4. Memory model
Day 3
5. DynamIQ Architecture update (8.1/8.2)
6. Barriers
7. DynamIQ Caches
Day 4
on
lic
8. DynamIQ Cache Coherency
9. Virtualization
10. Synchronization
Si
Day 5
11. SW GICv3 Programming
12. Booting
en
Appendix
13. Arm Glossary
14. Services and Support Overview
av
M
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Architecture Overview
2
Armv8-A
20
Confidential © 2023 Arm
on
2
lic
Si
Improved Virtualization
Armv7
Vector Extensions
av
bFloat
Armv6
Adv SIMD Enhanced Crypto
Armv5 VFPv3/v4 Scalar Floating Point
M
Thumb®-2
LPAE Secure EL2
Armv4
Jazelle® TrustZone® Full Armv7 Pointer
Authentication
compatibility
Virtualization Branch Target
Thumb® VFPv2 SIMD Identifier
1990 2011
3 0838 rev 35733
Maven Silicon, 2023:04:20
3
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
AArch32
• Evolution of Armv7-A
• A32 (Arm) and T32 (Thumb) instruction sets
3
– Armv8-A adds some new instructions
• Traditional Arm exception model
2
• Virtual addresses stored in 32-bit registers
AArch64
20
• New 64-bit general purpose registers (X0 to X30)
• New instructions – A64, fixed length 32-bit instruction set
– Includes SIMD, floating point and crypto instructions
• New exception model
• Virtual addresses now stored in 64-bit registers
Agenda
en
Privilege levels
AArch64 registers
A64 Instruction Set
AArch64 Exception Model
av
5
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Non-secure Secure
3
EL0 App App App App Trusted Services
2
EL1 Rich OS Rich OS Trusted OS
Hypervisor No EL2 in
20
EL2
Secure world
EL1
IRQ, etc
Rich OS Rich OS Trusted OS
When EL3 is using AArch32, in the Secure world the EL1 modes are treated as EL3
• No effect on the Normal world
7
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
AArch64 OS can host a
AArch32 OS cannot host an
mix of AArch32 and AArch32 App AArch64 App AArch32 App AArch64 App AArch64 application
AArch64 applications
20
AArch64 OS AArch32 OS
AArch64 Hypervisor can
AArch32 Hypervisor cannot
host AArch64 AArch64 Hypervisor host AArch64 OSs
andAArch32 OSs
Agenda
en
Privilege levels
AArch64 registers
A64 Instruction Set
AArch64 Exception Model
av
9
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Register banks
AArch64 provides 31 general purpose registers: R0-R30
• Each register has a 64-bit (Xn) and 32-bit (Wn) form
W0 W1
X0 X1
3
R0 R1
Separate register file for floating point, SIMD, and crypto operations: V0-V31
2
• Each register has a 128-bit (Qn), 64-bit (Dn), 32-bit (Sn), 16-bit (Hn), and 8-bit (Bn) form
20
B0 B1
H0 H1
S0 S1
D0 D1
Q0 Q1
10
lic
Si
Other registers
AArch64 introduces the “zero” register: XZR and WZR
en
There are separate link registers for function calls and exceptions
av
11
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Processor state
AArch64 does not have a direct equivalent of the AArch32 CPSR
• Settings previously held in the CPSR are referred to as “Processor State” (or PSTATE) fields
– These fields are accessed individually
Fields Description
3
NZCV ALU flags
Q Sticky overflow (AArch32 only)
2
DAIF Exception mask bits
SPSel SP selection (SP_EL0 or SP_ELx, AArch64 only)
20
CurrentEL The current exception level
E Data endianness (AArch32 only)
IL Illegal flag. When set, all instructions are treated as UNDEFINED
SS Software stepping bit
There is a set of rules known as the Procedure Call Standard (PCS) that specifies how registers should be used:
.global foo
.type foo, @function
.section _foo
av
13
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
X0-X7 X8-X15 X16-X23 X24-X30
2
XR (X8) IP0 (X16)
20
Registers
PR (X18)
(X24-X28)
Parameter and Result
Registers Corruptible Registers
(X0-X7) (X9-X15) Callee-saved
FP (X29)
Registers
(X19-X23) LR (X30)
The PCS also covers the use of the floating-point and SIMD registers
15
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
X0-X7 X8-X15 X16-X23 X24-X30
2
R0 R8_usr LR_irq R8_fiq
R1 R9_usr SP_irq R9_fiq
R2 R10_usr LR_svc R10_fiq
20
R3 R11_usr SP_svc R11_fiq
R4 R12_usr LR_abt R12_fiq
R5 SP_usr SP_abt SP_fiq
R6 LR_usr LR_und LR_fiq
R7 SP_hyp SP_und
System control
en
Use the MRS instruction to read a system register, and MSR instruction to write to a system register
17
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
ACTLR_ELx
• Auxiliary Control Register
• Controls processor-specific features
3
SCR_EL3
• Secure Configuration Register
• Controls Secure state and trapping of exceptions to EL3
2
HCR_EL2
• Hypervisor Configuration Register
• Controls Virtualization settings and trapping of exceptions to EL2
20
MIDR_EL1
• Main ID Register
• Specifies the type of processor that the code is running on (e.g. part number and revision)
MPIDR_EL1
• Multiprocessor Affinity Register
• Specifies the core and cluster IDs in a multi-core/multi-cluster system
CTR_EL0
• Cache Type register
• Specifies information about the integrated caches (e.g. the line size)
Agenda
en
Privilege levels
AArch64 registers
A64 Instruction Set
AArch64 Exception Model
av
19
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
A64 overview
AArch64 introduces new A64 instruction set
• Similar set of functionality as traditional A32 (Arm) and T32 (Thumb) ISAs
3
Syntax similar to A32 and T32
2
ADD W0, W2, W7 ; 32-bit addition, W0 = (W2 + W7)
ADD X0, X2, X7 ; 64-bit addition, X0 = (X2 + X7)
MOV X0, XZR ; Clear X0 to #0
20
Most instructions are not conditional
Agenda
en
Privilege levels
AArch64 registers
A64 Instruction Set
AArch64 Exception Model
av
21
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
AArch64 exceptions
In AArch64, exceptions are split between:
• Synchronous: Data Aborts from the MMU, Permission Faults, Alignment Faults, service call instructions (e.g. SVC), etc
• Asynchronous: IRQs, FIQs, SErrors (System Errors)
On taking an exception, the EL can either stay the same or get higher
3
• Exceptions are never taken to EL0
2
Synchronous exceptions are typically taken in the current EL
20
• HCR_EL2 controls routing to EL2 IRQ ?
• SCR_EL3 controls routing to EL3 Rich OS EL1
• Separate bits to control routing of IRQs, FIQs, and SErrors
Hypervisor EL2
Taking an exception
Handler for specific
en
Top-level handler
av
23
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
Privilege levels
AArch64 registers
A64 Instruction Set
3
AArch64 Exception Model
AArch64 Memory Model
2
20
24 0838 rev 35733
Maven Silicon, 2023:04:20
on
24
lic
Si
Memory types
en
Normal
av
Device
M
25
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Alignment
Unaligned data accesses are allowed to address ranges marked as Normal
3
• Trapping can be enabled separately for EL0/EL1, EL2, and EL3
– Controlled by SCTLR_ELx.A bits
2
Unaligned data accesses to addresses marked as Device will always trigger an exception
• Synchronous data abort
20
Instruction fetches must always be aligned
• A64 instructions must be 4-byte aligned (bits [1:0] = 0b00)
• Synchronous exception
Virtual addresses are 64-bit wide, but not all addresses are accessible
• Virtual memory address space split between two translation tables
– Each covering a configurable size, up to 48 bits of address space (TCR_ELx)
• Addresses not covered by either translation table automatically generate translation faults
av
TTBR1_EL1 RAM
FAULT
0x0000_FFFF_FFFF_FFFF
Translation Peripherals
Tables
Application
TTBR0_EL1 Flash
0x0
27
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• TTBR1_EL1 OS Translation Physical Address Space
Tables
• TCR_EL1
TTBR1_EL1 Peripherals
FAULT
2
Hypervisor
• TTBR0_EL2 Translation Flash
• TCR_EL2 Tables
20
Application
TTBR0_EL1
Secure Monitor
RAM
• TTBR0_EL3
• TCR_EL3 Hypervisor Virtual Address Space
Peripherals Translation
Tables
Data
28
lic
Si
Translation
Flash
Secure Data Tables
These are in theory completely separate
Peripherals
• SP:0x8000 != NP:0x8000 Non-secure Data
• But most systems instead treat Secure and
Non-secure as an attribute for access control Non-secure Physical
Non-secure EL1/EL0
M
Address Space
The Normal world can only access the Non- Non-secure Peripherals RAM
secure physical address space Translation
Non-secure Data Flash
Tables
The Secure world can access both physical Non-secure Data Peripherals
address spaces
• Controlled through the translation tables
29
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
MPCore configurations
Many implementations of Arm processors have a multi-core configuration
• Multiple cores contained within the same block
Each core has its own MMU configuration, register bank, internal state, and Program Counter
• Core0 might be executing in Non-secure AArch32 EL0 while Core1 is executing in Secure AArch64 EL1
3
Cores can be powered and brought in and out of reset independently
• ID registers allow for discovery of core affinity
2
Each core has separate L1 data and instruction caches
• Hardware will maintain coherency between L1 data caches for certain memory types
20
• Some cache and TLB instructions are broadcast to other cores
• All cores share a common physical memory map
30
lic
Si
en
Appendix
av
M
31
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Secure and Non-Secure are Security States
2
• EL3 is always Secure, EL2 is always Non-secure
• EL1/EL0 can be Secure or Non-secure (sometime S.ELx or NS.ELx are used as shorthand)
20
A64, A32, and T32 are Instruction Sets
• A64 used when in AArch64
• A32 and T32 used when in AArch32
– In previous architecture versions A32 was called Arm, and T32 was called Thumb
Examples:
• Processor currently executing in EL3 as AArch64, executing A64 instructions
Where a bit can be RES in one Execution State and used in another
• The Architecture defines the bit field as writeable or “stateful”
M
33
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
The Execution State of the highest EL (entered on reset) defines the reset contents of System Registers
2
• If the highest EL uses AArch64, but lower ELs use AArch32
• You may need to initialize Armv7/AArch32 System Registers with expected Armv7/AArch32 reset values in software before changing EL
20
34 0838 rev 35733
Maven Silicon, 2023:04:20
on
34
lic
Si
Armv8.1-A
en
And other enhancements to the memory system architecture, such as Privileged Access Never (PAN) state bit
35
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Armv8.2-A
Armv8.2-A further extends Armv8.1-A
3
New cache clean operation to point of persistency (AArch64 only)
• DC CVAP, Xt
2
Execute never support in stage 2 translation extended (AArch64 only)
• Can now specify different attributes for EL1 and EL0
20
Optional half precision floating point
• Supports IEEE754-2008 formatted half-precision floating point data processing
• Base architecture already had support for converting to fp16 format
Armv8 software support now widely available in the open source community
Linux Kernel
• AArch64 support has been available in mainline for several releases
• Under arch/arm64/
av
Filesystems
• AArch64 kernel supports both legacy Armv7-A and AAarch64 filesystem components
• Some guidance on building file-systems for AArch64 is available here https://wiki.linaro.org/HowTo/Armv8/OpenEmbedded
• Both Fedora and openSUSE have AArch64 releases
M
37
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Prebuilt versions of gcc are available for download from Arm Developer at:
• https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-a
3
Linaro provides prebuilt AArch64 GCC toolchain binaries (GCC, GDB, etc.) with Linux and bare-metal library options
• These are available as cross or native toolchains: https://launchpad.net/linaro-toolchain-binaries/
2
Any version of gcc that cross compiles to AArch64 will work, but:
• Certain processor-specific optimizations and automatic feature detection (using -mcpu=xxx) are only available past certain gcc versions
20
Arm tools
• The Arm compiler supports AArch64 and is suitable for bare-metal/validation environments
• ArmDS includes debug support for Armv8 hardware and models:
– https://developer.arm.com/tools-and-software/embedded/arm-development-studio
• Fast Models allows the creation of DynamIQ CPU based Arm Virtual Platforms for software development
Thank You
Danke
en
Gracias
谢谢
ありがとう
Asante
av
Merci
감사합니다
धन्यवाद
Kiitos
M
شكرا
ً
ধন্যবাদ
תודה
39
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
AArch64 Exception Model
2
Armv8-A
20
Confidential © 2023 Arm Limited
on
2
lic
Si
This module
This module will focus on:
en
3 0000
3
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• The AArch64 exception model
• Interrupts
3
• Synchronous exceptions
2
• SError exceptions
20
• Exceptions in EL2 and EL3
Exception Levels
• AArch64 has four exception levels, and two security states
en
Non-secure Secure
5
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
AArch64 exceptions
• Synchronous
• Service Calls: SVC, HVC, and SMC (covered later)
• Aborts from MMU (e.g. Permission Faults, Alignment Faults)
• SP and PC alignment checking
• Unallocated instructions
3
• Asynchronous
2
• IRQ
• FIQ Interrupt signals into the Arm core
• SError (System Error)
20
6 0840 rev 33026
Maven Silicon, 2023:04:20
on
6
lic
Si
Taking an exception
Handler for
en
Top-level handler
av
• PSTATE updated (EL stays the same or gets higher) • Restores PC from ELR_ELx
• Return address stored to ELR_ELx
• PC set to vector address
• ESR_ELx updated with cause of exception
‐ Only if synchronous or SError exception
7
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Exception routing
• On taking an exception, the EL can either stay the same or get higher
• Exceptions are never taken to EL0
3
• Asynchronous exceptions must be configured to be routed to:
• EL1 – The Rich OS kernel
•
2
EL2 – The Hypervisor
• EL3 – The Secure Monitor
20
Application EL0
IRQ ?
Rich OS EL1
Hypervisor EL2
8
lic
Si
Fields Description
NZCV ALU flags
DAIF Exception mask bits
av
• When taking an exception PSTATE is stored in the relevant Saved Program Status Register (SPSR)
• SPSR_EL3, SPSR_EL2, SPSR_EL1
• The SPSR also includes a mode field that holds the execution state
• M[4]=0: AArch64, M[4]=1: AArch32
9
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
...
3
EL0 AArch32 application code
...
2
entry
EL1 AArch64 kernel IRQ handler
exit
20
• The SPSR (populated by the core when taking the exception) includes the execution state and EL to return to
• But how does the core decide which execution state to enter when taking the exception?
• This is defined by the RW bit of control register for the EL above the one that the exception is taken to
• In this example HCR_EL2.RW would configure the execution state for EL1
• For a cold reset the Execution state is determined by a configuration input signal
• For a warm reset the Execution state entered is determined by RMR_ELx.AA64
• The Execution state of all other ELs can be dynamically changed by software
• For an exception return, the SPSR_ELx.M[4] must match the execution state defined by the corresponding
av
11
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• ELR_ELx Updated on exception entry
Use ERET instruction to return from exceptions
2
• After taking an exception, ELR_ELx contains the preferred return address
• For service calls (e.g. SVC):
‐ The address of the next instruction after the system call instruction
20
• For other synchronous exceptions:
‐ The address of the instruction that generated the exception
• For asynchronous exceptions:
‐ The address of first instruction that has not been executed, or executed fully as a result of taking the interrupt
Exception stacks
• Each exception level has its own dedicated stack pointer
en
13
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
AArch32 AArch64
2
R0-R12 X0-X12
Banked SP and LR X13-X23
Banked FIQ X24-X30
20
• When moving from AArch32 to AArch64:
• Registers not accessible in AArch32 state retain their values from previous AArch64 execution
• For registers that are accessible in both execution states:
‐ Top 32 bits: UNKNOWN
‐ Bottom 32 bits: The value of the mapped AArch32 register
14
‐ Typically access as Wn registers
• Except EL0
• If taken from a lower exception level, the execution state of the level below the level that the exception is taken to
‐ Example: Exception taken from EL0 to EL2, instruction block depends on execution state of EL1
15
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
IRQ / vIRQ
0x600 Synchronous
0x580 SError / vSError
0x500 FIQ / vFIQ
2
Exception from a lower EL and
0x480 IRQ / vIRQ at least one lower EL is AArch64.
0x400 Synchronous
0x380 SError / vSError
20
0x300 FIQ / vFIQ Exception from the current EL
0x280 IRQ / vIRQ while using SP_ELx
0x200 Synchronous
0x180 SError / vSError
0x100 FIQ / vFIQ Exception from the current EL
0x080 IRQ / vIRQ while using SP_EL0
VBAR_ELn + 0x000 Synchronous
16
lic
Si
Agenda
en
• Interrupts
• Synchronous exceptions
av
• SError exceptions
M
17
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Interrupt handling
• The Arm processor has two external interrupt signals
• IRQ and FIQ
• The architecture doesn’t mandate how these signals are used, but FIQ is often reserved for secure interrupt sources
• Processor State (PSTATE) contains interrupt masks for the current exception level
3
• PSTATE.I – Mask IRQs
• PSTATE.F – Mask FIQs
2
• On taking an exception to AArch64, all PSTATE interrupt masks are set
20
• Interrupts can be re-enabled in software to support nested exceptions
…..
av
Interrupt Controller
Distributor
• The GIC supports routing software generated, private and shared peripheral interrupts between cores in an MP system
19
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
20 0840 rev 33026
Maven Silicon, 2023:04:20
on
20
lic
Si
Main
Application
av
21
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
ASM IRQ handler
Main
Application
av
23
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
Identify interrupt source
20
ASM IRQ handler C subroutine Clear interrupt source
Handle interrupt
Main
Application
av
25
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
Identify interrupt source
20
ASM IRQ handler C subroutine Clear interrupt source
Handle interrupt
26
lic
Si
Main
Application
av
27
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
...
BL identify_and_clear_source
2
BL C_IRQ_Handler
20
...
LDP X2, X3, [SP], #16
Restore PCS corruptible registers
LDP X0, X1, [SP], #16
Main
Application
av
M
29
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
30 0840 rev 33026
Maven Silicon, 2023:04:20
on
30
lic
Si
Main
Application
av
M
31
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
32 0840 rev 33026
Maven Silicon, 2023:04:20
on
32
lic
Si
Main
Application
av
M
33
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
34 0840 rev 33026
Maven Silicon, 2023:04:20
on
34
lic
Si
Main
Application
av
M
35
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
36 0840 rev 33026
Maven Silicon, 2023:04:20
on
36
lic
Si
Main
Application
av
M
Mask interrupts
Restore SPSR_EL1 and ELR_EL1
Restore corruptible registers
37 0840 rev 33026
Maven Silicon, 2023:04:20
37
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
38 0840 rev 33026
Maven Silicon, 2023:04:20
on
Mask interrupts
Restore SPSR_EL1 and ELR_EL1
Restore corruptible registers
38
lic
Si
Main
Application
av
M
Mask interrupts
Restore SPSR_EL1 and ELR_EL1
Restore corruptible registers
39 0840 rev 33026
Maven Silicon, 2023:04:20
39
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Main
3
Application
2
20
40 0840 rev 33026
Maven Silicon, 2023:04:20
on
Mask interrupts
Restore SPSR_EL1 and ELR_EL1
Restore corruptible registers
40
lic
Si
Main
Application
av
M
Mask interrupts
Restore SPSR_EL1 and ELR_EL1
Restore corruptible registers
41 0840 rev 33026
Maven Silicon, 2023:04:20
41
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Stack SPSR_EL1 and ELR_EL1
STP X0, X1, [SP, #-16]!
2
BL identify_and_clear_source
; Unmask IRQs
MSR DAIFClr, #0b0010 Unmask IRQs
20
BL C_IRQ_Handler
; Mask IRQs
MSR DAIFSet, #0b0010 Mask IRQs
; Restore SPSR_EL1 and ELR_EL1
LDP X0, X1, [SP], #16
MSR SPSR_EL1, X0 Restore PCS corruptible registers,
MSR ELR_EL1, X1 SPSR_EL1, and ELR_EL1
42
lic
Si
Agenda
en
• Interrupts
• Synchronous exceptions
av
• SError exceptions
M
43
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Synchronous exceptions
• Synchronous exceptions can occur for a wide variety of reasons
• Aborts from MMU (Permission Faults, Alignment Faults, regions of memory being marked as Faults, etc)
• SP and PC alignment checking
• Unallocated instructions
• Service Calls: SVC , HVC, and SMC
3
• These exceptions can be part of the normal operation of the OS
2
• For example: A way for a task to request allocation of more memory
‐ Handler loads new page of code or data
20
• Or to indicate a fault
• For example: Task attempts to access invalid memory location
‐ Handler terminates the process
System calls
• Some instructions can only be carried out at a specific exception level
en
• Example:
av
45
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• The Fault Address Register (FAR_ELx)
•
2
Holds the faulting virtual address
• For all synchronous instruction and data aborts and alignment faults
20
• The Exception Link Register (ELR_ELx)
• Holds the preferred return address
‐ For system calls – the next instruction after the call
‐ In most of other cases, the instruction that triggered the exception
• Updated for synchronous exceptions and SErrors (Not updated for IRQ or FIQ)
• ISS
M
47
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• The AArch64 exception model
• Interrupts
3
• Synchronous exceptions
2
• SError exceptions
20
• Exceptions in EL2 and EL3
SError exceptions
• New exception type for Armv8-A processors
en
49
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• The AArch64 exception model
• Interrupts
3
• Synchronous exceptions
2
• SError exceptions
20
• Exceptions in EL2 and EL3
• EL0 – Application
• EL0 (user) cannot call directly into Hypervisor or Secure Monitor
•
av
• EL1 – Kernel
• Can call the Hypervisor (EL2) with the HVC instruction
• Can call the secure monitor (EL3) with the SMC instruction
M
• EL2 - Hypervisor
• Can call the secure monitor (EL3) with the SMC instruction
51
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Application EL0
IRQ ?
Rich OS EL1
3
Example: IRQs routed to EL2 to be
Hypervisor EL2
handled by the Hypervisor
2
Example: Secure interrupts are signalled
Secure Monitor EL3
as FIQs, with FIQs routed to EL3
20
• For systems that implement EL2 (Hypervisors) or EL3 (Secure Kernels)
• Asynchronous exceptions can be routed to a higher EL to be dealt with by a Hypervisor or Secure kernel
‐ SCR_EL3 specifies exceptions to be routed to EL3
‐ HCR_EL2 specifies exceptions to be routed to EL2
‐ Separate bits control routing of IRQs, FIQs and SErrors
• It is possible to set the bits to indicate an exception should be routed to both EL2 and EL3
• It will then be routed to EL3
av
M
53
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Example
• FIQ interrupts can be routed directly to EL3 using SCR_EL3.FIQ
• Other exceptions are routed to EL1 by the combination of the HCR_EL2 and SCR_EL3 values
Non-secure state
3
SCR_EL3.[ EA = 0, FIQ = 1, IRQ = 0 ]
SError
2
IRQ Applications EL0
FIQ
20
SError
IRQ Kernel EL1
FIQ
SError
IRQ Secure Monitor
54
lic
Si
Non-secure Secure
AArch32 AArch64
App App EL0
av
Non-secure interrupts
taken in EL1 (IRQs)
AArch64 Linux EL1
Secure interrupts routed to SMC
EL3 (FIQs)
Firmware / Secure Monitor EL3
M
55
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Non-secure Secure
3
EL0 Trusted Services
Secure interrupts taken in
2
EL1 (FIQs)
EL1 Trusted OS
SMC Non-secure interrupts
20
routed to EL3 (IRQs)
EL3 Firmware / Secure Monitor
56
• When executing in Secure state, HCR_EL2 does not effect interrupt routing
57
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Memory Management
2
Armv8-A
20
Confidential © 2020 Arm Limited
on
2
lic
Si
Agenda
• Memory Management theory
en
• TLB maintenance
M
3
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Read/Write permissions for User/Privileged modes
• Memory types
• Caching/Buffering and access ordering rules for memory accesses
2
Memory Map
20
Uncached Peripherals
Privileged Access OS
Application
User Access
Space
4
lic
Si
• For example: Making non-contiguous blocks of physical memory appear as a single block in the virtual address space
Privileged Access OS
RAM
Application
User Access
Space
5
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
When the MMU is enabled, all accesses made by the core are passed through it
• MMU will use cached translations from the TLB(s), or perform a table walk
2
• Translation must occur before a cache lookup can complete
20
MMU Memory
Arm Core Table Caches
TLBs Translation Tables
Walk Unit
• The MMU combines the physical address bits from the block table entry with the bottom bits from the original
av
virtual address
Translation Table
PA Base Attributes
PA Base Attributes VA Base
M
PA Base Attributes
PA Base Attributes
PA Base Attributes Translation Table
Base (TTBR)
7
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• The second-level table is indexed using the second bitfield of the VA (L2 Index)
• Table entries contain the physical base address of a 64KB block
2
The last bitfield of the VA provides the offset of the final physical output address
20
L1 Table L2 Table
L1 Index L2 Index
Table Base
8
lic
Si
Agenda
• Memory Management theory
en
• TLB maintenance
M
9
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
‐ Provides backward compatibility for legacy code
‐ Can not be used by the Hypervisor (EL2 or 2nd stage translations)
• Armv7-A (LPAE) Long Descriptor format (as used by Cortex-A15)
2
‐ Very similar to AArch64 format, but only supports 32-bit input address
20
10 0841 rev 32380
Maven Silicon, 2023:04:20
on
10
lic
Si
11
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Level 0 descriptors only output Table addresses, Level 3 only output Block addresses
3
• Note: The same descriptor type code has a different format at Level 3
2
Attributes Next-level Table Address 1 1
20
63 Block Descriptor (Levels 1, 2) 0
Upper Attributes Output Block Address Lower Attributes 0 1
12
lic
Si
• 4-level look up, 48-bit address, 9 address bits per level (512 entries)
av
Level 0 Table Index Level 1 Table Index Level 2 Table Index Level 3 Table Index Block Offset
M
Each entry can: Each entry can: Each entry can: Each entry can:
• Point to an L1 Table • Point to an L2 Table • Point to an L3 Table • Point to a 4KB Block
(No Block entries) • Point to a 1GB Block • Point to a 2MB Block (No Table entries)
13
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Virtual Address bits
2
[47] [46:36] [35:25] [24:14] [13:0]
20
Level 0 Table Index Level 1 Table Index Level 2 Table Index Level 3 Table Index Block Offset
Each entry can: Each entry can: Each entry can: Each entry can:
• Point to an L1 Table • Point to an L1 Table • Point to an L3 Table • Point to a 16KB Block
(No Block entries) (No Block entries) • Point to a 32MB Block (No Table entries)
• 3-level look up, 48-bit address, 13 address bits per level (8192 entries)
‐ Top Level is a partial table, 6 bits (64 entries)
av
Level 1 Table Index Level 2 Table Index Level 3 Table Index Block Offset
M
15
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Input virtual address can be a full 64-bit VA
‐ TTBR1_EL1 selected when upper 16 bits are all 1, TTBR0_EL1 selected when upper 16 bits are all 0
‐ Size of each address region controlled by TnSZ fields in the Translation Control Register (TCR_EL1)
2
• But, both of these regions must map to within a single 48-bit physical address space
20
0xFFFF_FFFF_FFFF_FFFF
0x0
Maven Silicon, 2023:04:20
TTBR0
on 0x0
16
lic
Si
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
IPS
av
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
T1SZ, T0SZ
• Kernel- and User-space Virtual Address Space Size
• Address range = 2(64 - TnSZ)
TG1, TG0
• Kernel- and User-space Translation Granule
17 0841 rev 32380
Maven Silicon, 2023:04:20
17
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Permitted address range is 2(64-TnSZ)
‐ Example: TnSZ = 34, address range is 30 bits
2
Table Address Range
20
Level 4KB Granule 16KB Granule 64KB Granule
0 40- to 48-bit 48-bit
1 31- to 39-bit 37- to 47-bit 43- to 48-bit
2 25- to 30-bit 26- to 36-bit 30- to 42-bit
3 25-bit 25- to 29-bit
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
IPS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M
TG1 SH1 ORGN1 IRGN1 T1SZ TG0 SH0 ORGN0 IRGN0 T0SZ
You must ensure that the attributes specified in the TCR_EL1 match those specified for the virtual memory region
covering the translation tables
19
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• The Blocks must be adjacent (correspond to a contiguous range of VA)
‐ 16 adjacent blocks with 4KB granule
‐ 32 or 128 adjacent blocks with 16KB granule
2
‐ 32 adjacent blocks with 64KB granule
• Have consistent attributes
20
• Start on an aligned boundary
• Point to a contiguous output address range at the same level of translation
If these conditions are not met it is considered a programming error which may result in TLB aborts or corrupted
lookups
• For example:
‐ If any of the table entries do not have the contiguous bit set
20
‐
20
lic
Si
Agenda
• Memory Management theory
en
• TLB maintenance
M
21
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
The Hypervisor and Secure Monitor also have a set of Stage 1 translation tables
• Mapping directly from VA to PA
2
Guest OS Physical Memory Map
Virtual Memory Map seen by Guest OS
Stage 1 Real Physical Memory Map
Stage 2
20
EL1 OS Peripherals
Guest OS
Tables Virtualization Peripherals
EL0 Application RAM Tables
TTBRn_EL1 RAM
Hypervisor Flash VTTBR_EL2
Virtual Memory Map Peripherals
Hypervisor
EL2 Hypervisor Tables RAM
Secure Monitor TTBR0_EL2
22
EL3
Virtual Memory Map
22
lic
Si
• Allows a single contiguous address space of variable size at the bottom of memory (up to 48-bit)
0x0000_FFFF_FFFF_FFFF
VTTBR
0x0
23 0841 rev 32380
Maven Silicon, 2023:04:20
23
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Allows a single contiguous address space of variable size at the bottom of memory (up to 48-bit)
2
‐ T0SZ[5:0] configures the size of the address space
‐ TG0 specifies the translation granule size
Virtual Address Space
20
‐ First level of lookup is implicitly controlled by T0SZ and translation granule size
‐ Any access outside the defined address range causes a Translation Fault
FAULT
0x0000_FFFF_FFFF_FFFF
Hypervisor or TTBR0_
24
lic
Si
When in the Secure World, the EL1 translation regime has two differences from when it is the in non-secure state:
• The second stage of translation is disabled
M
• The EL1 translation regime is capable of pointing to secure or non-secure physical addresses
25
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Memory Management theory
3
• Translations at EL2 / EL3
2
• TLB maintenance
20
26 0841 rev 32380
Maven Silicon, 2023:04:20
on
26
lic
Si
• Whenever the translation tables are modified the TLBs must be manually invalidated
• TLBs cannot cache table entries which result in a translation fault
27
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
AArch64 instructions
AArch64 TLB maintenance operations are performed via instructions
TLBI <type><level>{IS} {, <Xt>}
Type
• All - All TLB entries
3
• VMALL - All TLB entries (stage 1, for current guest OS)
• VMALLS12 - All TLB entries (stage 1 & 2 for current guest OS)
• ASID - Entries that match ASID in Xt
2
• VA - Entry for virtual address and ASID specified in Xt
• VAA - Entries for virtual address specified in Xt, with any ASID
• There are more…
20
Level
• En = ELn virtual address space (n can be 3, 2, or 1)
IS
• Inner-shareable operation
Examples:
TLBI VAE1, X0 ; Invalidate address/ASID in x0, for EL1 virtual address space
28
lic
Si
en
29
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Memory Model
2
Armv8-A
20
Confidential © 2020 Arm Limited
on
2
lic
Si
Memory model
• The memory map of a typical system is partitioned into logical regions
en
Memory Map
M
Uncached Peripherals
Privileged Access OS
3
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Reserved for
Software use UXN PXN Contig nG AF SH AP NS Indx
58:55 54 53 52 11 10 9:8 7:6 5 4:2
20
• UXN, PXN Executable permissions
• AF Access flag
• nG Non-global
• SH Shareability
• AP Access permissions
• NS
•
4
Indx
0842 rev 32379
Security (EL3 and Secure-EL1 only)
on
Index into Memory Attribute Indirection Register (MAIR_ELn)
4
lic
Si
Hierarchical attributes
• The descriptor format provides support for hierarchical attributes
en
• Use for:
• Access permissions (APTable)
• Security (NSTable)
av
next-level table
5
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Types
• Attributes
3
• Alignment and endianness
2
• Tagged pointers
20
6 0842 rev 32379
Maven Silicon, 2023:04:20
on
6
lic
Si
Memory Types
Each defined memory region has a specified memory type
en
The memory type affects how the processor can access the region
• Shareability
• Cacheability
7
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Normal memory gives the best performance because it imposes the fewest restrictions
• Allows the processor to re-order, repeat, and merge accesses
3
For optimal performance, application code and data should be marked as Normal
• Ordering can still be enforced when required using explicit barrier operations
2
Address regions marked as Normal can be accessed speculatively
20
• Data or instructions fetched from memory before being explicitly referenced
• Speculative access may, for example, be caused by: Memory Map
‐ Branch prediction
‐ Out-of-order data loads Peripherals
‐ Speculative cache line fills Normal OS
8
lic
Si
• There is no requirement for Normal accesses to complete in order with respect to other Normal and Device accesses
• However, a processor must handle address dependencies
0x1000 0x1004
av
9
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Device type imposes more restrictions on the core
2
Attempting to execute from a region marked as Device is UNPREDICTABLE
20
• Speculative instruction fetches are covered later…
Memory Map
Device Peripherals
OS
Device regions should always
10
lic
Si
11
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
LDR X2, [A+8] B Device-GRE
LDR X3, [C]
2
LDR X4, [B+8] C Device-nGnRnE
20
Effect of ordering rules:
• The two accesses to region A are guaranteed to be in program order with respect to each other
• The two accesses to region B are not guaranteed to be in program order with respect to each other,
or with respect to the accesses to regions A and C
12
lic
Si
The table of types is held in the Memory Attribute Indirection Register (MAIR_ELn)
• Eight entry table, each entry is 8 bits
av
63 0
7 6 5 4 3 2 1 0
M
For example:
13
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Device-
GRE Access rules for stronger memory are still legal for weaker
memory types
2
• For example: The Reordering attribute means accesses may be
Device-nGRE reordered, not that they must be reordered
20
An implementation may use the same behaviour for
Device-nGnRE different memory types
• Must use the behaviour of the strongest type
• Bus infrastructures may not be able to express all memory
types
Device-nGnRnE Software can specify the weakest type necessary for correct
operation
14
lic
Si
Agenda
• Types
en
• Attributes
• Tagged pointers
M
15
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Cacheability
Regions marked as Normal can be Cacheable or Non-cacheable
• Specified as Inner and Outer attributes
• The divide between Inner and Outer is IMPLEMENTATION DEFINED
‐ Typically Inner attributes are used by the integrated caches and Outer attributes are exported onto the bus
3
Inner Cacheable Outer Cacheable
2
Cores Memory
Cores
Cores L3$
Cores L1$ L2$ System
20
Cores Memory
Cores
Cores L2$
Cores L1$ System
Shareable
The Shareable attribute is used to define whether a location is shared with multiple processors
en
Example:
av
Processor Processor
Mali GPU
Core Core Core Core
Non-shared
M
Inner Shareable
Outer Shareable
These attributes can define sets of observers for which the Shareability attributes make the data/unified caches
transparent for data accesses
• Requires system to provide coherency management
17 0842 rev 32379
Maven Silicon, 2023:04:20
17
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
The Shareability attribute indicates whether other masters will access the data
3
• Complex systems may make data visible using extra coherency logic
• PEs without coherent caches can share data by forcing memory to not be cached
2
Software that specifies the correct memory type for the desired behaviour will then work correctly on any
implementation
20
PE0 PE1
PE
D$ D$
Cache
Coherency Logic
18
lic
Si
Access permissions
Access permissions control whether a region is readable and/or writeable
en
Memory Map
AP Unprivileged (EL0) Privileged (EL1/2/3)
Privileged Peripherals
00 No access Read/write
Privileged OS
01 Read/write Read/write
M
19
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Executable
Blocks can be marked as executable or non-executable (XN)
• UXN – Unprivileged Execute Never
• PXN – Privileged Execute Never
• Setting these attributes prevents speculative instruction fetches
3
Processor can also be configured to treat writeable regions as Execute Never
• SCTLR_EL1.WXN
2
‐ Regions writeable at EL0 treated as XN at EL0 and EL1 Memory Map
‐ Regions writeable at EL1 treated as XN at EL1
20
• SCTLR_EL2/3.WXN Not executable Peripherals
‐ Regions writeable at ELn treated as XN at ELn Executable OS
• SCTLR.UWXN
‐ AArch32 only
Not executable Application Data
‐ Regions writeable at EL0 treated as XN at EL1
Executable Application Code
20
lic
Si
Access flag
The Access Flag (AF) is a bit in each Block descriptor
en
• Abort handler must manually set the AF bit in the table entry
• Translation tables can be written with the AF bits set if this feature is not being used
21
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• For Non-Global entries the TLBs record the current ASID
• TLB hit only occurs if the ASID in TLB entry matches current ASID
2
Current ASID Memory Map
0x02
20
VA Tag ASID Descriptor Global Peripherals
Global OS
Non-Global Entry 0xFFE 0x02 PA and Attributes
22
Translation Table
ASIDs
ASID stands for Address Space Identifier
en
AArch64
• ASID is 8 or 16-bit, controlled by TCR_EL1.AS bit
• Current ASID specified in TTBR0_EL1 or TTBR1_EL1
‐ TCR_EL1 controls which TTBR holds the ASID
av
23
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Reserved bits
Entries in the Long Descriptor format include bits reserved for use by the OS
• These bits are guaranteed to be ignored by the hardware
• So can be used to record OS specific information in the translation tables
3
63 Table Descriptor 0
Attributes Next-level Table Address 1 1
2
Reserved for
20
Software use
58:55
63 Block Descriptor 0
Upper Attributes Output Block Address Lower Attributes 0 1
Reserved for
Flash
Secure Data Tables
• SP:0x8000 != NP:0x8000 Peripherals
• But most systems instead treat Secure and Non-secure Data
Non-secure as an attribute for access control
Non-secure Physical
The Normal world can only access the Non- Non-secure EL1/EL0 Address Space
M
25
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Types
• Attributes
3
• Alignment and endianness
2
• Tagged pointers
20
26 0842 rev 32379
Maven Silicon, 2023:04:20
on
26
lic
Si
Alignment
An unaligned access is where the address is not aligned to the element size
en
• All unaligned accesses will be faulted if the SCTLR_ELx.A bit is set for a given EL
27
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Endianness
Data accesses can be little-endian (LE) or big-endian (BE)
• Instruction fetches are always LE
3
• SCTLR_ELx.EE controls ELx data endianness
• SCTLR_EL1.E0E controls EL0 data endianness
• CPSR.E and SCTLR.EE in AArch32
2
It is IMPLEMENTATION DEFINED whether a processor supports both LE and BE
20
• If only little endian is supported .EE and .E0E bits become RES0
• If only big endian is supported, .EE and .E0E bits become RES1
Agenda
• Types
en
• Attributes
• Tagged pointers
M
29
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Peripherals Peripherals
OS Translation
Tables
0xFFFF_0000_0000_0000
2
Flash
TTBR1_EL1
FAULT
0x0000_FFFF_FFFF_FFFF
20
Translation
Tables RAM
Application
TTBR0_EL1
0x0
30
‐ Internally core uses bit [55] to sign-extend address to 64-bit format
• Allows bits [63:56] to be used to store other information
• Enabled through TCR_EL1, and controlled separately for each TTBR
0842 rev 32379
Maven Silicon, 2023:04:20
on
30
lic
Si
en
respective owners.
31
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
pte mkyoung(pte) Marks PTE as recently accessed
...
2
From arch/arm64/include/asm/pgtable.h:
...
20
PTE_BIT_FUNC(mkold, &= ~PTE_AF); Clears AF bit
PTE_BIT_FUNC(mkyoung, |= PTE_AF); Sets AF bit
...
From arch/arm64/include/asm/pgtable-hwdef.h:
...
#define PTE_AF
...
32 0842 rev 32379
(_AT(pteval_t, 1) << 10)
32
lic
Si
CONTEXTID)
This is dangerous: there is a race between setting the ASID and changing the TTBRn
• New ASID could be used for walks from old translation tables, or vice versa
av
33
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Overview
20
2
2 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
Architecture versions
▪ Arm DynamIQ CPUs (such as Cortex-A55, Cortex-A75, and Cortex-A76) and Neoverse CPUs (E1 and
en
N1) implement:
▪ Armv8.1 extensions
▪ Armv8.2 extensions
▪ Advanced SIMD and floating point support (Optional in Cortex-A55)
▪ Cryptographic extension (Optional)
av
▪ RAS extension
▪ Armv8.3 LDPAR instructions
▪ DynamIQ and Neoverse cores are fully backwards compatible with Armv8-A software
3
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
3
▪ Large System Extensions
2
▪ Memory system
20
▪ Other Changes
4
4 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
▪ DynamIQ and Neoverse cores introduce new atomic memory access instructions to A64
▪ CAS - Compare and swap
▪ LD<OP> - Load and <operation>
▪ SWP - Swap
av
▪ DynamIQ and Neoverse cores support atomic instructions internally when memory is defined as
▪ Inner or Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write
M
5
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
3
x0 = tmp
2
▪ Load and Add: LDADD x0,x1,[x2]
x1 = *x2
*x2 = x1 + x0
20
▪ Swap: SWP w0, w1, [x2]
tmp = *x2
*x2 = w0
w1 = tmp
6
6 1101 rev 16641 Maven Silicon, 2023:04:20
on
▪ NOTE: These are pseudo code sequences, and the actual execution of the instructions would be
atomic
lic
Si
LDADD LDADD
LDADD + 1 6
Interconnect
6→7
M
Memory
Memory
7
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
3
▪ Accesses to a page/block will cause Access Flag bit to be set atomically by hardware without generating exception
▪ Can be triggered by speculative accesses
▪ Supported in AArch64 only
2
▪ DirtyBitModifier bit added to block and page descriptors (bit 51)
20
▪ If a page is marked as RO and DirtyBitModifier==1, accesses to the page will cause the AP bits to be updated to
RW
▪ Allows software to easily check whether a page had been written to
8
8 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
Agenda
en
av
▪ Memory system
M
▪ Other Changes
9
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
▪ DynamIQ and Neoverse cores introduce the CnP (Common not Private) bit added to table pointer
3
registers
▪ When set, VMIDs and ASIDs must have the same meaning on all PEs in the inner shareable domain
2
▪ Allows for sharing of TLB entries
PE
20
MMU
TLB Caches Memory
PE
MMU
▪ Used by Neoverse-E1 for helping to implement SMT (each PE can have different tasks for the same ASID or VM)
10
10
▪ Ignored on Cortex-A55 and Cortex-A75
Cortex-A76
▪ Only on DSU CHI configs; not supported on DSU ACE configs
I$ D$ PoU
▪ Persistent memory behaves like DRAM
▪ Similar read/write random access times, but state persists over power down
▪ Architecture is technology agnostic Private U$
av
▪ Pushes dirty data far enough out into memory system that it will persist over power down PoC
CHI interconnect with
System Level Cache
11
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
3
▪ Large System Extensions
2
▪ Memory system
20
▪ Other Changes
12
12 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
13
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
Other changes
▪ DynamIQ and Neoverse cores improve support for Type 2 hypervisors (discussed later)
▪ Adds support for 16-bit VMID
3
▪ Vector Saturating Rounding Doubling Multiply Accumulate Returning High Half
▪ Vector Saturating Rounding Doubling Multiply Subtract Returning High Half
2
▪ Hierarchical attributes can be disabled (AArch64 only)
▪ Controlled by the HPD bits in TCR_ELn
20
▪ When HPD==1, APTable, PXNTable and UXNTable bit in table entries are ignored by hardware
▪ Bits could potentially re-used by software, similar to the Reserved For Software Usage fields
▪ NSTable bit unaffected
14
14 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
Other changes
▪ The top 4 bits of block/page descriptors can be made IMPLEMENTATION DEFINED (instead of
en
IGNORED)
▪ Controlled by TCR_ELn/VTCR_ELx.HWn bits
▪ Stage 2 translations can now specify separate execute attributes for EL0 and EL1
▪ XN field extended to two bits
15
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
Other changes
▪ ACTLR2 becomes mandatory in Armv8.2-A
▪ Was previously optional
3
▪ New bits added to SCTLR_EL1 and SCTLR_EL2 switch between new and legacy behaviour:
▪ LSMAOE – Load/Store Multiple Atomicity and Ordering Enable
2
▪ nTLSMD – no Trap Load/Store Multiple to Device-nG* regions
20
16
16 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
Armv8.1 features
en
17
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
Armv8.2 features
Armv8.2 feature Supported?
Common Not Private translations (CnP) Limited
EL0 vs EL1 execute never control at Stage 2 Yes
3
Page based hardware attributes Yes
PSTATE control to Modify LDTR/STTR Yes
2
FP16 Yes
Larger VA, IPA and PA support No
20
Persistent memory Yes
Statistical profiling extension No
VMID aware PIPT instruction cache No
RAS extension Yes
18
18 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
en
Overview
M
19
Copyright © 2020 Arm Ltd. All rights reserved.
Not to be reproduced by any means without prior written consent.
3
▪ Allow for many more IMPLEMENTATION DEFINED event types
2
▪ New architecturally defined event types to address the gap to standard software APIs such as Linux perf-events, and
support for more levels of TLB and cache analysis
20
▪ These changes are made retrospectively to Armv8-A
20
20 1101 rev 16641 Maven Silicon, 2023:04:20
on
lic
Si
en
av
M
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
DynamIQ and
3
Neoverse Barriers
2
20
2
Confidential © 2023 Arm
on
lic
Si
Memory model
en
–
– Speculative cache line fills
Speculative instruction fetches are allowed to any region that’s executable at some exception level
3
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
– e.g. mail boxes
• Sharing data with peripherals
e.g. DMA operations
2
–
• Modifying instruction memory
– e.g. loading a program into RAM or scatter loading
20
• Modifying memory management scheme
– e.g. context switching or demand paging
Where access order is important you may need to use barrier instructions
4
4 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
Barriers
en
The Arm architecture includes barrier instructions to force access order and access completion at a specific point
This module provides an introduction to barriers and their use, but if you are writing code where ordering is important we
recommend also reading:
–
– Appendix F Barrier Litmus Tests
▪ Includes worked examples
5
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Data barriers
• Instruction barriers
• DynamIQ and Neoverse extensions
2 3
20
6
6 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
DMB vs DSB
en
A Data Memory Barrier (DMB) is less restrictive than a Data Synchronization Barrier (DSB)
DMB
• The DMB only affects the ordering of explicit data accesses
– Data cache operations treated as explicit data accesses
• Ensures that all explicit data accesses before the DMB in program order are observed before any explicit access after the DMB
av
DSB
• No instruction or explicit data access after a DSB can be started until:
– All explicit data accesses before the DSB in program order have completed
– All cache, branch predictor and TLB maintenance operations issued by the local processor have completed
M
7
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
DMB
Explicit memory accesses before the DMB are observed before any explicit access after the DMB
• Does not guarantee when the operations happen, just the order
3
DMB SY
ADD X2, #1 May be executed before or after memory system sees LDR
STR X3, [X4] Must be seen by memory system after LDR
2
The effects of any data/unified cache maintenance operations issued by this core before the DMB are observed by
20
explicit data accesses after the DMB
• No effect on operations broadcast by other cores
DC CVAC, X5
LDR X0, [X1] Effect of data cache clean might not be seen by this instruction
DMB SY
8
8 1383 rev 35733
LDR X2, [X3]
DSB
A DSB is more restrictive than a DMB
en
In a multi-core system, if cache/TLB/branch maintenance prediction operation is broadcast – the operation must have
completed on all cores that received it
• Operations received by the core via broadcast do not affect DSBs
9
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Different observers
The core’s instruction interface, data interface and MMU table walker are separate observers
• An observer is something that can make memory accesses (e.g. MMU generates reads to walk translation tables)
3
DC CVAU, X0 ; Operations are executed in any order
IC IVAU, X0 ; despite address dependency. Could lead
2
; to I cache re-fetching old values!
20
A DSB instruction is often needed between such operations:
DC CVAU, X0
DSB ISH
IC IVAU, X0 ; I cache now guaranteed to see new values
10
10 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
DMB and DSB take a qualifier option which defines the range of applicability for the barrier
• Optional in Armv7-A, mandatory in Armv8-A
• Shareability domain:
– Full System
– Outer Shareable
– Inner Shareable
M
– Non-shareable
11
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Barrier Qualifiers
Qualifier Ordered Accesses (Before-After) Shareability Domain
NSHLD Load-Load, Load-Store
NSHST Store-Store Non-shareable
NSH Any-Any
3
ISHLD Load-Load, Load-Store
ISHST Store-Store Inner Shareable
2
ISH Any-Any
OSHLD Load-Load, Load-Store
20
OSHST Store-Store Outer Shareable
OSH Any-Any
LD Load-Load, Load-Store
ST Store-Store Full System
SY Any-Any
12
12 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
; Write a new message into mail box ; Wait for available flag
STR X5, [X1] loop:
LDR X12, [X2]
DMB ISHST CBNZ X12, loop
M
13
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
P0
DC CIVAC, X1 ; Clean & Invalidate region
DMB SY
2
STR W0, [X2] ; Send flag to E1
WAIT ([X4] == 1) ; Has E1 completed?
Interconnect
DMB SY P0
20
DC IVAC, X1 ; Invalidate region again
LDR W5, [X1] ; Read new data RAM
E1
E1
WAIT ([R2] == 1) ; Is P0 ready?
STR R5, [R1] ; Save data
DMB
14
STR
After the first data cache clean/invalidate, P0 could speculatively re-fetch the region into the data cache
• If this happened during the writes by E1 the data cache could be populated with the wrong (old) data
• The barrier forces the data to be read after the completion flag is seen
– BUT the P0 would be reading that data from the data cache!
15
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Set up data in memory (uncached buffer)
• Write to register to have DMA controller copy data to another location
Interconnect
2
Core
STR X5, [X1] ; Store data to source buffer
RAM
DSB ST
DMA
20
STR W0, [X4] ; Write to DMA controller to
; begin transfer
DSB needed to ensure that the data is visible (globally observable) at the point in time that the DMA controller receives the
command
16
16 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
Load-Acquire (LDAR)
• All accesses after the LDAR are observed after the LDAR
• Accesses before the LDAR are not affected
av
LDR
STR
Accesses can cross a barrier in
LDAR one direction but not the other
LDR
M
STR
17
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Store-Release (STLR)
• All accesses before the STLR are observed before the STLR
3
• Accesses after the STLR are not affected
LDR
2
STR
Accesses can cross a barrier in
STLR one direction but not the other
20
LDR
STR
18
18 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
• LDAXR, STLXR
• Remove the need for explicit barrier instructions in synchronization code
LDR
STR
LDAR
M
LDR
Critical code section
STR
STLR
LDR
STR
19
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
LDR
2
STR
Regular accesses can cross STLR
STLR barrier, but LDAR cannot
20
LDR
STR
LDAR
20
20 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
Agenda
en
• Data barriers
• Instruction barriers
• DynamIQ and Neoverse extensions
av
M
21
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
ISB
The Arm architecture defines context as the system registers
3
The effect of a context-changing operation is only guaranteed to be seen after a context synchronization event
2
• Taking an exception
• Returning from an exception
• Instruction Synchronization Barrier (ISB)
20
An ISB flushes the pipeline, and re-fetches the instructions from the cache (or memory)
• Guarantees that effects of any completed context-changing operation before the ISB are visible to any instruction after the barrier
• Also guarantees that context-changing operations after the ISB instruction only take effect after the ISB has been executed
22
22 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
ISB example
en
FPU/Advanced SIMD are enabled in AArch64 by writing the Coprocessor Access Control system register (CPACR_EL1)
The ISB is a context synchronization event which ensures the enable is complete before any subsequent FPU/NEON instructions are
executed
M
23
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
TLBI VAE1IS, X10 ; Invalidate affected VA
DSB ISH ; Ensure completion of the TLB invalidation
ISB ; Synchronize context on this processor
2
The DSB is needed to ensure that the maintenance operations complete
20
The ISB is needed to ensure the effects of those operations are seen by the following instructions
24
24 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
P0 loads a new program into memory, which then gets executed by P0 and P1
P0
STR X11, [X1] ; Save instruction to program memory
DC CVAU, X1 ; Clean D$ so instruction is visible to I$
; (Note that clean to PoU may be NoP’d)
DSB ISH ; Ensure clean completes on all cores
IC IVAU, X1 ; Discard stale data from I$
av
P1-Pn
WAIT ([X2] == 1) ; Wait for flag signaling completion
M
25
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
There is no barrier between the writes to program memory and the D cache clean
• Both operations specify the same address and both data side operations (same observer)
• So guaranteed to be in program order
2
A DSB is needed between the data cache clean and I cache invalidate
20
• Although both operations specify the same address, one is a data operation and the other is an instruction side operation (different observers)
• The DSB forces the data side operation to complete before the instruction side operations start
In a coherent system, the DSB forces the cache operations to complete not just on P0 but also on the other cores
26
26 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
Agenda
en
• Data barriers
• Instruction barriers
• DynamIQ and Neoverse extensions
av
M
27
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
PEs PEs
Non-secure physical address space divided into LORegions
• Each LORegion is defined by one or more descriptors
3
– Number of supported regions and descriptors supported is reported by LORID_EL1
Socket1
Each descriptor is defined by:
2
• Start address (LORSA_EL1) and end address (LOREA_EL1)
• Region number (LORN_EL1), all the descriptors with the same region number together form a region
20
LDLAR – Load Acquire, within LORegion
• Explicit data accesses after the barrier, that access an address within the same LORegion, are observed
after the barrier
Socket0 Socket 1
LORegion0 LORegion1
STLLR – Store Release, within LORregion
• Explicit data accesses before the barrier, that access an address within the same LORegion, are observed
before the barrier Accesses by PEs on Socket 0 to LORegion0 don’t need to wait for
28
28 1383 rev 35733
Maven Silicon, 2023:04:20
on LORegion1 to complete (and vice versa).
lic
Si
LDR x8, B
Normal
A
av
LDLAR x0, A
STLLR x2, A
M
LORegion 1
Normal
C STR x4, C
LDR x3, A
Accessed address is in a
different LORegion, so
unaffected by LDLAR/STLLR
29
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
DynamIQ and Neoverse CPUs support a new Load-Acquire instruction with a weaker release consistency
• LDAPR / LDAPRB / LDAPRH
• These instructions use a “processor consistent” consistency model
3
The requirement that load-acquires be observed in-order with preceding store-releases is dropped for these new Load-Acquire
instructions
2
LDR
20
STR
STLR
LDAPR
LDR
STR
30
30 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
Thank You
Danke
en
Gracias
谢谢
ありがとう
av
Asante
Merci
감사합니다
M
धन्यवाद
Kiitos
شكرا
ً
ধন্যবাদ
תודה
Confidential © 2021 Arm
31
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Appendix
2 3
20
32
Confidential © 2023 Arm
on
lic
Si
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of
subsequent evaluations shall have taken place.
Section 5.1.2.3 of the C Specification (ISO/IEC 9899:TC3)
av
Examples of sequence points include function calls and accesses to volatile variables
When writing low-level code in C/C++ you may need to consider sequence points as well as the architectural barriers
33
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
P0 P1
Where the sender and receiver(s) can both see the GIC, a DMB is enough
3
for correct operation
Interconnect The DMB ensures that the second store cannot be observed without the first store also being
observable
2
For P1 to receive the interrupt, the write to [x4] must be globally visible
RAM
20
P0:
P1:
34
34 1383 rev 35733
Maven Silicon, 2023:04:20
on
lic
Si
P0 GPU
In this example a custom component is used to generate an interrupt
Interconnect IRQGen • This component is only visible to P0, never visible to P1
P1
P0:
M
35
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Interconnect
2
RAM IRQGen
20
P0:
36
LDR r5, [r1]
www.arm.com/company/policies/trademarks
av
M
37
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
DynamIQ and
3
Neoverse Caches
2
20
Confidential © 2023 Arm Limited
on
2
lic
Si
Agenda
en
3
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Caches
• DynamIQ and Neoverse CPUs processors are implemented with multiple levels of cache
• Small, separate L1 Instruction and Data caches per core
• Similarly sized unified L2 cache per core
• A larger unified L3 cache per cluster with an integrated snoop filter
3
MMU uses translation tables and translation registers to control which memory locations are cached
2
L3 Cache/Snoop Filter
L2 Cache
CPU0
AMBA Interconnect
AMBA Interconnect
Control
20
MMU
I-Cache APB
D-Cache
L2 Cache
CPU1
Control
Space
MMU
External
I-Cache SRAM
DRAM
D-Cache
• Virtual address used to determine the location of data in cache Tag Index Offset
• Dirty data bit(s): indicate whether the data in the cache line is not coherent with external memory
5
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
CPU
3
Virtual address
MMU
2
TLB / Page Table Walk
Physical address
20
I-Cache D-Cache
Address
Tag Set (Index) Word Byte
N 14 13 6 5 2 1 0
av
Cache Line
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Tag v d Data
Line 0
M
Line 1
…
LDR X0, [0x0000007C]
Victim
…
Counter 1) Cache lookup is performed
Line 254
2) Cache miss; tag matches fail for given index in both cache ways
Line 255
3) Cache linefill is performed
4) Victim counter specifies which cache way to use (will evict previous data)
5) Cache returns requested word to the CPU
7
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• CPUs (typically) allocate into L1 caches for a cacheable read (including instruction fetch) or a write
• Allocation into L2 depends on CPU behavior
3
• For eviction (from allocating a cache line to a full cache), the cache selects the way indicated by the victim counter
• Victim counter can select by random, round-robin, or least recently used (LRU) – again, depending on the CPU
2
• There are Several options for cache behavior between L1 and L2:
• Strictly inclusive: Any cache line present in an L1 cache will also be present in the L2
20
• Weakly inclusive: Cache line will be allocated in L1 and L2 on a miss, but can later be evicted from L2
• Fully exclusive: Any cache line present in an L1 cache will not be present in the L2
• For fully exclusive D-caches, data typically only allocates into L2 following an L1 eviction
• There are some exceptions – preloads, write streaming (details in TRM for each CPU)
CPU[0] CPU[1]
Read A
• L1 miss L1-I L1-D L1-I L1-D
A
• L2 miss
• L3 miss (and snoop filter miss)
L2 Cache L2 Cache
A
av
Data A
9
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
L2 Cache L2 Cache
A
• SCU issues Read on master interface
2
• Data is returned on master interface
SCU-L3
• Does not allocate into L3
L3/Snoop Filter Control
20
Snoop Filter L3 Cache
• Data is allocated into Core 0 L2 and L1-D A: 0
Buffers
Master
Data A
CPU[0] CPU[1]
Read A
• L1 miss L1-I L1-D L1-I L1-D
A
• L2 miss
• L3 miss (and snoop filter miss)
L2 Cache L2 Cache
av
Data A
11
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
L3 cache allocation
Cluster
3
• Core 0 evicts the line from its cache
• Core 0 de-allocation, L3 allocation
2
SCU-L3
L3/Snoop Filter Control
• Core 1 issues a read, hitting in the L3 cache
20
Snoop Filter L3A
Data Cache
Snoop A
• L3 de-allocation, Core 1 allocation (Exclusive) A:
A:0,1
0
1 Data
A A
Buffers
Master
• Cache line will be valid in only one cache until accessed by multiple cores Data A
Agenda
en
13
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• Policies include:
• Cacheable / Non-cacheable
3
• Cache policies for data regions include
‐ Read / Write-allocate
‐ Write-Back Cacheable / Write-Through Cacheable
2
‐ Shareability (Discussed elsewhere)
20
• For cache coherency across multi-core systems, specific cache policies and memory attributes must be used
• Arm Cortex-A processors require write-back and shareable
• Update policies
• Write-back: Write may update the cache only (and cache line is marked as dirty)
av
• Write-through: Write updates both the cache and the external memory system
• The implementation determines which cache allocation and update policy combinations are available
15
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Not exactly a “cache enable”, but allows cores to issue cacheable memory accesses
‐ Those accesses can look up and allocate into data/unified caches
• When SCTLR.C = 0, all Normal memory accesses are downgraded to Normal non-cacheable (no lookup, no allocate)
2
• SCTLR.I: Instruction cache enable bit
• When SCTLR.I = 1:
20
‐ L1 I-cache allocation is possible
‐ Allocation into downstream caches (e.g. L2/L3) is possible when the effective memory type is write-back and SCTLR.C = 1
• Write updates both the Cache and the external memory system
• Write-Through accesses do not produce Dirty data
• Writes are allowed to only update the cached copy of the data
• Write-Back accesses can lead to Dirty data
• Eviction of a Cache Line containing Dirty data results in a write to the next level of memory
M
D$ D$
L2 Cache
Evictions
Write Buffer Write Miss
Write Buffer
External
Writes
17 1494 rev 32200
Maven Silicon, 2023:04:20
17
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Inner Cacheable Outer Cacheable
Bus Interface
System
20
Cores Memory
Cores
Cores L1$/ L3$ Level
Cores System
L2$ Cache
• Meaning the core can potentially automatically load data it thinks will be used
• Core-specific algorithms will determine when speculative read accesses will occur
• For instance: Core will start speculatively pre-fetching data if code performs consecutive loads from memory
•
av
19
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• PRFM instructions indicate addresses that are likely to be accessed in the near future
• Helps allocate cache lines before they are needed
3
• Several variants: PLDL1KEEP, PLIL2STRM, PSTL3STRM, …
2
<Prefetch type> <Level of cache> <Single-use or multi-use>
20
PLD - Prefetch for load L1 - Level 1 cache KEEP - Retained or temporal prefetch
PST - Prefetch for store L2 - Level 2 cache • Allocated in the cache normally
PLI - Preload instructions L3 - Level 3 cache STRM - Streaming or non-temporal prefetch
• For data that is used only once
• AArch32 prefetch hint instructions: PLD Rm, PLI Rm
• Preload data or instructions from address in Rm to cache
• Behavior can be hinted at by either marking memory in page tables, or by using specific instructions
av
• For example, memory not marked as transient in page table will be marked in cache if allocated using those instructions
21
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• LDNP and PFRM *L1STRM cause allocation into the L1 data cache
‐ But cache lines will be marked as non-temporal/transient
3
• STNP: Store Non-temporal Pair
‐ May or may not allocate into CPUs caches
2
20
22 1494 rev 32200
Maven Silicon, 2023:04:20
on
22
lic
Si
Agenda
en
23
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Cache maintenance
• Data or instructions may be fetched from the caches rather than from external memory
3
• Type of operation
‐ Invalidate - makes changes in the outer domains visible to the cache user
‐ Clean - makes changes in the cache visible to outer domain user(s)
2
‐ Zero - zero a block of memory (only available for data caches)
• Which entries
‐ All - the entire cache (not architecturally supported for data/unified caches)
20
‐ MVA or VA - a cache line containing a specific virtual address
‐ Set/Way - a specific cache line
• Scope
‐ PoC - Point of Coherency
‐ PoU - Point of Unification
‐ PoP - Point of Persistence
• Shareability
‐ Operations that can be broadcast
• PoU: Point at which instruction, data, and TLB accesses see the same copy of memory
• PoC: Point at which all agents see same copy of memory, generally memory system
av
I$ I D D$ D D$ D D$
TLB
25
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
CPU A System Control Registers System Control Registers CPU B
2
Point of Point of
I$ I D D$ I$ I D D$
Unification Unification
20
TLB TLB
L2 Cache L2 Cache
DSU L3 Cache
• Only on DSU CHI configs; not supported on DSU ACE configs Cortex-A76
NVRAM PoP
27
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
AArch64 instructions
• AArch64 cache maintenance operations are initiated by dedicated instructions
3
<function> <type> [<point>] {IS}
IC - Instruction Cache Passes an address
2
DC - Data Cache argument where required
20
I – Invalidate IS – Inner Shareable
C – Clean
CI – Clean & Invalidate VA – By Address U – Point of Unification
Z – Zero SW – By Set/Way C – Point of Coherency
ALL – Entire cache P – Point of Persistence
28
lic
Si
Maintenance broadcasts
• In multi-core systems, we do not know which core may have a specific address in its caches
en
• For instance, the core issuing a cache clean/invalidate by VA may not be the core that holds the addressed cache line
29
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
IC IVAU, Xt I-Cache Invalidate by Address to Point of Unification Based on VA
DC IVAC, Xt D-Cache Invalidate by Address to Point of Coherency Based on VA
2
DC ISW, Xt D-Cache Invalidate by Set/Way No
DC CVAC, Xt D-Cache Clean by Address to Point of Coherency Based on VA
20
DC CSW, Xt D-Cache Clean by Set/Way No
DC CVAU, Xt D-Cache Clean by Address to Point of Unification Based on VA
DC CIVAC, Xt D-Cache Clean & Invalidate by Address to Point of Coherency Based on VA
DC CISW, Xt D-Cache Clean & Invalidate by Set/Way No
For operations by VA, the Shareability attribute of the address determines whether it is broadcast and to which domain
respective owners.
31
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Cache Discovery
3
Appendix
2
20
Confidential © 2023 Arm Limited
on
32
lic
Si
Level 1 Level 2
M
…
Line Size Line Size
Set Set
Way Way
33
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• The cache line size is listed in the Cache Type Register (CTR , CTR_EL0)
• Can be made accessible by EL0 by setting the UCT bit of the System Control Register (SCTLR , SCTLR_EL1)
3
• Two register accesses are needed to determine the number of sets and ways:
• Write to the Cache Size Selection Register (CSSELR , CSSELR_EL1) to select which cache you want information for
• Read of the Cache Size ID Register (CCSIDR , CCSIDR_EL1)
2
• The Data Cache Zero ID Register (DCZID_EL0) contains the block size that will be zeroed for Zero operations
•
20
SCTLR.DZE/SCTLR_EL1.DZE controls if execution of the DC ZVA instruction is allowed at EL0
• HCR.TDZ/HCR_EL2.TDZ controls trapping of DC ZVA instruction
Non-integrated caches
• CLIDR/CLIDR_EL1 only tell you how many levels of cache are integrated into the core
en
• The core is not aware of how many cache levels are outside
• For example:
CPU
• If only L1 and L2 are integrated, CLIDR/CLIDR_EL1 will identify 2 levels of cache
L1 Cache
• May need to take into account non-integrated caches when
av
L3 Cache
AMBA Interconnect
35
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
AArch32 Maintenance
3
Instructions
2
Appendix
20
Confidential © 2023 Arm Limited
on
36
lic
Si
AArch32 mnemonics
• AArch32 Cache / branch prediction maintenance operations are CP15 operations
en
IS – Inner shareable
av
U – PoU
C – PoC
IC – Instruction cache
DC – Data cache / unified cache
BP – Branch predictor
Examples:
• DCCIMVAC – Data cache clean and invalidate to PoC, by MVA
• BPIMVA – Branch predictor invalidate by MVA
37
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
DCCMVAU D-Cache Clean by MVA to PoU Maybe*
DCCIMVAC D-Cache Clean & Invalidate by MVA to PoC Maybe*
2
DCCISW D-Cache Clean & Invalidate by Set/Way No
ICIALLUIS I-Cache Invalidate All to PoU Inner Shareable Yes (inner only)
20
ICIMVAU I-Cache Invalidate by MVA to PoU Maybe*
TLBIMVA TLB Invalidate by MVA No**
TLBIMVAIS TLB Invalidate by MVA Inner Shareable Yes (inner only)
38
• And details of “broadcast by shareability”
1494 rev 32200
Maven Silicon, 2023:04:20
on
38
lic
Si
en
Appendix
M
39
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• This applies even when SCTLR_ELn.I=0
• This means you must ALWAYS invalidate the instruction cache(s) after writing to instruction memory.
2
‐ For example:
20
DSB ISH ; Ensure visibility of the data stored
IC IVAU, [Xn] ; Invalidate instruction cache by VA to PoU
DSB ISH ; Ensure completion of the invalidations
ISB
Appendix
M
41
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
PA 0x8000 Data
Line 255
• CPU module will list the policy each CPU implements
2
• For VIPT and PIPT caches maintenance is necessary when data is written to a physical address that contains
instructions
20
• If the same physical address is held in a VIPT cache addressed by different virtual addresses
• Cache operations by VA are not guaranteed to affect all copies of that PA
• IC IALLx operations are necessary to affect all aliases
Thank You!
en
Danke!
Merci!
谢谢!
av
ありがとう!
Gracias!
M
Kiitos!
43
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Cache Coherency
2
20
Confidential © 2023 Arm Limited
on
2
lic
Si
Agenda
en
• Introduction to coherency
3
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Core 0 Cache
3
Memory DMA
Core 1 Cache
2
• Cache coherency is an issue in any system that
• Contains one or more cache
20
• Has more than one device sharing data in a single cached memory area
4
Coherency can be maintained in software or in hardware
• Hardware coherency support for SMP Operating Systems running on all of the CPUs
• Data from pages marked as Shareable and Write-Back cacheable can be cached and kept coherent between caches
• Maintenance instructions on one core may be broadcast to other cores
DynamIQ DynamIQ
av
L2 L2 L2 L2 L2 L2 L2 L2
DSU L3 DSU L3
Memory
5
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Neoverse-N1 Neoverse-N1 Neoverse-N1 Neoverse-N1
2
D$ I$ D$ I$ D$ I$ D$ I$
L2 L2 L2 L2
20
CHI-based Coherent Interconnect
Memory
programming)
• Should be set at system design time
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU GPU
Non-shared
Inner Shareable
M
Outer Shareable
• For Cacheable regions, the translation tables specify which domains will access a particular region of memory
• This controls how the processor handles cache coherency
• The translation tables must accurately reflect which domains will access a given location
7
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Hardware coherency support for an SMP OS running across multiple DynamIQ or Neoverse clusters
• Data from pages marked as Shareable can be cached
2
• Instruction cache and TLB maintenance operations are broadcast across the interconnect
• In AMBA these are sent as Distributed Virtual Memory (DVM) messages
• DVM messages are only sent between masters in the same Inner Shareable domain
20
• Marking an area as Non-cacheable or Device implicitly marks that region of memory as being accessed by masters in the System
domain
D$ I$ D$ I$ D$ I$ D$ I$ Mali-T604
Shader
Shader
Shader
L2 Cache
Core
Core
Core
DMA
L3 Cache Cache
ACE ACE-Lite ACE-Lite
M
Memory
9
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Introduction to coherency
3
• Coherency details – multi-processor systems
2
20
10 1495 rev 33118
Maven Silicon, 2023:04:20
on
10
lic
Si
• On DynamIQ and Neoverse clusters, level 1,2, and 3 caches are all part of the inner cache domain
‐ Shared – Cache line is clean and may be present in more than one L1 cache
‐ Invalid – Cache line is invalid
• Note: the Arm architecture does not dictate the mechanisms used to manage coherency
• This is a micro-architectural detail that can vary significantly between implementations
11
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
D$ I$ D$ I$ D$ I$ D$ I$
2
L2 L2 L2 L2
20
TAG TAG TAG TAG
12
lic
Si
LDR X1, [X0, #0x4] Will make line Exclusive LDR X1, [X0, #0xC]
ADD X1, X1, #0x1 ADD X1, X1, #0x3
STR X1, [X0, #0x4] STR X1, [X0, #0xC]
; Clean cache line
DC CVAC, X0
av
Snoop filter
M
VI VI
MESI MESI
Core 0 Core 1
13
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Clean cache line
DC CVAC, X0
2
L3 copy of location Label
0x0 0x7 MESI
20
Snoop filter
0x801C0090 VI VI
14
lic
Si
Snoop filter
M
0x801C0090 VI VI 0x801C0090
15
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Clean cache line
DC CVAC, X0
2
L3 copy of location Label
0x0 0x7 MESI
20
Snoop filter
0x801C0090 VI VI
16
lic
Si
Snoop filter
M
VI VI 0x801C0090
17
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Clean cache line
DC CVAC, X0
2
L3 copy of location Label
0x3 0x8 MESI
20
Duplicated TAG RAMs
VI VI 0x801C0090
18
lic
Si
Agenda
en
• Introduction to coherency
19
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Multi-cluster coherency
• Arm provides a number of interconnect options for maintaining cross cluster coherency, including:
• CCI-XXX Cache Coherent Interconnect
‐ Implements AMBA 4 Coherency Extensions (ACE)
‐ Supports 1 to 6 ACE Masters (depending on product)
‐ The ACE protocol supports MOESI-based cache line states for cross-cluster coherency
3
• CMN-600 Coherent Mesh Network
‐ Implements AMBA 5 Coherent Hub Interconnect (CHI)
‐ Supports larger number of masters – flexible configuration
2
‐ Includes integrated System Level Cache
20
• Shareable data transactions and broadcast cache/TLB maintenance operation
• BROADCASTOUTER signal must be tied high (configurable during system design) to allow coherency operations
• The interconnect can be programmed dynamically to disable snooping and maintenance operation broadcasts
• Allows for clusters to be removed from coherency management
‐ Example: When an entire cluster (including its L3 cache) is powered-down
Bus Slave
21
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Master 0 Master 1 Master 2
Read
20
Cache Coherent Interconnect
Bus Slave
Bus Slave
23
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Master 0 Master 1 Master 2
Data
20
Cache Coherent Interconnect
Bus Slave
** If snoop hit had not occurred, the data would have been fetched from the Bus Slave
av
Bus Slave
25
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Master 0 Master 1 Master 2
0xAABBCCDD 0xAABBCCDD 0xAABBCCDD
20
Cache Coherent Interconnect
Bus Slave
Addr
M
Bus Slave
27
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Master 0 Master 1 Master 2
0xAABBCCDD 0xAABBCCDD 0xAABBCCDD
20
Snoop Snoop
Bus Slave
Resp Resp
M
Bus Slave
29
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Master 0 Master 1 Master 2
0xAABBCCDD -- --
20
Resp
Bus Slave
Bus Slave
31
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Thank You!
Danke!
Merci!
3
谢谢!
2
ありがとう!
Gracias!
20
Kiitos!
3
Virtualization
2
20
2
Confidential © 2020 Arm Limited
on
lic
Si
Agenda
• Introduction
en
• Armv8-A Recap
av
3
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
What is virtualization?
Guest OS Guest OS
3
Hypervisor
2
Virtualization is the ability to create virtual machines that act like real machines
• These virtual machines can run their own OS, often referred to as a Guest OS
20
A Hypervisor or Virtual Machine Manager controls allocation of physical resources and execution time
4
4 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
5
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Extended VMID size (8bits to 16bits)
2
Larger VA/IPA/PA size
20
Extended Stage 2 execute permissions
6
6 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Agenda
• Introduction
en
• Armv8-A Recap
av
7
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Armv8-A virtualization
Armv8-A provides architectural support for virtualization
• Support for trapping system register accesses and instructions
• Two stage translation for EL1/0
• Virtual exceptions
• Virtual timer in System Timer architecture
3
Normal world Secure world
2
Trusted Trusted
EL0 App App App App
Service Service
20
EL1 OS Kernel OS Kernel Trusted Kernel
8
Support is optional, and no support for virtualization in secure state
Instruction/register trapping
The Hypervisor can be configured to trap certain instructions
en
Some registers have dedicated control for virtualization purpose, so no trapping is required
• For example, MIDR_EL1, MPIDR_EL1
9
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Virtual Memory Map Physical Memory Map
(Under control of Guest OS)
20
Peripherals Peripherals
OS Kernel
Translation Flash
Tables
App RAM
10
10 1381 rev 32514
TTBR0/1_EL1
Maven Silicon, 2023:04:20
on
lic
Si
OS Kernel RAM
Translation Flash Translation Peripherals
Tables Tables
RAM
TTBR0/1_EL1 VTTBR_EL2
11 1381 rev 32514
Maven Silicon, 2023:04:20
11
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
“Physical” addresses from the OS are now called “Intermediate Physical addresses”
• Translated to real PA by second stage of translation, under control of a Hypervisor
2
Hypervisor configures and enables stage 2 translations
• All addresses from EL0 and EL1 will still be translated by MMU hardware
• Hypervisor is not called for each translation
20
12
12 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Translation regimes
OS will create translation tables as normal, these are “Stage 1” tables
en
Stage 2 regime is configured from EL2, but used for EL0 & 1 accesses
av
TTBRn_EL1 VTTBR_EL2
App App & OS Kernel
EL0/1/2 Physical Memory Map
13
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Once fetched the entire VA – PA translation may be stored in the TLB
• Intermediate steps of the translation may be cached as well
• This is a decision for the implementation – not architecturally specified
2
A TLB hit may be same performance as without stage 2 translations
20
Likely to be more pressure on TLB
2nd stage tables will typically map far more memory than 1st stage tables
• Can map in larger blocks to mitigate this
14
14 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
VMID
Each virtual machine is assigned a VMID (Virtual Machine ID)
en
Legacy support for 8-bit VMID (max. VMID size on non-DynamIQ and Neoverse CPUs)
‐ VTCR_EL2.VS selects VMID size:
‐ 0 – 8-bit
av
‐ 1 – 16-bit
Similar to ASIDs, VMIDs are used to tag address translations as belonging to a particular virtual machine
• VMID is significant, even when virtualization is disabled, so it should always be configured
M
For guest accesses, TLBs store complete VA→IPA→PA translation in one entry
• VMID ensures that only the correct virtual machine can hit on TLB entry
• May remove the need to invalidate TLBs on switching between guests
15
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Guest OS (A) Guest OS (B)
2
UART Driver
20
Hypervisor
16
16 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Virtualizing exceptions
en
App (EL0)
Hypervisor (EL2)
17
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
App (EL0)
3
Guest OS (EL1)
2
FIQ FIQ
IRQ IRQ
20
SError SError
Hypervisor (EL2)
18
18 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Virtual exceptions
There are three virtual exceptions: Virtual Serror, Virtual IRQ, Virtual FIQ
en
• Can only be signalled if corresponding physical exceptions are configured to be routed to EL2
Virtual exceptions are always masked when executing in EL2 and EL3
• When enabled and pending, they will be taken when the core returns to EL0/EL1
• In EL0/EL1, each virtual exception is masked by the corresponding PSTATE mask bits
M
19
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Virtual CPU Interface – Used by virtual machine to handle virtual interrupts
• Virtual interface control – Used by Hypervisor to configure Virtual CPU Interface
2
20
Interrupt Controller
Distributor
CPU 0
20
20 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Virtual
CPU Interface
Interrupt CPU
Distributor
Physical
CPU Interface
21
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
20
Virtual
CPU interface
External Interrupt
Source
Distributor CPU
Physical
CPU interface
22
22 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Virtual
CPU interface
External Interrupt
Source
Distributor CPU
Physical IRQ
CPU interface
23
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
20
Virtual
CPU interface
External Interrupt
Source
Distributor CPU
Physical IRQ
CPU interface Hypervisor
24
24 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Virtual
CPU interface
External Interrupt
Source
Distributor CPU
Physical Hypervisor
CPU interface
25
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
20
Virtual vIRQ
CPU interface
External Interrupt
Source
Distributor CPU
Physical Hypervisor
CPU interface
26
26 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Virtual vIRQ
CPU interface Guest OS
External Interrupt
Source
Distributor CPU
Physical
CPU interface
27
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
6. EL2 returns to EL0 or EL1, back to virtual machine
7. CPU takes an IRQ exception, and Guest OS running on the virtual machine reads the interrupt status from the Virtual CPU
2
Interface
20
Virtual vIRQ
CPU interface Guest OS
External Interrupt
Source
Distributor CPU
Physical
CPU interface
28
28 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Generic Timer
Counter Module generates a fixed frequency incrementing count, which is distributed to all cores
en
29
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Physical Count Virtual Offset
(CNTPCT_EL0) (CNTVOFF_EL2)
2
-
20
Virtual Count
(CNTVCT_EL0)
30
30 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
Agenda
• Introduction
en
• Armv8-A Recap
av
31
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Not as well suited to Type 2 hypervisors (where host OS runs “over” the hypervisors)
• EL2 not suited for running host OS; requires lots of system register manipulation
3
Virtualization Host Extensions (VHE) make it easier to run common kernel code at EL2
• Only available when EL2 is AArch64
2
When the EL2 Host is enabled (HCR_EL2.E2H==1):
• EL2 virtual address space gains an upper translation region, described by TTBR1_EL2
20
• Meaning of HCR_EL2.TGE==1 is redefined
‐ HCR_EL2.TGE controls whether EL0 uses the EL1&0 or the EL2 translation regime
• EL2 gains ASID support
• Accesses to _EL1 system registers at EL2 re-directed to equivalent _EL2 registers
‐ _EL1 registers accessible using _EL12 alias
• Additional virtual counter for EL2
32
32 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
TTBR0_EL2
av
TTBR0_EL1
M
33
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Guest OS
OS (EL1) Translation Peripherals
Tables
TTBR1_EL1 Translation
FLASH Tables
Application VTTBR0_EL2
Translation RAM
Guest Application (EL0) Tables Peripherals
3
TTBR0_EL1
Virtual memory map Physical memory map
seen by guest (IPA) RAM
Under control of Guest OS(s)
2
Peripherals
Host OS (EL2) Host OS
Translation
Tables TTBR1_EL2
RAM
20
Host Application
Host Application (EL0) Translation FLASH
Tables TTBR0_EL2
Virtual memory space seen
by Host OS and Hypervisor
Monitor
Secure Mon (EL3) Translation Real physical memory map
Tables TTBR0_EL3
Virtual memory space seen
34
34
by Hypervisor & Secure Monitor
Exception routing
When HCR_EL2.E2H==1 and HCR_EL2.TGE==1, all exceptions in EL0 are routed to EL2
en
FIQ
IRQ Application EL0
SError
av
Guest OS EL1
M
FIQ FIQ
IRQ IRQ Hypervisor EL2
SError SError
35
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Required for any access to EL2 registers HVC
• Configuring Stage 2 translations
2
Perform Context
Switch between Host Trap
Moving from Guest to Host OS requires a full and VM execution Lowvisor
context. Switch
context switch of EL1 system registers EL2
20
Configure VGIC,
virtual timer.
Setup stage 2
translation registers.
Perform Context Switch between VM
Enable stage 2 and Host execution context.
translation
Disable virtual interrupt.
Enable traps
Disable stage 2 translation.
Disable traps
36
36 1381 rev 32514
Maven Silicon, 2023:04:20
on Pre-DynamIQ and NeoverseType 2 Virtualization example
lic
Si
37
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Running Host OS
Non-secure state Host OS runs at EL2
EL0 Guest App(s) Guest App(s) Host App(s) Translation Regime has split tables and ASIDs
• HCR_EL2.E2H==1
3
EL1 Guest Guest
OS OS “EL1” AT commands and TLB Invalidates
operate of EL2 regime and use ASIDs
2
• HCR_EL2.TGE==1
EL2
Hypervisor
20
Host OS Configuration stored in EL2 system registers
• Accessed using EL1 opcodes
HCR_EL2.E2H HCR_EL2.TGE
Host OS 1 1
38
38 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
EL1 OS OS
HCR_EL2.E2H==1
EL2
Hypervisor • EL2 translation regime has TTBR1
Host OS • Uses ASIDs
M
HCR_EL2.E2H HCR_EL2.TGE
Note: behaviour of TGE is changed to allow EL0
Host OS 1 1
exceptions to be masked by PSTATE.{I,F,A}
Host app 1 1
39
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Running Guest OS
Non-secure state Guest OS runs at EL1
No Host OS context switch required
EL0 Guest App(s) Guest App(s) Host App(s) • Host OS configuration still in EL2 registers
• Guest OS configuration in EL1 registers
3
EL1 Guest Guest
OS OS HCR_EL2.TGE==0
• Allows execution at EL1
2
• Allows exception routing to EL1
EL2
Hypervisor
20
Host OS HCR_EL2.E2H only matters for EL2
• Would probably still be set
HCR_EL2.E2H HCR_EL2.TGE
Host OS 1 1
Host app 1 1
Guest OS x 0
40
40 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
OS OS
EL2
Hypervisor
Host OS
M
HCR_EL2.E2H HCR_EL2.TGE
Host OS 1 1
Host app 1 1
Guest OS x 0
Guest app x 0
41 1381 rev 32514
Maven Silicon, 2023:04:20
41
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
When HCR_EL2.E2H==1 and HCR_TGE==1
• Accesses to the EL1 virtual and physical timers access the EL2 timers
2
• Virtual offset (CNTVCT_EL0) is treated as zero when read from EL2 and EL0
20
42
42 1381 rev 32514
Maven Silicon, 2023:04:20
on
lic
Si
en
av
M
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Synchronization
2
20
Confidential © 2023 Arm
on
2
lic
Si
Agenda
en
• Synchronization background
• Enforced atomicity
• Measured atomicity
• Local and global exclusive monitors
av
M
3
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• This is a Bad Thing
2
Thread 1 / Core 1
Thread 2 / Core 2
20
Increment:
LDR X0, =shared Increment:
LDR X1, [X0] LDR X0, =shared
ADD X1, X1, #1 LDR X1, [X0]
STR X1, [X0] ADD X1, X1, #1
STR X1, [X0]
Critical sections
en
Any set of compound operations needing to be atomic can be considered a critical section of code
• This becomes important if multiple threads have access to the same data
Increment:
BL lock
LDR X0, =shared
M
5
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
lock:
read lock_variable ; critical section read
3
if (lock_variable is UNLOCKED)
; critical section write
2
set lock_variable = LOCKED
else
goto lock
20
This is still an improvement over not using lock()
• Critical section read–write race has been isolated to this function
• But some way is needed to make the read & write of the lock variable atomic
DynamIQ and Neoverse CPUs provide two mechanisms for atomically modifying data
• Will interwork with legacy processors which do not support the new enforced atomicity operations
– For example DynamIQ cluster(s) + Cortex-M3 power controller
7
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Synchronization background
• Enforced atomicity
• Measured atomicity
3
• Local and global exclusive monitors
2
20
8 1509 rev 38654
Maven Silicon, 2023:04:20
on
8
lic
Si
Support for atomic accesses on other memory types will require fabric / interconnect support
M
9
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• R= X or W
tmp = *Xn;
if (*Xn == Rs)
3
*Xn = Rt;
Rs = tmp;
2
• W registers only
• Results are always zero-extended, not sign-extended
20
Compare and swap register pair: CASP <Rs>, <Rt>, [<Xn|SP>]
• Same behaviour as for CAS
• <Rs>:<Rs+1> and <Rt>:<Rt+1> are treated as a single 128-bit value
• R = X or W
av
tmp = *Xn;
*Xn = *Xn <OP> Rs;
Rt = tmp;
• R = X or W
11
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
EOR Atomic Exclusive OR SET Atomic bit set
SMAX Atomic signed maximum SMIN Atomic signed minimum
2
UMAX Atomic unsigned maximum UMIN Atomic unsigned minimum
Example usages
20
LDEOR X0, X0, [X14]
LDCLR X1, X8, [SP]
STSMAXH W7, [X2]
Swap
Swap memory contents: SWP <Rs>, <Rt>, [<Xn|SP>]
en
• R = X or W
tmp = *Xn;
*Xn = Rs;
av
Rt = tmp;
13
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Ordering Requirements
The atomic memory accesses are treated as performing both a load and a store
• For permission checking
• For data watchpoints
Atomic memory access instructions are provided with ordering options, which map to the architectural acquire and release
3
definitions
• Acquire, A; Release, L; Acquire and Release, AL
– ST<OP> instructions only support the release variant
2
Example usages
20
CASPAL X0, X0, [X14]
SWPA X1, X8, [SP]
STUMAXLH W7, [X2]
The architecture does not specify how the atomics are implemented
• There are a number of possible approaches
LDADD LDADD
LDADD + 1 6
Interconnect
6→7
M
Memory
Memory
15
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
#define LOCKED 1
3
#define UNLOCKED 0
; void lock(lock_t *ptr)
2
lock:
MOV W1, #LOCKED
; The unsigned maximum of (UNLOCKED, LOCKED) should
20
; return UNLOCKED if we’ve just set LOCKED
5: LDUMAXA W1, W1, [X0]
CBNZ W1, 5b
RET
Agenda
en
• Synchronization background
• Enforced atomicity
• Measured atomicity
• Local and global exclusive monitors
av
M
17
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
• (R = X or W):
3
• Store Exclusive: STXR <Ws>, <Rt>, [<Xn|SP>]
Ws indicates whether the store completed successfully (0 = success, 1 = failure, these are the only possible values)
2
–
20
– LDXP <Rt1>, <Rt2>, [<Xn|SP>]
– STXP <Ws>, <Rt1>, <Rt2>, [<Xn|SP>]
18
– W registers only
An Exclusive Monitor is used to monitor a location between the exclusive load and store
• The LDXR/LDREX instruction causes the Exclusive Monitor to flag the accessed address as “exclusive”
• The STXR/STREX instruction checks whether the Exclusive Monitor is still in the exclusive state
– Store will only happen if exclusivity check passes
The Exclusive Monitor does not prevent another core or thread from reading or writing the monitored location
av
• It only monitors for whether the location has been written to since the LDXR/LDREX
LDXR
STXR FAIL LDXR
Open Exclusive
19
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
B.EQ lock ; If LOCKED, try again
; Attempt to lock
2
MOV W1, #LOCKED
STXR W2, W1, [X0] ; Attempt to lock
CBNZ W2, lock ; If STXR failed, try again
20
RET
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Monitors
Open Open
Thread 0 Thread 1
21
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Attempt to lock ; Attempt to lock
MOV W1, #LOCKED MOV W1, #LOCKED
STXR W2, W1, [X0] STXR W2, W1, [X0]
2
CBNZ W2, lock CBNZ W2, lock
DMB SY DMB SY
RET RET
20
Memory at virtual address 0x00080020
UNLOCKED
Monitors
Open Open
22
lic
Si
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Monitors
23
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Attempt to lock ; Attempt to lock
MOV W1, #LOCKED MOV W1, #LOCKED
STXR W2, W1, [X0] STXR W2, W1, [X0]
2
CBNZ W2, lock CBNZ W2, lock
DMB SY DMB SY
RET RET
20
Memory at virtual address 0x00080020
UNLOCKED
Monitors
24
lic
Si
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Monitors
25
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Attempt to lock ; Attempt to lock
MOV W1, #LOCKED MOV W1, #LOCKED
STXR W2, W1, [X0] STXR W2, W1, [X0]
2
CBNZ W2, lock CBNZ W2, lock
DMB SY DMB SY
RET RET
20
Memory at virtual address 0x00080020
UNLOCKED
Monitors
26
lic
Si
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Monitors
27
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Attempt to lock ; Attempt to lock
MOV W1, #LOCKED MOV W1, #LOCKED
STXR W2, W1, [X0] STXR W2, W1, [X0] FAILED
2
CBNZ W2, lock CBNZ W2, lock
DMB SY DMB SY
RET RET
20
Memory at virtual address 0x00080020
LOCKED
Monitors
28
lic
Si
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Monitors
29
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Attempt to lock ; Attempt to lock
MOV W1, #LOCKED MOV W1, #LOCKED
STXR W2, W1, [X0] STXR W2, W1, [X0]
2
CBNZ W2, lock CBNZ W2, lock
DMB SY DMB SY
RET RET
20
Memory at virtual address 0x00080020
LOCKED
Monitors
30
lic
Si
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Monitors
31
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
; Attempt to lock ; Attempt to lock
MOV W1, #LOCKED MOV W1, #LOCKED
STXR W2, W1, [X0] STXR W2, W1, [X0]
2
CBNZ W2, lock CBNZ W2, lock
DMB SY DMB SY
RET RET
20
Memory at virtual address 0x00080020
LOCKED
Monitors
32
lic
Si
Agenda
en
• Synchronization background
• Enforced atomicity
• Measured atomicity
• Local and global exclusive monitors
av
M
33
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Non-shared
• Threads running on the same core only Multi-core Processor
2
• Uses Local Monitor only
Coherency Logic
Core 0
Cacheable + Inner/Outer Shareable Local Monitor
Global Monitor
20 Interconnect
• Threads running on any core within the domain
Core 1
Memory
• Typically Local Monitors + coherency logic
Local Monitor
Non-cacheable
• Threads running on different non-coherent cores
Processor
• Relies on Global Monitor in memory system
Local Monitor
Context switching
The CLREX instruction can be used to clear the Local Exclusive Monitor’s state
en
It is IMPLEMENTATION DEFINED whether the clearing of the Local Exclusive Monitor also clears the Global Exclusive Monitor
M
35
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Placing two locks within one ERG can lead to false negatives
An STREX or STXR to either lock clears the exclusivity of both
2
•
• Architecturally-correct software will still function correctly – but may be less efficient
20
Typically, the ERG is one cache line
lock: lock:
LDXR W1, [X0] LDXR W1, [X0]
CMP W1, #LOCKED CMP W1, #LOCKED
B.EQ lock B.EQ lock
; Attempt to lock ; Attempt to lock
av
Local
Monitors
W1 = LOCKED Open Open W1 = LOCKED
P0 P1
37
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
WFE
If the lock is already taken software can wait for it to become available or yield to the scheduler
If it waits for the lock to become available, the WFE instruction can be used to enter standby mode to save power
• Core will wake on the next unmasked interrupt or on receipt of an Event
3
Events can be generated by:
• Executing SEV (send event) on any core
2
• Executing SEVL (send event local) on this core
• The Global Monitor being cleared
• Event stream from Generic Timer
20
38 1509 rev 38654
Maven Silicon, 2023:04:20
on
38
lic
Si
The actual resource that the lock is protecting can still be accessed
• The lock, and any exclusive accesses, are only related to the resource by what the code says
av
• If a program ignores the lock, there’s nothing in the architecture stopping it from accessing that resource
39
Copyright © 2023 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Thank You
Danke
Gracias
谢谢
ありがとう
Asante
3
Merci
감사합니다
2
धन्यवाद
Kiitos
20
شكرا
ً
ধন্যবাদ
תודה
www.arm.com/company/policies/trademarks
av
M
41
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Programming the GIC
2
GICv3 and GICv4
20
Confidential © 2020 Arm Limited
on
2
lic
Si
Agenda
en
• Configuring interrupts
M
• Interrupt handling
3
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
GIC versions
GICv1 GICv2 GICv3 GICv4
3
• Up to 1020 interrupts virtualization than 8 cores virtual interrupts
IDs • Improved handling of • Support for message
• 8-bit priority Group 1 interrupts by signalled interrupts
2
• Software Generated secure software • System Register
Interrupts access to some
registers
• TrustZone support
20
• Vastly expanded the
Interrupt ID space
4
• CoreLink™ GIC-390
4
lic
Si
Agenda
en
• Configuring interrupts
M
• Interrupt handling
5
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Interrupt types
SPI – Shared Peripheral Interrupt
• Peripheral interrupt, available to all the cores using the interrupt controller
• INTIDs 32 to 1019
3
• Peripheral interrupt which is private to an individual core
• INTIDs 16 to 31
2
LPI – Locality-specific Peripheral Interrupt (new in GICv3)
• Peripheral interrupt, typically routed by an ITS
20
• INTIDs 8192+
6
• Discussed later in the module
Interrupt states
Active and
Inactive pendinga
en
Active
Activea
• interrupt is being serviced but not yet complete
Interrupt goes:
• Inactive → Pending when the interrupt is asserted
• Pending → Active when a CPU acknowledges the interrupt by reading the Interrupt Acknowledge Register (IAR)
• Active → Inactive when the same CPU deactivates the interrupt by writing the End of Interrupt Register (EOIR)
7
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Interrupt security
GICv3 supports three group/security settings
• Set individually for each interrupt
Group 0
• Group 0 interrupts are always Secure
3
• Signalled as FIQ, regardless of current Security state
• Typically used for interrupts for the firmware running at EL3
2
Secure Group 1
• Signalled as FIQ if core is in Non-secure state
20
• Signalled as IRQ if core is in Secure state
• Typically used for interrupts for the trusted OS
Non-secure Group 1
• Signalled as FIQ if core is in Secure state
• Signalled as IRQ if core is in Non-secure state
• Typically used for interrupts for the rich OS or Hypervisor
Non-secure Group 1
Trusted
Rich OS
OS
EL1
M
Secure Monitor
EL3
FIQ Vector
SCR_EL3.FIQ=1 SCR_EL3.IRQ=0
9 0873 rev 32248
Maven Silicon, 2023:04:20
9
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Development of the GIC architecture (a brief history)
3
• Programming the GIC
2
• Initialization the GIC
20
• Configuring interrupts
• Interrupt handling
Interfaces
en
Distributor
ITS
Redistributor Redistributor Redistributor Redistributor
av
11
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Development of the GIC architecture (a brief history)
3
• Programming the GIC
2
• Initialization the GIC
20
• Configuring interrupts
• Interrupt handling
There are also bits to select between GICv3 mode and legacy mode
• Legacy mode gives backwards compatibility with GICv2
• Controlled separately for Secure state (ARE_S) and Non-secure state (ARE_NS)
• These bits need to be set to 1 to enable GIC operation
M
13
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Power management
The GICR_WAKER register is used to record
whether a core is awake or asleep
• Awake cores can receive interrupts
• Asleep cores cannot, but the GIC will generate wake Redistributor
3
requests instead Power
Controller ProcessorSleep ChildrenAsleep
Wake Request
Marking a cores as asleep
2
• Software writes ProcessorSleep=1
• Software polls ChildrenAsleep until it reads 1 CPU interface
‐ GIC now considers the core to be asleep
20
IRQ FIQ
Reset and Power
Marking a core as awake Core
• Software writes ProcessorSleep=0
• Software polls ChildrenAsleep until it reads 0
• At reset, all cores considered to asleep
CPU interface
Like the Distributor, the CPU interface be set in GICv3 or legacy mode
en
Group enables
• Controlled by ICC_IGRPEN<n>_EL1 registers
av
15
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Development of the GIC architecture (a brief history)
3
• Programming the GIC
2
• Initialization the GIC
20
• Configuring interrupts
• Interrupt handling
‐ Non-secure state can only access bottom half of priority range GICD_IGROUPR<n>
• 0x00 is the highest priority, 0xFF is the lowest priority GICD_IGRPMODR<n>
• A GIC can implement less than 8 bits of priority
Redistributor
Configuration GICR_ISENABLER0
• Whether the interrupt is level-sensitive or edge-triggered
M
Security/Group
17
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
LPI configuration
LPI configuration and status is held in tables in memory, not
registers Redistributor Redistributor
3
• LPI configuration is global GICR_PENDBASER GICR_PENDBASER
2
GICR_PENDBASER specifies base address and size of pending
status table Pending Table Pending Table
20
• One table per Redistributor
Software must:
• Allocate memory for tables and set GICR_PROPBASER and Configuration
Table
GICR_PENDBASER registers to point at allocated memory
• Initialize contents of the Configuration table
• Initialize contents of the Pending table (zero memory)
18
• Enable Redistributor(s) by setting GICR_CTLR.EnableLPIs
ITS
LPIs are message signalled and usually sent via an Interrupt
Translation Service (or ITS)
en
:
• ITS uses the Collection ID to index into the Collection Table
Redistributor
‐ Returns the target Redistributor
19
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
ITS configuration
The ITS is controlled by a command queue (circular buffer) in memory
• Software maps/remaps interrupts by adding commands to the queue
• Software does not modify the ITS tables directly
3
Example:
• A timer has DeviceID 5 and sends EventID 0
2
• We decide to map the interrupt to INTID 8725 and deliver to the Redistributor 6
• The ITT allocated for the timer is at address 0x84500000
• We decide to use collection number 3
20
The command sequence we need is:
MAPD 5, 0x84500000, 2 Map DeviceID 5 to an Interrupt Translation Table, specifying 2-bit EventID width
MAPTI 5, 0, 8725, 3 Map EventID 0 to INTID 8725 and collection 3
MAPC 3, 6 Map collection 3 to Redistributor 6
SYNC 0x78400000 Synchronize changes on the Redistributor
Agenda
en
• Configuring interrupts
M
• Interrupt handling
21
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Acknowledging interrupts
On taking an interrupt software must read one of the Interrupt Acknowledge registers
• This returns the INTID of the interrupt and updates the state machine
3
There are different registers for Group 0 and Group 1 interrupts
• ICC_IAR0_EL1 – for Group 0 interrupts
2
• ICC_IAR1_EL1 – for Group 1 interrupts
When an interrupt is acknowledged the Running Priority of the CPU interface takes on the priority of the interrupt
20
• Current value reported by ICC_RPR_EL1
Reserved INTIDs
A read of the Acknowledge register might return one of the reserved INTIDs:
en
1020
• When in EL3, the highest priority interrupt targets Secure Group 1
‐ Exception taken to EL3, signalled interrupt targets Secure OS (S.EL1)
1021
av
1023
• There is no pending interrupt (after priority masking) that is targeting this core
• Or when in Non-secure state, the highest priority pending interrupt is Secure
• Or when in Secure EL1, the highest priority pending interrupt is Group 0
‐ Currently executing in the secure kernel (S.EL1), interrupt targets the Monitor (EL3)
23
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Deactivation
‐ Moves the state machine of the INTID, typically from Active to Inactive
2
Whether these tasks are performed as a single operation or separately depends on the EOIMode bits
• EOIMode == 0
20
‐ Writing to ICC_EOIRn_EL1 performs both priority drop and deactivation
• EOIMode == 1
‐ Writing to ICC_EOIRn_EL1 only performs priority drop
‐ Writing to ICC_DIR_EL1 performs deactivation
25
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Appendix
3
SBSA usage of PPIs
2
GICv3 and GICv4
20
Confidential © 2020 Arm Limited
on
26
lic
Si
PPI INTIDs
INTID Usage
en
27
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Appendix
3
SGI changes
2
GICv3 and GICv4
20
Confidential © 2020 Arm Limited
on
28
lic
Si
SGI ID5
SGI ID5
M
29
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Appendix
3
Legacy Mode
2
GICv3 and GICv4
20
Confidential © 2020 Arm Limited
on
30
lic
Si
Legacy operation
GICv3 supports legacy operation
en
31
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
ARE_NS=1 ARE_S=1 ARE_NS=1 ARE_S=0 ARE_NS=0 ARE_S=0
2
ICC_SRE_EL2.SRE=1 ICC_SRE_EL2.SRE=1 ICC_SRE_EL2.SRE=0 EL2
20
ICC_SRE_EL3.SRE=1 ICC_SRE_EL3.SRE=1 ICC_SRE_EL3.SRE=0 EL3
Appendix
av
Virtualization
GICv3 and GICv4
M
33
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• Physical CPU interface – Used by Hypervisor and Secure world for handling physical interrupts
• Virtual CPU Interface – Used by virtual machine for handling virtual interrupts
• Virtual interface control – Used by Hypervisor to configure generate virtual interrupts and task switching
2
GIC
20
Distributor
Redistributor
34
lic
Si
Virtual
M
CPU Interface
Interrupt
Distributor
Physical
CPU Interface
35
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
20
Virtual
CPU interface
External Interrupt
Source
Distributor
Physical IRQ
CPU interface
Virtual
M
CPU interface
External Interrupt
Source
Distributor
Physical IRQ
CPU interface Hypervisor
SCR_EL3.IRQ = 0
HCR_EL2.IMO = 1
37 0873 rev 32248
Maven Silicon, 2023:04:20
37
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
20
Virtual
CPU interface
External Interrupt
Source
Distributor
Physical Hypervisor
CPU interface
Virtual vIRQ
M
CPU interface
External Interrupt
Source
Distributor
Physical Hypervisor
CPU interface
39
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
5. GIC asserts vIRQ signal to CPU
6. EL2 returns to EL0 or EL1, back to virtual machine
2
20
Virtual vIRQ
CPU interface Guest OS
External Interrupt
Source
Distributor
Physical
CPU interface
7. CPU takes a virtual IRQ exception, and Guest OS running on the virtual machine interacts with the Virtual CPU interface
Virtual vIRQ
M
Physical
CPU interface
41
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Virtualization in GICv4
GICv4 adds support for direct injection of virtual interrupts
• In some instances, removes the need to enter the Hypervisor
• Requires an ITS, and only supported for LPIs
3
Hypervisor tells the ITS in advance about mappings between virtual and physical interrupts
• Mapping includes:
2
‐ EventID/Device of physical interrupt
‐ Virtual INTID
‐ Which virtual CPU the virtual interrupt belongs to
20
‐ Which physical CPU the virtual CPU is expected to be running on
If the virtual CPU is running when the interrupt occurs, the hardware generates a virtual interrupt
• If not, a physical door-bell interrupt is sent instead
• VMAPI and VMAPTI used to map EventID/DeviceID to virtual INTID and virtual CPU
‐ Can optionally specify a physical doorbell interrupt
• VMAPP used to map virtual core to a physical core
av
Physical Hypervisor
CPU Interface
Translation Virtual
Tables PE
table
VMAPP, VMAPTI
43
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Redistributor Virtual Guest OS
CPU Interface
Message from
peripheral GICR_VPENDBASER
ITS
20
GICR_VPROPBASER
Physical Hypervisor
CPU Interface
Translation Virtual
Tables PE
table
• Checks the value of GICR_VPENDBASER, which identifies the per-virtual PE pending table
Physical Hypervisor
CPU Interface
Translation Virtual
Tables PE
table
45
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
2 3
Redistributor Virtual Guest OS
CPU Interface
GICR_VPENDBASER
ITS
20
GICR_VPROPBASER
Physical Hypervisor
CPU Interface Physical IRQ
Translation Virtual
Tables PE
table
3
Booting
2
Neoverse and DynamIQ CPUs
20
Confidential © 2020 Arm Limited
on
2
lic
Si
What is booting?
The processor environment must be initialized before an OS kernel or bare metal app can be run
en
3
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Booting an Arm Neoverse or DynamIQ processor in AArch64
3
• Real-world booting
2
20
4 1497 rev 32514
Maven Silicon, 2023:04:20
on
4
lic
Si
5
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
31 2 1 0
2
RES0 RR AA64
20
The start address is IMPLEMENTATION DEFINED
• The DSU cluster’s RVBARADDR signals determine the AArch64 start address for both cold and warm resets
System registers for lower exception levels have UNKNOWN reset values
• Must initialize to safe values before leaving higher exception level
av
Can drop to any lower exception level without incrementally moving down
• Target exception level and all levels above must have their execution state configured before moving down
M
7
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
1. Initialize lower level’s System Control Register (SCTLR_ELy) to a known safe value
‐ Clear M, C, and I bits, ensure any RES1 bits are set to 1, ensure any RES0 bits are cleared to 0
3
2. Configure lower level execution state
‐ SCR_EL3.NS controls security state of EL1
2
‐ SCR_EL3.RW controls execution state of EL2 and Secure EL1
‐ HCR_EL2.RW controls execution state of Non-secure EL1
20
3. Configure the Saved Program Status Register (SPSR_ELx) for the current exception level
‐ See next slide
4. Set the current exception level’s Exception Link Register (ELR_ELx) to desired entry point
31 28 27 22 21 20 19 10 9 6 5 4 0
9
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
MSR SCR_EL3, X0 ; SCR_EL3.{RW,NS} = {1,1}
2
LDR X0, =0x3C9 ; “AArch64 EL2 with SP_EL2, mask all DAIF”
MSR SPSR_EL3, X0
20
ADRP X0, el2_entry ; Set EL2 entry point address
ADD X0, X0, :lo12:el2_entry ; (Can be skipped if el2_entry is 4KB aligned)
MSR ELR_EL3, X0
11
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
Details can be found in the relevant implementation’s documentation:
2
• Technical Reference Manual (TRM)
• Errata publications
20
12 1497 rev 32514
Maven Silicon, 2023:04:20
on
12
lic
Si
In legacy AArch32 systems, floating point / SIMD register access was explicitly enabled via FPEXC
M
13
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
All Neoverse or DynamIQ implementations will invalidate:
• All levels of implemented cache (L1 / L2 / L3)
2
• Translation Lookaside Buffers
• Micro-architectural features (table walk buffers, snoop filters etc.)
20
14 1497 rev 32514
Maven Silicon, 2023:04:20
on
14
lic
Si
15
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Normally an ISB is enough to synchronize processor context, but not when enabling the MMU
• The ISB only guarantees that the write to SCTLR_ELn has completed by that point
• Architecturally the write to SCTLR_ELn could complete immediately – If so, where will the next instruction come from?
3
‐ Original flat-mapped VA-PA mapping of PC, or new VA-PA mapping of PC?
2
The code enabling the MMU must exist in a valid address space before and after the MMU is enabled
• Arm recommends simply flat-mapping the page containing the write to SCTLR_ELn and the following ISB
20
VA PA Original Mapping New Mapping VA PA Original Mapping New Mapping
0x8014 0x008014 MSR SCTLR_EL3, X0 ??? 0x8014 0x8014 MSR SCTLR_EL3, X0 MSR SCTLR_EL3, X0
0x8018 0xFEC018 ISB ??? 0x8018 0x8018 ISB ISB
0x801C 0xFEC01C ??? B __main 0x801C 0x801C B __main B __main
If the write to SCTLR_EL3 completes before reaching the ISB, Flat-mapping the region of memory containing the instruction
the processor may execute the instruction at the new VA-PA sequence guarantees that the ISB is the next instruction
16
lic
Si
Additional considerations
Some parts of EL2 must be configured even if you do not have a hypervisor
en
Some peripherals may need to be at least partially configured while still in the secure world
av
• Example: All GIC interrupts are secure on reset; secure world must configure appropriate interrupts as non-secure
Appendix slide lists other key registers that one must be aware of
M
17
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Agenda
• Booting an Arm Neoverse or DynamIQ processor in AArch64
3
• Real-world booting
2
20
18 1497 rev 32514
Maven Silicon, 2023:04:20
on
18
lic
Si
Multi-core processors
Boot code must handle both per-core and processor-wide resources
en
In SMP systems, one core is designated the “primary” core and is responsible for initializing global resources, such as
system peripherals and booting an OS kernel
av
Secondary cores are held in reset or placed in a holding pen until the primary core wakes them
19
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Multi-processor systems
Many systems now include multiple clusters of multi-core processors
3
• Per-cluster (e.g. L3 cache)
• Global (e.g. Interconnect)
2
Typically only a single core in a single cluster is released from reset on power-up
• Usually core0 as reported by MPIDR_EL1
20
Additional setup to consider:
• Configure cache coherent interconnect
• Configure system memory controllers
Agenda
• Booting an Arm Neoverse or DynamIQ processor in AArch64
en
• Real-world booting
M
21
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
C Library
Reset Handler main()
Initialization
2
20
Systems running an OS will typically have multiple standalone boot stages
App
Bootstrap Bootloader OS Kernel
App
(ROM or Flash) (UEFI / uBoot) (Linux)
App
Boot process can involve both generic and heavily platform-specific code
av
Security considerations
• Trusted boot?
• Trusted services?
M
23
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
App
3
Bootstrap Bootloader OS Kernel
App
(ROM or Flash) (UEFI / uBoot) (Linux)
App
2
EL3 EL2 EL2 / EL1 EL0
20
The bootloader stage runs in the normal world
• No trusted boot
• No trusted services
• No runtime firmware services for systems that implement EL3 (all Armv8-A Cortex-A processors)
‐ Runtime firmware services need to run at the highest implemented exception level
• Designed for reuse and porting to other Armv8-A models and hardware platforms
• https://github.com/Arm-software/Arm-trusted-firmware
Currently targets:
• Foundation/Base Fixed Virtual Platform (FVP) models
• N1-SDP
25
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Trusted ROM
Trusted Firmware-A
BL1 BL2 BL3-1 OS Loader Rich OS Kernel
Cold / WArm Boot
Platform Initialization e.g. UEFI, U-Boot
Detect Resident Runtime e.g. Linux
Trusted Bootloader ‘BL3-3’ but not part of TF
Trusted Bootstrap
3
Trusted Firmware defines a number of bootloader stages with different responsibilities
2
BL1 establishes a chain of trust for Trusted boot
20
BL2:
• Uses the chain of trust to authenticate other bootloader stage images
• Performs critical platform-specific initialization
BL3-1:
• Acts as runtime Secure Monitor (SMC) interface for runtime firmware services running at EL3
• Handles world switching to a Trusted OS running at Secure-EL1, which in turn provides Trusted services
Trusted Firmware-A
Trusted Trusted
EL0 Application Application Application Application
Service Service
av
EL3 PSCI Core Interface Arm System IP Library World Switch Library
27
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
3
• A Firmware Design Document, which outlines the architecture, boot flow, and functionality of the project
• A Porting Guide, which describes all steps required for porting the project to a new platform
2
Arm will continue to develop the Trusted Firmware-A project in collaboration with interested parties in order to
provide a full reference implementation of PSCI, TBBR, and Secure Monitor code
20
This will benefit all developers working with Armv8-A TrustZone technology
respective owners.
29
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Appendix
2 3
20
Confidential © 2020 Arm Limited
on
30
lic
Si
Key registers
This is not an exhaustive list!
en
SCTLR_ELx System controls Set RES1, clear reset Before entering ELx
VBAR_ELx Vector table base address Vector table base address At entry to ELx
GICD_IGROUP<Rn> Set each IRQ’s Group (0/1) Make NS-EL1 IRQs Group 1 In the Secure world
GICC_PMR Priority mask Value in range [0x80 .. 0xFF] In the Secure world
M
SCR_EL3 Secure Monitor controls Set RW, NS, RES1, clear rest Before leaving EL3
HCR_EL2 Hypervisor controls Set RW, clear rest Before entering EL1
VMPIDR_EL2 MPIDR_EL1 read by EL1 MPIDR_EL1 read by EL2/EL3 Before entering EL1
VPIDR_EL2 MIDR_EL1 read by EL1 MIDR_EL1 read by EL2/EL3 Before entering EL1
VTTBR_EL2 Holds EL1 VMID Clear all Before entering EL1
31
Copyright © 2020 Arm Limited (or its affiliates). All rights reserved.
Not to be reproduced by any means without prior written consent.
Document references
Power State Coordination Interface (PSCI)
• Standardised SMC interface for CPU hotplug, subsystem deep idle, trusted kernel migration, etc
• Document number: Arm DEN 0022
• http://infocenter.Arm.com/help/index.jsp?topic=/com.Arm.doc.den0022c/index.html
3
SMC Calling Conventions (SMCC)
• Defines a common calling mechanism for use with the Secure Monitor Call (SMC) instruction
2
• Document number: Arm DEN 0028
• http://infocenter.Arm.com/help/index.jsp?topic=/com.Arm.doc.den0028a/index.html
20
Trusted Board Booting Requirements (TBBR)
• Defines the steps required for booting a trusted system, and aims to aid in both system design and certification efforts
• Available by request under NDA, contact your Arm Partner Manager
3
The Arm Glossary is a constantly growing and evolving source of Arm
reference. It is only available online.
2
To access it please visit :
20
https://developer.arm.com/glossary
on
lic
Si
en
av
M
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
3
& Support
2
20
© 2023 Arm
on
1
lic
Si
We have a range of services available to help reduce risk and shorten time-to-market
Design Reviews
av
Technical Support
Training
2
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
3
Satisfaction rates are Over 160 highly trained >75% of reviews identified at
consistently high 90%-95% Applications Engineers least 1 critical issue
2
20
Arm Developer Training Arm Community
Over 1,900 technical Over 4400 partners Developer resources and
documents available trained so far in 2020 discussion forums
3
lic
Si
Manchester
av
Warwick Munich
Cambridge* Beijing
Tokyo
Budapest
Richardson
Sophia Antipolis Shanghai*
Austin*
M
Seoul
Shenzhen
Bangalore Taipei
Hsinchu
4
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
Technical support
Resolve technical issues quickly and efficiently
• Keep your project on track Consistently high customer
Experienced Arm engineers answer your questions satisfaction 90%-95%
3
• Access to vast amount of Arm knowledge
Consistently high customer satisfaction
• Constantly monitored through customer surveys Enhanced Support option includes:
2
Worldwide support team located in the US, Europe and Asia • Regular contact with named
• 160+ Applications Engineers covering all product areas Applications Engineer
20
• Global teams providing local support • Quicker turnaround time on
Online portal to track your support enquiries support cases
• developer.arm.com/support • Service Tokens to be redeemed
on onsite support, training,
and/or Design Reviews
Downloads
• Articles • Access to the case tracking • Regular contact with a named
• Forums system Applications Engineer
• Evaluation products • Product maintenance • Quicker turnaround time
on support cases
M
6
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
3
Tools
2
20
Face-to-face courses Virtual Classroom Online Training
Experienced trainers deliver on-site Experienced trainers deliver live On-demand short videos with bite-
training at a location of your online training, customized to meet sized topics. Accessible wherever
choice. Courses can be customized your needs, at a date and time that you are and whenever you need
7
to meet your needs.
7
lic
Si
8
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
16
3
14
12
2
10
20
8
9
lic
Si
Service tokens
A flexible way to get the help you need, when you need it
en
Overview Training
•A flexible way to plan your services budget
• Example: 20 tokens can cover a 4-day on-site
•Valid for 12 months
av
On-site support
10
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
Arm Developer
One location for all technical
content, including:
3
• Technical product information
• Product documentation
2
• Developer resources
20
• Developer downloads
Arm Community
en
▪ Forum
▪ Blogs
▪ Communities:
av
▪ Android
▪ Arm Development Platforms
▪ Graphics & Multimedia
▪ Internet of Things
M
▪ Processors
▪ SoC Design
▪ Software Tools
12
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
3
Guides non-advanced Displays IP efficiently; Juxtaposes intra-IP Simplifies IP parameter Reveals custom
IP users towards best-fit intuitively save and details for fast and analysis, guides configs, inter/intra-IP software
IP for their project. start working with IP. accurate analysis. and renders RTL. workload performance.
2
20
PROJECTS
Saves IP & project information to organize SoC investigation activities and provide expert SoC creation guidance .
COLLABORATE
Invite colleagues, Arm FAEs and Arm Approved Design Partners to collaborate on your projects providing input and suggestions.
Membership Tiers
• Design Services (Non-commercial Usage)
• SoC Services (Commercial Usage)
av
Entitlements
• IP in the Arm Flexible Access (AFA) Mainstream
Package
• Software Development Tools
M
14
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
3
• Arm Software training courses under license
• Local language versions of our software-based
2
training courses
• Complementary training
20
• Public schedules
The training partners have demonstrated that they can
deliver Arm architecture and technology training to a
high technical level of understanding
Need help?
en
Go to arm.com/support Go to arm.com/training
av
M
Go to arm.com/design-reviews Go to arm.com/arm-approved
16
Copyright © Arm Ltd 2023. All rights reserved
Not to be reproduced by any means without prior written consent
Thank You!
Danke!
3
Merci!
谢谢!
2
ありがとう!
20
Gracias!
Kiitos!
17 © 2023 Arm
on
17
lic
Si
en
av
M