LearnTheArchitecture-MemoryManagement-101811 0100 00 en
LearnTheArchitecture-MemoryManagement-101811 0100 00 en
Version 1.0
Memory management 101811_0100_00
Version 1.0
Memory management
Learn the architecture guide
Copyright © 2019 Arm Limited (or its affiliates). All rights reserved.
Release Information
Document History
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY,
SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE
DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to, and has undertaken no analysis to
identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES, INCLUDING
WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES,
HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS
DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion
thereof is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s
customers is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to
this document at any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any click through or signed written
agreement covering this document with Arm, then the click through or signed written agreement prevails over and supersedes
the conflicting provisions of these terms. This document may be translated into other languages for convenience, and you agree
that if there is any conflict between the English version of this document and any translation, the terms of the English version of
the Agreement shall prevail.
The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited (or its
subsidiaries) in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the
trademarks of their respective owners. Please follow Arm’s trademark usage guidelines at
http://www.arm.com/company/policies/trademarks.
3T 3T
Copyright © 2019 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 2 of 26
Memory management 101811_0100_00
Version 1.0
Copyright © 2019 Arm Limited (or its affiliates). All rights reserved.
LES-PRE-20349
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.
Product Status
The information in this document is Final, that is for a developed product.
Web Address
https://developer.arm.com 3T
Copyright © 2019 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 3 of 26
Memory management 101811_0100_00
Version 1.0
Contents
1 Overview .............................................................................................................................................................................................................................................. 5
Copyright © 2019 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 4 of 26
Memory management 101811_0100_00
Version 1.0
1 Overview
This guide introduces memory translation in Armv8-A, which is key to memory management. It explains how virtual addresses
are translated to physical addresses, the translation table format, and how software manages the Translation Lookaside Buffers
(TLBs).
This information is useful for anyone who is developing low-level code, such as boot code or drivers. It is particularly relevant to
anyone who is writing code to set up or manage the Memory Management Unit (MMU).
At the end of this guide, you can check your knowledge. You will have learned how a virtual address is translated to a physical
address. You will be able to name the different address spaces, and describe how the address spaces map onto the stages of
translation. You will also have learned when software must perform TLB maintenance, and the syntax of TLB maintenance
commands.
Memory management 101811_0100_00
Version 1.0
One benefit of using virtual addresses is that an OS can present multiple fragmented physical regions of memory as a single,
contiguous virtual address space to an application.
Virtual addresses also benefit software developers, who will not know a system's exact memory addresses when writing their
application. With virtual addresses, software developers do not need to concern themselves with the physical memory. The
application knows that it is up to the OS and the hardware to work together to perform the address translation.
In practice, each application can use its own set of virtual addresses that will be mapped to different locations in the physical
system. As the operating system switches between different applications it re-programs the map. This means that the virtual
addresses for the current application will map to the correct physical location in memory.
Virtual addresses are translated to physical addresses through mappings. The mappings between virtual addresses and physical
addresses are stored in translation tables (sometimes referred to as page tables) as this diagram shows:
Translation tables are in memory and are managed by software, typically an OS or hypervisor. The translations tables are not
static, and the tables can be updated as the needs of software change. This changes the mapping between virtual and physical
addresses.
Memory management 101811_0100_00
Version 1.0
• The table walk unit, which contains logic that reads the translation tables from memory.
• Translation Lookaside Buffers (TLBs), which cache recently used translations.
All memory addresses that are issued by software are virtual. These memory addresses are passed to the MMU, which checks
the TLBs for a recently used cached translation. If the MMU does not find a recently cached translation, the table walk unit reads
the appropriate table entry, or entries, from memory, as shown here:
A virtual address must be translated to a physical address before a memory access can take place (because we must know which
physical memory location we are accessing). This need for translation also applies to cached data, because on Armv6 and later
processors, the data caches store data using the physical address (addresses that are physically tagged). Therefore, the address
must be translated before a cache lookup can complete.
Note: Architecture is a behavioral specification. The caches must behave as if they are physically tagged. An implementation
might do something different, as long as this is not software-visible.
Memory management 101811_0100_00
Version 1.0
Entry 0 in the table provides the mapping for block 0, entry 1 provides the mapping for block 1, and so on. Each entry contains
the address of a corresponding block of physical memory and the attributes to use when accessing the physical address.
Memory management 101811_0100_00
Version 1.0
The upper-order bits, which are labeled 'Which entry' in the diagram, tell you which block entry to look in and they are used as an
index into the table. This entry block contains the physical address for the virtual address.
The lower-order bits, which are labeled 'Offset in block' in the diagram, are an offset within that block and are not changed by
the translation.
Memory management 101811_0100_00
Version 1.0
The first table (Level 1 table) divides the virtual address space into large blocks. Each entry in this table can point to an equal-
sized block of physical memory or it can point to another table which subdivides the block into smaller blocks. We call this type of
table a 'multilevel table'. Here we can see an example of a multilevel table that has three levels:
In Armv8-A, the maximum number of levels is four, and the levels are numbered 0 to 3. This multilevel approach allows both
larger blocks and smaller blocks to be described. The characteristics of large and small blocks are as follows:
• Large blocks require fewer levels of reads to translate than small blocks. Plus, large blocks are more efficient to cache in
the TLBs.
• Small blocks give software fine-grain control over memory allocation. However, small blocks are less efficient to cache in
the TLBs. Caching is less efficient because small blocks require multiple reads through the levels to translate.
To manage this trade-off, an OS must balance the efficiency of using large mappings against the flexibility of using smaller
mappings for optimum performance.
Note: The processor does not know the size of the translation when it starts the table lookup. The processor works out the size
of the block that is being translated by performing the table walk.
Memory management 101811_0100_00
Version 1.0
Each of these virtual address spaces is independent, and has its own settings and tables. We often call these settings and tables
'translation regimes'. There are also virtual address spaces for Secure EL0, Secure EL1 and Secure EL2, but they are not shown
in the diagram.
Because there are multiple virtual address spaces, it is important to specify which address space an address is in. For example,
NS.EL2:0x8000 refers to the address 0x8000 in the Non-secure EL2 virtual address space.
The diagram also shows that the virtual addresses from Non-secure EL0 and Non-secure EL1 go through two sets of tables.
These tables support virtualization and allow the hypervisor to virtualize the view of physical memory that is seen by a virtual
machine (VM).
Memory management 101811_0100_00
Version 1.0
In virtualization, we call the set of translations that are controlled by the OS, Stage 1. The Stage 1 tables translate virtual
addresses to intermediate physical addresses (IPAs). In Stage 1 the OS thinks that the IPAs are physical address spaces. However,
the hypervisor controls a second set of translations, which we call Stage 2. This second set of translations translates IPAs to
physical addresses. This diagram shows how the two sets of translations work:
Although there are some minor differences in the table format, the process of Stage 1 and Stage 2 translation is usually the same.
Note: At Arm, we use the address 0x8000 in many of our examples. 0x8000 is also the default address for linking with the Arm
linker, armlink. The address comes from an early microcomputer, the BBC Micro Model B, which had ROM (and sideways RAM)
at the address 0x8000. The BBC Micro Model B was built by a company called Acorn, which developed the Acorn RISC Machine
(ARM), and later became Arm.
This diagram shows the layout of the virtual address space in AArch64:
Memory management 101811_0100_00
Version 1.0
There are two regions for the EL0/EL1 virtual address space: kernel space and application space. These two regions are shown
on the left-hand side of the diagram, with kernel space at the top, and application space, which is labeled 'User space', at the
bottom of the address space. Kernel space and user space have separate translation tables and this means that their mappings
can be kept separate.
There is a single region at the bottom of the address space for all other Exception levels. This region is shown on the right-hand
side of the diagram and is the box with no text in it.
Note: If you set HCR_EL2.E2H to 1 it enables a configuration where a host OS runs in EL2, and the applications of the host OS
run in EL0. In this scenario, EL2 also has an upper and a lower region.
Each region of address space has a size of up to 252 bytes. However, each region can be independently shrunk to a smaller size.
The TnSZ fields in the TCR_ELx registers control the size of the virtual address space. For example, this diagram shows that
TCR_EL1 controls the EL0/EL1 virtual address space:
The virtual address size can also be expressed as a number of address bits:
Therefore, if TCR_EL1.SZ1 is set to 32, the size of the kernel region in the EL0/EL1 virtual address space is 232 bytes
(0xFFFF_FFFF_0000_0000 to 0xFFFF_FFFF_FFFF_FFFF). Any address that is outside of the configured range or ranges will,
when it is accessed, generate an exception as a translation fault. The advantage of this configuration is that we only need to
describe as much of the address space as we want to use, which saves time and space. For example, imagine that the OS kernel
needs 1GB of address space (30-bit address size) for its kernel space. If the OS sets T1SZ to 34, then only the translation table
entries to describe 1GB are created, as 64 – 34 = 30.
Note: All Armv8-A implementations support 48-bit virtual addresses. Support for 52-bit virtual addresses is optional and
reported by ID_AA64MMFR2_EL1. At the time of writing, none of the Arm Cortex-A processors support 52-bit virtual
addresses.
Memory management 101811_0100_00
Version 1.0
Note: In Armv8.0-A, the maximum size for a physical address is 48 bits. This was extended to 52 bits in Armv8.2-A.
The size of the IPA space can be configured in the same way as the virtual address space. VTCR_EL2.T0SZ controls the size. The
maximum size that can be configured is the same as the physical address size that is supported by the processor. This means that
you cannot configure a larger IPA space than the supported physical address space.
5.2. Address Space Identifiers - Tagging translations with the owning process
Many modern OSs have all applications that seem to run from the same address region, this is what we have described as user
space. In practice, different applications require different mappings. This means, for example, that the translation for VA 0x8000
depends on which application is currently running.
Ideally, we would like the translations for different applications to coexist within the Translation Lookaside Buffers (TLBs), to
prevent the need for TLB invalidates on a context switch. But how would the processor know which version of the VA 0x8000
translation to use? In Armv8-A, the answer is Address Space Identifiers (ASIDs).
For the EL0/EL1 virtual address space, translations can be marked as Global (G) or Non-Global (nG) using the nG bit in the
attributes field of the translation table entry. For example, kernel mappings are Global translations, and application mappings are
Non-Global translations. Global translations apply whichever application is currently running. Non-Global translations only
apply with a specific application.
Non-Global mappings are tagged with an ASID in the TLBs. On a TLB lookup, the ASID in the TLB entry is compared with the
currently selected ASID. If they do not match, then the TLB entry is not used. This diagram shows a Global mapping in the kernel
space with no ASID tag and a non-Global mapping in user space with an ASID tag:
Memory management 101811_0100_00
Version 1.0
The diagram shows that TLB entries for multiple applications are allowed to coexist in the cache, and the ASID determines which
entry to use.
The ASID is stored in one of the two TTBRn_EL1 registers. Usually TTBR0_EL1 is used for user space. As a result, a single
register update can change both the ASID and the translation table that it points to.
Note: When virtualization is supported for a security state, EL0/EL1 translations are always tagged with a VMID – even if Stage
2 translation is not enabled. This means that if you are writing initialization code and are not using a hypervisor, it is important to
set a known VMID value before setting up the Stage 1 MMU.
For Armv8.0-A the answer is that they do not have to mean the same thing. There is no requirement for software to use a given
ASID in the same way across multiple processors. For example, ASID 5 might be used by the calculator on one processor and by
the web browser on another processor. This means that a TLB entry that is created by one processor cannot be used by another
processor.
In practice, it is unlikely that software will use ASIDs differently across processors. It is more common for software to use ASIDs
and VMIDs in the same way on all processors in a given system. Therefore, Armv8.2-A introduced the Common not Private (CnP)
bit in the Translation Table Base Register (TTBR). When the CnP bit is set, the software promises to use the ASIDs and VMIDs in
the same way on all processors, which allows the TLB entries that are created by one processor to be used by another.
Note: We have been talking about processors, however, technically, we should be using the term, Processing Element (PE). PE is a
generic term for any machine that implements the Arm architecture. It is important here because there are microarchitectural
reasons why sharing TLBs between processors would be difficult. But within a multithreaded processor, where each hardware
thread is a PE, it is much more desirable to share TLB entries.
Memory management 101811_0100_00
Version 1.0
Note: For purposes of clarity, this diagram does not specify the width of bit fields. You can find this information in the Arm
Architecture Reference Manual Armv8, for Armv8-A architecture profile: The VMSAv8-64 translation table format descriptors.
Each entry is 64 bits and the bottom two bits determine the type of entry.
Notice that some of the table entries are only valid at specific levels. The maximum number of levels of tables is four, which is
why there is no table descriptor for level 3 (or the fourth level), tables. Similarly, there are no Block descriptors or Page
descriptors for level 0. Because level 0 entry covers a large region of virtual address space, it does not make sense to allow
blocks.
Note: The encoding for the Table descriptor at levels 0-2 is the same as the Page descriptor at level 3. This encoding allows
'recursive tables', which point back to themselves. This is useful because it makes it easy to calculate the virtual address of a
particular page table entry so that it can be updated.
Memory management 101811_0100_00
Version 1.0
7 Translation granule
A translation granule is the smallest block of memory that can be described. Nothing smaller can be described, only larger blocks,
which are multiples of the granule.
Armv8-A supports three different granule sizes: 4KB, 16KB, and 64KB.
The granule sizes that a processor supports are IMPLEMENTATION DEFINED and are reported by ID_AA64MMFR0_EL1. All
Arm Cortex-A processors support 4KB and 64KB. The selected granule is the smallest block that can be described in the latest
level table. Larger blocks can also be described. This table shows the different block sizes for each level of table based on the
selected granule:
* There are restrictions on using 52-bit addresses. When the selected granule is 4KB or 16KB, the maximum virtual address
region size is 48 bits. Similarly, output physical addresses are limited to 48 bits. It is only when the 64KB granule is used that the
full 52 bits can be used.
Note: TCR_EL1 has two separate fields that control the granule size for the kernel space and the user space virtual address
ranges. These fields are called TG1 for kernel space and TG0 for user space. A potential problem for programmers is that these
two fields have different encodings.
Memory management 101811_0100_00
Version 1.0
The previous table summarized the block size (size of virtual address range covered by a single entry) for each granule at each
level of table. From the block size, you can work out which bits of the virtual address are used to index each level of table.
Let us take the 4KB granule as an example. This diagram shows the bits that are used to index the different levels of table for a
4KB granule:
Imagine that, for a configuration, you set the size of the virtual address space, TCR_ELx.T0SZ, to 32. Then the size of the virtual
address space, in address bits, is calculated as:
If we look at the previous 4KB granule diagram again, level 0 is indexed by bits 47:39. With a 32-bit address space you do not
have these bits. Therefore, the starting level of translation for your configuration is level 1.
This time, you do not have any other bits that are used to index the level 0 table or the level 1 table, so the starting level of
translation for your configuration is level 2.
As the previous diagram shows, when the size of the virtual address space reduces, you need fewer levels of tables to describe it.
These examples are based on using the 4KB granule. The same principle applies when using 16KB and 64KB granules, but the
address bits change.
Memory management 101811_0100_00
Version 1.0
• SCTLR_ELx
o M - Enable Memory Management Unit (MMU).
o C - Enable for data and unified caches.
o EE - Endianness of translation table walks.
• TTBR0_ELx and TTBR1_ELx
o BADDR - Physical address (PA) (or intermediate physical address, IPA, for EL0/EL1) of start of translation
table.
o ASID - The Address Space Identifier for Non-Global translations.
• TCR_ELx
o PS/IPS - Size of PA or IPA space, the maximum output address size.
o TnSZ - Size of address space covered by table.
o TGn - Granule size.
o SH/IRGN/ORGN - Cacheability and shareability to be used by MMU table walks.
o TBIn - Disabling of table walks to a specific table.
• MAIR_ELx
o Attr - Controls the Type and cacheability in Stage 1 tables.
Note: The TLBs are caches of translations, not caches of the translation tables. The difference is subtle. Several register fields
control how the translation table entries are interpreted. What is in a TLB entry is the interpretation of the translation table
entry given the configuration at the point that the tables were walked. In the Arm Architecture Reference Manual (Arm ARM), such
register fields are described as 'permitted to be cached in a TLB'.
If you change a translation table entry, or the controls that affect how entries are interpreted, then you need to invalidate the
affected entries in the TLB. If you do not invalidate those entries, then the processor might continue to use the old translation.
The processor is not permitted to cache a translation into the TLBs that results in any of the following faults:
As a result, you do not need to issue a TLB invalidate when mapping an address for the first time. However, you do need to issue
a TLB invalidate if you want to do any of the following:
• Unmap an address
Take an address that was previously valid or mapped and mark it as faulting.
Where:
Consider, for example, an OS that is updating an entry in its kernel translation tables. A typical TLB invalidate sequence would
look like this:
The syntax of the AT instruction lets you specify which translation regime to use. For example, EL2 can query the EL0/EL1
translation regime. However, EL1 cannot use the AT instruction to query the EL2 translation regime, as this is a breach of
privilege.
If the requested translation would have caused a fault, no exception is generated. Instead, the type of fault that would have been
generated is recorded in PAR_EL1.
Memory management 101811_0100_00
Version 1.0
A. A stage is the process of translating an input address to an output address. For Stage 1 this is the process of going from VA to
IPA and for Stage 2 going from IPA to PA.
A level refers to the tables in a given stage of translation. It is also how a larger block can be subdivided into smaller blocks.
A. The maximum size of the physical address space is IMPLEMENTATION DEFINED, and up to 52 bits (since Armv8.2-A).
Q. Which register field controls the size of the virtual address space?
A. It invalidates all the TLB entries for the EL3 virtual address space.
Q. Can a translation table entry that causes a Translation Fault be cached in the TLBs?
A. Addresses are flat mapped, so that the input and output addresses are the same.
A. An ASID is an Address Space Identifier, it identifies which application a translation is associated with. Non-Global mappings
(nG=1) are tagged with an ASID in the TLBs.
Memory management 101811_0100_00
Version 1.0
11 Related information
Here are some resources related to material in this guide:
Caches
• This topic is covered in the Caches and coherency guide (coming soon).
Virtualization
• For Armv8-A, this topic is covered in Virtualization.
• This topic is also covered in our R-profile guide, Armv8-R Virtualization.
• Introduction to Armv8-A
• Memory model overview
• Memory types overview
• What does architecture consist of?
Memory management 101811_0100_00
Version 1.0
12 Next steps
This guide has introduced the concept of memory management, explaining the mapping of virtual to physical addresses.
Understanding this information will help you to create your own bare-metal page tables and understand the processes your OS
performs when allocating memory.
Memory types and attributes, such as access permissions, are covered in the Memory Model guide.
As well as the Memory Management Unit (MMU) in the processor, it is increasingly common to have MMUs for non-processor
masters, such as Direct Memory Access (DMA) engines. These are referred to as SMMUs (System MMUs) in Arm systems, and
elsewhere as IOMMU.
To keep learning about the Armv8-A architecture, see more in our series of guides.