SlideShare a Scribd company logo
Physical Memory Models: the ways Linux
kernel addresses physical memory (physical
page frame)
Adrian Huang | June, 2022
* Kernel 5.11 (x86_64)
Agenda
• Four physical memory models
✓Purpose: page descriptor <-> PFN (Page Frame Number)
• Sparse memory model
• Sparse Memory Virtual Memmap: subsection
• page->flags
Four Physical Memory Models
• Flat Memory Model (CONFIG_FLATMEM)
✓UMA (Uniform Memory Access) with mostly continuous physical memory
• Discontinuous Memory Model (CONFIG_DISCONTIGMEM)
✓NUMA (Non-Uniform Memory Access) with mostly continuous physical memory
✓Removed since v5.14 because sparse memory model can cover this scope
• https://lore.kernel.org/linux-mm/20210602105348.13387-1-rppt@kernel.org/
• Sparse Memory (CONFIG_SPARSEMEM)
✓NUMA with discontinuous physical memory
• Sparse Memory Virtual Memmap (CONFIG_SPARSEMEM_VMEMMAP)
✓NUMA with discontinuous physical memory: A quick way to get page struct and
pfn
Memory Model – Flat Memory
struct page #n
....
struct page #1
struct page #0
Dynamic page structure
(Kernel Virtual Address Space)
struct page *mem_map
page frame #n
....
page frame #1
page frame #0
Physical Memory
Note
Page structure array
(Kernel Virtual Address Space)
1. [mem_map] Dynamic page structure: pre-allocate all page structures based on the number of page frames
✓ Allocate/Init page structures based on node’s memory info (struct pglist_data)
▪ Refer from: pglist_data.node_start_pfn & pglist_data.node_spanned_pages
2. Scenario: Continuous page frames (no memory holes) in UMA
3. Drawback
✓ Waste node_mem_map space if memory holes
✓ does not support memory hotplug
4. Check kernel function alloc_node_mem_map() in mm/page_alloc.c
Memory Model – Flat Memory
Memory Model – Discontinuous Memory
struct pglist_data *
page frame #000
....
page frame #1000
Physical Memory
1. [node_mem_map] Dynamic page structure: pre-allocate all page structures based on the
number of page frames
✓ Allocate/Init page structures based on node’s memory info (struct pglist_data)
▪ Refer from: pglist_data.node_start_pfn & pglist_data.node_spanned_pages
2. Scenario: Each node has continuous page frames (no memory holes) in NUMA
3. Drawback
✓ Waste node_mem_map space if memory holes
✓ does not support memory hotplug
NUMA Node Structure
(Kernel Virtual Address Space) struct pglist_data *
struct pglist_data *
…
struct pglist_data *node_data[]
page frame #999
....
page frame #0
struct page #n
....
struct page #0
struct page #n
....
struct page #0
node_mem_map
node_mem_map Node #1
Node #0
Note
Memory Model – Sparse Memory
struct mem_section
page frame
....
page frame
Physical Memory
**mem_section
struct mem_section
struct mem_section
…
struct mem_section *
page frame
....
page frame
....
struct page #0
struct page #n
....
struct page #0
Node #1
(hotplug)
Node #0
…
struct mem_section *
1. [section_mem_map] Dynamic page structure: pre-allocate page structures based on the
number of available page frames
✓ Refer from: memblock structure
2. Support physical memory hotplug
3. Minimum unit: PAGES_PER_SECTION = 32768
✓ Each memory section addresses the memory size: 32768 * 4KB (page size) = 128MB
4. [NUMA] : reduce the memory hole impact due to “struct mem_section”
Note
struct page #m+n-1
struct mem_section
page frame
....
page frame
Physical Memory
struct mem_section
struct mem_section
…
struct mem_section *
page frame
....
page frame
struct page #m+n-1
....
struct page #m
struct page #n
....
struct page #0
Node #1
Node #0
…
struct mem_section *
Memory Model – Sparse Memory Virtual Memmap
vmemmap
Memory Section
(two-dimension array)
Note
1. [section_mem_map] Dynamic page structure: pre-allocate page structures based on the number of
available page frames
✓ Refer from: memblock structure
2. Support physical memory hotplug
3. Minimum unit: PAGES_PER_SECTION = 32768
✓ Each memory section addresses the memory size: 32768 * 4KB (page size) = 128MB
4. [NUMA] : reduce the memory hole impact due to “struct mem_section”
5. Employ virtual memory map (vmemmap/ vmemmap_base) – A quick way to get page struct and pfn
6. Default configuration in Linux kernel
Memory Model – Sparse Memory Virtual Memmap: Detail
SECTIONS_PER_ROOT PAGES_PER_SECTION
0
14
15
22
NR_SECTION_ROOTS
23
33
PFN
63
struct mem_section
Physical Memory
struct mem_section
…
struct mem_section *
page frame
struct page #32767
....
struct page #0
…
struct mem_section *
vmemmap
**mem_section
(two-dimension array)
struct mem_section
struct mem_section
…
. . .
0
0
0
255
255
struct page
....
struct page
struct page
....
struct page
2047
+
…
page frame
page frame
…
page frame
page frame
…
Hot add
Hot add
Hot remove
....
+
page frame
128 MB
PFN
SECTIONS_PER_ROOT PAGES_PER_SECTION
0
14
15
22
NR_SECTION_ROOTS
23
33
PFN
63
struct mem_section
Physical Memory
struct mem_section
…
struct mem_section *
page frame
struct page #32767
....
struct page #0
…
struct mem_section *
vmemmap
**mem_section
(two-dimension array)
struct mem_section
struct mem_section
…
. . .
0
0
0
255
255
struct page
....
struct page
struct page
....
struct page
2047
+
…
page frame
page frame
…
page frame
page frame
…
Hot add
Hot add
Hot remove
....
+
page frame
128 MB
PFN
Memory Model – Sparse Memory Virtual Memmap: Detail
Sparse Memory Model
1. How to know available memory pages in a system?
2. Page Table Configuration for Direct Mapping
3. Sparse Memory Model Initialization – Detail
How to know available memory pages in a system?
BIOS e820 memblock Zone Page Frame Allocator
e820__memblock_setup() __free_pages_core()
[Call Path] memblock frees available memory space to zone page frame allocator
Zone page allocator detail will be discussed in another session:
physical memory management
setup_arch() -- Focus on memory portion
setup_arch
Reserve memblock for kernel code +
data/bss sections, page #0 and init ramdisk
e820__memory_setup
Setup init_mm struct for members
‘start_code’, ‘end_code’, ‘end_data’ and ‘brk’
memblock_x86_reserve_range_setup_data
e820__reserve_setup_data
e820__finish_early_params
efi_init
dmi_setup
e820_add_kernel_range
trim_bios_range
max_pfn = e820__end_of_ram_pfn()
kernel_randomize_memory
e820__memblock_setup
init_mem_mapping
x86_init.paging.pagetable_init
early_alloc_pgt_buf
reserve_brk
init_memory_mapping()
• Create 4-level page table (direct mapping) based on
‘memory’ type of memblock configuration.
x86_init.paging.pagetable_init()
• Init sparse
• Init zone structure
x86 - setup_arch() -- init_mem_mapping() – Page Table
Configuration for Direct Mapping
init_mem_mapping
probe_page_size_mask
setup_pcid
memory_map_top_down(ISA_END_ADDRESS, end)
init_memory_mapping(0, ISA_END_ADDRESS, PAGE_KERNEL)
init_range_memory_mapping(start, last_start)
split_mem_range
kernel_physical_mapping_init
add_pfn_range_mapped
early_ioremap_page_table_range_init [x86 only]
load_cr3(swapper_pg_dir)
__flush_tlb_all
init_memory_mapping() -> kernel_physical_mapping_init()
• Create 4-level page table (direct mapping) based on
‘memory’ type of memblock configuration.
split_mem_range()
• Split different the groups of page size based on the input
memory range (start address and end address)
✓ Try larger page size first
▪ 1G huge page -> 2M huge page -> 4K page
while (last_start > map_start)
init_memory_mapping(start, end, PAGE_KERNEL)
for_each_mem_pfn_range() → memblock stuff
Page Table Configuration for Direct Mapping
Kernel Space
0x0000_7FFF_FFFF_FFFF
0xFFFF_8000_0000_0000
128TB
Page frame direct
mapping (64TB)
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
page_offset_base
0
16MB
64-bit Virtual Address
Kernel Virtual Address
Physical Memory
0
0xFFFF_FFFF_FFFF_FFFF
Guard hole (8TB)
LDT remap for PTI (0.5TB)
Unused hole (0.5TB)
vmalloc/ioremap (32TB)
vmalloc_base
Unused hole (1TB)
Virtual memory map – 1TB
(store page frame descriptor)
…
vmemmap_base
64TB
*page
…
*page
…
*page
…
Page Frame
Descriptor
vmemmap_base
page_ofset_base = 0xFFFF_8880_0000_0000
vmalloc_base = 0xFFFF_C900_0000_0000
vmemmap_base = 0xFFFF_EA00_0000_0000
* Can be dynamically configured by KASLR (Kernel Address Space Layout Randomization - "arch/x86/mm/kaslr.c")
Default Configuration
Kernel text mapping from
physical address 0
Kernel code [.text, .data…]
Modules
__START_KERNEL_map = 0xFFFF_FFFF_8000_0000
__START_KERNEL = 0xFFFF_FFFF_8100_0000
MODULES_VADDR
0xFFFF_8000_0000_0000
Empty Space
User Space
128TB
1GB or 512MB
1GB or 1.5GB Fix-mapped address space
(Expanded to 4MB: 05ab1d8a4b36) FIXADDR_START
Unused hole (2MB) 0xFFFF_FFFF_FFE0_0000
0xFFFF_FFFF_FFFF_FFFF
FIXADDR_TOP = 0xFFFF_FFFF_FF7F_F000
Reference: Documentation/x86/x86_64/mm.rst
Note: Refer from page #5 in the slide deck Decompressed vmlinux: linux kernel initialization from page table configuration perspective
init_mem_mapping() – Page Table Configuration for Direct Mapping
Note
• 2-socket server with 32GB memory
init_mem_mapping() – Page Table Configuration for Direct Mapping
Note
• 2-socket server with 32GB memory
setup_arch() -- init_mem_mapping() – Page Table
Configuration for Direct Mapping
init_memory_mapping() -> kernel_physical_mapping_init()
• Create 4-level page table (direct mapping) based on
‘memory’ type of the memblock configuration.
x86 - setup_arch() -- x86_init.paging.pagetable_init()
x86_init.paging.pagetable_init
native_pagetable_init
Remove mappings in the end of physical
memory from the boot time page table
paging_init
pagetable_init
__flush_tlb_all
sparse_init
zone_sizes_init
permanent_kmaps_init
x86_init.paging.pagetable_init
native_pagetable_init
paging_init
sparse_init
zone_sizes_init
x86 x86_64
cfg number of pfn for each zone
free_area_init
Sparse Memory Model Initialization: sparse_init()
sparse_init
memblocks_present
pnum_begin = first_present_section_nr();
nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
for_each_mem_pfn_range(..)
memory_present(nid, start, end)
1. for_each_mem_pfn_range(): Walk through available memory range
from memblock subsystem
Allocate pointer array of section root if necessary
for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION)
sparse_index_init
set_section_nid
section_mark_present
cfg ‘ms->section_mem_map’ via
sparse_encode_early_nid()
for_each_present_section_nr(pnum_begin + 1, pnum_end)
sparse_init_nid
sparse_init_nid [Cover last cpu node]
Mark the present bit for each allocated mem_section
cfg ms->section_mem_map flag bits
1. Allocate a mem_section_usage struct
2. cfg ms->section_mem_map with the valid page descriptor
[During boot]
Temporary: Store nid in
ms->section_mem_map
[During boot]
Temporary: get nid in ms->section_mem_map
memblock
bottom_up
current_limit
memory
reserved
memblock_type
cnt
max
total_size
*regions
name
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
memory_present()
memblock_type
cnt
max
total_size
*regions
name
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
struct mem_section * #2047
…
struct mem_section * #0
struct mem_section #0
struct mem_section #255
…
**mem_section
Initialized object
Initialized object
Uninitialized object
memblock
bottom_up
current_limit
memory
reserved
memblock_type
cnt
max
total_size
*regions
name
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
memory_present()
memblock_type
cnt
max
total_size
*regions
name
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
struct mem_section #255
…
**mem_section
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section #0
struct mem_section * #2047
…
struct mem_section * #0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
. . .
Initialized object
Initialized object
Uninitialized object
P: Present, M: Memory map, O: Online, E: Early
memblock
bottom_up
current_limit
memory
reserved
memblock_type
cnt
max
total_size
*regions
name
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
memory_present()
memblock_type
cnt
max
total_size
*regions
name
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
…
struct mem_section #255
…
**mem_section
0
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section #5
struct mem_section #0
. . .
struct mem_section * #2047
…
struct mem_section * #0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
. . .
Initialized object
Initialized object
Uninitialized object
memblock
bottom_up
current_limit
memory
reserved
memblock_type
cnt
max
total_size
*regions
name
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
memory_present()
memblock_type
cnt
max
total_size
*regions
name
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
…
struct mem_section #255
…
**mem_section
0
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section #5
struct mem_section #0
. . .
struct mem_section #6
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section * #2047
…
struct mem_section * #0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
. . .
Initialized object
Initialized object
Uninitialized object
memblock
bottom_up
current_limit
memory
reserved
memblock_type
cnt
max
total_size
*regions
name
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
memory_present()
memblock_type
cnt
max
total_size
*regions
name
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
struct mem_section * #2047
…
struct mem_section * #0
…
struct mem_section #255
…
**mem_section
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
struct mem_section #5
struct mem_section #0
. . .
struct mem_section #6
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
…
struct mem_section #138
struct mem_section * #1
…
struct mem_section #9
struct mem_section #0
…
struct mem_section #255
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
. . .
struct mem_section
section_mem_map=0
struct mem_section_usage *usage
O=1
E=0 P=1
M=0
. . .
Initialized object
Initialized object
Uninitialized object
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
sparse_init_nid(): cfg mem_section_map
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
struct mem_section * #2047
…
struct mem_section * #0
…
struct mem_section #255
…
**mem_section
struct mem_section #5
struct mem_section #0
struct mem_section #6
…
struct mem_section #138
struct mem_section * #1
…
struct mem_section #9
struct mem_section #0
…
struct mem_section #255
. . .
struct page #65535
struct page #32767
struct page #0
struct page #32768
…
....
...
vmemmap = VMEMMAP_START =
vmemmap_base
section #0
section_roots #0
section #1
section_roots #0
struct mem_section_usage #n
…
struct mem_section_usage #0
Per-node basis
Number of available
‘struct mem_section
(map_count)’.
Initialized object
Uninitialized object
Allocate page structs for each
mem_section and map them to the page
table (Virtual Memory Map)
Note
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
. . .
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
. . .
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=0
M=1
memblock_region #0
base = 0x1000
size = 0x9f000
flags
nid = 0
memblock_region #1
base = 0x100000
size = 2ff00000
flags
nid = 0
memblock_region #2
base = 0x3004_2000
size = 0x1d6_e000
flags
nid = 0
memblock_region #7
base = 0x4_5000_0000
size = 0x3_ffc0_0000
flags
nid = 1
struct mem_section * #2047
…
struct mem_section * #0
…
struct mem_section #255
…
**mem_section
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
struct mem_section #5
struct mem_section #0
. . .
struct mem_section #6
…
struct mem_section #138
struct mem_section * #1
…
struct mem_section #9
struct mem_section #0
…
struct mem_section #255
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
. . .
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
. . .
struct page #0
vmemmap = VMEMMAP_START =
vmemmap_base
section #0,
section_roots #0
section #1,
section_roots #0
struct mem_section_usage #n
…
struct mem_section_usage #0
Per-node basis
Number of available
‘struct mem_section
(map_count)’.
…
struct page #32767
struct page #32768
…
struct page #65535
…
struct page #229375
…
…
struct page #4521984
…
struct page #8388607
struct page #8388608
…
struct page #8683520
section #2-6,
section_roots #0
section #138-255,
section_roots #0
…
section #0-9,
section_roots #1
Initialized object
Allocated & Uninitialized object
Unallocated object
sparse_init_nid(): cfg mem_section_map
Allocate page structs for each
mem_section and map them to the
page table (Virtual Memory Map)
Note
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=0
M=1
64-bit Virtual Address
Kernel Space
0x0000_7FFF_FFFF_FFFF
0xFFFF_8000_0000_0000
128TB
Page frame direct
mapping (64TB)
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
page_offset_base
0
16MB
64-bit Virtual Address
Kernel Virtual Address
Physical Memory
0
0xFFFF_FFFF_FFFF_FFFF
Guard hole (8TB)
LDT remap for PTI (0.5TB)
Unused hole (0.5TB)
vmalloc/ioremap (32TB)
vmalloc_base
Unused hole (1TB)
Virtual memory map – 1TB
(store page frame descriptor)
…
vmemmap_base
64TB
*page
…
*page
…
*page
…
Page Frame
Descriptor
vmemmap_base
page_ofset_base = 0xFFFF_8880_0000_0000
vmalloc_base = 0xFFFF_C900_0000_0000
vmemmap_base = 0xFFFF_EA00_0000_0000
* Can be dynamically configured by KASLR (Kernel Address Space Layout Randomization - "arch/x86/mm/kaslr.c")
Default Configuration
Kernel text mapping from
physical address 0
Kernel code [.text, .data…]
Modules
__START_KERNEL_map = 0xFFFF_FFFF_8000_0000
__START_KERNEL = 0xFFFF_FFFF_8100_0000
MODULES_VADDR
0xFFFF_8000_0000_0000
Empty Space
User Space
128TB
1GB or 512MB
1GB or 1.5GB Fix-mapped address space
(Expanded to 4MB: 05ab1d8a4b36) FIXADDR_START
Unused hole (2MB) 0xFFFF_FFFF_FFE0_0000
0xFFFF_FFFF_FFFF_FFFF
FIXADDR_TOP = 0xFFFF_FFFF_FF7F_F000
Reference: Documentation/x86/x86_64/mm.rst
Note: Refer from page #5 in the slide deck Decompressed vmlinux: linux kernel initialization from page table configuration perspective
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
. . .
struct page #0
vmemmap = VMEMMAP_START =
vmemmap_base
section #0,
section_roots #0
section #1,
section_roots #0
…
struct page #32767
struct page #32768
…
struct page #65535
…
struct page #229375
…
…
struct page #4521984
…
struct page #8388607
struct page #8388608
…
struct page #8683520
section #2-6,
section_roots #0
section #138-255,
section_roots #0
…
section #0-9,
section_roots #1
Re-visit sparse memory
Sparse Memory: Refer to section_mem_map
Sparse Memory with vmemmap: Refer to vmemmap
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=0
M=1
Sparse Memory Virtual Memmap:
subsection
1. Introduction
2. Subsection users?
3. pageblock_flags: pageblock migration type
Sparse Memory Virtual Memmap: subsection (1/4)
SECTIONS_PER_ROOT PAGES_PER_SECTION
0
14
15
22
NR_SECTION_ROOTS
23
33
PFN
63
SECTION_SIZE_BITS = 27
PAGES_PER_SUBSECTION
SUBSECTIONS_PER
_SECTION
14 9 0
8
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
• subsection_map: bitmap to indicate if the corresponding subsection is valid
• pageblock_flags: pages of a subsection have the same flag (migration type)
sparsemem vmemmap *only*
Sparse Memory Virtual Memmap: subsection (2/4)
Some macros are expanded manually
Note
Sparse Memory Virtual Memmap: subsection (3/4)
SECTIONS_PER_ROOT PAGES_PER_SECTION
0
14
15
22
NR_SECTION_ROOTS
23
33
PFN
63
SECTION_SIZE_BITS = 27
PAGES_PER_SUBSECTION
SUBSECTIONS_PER
_SECTION
14 9 0
8
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
• PAGES_PER_SUBSECTION = 512 pages
✓ 512 pages * 4KB = 2MB → 2MB huge page
in x86_64
Sparse Memory Virtual Memmap: subsection (4/4)
• SUBSECTION_SIZE
✓ (1UL << 21) = 2MB → 2MB huge
page in x86_64.
SECTIONS_PER_ROOT PAGES_PER_SECTION
0
14
15
22
NR_SECTION_ROOTS
23
33
PFN
63
SECTION_SIZE_BITS = 27
PAGES_PER_SUBSECTION
SUBSECTIONS_PER
_SECTION
14 9 0
8
Some macros are expanded manually
Note
subsection: subsection_map users?
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
• init stage
✓ paging_init -> zone_sizes_init -> free_area_init -> subsection_map_init -> subsection_mask_set
➢ Set the corresponding bit map for the specific subsection
• Reference stage
✓ pfn_section_valid(struct mem_section *ms, unsigned long pfn)
➢ Users
▪ [mm/page_alloc.c: 5089] free_pages -> virt_addr_valid -> __virt_addr_valid -> pfn_valid -> pfn_section_valid
▪ [drivers/char/mem.c: 416] mmap_kmem -> pfn_valid -> pfn_section_valid ➔ /dev/mem (`man mem`)
▪ …
subsection_map users
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
• Hotplug stage
✓ Add
➢ #A1 [drivers/acpi/acpi_memhotplug.c: 311] acpi_memory_device_add -> acpi_memory_enable_device ->
__add_memory -> add_memory_resource -> arch_add_memory -> add_pages -> __add_pages -> sparse_add_section
-> section_activate -> fill_subsection_map -> subsection_mask_set
➢ #A2 [drivers/dax/kmem.c: 43] dev_dax_kmem_probe -> add_memory_driver_managed -> add_memory_resource ->
same with #A1
✓ Remove
➢ #R1 [drivers/acpi/acpi_memhotplug.c: 311] acpi_memory_device_remove -> __remove_memory ->
try_remove_memory -> arch_remove_memory -> __remove_pages -> __remove_section -> sparse_remove_section ->
section_deactivate -> clear_subsection_map
➢ #R2 [drivers/dax/kmem.c: 139] dev_dax_kmem_remove -> remove_memory -> try_remove_memory -> same with #R1
subsection_map users
subsection: subsection_map users?
pageblock_flags: pageblock migration type
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
unsigned long pageblock_flags[4]
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
[0]
Dynamically allocated
[1]
[2]
[3]
subsection #0: Migration Type
subsection #16: Migration Type
subsection #32: Migration Type
subsection #48: Migration Type
Migration type is configured in setup_arch -> … -> memmap_init_zone
pageblock_flags: pageblock migration type
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
unsigned long pageblock_flags[4]
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
[0]
Dynamically allocated
[1]
[2]
[3]
subsection #0: Migration Type
subsection #16: Migration Type
subsection #32: Migration Type
subsection #48: Migration Type
pageblock: set migration type
free_area_init
print zone ranges and early memory node ranges
for_each_mem_pfn_range(..)
print memory range for each memblock
subsection_map_init
mminit_verify_pageflags_layout
setup_nr_node_ids
init_unavailable_mem
for_each_online_node(nid)
free_area_init_node
node_set_state
check_for_memory
get_pfn_range_for_nid
calculate_node_totalpages
pgdat_set_deferred_range
free_area_init_core
free_area_init_core
memmap_init
for (j = 0; j < MAX_NR_ZONES; j++)
memmap_init_zone
subsection_map_init
subsection_mask_set
for (nr = start_sec; nr <= end_sec; nr++)
bitmap_set
calculate arch_zone_{lowest, highest}_possible_pfn[]
for (pfn = start_pfn; pfn < end_pfn;)
set_pageblock_migratetype
__init_single_page
set_pageblock_migratetype
• [System init stage] each pageblock is initialized to MIGRATE_MOVABLE
zone
present_pages = 1311744
Page
. . .
pageblock #0
Page
pageblock #1
Page
pageblock #N
CONFIG_HUGETLB_PAGE Number of Pages
Y 512 = Huge page size
N 1024 (MAX_ORDER - 1)
pageblock size
N = round_up(present_pages / pageblock_size) - 1
Example
pageblocks = round_up(1311744 / 512) = 2562
pageblock
16 + 2544 + 2 = 2562
1
1
2
2
pageblock_flags: pageblock migration type
struct mem_section
section_mem_map
struct mem_section_usage *usage
O=1
E=1 P=1
M=1
subsection #63
…
subsection #0
struct mem_section_usage
subsection_map[1] (bitmap)
pageblock_flags[0]
struct page #0
…
struct page #511
…
…
struct page #32767
struct page #32256
subsection
subsection
section
…
…
unsigned long pageblock_flags[4]
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
4-bit MT . . . 4-bit MT
4-bit MT
[0]
Dynamically allocated
[1]
[2]
[3]
subsection #0: Migration Type
subsection #16: Migration Type
subsection #32: Migration Type
subsection #48: Migration Type
[CONFIG_HUGETLB_PAGE=y]
pages of subsection = pages of pageblock = 512 pages (order = 9)
page->flags layout
Node Zone … flags
Node Zone … flags
LAST_CPUPID
Node Zone … flags
Section
Node Zone flags
Section
Zone … flags
Section
…
LAST_CPUPID
No sparsemem or sparsemem
vmemmap
No sparsemem or sparsemem
vmemmap + last_cpupid
sparsemem
sparsemem + last_cpupid
sparsemem wo/ node
1. last_cpupid: Support for NUMA balancing (NUMA-optimizing scheduler)
2. sparsemem: Enabled by CONFIG_SPARSEMEM
Note
…
page->flags layout
0
63
page->flags layout: sparsemem vmemmap + last_cpupid
Kernel Configuration: qemu – v5.11 kernel
...
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
…
CONFIG_NR_CPUS=64
…
CONFIG_NODES_SHIFT=10
…
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
…
# CONFIG_KASAN is not set
Node Zone … flags (enum pageflags)
LAST_CPUPID
0
22
38
52
54
63
23-bit pageflags
2-bit zone
page->flags layout - sparsemem vmemmap + last_cpupid
Kernel Configuration: qemu – v5.11 kernel
...
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
…
CONFIG_NR_CPUS=64
…
CONFIG_NODES_SHIFT=10
…
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
…
# CONFIG_KASAN is not set
Node Zone … flags (enum pageflags)
LAST_CPUPID
0
22
38
52
54
63
Node Zone flags
Section …
LAST_CPUPID
sparsemem + last_cpupid
page->flags: section field (sparsemem wo/ vmemmap)
Sparse Memory: Refer to section_mem_map
Memory Model – Sparse Memory (sparsemem wo/ vmemmap)
struct mem_section
page frame
....
page frame
Physical Memory
**mem_section
struct mem_section
struct mem_section
…
struct mem_section *
page frame
....
page frame
....
struct page #0
struct page #n
....
struct page #0
Node #1
(hotplug)
Node #0
…
struct mem_section *
1. [section_mem_map] Dynamic page structure: pre-allocate page structures based on the
number of available page frames
✓ Refer from: memblock structure
2. Support physical memory hotplug
3. Minimum unit: mem_section - PAGES_PER_SECTION = 32768
✓ Each memory section addresses the memory size: 32768 * 4KB (page size) = 128MB
4. [NUMA] : reduce the memory hole impact due to “struct mem_section”
Note
struct page #m+n-1
Reference
• https://www.kernel.org/doc/html/v5.17/vm/memory-model.html
Backup
/sys/devices/system/memory/block_size_bytes
/sys/devices/system/memory/block_size_bytes
System memory
< 64GB?
block_size_bytes = 0x800_0000
(MIN_MEMORY_BLOCK_SIZE = 128 MB)
block_size_bytes = 0x8000_0000
(MAX_BLOCK_SIZE = 2 GB)
* Ignore SGI UV system platform
!X86_FEATURE_HYPERVISOR?
Find the largest allowed block size that
aligns to memory end (check ‘max_pfn’)
Range: 0x8000_0000 - 0x800_0000
Y
N
Y
N
* Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
/sys/devices/system/memory/block_size_bytes
System memory
< 64GB?
block_size_bytes = 0x800_0000
(MIN_MEMORY_BLOCK_SIZE = 128 MB)
block_size_bytes = 0x8000_0000
(MAX_BLOCK_SIZE = 2 GB)
* Ignore SGI UV system platform
!X86_FEATURE_HYPERVISOR?
Find the largest allowed block size that
aligns to memory end (check ‘max_pfn’)
Range: 0x8000_0000 - 0x800_0000
Y
N
Y
N
* Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
/sys/devices/system/memory/block_size_bytes
System memory
< 64GB?
block_size_bytes = 0x800_0000
(MIN_MEMORY_BLOCK_SIZE = 128 MB)
block_size_bytes = 0x8000_0000
(MAX_BLOCK_SIZE = 2 GB)
* Ignore SGI UV system platform
!X86_FEATURE_HYPERVISOR?
Find the largest allowed block size that
aligns to memory end (check ‘max_pfn’)
Range: 0x8000_0000 - 0x800_0000
Y
N
Y
N
QEMU – Guest OS
* Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
/sys/devices/system/memory/block_size_bytes
System memory
< 64GB?
block_size_bytes = 0x800_0000
(MIN_MEMORY_BLOCK_SIZE = 128 MB)
block_size_bytes = 0x8000_0000
(MAX_BLOCK_SIZE = 2 GB)
* Ignore SGI UV system platform
!X86_FEATURE_HYPERVISOR?
Find the largest allowed block size that
aligns to memory end (check ‘max_pfn’)
Range: 0x8000_0000 - 0x800_0000
Y
N
Y
N
QEMU – Guest OS
* Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
Ad

More Related Content

What's hot (20)

Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
Adrian Huang
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
Adrian Huang
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
Adrian Huang
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
Adrian Huang
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
shimosawa
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
Adrian Huang
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
Buland Singh
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
Pankaj Suryawanshi
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
Adrian Huang
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace
Babak Farrokhi
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
shimosawa
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Anne Nicolas
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
ycelgemici1
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
Buland Singh
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
Adrian Huang
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
Adrian Huang
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
Adrian Huang
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
Adrian Huang
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
Adrian Huang
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
shimosawa
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
Adrian Huang
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
Buland Singh
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
Pankaj Suryawanshi
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
Adrian Huang
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace
Babak Farrokhi
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
shimosawa
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Anne Nicolas
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
ycelgemici1
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
Buland Singh
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
Adrian Huang
 

Similar to Physical Memory Models.pdf (20)

memory.ppt
memory.pptmemory.ppt
memory.ppt
KalimuthuVelappan
 
memory_mapping.ppt
memory_mapping.pptmemory_mapping.ppt
memory_mapping.ppt
KalimuthuVelappan
 
Memory
MemoryMemory
Memory
Muhammed Mazhar Khan
 
Linux memory
Linux memoryLinux memory
Linux memory
ericrain911
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
Hao-Ran Liu
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
Sisimon Soman
 
Sysprog 15
Sysprog 15Sysprog 15
Sysprog 15
Ahmed Mekkawy
 
ch3-pv1-memory-management
ch3-pv1-memory-managementch3-pv1-memory-management
ch3-pv1-memory-management
yushiang fu
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Memory_Unit Cache Main Virtual Associative
Memory_Unit Cache Main Virtual AssociativeMemory_Unit Cache Main Virtual Associative
Memory_Unit Cache Main Virtual Associative
RNShukla7
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
The Linux Foundation
 
Live memory forensics
Live memory forensicsLive memory forensics
Live memory forensics
Shekh Md Mehedi Hasan
 
EncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache Execution
Yue Chen
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
Raghu Udiyar
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
Geraldo Netto
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Eric Lin
 
Vmfs
VmfsVmfs
Vmfs
Erick Treviño
 
virtual memory - Computer operating system
virtual memory - Computer operating systemvirtual memory - Computer operating system
virtual memory - Computer operating system
Electronics - Embedded System
 
INFLOW-2014-NVM-Compression
INFLOW-2014-NVM-CompressionINFLOW-2014-NVM-Compression
INFLOW-2014-NVM-Compression
Dhananjoy ( Joy ) Das
 
Linux Performance Tunning Memory
Linux Performance Tunning MemoryLinux Performance Tunning Memory
Linux Performance Tunning Memory
Shay Cohen
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
Hao-Ran Liu
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
Sisimon Soman
 
ch3-pv1-memory-management
ch3-pv1-memory-managementch3-pv1-memory-management
ch3-pv1-memory-management
yushiang fu
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Memory_Unit Cache Main Virtual Associative
Memory_Unit Cache Main Virtual AssociativeMemory_Unit Cache Main Virtual Associative
Memory_Unit Cache Main Virtual Associative
RNShukla7
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
The Linux Foundation
 
EncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache Execution
Yue Chen
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
Raghu Udiyar
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Eric Lin
 
Linux Performance Tunning Memory
Linux Performance Tunning MemoryLinux Performance Tunning Memory
Linux Performance Tunning Memory
Shay Cohen
 
Ad

Recently uploaded (20)

Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Ad

Physical Memory Models.pdf

  • 1. Physical Memory Models: the ways Linux kernel addresses physical memory (physical page frame) Adrian Huang | June, 2022 * Kernel 5.11 (x86_64)
  • 2. Agenda • Four physical memory models ✓Purpose: page descriptor <-> PFN (Page Frame Number) • Sparse memory model • Sparse Memory Virtual Memmap: subsection • page->flags
  • 3. Four Physical Memory Models • Flat Memory Model (CONFIG_FLATMEM) ✓UMA (Uniform Memory Access) with mostly continuous physical memory • Discontinuous Memory Model (CONFIG_DISCONTIGMEM) ✓NUMA (Non-Uniform Memory Access) with mostly continuous physical memory ✓Removed since v5.14 because sparse memory model can cover this scope • https://lore.kernel.org/linux-mm/[email protected]/ • Sparse Memory (CONFIG_SPARSEMEM) ✓NUMA with discontinuous physical memory • Sparse Memory Virtual Memmap (CONFIG_SPARSEMEM_VMEMMAP) ✓NUMA with discontinuous physical memory: A quick way to get page struct and pfn
  • 4. Memory Model – Flat Memory struct page #n .... struct page #1 struct page #0 Dynamic page structure (Kernel Virtual Address Space) struct page *mem_map page frame #n .... page frame #1 page frame #0 Physical Memory Note Page structure array (Kernel Virtual Address Space) 1. [mem_map] Dynamic page structure: pre-allocate all page structures based on the number of page frames ✓ Allocate/Init page structures based on node’s memory info (struct pglist_data) ▪ Refer from: pglist_data.node_start_pfn & pglist_data.node_spanned_pages 2. Scenario: Continuous page frames (no memory holes) in UMA 3. Drawback ✓ Waste node_mem_map space if memory holes ✓ does not support memory hotplug 4. Check kernel function alloc_node_mem_map() in mm/page_alloc.c
  • 5. Memory Model – Flat Memory
  • 6. Memory Model – Discontinuous Memory struct pglist_data * page frame #000 .... page frame #1000 Physical Memory 1. [node_mem_map] Dynamic page structure: pre-allocate all page structures based on the number of page frames ✓ Allocate/Init page structures based on node’s memory info (struct pglist_data) ▪ Refer from: pglist_data.node_start_pfn & pglist_data.node_spanned_pages 2. Scenario: Each node has continuous page frames (no memory holes) in NUMA 3. Drawback ✓ Waste node_mem_map space if memory holes ✓ does not support memory hotplug NUMA Node Structure (Kernel Virtual Address Space) struct pglist_data * struct pglist_data * … struct pglist_data *node_data[] page frame #999 .... page frame #0 struct page #n .... struct page #0 struct page #n .... struct page #0 node_mem_map node_mem_map Node #1 Node #0 Note
  • 7. Memory Model – Sparse Memory struct mem_section page frame .... page frame Physical Memory **mem_section struct mem_section struct mem_section … struct mem_section * page frame .... page frame .... struct page #0 struct page #n .... struct page #0 Node #1 (hotplug) Node #0 … struct mem_section * 1. [section_mem_map] Dynamic page structure: pre-allocate page structures based on the number of available page frames ✓ Refer from: memblock structure 2. Support physical memory hotplug 3. Minimum unit: PAGES_PER_SECTION = 32768 ✓ Each memory section addresses the memory size: 32768 * 4KB (page size) = 128MB 4. [NUMA] : reduce the memory hole impact due to “struct mem_section” Note struct page #m+n-1
  • 8. struct mem_section page frame .... page frame Physical Memory struct mem_section struct mem_section … struct mem_section * page frame .... page frame struct page #m+n-1 .... struct page #m struct page #n .... struct page #0 Node #1 Node #0 … struct mem_section * Memory Model – Sparse Memory Virtual Memmap vmemmap Memory Section (two-dimension array) Note 1. [section_mem_map] Dynamic page structure: pre-allocate page structures based on the number of available page frames ✓ Refer from: memblock structure 2. Support physical memory hotplug 3. Minimum unit: PAGES_PER_SECTION = 32768 ✓ Each memory section addresses the memory size: 32768 * 4KB (page size) = 128MB 4. [NUMA] : reduce the memory hole impact due to “struct mem_section” 5. Employ virtual memory map (vmemmap/ vmemmap_base) – A quick way to get page struct and pfn 6. Default configuration in Linux kernel
  • 9. Memory Model – Sparse Memory Virtual Memmap: Detail SECTIONS_PER_ROOT PAGES_PER_SECTION 0 14 15 22 NR_SECTION_ROOTS 23 33 PFN 63 struct mem_section Physical Memory struct mem_section … struct mem_section * page frame struct page #32767 .... struct page #0 … struct mem_section * vmemmap **mem_section (two-dimension array) struct mem_section struct mem_section … . . . 0 0 0 255 255 struct page .... struct page struct page .... struct page 2047 + … page frame page frame … page frame page frame … Hot add Hot add Hot remove .... + page frame 128 MB PFN
  • 10. SECTIONS_PER_ROOT PAGES_PER_SECTION 0 14 15 22 NR_SECTION_ROOTS 23 33 PFN 63 struct mem_section Physical Memory struct mem_section … struct mem_section * page frame struct page #32767 .... struct page #0 … struct mem_section * vmemmap **mem_section (two-dimension array) struct mem_section struct mem_section … . . . 0 0 0 255 255 struct page .... struct page struct page .... struct page 2047 + … page frame page frame … page frame page frame … Hot add Hot add Hot remove .... + page frame 128 MB PFN Memory Model – Sparse Memory Virtual Memmap: Detail
  • 11. Sparse Memory Model 1. How to know available memory pages in a system? 2. Page Table Configuration for Direct Mapping 3. Sparse Memory Model Initialization – Detail
  • 12. How to know available memory pages in a system? BIOS e820 memblock Zone Page Frame Allocator e820__memblock_setup() __free_pages_core() [Call Path] memblock frees available memory space to zone page frame allocator Zone page allocator detail will be discussed in another session: physical memory management
  • 13. setup_arch() -- Focus on memory portion setup_arch Reserve memblock for kernel code + data/bss sections, page #0 and init ramdisk e820__memory_setup Setup init_mm struct for members ‘start_code’, ‘end_code’, ‘end_data’ and ‘brk’ memblock_x86_reserve_range_setup_data e820__reserve_setup_data e820__finish_early_params efi_init dmi_setup e820_add_kernel_range trim_bios_range max_pfn = e820__end_of_ram_pfn() kernel_randomize_memory e820__memblock_setup init_mem_mapping x86_init.paging.pagetable_init early_alloc_pgt_buf reserve_brk init_memory_mapping() • Create 4-level page table (direct mapping) based on ‘memory’ type of memblock configuration. x86_init.paging.pagetable_init() • Init sparse • Init zone structure
  • 14. x86 - setup_arch() -- init_mem_mapping() – Page Table Configuration for Direct Mapping init_mem_mapping probe_page_size_mask setup_pcid memory_map_top_down(ISA_END_ADDRESS, end) init_memory_mapping(0, ISA_END_ADDRESS, PAGE_KERNEL) init_range_memory_mapping(start, last_start) split_mem_range kernel_physical_mapping_init add_pfn_range_mapped early_ioremap_page_table_range_init [x86 only] load_cr3(swapper_pg_dir) __flush_tlb_all init_memory_mapping() -> kernel_physical_mapping_init() • Create 4-level page table (direct mapping) based on ‘memory’ type of memblock configuration. split_mem_range() • Split different the groups of page size based on the input memory range (start address and end address) ✓ Try larger page size first ▪ 1G huge page -> 2M huge page -> 4K page while (last_start > map_start) init_memory_mapping(start, end, PAGE_KERNEL) for_each_mem_pfn_range() → memblock stuff
  • 15. Page Table Configuration for Direct Mapping Kernel Space 0x0000_7FFF_FFFF_FFFF 0xFFFF_8000_0000_0000 128TB Page frame direct mapping (64TB) ZONE_DMA ZONE_DMA32 ZONE_NORMAL page_offset_base 0 16MB 64-bit Virtual Address Kernel Virtual Address Physical Memory 0 0xFFFF_FFFF_FFFF_FFFF Guard hole (8TB) LDT remap for PTI (0.5TB) Unused hole (0.5TB) vmalloc/ioremap (32TB) vmalloc_base Unused hole (1TB) Virtual memory map – 1TB (store page frame descriptor) … vmemmap_base 64TB *page … *page … *page … Page Frame Descriptor vmemmap_base page_ofset_base = 0xFFFF_8880_0000_0000 vmalloc_base = 0xFFFF_C900_0000_0000 vmemmap_base = 0xFFFF_EA00_0000_0000 * Can be dynamically configured by KASLR (Kernel Address Space Layout Randomization - "arch/x86/mm/kaslr.c") Default Configuration Kernel text mapping from physical address 0 Kernel code [.text, .data…] Modules __START_KERNEL_map = 0xFFFF_FFFF_8000_0000 __START_KERNEL = 0xFFFF_FFFF_8100_0000 MODULES_VADDR 0xFFFF_8000_0000_0000 Empty Space User Space 128TB 1GB or 512MB 1GB or 1.5GB Fix-mapped address space (Expanded to 4MB: 05ab1d8a4b36) FIXADDR_START Unused hole (2MB) 0xFFFF_FFFF_FFE0_0000 0xFFFF_FFFF_FFFF_FFFF FIXADDR_TOP = 0xFFFF_FFFF_FF7F_F000 Reference: Documentation/x86/x86_64/mm.rst Note: Refer from page #5 in the slide deck Decompressed vmlinux: linux kernel initialization from page table configuration perspective
  • 16. init_mem_mapping() – Page Table Configuration for Direct Mapping Note • 2-socket server with 32GB memory
  • 17. init_mem_mapping() – Page Table Configuration for Direct Mapping Note • 2-socket server with 32GB memory
  • 18. setup_arch() -- init_mem_mapping() – Page Table Configuration for Direct Mapping init_memory_mapping() -> kernel_physical_mapping_init() • Create 4-level page table (direct mapping) based on ‘memory’ type of the memblock configuration.
  • 19. x86 - setup_arch() -- x86_init.paging.pagetable_init() x86_init.paging.pagetable_init native_pagetable_init Remove mappings in the end of physical memory from the boot time page table paging_init pagetable_init __flush_tlb_all sparse_init zone_sizes_init permanent_kmaps_init x86_init.paging.pagetable_init native_pagetable_init paging_init sparse_init zone_sizes_init x86 x86_64 cfg number of pfn for each zone free_area_init
  • 20. Sparse Memory Model Initialization: sparse_init() sparse_init memblocks_present pnum_begin = first_present_section_nr(); nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); for_each_mem_pfn_range(..) memory_present(nid, start, end) 1. for_each_mem_pfn_range(): Walk through available memory range from memblock subsystem Allocate pointer array of section root if necessary for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) sparse_index_init set_section_nid section_mark_present cfg ‘ms->section_mem_map’ via sparse_encode_early_nid() for_each_present_section_nr(pnum_begin + 1, pnum_end) sparse_init_nid sparse_init_nid [Cover last cpu node] Mark the present bit for each allocated mem_section cfg ms->section_mem_map flag bits 1. Allocate a mem_section_usage struct 2. cfg ms->section_mem_map with the valid page descriptor [During boot] Temporary: Store nid in ms->section_mem_map [During boot] Temporary: get nid in ms->section_mem_map
  • 21. memblock bottom_up current_limit memory reserved memblock_type cnt max total_size *regions name memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 memory_present() memblock_type cnt max total_size *regions name memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 struct mem_section * #2047 … struct mem_section * #0 struct mem_section #0 struct mem_section #255 … **mem_section Initialized object Initialized object Uninitialized object
  • 22. memblock bottom_up current_limit memory reserved memblock_type cnt max total_size *regions name memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 memory_present() memblock_type cnt max total_size *regions name memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 struct mem_section #255 … **mem_section struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section #0 struct mem_section * #2047 … struct mem_section * #0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 . . . Initialized object Initialized object Uninitialized object P: Present, M: Memory map, O: Online, E: Early
  • 23. memblock bottom_up current_limit memory reserved memblock_type cnt max total_size *regions name memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 memory_present() memblock_type cnt max total_size *regions name memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 … struct mem_section #255 … **mem_section 0 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section #5 struct mem_section #0 . . . struct mem_section * #2047 … struct mem_section * #0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 . . . Initialized object Initialized object Uninitialized object
  • 24. memblock bottom_up current_limit memory reserved memblock_type cnt max total_size *regions name memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 memory_present() memblock_type cnt max total_size *regions name memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 … struct mem_section #255 … **mem_section 0 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section #5 struct mem_section #0 . . . struct mem_section #6 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section * #2047 … struct mem_section * #0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 . . . Initialized object Initialized object Uninitialized object
  • 25. memblock bottom_up current_limit memory reserved memblock_type cnt max total_size *regions name memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 memory_present() memblock_type cnt max total_size *regions name memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 struct mem_section * #2047 … struct mem_section * #0 … struct mem_section #255 … **mem_section struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 struct mem_section #5 struct mem_section #0 . . . struct mem_section #6 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 … struct mem_section #138 struct mem_section * #1 … struct mem_section #9 struct mem_section #0 … struct mem_section #255 struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 . . . struct mem_section section_mem_map=0 struct mem_section_usage *usage O=1 E=0 P=1 M=0 . . . Initialized object Initialized object Uninitialized object
  • 26. memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 sparse_init_nid(): cfg mem_section_map memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 struct mem_section * #2047 … struct mem_section * #0 … struct mem_section #255 … **mem_section struct mem_section #5 struct mem_section #0 struct mem_section #6 … struct mem_section #138 struct mem_section * #1 … struct mem_section #9 struct mem_section #0 … struct mem_section #255 . . . struct page #65535 struct page #32767 struct page #0 struct page #32768 … .... ... vmemmap = VMEMMAP_START = vmemmap_base section #0 section_roots #0 section #1 section_roots #0 struct mem_section_usage #n … struct mem_section_usage #0 Per-node basis Number of available ‘struct mem_section (map_count)’. Initialized object Uninitialized object Allocate page structs for each mem_section and map them to the page table (Virtual Memory Map) Note struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 . . . struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 . . . struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=0 M=1
  • 27. memblock_region #0 base = 0x1000 size = 0x9f000 flags nid = 0 memblock_region #1 base = 0x100000 size = 2ff00000 flags nid = 0 memblock_region #2 base = 0x3004_2000 size = 0x1d6_e000 flags nid = 0 memblock_region #7 base = 0x4_5000_0000 size = 0x3_ffc0_0000 flags nid = 1 struct mem_section * #2047 … struct mem_section * #0 … struct mem_section #255 … **mem_section struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 struct mem_section #5 struct mem_section #0 . . . struct mem_section #6 … struct mem_section #138 struct mem_section * #1 … struct mem_section #9 struct mem_section #0 … struct mem_section #255 struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 . . . struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 . . . struct page #0 vmemmap = VMEMMAP_START = vmemmap_base section #0, section_roots #0 section #1, section_roots #0 struct mem_section_usage #n … struct mem_section_usage #0 Per-node basis Number of available ‘struct mem_section (map_count)’. … struct page #32767 struct page #32768 … struct page #65535 … struct page #229375 … … struct page #4521984 … struct page #8388607 struct page #8388608 … struct page #8683520 section #2-6, section_roots #0 section #138-255, section_roots #0 … section #0-9, section_roots #1 Initialized object Allocated & Uninitialized object Unallocated object sparse_init_nid(): cfg mem_section_map Allocate page structs for each mem_section and map them to the page table (Virtual Memory Map) Note struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=0 M=1
  • 28. 64-bit Virtual Address Kernel Space 0x0000_7FFF_FFFF_FFFF 0xFFFF_8000_0000_0000 128TB Page frame direct mapping (64TB) ZONE_DMA ZONE_DMA32 ZONE_NORMAL page_offset_base 0 16MB 64-bit Virtual Address Kernel Virtual Address Physical Memory 0 0xFFFF_FFFF_FFFF_FFFF Guard hole (8TB) LDT remap for PTI (0.5TB) Unused hole (0.5TB) vmalloc/ioremap (32TB) vmalloc_base Unused hole (1TB) Virtual memory map – 1TB (store page frame descriptor) … vmemmap_base 64TB *page … *page … *page … Page Frame Descriptor vmemmap_base page_ofset_base = 0xFFFF_8880_0000_0000 vmalloc_base = 0xFFFF_C900_0000_0000 vmemmap_base = 0xFFFF_EA00_0000_0000 * Can be dynamically configured by KASLR (Kernel Address Space Layout Randomization - "arch/x86/mm/kaslr.c") Default Configuration Kernel text mapping from physical address 0 Kernel code [.text, .data…] Modules __START_KERNEL_map = 0xFFFF_FFFF_8000_0000 __START_KERNEL = 0xFFFF_FFFF_8100_0000 MODULES_VADDR 0xFFFF_8000_0000_0000 Empty Space User Space 128TB 1GB or 512MB 1GB or 1.5GB Fix-mapped address space (Expanded to 4MB: 05ab1d8a4b36) FIXADDR_START Unused hole (2MB) 0xFFFF_FFFF_FFE0_0000 0xFFFF_FFFF_FFFF_FFFF FIXADDR_TOP = 0xFFFF_FFFF_FF7F_F000 Reference: Documentation/x86/x86_64/mm.rst Note: Refer from page #5 in the slide deck Decompressed vmlinux: linux kernel initialization from page table configuration perspective
  • 29. struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 . . . struct page #0 vmemmap = VMEMMAP_START = vmemmap_base section #0, section_roots #0 section #1, section_roots #0 … struct page #32767 struct page #32768 … struct page #65535 … struct page #229375 … … struct page #4521984 … struct page #8388607 struct page #8388608 … struct page #8683520 section #2-6, section_roots #0 section #138-255, section_roots #0 … section #0-9, section_roots #1 Re-visit sparse memory Sparse Memory: Refer to section_mem_map Sparse Memory with vmemmap: Refer to vmemmap struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=0 M=1
  • 30. Sparse Memory Virtual Memmap: subsection 1. Introduction 2. Subsection users? 3. pageblock_flags: pageblock migration type
  • 31. Sparse Memory Virtual Memmap: subsection (1/4) SECTIONS_PER_ROOT PAGES_PER_SECTION 0 14 15 22 NR_SECTION_ROOTS 23 33 PFN 63 SECTION_SIZE_BITS = 27 PAGES_PER_SUBSECTION SUBSECTIONS_PER _SECTION 14 9 0 8 struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … • subsection_map: bitmap to indicate if the corresponding subsection is valid • pageblock_flags: pages of a subsection have the same flag (migration type) sparsemem vmemmap *only*
  • 32. Sparse Memory Virtual Memmap: subsection (2/4) Some macros are expanded manually Note
  • 33. Sparse Memory Virtual Memmap: subsection (3/4) SECTIONS_PER_ROOT PAGES_PER_SECTION 0 14 15 22 NR_SECTION_ROOTS 23 33 PFN 63 SECTION_SIZE_BITS = 27 PAGES_PER_SUBSECTION SUBSECTIONS_PER _SECTION 14 9 0 8 struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … • PAGES_PER_SUBSECTION = 512 pages ✓ 512 pages * 4KB = 2MB → 2MB huge page in x86_64
  • 34. Sparse Memory Virtual Memmap: subsection (4/4) • SUBSECTION_SIZE ✓ (1UL << 21) = 2MB → 2MB huge page in x86_64. SECTIONS_PER_ROOT PAGES_PER_SECTION 0 14 15 22 NR_SECTION_ROOTS 23 33 PFN 63 SECTION_SIZE_BITS = 27 PAGES_PER_SUBSECTION SUBSECTIONS_PER _SECTION 14 9 0 8 Some macros are expanded manually Note
  • 35. subsection: subsection_map users? struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … • init stage ✓ paging_init -> zone_sizes_init -> free_area_init -> subsection_map_init -> subsection_mask_set ➢ Set the corresponding bit map for the specific subsection • Reference stage ✓ pfn_section_valid(struct mem_section *ms, unsigned long pfn) ➢ Users ▪ [mm/page_alloc.c: 5089] free_pages -> virt_addr_valid -> __virt_addr_valid -> pfn_valid -> pfn_section_valid ▪ [drivers/char/mem.c: 416] mmap_kmem -> pfn_valid -> pfn_section_valid ➔ /dev/mem (`man mem`) ▪ … subsection_map users
  • 36. struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … • Hotplug stage ✓ Add ➢ #A1 [drivers/acpi/acpi_memhotplug.c: 311] acpi_memory_device_add -> acpi_memory_enable_device -> __add_memory -> add_memory_resource -> arch_add_memory -> add_pages -> __add_pages -> sparse_add_section -> section_activate -> fill_subsection_map -> subsection_mask_set ➢ #A2 [drivers/dax/kmem.c: 43] dev_dax_kmem_probe -> add_memory_driver_managed -> add_memory_resource -> same with #A1 ✓ Remove ➢ #R1 [drivers/acpi/acpi_memhotplug.c: 311] acpi_memory_device_remove -> __remove_memory -> try_remove_memory -> arch_remove_memory -> __remove_pages -> __remove_section -> sparse_remove_section -> section_deactivate -> clear_subsection_map ➢ #R2 [drivers/dax/kmem.c: 139] dev_dax_kmem_remove -> remove_memory -> try_remove_memory -> same with #R1 subsection_map users subsection: subsection_map users?
  • 37. pageblock_flags: pageblock migration type struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … unsigned long pageblock_flags[4] 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT [0] Dynamically allocated [1] [2] [3] subsection #0: Migration Type subsection #16: Migration Type subsection #32: Migration Type subsection #48: Migration Type Migration type is configured in setup_arch -> … -> memmap_init_zone
  • 38. pageblock_flags: pageblock migration type struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … unsigned long pageblock_flags[4] 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT [0] Dynamically allocated [1] [2] [3] subsection #0: Migration Type subsection #16: Migration Type subsection #32: Migration Type subsection #48: Migration Type
  • 39. pageblock: set migration type free_area_init print zone ranges and early memory node ranges for_each_mem_pfn_range(..) print memory range for each memblock subsection_map_init mminit_verify_pageflags_layout setup_nr_node_ids init_unavailable_mem for_each_online_node(nid) free_area_init_node node_set_state check_for_memory get_pfn_range_for_nid calculate_node_totalpages pgdat_set_deferred_range free_area_init_core free_area_init_core memmap_init for (j = 0; j < MAX_NR_ZONES; j++) memmap_init_zone subsection_map_init subsection_mask_set for (nr = start_sec; nr <= end_sec; nr++) bitmap_set calculate arch_zone_{lowest, highest}_possible_pfn[] for (pfn = start_pfn; pfn < end_pfn;) set_pageblock_migratetype __init_single_page set_pageblock_migratetype • [System init stage] each pageblock is initialized to MIGRATE_MOVABLE
  • 40. zone present_pages = 1311744 Page . . . pageblock #0 Page pageblock #1 Page pageblock #N CONFIG_HUGETLB_PAGE Number of Pages Y 512 = Huge page size N 1024 (MAX_ORDER - 1) pageblock size N = round_up(present_pages / pageblock_size) - 1 Example pageblocks = round_up(1311744 / 512) = 2562 pageblock 16 + 2544 + 2 = 2562 1 1 2 2
  • 41. pageblock_flags: pageblock migration type struct mem_section section_mem_map struct mem_section_usage *usage O=1 E=1 P=1 M=1 subsection #63 … subsection #0 struct mem_section_usage subsection_map[1] (bitmap) pageblock_flags[0] struct page #0 … struct page #511 … … struct page #32767 struct page #32256 subsection subsection section … … unsigned long pageblock_flags[4] 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT 4-bit MT . . . 4-bit MT 4-bit MT [0] Dynamically allocated [1] [2] [3] subsection #0: Migration Type subsection #16: Migration Type subsection #32: Migration Type subsection #48: Migration Type [CONFIG_HUGETLB_PAGE=y] pages of subsection = pages of pageblock = 512 pages (order = 9)
  • 43. Node Zone … flags Node Zone … flags LAST_CPUPID Node Zone … flags Section Node Zone flags Section Zone … flags Section … LAST_CPUPID No sparsemem or sparsemem vmemmap No sparsemem or sparsemem vmemmap + last_cpupid sparsemem sparsemem + last_cpupid sparsemem wo/ node 1. last_cpupid: Support for NUMA balancing (NUMA-optimizing scheduler) 2. sparsemem: Enabled by CONFIG_SPARSEMEM Note … page->flags layout 0 63
  • 44. page->flags layout: sparsemem vmemmap + last_cpupid Kernel Configuration: qemu – v5.11 kernel ... CONFIG_NUMA_BALANCING=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y … CONFIG_NR_CPUS=64 … CONFIG_NODES_SHIFT=10 … CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y … # CONFIG_KASAN is not set Node Zone … flags (enum pageflags) LAST_CPUPID 0 22 38 52 54 63 23-bit pageflags 2-bit zone
  • 45. page->flags layout - sparsemem vmemmap + last_cpupid Kernel Configuration: qemu – v5.11 kernel ... CONFIG_NUMA_BALANCING=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y … CONFIG_NR_CPUS=64 … CONFIG_NODES_SHIFT=10 … CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y … # CONFIG_KASAN is not set Node Zone … flags (enum pageflags) LAST_CPUPID 0 22 38 52 54 63
  • 46. Node Zone flags Section … LAST_CPUPID sparsemem + last_cpupid page->flags: section field (sparsemem wo/ vmemmap) Sparse Memory: Refer to section_mem_map
  • 47. Memory Model – Sparse Memory (sparsemem wo/ vmemmap) struct mem_section page frame .... page frame Physical Memory **mem_section struct mem_section struct mem_section … struct mem_section * page frame .... page frame .... struct page #0 struct page #n .... struct page #0 Node #1 (hotplug) Node #0 … struct mem_section * 1. [section_mem_map] Dynamic page structure: pre-allocate page structures based on the number of available page frames ✓ Refer from: memblock structure 2. Support physical memory hotplug 3. Minimum unit: mem_section - PAGES_PER_SECTION = 32768 ✓ Each memory section addresses the memory size: 32768 * 4KB (page size) = 128MB 4. [NUMA] : reduce the memory hole impact due to “struct mem_section” Note struct page #m+n-1
  • 51. /sys/devices/system/memory/block_size_bytes System memory < 64GB? block_size_bytes = 0x800_0000 (MIN_MEMORY_BLOCK_SIZE = 128 MB) block_size_bytes = 0x8000_0000 (MAX_BLOCK_SIZE = 2 GB) * Ignore SGI UV system platform !X86_FEATURE_HYPERVISOR? Find the largest allowed block size that aligns to memory end (check ‘max_pfn’) Range: 0x8000_0000 - 0x800_0000 Y N Y N * Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
  • 52. /sys/devices/system/memory/block_size_bytes System memory < 64GB? block_size_bytes = 0x800_0000 (MIN_MEMORY_BLOCK_SIZE = 128 MB) block_size_bytes = 0x8000_0000 (MAX_BLOCK_SIZE = 2 GB) * Ignore SGI UV system platform !X86_FEATURE_HYPERVISOR? Find the largest allowed block size that aligns to memory end (check ‘max_pfn’) Range: 0x8000_0000 - 0x800_0000 Y N Y N * Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
  • 53. /sys/devices/system/memory/block_size_bytes System memory < 64GB? block_size_bytes = 0x800_0000 (MIN_MEMORY_BLOCK_SIZE = 128 MB) block_size_bytes = 0x8000_0000 (MAX_BLOCK_SIZE = 2 GB) * Ignore SGI UV system platform !X86_FEATURE_HYPERVISOR? Find the largest allowed block size that aligns to memory end (check ‘max_pfn’) Range: 0x8000_0000 - 0x800_0000 Y N Y N QEMU – Guest OS * Source code: arch/x86/mm/init_64.c: probe_memory_block_size()
  • 54. /sys/devices/system/memory/block_size_bytes System memory < 64GB? block_size_bytes = 0x800_0000 (MIN_MEMORY_BLOCK_SIZE = 128 MB) block_size_bytes = 0x8000_0000 (MAX_BLOCK_SIZE = 2 GB) * Ignore SGI UV system platform !X86_FEATURE_HYPERVISOR? Find the largest allowed block size that aligns to memory end (check ‘max_pfn’) Range: 0x8000_0000 - 0x800_0000 Y N Y N QEMU – Guest OS * Source code: arch/x86/mm/init_64.c: probe_memory_block_size()