Virtual Memory: Bilkent University Department of Computer Engineering CS342 Operating Systems
Virtual Memory: Bilkent University Department of Computer Engineering CS342 Operating Systems
Chapter 9
Virtual Memory
1
Objectives and Outline
Objectives Outline
• To describe the benefits of a virtual • Background
memory system • Demand Paging
• Copy-on-Write
• To explain the concepts of demand • Page Replacement
paging,
• Allocation of Frames
– page-replacement algorithms, and
• Thrashing
– allocation of page frames
• Memory-Mapped Files
• • Allocating Kernel Memory
To discuss the principle of the working-
set model • Other Considerations
• Operating-System Examples
2
Background
• Benefits:
– Only part of the program needs to be in memory for execution
• You can execute more programs concurrently
– Logical address space can therefore be much larger than physical address
space
• You can execute programs larger than physical memory
3
Virtual Memory That is Larger Than
Physical Memory
Page 0
0
Page 1 1
2 Page
3 Page22 Page 0 Page 1
Page 2
4 unavail
unavail
Page 3 Page 2 Page 3
Page
Page00 move
Page 4 pages
… unavail
unavail Page 4
… Page
Page33
n-2 Page
n-1 Page11 page n-2 Page n-1
page table Physical memory
page n-2
page n-1 all pages of program sitting on physical Disk
Virtual memory
4
A typical virtual-address space layout of a
process
function parameters;
local variables;
return addresses
5
Shared Library Using Virtual Memory
6
Implementing Virtual Memory
– Demand paging
• Bring pages into memory when they are used, i.e. allocate memory for
pages when they are used
– Demand segmentation
• Bring segments into memory when they are used, i.e. allocate memory
for segments when they are used.
7
Demand Paging
• Pager never brings a page into memory unless page will be needed
8
Valid-Invalid Bit
i
i
page table
• During address translation, if valid–invalid bit in page table entry
is i page fault
9
Page Table When Some Pages Are Not in Main
Memory
10
Page Fault
11
Page Fault (Cont.)
12
Steps in Handling a Page Fault
swap
space
13
Performance of Demand Paging
EAT = (1 – p) x memory_access_time
+ p x (page fault overhead time
+ time to swap page out (sometimes)
+ time swap page in
+ restart overhead time)
page fault
service time
14
Demand Paging Example
15
Process Creation
- Copy-on-Write
16
Copy-on-Write
• Copy-on-Write (COW) allows both parent and child processes to initially share
the same pages in memory
If either process modifies a shared page, only then is the page copied
• COW allows more efficient process creation as only modified pages are copied
17
Before Process 1 Modifies Page C
18
After Process 1 Modifies Page C
19
What happens if there is no free frame?
• Page replacement – find some page in memory, but not really in use, swap it
out
• With page replacement, same page may be brought into memory several
times
20
Page Replacement
• Use modify (dirty) bit to reduce overhead of page transfers – only modified
pages are written to disk while removing/replacing a page.
21
Need For Page Replacement
3. Bring the desired page into the (new) free frame; update the page and frame
tables
23
Page Replacement
24
Page Replacement Algorithms
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
25
Driving reference string
• 0100 0432 0101 0612 0102 0103 0104 0101 0611 0102 0103 0104
0101 0610 0102 0103 0104 0609 0102 0105
26
Graph of Page Faults Versus The Number of
Frames
27
First-In-First-Out (FIFO) Algorithm
• Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• 3 frames (3 pages can be in memory at a time per process)
1 1 4 5
2 2 1 3 9 page faults
3 3 2 4
• 4 frames
1 1 5 4
2 2 1 5 10 page faults
3 3 2
4 4 3
29
FIFO Illustrating Belady’s Anomaly
30
Optimal Algorithm
• Replace page that will not be used for longest period of time
• 4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 4
2 6 page faults
3
4 5
31
Optimal Page Replacement
32
Least Recently Used (LRU) Algorithm
• Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 1 1 1 5
2 2 2 2 2
3 5 5 4 4
4 4 3 3 3
8 page faults
33
LRU Page Replacement
34
LRU Algorithm Implementation
• Counter implementation
– Every page entry has a counter; every time page is referenced through this
entry, copy the clock into the counter
35
LRU Algorithm Implementation
36
Use of a Stack to Record The Most Recent Page
References
37
LRU Approximation Algorithms
• Reference bit
– With each page associate a bit, initially = 0 (not referenced/used)
– When page is referenced, bit set to 1
– Replace the one which is 0 (if one exists)
• We do not know the order, however (several pages may have 0 value)
38
Second change algorithm
• Second chance
– FIFO that is checking if page is referenced or not
– Need reference bit
– If page to be replaced, look to the FIFO list; remove the page close to
head of the list and that has reference bit 0.
– If there is a page you encounter that has reference bit 1, move it to the
back after clearing the reference bit. Try to find another page that has 0 as
reference bit.
– May require to change all 1’s to 0’s and then come back to the beginning
of the queue.
R=1 R=1 R=0 R=0 R=1 R=0
Head Tail
(oldest) (Youngest)
39
Second-Chance (Clock) Page-Replacement
Algorithm
40
Counting Algorithms
• Keep a counter of the number of references that have been made to each
page
• MFU Algorithm: based on the argument that the page with the smallest count
was probably just brought in and has yet to be used
41
Allocation of Frames
42
Fixed Allocation
• Equal allocation – For example, if there are 100 frames and 5
processes, give each process 20 frames.
• Proportional allocation – Allocate according to the size of process
si size of process pi
S si
m total number of frames
m 64
si
ai allocation for pi m si 10
S
s2 127
Example: 10
a1 64 5
137
127
a2 64 59
137
43
Priority Allocation
44
Global versus Local Allocation
• Global replacement – process selects a replacement frame from the set of all
frames; one process can take a frame from another
• Local replacement – each process selects from only its own set of allocated
frames
45
Thrashing
• If a process does not have “enough” pages, the page-fault rate is very high.
This leads to:
– low CPU utilization
– operating system thinks that it needs to increase the degree of
multiprogramming
– another process added to the system
46
Thrashing (Cont.)
47
Demand Paging and Thrashing
48
Locality In A Memory-Reference Pattern
49
Working-Set Model
50
Working-Set Model
51
Keeping Track of Working-Set
Physical Memory
additional
R_bit ref_bits page x frame 0
x 0 x 0 0
y 0 y 0 0 Page y frame 1
z 0 z 0 0
w 0 w 0 0 Page z frame 2
Page w frame 3
page table
52
Keeping Track of Working-Set
• Whenever a timer interrupts copy and sets the values of all reference bits to 0
– If one of the bits in memory = 1 page in working set
• Why is this not completely accurate?
– Granularity is 5000 time_units (we don’t know when exactly in it reference
has occurred)
• Improvement = 10 bits and interrupt every 1000 time units.
53
Keeping Track of Working-Set (example)
~= 2T
fault
timer int timer int timer int timer int timer int
00 00 00 00 00 00 01 00 01 10 01 11
1 0 00 11 00 10 10 10 01 10 00 10 00
2 1 00 2 1 10 2 1 11 21 11 20 11 20 01
3 1 00 3 0 10 3 1 01 31 10 31 11 30 11
R R R R R R
54
Keeping Track of Working-Set (example
continued)
~= 2T
fault
timer int timer int timer int timer int timer int
00 00 00 00 00 00 01 00 01 10 01 11
1 0 00 11 00 10 10 10 01 10 00 10 00
2 1 00 2 1 10 2 1 11 21 11 20 11 20 01
3 1 00 3 0 10 3 1 01 31 10 31 11 30 11
R R R R R R
55
Page-Fault Frequency (PFF) Scheme
56
Working Sets and Page Fault Rates
57
Memory-Mapped Files
• Memory-mapped file I/O allows file I/O to be treated as routine memory access
by mapping a disk block to a page in memory
• A file is initially read using demand paging. A page-sized portion of the file is
read from the file system into a physical page. Subsequent reads/writes
to/from the file are treated as ordinary memory accesses.
• Simplifies file access by treating file I/O through memory rather than read()
write() system calls
• Also allows several processes to map the same file allowing the pages in
memory to be shared
58
Memory Mapped Files
59
Memory-Mapped Shared Memory in
Windows
60
Allocating Kernel Memory
61
Allocating Kernel Memory
– Slab Allocator
62
Buddy System Allocator
– When smaller allocation needed than is available, current chunk split into
two buddies of next-lower power of 2
• Continue until appropriate sized chunk available
63
Buddy System Allocator
64
Example
65
Example
512 KB of Memory (physically contiguous area)
A C B D
Alloc A 45 KB
512
Alloc B 70 KB
Alloc C 50 KB
256 256 Alloc D 90 KB
Free C
Free A
128 128(B)
128 128
128(D) 128 Free B
Free D
64(A)
64 64(C)
64
66
Slab Allocator
• Alternate strategy
67
Slab Allocator
• If slab is full of used objects, next object allocated from empty slab
– If no empty slabs, new slab allocated
• Benefits include
– no fragmentation,
– fast memory request satisfaction
68
Slabs and Caches
cache cache
structure structure
slab slab
structure structure
struct kmem_cache_s {
struct list_head slabs_full; /* points to the full slabs */
struct list_head slabs_partial; /* points to the partial slabs */
struct list_head slabs_free; /* points to the free slabs */
unsigned int objsize; /* size of objects stored in this cache */
unsigned int flags;
unsigned int num;
spinlock_t spinlock;
…
…
}
70
Slab structure
71
Layout of Slab Allocator
an object
72
Slab Allocation
73
Slab Allocator in Linux
• cat /proc/slabinfo will give info about the current slabs and objects
cache names: one cache for each different object type
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <
sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ip_fib_alias 15 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0
ip_fib_hash 15 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0
dm_tio 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0
dm_io 0 0 20 169 1 : tunables 120 60 8 : slabdata 0 0 0
uhci_urb_priv 4 127 28 127 1 : tunables 120 60 8 : slabdata 1 1 0
jbd_4k 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0
ext3_inode_cache 128604 128696 504 8 1 : tunables 54 27 8 : slabdata 16087 16087 0
ext3_xattr 24084 29562 48 78 1 : tunables 120 60 8 : slabdata 379 379 0
journal_handle 16 169 20 169 1 : tunables 120 60 8 : slabdata 1 1 0
journal_head 75 144 52 72 1 : tunables 120 60 8 : slabdata 2 2 0
revoke_table 2 254 12 254 1 : tunables 120 60 8 : slabdata 1 1 0
revoke_record 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0
scsi_cmd_cache 35 60 320 12 1 : tunables 54 27 8 : slabdata 5 5 0
….
files_cache 104 170 384 10 1 : tunables 54 27 8 : slabdata 17 17 0
signal_cache 134 144 448 9 1 : tunables 54 27 8 : slabdata 16 16 0
sighand_cache 126 126 1344 3 1 : tunables 24 12 8 : slabdata 42 42 0
task_struct 179 195 1392 5 2 : tunables 24 12 8 : slabdata 39 39 0
anon_vma 2428 2540 12 254 1 : tunables 120 60 8 : slabdata 10 10 0
pgd 89 89 4096 1 1 : tunables 24 12 8 : slabdata 89 89 0
pid 170 303 36 101 1 : tunables 120 60 8 : slabdata 3 3 0
• Prepaging
– To reduce the large number of page faults that occurs at process startup
– Prepage all or some of the pages a process will need, before they are
referenced
– But if prepaged pages are unused, I/O and memory was wasted
75
Other Issues – Page Size
76
Other Issues – TLB Reach
77
Other Issues – Program Structure
– Program 2
for (i = 0; i < 128; i++)
for (j = 0; j < 128; j++)
data[i,j] = 0;
• Consider I/O - Pages that are used for copying a file from a device must be
locked from being selected for eviction by a page replacement algorithm
79
Reason Why Frames Used For I/O Must Be In
Memory
RAM
Process A pages
Process B pages
• Windows XP
• Solaris
81
Windows XP
82
Solaris
83
Solaris 2 Page Scanner
84
References
• The slides here are adapted/modified from the textbook and its slides:
Operating System Concepts, Silberschatz et al., 7th & 8th editions, Wiley.
• Operating System Concepts, 7th and 8th editions, Silberschatz et al. Wiley.
• Modern Operating Systems, Andrew S. Tanenbaum, 3rd edition, 2009.
85