Class-6 Live Migration of VMs
Class-6 Live Migration of VMs
Cloud Networking
Winter 2025
K. K. Ramakrishnan
[email protected]
SSC 308
Tue-Thur 2:00 pm - 3:20 pm
CS 208: Class 6
Live Migration of Virtual Machines
• Placement problem:
• Data center has 10,000 - 1,000,000 servers
• Customers have 100,000s VMs
• Each with different CPU and RAM needs
• Which VMs should go where?
• How hard is this problem?
• Often use bin packing algorithms
• But imbalance may result from
imperfect predictions, variability 4
VM Checkpointing
• What is a computer?
• What is its "state"?
5
VM Checkpointing
• A VM is really just (memory + CPU + disk) state
• Save a copy of a VM:
• Pause VM
VM
• Write out all of its memory pages to disk
• Write out processor (e.g., registers) state Memory state
• Make a copy of its disk data
CPU state
• Can restore it later, at exactly the
point where it was paused
• Or create multiple clones based
on this saved state
6
VM Migration
• Virtualization abstracts the VM from the
underlying hardware...
• So we can change the hardware completely
without affecting the VM
7
VM Migration
• Assume disk is accessible to both hosts over network
• Only need to migrate memory and CPU state
• Basic idea for Live Migration:
• Copy all memory pages as the VM is running
• Track what memory pages are written to by the VM during transfer
• Resend all dirty pages.
• Repeat until there are very few pages left.
• Pause and send the final set of pages
time
VM VM
Memory Copy All Pages Memory
...
Iterate: Copy dirtied pages
Host 1 Host 2
8
Significance of Live Migration
Concept:
Migration: Move VM between different physical machines
Live: Without disconnecting client or application (invisible)
12
Why Live Migrate VMs?
• Load Balancing
• System Maintenance
• Edge computing
• To move computation closer to the data sources
16
Migrate VM’s memory first or last?
• Pure stop-and-copy
– Freeze VM at source,
– Copy the VM’s pseudo-physical memory contents to target,
– Restart VM at target
– Long downtime.
– Total migration time = downtime
18
Design(2) - Migrating Memory
Three phases of memory transfer
◦ Push phase
Source VM continues running
Pages are pushed across the network to destination
◦ Stop and copy phase
The source VM stopped, pages are copied across to the
destination VM
◦ Pull phase
New VM executes and find faults
“Pull” pages from the source
Pre- copy
◦ A bounded iterative push phase with a very
short stop and copy phase
◦ Avoids Pull phase
19
Design(3) – Network &
Disk
Network
◦ Generate an unsolicited ARP reply
from the migrated host, advertising
the IP has moved to a new location
◦ A small number of in-flight packets
maybe lost
Disk
◦ Network-attached Storage(NAS)
20
Pre-copy Live VM Migration
VM running
VM running Overhead Downtime
normally on
normally on Due to (VM out of
Destination
Source Host copying service)
Host
22
Design(5) – Logical Steps
Stage 0: Pre-Migration
◦ Preselect target host
Stage 1: Reservation
◦ Confirm the resources are available on
destination host
Stage 2: Iterative Pre- copy
◦ First iteration, all pages are transferred
from source to destination
◦ Subsequent iteration, copy dirty pages
during the previous transfer phase
23
Design(6) – Logical Steps
Stage 3: Stop and copy
◦ Stop the running Guest OS at source host
◦ Redirect the network flow to destination host
◦ CPU state and remaining memory pages are
transferred
Stage 4: Commitment
◦ Destination host indicates to source it has
successfully received a consistent OS image
◦ Source Host acknowledges and VM @ source
now can be discarded
Stage 5: Activation
◦ VM starts on Destination host
24
Pre-copy migration
Challenge in Pre-copy
Every VM has some set of pages
which it updates very frequently.
And this ‘Working Set’ size would
influence downtime of the
migration.
Þ Writable Working Set
27
28
Implementation
There are two kinds of implementations of migration:
for initiating and managing state transfer
◦ Managed Migration
The physical machine has a management VM. (Xen)
A migration daemon running in the management VM
◦ Self Migration
Implemented within the migrating (guest) OS
A small stub required on the destination machine
There are three conditions for changing to the stop-
and-copy (iterative pre-copy) phase.
◦ The dirty rate is bigger than an upper bound.
◦ The size of working set is small enough.
◦ A limit on the rounds.
33
Implementation Issues(2)
Dynamic Rate - Limiting
◦ The first round v = m (transfer rate =
min. b/w)
◦ Next round v = v*r (adjust transfer
rate)
◦ Pre-copy will be terminated when v
> M or remaining pages less than
256KB
34
Implementation Issues(3)
Rapid Page Dirtying
◦ The page dirtying is often physically clustered
◦ “Peek” those pages dirtied in the previous round
◦ Avoid transferring page multiple times
• Before transmitting page, peek into the current round's dirty
bitmap
• Skip transmission if page is already dirtied in ongoing round
Stunning Rogue Process
◦ Some processes may produce dirty memory at a fast rate
◦ Ex. A test program which writes one word in every page was
able to dirty memory at a rate of 320Gbit/sec
◦ Forking a monitoring thread within the OS kernel when
migrating begins
◦ Monitor the Writeable WS of individual processes
◦ If the process dirties memory too fast, then “stun” it
Move non-interactive processes generating dirty pages to
wait queue
• Execution paused until migration completes 35
Implementation Issues(4)
Freeing Page Cache Pages
◦ OS can tell some or all of the free
pages to not be transferred
◦ Page Cache: transparent cache for
the pages originating from a
secondary storage device - disk.
◦ Do not transfer these pages in the
first iteration
◦ Reduce transferred time
37
Evaluation(1)
DellPE-2650 server-class
machine
Dual Xeon 2Ghz CPUs
2GB memory
Broadcom TG3 network interface
Gigabit Ethernet
Netapp F840 NAS
XenLinux 2.4.27
38
Evaluation(2)- Simple Web
Server
Continuously serving a single
512KB file to a set of 100 clients
39
Evaluation(3)- SPECweb99
SPECweb99 – a application-level
benchmark for evaluating web
server
40
Evaluation(4)
Quake 3 server – an online
game server with 6 players
◦ Downtime: 50ms
Diabolical Workload
◦ Running a 512MB host and use a
simple program that writes
constantly to a 256MB region of
memory
◦ Downtime : 3.5sec
◦ Rare in real world
41
A worst case example
Summary
Minimal impact on running
services
Small downtime with realistic
server
44
Optimization
Dynamic Rate-Limiting
Rapid Page Dirtying
Para-virtualized Optimization
1. Ballooning mechanism
technique to dynamically adjust physical memory of guest.
driver in the guest OS, called a balloon driver, allocate pages
from the guest OS and then hand those pages back to Xen.
From the guest OS perspective: has all memory that it started with;
only there is a device driver - memory hog. But from Xen’s
perspective, the memory which the device driver asked for is no
longer real memory — it’s just empty space (hence “balloon”).
When the administrator wants to give memory back to VM, balloon
driver asks Xen to fill the empty space with memory again
(shrinking or “deflating” the balloon), and then “free” the
resulting pages back to the guest OS (memory available for use
again).
4
The catch in Pre-copy:
Dirty page tracking
• Mark the VM’s memory pages as read-only after each
iteration.
(Non-pageable
Memory e.g. guest
Kernel memory)
• Advantage
• Lower network overhead
• Each page sent only once over the network
• Total migration time lower for write-intensive
workloads.
• Disadvantage
• Cold start penalty at the destination till working set
is fetched over the network
48
Stages of Postcopy Migration
1. First, freeze the VM at the source
2. Migrate CPU state and minimum state to destination
3. Start VM at the target
• but without its memory!
VM Memory Pages
0 Max
50
Prepaging strategies for Post-copy
0 Max
Size LRU Ordering
• Versus Precopy
• Lower total migration time
•Versus Postcopy
• Smaller cold-start penalty due to fewer network-bound page faults
52
Black-box and Gray-box Strategies
for Virtual Machine Migration
per min
Arrivals per min
1200
140000
Arrivals
40000
20000
0 0
0 5 10 15 20
0 1 2 3 4 5
Time (hrs)
Time (days)
Static over-provisioning
Allocate for peak load
Wastes resources
Not suitable for dynamic workloads
Difficult to predict peak resource requirements
Dynamic provisioning
Adjust based on workload
Often done manually
Becoming easier with virtualization
Problem Statement
System Overview
Conclusions
Research Challenges
Sandpiper: automatically detect and mitigate
hotspots through virtual machine migration
When to migrate?
Where to move to?
How much of each resource to allocate?
A migratory bird
VM 1
VM 2
One per server Nucleus
…
Control Plane
Centralized server
PM 1 PM N
Hotspot Detector
Detect when a hotspot occurs
Profiling Engine Hotspot
Detector
Profiling
Engine
Migration
Manager
Decide how much to allocate
Control Plane
E.g., Apache modelling module
Migration Manager
PM = Physical Machine
Determine where to migrate VM = Virtual Machine
Black-Box and Gray-Box
Black-box: only data from outside the VM
Completely OS and application agnostic
Dom 0 for monitoring network usage; memory usage (indirectly)
Gray Box
Black Box
??? Application logs
OS statistics
Is black-box sufficient?
What do we gain from gray-box data?
Outline
Introduction & Motivation
System Overview
Conclusions
Black-box Monitoring
Xen uses a “Driver Domain”
Special VM with network and disk drivers
Nucleus runs here
VM
Driver
CPU Domain
Utilization
Time Time Time
Not overloaded Hotspot Detected!
Resource Profiling – How much?
How much of each resource to give a VM
Create distribution from time series
Provision to meet peaks of recent workload
Utilization Profile
100
Historical data 80
Probability
60
40
20
0
0 20 40 60 80 100
% Utilization
Black box: generally know when resources are not enough
What to do if utilization is at 100%?
Gray-box
Request level knowledge can help
Can use application models to determine requirements
Determining Placement – Where to?
Migrate VMs from overloaded to underloaded servers
1 1 1
Volume =
net
1-cpu
* 1-net
* 1-mem
em
m
Highly loaded servers are targeted first
Swap if necessary
Swap a high Volume VM for a low Volume one Spare
Requires 3 migrations
Can’t support both at once
PM1 PM2
VM1
Swaps increase the number VM2 VM5
of hotspots we can resolve VM3 VM4
Swap
Outline
Introduction & Motivation
System Overview
Conclusions
Implementation
Use Xen 3.0.2-3 virtualization software
PM 1
PM 2
PM 3
Memory Hotspots
Virtual machine runs SpecJBB benchmark
Memory utilization increases over time
Black-box increases VM memory allocation by 32MB if page-
swapping is observed
Gray-box maintains 32 MB free
Significantly reduces page-swapping
756
706
656
RAM (MB)
606
556
506
456
406
356
Black-box
306 Gray-box
256
0 200 400 600 800 1000 1200 1400
Time (sec)
12 180
Static
Static 160
10 Sandpiper
Sandpiper 140
Time (intervals)
# of Hotspots
8 120
100
6
80
4
60
2 40
20
0
0
1 11 21 31 41 51
Measured Predicted
0.45
0.4
0.35
Utilization
0.3
0.25
0.2
0.15
0.1
PM1
0.05 PM2
0
0 50 100 150 200 250 300
Time (sec)
Related Work
Menasce and Bennani 2006
Single server resource management
Shirako
Migration used to meet resource policies determined by
application owners
Future work
Improved black-box memory monitoring
Support for replicated services
Virtual Machine Files
77
File format(1)
.XML File
◦ Save VM Configuration details
◦ Named with the GUID
78
File format(2)
.BIN files
◦ This file contains the memory of a
virtual machine or snapshot that is in
a saved state(running programs, data
for those programs, word processing
documents you are viewing, etc.)
.VSV files
◦ This file contains the saved state from
the devices associated with the virtual
machine.
79
File format(3)
.Vhd files
◦ These are the virtual hard disk files
for the virtual machine(save things
such as files, folders, file system and
disk partitions)
.avhd files
◦ These are the differencing disk files
used for virtual machine snapshots
80