Andrey Vagin <avagin@openvz.org>
● 1 June 2013, Moscow<
Linux Containers
Fedora Virtualization Day
2
Different types of Virtualization
● Virtual Machines
– Emulation (qemu)
– Paravirtualization (XEN)
– Hardware Virtualization (KVM, ESX)
● OS Level Virtualization
– Containers (Linux Containers, Solaris Zones, BSD Jails)
3
Virtual Machine (VM)
Hardware
Hypervisor
Virtual HW
Kernel
Apps
Virtual HW
Kernel
Apps
Virtual HW
Kernel
Apps
Virtual HW
Kernel
Apps
4
Containers (CT)
Hardware
Host Kernel
Apps
Namespaces
Apps
Namespaces
Apps
Namespaces
Apps
Namespaces
- chroot() on steroids
5
7
Comparison VM-s vs CT-s
● One real HW, many virtual HW,
many OS-s.
● One real HW, one kernel, many
userspace instances
● Full control on the guest OS ● Native performance: [almost] no
overhead
● High density
● KSM (Kernel SamePage Merging) ● Use resources on demand
● Dynamic resource allocation
● Naturally share pages
● Depends on hardware
(VT-x, VT-d, EPT, etc)
● Not all functionality are virtualized
● Flexibility
8
9
10
Evolution of Operating System
● Multitask
many processes
● Multiuser
many users
● Multicontainer
many containers
11
Containers (CT)
Cgroups
– control resources
● cpu, cpuacct, cpuset
● blkio
● memory
● net_cls
Namespaces
– isolate environments
● MNT
● PID
● NET
● IPC
● User
● UTS
12
How to execute CT
All allowed by default
● unshare, nsenter
● Systemd Lightweight Containers
● LXC
● Libvirt LXC
All restricted by default
● OpenVZ (vzctl-core) (FC19)
13
vzctl - perform various operations on a container
# yum install -y vzctl-core
# vzctl create 101 --ostemplate fedora-15
# vzctl start 101
# vzctl exec 101 ps ax
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 init
11830 ? Ss 0:00 syslogd -m 0
11897 ? Ss 0:00 /usr/sbin/sshd
11943 ? Ss 0:00 xinetd -stayalive -pidfile ...
12218 ? Ss 0:00 sendmail: accepting connections
12265 ? Ss 0:00 sendmail: Queue runner@01:00:00
13362 ? Ss 0:00 /usr/sbin/httpd
13363 ? S 0:00 _ /usr/sbin/httpd
..............................................
6416 ? Rs 0:00 ps axf
# vzctl stop 101
# vzctl destroy 101
14
OpenVZ kernel only features
● Ploop (snapshot, backups, different formats)
● Second level quota
● More functional memory accounting
● PFCache (memory deduplication. Io-ops saving)
● More isolated in compare with FC19 (lack of userns)
Questions?
http://openvz.org
Andrey Vagin <avagin@openvz.org><
CRIU - Checkpoint/Restore in User-space
17
What is C/R and how can it be used?
C/R is the ability to save states of processes and to restore them later.
Usage scenarios:
– Failure recovery
– Live migration
– Reboot-less upgrade
– Speed up of slow-boot services
– HPC issues
18
History
●
Berkeley Lab Checkpoint/Restart (BLCR) (2003)
– Load a kernel module and link with a library
● DMTCP: Distributed MultiThreaded CheckPointing (2004-2006)
– Preload a library
●
OpenVZ (2005)
– OpenVZ kernel
● Linux Checkpoint/Restart by Oren Laadan (2008)
– A non-mainline kernel
●
CRIU (2011)
OpenVZ
2005
BLCR
2003
Linux C/R
2008
CRIU
2011
DMTCP
2007
19
How does this work?
Kernel objects Process tree
crtools
Image files
Name-spaces
Files
Sockets
Pipes
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
20
Kernel interfaces
Dump Restore
syscalls
netlink
/proc/
ptrace
21
Dump
● Parasite code
– Receive file descriptors
– Dump memory content
– Prctl(), sigaction, pending signals, timers, etc.
● Ptrace
– freeze processes
– Inject a parasite code
● Netlink
– Get information about sockets, netns
● Procfs
/proc/PID/maps, /proc/PID/map_files/,
/proc/PID/status, /proc/PID/mountinfo
22
Restore
● Collect shared objects
● Restore name-spaces
● Create a process tree
– Restore SID, PGID
– Restore objects, which should be inherited
● Files, sockets, pipes, ...
● Restore per-task properties.
● Restore memory
● Call sigreturn
● Awesome
Namespaces
Processes
23
Interesting moments
● How to restore shared objects?
– Send file descriptors via unix sockets
– Map files from /proc/self/map_files/ for restoring anon shared mappings
● How to restore memory mappings on the correct places?
– Map a new code block and a stack
– Unmap crtools' mappings
– Remap task's mappings on the correct places
● How to resume a process?
– Create a signal frame
– Call sigreturn()
24
Kernel impact
~140 patches merged ~10 patches in flight
~11 new features appeared ~2 new features to come
25
New features in a kernel
● Parasite code injection (by Tejun Heo)
– Read task states, that are currently retrieved by a task only about itself
● The kcmp() system call
– Helps checking which kernel objects are shared between processes
● Proc map_files directory
– Find out what exact file is mapped
– Mappings sharing info
● A bunch of prctl extensions
– Set various private stuff on task/mm objects (c/r-only feature)
● Last-pid sysctl
– Restore task with desired PID value
26
New features in a kernel
● TCP repair mode
– Read intimate state of a TCP connection
and reconstructs it from scratch on a freshly created socket
● Sockets information dumping via netlink (sock_diag)
– Extendable sockets state retrieving engine
● Virtual net devices indexes
– Allows to restore network devices in a namespace
● Socket peeking offset
– Allows peeking sockets queues (reading without removing data from queue)
● Task memory tracking
– incremental snapshots, online migration
27
What are already supported?
– X86_64 architecture
– Process tree linkage
– Multi-threaded apps
– All kinds of memory mappings
– Terminals, groups, sessions
– Open files (shared and unlinked)
– Established TCP connections
– Unix sockets, Packet sockets
– Name-spaces (net, mount, ipc)
– Non-posix files (epoll, inotify)
– Pipes, Fifo-s, IPC, ...
– ARM architecture
– Pending signals
– TCP time-stamps
– Iterative snapshots
– VDSO
– LXC and OpenVZ containers
In flight
– Posix timers
– Convert OpenVZ images
28
How is CRIU tested?
● ZDTM – a set of unit-tests
● Real-life applications
– Apache, Nginx
– MySQL, MongoDB, Oracle
– Make && gcc
– Tar & gzip
– Screen
– Java
– LXC
– VNC server + GUI applications
29
Future plans (Feb, 2013)
● Support all kinds of kernel objects
● Merge all in-flight patches in the mainstream kernel
● Integrate CRIU with OpenVZ and LXC utilities
● Iterative migration
– Migrate memory content before freezing applications
● Integration in distributions
– CRIU was accepted to Fedora 19
30
How to use
● ./crtools dump -t pid [<options>]
– checkpoint a process/tree identified by pid
● ./crtools restore -t pid [<options>]
– restore - restore a process/tree identified by pid
● ./crtools show (-D dir)|(-f file) [<options>]
– show dump file(s) contents
● ./crtools check
– checks whether the kernel support is up-to-date
● ./crtools exec -t pid <syscall-string>
– exec - execute a system call by other task
31
Checkpoint/restore of a VNC server.
Questions?
http://criu.org

More Related Content

ODP
CRIU: Time and Space Travel for Linux Containers
PDF
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
PDF
2. Vagin. Linux containers. June 01, 2013
PDF
Yet another introduction to Linux RCU
PPT
PDF
FOSDEM2015: Live migration for containers is around the corner
PDF
Arbiter volumes in gluster
ODP
Checkpoint/restore of containers with CRIU
CRIU: Time and Space Travel for Linux Containers
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
2. Vagin. Linux containers. June 01, 2013
Yet another introduction to Linux RCU
FOSDEM2015: Live migration for containers is around the corner
Arbiter volumes in gluster
Checkpoint/restore of containers with CRIU

What's hot (19)

PDF
Scalability and Performance of CNS 3.6
PDF
Gluster as Native Storage for Containers - past, present and future
ODP
Speeding up ps and top
PDF
Live migrating a container: pros, cons and gotchas
PDF
Live migration: pros, cons and gotchas -- Pavel Emelyanov
ODP
Gluster volume snapshot
PDF
Heketi Functionality into Glusterd2
PDF
Container-relevant Upstream Kernel Developments
PDF
Gluster and Kubernetes
PDF
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
PDF
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
ODP
Gluster d thread_synchronization_using_urcu_lca2016
PDF
High Availability Storage (susecon2016)
PDF
Small, Simple, and Secure: Alpine Linux under the Microscope
PDF
OSv at Usenix ATC 2014
PDF
Gluster as Block Store in Containers
PPTX
CoreOS Intro
PDF
Talk on PHP Day Uruguay about Docker
ODP
GlusterFS Cinder integration presented at GlusterNight Paris event @ Openstac...
Scalability and Performance of CNS 3.6
Gluster as Native Storage for Containers - past, present and future
Speeding up ps and top
Live migrating a container: pros, cons and gotchas
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Gluster volume snapshot
Heketi Functionality into Glusterd2
Container-relevant Upstream Kernel Developments
Gluster and Kubernetes
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
Gluster d thread_synchronization_using_urcu_lca2016
High Availability Storage (susecon2016)
Small, Simple, and Secure: Alpine Linux under the Microscope
OSv at Usenix ATC 2014
Gluster as Block Store in Containers
CoreOS Intro
Talk on PHP Day Uruguay about Docker
GlusterFS Cinder integration presented at GlusterNight Paris event @ Openstac...
Ad

Similar to Fedora Virtualization Day: Linux Containers & CRIU (20)

PDF
OpenVZ Linux Containers
PPTX
Realizing Linux Containers (LXC)
ODP
Openvz booth
PDF
LXC on Ganeti
ODP
CRIU: are we there yet?
PDF
Lightweight Virtualization in Linux
PDF
Let's Containerize New York with Docker!
PDF
Docker Introduction + what is new in 0.9
PDF
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
PDF
Talk 160920 @ Cat System Workshop
PDF
Containers and Namespaces in the Linux Kernel
ODP
Not so brief history of Linux Containers
ODP
Not so brief history of Linux Containers - Kir Kolyshkin
PDF
Ospresentation 120112074429-phpapp02 (1)
PPTX
ubantu ppt.pptx
PPTX
Linux Container Brief for IEEE WG P2302
PDF
Containerization is more than the new Virtualization: enabling separation of ...
PDF
Docker and-containers-for-development-and-deployment-scale12x
PDF
Evolution of Linux Containerization
PDF
Evoluation of Linux Container Virtualization
OpenVZ Linux Containers
Realizing Linux Containers (LXC)
Openvz booth
LXC on Ganeti
CRIU: are we there yet?
Lightweight Virtualization in Linux
Let's Containerize New York with Docker!
Docker Introduction + what is new in 0.9
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Talk 160920 @ Cat System Workshop
Containers and Namespaces in the Linux Kernel
Not so brief history of Linux Containers
Not so brief history of Linux Containers - Kir Kolyshkin
Ospresentation 120112074429-phpapp02 (1)
ubantu ppt.pptx
Linux Container Brief for IEEE WG P2302
Containerization is more than the new Virtualization: enabling separation of ...
Docker and-containers-for-development-and-deployment-scale12x
Evolution of Linux Containerization
Evoluation of Linux Container Virtualization
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPT
What is a Computer? Input Devices /output devices
DOCX
search engine optimization ppt fir known well about this
PPTX
Tartificialntelligence_presentation.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Five Habits of High-Impact Board Members
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPT
Geologic Time for studying geology for geologist
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Enhancing emotion recognition model for a student engagement use case through...
Web Crawler for Trend Tracking Gen Z Insights.pptx
NewMind AI Weekly Chronicles – August ’25 Week III
sustainability-14-14877-v2.pddhzftheheeeee
Assigned Numbers - 2025 - Bluetooth® Document
What is a Computer? Input Devices /output devices
search engine optimization ppt fir known well about this
Tartificialntelligence_presentation.pptx
STKI Israel Market Study 2025 version august
A novel scalable deep ensemble learning framework for big data classification...
Module 1.ppt Iot fundamentals and Architecture
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Developing a website for English-speaking practice to English as a foreign la...
Five Habits of High-Impact Board Members
observCloud-Native Containerability and monitoring.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Geologic Time for studying geology for geologist
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf

Fedora Virtualization Day: Linux Containers & CRIU

  • 1. Andrey Vagin <[email protected]> ● 1 June 2013, Moscow< Linux Containers Fedora Virtualization Day
  • 2. 2 Different types of Virtualization ● Virtual Machines – Emulation (qemu) – Paravirtualization (XEN) – Hardware Virtualization (KVM, ESX) ● OS Level Virtualization – Containers (Linux Containers, Solaris Zones, BSD Jails)
  • 3. 3 Virtual Machine (VM) Hardware Hypervisor Virtual HW Kernel Apps Virtual HW Kernel Apps Virtual HW Kernel Apps Virtual HW Kernel Apps
  • 5. 5
  • 6. 7 Comparison VM-s vs CT-s ● One real HW, many virtual HW, many OS-s. ● One real HW, one kernel, many userspace instances ● Full control on the guest OS ● Native performance: [almost] no overhead ● High density ● KSM (Kernel SamePage Merging) ● Use resources on demand ● Dynamic resource allocation ● Naturally share pages ● Depends on hardware (VT-x, VT-d, EPT, etc) ● Not all functionality are virtualized ● Flexibility
  • 7. 8
  • 8. 9
  • 9. 10 Evolution of Operating System ● Multitask many processes ● Multiuser many users ● Multicontainer many containers
  • 10. 11 Containers (CT) Cgroups – control resources ● cpu, cpuacct, cpuset ● blkio ● memory ● net_cls Namespaces – isolate environments ● MNT ● PID ● NET ● IPC ● User ● UTS
  • 11. 12 How to execute CT All allowed by default ● unshare, nsenter ● Systemd Lightweight Containers ● LXC ● Libvirt LXC All restricted by default ● OpenVZ (vzctl-core) (FC19)
  • 12. 13 vzctl - perform various operations on a container # yum install -y vzctl-core # vzctl create 101 --ostemplate fedora-15 # vzctl start 101 # vzctl exec 101 ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 init 11830 ? Ss 0:00 syslogd -m 0 11897 ? Ss 0:00 /usr/sbin/sshd 11943 ? Ss 0:00 xinetd -stayalive -pidfile ... 12218 ? Ss 0:00 sendmail: accepting connections 12265 ? Ss 0:00 sendmail: Queue runner@01:00:00 13362 ? Ss 0:00 /usr/sbin/httpd 13363 ? S 0:00 _ /usr/sbin/httpd .............................................. 6416 ? Rs 0:00 ps axf # vzctl stop 101 # vzctl destroy 101
  • 13. 14 OpenVZ kernel only features ● Ploop (snapshot, backups, different formats) ● Second level quota ● More functional memory accounting ● PFCache (memory deduplication. Io-ops saving) ● More isolated in compare with FC19 (lack of userns)
  • 15. Andrey Vagin <[email protected]>< CRIU - Checkpoint/Restore in User-space
  • 16. 17 What is C/R and how can it be used? C/R is the ability to save states of processes and to restore them later. Usage scenarios: – Failure recovery – Live migration – Reboot-less upgrade – Speed up of slow-boot services – HPC issues
  • 17. 18 History ● Berkeley Lab Checkpoint/Restart (BLCR) (2003) – Load a kernel module and link with a library ● DMTCP: Distributed MultiThreaded CheckPointing (2004-2006) – Preload a library ● OpenVZ (2005) – OpenVZ kernel ● Linux Checkpoint/Restart by Oren Laadan (2008) – A non-mainline kernel ● CRIU (2011) OpenVZ 2005 BLCR 2003 Linux C/R 2008 CRIU 2011 DMTCP 2007
  • 18. 19 How does this work? Kernel objects Process tree crtools Image files Name-spaces Files Sockets Pipes 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101
  • 20. 21 Dump ● Parasite code – Receive file descriptors – Dump memory content – Prctl(), sigaction, pending signals, timers, etc. ● Ptrace – freeze processes – Inject a parasite code ● Netlink – Get information about sockets, netns ● Procfs /proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo
  • 21. 22 Restore ● Collect shared objects ● Restore name-spaces ● Create a process tree – Restore SID, PGID – Restore objects, which should be inherited ● Files, sockets, pipes, ... ● Restore per-task properties. ● Restore memory ● Call sigreturn ● Awesome Namespaces Processes
  • 22. 23 Interesting moments ● How to restore shared objects? – Send file descriptors via unix sockets – Map files from /proc/self/map_files/ for restoring anon shared mappings ● How to restore memory mappings on the correct places? – Map a new code block and a stack – Unmap crtools' mappings – Remap task's mappings on the correct places ● How to resume a process? – Create a signal frame – Call sigreturn()
  • 23. 24 Kernel impact ~140 patches merged ~10 patches in flight ~11 new features appeared ~2 new features to come
  • 24. 25 New features in a kernel ● Parasite code injection (by Tejun Heo) – Read task states, that are currently retrieved by a task only about itself ● The kcmp() system call – Helps checking which kernel objects are shared between processes ● Proc map_files directory – Find out what exact file is mapped – Mappings sharing info ● A bunch of prctl extensions – Set various private stuff on task/mm objects (c/r-only feature) ● Last-pid sysctl – Restore task with desired PID value
  • 25. 26 New features in a kernel ● TCP repair mode – Read intimate state of a TCP connection and reconstructs it from scratch on a freshly created socket ● Sockets information dumping via netlink (sock_diag) – Extendable sockets state retrieving engine ● Virtual net devices indexes – Allows to restore network devices in a namespace ● Socket peeking offset – Allows peeking sockets queues (reading without removing data from queue) ● Task memory tracking – incremental snapshots, online migration
  • 26. 27 What are already supported? – X86_64 architecture – Process tree linkage – Multi-threaded apps – All kinds of memory mappings – Terminals, groups, sessions – Open files (shared and unlinked) – Established TCP connections – Unix sockets, Packet sockets – Name-spaces (net, mount, ipc) – Non-posix files (epoll, inotify) – Pipes, Fifo-s, IPC, ... – ARM architecture – Pending signals – TCP time-stamps – Iterative snapshots – VDSO – LXC and OpenVZ containers In flight – Posix timers – Convert OpenVZ images
  • 27. 28 How is CRIU tested? ● ZDTM – a set of unit-tests ● Real-life applications – Apache, Nginx – MySQL, MongoDB, Oracle – Make && gcc – Tar & gzip – Screen – Java – LXC – VNC server + GUI applications
  • 28. 29 Future plans (Feb, 2013) ● Support all kinds of kernel objects ● Merge all in-flight patches in the mainstream kernel ● Integrate CRIU with OpenVZ and LXC utilities ● Iterative migration – Migrate memory content before freezing applications ● Integration in distributions – CRIU was accepted to Fedora 19
  • 29. 30 How to use ● ./crtools dump -t pid [<options>] – checkpoint a process/tree identified by pid ● ./crtools restore -t pid [<options>] – restore - restore a process/tree identified by pid ● ./crtools show (-D dir)|(-f file) [<options>] – show dump file(s) contents ● ./crtools check – checks whether the kernel support is up-to-date ● ./crtools exec -t pid <syscall-string> – exec - execute a system call by other task

Editor's Notes

  • #19: BLCR is used a kernel module, doesn&apos;t checkpoint sockets, SysV IPC, zombies, etc. Applications should be linked with a library and executed via a helper. DMTCP uses an executer too, but doesn&apos;t require a kernel module. C/R in OpenVZ is used for checkpount/restore and migrate OpenVZ containers. It requires the OpenVZ kernel. Linux C/R is very similar on OpenVZ C/R. It is used for checkpoint/restore of LXC. CRIU combines all this project. It will work on the pure upstream kernel. It&apos;s able to dump a task without any preparation.