L4 - Virtualization
L4 - Virtualization
Virtualization
Emiliano Casalicchio, Phd
Department of Computer Science
Agenda
[email protected] 2
Why virtualization has been widely explored
• Increased performance and computing capacity
• Underutilized hardware and software resources
• The figure shows a typical path of resource usage in a Desktop-as-a-Service environment. The plot refer to a
specific time zone. During working hours (8-18) and working days (mon-fri) are connected and resources are
utilized. During the night no users are connected except sporadic cases. With virtualization, during the night
resources can be used for other tasks, e.g. to run desktops for users working in another time zone, or to run
other services. E. Casalicchio, et al. (2015), Cloud Desktop Workload: a Characterization Study, IEEE
International Conference on Cloud Engineering
• Lack of space
• Greening initiatives
• Rise of administrative costs
• Increased security
• Managed execution
• Portability
Virtual Image Applications Applications
Guest
VMM or Hypervisor
FIGURE 3.1
The virtualization reference model.
[email protected] 4
The main common characteristic of all these different implementations is the fact that the vir-
Increased security
• Why Virtualization increase security?
• The virtual machine manager control and filter the activity of the guest,
thus preventing some harmful operations from being performed
• Resources exposed by the host can then be hidden or simply protected from the
guest
• Sensitive information that is contained in the host can be naturally hidden without
the need to install complex security policies
• Examples
• Applets downloaded from the Internet run in a sandboxed version of the Java
Virtual Machine (JVM). The sandbox filters harmful instructions on the bases of
security policies
• In Hardware virtualization the file system exposed by the VM is completely
separated from the file system of the host
[email protected] 5
Managed execution
3.2 Characteristics of virtualized environments 75
Virtual
Resources
Physical
Resources
FIGURE 3.2
Functions enabled by managed execution.
e.g. Cassandra distributed data store
information that is contained in the host can be naturally hidden without the need to install com-
plex security policies. Increased security is a requirement when dealing with untrusted code. For
example, applets downloaded from the Internet run in a sandboxed3 version of the Java Virtual
Machine (JVM), which provides them with limited access to the hosting operating system
resources. Both the JVM and the .NET runtime provide extensive security policies for customiz-
ing the execution environment of applications. Hardware virtualization solutions such as VMware
Desktop, VirtualBox, and Parallels provide the ability to create a virtual computer with custom-
ized virtual hardware on top of which a new operating system can be installed. By default, the
file system exposed by the virtual computer is completely separated from the one of the host
machine. This becomes the perfect environment for running applications without affecting other
users in the environment.
[email protected] 6
Portability
• A java or python code can be executed on any environment
running a JVM or Python VM
• A VM is booted from a VM image that is stored in a specific
disk format
• The disk format of a virtual machine image is the format of the
underlying disk image, e.g.
• aki, ari, ami, vdi, vhd, vhdx, vmdk, …
• You can move around VM keeping or converting their format.
• A Docker container can run on any platform running the Docker
engine.
[email protected] 7
78 CHAPTER 3 Virtualization
Virtualization techniques
How it is done? Technique Virtualization Model
Emulation Application
Execution Programming
Environment Process Level High-Level VM
Language
Storage Operating
Multiprogramming
System
Virtualization
Partial Virtualization
[email protected] 8
Machine reference model
• To understand how the different virtualization
3.3 Taxonomy techniques
of virtualization work it79is
techniques
fundamental to recall the layered structure of a computer system
Applications Applications
API calls
API
Libraries Libraries
ISA ISA
Hyper-visor Ring 3
Ring 2
Privileged Modes
Ring 1
• The hyper-visor Ring 0
• literally means, runs on top of the supervisor mode
• In practice, hypervisor run in supervisor mode Most Privileged Mode
(Supervisor Mode)
• Challenge: to fully emulate and manage the status of the CPU for guest
operating systems all the sensitive instructions will be executed in privileged
mode, which requires supervisor mode
FIGURE 3.5
Security rings and privilege modes.
• Not true for the original ISA (17 sensitive instructions called in user mode) à no isolation
among guest OSs All the current systems support at least two different execution modes: superviso
mode. The first mode denotes an execution mode in which all the instructions (privile
leged) can be executed without any restriction. This mode, also called master mode o
• With Intel VT and AMD V such instructions
generally used have been redesigned
by the operating as privileged
system (or the hypervisor) to perform sensitive operati
level resources. In user mode, there are restrictions to control the machine-level resou
ning in user mode invokes the privileged instructions, hardware interrupts occur and tr
harmful execution of the instruction. Despite this, there might be some instructions th
[email protected] 11
as privileged instructions under some conditions and as nonprivileged instructions under
Disambiguation
• VMM - Virtual Machine Manager
• Widely recognized synonymous for Hypervisor
• Used by Buyya et al.
[email protected] 12
Type II hypervisors require the support of an operating system to provide virtualization 3.3 Taxonomy of virtualization techniq
• The dispatcher
• the entry point of the monitor
Virtual Machine Instance
ISA
• The allocator
Instructions (ISA) • responsible for deciding the
system resources to be provided
to the VM
Dispatcher Interpreter
Routines • The interpreter
• Implement interpreter routines
• executed whenever a virtual
machine executes a privileged
Allocator instruction
• a trap is triggered and the
Virtual Machine Manager corresponding routine is executed.
[email protected] 14
3.3 Taxonomy of virtualizat
[email protected] 15
THEOREM 3.1
Emulation Application
Execution Programming
Environment Process Level High-Level VM
Language
Storage Operating
Multiprogramming
System
Virtualization
Network Hardware-Assisted
Virtualization
Full Virtualization
System Level Hardware
…. Paravirtualization
Partial Virtualization
FIGURE 3.3
A taxonomy of virtualization techniques.
Instructions (ISA)
[email protected] 17
THEOREM 3.1
Binary Translation
• Sequences of instructions are translated from a source instruction set to
the target instruction set
• Static
• aims to convert all of the code of an executable file into code that runs on the target
architecture without having to run the code first
• Dynamic
• looks at a short sequence of code, e.g. typically on the order of an instruction or
short sequence of instructions, then translates it and caches the resulting sequence
• Code is only translated as it is discovered and when possible, branch instructions are
made to point to already translated and saved code
• More overhead that static translation; re-execution of instruction and chaching are
leverage to make it more efficient
[email protected] 18
Full virtualization (cont’d)
[email protected] 19
Emulation Application
Execution Programming
Environment Process Level High-Level VM
Language
Storage Operating
Multiprogramming
System
Virtualization
Network Hardware-Assisted
Virtualization
Full Virtualization
System Level Hardware
…. Paravirtualization
Partial Virtualization
FIGURE 3.3
A taxonomy of virtualization techniques.
ISA
Instructions (ISA)
• Reduce the performance penalties experienced by
emulating x86 hardware with VMM
• by design the x86 architecture did not meet the Dispatcher Interpreter
Routines
Equivalence requirements
• 17 sensitive instructions called in user mode
• early products were using binary translation to trap some sensitive
instructions and provide an emulated version Allocator
FIGURE 3.8
A hypervisor reference architecture.
[email protected] 21
Example - Intel VT-x • VMX Root Mode is the
privilege level of Intel x86
processors
• A set of additional instructions
is added In order to
• control the start and stop of a
VM
• allocate a memory page
• maintain the CPU state for VMs,
• Xen, VMware, and the
Microsoft Virtual PC all
implement their hypervisors by
using the VT-x technology.
[email protected] 22
Example - VMware
• VMware implements full virtualization in
• the desktop environment, by means of Type II hypervisors
• the server environment, by means of Type I hypervisors
• Use a mix of HW support and dynamic binary translation to
avoid the HW support became a bottleneck
• VMware workstation/fusion for desktop
• VMware GSX for servers
[email protected] 23
Example - VMware workstation/fusion
• I/O Instructions are binary translated
• Other instructions are directly managed by the VMM
[email protected] 24
Example - VMware GSX
Enhancements
• Vmware ESX/ESXi
can be installed
User requests on bare-metal
• ESX based on
modified version
on linux
• ESXi based on a
thin OS layer
[email protected] 25
Emulation Application
Execution Programming
Environment Process Level High-Level VM
Language
Storage Operating
Multiprogramming
System
Virtualization
Network Hardware-Assisted
Virtualization
Full Virtualization
System Level Hardware
…. Paravirtualization
Partial Virtualization
FIGURE 3.3
A taxonomy of virtualization techniques.
[email protected] 27
Xen (open source)
• Paravirtualization
• Domains represent VM instances
• Domain 0 host specific control
software, which has privileged
access to the host and controls
all the other guest operating
systems
• first one that is loaded once the
virtual machine manager has
completely booted,
• it hosts a HTTP server that
serves requests for virtual
machine creation, configuration,
and termination
• Support for linux
• Windows is supported only if
HW assisted virtualization is
available
[email protected] 28
KVM (open source)
• KVM is a hardware-assisted para-virtualization tool
• Improves performance and supports unmodified guest OSes
such as Windows, Linux, Solaris, and other UNIX variants
• Part of Linux kernel
[email protected] 29
Group discussion
• Why binary translation is used in virtualization?
• Why instruction caching is helpful in Full Virtualization?
• What are the main differences between HW assisted and not
HW assisted virtualization?
• What are the differences between Paravirtualization and Full
Virtualization?
[email protected] 30
92 CHAPTER 3 Virtualization
VM migration
VM
VM VM VM VM VM
• Off-line migration, the
VM is stopped, moved Virtual Machine Manager
transferred without
generating service
discontinuity VM VM VM
VM VM VM
Server A Server B
(running) (inactive)
After Migration
FIGURE 3.10
Live migration and server consolidation.
[email protected] 31
Live
migration
[email protected] 32
Live migration Steps 0 and 1: Start migration
• determining the migrating VM and the destination host.
• users could manually make a VM migrate to an appointed host
• in most circumstances, the migration is automatically started by
strategies such as load balancing and server consolidation.
[email protected] 33
Live migration Steps 2: Transfer memory
• the whole execution state of the VM is stored in
memory
• sending the VM’s memory to the destination node
ensures continuity of the service
1. all of the memory data is transferred in the first
round,
2. then the migration controller recopies the memory
data which is changed in the last round.
3. 2 is iterated until the dirty portion of the memory is
small enough to handle the final copy.
• precopying memory is performed iteratively, the
execution of programs is not obviously interrupted.
[email protected] 34
Live migration Step 3: Suspend the VM and
copy the last portion of the data
[email protected] 35
Live Migration Steps 4 and 5: Commit
and activate the new host.
1. the VM reloads the states and
recovers the execution of programs
in it, and the service provided by this
VM continues.
2. the network connection is redirected
to the new VM and the dependency
to the source host is cleared
3. The whole migration process finishes
by removing the original VM from the
source host.
[email protected] 36
Live migration (performance impact)
[email protected] 37
78 CHAPTER 3 Virtualization
Virtualization techniques
How it is done? Technique Virtualization Model
Emulation Application
Execution Programming
Environment Process Level High-Level VM
Language
Storage Operating
Multiprogramming
System
Virtualization
Network Hardware-Assisted
Virtualization
Full Virtualization
System Level Hardware
…. Paravirtualization
Partial Virtualization
[email protected] 38
Operating system level virtualization:
Containerization
Virtual Machines Containers
• Containerization – leverage Hardware–level virtualization Operating system – level virtualization
multiprogramming techniques at
OS level
App A App A App B
• No virtual machine manager or
hypervisor
Bins/ Bins/ Bins/
• No HW emulation, sharing of the VMs Libs Libs Libs
same host OS App App App App App
manager (e.g
B B
Container
A A B
Docker)
Guest Guest Guest
• Based on OS OS OS
• Namespace - what you can see
and use Hypervisor (Type II) Bins/ Libs Bins/ Libs
Server Server
[email protected] 39
Application Containers vs System
Containers
• A container can be used to distribute and run a single application, a
microservice, or an entire operating systems
Application containers System containers
[email protected] 41
Namespace
http://man7.org/linux/man-pages/man7/namespaces.7.html
• Wraps a global system resource in an abstraction
• Cgroup Cgroup root directory
• IPC System V IPC, POSIX message queues • At operating system level,
• Network Network devices, stacks, ports, etc. when a process is created
• Mount Mount points
• PID Process IDs
with the clone() system call
• User User and group IDs one or more namespace can
• UTS Hostname and NIS domain name be created.
• The processes within the namespace perceive they • An argument of the clone()
have their own isolated instance of the global resource system call allow to specify
what namespace to create
• Changes to the global resource are • There are other system call
• visible to other processes members of the namespace that allow a process to join
• invisible to other processes. another name space (setns)
• One use of namespaces is to implement containers and to move a process in a
new namespace (un-share)
[email protected] 42
Cgroups
http://man7.org/linux/man-pages/man7/cgroups.7.html
• Control groups, usually referred to as cgroups, are a Linux kernel feature
which allow processes to be organized into hierarchical groups whose usage
of various types of resources can then be limited and monitored.
• The kernel's cgroup interface is provided through a pseudo-filesystem called
cgroupfs.
• References
• https://facebookmicrosites.github.io/cgroup2/docs/overview
• https://facebookmicrosites.github.io/cgroup2/docs/create-cgroups.html
• (for pro) https://www.kernel.org/doc/Documentation/cgroup-v2.txt
[email protected] 43
Cgroup
[email protected] 44
Charles P. Wright (2004) Unionfs: Bringing
Filesystems Together
UnionFS https://www.linuxjournal.com/article/7714
[email protected] 45
$ ls /Fruits
Apple Tomato
$ ls /Vegetables
Carrots Tomato
$ cat /Fruits/Tomato
I am botanically a fruit.
$ cat /Vegetables/Tomato
I am horticulturally a vegetable.
# mount -t unionfs -o dirs=/Fruits:/Vegetables > none
/mnt/healthy
$ ls /mnt/healthy %%% here Fruits has
Apple Carrots Tomato higher priority than
Vegatables
$ cat /mnt/healthy/Tomato
I am botanically a fruit.
[email protected] 46
Union FS (cont’d)
• To each branch is assigned a precedence
• A branch with a higher precedence overrides a branch with a lower
precedence.
• Unionfs operates on directories
• If a directory exists in two underlying branches, the contents and attributes of the
Unionfs directory are the combination of the two lower directories
• Unionfs automatically removes any duplicate directory entries
• If a file exists in two branches, the contents and attributes of the Unionfs file are the
same as the file in the higher-priority branch, and the file in the lower-priority branch
is ignored
(Read example about /Fruit and /Vegetable in Charles P. Wright)
[email protected] 47
Union FS: Copy-on-Write Unions
• Unionfs also can mix read-only and read-write branches.
• In this case, the union as a whole is read-write
• copy-on-write semantics to give the illusion that you can modify files and directories on
read-only branches
• Example
• /mnt/cdrom is read only
• /tmp/cdpatch is read and write (created by the admin)
[email protected] 48
Containerization in
practice
Docker
Containerization with Docker
• Containers run directly within the host
machine’s kernel
• Containers + bare metal
• Containers + virtual machine
• Docker engine is a C/S application
• CLI command line interface to interact with
the server (Docker daemon)
• CLI use the Docker REST API
• Docker REST API can be used also
separately (called from applications)
• The daemon create and manage Docker
objects
Courtesy by docker.com
[email protected] 50
Docker architecture
[email protected] 51
Example of a docker CLI command
• docker run -i -t ubuntu /bin/bash
• The Docker deamon does what follow
• Locate and eventually download the ubuntu image (if not locally
stored)
Container R/W file
• Create a new container system
• Allocate a R/W file system as final layer of the image Ubuntu image
• The file system is isolated from the host file system (read only)
• Create a network interface connected to the default network
• Start the container and run /bin/bash command
[email protected] 52
Layers concept
Command to run
app build
app
UBUNTU Dockerfile
FROM ubuntu:15.04
COPY . /app
RUN make /app
CMD python /app/app.py
[email protected] 55
Storage - Container writable layer
• Does not persist when the container is terminated
• Writable layer is tightly coupled with the host where the
container is running – reduced portability
• Reduced performance
[email protected] 56
Storage - bind mount
• Any area of the host file system
• A file or directory on the host machine is mounted into a
container
• referenced by its full path on the host machine
• if not exist created by the container
• Shared with host processes
• Not portable - depends on the filesystem
• Performance usually higher than Volumes
[email protected] 57
Storage - Volume
• Created and managed by docker (isolated from the host)
• Stored within a directory on the Docker host
• Mounted into multiple containers simultaneously R/W or R
• Advantages
• Easier to back up or migrate than bind mounts.
• Volumes can be more safely shared among multiple containers.
• Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt
the contents of volumes, or to add other functionality.
• New volumes can have their content pre-populated by a container.
• Volumes work on both Linux and Windows containers.
• Volumes can be managed using Docker CLI commands or the Docker API.
[email protected] 58
Storage - tmpfs
• An area of memory outside the container writable layer
• A memory area in the container namespace or shared with the
host
• Is temporary, and only persisted in the host memory
• Removed when the container stop
• Limitations
• Available only in Linux hosts
• Not sharable among containers
[email protected] 59
Question - In which use cases (application
scenarios) will you use a specific docker’s
storage option?
Volumes (docker managed) Bind mount (Host FS) tmpfs (in memory)
[email protected] 60
Best cases for using storage options
• Volumes
• Sharing data among multiple running containers
• When the Docker host is not guaranteed to have a given directory or file structure
• When you want to store your container’s data on a remote host or a cloud provider, rather
than locally
• When you need to back up, restore, or migrate data from one Docker host to another
• Bind mount
• Sharing configuration files from the host machine to containers
• Sharing source code or build artifacts between a development environment on the
Docker host and a container
• When the file or directory structure of the Docker host is guaranteed to be consistent with
the bind mounts the containers require
• tmpfs mounts
• when you do not want the data to persist either on the host machine or within the
container.
• for security reasons
• to guarantee performance of the container when your application needs to write a large volume of
non-persistent state data
[email protected] 61
REFERENCE
https://docs.docker.com/engine/swarm/how-
Docker services swarm-mode-works/services/
[email protected] 62
Description of a service:
the docker-compose.yml file
[email protected] 63
Networking
• You can connect Container together, or connect them to non-
Docker workloads
• Docker containers do not even need to be aware that they are
deployed on Docker, or whether their peers are also Docker workloads
or not
• Type of network drivers
• Bridge
• Host
• Overlay
• Macvlan
• none
[email protected] 64
Network drivers
• bridge:
• The default network driver
• If you don’t specify a driver, this is the type of network you are creating
• Bridge networks are usually used when your applications run in
standalone containers that need to communicate
• host:
• For standalone containers
• Remove network isolation between the container and the Docker host,
and use the host’s networking directly
• host is only available for swarm services on Docker 17.06 and higher.
[email protected] 65
Network drivers (cont’d)
• overlay:
• Overlay networks connect multiple Docker daemons together and
enable swarm services to communicate with each other
• You can also use overlay networks to facilitate communication
between a swarm service and a standalone container, or between two
standalone containers on different Docker daemons
• This strategy removes the need to do OS-level routing between these
containers.
[email protected] 66
Network drivers (cont’d)
• macvlan:
• Macvlan networks allow you to assign a MAC address to a container,
making it appear as a physical device on your network
• The Docker daemon routes traffic to containers by their MAC
addresses
• Using the macvlan driver is sometimes the best choice when dealing
with legacy applications that expect to be directly connected to the
physical network, rather than routed through the Docker host’s
network stack.
• none:
• For this container, disable all networking
[email protected] 67
Summary of Networking Drivers
• User-defined bridge networks are best when you need multiple
containers to communicate on the same Docker host.
• Host networks are best when the network stack should not be
isolated from the Docker host, but you want other aspects of
the container to be isolated.
• Overlay networks are best when you need containers running
on different Docker hosts to communicate, or when multiple
applications work together using swarm services.
• Macvlan networks are best when you are migrating from a VM
setup or need your containers to look like physical hosts on
your network, each with a unique MAC address.
[email protected] 68
Outcome
• Docker and its underlining technologies
• Namespace and cgroup
• Union FS
• Roles of Docker deamon, CLI and API
• Layerd images
• Storage: volume, bind, tmpfs
• Networking
[email protected] 69
Prepare yourself for the practical
experience
• Docker documentation is the main source
• Get Docker (here you learn about the many distrib and you can
dowload your preferred version)
• https://docs.docker.com/install/
• Docker overview (an overview, covered in the lecture)
• https://docs.docker.com/engine/docker-overview/
• Docker Storage (Volumes, bind mont, …, covered in the lecture)
• https://docs.docker.com/storage/
• Docker Networking (networking, covered in the lecture)
• https://docs.docker.com/network/
[email protected] 70
Prepare yourself for the practical
experience
• Work in group, there is a lot o knowledge to put togheter
• You should read the documentation listed in the previous slide
• Prepare for the practical experience
• Get Started with docker
• Part 1 - setup of the docker environment: https://docs.docker.com/get-started/
• Do Part 1 before Friday (if possible)
• That is install docker on your laptop
• Part 2 - 6 are part of the Friday’s (2) practical experience
[email protected] 71
Other sources
• A set of example and tutorials
• https://docs.docker.com/samples/
• Suggested
• Docker for beginners
• Docker Swarm mode
[email protected] 72
Outcome
• Concept of managed execution
• Sharing, aggregation, emulation, isolation
• Taxonomy of Virtualization techniques
• System; HW assisted, full, para
• Process; multiprogramming
• How virtualization work
• Relation among the Machine reference model, privileged and
nonprivileged instructions and VMM architecture
• VM migration
• How virtualization is related with cloud computing and what are
the prons and cons of virtualization
[email protected] 74
Exam questions
• What is the meaning for “managed execution” in virtualization? Describe this concept
and provide some example.
• Explain how and why system level virtualization (a.k.a. Hypervisor) could increase the
security of a system.
• Describe and discuss the system level virtualization techniques.
• What are the main differences between system level virtualization (a.k.a. Hypervisor) and
operating system level virtualization (a.k.a. Container)? (suggestion: you can explain the
differences referring to technologies you know)
• How does virtual machine’s live migration works? Describe and comment the main steps
of the virtual machine live migration process.
• Explain and compare the different techniques to store data in a container
• Describe the Docker architecture.
• Explain why and how container beneficiate of the Union File System technology.
• Where are stored the Docker Images in a Docker host? (do a search by yourself)
• More questions on docker … will be updated on classroom
[email protected] 75
Questions?