Xen and Art Visualization
Xen and Art Visualization
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris,
Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield
University of Cambridge Computer Laboratory
15 JJ Thomson Avenue, Cambridge, UK, CB3 0FD
{firstname.lastname}@cl.cam.ac.uk
ABSTRACT
1.
General Terms
Design, Measurement, Performance
Keywords
Virtual Machine Monitors, Hypervisors, Paravirtualization
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SOSP03, October 1922, 2003, Bolton Landing, New York, USA.
Copyright 2003 ACM 1-58113-757-5/03/0010 ...$5.00.
INTRODUCTION
Notwithstanding the intricacies of the x86, there are other arguments against full virtualization. In particular, there are situations
in which it is desirable for the hosted operating systems to see real
as well as virtual resources: providing both real and virtual time
allows a guest OS to better support time-sensitive tasks, and to correctly handle TCP timeouts and RTT estimates, while exposing real
machine addresses allows a guest OS to improve performance by
using superpages [30] or page coloring [24].
We avoid the drawbacks of full virtualization by presenting a virtual machine abstraction that is similar but not identical to the underlying hardware an approach which has been dubbed paravirtualization [43]. This promises improved performance, although
it does require modifications to the guest operating system. It is
important to note, however, that we do not require changes to the
application binary interface (ABI), and hence no modifications are
required to guest applications.
We distill the discussion so far into a set of design principles:
1. Support for unmodified application binaries is essential, or
users will not transition to Xen. Hence we must virtualize all
architectural features required by existing standard ABIs.
2. Supporting full multi-application operating systems is important, as this allows complex server configurations to be
virtualized within a single guest OS instance.
3. Paravirtualization is necessary to obtain high performance
and strong resource isolation on uncooperative machine architectures such as x86.
4. Even on cooperative machine architectures, completely hiding the effects of resource virtualization from guest OSes
risks both correctness and performance.
Note that our paravirtualized x86 abstraction is quite different
from that proposed by the recent Denali project [44]. Denali is designed to support thousands of virtual machines running network
services, the vast majority of which are small-scale and unpopular. In contrast, Xen is intended to scale to approximately 100 virtual machines running industry standard applications and services.
Given these very different goals, it is instructive to contrast Denalis
design choices with our own principles.
Firstly, Denali does not target existing ABIs, and so can elide
certain architectural features from their VM interface. For example, Denali does not fully support x86 segmentation although it is
exported (and widely used1 ) in the ABIs of NetBSD, Linux, and
Windows XP.
Secondly, the Denali implementation does not address the problem of supporting application multiplexing, nor multiple address
spaces, within a single guest OS. Rather, applications are linked
explicitly against an instance of the Ilwaco guest OS in a manner
rather reminiscent of a libOS in the Exokernel [23]. Hence each virtual machine essentially hosts a single-user single-application unprotected operating system. In Xen, by contrast, a single virtual
machine hosts a real operating system which may itself securely
multiplex thousands of unmodified user-level processes. Although
a prototype virtual MMU has been developed which may help Denali in this area [44], we are unaware of any published technical
details or evaluation.
Thirdly, in the Denali architecture the VMM performs all paging
to and from disk. This is perhaps related to the lack of memorymanagement support at the virtualization layer. Paging within the
1
For example, segments are frequently used by thread libraries to address
thread-local data.
Memory Management
Segmentation
Paging
CPU
Protection
Exceptions
System Calls
Interrupts
Time
Device I/O
Network, Disk, etc.
Cannot install fully-privileged segment descriptors and cannot overlap with the top end of the linear
address space.
Guest OS has direct read access to hardware page tables, but updates are batched and validated by
the hypervisor. A domain may be allocated discontiguous machine pages.
Guest OS must run at a lower privilege level than Xen.
Guest OS must register a descriptor table for exception handlers with Xen. Aside from page faults,
the handlers remain the same.
Guest OS may install a fast handler for system calls, allowing direct calls from an application into
its guest OS and avoiding indirecting through Xen on every call.
Hardware interrupts are replaced with a lightweight event system.
Each guest OS has a timer interface and is aware of both real and virtual time.
Virtual devices are elegant and simple to access. Data is transferred using asynchronous I/O rings.
An event mechanism replaces hardware interrupts for notifications.
Table 1: The paravirtualized x86 interface.
2.1.1
Memory management
guest OS. The task is easier if the architecture provides a softwaremanaged TLB as these can be efficiently virtualized in a simple
manner [13]. A tagged TLB is another useful feature supported
by most server-class RISC architectures, including Alpha, MIPS
and SPARC. Associating an address-space identifier tag with each
TLB entry allows the hypervisor and each guest OS to efficiently
coexist in separate address spaces because there is no need to flush
the entire TLB when transferring execution.
Unfortunately, x86 does not have a software-managed TLB; instead TLB misses are serviced automatically by the processor by
walking the page table structure in hardware. Thus to achieve the
best possible performance, all valid page translations for the current
address space should be present in the hardware-accessible page
table. Moreover, because the TLB is not tagged, address space
switches typically require a complete TLB flush. Given these limitations, we made two decisions: (i) guest OSes are responsible for
allocating and managing the hardware page tables, with minimal
involvement from Xen to ensure safety and isolation; and (ii) Xen
exists in a 64MB section at the top of every address space, thus
avoiding a TLB flush when entering and leaving the hypervisor.
Each time a guest OS requires a new page table, perhaps because a new process is being created, it allocates and initializes a
page from its own memory reservation and registers it with Xen.
At this point the OS must relinquish direct write privileges to the
page-table memory: all subsequent updates must be validated by
Xen. This restricts updates in a number of ways, including only
allowing an OS to map pages that it owns, and disallowing writable
mappings of page tables. Guest OSes may batch update requests to
amortize the overhead of entering the hypervisor. The top 64MB
region of each address space, which is reserved for Xen, is not accessible or remappable by guest OSes. This address region is not
used by any of the common x86 ABIs however, so this restriction
does not break application compatibility.
Segmentation is virtualized in a similar way, by validating updates to hardware segment descriptor tables. The only restrictions
on x86 segment descriptors are: (i) they must have lower privilege than Xen, and (ii) they may not allow any access to the Xenreserved portion of the address space.
2.1.2
CPU
OS subsection
Architecture-independent
Virtual network driver
Virtual block-device driver
Xen-specific (non-driver)
Total
(Portion of total x86 code base
# lines
Linux
XP
78
1299
484
1070
1363
3321
2995
4620
1.36% 0.04%)
segment is not present or if the handler is not paged into memory then an appropriate fault will be taken when Xen executes the
iret instruction which returns to the handler. Xen detects these
double faults by checking the faulting program counter value: if
the address resides within the exception-virtualizing code then the
offending guest OS is terminated.
Note that this lazy checking is safe even for the direct systemcall handler: access faults will occur when the CPU attempts to
directly jump to the guest OS handler. In this case the faulting
address will be outside Xen (since Xen will never execute a guest
OS system call) and so the fault is virtualized in the normal way.
If propagation of the fault causes a further double fault then the
guest OS is terminated as described above.
2.1.3
Device I/O
2.2
Table 2 demonstrates the cost, in lines of code, of porting commodity operating systems to Xens paravirtualized x86 environment. Note that our NetBSD port is at a very early stage, and hence
we report no figures here. The XP port is more advanced, but still in
progress; it can execute a number of user-space applications from
a RAM disk, but it currently lacks any virtual I/O drivers. For this
reason, figures for XPs virtual device drivers are not presented.
However, as with Linux, we expect these drivers to be small and
simple due to the idealized hardware abstraction presented by Xen.
Windows XP required a surprising number of modifications to
its architecture independent OS code because it uses a variety of
structures and unions for accessing page-table entries (PTEs). Each
page-table access had to be separately modified, although some of
Control
Plane
Software
User
Software
User
Software
User
Software
GuestOS
GuestOS
GuestOS
GuestOS
Xeno-Aware
Device Drivers
Xeno-Aware
Device Drivers
Xeno-Aware
Device Drivers
Xeno-Aware
Device Drivers
(XenoLinux)
Domain0
control
interface
(XenoLinux)
virtual
x86 CPU
(XenoBSD)
virtual
phy mem
virtual
network
(XenoXP)
virtual
blockdev
X
E
N
Figure 1: The structure of a machine running the Xen hypervisor, hosting a number of different guest operating systems,
including Domain0 running control software in a XenoLinux
environment.
this process was automated with scripts. In contrast, Linux needed
far fewer modifications to its generic memory system as it uses preprocessor macros to access PTEs the macro definitions provide
a convenient place to add the translation and hypervisor calls required by paravirtualization.
In both OSes, the architecture-specific sections are effectively
a port of the x86 code to our paravirtualized architecture. This
involved rewriting routines which used privileged instructions, and
removing a large amount of low-level system initialization code.
Again, more changes were required in Windows XP, mainly due
to the presence of legacy 16-bit emulation code and the need for
a somewhat different boot-loading mechanism. Note that the x86specific code base in XP is substantially larger than in Linux and
hence a larger porting effort should be expected.
3.
DETAILED DESIGN
3.1
3.2
The presence of a hypervisor means there is an additional protection domain between guest OSes and I/O devices, so it is crucial
that a data transfer mechanism be provided that allows data to move
vertically through the system with as little overhead as possible.
Two main factors have shaped the design of our I/O-transfer
mechanism: resource management and event notification. For resource accountability, we attempt to minimize the work required to
demultiplex data to a specific domain when an interrupt is received
from a device the overhead of managing buffers is carried out
later where computation may be accounted to the appropriate domain. Similarly, memory committed to device I/O is provided by
the relevant domains wherever possible to prevent the crosstalk inherent in shared buffer pools; I/O buffers are protected during data
transfer by pinning the underlying page frames within Xen.
Request Consumer
Private pointer
in Xen
Response Producer
Shared pointer
updated by
Xen
Request Producer
Shared pointer
updated by guest OS
Response Consumer
Private pointer
in guest OS
Request queue - Descriptors queued by the VM but not yet accepted by Xen
Outstanding descriptors - Descriptor slots awaiting a response from Xen
Response queue - Descriptors returned by Xen in response to serviced requests
Unused descriptors
3.3.1
CPU scheduling
Xen currently schedules domains according to the Borrowed Virtual Time (BVT) scheduling algorithm [11]. We chose this particular algorithms since it is both work-conserving and has a special mechanism for low-latency wake-up (or dispatch) of a domain
when it receives an event. Fast dispatch is particularly important
to minimize the effect of virtualization on OS subsystems that are
designed to run in a timely fashion; for example, TCP relies on
the timely delivery of acknowledgments to correctly estimate network round-trip times. BVT provides low-latency dispatch by using virtual-time warping, a mechanism which temporarily violates
ideal fair sharing to favor recently-woken domains. However,
other scheduling algorithms could be trivially implemented over
our generic scheduler abstraction. Per-domain scheduling parameters can be adjusted by management software running in Domain0.
3.3.2
Xen provides guest OSes with notions of real time, virtual time
and wall-clock time. Real time is expressed in nanoseconds passed
since machine boot and is maintained to the accuracy of the processors cycle counter and can be frequency-locked to an external time
source (for example, via NTP). A domains virtual time only advances while it is executing: this is typically used by the guest OS
scheduler to ensure correct sharing of its timeslice between application processes. Finally, wall-clock time is specified as an offset
to be added to the current real time. This allows the wall-clock time
to be adjusted without affecting the forward progress of real time.
Each guest OS can program a pair of alarm timers, one for real
time and the other for virtual time. Guest OSes are expected to
maintain internal timer queues and use the Xen-provided alarm
timers to trigger the earliest timeout. Timeouts are delivered using Xens event mechanism.
3.3.3
3.3.4
Physical memory
3.3.5
Network
3.3.6
Disk
4. EVALUATION
In this section we present a thorough performance evaluation
of Xen. We begin by benchmarking Xen against a number of alternative virtualization techniques, then compare the total system
throughput executing multiple applications concurrently on a single native operating system against running each application in its
own virtual machine. We then evaluate the performance isolation
Xen provides between guest OSes, and assess the total overhead of
running large numbers of operating systems on the same hardware.
For these measurements, we have used our XenoLinux port (based
on Linux 2.4.21) as this is our most mature guest OS. We expect
the relative overheads for our Windows XP and NetBSD ports to
be similar but have yet to conduct a full evaluation.
There are a number of preexisting solutions for running multiple copies of Linux on the same machine. VMware offers several
commercial products that provide virtual x86 machines on which
unmodified copies of Linux may be booted. The most commonly
used version is VMware Workstation, which consists of a set of
privileged kernel extensions to a host operating system. Both
Windows and Linux hosts are supported. VMware also offer an
enhanced product called ESX Server which replaces the host OS
with a dedicated kernel. By doing so, it gains some performance
benefit over the workstation product. ESX Server also supports a
paravirtualized interface to the network that can be accessed by installing a special device driver (vmxnet) into the guest OS, where
deployment circumstances permit.
We have subjected ESX Server to the benchmark suites described
below, but sadly are prevented from reporting quantitative results
due to the terms of the products End User License Agreement. Instead we present results from VMware Workstation 3.2, running
on top of a Linux host OS, as it is the most recent VMware product
without that benchmark publication restriction. ESX Server takes
advantage of its native architecture to equal or outperform VMware
Workstation and its hosted architecture. While Xen of course requires guest OSes to be ported, it takes advantage of paravirtualization to noticeably outperform ESX Server.
We also present results for User-mode Linux (UML), an increasingly popular platform for virtual hosting. UML is a port of Linux
to run as a user-space process on a Linux host. Like XenoLinux, the
changes required are restricted to the architecture dependent code
base. However, the UML code bears little similarity to the native
x86 port due to the very different nature of the execution environments. Although UML can run on an unmodified Linux host, we
present results for the Single Kernel Address Space (skas3) variant that exploits patches to the host OS to improve performance.
We also investigated three other virtualization techniques for running ported versions of Linux on the same x86 machine. Connectixs Virtual PC and forthcoming Virtual Server products (now acquired by Microsoft) are similar in design to VMwares, providing
full x86 virtualization. Since all versions of Virtual PC have benchmarking restrictions in their license agreements we did not subject
them to closer analysis. UMLinux is similar in concept to UML
but is a different code base and has yet to achieve the same level of
performance, so we omit the results. Work to improve the performance of UMLinux through host OS modifications is ongoing [25].
Although Plex86 was originally a general purpose x86 VMM, it has
now been retargeted to support just Linux guest OSes. The guest
OS must be specially compiled to run on Plex86, but the source
changes from native x86 are trivial. The performance of Plex86 is
currently well below the other techniques.
All the experiments were performed on a Dell 2650 dual processor 2.4GHz Xeon server with 2GB RAM, a Broadcom Tigon 3 Gigabit Ethernet NIC, and a single Hitachi DK32EJ 146GB 10k RPM
SCSI disk. Linux version 2.4.21 was used throughout, compiled
for architecture i686 for the native and VMware guest OS experiments, for xeno-i686 when running on Xen, and architecture um
when running on UML. The Xeon processors in the machine support SMT (hyperthreading), but this was disabled because none
of the kernels currently have SMT-aware schedulers. We ensured
that the total amount of memory available to all guest OSes plus
their VMM was equal to the total amount available to native Linux.
The RedHat 7.2 distribution was used throughout, installed on
ext3 file systems. The VMs were configured to use the same disk
partitions in persistent raw mode, which yielded the best performance. Using the same file system image also eliminated potential
differences in disk seek times and transfer rates.
4.1
Relative Performance
518
514
400
418
1714
158
1633
172
271
263
554
567
1.0
550
567
1.1
310
334
0.8
0.7
0.6
111
199
0.2
306
0.3
172
0.4
150
65
0.5
80
535
0.9
0.1
0.0
OSDB-IR (tup/s)
OSDB-OLTP (tup/s)
dbench (score)
Figure 3: Relative performance of native Linux (L), XenoLinux (X), VMware workstation 3.2 (V) and User-Mode Linux (U).
memory management. In the case of the VMMs, this system time
is expanded to a greater or lesser degree: whereas Xen incurs a
mere 3% overhead, the other VMMs experience a more significant
slowdown.
Two experiments were performed using the PostgreSQL 7.1.3
database, exercised by the Open Source Database Benchmark suite
(OSDB) in its default configuration. We present results for the
multi-user Information Retrieval (IR) and On-Line Transaction Processing (OLTP) workloads, both measured in tuples per second. A
small modification to the suites test harness was required to produce correct results, due to a UML bug which loses virtual-timer
interrupts under high load. The benchmark drives the database
via PostgreSQLs native API (callable SQL) over a Unix domain
socket. PostgreSQL places considerable load on the operating system, and this is reflected in the substantial virtualization overheads
experienced by VMware and UML. In particular, the OLTP benchmark requires many synchronous disk operations, resulting in many
protection domain transitions.
The dbench program is a file system benchmark derived from
the industry-standard NetBench. It emulates the load placed on a
file server by Windows 95 clients. Here, we examine the throughput experienced by a single client performing around 90,000 file
system operations.
SPEC WEB99 is a complex application-level benchmark for evaluating web servers and the systems that host them. The workload is
a complex mix of page requests: 30% require dynamic content generation, 16% are HTTP POST operations and 0.5% execute a CGI
script. As the server runs it generates access and POST logs, so
the disk workload is not solely read-only. Measurements therefore
reflect general OS performance, including file system and network,
in addition to the web server itself.
A number of client machines are used to generate load for the
server under test, with each machine simulating a collection of
users concurrently accessing the web site. The benchmark is run
repeatedly with different numbers of simulated users to determine
the maximum number that can be supported. SPEC WEB99 defines
a minimum Quality of Service that simulated users must receive in
order to be conformant and hence count toward the score: users
4.2
Config
L-SMP
L-UP
Xen
VMW
UML
null
call
0.53
0.45
0.46
0.73
24.7
null
I/O
0.81
0.50
0.50
0.83
25.1
stat
2.10
1.28
1.22
1.88
36.1
open slct
closeTCP
3.51 23.2
1.92 5.70
1.88 5.69
2.99 11.1
62.8 39.9
sig
inst
0.83
0.68
0.69
1.02
26.0
sig
hndl
2.94
2.49
1.75
4.63
46.0
fork
proc
143
110
198
874
21k
exec sh
proc proc
601 4k2
530 4k0
768 4k8
2k3 10k
33k 58k
2p
0K
1.69
0.77
1.97
18.1
15.5
2p
16K
1.88
0.91
2.22
17.6
14.6
2p
64K
2.03
1.06
2.67
21.3
14.4
8p
16K
2.36
1.03
3.07
22.4
16.3
8p
64K
26.8
24.3
28.7
51.6
36.8
16p
16K
4.79
3.61
7.08
41.7
23.6
16p
64K
38.4
37.6
39.4
72.2
52.0
0K File
create delete
L-SMP 44.9 24.2
L-UP 32.1 6.08
Xen
32.5 5.86
VMW 35.3 9.3
UML
130 65.7
10K File
create delete
123 45.2
66.0 12.5
68.2 13.6
85.6 21.4
250 113
Mmap
lat
99.0
68.0
139
620
1k4
Prot
fault
1.33
1.06
1.40
7.53
21.8
Page
fault
1.88
1.42
2.73
12.4
26.3
In 24 of the 37 microbenchmarks, XenoLinux performs similarly to native Linux, tracking the uniprocessor Linux kernel performance closely and outperforming the SMP kernel. In Tables 3
to 5 we show results which exhibit interesting performance variations among the test systems; particularly large penalties for Xen
are shown in bold face.
In the process microbenchmarks (Table 3), Xen exhibits slower
fork, exec and sh performance than native Linux. This is expected,
since these operations require large numbers of page table updates
which must all be verified by Xen. However, the paravirtualization
approach allows XenoLinux to batch update requests. Creating new
page tables presents an ideal case: because there is no reason to
commit pending updates sooner, XenoLinux can amortize each hypercall across 2048 updates (the maximum size of its batch buffer).
Hence each update hypercall constructs 8MB of address space.
Table 4 shows context switch times between different numbers
of processes with different working set sizes. Xen incurs an extra overhead between 1s and 3s, as it executes a hypercall to
change the page table base. However, context switch results for
larger working set sizes (perhaps more representative of real applications) show that the overhead is small compared with cache effects. Unusually, VMware Workstation is inferior to UML on these
microbenchmarks; however, this is one area where enhancements
in ESX Server are able to reduce the overhead.
The mmap latency and page fault latency results shown in Table 5 are interesting since they require two transitions into Xen per
page: one to take the hardware fault and pass the details to the guest
OS, and a second to install the updated page table entry on the guest
OSs behalf. Despite this, the overhead is relatively modest.
One small anomaly in Table 3 is that XenoLinux has lower signalhandling latency than native Linux. This benchmark does not require any calls into Xen at all, and the 0.75s (30%) speedup is pre-
4.2.1
Network performance
4.3
In this section, we compare the performance of running multiple applications in their own guest OS against running them on
the same native operating system. Our focus is on the results using Xen, but we comment on the performance of the other VMMs
where applicable.
Figure 4 shows the results of running 1, 2, 4, 8 and 16 copies
of the SPEC WEB99 benchmark in parallel on a two CPU machine. The native Linux was configured for SMP; on it we ran
multiple copies of Apache as concurrent processes. In Xens case,
each instance of SPEC WEB99 was run in its own uniprocessor
Linux guest OS (along with an sshd and other management processes). Different TCP port numbers were used for each web server
to enable the copies to be run in parallel. Note that the size of the
SPEC data set required for c simultaneous connections is (25 +
(c 0.66)) 4.88 MBytes or approximately 3.3GB for 1000 connections. This is sufficiently large to thoroughly exercise the disk
and buffer cache subsystems.
Achieving good SPEC WEB99 scores requires both high throughput and bounded latency: for example, if a client request gets stalled
due to a badly delayed disk read, then the connection will be classed
as non conforming and wont contribute to the score. Hence, it is
important that the VMM schedules domains in a timely fashion. By
default, Xen uses a 5ms time slice.
In the case of a single Apache instance, the addition of a sec-
200
16
2833
2685
3289
290
282
318
2104
1661
1.0
0.5
8(diff)
OSDB-IR
1.5
0.0
289
880
874
906
842
896
400
2.0
158
600
662
800
887
1001
924
-16.3% (non-SMP guest)
1000
8(diff)
OSDB-OLTP
4.4
Performance Isolation
Normalised Throughput
2.0
1.8
1.6
1.4
Linux
XenoLinux (50ms time slice)
XenoLinux (5ms time slice)
1.2
1.0
10
20
30
4.5 Scalability
In this section, we examine Xens ability to scale to its target
of 100 domains. We discuss the memory requirements of running
many instances of a guest OS and associated applications, and measure the CPU performance overhead of their execution.
We evaluated the minimum physical memory requirements of
a domain booted with XenoLinux and running the default set of
RH7.2 daemons, along with an sshd and Apache web server. The
domain was given a reservation of 64MB on boot, limiting the maximum size to which it could grow. The guest OS was instructed to
minimize its memory footprint by returning all pages possible to
Xen. Without any swap space configured, the domain was able to
reduce its memory footprint to 6.2MB; allowing the use of a swap
device reduced this further to 4.2MB. A quiescent domain is able to
stay in this reduced state until an incoming HTTP request or periodic service causes more memory to be required. In this event, the
guest OS will request pages back from Xen, growing its footprint
as required up to its configured ceiling.
This demonstrates that memory usage overhead is unlikely to
be a problem for running 100 domains on a modern server class
machine far more memory will typically be committed to application data and buffer cache usage than to OS or application text
pages. Xen itself maintains only a fixed 20kB of state per domain,
unlike other VMMs that must maintain shadow page tables etc.
Finally, we examine the overhead of context switching between
large numbers of domains rather than simply between processes.
Figure 6 shows the normalized aggregate throughput obtained when
running a small subset of the SPEC CINT2000 suite concurrently
on between 1 and 128 domains or processes on our dual CPU server.
The line representing native Linux is almost flat, indicating that
for this benchmark there is no loss of aggregate performance when
scheduling between so many processes; Linux identifies them all as
compute bound, and schedules them with long time slices of 50ms
or more. In contrast, the lower line indicates Xens throughput
5.
RELATED WORK
Virtualization has been applied to operating systems both commercially and in research for nearly thirty years. IBM VM/370 [19,
38] first made use of virtualization to allow binary support for legacy
code. VMware [10] and Connectix [8] both virtualize commodity
PC hardware, allowing multiple operating systems to run on a single host. All of these examples implement a full virtualization of
(at least a subset of) the underlying hardware, rather than paravirtualizing and presenting a modified interface to the guest OS. As
shown in our evaluation, the decision to present a full virtualization, although able to more easily support off-the-shelf operating
systems, has detremental consequences for performance.
The virtual machine monitor approach has also been used by
Disco to allow commodity operating systems to run efficiently on
ccNUMA machines [7, 18]. A small number of changes had to be
made to the hosted operating systems to enable virtualized execution on the MIPS architecture. In addition, certain other changes
were made for performance reasons.
At present, we are aware of two other systems which take the
paravirtualization approach: IBM presently supports a paravirtualized version of Linux for their zSeries mainframes, allowing large
numbers of Linux instances to run simultaneously. Denali [44],
discussed previously, is a contemporary isolation kernel which attempts to provide a system capable of hosting vast numbers of virtualized OS instances.
In addition to Denali, we are aware of two other efforts to use
low-level virtualization to build an infrastructure for distributed
systems. The vMatrix [1] project is based on VMware and aims
to build a platform for moving code between different machines.
As vMatrix is developed above VMware, they are more concerned
with higher-level issues of distribution that those of virtualization
itself. In addition, IBM provides a Managed Hosting service, in
which virtual Linux instances may be rented on IBM mainframes.
The PlanetLab [33] project has constructed a distributed infrastructure which is intended to serve as a testbed for the research and
development of geographically distributed network services. The
platform is targeted at researchers and attempts to divide individual
physical hosts into slivers, providing simultaneous low-level access
to users. The current deployment uses VServers [17] and SILK [4]
to manage sharing within the operating system.
We share some motivation with the operating system extensibility and active networks communities. However, when running
over Xen there is no need to check for safe code, or for guaranteed termination the only person hurt in either case is the client
http://www.cl.cam.ac.uk/netos/xen
6.2
Conclusion
Xen provides an excellent platform for deploying a wide variety of network-centric services, such as local mirroring of dynamic
web content, media stream transcoding and distribution, multiplayer
game and virtual reality servers, and smart proxies [2] to provide a
less ephemeral network presence for transiently-connected devices.
Xen directly addresses the single largest barrier to the deployment of such services: the present inability to host transient servers
for short periods of time and with low instantiation costs. By allowing 100 operating systems to run on a single server, we reduce the
associated costs by two orders of magnitude. Furthermore, by turning the setup and configuration of each OS into a software concern,
we facilitate much smaller-granularity timescales of hosting.
As our experimental results show in Section 4, the performance
of XenoLinux over Xen is practically equivalent to the performance
of the baseline Linux system. This fact, which comes from the careful design of the interface between the two components, means that
there is no appreciable cost in having the resource management facilities available. Our ongoing work to port the BSD and Windows
XP kernels to operate over Xen is confirming the generality of the
interface that Xen exposes.
Acknowledgments
This work is supported by ESPRC Grant GR/S01894/01 and by
Microsoft. We would like to thank Evangelos Kotsovinos, Anil
Madhavapeddy, Russ Ross and James Scott for their contributions.
7.
REFERENCES