Vitulization
Vitulization
Jiri Herrmann
Scott Radvan
Yehuda Zimmerman
Dayle Parker
Laura Novich
Legal Notice
Co pyright 20 16 Red Hat, Inc.
This do cument is licensed by Red Hat under the Creative Co mmo ns Attributio n-ShareAlike 3.0
Unpo rted License. If yo u distribute this do cument, o r a mo dified versio n o f it, yo u must pro vide
attributio n to Red Hat, Inc. and pro vide a link to the o riginal. If the do cument is mo dified, all Red
Hat trademarks must be remo ved.
Red Hat, as the licenso r o f this do cument, waives the right to enfo rce, and agrees no t to assert,
Sectio n 4 d o f CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shado wman lo go , JBo ss, OpenShift, Fedo ra, the Infinity
lo go , and RHCE are trademarks o f Red Hat, Inc., registered in the United States and o ther
co untries.
Linux is the registered trademark o f Linus To rvalds in the United States and o ther co untries.
Java is a registered trademark o f Oracle and/o r its affiliates.
XFS is a trademark o f Silico n Graphics Internatio nal Co rp. o r its subsidiaries in the United
States and/o r o ther co untries.
MySQL is a registered trademark o f MySQL AB in the United States, the Euro pean Unio n and
o ther co untries.
No de.js is an o fficial trademark o f Jo yent. Red Hat So ftware Co llectio ns is no t fo rmally
related to o r endo rsed by the o fficial Jo yent No de.js o pen so urce o r co mmercial pro ject.
The OpenStack Wo rd Mark and OpenStack lo go are either registered trademarks/service
marks o r trademarks/service marks o f the OpenStack Fo undatio n, in the United States and o ther
co untries and are used with the OpenStack Fo undatio n's permissio n. We are no t affiliated with,
endo rsed o r spo nso red by the OpenStack Fo undatio n, o r the OpenStack co mmunity.
All o ther trademarks are the pro perty o f their respective o wners.
Abstract
The Virtualizatio n Administratio n Guide co vers administratio n o f ho st physical machines,
netwo rking, sto rage, device and guest virtual machine management, and tro ublesho o ting. No te:
This do cument is under develo pment, is subject to substantial change, and is pro vided o nly as
a preview. The included info rmatio n and instructio ns sho uld no t be co nsidered co mplete, and
sho uld be used with cautio n. To expand yo ur expertise, yo u might also be interested in the Red
Hat Enterprise Virtualizatio n (RH318) training co urse.
T able of Contents
. .hapt
C
. . . .er
. .1. .. Server
. . . . . . Best
. . . . .Pract
. . . . ices
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. 2. . . . . . . . . .
. .hapt
C
. . . .er
. .2. .. Securit
......y
. .for
. . .Virt
. . . ualiz
. . . . at
. . ion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. 3. . . . . . . . . .
2 .1. Sto rag e Sec urity Is s ues
13
2 .2. SELinux and Virtualiz atio n
13
2 .3. SELinux
15
2 .4. Virtualiz atio n Firewall Info rmatio n
15
. .hapt
C
. . . .er
. .3.
. .sVirt
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. 7. . . . . . . . . .
3 .1. Sec urity and Virtualiz atio n
18
3 .2. s Virt Lab eling
19
. .hapt
C
. . . .er
. .4. .. KVM
. . . . Live
. . . . Migrat
. . . . . . ion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. 0. . . . . . . . . .
4 .1. Live Mig ratio n Req uirements
20
4 .2. Live Mig ratio n and Red Hat Enterp ris e Linux Vers io n Co mp atib ility
22
4 .3. Shared Sto rag e Examp le: NFS fo r a Simp le Mig ratio n
4 .4. Live KVM Mig ratio n with virs h
4 .4.1. Ad d itio nal Tip s fo r Mig ratio n with virs h
23
24
26
28
29
. .hapt
C
. . . .er
. .5.
. .Remot
. . . . . .e. Management
. . . . . . . . . . . . of
..G
. .uest
. . . .s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
...........
5 .1. Remo te Manag ement with SSH
5 .2. Remo te Manag ement O ver TLS and SSL
36
39
5 .3. Trans p o rt Mo d es
41
. .hapt
C
. . . .er
. .6. .. O
. .vercommit
. . . . . . . . .t .ing
. . .wit
. . .h. KVM
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. 6. . . . . . . . . .
6 .1. Intro d uc tio n
46
6 .2. O verc o mmitting Virtualiz ed CPUs
47
. .hapt
C
. . . .er
. .7. .. KSM
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. 9. . . . . . . . . .
. .hapt
C
. . . .er
. .8. .. Advanced
. . . . . . . . .G
. .uest
. . . . Virt
. . . ual
. . . .Machine
. . . . . . . Administ
. . . . . . . . rat
. . .ion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
...........
8 .1. Co ntro l G ro up s (c g ro up s )
53
8 .2. Hug e Pag e Sup p o rt
8 .3. Running Red Hat Enterp ris e Linux as a G ues t Virtual Mac hine o n a Hyp er-V Hyp ervis o r
53
54
54
55
56
56
56
57
57
58
59
60
60
60
61
61
62
. .hapt
C
. . . .er
. .9. .. G
. .uest
. . . . virt
. . . ual
. . . machine
. . . . . . . . device
. . . . . . configurat
. . . . . . . . . .ion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. 3. . . . . . . . . .
9 .1. PCI Devic es
64
65
9 .1.2.
9 .1.3.
9 .1.4.
9 .1.5.
69
71
74
75
75
76
78
81
81
81
82
86
87
88
. .hapt
C
. . . .er
. .1. 0. .. Q
. .EMU. . . . .img
. . . and
. . . .Q
. .EMU
. . . .G
. .uest
. . . . Agent
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9. 0. . . . . . . . . .
10 .1. Us ing q emu-img
90
10 .2. Q EMU G ues t Ag ent
10 .2.1. Ins tall and Enab le the G ues t Ag ent
95
95
95
96
97
97
98
0 .3.1. Us ing lib virt Co mmand s with the Q EMU G ues t Ag ent o n Wind o ws G ues ts
1
10 .4. Setting a Limit o n Devic e Red irec tio n
10 1
10 1
10 .5. Dynamic ally Chang ing a Ho s t Phys ic al Mac hine o r a Netwo rk Brid g e that is Attac hed to a Virtual
NIC
10 2
. .hapt
C
. . . .er
. .1. 1. .. St
. . orage
. . . . . .Concept
. . . . . . . s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.0. 4. . . . . . . . . .
11.1. Sto rag e Po o ls
11.2. Vo lumes
10 4
10 5
. .hapt
C
. . . .er
. .1. 2. .. St
. . orage
. . . . . .Pools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.0. 7. . . . . . . . . .
12.1. Dis k-b as ed Sto rag e Po o ls
12.1.1. Creating a Dis k-b as ed Sto rag e Po o l Us ing virs h
12.1.2. Deleting a Sto rag e Po o l Us ing virs h
12.2. Partitio n-b as ed Sto rag e Po o ls
12.2.1. Creating a Partitio n-b as ed Sto rag e Po o l Us ing virt-manag er
110
111
111
114
115
117
118
118
121
122
124
124
125
130
131
133
10 7
10 8
133
133
137
140
141
143
143
143
146
147
147
148
150
12.8 .3. Co nfig uring the Virtual Mac hine to Us e a vHBA LUN
12.8 .4. Des tro ying the vHBA Sto rag e Po o l
151
152
. .hapt
C
. . . .er
. .1. 3.
. . .Volumes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. 53
...........
13.1. Creating Vo lumes
153
153
154
154
157
158
. .hapt
C
. . . .er
. .1. 4. .. Managing
. . . . . . . . . guest
. . . . . . virt
. . . ual
. . . machines
. . . . . . . . . wit
. . .h. virsh
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.6. 0. . . . . . . . . .
14.1. G eneric Co mmand s
14.1.1. help
16 0
16 0
16 1
14.1.3. vers io n
14.1.4. Arg ument Dis p lay
16 1
16 1
14.1.5. c o nnec t
14.1.6 . Dis p laying Bas ic Info rmatio n
16 1
16 2
16 2
16 2
16 3
16 4
16 4
16 4
16 4
16 5
14.5.4. Ed iting and Dis p laying a Des c rip tio n and Title o f a Do main
16 5
16 5
16 6
16 6
16 6
16 6
16 7
16 7
16 7
16 7
16 8
16 8
16 8
170
170
170
14.5.19 . Dis p laying a URI fo r Co nnec tio n to a G rap hic al Dis p lay
172
172
173
174
14.5.23. Creating a Virtual Mac hine XML Dump (Co nfig uratio n File)
174
14.5.24. Creating a G ues t Virtual Mac hine fro m a Co nfig uratio n File
176
176
14.6 .1. Ad d ing Multifunc tio n PCI Devic es to KVM G ues t Virtual Mac hines
176
177
177
178
178
179
4.6 .7. Dis p laying the IP Ad d res s and Po rt Numb er fo r the VNC Dis p lay
1
14.7. NUMA No d e Manag ement
179
179
179
179
14.7.3. Dis p laying the Amo unt o f Free Memo ry in a NUMA Cell
18 0
18 0
18 0
18 1
18 1
18 1
18 1
18 1
18 2
18 2
14.8 . Starting , Sus p end ing , Res uming , Saving , and Res to ring a G ues t Virtual Mac hine
18 2
18 2
18 3
18 3
18 3
18 3
18 4
18 4
14.8 .8 . Up d ating the Do main XML File that will b e Us ed fo r Res to ring the G ues t
18 5
18 5
18 5
18 5
14.9 . Shutting Do wn, Reb o o ting , and Fo rc ing Shutd o wn o f a G ues t Virtual Mac hine
14.9 .1. Shutting Do wn a G ues t Virtual Mac hine
18 6
18 6
14.9 .2. Shutting Do wn Red Hat Enterp ris e Linux 6 G ues ts o n a Red Hat Enterp ris e Linux 7 Ho s t
14.9 .3. Manip ulating the lib virt-g ues ts Co nfig uratio n Setting s
18 6
19 0
19 1
19 1
19 1
19 1
14.10 .2. G etting the Do main Name o f a G ues t Virtual Mac hine
19 1
19 1
19 2
18 8
19 2
19 2
19 2
19 3
19 3
19 3
19 4
19 4
19 4
19 4
19 4
19 5
19 5
19 5
19 5
19 6
19 6
14.12.3. Dump ing Sto rag e Vo lume Info rmatio n to an XML File
19 7
19 7
19 7
19 7
19 7
19 8
19 8
19 8
19 8
20 0
20 0
14.13.4. Dis p laying Info rmatio n ab o ut the Virtual CPU Co unts o f a Do main
20 1
20 1
20 1
20 3
20 4
14.13.9 . Dis p laying G ues t Virtual Mac hine Blo c k Devic e Info rmatio n
20 4
14.13.10 . Dis p laying G ues t Virtual Mac hine Netwo rk Devic e Info rmatio n
20 4
20 4
20 5
20 5
14.15.1.1. Defining and s tarting a ho s t p hys ic al mac hine interfac e via an XML file
20 6
20 6
20 6
20 6
20 6
20 6
20 6
20 7
20 7
20 7
20 7
20 8
20 9
20 9
20 9
20 9
210
210
210
211
211
212
212
212
212
215
215
216
217
217
217
217
218
218
218
218
218
219
219
219
219
219
219
219
. .hapt
C
. . . .er
. .1. 5.
. . Managing
. . . . . . . . .G
. .uest
. . . .s. wit
. . .h. .t he
. . . Virt
. . . ual
. . . Machine
. . . . . . . . Manager
. . . . . . . . (virt
. . . .- .manager)
. . . . . . . . . . . . . . . . . . . . .2.2. 1. . . . . . . . . .
15.1. Starting virt-manag er
221
15.2. The Virtual Mac hine Manag er Main Wind o w
15.3. The Virtual Hard ware Details Wind o w
222
222
224
226
228
229
236
237
238
239
240
. .hapt
C
. . . .er
. .1. 6. .. G
. .uest
. . . . Virt
. . . ual
. . . .Machine
. . . . . . . Disk
. . . . .Access
. . . . . . wit
. . .h. .O. ffline
. . . . . T. ools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4. 4. . . . . . . . . .
16 .1. Intro d uc tio n
244
16 .2. Termino lo g y
16 .3. Ins tallatio n
16 .4. The g ues tfis h Shell
245
246
246
247
247
248
249
249
249
249
250
251
251
251
252
253
253
253
254
254
254
256
256
256
256
16 .10 . virt-win-reg : Read ing and Ed iting the Wind o ws Reg is try
16 .10 .1. Intro d uc tio n
16 .10 .2. Ins tallatio n
258
258
258
258
259
26 0
26 4
26 7
26 7
. .hapt
C
. . . .er
. .1. 7. .. Using
. . . . . .Simple
. . . . . .T
. .ools
. . . .for
. . .G. uest
. . . . .Virt
. . . ual
. . . Machine
. . . . . . . . Management
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.6. 8. . . . . . . . . .
17.1. Us ing virt-viewer
26 8
17.2. remo te-viewer
26 9
. .hapt
C
. . . .er
. .1. 8. .. Virt
. . . ual
. . . .Net
. . .working
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.7. 1. . . . . . . . . .
18 .1. Virtual Netwo rk Switc hes
271
18 .2. Brid g ed Mo d e
18 .3. Netwo rk Ad d res s Trans latio n Mo d e
18 .3.1. DNS and DHCP
272
273
274
18 .4. Ro uted Mo d e
18 .5. Is o lated Mo d e
275
276
277
278
278
18 .7.2. Ro uted Mo d e
18 .7.3. NAT Mo d e
279
28 0
8 .7.4. Is o lated Mo d e
1
18 .8 . Manag ing a Virtual Netwo rk
18 .9 . Creating a Virtual Netwo rk
28 1
28 1
28 2
28 9
29 3
29 5
29 5
29 6
29 8
29 8
18 .12.5. Auto matic IP Ad d res s Detec tio n and DHCP Sno o p ing
18 .12.5.1. Intro d uc tio n
18 .12.5.2. DHCP Sno o p ing
30 0
30 0
30 1
30 2
30 2
30 2
30 2
30 3
30 4
30 5
30 5
30 5
30 6
30 7
18 .12.10 .6 . IPv6
18 .12.10 .7. TCP/UDP/SCTP
30 8
30 8
18 .12.10 .8 . ICMP
18 .12.10 .9 . IG MP, ESP, AH, UDPLITE, ' ALL'
18 .12.10 .10 . TCP/UDP/SCTP o ver IPV6
30 9
310
311
312
312
313
313
314
315
315
316
318
321
322
322
322
323
324
. .hapt
C
. . . .er
. .1. 9. .. qemu. . . . . .kvm
. . . .Commands,
. . . . . . . . . . Flags,
. . . . . . and
. . . .Argument
. . . . . . . . .s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . 5. . . . . . . . . .
19 .1. Intro d uc tio n
Whitelis t Fo rmat
325
325
325
325
325
P ro c es s o r To p o lo g y
N UMA Sys tem
326
326
M emo ry Siz e
K eyb o ard Layo ut
326
326
G ues t Name
G ues t UUID
19 .3. Dis k O p tio ns
326
326
326
G eneric Drive
B o o t O p tio n
326
327
nap s ho t Mo d e
S
19 .4. Dis p lay O p tio ns
D is ab le G rap hic s
V G A Card Emulatio n
V NC Dis p lay
p ic e Des kto p
S
19 .5. Netwo rk O p tio ns
T AP netwo rk
328
328
328
328
328
329
329
329
19 .6 . Devic e O p tio ns
G eneral Devic e
330
330
G lo b al Devic e Setting
C harac ter Devic e
E nab le USB
337
337
337
19 .7. Linux/Multib o o t Bo o t
K ernel File
337
337
R am Dis k
C o mmand Line Parameter
19 .8 . Exp ert O p tio ns
338
338
338
K VM Virtualiz atio n
D is ab le Kernel Mo d e PIT Reinjec tio n
338
338
N o Shutd o wn
N o Reb o o t
S erial Po rt, Mo nito r, Q MP
338
338
338
339
339
R TC
Watc hd o g
Watc hd o g Reac tio n
339
339
339
339
339
339
339
339
ud io Help
A
19 .10 . Mis c ellaneo us O p tio ns
339
340
M ig ratio n
N o Default Co nfig uratio n
D evic e Co nfig uratio n File
340
340
340
L o ad ed Saved State
340
. .hapt
C
. . . .er
. .2. 0. .. Manipulat
. . . . . . . . . ing
. . . t. he
. . .Domain
. . . . . . . XML
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
. . 1. . . . . . . . . .
2 0 .1. G eneral Info rmatio n and Metad ata
341
2 0 .2. O p erating Sys tem Bo o ting
2 0 .2.1. BIO S Bo o tlo ad er
2 0 .2.2. Ho s t Phys ic al Mac hine Bo o tlo ad er
342
342
344
344
345
346
346
348
349
350
350
351
352
356
356
358
359
36 0
36 2
36 3
36 3
36 4
36 5
36 5
36 5
36 6
36 6
36 7
36 8
370
371
372
373
373
375
377
378
379
38 0
38 1
38 2
38 2
38 3
38 3
38 6
38 7
38 7
38 8
38 9
39 0
39 0
39 1
39 2
39 2
39 3
39 3
39 4
39 7
39 8
39 9
40 1
40 2
40 6
40 7
40 8
40 9
410
. .hapt
C
. . . .er
. .2. 1. .. T
. .roubleshoot
. . . . . . . . . . .ing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1. 2. . . . . . . . . .
2 1.1. Deb ug g ing and Tro ub les ho o ting To o ls
412
10
413
414
415
418
418
419
420
420
420
2 1.10 . Enab ling Intel VT-x and AMD-V Virtualiz atio n Hard ware Extens io ns in BIO S
2 1.11. KVM Netwo rking Perfo rmanc e
420
421
423
423
424
. .ppendix
A
. . . . . . . A.
..T
. .he
. . Virt
. . . .ual
. . .Host
. . . . .Met
. . .rics
. . . .Daemon
. . . . . . . (vhost
. . . . . .md)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2. 6. . . . . . . . . .
. .ppendix
A
. . . . . . . B.
. . .Addit
. . . . ional
. . . . . Resources
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2. 7. . . . . . . . . .
B .1. O nline Res o urc es
427
B .2. Ins talled Do c umentatio n
427
. .ppendix
A
. . . . . . . C.
. . Revision
. . . . . . . . .Hist
. . . ory
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2. 8. . . . . . . . . .
11
12
13
1. Create a logical volume. This example creates a 5 gigabyte logical volume named
NewVolumeName on the volume group named volumegroup. This example also assumes
that there is enough disk space. You may have to create additional storage on a network
device and give the guest access to it. Refer to Chapter 13, Volumes for more information.
# lvcreate -n NewVolumeName -L 5G volumegroup
2. Format the NewVolumeName logical volume with a file system that supports extended
attributes, such as ext3.
# mke2fs -j /dev/volumegroup/NewVolumeName
3. Create a new directory for mounting the new logical volume. This directory can be anywhere
on your file system. It is advised not to put it in important system directories (/etc, /var,
/sys) or in home directories (/ho me or /ro o t). This example uses a directory called
/vi rtsto rag e
# mkdir /virtstorage
4. Mount the logical volume.
# mount /dev/volumegroup/NewVolumeName /virtstorage
5. Set the SELinux type for the folder you just created.
# semanage fcontext -a -t virt_image_t "/virtstorage(/.*)?"
If the targeted policy is used (targeted is the default policy) the command appends a line to
the /etc/sel i nux/targ eted /co ntexts/fi l es/fi l e_co ntexts. l o cal file which
makes the change persistent. The appended line may resemble this:
/virtstorage(/.*)?
system_u:object_r:virt_image_t:s0
6. Run the command to change the type of the mount point (/vi rtsto rag e) and all files under
it to vi rt_i mag e_t (the resto reco n and setfi l es commands read the files in
/etc/sel i nux/targ eted /co ntexts/fi l es/).
# restorecon -R -v /virtstorage
14
Note
Create a new file (using the to uch command) on the file system.
# touch /virtstorage/newfile
Verify the file has been relabeled using the following command:
# sudo ls -Z /virtstorage
-rw-------. root root system_u:object_r:virt_image_t:s0 newfile
The output shows that the new file has the correct attribute, vi rt_i mag e_t.
2.3. SELinux
This section contains topics to consider when using SELinux with your virtualization deployment.
When you deploy system changes or add devices, you must update your SELinux policy
accordingly. To configure an LVM volume for a guest virtual machine, you must modify the SELinux
context for the respective underlying block device and volume group. Make sure that you have
installed the po l i cyco reuti l s-pytho n package (yum i nstal l po l i cyco reuti l s-pytho n)
before running the command.
# semanage fcontext -a -t virt_image_t -f -b /dev/sda2
# restorecon /dev/sda2
K VM an d SELin u x
The following table shows the SELinux Booleans which affect KVM when launched by libvirt.
K VM SELin u x B o o lean s
SELin u x B o o lean
D escrip t io n
virt_use_comm
virt_use_fusefs
virt_use_nfs
virt_use_samba
virt_use_sanlock
virt_use_sysfs
virt_use_xserver
virt_use_usb
15
Note
Any network service on a guest virtual machine must have the applicable ports open on the
guest virtual machine to allow external access. If a network service on a guest virtual machine
is firewalled it will be inaccessible. Always verify the guest virtual machine's network
configuration first.
ICMP requests must be accepted. ICMP packets are used for network testing. You cannot ping
guest virtual machines if the ICMP packets are blocked.
Port 22 should be open for SSH access and the initial installation.
Ports 80 or 443 (depending on the security settings on the RHEV Manager) are used by the vdsmreg service to communicate information about the host physical machine.
Ports 5634 to 6166 are used for guest virtual machine console access with the SPICE protocol.
Ports 49152 to 49216 are used for migrations with KVM. Migration may use any port in this range
depending on the number of concurrent migrations occurring.
Enabling IP forwarding (net. i pv4 . i p_fo rward = 1) is also required for shared bridges and
the default bridge. Note that installing libvirt enables this variable so it will be enabled when the
virtualization packages are installed unless it was manually disabled.
Note
Note that enabling IP forwarding is n o t required for physical bridge devices. When a guest
virtual machine is connected through a physical bridge, traffic only operates at a level that
does not require IP configuration such as IP forwarding.
16
Chapt er 3. sVirt
Chapter 3. sVirt
sVirt is a technology included in Red Hat Enterprise Linux 6 that integrates SELinux and
virtualization. sVirt applies Mandatory Access Control (MAC) to improve security when using guest
virtual machines. This integrated technology improves security and hardens the system against bugs
in the hypervisor. It is particulary helpful in preventing attacks on the host physical machine or on
another guest virtual machine.
This chapter describes how sVirt integrates with virtualization technologies in Red Hat Enterprise
Linux 6.
N o n - virt u aliz ed En viro n men t s
In a non-virtualized environment, host physical machines are separated from each other physically
and each host physical machine has a self-contained environment, consisting of services such as a
web server, or a D NS server. These services communicate directly to their own user space, host
physical machine's kernel and physical hardware, offering their services directly to the network. The
following image represents a non-virtualized environment:
User Space - memory area where all user mode applications and some drivers execute.
Web App (web application server) - delivers web content that can be accessed through the a
browser.
Host Kernel - is strictly reserved for running the host physical machine's privileged kernel,
kernel extensions, and most device drivers.
D NS Server - stores D NS records allowing users to access web pages using logical names
instead of IP addresses.
Virt u aliz ed En viro n men t s
In a virtualized environment, several virtual operating systems can run on a single kernel residing on
a host physical machine. The following image represents a virtualized environment:
17
SELinux introduces a pluggable security framework for virtualized instances in its implementation of
Mandatory Access Control (MAC). The sVirt framework allows guest virtual machines and their
resources to be uniquely labeled. Once labeled, rules can be applied which can reject access
between different guest virtual machines.
18
Chapt er 3. sVirt
00:00:17 qemu-kvm
The actual disk images are automatically labeled to match the processes, as shown in the following
output:
# ls -lZ /var/lib/libvirt/images/*
system_u:object_r:svirt_image_t:s0:c87,c520
image1
The following table outlines the different context labels that can be assigned when using sVirt:
T ab le 3.1. sVirt co n t ext lab els
SELin u x C o n t ext
T yp e / D escrip t io n
system_u:system_r:svirt_t:MCS1
system_u:object_r:svirt_image_t:MCS1
system_u:object_r:svirt_image_t:s0
It is also possible to perform static labeling when using sVirt. Static labels allow the administrator to
select a specific label, including the MCS/MLS field, for a guest virtual machine. Administrators who
run statically-labeled virtualized guest virtual machines are responsible for setting the correct label
on the image files. The guest virtual machine will always be started with that label, and the sVirt
system will never modify the label of a statically-labeled virtual machine's content. This allows the
sVirt component to run in an MLS environment. You can also run multiple guest virtual machines with
different sensitivity levels on a system, depending on your requirements.
19
20
21
4 .2. Live Migrat ion and Red Hat Ent erprise Linux Version Compat ibilit y
Live Migration is supported as shown in table Table 4.1, Live Migration Compatibility :
T ab le 4 .1. Live Mig rat io n C o mp at ib ilit y
Mig rat io n
Met h o d
R elease T yp e
Examp le
Forward
Forward
Major release
Minor release
5.x 6.y
5.x 5.y (y>x,
x>=4)
Not supported
Fully supported
Forward
Minor release
Fully supported
Backward
Backward
Major release
Minor release
6.x 5.y
5.x 5.y
(x>y,y>=4)
Not supported
Supported
Backward
Minor release
Supported
N o t es
Any issues
should be
reported
Any issues
should be
reported
Refer to
Troubleshooting
problems with
migration for
known issues
Refer to
Troubleshooting
problems with
migration for
known issues
C o n f ig u rin g N et wo rk St o rag e
22
Configure shared storage and install a guest virtual machine on the shared storage.
Alternatively, use the NFS example in Section 4.3, Shared Storage Example: NFS for a Simple
Migration
Also note, that the instructions provided in this section are not meant to replace the detailed
instructions found in Red Hat Linux Storage Administration Guide. Refer to this guide for information on
configuring NFS, opening IP tables, and configuring the firewall.
1. C reat e a d irect o ry f o r t h e d isk imag es
This shared directory will contain the disk images for the guest virtual machines. To do this
create a directory in a location different from /var/l i b/l i bvi rt/i mag es. For example:
# mkdir /var/lib/libvirt-img/images
2. Ad d t h e n ew d irect o ry p at h t o t h e N FS co n f ig u rat io n f ile
The NFS configuration file is a text file located in /etc/expo rts. Open the file and edit it
adding the path to the new file you created in step 1.
# echo "/var/lib/libvirt-img/images" >> /etc/exports/[NFS-ConfigFILENAME.txt]
3. St art N FS
a. Make sure that the ports for NFS in i ptabl es (2049, for example) are opened and
add NFS to the /etc/ho sts. al l o w file.
b. Start the NFS service:
# service nfs start
4. Mo u n t t h e sh ared st o rag e o n b o t h t h e so u rce an d t h e d est in at io n
23
Mount the /var/l i b/l i bvi rt/i mag es directory on both the source and destination
system, running the following command twice. Once on the source system and again on the
destination system.
# mount source_host:/var/lib/libvirt-img/images
/var/lib/libvirt/images
Warning
Make sure that the direcrtories you create in this procedure is compliant with the
requirements as outlined in Section 4.1, Live Migration Requirements . In addition, the
directory may need to be labeled with the correct SELinux label. For more information
consult the NFS chapter in the Red Hat Enterprise Linux Storage Administration Guide.
Note
The DestinationURL parameter for normal migration and peer2peer migration has different
semantics:
normal migration: the DestinationURL is the URL of the target host physical machine as
seen from the source guest virtual machine.
peer2peer migration: DestinationURL is the URL of the target host physical machine as
seen from the source host physical machine.
Once the command is entered, you will be prompted for the root password of the destination system.
24
Important
An entry for the destination host physical machine, in the /etc/ho sts file on the source
server is required for migration to succeed. Enter the IP address and hostname for the
destination host physical machine in this file as shown in the following example, substituting
your destination host physical machine's IP address and hostname:
10.0.0.20 host2.example.com
25
Note
D uring the migration, the completion percentage indicator number is likely to decrease
multiple times before the process finishes. This is caused by a recalculation of the
overall progress, as source memory pages that are changed after the migration starts
need to be be copied again. Therefore, this behavior is expected and does not indicate
any problems with the migration.
4.
Verif y t h e g u est virt u al mach in e h as arrived at t h e d est in at io n h o st
From the destination system, ho st2. exampl e. co m, verify g uest1-rhel 6 -6 4 is running:
[root@ host2 ~]# virsh list
Id Name
State
---------------------------------10 guest1-rhel6-64
running
The live migration is now complete.
Note
libvirt supports a variety of networking methods including TLS/SSL, UNIX sockets, SSH, and
unencrypted TCP. Refer to Chapter 5, Remote Management of Guests for more information on
using other methods.
Note
Non-running guest virtual machines cannot be migrated with the vi rsh mi g rate command.
To migrate a non-running guest virtual machine, the following script should be used:
virsh dumpxml Guest1 > Guest1.xml
virsh -c qemu+ssh://<target-system-FQDN>
virsh undefine Guest1
define Guest1.xml
26
#################################################################
#
# Processing controls
#
# The maximum number of concurrent client connections to allow
# over all sockets combined.
#max_clients = 20
27
Important
The max_clients and max_workers parameters settings are effected by all guest
virtual machine connections to the libvirtd service. This means that any user that is
using the same guest virtual machine and is performing a migration at the same time
will also beholden to the limits set in the the max_clients and max_workers
parameters settings. This is why the maximum value needs to be considered carefully
before performing a concurrent live migration.
4. Save the file and restart the service.
Note
There may be cases where a migration connection drops because there are too many
ssh sessions that have been started, but not yet authenticated. By default, sshd allows
only 10 sessions to be in a " pre-authenticated state" at any time. This setting is
controlled by the MaxStartups parameter in the sshd configuration file (located here:
/etc/ssh/sshd _co nfi g ), which may require some adjustment. Adjusting this
parameter should be done with caution as the limitation is put in place to prevent D oS
attacks (and over-use of resources in general). Setting this value too high will negate
its purpose. To change this parameter, edit the file /etc/ssh/sshd _co nfi g , remove
the # from the beginning of the MaxStartups line, and change the 10 (default value)
to a higher number. Remember to save the file and restart the sshd service. For more
information, refer to the sshd _co nfi g man page.
28
29
Connect to the target host physical machine by clicking on the File menu, then click Ad d
C o n n ect io n .
30
Fig u re 4 .4 . En t er p asswo rd
4. Mig rat e g u est virt u al mach in es
Open the list of guests inside the source host physical machine (click the small triangle on
the left of the host name) and right click on the guest that is to be migrated (g u est 1- rh el6 6 4 in this example) and click Mig rat e.
31
32
33
virt - man ag er now displays the newly migrated guest virtual machine running in the
destination host. The guest virtual machine that was running in the source host physical
machine is now listed inthe Shutoff state.
34
<pool type='iscsi'>
<name>iscsirhel6guest</name>
<source>
<host name='virtlab22.example.com.'/>
<device path='iqn.2001-05.com.iscsivendor:0-8a0906fbab74a06-a700000017a4cc89-rhevh'/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>
...
Fig u re 4 .10. XML co n f ig u rat io n f o r t h e d est in at io n h o st p h ysical mach in e
35
Note
Red Hat Enterprise Virtualization enables remote management of large numbers of virtual
machines. Refer to the Red Hat Enterprise Virtualization documentation for further details.
The following packages are required for ssh access:
openssh
openssh-askpass
openssh-clients
openssh-server
36
Important
SSH keys are user dependent and may only be used by their owners. A key's owner is the one
who generated it. Keys may not be shared.
vi rt-manag er must be run by the user who owns the keys to connect to the remote host. That
means, if the remote systems are managed by a non-root user vi rt-manag er must be run in
unprivileged mode. If the remote systems are managed by the local root user then the SSH
keys must be owned and created by root.
You cannot manage the local host as an unprivileged user with vi rt-manag er.
1. O p t io n al: C h an g in g u ser
Change user, if required. This example uses the local root user for remotely managing the
other hosts and the local host.
$ su 2. G en erat in g t h e SSH key p air
Generate a public key pair on the machine vi rt-manag er is used. This example uses the
default key location, in the ~ /. ssh/ directory.
# ssh-keyg en -t rsa
3. C o p yin g t h e keys t o t h e remo t e h o st s
Remote login without a password, or with a passphrase, requires an SSH key to be
distributed to the systems being managed. Use the ssh-co py-i d command to copy the key
to root user at the system address provided (in the example, [email protected]).
# ssh-co py-i d -i ~ /. ssh/i d _rsa. pub ro o t@ ho st2. exampl e. co m
root@ host2.example.com's password:
Now try logging into the machine, with the ssh ro o t@ ho st2. exampl e. co m command and
check in the . ssh/autho ri zed _keys file to make sure unexpected keys have not been
added.
Repeat for other systems, as required.
4. O p t io n al: Ad d t h e p assp h rase t o t h e ssh - ag en t
The instructions below describe how to add a passphrase to an existing ssh-agent. It will fail
to run if the ssh-agent is not running. To avoid errors or conflicts make sure that your SSH
parameters are set correctly. Refer to the Red Hat Enterprise Linux Deployment Guide for more
information.
Add the passphrase for the SSH key to the ssh-ag ent, if required. On the local host, use the
following command to add the passphrase (if there was one) to enable password-less login.
# ssh-ad d ~ /. ssh/i d _rsa
37
38
39
This procedure demonstrates how to issue a certificate with the X.509 CommonName (CN)field set to
the hostname of the server. The CN must match the hostname which clients will be using to connect to
the server. In this example, clients will be connecting to the server using the URI:
q emu: //myco mmo nname/system, so the CN field should be identical, ie mycommoname.
1. Create a private key for the server.
# certto o l --g enerate-pri vkey > serverkey. pem
2. Generate a signature for the CA's private key by first creating a template file called
server. i nfo . Make sure that the CN is set to be the same as the server's hostname:
organization = Name of your organization
cn = mycommonname
tls_www_server
encryption_key
signing_key
3. Create the certificate with the following command:
# certto o l --g enerate-certi fi cate --l o ad -pri vkey serverkey. pem -l o ad -ca-certi fi cate cacert. pem --l o ad -ca-pri vkey cakey. pem \ -templ ate server. i nfo --o utfi l e servercert. pem
4. This results in two files being generated:
serverkey.pem - The server's private key
servercert.pem - The server's public key
Make sure to keep the location of the private key secret. To view the contents of the file,
perform the following command:
# certto o l -i --i ni fi l e servercert. pem
When opening this file the C N= parameter should be the same as the CN that you set earlier.
For example, myco mmo nname.
5. Install the two files in the following locations:
serverkey. pem - the server's private key. Place this file in the following location:
/etc/pki /l i bvi rt/pri vate/serverkey. pem
servercert. pem - the server's certificate. Install it in the following location on the server:
/etc/pki /l i bvi rt/servercert. pem
Pro ced u re 5.3. Issu in g a clien t cert if icat e
1. For every client (ie. any program linked with libvirt, such as virt-manager), you need to issue
a certificate with the X.509 D istinguished Name (D N) set to a suitable name. This needs to be
decided on a corporate level.
For example purposes the following information will be used:
C=USA,ST=North Carolina,L=Raleigh,O=Red Hat,CN=name_of_client
40
This process is quite similar to Procedure 5.2, Issuing a server certificate , with the following
exceptions noted.
2. Make a private key with the following command:
# certto o l --g enerate-pri vkey > cl i entkey. pem
3. Generate a signature for the CA's private key by first creating a template file called
cl i ent. i nfo . The file should contain the following (fields should be customized to reflect
your region/location):
country = USA
state = North Carolina
locality = Raleigh
organization = Red Hat
cn = client1
tls_www_client
encryption_key
signing_key
4. Sign the certificate with the following command:
# certto o l --g enerate-certi fi cate --l o ad -pri vkey cl i entkey. pem -l o ad -ca-certi fi cate cacert. pem \ --l o ad -ca-pri vkey cakey. pem -templ ate cl i ent. i nfo --o utfi l e cl i entcert. pem
5. Install the certificates on the client machine:
# cp cl i entkey. pem /etc/pki /l i bvi rt/pri vate/cl i entkey. pem
# cp cl i entcert. pem /etc/pki /l i bvi rt/cl i entcert. pem
U N IX So cket s
UNIX domain sockets are only accessible on the local machine. Sockets are not encrypted, and use
UNIX permissions or SELinux for authentication. The standard socket names are
/var/run/l i bvi rt/l i bvi rt-so ck and /var/run/l i bvi rt/l i bvi rt-so ck-ro (for read-only
connections).
41
SSH
Transported over a Secure Shell protocol (SSH) connection. Requires Netcat (the nc package)
installed. The libvirt daemon (l i bvi rtd ) must be running on the remote machine. Port 22 must be
open for SSH access. You should use some sort of SSH key management (for example, the sshag ent utility) or you will be prompted for a password.
ext
The ext parameter is used for any external program which can make a connection to the remote
machine by means outside the scope of libvirt. This parameter is unsupported.
TCP
Unencrypted TCP/IP socket. Not recommended for production use, this is normally disabled, but an
administrator can enable it for testing or use over a trusted network. The default port is 16509.
The default transport, if no other is specified, is TLS.
R emo t e U R Is
A Uniform Resource Identifier (URI) is used by vi rsh and libvirt to connect to a remote host. URIs can
also be used with the --co nnect parameter for the vi rsh command to execute single commands or
migrations on remote hosts. Remote URIs are formed by taking ordinary local URIs and adding a
hostname and/or transport name. As a special case, using a URI scheme of 'remote', will tell the
remote libvirtd server to probe for the optimal hypervisor driver. This is equivalent to passing a NULL
URI for a local connection
libvirt URIs take the general form (content in square brackets, " []" , represents optional functions):
driver[+transport]://[username@ ][hostname][:port]/path[?extraparameters]
Note that if the hypervisor(driver) is QEMU, the path is mandatory. If it is XEN, it is optional.
The following are examples of valid remote URIs:
qemu://hostname/
xen://hostname/
xen+ssh://hostname/
The transport method or the hostname must be provided to target an external location. For more
information refer to http://libvirt.org/guide/html/Application_D evelopment_Guide-ArchitectureRemote_URIs.html.
42
Connect to a remote KVM host named ho st2, using SSH transport and the SSH username
vi rtuser.The connect command for each is co nnect [<name>] [--read o nl y], where
<name> is a valid URI as explained here. For more information about the vi rsh co nnect
command refer to Section 14.1.5, connect
q emu+ ssh: //vi rtuser@ ho t2/
Connect to a remote KVM hypervisor on the host named ho st2 using TLS.
q emu: //ho st2/
T ran sp o rt mo d e
D escrip t io n
Examp le u sag e
name
all modes
name=qemu:///system
43
N ame
T ran sp o rt mo d e
D escrip t io n
Examp le u sag e
command
command=/opt/openss
h/bin/ssh
socket
netcat
ssh
socket=/opt/libvirt/run/li
bvirt/libvirt-sock
netcat=/opt/netcat/bin/n
c
44
tls
If set to a non-zero
value, this disables
client checks of the
server's certificate.
Note that to disable
server checks of the
client's certificate or IP
address you must
change the libvirtd
configuration.
no_verify=1
N ame
T ran sp o rt mo d e
D escrip t io n
Examp le u sag e
no_tty
ssh
If set to a non-zero
value, this stops ssh
from asking for a
password if it cannot
log in to the remote
machine automatically
. Use this when you do
not have access to a
terminal .
no_tty=1
45
Important
Overcommitting is not an ideal solution for all memory issues as the recommended method to
deal with memory shortage is to allocate less memory per guest so that the sum of all guests
memory (+4G for the host O/S) is lower than the host physical machine's physical memory. If
the guest virtual machines need more memory, then increase the guest virtual machines' swap
space allocation. If however, should you decide to overcommit, do so with caution.
Guest virtual machines running on a KVM hypervisor do not have dedicated blocks of physical RAM
assigned to them. Instead, each guest virtual machine functions as a Linux process where the host
physical machine's Linux kernel allocates memory only when requested. In addition the host
physical machine's memory manager can move the guest virtual machine's memory between its own
physical memory and swap space. This is why overcommitting requires allotting sufficient swap
space on the host physical machine to accommodate all guest virtual machines as well as enough
memory for the host physical machine's processes. As a basic rule, the host physical machine's
operating system requires a maximum of 4GB of memory along with a minimum of 4GB of swap
space. Refer to Example 6.1, Memory overcommit example for more information.
Red Hat Knowledgebase has an article on safely and efficiently determining the size of the swap
partition.
Note
The example below is provided as a guide for configuring swap only. The settings listed may
not be appropriate for your environment.
46
Note
Overcommitting does not work with all guest virtual machines, but has been found to work in a
desktop virtualization setup with minimal intensive usage or running several identical guest
virtual machines with KSM. It should be noted that configuring swap and memory overcommit
is not a simple plug-in and configure formula, as each environment and setup is different.
Proceed with caution before changing these settings and make sure you completely
understand your environment and setup before making any changes.
For more information on KSM and overcommitting, refer to Chapter 7, KSM.
47
Important
D o not overcommit memory or CPUs in a production environment without extensive testing.
Applications which use 100% of memory or processing resources may become unstable in
overcommitted environments. Test before deploying.
For more information on how to get the best performance out of your virtual machine, refer to the Red
Hat Enterprise Linux 6 Virtualization Tuning and Optimization Guide.
48
Chapt er 7 . KSM
Chapter 7. KSM
The concept of shared memory is common in modern operating systems. For example, when a
program is first started it shares all of its memory with the parent program. When either the child or
parent program tries to modify this memory, the kernel allocates a new memory region, copies the
original contents and allows the program to modify this new region. This is known as copy on write.
KSM is a new Linux feature which uses this concept in reverse. KSM enables the kernel to examine
two or more already running programs and compare their memory. If any memory regions or pages
are identical, KSM reduces multiple identical memory pages to a single page. This page is then
marked copy on write. If the contents of the page is modified by a guest virtual machine, a new page
is created for that guest virtual machine.
This is useful for virtualization with KVM. When a guest virtual machine is started, it inherits only the
memory from the parent q emu-kvm process. Once the guest virtual machine is running, the contents
of the guest virtual machine operating system image can be shared when guests are running the
same operating system or applications.
Note
The page deduplication technology (used also by the KSM implementation) may introduce
side channels that could potentially be used to leak information across multiple guests. In
case this is a concern, KSM can be disabled on a per-guest basis.
KSM provides enhanced memory speed and utilization. With KSM, common process data is stored in
cache or in main memory. This reduces cache misses for the KVM guests which can improve
performance for some applications and operating systems. Secondly, sharing memory reduces the
overall memory usage of guests which allows for higher densities and greater utilization of
resources.
Note
Starting in Red Hat Enterprise Linux 6.5, KSM is NUMA aware. This allows it to take NUMA
locality into account while coalescing pages, thus preventing performance drops related to
pages being moved to a remote node. Red Hat recommends avoiding cross-node memory
merging when KSM is in use. If KSM is in use, change the
/sys/kernel /mm/ksm/merg e_acro ss_no d es tunable to 0 to avoid merging pages across
NUMA nodes. Kernel memory accounting statistics can eventually contradict each other after
large amounts of cross-node merging. As such, numad can become confused after the KSM
daemon merges large amounts of memory. If your system has a large amount of free memory,
you may achieve higher performance by turning off and disabling the KSM daemon. Refer to
the Red Hat Enterprise Linux Performance Tuning Guide for more information on NUMA.
Red Hat Enterprise Linux uses two separate methods for controlling KSM:
The ksm service starts and stops the KSM kernel thread.
The ksmtuned service controls and tunes the ksm, dynamically managing same-page merging.
The ksmtuned service starts ksm and stops the ksm service if memory sharing is not necessary.
The ksmtuned service must be told with the retune parameter to run when new guests are
created or destroyed.
49
Both of these services are controlled with the standard service management tools.
T h e K SM Service
The ksm service is included in the qemu-kvm package. KSM is off by default on Red Hat Enterprise
Linux 6. When using Red Hat Enterprise Linux 6 as a KVM host physical machine, however, it is likely
turned on by the ksm/ksmtuned services.
When the ksm service is not started, KSM shares only 2000 pages. This default is low and provides
limited memory saving benefits.
When the ksm service is started, KSM will share up to half of the host physical machine system's main
memory. Start the ksm service to enable KSM to share more memory.
# service ksm start
Starting ksm:
OK
The ksm service can be added to the default startup sequence. Make the ksm service persistent with
the chkconfig command.
# chkconfig ksm on
T h e K SM T u n in g Service
The ksmtuned service does not have any options. The ksmtuned service loops and adjusts ksm.
The ksmtuned service is notified by libvirt when a guest virtual machine is created or destroyed.
# service ksmtuned start
Starting ksmtuned:
OK
The ksmtuned service can be tuned with the retune parameter. The retune parameter instructs
ksmtuned to run tuning functions manually.
Before changing the parameters in the file, there are a few terms that need to be clarified:
thres - Activation threshold, in kbytes. A KSM cycle is triggered when the thres value added to
the sum of all qemu-kvm processes RSZ exceeds total system memory. This parameter is the
equivalent in kbytes of the percentage defined in KSM_THRES_COEF.
The /etc/ksmtuned . co nf file is the configuration file for the ksmtuned service. The file output
below is the default ksmtuned . co nf file.
# Configuration file for ksmtuned.
# How long ksmtuned should sleep between tuning adjustments
# KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
# KSM_SLEEP_MSEC=10
# KSM_NPAGES_BOOST is added to the npages value, when free memory is less
than thres.
# KSM_NPAGES_BOOST=300
# KSM_NPAGES_DECAY Value given is subtracted to the npages value, when
50
Chapt er 7 . KSM
51
sleep _millisecs
Sleep milliseconds.
KSM tuning activity is stored in the /var/l o g /ksmtuned log file if the DEBUG=1 line is added to the
/etc/ksmtuned . co nf file. The log file location can be changed with the LOGFILE parameter.
Changing the log file location is not advised and may require special configuration of SELinux
settings.
D eact ivat in g K SM
KSM has a performance overhead which may be too large for certain environments or host physical
machine systems.
KSM can be deactivated by stopping the ksmtuned and the ksm service. Stopping the services
deactivates KSM but does not persist after restarting.
# service ksmtuned stop
Stopping ksmtuned:
# service ksm stop
Stopping ksm:
OK
OK
Persistently deactivate KSM with the chkco nfi g command. To turn off the services, run the following
commands:
# chkconfig ksm off
# chkconfig ksmtuned off
Important
Ensure the swap size is sufficient for the committed RAM even with KSM. KSM reduces the RAM
usage of identical or similar guests. Overcommitting guests with KSM without sufficient swap
space may be possible but is not recommended because guest virtual machine memory use
can result in pages becoming unshared.
52
Note
See the Red Hat Enterprise Linux 7 Virtualization Tuning and Optimization Guide for
instructions on tuning memory performance with huge pages.
53
8.3. Running Red Hat Ent erprise Linux as a Guest Virt ual Machine on a
Hyper-V Hypervisor
It is possible to run a Red Hat Enterprise Linux guest virtual machine on a Microsoft Windows host
physical machine running the Microsoft Windows Hyper-V hypervisor. In particular, the following
enhancements have been made to allow for easier deployment and management of Red Hat
Enterprise Linux guest virtual machines:
Upgraded VMBUS protocols - VMBUS protocols have been upgraded to Windows 8 level. As part
of this work, now VMBUS interrupts can be processed on all available virtual CPUs in the guest.
Furthermore, the signaling protocol between the Red Hat Enterprise Linux guest virtual machine
and the Windows host physical machine has been optimized.
Synthetic frame buffer driver - Provides enhanced graphics performance and superior resolution
for Red Hat Enterprise Linux desktop users.
Live Virtual Machine Backup support - Provisions uninterrupted backup support for live Red Hat
Enterprise Linux guest virtual machines.
D ynamic expansion of fixed size Linux VHD s - Allows expansion of live mounted fixed size Red
Hat Enterprise Linux VHD s.
For more information, refer to the following article: Enabling Linux Support on Windows Server 2012
R2 Hyper-V.
Note
The Hyper-V hypervisor supports shrinking a GPT-partitioned disk on a Red Hat Enterprise
Linux guest if there is free space after the last partition, by allowing the user to drop the
unused last part of the disk. However, this operation will silently delete the secondary GPT
header on the disk, which may produce error messages when the guest examines the partition
table (for example, when printing the partition table with parted ). This is a known limit of
Hyper-V. As a workaround, it is possible to manually restore the secondary GPT header after
shrinking the GPT disk by using the expert menu in g d i sk and the e command. Furthermore,
using the " expand" option in the Hyper-V manager also places the GPT secondary header in
a location other than at the end of disk, but this can be moved with parted . See the g d i sk
and parted man pages for more information on these commands.
54
8.5. Aut omat ically St art ing Guest Virt ual Machines
This section covers how to make guest virtual machines start automatically during the host physical
machine system's boot phase.
This example uses vi rsh to set a guest virtual machine, TestServer, to automatically start when
the host physical machine boots.
# virsh autostart TestServer
Domain TestServer marked as autostarted
The guest virtual machine now automatically starts with the host physical machine.
55
To stop a guest virtual machine automatically booting use the --disable parameter
# virsh autostart --disable TestServer
Domain TestServer unmarked as autostarted
The guest virtual machine no longer automatically starts with the host physical machine.
8.6. Disable SMART Disk Monit oring for Guest Virt ual Machines
SMART disk monitoring can be safely disabled as virtual disks and the physical storage devices are
managed by the host physical machine.
# service smartd stop
# chkconfig --del smartd
56
8.8.1. Anot her Met hod t o Generat e a New MAC for Your Guest Virt ual Machine
You can also use the built-in modules of pytho n-vi rti nst to generate a new MAC address and
UUID for use in a guest virtual machine configuration file:
# echo 'import virtinst.util ; print\
virtinst.util.uuidToString(virtinst.util.randomUUID())' | python
# echo 'import virtinst.util ; print virtinst.util.randomMAC()' |
python
The script above can also be implemented as a script file as seen below.
#!/usr/bin/env python
# -*- mode: python; -*print ""
print "New UUID:"
import virtinst.util ; print
virtinst.util.uuidToString(virtinst.util.randomUUID())
print "New MAC:"
import virtinst.util ; print virtinst.util.randomMAC()
print ""
57
Warning
Virtual memory allows a Linux system to use more memory than there is physical RAM on the
system. Underused processes are swapped out which allows active processes to use memory,
improving memory utilization. D isabling swap reduces memory utilization as all processes are
stored in physical RAM.
If swap is turned off, do not overcommit guest virtual machines. Overcommitting guest virtual
machines without any swap can cause guest virtual machines or the host physical machine
system to crash.
D escrip t io n
utc
localtime
timezone
variable
58
Note
The value u t c is set as the clock offset in a virtual machine by default. However, if the guest
virtual machine clock is run with the lo calt ime value, the clock offset needs to be changed
to a different value in order to have the guest virtual machine clock synchronized with the
host physical machine clock.
The ti mezo ne attribute determines which timezone is used for the guest virtual machine clock.
The ad justment attribute provides the delta for guest virtual machine clock synchronization. In
seconds, relative to UTC.
D escrip t io n
pit
rtc
tsc
59
Valu e
D escrip t io n
kvmclock
8.10.2. t rack
The track attribute specifies what is tracked by the timer. Only valid for a name value of rtc.
T ab le 8.3. t rack at t rib u t e valu es
Valu e
D escrip t io n
boot
guest
wall
8.10.3. t ickpolicy
The ti ckpo l i cy attribute assigns the policy used to pass ticks on to the guest virtual machine. The
following values are accepted:
T ab le 8.4 . t ickp o licy at t rib u t e valu es
Valu e
D escrip t io n
delay
catchup
merge
discard
D escrip t io n
auto
native
emulate
smpsafe
60
present is used to override the default set of timers visible to the guest virtual machine..
T ab le 8.6 . p resen t at t rib u t e valu es
Valu e
D escrip t io n
yes
no
Note
The PIT clocksource can be used with a 32-bit guest running under a 64-bit host (which
cannot use PIT), with the following conditions:
Guest virtual machine may have only one CPU
APIC timer must be disabled (use the " noapictimer" command line option)
NoHZ mode must be disabled in the guest (use the " nohz=off" command line option)
High resolution timer mode must be disabled in the guest (use the " highres=off" command
line option)
The PIT clocksource is not compatible with either high resolution timer mode, or NoHz
mode.
61
This feature is only supported with guest virtual machines running Red Hat Enterprise Linux 6 and is
disabled by default. This feature only works using the Linux perf tool. Make sure the perf package is
installed using the command:
# yum i nstal l perf.
See the man page on perf for more information on the perf commands.
62
Note
The number of devices that can be attached to a virtual machine depends on several factors.
One factor is the number of files open by the QEMU process (configured in
/etc/securi ty/l i mi ts. co nf, which can be overridden by /etc/l i bvi rt/q emu. co nf).
Other limitation factors include the the number of slots available on the virtual bus, as well as
the system-wide limit on open files set by sysctl.
For more information on specific devices and for limitations refer to Section 20.16, D evices .
Red Hat Enterprise Linux 6 supports PCI hotplug of devices exposed as single function slots to the
virtual machine. Single function host devices and individual functions of multi-function host devices
may be configured to enable this. Configurations exposing devices as multi-function PCI slots to the
virtual machine are recommended only for non-hotplug applications.
63
Note
Platform support for interrupt remapping is required to fully isolate a guest with assigned
devices from the host. Without such support, the host may be vulnerable to interrupt injection
attacks from a malicious guest. In an environment where guests are trusted, the admin may
opt-in to still allow PCI device assignment using the al l o w_unsafe_i nterrupts option to
the vfio_iommu_type1 module. This may either be done persistently by adding a .conf file
(e.g. l o cal . co nf) to /etc/mo d pro be. d containing the following:
options vfio_iommu_type1 allow_unsafe_interrupts=1
or dynamically using the sysfs entry to do the same:
# echo 1 >
/sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
64
grub2-mkconfig -o /etc/grub2.cfg
Note that if you are using a UEFI-based host, the target file should be /etc/g rub2efi . cfg .
4. R ead y t o u se
Reboot the system to enable the changes. Your system is now capable of PCI device
assignment.
Pro ced u re 9 .2. Prep arin g an AMD syst em f o r PC I d evice assig n men t
1. En ab le t h e AMD IO MMU sp ecif icat io n s
The AMD IOMMU specifications are required to use PCI device assignment in Red Hat
Enterprise Linux. These specifications must be enabled in the BIOS. Some system
manufacturers disable these specifications by default.
2. En ab le IO MMU kern el su p p o rt
Append amd_iommu=on to the end of the GRUB_CMD LINX_LINUX line, within the quotes, in
/etc/sysco nfi g /g rub so that AMD IOMMU specifications are enabled at boot.
3. R eg en erat e co n f ig f ile
Regenerate /etc/grub2.cfg by running:
grub2-mkconfig -o /etc/grub2.cfg
Note that if you are using a UEFI-based host, the target file should be /etc/g rub2efi . cfg .
4. R ead y t o u se
Reboot the system to enable the changes. Your system is now capable of PCI device
assignment.
65
66
67
Note
An IOMMU group is determined based on the visibility and isolation of devices from the
perspective of the IOMMU. Each IOMMU group may contain one or more devices. When
multiple devices are present, all endpoints within the IOMMU group must be claimed for
any device within the group to be assigned to a guest. This can be accomplished
either by also assigning the extra endpoints to the guest or by detaching them from the
host driver using vi rsh no d ed ev-d etach. D evices contained within a single group
may not be split between multiple guests or split between host and guest. Nonendpoint devices such as PCIe root ports, switch ports, and bridges should not be
detached from the host drivers and will not interfere with assignment of endpoints.
D evices within an IOMMU group can be determined using the iommuGroup section of
the vi rsh no d ed ev-d umpxml output. Each member of the group is provided via a
separate " address" field. This information may also be found in sysfs using the
following:
$ ls /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices/
An example of the output from this would be:
0000:01:00.0
0000:01:00.1
To assign only 0000.01.00.0 to the guest, the unused endpoint should be detached
from the host before starting the guest:
$ virsh nodedev-detach pci_0000_01_00_1
68
Alternately, run vi rsh attach-d evi ce, specifying the virtual machine name and the
guest's XML file:
virsh attach-device guest1-rhel6-64 fi l e. xml
5. St art t h e virt u al mach in e
# virsh start guest1-rhel6-64
The PCI device should now be successfully assigned to the virtual machine, and accessible to the
guest operating system.
69
2. Select a PC I d evice
Select PC I H o st D evice from the Hard ware list on the left.
Select an unused PCI device. If you select a PCI device that is in use by another guest an
error may result. In this example, a spare 82576 network device is used. Click Fi ni sh to
complete setup.
70
Note
If device assignment fails, there may be other endpoints in the same IOMMU group that are still
attached to the host. There is no way to retrieve group information using virt-manager, but
virsh commands can be used to analyze the bounds of the IOMMU group and if necessary
sequester devices.
Refer to the Note in Section 9.1.1, Assigning a PCI D evice with virsh for more information on
IOMMU groups and how to detach endpoint devices using virsh.
71
Identify the PCI device designated for device assignment to the guest virtual machine.
# lspci
00:19.0
Network
01:00.0
Network
01:00.1
Network
| grep Ethernet
Ethernet controller: Intel Corporation 82567LM-2 Gigabit
Connection
Ethernet controller: Intel Corporation 82576 Gigabit
Connection (rev 01)
Ethernet controller: Intel Corporation 82576 Gigabit
Connection (rev 01)
The vi rsh no d ed ev-l i st command lists all devices attached to the system, and identifies
each PCI device with a string. To limit output to only PCI devices, run the following command:
# virsh nodedev-list --cap pci
pci_0000_00_00_0
pci_0000_00_01_0
pci_0000_00_03_0
pci_0000_00_07_0
pci_0000_00_10_0
pci_0000_00_10_1
pci_0000_00_14_0
pci_0000_00_14_1
pci_0000_00_14_2
pci_0000_00_14_3
pci_0000_00_19_0
pci_0000_00_1a_0
pci_0000_00_1a_1
pci_0000_00_1a_2
pci_0000_00_1a_7
pci_0000_00_1b_0
pci_0000_00_1c_0
pci_0000_00_1c_1
pci_0000_00_1c_4
pci_0000_00_1d_0
pci_0000_00_1d_1
pci_0000_00_1d_2
pci_0000_00_1d_7
pci_0000_00_1e_0
pci_0000_00_1f_0
pci_0000_00_1f_2
pci_0000_00_1f_3
pci_0000_01_00_0
pci_0000_01_00_1
pci_0000_02_00_0
pci_0000_02_00_1
pci_0000_06_00_0
pci_0000_07_02_0
pci_0000_07_03_0
Record the PCI device number; the number is needed in other steps.
Information on the domain, bus and function are available from output of the vi rsh
no d ed ev-d umpxml command:
# virsh nodedev-dumpxml pci_0000_01_00_0
72
<device>
<name>pci_0000_01_00_0</name>
<parent>pci_0000_00_01_0</parent>
<driver>
<name>igb</name>
</driver>
<capability type='pci'>
<domain>0</domain>
<bus>1</bus>
<slot>0</slot>
<function>0</function>
<product id='0x10c9'>82576 Gigabit Network Connection</product>
<vendor id='0x8086'>Intel Corporation</vendor>
<iommuGroup number='7'>
<address domain='0x0000' bus='0x00' slot='0x19'
function='0x0'/>
</iommuGroup>
</capability>
</device>
Note
If there are multiple endpoints in the IOMMU group and not all of them are assigned to
the guest, you will need to manually detach the other endpoint(s) from the host by
running the following command before you start the guest:
$ virsh nodedev-detach pci_0000_00_19_1
Refer to the Note in Section 9.1.1, Assigning a PCI D evice with virsh for more
information on IOMMU groups.
2. Ad d t h e d evice
Use the PCI identifier output from the vi rsh no d ed ev command as the value for the -host-device parameter.
virt-install \
--name=guest1-rhel6-64 \
--disk path=/var/lib/libvirt/images/guest1-rhel6-64.img,size=8 \
--nonsparse --graphics spice \
--vcpus=2 --ram=2048 \
--location=http://example1.com/installation_tree/RHEL6.0-Serverx86_64/os \
--nonetworks \
--os-type=linux \
--os-variant=rhel6
--host-device=pci_0000_01_00_0
3. C o mp let e t h e in st allat io n
Complete the guest installation. The PCI device should be attached to the guest.
73
74
Note
This action cannot be performed when the guest virtual machine is running. You must add the
PCI device on a guest virtual machine that is shutdown.
75
devices). D ue to limitations in standard single-port PCI ethernet card driver design - only SR-IOV
(Single Root I/O Virtualization) virtual function (VF) devices can be assigned in this manner; to
assign a standard single-port PCI or PCIe Ethernet card to a guest, use the traditional <ho std ev>
device definition.
To use VFIO device assignment rather than traditional/legacy KVM device assignment (VFIO is a new
method of device assignment that is compatible with UEFI Secure Boot), a <type= ' ho std ev' >
interface can have an optional driver sub-element with a name attribute set to " vfio" . To use legacy
KVM device assignment you can set name to " kvm" (or simply omit the <d ri ver> element, since
<d ri ver= ' kvm' > is currently the default).
Note
Intelligent passthrough of network devices is very similar to the functionality of a standard
<ho std ev> device, the difference being that this method allows specifying a MAC address
and <vi rtual po rt> for the passed-through device. If these capabilities are not required, if
you have a standard single-port PCI, PCIe, or USB network card that does not support SR-IOV
(and hence would anyway lose the configured MAC address during reset after being assigned
to the guest domain), or if you are using a version of libvirt older than 0.9.11, you should use
standard <ho std ev> to assign the device to the guest instead of <i nterface
type= ' ho std ev' />.
<devices>
<interface type='hostdev'>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
</source>
<mac address='52:54:00:6d:90:02'>
<virtualport type='802.1Qbh'>
<parameters profileid='finance'/>
</virtualport>
</interface>
</devices>
Fig u re 9 .6 . XML examp le f o r PC I d evice assig n men t
76
address prior to assigning the VF to the host physical machine and you would need to set this each
and every time the guest virtual machine boots. In order to assign this MAC address as well as other
options, refert to the procedure described in Procedure 9.8, Configuring MAC addresses, vLAN, and
virtual ports for assigning PCI devices on SR-IOV .
Pro ced u re 9 .8. C o n f ig u rin g MAC ad d resses, vLAN , an d virt u al p o rt s f o r assig n in g PC I
d evices o n SR - IO V
It is important to note that the <ho std ev> element cannot be used for function-specific items like
MAC address assignment, vLAN tag ID assignment, or virtual port assignment because the <mac>,
<vl an>, and <vi rtual po rt> elements are not valid children for <ho std ev>. As they are valid for
<i nterface>, support for a new interface type was added (<i nterface type= ' ho std ev' >).
This new interface device type behaves as a hybrid of an <i nterface> and <ho std ev>. Thus,
before assigning the PCI device to the guest virtual machine, libvirt initializes the network-specific
hardware/switch that is indicated (such as setting the MAC address, setting a vLAN tag, and/or
associating with an 802.1Qbh switch) in the guest virtual machine's XML configuration file. For
information on setting the vLAN tag, refer to Section 18.14, Setting vLAN Tags .
1. Sh u t d o wn t h e g u est virt u al mach in e
Using vi rsh shutd o wn command (refer to Section 14.9.1, Shutting D own a Guest Virtual
Machine ), shutdown the guest virtual machine named guestVM.
# vi rsh shutd o wn guestVM
2. G at h er in f o rmat io n
In order to use <i nterface type= ' ho std ev' >, you must have an SR-IOV-capable
network card, host physical machine hardware that supports either the Intel VT-d or AMD
IOMMU extensions, and you must know the PCI address of the VF that you wish to assign.
3. O p en t h e XML f ile f o r ed it in g
Run the # vi rsh save-i mag e-ed i t command to open the XML file for editing (refer to
Section 14.8.10, Edit D omain XML Configuration Files for more information). As you would
want to restore the guest virtual machine to its former running state, the --runni ng would be
used in this case. The name of the configuration file in this example is guestVM.xml, as the
name of the guest virtual machine is guestVM.
# vi rsh save-i mag e-ed i t guestVM.xml --runni ng
The guestVM.xml opens in your default editor.
4. Ed it t h e XML f ile
Update the configuration file (guestVM.xml) to have a <d evi ces> entry similar to the
following:
<devices>
...
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0' bus='0x00' slot='0x07'
77
9.1.8. Set t ing PCI Device Assignment from a Pool of SR-IOV Virt ual Funct ions
Hard coding the PCI addresses of a particular Virtual Functions (VFs) into a guest's configuration has
two serious limitations:
The specified VF must be available any time the guest virtual machine is started, implying that the
administrator must permanently assign each VF to a single guest virtual machine (or modify the
configuration file for every guest virtual machine to specify a currently unused VF's PCI address
each time every guest virtual machine is started).
If the guest virtual machine is moved to another host physical machine, that host physical
machine must have exactly the same hardware in the same location on the PCI bus (or, again, the
guest virtual machine configuration must be modified prior to start).
It is possible to avoid both of these problems by creating a libvirt network with a device pool
containing all the VFs of an SR-IOV device. Once that is done you would configure the guest virtual
machine to reference this network. Each time the guest is started, a single VF will be allocated from
the pool and assigned to the guest virtual machine. When the guest virtual machine is stopped, the
VF will be returned to the pool for use by another guest virtual machine.
78
<network>
<name>passthrough</name>
<!--This is the name of the file you created-->
<forward mode='hostdev' managed='yes'>
<pf dev='myNetDevName'/>
<!--Use the netdev name of your SR-IOV devices PF here-->
</forward>
</network>
79
<interface type='network'>
<source network='passthrough'>
</interface>
<network connections='1'>
<name>passthrough</name>
<uuid>a6b49429-d353-d7ad-3185-4451cc786437</uuid>
<forward mode='hostdev' managed='yes'>
<pf dev='eth3'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x10'
function='0x1'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x10'
function='0x3'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x10'
function='0x5'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x10'
function='0x7'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x11'
function='0x1'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x11'
function='0x3'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x11'
function='0x5'/>
</forward>
</network>
80
Note
virt - man ag er should not be used for hot plugging or hot unplugging devices. If you want
to hot plug/or hot unplug a USB device, refer to Procedure 14.1, Hotplugging USB devices
for use by the guest virtual machine .
Using USB re-direction - USB re-direction is best used in cases where there is a host physical
machine that is running in a data center. The user connects to his/her guest virtual machine from
a local machine or thin client. On this local machine there is a SPICE client. The user can attach
any USB device to the thin client and the SPICE client will redirect the device to the host physical
machine on the data center so it can be used by the guest virtual machine that is running on the
thin client. For instructions on USB re-direction using the virt-manager, refer to Section 15.3.1,
Attaching USB D evices to a Guest Virtual Machine It should be noted that USB redirection is not
possible using the TCP protocol (Refer to BZ #1085318).
Important
If a device matches none of the rule filters, redirecting it will not be allowed!
Examp le 9 .1. An examp le o f limit in g red irect io n wit h a win d o ws g u est virt u al mach in e
1. Prepare a Windows 7 guest virtual machine.
2. Add the following code excerpt to the guest virtual machine's' domain xml file:
81
...
<devices>
<controller type='ide' index='0'/>
82
83
piix4-uhci
ehci
ich9-ehci1
ich9-uhci1
ich9-uhci2
ich9-uhci3
vt82c686b-uhci
pci-ohci
nec-xhci
Note
If the USB bus needs to be explicitly disabled for the guest virtual machine,
<mo d el = ' no ne' > may be used. .
For controllers that are themselves devices on a PCI or USB bus, an optional sub-element
<ad d ress> can specify the exact relationship of the controller to its master bus, with semantics as
shown in Section 9.4, Setting Addresses for D evices .
An optional sub-element <d ri ver> can specify the driver specific options. Currently it only supports
attribute queues, which specifies the number of queues for the controller. For best performance, it's
recommended to specify a value matching the number of vCPUs.
USB companion controllers have an optional sub-element <master> to specify the exact
relationship of the companion to its master controller. A companion controller is on the same bus as
its master, so the companion i nd ex value should be equal.
An example XML which can be used is as follows:
...
<devices>
<controller type='usb' index='0'
<address type='pci' domain='0'
</controller>
<controller type='usb' index='0'
<master startport='0'/>
<address type='pci' domain='0'
multifunction='on'/>
</controller>
...
</devices>
...
84
model='ich9-ehci1'>
bus='0' slot='4' function='7'/>
model='ich9-uhci1'>
bus='0' slot='4' function='0'
...
<devices>
<controller type='pci' index='0' model='pci-root'/>
<controller type='pci' index='1' model='pci-bridge'>
<address type='pci' domain='0' bus='0' slot='5' function='0'
multifunction='off'/>
</controller>
</devices>
...
Fig u re 9 .13. D o main XML examp le f o r PC I b rid g e
For machine types which provide an implicit PCI Express (PCIe) bus (for example, the machine types
based on the Q35 chipset), the pcie-root controller with i nd ex= ' 0 ' is auto-added to the domain's
configuration. pcie-root has also no address, but provides 31 slots (numbered 1-31) and can only be
used to attach PCIe devices. In order to connect standard PCI devices on a system which has a pcieroot controller, a pci controller with mo d el = ' d mi -to -pci -bri d g e' is automatically added. A
dmi-to-pci-bridge controller plugs into a PCIe slot (as provided by pcie-root), and itself provides 31
standard PCI slots (which are not hot-pluggable). In order to have hot-pluggable PCI slots in the
guest system, a pci-bridge controller will also be automatically created and connected to one of the
slots of the auto-created dmi-to-pci-bridge controller; all guest devices with PCI addresses that are
auto-determined by libvirt will be placed on this pci-bridge device.
...
<devices>
85
model='pcie-root'/>
model='dmi-to-pci-bridge'>
bus='0' slot='0xe' function='0'/>
model='pci-bridge'>
bus='1' slot='1' function='0'/>
D escrip t io n
type='pci'
86
Ad d ress t yp e
D escrip t io n
type='drive'
type='virtio-serial'
type='ccid'
type='usb'
type='isa'
87
88
machine at the location /d ev/hwrng . This chardev can then be opened and read to fetch entropy
from the host physical machine. In order for guest virtual machines' applications to benefit from
using randomness from the virtio-rng device transparently, the input from /d ev/hwrng must be
relayed to the kernel entropy pool in the guest virtual machine. This can be accomplished if the
information in this location is coupled with the rgnd daemon (contained within the rng-tools).
This coupling results in the entropy to be routed to the guest virtual machine's /d ev/rand o m file.
The process is done manually in Red Hat Enterprise Linux 6 guest virtual machines.
Red Hat Enterprise Linux 6 guest virtual machines are coupled by running the following command:
# rngd -b -r /dev/hwrng -o /dev/random
For more assistance, run the man rng d command for an explanation of the command options
shown here. For further examples, refer to Procedure 9.11, Implementing virtio-rng with the command
line tools for configuring the virtio-rng device.
Note
Windows guest virtual machines require the driver vi o rng to be installed. Once installed, the
virtual RNG device will work using the CNG (crypto next generation) API provided by Microsoft.
Once the driver is installed, the vi rtrng device appears in the list of RNG providers.
...
<devices>
<rng model='virtio'>
<rate period="2000" bytes="1234"/>
<backend model='random'>/dev/random</backend>
<source mode='bind' service='1234'>
<source mode='connect' host='192.0.2.1' service='1234'>
</backend>
</rng>
</devices>
...
89
Note
Only the q co w2 and vd i formats support consistency checks.
Using the -r tries to repair any inconsistencies that are found during the check, but when used with
-r leaks cluster leaks are repaired and when used with -r all all kinds of errors are fixed. Note
that this has a risk of choosing the wrong fix or hiding corruption issues that may have already
occurred.
C o mmit
Commits any changes recorded in the specified file (filename) to the file's base image with the q emui mg co mmi t command. Optionally, specify the file's format type (format).
# qemu-img commit [-f format] [-t cache] filename
C o n vert
The convert option is used to convert one recognized image format to another image format.
Command format:
# qemu-img convert [-c] [-p] [-f format] [-t cache] [-O output_format] [-o
options] [-S sparse_size] filename output_filename
The -p parameter shows the progress of the command (optional and not for every command) and -S
option allows for the creation of a sparse file, which is included within the disk image. Sparse files in
all purposes function like a standard file, except that the physical blocks that only contain zeros (i.e.,
nothing). When the Operating System sees this file, it treats it as it exists and takes up actual disk
90
space, even though in reality it doesn't take any. This is particularly helpful when creating a disk for
a guest virtual machine as this gives the appearance that the disk has taken much more disk space
than it has. For example, if you set -S to 50Gb on a disk image that is 10Gb, then your 10Gb of disk
space will appear to be 60Gb in size even though only 10Gb is actually being used.
Convert the disk image filename to disk image output_filename using format output_format.
The disk image can be optionally compressed with the -c option, or encrypted with the -o option by
setting -o encrypti o n. Note that the options available with the -o parameter differ with the selected
format.
Only the q co w2 format supports encryption or compression. q co w2 encryption uses the AES format
with secure 128-bit keys. q co w2 compression is read-only, so if a compressed sector is converted
from q co w2 format, it is written to the new format as uncompressed data.
Image conversion is also useful to get a smaller image when using a format which can grow, such as
q co w or co w. The empty sectors are detected and suppressed from the destination image.
C reat e
Create the new disk image filename of size size and format format.
# qemu-img create [-f format] [-o options] filename [size][preallocation]
If a base image is specified with -o backi ng _fi l e= filename, the image will only record
differences between itself and the base image. The backing file will not be modified unless you use
the co mmi t command. No size needs to be specified in this case.
Preallocation is an option that may only be used with creating qcow2 images. Accepted values
include -o preal l o cati o n= off|meta|full|falloc. Images with preallocated metadata are
larger than images without. However in cases where the image size increases, performance will
improve as the image grows.
It should be noted that using ful l allocation can take a long time with large images. In cases where
you want full allocation and time is of the essence, using fal l o c will save you time.
In f o
The i nfo parameter displays information about a disk image filename. The format for the i nfo
option is as follows:
# qemu-img info [-f format] filename
This command is often used to discover the size reserved on disk which can be different from the
displayed size. If snapshots are stored in the disk image, they are displayed also. This command will
show for example, how much space is being taken by a q co w2 image on a block device. This is
done by running the q emu-i mg . You can check that the image in use is the one that matches the
output of the q emu-i mg i nfo command with the q emu-i mg check command. Refer to
Section 10.1, Using qemu-img .
# qemu-img info /dev/vg-90.100-sluo/lv-90-100-sluo
image: /dev/vg-90.100-sluo/lv-90-100-sluo
file format: qcow2
virtual size: 20G (21474836480 bytes)
91
disk size: 0
cluster_size: 65536
Map
The # q emu-i mg map [-f format] [--o utput= output_format] fi l ename command
dumps the metadata of the image filename and its backing file chain. Specifically, this command
dumps the allocation state of every sector of a specified file, together with the topmost file that
allocates it in the backing file chain. For example, if you have a chain such as c.qcow2 b.qcow2
a.qcow2, a.qcow is the original file, b.qcow2 is the changes made to a.qcow2 and c.qcow2 is the
delta file from b.qcow2. When this chain is created the image files stores the normal image data, plus
information about what is in which file and where it is located within the file. This information is
referred to as the image's metadata. The -f format option is the format of the specified image file.
Formats such as raw, qcow2, vhdx and vmdk may be used. There are two output options possible:
human and jso n.
human is the default setting. It is designed to be more readable to the human eye, and as such,
this format should not be parsed. For clarity and simplicity, the default human format only dumps
known-nonzero areas of the file. Known-zero parts of the file are omitted altogether, and likewise
for parts that are not allocated throughout the chain. When the command is executed, qemu-img
output will identify a file from where the data can be read, and the offset in the file. The output is
displayed as a table with four columns; the first three of which are hexadecimal numbers.
# qemu-img map -f qcow2 --output=human /tmp/test.qcow2
Offset
Length
Mapped to
File
0
0x20000
0x50000
/tmp/test.qcow2
0x100000
0x80000
0x70000
/tmp/test.qcow2
0x200000
0x1f0000
0xf0000
/tmp/test.qcow2
0x3c00000
0x20000
0x2e0000
/tmp/test.qcow2
0x3fd0000
0x10000
0x300000
/tmp/test.qcow2
jso n, or JSON (JavaScript Object Notation), is readable by humans, but as it is a programming
language, it is also designed to be parsed. For example, if you want to parse the output of " qemuimg map" in a parser then you should use the option --o utput= jso n.
# qemu-img map -f qcow2 --output=json /tmp/test.qcow2
[{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data":
true, "offset": 327680},
{ "start": 131072, "length": 917504, "depth": 0, "zero": true, "data":
false},
For more information on the JSON format, refer to the qemu-img(1) man page.
R eb ase
Changes the backing file of an image.
# qemu-img rebase [-f format] [-t cache] [-p] [-u] -b backing_file [-F
backing_format] filename
The backing file is changed to backing_file and (if the format of filename supports the feature), the
backing file format is changed to backing_format.
92
Note
Only the q co w2 format supports changing the backing file (rebase).
There are two different modes in which rebase can operate: Saf e and U n saf e.
Saf e mo d e is used by default and performs a real rebase operation. The new backing file may differ
from the old one and the q emu-i mg rebase command will take care of keeping the guest virtual
machine-visible content of filename unchanged. In order to achieve this, any clusters that differ
between backing_file and old backing file of filename are merged into filename before making any
changes to the backing file.
Note that safe mode is an expensive operation, comparable to converting an image. The old backing
file is required for it to complete successfully.
U n saf e mo d e is used if the -u option is passed to q emu-i mg rebase. In this mode, only the
backing file name and format of filename is changed, without any checks taking place on the file
contents. Make sure the new backing file is specified correctly or the guest-visible content of the
image will be corrupted.
This mode is useful for renaming or moving the backing file. It can be used without an accessible old
backing file. For instance, it can be used to fix an image whose backing file has already been moved
or renamed.
R esiz e
Change the disk image filename as if it had been created with size size. Only images in raw format can
be resized regardless of version. Red Hat Enterprise Linux 6.1 and later adds the ability to grow (but
not shrink) images in q co w2 format.
Use the following to set the size of the disk image filename to size bytes:
# qemu-img resize filename size
You can also resize relative to the current size of the disk image. To give a size relative to the current
size, prefix the number of bytes with + to grow, or - to reduce the size of the disk image by that
number of bytes. Adding a unit suffix allows you to set the image size in kilobytes (K), megabytes (M),
gigabytes (G) or terabytes (T).
# qemu-img resize filename [+|-]size[K|M|G|T]
Warning
Before using this command to shrink a disk image, you must use file system and partitioning
tools inside the VM itself to reduce allocated file systems and partition sizes accordingly.
Failure to do so will result in data loss.
After using this command to grow a disk image, you must use file system and partitioning tools
inside the VM to actually begin using the new space on the device.
Sn ap sh o t
93
Su p p o rt ed Fo rmat s
q emu - img is designed to convert files to one of the following formats:
raw
Raw disk image format (default). This can be the fastest file-based format. If your file system
supports holes (for example in ext2 or ext3 on Linux or NTFS on Windows), then only the
written sectors will reserve space. Use q emu-i mg i nfo to obtain the real size used by the
image or l s -l s on Unix/Linux. Although Raw images give optimal performance, only very
basic features are available with a Raw image (no snapshots etc.).
q co w2
QEMU image format, the most versatile format with the best feature set. Use it to have
optional AES encryption, zlib-based compression, support of multiple VM snapshots, and
smaller images, which are useful on file systems that do not support holes (non-NTFS file
systems on Windows). Note that this expansive feature set comes at the cost of
performance.
Although only the formats above can be used to run on a guest virtual machine or host physical
machine machine, q emu - img also recognizes and supports the following formats in order to convert
from them into either raw or q co w2 format. The format of an image is usually detected automatically.
In addition to converting these formats into raw or q co w2 , they can be converted back from raw or
q co w2 to the original format.
bo chs
Bochs disk image format.
cl o o p
Linux Compressed Loop image, useful only to reuse directly compressed CD -ROM images
present for example in the Knoppix CD -ROMs.
co w
User Mode Linux Copy On Write image format. The co w format is included only for
compatibility with previous versions. It does not work with Windows.
d mg
Mac disk image format.
nbd
Network block device.
paral l el s
94
Important
Note that it is only safe to rely on the guest agent when run by trusted guests. An untrusted
guest may maliciously ignore or abuse the guest agent protocol, and although built-in
safeguards exist to prevent a denial of service attack on the host, the host requires guest cooperation for operations to run as expected.
Note that QEMU guest agent can be used to enable and disable virtual CPUs (vCPUs) while the guest
is running, thus adjusting the number of vCPUs without using the hot plug and hot unplug features.
Refer to Section 14.13.6, Configuring Virtual CPU Count for more information.
10.2.2. Set t ing up Communicat ion bet ween Guest Agent and Host
The host machine communicates with the guest agent through a VirtIO serial connection between the
host and guest machines. A VirtIO serial channel is connected to the host via a character device
driver (typically a Unix socket), and the guest listens on this serial channel. The following procedure
shows how to set up the host and guest machines for guest agent use.
95
Note
For instructions on how to set up the QEMU guest agent on Windows guests, refer to the
instructions found here.
<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/rhel6.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
96
If the virtio-serial device resets and qemu-guest-agent has not connected to the channel (generally
caused by a reboot or hotplug), data from the client will be dropped.
If qemu-guest-agent has connected to the channel following a virtio-serial device reset, data from
the client will be queued (and eventually throttled if available buffers are exhausted), regardless of
whether or not qemu-guest-agent is still running or connected.
97
Note
An application-specific hook script might need various SELinux permissions in order to run
correctly, as is done when the script needs to connect to a socket in order to talk to a
database. In general, local SELinux policies should be developed and installed for such
purposes. Accessing file system nodes should work out of the box, after issuing the
resto reco n -FvvR command listed in Table 10.1, QEMU guest agent package contents in
the table row labeled /etc/q emu-g a/fsfreeze-ho o k. d /.
The qemu-guest-agent binary RPM includes the following files:
T ab le 10.1. Q EMU g u est ag en t p ackag e co n t en t s
File n ame
D escrip t io n
/etc/rc. d /i ni t. d /q emu-g a
The main hook script, /usr/l i bexec/q emu-g a/fsfreeze-ho o k logs its own messages, as well
as the application-specific script's standard output and error messages, in the following log file:
/var/l o g /q emu-g a/fsfreeze-ho o k. l o g . For more information, refer to the qemu-guest-agent
wiki page at wiki.qemu.org or libvirt.org.
98
Note
Windows guest virtual machines require the QEMU guest agent package for Windows, qemuguest-agent-win. This agent is required for VSS (Volume Shadow Copy Service) support for
Windows guest virtual machines running on Red Hat Enterprise Linux. More information can
be found here.
99
Add the following elements to the XML file using the # vi rsh ed i t win7x86 command and
save the changes. Note that the source socket name must be unique in the host, named
win7x86.agent in this example:
...
<channel type='unix'>
<source mode='bind'
path='/var/lib/libvirt/qemu/win7x86.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
<address type='virtio-serial' controller='0' bus='0'
port='1'/>
</channel>
<channel type='spicevmc'>
<target type='virtio' name='com.redhat.spice.0'/>
<address type='virtio-serial' controller='0' bus='0'
port='2'/>
</channel>
...
100
total 1544
-rw-r--r--. 1 root root 856064 Oct 23 04:58 qemu-ga-x64.msi
-rw-r--r--. 1 root root 724992 Oct 23 04:58 qemu-ga-x86.msi
Examp le 10.1. Limit in g red irect io n wit h a Win d o ws g u est virt u al mach in e
1. Prepare a Windows 7 guest virtual machine.
2. Add the following code excerpt to the guest virtual machine's XML file:
<redirdev bus='usb' type='spicevmc'>
<alias name='redir0'/>
101
102
bridge id
8000.5254007da9f2
STP enabled
yes
vnet0
virbr1
virbr1-nic
8000.525400682996
yes
interfaces
4. Update the guest virtual machine's network with the new interface parameters with the
following command:
# vi rsh upd ate-d evi ce test1 br1. xml
Device updated successfully
5. On the guest virtual machine, run servi ce netwo rk restart. The guest virtual machine
gets a new IP address for virbr1. Check the guest virtual machine's vnet0 is connected to the
new bridge(virbr1)
# brctl sho w
bridge name
virbr0
virbr1
vnet0
bridge id
8000.5254007da9f2
8000.525400682996
STP enabled
yes
yes
interfaces
virbr0-nic
virbr1-nic
103
104
Note
Multi-path storage pools should not be created or used as they are not fully supported.
11.2. Volumes
Storage pools are divided into storage volumes. Storage volumes are an abstraction of physical
partitions, LVM logical volumes, file-based disk images and other storage types handled by libvirt.
Storage volumes are presented to guest virtual machines as local storage devices regardless of the
underlying hardware.
R ef eren cin g Vo lu mes
To reference a specific volume, three approaches are possible:
T h e n ame o f t h e vo lu me an d t h e st o rag e p o o l
A volume may be referred to by name, along with an identifier for the storage pool it belongs
in. On the virsh command line, this takes the form --pool storage_pool volume_name.
For example, a volume named firstimage in the guest_images pool.
# virsh vol-info --pool guest_images firstimage
Name:
firstimage
Type:
block
Capacity:
20.00 GB
Allocation:
20.00 GB
virsh #
T h e f u ll p at h t o t h e st o rag e o n t h e h o st p h ysical mach in e syst em
A volume may also be referred to by its full path on the file system. When using this
approach, a pool identifier does not need to be included.
For example, a volume named secondimage.img, visible to the host physical machine system
as /images/secondimage.img. The image can be referred to as /images/secondimage.img.
# virsh vol-info /images/secondimage.img
Name:
secondimage.img
Type:
file
Capacity:
20.00 GB
Allocation:
136.00 kB
T h e u n iq u e vo lu me key
When a volume is first created in the virtualization system, a unique identifier is generated
and assigned to it. The unique identifier is termed the volume key. The format of this volume
key varies upon the storage used.
When used with block based storage such as LVM, the volume key may follow this format:
c3pKz4-qPVc-Xf7M-7WNM-WJc8-qSiz-mtvpGn
105
When used with file based storage, the volume key may instead be a copy of the full path to
the volume storage.
/images/secondimage.img
For example, a volume with the volume key of Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr:
# virsh vol-info Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
Name:
firstimage
Type:
block
Capacity:
20.00 GB
Allocation:
20.00 GB
vi rsh provides commands for converting between a volume name, volume path, or volume key:
vo l- n ame
Returns the volume name when provided with a volume path or volume key.
# virsh vol-name /dev/guest_images/firstimage
firstimage
# virsh vol-name Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
vo l- p at h
Returns the volume path when provided with a volume key, or a storage pool identifier and
volume name.
# virsh vol-path Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
/dev/guest_images/firstimage
# virsh vol-path --pool guest_images firstimage
/dev/guest_images/firstimage
T h e vo l- key co mman d
Returns the volume key when provided with a volume path, or a storage pool identifier and
volume name.
# virsh vol-key /dev/guest_images/firstimage
Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
# virsh vol-key --pool guest_images firstimage
Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
106
Note
Storage pools and volumes are not required for the proper operation of guest virtual
machines. Pools and volumes provide a way for libvirt to ensure that a particular piece of
storage will be available for a guest virtual machine, but some administrators will prefer to
manage their own storage and guest virtual machines will operate properly without any pools
or volumes defined. On systems that do not use pools, system administrators must ensure the
availability of the guest virtual machines' storage using whatever tools they prefer, for
example, adding the NFS share to the host physical machine's fstab so that the share is
mounted at boot time.
107
Warning
Guests should not be given write access to whole disks or block devices (for example,
/d ev/sd b). Use partitions (for example, /d ev/sd b1) or LVM volumes.
If you pass an entire block device to the guest, the guest will likely partition it or create its own
LVM groups on it. This can cause the host physical machine to detect these partitions or LVM
groups and cause errors.
Warning
D edicating a disk to a storage pool will reformat and erase all data presently stored on the
disk device. It is strongly recommended to back up the storage device before commencing with
the following procedure.
108
3. At t ach t h e d evice
Add the storage pool definition using the vi rsh po o l -d efi ne command with the XML
configuration file created in the previous step.
# virsh pool-define ~/guest_images_disk.xml
Pool guest_images_disk defined from /root/guest_images_disk.xml
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images_disk
inactive
no
4. St art t h e st o rag e p o o l
Start the storage pool with the vi rsh po o l -start command. Verify the pool is started with
the vi rsh po o l -l i st --al l command.
# virsh pool-start guest_images_disk
Pool guest_images_disk started
# virsh pool-list --all
Name
State
Autostart
109
----------------------------------------default
active
yes
guest_images_disk
active
no
5. T u rn o n au t o st art
Turn on autostart for the storage pool. Autostart configures the l i bvi rtd service to start
the storage pool when the service starts.
# virsh pool-autostart guest_images_disk
Pool guest_images_disk marked as autostarted
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images_disk
active
yes
6. Verif y t h e st o rag e p o o l co n f ig u rat io n
Verify the storage pool was created correctly, the sizes reported correctly, and the state
reports as runni ng .
# virsh pool-info guest_images_disk
Name:
guest_images_disk
UUID:
551a67c8-5f2a-012c-3844-df29b167431c
State:
running
Capacity:
465.76 GB
Allocation:
0.00
Available:
465.76 GB
# ls -la /dev/sdb
brw-rw----. 1 root disk 8, 16 May 30 14:08 /dev/sdb
# virsh vol-list guest_images_disk
Name
Path
----------------------------------------7. O p t io n al: R emo ve t h e t emp o rary co n f ig u rat io n f ile
Remove the temporary storage pool XML configuration file if it is not needed.
# rm ~/guest_images_disk.xml
A disk based storage pool is now available.
110
12.2.1. Creat ing a Part it ion-based St orage Pool Using virt -manager
This procedure creates a new storage pool using a partition of a storage device.
Pro ced u re 12.1. C reat in g a p art it io n - b ased st o rag e p o o l wit h virt - man ag er
1. O p en t h e st o rag e p o o l set t in g s
a. In the vi rt-manag er graphical interface, select the host physical machine from the
main window.
Open the Ed i t menu and select C o nnecti o n D etai l s
111
112
113
The new storage pool appears in the storage list on the left after a few seconds. Verify the size
is reported as expected, 458.20 GB Free in this example. Verify the State field reports the new
storage pool as Active.
Select the storage pool. In the Auto start field, click the O n Bo o t checkbox. This will make
sure the storage device starts whenever the l i bvi rtd service starts.
114
Warning
D o not use this procedure to assign an entire disk as a storage pool (for example,
/d ev/sd b). Guests should not be given write access to whole disks or block devices. Only
use this method to assign partitions (for example, /d ev/sd b1) to storage pools.
Pro ced u re 12.2. C reat in g p re- f o rmat t ed b lo ck d evice st o rag e p o o ls u sin g virsh
1. C reat e t h e st o rag e p o o l d ef in it io n
Use the virsh po o l -d efi ne-as command to create a new storage pool definition. There are
three options that must be provided to define a pre-formatted disk as a storage pool:
Part it io n n ame
The name parameter determines the name of the storage pool. This example uses
the name guest_images_fs in the example below.
d evice
115
The device parameter with the path attribute specifies the device path of the
storage device. This example uses the partition /dev/sdc1.
mo u n t p o in t
The mountpoint on the local file system where the formatted device will be
mounted. If the mount point directory does not exist, the vi rsh command can create
the directory.
The directory /guest_images is used in this example.
# virsh pool-define-as guest_images_fs fs - - /dev/sdc1 "/guest_images"
Pool guest_images_fs defined
The new pool and mount points are now created.
2. Verif y t h e n ew p o o l
List the present storage pools.
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images_fs
inactive
no
3. C reat e t h e mo u n t p o in t
Use the vi rsh po o l -bui l d command to create a mount point for a pre-formatted file
system storage pool.
# virsh pool-build guest_images_fs
Pool guest_images_fs built
# ls -la /guest_images
total 8
drwx------. 2 root root 4096 May 31 19:38 .
dr-xr-xr-x. 25 root root 4096 May 31 19:38 ..
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images_fs
inactive
no
4. St art t h e st o rag e p o o l
Use the vi rsh po o l -start command to mount the file system onto the mount point and
make the pool available for use.
# virsh pool-start guest_images_fs
Pool guest_images_fs started
# virsh pool-list --all
Name
State
Autostart
116
----------------------------------------default
active
yes
guest_images_fs
active
no
5. T u rn o n au t o st art
By default, a storage pool defined with vi rsh, is not set to automatically start each time
l i bvi rtd starts. To remedy this, enable the automatic start with the vi rsh po o l auto start command. The storage pool is now automatically started each time l i bvi rtd
starts.
# virsh pool-autostart guest_images_fs
Pool guest_images_fs marked as autostarted
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images_fs
active
yes
6. Verif y t h e st o rag e p o o l
Verify the storage pool was created correctly, the sizes reported are as expected, and the state
is reported as runni ng . Verify there is a " lost+found" directory in the mount point on the file
system, indicating the device is mounted.
# virsh pool-info guest_images_fs
Name:
guest_images_fs
UUID:
c7466869-e82a-a66c-2187-dc9d6f0877d0
State:
running
Persistent:
yes
Autostart:
yes
Capacity:
458.39 GB
Allocation:
197.91 MB
Available:
458.20 GB
# mount | grep /guest_images
/dev/sdc1 on /guest_images type ext4 (rw)
# ls -la /guest_images
total 24
drwxr-xr-x. 3 root root 4096 May 31 19:47 .
dr-xr-xr-x. 25 root root 4096 May 31 19:38 ..
drwx------. 2 root root 16384 May 31 14:18 lost+found
117
12.3.1. Creat ing a Direct ory-based St orage Pool wit h virt -manager
1. C reat e t h e lo cal d irect o ry
a. O p t io n al: C reat e a n ew d irect o ry f o r t h e st o rag e p o o l
Create the directory on the host physical machine for the storage pool. This example
uses a directory named /guest virtual machine_images.
# mkdir /guest_images
b. Set d irect o ry o wn ersh ip
Change the user and group ownership of the directory. The directory must be owned
by the root user.
# chown root:root /guest_images
c. Set d irect o ry p ermissio n s
Change the file permissions of the directory.
# chmod 700 /guest_images
d. Verif y t h e ch an g es
Verify the permissions were modified. The output shows a correctly configured empty
directory.
# ls -la /guest_images
total 8
drwx------. 2 root root 4096 May 28 13:57 .
dr-xr-xr-x. 26 root root 4096 May 28 13:57 ..
2. C o n f ig u re SELin u x f ile co n t ext s
Configure the correct SELinux context for the new directory. Note that the name of the pool
and the directory do not have to match. However, when you shutdown the guest virtual
machine, libvirt has to set the context back to a default value. The context of the directory
determines what this default value is. It is worth explicitly labeling the directory virt_image_t,
118
so that when the guest virtual machine is shutdown, the images get labeled 'virt_image_t' and
are thus isolated from other processes running on the host physical machine.
# semanage fcontext -a -t virt_image_t '/guest_images(/.*)?'
# restorecon -R /guest_images
3. O p en t h e st o rag e p o o l set t in g s
a. In the vi rt-manag er graphical interface, select the host physical machine from the
main window.
Open the Ed i t menu and select C o nnecti o n D etai l s
119
120
121
122
Autostart
----------------------------------------default
active
yes
guest_images
inactive
no
3. C reat e t h e lo cal d irect o ry
Use the vi rsh po o l -bui l d command to build the directory-based storage pool for the
directory guest_images (for example), as shown:
# virsh pool-build guest_images
Pool guest_images built
# ls -la /guest_images
total 8
drwx------. 2 root root 4096 May 30 02:44 .
dr-xr-xr-x. 26 root root 4096 May 30 02:44 ..
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images
inactive
no
4. St art t h e st o rag e p o o l
Use the virsh command po o l -start to enable a directory storage pool, thereby allowing
allowing volumes of the pool to be used as guest disk images.
# virsh pool-start guest_images
Pool guest_images started
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images
active
no
5. T u rn o n au t o st art
Turn on autostart for the storage pool. Autostart configures the l i bvi rtd service to start
the storage pool when the service starts.
# virsh pool-autostart guest_images
Pool guest_images marked as autostarted
# virsh pool-list --all
Name
State
Autostart
----------------------------------------default
active
yes
guest_images
active
yes
6. Verif y t h e st o rag e p o o l co n f ig u rat io n
Verify the storage pool was created correctly, the size is reported correctly, and the state is
reported as runni ng . If you want the pool to be accessible even if the guest virtual machine
is not running, make sure that P ersi stent is reported as yes. If you want the pool to start
automatically when the service starts, make sure that Auto start is reported as yes.
123
Note
Thin provisioning is currently not possible with LVM based storage pools.
124
Note
Please refer to the Red Hat Enterprise Linux Storage Administration Guide for more details on LVM.
Warning
LVM-based storage pools require a full disk partition. If activating a new partition/device with
these procedures, the partition will be formatted and all data will be erased. If using the host's
existing Volume Group (VG) nothing will be erased. It is recommended to back up the storage
device before commencing the following procedure.
12.4 .1. Creat ing an LVM-based St orage Pool wit h virt -manager
LVM-based storage pools can use existing LVM volume groups or create new LVM volume groups on
a blank partition.
1. O p t io n al: C reat e n ew p art it io n f o r LVM vo lu mes
These steps describe how to create a new partition and LVM volume group on a new hard
disk drive.
Warning
This procedure will remove all data from the selected storage device.
a. C reat e a n ew p art it io n
Use the fd i sk command to create a new disk partition from the command line. The
following example creates a new partition that uses the entire disk on the storage
device /d ev/sd b.
# fdisk /dev/sdb
Command (m for help):
Press n for a new partition.
b. Press p for a primary partition.
Command action
e
extended
p
primary partition (1-4)
c. Choose an available partition number. In this example the first partition is chosen by
entering 1.
Partition number (1-4): 1
d. Enter the default first cylinder by pressing Enter.
125
126
127
128
129
130
131
132
LV
VG
Copy% Convert
volume1 libvirt_lvm
volume2 libvirt_lvm
volume3 libvirt_lvm
Attr
LSize
-wi-a-wi-a-wi-a-
8.00g
8.00g
8.00g
Move Log
133
134
#LUN
Ensure that the /etc/tg t/targ ets. co nf file contains the d efaul t-d ri ver i scsi line
to set the driver type as iSCSI. The driver uses iSCSI by default.
Important
This example creates a globally accessible target without access control. Refer to the
scsi-target-utils for information on implementing secure access.
135
View the new targets to ensure the setup was successful with the tg t-ad mi n --sho w
command.
# tgt-admin --show
Target 1: iqn.2010-05.com.example.server1:iscsirhel6guest
System information:
Driver: iscsi
State: ready
I_T nexus information:
LUN information:
LUN: 0
Type: controller
SCSI ID: IET
00010000
SCSI SN: beaf10
Size: 0 MB
Online: Yes
Removable media: No
Backing store type: rdwr
Backing store path: None
LUN: 1
Type: disk
SCSI ID: IET
00010001
SCSI SN: beaf11
Size: 20000 MB
Online: Yes
Removable media: No
Backing store type: rdwr
Backing store path: /dev/virtstore/virtimage1
LUN: 2
Type: disk
SCSI ID: IET
00010002
SCSI SN: beaf12
Size: 10000 MB
Online: Yes
Removable media: No
Backing store type: rdwr
Backing store path: /var/lib/tgtd/virtualization/virtimage2.img
Account information:
ACL information:
ALL
Warning
The ACL list is set to all. This allows all systems on the local network to access this
device. It is recommended to set host physical machine access ACLs for production
environments.
136
137
138
139
14 0
14 1
14 2
14 3
14 4
14 5
14 6
Important
Refer to the Red Hat Storage Administration Guide for additonal information.
12.8. Using an NPIV Virt ual Adapt er (vHBA) wit h SCSI Devices
NPIV (N_Port ID Virtualization) is a software technology that allows sharing of a single physical
Fibre Channel host bus adapter (HBA).
This allows multiple guests to see the same storage from multiple physical hosts, and thus allows for
easier migration paths for the storage. As a result, there is no need for the migration to create or copy
storage, as long as the correct storage path is specified.
14 7
In virtualization, the virtual host bus adapter, or vHBA, controls the LUNs for virtual machines. Each
vHBA is identified by its own WWNN (World Wide Node Name) and WWPN (World Wide Port Name).
The path to the storage is determined by the WWNN and WWPN values.
This section provides instructions for configuring a vHBA on a virtual machine. Note that Red Hat
Enterprise Linux 6 does not support persistent vHBA configuration across host reboots; verify any
vHBA-related settings following a host reboot.
14 8
<vports>0</vports>
</capability>
</capability>
</device>
In this example, the <max_vpo rts> value shows there are a total 127 virtual ports available
for use in the HBA configuration. The <vpo rts> value shows the number of virtual ports
currently being used. These values update after creating a vHBA.
3. C reat e a vH B A h o st d evice
Create an XML file similar to the following (in this example, named vhba_host3.xml) for the
vHBA host.
# cat vhba_host3.xml
<device>
<parent>scsi_host3</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
</capability>
</capability>
</device>
The <parent> field specifies the HBA device to associate with this vHBA device. The details
in the <d evi ce> tag are used in the next step to create a new vHBA device for the host. See
http://libvirt.org/formatnode.html for more information on the no d ed ev XML format.
4. C reat e a n ew vH B A o n t h e vH B A h o st d evice
To create a vHBA on vhba_host3, use the vi rsh no d ed ev-create command:
# virsh nodedev-create vhba_host3.xml
Node device scsi_host5 created from vhba_host3.xml
5. Verif y t h e vH B A
Verify the new vHBA's details (scsi _ho st5) with the vi rsh no d ed ev-d umpxml command:
# virsh nodedev-dumpxml scsi_host5
<device>
<name>scsi_host5</name>
<path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3/vport
-3:0-0/host5</path>
<parent>scsi_host3</parent>
<capability type='scsi_host'>
<host>5</host>
<capability type='fc_host'>
<wwnn>5001a4a93526d0a1</wwnn>
<wwpn>5001a4ace3ee047d</wwpn>
<fabric_wwn>2002000573de9a81</fabric_wwn>
</capability>
</capability>
</device>
14 9
Note
Ensure you use the vHBA created in Procedure 12.6, Creating a vHBA as the host
name, modifying the vHBA name scsi_hostN to hostN for the storage pool configuration.
In this example, the vHBA is named scsi_host5, which is specified as <ad apter
name= ' ho st5' /> in a Red Hat Enterprise Linux 6 libvirt storage pool.
It is recommended to use a stable location for the <path> value, such as one of the
/d ev/d i sk/by-{path| i d | uui d | l abel } locations on your system. More information on
<path> and the elements within <targ et> can be found at
http://libvirt.org/formatstorage.html.
In this example, the ' scsi ' storage pool is named vhbapool_host3.xml:
<pool type='scsi'>
<name>vhbapool_host3</name>
<uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
<capacity unit='bytes'>0</capacity>
<allocation unit='bytes'>0</allocation>
<available unit='bytes'>0</available>
<source>
<adapter name='host5'/>
</source>
<target>
<path>/dev/disk/by-path</path>
<permissions>
<mode>0700</mode>
<owner>0</owner>
<group>0</group>
</permissions>
</target>
</pool>
2. D ef in e t h e p o o l
150
To define the storage pool (named vhbapool_host3 in this example), use the vi rsh po o l d efi ne command:
# virsh pool-define vhbapool_host3.xml
Pool vhbapool_host3 defined from vhbapool_host3.xml
3. St art t h e p o o l
Start the storage pool with the following command:
# virsh pool-start vhbapool_host3
Pool vhbapool_host3 started
4. En ab le au t o st art
Finally, to ensure that subsequent host reboots will automatically define vHBAs for use in
virtual machines, set the storage pool autostart feature (in this example, for a pool named
vhbapool_host3):
# virsh pool-autostart vhbapool_host3
151
152
Chapt er 1 3. Volumes
End
8590MB
17.2GB
30.1GB
Size
8590MB
8590MB
8590MB
File system
Name
primary
primary
primary
Flags
153
volume3
clone1
/dev/sdb3
/dev/sdb4
Start
4211MB
12.8GB
21.4GB
30.0GB
End
Size
File system
12.8GB 8595MB primary
21.4GB 8595MB primary
30.0GB 8595MB primary
38.6GB 8595MB primary
Name
Flags
154
Chapt er 1 3. Volumes
Note
This change will only apply after the guest has been destroyed and restarted. In
addition, persistent devices can only be added to a persistent domain, that is a
domain whose configuration has been saved with vi rsh d efi ne command.
If the guest is running, and you want the new device to be added temporarily until the guest is
destroyed, omit the --co nfi g option:
# virsh attach-device Guest1 ~/NewStorage.xml
Note
The vi rsh command allows for an attach-d i sk command that can set a limited
number of parameters with a simpler syntax and without the need to create an XML file.
The attach-d i sk command is used in a similar manner to the attach-d evi ce
command mentioned previously, as shown:
# virsh attach-disk Guest1
/var/lib/libvirt/images/FileName.img vdb --cache none
Note that the vi rsh attach-d i sk command also accepts the --co nfi g option.
155
Note
The following steps are Linux guest specific. Other operating systems handle new
storage devices in different ways. For other systems, refer to that operating system's
documentation.
156
Chapt er 1 3. Volumes
/myfiles
ext3
defaults
0 0
157
b. Follow the instruction in the previous section to attach the device to the guest virtual
machine. Alternatively, you can use the virsh attach-disk command, as shown:
# virsh attach-disk Guest1 /dev/sr0 vdc
Note that the following options are available:
The vi rsh attach-d i sk command also accepts the --config, --type, and -mode options, as shown:
# vi rsh attach-d i sk G uest1 /d ev/sr0 vd c --co nfi g --type
cd ro m --mo d e read o nl y
Additionally, --type also accepts --type disk in cases where the device is a
hard drive.
3. The guest virtual machine now has a new hard disk device called /d ev/vd c on Linux (or
something similar, depending on what the guest virtual machine OS chooses) or D : d ri ve
(for example) on Windows. You can now initialize the disk from the guest virtual machine,
following the standard procedures for the guest virtual machine's operating system. Refer to
Procedure 13.1, Adding file-based storage and Procedure 13.1, Adding file-based storage
for an example.
Warning
The host physical machine should not use filesystem labels to identify file systems in
the fstab file, the i ni trd file or on the kernel command line. D oing so presents a
security risk if less privileged users, such as guest virtual machines, have write access
to whole partitions or LVM volumes, because a guest virtual machine could potentially
write a filesystem label belonging to the host physical machine, to its own block device
storage. Upon reboot of the host physical machine, the host physical machine could
then mistakenly use the guest virtual machine's disk as a system disk, which would
compromise the host physical machine system.
It is preferable to use the UUID of a device to identify it in the fstab file, the i ni trd file
or on the kernel command line. While using UUID s is still not completely secure on
certain file systems, a similar compromise with UUID is significantly less feasible.
Important
Guest virtual machines should not be given write access to whole disks or block
devices (for example, /d ev/sd b). Guest virtual machines with access to whole block
devices may be able to modify volume labels, which can be used to compromise the
host physical machine system. Use partitions (for example, /d ev/sd b1) or LVM
volumes to prevent this issue.
158
Chapt er 1 3. Volumes
guest_images.
# virsh vol-delete --pool guest_images volume1
Vol volume1 deleted
159
14 .1.1. help
$ vi rsh hel p [co mmand | g ro up] The help command can be used with or without options. When
used without options, all commands are listed, one per line. When used with an option, it is grouped
into categories, displaying the keyword for each group.
To display the commands that are only for a specific option, you need to give the keyword for that
group as an option. For example:
$ vi rsh hel p po o l
Storage Pool (help keyword 'pool'):
find-storage-pool-sources-as
find potential storage pool sources
find-storage-pool-sources
discover potential storage pool
sources
pool-autostart
autostart a pool
pool-build
build a pool
pool-create-as
create a pool from a set of args
pool-create
create a pool from an XML file
pool-define-as
define a pool from a set of args
pool-define
define (but don't start) a pool from
an XML file
pool-delete
delete a pool
pool-destroy
destroy (stop) a pool
pool-dumpxml
pool information in XML
pool-edit
edit XML configuration for a storage
pool
pool-info
storage pool information
pool-list
list pools
pool-name
convert a pool UUID to pool name
pool-refresh
refresh a pool
pool-start
start a (previously defined) inactive
pool
pool-undefine
undefine an inactive pool
pool-uuid
convert a pool name to pool UUID
Using the same command with a command option, gives the help information on that one specific
command. For example:
$ vi rsh hel p vo l -path
NAME
vol-path - returns the volume path for a given volume name or key
160
SYNOPSIS
vol-path <vol> [--pool <string>]
OPTIONS
[--vol] <string> volume name or key
--pool <string> pool name or uuid
14 .1.3. version
The version command displays the current libvirt version and displays information about where the
build is from. For example:
$ vi rsh versi o n
Compiled against library: libvirt 1.1.1
Using library: libvirt 1.1.1
Using API: QEMU 1.1.1
Running hypervisor: QEMU 1.5.3
14 .1.5. connect
Connects to a hypervisor session. When the shell is first started this command runs automatically
when the URI parameter is requested by the -c command. The URI specifies how to connect to the
hypervisor. The most commonly used URIs are:
xen: /// - connects to the local Xen hypervisor.
q emu: ///system - connects locally as root to the daemon supervising QEMU and KVM
domains.
xen: ///sessi o n - connects locally as a user to the user's set of QEMU and KVM domains.
l xc: /// - connects to a local Linux container.
Additional values are available on libvirt's website http://libvirt.org/uri.html.
The command can be run as follows:
161
0x17ef Lenovo
0x480f Integrated Webcam [R5U877]
2. Create an XML file and give it a logical name (usb_d evi ce. xml , for example). Make sure
you copy the vendor and procuct ID s exactly as was displayed in your search.
162
<source>
<vendor id='0x17ef'/>
<product id='0x480f'/>
</source>
</hostdev>
...
163
The type can be either netwo rk to indicate a physical network device, or bri d g e to indicate a
bridge to a device. source is the source of the device. To remove the attached device, use the vi rsh
d etach-d evi ce.
14 .5.2. Connect ing t he Serial Console for t he Guest Virt ual Machine
The $ vi rsh co nso l e <d o mai n> [--d evname <stri ng >] [--fo rce] [--safe]
command connects the virtual serial console for the guest virtual machine. The optional --devname
164
<string> parameter refers to the device alias of an alternate console, serial, or parallel device
configured for the guest virtual machine. If this parameter is omitted, the primary console will be
opened. The --fo rce option will force the console connection or when used with disconnect, will
disconnect connections. Using the --safe option will only allow the guest to connect if safe console
handling is supported.
$ virsh console virtual_machine --safe
165
vda
hdc
/VirtualMachines/rhel6.img
-
vda
174670
3219440128
23897
164849664
11577
1005410244506
1085306686457
340645193294
14 .5.9. Set t ing Net work Int erface Bandwidt h Paramet ers
d o mi ftune sets the guest virtual machine's network interface bandwidth parameters. The following
format should be used:
#vi rsh d o mi ftune d o mai n i nterface-d evi ce [[--co nfi g ] [--l i ve] | [-current]] [--i nbo und averag e,peak,burst] [--o utbo und averag e,peak,burst]
166
The only required parameter is the domain name and interface device of the guest virtual machine,
the --co nfi g , --l i ve, and --current functions the same as in Section 14.19, Setting Schedule
Parameters . If no limit is specified, it will query current network interface setting. Otherwise, alter the
limits with the following options:
<interface-device> This is mandatory and it will set or query the domains network interfaces
bandwidth parameters. i nterface-d evi ce can be the interfaces target name (<target
dev=name/>), or the MAC address.
If no --i nbo und or --o utbo und is specified, this command will query and show the bandwidth
settings. Otherwise, it will set the inbound or outbound bandwidth. average,peak,burst is the same
as in attach-i nterface command. Refer to Section 14.3, Attaching Interface D evices
167
If --i nacti ve is specified, the result will show the devices that are to be used at the next boot and
will not show those that are currently running in use by the running domain. If --d etai l s is
specified, the disk type and device value will be included in the table. The information displayed in
this table can be used with the d o mbl ki nfo and snapsho t-create.
#d o mbl kl i st rhel6 --d etai l s
Warning
bl o ckco mmi t will corrupt any file that depends on the -base option (other than files that
depend on the -top option, as those files now point to the base). To prevent this, do not
commit changes into files shared by more than one guest. The -verbose option allows the
progress to be printed on the screen.
168
169
To move the disk image to a new file system on the host:# vi rsh snapsho t-create
example-domaine - xml fi l e /path/to /new. xml - d i sk-o nl y followed by # vi rsh
bl o ckpul l example-domain vd a - wai t
To use live migration with post-copy storage migration:
On the destination run:
# q emu-i mg create -f q co w2 -o backi ng _fi l e= /so urce-ho st/vm. i mg
/d esti nati o n-ho st/vm. q co w2
On the source run:
# vi rsh mi g rate example-domain
On the destination run:
# vi rsh bl o ckpul l example-domain vd a - wai t
Note
Live image re-sizing will always re-size the image, but may not immediately be picked up by
guests. With recent guest kernels, the size of virtio-blk devices is automatically updated (older
kernels require a guest reboot). With SCSI devices, it is required to manually trigger a re-scan
in the guest with the command, echo >
/sys/cl ass/scsi _d evi ce/0 : 0 : 0 : 0 /d evi ce/rescan. In addition, with ID E it is
required to reboot the guest before it picks up the new size.
Run the following command: bl o ckresi ze [d o mai n] [path si ze] where:
D omain is the unique target name or source file of the domain whose size you want to change
Path size is a scaled integer which defaults to KiB (blocks of 1024 bytes) if there is no suffix.
You must use a suffix of " B" to for bytes.
170
Note
Live block copy is a feature that is not supported with the version of KVM that is supplied with
Red Hat Enterprise Linux. Live block copy is available with the version of KVM that is supplied
with Red Hat Virtualization. This version of KVM must be running on your physical host
machine in order for the feature to be supported. Contact your representative at Red Hat for
more details.
Live block copy allows you to copy an in use guest disk image to a destination image and switches
the guest disk image to the destination guest image while the guest is running. Whilst live migration
moves the memory and registry state of the host, the guest is kept in shared storage. Live block copy
allows you to move the entire guest contents to another host on the fly while the guest is running. Live
block copy may also be used for live migration without requiring permanent share storage. In this
method the disk image is copied to the destination host after migration, but while the guest is
running.
Live block copy is especially useful for the following applications:
moving the guest image from local storage to a central location
when maintenance is required, guests can be transferred to another location, with no loss of
performance
allows for management of guest images for speed and efficiency
image format conversions can be done without having to shut down the guest
171
4. A background task that loops over all disk clusters is executed. For each cluster, there are
the following possible cases and actions:
The cluster is already allocated in active and there is nothing to do.
Use bd rv_i s_al l o cated () to follow the backing file chain. If the cluster is read from
base (which is shared) there is nothing to do.
If bd rv_i s_al l o cated () variant is not feasible, rebase the image and compare the
read data with write data in base in order to decide if a copy is needed.
In all other cases, copy the cluster into acti ve
5. When the copy has completed, the backing file of active is switched to base (similar to
rebase)
To reduce the length of a backing chain after a series of snapshots, the following commands are
helpful: bl o ckco mmi t and bl o ckpul l . See Section 14.5.15, Using blockcommit to Shorten a
Backing Chain for more information.
172
<domain type='qemu'>
<uuid>00000000-0000-0000-0000-000000000000</uuid>
<memory>219136</memory>
<currentMemory>219136</currentMemory>
<vcpu>1</vcpu>
<os>
<type arch='i686' machine='pc'>hvm</type>
173
<boot dev='hd'/>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu</emulator>
<disk type='block' device='disk'>
<source dev='/dev/HostVG/QEMUGuest1'/>
<target dev='hda' bus='ide'/>
</disk>
</devices>
</domain>
14 .5.23. Creat ing a Virt ual Machine XML Dump (Configurat ion File)
Output a guest virtual machine's XML configuration file with vi rsh:
# virsh dumpxml {guest-id, guestname or uuid}
This command outputs the guest virtual machine's XML configuration file to standard out (std o ut).
You can save the data by piping the output to a file. An example of piping the output to a file called
guest.xml:
# virsh dumpxml GuestID > guest.xml
174
This file g uest. xml can recreate the guest virtual machine (refer to Section 14.6, Editing a Guest
Virtual Machine's configuration file . You can edit this XML configuration file to configure additional
devices or to deploy additional guest virtual machines.
An example of vi rsh d umpxml output:
# virsh dumpxml guest1-rhel6-64
<domain type='kvm'>
<name>guest1-rhel6-64</name>
<uuid>b8d7388a-bbf2-db3a-e962-b97ca6e514bd</uuid>
<memory>2097152</memory>
<currentMemory>2097152</currentMemory>
<vcpu>2</vcpu>
<os>
<type arch='x86_64' machine='rhel6.2.0'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none' io='threads'/>
<source file='/home/guest-images/guest1-rhel6-64.img'/>
<target dev='vda' bus='virtio'/>
<shareable/<
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
</disk>
<interface type='bridge'>
<mac address='52:54:00:b9:35:a9'/>
<source bridge='br0'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<input type='tablet' bus='usb'/>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'/>
<sound model='ich6'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
</sound>
175
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06'
function='0x0'/>
</memballoon>
</devices>
</domain>
Note that the <shareable/> flag is set. This indicates the device is expected to be shared between
domains (assuming the hypervisor and OS support this), which means that caching should be
deactivated for that device.
14 .5.24 . Creat ing a Guest Virt ual Machine from a Configurat ion File
Guest virtual machines can be created from XML configuration files. You can copy existing XML from
previously created guest virtual machines or use the d umpxml option (refer to Section 14.5.23,
Creating a Virtual Machine XML D ump (Configuration File) ). To create a guest virtual machine with
vi rsh from an XML file:
# virsh create configuration_file.xml
14 .6. Edit ing a Guest Virt ual Machine's configurat ion file
Instead of using the d umpxml option (refer to Section 14.5.23, Creating a Virtual Machine XML
D ump (Configuration File) ), guest virtual machines can be edited either while they are running or
while they are offline. The vi rsh ed i t command provides this functionality. For example, to edit the
guest virtual machine named rhel6:
# virsh edit rhel6
This opens a text editor. The default text editor is the $ED IT O R shell parameter (set to vi by default).
14 .6.1. Adding Mult ifunct ion PCI Devices t o KVM Guest Virt ual Machines
This section will demonstrate how to add multi-function PCI devices to KVM guest virtual machines.
1. Run the vi rsh ed i t [guestname] command to edit the XML configuration file for the
guest virtual machine.
2. In the address type tag, add a mul ti functi o n= ' o n' entry for functi o n= ' 0 x0 ' .
This enables the guest virtual machine to use the multifunction PCI devices.
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/libvirt/images/rhel62-1.img'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0' multifunction='on'/
</disk>
176
For a PCI device with two functions, amend the XML configuration file to include a second
device with the same slot number as the first device and a different function number, such as
functi o n= ' 0 x1' .
For Example:
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/libvirt/images/rhel62-1.img'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0' multifunction='on'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/libvirt/images/rhel62-2.img'/>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x1'/>
</disk>
3. l spci output from the KVM guest virtual machine shows:
$ lspci
00:05.0 SCSI storage controller: Red Hat, Inc Virtio block device
00:05.1 SCSI storage controller: Red Hat, Inc Virtio block device
177
178
x86_64
4
1199 MHz
1
2
2
1
3715908 KiB
179
--no d eset contains a list of NUMA nodes that are used by the host physical machine for
running the domain. The list contains nodes, each separated by a comma, with a dash - used for
node ranges and a caret ^ used for excluding a node.
Only one of the following three options can be used per instance:
--co nfi g will take effect on the next boot of a persistent guest virtual machine.
--l i ve will set the scheduler information of a running guest virtual machine.
--current will affect the current state of the guest virtual machine.
180
181
net_lo_00_00_00_00_00_00
net_macvtap0_52_54_00_12_fe_50
net_tun0
net_virbr0_nic_52_54_00_03_7d_cb
pci_0000_00_00_0
pci_0000_00_02_0
pci_0000_00_16_0
pci_0000_00_19_0
|
+- net_eth0_f0_de_f1_3a_35_4f
14 .8. St art ing, Suspending, Resuming, Saving, and Rest oring a Guest
Virt ual Machine
This section provides information on starting, suspending, resuming, saving, and restoring guest
virtual machines.
182
--bypass-cache - used if the domain is in the managedsave state. If this is used, it will restore
the guest virtual machine, avoiding the system cache. Note this will slow down the restore
process.
--fo rce-bo o t - discards any managedsave options and causes a fresh boot to occur
--pass-fd s - is a list of additional options separated by commas, which are passed onto the
guest virtual machine.
183
--manag ed -save - this option guarantees that any managed save image is also cleaned up.
Without using this option, attempts to undefine a domain with a managed save image will fail.
--snapsho ts-metad ata - this option guarantees that any snapshots (as shown with
snapsho t-l i st) are also cleaned up when undefining an inactive domain. Note that any
attempts to undefine an inactive domain whose configuration file contains snapshot metadata will
fail. If this option is used and the domain is active, it is ignored.
--sto rag e - using this option requires a comma separated list of volume target names or source
paths of storage volumes to be removed along with the undefined domain. This action will
undefine the storage volume before it is removed. Note that this can only be done with inactive
domains. Note too that this will only work with storage volumes that are managed by libvirt.
--remo ve-al l -sto rag e - in addition to undefining the domain, all associated storage volumes
are deleted.
--wi pe-sto rag e - in addition to deleting the storage volume, the contents are wiped.
184
If you want to restore the guest virtual machine directly from the XML file, the vi rsh resto re
command will do just that. You can monitor the process with the d o mjo bi nfo and cancel it with the
d o mjo babo rt.
14 .8.8. Updat ing t he Domain XML File t hat will be Used for Rest oring t he
Guest
The vi rsh save-i mag e-d efi ne file xml --runni ng | --paused command will update the
domain XML file that will be used when the specified file is later used during the vi rsh resto re
command. The xml argument must be an XML file name containing the alternative XML with changes
only in the host physical machine specific portions of the domain XML. For example, it can be used to
account for the file naming differences resulting from creating disk snapshots of underlying storage
after the guest was saved. The save image records if the domain should be restored to a running or
paused state. Using the options --runni ng or --paused dictates the state that is to be used.
185
14 .9. Shut t ing Down, Reboot ing, and Forcing Shut down of a Guest
Virt ual Machine
This section provides information about shutting down, rebooting, and forcing shutdown of a guest
virtual machine.
14 .9.2. Shut t ing Down Red Hat Ent erprise Linux 6 Guest s on a Red Hat
Ent erprise Linux 7 Host
Installing Red Hat Enterprise Linux 6 guest virtual machines with the Mi ni mal i nstal l ati o n
option does not install the acpid package. Red Hat Enterprise Linux 7 no longer requires this
package, as it has been taken over by systemd . However, Red Hat Enterprise Linux 6 guest virtual
machines running on a Red Hat Enterprise Linux 7 host still require it.
Without the acpid package, the Red Hat Enterprise Linux 6 guest virtual machine does not shut down
when the vi rsh shutd o wn command is executed. The vi rsh shutd o wn command is designed to
gracefully shut down guest virtual machines.
Using vi rsh shutd o wn is easier and safer for system administration. Without graceful shut down
with the vi rsh shutd o wn command a system administrator must log into a guest virtual machine
manually or send the C trl -Al t-D el key combination to each guest virtual machine.
Note
Other virtualized operating systems may be affected by this issue. The vi rsh shutd o wn
command requires that the guest virtual machine operating system is configured to handle
ACPI shut down requests. Many operating systems require additional configuration on the
guest virtual machine operating system to accept ACPI shut down requests.
186
Set the acpi d service to start during the guest virtual machine boot sequence and start the
service:
# chkconfig acpid on
# service acpid start
3. Prep are g u est d o main xml
Edit the domain XML file to include the following element. Replace the virtio serial port with
o rg . q emu. g uest_ag ent. 0 and use your guest's name instead of $guestname
<channel type='unix'>
<source mode='bind'
path='/var/lib/libvirt/qemu/{$guestname}.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
187
188
The configuration file is located in /etc/sysco nfi g /l i bvi rt-g uests. Edit the file,
remove the comment mark (#) and change the ON_SHUTDOWN=suspend to
ON_SHUTDOWN=shutdown. Remember to save the change.
$ vi /etc/sysconfig/libvirt-guests
# URIs to check for running guests
# example: URIS='default xen:/// vbox+tcp://host/system lxc:///'
#URIS=default
# action taken on host boot
# - start
all guests which were running on shutdown are started
on boot
#
regardless on their autostart settings
# - ignore
boot, however,
#
started by
#
libvirtd
#ON_BOOT=start
189
190
Use the vi rsh rebo o t command to reboot a guest virtual machine. The prompt will return once the
reboot has executed. Note that there may be a time lapse until the guest virtual machine returns.
#virsh reboot {domain-id, domain-name or domain-uuid} [--mo d e method]
You can control the behavior of the rebooting guest virtual machine by modifying the <o n_rebo o t>
element in the guest virtual machine's configuration file. Refer to Section 20.12, Events
Configuration for more information.
By default, the hypervisor will try to pick a suitable shutdown method. To specify an alternative
method, the --mo d e option can specify a comma separated list which includes i ni tctl , acpi ,
ag ent, and si g nal . The order in which drivers will try each mode is not related to the order
specified in the command. For strict control over ordering, use a single mode at a time and repeat the
command.
191
To get the Universally Unique Identifier (UUID ) for a guest virtual machine:
# virsh domuuid {domain-id or domain-name}
An example of vi rsh d o muui d output:
# virsh domuuid r5b2-mySQL01
4a4c59a7-ee3f-c781-96e4-288f2862f011
vr-rhel6u1-x86_64-kvm
9
vr-rhel6u1-x86_64-kvm
a03093a1-5da6-a2a2-3baf-a845db2f10b9
hvm
running
1
21.6s
2097152 kB
1025000 kB
yes
disable
selinux
0
system_u:system_r:svirt_t:s0:c612,c921 (permissive)
192
The po o l -i nfo pool-or-uuid command will list the basic information about the specified
storage pool object. This command requires the name or UUID of the storage pool. To retrieve this
information, use the following coomand:
po o l -l i st [--i nacti ve] [--al l ] [--persi stent] [--transi ent] [-auto start] [--no -auto start] [--d etai l s] type
This lists all storage pool objects known to libvirt. By default, only active pools are listed; but using
the --i nacti ve option lists just the inactive pools, and using the --al l option lists all of the
storage pools.
In addition to those options there are several sets of filtering options that can be used to filter the
content of the list. --persi stent restricts the list to persistent pools, --transi ent restricts the list
to transient pools, --auto start restricts the list to autostarting pools and finally --no -auto start
restricts the list to the storage pools that have autostarting disabled.
For all storage pool commands which require a type, the pool types must be separated by comma.
The valid pool types include: d i r, fs, netfs, l o g i cal , d i sk, i scsi , scsi , mpath, rbd , and
sheepd o g .
The --d etai l s option instructs vi rsh to additionally display pool persistence and capacity
related information where available.
Note
When this command is used with older servers, it is forced to use a series of API calls with an
inherent race, where a pool might not be listed or might appear more than once if it changed
its state between calls while the list was being collected. Newer servers however, do not have
this problem.
The po o l -refresh pool-or-uuid refreshes the list of volumes contained in pool.
1 4 .1 1 .2 .1 . Building a st o rage po o l
The po o l -bui l d pool-or-uuid --o verwri te --no -o verwri te command builds a pool with
a specified pool name or UUID. The options --o verwri te and --no -o verwri te can only be used
for a pool whose type is file system. If neither option is specified, and the pool is a file system type
pool, then the resulting build will only make the directory.
If --no -o verwri te is specified, it probes to determine if a file system already exists on the target
device, returning an error if it exists, or using mkfs to format the target device if it does not. If -o verwri te is specified, then the mkfs command is executed and any existing data on the target
device is overwritten.
193
1 4 .1 1 .2 .3. Cre at ing and st art ing a st o rage po o l fro m raw param e t e rs
# po o l -create-as name --pri nt-xml type so urce-ho st so urce-path so urced ev so urce-name <targ et> --so urce-fo rmat format
This command creates and starts a pool object name from the raw parameters given.
If --pri nt-xml is specified, then it prints the XML of the storage pool object without creating the
pool. Otherwise, the pool requires a type in order to be built. For all storage pool commands which
require a type, the pool types must be separated by comma. The valid pool types include: d i r, fs,
netfs, l o g i cal , d i sk, i scsi , scsi , mpath, rbd , and sheepd o g .
In contrast, the following command creates, but does not start, a pool object name from the raw
parameters given:
# po o l -d efi ne-as name --pri nt-xml type so urce-ho st so urce-path so urced ev so urce-name <targ et> --so urce-fo rmat format
If --pri nt-xml is specified, then it prints the XML of the pool object without defining the pool.
Otherwise, the pool has to have a specified type. For all storage pool commands which require a
type, the pool types must be separated by comma. The valid pool types include: d i r, fs, netfs,
l o g i cal , d i sk, i scsi , scsi , mpath, rbd , and sheepd o g .
The po o l -start pool-or-uuid starts the specified storage pool, which was previously defined
but inactive.
194
This method is the only method that should be used to edit an XML configuration file as it does error
checking before applying.
195
196
Note
The version of the scrub binary installed on the host will limit the algorithms that are available.
197
pool the volume is in. It also requires vol-name-or-key-or-path which is the name or key or path of the
volume to wipe. The --o ffset option is the position in the storage volume at which to start writing
the data. --l eng th length dictates an upper limit for the amount of data to be uploaded. An error
will occur if the local-file is greater than the specified --l eng th.
198
State
---------------------------------0 Domain-0
running
1 Domain202
paused
2 Domain010
inactive
3 Domain9600
crashed
There are seven states that can be visible using this command:
Running - The runni ng state refers to guest virtual machines which are currently active on a
CPU.
Idle - The i d l e state indicates that the domain is idle, and may not be running or able to run.
This can be caused because the domain is waiting on IO (a traditional wait state) or has gone
to sleep because there was nothing else for it to do.
Paused - The paused state lists domains that are paused. This occurs if an administrator
uses the paused button in vi rt-manag er or vi rsh suspend . When a guest virtual machine
is paused it consumes memory and other resources but it is ineligible for scheduling and CPU
resources from the hypervisor.
Shutdown - The shutd o wn state is for guest virtual machines in the process of shutting down.
The guest virtual machine is sent a shutdown signal and should be in the process of stopping
its operations gracefully. This may not work with all guest virtual machine operating systems;
some operating systems do not respond to these signals.
Shut off - The shut o ff state indicates that the domain is not running. This can be caused
when a domain completely shuts down or has not been started.
Crashed - The crashed state indicates that the domain has crashed and can only occur if the
guest virtual machine has been configured not to restart on crash.
D ying - D omains in the d yi ng state are in is in process of dying, which is a state where the
domain has not completely shut-down or crashed.
--manag ed -save Although this option alone does not filter the domains, it will list the domains
that have managed save state enabled. In order to actually list the domains seperately you will
need to use the --i nacti ve option as well.
--name is specified domain names are printed in a list. If --uui d is specified the donain's UUID
is printed instead. Using the option --tabl e specifies that a table style output should be used.
All three commands are mutually exclusive
--ti tl e This command must be used with --tabl e output. --ti tl ewill cause an extra column
to be created in the table with the short domain description (title).
--persi stentincludes persistent domains in a list. Use the --transi ent option.
--wi th-manag ed -save lists the domains that have been configured with managed save. To list
the commands without it, use the command --wi tho ut-manag ed -save
--state-runni ng filters out for the domains that are running, --state-paused for paused
domains, --state-shuto ff for domains that are turned off, and --state-o ther lists all states
as a fallback.
--auto start this option will cause the auto-starting domains to be listed. To list domains with
this feature disabled, use the option --no -auto start.
--wi th-snapsho t will list the domains whose snapshot images can be listed. To filter for the
domains without a snapshot, use the option --wi tho ut-snapsho t
199
Name
State
Domain-0
running
rhelvm
paused
For an example of vi rsh vcpui nfo output, refer to Section 14.13.2, D isplaying Virtual CPU
Information
1
2
running
10889.1s
yyyy
200
14 .13.4 . Displaying Informat ion about t he Virt ual CPU Count s of a Domain
vi rsh vcpuco unt requires a domain name or a domain ID . For example:
# virsh vcpucount rhel6
maximum
config
maximum
live
current
config
current
live
2
2
2
2
201
The following parameters may be set for the vi rsh setvcpus command:
{domain-name, domain-id or domain-uuid} - Specifies the virtual machine.
count - Specifies the number of virtual CPUs to set.
Note
The count value cannot exceed the number of CPUs that were assigned to the guest
virtual machine when it was created. It may also be limited by the host or the hypervisor.
For Xen, you can only adjust the virtual CPUs of a running domain if the domain is
paravirtualized.
--live - The default option, used if none are specified. The configuration change takes effect on
the running guest virtual machine. This is referred to as a hot plug if the number of vCPUs is
increased, and hot unplug if it is reduced.
Important
The vCPU hot unplug feature is a Technology Preview. Therefore, it is not supported and
not recommended for use in high-value deployments.
--config - The configuration change takes effect on the next reboot of the guest. Both the -co nfi g and --l i ve options may be specified together if supported by the hypervisor.
--current - Configuration change takes effect on the current state of the guest virtual machine. If
used on a running guest, it acts as --live, if used on a shut-down guest, it acts as --config.
--maximum - Sets a maximum vCPU limit that can be hot-plugged on the next reboot of the guest.
As such, it must only be used with the --co nfi g option, and not with the --l i ve option.
--guest - Instead of a hot plug or a hot unplug, the QEMU guest agent modifies the vCPU count
directly in the running guest by enabling or disabling vCPUs. This option cannot be used with
count value higher than the current number of vCPUs in the gueet, and configurations set with -guest are reset when a guest is rebooted.
Note
For information on increasing vCPU performance by using multi-queue, refer to the Red Hat
Enterprise Linux Virtualization Tuning and Optimization Guide.
Examp le 14 .4 . vC PU h o t p lu g an d h o t u n p lu g
To hot-plug a vCPU, run the following command on a guest with a single vCPU:
vi rsh setvcpus guestVM1 2 --live
This increases the number of vCPUs for guestVM1 to two. The change is performed while guestVM1
is running, as indicated by the --live option.
202
To hot-unplug one vCPU from the same running guest, run the following:
vi rsh setvcpus guestVM1 1 --live
Be aware, however, that currently, using vCPU hot unplug can lead to problems with further
modifications of the vCPU count.
203
14 .13.9. Displaying Guest Virt ual Machine Block Device Informat ion
Use vi rsh d o mbl kstat to display block device statistics for a running guest virtual machine.
# virsh domblkstat GuestName block-device
14 .13.10. Displaying Guest Virt ual Machine Net work Device Informat ion
Use vi rsh d o mi fstat to display network interface statistics for a running guest virtual machine.
# virsh domifstat GuestName interface-device
204
Warning
The commands in this section are only supported if the machine has the NetworkManager
service disabled, and is using the netwo rk service instead.
Often, these host interfaces can then be used by name within domain <i nterface> elements (such
as a system-created bridge interface), but there is no requirement that host interfaces be tied to any
particular guest configuration XML at all. Many of the commands for host interfaces are similar to the
205
ones used for domains, and the way to name an interface is either by its name or its MAC address.
However, using a MAC address for an i face option only works when that address is unique (if an
interface and a bridge share the same MAC address, which is often the case, then using that MAC
address results in an error due to ambiguity, and you must resort to a name instead).
1 4 .1 5 .1 .1 . De fining and st art ing a ho st physical m achine int e rface via an XML file
The vi rsh i face-d efi ne file command define a host interface from an XML file. This command
will only define the interface and will not start it.
vi rsh i face-d efi ne iface.xml
To start an interface which has already been defined, run i face-start interface, where interface
is the interface name.
206
207
Note
Live snapshots are not supported in Red Hat Enterprise Linux. There are additional options
available with the vi rsh snapsho t-create command for use with live snapshots which are
visible in libvirt, but not supported in Red Hat Enterprise Linux 6.
The options available in Red Hat Enterprise Linux include:
--red efi ne specifies that if all XML elements produced by snapsho t-d umpxml are valid; it can
be used to migrate snapshot hierarchy from one machine to another, to recreate hierarchy for the
case of a transient domain that goes away and is later recreated with the same name and UUID ,
or to make slight alterations in the snapshot metadata (such as host-specific aspects of the
domain XML embedded in the snapshot). When this option is supplied, the xml fi l e argument is
mandatory, and the domains current snapshot will not be altered unless the --current option is
also given.
--no -metad ata creates the snapshot, but any metadata is immediately discarded (that is, libvirt
does not treat the snapshot as current, and cannot revert to the snapshot unless --red efi ne is
later used to teach libvirt about the metadata again).
--reuse-external , if used, this option specifies the location of an existing external XML
snapshot to use. If an existing external snapshot does not already exist, the command will fail to
take a snapshot to avoid losing contents of the existing files.
208
--no -metad ata creates snapshot data but any metadata is immediately discarded (that is, libvirt
does not treat the snapshot as current, and cannot revert to the snapshot unless snapshot-create
is later used to teach libvirt about the metadata again). This option is incompatible with -pri nt-xml .
209
210
Outputs the name of the parent snapshot, if any, for the given snapshot, or for the current snapshot
with --current. To use, run:
#vi rsh snapsho t-parent d o mai n {snapsho t | --current}
Warning
Be aware that this is a destructive action; any changes in the domain since the last snapshot
was taken will be lost. Also note that the state of the domain after snapsho t-revert is
complete will be the state of the domain at the time the original snapshot was taken.
To revert the snapshot, run
# snapsho t-revert d o mai n {snapsho t | --current} [{--runni ng | --paused }]
[--fo rce]
Normally, reverting to a snapshot leaves the domain in the state it was at the time the snapshot was
created, except that a disk snapshot with no guest virtual machine state leaves the domain in an
inactive state. Passing either the --runni ng or --paused option will perform additional state
changes (such as booting an inactive domain, or pausing a running domain). Since transient
domains cannot be inactive, it is required to use one of these options when reverting to a disk
snapshot of a transient domain.
There are two cases where a snapsho t revert involves extra risk, which requires the use of -fo rce to proceed. One is the case of a snapshot that lacks full domain information for reverting
configuration; since libvirt cannot prove that the current configuration matches what was in use at
the time of the snapshot, supplying --fo rce assures libvirt that the snapshot is compatible with the
current configuration (and if it is not, the domain will likely fail to run). The other is the case of
reverting from a running domain to an active state where a new hypervisor has to be created rather
than reusing the existing hypervisor, because it implies drawbacks such as breaking any existing
VNC or Spice connections; this condition happens with an active snapshot that uses a provably
incompatible configuration, as well as with an inactive snapshot that is combined with the --start
or --pause option.
211
The --metad ata is used it will delete the snapshot's metadata maintained by libvirt, while leaving the
snapshot contents intact for access by external tools; otherwise deleting a snapshot also removes its
data contents from that point in time.
14 .16.3. Det ermining a Compat ible CPU Model t o Suit a Pool of Host Physical
Machines
Now that it is possible to find out what CPU capabilities a single host physical machine has, the next
step is to determine what CPU capabilities are best to expose to the guest virtual machine. If it is
known that the guest virtual machine will never need to be migrated to another host physical
machine, the host physical machine CPU model can be passed straight through unmodified. A
virtualized data center may have a set of configurations that can guarantee all servers will have
212
100% identical CPUs. Again the host physical machine CPU model can be passed straight through
unmodified. The more common case, though, is where there is variation in CPUs between host
physical machines. In this mixed CPU environment, the lowest common denominator CPU must be
determined. This is not entirely straightforward, so libvirt provides an API for exactly this task. If libvirt
is provided a list of XML documents, each describing a CPU model for a host physical machine,
libvirt will internally convert these to CPUID masks, calculate their intersection, and convert the
CPUID mask result back into an XML CPU description.
Here is an example of what libvirt reports as the capabilities on a basic workstation, when the vi rsh
capabi l i ti esis executed:
<capabilities>
<host>
<cpu>
<arch>i686</arch>
<model>pentium3</model>
<topology sockets='1' cores='2' threads='1'/>
<feature name='lahf_lm'/>
<feature name='lm'/>
<feature name='xtpr'/>
<feature name='cx16'/>
<feature name='ssse3'/>
<feature name='tm2'/>
<feature name='est'/>
<feature name='vmx'/>
<feature name='ds_cpl'/>
<feature name='monitor'/>
<feature name='pni'/>
<feature name='pbe'/>
<feature name='tm'/>
<feature name='ht'/>
<feature name='ss'/>
<feature name='sse2'/>
<feature name='acpi'/>
<feature name='ds'/>
<feature name='clflush'/>
<feature name='apic'/>
</cpu>
</host>
</capabilities>
Fig u re 14 .3. Pu llin g h o st p h ysical mach in e' s C PU mo d el in f o rmat io n
Now compare that to any random server, with the same vi rsh capabi l i ti es command:
<capabilities>
<host>
<cpu>
<arch>x86_64</arch>
213
<model>phenom</model>
<topology sockets='2' cores='4' threads='1'/>
<feature name='osvw'/>
<feature name='3dnowprefetch'/>
<feature name='misalignsse'/>
<feature name='sse4a'/>
<feature name='abm'/>
<feature name='cr8legacy'/>
<feature name='extapic'/>
<feature name='cmp_legacy'/>
<feature name='lahf_lm'/>
<feature name='rdtscp'/>
<feature name='pdpe1gb'/>
<feature name='popcnt'/>
<feature name='cx16'/>
<feature name='ht'/>
<feature name='vme'/>
</cpu>
...snip...
Fig u re 14 .4 . G en erat e C PU d escrip t io n f ro m a ran d o m server
To see if this CPU description is compatible with the previous workstation CPU description, use the
vi rsh cpu-co mpare command.
The reduced content was stored in a file named vi rsh-caps-wo rkstati o n-cpu-o nl y. xml and
the vi rsh cpu-co mpare command can be executed on this file:
# virsh cpu-compare virsh-caps-workstation-cpu-only.xml
Host physical machine CPU is a superset of CPU described in virsh-capsworkstation-cpu-only.xml
As seen in this output, libvirt is correctly reporting that the CPUs are not strictly compatible. This is
because there are several features in the server CPU that are missing in the client CPU. To be able to
migrate between the client and the server, it will be necessary to open the XML file and comment out
some features. To determine which features need to be removed, run the vi rsh cpu-basel i ne
command, on the bo th-cpus. xml which contains the CPU information for both machines. Running
# vi rsh cpu-basel i ne bo th-cpus. xml , results in:
<cpu match='exact'>
<model>pentium3</model>
<feature policy='require'
<feature policy='require'
<feature policy='require'
<feature policy='require'
<feature policy='require'
<feature policy='require'
<feature policy='require'
<feature policy='require'
<feature policy='require'
</cpu>
214
name='lahf_lm'/>
name='lm'/>
name='cx16'/>
name='monitor'/>
name='pni'/>
name='ht'/>
name='sse2'/>
name='clflush'/>
name='apic'/>
215
memo ry - The memory controller allows for setting limits on RAM and swap usage and querying
cumulative usage of all processes in the group
cpuset - The CPU set controller binds processes within a group to a set of CPUs and controls
migration between CPUs.
cpuacct - The CPU accounting controller provides information about CPU usage for a group of
processes.
cpu -The CPU scheduler controller controls the prioritization of processes in the group. This is
similar to granting ni ce level privileges.
d evi ces - The devices controller grants access control lists on character and block devices.
freezer - The freezer controller pauses and resumes execution of processes in the group. This is
similar to SIG ST O P for the whole group.
net_cl s - The network class controller manages network utilization by associating processes
with a tc network class.
In creating a group hierarchy cgroup will leave mount point and directory setup entirely to the
administrators discretion and is more complex than just adding some mount points to /etc/fstab.
It is necessary to setup the directory hierarchy and decide how processes get placed within it. This
can be done with the following virsh commands:
sched i nfo - described in Section 14.19, Setting Schedule Parameters
bl ki o tune- described in Section 14.20, D isplay or Set Block I/O Parameters
d o mi ftune- described in Section 14.5.9, Setting Network Interface Bandwidth Parameters
memtune - described in Section 14.21, Configuring Memory Tuning
216
The scheduler can be set with any of the following parameters: cpu_shares, vcpu_period and
vcpu_quota.
217
This command will configure a virtual network to be started automatically when the guest virtual
machine boots. To run this command:
# vi rsh net-auto start network [--d i sabl e]
This command accepts the --d i sabl e option which disables the autostart command.
14 .22.2. Creat ing a Virt ual Net work from an XML File
This command creates a virtual network from an XML file. Refer to libvirt's website to get a description
of the XML network format used by libvirt. In this command file is the path to the XML file. To create the
virtual network from an XML file, run:
# vi rsh net-create file
14 .22.6. Edit ing a Virt ual Net work's XML Configurat ion File
The following command edits the XML configuration file for a network. This is equivalent to:
#vi rsh net-d umpxml --i nacti ve network > network.xml
vi network.xml (or make changes with your other text editor)
virsh net-define network.xml
except that it does some error checking. The editor used can be supplied by the $VISUAL or
$ED ITOR environment variables, and defaults to " vi" . To edit the network, run:
#vi rsh net-ed i t network
218
14 .22.7. Get t ing Informat ion about a Virt ual Net work
This command returns basic information about the network object. To get the network information,
run:
# vi rsh net-i nfo network
14 .22.8. List ing Informat ion about a Virt ual Net work
Returns the list of active networks, if --al l is specified this will also include defined but inactive
networks, if --i nacti ve is specified only the inactive ones will be listed. You may also want to filter
the returned networks by --persi stent to list the persitent ones, --transi ent to list the transient
ones, --auto start to list the ones with autostart enabled, and --no -auto start to list the ones
with autostart disabled.
Note: When talking to older servers, this command is forced to use a series of API calls with an
inherent race, where a pool might not be listed or might appear more than once if it changed state
between calls while the list was being collected. Newer servers do not have this problem.
To list the virtual networks, run:
# net-l i st [--i nacti ve | --al l ] [--persi stent] [<--transi ent>] [-auto start] [<--no -auto start>]
14 .22.13. Updat ing an Exist ing Net work Definit ion File
219
This command updates the given section of an existing network definition, taking effect immediately,
without needing to destroy and re-start the network. This command is one of " add-first" , " add-last" ,
" add" (a synonym for add-last), " delete" , or " modify" . section is one of " " bridge" , " domain" , " ip" , " ipdhcp-host" , " ip-dhcp-range" , " forward" , " forward-interface" , " forward-pf" , " portgroup" , " dns-host" ,
" dns-txt" , or " dns-srv" , each section being named by a concatenation of the xml element hierarchy
leading to the element being changed. For example, " ip-dhcp-host" will change a <ho st> element
that is contained inside a <d hcp> element inside an <i p> element of the network. xml is either the
text of a complete xml element of the type being changed (e.g. " <host mac=" 00:11:22:33:44:55
ip=192.0.2.1/>" , or the name of a file that contains a complete xml element. D isambiguation is done
by looking at the first character of the provided text - if the first character is " <" , it is xml text, if the first
character is not " >" , it is the name of a file that contains the xml text to be used. The --parent-index
option is used to specify which of several parent elements the requested element is in (0-based). For
example, a dhcp <ho st> element could be in any one of multiple <i p> elements in the network; if a
parent-index isnt provided, the " most appropriate" <i p> element will be selected (usually the only
one that already has a <d hcp> element), but if --parent-index is given, that particular instance of
<i p> will get the modification. If --live is specified, affect a running network. If --config is
specified, affect the next startup of a persistent network. If -- current is specified, affect the current
network state. Both --live and --config options may be given, but --current is exclusive. Not
specifying any option is the same as specifying --current.
To update the configuration file, run:
# vi rsh net-upd ate network command section xml [--parent-index index]
[[--live] [--config] | [--current]]
220
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
221
Using ssh to manage virtual machines and hosts is discussed further in Section 5.1, Remote
Management with SSH .
222
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
223
Note
In order to attach the USB device to the guest virtual machine, you first must attach it to the
host physical machine and confirm that the device is working. If the guest is running, you need
to shut it down before proceeding.
224
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
225
226
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
Note
VNC is considered insecure by many security experts, however, several changes have been
made to enable the secure usage of VNC for virtualization on Red Hat enterprise Linux. The
guest machines only listen to the local host's loopback address (127. 0 . 0 . 1). This ensures
only those with shell privileges on the host can access virt-manager and the virtual machine
through VNC. Although virt-manager is configured to listen to other public network interfaces
and alternative methods can be configured, it is not recommended.
Remote administration can be performed by tunneling over SSH which encrypts the traffic.
Although VNC can be configured to access remotely without tunneling over SSH, for security
reasons, it is not recommended. To remotely administer the guest follow the instructions in:
Chapter 5, Remote Management of Guests. TLS can provide enterprise level security for
managing guest and host systems.
227
Your local desktop can intercept key combinations (for example, Ctrl+Alt+F1) to prevent them from
being sent to the guest machine. You can use the Sen d key menu option to send these sequences.
From the guest machine window, click the Sen d key menu and select the key sequence to send. In
addition, from this menu you can also capture the screen output.
SPICE is an alternative to VNC available for Red Hat Enterprise Linux.
228
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
229
230
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
231
Important
The hot unplugging feature is only available as a Technology Preview. Therefore, it is
not supported and not recommended for use in high-value deployments.
232
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
233
234
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
235
236
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
237
238
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
239
24 0
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
24 1
24 2
Chapt er 1 5. Managing G uest s wit h t he Virt ual Machine Manager (virt - manager)
24 3
Warning
You must n ever use these tools to write to a host physical machine or disk image which is
attached to a running virtual machine, not even to open such a disk image in write mode.
D oing so will result in disk corruption of the guest virtual machine. The tools try to prevent you
from doing this, however do not catch all cases. If there is any suspicion that a guest virtual
machine might be running, it is strongly recommended that the tools not be used, or at least
always use the tools in read-only mode.
24 4
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
Note
Some virtualization commands in Red Hat Enterprise Linux 6 allow you to specify a remote
libvirt connection. For example:
virt-df -c qemu://remote/system -d Guest
However, libguestfs in Red Hat Enterprise Linux 6 cannot access remote guests, and
commands using remote URLs like this do not work as expected. This affects the following Red
Hat Enterprise Linux 6 commands:
guestfish
guestmount
virt-alignment-scan
virt-cat
virt-copy-in
virt-copy-out
virt-df
virt-edit
virt-filesystems
virt-inspector
virt-inspector2
virt-list-filesystems
virt-list-partitions
virt-ls
virt-rescue
virt-sysprep
virt-tar
virt-tar-in
virt-tar-out
virt-win-reg
16.2. T erminology
This section explains the terms used throughout this chapter.
lib g u est f s ( G U EST FileSyst em LIB rary) - the underlying C library that provides the basic
functionality for opening disk images, reading and writing files and so on. You can write C
programs directly to this API, but it is quite low level.
g u est f ish ( G U EST Filesyst em In t eract ive SH ell) is an interactive shell that you can use
from the command line or from shell scripts. It exposes all of the functionality of the libguestfs API.
Various virt tools are built on top of libguestfs, and these provide a way to perform specific single
tasks from the command line. Tools include virt - d f , virt - rescu e, virt - resiz e and virt - ed it .
h ivex and Au g eas are libraries for editing the Windows Registry and Linux configuration files
respectively. Although these are separate from libguestfs, much of the value of libguestfs comes
from the combination of these tools.
g u est mo u n t is an interface between libguestfs and FUSE. It is primarily used to mount file
systems from disk images on your host physical machine. This functionality is not necessary, but
can be useful.
24 5
Note
libguestfs and guestfish do not require root privileges. You only need to run them as root if the
disk image being accessed needs root to read and/or write.
When you start guestfish interactively, it will display this prompt:
guestfish --ro -a /path/to/disk/image
Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.
Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell
><fs>
24 6
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
At the prompt, type ru n to initiate the library and attach the disk image. This can take up to 30
seconds the first time it is done. Subsequent starts will complete much faster.
Note
libguestfs will use hardware virtualization acceleration such as KVM (if available) to speed up
this process.
Once the ru n command has been entered, other commands can be used, as the following section
demonstrates.
24 7
without
specifying the filesystem type. For example a string such as "ext3"
or
"ntfs".
To view the actual contents of a file system, it must first be mounted. This example uses one of the
Windows partitions shown in the previous output (/d ev/vd a2), which in this case is known to
correspond to the C :\ drive:
><fs> mount-ro
><fs> ll /
total 1834753
drwxrwxrwx 1
drwxr-xr-x 21
lrwxrwxrwx 2
drwxrwxrwx 1
drwxrwxrwx 1
drwxrwxrwx 1
/dev/vda2 /
root
root
root
root
root
root
root
root
root
root
root
root
4096
4096
60
4096
4096
16384
Nov
Nov
Jul
Nov
Sep
Sep
1
16
14
15
19
19
11:40
21:45
2009
18:00
10:34
10:34
.
..
Documents and Settings
Program Files
Users
Windows
You can use guestfish commands such as l s, l l , cat, mo re, d o wnl o ad and tar-o ut to view and
download files and directories.
Note
There is no concept of a current working directory in this shell. Unlike ordinary shells, you
cannot for example use the cd command to change directories. All paths must be fully
qualified starting at the top with a forward slash (/) character. Use the Tab key to complete
paths.
To exit from the guestfish shell, type exi t or enter C trl + d .
24 8
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
Oct
Nov
Oct
Oct
Oct
Oct
28
17
27
27
27
28
09:09 .
15:10 ..
22:37 bin
21:52 boot
21:21 dev
09:09 etc
Because guestfish needs to start up the libguestfs back end in order to perform the inspection and
mounting, the run command is not necessary when using the -i option. The -i option works for
many common Linux and Windows guest virtual machines.
24 9
Once you are familiar with using guestfish interactively, according to your needs, writing shell scripts
with it may be useful. The following is a simple shell script to add a new MOTD (message of the day)
to a guest:
#!/bin/bash set -e
guestname="$1"
guestfish -d "$guestname" -i <<'EOF'
write /etc/motd "Welcome to Acme Incorporated."
chmod 0644 /etc/motd
EOF
250
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
Note
More information about Augeas can be found on the website http://augeas.net.
guestfish can do much more than we can cover in this introductory document. For example, creating
disk images from scratch:
guestfish -N fs
Or copying out whole directories from a disk image:
><fs> copy-out /home /tmp/home
For more information see the man page guestfish(1).
251
This section describes vi rt-rescue, which can be considered analogous to a rescue CD for virtual
machines. It boots a guest virtual machine into a rescue shell so that maintenance can be performed
to correct errors and the guest virtual machine can be repaired.
There is some overlap between virt-rescue and guestfish. It is important to distinguish their differing
uses. virt-rescue is for making interactive, ad-hoc changes using ordinary Linux file system tools. It is
particularly suited to rescuing a guest virtual machine that has failed . virt-rescue cannot be scripted.
In contrast, guestfish is particularly useful for making scripted, structured changes through a formal
set of commands (the libguestfs API), although it can also be used interactively.
252
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
When you are finished rescuing the guest virtual machine, exit the shell by entering exi t or C trl + d .
vi rt-rescue has many command line options. The options most often used are:
- - ro : Operate in read-only mode on the guest virtual machine. No changes will be saved. You can
use this to experiment with the guest virtual machine. As soon as you exit from the shell, all of your
changes are discarded.
- - n et wo rk: Enable network access from the rescue shell. Use this if you need to, for example,
download RPM or other files into the guest virtual machine.
Used
10233
2272744
Available
85634
4493036
Use%
11%
32%
(Where /d ev/vg _g uests/R HEL6 is a Red Hat Enterprise Linux 6 guest virtual machine disk image.
The path in this case is the host physical machine logical volume where this disk image is located.)
You can also use vi rt-d f on its own to list information about all of your guest virtual machines (ie.
those known to libvirt). The vi rt-d f command recognizes some of the same options as the
standard d f such as -h (human-readable) and -i (show inodes instead of blocks).
vi rt-d f also works on Windows guest virtual machines:
# virt-df -h
Filesystem
Size
F14x64:/dev/sda1
484.2M
F14x64:/dev/vg_f14x64/lv_root
7.4G
RHEL6brewx64:/dev/sda1
484.2M
RHEL6brewx64:/dev/vg_rhel6brewx64/lv_root
13.3G
Win7x32:/dev/sda1
100.0M
Win7x32:/dev/sda2
19.9G
7.4G
Used
66.3M
3.0G
52.6M
3.4G
24.1M
12.5G
Available
392.9M
4.4G
406.6M
Use%
14%
41%
11%
9.2G
75.9M
38%
26%
25%
253
Note
You can use vi rt-d f safely on live guest virtual machines, since it only needs read-only
access. However, you should not expect the numbers to be precisely the same as those from a
d f command running inside the guest virtual machine. This is because what is on disk will be
slightly out of synch with the state of the live guest virtual machine. Nevertheless it should be a
good enough approximation for analysis and monitoring purposes.
virt-df is designed to allow you to integrate the statistics into monitoring tools, databases and so on.
This allows system administrators to generate reports on trends in disk usage, and alerts if a guest
virtual machine is about to run out of disk space. To do this you should use the --csv option to
generate machine-readable Comma-Separated-Values (CSV) output. CSV output is readable by most
databases, spreadsheet software and a variety of other tools and programming languages. The raw
CSV looks like the following:
# virt-df --csv WindowsGuest
Virtual Machine,Filesystem,1K-blocks,Used,Available,Use%
Win7x32,/dev/sda1,102396,24712,77684,24.1%
Win7x32,/dev/sda2,20866940,7786652,13080288,37.3%
For resources and ideas on how to process this output to produce trends and alerts, refer to the
following URL: http://virt-tools.org/learning/advanced-virt-df/.
16.8. virt -resiz e: Resiz ing Guest Virt ual Machines Offline
This section provides information about resizing offline guest virtual machines.
254
Used
10.0M
2.2G
Available
83.6M
4.3G
Use%
11%
32%
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
255
[#####################################################]
Expanding /dev/sda1 using the 'resize2fs' method
Expanding /dev/sda2 using the 'pvresize' method
Expanding /dev/VolGroup00/LogVol00 using the 'resize2fs' method
5. Try to boot the virtual machine. If it works (and after testing it thoroughly) you can delete the
backup disk. If it fails, shut down the virtual machine, delete the new disk, and rename the
backup disk back to its original name.
6. Use vi rt-d f and/or vi rt-l i st-parti ti o ns to show the new size:
# virt-df -h /dev/vg_pin/RHEL6
Filesystem
Size
RHEL6:/dev/sda1
484.4M
RHEL6:/dev/VolGroup00/LogVol00 14.3G
Used
10.8M
2.2G
Available
448.6M
11.4G
Use%
3%
16%
Resizing guest virtual machines is not an exact science. If vi rt-resi ze fails, there are a number of
tips that you can review and attempt in the virt-resize(1) man page. For some older Red Hat Enterprise
Linux guest virtual machines, you may need to pay particular attention to the tip regarding GRUB.
16.9. virt -inspect or: Inspect ing Guest Virt ual Machines
This section provides information about inspecting guest virtual machines using vi rt-i nspecto r.
Note
Red Hat Enterprise Linux 6.2 ships with two variations of this program: vi rt-i nspecto r is
the original program as found in Red Hat Enterprise Linux 6.0 and is now deprecated
upstream. vi rt-i nspecto r2 is the same as the new upstream vi rt-i nspecto r program.
256
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
Or as shown here:
virt-inspector --xml GuestName > report.xml
The result will be an XML report (repo rt. xml ). The main components of the XML file are a top-level
<operatingsytems> element containing usually a single <operatingsystem> element, similar to the
following:
<operatingsystems>
<operatingsystem>
<!-- the type of operating system and Linux distribution -->
<name>linux</name>
<distro>rhel</distro>
<!-- the name, version and architecture -->
<product_name>Red Hat Enterprise Linux Server release 6.4
</product_name>
<major_version>6</major_version>
<minor_version>4</minor_version>
<package_format>rpm</package_format>
<package_management>yum</package_management>
<root>/dev/VolGroup/lv_root</root>
<!-- how the filesystems would be mounted when live -->
<mountpoints>
<mountpoint dev="/dev/VolGroup/lv_root">/</mountpoint>
<mountpoint dev="/dev/sda1">/boot</mountpoint>
<mountpoint dev="/dev/VolGroup/lv_swap">swap</mountpoint>
</mountpoints>
< !-- filesystems-->
<filesystem dev="/dev/VolGroup/lv_root">
<label></label>
<uuid>b24d9161-5613-4ab8-8649-f27a8a8068d3</uuid>
<type>ext4</type>
<content>linux-root</content>
<spec>/dev/mapper/VolGroup-lv_root</spec>
</filesystem>
<filesystem dev="/dev/VolGroup/lv_swap">
<type>swap</type>
<spec>/dev/mapper/VolGroup-lv_swap</spec>
</filesystem>
<!-- packages installed -->
<applications>
<application>
<name>firefox</name>
<version>3.5.5</version>
<release>1.fc12</release>
</application>
</applications>
</operatingsystem>
</operatingsystems>
257
Processing these reports is best done using W3C standard XPath queries. Red Hat Enterprise Linux
6 comes with a command line program (xpath) which can be used for simple instances; however, for
long-term and advanced usage, you should consider using an XPath library along with your favorite
programming language.
As an example, you can list out all file system devices using the following XPath query:
virt-inspector --xml GuestName | xpath //filesystem/@ dev
Found 3 nodes:
-- NODE -dev="/dev/sda1"
-- NODE -dev="/dev/vg_f12x64/lv_root"
-- NODE -dev="/dev/vg_f12x64/lv_swap"
Or list the names of all applications installed by entering:
virt-inspector --xml GuestName | xpath //application/name
[...long list...]
258
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
Note
Hex-quoting is used for strings because the format does not properly define a portable
encoding method for strings. This is the only way to ensure fidelity when transporting . R EG
files from one machine to another.
You can make hex-quoted strings printable by piping the output of vi rt-wi n-reg through
this simple Perl script:
perl -MEncode -pe's?hex\((\d+)\):(\S+)?
$t=$1;$_=$2;s,\,,,g;"str($t):\"".decode(utf16le=>pack("H*",$_))."\""
?eg'
To merge changes into the Windows Registry of an offline guest virtual machine, you must first
prepare a . R EG file. There is a great deal of documentation about doing this available here. When
you have prepared a . R EG file, enter the following:
# virt-win-reg --merge WindowsGuest input.reg
This will update the registry in the guest virtual machine.
259
The binding for each language is essentially the same, but with minor syntactic changes. A C
statement:
guestfs_launch (g);
Would appear like the following in Perl:
$g->launch ()
Or like the following in OCaml:
g#launch ()
Only the API from C is detailed in this section.
In the C and C++ bindings, you must manually check for errors. In the other bindings, errors are
converted into exceptions; the additional error checks shown in the examples below are not
necessary for other languages, but conversely you may wish to add code to catch exceptions. Refer
to the following list for some points of interest regarding the architecture of the libguestfs API:
The libguestfs API is synchronous. Each call blocks until it has completed. If you want to make
calls asynchronously, you have to create a thread.
The libguestfs API is not thread safe: each handle should be used only from a single thread, or if
you want to share a handle between threads you should implement your own mutex to ensure that
two threads cannot execute commands on one handle at the same time.
You should not open multiple handles on the same disk image. It is permissible if all the handles
are read-only, but still not recommended.
You should not add a disk image for writing if anything else could be using that disk image (eg. a
live VM). D oing this will cause disk corruption.
Opening a read-only handle on a disk image which is currently in use (eg. by a live VM) is
possible; however, the results may be unpredictable or inconsistent particularly if the disk image
is being heavily written to at the time you are reading it.
260
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
/* ... */
guestfs_close (g);
exit (EXIT_SUCCESS);
}
Save this program to a file (test. c). Compile this program and run it with the following two
commands:
gcc -Wall test.c -o test -lguestfs
./test
At this stage it should print no output. The rest of this section demonstrates an example showing how
to extend this program to create a new disk image, partition it, format it with an ext4 file system, and
create some files in the file system. The disk image will be called d i sk. i mg and be created in the
current directory.
The outline of the program is:
Create the handle.
Add disk(s) to the handle.
Launch the libguestfs back end.
Create the partition, file system and files.
Close the handle and exit.
Here is the modified program:
#include
#include
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<string.h>
<fcntl.h>
<unistd.h>
<guestfs.h>
int
main (int argc, char *argv[])
{
guestfs_h *g;
size_t i;
g = guestfs_create ();
if (g == NULL) {
perror ("failed to create libguestfs handle");
exit (EXIT_FAILURE);
}
/* Create a raw-format sparse disk image, 512 MB in size. */
int fd = open ("disk.img", O_CREAT|O_WRONLY|O_TRUNC|O_NOCTTY, 0666);
if (fd == -1) {
perror ("disk.img");
exit (EXIT_FAILURE);
}
261
262
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
exit (EXIT_FAILURE);
}
/* Create an ext4 filesystem on the partition. */
if (guestfs_mkfs (g, "ext4", partitions[0]) == -1)
exit (EXIT_FAILURE);
/* Now mount the filesystem so that we can add files. */
if (guestfs_mount_options (g, "", partitions[0], "/") == -1)
exit (EXIT_FAILURE);
/* Create some files and directories. */
if (guestfs_touch (g, "/empty") == -1)
exit (EXIT_FAILURE);
const char *message = "Hello, world\n";
if (guestfs_write (g, "/hello", message, strlen (message)) == -1)
exit (EXIT_FAILURE);
if (guestfs_mkdir (g, "/foo") == -1)
exit (EXIT_FAILURE);
/* This uploads the local file /etc/resolv.conf into the disk image.
*/
if (guestfs_upload (g, "/etc/resolv.conf", "/foo/resolv.conf") == -1)
exit (EXIT_FAILURE);
/* Because 'autosync' was set (above) we can just close the handle
* and the disk contents will be synchronized. You can also do
* this manually by calling guestfs_umount_all and guestfs_sync.
*/
guestfs_close (g);
/* Free up the lists. */
for (i = 0; devices[i] != NULL; ++i)
free (devices[i]);
free (devices);
for (i = 0; partitions[i] != NULL; ++i)
free (partitions[i]);
free (partitions);
exit (EXIT_SUCCESS);
}
Compile and run this program with the following two commands:
gcc -Wall test.c -o test -lguestfs
./test
If the program runs to completion successfully then you should be left with a disk image called
d i sk. i mg , which you can examine with guestfish:
guestfish --ro -a disk.img -m /dev/sda1
><fs> ll /
><fs> cat /foo/resolv.conf
263
By default (for C and C++ bindings only), libguestfs prints errors to stderr. You can change this
behavior by setting an error handler. The guestfs(3) man page discusses this in detail.
16.12. virt -sysprep: Reset t ing Virt ual Machine Set t ings
The virt-sysprep command line tool can be used to reset or unconfigure a guest virtual machine so
that clones can be made from it. This process involves removing SSH host keys, persistent network
MAC configuration, and user accounts. virt-sysprep can also customize a virtual machine, for
instance by adding SSH keys, users or logos. Each step can be enabled or disabled as required.
The term " sysprep" is derived from the System Preparation tool (sysprep.exe) which is used with the
Microsoft Windows systems. D espite this, the tool does not currently work on Windows guests.
Note
libguestfs and guestfish do not require root privileges. You only need to run them as root if the
disk image being accessed needs root access to read and/or write.
The virt-sysprep tool is part of the libguestfs-tools-c package, which is installed with the following
command:
$ yum i nstal l libguestfs-tools-c
Alternatively, just the virt-sysprep tool can be installed with the following command:
$ yum i nstal l /usr/bi n/vi rt-sysprep
Important
virt-sysprep modifies the guest or disk image in place. To use virt-sysprep, the guest virtual
machine must be offline, so you must shut it down before running the commands. To preserve
the existing contents of the guest virtual machine, you must snapshot, copy or clone the disk
first. Refer to libguestfs.org for more information on copying and cloning disks.
The following commands are available to use with virt-sysprep:
T ab le 16 .1. virt-sysprep co mman d s
C o mman d
D escrip t io n
Examp le
--help
$ vi rt-sysprep --hel p
264
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
C o mman d
D escrip t io n
Examp le
$ vi rt-sysprep --ad d
/d ev/vms/d i sk. i mg
-n or --dry-run or --dryrun
--enable [operations]
$ vi rt-sysprep -c
q emu: ///system
$ vi rt-sysprep --enabl e
ssh-ho tkeys,ud evpersi stent-net
$ vi rt-sysprep --fo rmat
raw -a disk.img forces raw
format (no auto-detection) for
disk.img, but vi rt-sysprep -fo rmat raw -a disk.img
--fo rmat auto -a
ano ther. i mg forces raw
format (no auto-detection) for
d i sk. i mg and reverts to autodetection for ano ther. i mg . If
you have untrusted raw-format
guest disk images, you should
use this option to specify the
disk format. This avoids a
possible security problem with
malicious guests.
265
C o mman d
D escrip t io n
Examp le
--list-operations
--mount-options
-q or --quiet
-v or --verbose
-V or --version
266
$ vi rt-sysprep -q
$ vi rt-sysprep -v
$ vi rt-sysprep -V
Chapt er 1 6 . G uest Virt ual Machine Disk Access wit h O ffline T ools
C o mman d
D escrip t io n
Examp le
--root-password
267
268
Chapt er 1 7 . Using Simple T ools for G uest Virt ual Machine Management
269
270
271
Fig u re 18.2. Lin u x h o st p h ysical mach in e wit h an in t erf ace t o a virt u al n et wo rk swit ch
This vi rbr0 interface can be viewed with the i fco nfi g and i p commands like any other interface:
$ ifconfig virbr0
virbr0
Link encap:Ethernet HWaddr 1B:C4:94:CF:FD:17
inet addr:192.168.122.1 Bcast:192.168.122.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:3097 (3.0 KiB)
$ ip addr show virbr0
3: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN
link/ether 1b:c4:94:cf:fd:17 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
272
Warning
The only bonding modes that should be used with a guest virtual machine are Mode 1, Mode
2, and Mode 4. Under no circumstances should Modes 0, 3, 5, or 6 be used. It should also be
noted that mii-monitoring should be used to monitor bonding modes as arp-monitoring does
not work.
For more information on bonding modes, refer to the knowledgebase article on bonding modes, or
The Red Hat Enterprise Linux 6 D eployment Guide.
For a detailed explanation of bridge_opts parameters, see the Red Hat Enterprise Virtualization
Administration Guide.
273
Warning
Virtual network switches use NAT configured by iptables rules. Editing these rules while the
switch is running is not recommended, as incorrect rules may result in the switch being unable
to communicate.
If the switch is not running, you can set th public IP range for forward mode NAT in order to create a
port masquerading range by running:
# i ptabl es -j SNAT --to -so urce [start]-[end ]
274
275
276
277
Note
A virtual network can be restricted to a specific physical interface. This may be useful on a
physical system that has several interfaces (for example, eth0 , eth1 and eth2). This is only
useful in routed and NAT modes, and can be defined in the d ev= <i nterface> option, or in
vi rt-manag er when creating a new virtual network.
278
D eploying guest virtual machines which must be easily accessible to an existing physical
network. Placing guest virtual machines on a physical network where they must access services
within an existing broadcast domain, such as D HCP.
Connecting guest virtual machines to an exsting network where VLANs are used.
279
physical network connections. One interface is used for management and accounting, the other is for
the virtual machines to connect through. Each guest has its own public IP address, but the host
physical machines use private IP address as management of the guests can only be performed by
internal administrators. Refer to the following diagram to understand this scenario:
280
NAT (Network Address Translation) mode is the default mode. It can be used for testing when there is
no need for direct network visibility.
281
282
283
284
285
286
287
288
289
290
291
292
293
p rivat e
All packets are sent to the external bridge and will only be delivered to a target VM on the
same host physical machine if they are sent through an external router or gateway and that
device sends them back to the host physical machine. This procedure is followed if either
the source or destination device is in private mode.
p asst h ro u g h
This feature attaches a virtual function of a SRIOV capable NIC directly to a VM without
losing the migration capability. All packets are sent to the VF/IF of the configured network
device. D epending on the capabilities of the device additional prerequisites or limitations
may apply; for example, on Linux this requires kernel 2.6.38 or newer.
Each of the four modes is configured by changing the domain xml file. Once this file is opened,
change the mode setting as shown:
<devices>
...
<interface type='direct'>
<source dev='eth0' mode='vepa'/>
</interface>
</devices>
The network access of direct attached guest virtual machines can be managed by the hardware
switch to which the physical interface of the host physical machine is connected to.
The interface can have additional parameters as shown below, if the switch is conforming to the IEEE
802.1Qbg standard. The parameters of the virtualport element are documented in more detail in the
IEEE 802.1Qbg standard. The values are network specific and should be provided by the network
administrator. In 802.1Qbg terms, the Virtual Station Interface (VSI) represents the virtual interface of
a virtual machine.
Note that IEEE 802.1Qbg requires a non-zero value for the VLAN ID . Also if the switch is conforming
to the IEEE 802.1Qbh standard, the values are network specific and should be provided by the
network administrator.
Virt u al St at io n In t erf ace t yp es
man ag erid
The VSI Manager ID identifies the database containing the VSI type and instance
definitions. This is an integer value and the value 0 is reserved.
t yp eid
The VSI Type ID identifies a VSI type characterizing the network access. VSI types are
typically managed by network administrator. This is an integer value.
t yp eid versio n
The VSI Type Version allows multiple versions of a VSI Type. This is an integer value.
in st an ceid
The VSI Instance ID Identifier is generated when a VSI instance (i.e. a virtual interface of a
virtual machine) is created. This is a globally unique identifier.
p ro f ileid
294
The profile ID contains the name of the port profile that is to be applied onto this interface.
This name is resolved by the port profile database into the network parameters from the port
profile, and those network parameters will be applied to this interface.
Each of the four types is configured by changing the domain xml file. Once this file is opened,
change the mode setting as shown:
<devices>
...
<interface type='direct'>
<source dev='eth0.2' mode='vepa'/>
<virtualport type="802.1Qbg">
<parameters managerid="11" typeid="1193047" typeidversion="2"
instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/>
</virtualport>
</interface>
</devices>
The profile ID is shown here:
<devices>
...
<interface type='direct'>
<source dev='eth0' mode='private'/>
<virtualport type='802.1Qbh'>
<parameters profileid='finance'/>
</virtualport>
</interface>
</devices>
...
295
As previously mentioned, applying network traffic filtering rules can be done on individual network
interfaces that are configured for certain types of network configurations. Supported network types
include:
network
ethernet -- must be used in bridging mode
bridge
296
Filtering rules are organized in filter chains. These chains can be thought of as having a tree
structure with packet filtering rules as entries in individual chains (branches).
Packets start their filter evaluation in the root chain and can then continue their evaluation in other
chains, return from those chains back into the root chain or be dropped or accepted by a filtering
rule in one of the traversed chains.
Libvirt's network filtering system automatically creates individual root chains for every virtual
machine's network interface on which the user chooses to activate traffic filtering. The user may write
filtering rules that are either directly instantiated in the root chain or may create protocol-specific
filtering chains for efficient evaluation of protocol-specific rules.
The following chains exist:
root
mac
stp (spanning tree protocol)
vlan
arp and rarp
ipv4
ipv6
Multiple chains evaluating the mac, stp, vlan, arp, rarp, ipv4, or ipv6 protocol can be created using
the protocol name only as a prefix in the chain's name.
297
</rule>
<rule action='accept' direction='inout' priority='650'>
<arp opcode='Reply'/>
</rule>
<rule action='drop' direction='inout' priority='1000'/>
</filter>
The consequence of putting ARP-specific rules in the arp chain, rather than for example in the root
chain, is that packets protocols other than ARP do not need to be evaluated by ARP protocolspecific rules. This improves the efficiency of the traffic filtering. However, one must then pay
attention to only putting filtering rules for the given protocol into the chain since other rules will not
be evaluated. For example, an IPv4 rule will not be evaluated in the ARP chain since IPv4 protocol
packets will not traverse the ARP chain.
D ef au lt p rio rit y
stp
mac
vlan
ipv4
ipv6
arp
rarp
-810
-800
-750
-700
-600
-500
-400
Note
A chain with a lower priority value is accessed before one with a higher value.
The chains listed in Table 18.1, Filtering chain default priorities values can be also be
assigned custom priorities by writing a value in the range [-1000 to 1000] into the priority
(XML) attribute in the filter node. Section 18.12.2, Filtering Chains filter shows the default
priority of -500 for arp chains, for example.
298
The parameter IP represents the IP address that the operating system inside the virtual machine is
expected to use on the given interface. The IP parameter is special in so far as the libvirt daemon will
try to determine the IP address (and thus the IP parameter's value) that is being used on an interface
if the parameter is not explicitly provided but referenced. For current limitations on IP address
detection, consult the section on limitations Section 18.12.12, Limitations on how to use this feature
and what to expect when using it. The XML file shown in Section 18.12.2, Filtering Chains contains
the filter no-arp-spoofing, which is an example of using a network filter XML to reference the MAC
and IP variables.
Note that referenced variables are always prefixed with the character $. The format of the value of a
variable must be of the type expected by the filter attribute identified in the XML. In the above example,
the IP parameter must hold a legal IP address in standard format. Failure to provide the correct
structure will result in the filter variable not being replaced with a value and will prevent a virtual
machine from starting or will prevent an interface from attaching when hotplugging is being used.
Some of the types that are expected for each XML attribute are shown in the example Example 18.4,
Sample variable types .
299
a virtual machine to receive traffic on a set of ports, which are specified in DSTPORTS, from the set
of source IP address specified in SRCIPADDRESSES. The rule generates all combinations of
elements of the variable DSTPORTS with those of SRCIPADDRESSES by using two independent
iterators to access their elements.
<rule action='accept' direction='in' priority='500'>
<ip srcipaddr='$SRCIPADDRESSES[@ 1]' dstportstart='$DSTPORTS[@ 2]'/>
</rule>
Assign concrete values to SRCIPADDRESSES and DSTPORTS as shown:
SRCIPADDRESSES = [ 10.0.0.1, 11.1.2.3 ]
DSTPORTS = [ 80, 8080 ]
Assigning values to the variables using $SR C IP AD D R ESSES[@ 1] and $D ST P O R T S[@ 2] would
then result in all combinations of addresses and ports being created as shown:
10.0.0.1, 80
10.0.0.1, 8080
11.1.2.3, 80
11.1.2.3, 8080
Accessing the same variables using a single iterator, for example by using the notation
$SR C IP AD D R ESSES[@ 1] and $D ST P O R T S[@ 1], would result in parallel access to both lists
and result in the following combination:
10.0.0.1, 80
11.1.2.3, 8080
Note
$VAR IABLE is short-hand for $VAR IABLE[@ 0 ]. The former notation always assumes the role
of iterator with i terato r i d = "0 " added as shown in the opening paragraph at the top of
this section.
18.12.5. Aut omat ic IP Address Det ect ion and DHCP Snooping
This section provides information about automatic IP address detection and D HCP snooping.
1 8 .1 2 .5 .1 . Int ro duct io n
The detection of IP addresses used on a virtual machine's interface is automatically activated if the
variable IP is referenced but no value has been assigned to it. The variable CTRL_IP_LEARNING
can be used to specify the IP address learning method to use. Valid values include: any, dhcp, or
none.
The value any instructs libvirt to use any packet to determine the address in use by a virtual machine,
which is the default setting if the variable TRL_IP_LEARNING is not set. This method will only detect
a single IP address per interface. Once a guest virtual machine's IP address has been detected, its IP
300
network traffic will be locked to that address, if for example, IP address spoofing is prevented by one
of its filters. In that case, the user of the VM will not be able to change the IP address on the interface
inside the guest virtual machine, which would be considered IP address spoofing. When a guest
virtual machine is migrated to another host physical machine or resumed after a suspend operation,
the first packet sent by the guest virtual machine will again determine the IP address that the guest
virtual machine can use on a particular interface.
The value of dhcp instructs libvirt to only honor D HCP server-assigned addresses with valid leases.
This method supports the detection and usage of multiple IP address per interface. When a guest
virtual machine resumes after a suspend operation, any valid IP address leases are applied to its
filters. Otherwise the guest virtual machine is expected to use D HCP to obtain a new IP addresses.
When a guest virtual machine migrates to another physical host physical machine, the guest virtual
machine is required to re-run the D HCP protocol.
If CTRL_IP_LEARNING is set to none, libvirt does not do IP address learning and referencing IP
without assigning it an explicit value is an error.
Note
Automatic D HCP detection listens to the D HCP traffic the guest virtual machine exchanges with
the D HCP server of the infrastructure. To avoid denial-of-service attacks on libvirt, the
evaluation of those packets is rate-limited, meaning that a guest virtual machine sending an
excessive number of D HCP packets per second on an interface will not have all of those
packets evaluated and thus filters may not get adapted. Normal D HCP client behavior is
assumed to send a low number of D HCP packets per second. Further, it is important to setup
appropriate filters on all guest virtual machines in the infrastructure to avoid them being able
to send D HCP packets. Therefore guest virtual machines must either be prevented from
sending UD P and TCP traffic from port 67 to port 68 or the D HCPSERVER variable should be
used on all guest virtual machines to restrict D HCP server messages to only be allowed to
originate from trusted D HCP servers. At the same time anti-spoofing prevention must be
enabled on all guest virtual machines in the subnet.
301
</filterref>
</interface>
D ef in it io n
MAC
IP
IPV6
D HCPSERVER
D HCPSERVERV6
CTRL_IP_LEARNING
302
not known to libvirt, yet. However, once a virtual machine is started or a network interface referencing
a filter is to be hotplugged, all network filters in the filter tree must be available. Otherwise the virtual
machine will not start or the network interface cannot be attached.
The traffic filtering rule starts with the rule node. This node may contain up to three of the following
attributes:
action is mandatory can have the following values:
drop (matching the rule silently discards the packet with no further analysis)
reject (matching the rule generates an ICMP reject message with no further analysis)
accept (matching the rule accepts the packet with no further analysis)
return (matching the rule passes this filter, but returns control to the calling filter for further
analysis)
continue (matching the rule goes on to the next rule for further analysis)
direction is mandatory can have the following values:
in for incomming traffic
out for outgoing traffic
inout for incoming and outgoing traffic
priority is optional. The priority of the rule controls the order in which the rule will be instantiated
relative to other rules. Rules with lower values will be instantiated before rules with higher values.
Valid values are in the range of -1000 to 1000. If this attribute is not provided, priority 500 will be
assigned by default. Note that filtering rules in the root chain are sorted with filters connected to
the root chain following their priorities. This allows to interleave filtering rules with access to filter
chains. Refer to Section 18.12.3, Filtering Chain Priorities for more information.
statematch is optional. Possible values are '0' or 'false' to turn the underlying connection state
matching off. The default setting is 'true' or 1
For more information see Section 18.12.11, Advanced Filter Configuration Topics .
303
The above example Example 18.7, An Example of a clean traffic filter indicates that the traffic of type
ip will be associated with the chain ipv4 and the rule will have pri o ri ty= 500. If for example another
filter is referenced whose traffic of type ip is also associated with the chain ipv4 then that filter's rules
will be ordered relative to the pri o ri ty= 500 of the shown rule.
A rule may contain a single rule for filtering of traffic. The above example shows that traffic of type ip
is to be filtered.
304
the protocol property attribute2 does not match value2 and the protocol property attribute3
matches value3.
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
dstmacaddr
dstmacmask
MAC_AD D R
MAC_MASK
protocolid
comment
STRING
1 8 .1 2 .1 0 .2 . VLAN (8 0 2 .1 Q)
Protocol ID : vlan
Rules of this type should go either into the root or vlan chain.
T ab le 18.4 . VLAN p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
dstmacaddr
dstmacmask
MAC_AD D R
MAC_MASK
vlan-id
encap-protocol
comment
STRING
305
Rules of this type should go either into the root or stp chain.
T ab le 18.5. ST P p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
type
UINT8
flags
root-priority
root-priority-hi
root-address
root-address-mask
roor-cost
root-cost-hi
sender-priority-hi
sender-address
sender-address-mask
UINT8
UINT16
UINT16 (0x0-0xfff, 0 - 4095)
MAC _AD D RESS
MAC _MASK
UINT32
UINT32
UINT16
MAC_AD D RESS
MAC_MASK
port
port_hi
msg-age
msg-age-hi
max-age-hi
hello-time
hello-time-hi
forward-delay
forward-delay-hi
comment
UINT16
UINT16
UINT16
UINT16
UINT16
UINT16
UINT16
UINT16
UINT16
STRING
1 8 .1 2 .1 0 .4 . ARP/RARP
Protocol ID : arp or rarp
Rules of this type should either go into the root or arp/rarp chain.
T ab le 18.6 . AR P an d R AR P p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
dstmacaddr
dstmacmask
MAC_AD D R
MAC_MASK
hwtype
protocoltype
UINT16
UINT16
306
At t rib u t e N ame
D at at yp e
D ef in it io n
opcode
UINT16, STRING
arpsrcmacaddr
MAC_AD D R
arpdstmacaddr
MAC _AD D R
arpsrcipaddr
IP_AD D R
arpdstipaddr
IP_AD D R
gratututous
BOOLEAN
comment
STRING
1 8 .1 2 .1 0 .5 . IPv4
Protocol ID : ip
Rules of this type should either go into the root or ipv4 chain.
T ab le 18.7. IPv4 p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
dstmacaddr
dstmacmask
MAC_AD D R
MAC_MASK
srcipaddr
srcipmask
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
protocol
UINT8, STRING
srcportstart
UINT16
srcportend
UINT16
dstportstart
UNIT16
307
At t rib u t e N ame
D at at yp e
D ef in it io n
dstportend
UNIT16
comment
STRING
1 8 .1 2 .1 0 .6 . IPv6
Protocol ID : ipv6
Rules of this type should either go into the root or ipv6 chain.
T ab le 18.8. IPv6 p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
dstmacaddr
dstmacmask
MAC_AD D R
MAC_MASK
srcipaddr
srcipmask
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
protocol
UINT8, STRING
scrportstart
UNIT16
srcportend
UINT16
dstportstart
UNIT16
dstportend
UNIT16
comment
STRING
1 8 .1 2 .1 0 .7 . T CP/UDP/SCT P
Protocol ID : tcp, udp, sctp
The chain parameter is ignored for this type of traffic and should either be omitted or set to root. .
T ab le 18.9 . T C P/U D P/SC T P p ro t o co l t yp es
308
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcipaddr
srcipmask
MAC_AD D R
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
scripto
IP_AD D R
srcipfrom
IP_AD D R
dstipfrom
IP_AD D R
dstipto
IP_AD D R
scrportstart
UNIT16
srcportend
UINT16
dstportstart
UNIT16
dstportend
UNIT16
comment
state
STRING
STRING
flags
STRING
ipset
STRING
ipsetflags
IPSETFLAGS
1 8 .1 2 .1 0 .8 . ICMP
Protocol ID : icmp
Note: The chain parameter is ignored for this type of traffic and should either be omitted or set to root.
T ab le 18.10. IC MP p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
309
At t rib u t e N ame
D at at yp e
D ef in it io n
dstmacaddr
dstmacmask
MAD _AD D R
MAC_MASK
srcipaddr
srcipmask
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
srcipfrom
IP_AD D R
scripto
IP_AD D R
dstipfrom
IP_AD D R
dstipto
IP_AD D R
type
code
comment
state
UNIT16
UNIT16
STRING
STRING
ipset
STRING
ipsetflags
IPSETFLAGS
D at at yp e
D ef in it io n
srcmacaddr
srcmacmask
MAC_AD D R
MAC_MASK
dstmacaddr
dstmacmask
MAD _AD D R
MAC_MASK
srcipaddr
srcipmask
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
srcipfrom
IP_AD D R
310
At t rib u t e N ame
D at at yp e
D ef in it io n
scripto
IP_AD D R
dstipfrom
IP_AD D R
dstipto
IP_AD D R
comment
state
STRING
STRING
ipset
STRING
ipsetflags
IPSETFLAGS
1 8 .1 2 .1 0 .1 0 . T CP/UDP/SCT P o ve r IPV6
Protocol ID : tcp-ipv6, udp-ipv6, sctp-ipv6
The chain parameter is ignored for this type of traffic and should either be omitted or set to root.
T ab le 18.12. T C P, U D P, SC T P o ver IPv6 p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcipaddr
srcipmask
MAC_AD D R
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
srcipfrom
IP_AD D R
scripto
IP_AD D R
dstipfrom
IP_AD D R
dstipto
IP_AD D R
srcportstart
UINT16
srcportend
UINT16
dstportstart
UINT16
dstportend
UINT16
comment
state
STRING
STRING
311
At t rib u t e N ame
D at at yp e
D ef in it io n
ipset
STRING
ipsetflags
IPSETFLAGS
1 8 .1 2 .1 0 .1 1 . ICMPv6
Protocol ID : icmpv6
The chain parameter is ignored for this type of traffic and should either be omitted or set to root.
T ab le 18.13. IC MPv6 p ro t o co l t yp es
At t rib u t e N ame
D at at yp e
D ef in it io n
srcmacaddr
srcipaddr
srcipmask
MAC_AD D R
IP_AD D R
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
srcipfrom
IP_AD D R
scripto
IP_AD D R
dstipfrom
IP_AD D R
dstipto
IP_AD D R
type
code
comment
state
UINT16
UINT16
STRING
STRING
ipset
STRING
ipsetflags
IPSETFLAGS
D at at yp e
D ef in it io n
srcmacaddr
srcipaddr
MAC_AD D R
IP_AD D R
312
At t rib u t e N ame
D at at yp e
D ef in it io n
srcipmask
IP_MASK
dstipaddr
dstipmask
IP_AD D R
IP_MASK
srcipfrom
IP_AD D R
scripto
IP_AD D R
dstipfrom
IP_AD D R
dstipto
IP_AD D R
comment
state
STRING
STRING
ipset
STRING
ipsetflags
IPSETFLAGS
1 8 .1 2 .1 1 .1 . Co nne ct io n t racking
The network filtering subsystem (on Linux) makes use of the connection tracking support of IP tables.
This helps in enforcing the directionality of network traffic (state match) as well as counting and
limiting the number of simultaneous connections towards a guest virtual machine. As an example, if a
guest virtual machine has TCP port 8080 open as a server, clients may connect to the guest virtual
machine on port 8080. Connection tracking and enforcement of directionality then prevents the guest
virtual machine from initiating a connection from (TCP client) port 8080 to the host physical machine
back to a remote host physical machine. More importantly, tracking helps to prevent remote attackers
from establishing a connection back to a guest virtual machine. For example, if the user inside the
guest virtual machine established a connection to port 80 on an attacker site, then the attacker will
not be able to initiate a connection from TCP port 80 back towards the guest virtual machine. By
default the connection state match that enables connection tracking and then enforcement of
directionality of traffic is turned on.
313
This now allows incoming traffic to TCP port 12345, but would also enable the initiation from
(client) TCP port 12345 within the VM, which may or may not be desirable.
314
Note
Limitation rules must be listed in the XML prior to the rules for accepting traffic. According to
the XML file in Example 18.10, XML sample file that sets limits to connections , an additional
rule for allowing D NS traffic sent to port 22 go out the guest virtual machine, has been added
to avoid ssh sessions not getting established for reasons related to D NS lookup failures by
the ssh daemon. Leaving this rule out may result in the ssh client hanging unexpectedly as it
tries to connect. Additional caution should be used in regards to handling timeouts related to
tracking of traffic. An ICMP ping that the user may have terminated inside the guest virtual
machine may have a long timeout in the host physical machine's connection tracking system
and will therefore not allow another ICMP ping to go through.
The best solution is to tune the timeout in the host physical machine's sysfs with the
following command:# echo 3 >
/pro c/sys/net/netfi l ter/nf_co nntrack_i cmp_ti meo ut. This command sets the
ICMP connection tracking timeout to 3 seconds. The effect of this is that once one ping is
terminated, another one can start after 3 seconds.
If for any reason the guest virtual machine has not properly closed its TCP connection, the
connection to be held open for a longer period of time, especially if the TCP timeout value was
set for a large amount of time on the host physical machine. In addition, any idle connection
may result in a time out in the connection tracking system which can be re-activated once
packets are exchanged.
However, if the limit is set too low, newly initiated connections may force an idle connection
into TCP backoff. Therefore, the limit of connections should be set rather high so that
fluctuations in new TCP connections don't cause odd traffic behavior in relation to idle
connections.
D escrip t io n
315
C o mman d N ame
D escrip t io n
no-arp-spoofing
allow-dhcp
allow-dhcp-server
no-ip-spoofing
no-ip-multicast
clean-traffic
These filters are only building blocks and require a combination with other filters to provide useful
network traffic filtering. The most used one in the above list is the clean-traffic filter. This filter itself can
for example be combined with the no-ip-multicast filter to prevent virtual machines from sending IP
multicast traffic on top of the prevention of packet spoofing.
316
Using a Linux host physical machine, all traffic filtering rules created by libvirt's network filtering
subsystem first passes through the filtering support implemented by ebtables and only afterwards
through iptables or ip6tables filters. If a filter tree has rules with the protocols including: mac, stp,
vlan arp, rarp, ipv4, or ipv6; the ebtable rules and values listed will automatically be used first.
Multiple chains for the same protocol can be created. The name of the chain must have a prefix of
one of the previously enumerated protocols. To create an additional chain for handling of ARP
traffic, a chain with name arp-test, can for example be specified.
As an example, it is possible to filter on UD P traffic by source and destination ports using the IP
protocol filter and specifying attributes for the protocol, source and destination IP addresses and
ports of UD P packets that are to be accepted. This allows early filtering of UD P traffic with ebtables.
However, once an IP or IPv6 packet, such as a UD P packet, has passed the ebtables layer and there
is at least one rule in a filter tree that instantiates iptables or ip6tables rules, a rule to let the UD P
packet pass will also be necessary to be provided for those filtering layers. This can be achieved with
a rule containing an appropriate udp or udp-ipv6 traffic filtering node.
317
318
319
320
321
Migration must occur between libvirt installations of version 0.8.1 or later in order not to lose the
network traffic filters associated with an interface.
VLAN (802.1Q) packets, if sent by a guest virtual machine, cannot be filtered with rules for
protocol ID s arp, rarp, ipv4 and ipv6. They can only be filtered with protocol ID s, MAC and VLAN.
Therefore, the example filter clean-traffic Example 18.1, An example of network filtering will not
work as expected.
...
<devices>
<interface type='mcast'>
<mac address='52:54:00:6d:90:01'>
<source address='230.0.0.1' port='5558'/>
</interface>
</devices>
...
322
...
<devices>
<interface type='server'>
<mac address='52:54:00:22:c9:42'>
<source address='192.168.0.1' port='5558'/>
</interface>
...
<interface type='client'>
<mac address='52:54:00:8b:c9:51'>
<source address='192.168.0.1' port='5558'/>
</interface>
</devices>
...
<network>
<name>ovs-net</name>
<forward mode='bridge'/>
<bridge name='ovsbr0'/>
<virtualport type='openvswitch'>
<parameters interfaceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/>
</virtualport>
<vlan trunk='yes'>
<tag id='42' nativeMode='untagged'/>
<tag id='47'/>
</vlan>
<portgroup name='dontpanic'>
<vlan>
<tag id='42'/>
</vlan>
</portgroup>
</network>
Fig u re 18.28. vSet t in g VLAN t ag ( o n su p p o rt ed n et wo rk t yp es o n ly)
If (and only if) the network type supports vlan tagging transparent to the guest, an optional <vl an>
element can specify one or more vlan tags to apply to the traffic of all guests using this network.
(openvswitch and type='hostdev' SR-IOV networks do support transparent VLAN tagging of guest
traffic; everything else, including standard linux bridges and libvirt's own virtual networks, do not
support it. 802.1Qbh (vn-link) and 802.1Qbg (VEPA) switches provide their own way (outside of
323
libvirt) to tag guest traffic onto specific vlans.) As expected, the tag attribute specifies which vlan tag
to use. If a network has more than one <vl an> element defined, it is assumed that the user wants to
do VLAN trunking using all the specified tags. In the case that VLAN trunking with a single tag is
desired, the optional attribute trunk='yes' can be added to the VLAN element.
For network connections using openvswitch it is possible to configure the 'native-tagged' and
'native-untagged' VLAN modes. This uses the optional nativeMode attribute on the <tag > element:
nativeMode may be set to 'tagged' or 'untagged'. The id attribute of the element sets the native vlan.
<vl an> elements can also be specified in a <po rtg ro up> element, as well as directly in a domain's
<i nterface> element. In the case that a vlan tag is specified in multiple locations, the setting in
<i nterface> takes precedence, followed by the setting in the <po rtg ro up> selected by the
interface config. The <vl an> in <netwo rk> will be selected only if none is given in <po rtg ro up> or
<i nterface>.
324
Whit e list Fo rm at
<n ame> - When used in a syntax description, this string should be replaced by user-defined
value.
[ a|b |c] - When used in a syntax description, only one of the strings separated by | is used.
When no comment is present, an option is supported with all possible values.
Em ulat e d Machine
- M <machine-type>
- mach in e <machine-type>[,<pro perty>[= <val ue>][,. . ]]
325
Pro ce sso r T o po lo gy
- smp <n>[,cores=<ncores>][,threads=<nthreads>][,sockets=<nsocks>][,maxcpus=<maxcpus>]
Hypervisor and guest operating system limits on processor topology apply.
NUMA Syst e m
- n u ma <nodes>[,mem=<size>][,cpus=<cpu[-cpu>]][,nodeid=<node>]
Hypervisor and guest operating system limits on processor topology apply.
Me m o ry Size
- m <megs>
Supported values are limited by guest minimal and maximal values and hypervisor limits.
Gue st Nam e
- n ame <name>
Gue st UUID
- u u id <uuid>
Ge ne ric Drive
- d rive <option>[,<option>[,<option>[,...]]]
Supported with the following options:
326
read o n ly[on|off]
werro r[enospc|report|stop|ignore]
rerro r[report|stop|ignore]
id =<id>
Id of the drive has the following limitation for if=none:
ID E disk has to have <id> in following format: drive-ide0-<BUS>-<UNIT>
Example of correct format:
-drive if=none,id=drive-ide0-<BUS>-<UNIT>,... -device ide-drive,drive=drive-ide0-<BUS><UNIT>,bus=ide.<BUS>,unit=<UNIT>
f ile=<file>
Value of <file> is parsed with the following rules:
Passing floppy device as <file> is not supported.
Passing cd-rom device as <file> is supported only with cdrom media type (media=cdrom) and
only as ID E drive (either if=ide or if=none + -device ide-drive).
If <file> is neither block nor character device, it must not contain ':'.
if =<interface>
The following interfaces are supported: none, ide, virtio, floppy.
in d ex=<index>
med ia=<media>
cach e=<cache>
Supported values: none, writeback or writethrough.
co p y- o n - read =[on|off]
sn ap sh o t =[yes|no]
serial=<serial>
aio =<aio>
f o rmat =<format>
This option is not required and can be omitted. However, this is not recommended for raw images
because it represents security risk. Supported formats are:
q co w2
raw
Bo o t Opt io n
- b o o t [order=<drives>][,menu=[on|off]]
327
Snapsho t Mo de
- sn ap sh o t
Disable Graphics
- n o g rap h ic
VNC Display
- vn c <display>[,<option>[,<option>[,...]]]
Supported display value:
[<host>]:<port>
unix:<path>
sh are[allow-exclusive|force-shared|ignore]
none - Supported with no other options specified.
Supported options are:
t o =<port>
reverse
p asswo rd
t ls
x509 =</path/to/certificate/dir> - Supported when t ls specified.
x509 verif y=</path/to/certificate/dir> - Supported when t ls specified.
sasl
acl
328
Spice De skt o p
- sp ice option[,option[,...]]
Supported options are:
p o rt =<number>
ad d r=<addr>
ip v4
ip v6
p asswo rd =<secret>
d isab le- t icket in g
d isab le- co p y- p ast e
t ls- p o rt =<number>
x509 - d ir=</path/to/certificate/dir>
x509 - key- f ile=<file>
x509 - key- p asswo rd =<file>
x509 - cert - f ile=<file>
x509 - cacert - f ile=<file>
x509 - d h - key- f ile=<file>
t ls- cip h er=<list>
t ls- ch an n el[main|display|cursor|inputs|record|playback]
p lain t ext - ch an n el[main|display|cursor|inputs|record|playback]
imag e- co mp ressio n =<compress>
jp eg - wan - co mp ressio n =<value>
z lib - g lz - wan - co mp ressio n =<value>
st reamin g - vid eo =[off|all|filter]
ag en t - mo u se=[on|off]
p layb ack- co mp ressio n =[on|off]
seamless- mig rat io =[on|off]
T AP ne t wo rk
- n et d ev t ap ,id=<id>][,<options>...]
329
Ge ne ral De vice
- d evice <driver>[,<prop>[=<value>][,...]]
All drivers support following properties
id
bus
Following drivers are supported (with available properties):
p ci- assig n
host
bootindex
configfd
addr
rombar
romfile
multifunction
If the device has multiple functions, all of them need to be assigned to the same guest.
rt l8139
mac
netdev
330
bootindex
addr
e1000
mac
netdev
bootindex
addr
virt io - n et - p ci
ioeventfd
vectors
indirect
event_idx
csum
guest_csum
gso
guest_tso4
guest_tso6
guest_ecn
guest_ufo
host_tso4
host_tso6
host_ecn
host_ufo
mrg_rxbuf
status
ctrl_vq
ctrl_rx
ctrl_vlan
ctrl_rx_extra
mac
netdev
bootindex
331
x-txtimer
x-txburst
tx
addr
q xl
ram_size
vram_size
revision
cmdlog
addr
id e- d rive
unit
drive
physical_block_size
bootindex
ver
wwn
virt io - b lk- p ci
class
drive
logical_block_size
physical_block_size
min_io_size
opt_io_size
bootindex
ioeventfd
vectors
indirect_desc
event_idx
scsi
addr
virt io - scsi- p ci - tech-preview in 6.3, supported since 6.4.
332
For Windows guests, Windows Server 2003, which was tech-preview, is no longer supported
since 6.5. However, Windows Server 2008 and 2012, and Windows desktop 7 and 8 are fully
supported since 6.5.
vectors
indirect_desc
event_idx
num_queues
addr
isa- d eb u g co n
isa- serial
index
iobase
irq
chardev
virt serialp o rt
nr
chardev
name
virt co n so le
nr
chardev
name
virt io - serial- p ci
vectors
class
indirect_desc
event_idx
max_ports
flow_control
addr
ES1370
addr
AC 9 7
333
addr
in t el- h d a
addr
h d a- d u p lex
cad
h d a- micro
cad
h d a- o u t p u t
cad
i6 300esb
addr
ib 700 - no properties
sg a - no properties
virt io - b allo o n - p ci
indirect_desc
event_idx
addr
u sb - t ab let
migrate
port
u sb - kb d
migrate
port
u sb - mo u se
migrate
port
u sb - ccid - supported since 6.2
port
slot
u sb - h o st - tech preview since 6.2
hostbus
hostaddr
334
hostport
vendorid
productid
isobufs
port
u sb - h u b - supported since 6.2
port
u sb - eh ci - tech preview since 6.2
freq
maxframes
port
u sb - st o rag e - tech preview since 6.2
drive
bootindex
serial
removable
port
u sb - red ir - tech preview for 6.3, supported since 6.4
chardev
filter
scsi- cd - tech preview for 6.3, supported since 6.4
drive
logical_block_size
physical_block_size
min_io_size
opt_io_size
bootindex
ver
serial
scsi-id
lun
channel-scsi
335
wwn
scsi- h d -tech preview for 6.3, supported since 6.4
drive
logical_block_size
physical_block_size
min_io_size
opt_io_size
bootindex
ver
serial
scsi-id
lun
channel-scsi
wwn
scsi- b lo ck -tech preview for 6.3, supported since 6.4
drive
bootindex
scsi- d isk -tech preview for 6.3
drive=drive
logical_block_size
physical_block_size
min_io_size
opt_io_size
bootindex
ver
serial
scsi-id
lun
channel-scsi
wwn
p iix3- u sb - u h ci
p iix4 - u sb - u h ci
336
Charact e r De vice
- ch ard ev backend,id=<id>[,<options>]
Supported backends are:
n u ll,id=<id> - null device
so cket ,id=<id>,port=<port>[,host=<host>][,to=<to>][,ipv4][,ipv6][,nodelay][,server][,nowait][,telnet]
- tcp socket
so cket ,id=<id>,path=<path>[,server][,nowait][,telnet] - unix socket
f ile,id=<id>,path=<path> - trafit to file.
st d io ,id=<id> - standard i/o
sp icevmc,id=<id>,name=<name> - spice channel
Enable USB
- u sb
Ke rne l File
337
- kern el <bzImage>
Note: multiboot images are not supported
Ram Disk
- in it rd <file>
No Shut do wn
- n o - sh u t d o wn
No Re bo o t
- n o - reb o o t
338
Mo nit o r Re dire ct
- mo n <chardev_id>[,mode=[readline|control]][,default=[on|off]]
RT C
- rt c [base=utc|localtime|date][,clock=host|vm][,driftfix=none|slew]
Wat chdo g
- wat ch d o g model
Gue st Me m o ry Backing
- mem- p reallo c - mem- p at h /dev/hugepages
SMBIOS Ent ry
- smb io s type=0[,vendor=<str>][,<version=str>][,date=<str>][,release=% d.% d]
- smb io s type=1[,manufacturer=<str>][,product=<str>][,version=<str>][,serial=<str>][,uuid=<uuid>]
[,sku=<str>][,family=<str>]
He lp
-h
- h elp
Ve rsio n
- versio n
Audio He lp
- au d io - h elp
339
Migrat io n
- in co min g
No De fault Co nfigurat io n
- n o d ef co n f ig
- n o d ef au lt s
Running without -nodefaults is not supported
Lo ade d Save d St at e
- lo ad vm <file>
34 0
D escrip t io n
<name>
<uui d >
<ti tl e>
34 1
Elemen t
D escrip t io n
<metad ata>
...
<os>
<type>hvm</type>
<loader>/usr/lib/xen/boot/hvmloader</loader>
<boot dev='hd'/>
<boot dev='cdrom'/>
<bootmenu enable='yes'/>
<smbios mode='sysinfo'/>
<bios useserial='yes' rebootTimeout='0'/>
</os>
...
Fig u re 20.2. B IO S b o o t lo ad er d o main XML
The components of this section of the domain XML are as follows:
T ab le 20.2. B IO S b o o t lo ad er elemen t s
Elemen t
34 2
D escrip t io n
Elemen t
D escrip t io n
<type>
<l o ad er>
<bo o t>
<bo o tmenu>
<smbi o s>
<bi o s>
34 3
...
<bootloader>/usr/bin/pygrub</bootloader>
<bootloader_args>--append single</bootloader_args>
...
Fig u re 20.3. H o st p h ysical mach in e b o o t lo ad er d o main XML
The components of this section of the domain XML are as follows:
T ab le 20.3. B IO S b o o t lo ad er elemen t s
Elemen t
D escrip t io n
<bo o tl o ad er>
...
<os>
<type>hvm</type>
<loader>/usr/lib/xen/boot/hvmloader</loader>
<kernel>/root/f8-i386-vmlinuz</kernel>
<initrd>/root/f8-i386-initrd</initrd>
<cmdline>console=ttyS0 ks=http://example.com/f8-i386/os/</cmdline>
<dtb>/root/ppc.dtb</dtb>
</os>
...
34 4
D escrip t io n
<type>
<l o ad er>
<kernel >
...
<os>
<smbios mode='sysinfo'/>
...
</os>
<sysinfo type='smbios'>
<bios>
<entry name='vendor'>LENOVO</entry>
</bios>
<system>
<entry name='manufacturer'>Fedora</entry>
<entry name='vendor'>Virt-Manager</entry>
</system>
</sysinfo>
...
Fig u re 20.5. SMB IO S syst em in f o rmat io n
The <sysi nfo > element has a mandatory attribute type that determines the layout of sub-elements,
and may be defined as follows:
smbi o s - Sub-elements call out specific SMBIOS values, which will affect the guest virtual
machine if used in conjunction with the smbios sub-element of the <o s> element. Each subelement of sysinfo names a SMBIOS block, and within those elements can be a list of entry
elements that describe a field within the block. The following blocks and entries are recognized:
34 5
bi o s - This is block 0 of SMBIOS, with entry names drawn from vend o r, versi o n, d ate, and
rel ease.
<system> - This is block 1 of SMBIOS, with entry names drawn from manufacturer,
pro d uct, versi o n, seri al , uui d , sku, and fami l y. If a uui d entry is provided alongside
a top-level uuid element, the two values must match.
<domain>
...
<vcpu placement='static' cpuset="1-4,^3,6" current="1">2</vcpu>
...
</domain>
Fig u re 20.6 . C PU allo cat io n
The <cpu> element defines the maximum number of virtual CPUs (vCPUs) allocated for the guest
virtual machine operating system, which must be between 1 and the maximum supported by the
hypervisor. This element can contain an optional cpuset attribute, which is a comma-separated list
of physical CPU numbers that domain processes and virtual CPUs can be pinned to by default.
Note that the pinning policy of domain processes and virtual CPUs can be specified separately by
using the cputune attribute. If the emul ato rpi n attribute is specified in <cputune>, the cpuset
value specified by <vcpu> will be ignored.
Similarly, virtual CPUs that have set a value for vcpupi n cause cpuset settings to be ignored.
Virtual CPUs where vcpupi n is not specified will be pinned to the physical CPUs specified by
cpuset. Each element in the cpuset list is either a single CPU number, a range of CPU numbers, or
a caret (^) followed by a CPU number to be excluded from a previous range. The attribute current
can be used to specify whether fewer than the maximum number of virtual CPUs should be enabled.
The optional attribute pl acement can be used to specify the CPU placement mode for the domain
process. pl acement can be set as either stati c or auto . If you set <vcpu pl acement= ' auto ' >,
the system will query numad and use the settings specified in the <numatune> tag, and ignore any
other settings in <vcpu> . If you set <vcpu pl acement= ' stati c' >, the system will use the settings
specified in the <vcpu pl acement> tag instead of the settings in <numatune>.
<domain>
...
<cputune>
<vcpupin vcpu="0" cpuset="1-4,^2"/>
<vcpupin vcpu="1" cpuset="0,1"/>
<vcpupin vcpu="2" cpuset="2,3"/>
<vcpupin vcpu="3" cpuset="0,4"/>
<emulatorpin cpuset="1-3"/>
34 6
<shares>2048</shares>
<period>1000000</period>
<quota>-1</quota>
<emulator_period>1000000</emulator_period>
<emulator_quota>-1</emulator_quota>
</cputune>
...
</domain>
Fig u re 20.7. C PU t u n in g
Although all are optional, the components of this section of the domain XML are as follows:
T ab le 20.5. C PU t u n in g elemen t s
Elemen t
D escrip t io n
<cputune>
<vcpupi n>
<shares>
<peri o d >
34 7
Elemen t
D escrip t io n
<q uo ta>
<domain>
...
<memoryBacking>
<hugepages/>
</memoryBacking>
...
</domain>
Fig u re 20.8. Memo ry b ackin g
34 8
<domain>
...
<memtune>
<hard_limit unit='G'>1</hard_limit>
<soft_limit unit='M'>128</soft_limit>
<swap_hard_limit unit='G'>2</swap_hard_limit>
<min_guarantee unit='bytes'>67108864</min_guarantee>
</memtune>
...
</domain>
Fig u re 20.9 . Memo ry T u n in g
Although all are optional, the components of this section of the domain XML are as follows:
T ab le 20.6 . Memo ry t u n in g elemen t s
Elemen t
D escrip t io n
<memtune>
<hard _l i mi t>
<swap_hard _l i mi t>
>
<domain>
...
<numatune>
<memory mode="strict" nodeset="1-4,^3"/>
</numatune>
...
</domain>
Fig u re 20.10. N U MA n o d e t u n in g
Although all are optional, the components of this section of the domain XML are as follows:
T ab le 20.7. N U MA n o d e t u n in g elemen t s
Elemen t
D escrip t io n
<numatune>
<memo ry>
350
<domain>
...
<blkiotune>
<weight>800</weight>
<device>
<path>/dev/sda</path>
<weight>1000</weight>
</device>
<device>
<path>/dev/sdb</path>
<weight>500</weight>
</device>
</blkiotune>
...
</domain>
Fig u re 20.11. B lo ck I/O T u n in g
Although all are optional, the components of this section of the domain XML are as follows:
T ab le 20.8. B lo ck I/O t u n in g elemen t s
Elemen t
D escrip t io n
<bl ki o tune>
<wei g ht>
351
the resource partition in which to place the domain. If no partition is listed, then the domain will be
placed in a default partition. It is the responsibility of the app/admin to ensure that the partition exists
prior to starting the guest virtual machine. Only the (hypervisor specific) default partition can be
assumed to exist by default.
<resource>
<partition>/virtualmachines/production</partition>
</resource>
Fig u re 20.12. R eso u rce p art it io n in g
Resource partitions are currently supported by the QEMU and LXC drivers, which map partition paths
to cgroups directories in all mounted controllers.
<cpu match='exact'>
<model fallback='allow'>core2duo</model>
<vendor>Intel</vendor>
<topology sockets='1' cores='2' threads='1'/>
<feature policy='disable' name='lahf_lm'/>
</cpu>
Fig u re 20.13. C PU mo d el an d t o p o lo g y examp le 1
<cpu mode='host-model'>
<model fallback='forbid'/>
<topology sockets='1' cores='2' threads='1'/>
</cpu>
Fig u re 20.14 . C PU mo d el an d t o p o lo g y examp le 2
<cpu mode='host-passthrough'/>
Fig u re 20.15. C PU mo d el an d t o p o lo g y examp le 3
352
In cases where no restrictions are to be put on either the CPU model nor its features, a simpler cpu
element such as the following may be used.
<cpu>
<topology sockets='1' cores='2' threads='1'/>
</cpu>
Fig u re 20.16 . C PU mo d el an d t o p o lo g y examp le 4
The components of this section of the domain XML are as follows:
T ab le 20.9 . C PU mo d el an d t o p o lo g y elemen t s
Elemen t
D escrip t io n
<cpu>
<match>
353
Elemen t
D escrip t io n
<mo d e>
354
Elemen t
<mo d el >
<vend o r>
<to po l o g y>
<feature>
D escrip t io n
Specifies CPU model requested by the guest
virtual machine. The list of available CPU
models and their definition can be found in
cpu_map. xml file installed in libvirt's data
directory. If a hypervisor is not able to use the
exact CPU model, libvirt automatically falls back
to a closest model supported by the hypervisor
while maintaining the list of CPU features. An
optional fal l back attribute can be used to
forbid this behavior, in which case an attempt to
start a domain requesting an unsupported CPU
model will fail. Supported values for fallback
attribute are: al l o w (this is the default), and
fo rbi d . The optional vend o r_i d attribute can
be used to set the vendor id seen by the guest
virtual machine. It must be exactly 12 characters
long. If not set, the vendor id of the host physical
machine is used. Typical possible values are
Authenti cAMD and G enui neIntel .
Specifies CPU vendor requested by the guest
virtual machine. If this element is missing, the
guest virtual machine runs on a CPU matching
given features regardless of its vendor. The list
of supported vendors can be found in
cpu_map. xml .
Specifies requested topology of virtual CPU
provided to the guest virtual machine. Three
non-zero values have to be given for sockets,
cores, and threads: total number of CPU
sockets, number of cores per socket, and
number of threads per core, respectively.
Can contain zero or more elements used to finetune features provided by the selected CPU
model. The list of known feature names can be
found in the same file as CPU models. The
meaning of each feature element depends on its
policy attribute, which has to be set to one of the
following values:
fo rce - forces the virtual to be supported
regardless of whether it is actually supported
by host physical machine CPU.
req ui re - dictates that guest virtual
machine creation will fail unless the feature is
supported by host physical machine CPU.
This is the default setting
o pti o nal - this feature is supported by
virtual CPU but and only if it is supported by
host physical machine CPU.
d i sabl e - this is not supported by virtual
CPU.
fo rbi d - guest virtual machine creation will
fail if the feature is supported by host
physical machine CPU.
355
<cpu>
<numa>
<cell cpus='0-3' memory='512000'/>
<cell cpus='4-7' memory='512000'/>
</numa>
</cpu>
...
Fig u re 20.17. G u est Virt u al Mach in e N U MA T o p o lo g y
Each cell element specifies a NUMA cell or a NUMA node. cpus specifies the CPU or range of CPUs
that are part of the node. memo ry specifies the node memory in kibibytes (i.e. blocks of 1024 bytes).
Each cell or node is assigned cel l i d or no d ei d in increasing order starting from 0.
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<on_lockfailure>poweroff</on_lockfailure>
Fig u re 20.18. Even t s co n f ig u rat io n
The following collections of elements allow the actions to be specified when a guest virtual machine
OS triggers a life cycle operation. A common use case is to force a reboot to be treated as a poweroff
when doing the initial OS installation. This allows the VM to be re-configured for the first post-install
bootup.
The components of this section of the domain XML are as follows:
T ab le 20.10. Even t co n f ig u rat io n elemen t s
St at e
356
D escrip t io n
St at e
D escrip t io n
357
St at e
D escrip t io n
<o n_crash>
...
<pm>
<suspend-to-disk enabled='no'/>
358
<suspend-to-mem enabled='yes'/>
</pm>
...
Fig u re 20.19 . Po wer man ag emen t
The <pm> element can be enabled using the arguement yes or disabled using the argument no .
BIOS support can be implemented for S3 using the argument suspend -to -d i sk and S4 using the
argument suspend -to -mem ACPI sleep states. If nothing is specified, the hypervisor will be left with
its default value.
...
<features>
<pae/>
<acpi/>
<apic/>
<hap/>
<privnet/>
<hyperv>
<relaxed state='on'/>
</hyperv>
</features>
...
D escrip t io n
<pae>
<acpi >
359
St at e
D escrip t io n
<api c>
<hap>
hyperv
20.15. T imekeeping
The guest virtual machine clock is typically initialized from the host physical machine clock. Most
operating systems expect the hardware clock to be kept in UTC, which is the default setting. Note that
for Windows guest virtual machines the guest virtual machine must be set in l o cal ti me.
...
<clock offset='localtime'>
<timer name='rtc' tickpolicy='catchup' track='guest'>
<catchup threshold='123' slew='120' limit='10000'/>
</timer>
<timer name='pit' tickpolicy='delay'/>
</clock>
...
Fig u re 20.21. T imekeep in g
The components of this section of the domain XML are as follows:
T ab le 20.12. T ime keep in g elemen t s
St at e
360
D escrip t io n
St at e
D escrip t io n
<cl o ck>
<ti mer>
<freq uency>
<mo d e>
<present>
See Note
This is an unsigned integer specifying the
frequency at which name="tsc" runs.
The mo d e attribute controls how the
name= "tsc" <ti mer> is managed, and can be
set to: auto , nati ve, emul ate, paravi rt, or
smpsafe. Other timers are always emulated.
Specifies whether a particular timer is available
to the guest virtual machine. Can be set to yes
or no
361
Note
Each <ti mer> element must contain a name attribute, and may have the following attributes
depending on the name specified.
<name> - selects which ti mer is being modified. The following values are
acceptable:kvmcl o ck (QEMU-KVM), pi t(QEMU-KVM), or rtc(QEMU-KVM), or tsc(libxl
only). Note that pl atfo rm is currently unsupported.
track - specifies the timer track. The following values are acceptable: bo o t, g uest, or
wal l . track is only valid for name= "rtc".
ti ckpo l i cy - determines what happens whens the deadline for injecting a tick to the
guest virtual machine is missed. The following values can be assigned:
d el ay -will continue to deliver ticks at the normal rate. The guest virtual machine time
will be delayed due to the late tick
catchup - delivers ticks at a higher rate in order to catch up with the missed tick. The
guest virtual machine time is not displayed once catchup is complete. In addition, there
can be three optional attributes, each a positive integer, as follows: threshold, slew, and
limit.
merg e - merges the missed tick(s) into one tick and injects them. The guest virtual
machine time may be delayed, depending on how the merge is done.
d i scard - throws away the missed tick(s) and continues with future injection at its
default interval setting. The guest virtual machine time may be delayed, unless there is
an explicit statement for handling lost ticks
20.16. Devices
This set of XML elements are all used to describe devices provided to the guest virtual machine
domain. All of the devices below are indicated as children of the main devices element.
The following virtual devices are supported:
virtio-scsi-pci - PCI bus storage device
virtio-9p-pci - PCI bus storage device
virtio-blk-pci - PCI bus storage device
virtio-net-pci - PCI bus network device also known as virtio-net
virtio-serial-pci - PCI bus input device
virtio-balloon-pci - PCI bus memory balloon device
virtio-rng-pci - PCI bus virtual random number generator device
362
Important
If a virtio device is created where the number of vectors is set to a value higher than 32, the
device behaves as if it was set to a zero value on Red Hat Enterprise Linux 6, but not on
Enterprise Linux 7. The resulting vector setting mismatch causes a migration error if the
number of vectors on any virtio device on either platform is set to 33 or higher. It is therefore
not reccomended to set the vecto r value to be greater than 32. All virtio devices with the
exception of virtio-balloon-pci and virtio-rng-pci will accept a vecto r argument.
...
<devices>
<emulator>/usr/lib/xen/bin/qemu-dm</emulator>
</devices>
...
Fig u re 20.22. D evices - ch ild elemen t s
The contents of the <emul ato r> element specify the fully qualified path to the device model emulator
binary. The capabilities XML specifies the recommended default emulator to use for each particular
domain type or architecture combination.
...
<devices>
<disk type='file' snapshot='external'>
<driver name="tap" type="aio" cache="default"/>
<source file='/var/lib/xen/images/fv0' startupPolicy='optional'>
<seclabel relabel='no'/>
</source>
<target dev='hda' bus='ide'/>
<iotune>
<total_bytes_sec>10000000</total_bytes_sec>
<read_iops_sec>400000</read_iops_sec>
<write_iops_sec>100000</write_iops_sec>
</iotune>
<boot order='2'/>
<encryption type='...'>
...
</encryption>
<shareable/>
<serial>
...
363
</serial>
</disk>
...
<disk type='network'>
<driver name="qemu" type="raw" io="threads" ioeventfd="on"
event_idx="off"/>
<source protocol="sheepdog" name="image_name">
<host name="hostname" port="7000"/>
</source>
<target dev="hdb" bus="ide"/>
<boot order='1'/>
<transient/>
<address type='drive' controller='0' bus='1' unit='0'/>
</disk>
<disk type='network'>
<driver name="qemu" type="raw"/>
<source protocol="rbd" name="image_name2">
<host name="hostname" port="7000"/>
</source>
<target dev="hdd" bus="ide"/>
<auth username='myuser'>
<secret type='ceph' usage='mypassid'/>
</auth>
</disk>
<disk type='block' device='cdrom'>
<driver name='qemu' type='raw'/>
<target dev='hdc' bus='ide' tray='open'/>
<readonly/>
</disk>
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/sda'/>
<target dev='sda' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='3' unit='0'/>
</disk>
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/sda'/>
<geometry cyls='16383' heads='16' secs='63' trans='lba'/>
<blockio logical_block_size='512' physical_block_size='4096'/>
<target dev='hda' bus='ide'/>
</disk>
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='blk-pool0' volume='blk-pool0-vol0'/>
<target dev='hda' bus='ide'/>
</disk>
</devices>
...
Fig u re 20.23. D evices - H ard d rives, f lo p p y d isks, C D R O Ms
2 0 .1 6 .1 .1 . Disk e le m e nt
364
The <d i sk> element is the main container for describing disks. The attribute type can be used with
the <d i sk> element. The following types are allowed:
fi l e
bl o ck
dir
netwo rk
For more information, see D isk Elements
2 0 .1 6 .1 .2 . So urce e le m e nt
If the <d i sk type= ' fi l e' ' >, then the fi l e attribute specifies the fully-qualified path to the file
holding the disk. If the <d i sk type= ' bl o ck' >, then the d ev attribute specifies the path to the host
physical machine device to serve as the disk. With both fi l e and bl o ck, one or more optional subelements secl abel , described below, can be used to override the domain security labeling policy
for just that source file. If the disk type is d i r, then the d i r attribute specifies the fully-qualified path
to the directory to use as the disk. If the disk type is netwo rk, then the protocol attribute specifies the
protocol to access to the requested image; possible values are nbd , rbd , sheepd o g or g l uster.
If the protocol attribute is rbd , sheepd o g or g l uster, an additional attribute name is mandatory to
specify which volume and or image will be used. When the disk type is netwo rk, the so urce may
have zero or more ho st sub-elements used to specify the host physical machines to connect,
including: type= ' d i r' and type= ' netwo rk' . For a fi l e disk type which represents a cdrom or
floppy (the device attribute), it is possible to define policy what to do with the disk if the source file is
not accessible. This is done by manipulating the startupP o l i cy attribute, with the following
values:
mand ato ry causes a failure if missing for any reason. This is the default setting.
req ui si te causes a failure if missing on boot up, drops if missing on migrate/restore/revert
o pti o nal drops if missing at any start attempt
2 0 .1 6 .1 .3. Mirro r e le m e nt
This element is present if the hypervisor has started a Bl o ckC o py operation, where the <mi rro r>
location in the attribute file will eventually have the same contents as the source, and with the file
format in attribute format (which might differ from the format of the source). If an attribute ready is
present, then it is known the disk is ready to pivot; otherwise, the disk is probably still copying. For
now, this element only valid in output; it is ignored on input.
2 0 .1 6 .1 .4 . T arge t e le m e nt
The <targ et> element controls the bus / device under which the disk is exposed to the guest virtual
machine OS. The dev attribute indicates the logical device name. The actual device name specified is
not guaranteed to map to the device name in the guest virtual machine OS. The optional bus attribute
specifies the type of disk device to emulate; possible values are driver specific, with typical values
being i d e, scsi , vi rti o , xen, usb or sata. If omitted, the bus type is inferred from the style of the
device name. eg, a device named ' sd a' will typically be exported using a SCSI bus. The optional
attribute tray indicates the tray status of the removable disks (i.e. CD ROM or Floppy disk), the value
can be either o pen or cl o sed . The default setting is cl o sed . For more information, see target
Elements
365
2 0 .1 6 .1 .5 . io t une
The optional <i o tune> element provides the ability to provide additional per-device I/O tuning, with
values that can vary for each device (contrast this to the bl ki o tune element, which applies globally
to the domain). This element has the following optional sub-elements. Note that any sub-element not
specified or at all or specified with a value of 0 implies no limit.
<to tal _bytes_sec> - the total throughput limit in bytes per second. This element cannot be
used with <read _bytes_sec> or <wri te_bytes_sec>.
<read _bytes_sec> - the read throughput limit in bytes per second.
<wri te_bytes_sec> - the write throughput limit in bytes per second.
<to tal _i o ps_sec> - the total I/O operations per second. This element cannot be used with
<read _i o ps_sec> or <wri te_i o ps_sec>.
<read _i o ps_sec> - the read I/O operations per second.
<wri te_i o ps_sec> - the write I/O operations per second.
2 0 .1 6 .1 .6 . drive r
The optional <d ri ver> element allows specifying further details related to the hypervisor driver that
is used to provide the disk. The following options may be used:
If the hypervisor supports multiple backend drivers, then the name attribute selects the primary
backend driver name, while the optional type attribute provides the sub-type. For a list of possible
types refer to D river Elements
The optional cache attribute controls the cache mechanism, possible values are: d efaul t,
no ne, wri tethro ug h, wri teback, d i rectsync (similar to wri tethro ug h, but it bypasses the
host physical machine page cache) and unsafe (host physical machine may cache all disk io,
and sync requests from guest virtual machine virtual machines are ignored).
The optional erro r_po l i cy attribute controls how the hypervisor behaves on a disk read or
write error, possible values are sto p, repo rt, i g no re, and eno space. The default setting of
erro r_po l i cy is repo rt. There is also an optional rerro r_po l i cy that controls behavior for
read errors only. If no rerro r_po l i cy is given, erro r_po l i cy is used for both read and write
errors. If rerro r_po l i cy is given, it overrides the erro r_po l i cy for read errors. Also note that
eno space is not a valid policy for read errors, so if erro r_po l i cy is set to eno space and no
rerro r_po l i cy is given, the read error the default setting, repo rt will be used.
The optional i o attribute controls specific policies on I/O; q emu guest virtual machine virtual
machines support thread s and nati ve. The optional i o eventfd attribute allows users to set
domain I/O asynchronous handling for disk device. The default is left to the discretion of the
hypervisor. Accepted values are o n and o ff. Enabling this allows the guest virtual machine
virtual machine to be executed while a separate thread handles I/O. Typically guest virtual
machine virtual machines experiencing high system CPU utilization during I/O will benefit from
this. On the other hand, an overloaded host physical machine can increase guest virtual machine
virtual machine I/O latency. Unless you are absolutely certian that the i o needs to be
manipulated, it is highly recommended that you not change the default setting and allow the
hypervisor to dictate the setting.
The optional event_i d x attribute controls some aspects of device event processing and can be
set to either o n or o ff - if it is on, it will reduce the number of interrupts and exits for the guest
virtual machine virtual machine. The default is determined by the hypervisor and the default
setting is o n. In cases that there is a situation where this behavior is suboptimal, this attribute
366
provides a way to force the feature o ff. Unless you are absolutely certian that the event_i d x
needs to be manipulated, it is highly recommended that you not change the default setting and
allow the hypervisor to dictate the setting.
The optional co py_o n_read attribute controls whether to copy the read backing file into the
image file. The accepted values can be either o n or <o ff>. co py-o n-read avoids accessing the
same backing file sectors repeatedly and is useful when the backing file is over a slow network.
By default co py-o n-read is o ff.
367
sheepd o g - Specifies one of the sheepdog servers (default is localhost:7000) and can be
used one or none of the host physical machines
g l uster - Specifies a server running a glusterd daemon and may be used for only only one
host physical machine. The valid values for transport attribute are tcp, rd ma or uni x. If
nothing is specified, tcp is assumed. If transport is uni x, the so cket attribute specifies path
to unix socket.
<ad d ress> - Ties the disk to a given slot of a controller. The actual <co ntro l l er> device can
often be inferred by but it can also be explicitly specified. The type attribute is mandatory, and is
typically pci or d ri ve. For a pci controller, additional attributes for bus, sl o t, and functi o n
must be present, as well as optional d o mai n and mul ti functi o n. mul ti fun cti o n defaults
to o ff. For a d ri ve controller, additional attributes co ntro l l er, bus, targ et, and uni t are
available, each with a default setting of 0 .
auth - Provides the authentication credentials needed to access the source. It includes a
mandatory attribute username, which identifies the username to use during authentication, as well
as a sub-element secret with mandatory attribute type. More information can be found here at
D evice Elements
g eo metry - Provides the ability to override geometry settings. This mostly useful for S390 D ASD disks or older D OS-disks.
cyl s - Specifies the number of cylinders.
head s - Specifies the number of heads.
secs - Specifies the number of sectors per track.
trans - Specifies the BIOS-Translation-Modus and can have the following values:no ne, l ba or
auto
bl o cki o - Allows the block device to be overridden with any of the block device properties listed
below:
b lo ckio o p t io n s
l o g i cal _bl o ck_si ze- reports to the guest virtual machine virtual machine OS and
describes the smallest units for disk I/O.
physi cal _bl o ck_si ze - reports to the guest virtual machine virtual machine OS and
describes the disk's hardware sector size which can be relevant for the alignment of disk data.
...
<devices>
<filesystem type='template'>
<source name='my-vm-template'/>
<target dir='/'/>
</filesystem>
<filesystem type='mount' accessmode='passthrough'>
368
369
targ et - D ictates where the source drivers can be accessed in the guest virtual machine. For
most drivers this is an automatic mount point, but for QEMU-KVM this is merely an arbitrary string
tag that is exported to the guest virtual machine as a hint for where to mount.
read o nl y - Enables exporting the filesytem as a readonly mount for guest virtual machine, by
default read -wri te access is given.
space_hard _l i mi t - Specifies the maximum space available to this guest virtual machine's
filesystem
space_so ft_l i mi t - Specifies the maximum space available to this guest virtual machine's
filesystem. The container is permitted to exceed its soft limits for a grace period of time. Afterwards
the hard limit is enforced.
370
...
<devices>
<controller type='ide' index='0'/>
<controller type='virtio-serial' index='0' ports='16' vectors='4'/>
<controller type='virtio-serial' index='1'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0a'
function='0x0'/>
<controller type='scsi' index='0' model='virtio-scsi'
num_queues='8'/>
</controller>
...
</devices>
...
371
should be noted that virtio-scsi controllers and drivers will work on both KVM and Windows guest
virtual machines. The <co ntro l l er type= ' scsi ' > also has an attribute num_q ueues which
enables multi-queue support for the number of queues specified.
A "usb" controller has an optional attribute mo d el , which is one of "pi i x3-uhci ", "pi i x4 uhci ", "ehci ", "i ch9 -ehci 1", "i ch9 -uhci 1", "i ch9 -uhci 2", "i ch9 -uhci 3", "vt82c6 86 buhci ", "pci -o hci " or "nec-xhci ". Additionally, if the USB bus needs to be explicitly disabled for
the guest virtual machine, mo d el = ' no ne' may be used. The PowerPC64 " spapr-vio" addresses do
not have an associated controller.
For controllers that are themselves devices on a PCI or USB bus, an optional sub-element ad d ress
can specify the exact relationship of the controller to its master bus, with semantics given above.
USB companion controllers have an optional sub-element master to specify the exact relationship of
the companion to its master controller. A companion controller is on the same bus as its master, so
the companion index value should be equal.
...
<devices>
<controller type='usb' index='0'
<address type='pci' domain='0'
</controller>
<controller type='usb' index='0'
<master startport='0'/>
<address type='pci' domain='0'
multifunction='on'/>
</controller>
...
</devices>
...
model='ich9-ehci1'>
bus='0' slot='4' function='7'/>
model='ich9-uhci1'>
bus='0' slot='4' function='0'
...
<devices>
...
<lease>
<lockspace>somearea</lockspace>
<key>somekey</key>
<target path='/some/lease/path' offset='1024'/>
372
</lease>
...
</devices>
...
Fig u re 20.27. D evices - d evice leases
The l ease section can have the following arguements:
l o ckspace - an arbitrary string that identifies lockspace within which the key is held. Lock
managers may impose extra restrictions on the format, or length of the lockspace name.
key - an arbitrary string, that uniquely identies the lease to be acquired. Lock managers may
impose extra restrictions on the format, or length of the key.
targ et - the fully qualified path of the file associated with the lockspace. The offset specifies
where the lease is stored within the file. If the lock manager does not require a offset, set this value
to 0 .
...
<devices>
<hostdev mode='subsystem' type='usb'>
<source startupPolicy='optional'>
<vendor id='0x1234'/>
<product id='0xbeef'/>
</source>
<boot order='2'/>
</hostdev>
</devices>
...
Fig u re 20.28. D evices - h o st p h ysical mach in e d evice assig n men t
Alternatively the following can also be done:
...
<devices>
<hostdev mode='subsystem' type='pci' managed='yes'>
373
<source>
<address bus='0x06' slot='0x02' function='0x0'/>
</source>
<boot order='1'/>
<rom bar='on' file='/etc/fake/boot.bin'/>
</hostdev>
</devices>
...
Fig u re 20.29 . D evices - h o st p h ysical mach in e d evice assig n men t alt ern at ive
The components of this section of the domain XML are as follows:
T ab le 20.13. H o st p h ysical mach in e d evice assig n men t elemen t s
Paramet er
D escrip t io n
ho std ev
so urce
374
Paramet er
D escrip t io n
bo o t
ro m
ad d ress
...
<hostdev mode='capabilities' type='storage'>
<source>
<block>/dev/sdf1</block>
</source>
</hostdev>
...
375
...
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/input/event3</char>
</source>
</hostdev>
...
...
<hostdev mode='capabilities' type='net'>
<source>
<interface>eth0</interface>
</source>
</hostdev>
...
D escrip t io n
ho std ev
so urce
376
...
<devices>
<redirdev bus='usb' type='tcp'>
<source mode='connect' host='localhost' service='4000'/>
<boot order='1'/>
</redirdev>
<redirfilter>
<usbdev class='0x08' vendor='0x1234' product='0xbeef'
version='2.00' allow='yes'/>
<usbdev allow='no'/>
</redirfilter>
</devices>
...
Fig u re 20.33. D evices - red irect ed d evices
The components of this section of the domain XML are as follows:
T ab le 20.15. R ed irect ed d evice elemen t s
Paramet er
D escrip t io n
red i rd ev
bo o t
377
Paramet er
D escrip t io n
...
<devices>
<smartcard mode='host'/>
<smartcard mode='host-certificates'>
<certificate>cert1</certificate>
<certificate>cert2</certificate>
<certificate>cert3</certificate>
<database>/etc/pki/nssdb/</database>
</smartcard>
<smartcard mode='passthrough' type='tcp'>
<source mode='bind' host='127.0.0.1' service='2001'/>
<protocol type='raw'/>
<address type='ccid' controller='0' slot='0'/>
</smartcard>
<smartcard mode='passthrough' type='spicevmc'/>
</devices>
...
Fig u re 20.34 . D evices - smart card d evices
The smartcard element has a mandatory attribute mo d e. The following modes are supported; in
each mode, the guest virtual machine sees a device on its USB bus that behaves like a physical USB
CCID (Chip/Smart Card Interface D evice) card.
The mode attributes are as follows:
T ab le 20.16 . Smart card mo d e elemen t s
Paramet er
378
D escrip t io n
Paramet er
D escrip t io n
mo d e= ' ho st'
Each mode supports an optional sub-element ad d ress, which fine-tunes the correlation between the
smartcard and a ccid bus controller (Refer to Section 20.16.3, D evice Addresses ).
...
<devices>
379
<interface type='bridge'>
<source bridge='xenbr0'/>
<mac address='00:16:3e:5d:c7:9e'/>
<script path='vif-bridge'/>
<boot order='1'/>
<rom bar='off'/>
</interface>
</devices>
...
Fig u re 20.35. D evices - n et wo rk in t erf aces
There are several possibilities for specifying a network interface visible to the guest virtual machine.
Each subsection below provides more details about common setup options. Additionally, each
<i nterface> element has an optional <ad d ress> sub-element that can tie the interface to a
particular pci slot, with attribute type= ' pci ' (Refer to Section 20.16.3, D evice Addresses ).
380
you may choose to specify no type, but both an pro fi l ei d (in case the switch is 802.1Qbh) and
an i nterfacei d (in case the switch is Open vSwitch) (you may also omit the other attributes, such
as manag eri d , typei d , or pro fi l ei d , to be filled in from the network's vi rtual po rt). If you
want to limit a guest virtual machine to connecting only to certain types of switches, you can specify
the virtualport type, but still omit some/all of the parameters - in this case if the host physical
machine's network has a different type of virtualport, connection of the interface will fail. The virtual
network parameters are defined using management tools that modify the following part of the domain
XML:
...
<devices>
<interface type='network'>
<source network='default'/>
</interface>
...
<interface type='network'>
<source network='default' portgroup='engineering'/>
<target dev='vnet7'/>
<mac address="00:11:22:33:44:55"/>
<virtualport>
<parameters instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/>
</virtualport>
</interface>
</devices>
...
Fig u re 20.36 . D evices - n et wo rk in t erf aces- virt u al n et wo rks
2 0 .1 6 .9 .2 . Bridge t o LAN
Note that this is the recommended configuration setting for general guest virtual machine connectivity
on host physical machines with static wired networking configurations.
Bridge to LAN provides a bridge from the guest virtual machine directly onto the LAN. This assumes
there is a bridge device on the host physical machine which has one or more of the host physical
machines physical NICs enslaved. The guest virtual machine will have an associated tun device
created with a name of <vnetN>, which can also be overridden with the <targ et> element (refer to
Section 20.16.9.11, Overriding the target element ). The <tun> device will be enslaved to the bridge.
The IP range / network configuration is whatever is used on the LAN. This provides the guest virtual
machine full incoming and outgoing net access just like a physical machine.
On Linux systems, the bridge device is normally a standard Linux host physical machine bridge. On
host physical machines that support Open vSwitch, it is also possible to connect to an open vSwitch
bridge device by adding a vi rtual po rt type= ' o penvswi tch' / to the interface definition. The
Open vSwitch type virtualport accepts two parameters in its parameters element - an i nterfacei d
which is a standard uuid used to uniquely identify this particular interface to Open vSwitch (if you do
no specify one, a random i nterfacei d will be generated for you when you first define the
interface), and an optional pro fi l ei d which is sent to Open vSwitch as the interfaces <po rtpro fi l e>. To set the bridge to LAN settings, use a management tool that will configure the following
part of the domain XML:
381
...
<devices>
...
<interface type='bridge'>
<source bridge='br0'/>
</interface>
<interface type='bridge'>
<source bridge='br1'/>
<target dev='vnet7'/>
<mac address="00:11:22:33:44:55"/>
</interface>
<interface type='bridge'>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
<parameters profileid='menial' interfaceid='09b11c53-8b5c-4eeb8f00-d84eaa0aaa4f'/>
</virtualport>
</interface>
...
</devices>
Fig u re 20.37. D evices - n et wo rk in t erf aces- b rid g e t o LAN
<forward mode='nat'>
<address start='192.0.2.1' end='192.0.2.10'/>
</forward> ...
Fig u re 20.38. Po rt Masq u erad in g R an g e
These values should be set using the i ptabl es commands as shown in Section 18.3, Network
Address Translation Mode
...
382
<devices>
<interface type='user'/>
...
<interface type='user'>
<mac address="00:11:22:33:44:55"/>
</interface>
</devices>
...
Fig u re 20.39 . D evices - n et wo rk in t erf aces- U sersp ace SLIR P st ack
...
<devices>
<interface type='ethernet'/>
...
<interface type='ethernet'>
<target dev='vnet7'/>
<script path='/etc/qemu-ifup-mynet'/>
</interface>
</devices>
...
Fig u re 20.4 0. D evices - n et wo rk in t erf aces- g en eric Et h ern et co n n ect io n
...
<devices>
...
383
<interface type='direct'>
<source dev='eth0' mode='vepa'/>
</interface>
</devices>
...
Fig u re 20.4 1. D evices - n et wo rk in t erf aces- d irect at t ach men t t o p h ysical in t erf aces
The individual modes cause the delivery of packets to behave as shown in Table 20.17, D irect
attachment to physical interface elements :
T ab le 20.17. D irect at t ach men t t o p h ysical in t erf ace elemen t s
Elemen t
D escrip t io n
vepa
bri d g e
pri vate
passthro ug h
The network access of direct attached virtual machines can be managed by the hardware switch to
which the physical interface of the host physical machine machine is connected to.
The interface can have additional parameters as shown below, if the switch is conforming to the IEEE
802.1Qbg standard. The parameters of the virtualport element are documented in more detail in the
IEEE 802.1Qbg standard. The values are network specific and should be provided by the network
administrator. In 802.1Qbg terms, the Virtual Station Interface (VSI) represents the virtual interface of
a virtual machine.
Note that IEEE 802.1Qbg requires a non-zero value for the VLAN ID .
384
Additional elements that can be manipulated are described in Table 20.18, D irect attachment to
physical interface additional elements :
T ab le 20.18. D irect at t ach men t t o p h ysical in t erf ace ad d it io n al elemen t s
Elemen t
D escrip t io n
manag eri d
typei d
typei d versi o n
i nstancei d
pro fi l ei d
...
<devices>
...
<interface type='direct'>
<source dev='eth0.2' mode='vepa'/>
<virtualport type="802.1Qbg">
<parameters managerid="11" typeid="1193047" typeidversion="2"
instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/>
</virtualport>
</interface>
</devices>
...
Fig u re 20.4 2. D evices - n et wo rk in t erf aces- d irect at t ach men t t o p h ysical in t erf aces
ad d it io n al p aramet ers
The interface can have additional parameters as shown below if the switch is conforming to the IEEE
802.1Qbh standard. The values are network specific and should be provided by the network
administrator.
Additional parameters in the domain XML include:
385
...
<devices>
...
<interface type='direct'>
<source dev='eth0' mode='private'/>
<virtualport type='802.1Qbh'>
<parameters profileid='finance'/>
</virtualport>
</interface>
</devices>
...
Fig u re 20.4 3. D evices - n et wo rk in t erf aces- d irect at t ach men t t o p h ysical in t erf aces
mo re ad d it io n al p aramet ers
The pro fi l ei d attribute, contains the name of the port profile that is to be applied to this interface.
This name is resolved by the port profile database into the network parameters from the port profile,
and those network parameters will be applied to this interface.
...
<devices>
<interface type='hostdev'>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
</source>
<mac address='52:54:00:6d:90:02'>
<virtualport type='802.1Qbh'>
<parameters profileid='finance'/>
386
</virtualport>
</interface>
</devices>
...
Fig u re 20.4 4 . D evices - n et wo rk in t erf aces- PC I p asst h ro u g h
...
<devices>
<interface type='mcast'>
<mac address='52:54:00:6d:90:01'>
<source address='230.0.0.1' port='5558'/>
</interface>
</devices>
...
Fig u re 20.4 5. D evices - n et wo rk in t erf aces- mu lt icast t u n n el
2 0 .1 6 .9 .9 . T CP t unne l
Creating a TCP client/server architecture is another way to provide a virtual network wher one guest
virtual machine provides the server end of the network and all other guest virtual machines are
configured as clients. All network traffic between the guest virtual machines is routed via the guest
virtual machine that is configured as the server. This model is also available for use to unprivileged
users. There is no default D NS or D HCP support and no outgoing network access. To provide
outgoing network access, one of the guest virtual machines should have a second NIC which is
connected to one of the first 4 network types thereby providing the appropriate routing. A TCP tunnel
is created by manipulating the i nterface type using a management tool and setting/changing it
to server or cl i ent, and providing a mac and source address. The result is shown in changes
made to the domain XML:
...
<devices>
<interface type='server'>
<mac address='52:54:00:22:c9:42'>
387
<devices>
<interface type='network'>
<source network='default'/>
<target dev='vnet1'/>
<model type='virtio'/>
<driver name='vhost' txmode='iothread' ioeventfd='on'
event_idx='off'/>
</interface>
</devices>
...
Fig u re 20.4 7. D evices - n et wo rk in t erf aces- set t in g N IC d river- sp ecif ic o p t io n s
Currently the following attributes are available for the " virtio" NIC driver:
T ab le 20.19 . virt io N IC d river elemen t s
Paramet er
D escrip t io n
name
388
Paramet er
D escrip t io n
txmo d e
i o eventfd
event_i d x
...
<devices>
<interface type='network'>
<source network='default'/>
389
<target dev='vnet1'/>
</interface>
</devices>
...
Fig u re 20.4 8. D evices - n et wo rk in t erf aces- o verrid in g t h e t arg et elemen t
If no target is specified, certain hypervisors will automatically generate a name for the created tun
device. This name can be manually specified, however the name must not start with either 'vnet' or
'vif', which are prefixes reserved by libvirt and certain hypervisors. Manually specified targets using
these prefixes will be ignored.
...
<devices>
<interface type='network'>
<source network='default'/>
<target dev='vnet1'/>
<boot order='1'/>
</interface>
</devices>
...
Fig u re 20.4 9 . Sp ecif yin g b o o t o rd er
For hypervisors which support it, you can set a specific NIC to be used for the network boot. The
order of attributes determine the order in which devices will be tried during boot sequence. Note that
the per-device boot elements cannot be used together with general boot elements in BIOS bootloader
section.
...
<devices>
<interface type='network'>
<source network='default'/>
<target dev='vnet1'/>
<rom bar='on' file='/etc/fake/boot.bin'/>
</interface>
</devices>
...
390
2 0 .1 6 .9 .1 4 . Qualit y o f se rvice
This section of the domain XML provides setting quality of service. Incoming and outgoing traffic can
be shaped independently. The band wi d th element can have at most one inbound and at most one
outbound child elements. Leaving any of these children element out results in no QoS being applied
on that traffic direction. Therefore, when you want to shape only domain's incoming traffic, use
inbound only, and vice versa.
Each of these elements has one mandatory attribute averag e (or fl o o r as described below).
averag e specifies average bit rate on the interface being shaped. Then there are two optional
attributes: peak, which specifies maximum rate at which interface can send data, and burst, which
specifies the amount of bytes that can be burst at peak speed. Accepted values for attributes are
integer numbers.
The units for averag e and peak attributes are kilobytes per second, whereas burst is only set in
kilobytes. In addtion, inbound traffic can optionally have a fl o o r attribute. This guarantees minimal
throughput for shaped interfaces. Using the fl o o r requires that all traffic goes through one point
where QoS decisions can take place. As such it may only be used in cases where the i nterface
type= ' netwo rk' / with a fo rward type of ro ute, nat, or no forward at all). It should be noted that
within a virtual network, all connected interfaces are required to have at least the inbound QoS set
(averag e at least) but the floor attribute doesn't require specifying averag e. However, peak and
burst attributes still require averag e. At the present time, ingress qdiscs may not have any classes,
and therefore fl o o r may only be applied only on inbound and not outbound traffic.
To specify the QoS configuration settings, use a management tool to make the following changes to
the domain XML:
...
<devices>
<interface type='network'>
<source network='default'/>
<target dev='vnet0'/>
<bandwidth>
<inbound average='1000' peak='5000' floor='200' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
</interface>
<devices>
...
Fig u re 20.51. Q u alit y o f service
391
...
<devices>
<interface type='bridge'>
<vlan>
<tag id='42'/>
</vlan>
<source bridge='ovsbr0'/>
<virtualport type='openvswitch'>
<parameters interfaceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/>
</virtualport>
</interface>
<devices>
...
Fig u re 20.52. Set t in g VLAN t ag ( o n su p p o rt ed n et wo rk t yp es o n ly)
If (and only if) the network connection used by the guest virtual machine supports vlan tagging
transparent to the guest virtual machine, an optional vl an element can specify one or more vlan
tags to apply to the guest virtual machine's network traffic (openvswitch and type= ' ho std ev' SRIOV interfaces do support transparent vlan tagging of guest virtual machine traffic; everything else,
including standard Linux bridges and libvirt's own virtual networks, do not support it. 802.1Qbh (vnlink) and 802.1Qbg (VEPA) switches provide their own way (outside of libvirt) to tag guest virtual
machine traffic onto specific vlans.) To allow for specification of multiple tags (in the case of vlan
trunking), a subelement, tag , specifies which vlan tag to use (for example: tag i d = ' 4 2' /. If an
interface has more than one vl an element defined, it is assumed that the user wants to do VLAN
trunking using all the specified tags. In the case that vlan trunking with a single tag is desired, the
optional attribute trunk= ' yes' can be added to the toplevel vlan element.
...
<devices>
<interface type='network'>
<source network='default'/>
<target dev='vnet0'/>
392
<link state='down'/>
</interface>
<devices>
...
Fig u re 20.53. Mo d if yin g virt u al lin k st at e
...
<devices>
<input type='mouse' bus='usb'/>
</devices>
...
Fig u re 20.54 . In p u t d evices
The <i nput> element has one mandatory attribute: type which can be set to: mo use or tabl et. The
latter provides absolute cursor movement, while the former uses relative movement. The optional bus
attribute can be used to refine the exact device type and can be set to: xen (paravirtualized), ps2,
and usb.
The input element has an optional sub-element <ad d ress>, which can tie the device to a particular
PCI slot, as documented above.
...
<devices>
<hub type='usb'/>
</devices>
...
Fig u re 20.55. H u b d evices
393
The hub element has one mandatory attribute, the type whose value can only be usb. The hub
element has an optional sub-element ad d ress with type= ' usb' which can tie the device to a
particular controller.
...
<devices>
<graphics type='sdl' display=':0.0'/>
<graphics type='vnc' port='5904'>
<listen type='address' address='192.0.2.1'/>
</graphics>
<graphics type='rdp' autoport='yes' multiUser='yes' />
<graphics type='desktop' fullscreen='yes'/>
<graphics type='spice'>
<listen type='network' network='rednet'/>
</graphics>
</devices>
...
Fig u re 20.56 . G rap h ical Frameb u f f ers
The g raphi cs element has a mandatory type attribute which takes the value sd l , vnc, rd p or
d eskto p as explained below:
T ab le 20.20. G rap h ical f rameb u f f er elemen t s
Paramet er
D escrip t io n
sd l
394
Paramet er
D escrip t io n
vnc
spi ce
When SPICE has both a normal and TLS secured TCP port configured, it may be desirable to restrict
what channels can be run on each port. This is achieved by adding one or more channel elements
inside the main g raphi cs element. Valid channel names include mai n, d i spl ay, i nputs, curso r,
pl ayback, reco rd ; smartcard ; and usbred i r.
To specify the SPICE configuration settings, use a mangement tool to make the following changes to
the domain XML:
395
D escrip t io n
rd p
d eskto p
396
Paramet er
D escrip t io n
l i sten
...
<devices>
<video>
<model type='vga' vram='8192' heads='1'>
<acceleration accel3d='yes' accel2d='yes'/>
397
</model>
</video>
</devices>
...
Fig u re 20.58. Vid eo d evices
The g raphi cs element has a mandatory type attribute which takes the value " sdl" , " vnc" , " rdp" or
" desktop" as explained below:
T ab le 20.22. G rap h ical f rameb u f f er elemen t s
Paramet er
D escrip t io n
vi d eo
mo d el
acceleration
ad d ress
...
<devices>
<parallel type='pty'>
<source path='/dev/pts/2'/>
<target port='0'/>
</parallel>
<serial type='pty'>
<source path='/dev/pts/3'/>
<target port='0'/>
398
</serial>
<console type='pty'>
<source path='/dev/pts/4'/>
<target port='0'/>
</console>
<channel type='unix'>
<source mode='bind' path='/tmp/guestfwd'/>
<target type='guestfwd' address='10.0.2.1' port='4600'/>
</channel>
</devices>
...
Fig u re 20.59 . C o n so les, serial, p arallel, an d ch an n el d evices
In each of these directives, the top-level element name (parallel, serial, console, channel) describes
how the device is presented to the guest virtual machine. The guest virtual machine interface is
configured by the target element. The interface presented to the host physical machine is given in the
type attribute of the top-level element. The host physical machine interface is configured by the
source element. The source element may contain an optional seclabel to override the way that
labelling is done on the socket path. If this element is not present, the security label is inherited from
the per-domain setting. Each character device element has an optional sub-element ad d ress which
can tie the device to a particular controller or PCI slot.
...
<devices>
<parallel type='pty'>
<source path='/dev/pts/2'/>
<target port='0'/>
</parallel>
</devices>
...
Fig u re 20.6 0. G u est virt u al mach in e in t erf ace Parallel Po rt
<targ et> can have a po rt attribute, which specifies the port number. Ports are numbered starting
from 0. There are usually 0, 1 or 2 parallel ports.
To set the serial port use a management tool to make the following change to the domain XML:
...
<devices>
<serial type='pty'>
399
<source path='/dev/pts/3'/>
<target port='0'/>
</serial>
</devices>
...
Fig u re 20.6 1. G u est virt u al mach in e in t erf ace serial p o rt
<targ et> can have a po rt attribute, which specifies the port number. Ports are numbered starting
from 0. There are usually 0, 1 or 2 serial ports. There is also an optional type attribute, which has
two choices for its value, one is i sa-seri al , the other is usb-seri al . If type is missing, i saseri al will be used by default. For usb-serial an optional sub-element <ad d ress> with
type= ' usb' can tie the device to a particular controller, documented above.
The <co nso l e> element is used to represent interactive consoles. D epending on the type of guest
virtual machine in use, the consoles might be paravirtualized devices, or they might be a clone of a
serial device, according to the following rules:
If no targ etT ype attribute is set, then the default device type is according to the hypervisor's
rules. The default type will be added when re-querying the XML fed into libvirt. For fully virtualized
guest virtual machines, the default device type will usually be a serial port.
If the targ etT ype attribute is seri al , and if no <seri al > element exists, the console element
will be copied to the <seri al > element. If a <seri al > element does already exist, the console
element will be ignored.
If the targ etT ype attribute is not seri al , it will be treated normally.
Only the first <co nso l e> element may use a targ etT ype of seri al . Secondary consoles must
all be paravirtualized.
On s390, the console element may use a targetType of sclp or sclplm (line mode). SCLP is the
native console type for s390. There's no controller associated to SCLP consoles.
In the example below, a virtio console device is exposed in the guest virtual machine as /dev/hvc[0-7]
(for more information, see http://fedoraproject.org/wiki/Features/VirtioSerial):
...
<devices>
<console type='pty'>
<source path='/dev/pts/4'/>
<target port='0'/>
</console>
<!-- KVM virtio console -->
<console type='pty'>
<source path='/dev/pts/5'/>
<target type='virtio' port='0'/>
</console>
</devices>
...
...
<devices>
4 00
20.16.16. Channel
This represents a private communication channel between the host physical machine and the guest
virtual machine and is manipulated by making changes to your guest virtual machine virtual
machine using a management tool that results in changes made to the following section of the
domain xml
...
<devices>
<channel type='unix'>
<source mode='bind' path='/tmp/guestfwd'/>
<target type='guestfwd' address='10.0.2.1' port='4600'/>
</channel>
<!-- KVM virtio channel -->
<channel type='pty'>
<target type='virtio' name='arbitrary.virtio.serial.port.name'/>
</channel>
<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/f16x86_64.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
<channel type='spicevmc'>
<target type='virtio' name='com.redhat.spice.0'/>
</channel>
</devices>
...
Fig u re 20.6 3. C h an n el
This can be implemented in a variety of ways. The specific type of <channel > is given in the type
attribute of the <targ et> element. D ifferent channel types have different target attributes as follows:
g uestfwd - D ictates that TCP traffic sent by the guest virtual machine to a given IP address and
port is forwarded to the channel device on the host physical machine. The targ et element must
have address and port attributes.
vi rti o - Paravirtualized virtio channel. <channel > is exposed in the guest virtual machine
4 01
under /d ev/vpo rt*, and if the optional element nameis specified, /d ev/vi rti o -po rts/$name
(for more info, please see http://fedoraproject.org/wiki/Features/VirtioSerial). The optional element
ad d ress can tie the channel to a particular type= ' vi rti o -seri al ' controller, documented
above. With QEMU, if name is " org.qemu.guest_agent.0" , then libvirt can interact with a guest
virtual machine agent installed in the guest virtual machine, for actions such as guest virtual
machine shutdown or file system quiescing.
spi cevmc - Paravirtualized SPICE channel. The domain must also have a SPICE server as a
graphics device, at which point the host physical machine piggy-backs messages across the
main channel. The targ et element must be present, with attribute type= ' vi rti o ' ; an optional
attribute name controls how the guest virtual machine will have access to the channel, and
defaults to name= ' co m. red hat. spi ce. 0 ' . The optional <ad d ress> element can tie the
channel to a particular type= ' vi rti o -seri al ' controller.
D escrip t io n
D omain logfile
XML sn ip p et
D evice logfile
4 02
Paramet er
D escrip t io n
Virtual console
XML sn ip p et
Null device
Pseudo TTY
NB Special case
4 03
Paramet er
D escrip t io n
XML sn ip p et
Named pipe
TCP client/server
4 04
Paramet er
D escrip t io n
Or
assn
a TCP
XML
ip p etserver waiting for
a client connection.
4 05
Paramet er
D escrip t io n
UD P network console
XML sn ip p et
</seri al >
</d evi ces>
<d evi ces>
<seri al type= "ud p">
<so urce mo d e= "bi nd "
ho st= "0 . 0 . 0 . 0 "
servi ce= "24 4 5"/>
<so urce
mo d e= "co nnect"
ho st= "0 . 0 . 0 . 0 "
servi ce= "24 4 5"/>
<targ et po rt= "1"/>
</seri al >
</d evi ces>
...
<devices>
<sound model='es1370'/>
</devices>
...
Fig u re 20.6 4 . Virt u al so u n d card
4 06
The sound element has one mandatory attribute, mo d el , which specifies what real sound device is
emulated. Valid values are specific to the underlying hypervisor, though typical choices are
' es1370 ' , ' sb16 ' , ' ac9 7' , and ' i ch6 ' . In addition, a sound element with ich6 model can have
optional sub-elements co d ec to attach various audio codecs to the audio device. If not specified, a
default codec will be attached to allow playback and recording. Valid values are ' d upl ex'
(advertises a line-in and a line-out) and ' mi cro ' (advertises a speaker and a microphone).
...
<devices>
<sound model='ich6'>
<codec type='micro'/>
<sound/>
</devices>
...
Fig u re 20.6 5. So u n d d evices
Each sound element has an optional sub-element <ad d ress> which can tie the device to a
particular PCI slot, documented above.
...
<devices>
<watchdog model='i6300esb'/>
</devices>
...
...
<devices>
<watchdog model='i6300esb' action='poweroff'/>
</devices>
</domain>
Fig u re 20.6 6 . Wat ch d o g D evice
The following attributes are declared in this XML:
mo d el - The required mo d el attribute specifies what real watchdog device is emulated. Valid
values are specific to the underlying hypervisor.
The mo d el attribute may take the following values:
4 07
...
<devices>
<memballoon model='virtio'/>
</devices>
...
Fig u re 20.6 7. Memo ry b allo o n d evice
Here is an example where the device is added manually with static PCI slot 2 requested
...
<devices>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02'
4 08
function='0x0'/>
</memballoon>
</devices>
</domain>
Fig u re 20.6 8. Memo ry b allo o n d evice ad d ed man u ally
The required mo d el attribute specifies what type of balloon device is provided. Valid values are
specific to the virtualization platform are: ' vi rti o ' which is the default setting with the KVM
hypervisor or ' xen' which is the default setting with the Xen hypervisor.
4 09
<domain type='qemu'>
<name>QEmu-fedora-i686</name>
<uuid>c7a5fdbd-cdaf-9455-926a-d65c16db1809</uuid>
<memory>219200</memory>
<currentMemory>219200</currentMemory>
<vcpu>2</vcpu>
<os>
<type arch='i686' machine='pc'>hvm</type>
<boot dev='cdrom'/>
</os>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='cdrom'>
<source file='/home/user/boot.iso'/>
<target dev='hdc'/>
<readonly/>
</disk>
<disk type='file' device='disk'>
<source file='/home/user/fedora.img'/>
4 10
<target dev='hda'/>
</disk>
<interface type='network'>
<source network='default'/>
</interface>
<graphics type='vnc' port='-1'/>
</devices>
</domain>
Fig u re 20.70. Examp le d o main XML co n f ig
KVM hardware accelerated guest virtual machine on i686
<domain type='kvm'>
<name>demo2</name>
<uuid>4dea24b3-1d52-d8f3-2516-782e98a23fa0</uuid>
<memory>131072</memory>
<vcpu>1</vcpu>
<os>
<type arch="i686">hvm</type>
</os>
<clock sync="localtime"/>
<devices>
<emulator>/usr/bin/qemu-kvm</emulator>
<disk type='file' device='disk'>
<source file='/var/lib/libvirt/images/demo2.img'/>
<target dev='hda'/>
</disk>
<interface type='network'>
<source network='default'/>
<mac address='24:42:53:21:52:45'/>
</interface>
<graphics type='vnc' port='-1' keymap='de'/>
</devices>
</domain>
Fig u re 20.71. Examp le d o main XML co n f ig
4 11
4 12
bridge-id
STP
enabled
interfaces
---------------------------------------------------------------------------virtbr0
8000.feffffff
yes
eth0
# brctl showmacs virtbr0
port-no
mac-addr
1
fe:ff:ff:ff:ff:
2
fe:ff:ff:fe:ff:
# brctl showstp virtbr0
virtbr0
bridge-id
8000.fefffffffff
designated-root
8000.fefffffffff
root-port
0
max-age
20.00
hello-time
2.00
forward-delay
0.00
aging-time
300.01
hello-timer
1.43
topology-change-timer 0.00
local?
yes
yes
aging timer
0.00
0.00
path-cost
bridge-max-age
bridge-hello-time
bridge-forward-delay
0
20.00
2.00
0.00
tcn-timer
gc-timer
0.00
0.02
Listed below are some other useful commands for troubleshooting virtualization.
st race is a command which traces system calls and events received and used by another
process.
vn cviewer: connect to a VNC server running on your server or a virtual machine. Install
vn cviwer using the yum i nstal l vnc command.
vn cserver: start a remote desktop on your server. Gives you the ability to run graphical user
interfaces such as virt-manager via a remote session. Install vn cserver using the yum i nstal l
vnc-server command.
4 13
to create virtual storage, the image file can be found in the location specified for vi rsh
po o l -d efi ne by the user. For instructions on how to back up the guest image files, use the
steps described in Procedure 21.1, Creating a backup of the guest virtual machine's disk
image for disaster recovery purposes .
If you are using bridges, you will also need to back up the files located in
/etc/sysco nfi g /netwo rk-scri pts/i fcfg -<bri d g e_name>
Optionally, the guest virtual machine core dump files found in /var/l i b/l i bvi rt/q emu/d ump
can also be backed up to be used for analysing the causes of the failure. Note, however, that
these files can be very large for some systems.
Pro ced u re 21.1. C reat in g a b acku p o f t h e g u est virt u al mach in e' s d isk imag e f o r
d isast er reco very p u rp o ses
This procedure will cover how to back up several different disk image types.
1. To back up only the guest virtual machine disk image, back up the files located in
/var/l i b/l i bvi rt/i mag es. To back up guest virtual machine disk images with LVM
logical volumes, run the following command:
# lvcreate --snapshot --name snap --size 8G /dev/vg0/data
This command creates a snapshot volume named snap with a size of 8G as part of a 64G
volume.
2. Create a file for the snapshots using a command similar to this one:
# mkdir /mnt/virt.snapshot
3. Mount the directory you created and the snapshot volume using the following command:
# mount /dev/vg0/snap /mnt/virt.snapshot
4. Use one of the following commands to back up the volume:
a.
b.
4 14
While the domain (guest virtual machine domain name) and corefilepath (location of the newly
created core dump file) are mandatory, the following arguments are optional:
--l i ve creates a dump file on a running machine and doesn't pause it.
--crash stops the guest virtual machine and generates the dump file. The main difference is that
the guest virtual machine will not be listed as Stopped, with the reason as Crashed. Note that in
virt - man ag er the status will be listed as Paused.
--reset will reset the guest virtual machine following a successful dump. Note, these three
switches are mutually exclusive.
--bypass-cache uses O_D IRECT to bypass the file system cache.
--memo ry-o nl y the dump file will be saved as an elf file, and will only include domains memory
and cpu common register value. This option is very useful if the domain uses host devices
directly.
--verbo se displays the progress of the dump
The entire dump process may be monitored using vi rsh d o mjo bi nfo command and can be
canceled by running vi rsh d o mjo babo rt.
21.4 . kvm_st at
The kvm_stat command is a python script which retrieves runtime statistics from the kvm kernel
module. The kvm_stat command can be used to diagnose guest behavior visible to kvm. In
particular, performance related issues with guests. Currently, the reported statistics are for the entire
system; the behavior of all running guests is reported. To run this script you need to install the qemukvm-tools package.
The kvm_stat command requires that the kvm kernel module is loaded and d ebug fs is mounted. If
either of these features are not enabled, the command will output the required steps to enable
d ebug fs or the kvm module. For example:
# kvm_stat
Please mount debugfs ('mount -t debugfs debugfs /sys/kernel/debug')
and ensure the kvm modules are loaded
Mount d ebug fs if required:
# mount -t debugfs debugfs /sys/kernel/debug
kvm_st at O u t p u t
The kvm_stat command outputs statistics for all guests and the host. The output is updated until the
command is terminated (using C trl +c or the q key).
# kvm_stat
kvm statistics
efer_reload
exits
fpu_reload
halt_exits
94
4003074
1313881
14050
0
31272
10796
259
4 15
halt_wakeup
4496
203
host_state_reload 1638354
24893
hypercalls
0
0
insn_emulation
1093850
1909
insn_emulation_fail
0
0
invlpg
75569
0
io_exits
1596984
24509
irq_exits
21013
363
irq_injections
48039
1222
irq_window
24656
870
largepages
0
0
mmio_exits
11873
0
mmu_cache_miss
42565
8
mmu_flooded
14752
0
mmu_pde_zapped
58730
0
mmu_pte_updated
6
0
mmu_pte_write
138795
0
mmu_recycled
0
0
mmu_shadow_zapped
40358
0
mmu_unsync
793
0
nmi_injections
0
0
nmi_window
0
0
pf_fixed
697731
3150
pf_guest
279349
0
remote_tlb_flush
5
0
request_irq
0
0
signal_exits
1
0
tlb_flush
200190
0
Exp lan at io n o f variab les:
ef er_relo ad
The number of Extended Feature Enable Register (EFER) reloads.
exit s
The count of all VMEXIT calls.
f p u _relo ad
The number of times a VMENT R Y reloaded the FPU state. The fpu_rel o ad is incremented
when a guest is using the Floating Point Unit (FPU).
h alt _exit s
Number of guest exits due to hal t calls. This type of exit is usually seen when a guest is
idle.
h alt _wakeu p
Number of wakeups from a hal t.
h o st _st at e_relo ad
Count of full reloads of the host state (currently tallies MSR setup and guest MSR reads).
h yp ercalls
4 16
4 17
mmu _u n syn c
Number of non-synchronized pages which are not yet unlinked.
n mi_in ject io n s
Number of Non-maskable Interrupt (NMI) injections to the guest.
n mi_win d o w
Number of guest exits from (outstanding) Non-maskable Interrupt (NMI) windows.
p f _f ixed
Number of fixed (non-paging) page table entry (PTE) maps.
p f _g u est
Number of page faults injected into guests.
remo t e_t lb _f lu sh
Number of remote (sibling CPU) Translation Lookaside Buffer (TLB) flush requests.
req u est _irq
Number of guest interrupt window request exits.
sig n al_exit s
Number of guest exits due to pending signals from the host.
t lb _f lu sh
Number of tl b_fl ush operations performed by the hypervisor.
Note
The output information from the kvm_stat command is exported by the KVM hypervisor as
pseudo files located in the /sys/kernel /d ebug /kvm/ directory.
4 18
<channel type='unix'>
<source mode='bind'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
Fig u re 21.1. C o n f ig u rin g t h e g u est ag en t ch an n el
3. Start the guest virtual machine, by running vi rsh start [d o mai n].
4. Install qemu-guest-agent on the guest virtual machine (yum i nstal l q emu-g uest-ag ent)
and make it run automatically at every boot as a service (qemu-guest-agent.service). Refer to
Chapter 10, QEMU-img and QEMU Guest Agent for more information.
4 19
You can also use vi rt-manag er to display the virtual text console. In the guest console window,
select Seri al 1 in T ext C o nso l es from the Vi ew menu.
4 20
The virtualization extensions cannot be disabled in the BIOS for AMD -V.
Refer to the following section for instructions on enabling disabled virtualization extensions.
Verify the virtualization extensions are enabled in BIOS. The BIOS settings for Intel VT or AMD -V are
usually in the C h ip set or Pro cesso r menus. The menu names may vary from this guide, the
virtualization extension settings may be found in Securi ty Setti ng s or other non standard menu
names.
Pro ced u re 21.3. En ab lin g virt u aliz at io n ext en sio n s in B IO S
1. Reboot the computer and open the system's BIOS menu. This can usually be done by
pressing the d el ete key, the F1 key or Al t and F4 keys depending on the system.
2. En ab lin g t h e virt u aliz at io n ext en sio n s in B IO S
Note
Many of the steps below may vary depending on your motherboard, processor type,
chipset and OEM. Refer to your system's accompanying documentation for the correct
information on configuring your system.
a. Open the P ro cesso r submenu The processor settings menu may be hidden in the
C hi pset, Ad vanced C P U C o nfi g urati o n or No rthbri d g e.
b. Enable Intel Vi rtual i zati o n T echno l o g y (also known as Intel VT-x). AMD -V
extensions cannot be disabled in the BIOS and should already be enabled. The
virtualization extensions may be labeled Vi rtual i zati o n Extensi o ns,
Vand erpo o l or various other names depending on the OEM and system BIOS.
c. Enable Intel VT-d or AMD IOMMU, if the options are available. Intel VT-d and AMD
IOMMU are used for PCI device assignment.
d. Select Save & Exi t.
3. Reboot the machine.
4. When the machine has booted, run cat /pro c/cpui nfo | g rep -E "vmx| svm".
Specifying --color is optional, but useful if you want the search term highlighted. If the
command outputs, the virtualization extensions are now enabled. If there is no output your
system may not have the virtualization extensions or the correct BIOS setting enabled.
4 21
Note
Note that the virtualized Intel PRO/1000 (e10 0 0 ) driver is also supported as an emulated
driver choice. To use the e10 0 0 driver, replace vi rti o in the procedure below with e10 0 0 .
For the best performance it is recommended to use the vi rti o driver.
4 22
# cp /tmp/guest-template.xml /tmp/new-guest.xml
# vi /tmp/new-guest.xml
Add the model line in the network interface section:
<interface type='network'>
[output truncated]
<model type='virtio' />
</interface>
3. Create the new virtual machine:
# virsh define /tmp/new-guest.xml
# virsh start new-guest
21.12. Workaround for Creat ing Ext ernal Snapshot s wit h libvirt
There are two classes of snapshots for QEMU guests. Internal snapshots are contained completely
within a qcow2 file, and fully supported by libvirt, allowing for creating, deleting, and reverting of
snapshots. This is the default setting used by libvirt when creating a snapshot, especially when no
option is specified. Although this file type takes a bit longer than others in creating the the snapshot,
it is required by libvirt to use qcow2 disks. Another drawback to this file type is that qcow2 disks are
not subject to receive improvements from QEMU.
External snapshots, on the other hand work with any type of original disk image, can be taken with
no guest downtime, and are able to receive active improvements from QEMU. In libvirt, they are created
when using the --d i sk-o nl y option to snapsho t-create-as (or when specifying an explicit XML
file to snapsho t-create that does the same). At the moment external snapshots are a one-way
operation as libvirt can create them but cannot do anything further with them.
4 23
4 24
If the output includes kvm_i ntel or kvm_amd then the kvm hardware virtualization modules
are loaded and your system meets requirements.
Note
If the libvirt package is installed, the vi rsh command can output a full list of virtualization
system capabilities. Run vi rsh capabi l i ti es as root to receive the complete list.
4 25
4 26
4 27
T u e Au g 9 2016
Jiri H errman n
R evisio n 1- 501
Mo n May 02 2016
Updates for the 6.8 GA release
Jiri H errman n
R evisio n 1- 500
T h u Mar 01 2016
Multiple updates for the 6.8 beta release
Jiri H errman n
R evisio n 1- 4 4 9
T h u O ct 08 2015
Cleaned up the Revision History
Jiri H errman n
R evisio n 1- 4 4 7
Fri Ju l 10 2015
Updates for the 6.7 GA release.
D ayle Parker
4 28