VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices
AQ1
1 Introduction
As cloud computing becomes more prevalent, the usage of virtualized guest sys-
tems for rapid and scalable deployment of computing resources is increasing.
Major cloud service providers, such as Amazon Web Services (AWS), Microsoft
Azure, and IBM SoftLayer, continue to grow as demand for cloud computing
resources increases. Amazon, the current market leader in cloud computing,
reported that AWSs net sales exceeded 7.88 billion USD in 2015 [2], which
demonstrates a strong market need for virtualization technology.
This popularity has led to an increased interest in mitigating attacks that tar-
get hypervisors from within the virtualized guest environments that they host.
This document has been approved for public release: 88ABW-2016-3973.
Electronic supplementary material The online version of this chapter (doi:10.
1007/978-3-319-66332-6 1) contains supplementary material, which is available to
authorized users.
c Springer International Publishing AG 2017
M. Dacier et al. (Eds.): RAID 2017, LNCS 10453, pp. 123, 2017.
DOI: 10.1007/978-3-319-66332-6 1
2 A. Henderson et al.
Author Proof
Unfortunately, hypervisors are complex pieces of software that are dicult to test
under every possible set of guest runtime conditions. Virtual hardware devices
used by guests, which are hardware peripherals emulated in software (rather than
directly mapping to physical devices on the host system), are particularly com-
plex and a source of numerous bugs [36]. This has led to the ongoing discovery
of vulnerabilities that exploit these virtual devices to access the host.
Because virtual devices are so closely associated with the hypervisor, if not
integrated directly into it, they execute at a higher level of privilege than any
code executing within the guest environment. They are not part of the guest
environment, per se, but they are privileged subsystems that the guest environ-
ment directly interacts with. Under no circumstances should activity originating
from within the guest be able to attack and compromise the hypervisor, so eec-
tively identifying potential vulnerabilities in these virtual devices is a dicult,
but valuable, problem to consider. However, these virtual devices are written by
a number of dierent authors, and the most complex virtual devices are imple-
mented using thousands of lines of code. Therefore, it is desirable to discover
an eective and ecient method to test these devices in a scalable and auto-
mated fashion without requiring expert knowledge of each virtual devices state
machine and internal details.
Such issues have led to a strong interest in eectively testing virtual device
code [9,28] to discover bugs or other behaviors that may lead to vulnerabilities.
However, this is a non-trivial task as virtual devices are often tightly coupled
to the hypervisor codebase and may need to pass through a number of device
initialization states (i.e. BIOS and guest kernel initialization of the device) before
representing the devices state within a running guest system.
Evolutionary fuzzing techniques (e.g., AFL [38]) has gained its popularity
recently for its eectiveness in discovering crashes and hangs. It is widely used
in industry, and most nalists in the DARPA Cyber Grand Challenge used it
for vulnerability discovery. Several academic research papers soon appeared to
further improve the eectiveness of evolutionary fuzzing, such as AFLFast [21],
VUzzer [33], Driller [35], and DeepFuzz [22]. While these eorts greatly improve
the state-of-the-art, they aim at nding defects within the entire user-level pro-
gram, and cannot be directly applied to nd bugs in virtual devices, for several
reasons. First of all, the fuzz testing must be targeted at specic virtual device
code, which is a rather small portion of the entire hypervisor code base. It must
be in-situ as well, as virtual devices frequently interact with the rest of the
hypervisor code. Last but not least, it must be stateful, since virtual devices
need to be properly initialized and reach certain states to trigger defects.
To address these unique challenges, we propose Virtual Device Fuzzer (VDF),
a novel fuzz testing framework that provides targeted fuzz testing of interesting
subsystems (virtual devices) within complex programs. VDF enables the testing
of virtual devices within the context of a running hypervisor. It utilizes record
and replay of virtual device memory-mapped I/O (MMIO) activity to create
fuzz testing seed inputs that are guaranteed to reach states of interest and ini-
tialize each virtual device to a known good state from which to begin each test.
Providing proper seed test cases to the fuzzer is important for eective exploring
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 3
Author Proof
the branches of a program [25,34], as a good starting seed will focus the fuzzers
eorts in areas of interest within the program. VDF mutates these seed inputs
to generate and replay fuzzed MMIO activity to exercise additional branches of
interest.
As a proof of concept, we utilize VDF to test a representative set of eighteen
virtual devices implemented within the QEMU whole-system emulator [19], a
popular type-2 hypervisor that uses a virtualized device model. Whether QEMU
completely emulates the guest CPU or uses another hypervisor, such as KVM [10]
or Xen [18], to execute guest CPU instructions, hardware devices made available
to the guest are software-based devices implemented within QEMU.
In summary, this paper makes the following contributions:
We propose and develop a targeted, in-situ fuzz testing framework for virtual
devices.
We evaluate VDF by testing eighteen QEMU virtual devices, executing over
2.28 billion test cases in several parallel VDF instances within a cloud envi-
ronment. This testing discovered a total of 348 crashes and 666 hangs within
six of the tested virtual devices. Bug reports and CVEs have been reported
to the QEMU maintainers where applicable.
We devise a testcase minimization algorithm to reduce each crash/hang test
case to a minimal test case that still reproduces the same bug. The average
test case is reduced to only 18.57% of its original size, greatly simplifying the
analysis of discovered bugs and discovering duplicate test cases that reproduce
the same bug. We also automatically generate source code suitable for repro-
ducing the activity of each test case to aid in the analysis of each discovered
bug.
We analyze the discovered bugs and organize them into four categories: excess
host resource usage, invalid data transfers, debugging asserts, and multi-
threaded race conditions.
2 Background
Within QEMU, virtual device code registers callback functions with QEMUs
virtual memory management unit (MMU). These callback functions expose vir-
tual device functionality to the guest environment and are called when specic
memory addresses within the guest memory space are read or written. QEMU
uses this mechanism to implement memory-mapped I/O (MMIO), mimicking
the MMIO mechanism of physical hardware.
We have identied a model for guest activity that attempts to attack these
virtual devices:
1. The virtual device is correctly instantiated by the hypervisor and made avail-
able to the guest environment.
2. The virtual device is correctly initialized via the guests BIOS and OS kernel
and is brought to a stable state during the guest boot process. Any needed
guest kernel device drivers have been loaded and initialized.
4 A. Henderson et al.
Author Proof
3. Once the guest boots, the attacker acquires privileged access within the guest
and attempts to attack the virtual devices via memory reads/writes to the
MMIO address(es) belonging to these virtual devices.
1
QEMU provides the qtest framework to perform arbitrary read/write activity with-
out the guest. We discuss qtest, and its limitations when fuzz testing, in Sect. 3.
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 5
Author Proof
Fig. 1. Device access process for a device request originating from inside of a
QEMU/KVM guest. Note that the highest level of privilege in the guest (ring 0)
is still lower than that of the QEMU process (ring 3).
Fig. 2. The x86 address space layout for port- and memory-mapped I/O.
needed by each PCI device are reported to the BIOS. The BIOS then determines
a memory-mapping for each register bank that satises the MMIO needs of all
PCI devices without any overlap. Finally, the BIOS instructs the PCI bus to
map specic base addresses to each devices register banks using the PCI base
address registers (BARs) of each device.
However, PCI makes the task of virtual device testing more dicult. By
default, the BARs for each device contain invalid addresses. Until the BARs
are initialized by the BIOS, PCI devices are unusable. The PCI host controller
provides two 32-bit registers in the ISA MMIO/PMIO address space for con-
guring each PCI device BAR2 . Until the proper read/write sequence is made
to these two registers, PCI devices remain uncongured and inaccessible to the
guest environment. Therefore, conguring a virtual PCI-based device involves
initializing both the state of the PCI bus and the virtual device.
3
VDF still uses a two-byte branch ID, allowing for 65536 unique branches to be
instrumented. In practice, this is more than adequate for virtual device testing.
8 A. Henderson et al.
Author Proof
Fig. 3. VDFs process for performing fuzz testing of QEMU virtual devices.
with one required master instance and one or more optional slave instances. The
primary dierence between master and slave instances is that the master uses a
series of sophisticated mutation strategies (bit/byte swapping, setting bytes to
specic values like 0x00 and 0xFF, etc.) to explore the program under test. Slave
instances only perform random bit ips throughout the seed data.
Once the seed input has been mutated into a new test case, a new QEMU
instance is spawned by AFL. VDF replays the test case in the new QEMU
instance and observes whether the mutated data has caused QEMU to crash or
hang. VDF does not blindly replay events, but rather performs strict ltering
on the mutated seed input during replay. The lter discards malformed events,
events describing a read/write outside the range of the current register bank,
events referencing an invalid register bank, etc. This prevents mutated data
from potentially exercising memory locations unrelated to the virtual device
under test. If a test case causes a crash or hang, the test case is logged to disk.
Finally, in the third step, each of the collected crash and hang test cases is
reduced to a minimal test case capable of reproducing the bug. Both a minimized
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 9
Author Proof
test case and source code to reproduce the bug are generated. The minimization
of test cases is described further in Sect. 3.5.
4
If only a minimal amount of recorded activity is required, VDF can capture initial-
ization activity via executing a QEMU qtest test case.
10 A. Henderson et al.
Author Proof
Second, the recorded startup activity is partitioned into two sets: an init set
and a seed set. The init set contains any seed input required to initialize the
device for testing, such as PCI BAR setup, and the activity in this set will never
be mutated by the fuzzer. VDF plays back the init set at the start of each test
to return the device to a known, repeatable state. The seed set contains the seed
input that will be mutated by the fuzzer. It can be any read/write sequence
that exercises the device, and it usually originates from user space activity that
exercises the device (playing an audio le, pinging an IP address, etc.).
Even with no guest OS booted or present, a replay of these two sets returns
the virtual device to the same state that it was in immediately after the reg-
ister activity was originally recorded. While the data in the sets could include
timestamps to ensure that the replay occurs at the correct time intervals, VDF
does not do this. Instead, VDF takes the simpler approach of advancing the vir-
tual clock one microsecond for each read or write performed. The diculty with
including timestamps within the seed input is that the value of the timestamp is
too easily mutated into very long virtual delays between events. While it is true
that some virtual device branches may only be reachable when a larger virtual
time interval has passed (such as interrupts that are raised when a device has
completed performing some physical event), our observation is that performing
a xed increment of virtual time on each read or write is a reasonable approach.
Event Record Format. VDF event records contain three elds: a header
eld, base oset eld, and data written eld. This format captures all data
needed to replay an MMIO event and represents this information in a compact
format requiring only 38 bytes per event. The compactness of each record is
an important factor because using a smaller record size decreases the number of
bits that can potentially be mutated.
The header is a single byte that captures whether the event is a read or write
event, the size of the event (1, 2, or 4 bytes), and which virtual device register
bank the event takes place in. The base oset eld is one to three bytes in size
and holds the oset from the base address. The size of this eld will vary from
device to device, as some devices have small register bank ranges (requiring only
one byte to represent an oset into the register bank) and other devices map
much larger register banks and device RAM address ranges (requiring two or
three bytes to specify an oset). The data eld is one or four bytes in size and
holds the data written to a memory location when the header eld species a
write operation. Some devices, such as the oppy disk controller and the serial
port, only accept single byte writes. Most devices accept writes of 1, 2, or 4
bytes, requiring a 4 byte eld for those devices to represent the data. For read
operations, the data eld is ignored.
While VDFs record and replay of MMIO activity captures the interaction
of the guest environment with virtual devices, some devices may make use of
interrupts and DMA. However, we argue that such hardware events are not
necessary to recreate the behavior of most devices for fuzz testing. Interrupts
are typically produced by a virtual device, rather than consumed, to alert the
guest environment that some hardware event has completed. Typically, another
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 11
Author Proof
read or write event would be initiated by the guest in reaction to an interrupt, but
since we record all this read/write activity, the guests response to the interrupt
is captured without explicitly capturing the interrupt.
DMA events copy data between guest and device RAM. DMA copies typically
occur when buers of data must be copied and the CPU isnt needed to copy
this data byte-by-byte. Our observation is that if we are only copying data to
be processed, it is not actually necessary to place legitimate data at the correct
location within guest RAM and then copy it into the virtual device. It is enough
to say that the data has been copied and then move onto the next event. While
the size of data and alignment of the data may have some impact on the behavior
of the virtual device, such details are outside the scope of this paper.
IPC protocol. Commands are sent between the test driver and QEMU as plain-
text messages, requiring time to parse each string. While this is not a concern
for the virtual clock of QEMU, wall clock-related issues (such as thread race
conditions) are less likely to be exposed.
Second, qtest does not provide control over QEMU beyond spawning the new
QEMU instance and sending control messages. It is unable to determine exactly
where a hung QEMU process has become stuck. A hung QEMU also hangs the
qtest test driver process, as the test driver will continue to wait for input from the
non-responsive QEMU. If QEMU crashes, qtest will respond with the feedback
that the test failed. Reproducing the test which triggers the crash may repeat
the crash, but the analyst still has to attach a debugger to the spawned QEMU
instance prior to the crash to understand the crash.
VDF seeks to automate the discovery of any combination of virtual device
MMIO activity that triggers a hang or crash in either the virtual device or some
portion of the hypervisor. qtest excels at running known-good, hard-coded tests
on QEMU virtual devices for repeatable regression testing. But, it becomes less
useful when searching for unknown vulnerabilities, which requires automatically
generating new test cases that cover as many execution paths as possible.
To address these shortcomings, we have developed a new fuzzer QEMU accel-
erator, based upon qtest, for VDFs event playback. This new accelerator adds
approximately 850 LOC to the QEMU codebase. It combines the functionality of
the qtest test driver process and the qtest accelerator within QEMU, eliminating
the need for a separate test driver process and the IPC between QEMU and the
test driver. More importantly, it allows VDF to directly replay read/write events
as if the event came directly from within a complete guest environment.
Fig. 4. A sample of the branch coverage data for the AC97 virtual device.
large amount of test data that is not needed to reproduce the discovered issue,
so it is desirable to reduce this test case to the absolute minimum number of
records needed to still trigger the bug. Such a minimal test case simplies the
job of the analyst when using the test case to debug the underlying cause.
AFL provides a test case minimization utility called afl-tmin. afl-tmin
seeks to make the test case input smaller while still following the same path of
execution through the binary. Unfortunately, this will not be useful for reducing
the test cases recorded by VDF, which is only interested in reaching the state
in which a crash/hang occurs. It has no interest in reaching every state in the
test case, but only the states necessary to reach the crash/hang state. Therefore,
VDF performs a three-step test case post-processing, seen in Fig. 5, to produce
a minimal test case which passes through a minimimal number of states from
any test case shown to reproduce an issue.
First, the test case le is read into memory and any valid test records in the
test case are placed into an ordered dataset in the order in which they appear
within the test case. Because the fuzzer lacks semantic understanding of the
elds within these records, it produces many records via mutation that contain
invalid garbage data. Such invalid records may contain an invalid header eld,
describe a base oset to a register outside of the register bank for the device,
or simply be a truncated record at the end of the test case. After this ltering
step, only valid test records remain.
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 15
Author Proof
Second, VDF eliminates all records in the dataset that are located after the
point in the test case where the issue is triggered. To do this, it generates a
new test case using all but the last record of the dataset and then attempts to
trigger the issue using this truncated test case. If the issue is still triggered, the
last record is then removed from the dataset and another new truncated test
case is generated in the same fashion. This process is repeated until a truncated
test case is created that no longer triggers the issue, indicating that all dataset
records located after the issue being triggered are now removed.
Third, VDF eliminates any remaining records in the dataset that are not
necessary to trigger the issue. Beginning with the rst record in the dataset,
VDF iterates through each dataset record, generating a new test case using all
but the current record. It then attempts to trigger the issue using this generated
test case. If the issue is still triggered, the current record is not needed to trigger
the issue and is removed from the dataset. Once each dataset record has been
visited and the unnecessary records removed, the dataset is written out to disk
as the nal, minimized test case. In addition, source code is generated that is
suitable for reproducing the minimized dataset as a qtest testcase.
While simple, VDFs test case minimization is very eective. The 1014 crash
and hang test cases produced by the fuzzer during our testing have an average
size of 2563.5 bytes each. After reducing these test cases to a minimal state,
the average test case size becomes only 476 bytes, a mere 18.57% of the original
test case size. On average, each minimal test case is able to trigger an issue by
performing approximately 13 read/write operations. This average is misleadingly
high due to some outliers, however, as over 92.3% of the minimized test cases
perform fewer than six MMIO read/write operations.
4 Evaluation
The conguration used for all evaluations is a cloud-based 8-core 2.0 GHz Intel
Xeon E5-2650 CPU instance with 8 GB of RAM. Each instance uses a minimal
server installation of Ubuntu 14.04 Linux as its OS. Eight cloud instances were
utilized in parallel. Each device was fuzzed within a single cloud instance, with
one master fuzzer process and ve slave fuzzer processes performing the testing.
A similar conguration was used for test case minimization: each cloud instance
ran six minimizer processes in parallel to reduce each crash/hang test case.
We selected a set of eighteen virtual devices, shown in Table 2, for our evalu-
ation of VDF. These virtual devices utilize a wide variety of hardware features,
such as timers, interrupts, and DMA. Each of these devices provides one or
more MMIO interfaces to their control registers, which VDFs fuzzing acceler-
ator interacts with. All devices were evaluated using QEMU v2.5.05 , with the
exception of the TPM device. The TPM was evaluated using QEMU v2.2.50
with an applied patchset that provides a libtpms emulation [20] of the TPM
5
US government approval for the engineering and public release of the research shown
in this paper required a time frame of approximately one year. The versions of QEMU
identied for this study were originally selected at the start of that process.
16 A. Henderson et al.
Author Proof
Device class Device Branches Initial Final Crashes Hangs Tests per Test
of interest coverage coverage found found instance duration
Audio AC97 164 43.9% 53.0% 87 0 24.0 M 59d 18 h
CS4231a 109 5.5% 56.0% 0 0 29.3 M 65d 12 h
ES1370 165 50.9% 72.7% 0 0 30.8 M 69d 18 h
Intel-HDA 273 43.6% 58.6% 238 0 23.1 M 59d 12 h
SoundBlaster 311 26.7% 81.0% 0 0 26.7 M 58d 13 h
16
Block Floppy 370 44.9% 70.5% 0 0 21.0 M 57d 15 h
Char Parallel 91 30.8% 42.9% 0 0 14.6 M 25d 12 h
Serial 213 2.3% 44.6% 0 0 33.0 M 62d 12 h
IDE IDE Core 524 13.9% 27.5% 0 0 24.9 M 65d 6 h
Network EEPro100 240 15.8% 75.4% 0 0 25.7 M 62d 12 h
(i82550)
E1000 332 13.9% 81.6% 0 384 23.9 M 61d
(82544GC)
NE2000 (PCI) 145 39.3% 71.7% 0 0 25.2 M 58d 13 h
PCNET (PCI) 487 11.5% 36.1% 0 0 25.0 M 58d 13h
RTL8139 349 12.9% 63.0% 0 6 24.2 M 58d 12 h
SD Card SD HCI 486 18.3% 90.5% 14 265 24.0 M 62d
TPM TPM 238 26.1% 67.3% 9 11 2.1M 36d 12 h
Watchdog IB700 16 87.5% 100.0% 0 0 0.3 M 8h
I6300ESB 76 43.4% 68.4% 0 0 2.1 M 26 h
hardware device [23]. Fewer than 1000 LOC were added to each of these two
QEMU codebases to implement both the fuzzer accelerator and any recording
instrumentation necessary within each tested virtual device.
VDF discovered noteworthy bugs in six virtual devices within the evaluation
set, including a known denial-of-service CVE [7] and a new, previously undis-
covered denial-of-service CVE [8]. Additional bugs were discovered relating to
memory management and thread-race conditions, underscoring VDFs ability to
discover bugs of a variety of natures utilizing the same techniques and principles.
62.32% of the total branches were covered. The largest increase in average cov-
erage was seen during the rst six cumulative hours of testing, where cover-
age increased from the initial 30.15% to 52.84%. After 2.25 days of cumulative
testing, average coverage slows considerably and only 0.43% more of the total
branches are discovered during the next 6.75 cumulative days of testing. While
eleven of the eighteen tested devices stopped discovering new branches after only
one day of cumulative testing, six of the seven remaining devices continued to
discover additional branches until 6.5 cumulative days had elapsed. Only in the
serial device were additional branches discovered after nine cumulative days.
100.00%
70%
60% 80.00%
70.00%
50%
60.00%
40%
50.00%
30% 40.00%
20% 30.00%
20.00%
10%
10.00%
0% 0.00%
0 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30
Cumulative Days of Fuzz Testing Cumulative Days of Fuzz Testing
Fig. 6. Average percentage of branches covered (left) and average percentage of total
bugs discovered (right) over time during fuzz testing.
Our proposed test case minimization greatly simplies this process, as many
unique bugs identied by VDF minimize to the same set of read/write oper-
ations. The ordering of these operations may dier, but the nal read/write
that triggers the bug remains the same. Each discovered virtual device bug falls
into one of four categories: Excess resource usage (AC97), invalid data trans-
fers (E1000, RTL8139, SDHCI), debugging asserts (Intel-HDA), and thread race
conditions (TPM).
Invalid Data Transfers. Many virtual devices transfer blocks of data. Such
transfers are used to move data to and from secondary storage and guest physical
memory via DMA. However, invalid data transfers can cause virtual devices
to hang in an innite loop. This type of bug can be dicult to deal with in
production systems as the QEMU process is still running while the guests virtual
clock is in a paused state. If queried, the QEMU process appears to be running
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 19
Author Proof
and responsive. The guest remains frozen, causing a denial of service of any
processes running inside of the guest.
VDF discovered test cases that trigger invalid data transfer bugs in the E1000
and RTL8139 virtual network devices and the SDHCI virtual block device. In
each case, a transfer was initiated with either a block size of zero or an invalid
transfer size, leaving each device in a loop that either never terminates or exe-
cutes for an arbitrarily long period of time.
For the E1000 virtual device, the guest sets the devices E1000 TDH and
E1000 TDT registers (TX descriptor head and tail, respectively) with osets
into guest memory that designate the current position into a buer contain-
ing transfer operation descriptors. The guest then initiates a transfer using
the E1000 TCTL register (TX control). However, if the values placed into the
E1000 TDH/TDL registers are too large, then the transfer logic enters an innite
loop. A review of reported CVEs has shown that this issue was already discovered
in January 2016 [7] and patched [14].
For the RTL8139 virtual device, the guest resets the device via the ChipCmd
(chip control) register. Then, the TxAddr0 (transfer address), CpCmd (C+ mode
command), and TxPoll (check transfer descriptors) registers are set to initiate
a DMA transfer in the RTL8139s C+ mode. However, if an invalid address
is supplied to the TxAddr0 register, QEMU becomes trapped in an endless loop
of DMA lookups. This was an undiscovered bug, which has been patched and
assigned CVE-2016-8910 [8] as a denial of service exploit.
For the SDHCI virtual device, the guest sets the devices SDHC CMDREG reg-
ister bit for data is present and sets the block size to transfer to zero in the
SDHC BLKSIZE register. The switch case for SDHC BLKSIZE in the sdhci write()
MMIO callback function in hw/sd/sdhci.c performs a check to determine
whether the block size exceeds the maximum allowable block size, but it does
not perform a check for a block size of zero. Once the transfer begins, the device
becomes stuck in a loop, and the guest environment becomes unresponsive. Luck-
ily, xes for this issue were integrated into mainline QEMU [12] in December 2015.
process implementing the TPM via RPC. However, it is also possible to integrate
libtpms directly into QEMU by applying a patchset provided by IBM [23]. This
allows each QEMU instance to own its own TPM instance and directly control
the start-up and shutdown of the TPM via a TPM backend in QEMU.
VDF discovered a hang that is the result of the TPM backend thread pool
shutdown occurring before the tasks allocated to the thread pool have all been
completed. Without an adequately long call to sleep() or usleep() prior to the
thread pool shutdown to force a context switch and allow the thread pool worker
threads to complete, the thread pool will hang on shutdown. Because the shut-
down of the TPM backend is registered to be called at exit() via an atexit()
call, any premature exit() prior to the necessary sleep() or usleep() call will
trigger this issue. QEMUs signal handlers are never unregistered, so using a
SIGTERM signal to kill QEMU is unsuccessful.
Note that this thread pool is part of the TPM backend design in QEMU,
and is not part of the libtpms library that implements the actual TPM emula-
tor. Most likely this design decision was made to avoid any noticeable slowdown
in QEMUs execution by making the TPM virtual device run in an asynchro-
nous manner to avoid any performance impact caused by performing expensive
operations in the software TPM. Other newer TPM pass-through options, such
as the Character in User Space (CUSE) device interface to a stand-alone TPM
emulator using libtpms [13], should not experience this particular issue.
5 Related Work
Fuzzing has been a well-explored research topic for a number of years. The
original fuzzing paper [32] used random program inputs as seed data for testing
Unix utilities. Later studies on the selection of proper fuzzing seeds [25,34] and
the use of concolic fuzzing to discover software vulnerabilities [17] have both been
used to improve the coverage and discovery of bugs in programs undergoing fuzz
testing. By relying on the record and replay of virtual device activity, VDF
provides proper seed input that is known to execute branches of interest.
Frameworks for testing virtual devices are a fairly recent development.
qtest [9] was the rst framework to approach the idea of exible low-level test-
ing of virtual devices. VDF leverages qtest, but has improved on the approach
to better improve test case throughput and test automation. Tang and Li pro-
posed an approach [36] using a custom BIOS within the guest environment that
listened on a virtual serial port to drive testing. VDFs approach relies upon
no software executing within the guest environment (BIOS, kernel, etc.), and
performs device-specic BIOS-level initialization as part of its init set.
A number of tools utilize record and replay. ReVirt [31] records system events
to replay the activity of compromised guest systems to better analyze the nature
of the attack. Aftersight [27] records selected system events and then ooads
those events to another system for replay and analysis. Its primary contribution
of decoupled analysis demonstrates that record and replay facilitates repeated
heavyweight analysis after the moment that the event of interest originally
occurred. PANDA [30], a much more recent work in this area, uses a modied
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 21
Author Proof
6 Conclusion
In this paper, we presented VDF, a system for performing fuzz testing on vir-
tual devices, within the context of a running hypervisor, using record/replay of
memory-mapped I/O events. We used VDF to fuzz test eighteen virtual devices,
generating 1014 crash or hang test cases that reveal bugs in six of the tested
devices. Over 80% of the crashes and hangs were discovered within the rst day
of testing. VDF covered an average of 62.32% of virtual device branches during
testing, and the average test case was minimized to 18.57% of its original size.
Acknowledgment. The authors would like to thank the sta of the Griss Institute
in Rome, New York for generously allowing the use of their cloud computing resources.
This material is based upon research sponsored by the Air Force Research Lab, Rome
Research Site under agreement number FA8750-15-C-0190.
22 A. Henderson et al.
Author Proof
References
1. Advanced Linux Sound Architecture (ALSA). http://www.alsa-project.org
2. Amazon.com, Inc., Form 10-K 2015. http://www.sec.gov/edgar.shtml
3. CVE-2014-2894: O-by-one error in the cmd start function in smart self test in
IDE core. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-2894
4. CVE-2015-3456: Floppy disk controller (FDC) allows guest users to cause denial
of service. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3456
5. CVE-2015-5279: Heap-based buer overow in NE2000 virtual device. https://cve.
mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-5279
6. CVE-2015-6855: IDE core does not properly restrict commands. http://cve.mitre.
org/cgi-bin/cvename.cgi?name=CVE-2015-6855
7. CVE-2016-1981: Reserved. https://cve.mitre.org/cgi-bin/cvename.cgi?name=
CVE-2016-1981
8. CVE-2016-8910: Qemu: net: rtl8139: innite loop while transmit in C+ mode.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-8910
9. Features/QTest. http://wiki.qemu.org/Features/QTest
10. Kernel-Based Virtual Machine. http://www.linux-kvm.org/
11. PCI - OSDev Wiki. http://wiki.osdev.org/PCI
12. [Qemu-devel] [PATCH 1/2] hw/sd: implement CMD23 (SET BLOCK COUNT)
for MMC compatibility. https://lists.gnu.org/archive/html/qemu-devel/2015-12/
msg00948.html
13. [Qemu-devel] [PATCH 1/5] Provide support for the CUSE TPM. https://lists.
nongnu.org/archive/html/qemu-devel/2015-04/msg01792.html
14. [Qemu-devel] [PATCH] e1000: eliminate innite loops on out-of-bounds transfer
start. https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03454.html
15. Qubes OS Project. https://www.qubes-os.org/
16. TrouSerS - The open-source TCG software stack. http://trousers.sourceforge.net
17. Avgerinos, T., Cha, S.K., Lim, B., Hao, T., Brumley, D.: AEG: automatic exploit
generation. In: Proceedings of Network and Distributed System Security Sympo-
sium (NDSS) (2011)
18. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R.,
Pratt, I., Wareld, A.: Xen and the art of virtualization. ACM SIGOPS Operating
Syst. Rev. 37(5), 164 (2003)
19. Bellard, F.: QEMU, a fast and portable dynamic translator. In: USENIX Annual
Technical Conference, Freenix Track, pp. 4146 (2005)
20. Berger, S.: libtpms library. https://github.com/stefanberger/libtpms
21. Bohme, M., Pham, V.T., Roychoudhury, A.: Coverage-based greybox fuzzing as
markov chain. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer
and Communications Security, CCS 2016 (2016)
22. Bottinger, K., Eckert, C.: Deepfuzz: triggering vulnerabilities deeply hidden in
binaries. In: Proceedings of the 13th International Conference on Detection of
Intrusions and Malware, and Vulnerability Assessment, DIMVA 2016 (2016)
23. Bryant, C.: [1/4] tpm: Add TPM NVRAM Implementation (2013). https://
patchwork.ozlabs.org/patch/288936/
24. Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of
high-coverage tests for complex systems programs. In: Proceedings of the 8th Sym-
posium on Operating Systems Design and Implementation, pp. 209224. USENIX
Association (2008)
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices 23
Author Proof
25. Cha, S.K., Avgerinos, T., Rebert, A., Brumley, D.: Unleashing mayhem on binary
code. In: 2012 IEEE Symposium on Security and Privacy, pp. 380394. IEEE, May
2012
26. Chipounov, V., Georgescu, V., Zamr, C., Candea, G.: Selective symbolic execu-
tion. In: Proceedings of Fifth Workshop on Hot Topics in System Dependability,
June, Lisbon, Portugal (2009)
27. Chow, J., Garnkel, T., Chen, P.M.: Decoupling dynamic program analysis from
execution in virtual environments. In: USENIX Annual Technical Conference, pp.
114 (2008)
28. Cong, K., Xie, F., Lei, L.: Symbolic execution of virtual devices. In: 2013 13th
International Conference on Quality Software, pp. 110. IEEE, July 2013
29. Corbet, J., Rubini, A., Kroah-Hartman, G.: Linux Device Drivers, 3rd edn.
O Reilly Media Inc., Sebastopol (2005)
30. Dolan-Gavitt, B., Hodosh, J., Hulin, P., Leek, T., Whelan, R.: Repeatable Reverse
Engineering for the Greater Good with PANDA. Technical report, Columbia Uni-
versity, MIT Lincoln Laboratory, TR CUCS-023-14 (2014)
31. Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: enabling
intrusion analysis through virtual-machine logging and replay. ACM SIGOPS
Operating Syst. Rev. 36(SI), 211224 (2002)
32. Miller, B.P., Fredriksen, L., So, B.: An empirical study of the reliability of UNIX
utilities. Commun. ACM 33(12), 3244 (1990)
33. Rawat, S., Jain, V., Kumar, A., Cojocar, L., Giurida, C., Bos, H.: VUzzer:
application-aware evolutionary fuzzing. In: NDSS, February 2017
34. Rebert, A., Cha, S.K., Avgerinos, T., Foote, J., Warren, D., Grieco, G., Brumley,
D.: Optimizing seed selection for fuzzing. In: 23rd USENIX Security Symposium
(2014)
35. Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta, J.,
Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Driller: augmenting fuzzing through
selective symbolic execution. In: Proceedings of NDSS 2016, February 2016
36. Tang, J., Li, M.: When virtualization encounter AFL. In: Black Hat Europe (2016)
37. Wu, C., Wang, Z., Jiang, X.: Taming hosted hypervisors with (mostly) deprivileged
execution. In: Network and Distributed System Security Symposium (2013)
38. Zalewski, M.: American Fuzzy Lop Fuzzer. http://lcamtuf.coredump.cx/a/