0% found this document useful (0 votes)
115 views

An Exploitation Chain To Breakout of VMware ESXi

Uploaded by

andy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views

An Exploitation Chain To Breakout of VMware ESXi

Uploaded by

andy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Breaking Turtles All the Way Down:

An Exploitation Chain to Break out of VMware ESXi

Hanqing Zhao ‡† , Yanyu Zhang† , Kun Yang∗ , Taesoo Kim ‡

† Chaitin Security Research Lab,


‡ Georgia Institute of Technology,
∗ Tsinghua University

Abstract emulated floppy drive. Other vulnerabilities such as CVE-


2015-5165 and CVE-2015-7504 have been disclosed and
VMware ESXi is an enterprise-class, bare-metal hypervisor
exploited in public PoC demonstrations [17]. However, most
dedicated to providing the state-of-the-art private-cloud in-
of them, as far as we know, rely on strict preconditions (e.g.,
frastructures. Accordingly, the design and implementation of
supporting a decade-old floppy driver by the cloud provider),
ESXi is of our community’s interest, yet lacking a thorough
which fail to demonstrate exploitation under the default con-
evaluation of its security internals. In this paper, we give a
figuration of the underlying virtualization infrastructures. Ad-
comprehensive analysis of the guest-to-host attack surfaces
mittedly, there have been a few public demonstrations against
of ESXi and its recent security mitigation (i.e., the vSphere
VMware Workstation in the Pwn2Own contest, but their ap-
sandbox). In particular, we introduce an effective and reliable
proaches and details of exploitation have not been disclosed
approach to chain multiple vulnerabilities for exploitation
to the public.
and demonstrate our approach by leveraging two new bugs
The situation for enterprise-class, commercial hypervi-
(i.e., uninitialized stack usages), namely, CVE-2018-6981 and
sors, such as VMware ESXi or Hyper-V, is even worse: they
CVE-2018-6982. Our exploit chain is the first public demon-
are much more attractive targets to attackers, but their inter-
stration of a virtual machine escape against VMware ESXi.
nals still remain opaque to the security communities. Such
security-by-obscurity approaches taken by commercial type-
1 Introduction 1 hypervisor require that many practical challenges be ad-
dressed. First, the complex internal design and machinery to
Cloud computing has become the most popular choice for implement a type-1 hypervisors is not trivial to understand
today’s online services. The foundation that enables cloud in depth. Second, hypervisors are intensively protected by
computing is virtualization technology, effectively allowing custom in-house protection schemes, limiting the capability
economy of scale. Some examples include QEMU-KVM, for dynamic analysis. Lastly, non-trivial reverse engineering
Xen, Hyper-V, and VMware Families, which are the basic skills and efforts are required to understand and even navigate
building blocks of most public and private cloud infrastruc- system binaries without proper source codes and symbols.
tures. In this paper, we share the lessons learned from our in-
Unfortunately, its large and complicated code base of vir- depth, security analysis of ESXi. We first introduce the attack
tual machines inevitably includes software vulnerabilities surfaces of ESXi by systematically evaluating attack scenarios.
such as memory corruptions. Moreover, unlike other software Second, we provide an exhaustive analysis of two previously
bugs, the bugs in virtualization infrastructures could lead to unknown vulnerabilities we found in ESXi. Last, by leverag-
a break-in of the security boundary between guest and host; ing these vulnerabilities, we elaborate a new exploit technique
thus, the security of the entire cloud environment of the cloud to bypass the deployed mitigation, the vSphere sandbox in
provider might be subverted. ESXi. The constructed exploit is reliable and persistent; it dy-
These unique propositions of cloud infrastructures make namically adopts the version of ESXi and survives across the
them an attractive target for attackers, but our communities system reboot, causing persistent damage to cloud providers.
lack, an in-depth analysis and evaluation of their security. Re- Overall, this paper makes the following contributions:
search communities have made several attempts to do this: 1. A systematic review of attack surfaces, mitigations, and
one example that received considerable attention is a secu- vSphere sandbox of VMware ESXi.
rity bug of QEMU-KVM, called VENOM (known as CVE-
2015-3456 [8]), which attempted to exploit a bug in a virtual, 2. An in-depth demonstration and analysis of the vulner-
: sandboxed : user worlds Guest OS Host OS

Hostd
: sandboxed : user worlds
Guest
VMX Virtual Machine Ring-0
Plugin Hostd VMX handler 0
Hardware …
etc. Plugin Virtual
hypercall handler N
Hardware Ring-3
etc.
VMM
User world API In/Out (special ports) return to user-land

Resource
User world API Exceptions Hypervisor
VMKernel VMFS captured
Scheduling I/O Stack Drivers

Resource
Physical Hardwares
VMKernel Figure 2: The backdoor remote procedure call. Under default I/O
Scheduling I/O Stack Drivers privilege level, a Ring-3 program should not be able to issue I/O
Figure 1: The architecture of VMware ESXi [5]. VMKernel, a
POSIX-like OS, is designed to multiplex the virtual machines and operations. As a result, the int or out instruction should cause
Physical Hardwares the Ring-3 process to fault and crash. However, in this scene, the
provides some core fundamentals such as resource scheduling, I/O
stacks, file system (VMFS), and drivers. The guest machines can com- hypervisor captures the fault and handles it, supporting the backdoor
municate withVMFSthe host through hypercall, which includes normal RPC mechanism.
hypercalls such as VM-Exit and special hypercalls such as backdoor
and VMCI. The term "user world" refers to a process running in the
VMkernel. A significant user world is "VMX," a Ring-3 process, be- hypervisor. Also, it has been widely applied to VMware prod-
hypercall
cause it contains RPC handlers and virtual hardwares. (Note that ucts, e.g., some functionalities in open-vm-tools [18] such
the VMX process and some other process sandboxed by the vSphere as drag-and-drop, copy-and-paste are implemented on top of
sandbox.) The virtual machine monitor (VMM) is a process that the backdoor mechanism. As Figure 2 illustrates, the guest
provides the execution environment for guest virtual machines. machine that runs in Ring-3 of a protected-mode OS executes
the in or out instruction with a specific port, which raises a
corresponding exception. Normally, this results in the crash
abilities used in the first virtual machine escape of of the process. However, in a VMware virtual machine, the
VMware ESXi. hypervisor captures the exception and dispatches it to a proper
handler on the host OS. Therefore, there are no exceptions
3. State-of-the-art and reusable exploitation techniques for in the end. Compared to normal hypercall, which usually re-
manipulating memory layout, constructing an arbitrary- quires the CPL ⩽ IOPL, some backdoor requests can be issued
address-write primitive, and achieving persistent ex- from Ring-3 directly. Consequently, a channel for the commu-
ploitation in VMware ESXi. nication between guest and host can be established. For one
example, as Figure 3 delineates, backdoor can be leveraged
Threat Model. In this research, we assume that an adver-
to send data from the guest to the host. By putting required
sary can execute arbitrary codes in the user space and kernel
parameters into specific registers, like a simple function call,
space on the guest OS.
a process running in the protected mode can invoke the RPC
directly. In the sample, we first create a new "RPCI" channel
2 Background and retrieve the channel number. Then, we can send data
to the host through this channel. Based on this mechanism,
2.1 The Architecture of the ESXi some high-level and complicated protocols could be devel-
oped. Furthermore, in this paper, we also use this feature
As Figure 1 illustrates, VMware ESXi integrates its operating to reliably manipulate the memory layout. The details are
system (OS) called VMkernel, providing the functionalities discussed in §3.
of resource scheduling, I/O stacks, network stacks, storage
stacks, and device drivers, and all processes are running on
2.2 Virtual Machine Escape
top of it. VMkernel also implements a simple in-memory
file system to hold staged patches, configurations, and system VM escape is a process of breaking out of a virtual machine
logs. from a guest OS, so the guest VM can launch an arbitrary
To communicate with the hypervisor, the guest taps into execution with the privilege of the host operating system [21].
the VMM through VM-Exit in most circumstances. Notably, Specifically, ESXi completely isolates the guest operating sys-
VMware also introduced another hypercall mechanism called tems from each other by leveraging hardware virtualization
backdoor. Interestingly, although it is named "backdoor," it technologies, such as Intel VT or AMD-V. Any privileged
is merely a communication channel between the guest and the instructions from the guest operating systems will be captured
1 ; Creating a new channel Interface Category Privilege
2 asm(
3 "movl $0x564d5868,%%eax\n\t" ; magic bytes: ’VMXh’
4 "movl $0xc9435052,%%ebx\n\t" ; magic bytes for RPCI
SVGA2D Virtual Graphic ROOT
5 "movl $0x1e,%%ecx\n\t" ; MESSAGE_TYPE_OPEN SVGA3D Virtual Graphic ROOT
6 "movl $0x5658,%%edx\n\t" ; special I/O port e1000 Virtual Ethernet ROOT
7 "out %%eax,%%dx\n\t"
8 "movl %%edx, %%eax\n\t" ; ret channel number (EDX(HI)) e1000e Virtual Ethernet ROOT
9 "movl %%ecx, %%ebx\n\t" ; success or failure VMXNET3 Virtual Ethernet ROOT
10 ... xHCI Virtual USB ROOT
11 );
12 uHCI Virtual USB ROOT
13 ; Sending data through a specific channel aHCI Virtual SATA ROOT
14 asm(
15 "movl %0, %%edx\n\t" ; channel number (EDX(HI))
Lsilogic Virtual SCSI ROOT
16 "movl $0x41414141,%%ebx\n\t" ; 4 bytes to be sent Printer Virtual COM Device ROOT
17 "movl $0x564d5868,%%eax\n\t" ; magic bytes: ’VMXh’
18 "movl $0x0002001e,%%ecx\n\t" ; MESSAGE_TYPE_SEND Table 1: The virtual hardware has been demonstrated to affect the
19 "movw $0x5658,%%dx\n\t" ; special I/O port
20 "out %%eax,%%dx\n\t"
interaction of guest-to-host. The privilege field indicates the require-
21 "movl %%ecx, %%eax" ; success or failure ment to open the device in the guest OS.
22 ...
23 );

Figure 3: By using "RPCI," a sort of backdoor-based mechanism, them require that the root privilege be opened in the
guest machines can communicate with the host OS. By reading or guest operating system.
writing a special I/O port (0x5658/ 0x5659), a process running in
Ring-3 can invoke the RPC directly. 3. RPC Channels: VMware developed some RPC proto-
cols such as backdoor and VMware Virtual Machine
Communication Interface (VMCI) to accelerate the com-
and sanitized by the hypervisor. In normal circumstances, the munication between the guest OS and the host OS. It has
guest cannot execute codes or affect security-critical behav- been applied in VMware’s virtual machines for decades,
iors such as system configuration, and network connections because it does not rely on hardware virtualization ex-
of other guests or the host. By exploiting the vulnerabilities tensions. Thus, it also results in some virtual machine
in the ESXi, the adversaries can cross the security boundary escape attack surfaces. Some RPC handlers exist in the
between the guest and the host, to execute arbitrary codes on VMX process. By exploiting the bugs in these handlers,
the host operating system (i.e., virtual machine escape in the adversaries can escape from the guest OS.
ESXi).
Common Mitigations. The host maintains a POSIX-like
operating system, and some Linux-like security mitigations
2.3 Security Analysis of the ESXi are integrated into the host.
Attack Surfaces. The virtualization layer is the most signifi- 1. ASLR: Address Space Layout Randomization
cant part of the lifetime of the guest operating system. Any (ASLR) [20] was introduced to mitigate the exploitation
interaction between the guest OS and the hypervisor is a po- of memory corruption vulnerabilities. In ESXi, the
tential attack vector that could be exploited by adversaries. addresses encompassing the program, stack, heap, and
Generally, the guest can communicate with the host of ESXi libraries of the user space binaries are randomized. In
in several ways: the ESXi’s VMX process, which contains most of the
virtual hardware, some hardware such as the network
1. VMKernel and Core Virtualization Infrastructures: card runs in Ring-3. Therefore, an attacker who wants
There are some fundamentals such as VM-Exit handlers, to attack virtual hardware or other Ring-3 services in
memory management, and memory virtualization infras- ESXi first has to leak code pointers (i.e., information
tructures offered by the hypervisor running in the kernel leakage) to further hijack the control flow of ESXi.
space. An adversary who attacks the hypervisor success-
fully could take over the kernel of the host operating 2. NX/ DEP: This option is referred to as Data Execution
system directly. Prevention (DEP) or No-Execute (NX). It works with
the processor to help prevent buffer overflow attacks by
2. Virtual Hardware: To support the I/O virtualization, blocking code execution from memory that is marked as
VMware designed a batch of virtual harware and de- non-executable [11]. In ESXi, when the process is trying
vices. Most of them are integrated into the VMX process to execute shellcodes on stack, heap, or data segments,
of ESXi [3, 15]. The guest OS can communicate with it will crash.
the virtual hardware through port I/O (PIO) or memory-
mapped I/O (MMIO). Table 1 shows some significant 3. Compact VMX: Compared with VMware Workstation,
virtual hardware integrated in VMware ESXi. Most of the type-2 hypervisor developed by VMware, ESXi’s
1 # Rules applicable for all VMs 1 void __usercall vmxnet3_reg_cmd(vmxnet3_class *a1,
2 2 __int64 read_or_write, _DWORD *data, __int64 a4, __int64 a5)
3 -s genericSys grant 3 {
4 -s ioctlSys grant 4 ...
5 -s vsiReadSys grant 5 case 4: // VMXNET3_CMD_UPDATE_MAC_FILTERS
6 ... 6 if ( a1->field_1A20 ) {
7 7 ⋆ dma_memory_create(a1->driver_shared_addr + 8, 0x2B0ui64, 1,
8 -c unix_socket_create grant 8 ⋆ a1->state->field_B8, &page);
9 -c unix_stream_socket_bind grant 9 vmxnet3_cmd_update_mac_filters(v6, &page, a5);
10 -c unix_dgram_socket_bind grant 10 ⋆ destruct_page_struct(&page);
11 ... 11 sub_14017CB30(v6);
12 12 }
13 -p inet_socket_bind all grant 13 break;
14 -p inet_socket_connect loopback grant 14 ...
15 -p inet_socket_connect nonloopback grant 15 }
16 ... 16
17 17 char __fastcall dma_memory_create(unsigned __int64 addr, unsigned
18 -d tpm2emuObj tpm2emuDom file_exec grant 18 __int64 size, int a3, int a4, page_struct *page)
19 19 {
20 -r /var/run rw 20 unsigned __int64 v5;
21 -r /var/lock rw 21
22 ... 22 v5 = *(qword_140DAA810 + 12160);
23 // check the addr
Figure 4: A sample rule for global VMs. It grants what system calls 24 ⋆ if ( addr > v5 || !size || size > v5 - addr + 1 )
25 ⋆ return 0;
a VMX process is allowed to call, what network connections a VMX 26 set_page_struct(addr, size, a3, a4, page);
process is allowed to establish, and what directories a VMX process is 27 return 1;
allowed to read or write. 28 }

Figure 5: dma_memory_create() is responsible for creating a page


struct used to read/write memory between guest and host. Inside
VMX program has fewer source codes, because VMware
the function, it checks the addr passed by users. If it is invalid, the
moved the data package operations into the VMkernel, function will return directly. However, it does not check whether the
enhancing the efficiency of ESXi. Meanwhile, it narrows allocation is successful; thus, an uninitialized stack variable could
the attack surfaces of VMX. be used in destruct_page_struct().

Sandboxing. Since vSphere 6.5, VMware ESXi has intro-


duced a mandatory access control (MAC), similar to Ubuntu’s
AppArmor, to enforce the security policy of guest VMs. runs in its own sandbox. However, it already has prede-
(VMware vSphere is a commercial name for the whole fined rules; the users of ESXi do not need to configure it
VMware Suite, and ESXi is the hypervisor server of vSphere.) by themselves. The rules dictate what the VMX process run-
The sandbox maintains some pre-defined policies that con- ning the virtual machine is allowed to do or access, e.g., the
tain some white lists of allowed syscalls, sockets, and file VMX process has no access to some sensitive files such as
permissions in the file system (VMFS). The main functional- /etc/passwd, and the VMX process cannot run scripts or ini-
ity of sandboxing is offered by VMKernel, dividing the tar- tial network connections on the host. Meanwhile, the sandbox
get into several different restricted domains (app, ioFilter, restricts what system calls a VMX process allowed to call; thus,
pluginFramework, globalVM, plugin, tpm2emu). After the some sensitive system calls such as execve, execl are re-
specific virtual machine starts, Its behaviors are limited. It stricted. The complete rules of the sandbox can be found in
ensures the safety and security of VMs by running them in an /etc/vmware/secpolicy/domains/ on the host. By au-
operational sandbox with strict controls regarding hypervisor diting the sandbox profiles, we can get some plausible attack
capabilities available to them [6]. Even if the adversaries surfaces of the sandboxing in ESXi.
have escaped from the guest operating system, they are still
restricted by the vSphere sandbox; thus, it qualifies the impact
of virtual machine escape attacks.
Figure 4 shows a sample rule for global virtual ma-
chines. For instance, if an adversary exploits a vulnerabil- 3 VM Escape: Our Approach
ity in virtual hardware that is integrated in the VMX process
of VMware Workstation, the adversary can spawn a shell or
reverse shell on the host OS, finishing the VM escape. How- Overview. There are two uninitialized usages in vmxnet3
ever, in ESXi, even if an adversary gets an arbitrary shell- virtual ethernet card and a logical issue in the sandbox policy.
code execution primitive by exploiting vulnerabilities in the CVE-2018-6982 is used for code pointer leak, and CVE-2018-
VMX process, it cannot invoke any sensitive syscalls such as 6981 for arbitrary pointer free. Our technique is to chain both
execve or establish a remote shell through sockets. There- for an arbitrary write and, ultimately control-flow hijacking.
fore, it makes the VM escape more difficult. The sandbox Next, we bypass the vSphere sandbox and achieve virtual
of ESXi is a rule-based sandbox, and each virtual machine machine escape.
1 bool __fastcall vmxnet3_cmd_get_coalesce(__int64 a1, char a2)
3.1 Vulnerabilities 2 {
3 v17 = 0;
The vmxnet3 adapter is the recommended network adapter 4 v26 = __readfsqword(0x28u);
5 qmemcpy(&v25, (const void *)(*(_QWORD *)(a1 + 208) + 272LL),
to use as default in VMware ESXi because it offers the best 6 0x100uLL);
throughput of all the adapter options. As such, it’s likely in 7 if ( !(unsigned __int8)get_args(a1, &v18)
8 || (_DWORD)v18 != 1 //v18 is controllable
use on most virtual machines. In this section, we introduce 9 || !HIDWORD(v18)
two memory corruption bugs leveraged in our exploitation 10 || !v19
11 ⋆ || HIDWORD(v18) != 16 // constrait; but v18 is controllable
chain. 12 || !(unsigned __int8)sanity_check(v19, 16LL) ) {
CVE-2018-6981. This bug is caused by an uninitialized use 13 return 0;
14 }
of stack memory in the vmxnet3 virtual ethernet card. As 15 if ( !a2 ) { //a2: always zero
Figure 5 shows, there is an interface (vmxnet3_reg_cmd()) 16 v14 = *(_QWORD *)(a1 + 208);
17 ⋆ v20 = 0xFA000000003LL;
that can execute commands in the MMIO memory of vmxnet3. 18

In the command VMXNET3_CMD_UPDATE_MAC_FILTERS, the 19 // first 8-byte of v20 is initialized, but 16-byte is read
20 // (HIDWORD(v18) == 16)
dma_memeory_create() function creates a page structure 21 ⋆ write_back_to_guest(v19, &v20, HIDWORD(v18), 0, *(_DWORD *)
used to read/write memory between guest and host. The 22 ⋆ (v14 + 184));
23 return 1;
destruct_page_struct() is responsible for releasing the 24 }
memory of the page structure. 25 ...
26 }
According to Figure 5, the page structure is allocated
and initialized in the function set_page_struct(). At Figure 6: A code snippet of vmxnet3_cmd_get_coalesce(). The
the beginning of the function, the dma_memeory_create() get_args() function reads a memory region of the VMX process.
function checks the validity of the physical address given Ultimately, v18, v19 are controllable. The second parameter
by the guest. Unfortunately, if the guest provides an write_back_to_guest indicates the source buffer, and the third one
invalid physical address, the function will return im- indicates the size.
mediately. However, after the dma_memeory_create(),
the VMXNET3_CMD_UPDATE_MAC_FILTERS handler fails to
check whether the allocation of the page structure is function (Figure 8), the function frees a field of the unini-
successful, resulting in an uninitialized use in the tialized stack memory. Hence, after filling a pointer into
destruct_page_struct() function. the uninitialized memory, an arbitrary-address-free primitive
Technically, this bug can also be turned into an informa- could be constructed.
tion leakage bug. However, to improve the stability of the 2) Arbitrary address write primitive. As Figure 11 illus-
exploitation, we decided to chain a dependent information trates, the metadata of Backdoor-RPC channel exists in the
leakage bug into the exploitation. data segment. Therefore, we use this feature to construct an
CVE-2018-6982. This bug is also caused by an uninitial- arbitrary-address-write primitive. First, we opened several
ized stack variable in the memory of ESXi, and we uti- Backdoor-RPC channels; thus, some metadata structures of
lized it to independently retrieve memory address informa- the channel in the data segment will be activated. Second,
tion from the host. There is another command handler in we fake a glibc fast-bin chunk on it to do the House of Spirit
vmxnet3_reg_cmd() called vmxnet3_cmd_get_coalesce(). Attack.
Figure 6 depicts the core logic of it. The get_args() func- Specifically, after leaking the address of the data segment,
tion is used to retrieve some data from a memory region of the we calculated the addresses of the metadata for the backdoor,
VMX process. The sanity_check() function qulifies that the and put them into the uninitialized stack memory using the
v19 must satisfy v19 ⩽ 16. Also, the write_back_to_guest() function handle_port_io(). For example, in Figure 9, when
function will write 16 bytes of data into the guest context. Un- the size of the data is less than 0x8000, it will put all of the
fortunately, only 8 bytes of them (v20) are initialized. data into the stack. Next, we use the arbitrary-address-free
primitive to free the fake fast-bin chunk.
3.2 Exploitation House of Spirit Attack. Because ESXi uses a variant of glibc
to maintain Ring-3’s heap, we decide to fake a fast-bin chunk
A significant challenge of exploiting uninitialized use bugs on the global metadata of Backdoor-RPC channels, i.e., House
is how to control the uninitialized variable. In this section, of Spirit Attack of glibc [1,13,22]. However, glibc has several
we illustrate the entire process of turning the uninitialized use integrity checks to mitigate memory corruption attacks. To
bug to arbitrary code execution and how we overcome the bypass it, we need to construct the fake chunk and pick the
challenges. size properly.
1) Arbitrary address free primitive. As Figure 7 delineated, After investigating, as Figure 10 illustrates, we determine
first, we leak some addresses to break ASLR through the in- some constraints in the free() function of glibc that need to
formation leakage bug. Next, in destruct_page_struct() be satisfied:
stack 4. open several RPC channels
handle_port_io()
and fake fast chunk
Info X state channel 1 stack
1. retrieve some Info Y …
… metadata …
information create time 5. put the address
(on .data seg) …
in .data seg as uninitialized addr data len of the fake fast

fingerprints to … … chunk (on .data)
fake chunk addr into stack
construct the exp 3. using the leaked state channel N
2. trigger the uninitialized …

dynamically info to calculate metadata
stack memory read create time
the addr of .data (on .data seg)
segment data len
13. ROP chain on stack … 6. free the fake chunk
… state channel N+1 using the arbitrary
14. shellcodes … metadata
12. stack pivot address free primitive
mmap RWX memory create time (on .data seg)
… data len

jump to shellcodes fake chunk


.got hijacking 15.arbitrary code execution prev size prev size prev size
11. overwrite size size size
9. use the fake
qsort’s .got.plt address chunk to corrupt FD FD FD
8. reallocate the fake
into stack pivot gadget next chunk’s BK BK BK
chunk with the
data pointer … … …
operations on other
10. arbitrary address write RPC channels
7.fastbin of heap
Figure 7: Exploiting the vulnerabilities and getting arbitrary shellcode execution privilege

1 void __fastcall destruct_page_struct(page_struct *a1) 1 void __usercall handle_port_io(__int64 a1, __int64 a2, __int64 a3)
2 { 2 { ...
3 int v1; // eax 3 char *v11; // rsi
4 page_struct *v2; // rbx 4 ...
5 unsigned int v3; // edi 5 __int64 v35; // [rsp+A0h] [rbp-8038h]
6 __int64 v4; // rbp 6 __int64 v36; // [rsp+80B0h] [rbp-28h]
7 __int64 v5; // rsi 7
8 __int64 v6; // r12 8 v3 = *(a1 + 4);
9 __int64 v7; // rax 9 v4 = *(a1 + 13);
10 10 read_or_write = *(a1 + 48);
11 v1 = a1->ready; 11 ...
12 v2 = a1; 12 if ( *(a1 + 60) && (v10 = *(a1 + 52) << 12, v10 > 0x8000) )
13 if ( v1 == 1 ) 13 v11 = malloc_heap_memory(v10); // copy the data into heap
14 {...} 14 else
15 else 15 v11 = &v35;
16 { 16 if ( read_or_write & 1 )
17 v3 = 0; 17 { if ( *(v8 + 60) )
18 if ( v1 ) 18 { ...
19 { 19 v15 = v11;
20 v4 = 0i64; 20 do
21 v5 = 0i64; 21 { ...
22 do 22 memcpy(v15, v18, v17); // copy the data into stack
23 { 23 ...
24 ...
25 } Figure 9: A code snippet of handle_port_io(). We use it to spray
26 while ( v3 < v2->ready );
27 } the stack.
28 free(v2->field_18); // free the pointer on stack
29 }
30 }
4. For the next chunk’s size θ: 2 ∗ SIZE_SZ ⩽ θ ⩽ av →
Figure 8: A code snippet of destruct_page_struct(). We use it to system_mem.
free arbitrary addresses.
5. The first chunk in the fast-bin is not the fake chunk.

1. The ISMMAP bit of the fake chunk is 0. Then, we reallocate the fake chunk by leveraging other
Backdoor-RPC channel operations, i.e., when a new channel
2. The fake chunk’s address is aligned. is opened, the channel allocates a new buffer with a control-
lable length that pointed by the data field in the metadata of
3. The size of the fake chunk is 32 bytes to 128 bytes and the channel. This is a flexible and reusable trick to manipulate
aligned. the heap of ESXi. Finally, we overwrite the next data pointer
: sandboxed
void public_fRE(Void_t* mem)
1
.data segment
2 {
3 Hostd
mstate ar_ptr; Channel N state fake fast-bin chunk
4 mchunkptr p;
... …
5
Plugin
6 p = mem2chunk(mem); 0x20 data prev size
7 if (chunk_is_mmapped(p)) // check mmap bit
8 { etc. 0x28 size size
9 munmap_chunk(p); …
10 return; Guest
11 } VMX Virtual Machine Channel N+1 state
12 ...
13
Hardware
ar_ptr = arena_for_chunk(p); …
14 ... 0x20 data
15 _int_free(ar_ptr, mem); size
0x28
16
17
} VMKernel …
18 void _int_free(mstate av, Void_t* mem) VMM
19 { Figure 11: Arbitrary Address Write. By faking a fast-bin chunk
20 mchunkptr p; Physical Hardwares
21 INTERNAL_SIZE_T size; on the metadata of Backdoor-RPC channels, we can reallocate the
22 mfastbinptr* fb;Resource fake through Backdoor-RPC operations. Aftertarget
reallocating,
addr we can
...
23
24
Scheduling I/O Stack Drivers
p = mem2chunk(mem);
overwrite the next channel’s metadata to corrupt arbitrary addresses.
25 size = chunksize(p);
26 ...
27
// check current size
VMFS
28
29 if ((unsigned long)(size) <= (unsigned long)(av->max_fast))
/var/run/inetd.conf.
30 { In this way, we can bind a shell on a specific port by
31 // check next chunk
32 if (chunk_at_offset(p, size)->size <= 2 * SIZE_SZ overwriting the inetd.conf file. Note that files existing in
33 || __builtin_expect(chunksize(chunk_at_offset(p, size)) the /var/run/* are not persistent and copied from the backup
34 >= av->system_mem, 0))
35 firmware in the /bootbank/* directory after rebooting.
36 {
37 errstr = "free(): invalid next size (fast)";
38 goto errout; Forcing the process to restart. To activate the config-
39 } uration and spawn a shell, we need to force the inetd process
40 ...
41 fb = &(av->fastbins[fastbin_index(size)]); to restart. However, we cannot simply restart the entire OS,
42 ... because the inetd.conf file is not persistent. Files in the
43
44 p->fd = *fb; VMFS are copied from the bootbank after the host OS restart.
45 *fb = p; Fortunately, there is a watchdog can help us to restart some
46 }
47 } processes. As a result, we use the kill() system call to
terminate the inetd process. After that, the watchdog restarts
Figure 10: To fake a fast-bin chunk successfully, we need to bypass
the process, and a bind shell spawns.
some constraints in glibc.

of the next channel and an arbitrary-address-write primitive


can be constructed. 3.4 Reliability
3) Code execution. To execute codes on the context of the
host, we use the arbitrary-address-write primitive to corrupt Compatibility. To improve the compatibility of our exploita-
the stack. First of all, we corrupt the global offset table and tion, we use some fingerprints retrieved from the host OS to
overwrite the qsort function into a stack pivot gadget. Next, a manipulate the exploitation dynamically. As Figure 7 indi-
ROP chain will be put into the stack. Meanwhile, the control cates, we first utilize the information leakage bug to retrieve
flow will be forwarded to the ROP chain on the pivoted stack. some memory of .data segment, using it to determine the
In particular, the ROP chain will invoke mmap syscall and version of the running ESXi.
enable the privilege to execute arbitrary shellcodes. Next, we can dynamically construct some crucial payloads
(e.g., shellcodes, address of fake chunk, etc.) in terms of the
current version.
3.3 Circumventing the vSphere Sandbox
Persistency. As depicted in §3.3, after the host OS restarted,
The logical bug in the sandbox policy. By scrutinizing the the entire VMFS is overwritten again, i.e., files in VMFS are
policies of the sandbox, we determine that there are some overwritten by the backup firmware stored in /bootbank. In
loopholes inside it. The sandbox grants the VMX process light of this, after getting the root privilege, we can overwrite
to read and write the /var/run directory of VMFS, and an the files under the /bootbank directory to achieve persistent
internet server (inetd) configuration database exists in the exploitation.
4 Implementation Version Build number V? S? Success Adaptable
ESXi 6.7 10764712 Y Y 93.3% Y
The exploit has been implemented in a kernel module with
ESXi 6.7 10302608 Y Y 86.7% Y
450 LoC. The shellcodes embedded in the kernel module
ESXi 6.7 10176752 Y Y 90.0% Y
will spawn a bind shell on a specific port and a reverse shell ESXi 6.7 9484548 Y Y 86.7% Y
connecting to our remote server. To launch the escape, we ESXi 6.7 9214924 Y Y 93.3% Y
merely need to insert the module into the kernel of the guest ESXi 6.7 8941472 Y Y 90.0% Y
OS. ESXi 6.5 <10719125 Y Y N/A N
Notably, all of the bugs depicted above have been reported ESXi 6.0 Any N N N/A N/A
to the vendor and patched appropriately. Now the exploit- V: Vulnerable or not, S: Sandboxed or not
chain only affect the instances of VMware ESXi whose ver- Table 2: The results of the evaluation. The adaptable field indicates
sion is earlier than 6.7.0 with build number 10764712. whether the exploit can adapt to the environment automatically. Ac-
tually, the exploitation chain can also work on ESXi 6.5, but we have
not adapted those versions. Furthermore, the reason the success rate
5 Evaluation is not 100% is that the stack could be polluted by the ESXi itself,
resulting in the arbitrary address free primitive fails.
To examine the stability and compatibility of our exploit. We
evaluate it on ESXi 6.7 with different versions.

Experimental Setup. We run the exploit on ESXi with


different versions. The guest OS is Ubuntu 16.04.3 LTS,
with 2 cores, and 4GB memory. The host machine has an
i9-7980XE processor with a 64GB physical memory. For single vulnerability in Backdoor-RPC [4]. However, in their
each guest OS, we run the exploit 30 times. tests, the stability is qualified by the Low Fragment Heap of
the Windows 10 operating system, and the maximum success
Results. Table 2 shows the results. The exploit chain rate is 80%. Keen Security Lab has also demonstrated an
has been demonstrated to affect ESXi 6.7, and ESXi 6.5. exploit for VMware Workstation by using an information
Also, the exploit can effectively adapt to the targeted leakage bug in Backdoor-RPC and a memory corruption bug
environment. In particular, the maximum success rate of our in xHCI [9]. For type-1 hypervisors, CVE-2015-7835 [2] has
proposed exploit chain is 93.33%. been demonstrated to achieve a virtual machine escape of the
Xen hypervisor. Furthermore, Jordan Rabet has proposed an
in-depth analysis of a successful virtual machine escape of
Hyper-V [12].
6 Discussion
Exploitation of Uninitialized Use. The leading part
Evaluation results show that the success rate of our exploit of the exploitation of uninitialized use is to control the
is not 100%. In this section, we try to determine the reasons. uninitialized variables and turn them into other types
By scrutinizing the memory status, we found two significant vulnerabilities. Halvar Flake proposed an approach to
factors that qualified the success rate of the exploit. determine all the paths that could be overlapped in the
same stack frame [7]. Also, Kangjie Lu et al. proposed an
1) Manipulating the uninitialized variables. In light of the
automated approach using targeted stack spraying to facilitate
fact that the arbitrary-address-free primitive requires valid
the uninitialized uses of the kernel [10].
addresses, we have to make sure the targeted address is le-
gitimate before triggering the logic of free. However, the
Mitigations for VM Escape. BitVisor [14] tried to en-
normal actions and executions in ESXi may pollute the stack,
force the security of I/O devices by minimizing the code size
resulting in the “free" operation not going as expected.
of hypervisors by allowing most of the I/O accesses from
2) The stability of heap. After releasing the targeted fake the guest OS to pass through the hypervisor. NoHype [16]
memory chunk, we need to reallocate it immediately. How- presented a strategy to eliminate the hypervisor attack surface
ever, if ESXi allocates it before we do, the process will crash by enabling the guest VMs to run natively on the underlying
suddenly. hardware while maintaining the ability to run multiple VMs
concurrently. Hypersafe [19] proposed an approach to
7 Related work apply the Control-Flow-Integrity to mitigate virtual machine
escape attacks. Cloudvisor [23] introduced an approach
Virtual Machine Escape. Chaitin Security Research Lab that enforces the separation of resource management from
has demonstrated an exploit of VMware Workstation with a security protection in the virtualization layer.
8 Conclusion [14] Takahiro Shinagawa, Hideki Eiraku, Kouichi Tanimoto, Kazumasa
Omote, Shoichi Hasegawa, Takashi Horie, Manabu Hirano, Kenichi
VMware ESXi is one of the most state-of-the-art enterprise Kourai, Yoshihiro Oyama, Eiji Kawai, Kenji Kono, Shigeru Chiba,
Yasushi Shinjo, and Kazuhiko Kato. Bitvisor: A thin hypervisor
class hypervisors. However, there has been no systematic for enforcing i/o device security. In Proceedings of the 2009 ACM
security analysis or successful virtual machine escape until SIGPLAN/SIGOPS International Conference on Virtual Execution En-
this research. We give a systematic overview of the archi- vironments, VEE ’09, pages 121–130, New York, NY, USA, 2009.
tecture, attack surfaces, and exploitation approaches of ESXi. ACM.
Furthermore, we proposed a flexible and reuseable strategy [15] Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim. Vir-
tualizing i/o devices on vmware workstation’s hosted virtual machine
that leverages the backdoor RPC to manipulate the memory monitor. In USENIX Annual Technical Conference, General Track,
layouts. Our exploitation chain contains three vulnerabilities. pages 1–14, 2001.
Evaluation results show that it is reliable (90% success rate [16] Jakub Szefer, Eric Keller, Ruby B Lee, and Jennifer Rexford. Elim-
on average). inating the hypervisor attack surface for a more secure cloud. In
Proceedings of the 18th ACM conference on Computer and communi-
cations security, pages 401–412. ACM, 2011.
Acknowledgment [17] Mehdi Talbi and Paul Fariello. QEMU Case Study, 2017. http://
www.phrack.org/papers/vm-escape-qemu-case-study.html.
We would like to thank Xiaoshuai Zhang and Hong Hu for [18] VMware. Open-VM-Tools, 2019. https://github.com/vmware/
helpful suggestions in the paper writing, and anonymous re- open-vm-tools.
viewers for their helpful comments. We also would like to [19] Zhi Wang and Xuxian Jiang. Hypersafe: A lightweight approach
thank VMware for quick response. to provide lifetime hypervisor control-flow integrity. In 2010 IEEE
Symposium on Security and Privacy, pages 380–395. IEEE, 2010.
References [20] Wikipedia. Address space layout randomization, 2019.
https://en.wikipedia.org/wiki/Address_space_layout_
[1] blackngel. MALLOC DES-MALEFICARUM, 2009. http:// randomization.
phrack.org/issues/66/10.html. [21] Wikipedia. Virtual Machine Escape, 2019. https://bit.ly/
[2] Jeremie Boutoille. Xen exploitation part 2: XSA-148, from guest to 2WoFazv.
host, 2016. https://bit.ly/2MaCGB4. [22] Tianyi Xie, Yuanyuan Zhang, Juanru Li, Hui Liu, and Dawu Gu.
[3] Edouard Bugnion, Scott Devine, Mendel Rosenblum, Jeremy Suger- New exploit methods against ptmalloc of glibc. In 2016 IEEE Trust-
man, and Edward Y Wang. Bringing virtualization to the x86 archi- com/BigDataSE/ISPA, pages 646–653. IEEE, 2016.
tecture with the original vmware workstation. ACM Transactions on [23] Fengzhe Zhang, Jin Chen, Haibo Chen, and Binyu Zang. CloudVi-
Computer Systems (TOCS), 30(4):12, 2012. sor: retrofitting protection of virtual machines in multi-tenant cloud
[4] Amat Cama and Kun Yang. The Weak Bug - Exploiting a Heap with nested virtualization. In Proceedings of the Twenty-Third ACM
Overflow in VMware, 2017. https://bit.ly/2uPYK6A. Symposium on Operating Systems Principles, pages 203–216. ACM,
2011.
[5] Charu Chaubal. The Architecture of VMware ESXi. VMware White
Paper, 1(7), 2008.
[6] Adam Eckerle, Mike Foley, Eric Gray, Matthew Meyer, Kyle Ruddy,
and Emad Younis. What’s New in VMware vSphere® 6.5, 2016.
https://bit.ly/2mwhSGV.
[7] Halvar Flake. Attacks on Uninitialized Local Variables, 2006.
https://www.blackhat.com/presentations/bh-europe-06/
bh-eu-06-Flake.pdf.
[8] Jason Geffner. VENOM: Virtualized ENVIRONMENT NE-
GLECTED OPERATIONS MANIPULATION, 2015. https://
venom.crowdstrike.com/.
[9] Marco Grassi, Azure Yang, and Jackyxty. A bunch of Red Pills:
VMware Escapes, 2018. https://keenlab.tencent.com/en/
2018/04/23/A-bunch-of-Red-Pills-VMware-Escapes/.
[10] Kangjie Lu, Marie-Therese Walter, David Pfaff, Stefan Nümberger,
Wenke Lee, and Michael Backes. Unleashing use-before-initialization
vulnerabilities in the linux kernel using targeted stack spraying. In 24th
Annual Network and Distributed System Security Symposium, NDSS
2017, San Diego, California, USA, February 26 - March 1, 2017, 2017.
[11] Microsoft. DEP/NX Protection, 2018. https://docs.
microsoft.com/en-us/windows/desktop/win7appqual/
dep-nx-protection.
[12] Jordan Rabet. Hardening hyper-v through offensive security research,
2018. https://ubm.io/2WhwVW5.
[13] Shellphish. how2heap, 2019. https://github.com/shellphish/
how2heap.

You might also like