asyncweb.eurosp18
asyncweb.eurosp18
Resp.
Resp.
Resp.
Resp.
Req.
Req.
Req.
Req.
a running web server on the fly, by redirecting data pointers
to faux structures, instead of redirecting code pointers to
Client 1 ... Client N Client 1 ... Client N
malicious code. We present a complete case study of such
data-oriented attacks against the contemporary Nginx web
server, and evaluate the covertness of our demonstrated Figure 1: Synchronous vs. asynchronous web servers.
attack in the face of common real-world security-hardened
deployment scenarios. Our specific innovations include:
of a process object, as well as the overhead of context
• A flexible and robust instrumentation technique for
switching between processes, means that this model is not
identifying security-critical data in web server memory.
satisfactory for web servers that must handle hundreds or
• An approach for bypassing ASLR using only a linear
thousands of incoming connections concurrently.
heap memory disclosure vulnerability.
• Highlighting how an adversary can significantly reduce
In response to demands for highly concurrent web
the work factor involved in server takeover (compared servers, traditional process-based architectures such as
to what is typically considered necessary using con- Apache have begun to offer thread-based concurrency that
temporary approaches). allows a single process to service multiple concurrent con-
• Evaluating the feasibility of such attacks by studying
nections by dedicating a unique thread to each connection.
the widespread susceptibility of deployed web servers In this way, one thread in a process can block while waiting
to vulnerabilities that enable such attacks. for an I/O operation to complete at the same time that other
threads continue to service other requests. This approach,
called worker mode by Apache [4], is a popular alternative
2. Background to process preforking when scalability to many connections
Although modern web servers generally carry out a set is important, but allocating a thread for each connection is
of straightforward tasks when handling incoming requests still considered inefficient for many real-world servers [24].
(e.g., accepting network connections, parsing client requests, As the demand for web server concurrency has in-
fetching content from a datastore, and generating responses), creased, a new architecture emerged: the asynchronous
there have been a number of proposed approaches to imple- (event-driven) web server. Under this model, requests
menting this workflow. The differences can be attributed to are serviced asynchronously by a single (single-threaded)
varying standards for scalability, performance, robustness, worker process, which uses event-based callback functions
and simplicity in design. Designing a web server architec- to carry out server functionality when needed (e.g., parse
ture that is optimized for any of these high-level attributes request headers, construct response headers). Since blocking
involves awareness of how to leverage lower-level operating on synchronous I/O is not necessary, connections do not
system features (e.g., processes, threads, asynchronous I/O). need to be associated with a scheduling unit that can be
One approach relies on using a different process or suspended, providing greater scalability. Note that the func-
thread for each connection being serviced. This greatly tionality that enables asynchronous request processing (e.g.,
improves the scalability of servicing requests through syn- chaining processing modules together via callback func-
chronous I/O, since the process or thread associated with tions) must be at the core of the overall server architecture
a given request can be suspended while waiting for an I/O and must be incorporated into many design aspects.
operation to complete — freeing resources which can be Despite the challenges of refactoring its core syn-
dedicated to processing additional requests. In recent years, chronous processing implementation, Apache recently of-
this model has been popularized by the Apache web server, fered a processing mode known as event [4], which makes
which forks a separate process to handle each incoming con- further strides to optimize the number of clients that can
nection, terminating it upon connection closure. One notable be handled simultaneously by a single worker process. As
optimization of Apache’s process-per-request architecture we show later, the risks of abandoning web server memory
involves preforking a pool of processes on startup to avoid space isolation between client requests, will only become
the overhead of forking upon each incoming connection. more relevant as Apache continues to refactor its server de-
While using multiple processes for handling concurrent sign to match the impressive scalability performance offered
requests indeed benefits scalability, the heavyweight nature by asynchronous architectures.
Nginx (pronounced engine-x), the market’s most popular write capabilities against popular server programs, thereby
asynchronous web server, has garnered widespread adop- confirming the generally accepted notion that motivated
tion as a result of its ground-up design for asynchronous adversaries will find ways to leverage memory corruption
scalability [39]. In fact, although Apache still holds the exploits (e.g., buffer overflow, use-after-free, double free) in
largest market share, many sites have switched to Nginx in order to achieve the so-called write-what-where capabilities
recent years (potentially also incorporating other back-end [26]. This scenario — which affords the ability to write an
processing solutions). At the start of the decade in 2010, arbitrary value at an arbitrary location in process memory
Apache claimed 71.5% of the web server market, while — can be enacted in a variety of ways, such as corrupting
Nginx was only used by 3.9% of sites. However, as of stack or heap objects that will be written to in the future.
January 2017, only 50.9% of sites still use Apache, while Like Hu et al. [22], we assume the existence of an arbitrary
32.1% use Nginx. The popularity of Nginx is especially write vulnerability in Nginx for the proof of concept exploits
apparent for the busiest websites, as the majority of the presented in Section 6.
busiest 100,000 sites use Nginx over Apache [41]. Although we assume such arbitrary write capabilities,
Figure 1 shows the high-level architectural differences we do not assume the ability to use memory corruption to
between the industry’s two most popular web servers. Crit- gain arbitrary read capabilities. In particular, after extensive
ically, the figure shows the difference in how Apache uses research, we found no practical exploits or exploit method-
process-based isolation to logically separate request process- ologies that can be leveraged to disclose server memory at
ing, while Nginx handles all requests in a single process. an arbitrary address. Although such exploits may exist, we
This key difference in architectural models has major im- restrict ourselves from asserting the theoretically powerful
plications in terms of the susceptibility of these web servers assumption of arbitrary read capabilities due to their rarity
to non-control-data oriented attacks. and to keep with our goal (§4) of presenting attacks that are
feasible in the real world.
2.1. Exploiting Web Servers On the other hand, there have been instances of server
Exploiting a web server can be a desirable feat for mounting vulnerabilities that disclose a linear swath of heap memory
widespread attacks against unsuspecting clients. Web server (e.g., Heartbleed (CVE-2014-0160), Cloudbleed [18], Ya-
exploitation is often the first step in a drive-by download hoobleed (CVE-2017-9098), CVE-2014-0226, CVE-2012-
campaign, where the ultimate goal is to use the popularity 1180) at an unspecified address. The Heartbleed vulnera-
of a legitimate website to distribute malware once the web bility, for example, was one of the most impactful security
server has been compromised. To put the findings of issues in the last decade, with 24–55% of HTTPS servers in
this work in perspective, it is important to understand the the Alexa Top 1 million sites being initially vulnerable [14].
requirements for a modern-day exploit chain that seeks to In early 2017, researchers uncovered the Cloudbleed vul-
gain system level control of a victim machine. Due to nerability in Cloudflare’s CDN service, due to a memory
ubiquitously deployed mitigations such as DEP and ASLR, error in an Nginx module used for parsing and modifying
full system exploitation generally requires an adversary to: HTML pages on-the-fly [18]. This vulnerability serves as a
reminder that complex and memory-error-prone processing
1) Exploit a memory corruption vulnerability to modify is employed by cloud-based services within the confines
the contents of an application’s memory. of Nginx’s asynchronous architecture. While Heartbleed,
2) Leverage a memory disclosure bug to circumvent ad- Cloudbleed, and similar vulnerabilities do not give the ad-
dress space randomization. versary as powerful of a primitive as arbitrary read, we
3) Prepare a code re-use payload in memory and pivot the show that even a partial linear read of heap memory (whose
stack pointer to the start of this chain. location is not controlled by the adversary) can be leveraged
4) Use the ROP chain to map the location of injected to undermine ASLR and locate key application structures as
shellcode as executable. a first step in performing powerful data-oriented attacks.
5) Launch a privilege escalation attack against higher-
privilege components.
3. Other Related Work
Each of these steps in the exploit chain provide unique
challenges to an adversary. In particular, accepting the fact Over a decade ago, Chen et al. [11] highlighted the power of
that memory errors will inevitably occur in complex ap- leveraging memory corruption exploits to subvert systems
plications written in type-unsafe C/C++ code, the research through the manipulation of security-critical non-control-
community has focused heavily on raising the bar for steps data — all without ever corrupting the control flow struc-
3–5 through DEP and code reuse defenses, sandbox devel- tures of an application. They demonstrated data-oriented
opment, kernel hardening and many others. attacks against an assortment of widely-used server-side
Interestingly, while the absence of untrusted script exe- applications, but their approach required manual source code
cution protects web servers from many associated vulnera- analysis to obtain in-depth semantic knowledge regarding
bilities, the non-trivial logic implementing complex request the layout of security-critical data and how its corruption
processing and dynamic content generation exposes a con- could be leveraged in each application. More recently, Hu
siderable attack surface to adversaries. Indeed, Hu et al. et al. [21] showed how to lessen the amount of a-priori
[22] recently showed the feasibility of achieving arbitrary knowledge needed for pulling off the same attacks pre-
sented by Chen et al. [11]. Their approach, termed data- web servers and the ability to trivially defeat ASLR given
flow stitching, utilizes taint tracking to compute data flows these primitives. However, our extensive research into the
that occur during application runtime. This approach treats actual remote server exploits seen in the wild — as well as
file inputs to the application as data sources and file outputs published research on the matter [22] — led us to question
as data sinks, tracing how critical data is imported to an that assumption, and instead limit our adversarial model to
application from the file system as well as how information one in which the adversary has the powers of arbitrary write,
generated by the program flows out to the filesystem. Shortly but only linear heap disclosure. Critically, unlike prior work,
thereafter, Hu et al. [22] highlighted the feasibility of using we do not assume the adversary can read data from arbitrary
commonly occurring memory corruption vulnerabilities to addresses in memory since we see no supporting evidence
gain arbitrary write capabilities in server programs. That for this ability in real-world server exploits. Our attacks are
work shows how memory errors can be leveraged to achieve demonstrated against Nginx, the industry leader in scalable,
write-what-where [26] capabilities in process memory. event-based server architectures. For simplicity, we assume
None of these works provide a general technique for the adversary has access to debug symbols, which is a
overcoming ASLR, but rather require that a pointer to realistic assumption given that the two most popular web
security-critical data is somehow leaked to the adversary by servers1 , namely Apache and Nginx, are both open source.
the same memory error that allows for the arbitrary write.
Thus, it is unclear how an adversary would adapt the opaque 5. Approach
payloads generated by these approaches, even if the loca-
tions of modules in the process address space were known Even under the assumption that an adversary can leak heap
through traditional ASLR-bypass techniques. Empowered memory and overwrite arbitrary data in process memory,
by the write-what-where [26] capabilities demonstrated in there are several hurdles that must be overcome to achieve
Hu et al. [22], we explore the importance of server process viable data-only attacks against asynchronous web servers.
architectures and how they affect data-oriented attacks. This First among these is identifying data that when overwritten
connection has been critically overlooked, and we believe will have the intended high-level effect of injecting mali-
this oversight has dire consequences moving forward. cious web content that would result in drive-by downloads
or disabling services that provide privacy and confidentiality.
3.1. Defenses Against Control-Flow Hijacking Next, having identified this data, we must find ways to
reliably overwrite it to meet the desired objective. Lastly,
As the security community has largely acknowledged that
to fully explore the power of this threat, we seek ways to
memory corruption vulnerabilities in complex software are
automate the steps as much as possible.
inevitable, defensive mitigations have most prominently tar-
geted the control-flow hijacking steps of the exploit chain 5.1. Memory Access Tracing
— including return-oriented programming tactics [34] and
related variants. These solutions employ varied techniques To address the first challenge, we provide a technique for
to thwart attacks, such as ensuring control-flow integrity tracing the memory accesses committed by a web server in
(CFI) [1, 29] or employing code diversification (e.g., [5]). servicing a request, and explain how these accesses can be
These approaches do not protect against data-oriented at- inspected to identify data that is critical to server execution
tacks as they are exclusively directed towards protecting the as configured by website administrators. In other words, we
executable section of a program from being repurposed for aim to identify data consulted on every incoming request
malicious means, and do nothing to enforce the integrity of that when overwritten will cause the server to behave differ-
non-control data that is read or written by the application. ently than expected. Unexpected behaviors include serving
malicious drive-by download content along with the original
4. Goals And Adversarial Model benign web pages, or downgrading the connection security
of HTTPS without warning.
Given the fact that asynchronous server architectures such Our solution uses Intel’s Pin framework [25] to record all
as Nginx handle many client connections in the same long- reads directed at the .data section of the main executable’s
lived server process, our goal is to show realistic attack sce- memory from the time the server receives an incoming
narios in which data-oriented attacks have expressive power HTTP request until the service of this request is complete.
rivaling that of control-flow hijacking exploits against web For each read, we also record the instruction pointer which
servers. Moreover, we seek to show that in some respect, issued the read. Next, in an offline phase, we use debug
data-only attacks are more attractive from an adversarial symbols to construct a timeline of data accesses made when
perspective than attacking control flow, since they tend to be servicing a request, including the variable name and offset in
especially covert from a system-monitoring perspective, and the .data section that was accessed as well as the function
also obviate the need for further privilege escalation attacks name and offset that issued the access. We trace accesses to
once the server worker process has been exploited. the .data section (rather than the heap) because they tend
4.1. Adversarial Model to offer better insight into the high-level operations that take
place while a server is processing a request. Specifically, the
As alluded to earlier, recent work [31] has assumed the
full powers of arbitrary read and write exploitation against 1. Together, these servers account for 83% of the market share [42].
.data section .text section Heap
Config. data
pointer table
ngx_http_core_module ngx_http_create_request() { Core config.
ctx_index: 0 ... Offset 0
}
... Offset 1
...
Figure 2: NGINX exploit diagram. Through program instrumentation and memory analysis, an attacker can locate the entries
of interest in the configuration data pointer table, and overwrite them to point to malicious entries.
.data section often contains top-level pointers to complex towards generating a response. Thus, following the control
per-module data structures which are spread throughout the flow and data accesses of asynchronous web servers through
heap, and this top-level is generally a good starting point manual source code inspection is a difficult task, and for
for the live memory analysis techniques explained shortly. that reason, we resorted to program instrumentation to help
Moreover, the heap is accessed thousands of times more identify security-critical data.
often during request processing than the .data section, For pedagogical reasons, we note that a sample memory
and thus it is more difficult to associate high-level server trace for Nginx to service an HTTP GET request contains
operations with individual memory accesses. Lastly, even less than 150 accesses to the data section, so it is feasible
while instrumenting a program it is often difficult to as- to manually identify data of interest. For example, the 96th
sociate individual allocations with the type of object that access directed at the data section in our trace originated
will reside at the given heap location, thus lessening the from ngx http access handler(), which accesses
advantages provided by debug symbols. data at offset 0 within the ngx http access module
The reader may be wondering why we do not simply structure. With a quick inspection, it becomes clear that
conduct manual source code analysis to identify where the function is referencing an access control configura-
critical configuration data is accessed in the server program. tion data structure on the heap, using an index stored at
In fact, we initially took this manual approach, but soon re- ngx http access module + 0 to retrieve the pointer
alized that the complex nature of asynchronous web servers to this data. Given such a memory trace, an adversary can
(in the way they chain modules and functionality together easily hone in on some important access control related
through callback functions) made for much difficulty in configuration data in memory. While this example may
manually tracing the flow of execution that occurs while seem overly simple, we found that additional code paths
handling even the simplest of requests. Said another way, we identified in Nginx, that consult in-memory configuration
the performance optimization gained by asynchronous server data structures for other modules (e.g., SSL module, security
architectures comes at the cost of code simplicity, as every headers module, error and access logging modules), are just
small module of processing that takes place in servicing as straightforward to analyze.
a request must be chained together through complex data
5.2. Corrupting Data for a Desired Effect
structures rather than following a simple, sequential order.
Such modular code design is an essential component of Armed with the ability to locate sensitive data within a
asynchronous web servers, as there are no thread or process program, the next challenge involves determining how to
objects to save the code execution state of a partially-formed overwrite that data for the intended degradation of server
response while waiting on some resource (e.g., a file from security — without introducing unstable behavior to the
disk). Instead, small code modules accomplish simple tasks server. In this work, we restrict our attacks to influencing the
that can be asynchronously invoked to perform some step in-memory representation of configuration data. Specifically,
we seek to understand how different server configuration (a)
ngx_http_access
_loc_conf_t* ngx_array_t
ngx_http_access
_rule_t
options cause in-memory data structures to be populated alcf elts mask: 25413176
achieved through manual source-code analysis, tracking the _loc_conf_t size: 16 deny: 1
rules ...
data flow of information from configuration file to in- rules_un
memory structures. However, as discussed in Section 5.1, ngx_array_t
ngx_http_access
_rule_t
the complex nature of callback functionality to support elts mask: 25413280
containing the structures (e.g., the config data pointer table) ap_http_filter() {
Server configuration struct
server_rec on the heap
Config. data
pointer table
Core config.
Offset i
disclosure that leaks the address of a structure in one worker ...
}
6.8. Applicability To Other Modern Web Servers Reference
config. data
...