Safeinit: Comprehensive and Practical Mitigation of Uninitialized Read Vulnerabilities
Safeinit: Comprehensive and Practical Mitigation of Uninitialized Read Vulnerabilities
AbstractUsage of uninitialized values remains a common attacks. Todays compiler warnings and static analysis tools
error in C/C++ code. This results not only in undefined and flag only a small subset of these uninitialized reads, and worse,
generally undesired behavior, but is also a cause of information the common occurrence of false positives in the warnings
disclosure and other security vulnerabilities. Existing solutions means that programmers often ignore them altogether. Given
for mitigating such errors are not used in practice as they are the growing popularity of uninitialized reads in real-world
either limited in scope (for example, only protecting the heap),
or incur high runtime overhead.
exploits [21], [53], [8], [50], the current lack of comprehensive
protection is concerning. In this paper, we show that automatic
In this paper, we propose SafeInit, a practical protection initialization of all values on the heap and stack at allocation
system which hardens applications against such undefined be- time is possible with minimal performance penalties.
havior by guaranteeing initialization of all values on the heap
and stack, every time they are allocated or come into scope. Worryingly, C/C++ compilers can even introduce new
Doing so provides comprehensive protection against this class of vulnerabilities when taking advantage of the fact that reading
vulnerabilities in generic programs, including both information uninitialized memory is undefined behavior; the optimiza-
disclosure and re-use/logic vulnerabilities. tions applied by modern compilers can remove sanity checks or
We show that, with carefully designed compiler optimizations, other code in such circumstances [63]. Worse, recent research
our implementation achieves sufficiently low overhead (<5% for [44] has also shown that many programmers are unaware of
typical server applications and SPEC CPU2006) to serve as a these dangerous consequences. For C/C++ programs running
standard hardening protection in practical settings. Moreover, in production systems, there are few options for preventing
we show that we can effortlessly apply it to harden non-standard attacks that exploit uninitialized read errors. Solutions such as
code, such as the Linux kernel, with low runtime overhead. valgrind [54] and MemorySanitizer [58], in widespread use
during the development process, are much too expensive for
I. I NTRODUCTION use in production systems, even when using complex data-flow
analysis to reduce this overhead [65].
The use of uninitialized memory in C/C++ programs
introduce vulnerabilities that are popular among attackers to Clearing memory: The obvious mitigation for this problem
manipulate a programs control flow or to disclose information. is to always clear memory. For instance, Chow et al. [11]
In addition to the obvious issue of revealing sensitive data, proposed to clear all memory at deallocation time. However,
the exposure of metadata has become a more prominent prob- they only obtained acceptable overhead for heap allocations
lem in recent years, since information disclosure increasingly not for the high frequency allocations and deallocations on
becomes an essential prelude to successful exploits (e.g., to the stack. Moreover, the solution fails to address the problem
circumvent ASLR or other hardening methods) [56]. Unfortu- of undefined behavior. The PaX project [49] offers a limited
nately, concerns about the performance overhead have made but very practical solution in the form of Linux kernel patches
compiler writers reluctant to adopt strong mitigations against which protect against common uninitialized value errors, in-
this attack vector. cluding gcc plugins. Recently, and concurrent to our work,
UniSan [38] proposed more comprehensive protection than
Languages such as Java and C# ensure the definite as- this against a narrower threatinformation disclosure from the
signment of variables, requiring them to be initialized on all Linux kernelusing data-flow analysis to initialize memory
possible paths of execution. Unfortunately, C and C++ do not and variables which might be disclosed to an attacker. While
enforce this property. As a result, the vast body of existing both of these solutions provide acceptable overhead, neither
C/C++ code, which includes many runtimes and libraries for provide a complete solution for uninitialized values, and are
safer languages, is potentially vulnerable to uninitialized read currently applicable to only the Linux kernel.
In this paper, we describe a comprehensive and practical
Permission to freely reproduce all or part of this paper for noncommercial
purposes is granted provided that copies bear this notice and the full citation
solution for mitigating these errors in generic programs, by
on the first page. Reproduction for commercial purposes is strictly prohibited adapting the toolchain to ensure that all stack and heap
without the prior written consent of the Internet Society, the first-named author allocations are always initialized. SafeInit is implemented
(for reproduction of an entire paper only), and the authors employer if the at a compiler level, where low-overhead static analysis and
paper was prepared within the scope of employment. optimizations are available, and can be enabled using a single
NDSS 17, 26 February - 1 March 2017, San Diego, CA, USA
Copyright 2017 Internet Society, ISBN 1-1891562-46-0 compiler flag. We show that the overhead can be reduced
http://dx.doi.org/10.14722/ndss.2017.23183 to acceptable levels, by applying a set of carefully designed
optimizations; for example, these more than halve the overhead III. BACKGROUND
of SafeInit on SPEC CINT2006 from 8% down to <4% (with
Memory is constantly reallocated, and thus reused, in
the remaining overhead largely due to excessively complex
almost all applications. On the stack, function activation frames
code, which can be resolved using minimal-effort annotations).
contain data from previous function calls; on the heap, allo-
Summarizing, our contributions are: cations contain data from previously-freed allocations. Issues
with uninitialized data arise when such data is not overwritten
We propose SafeInit, a compiler-based solution which before being used, extending the lifetime of the old data
together with a hardened allocator automatically beyond the point of the new allocation.
mitigates uninitialized value reads by ensuring initial-
Many variables are clearly initialized before they are used;
ization, both on the stack and on the heap.
as an example, consider an integer counter used only in a
We present optimizations which reduce the typical for loop, which is explicitly assigned a new value for every
overhead of our solution to minimal levels (<5%), and iteration of the loop. We can trivially see that such a variable
are straightforward to implement in modern compilers. is always initialized before it is used.
We discuss our prototype implementation of SafeInit, On the other hand, the initialization state of a variable
based on clang and LLVM, and show that it can be ap- which is only used if a complicated conditional is true may
plied to the majority of real-world C/C++ applications depend itself on other conditionals, resolving of which would
without any additional manual effort. require executing large portions of the program or at least
extensive optimization and analysis.
We evaluate our work on CPU-intensive (including
SPEC CPU2006) and I/O intensive (server) applica- Memory may also only be partially initialized; structures
tions, as well as the Linux kernel, and verify that real- and union types in C are often deliberately incompletely
world vulnerabilities are successfully mitigated. initialized, and for simplicity or performance reasons, arrays
are often allocated with larger sizes than (initially) necessary
In summary, we argue that SafeInit provides a comprehen- to store their contents.
sive and practical solution to a serious real-world problem, In practice, reuse of memory is not only common, but also
show that it provides significant advantages compared to desirable for performance reasons [17]. When it is unclear
existing techniques and tools, and demonstrate that it offers whether a variable will be initialized before it is used, the
acceptable levels of overhead. We believe this system is only practical and safe approach is to initialize it in all cases.
sufficiently practical to make it useful in production systems,
with overhead below the levels typically demanded for industry A. Sensitive data disclosure
adoption [60], and hope to see it become a standard ingredient
of the hardening transformations offered by modern compilers. The most obvious danger of information disclosure due to
uninitialized data is the disclosure of directly sensitive data,
such as encryption keys, passwords, configuration information
II. T HREAT M ODEL and the contents of confidential files. Chow et al. have dis-
Uninitialized read errors occur when a variable, or memory, cussed [10] that data lifetimes can last far longer than we
is used without having first been initialized. This can occur would expect, and that many unintentional copies of data may
after a stack variable comes into scope, or after heap memory be made.
has been allocated. We consider an attacker seeking to exploit In fact, even when all copies of such data are appar-
any of the vulnerabilities caused by such reads of uninitialized ently explicitly cleared, problems persist. Many programs call
values, including information disclosure and use of unintended memset to clear sensitive data; unfortunately, if the data is
values (such as function pointers). We assume that such no longer valid and thus no longer used after that point, such
potential attackers have a copy of all binaries in use, and are calls can be (and are) optimized away by compilers. Common
thus aware of details such as the exact stack layout chosen by workarounds which attempt to hide these calls from com-
the compiler. piler analysis are often optimized by ever-improving compiler
We assume the program has been already hardened against analysis passes, and alternative functions (such as memset_s
other classes of vulnerabilities using existing (e.g., memory and explicit_bzero) which compilers are prohibited from
safety) defenses. Although mitigating uninitialized value vul- optimizing away are not yet commonly available.
nerabilities can probabilistically mitigate some vulnerabilities If the use of uninitialized data in a program is not directly
caused by other temporal (such as pointer use-after-free) and influenced by untrusted input, it is tempting to conclude
spatial errors (such as out-of-bounds reads), there are existing that the security consequences of these classes of issues is
low-impact solutions such as baggy bounds checking [2] otherwise minimal. However, experience has shown that a
which provide superior defenses against such attacks, and we wide range of potential attack vectors must be considered,
do not consider them in our threat model. and this varied attack surface means that all uninitialized data
vulnerabilities should be taken seriously.
We also only consider C/C++ code. Extending this work
to similar languages should be possible (as shown, for exam- One illustrative example was a vulnerability [29] in the
ple, by existing compiler functionality for local variables in PostScript font rendering on Windows, caused by failure to
Fortran), but in particular, custom assembly-language routines initialize a temporary buffer which could be read by font
fall outside the scope of our work. bytecode. By providing a font which rendered glyphs based
2
on the contents of this buffer, JavaScript in a web browser This is far from a new problem; Microsoft described
could disclose memory by reading back rendered pixels. an arbitrary write vulnerability due to an uninitialized stack
variable in Microsoft Excel in 2008 [45], and in 2010, Kees
Similarly, information disclosure from kernels to userspace Cook disclosed [12] an arbitrary Linux kernel memory write
programs, or from hypervisors to guest virtual machines, is a vulnerability caused by an uninitialized structure on the stack.
common and serious issue [8]. Containers and virtual machines
running code from untrusted parties, or as a vital layer of A common mistake is to fail to initialize variables or
sandboxing from untrusted software such as JavaScript in web buffers on the execution path taken when an error is encoun-
browsers, are now a standard component of many systems. As tered. For example, Samba had a vulnerability[61] caused by
such, even code such as device driver interfaces and emulated failure to check the error value returned by a function before
devices must be free of security issues. using a pointer value which was only initialized in the error-
free path. Similarly, a bug in Microsofts XML parser[1] made
B. Bypassing security defenses a virtual function call using a pointer stored in a local variable
which was not initialized on all execution paths. By spraying
Even where software does not make use of any seemingly the stack with pointers using JavaScript, attackers could control
sensitive data, or such data is sufficiently isolated to avoid the contents of the memory where the variable was stored, and
the possibility of it being disclosed due to uninitialized data exploit this vulnerability from within a web browser.
issues, many modern software defenses depend on the secrecy
of sensitive metadata, and so information disclosure is still It is clear that all of these vulnerabilities must be taken
a critical flaw. Stack canaries provide an obvious example; seriously, and that preventing information disclosure addresses
their protection relies on the canary value remaining secret. only a subset of uninitialized value vulnerabilities.
3
main() { define @main() {
int x; %x = alloca i32, align 4 define @main() {
printf("%d", x); %0 = load i32, i32* %x call @printf(..., i32 undef)
} call @printf(..., i32 %0) }
}
(a) C code (b) LLVM bitcode, before mem2reg (c) LLVM bitcode, after mem2reg
Fig. 1. LLVM transforms uninitialized reads into undef values early in the optimization process; later passes cannot recover the information removed in (c)
F. Undefined behavior
Hardened Binary
Undefined behavior [63] occurs when a C/C++ program
allocator
fails to follow the rules imposed by the language. Most
importantly in the context of our discussion, this is the
case when code reads uninitialized stack variables, or even Fig. 3. High-level overview of SafeInit
uninitialized heap allocations. The C/C++ standards state that
permissible consequences of undefined behavior includes the
prerequisite for almost all other compiler optimization passes
compilers code generation ignoring the situation completely
or analyses. This illustrates why hardening transformations
with unpredictable results, but many programmers are un-
must be run before any other optimizations, and most impor-
aware [44] of the fact that these consequences have more
tantly, why they must be performed in the compiler itself.
serious consequences for their compiled binaries than simply
producing code which will read potentially uninitialized data. This also limits the analysis available to such transforma-
tions; any analysis must be performed on the initial, unop-
This is not a merely theoretical problem, but a serious timized code from the frontend. This not only reduces the
practical issue; to enable the maximum number of opti- accuracy of any analysis, but also has a serious impact on
mizations, especially in code which may be expanded from performance; these problems are particularly troublesome for
templates and macros and eventually largely be discarded as interprocedural analysis, and in any case, other functions may
unreachable, modern compiler transforms (such as those used be unavailable until link-time optimization (LTO) is performed.
by LLVM [34]) take advantage of this undefined behavior on a Attempting to delay all optimization until LTO time has a
large scale. Unfortunately, such transformations may interpret severe impact on compilation time, making it far less practical.
undefined values (and thus, also uninitialized values) as any
value which makes optimizations more convenient, even if this
IV. OVERVIEW
makes program logic inconsistent. These situations often only
become apparent after other compiler transformations have SafeInit mitigates uninitialized value problems by forcing
already been applied, and cannot be detected by dynamic the initialization of both heap allocations (after their alloca-
analysis tools, since they rely on the machine code which has tions) and all stack variables (whenever they come into scope).
been generated after this process. This is done by modifying the compiler to insert initialization
calls directly at all such points.
Even a very basic level of compiler optimizations will
cause problems with such code. For example, Figure 1 shows In order to provide both practical and comprehensive
clang/LLVM generating undef values due to an uninitialized security, this instrumentation must be done within the compiler
local variable. This is caused by the mem2reg pass, which itself. SafeInit can be enabled by simply passing an additional
converts local variables to SSA form; this transformation is a hardening flag during the compilation process. As can be seen
4
in Figure 3, this enables an additional compiler pass which char err_msg[MAX_MSG];
adds the necessary initialization. ...
if (error) {
A naive initialization approach would lead to excessive setErrMessage(err_msg);
runtime overhead, and an important element of our system printf(err_msg);
is a customized hardened allocator. This is able to avoid return error;
initialization in many cases by taking advantage of extra }
information, combined with our compiler instrumentation.
Finally, the SafeInit optimizer provides non-invasive trans- Fig. 4. Typical code using error message buffer; the buffer need not be
formations and optimizations which we run alongside existing initialized unless the branch here is taken.
compiler optimizations (themselves modified where neces-
sary), as well as the final component, an extension of existing
dead store elimination optimizations. These build on top of allocation time, large allocations are at least several pages in
our initialization pass and allocator, performing more extensive size, and often allocated by using mmap directly.
removal of unnecessary initialization code, demonstrating that Modern operating systems also provide support for
the runtime overhead of our solution can be minimized. clearing regions of such memory directly (such as
MADV DONTNEED on Linux) by releasing the underlying
Perhaps most importantly, SafeInit is practical to imple-
pages; while this comes with potential performance downsides
ment in modern compilers. Our system requires minimal
[31], it is already used by modern allocators to minimize
changes and is non-invasive; no new analyses are required,
memory usage, and is ideal for our needs. By ensuring that
and the extended optimizations we propose are not specific to
large allocations are always released, we can ensure they
SafeInit. Our design is also compatible with recent develop-
will be cleared even if they are reused for another allocation,
ments such as ThinLTO [28], where later optimization passes
without incurring any performance penalty for clearing areas
may not have access to the IR/bitcode of called functions.
of memory which will not be used.
V. M ITIGATING UNINITIALIZED VALUES Our allocator also exports non-initializing variants of al-
location functions; the requested memory is not zeroed when
A. Initialization pass these are called, but the gap between the requested allocation
size and the true allocated size is always cleared. An applica-
SafeInit initializes all local variables before their first use,
tion may later make a realloc call which can lead to the
treating the point at which variables come into scope (for
re-use of this space, and keeping track of individual allocation
example, in a loop) as a newly-allocated variable. We propose
sizes consumes more memory and leads to excessive runtime
inserting initialization code at all such points where a stack
overhead (we observed overheads of >5%).
variable comes into scope; the necessary scope information is
provided by the compiler frontend.
C. Optimizer
Specifically, SafeInits stack hardening pass modifies the
compilers intermediate representation (IR) of the code being Our optimizer design provides several efficient and practi-
compiled and inserts a store instruction (ideally, a memset cal optimizations which improve the performance of SafeInit
builtin/intrinsic) after every variable comes into scope; other while being efficient and non-invasive. The primary goal of the
optimizations can later remove or simplify these. This clears optimizer is to make simple changes which will allow the many
all of the memory allocated for the variable, including any other standard optimizations available in modern compilers to
padding within a structure, or between array elements. remove any unnecessary initializations. We hope that SafeInit
will become a standard hardening technique, and so it needs
to be as practical as possible; in particular, we need to avoid
B. Hardened allocator adding complex or invasive analysis.
SafeInits hardened allocator ensures that all newly- 1) Sinking stores: Ideally, stores to local variables should
allocated memory is cleared to zero before being returned to be as close as possible to their uses. This is important for cache
the application. We do this in the allocator for safety we locality, and for minimizing the memory usage of stack frames;
override all heap allocation functions to ensure these hardened minimizing the lifetime of a variable allows stack coloring
allocator functions are always used as well as to improve algorithms to allocate stack frame space more efficiently.
performance by taking advantage of the extra information
available to the allocator. Our optimizer attempts to move our initialization stores to
the dominating point of the uses of a variable. Importantly,
Importantly, the compiler is aware that our hardened allo- this also avoids unnecessary initialization; variables which are
cator is in use; any code using allocated memory is no longer unused in certain paths need only be initialized in the paths
making use of undefined behavior, and cannot be modified or where they are used. A common example is where variables
removed by the compiler. are only used in error paths, such as with the code in Figure 4.
This code path is not executed during normal execution, and
All memory pages allocated by the operating system ker-
we do not need to initialize the buffer until we reach a path
nel are already cleared to zero, and so allocators can take
in which it will be used.
advantage of this and avoid clearing such pages. Although
the overhead of keeping track of this for small allocations is If this dominating point of the uses of a variable is
excessive, and so small allocations must always be cleared at reachable from itself, and it does not go out of scope when
5
int buf[50]; row *r = malloc(sizeof(row));
for (int i = 0; i < 50; ++i) r->row_num = 0;
buf[i] = 1; r->length = 0;
r->user_word = NULL;
Fig. 5. Example of initialization using a loop; buf is fully initialized but
this code cannot be converted into a memset. Fig. 7. Example of removed zero stores (from espresso); the memory
returned from malloc is already cleared with zero values.
sprintf (t3, "%s%s;", t1, t2);
strcpy (t1, t3); int buf[n];
memset(buf, 0, n);
memset(buf, 1, n);
Fig. 6. t3 is a safe string buffer (from gcc in SPEC CPU2006) which does
not need initialization
Fig. 8. Example of an unnecessary non-constant-length store; the first
memset can always be removed.
following this execution path, then it is not an appropriate
place for initialization; this typically occurs if the first stores
to a variable are inside a loop. To resolve this, we instead use another store without ever being read. We propose DSE-style
an initialization point which also dominates all the predecessor optimizations which are particularly appropriate for removing
basic blocks of such dominating points. initializations; existing optimizations are often ill-suited to this
task, since these situations occur less frequently in other code,
2) Detecting initialization: We propose detection of typical and so are less of a priority for compiler development.
code which initialize arrays (or portions of them), which allows
other compiler optimizations to remove or shorten previous Only relatively simple DSE optimizations are available in
stores which are revealed to be overwritten. current compilers, generally limited to only statically-sized
stores within a single basic block. However, this is an active
Typical compiler optimizations perform this only for indi- area of compiler development, and as we will later discuss,
vidual store instructions, or intrinsics such as memset. While support for these forms of complex DSE is slowly being
modern compiler transforms attempt to convert some loops to introduced in mainstream compilers. The optimizations we
memset calls [18], this is only possible if a single-byte (or present here serve to demonstrate demonstrate the importance
in some cases, two-byte) pattern is used. This is insufficient of this work and the potential performance improvements
for many common cases, such as initializing an array of (four- which are possible from more intensive optimization.
byte) integer values to the value 1, as shown in Figure 5.
Our design detects such code, treating these loops as if they 1) Heap clearing: Since all heap allocations are guaranteed
were single stores which cover the entire range they initialize. to be initialized to zero, our compiler can remove any zero
stores to freshly-allocated heap memory (treating all allocation
3) String buffers: Buffers which are used to store C-style functions as equivalent to calloc); an example is shown in
null-terminated strings are often only used in a safe manner, Figure 7. Similarly, any loads from freshly-allocated memory
where the data in memory beyond the null-terminator is never are known to be zero (rather than being undefined behavior),
used. We propose a low-cost check which catches simple and we can replace them with a constant value.
cases, such as that in Figure 6; buffers which are only passed
to known C library string functions (such as strcpy and If memory is fully initialized to a non-zero value, then our
strlen) are safe. When initializing with zero, only the first optimizer can also rewrite the allocator call to an alternative
byte of such buffers must be initialized. allocation function which skips any potential (unnecessary)
initialization; however, we want to be sure that only these
Compilers already know about and detect such built-in instances are left uninitialized by our custom allocator.
string functions, so we can take advantage of their existing
2) Non-constant-length store removal: Dead Store Elimi-
infrastructure to detect these functions; there is no need to
nation is generally only performed when the stores are of a
add annotations. Where the optimizer can prove that the string
known constant length; we propose transforming stores with
is always initialized, the initialization can later be removed
non-constant lengths, which is important to remove unneces-
entirely; however, this often only becomes clear after further
sary initializations of both dynamic stack allocations and heap
optimizations have been applied.
allocations. The simplest such situations are when an entire
existing store is overwritten, such as the code in Figure 8. Our
D. Dead Store Elimination DSE also removes stores of a non-constant size, such as some
To minimize the performance cost of initialization, SafeInit of our initialization stores, when they are entirely overwritten
also includes a variety of improved optimization passes. These by later stores.
are more complex than our other optimizations, and may 3) Cross-block DSE: Our optimizer also performs Dead
not always be necessary to obtain low overhead. However, Store Elimination across multiple basic blocks. This is an
they resolve real situations which we found to introduce active area of improvement for existing compilers, but is far
unnecessary overhead when using our hardening passes. more relevant when universal initialization is introduced, and
is necessary to enable many of the optimizations below.
In particular, we need so-called dead store elimination
(DSE) optimizations, a standard class of compiler optimiza- We need to remove both stores which are partially or
tions [4] which remove stores which are always overwritten by completely overwritten by other stores (standard DSE) as well
6
int result_buf[BUF_SIZE]; discussed, the dead store optimizations which are vital for
return shared_func(data, result_buf); acceptable performance are an active area of development, so
we based our work on a recent pre-release version of the code
Fig. 9. Example of write-only/scratch buffer; initialization is unnecessary if (LLVM revision 269558, from mid-May 2016).
shared_func only writes to the pointer provided as the second argument.
A. Initialization pass
We implemented stack clearing as an LLVM pass, which
as stores which are never used (for example, because the we run before any other optimization pass is allowed to run
value goes out of scope before being used), and while we are mostly importantly, before mem2reg, which will introduce
primarily concerned about memset, we also remove normal undef values when an uninitialized stack variable is read.
stores. New opportunities for rewriting heap functions may
also be revealed during this process, and our optimizer also Local variables in LLVM are defined by using the alloca
applies these optimizations here. instruction; our pass performs initialization by adding a call to
the LLVM memset intrinsic after each of these instructions.
4) Non-constant-length store shortening: To enable other This guarantees that the entire allocation is cleared, and are
optimizations, particularly involving string or other library transformed into store instructions where appropriate.
functions, we also attempt to shorten stores by non-constant
lengths. For example, if the first x bytes of an array are initial- B. Hardened allocator
ized, then we may be able to shorten an earlier initialization by
x bytes, assuming that the value of x does not change between We implemented our hardened allocator by modifying
the stores. However, the compiler must either be able to prove tcmalloc, a modern high-performance heap allocator [22].
that x is never larger than the size of the array, or add an The underlying pages for the allocator are obtained using
additional (and potentially expensive) bounds check. mmap or sbrk, and are guaranteed to initially be be zero. We
In practice, writing beyond the bounds of an array is force the use of MADV DONTNEED (or equivalent) when
undefined behavior, and existing compiler optimizations take freeing any such allocations, and so large heap allocations are
advantage of this to make assumptions. If execution is always always zero, and need not be initialized. The performance over-
guaranteed to reach the second store after it has reached the head of tracking the initialization status of smaller allocations
first, the compiler can assume that the second store does not is excessive, so we simply clear all other heap allocations to
write beyond the size of the array, and thus that the first store zero before the allocator returns a pointer.
may always be shortened. We also modified LLVM to treat reads from newly-
The conservative approach proposed by our design fails to allocated memory as returning zero, rather than undef, when
remove some stores which, in practice, are always safe. As SafeInit is enabled. As discussed, this is vital for avoiding the
we discuss in implementation, this turned out to be a serious unpredictable consequences of undefined values.
limitation. The performance overhead of this optimization also
means that it is only worthwhile on relatively large stores; we C. Optimizer
only apply it for stack allocations beyond a fixed size. We implemented our proposed sinking stores optimization
for stack initialization by moving our inserted memset calls
5) Write-only buffers: Sometimes, memory is allocated, but
to the dominating point of all uses of the alloca (ignoring
never written to. Removing unused local variables is known to
those which do not actually use the variable, such as casts or
be an important optimization [27], but typically interprocedural
debug intrinsics). When compiling with optimizations enabled,
analysis has been unnecessary. A typical example is shown in
clang will emit lifetime markers which indicate the points
Figure 9, where a function requires a memory buffer as an
at which local variables come into scope; we modified clang
argument for storing results, but the caller never reads from
to emit appropriate lifetime markers in all circumstances, and
this buffer, simply discarding the content. Our initialization
insert the initialization after these points.
further complicates this, by adding a new unnecessary write
to initialize such buffers. The alloca instructions corresponding to local variables
are placed in the first basic block, which is necessary for the
If the called function never reads from the buffer, then the majority of LLVM optimizations to function correctly, and for
entire buffer is unnecessary. One approach is to clone such stack coloring to be performed. However, dynamic allocation
functions and remove the arguments in these cases, enabling of stack space within a function may not be in the first block
removal of the stores. However, this can dramatically increase (such as when an alloca call is made from C/C++ code); in
code size; inlining or cloning can be very expensive, and these circumstances, we have to also ensure that initialization
our design aims to remain practical by avoiding the need for is not performed before the allocation takes place.
any additional interprocedural analysis. Instead, we annotate
allocations and function (pointer) arguments which are only We implemented initialization detection optimization by
written to. If we can then show that portions of memory are adding a new intrinsic function, initialized, which has the
only stored to, and not read, then all the stores can be removed. same store-killing side effects as memset, but is ignored by
code generation. By extending components such as LLVMs
VI. I MPLEMENTATION loop idiom detection to generate this new intrinsic where
replacing code with a memset is not possible, we allow
We implemented a prototype of SafeInit by extending the other existing optimization passes to take advantage of this
clang compiler, and the LLVM compiler framework [35]. As information without the need to modify them individually.
7
D. Dead Store Elimination int deny_access;
if (deny_access) {
We implemented the other optimizations described above printf("Access denied.");
by extending existing LLVM code, keeping our changes mini- return 0;
mal where possible. Our implementation of write-only buffers }
made use of the patch in D18714 (since merged), which added printf("Access granted.");
the basic framework for a writeonly attribute.
We also based our implementation of cross-block Dead Fig. 10. (Simplified) example of an uninitialized read which is optimized
Store Elimination on the (rejected) patch in D13363. Due to away by existing compiler transforms; in this example, the code in the branch
performance regressions, we disable this cross-block DSE for is typically removed entirely.
small stores ( 8 bytes); we also extended this code to support
removing memset, and shortening such stores. $ var-cc -O2 example.c
Our prototype currently only applies non-constant short- $ multivar ./example-v0 ./example-v1
ening to memset calls which overwrite an entire object, and ! SYSCALL MISMATCH for variants v0 and v1
requires that they be at least 64 bytes for the efficiency reasons 0: write(1, "Access granted.", 15)
discussed above. LLVMs limited support for range analysis 1: write(1, "Access denied.", 14)
severely limits the current optimization opportunities for such == Backtrace ==
shortening, since in the majority of cases we are unable to ip = 7271a9620 __write+16
prove accesses are safe without performing our own analysis. ...
ip = 727120de9 _IO_printf+153
Since our goal is to show techniques which are practical ip = 4007ce check_access+366
to implement without needing additional analysis, we limited
ourselves to the typical analyses which are used by existing
Fig. 11. Example of an uninitialized read being detected, using optimized
in-tree code in such circumstances. These include checking builds of the code in Figure 10; since there is no uninitialized memory usage
known bits, and making use of the scalar evolution code for in the optimized binaries, tools such as valgrind fail to detect such cases.
loop variables. In turn, these limitations remove opportunities
for library call optimizations; we found that even our optimiza-
tions for string functions are of limited usefulness (outside of Filling memory with a constant value is much faster than
artificial micro-benchmarks) due to the effect of these safety using random values, so we fill all uninitialized bytes of
checks. memory in each variant with the same constant. Some opti-
mizations are no longer possible when using non-zero values;
E. Frame clearing in particular, we need to clear all heap memory, since the zero
pages returned from the kernel are no longer appropriate.
To put our evaluation into context, we also implemented an
alternative compiler hardening pass which clears the portion However, multi-variant systems do not necessarily require
of each frame reserved for local variables in every function synchronization (they need not run variants in lockstep);
prologue. The performance of this frame clearing provides an system calls need only be executed for one of the variants, the
estimate of the lower bound for these naive approaches; we so-called leader. Since our hardening has already mitigated
apply our normal stack hardening pass to protect non-static potential security issues, there is no need to run the variants
(dynamically-allocated) local variables. in lockstep. We initialize the values of the leader process with
This improves performance compared to simply clearing all zero, allowing it to run ahead of the other variants, which
frames, since we do not clear space reserved for other purposes reduces the overall runtime impact of this slower initialization.
such as spilled registers (although our optimized clearing code
sometimes clears part of this space, for alignment reasons). VIII. E VALUATION
This approach also fails to provide guarantees for overlapping
or re-used variables within the function; any changes to resolve Our benchmarks were run on a (4-core) Intel i7-3770
these (such as disabling stack coloring to avoid overlapping with 8GB of RAM, running (64-bit) Ubuntu 14.04.1. CPU
variables) resulted in significantly worse performance. frequency scaling was disabled, and hyperthreading enabled.
Transparent Huge Pages were turned off, due to their extremely
VII. D ETECTION unpredictable effect on runtime and memory usage this is a
commonly-recommended vendor configuration, and although it
Our hardened toolchain can also be combined with a has a negative effect on some benchmarks, it does not appear
modern high-performance multi-variant execution system such to meaningfully change our overhead figures.
as [30] to provide a detection tool, inspired by DieHard [5]. We
compile multiple versions of the same application, initializing Our baseline configuration is an unmodified version of
memory to different values in each variant. This allows us to clang/LLVM, using an unmodified version of tcmalloc. As
perform high-performance detection of the majority of uses of well as comparing this to SafeInit, we also present results for
uninitialized values, including those which would typically be the naive approach, which simply applies our initialization
removed by compiler optimizations or only stored in registers, pass without any of our proposed optimizations, using a
without the false positives resulting from harmless memory hardened allocator which simply zeroes all allocations. We
reads which do not affect the output. Example usage can be do make use of a modified compiler which performs local
seen in Figure 11. variable initialization and ensures that safety is maintained; for
8
20% 76% 36% 12%
10 naive SafeInit 3
2 2.2%
SafeInit
8 8.0% 1
0
6 1
2
4
ay
ilc
II
lex
3
3.5%
x
al
vr
lb
m
in
p
na
de
po
so
h
sp
2
0
Fig. 14. SPEC CFP2006, runtime overhead (%) for SafeInit
2
1
c
r
ch
cf
er
qu ng
m
om ref
k
gc
ta
tp
ip
bm
bm
m
tu
en
as
sje
64
ne
bz
hm
an
go
nc
rlb
h2
0
la
pe
xa
lib
Fig. 12. SPEC CINT2006, runtime overhead (%) when hardening with 2
SafeInit
la ar
ch
cf
er
qu ng
m
om ref
k
gc
tp
ip
bm
bm
t
m
tu
en
as
sje
64
ne
bz
hm
an
go
nc
rlb
h2
pe
20% 160% 36%
xa
lib
10 Fig. 15. SPEC CINT2006, runtime overhead (%) of SafeInits optimizer
frame clearing
SafeInit without hardening applied
8
6
baseline compiler. Results for CFP2006 are similar, as shown
4 in Figure 14, with an average overhead of 2.2%.
2 Table I provides details of the number of allocas (repre-
senting the number of local variables, plus occasional copies of
0 arguments or dynamic allocations) for each benchmark. Many
2 initializations are transformed or removed during optimization,
but the table contains the number of initializations which are
still represented as a memset in the final post-optimization
c
r
ch
cf
er
qu ng
m
om ref
k
gc
xa sta
tp
ip
bm
bm
m
tu
en
sje
64
ne
bz
an
go
nc
lb
h2
r
la
pe
9
TABLE I. SPEC CINT2006 DETAILS . # INITS IS THE NUMBER OF LARGE INITIALIZATIONS LEFT AFTER EXISTING COMPILER OPTIMIZATIONS AND OUR
OPTIMIZER HAVE RUN , RESPECTIVELY. S IZE IS THE ( STRIPPED ) BINARY SIZE .
Benchmark #allocas #inits (naive) #inits (opt) size (baseline) size (naive) size (optimizer)
astar 790 7 4 43736 43736 (0%) 43736 (0%)
bzip2 679 23 20 80488 84584 (5.1%) 84584 (5.1%)
gcc 31551 650 596 4108712 4133288 (0.6%) 4120992 (0.3%)
gobmk 17039 325 300 3554640 3566928 (0.3%) 3566928 (0.3%)
h264ref 4229 122 122 630664 638856 (1.3%) 638856 (1.3%)
hmmer 3333 19 18 189592 189592 (0%) 189592 (0%)
libquantum 567 3 2 31336 31336 (0%) 31336 (0%)
mcf 184 1 1 19040 19040 (0%) 19040 (0%)
omnetpp 7638 110 110 806712 810808 (0.5%) 814904 (1%)
perlbench 12327 175 167 1272584 1284872 (1%) 1280792 (1%)
sjeng 770 61 48 133976 133976 (0%) 133976 (0%)
xalancbmk 92396 1701 1302 3871528 3908392 (1%) 3892008 (0.5%)
6 6
naive SafeInit naive SafeInit
5 5
4 SafeInit 4 SafeInit
3 3
2 2
1 1
0 0
K
4K
4K
4K
B
1M
1M
1M
1M
-4
64
64
64
64
-
-
e
ev
le
e-
e-
fil
e-
ev
e-
fi
ev
rit
fil
fil
nd
fil
nd
fil
rit
rit
nd
nd
w
nd
nd
se
se
w
w
se
se
se
se
Fig. 16. requests/sec overhead (%) for hardening lighttpd Fig. 17. requests/sec overhead (%) for hardening nginx
Given the average overhead of 13.5%, it is clear that such Average overhead is minimal when sending the large
frame-based initialization without the benefit of compiler op- (1MB) file. In the extreme case of the small 4Kb file, where
timizations is too slow. Despite this simpler approach offering we process almost 70,000 requests per second, overhead is
considerably less safety, only bzip2 gains significant perfor- still less than 3%; the majority of execution time here is spent
mance benefit from these reduced guarantees. parsing incoming requests and maintaining internal structures.
We also investigated another approach for weakening guar- Much of lighttpds overhead for these tiny requests is
antees to improve performance, by increasing the lifetimes of caused by small heap allocations for strings in the chunk
variables inside loops so they would only be initialized once, queue; only the first byte of these is initialized by the caller, but
before the loop. The impact of this on stack coloring and our hardened allocator clears the entire allocation for safety.
register allocation resulted in worse performance for almost all The remaining overhead for both situations is due to lighttpds
benchmarks (and average overhead for CINT2006 of >5%). writev code, used by both backends for writing these al-
locations to the network, uses a fixed-size stack buffer. Our
B. Servers current optimizer fails to optimize away the unused portion of
the buffer, but improved optimizations or minor changes to the
We evaluated the overhead of SafeInit for less
code could reduce the overhead further. In fact, older versions
computationally-intensive tasks by using two modern
of lighttpd used a larger buffer in this code, but recently a
high-performance web servers, nginx (1.10.1) and lighttpd
sane limit was imposed on the buffer size; such modifications
(1.4.41). We built the web servers using LTO and -O3. Since
demonstrate how general code improvements can also reduce
they are I/O bound when used on our 1gbps network interface,
the overhead imposed by SafeInit.
we benchmarked them using the loopback interface. This is
an extreme scenario; in practice, the overhead of SafeInit is nginx: We tested nginx both with a default configuration
insignificant for these servers. (which is similar to the one we used for lighttpd) and with
sendfile enabled (which significantly increases performance for
We used apachebench to repeatedly download 4Kb, 64Kb
serving the 64Kb and 1MB files). All logging was disabled;
and 1MB files, for a period of 30 seconds. We enabled
our overhead is slightly reduced when logs are enabled. The
pipelining, used 8 concurrent workers, and used CPU affinity
results are shown in Figure 17.
to reserve a CPU core for apachebench. We measured the
overhead for the median number of requests per second, across Overhead of full SafeInit, including our optimizer, is not-
10 runs; we did not see significant amounts of variance. icably higher with the 64Kb files; however, the overhead of
SafeInit remains below 5% in all circumstances.
lighttpd: We attempted to configure lighttpd to optimize
throughput, allowing 100 requests per pipelined connection, nginx makes use of a custom pool-based memory allo-
and evaluated both the sendfile (default) and writev cator, which makes it difficult for our optimizer to analyse
network backends. The results are shown in Figure 16. code. However, we manually verified that memory is not (by
10
TABLE II. PHP 7.0.9 MICRO - BENCHMARK RESULTS ( IN SECONDS )
userspace allocator; our automatic hardening only protects
bench.php micro_bench.php local variables. Protecting other sources of uninitialized data,
baseline 1.029 3.983 such as the SLAB and buddy allocators, would require manual
new optimizer 1.007 (-2.1%) 3.879 (-2.6%) changes, and presumably add further overhead; such sanitiza-
naive SafeInit 1.004 (-2.5%) 3.994 (0.3%) tion is already offered by kernel patches such as grsecurity.
SafeInit 0.999 (-3%) 3.897 (-2.8%)
Table III provides a selection of latency and bandwidth
TABLE III. LM BENCH RESULTS . T IME IN MICROSECONDS , PLUS % figures for typical system calls, using LMbench, a kernel
OVERHEAD ABOVE BASELINE . microbenchmarking tool [43]. We ran each benchmark 10
Sub-benchmark Baseline w/Optimizer Stack SafeInit
times, with a short warming-up period and a high number of
syscall null 0.0402 0.0402 (0%) 0.0402 (0%)
iterations (100) per run, and provide the median result. TCP
syscall stat 0.2519 0.2369 (-5.9%) 0.2571 (2.1%)
connections were to localhost, and other parameters were those
syscall fstat 0.0739 0.0742 (0.4%) 0.0775 (4.9%) used by the default LMbench script. The overhead numbers
syscall open 0.7049 0.6778 (-3.8%) 0.7119 (1%) for the hardened kernels include the (negligible) overhead of
syscall read 0.0817 0.0819 (0.2%) 0.0819 (0.2%) hardening LMbench itself.
syscall write 0.0981 0.0979 (-0.2%) 0.0971 (-1%)
We incur substantial overhead for the stat and open
select tcp 4.5882 4.6714 (1.8%) 4.6497 (1.3%)
system calls; while this is largely mitigated by the improved
sig install 0.0964 0.0977 (1.4%) 0.1000 (3.7%)
sig catch 0.6534 0.6495 (-0.6%) 0.6648 (1.7%)
performance provided by our optimizer, it is a cause for
sig prot 0.2220 0.2210 (-0.4%) 0.2350 (5.9%)
concern, and we intend to investigate it further, along with
proc fork 65.5904 66.6386 (1.6%) 67.7927 (3.4%)
fstat and the (signal) protection fault, which is the only
proc exec 208.8846 209.8519 (0.5%) 212.3462 (1.7%) system call we saw with overhead >5%.
pipe 3.3500 3.3834 (1%) 3.4145 (1.9%)
To evaluate the real-world performance of SafeInit applied
tcp 6.7489 6.7163 (-0.5%) 6.6835 (-1%)
to the kernel stack, we hardened both nginx and the kernel
bw pipe (MB/s) 4988.09 4974.89 (0.3%) 5182.4 (-3.9%)
with SafeInit, and compared performance to a non-hardened
bw tcp (MB/s) 8269.34 8245.39 (0.3%) 8350.71 (-1%)
nginx running under a non-hardened kernel. Using the sendfile
configuration we discussed above, and again using the loop-
back interface to provide an extreme situation, we observed
default) re-used within the pool, to ensure that any potential overhead of 2.9%, 3% and 4.5% for the 1M, 64kB and 4kB
uninitialized memory vulnerabilities would still be mitigated. cases respectively.
We also ran nginx using our detection tool (using two We present the numbers above as a view of what is possible
variants); overhead (above our hardened version) was generally with only automatic mitigation, without application-specific
similar to that reported by Koning et al. [30], with worst-case knowledge. Our optimizer could be extended with knowledge
overhead of <75%. of heap functions, inline assembly, and core kernel functions
PHP: We also evaluated a modern high-performance such as copy_from_user, which would provide both im-
scripting language, PHP 7.0.9. We used the default compiler proved guarantees and more opportunities for optimization.
flags (-O2), since we encountered build system problems when
attempting to use LTO. However, PHP makes extensive use of D. Residual Overhead
an internal memory allocator, which re-uses memory obtained
from our hardened allocator; this reduces our safety guarantees The average overhead of CINT2006 is distorted by the
for smaller allocations. performance overhead of two outliers. The most significant
is sjeng, a chess program. It stores game moves in large on-
We ran both supplied PHP micro-benchmarks (from the stack arrays in several recursive functions, and these arrays
Zend directory). The median of 21 runs (we saw little are then passed to many other functions, with the size stored
variation between runs) is shown in Table II; the combination in a global variable. This code is so convoluted that, even
of SafeInit and our new optimizer result in performance with manual inspection, we are unable to determine whether
improvements of around 3% for both micro-benchmarks. We or not array elements may be used without being initialized. An
saw approximately 1% overhead (above the hardened version) appropriate approach might be to refactor or rewrite the code
when running these benchmarks under our detection system in question, removing such code smells, which would benefit
(using two variants). both compiler analysis as well as our manual inspection.
This may be unrealistic in some cases, so we added
C. Linux compiler support for annotating variables and types with a
We built the latest LLVMLinux [37] kernel tree1 using our no zeroinit attribute, and annotated sjengs move_s type;
toolchain. We customized the build system to allow use of this single annotation successfully reduced sjengs runtime
LTO, re-enabled built-in clang functions, and modified the gold overhead to 6.5% (which would, in turn, reduce the mean
linker to work around some LTO code generation issues we overhead for CINT2006 to less than 2%), in combination with
encountered with symbol ordering. our full set of optimizations.
Since the Linux kernel (inherently) performs its own mem- lighttpds buffer preparation function, discussed earlier,
ory management, it does not get linked with our hardened could also benefit from such an annotation. However, since
lighttpd does not clear the entire buffer, this would also
1 based on mainline revision f800c25b require detailed manual inspection to ensure it was safe; we
11
TABLE IV. WARNING PASS OUTPUT, FOR CINT2006
$ multivar php poc.php
Benchmark #Warnings Notes Starting php-zero (20439)
bzip2 4 one is a 4MB buffer added by SPEC Starting php-poison (20440)
gcc 1 20440 term sig: Segmentation fault (11)
gobmk 8 mostly too complex to analyze
h264ref 7
Fig. 18. Detection output when checking PHP CVE-2016-4243
perlbench 1 unused at runtime
sjeng 19 17 of these are move_s
xalancbmk 16 temporary (wide) string buffers
Our hardening does not prevent programs from reusing
TABLE V. V ERIFIED UNINITIALIZED VALUE MITIGATIONS memory internally. For example, a stack buffer may be reused
for different purposes within the same function, or a custom
CVE number Software Mitigated? Description internal heap allocator may reuse memory without clearing it,
2016-4243 PHP X Use of uninitialized stack variables, such as we saw with PHP. Although it would potentially be
including a pointer.
2016-5337 qemu X Info disclosure to guest; missing null
possible to catch some of these cases using heuristics, or by
terminator for stack string buffer. attaching annotations of some kind, we do not believe it is
2016-4486 Linux X Info disclosure to userspace; uninitial- realistic nor reasonable for a compiler to support this.
ized padding in struct on stack.
Clearing variables to zero ensures that any uninitialized
pointers are null. An attempt to dereference such a pointer will
do not believe the reduced safety in adding such annotations result in a fault; in such situations, our mitigation has reduced
is justified, given the low overhead of our approach. a more serious problem to a denial-of-service vulnerability.
We also added a warning pass to our compiler, which can In many cases, code will specifically check for null pointers
omit warnings (at link time) about large on-stack allocations or other variables, and so clearing variables mitigates issues
(by default, >4kB) for which our optimizer failed to remove entirely; when running our detection system, we noticed that
initialization. Figure IV summarizes the results for CINT2006 many uninitialized pointer dereferences were only triggered
(excluding the benchmarks which output no warnings). Many in the variant initialized with a non-zero value. For example,
of these are not on critical paths for performance, and some Figure 18 shows the output of our detection system executing
are completely unused in practice, such as a 8kB buffer in a proof-of-concept exploit for PHP CVE-2016-4243. Only the
perlbench described in the source code as The big, slow and variant initialized with a non-zero value attempts to derefer-
stupid way. These warnings could be combined with profiling ence the value (which results in a fault, caught by our system).
to determine which code needs to be refactored or annotated. Initializing all variables with zero also has the potential to
activate vulnerabilities which would otherwise have remained
E. Security dormant. A contrived example could be a insecure variable,
To verify that SafeInit works as expected, we not only which is used to force a check of some kind, but is used
considered a variety of real-world vulnerabilities, such as those uninitialized. This may not be a problem in practice under
in Table V, but also created a suite of individual test cases. some environments, where the underlying memory happens to
We inspected the bitcode and machine code generated for the always contain a non-zero value. However, this may change at
relevant code manually, and also ran our test suite using the any time, and since compilers are allowed to transform such
detection system we described above. We also used valgrind undefined behavior, it is always possible that such code may
to verify our hardening; for example, we confirmed that all be optimized away.
uninitialized value warnings from valgrind disappear when As stated in our threat model, we only consider C/C++
OpenSSL 0.8.9a is hardened with SafeInit. code; assembly routines fall outside the scope of this work,
As with all compiler optimizations, our improvements may although typical inline assembly will declare local variables in
expose latent bugs in other compiler code or in the source C/C++ code, which would then be initialized by our prototype.
being compiled, or even contain bugs themselves. We verified Since we have implemented our SafeInit prototype as an
that the benchmarks we ran produced correct results. We also LLVM pass, other compiler frontends making use of LLVM
extensively tested our hardened kernel, and where available could also easily benefit from our work; we look forward to
ran test suites for the compilers and software we hardened experimenting with NVIDIAs upcoming Fortran front-end.
(such as PHP). However, the potential for such issues remains
an inherent risk when using any modern compiler, as shown IX. L IMITATIONS
by Yang et al. [64]. Formal verification of compilers (e.g.,
Libraries: For complete protection against all uninitialized
CompCert [36]) or individual optimization passes (such as that
value vulnerabilities, all used libraries must also be instru-
by Zhao et al. [67] and Deng et al. [16]).
mented. The standard C library used on Linux, glibc, does
However, in total, our SafeInit prototype adds or modifies not build under clang, so our prototype implementation is
less than 2000 lines of code in LLVM, including some debug- unable to instrument it; this is a limitation of our specific
ging code and around 400 lines of code based on third-party implementation, not our design. Stepanov et al. [58] state that
patches. Although our modifications are complex, this is a they implemented interception of close to 300 libc functions
relatively small amount of code and each component should be in MemorySanitizer; while such knowledge of library functions
individually reviewable; for comparison, our (separate) frame is not required by SafeInit, having access to the bitcode for
clearing pass alone is more than 350 lines of code. libraries would also allow further performance improvements.
12
Since both the toolchain and C library are usually provided require less manual effort to fix. Recent research [65] claims
together, we feel it would be reasonable to make small im- to have reduced MSan overhead even further.
provements to the C library to mitigate any performance issues
for specific functions. However, in any case, we observed no Berger et al. proposed using multi-variant execution to
meaningful overhead (<0.1%) when building benchmarks and detect uses of uninitialized heap allocations in DieHard [5].
applications against an unmodified alternative C library (musl). By running multiple variants of the same program and filling
newly-allocated heap memory with random values, and pro-
Performance: Our modified optimizer can cause (small) viding all variants with identical input, any deviation in output
performance regressions in some code, caused by unintended was likely to be due to use of uninitialized memory. To obtain
consequences on other optimizations and code generation. For reliability against memory errors, they proposed running sev-
example, removing stores makes functions smaller, and so eral variants, and dropping any reporting inconsistent results.
more likely to fall under the inlining threshold; we can improve Stack clearing: gccs Fortran compiler provides an
performance across all of our benchmarks by modifying the -finit-local-zero option, intended only for compatibil-
threshold. To be as fair as possible, we presented our results ity with legacy Fortran code. Several C/C++ compilers provide
without any such changes. The optimizations proposed in options for automatic initialization of function stack frames,
our design and implemented in our prototype are deliberately
intended only for debugging purposes. As discussed, such
minimal, without additional analysis, to show they are practical
stack frame clearing adds a significant performance penalty,
to implement in current compilers; this limits many of the
and provides fewer guarantees.
possible transformations. Despite this, we expect the overhead
of SafeInit to decrease significantly over time, as the related Chen et al. presented StackArmor [9], a binary hardening
compiler optimizations continue to improve. system which isolated function frames containing potentially
unsafe buffers using guard pages and random reordering. This
There will inevitably be cases where performance is un- makes it more difficult for attackers to predict which data
acceptable in real-world code, as we saw with sjeng. Where may be present in uninitialized portions of frames, providing
annotations are an unacceptable solution, making changes to probabilistic mitigation of uninitialized data vulnerabilities;
the code may be necessary. However, such refactoring can also they combined it with analysis to add zero-initialization to
improve the code in other ways, whether just making it more potentially uninitialized portions of non-isolated frames, but
readable and easier to understand, or as we saw with lighttpd, reported high average overhead of 28% on SPEC CPU2006.
also by resolving potential memory or performance issues.
Heap clearing: Heap allocation clearing is an option in
Relevant recent developments in LLVM include improve- some existing allocators, such as jemalloc [20], although
ments to loop analysis and optimization [47] as well as trans- generally intended only for debugging; for example, the je-
forming entire structure definitions to improve performance malloc documentation warns that it will impact performance
[26]. During the development of our project, improvements to negatively. Wang et al. [62] proposed zero-initializing and
LLVMs store optimizations have also continued; for example, padding heap allocations at allocation time, by wrapping
one recent patch improved removal of stores which are over- malloc, to protect against buffer overread vulnerabilities.
written by multiple later store instructions, allowing removal Araujo and Hamlen [3] suggested just zeroing the first byte
of unnecessary initializations when individual members of a of all allocations, giving limited benefits (e.g., for C strings)
structure are initialized. We look forward to seeing how future but adding almost no overhead.
optimizations further decrease the overhead of our work.
Chow et al. proposed Secure Deallocation [11], which
modifies the system C library to zero heap allocations when
X. R ELATED W ORK freed, and modifies compiler code generation to clear stack
frames in function epilogues; this provides less comprehen-
Detection: Dynamic analysis tools for detecting uses of
sive protection and misses optimization opportunities. They
uninitialized data, such as valgrinds memcheck [54], track
claimed runtime overhead of <7% for heap clearing, but 10%-
the initialized state of each bit of memory and (optionally)
40% overhead for stack clearing, although their approach does
the origin of any uninitialized data. The high overhead of this
protect against some vulnerabilities outside our threat model.
tracking makes it often prohibitive for use during development,
and completely impractical to deploy. It is almost essential to Heap isolation: Isolating all heap allocations mitigates
use optimized binaries, where undefined behavior may have some classes of memory vulnerabilities, such as overflows;
already introduced undetectable vulnerabilities, along with however, this is at best a probabilistic defense, since limited
other issues which reduce the reliability of this approach, such available address space means memory is inevitably reused
as re-use of stack memory within functions. after a certain point. DieHard [5] allocates memory randomly
across an oversized heap, and Archipelago [41] allocates
More recent detection tools using a similar approach in- memory across the entire address space. OpenBSD [46] imple-
clude Dr. Memory [7], which significantly reduces overhead mented such a random allocator by default, including moving
by applying optimizations, and MemorySanitizer [58] (MSan), metadata out-of-bound, and DieHarder [48] built upon this to
which reduces overhead even further by instrumenting binaries increase entropy at an additional performance cost of 20%,
during compilation (using LLVM). The execution time over- due to the cost of memory fragmentation.
head for MSan is reported as 2.5x (with optimized binaries),
which is sufficient to make it usable as part of continuous Information disclosure defenses: Many defenses have been
integration for projects such as Chrome, and advancements proposed for protecting sensitive data. TaintEraser [68] uses
such as chained origin tracking mean that reported errors tainting to track sensitive user input and prevent it from
13
escaping to the filesystem or network. Harrison and Xu [24] may result in significantly better overhead in some cases,
proposed techniques for probabilistically protecting private particularly for the heap. We believe similar results could be
cryptographic keys against memory disclosure attacks, and obtained by adding knowledge of Linux heap functions and a
SWIPE [23] tracks sensitive data using static analysis and Linux-specific optimization pass to SafeInit; combining both
erases it at the end of its lifetime. techniques may also be a promising approach.
Defenses which depend on information hiding to protect
pointers or other metadata are particularly vulnerable to infor- XI. C ONCLUSION
mation disclosure. Advances such as fine-grained ASLR [25] Uninitialized data vulnerabilities continue to pose a secu-
are rendered useless if uninitialized memory errors can be used rity problem in modern C/C++ software, and ensuring safety
to disclose pointers. Defenses such as Code-Pointer Integrity against the use of uninitialized values is not as easy as it might
[32], Readactor [15] and ASLR-Guard [39] aim to protect code seem. Threats ranging from simple information disclosures to
pointers against more sophisticated disclosure attacks such as serious issues such as arbitrary memory writes, static analysis
those proposed by Evans et al. [19] and Schuster et al. [51]. limitations, and compiler optimizations taking advantage of
undefined behavior, combine to make this a difficult problem.
Linux kernel
We presented a toolchain-based hardening technique,
Uninitialized data vulnerabilities in the Linux kernel have SafeInit, which mitigates uses of uninitialized values in C/C++
had increased attention in recent years; as well as obvious programs by ensuring that all local variables and stack alloca-
issues of exposing confidential information, knowledge of tions are initialized before use. By making use of appropriate
kernel addresses has become important for attackers wishing to optimizations, we showed that runtime overhead for many
bypass defenses such as stack canaries (using gccs StackGuard applications can be reduced to a level which makes it practical
[14]) and ASLR (kASLR [13]). In 2011, Chen et al. [8] per- to apply as a standard hardening protection, and that this can
formed an extensive analysis of kernel vulnerabilities and re- be done practically in a modern compiler.
ported that the most common category were uninitialized data
errors, almost all of which led to information disclosure. More To foster further research in this area, we are open
recently, Peiro et al. [50] provided more in-depth discussion sourcing our SafeInit prototype, which is available at
of such kernel info disclosure vulnerabilities, and presented a https://github.com/vusec/safeinit. We hope to
technique for identifying stack information disclosures using work towards making SafeInit available as a standard compiler
static analysis. Linux also includes kmemcheck, a dynamic feature, and improving the optimizations it depends upon.
analysis tool for detecting uses of uninitialized heap memory.
ACKNOWLEDGEMENTS
grsecurity/PaX: The PaX project [49], as part of the
hardened grsecurity Linux patches, provides two different We would like to thank Kees Cook, Kangjie Lu and the
mitigations for potentially uninitialized kernel stack data, using anonymous reviewers for their comments. This work was
gcc plugins. One annotates structures which may be disclosed supported by the European Commission through project H2020
to userspace, and initializes any such structures on the stack ICT-32-2014 SHARCS under Grant Agreement No. 644571
to prevent accidental information disclosure. The other takes and by the Netherlands Organisation for Scientific Research
a more aggressive approach, clearing the kernel stack be- through grant NWO 639.023.309 VICI Dowsing.
fore/after system calls. A gcc plugin tracks the maximum stack
depth used for each call, providing efficient protection against R EFERENCES
stack re-use between different system calls, although still [1] CVE-2012-1889: Vulnerability in Microsoft XML core services could
theoretically allowing an attacker to exploit such issues within allow remote code execution, 2012.
a single call. Both grsecurity and recent mainline kernels can [2] P. Akritidis, M. Costa, M. Castro, and S. Hand, Baggy bounds
also be configured to initialize and/or clear heap allocations. checking: An efficient and backwards-compatible defense against out-
of-bounds errors. in USENIX Security, 2009.
UniSan: Concurrently to our work, Lu et al. developed [3] F. Araujo and K. Hamlen, Compiler-instrumented, dynamic secret-
UniSan[38], a compiler-based approach for mitigating infor- redaction of legacy processes for attacker deception, in USENIX
mation disclosure vulnerabilities caused by uninitialized values Security, 2015.
in the Linux kernel. They propose using static data-flow [4] M. Auslander and M. Hopkins, An overview of the PL. 8 compiler,
analysis to trace potential execution paths (after optimizations in SIGPLAN Symposium on Compiler Construction, 1982.
have been applied), and initializing any variables which cannot [5] E. D. Berger and B. G. Zorn, DieHard: probabilistic memory safety
be proven to be initialized before potentially being disclosed; for unsafe languages, in PLDI, 2006.
they implemented a prototype using LLVM, and manually [6] A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-
inspected their analysis results to find and disclose various new Gros, A. Kamsky, S. McPeak, and D. Engler, A few billion lines
of code later: using static analysis to find bugs in the real world,
uninitialized value disclosure vulnerabilities (some of which Communications of the ACM, vol. 53, no. 2, pp. 6675, 2010.
we used to verify the correctness of our own work). [7] D. Bruening and Q. Zhao, Practical memory checking with dr. mem-
ory, in CGO, 2011.
Our approach mitigates a wider range of potential uninitial-
ized value vulnerabilities on the stack (such as dereferencing [8] H. Chen, Y. Mao, X. Wang, D. Zhou, N. Zeldovich, and M. F.
Kaashoek, Linux kernel vulnerabilities: State-of-the-art defenses and
uninitialized pointers [40] or even control-flow-based side- open problems, in APSys, 2011.
channel attacks [52]), and SafeInit obtains good performance [9] X. Chen, A. Slowinska, D. Andriesse, H. Bos, and C. Giuffrida,
without additional data-flow analysis. However, UniSans inter- StackArmor: Comprehensive protection from stack-based memory
procedural analysis and specific knowledge of kernel functions error vulnerabilities for binaries. in NDSS, 2015.
14
[10] J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, and M. Rosenblum, [39] K. Lu, C. Song, B. Lee, S. P. Chung, T. Kim, and W. Lee, ASLR-
Understanding data lifetime via whole system simulation, in USENIX Guard: Stopping address space leakage for code reuse attacks, in CCS,
Security, 2004. 2015.
[11] J. Chow, B. Pfaff, T. Garfinkel, and M. Rosenblum, Shredding [40] K. Lu, M.-T. Walter, D. Pfaff, N. Stefan, W. Lee, and M. Backes,
your garbage: Reducing data lifetime through secure deallocation. in Unleashing use-before-initialization vulnerabilities in the Linux kernel
USENIX Security, 2005. using targeted stack spraying, in NDSS, 2017.
[12] K. Cook, Kernel exploitation via uninitialized stack, DEFCON 19, [41] V. B. Lvin, G. Novark, E. D. Berger, and B. G. Zorn, Archipelago:
2011. trading address space for reliability and security, in ASPLOS, 2008.
[13] , Kernel address space layout randomization, 2013, Linux Secu- [42] V. Makarov, The integrated register allocator for GCC, in GCC
rity Summit. Developers Summit, 2007.
[14] C. Cowan, C. Pu, D. Maier, J. Walpole, P. Bakke, S. Beattie, A. Grier, [43] L. W. McVoy and C. Staelin, LMbench: Portable tools for performance
P. Wagle, Q. Zhang, and H. Hinton, StackGuard: Automatic adaptive analysis, in USENIX, 1996.
detection and prevention of buffer-overflow attacks, in USENIX Secu- [44] K. Memarian, J. Matthiesen, J. Lingard, K. Nienhuis, D. Chisnall,
rity, 1998. R. N. M. Watson, and P. Sewell, Into the depths of c: Elaborating
the de facto standards, in PLDI, 2016.
[15] S. Crane, C. Liebchen, A. Homescu, L. Davi, P. Larsen, A.-R. Sadeghi,
S. Brunthaler, and M. Franz, Readactor: Practical code randomization [45] Microsoft, MS08-014 : The case of the uninitialized stack variable
resilient to memory disclosure, in S&P, 2015. vulnerability, 1998.
[16] C. Deng and K. S. Namjoshi, Securing a compiler transformation, in [46] O. Moerbeek, A new malloc (3) for OpenBSD, in EuroBSDCon, 2009.
Static Analysis, 2016. [47] A. Nemet and M. Zolotukhin, Advances in loop analysis frameworks
[17] C. Ding and Y. Zhong, Predicting whole-program locality through and optimizations, in LLVM Developers Meeting, 2015.
reuse distance analysis, in PLDI, 2003. [48] G. Novark and E. D. Berger, DieHarder: securing the heap, in CCS,
2010.
[18] D. Edelsohn, W. Gellerich, M. Hagog, D. Naishlos, M. Namolaru,
E. Pasch, H. Penner, U. Weigand, and A. Zaks, Contributions to the [49] PaX Team, PaX - gcc plugins galore, 2013, H2HC.
GNU compiler collection, IBM Systems Journal, 2005. [50] S. Peiro, M. Munoz, and A. Crespo, An analysis on the impact and
[19] I. Evans, S. Fingeret, J. Gonzalez, U. Otgonbaatar, T. Tang, H. Shrobe, detection of kernel stack infoleaks, Logic Journal of IGPL, 2016.
S. Sidiroglou-Douskos, M. Rinard, and H. Okhravi, Missing the point [51] F. Schuster, T. Tendyck, C. Liebchen, L. Davi, A.-R. Sadeghi, and
(er): On the effectiveness of code pointer integrity, in S&P, 2015. T. Holz, Counterfeit object-oriented programming: On the difficulty
[20] J. Evans, A scalable concurrent malloc(3) implementation for of preventing code reuse attacks in c++ applications, in S&P, 2015.
FreeBSD, in BSDCan, 2006. [52] J. Seibert, H. Okhravi, and E. Soderstrom, Information leaks without
memory disclosures: Remote side channel attacks on diversified code,
[21] H. Flake, Attacks on uninitialized local variables, Black Hat Europe,
in CCS, 2014.
2006.
[53] F. J. Serna, The info leak era on software exploitation, Black Hat
[22] S. Ghemawat and P. Menage, TCMalloc : Thread-caching malloc, USA, 2012.
2007.
[54] J. Seward and N. Nethercote, Using valgrind to detect undefined value
[23] K. Gondi, P. Bisht, P. Venkatachari, A. P. Sistla, and V. N. Venkatakr- errors with bit-precision, in USENIX, 2005.
ishnan, SWIPE: Eager erasure of sensitive data in large scale systems
[55] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu, and D. Boneh,
software, in CODASPY, 2012.
On the effectiveness of address-space randomization, in CCS, 2004.
[24] K. Harrison and S. Xu, Protecting cryptographic keys from memory [56] K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko, C. Liebchen, and
disclosure attacks, in DSN, 2007. A.-R. Sadeghi, Just-in-time code reuse: On the effectiveness of fine-
[25] J. Hiser, A. Nguyen-Tuong, M. Co, M. Hall, and J. W. Davidson, ILR: grained address space layout randomization, in S&P, 2013.
Whered my gadgets go? in S&P, 2012. [57] B. Spengler, Detection, prevention, and containment: A study of
[26] G. Hoflehner, LLVM performance improvements and headroom, in grsecurity, 2002, libres Software Meeting.
LLVM Developers Meeting, 2015. [58] E. Stepanov and K. Serebryany, MemorySanitizer: fast detector of
[27] J. Hubicka, Interprocedural optimization framework in GCC, in GCC uninitialized memory use in c++, in CGO, 2015.
Developers Summit, 2007. [59] C. Sun, V. Le, and Z. Su, Finding and analyzing compiler warning
[28] T. Johnson and D. L. Xinliang, ThinLTO: A fine-grained demand- defects, in ICSE, 2016.
driven infrastructure, in EuroLLVM, 2015. [60] L. Szekeres, M. Payer, T. Wei, and D. Song, SoK: Eternal war in
[29] M. Jurczyk, Enabling QR codes in Internet Explorer, or a story of a memory, in S&P, 2013.
cross-platform memory disclosure, 2015. [61] R. van Eeden, Unexpected code execution in smbd, 2015.
[30] K. Koning, H. Bos, and C. Giuffrida, Secure and efficient multi-variant [62] J. Wang, M. Zhao, Q. Zeng, D. Wu, and P. Liu, Risk assessment of
execution using hardware-assisted process virtualization, in DSN, 2016. buffer heartbleed over-read vulnerabilities, in DSN, 2015.
[31] B. C. Kuszmaul, SuperMalloc: a super fast multithreaded malloc for [63] X. Wang, H. Chen, A. Cheung, Z. Jia, N. Zeldovich, and M. F.
64-bit machines, in ISMM, 2015. Kaashoek, Undefined behavior: what happened to my code? in APSys,
2012.
[32] V. Kuznetsov, L. Szekeres, M. Payer, G. Candea, R. Sekar, and D. Song,
Code-Pointer Integrity, in OSDI, 2014. [64] X. Yang, Y. Chen, E. Eide, and J. Regehr, Finding and understanding
bugs in c compilers, in PLDI, 2011.
[33] W. Landi, Undecidability of static analysis, ACM Lett. Program. Lang.
[65] D. Ye, Y. Sui, and J. Xue, Accelerating dynamic detection of uses of
Syst., 1992.
undefined values with static value-flow analysis, in CGO, 2014.
[34] C. Lattner, What every C programmer should know about undefined
[66] S. Yilek, E. Rescorla, H. Shacham, B. Enright, and S. Savage, When
behavior, 2011, LLVM project blog.
private keys are public: results from the 2008 Debian OpenSSL vulner-
[35] C. Lattner and V. Adve, LLVM: A Compilation Framework for ability, in IMC, 2009.
Lifelong Program Analysis & Transformation, in CGO, 2004. [67] J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic, Formal
[36] X. Leroy, Formal verification of a realistic compiler, Communications verification of SSA-based optimizations for LLVM, in PLDI, 2013.
of the ACM, no. 7, pp. 107115, 2009. [68] D. Y. Zhu, J. Jung, D. Song, T. Kohno, and D. Wetherall, TaintEraser:
[37] Linux Foundation, LLVMLinux project. Protecting sensitive data leaks using application-level taint tracking,
ACM SIGOPS Operating Systems Review, 2011.
[38] K. Lu, C. Song, T. Kim, and W. Lee, UniSan: Proactive kernel memory
initialization to eliminate data leakages, in CCS, 2016.
15