SlideShare a Scribd company logo
Program Synthesis in Reverse Engineering
Rolf Rolles
M¨obius Strip Reverse Engineering
http://www.msreverseengineering.com
No Such Conference Keynote Speech
November 19th, 2014
Program Synthesis in Reverse Engineering
Overview
Desired
Behavior
Program
Synthesizer
Program
Program synthesis is an academic discipline devoted to
creating programs automatically, given an expectation of how
the program should behave.
We apply and adapt existing academic work to automate
tasks in reverse engineering.
Program Synthesis
What it is Not
“Can I have a web browser?”
Program Synthesizer
“Sure, here you go!”
Program Synthesis
What it is Not
“Can I have a web browser?”
Program Synthesizer
“Sure, here you go!”
Program synthesis:
Requires precise behavioral specifications
Operates on small scales
Is primarily researched on loop-free programs
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Enumerate every possible function abs(int x):
int abs(int x) { return -x; }
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Enumerate every possible function abs(int x):
int abs(int x) { return ~x; }
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Enumerate every possible function abs(int x):
int abs(int x) { return -(x+0); }
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Enumerate every possible function abs(int x):
int abs(int x) { return ~(x+0); }
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Enumerate every possible function abs(int x):
int abs(int x) { return -(x+1); }
Program Synthesis
The Simplest Possible Program Synthesizer
Starting from a behavioral specification:
int abs(int x) =
x if x >= 0
-x otherwise
Create a harness for testing abs(x) exhaustively:
int main(int , char **) {
for(int x = 0; x != MAX_INT; ++x) {
int r = abs(x);
if((x >= 0 && r != x) || r != -x)
return -1;
}
return 0;
}
Enumerate every possible function abs(int x):
int abs(int x) { return ~(x+1); }
Program Synthesis
The Simplest Possible Program Synthesizer
The brute-force synthesizer:
Is extremely slow, and
Explores an infinite number of programs.
We could improve the situation by carefully choosing which
programs to generate.
Modern program synthesis goes further and does better.
Introduction
Ingredients
X86 Disassembly and Assembly
IR Translation
IR Interpreter
SMT Integration
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
Ingredients
X86 Disassembly and Assembly
28 D8
0F B6 DB
81 C3 BB BB BB BB
01 D8
X86 Disassembler
sub bl, bl
movzx ebx, bl
add ebx, 0BBBBBBBBh
add eax, ebx
28 D8
0F B6 DB
81 C3 BB BB BB BB
01 D8
X86 Assembler
sub bl, bl
movzx ebx, bl
add ebx, 0BBBBBBBBh
add eax, ebx
Given bytes, a disassembler produces X86 assembly. An
assembler does the opposite.
Ingredients
IR Translation
add eax, ecx
IR Translator
vRes = vEax + vEcx;
vZF = vRes == 0;
vSF = vRes <s 0;
vPF = Parity(vRes);
vCF = vRes <u vEax;
vOF = (vEcx ^ vRes)
& (vEax ^ vRes) <s 0;
vAF = (vRes ^ vEax ^ vEcx)
& 0x10 != 0;
vEax = vRes;
IR translators convert instructions to a symbolic representation of
their actual effects when executed.
Ingredients
IR Interpreter
vRes = vEax + vEcx;
vZF = vRes == 0;
vSF = vRes <s 0;
vPF = Parity(vRes);
vCF = vRes <u vEax;
vOF = (vEcx ^ vRes)
& (vEax ^ vRes) <s 0;
vAF = (vRes ^ vEax ^ vEcx)
& 0x10 != 0;
vEax = vRes;
IR for add eax, ecx
eax 645EDE7Bh zf 0 of 0 sf 1
ecx 0FCD02BDEh af 0 pf 1 cf 0
Input State
IR Interpreter
eax 612F0A59h zf 0 of 0 sf 0
ecx 0FCD02BDEh af 1 pf 1 cf 1
Output State
The IR interpreter executes IR statements in an input state,
producing an output state.
Ingredients
SMT Integration
vRes = vEax + vEcx;
vZF = vRes == 0;
vSF = vRes <s 0;
vPF = Parity(vRes);
vCF = vRes <u vEax;
vOF = (vEcx ^ vRes)
& (vEax ^ vRes) <s 0;
vAF = (vRes ^ vEax ^ vEcx)
& 0x10 != 0;
vEax = vRes;
IR for add eax, ecx
vZF == 1
Query
SMT Solver
vEax = 0, vEcx = 0
SMT solvers can answer questions about the values of
variables and memory within IR sequences.
We queried for a model of eax and ecx where vZF == 1.
Introduction
Ingredients
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
Program Synthesis in Reverse Engineering
CPU Emulator Synthesis
CPU
Emulator
Synthesizer
X86 Assembler
Templates
add eax, ecx
vRes = vEax + vEcx;
vZF = vRes == 0;
vSF = vRes <s 0;
vPF = Parity(vRes);
vCF = vRes <u vEax;
vOF = (vEcx ^ vRes)
& (vEax ^ vRes) <s 0;
vAF = (vRes ^ vEax ^ vEcx)
& 0x10 != 0;
vEax = vEax + vEcx;
CPU Emulator Logic
We present a re-designed version of [2] for generating CPU
emulators.
Program Synthesis in Reverse Engineering
Peephole Superdeobfuscation
Deobfuscator
Generator
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Code push r321
mov r321, i321
mov r322, i322
add r322, r321
pop r321
Simplifies To
mov r322, i321+i322
Deobfuscator Rule
We apply the ideas of [1] to automatically create
deobfuscators for certain obfuscators.
Program Synthesis in Reverse Engineering
Metamorphic Extraction
Functionality
Extractor
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
add al, 0B7h
sub al, 90h
push cx
mov cl, bl
add al, bl
pop cx
add al, 90h
sub al, 0B7h
push ebx
mov ebx, 8716AEF1h
push ecx
push eax
xor dword ptr [esp], ebx
; continued
We apply a technique from [3] to automatically recover
metamorphically-generated code sequences.
Introduction
Ingredients
X86 Disassembly and Assembly
IR Translation
IR Interpreter
SMT Integration
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
CPU Emulator Synthesis
Overview
CPU emulators require accurate descriptions of the operation
of the X86 processor.
Sadly, the Intel manuals are at best vague, at worst wrong.
IF 64-Bit Mode
THEN
#UD;
ELSE
IF ((AL AND 0FH) > 9) or (AF = 1)
THEN
WRONG: AL ← AL + 6; AX ← AX + 0x106
WRONG: AH ← AH + 1;
AF ← 1;
CF ← 1;
AL ← AL AND 0FH;
ELSE
AF ← 0;
CF ← 0;
AL ← AL AND 0FH;
FI;
FI;
That’s from the first instruction in the manual.
CPU Emulator Synthesis
Overview
Idea: use program synthesis to find instruction descriptions.
add eax, ecx
CPU Emulator Synthesizer
vRes = vEax + vEcx;
vZF = vRes == 0;
vSF = vRes <s 0;
vPF = Parity(vRes);
vCF = vRes <u vEax;
vOF = (vEcx ^ vRes)
& (vEax ^ vRes) <s 0;
vAF = (vRes ^ vEax ^ vEcx)
& 0x10 != 0;
vEax = vEax + vEcx;
CPU Emulator Synthesis
Overview
add eax, ecx
Behavior Sampler
I/O Pair Data
Hypothesis Generator
Templates, Substitutions
Hypotheses
Empirical Sieve
Candidate Hypotheses
Equivalence Checker
Single Result
We describe each component
next.
1. Hypothesis Generator
2. Behavior Sampler
3. Empirical Sieve
4. Equivalence Checker
CPU Emulator Synthesis
Hypothesis Generator
We generate a family of expressions from a template expression
and a list of substitutions.
Binop(_,+,_)
vEax Dword(0)
Tree representation of vEax + 0
Token Replace With
+ +,-,&,|,^
vEax vEax,vEcx
Specified Substitutions
vEax + 0 vEax - 0 vEax & 0 vEax | 0 vEax ^ 0
vEcx + 0 vEcx - 0 vEcx & 0 vEcx | 0 vEcx ^ 0
The 5 ∗ 2 = 10 expressions generated from the above
CPU Emulator Synthesis
Hypothesis Generator
A result location is a location modified by an instruction.
A hypothesis is an IR expression of equality between a result
location variable and a generated expression.
For the instruction add eax, ebx, the result locations are
vEaxout , vZFout , vSFout , vPFout , vCFout , vOFout , and vAFout .
vZFout == (vEax + vEbx) <s 0 vZFout == ((vEax + vEbx) ^ vEax) >u 0
vSFout == (vEax + vEbx) <s 0 vSFout == ((vEax + vEbx) ^ vEax) >u 0
vPFout == (vEax + vEbx) <s 0 vPFout == ((vEax + vEbx) ^ vEax) >u 0
vCFout == (vEax + vEbx) <s 0 vCFout == ((vEax + vEbx) ^ vEax) >u 0
vOFout == (vEax + vEbx) <s 0 vOFout == ((vEax + vEbx) ^ vEax) >u 0
vAFout == (vEax + vEbx) <s 0 vAFout == ((vEax + vEbx) ^ vEax) >u 0
Some hypotheses for add eax, ebx
CPU Emulator Synthesis
Behavior Sampler
add eax, ecx Input State
Behavior Sampler
Output State
Given a set of register and flag values as input:
1. JIT assemble the instruction
2. Set the processor registers and flags to the input state
3. Execute the instruction
4. Collect the values of the registers and flags as output
These I/O pairs are samples of the instruction’s behavior.
CPU Emulator Synthesis
Empirical Sieve
eaxin = 0
ecxin = 0
cfout = 0
I/O Data
CFout == (vEax == vEcx)
Hypothesis
0 == (0 == 0)
0 == 1
false
Evaluation
If a hypothesis is false in any I/O pair, it cannot describe the
instruction’s behavior, so we discard it.
In the figure above, CFout == (vEax == vEcx) is an invalid
hypothesis because it evaluates to false for the pair listed.
CPU Emulator Synthesis
Empirical Sieve
I/O Pairs Hypothesis Generator
IR Interpreter
Unfalsified Hypotheses
We generate many hypotheses, test them against the I/O
pairs, and discard any that evaluate to false.
Multiple unfalsified hypotheses may remain after the empirical
sieve. The next component removes more of them.
CPU Emulator Synthesis
Equivalence Checker
Equivalence checking uses an SMT solver to determine
whether two expressions evaluate equally for all inputs. If not,
it finds an input that causes a different evaluation.
(vEax+vEcx) == 0
Expression #1
(vEcx+vEax+vEcx+vEax) == 0
Expression #2
Two hypotheses for ZFout .
We query an SMT solver as to the satisfiability of
((vEax+vEcx) == 0) != ((vEcx+vEax+vEcx+vEax) == 0).
If UNSAT, the hypotheses always behave the same.
IF SAT, the solution contains values for vEax and vEcx that
cause the hypotheses to evaluate differently.
CPU Emulator Synthesis
Equivalence Checker
The SMT solver reports satisfiability, and gives a model where
the first hypothesis for ZFout evaluates to 0, and the other to 1.
eaxin = 3d800000h
ecxin = 42800000h
Model for ((vEax+vEcx) == 0) != ((vEcx+vEax+vEcx+vEax) == 0)
If we sample the instruction’s behavior in this input state, one
of the hypotheses must become falsified, since:
If zfout = 1, then (vEax+vEcx) == 0 is false.
If zfout = 0, then (vEcx+vEax+vEcx+vEax) == 0 is false.
Hypothesis #2 is falsified.
CPU Emulator Synthesis
Behavior Sampler
add eax, ecx
Behavior Sampler
I/O Pair Data
Hypothesis Generator
<out> == <in>,
<out> == <in> <+> <in>,
. . .
CFout == (vEax == vEcx),
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .EmpiricalSieve
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .
Equivalence Checker
CFout == vEax + vEcx <u vEax
First, we generate hy-
potheses for the instruc-
tion’s effects.
CPU Emulator Synthesis
Hypothesis Generator
add eax, ecx
Behavior Sampler
I/O Pair Data
Hypothesis Generator
<out> == <in>,
<out> == <in> <+> <in>,
. . .
CFout == (vEax == vEcx),
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .EmpiricalSieve
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .
Equivalence Checker
CFout == vEax + vEcx <u vEax
Next, we collect I/O
pairs for an instruction.
CPU Emulator Synthesis
Empirical Sieve
add eax, ecx
Behavior Sampler
I/O Pair Data
Hypothesis Generator
<out> == <in>,
<out> == <in> <+> <in>,
. . .
CFout == (vEax == vEcx),
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .Empirical Sieve
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .
Equivalence Checker
CFout == vEax + vEcx <u vEax
Remove hypotheses
that are shown to be
false by any I/O pair.
CPU Emulator Synthesis
Equivalence Checking
add eax, ecx
Behavior Sampler
I/O Pair Data
Hypothesis Generator
<out> == <in>,
<out> == <in> <+> <in>,
. . .
CFout == (vEax == vEcx),
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .Empirical Sieve
CFout == (vEcx <s 0),
CFout == (vEax + vEcx <u vEax),
. . .
Equivalence Checker
CFout == vEax + vEcx <u vEax
The equivalence checker
removes all but one hy-
pothesis.
Introduction
Ingredients
X86 Disassembly and Assembly
IR Translation
IR Interpreter
SMT Integration
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
Peephole Superdeobfuscation
Rule-Based Obfuscation
Unobfuscated Obfuscated Constraints
add r81, r82 add r81, imm8
add r81, r82
sub r81, imm8
sub r321, r322 push r323 r323 = r321
mov r323, r322 r321 = esp
sub r321, r323 r321 = esp
pop r323 r323 = esp
Obfuscator Rules
Some obfuscators use a rule set to randomly permute code.
The ordinary and obfuscated sequences must have the same
(or at least “similar”) effects when executed.
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh
add al, bl
Blah
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh
sub al, 9Fh
add al, bl add al, bl
add al, 9Fh
Replace instructions with obfuscated sequences
Obfuscation
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh
sub al, 9Fh sub al, 9Fh
sub al, 7Fh
add al, bl add al, bl
add al, 7Fh
push edx
mov dh, 22h
inc dh
dec dh
add dh, 7Dh
add al, 9Fh add al, dh
pop edx
Replace instructions with obfuscated sequences
Obfuscation
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh push cx
mov ch, 9Fh
sub al, 9Fh sub al, ch
pop cx
sub al, 7Fh sub al, 7Fh
sub al, 70h
add al, bl add al, bl
add al, 70h
add al, 7Fh add al, 7Fh
push edx push edx
push ecx
mov ch, 22h
mov dh, 22h mov dh, ch
pop ecx
inc dh inc dh
dec dh dec dh
add dh, 7Dh add dh, 7Dh
add al, dh add al, dh
pop edx pop edx
Replace instructions with obfuscated sequences
Obfuscation
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh push cx
mov ch, 9Fh
sub al, ch
pop cx
sub al, 7Fh
sub al, 70h
add al, bl
add al, 70h
add al, 7Fh
push edx
push ecx
mov ch, 22h
mov dh, ch
pop ecx
inc dh
dec dh
add dh, 7Dh
add al, dh
pop edx
Blah
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh push cx
mov ch, 9Fh
sub al, 9Fh sub al, ch
pop cx
sub al, 7Fh sub al, 7Fh
sub al, 70h
add al, bl add al, bl
add al, 70h
add al, 7Fh add al, 7Fh
push edx push edx
push ecx
mov ch, 22h
mov dh, 22h mov dh, ch
pop ecx
inc dh inc dh
dec dh dec dh
add dh, 7Dh add dh, 7Dh
add al, dh add al, dh
pop edx pop edx
Replace obfuscated sequences with instructions
Deobfuscation with Inverse Rules
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh
sub al, 9Fh sub al, 9Fh
sub al, 7Fh
add al, bl add al, bl
add al, 7Fh
push edx
mov dh, 22h
inc dh
dec dh
add dh, 7Dh
add al, 9Fh add al, dh
pop edx
Replace obfuscated sequences with instructions
Deobfuscation with Inverse Rules
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh
sub al, 9Fh
add al, bl add al, bl
add al, 9Fh
Replace obfuscated sequences with instructions
Deobfuscation with Inverse Rules
Peephole Superdeobfuscation
add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh
add al, bl
Blah
Deobfuscation with Inverse Rules
Peephole Superdeobfuscation
Rule-Based Deobfuscation
Obfuscated Unobfuscated Constraints
add r81, imm8 add r81, r82
add r81, r82
sub r81, imm8
push r323 sub r321, r322 r323 = r321
mov r323, r322 r321 = esp
sub r321, r323 r321 = esp
pop r323 r323 = esp
Deobfuscator Rules
We could inspect the code manually, pick out patterns, and
write deobfuscator rules.
This is Tedious And Error-Prone—.
We automate the process with program synthesis.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4]
mov [esp], eax
Obfuscated
push eax
IR Interpreter
eax 645EDE7Bh
eax 0FCD02BDEh
Output State #1 Output State #2
State
Comparison
Collect obfuscated sequences.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4]
mov [esp], eax
Obfuscated
push eax
IR Interpreter
eax 645EDE7Bh
esp 0FCD02BDEh
Input State
esp 0FCD02BDAh
mem[esp] = 0x645EDE7B
Output State #1
Output State #2
State
Comparison
Generate I/O pairs.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4]
mov [esp], eax
Obfuscated
push eax
Collect
Registers /
Constants
eax 645EDE7Bh
eax 0FCD02BDEh
Output State #1
eax, esp, 4
Operand Parts
Output State #2
State
Comparison
Preparing for candidate generation, collect registers and constants
from the obfuscated sequence.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4]
push r32
push i32
add r32, i32
Candidate Templates
push eax
Candidate
push esp
push 4
add esp, 4
add eax, 4
IR Interpreter
Candidate
Enumerator
eax 645EDE7Bh
eax 0FCD02BDEh
Output State #1
eax, esp, 4
Operand Parts
Output State #2
State
Comparison
Plug the registers and constants into the templates to generate
candidate replacements.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4] push eax
Candidate
IR Interpreter
eax 645EDE7Bh
esp 0FCD02BDEh
Input State
Output State #1
esp 0FCD02BDAh
mem[esp] = 0x645EDE7B
Output State #2
State
Comparison
For each candidate, generate I/O pairs in the same input states
as for the original sequence.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4]
mov [esp], eax
Obfuscated
push eax
Candidate
IR Interpreter
eax 645EDE7Bh
esp 0FCD02BDEh
Input State
esp 0FCD02BDAh
mem[esp] = 0x645EDE7B
Output State #1
esp 0FCD02BDAh
mem[esp] = 0x645EDE7B
Output State #2
State
Comparison
Compare the resulting states. We can be lenient for flags and
stack memory. If they match, the candidate is a potential
replacement for the obfuscated sequence.
Peephole Superdeobfuscation
The Idea: Obfuscated/Unobfuscated Behavior Must Match
lea esp, [esp-4]
mov [esp], eax
Obfuscated
push eax
Candidate
Equivalence Checker
eax 645EDE7Bh
eax 0FCD02BDEh
Output State #1 Output State #2
State
Comparison
Obfuscated Deobfuscated
lea esp, [esp-4] push eax
mov [esp], eax
Deobfuscator Rules
If the states matched, use an SMT solver to ensure that the
sequences are actually equivalent. Learn rules if so.
Peephole Superdeobfuscation
Generalization
Obfuscated Deobfuscated Constraints
lea esp, [esp-4] push eax
mov [esp], eax
Deobfuscator Rules
Our deobfuscator rule is specific to the register eax.
In fact, eax can be replaced with any register except esp.
Peephole Superdeobfuscation
Generalization
Obfuscated Deobfuscated Constraints
lea esp, [esp-4] push eax
mov [esp], eax
Deobfuscator Rules
Obfuscated Deobfuscated Constraints
Generalized Rules
lea esp, [esp-4] push r32 r32 = esp
mov [esp], r32
Our deobfuscator rule is specific to the register eax.
In fact, eax can be replaced with any register except esp.
Generalization removes specific register and/or immediate
values from deobfuscation rules.
Peephole Superdeobfuscation
In Action
sub esp, 4
mov [esp], eax
mov eax, ebp
push ebp
push edi
mov [esp], eax
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
mov [esp], r322
mov r322, i321+i322
r321 = r322
We iterate through the in-
structions, trying to sim-
plify those within a window.
: bottom of current window.
: can simplify.
: simplification applied.
Peephole Superdeobfuscation
In Action
sub esp, 4
mov [esp], eax
mov eax, ebp
push ebp
push edi
mov [esp], eax
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
mov [esp], r322
mov r322, i321+i322
r321 = r322
Learn rule and generalize.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
push edi
mov [esp], eax
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
mov [esp], r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
push edi
mov [esp], eax
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
mov [esp], r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
push edi
mov [esp], eax
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
mov [esp], r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
push edi
mov [esp], eax
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
mov r322, i321+i322
r321 = r322
Learn rule and generalize.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
push eax
xor [esp], 7FEE03B1h
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
push eax
xor [esp], 7FEE03B1h
xor [esp], 7FEE03B1h
pop ebp
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
Learn rule and generalize.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
sub esp, 4
mov [esp], esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
Apply previous simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
push esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
push esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
push esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
push esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
mov r322, i321+i322
r321 = r322
No simplification.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
push esi
push edx
mov edx, 6F6B5081h
mov esi, 0BC6D38Bh
add esi, edx
pop edx
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
push r321 mov r322, i321+i322 r321 = esp
mov r321, i321 mov r322, i321+i322 r322 = esp
mov r322, i322 r321 = r322
add r322, r321
pop r321
Learn rule and generalize.
Peephole Superdeobfuscation
In Action
push eax
mov eax, ebp
push ebp
xor [esp], 7FEE03B1h
mov ebp, eax
xor ebp, 7FEE03B1h
push esi
mov esi, 7B32240Ch
Obfuscated Deobfuscated Constraints
sub esp, 4 push r32 r32 = esp
mov [esp], r32
push r321 push r322 r322 = esp
mov [esp], r322
push r321 mov r322, r321 r321 = esp
xor [esp], i32 xor r322, i32 r322 = esp
pop r322
push r321 mov r322, i321+i322 r321 = esp
mov r321, i321 mov r322, i321+i322 r322 = esp
mov r322, i322 r321 = r322
add r322, r321
pop r321
Continue ad infinitum.
wtf?
Peephole Superdeobfuscation
System Output
Deobfuscator Rules
Code Generator
C CodePython Code OCaml Code
We can turn a set of rules into a program that uses
pattern-matching to implement those transformations.
We can generate a deobfuscator automatically, and also the
obfuscator that created the code in the first place!
Peephole Superdeobfuscation
Limitations
mov eax, 12h
or edx, eax
add eax, 34h
jmp @end
bswap ax
bsr edx, 12h
salc
cmc
@end:
xor edx, edx
mov eax, [esp]
add esp, 4
ret
Could simplify if we knew that edx was dead (i.e., not used
before its next modification ).
Ours is a forward analysis; dead code elimination requires
backwards analysis.
Can incorporate a liveness analysis for this purpose.
Introduction
Ingredients
X86 Disassembly and Assembly
IR Translation
IR Interpreter
SMT Integration
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
Template-Based Program Synthesis
Overview
Question: is it possible to create the function x+1 by using two of
~ (not) and/or - (neg)?
y = ~x; y = -x;
z = ~y; z = ~y;
y = ~x; y = -x;
z = -y; z = -y;
Table: All Possible Sequences
Template-Based Program Synthesis
Templates
Problem phrased as a template: what do bop1 and bop2 need to be
set to, so that f(x) == x+1 for all values of x?
bool bop1 , bop2;
int f(int x)
{
int y = bop1 ? -x : ~x;
return bop2 ? -y : ~y;
}
Template-Based Program Synthesis
Solving
After phrasing the question appropriately:
English Mathematics
Are there values of bop1, bop2 ∃ bop1, bop2 ∈ Bool ·
Such that, for all values of x ∀ x ∈ BV[32] ·
In the code
y = bop1 ? -x : ~x let y = bop1 ? -x : ~x in
z = bop2 ? -y : ~y let z = bop2 ? -y : ~y in
z == x+1 is always true? z == x+1
An SMT solver gives bop1 = false, bop2 = true.
I.e., int f(int x) { return -~x; }
Template-Based Program Synthesis
Extending the Framework
We can extend the idea to use
more than two operator types:
Or reference further constants
that the solver must provide:
char op1;
int f(int x)
{
y = op1 == 0 ? -x :
op1 == 1 ? ~x :
x-1;
return y;
}
bool op1;
char c1;
int f(int x)
{
y = op1 ? x + c1 :
x ^ c1;
return y;
}
Introduction
Ingredients
X86 Disassembly and Assembly
IR Translation
IR Interpreter
SMT Integration
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
Metamorphic Extraction
Metamorphic Decoder Generation
A metamorphic engine generates code like the following:
lodsb
op al, bl
op al, i81
op al, i82
op bl, al
Where each op can be add, sub, or xor, and i81
/ i82 are 8-bit constants. I.e., 34 ∗ 28 ∗ 28 ≈ 5.3
million possible instances.
Example metamorphic decoder:
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
Metamorphic Extraction
Further Obfuscation
add al, 0B7h
sub al, 90h
push cx
mov cl, bl
add al, bl
pop cx
add al, 90h
sub al, 0B7h
push ebx
mov ebx, 8716AEF1h
push ecx
push eax
xor dword ptr [esp], ebx
; continued
Obfuscator
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The metamorphic code is then further obfuscated.
Metamorphic Extraction
Goal
lodsb
op al, bl
op al, i81
op al, i82
op bl, al
add al, 0B7h
sub al, 90h
push cx
mov cl, bl
add al, bl
pop cx
add al, 90h
sub al, 0B7h
push ebx
mov ebx, 8716AEF1h
push ecx
push eax
xor dword ptr [esp], ebx
; continued
Re-Create
Given an obfuscated sequence, we want to determine the
underlying metamorphically-generated function.
I.e., would like to know the ops and i8s.
We re-create the information instead of deobfuscating.
Let’s explore two approaches based on program synthesis.
Metamorphic Extraction
Template for Metamorphic Functionality
Template for Metamorphic Functionality
al = Load(vMem,vEdi,8);
a = op1 == 0 ? al + vBl : op1 == 1 ? al - vBl : al ^ vBl;
b = op2 == 0 ? a + c1 : op2 == 1 ? a - c1 : a ^ c1;
c = op3 == 0 ? b + c2 : op3 == 1 ? b - c2 : b ^ c2;
d = op4 == 0 ? vBl + c : op4 == 1 ? vBl - c : vBl ^ c;
X86-related variables:
al is the value read by lodsb
vBl is the initial value of bl
d is the final value of bl
Template parameters:
Constants c1, c2
Operators op1, op2, op3, op4
0:add, 1:sub, 2+:xor
lodsb
op al, bl
op al, i81
op al, i82
op bl, al
Metamorphic Extraction
Straightforward Approach Using ∃ · ∀·
Template Program
IR for obfuscated X86
IR assertions
∃ op1, op2, op3, op4, c1, c2 ·
∀ input X86 states ·
d == vBlAfter
Query
SMT Solver
Values for
op1, op2, op3, op4, c1, c2
We can solve directly using the quantifiers ∃, ∀.
Quantifiers can slow solving, so we show an alternative.
Metamorphic Extraction
Collect I/O Pairs
IR for obfuscated X86 Input State
IR Interpreter
Output State
Collect I/O pairs for the obfuscated X86 seqeuence.
Extract the important parts of the states.
al 0x00 vBl 0x00 vBlAfter 0xE1
Parts of state needed for synthesis
Metamorphic Extraction
Create Witnesses
Plug the I/O pair
al 0x00 vBl 0x00 vBlAfter 0xE1
Into the template
a = op1 == 0 ? al + vBl : op1 == 1 ? al - vBl : al ^ vBl;
b = op2 == 0 ? a + c1 : op2 == 1 ? a - c1 : a ^ c1;
c = op3 == 0 ? b + c2 : op3 == 1 ? b - c2 : b ^ c2;
d = op4 == 0 ? vBl + c : op4 == 1 ? vBl - c : vBl ^ c;
To obtain a witness:
a1 = op1 == 0 ? 0x00+0x00 : op1 == 1 ? 0x00-0x00 : 0x00^0x00;
b1 = op2 == 0 ? a1 + c1 : op2 == 1 ? a1 - c1 : a1 ^ c1;
c1 = op3 == 0 ? b1 + c2 : op3 == 1 ? b1 - c2 : b1 ^ c2;
d1 = op4 == 0 ? 0x00 + c1 : 0x00 == 1 ? 0x00 - c1 : 0x00 ^ c1;
assert(d1 == 0xE1);
Metamorphic Extraction
Synthesizing Candidate Functions
Witnesses
SMT Solver
Function Doesn’t Exist
Values for
op1, op2, op3, op4, c1, c2
UNSAT SAT
Query for template parameters that satisfy the witnesses.
If they exist, create a function from them.
a = al ^ vBl;
b = a ^ 0x4A;
c = b ^ 0x85;
d = vBl + c;
Metamorphic Extraction
Equivalence Checking
Our function is only valid for witnesses seen far.
Does its behavior always match the X86?
Synthesized Function
Obfuscated X86
IR assertions
d != vBlAfter
Query
SMT Solver
Valid State Causing Difference
UNSAT SAT
If the formula is satisfiable, the output is a counterexample:
a state causing a difference in execution.
al 0x00 vBl 0x88
Metamorphic Extraction
Refinement
If the function did not match the X86, use the output state to
create a new witness. Repeat until error or success.
Witnesses
Synthesis
Can’t SynthesizeEquivalence Check
Create New Witness Function is Valid
UNSATSAT
UNSATSAT
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; c = b ^ 0x85; c = b ^ 0x85; c = b ^ 0x85;
Synthesized b = a ^ 0x4A;
Program c = b ^ 0x85;
d = vBl + c;
Counter-
Example
Begin with a program synthesized from the witnesses.
Try to find a counter-example.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; c = b ^ 0x85; c = b ^ 0x85; c = b ^ 0x85;
Synthesized b = a ^ 0x4A;
Program c = b ^ 0x85;
d = vBl + c;
Counter- vBl = 0x88
Example al = 0x00
Synthesize a program from the witnesses.
Try to find a counter-example.
Found: generate new witness.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; a = al + vBl; c = b ^ 0x85; c = b ^ 0x85;
Synthesized b = a ^ 0x4A; b = a ^ 0xD9;
Program c = b ^ 0x85; c = b - 0xE8;
d = vBl + c; d = vBl + c;
Counter- vBl = 0x88
Example al = 0x00
Synthesize a program from the witnesses.
Try to find a counter-example.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; a = al + vBl; c = b ^ 0x85; c = b ^ 0x85;
Synthesized b = a ^ 0x4A; b = a ^ 0xD9;
Program c = b ^ 0x85; c = b - 0xE8;
d = vBl + c; d = vBl + c;
Counter- vBl = 0x88 vBl = 0x20
Example al = 0x00 al = 0x10
Synthesize a program from the witnesses.
Try to find a counter-example.
Found: generate new witness.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; a = al + vBl; a = al + vBl; c = b ^ 0x85;
Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3;
Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E;
d = vBl + c; d = vBl + c; d = vBl + c;
Counter- vBl = 0x88 vBl = 0x20
Example al = 0x00 al = 0x10
Synthesize a program from the witnesses.
Try to find a counter-example.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; a = al + vBl; a = al + vBl; c = b ^ 0x85;
Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3;
Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E;
d = vBl + c; d = vBl + c; d = vBl + c;
Counter- vBl = 0x88 vBl = 0x20 vBl = 0x23
Example al = 0x00 al = 0x10 al = 0x08
Synthesize a program from the witnesses.
Try to find a counter-example.
Found: generate new witness.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; a = al + vBl; a = al + vBl; a = al + vBl;
Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3; b = a ^ 0xD2;
Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E; c = b + 0x0F;
d = vBl + c; d = vBl + c; d = vBl + c; d = vBl + c;
Counter- vBl = 0x88 vBl = 0x20 vBl = 0x23
Example al = 0x00 al = 0x10 al = 0x08
Synthesize a program from the witnesses.
Try to find a counter-example.
Metamorphic Extraction
In Action
lodsb
add al, bl
xor al, 0D2h
sub al, 0F1h
add bl, al
The deobfuscated sequence is shown for clarity. Our analysis
works upon obfuscated sequences.
a = al ^ vBl; a = al + vBl; a = al + vBl; a = al + vBl;
Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3; b = a ^ 0xD2;
Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E; c = b + 0x0F;
d = vBl + c; d = vBl + c; d = vBl + c; d = vBl + c;
Counter- vBl = 0x88 vBl = 0x20 vBl = 0x23
Example al = 0x00 al = 0x10 al = 0x08 NONE
Synthesize a program from the witnesses.
Try to find a counter-example.
Not found: function is valid.
Introduction
Ingredients
X86 Disassembly and Assembly
IR Translation
IR Interpreter
SMT Integration
Applications
Enumerative Program Synthesis
CPU Emulator Synthesis
Peephole Superdeobfuscation
Template-Based Program Synthesis
Metamorphic Extraction
Conclusion
New Course Offering
New training course offering on SMT-based binary program analysis.
Written for low-level people comfortable programming in
Python; no particular math or CS background required.
Learn what SMT solvers are and how to use them.
Lecture material vividly illustrated like these slides.
Students construct a minimal, yet fully functional SMT-based
program analysis framework in Python.
Dozens of small, guided programming exercises.
Code an SMT solver, X86 → IR translator, ROP compiler1
.
Available now!
1
ROP compiler application subject to potential replacement pending
forthcoming regulation of the computer security industry
Questions?
rolf@msreverseengineering.com
Check out M¨obius Strip Reverse Engineering at:
http://www.msreverseengineering.com
Program analysis training classes
Reverse engineering training classes
Consulting services
Blog, research archive, and other resources
Thanks
My proofreaders are awesome:
Igor Skochinsky
William Whistler
Vijay D’Silva
People who publish inspiring work are awesome, too.
References
Sorav Bansal and Alex Aiken.
Automatic generation of peephole superoptimizers.
In ACM Sigplan Notices, volume 41, pages 394–403. ACM,
2006.
Patrice Godefroid and Ankur Taly.
Automated synthesis of symbolic instruction encodings from
i/o samples.
In ACM SIGPLAN Notices, volume 47, pages 441–452. ACM,
2012.
Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam
Venkatesan.
Synthesis of loop-free programs.
In ACM SIGPLAN Notices, volume 46, pages 62–73. ACM,
2011.

More Related Content

What's hot (20)

PDF
The Evolution of Async-Programming on .NET Platform (TUP, Full)
jeffz
 
PDF
JVM Mechanics
Doug Hawkins
 
PPT
06 Loops
maznabili
 
PDF
Apache PIG - User Defined Functions
Christoph Bauer
 
PPTX
Synapse india dotnet development overloading operater part 3
Synapseindiappsdevelopment
 
DOCX
Algoritmos sujei
gersonjack
 
PPTX
Introduction to nsubstitute
Suresh Loganatha
 
PDF
The Evolution of Async-Programming (SD 2.0, JavaScript)
jeffz
 
PDF
An Introduction to Property Based Testing
C4Media
 
PDF
Asterisk: PVS-Studio Takes Up Telephony
Andrey Karpov
 
PPTX
EcmaScript unchained
Eduard Tomàs
 
PDF
Javascript Uncommon Programming
jeffz
 
DOC
All VLSI programs
Gouthaman V
 
PDF
Meck at erlang factory, london 2011
Adam Lindberg
 
PDF
VTU DSA Lab Manual
AkhilaaReddy
 
KEY
Gevent what's the point
seanmcq
 
KEY
Meck
Adam Lindberg
 
PDF
pdx-react-observables
Ian Irvine
 
PPTX
Lecture 5, c++(complete reference,herbet sheidt)chapter-15
Abu Saleh
 
PPTX
C++ AMP 실천 및 적용 전략
명신 김
 
The Evolution of Async-Programming on .NET Platform (TUP, Full)
jeffz
 
JVM Mechanics
Doug Hawkins
 
06 Loops
maznabili
 
Apache PIG - User Defined Functions
Christoph Bauer
 
Synapse india dotnet development overloading operater part 3
Synapseindiappsdevelopment
 
Algoritmos sujei
gersonjack
 
Introduction to nsubstitute
Suresh Loganatha
 
The Evolution of Async-Programming (SD 2.0, JavaScript)
jeffz
 
An Introduction to Property Based Testing
C4Media
 
Asterisk: PVS-Studio Takes Up Telephony
Andrey Karpov
 
EcmaScript unchained
Eduard Tomàs
 
Javascript Uncommon Programming
jeffz
 
All VLSI programs
Gouthaman V
 
Meck at erlang factory, london 2011
Adam Lindberg
 
VTU DSA Lab Manual
AkhilaaReddy
 
Gevent what's the point
seanmcq
 
pdx-react-observables
Ian Irvine
 
Lecture 5, c++(complete reference,herbet sheidt)chapter-15
Abu Saleh
 
C++ AMP 실천 및 적용 전략
명신 김
 

Similar to NSC #2 - D1 01 - Rolf Rolles - Program synthesis in reverse engineering (20)

PDF
SFO15-500: VIXL
Linaro
 
PDF
Auto Tuning
Hemanth Kumar Mantri
 
PPT
COMPILER_DESIGN_CLASS 2.ppt
ssuserebb9821
 
PPTX
COMPILER_DESIGN_CLASS 1.pptx
ssuserebb9821
 
PPTX
Chapter_04_ARM_Assembly.pptx ARM ASSEMBLY CODE
NagarathnaRajur2
 
PPTX
Intro to reverse engineering owasp
Tsvetelin Choranov
 
PDF
8085_MicroelectronicAndMicroprocess.pdf
FloraKara
 
PPT
Unit 2 8086 Instruction set.ppt notes good
SahilSingh866567
 
DOCX
Instruction set of 8086 Microprocessor
Velalar College of Engineering and Technology
 
PPTX
Chapter_04_ARM_Assembly ARM assembly language is the low-level programming.pptx
Elisée Ndjabu
 
PPT
8086-instruction-set-ppt
jemimajerome
 
PDF
reductio [ad absurdum]
Shakacon
 
PDF
Automated static deobfuscation in the context of Reverse Engineering
zynamics GmbH
 
PDF
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Security Conference
 
PDF
Introduction to Compiler Development
Logan Chien
 
PDF
Arm instruction set
Mathivanan Natarajan
 
PPTX
Assembler - System Programming
Radhika Talaviya
 
PPTX
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
PPT
lecture8_Cuong.ppt
HongV34104
 
SFO15-500: VIXL
Linaro
 
COMPILER_DESIGN_CLASS 2.ppt
ssuserebb9821
 
COMPILER_DESIGN_CLASS 1.pptx
ssuserebb9821
 
Chapter_04_ARM_Assembly.pptx ARM ASSEMBLY CODE
NagarathnaRajur2
 
Intro to reverse engineering owasp
Tsvetelin Choranov
 
8085_MicroelectronicAndMicroprocess.pdf
FloraKara
 
Unit 2 8086 Instruction set.ppt notes good
SahilSingh866567
 
Instruction set of 8086 Microprocessor
Velalar College of Engineering and Technology
 
Chapter_04_ARM_Assembly ARM assembly language is the low-level programming.pptx
Elisée Ndjabu
 
8086-instruction-set-ppt
jemimajerome
 
reductio [ad absurdum]
Shakacon
 
Automated static deobfuscation in the context of Reverse Engineering
zynamics GmbH
 
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Security Conference
 
Introduction to Compiler Development
Logan Chien
 
Arm instruction set
Mathivanan Natarajan
 
Assembler - System Programming
Radhika Talaviya
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
lecture8_Cuong.ppt
HongV34104
 
Ad

More from NoSuchCon (16)

PDF
NSC #2 - Challenge Solution
NoSuchCon
 
PDF
NSC #2 - Challenge Introduction
NoSuchCon
 
PDF
NSC #2 - D3 05 - Alex Ionescu- Breaking Protected Processes
NoSuchCon
 
PDF
NSC #2 - D3 04 - Guillaume Valadon & Nicolas Vivet - Detecting BGP hijacks
NoSuchCon
 
PDF
NSC #2 - D3 03 - Jean-Philippe Aumasson - Cryptographic Backdooring
NoSuchCon
 
PDF
NSC #2 - D3 02 - Peter Hlavaty - Attack on the Core
NoSuchCon
 
PDF
NSC #2 - D3 01 - Thomas Braden - Exploitation of hardened MSP430-based device
NoSuchCon
 
PDF
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
NoSuchCon
 
PDF
NSC #2 - D2 05 - Andrea Barisani - Forging the USB Armory
NoSuchCon
 
PDF
NSC #2 - D2 04 - Ezequiel Gutesman - Blended Web and Database Attacks
NoSuchCon
 
PDF
NSC #2 - D2 03 - Nicolas Collignon - Google Apps Engine Security
NoSuchCon
 
PDF
NSC #2 - D2 02 - Benjamin Delpy - Mimikatz
NoSuchCon
 
PDF
NSC #2 - D2 01 - Andrea Allievi - Windows 8.1 Patch Protections
NoSuchCon
 
PDF
NSC #2 - D1 05 - Renaud Lifchitz - Quantum computing in practice
NoSuchCon
 
PDF
NSC #2 - D1 03 - Sébastien Dudek - HomePlugAV PLC
NoSuchCon
 
PDF
NSC #2 - D1 02 - Georgi Geshev - Your Q is my Q
NoSuchCon
 
NSC #2 - Challenge Solution
NoSuchCon
 
NSC #2 - Challenge Introduction
NoSuchCon
 
NSC #2 - D3 05 - Alex Ionescu- Breaking Protected Processes
NoSuchCon
 
NSC #2 - D3 04 - Guillaume Valadon & Nicolas Vivet - Detecting BGP hijacks
NoSuchCon
 
NSC #2 - D3 03 - Jean-Philippe Aumasson - Cryptographic Backdooring
NoSuchCon
 
NSC #2 - D3 02 - Peter Hlavaty - Attack on the Core
NoSuchCon
 
NSC #2 - D3 01 - Thomas Braden - Exploitation of hardened MSP430-based device
NoSuchCon
 
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
NoSuchCon
 
NSC #2 - D2 05 - Andrea Barisani - Forging the USB Armory
NoSuchCon
 
NSC #2 - D2 04 - Ezequiel Gutesman - Blended Web and Database Attacks
NoSuchCon
 
NSC #2 - D2 03 - Nicolas Collignon - Google Apps Engine Security
NoSuchCon
 
NSC #2 - D2 02 - Benjamin Delpy - Mimikatz
NoSuchCon
 
NSC #2 - D2 01 - Andrea Allievi - Windows 8.1 Patch Protections
NoSuchCon
 
NSC #2 - D1 05 - Renaud Lifchitz - Quantum computing in practice
NoSuchCon
 
NSC #2 - D1 03 - Sébastien Dudek - HomePlugAV PLC
NoSuchCon
 
NSC #2 - D1 02 - Georgi Geshev - Your Q is my Q
NoSuchCon
 
Ad

Recently uploaded (20)

PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PDF
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
PDF
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Français Patch Tuesday - Juillet
Ivanti
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 

NSC #2 - D1 01 - Rolf Rolles - Program synthesis in reverse engineering

  • 1. Program Synthesis in Reverse Engineering Rolf Rolles M¨obius Strip Reverse Engineering http://www.msreverseengineering.com No Such Conference Keynote Speech November 19th, 2014
  • 2. Program Synthesis in Reverse Engineering Overview Desired Behavior Program Synthesizer Program Program synthesis is an academic discipline devoted to creating programs automatically, given an expectation of how the program should behave. We apply and adapt existing academic work to automate tasks in reverse engineering.
  • 3. Program Synthesis What it is Not “Can I have a web browser?” Program Synthesizer “Sure, here you go!”
  • 4. Program Synthesis What it is Not “Can I have a web browser?” Program Synthesizer “Sure, here you go!” Program synthesis: Requires precise behavioral specifications Operates on small scales Is primarily researched on loop-free programs
  • 5. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise
  • 6. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; }
  • 7. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; } Enumerate every possible function abs(int x): int abs(int x) { return -x; }
  • 8. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; } Enumerate every possible function abs(int x): int abs(int x) { return ~x; }
  • 9. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; } Enumerate every possible function abs(int x): int abs(int x) { return -(x+0); }
  • 10. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; } Enumerate every possible function abs(int x): int abs(int x) { return ~(x+0); }
  • 11. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; } Enumerate every possible function abs(int x): int abs(int x) { return -(x+1); }
  • 12. Program Synthesis The Simplest Possible Program Synthesizer Starting from a behavioral specification: int abs(int x) = x if x >= 0 -x otherwise Create a harness for testing abs(x) exhaustively: int main(int , char **) { for(int x = 0; x != MAX_INT; ++x) { int r = abs(x); if((x >= 0 && r != x) || r != -x) return -1; } return 0; } Enumerate every possible function abs(int x): int abs(int x) { return ~(x+1); }
  • 13. Program Synthesis The Simplest Possible Program Synthesizer The brute-force synthesizer: Is extremely slow, and Explores an infinite number of programs. We could improve the situation by carefully choosing which programs to generate. Modern program synthesis goes further and does better.
  • 14. Introduction Ingredients X86 Disassembly and Assembly IR Translation IR Interpreter SMT Integration Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 15. Ingredients X86 Disassembly and Assembly 28 D8 0F B6 DB 81 C3 BB BB BB BB 01 D8 X86 Disassembler sub bl, bl movzx ebx, bl add ebx, 0BBBBBBBBh add eax, ebx 28 D8 0F B6 DB 81 C3 BB BB BB BB 01 D8 X86 Assembler sub bl, bl movzx ebx, bl add ebx, 0BBBBBBBBh add eax, ebx Given bytes, a disassembler produces X86 assembly. An assembler does the opposite.
  • 16. Ingredients IR Translation add eax, ecx IR Translator vRes = vEax + vEcx; vZF = vRes == 0; vSF = vRes <s 0; vPF = Parity(vRes); vCF = vRes <u vEax; vOF = (vEcx ^ vRes) & (vEax ^ vRes) <s 0; vAF = (vRes ^ vEax ^ vEcx) & 0x10 != 0; vEax = vRes; IR translators convert instructions to a symbolic representation of their actual effects when executed.
  • 17. Ingredients IR Interpreter vRes = vEax + vEcx; vZF = vRes == 0; vSF = vRes <s 0; vPF = Parity(vRes); vCF = vRes <u vEax; vOF = (vEcx ^ vRes) & (vEax ^ vRes) <s 0; vAF = (vRes ^ vEax ^ vEcx) & 0x10 != 0; vEax = vRes; IR for add eax, ecx eax 645EDE7Bh zf 0 of 0 sf 1 ecx 0FCD02BDEh af 0 pf 1 cf 0 Input State IR Interpreter eax 612F0A59h zf 0 of 0 sf 0 ecx 0FCD02BDEh af 1 pf 1 cf 1 Output State The IR interpreter executes IR statements in an input state, producing an output state.
  • 18. Ingredients SMT Integration vRes = vEax + vEcx; vZF = vRes == 0; vSF = vRes <s 0; vPF = Parity(vRes); vCF = vRes <u vEax; vOF = (vEcx ^ vRes) & (vEax ^ vRes) <s 0; vAF = (vRes ^ vEax ^ vEcx) & 0x10 != 0; vEax = vRes; IR for add eax, ecx vZF == 1 Query SMT Solver vEax = 0, vEcx = 0 SMT solvers can answer questions about the values of variables and memory within IR sequences. We queried for a model of eax and ecx where vZF == 1.
  • 19. Introduction Ingredients Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 20. Program Synthesis in Reverse Engineering CPU Emulator Synthesis CPU Emulator Synthesizer X86 Assembler Templates add eax, ecx vRes = vEax + vEcx; vZF = vRes == 0; vSF = vRes <s 0; vPF = Parity(vRes); vCF = vRes <u vEax; vOF = (vEcx ^ vRes) & (vEax ^ vRes) <s 0; vAF = (vRes ^ vEax ^ vEcx) & 0x10 != 0; vEax = vEax + vEcx; CPU Emulator Logic We present a re-designed version of [2] for generating CPU emulators.
  • 21. Program Synthesis in Reverse Engineering Peephole Superdeobfuscation Deobfuscator Generator push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Code push r321 mov r321, i321 mov r322, i322 add r322, r321 pop r321 Simplifies To mov r322, i321+i322 Deobfuscator Rule We apply the ideas of [1] to automatically create deobfuscators for certain obfuscators.
  • 22. Program Synthesis in Reverse Engineering Metamorphic Extraction Functionality Extractor lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al add al, 0B7h sub al, 90h push cx mov cl, bl add al, bl pop cx add al, 90h sub al, 0B7h push ebx mov ebx, 8716AEF1h push ecx push eax xor dword ptr [esp], ebx ; continued We apply a technique from [3] to automatically recover metamorphically-generated code sequences.
  • 23. Introduction Ingredients X86 Disassembly and Assembly IR Translation IR Interpreter SMT Integration Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 24. CPU Emulator Synthesis Overview CPU emulators require accurate descriptions of the operation of the X86 processor. Sadly, the Intel manuals are at best vague, at worst wrong. IF 64-Bit Mode THEN #UD; ELSE IF ((AL AND 0FH) > 9) or (AF = 1) THEN WRONG: AL ← AL + 6; AX ← AX + 0x106 WRONG: AH ← AH + 1; AF ← 1; CF ← 1; AL ← AL AND 0FH; ELSE AF ← 0; CF ← 0; AL ← AL AND 0FH; FI; FI; That’s from the first instruction in the manual.
  • 25. CPU Emulator Synthesis Overview Idea: use program synthesis to find instruction descriptions. add eax, ecx CPU Emulator Synthesizer vRes = vEax + vEcx; vZF = vRes == 0; vSF = vRes <s 0; vPF = Parity(vRes); vCF = vRes <u vEax; vOF = (vEcx ^ vRes) & (vEax ^ vRes) <s 0; vAF = (vRes ^ vEax ^ vEcx) & 0x10 != 0; vEax = vEax + vEcx;
  • 26. CPU Emulator Synthesis Overview add eax, ecx Behavior Sampler I/O Pair Data Hypothesis Generator Templates, Substitutions Hypotheses Empirical Sieve Candidate Hypotheses Equivalence Checker Single Result We describe each component next. 1. Hypothesis Generator 2. Behavior Sampler 3. Empirical Sieve 4. Equivalence Checker
  • 27. CPU Emulator Synthesis Hypothesis Generator We generate a family of expressions from a template expression and a list of substitutions. Binop(_,+,_) vEax Dword(0) Tree representation of vEax + 0 Token Replace With + +,-,&,|,^ vEax vEax,vEcx Specified Substitutions vEax + 0 vEax - 0 vEax & 0 vEax | 0 vEax ^ 0 vEcx + 0 vEcx - 0 vEcx & 0 vEcx | 0 vEcx ^ 0 The 5 ∗ 2 = 10 expressions generated from the above
  • 28. CPU Emulator Synthesis Hypothesis Generator A result location is a location modified by an instruction. A hypothesis is an IR expression of equality between a result location variable and a generated expression. For the instruction add eax, ebx, the result locations are vEaxout , vZFout , vSFout , vPFout , vCFout , vOFout , and vAFout . vZFout == (vEax + vEbx) <s 0 vZFout == ((vEax + vEbx) ^ vEax) >u 0 vSFout == (vEax + vEbx) <s 0 vSFout == ((vEax + vEbx) ^ vEax) >u 0 vPFout == (vEax + vEbx) <s 0 vPFout == ((vEax + vEbx) ^ vEax) >u 0 vCFout == (vEax + vEbx) <s 0 vCFout == ((vEax + vEbx) ^ vEax) >u 0 vOFout == (vEax + vEbx) <s 0 vOFout == ((vEax + vEbx) ^ vEax) >u 0 vAFout == (vEax + vEbx) <s 0 vAFout == ((vEax + vEbx) ^ vEax) >u 0 Some hypotheses for add eax, ebx
  • 29. CPU Emulator Synthesis Behavior Sampler add eax, ecx Input State Behavior Sampler Output State Given a set of register and flag values as input: 1. JIT assemble the instruction 2. Set the processor registers and flags to the input state 3. Execute the instruction 4. Collect the values of the registers and flags as output These I/O pairs are samples of the instruction’s behavior.
  • 30. CPU Emulator Synthesis Empirical Sieve eaxin = 0 ecxin = 0 cfout = 0 I/O Data CFout == (vEax == vEcx) Hypothesis 0 == (0 == 0) 0 == 1 false Evaluation If a hypothesis is false in any I/O pair, it cannot describe the instruction’s behavior, so we discard it. In the figure above, CFout == (vEax == vEcx) is an invalid hypothesis because it evaluates to false for the pair listed.
  • 31. CPU Emulator Synthesis Empirical Sieve I/O Pairs Hypothesis Generator IR Interpreter Unfalsified Hypotheses We generate many hypotheses, test them against the I/O pairs, and discard any that evaluate to false. Multiple unfalsified hypotheses may remain after the empirical sieve. The next component removes more of them.
  • 32. CPU Emulator Synthesis Equivalence Checker Equivalence checking uses an SMT solver to determine whether two expressions evaluate equally for all inputs. If not, it finds an input that causes a different evaluation. (vEax+vEcx) == 0 Expression #1 (vEcx+vEax+vEcx+vEax) == 0 Expression #2 Two hypotheses for ZFout . We query an SMT solver as to the satisfiability of ((vEax+vEcx) == 0) != ((vEcx+vEax+vEcx+vEax) == 0). If UNSAT, the hypotheses always behave the same. IF SAT, the solution contains values for vEax and vEcx that cause the hypotheses to evaluate differently.
  • 33. CPU Emulator Synthesis Equivalence Checker The SMT solver reports satisfiability, and gives a model where the first hypothesis for ZFout evaluates to 0, and the other to 1. eaxin = 3d800000h ecxin = 42800000h Model for ((vEax+vEcx) == 0) != ((vEcx+vEax+vEcx+vEax) == 0) If we sample the instruction’s behavior in this input state, one of the hypotheses must become falsified, since: If zfout = 1, then (vEax+vEcx) == 0 is false. If zfout = 0, then (vEcx+vEax+vEcx+vEax) == 0 is false. Hypothesis #2 is falsified.
  • 34. CPU Emulator Synthesis Behavior Sampler add eax, ecx Behavior Sampler I/O Pair Data Hypothesis Generator <out> == <in>, <out> == <in> <+> <in>, . . . CFout == (vEax == vEcx), CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . .EmpiricalSieve CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . . Equivalence Checker CFout == vEax + vEcx <u vEax First, we generate hy- potheses for the instruc- tion’s effects.
  • 35. CPU Emulator Synthesis Hypothesis Generator add eax, ecx Behavior Sampler I/O Pair Data Hypothesis Generator <out> == <in>, <out> == <in> <+> <in>, . . . CFout == (vEax == vEcx), CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . .EmpiricalSieve CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . . Equivalence Checker CFout == vEax + vEcx <u vEax Next, we collect I/O pairs for an instruction.
  • 36. CPU Emulator Synthesis Empirical Sieve add eax, ecx Behavior Sampler I/O Pair Data Hypothesis Generator <out> == <in>, <out> == <in> <+> <in>, . . . CFout == (vEax == vEcx), CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . .Empirical Sieve CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . . Equivalence Checker CFout == vEax + vEcx <u vEax Remove hypotheses that are shown to be false by any I/O pair.
  • 37. CPU Emulator Synthesis Equivalence Checking add eax, ecx Behavior Sampler I/O Pair Data Hypothesis Generator <out> == <in>, <out> == <in> <+> <in>, . . . CFout == (vEax == vEcx), CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . .Empirical Sieve CFout == (vEcx <s 0), CFout == (vEax + vEcx <u vEax), . . . Equivalence Checker CFout == vEax + vEcx <u vEax The equivalence checker removes all but one hy- pothesis.
  • 38. Introduction Ingredients X86 Disassembly and Assembly IR Translation IR Interpreter SMT Integration Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 39. Peephole Superdeobfuscation Rule-Based Obfuscation Unobfuscated Obfuscated Constraints add r81, r82 add r81, imm8 add r81, r82 sub r81, imm8 sub r321, r322 push r323 r323 = r321 mov r323, r322 r321 = esp sub r321, r323 r321 = esp pop r323 r323 = esp Obfuscator Rules Some obfuscators use a rule set to randomly permute code. The ordinary and obfuscated sequences must have the same (or at least “similar”) effects when executed.
  • 40. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh add al, bl Blah
  • 41. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 9Fh add al, bl add al, bl add al, 9Fh Replace instructions with obfuscated sequences Obfuscation
  • 42. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 7Fh add al, bl add al, bl add al, 7Fh push edx mov dh, 22h inc dh dec dh add dh, 7Dh add al, 9Fh add al, dh pop edx Replace instructions with obfuscated sequences Obfuscation
  • 43. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh push cx mov ch, 9Fh sub al, 9Fh sub al, ch pop cx sub al, 7Fh sub al, 7Fh sub al, 70h add al, bl add al, bl add al, 70h add al, 7Fh add al, 7Fh push edx push edx push ecx mov ch, 22h mov dh, 22h mov dh, ch pop ecx inc dh inc dh dec dh dec dh add dh, 7Dh add dh, 7Dh add al, dh add al, dh pop edx pop edx Replace instructions with obfuscated sequences Obfuscation
  • 44. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh push cx mov ch, 9Fh sub al, ch pop cx sub al, 7Fh sub al, 70h add al, bl add al, 70h add al, 7Fh push edx push ecx mov ch, 22h mov dh, ch pop ecx inc dh dec dh add dh, 7Dh add al, dh pop edx Blah
  • 45. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh push cx mov ch, 9Fh sub al, 9Fh sub al, ch pop cx sub al, 7Fh sub al, 7Fh sub al, 70h add al, bl add al, bl add al, 70h add al, 7Fh add al, 7Fh push edx push edx push ecx mov ch, 22h mov dh, 22h mov dh, ch pop ecx inc dh inc dh dec dh dec dh add dh, 7Dh add dh, 7Dh add al, dh add al, dh pop edx pop edx Replace obfuscated sequences with instructions Deobfuscation with Inverse Rules
  • 46. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 7Fh add al, bl add al, bl add al, 7Fh push edx mov dh, 22h inc dh dec dh add dh, 7Dh add al, 9Fh add al, dh pop edx Replace obfuscated sequences with instructions Deobfuscation with Inverse Rules
  • 47. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh sub al, 9Fh add al, bl add al, bl add al, 9Fh Replace obfuscated sequences with instructions Deobfuscation with Inverse Rules
  • 48. Peephole Superdeobfuscation add al, bl sub al, 9Fh sub al, 9Fh sub al, 9Fh add al, bl Blah Deobfuscation with Inverse Rules
  • 49. Peephole Superdeobfuscation Rule-Based Deobfuscation Obfuscated Unobfuscated Constraints add r81, imm8 add r81, r82 add r81, r82 sub r81, imm8 push r323 sub r321, r322 r323 = r321 mov r323, r322 r321 = esp sub r321, r323 r321 = esp pop r323 r323 = esp Deobfuscator Rules We could inspect the code manually, pick out patterns, and write deobfuscator rules. This is Tedious And Error-Prone—. We automate the process with program synthesis.
  • 50. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] mov [esp], eax Obfuscated push eax IR Interpreter eax 645EDE7Bh eax 0FCD02BDEh Output State #1 Output State #2 State Comparison Collect obfuscated sequences.
  • 51. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] mov [esp], eax Obfuscated push eax IR Interpreter eax 645EDE7Bh esp 0FCD02BDEh Input State esp 0FCD02BDAh mem[esp] = 0x645EDE7B Output State #1 Output State #2 State Comparison Generate I/O pairs.
  • 52. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] mov [esp], eax Obfuscated push eax Collect Registers / Constants eax 645EDE7Bh eax 0FCD02BDEh Output State #1 eax, esp, 4 Operand Parts Output State #2 State Comparison Preparing for candidate generation, collect registers and constants from the obfuscated sequence.
  • 53. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] push r32 push i32 add r32, i32 Candidate Templates push eax Candidate push esp push 4 add esp, 4 add eax, 4 IR Interpreter Candidate Enumerator eax 645EDE7Bh eax 0FCD02BDEh Output State #1 eax, esp, 4 Operand Parts Output State #2 State Comparison Plug the registers and constants into the templates to generate candidate replacements.
  • 54. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] push eax Candidate IR Interpreter eax 645EDE7Bh esp 0FCD02BDEh Input State Output State #1 esp 0FCD02BDAh mem[esp] = 0x645EDE7B Output State #2 State Comparison For each candidate, generate I/O pairs in the same input states as for the original sequence.
  • 55. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] mov [esp], eax Obfuscated push eax Candidate IR Interpreter eax 645EDE7Bh esp 0FCD02BDEh Input State esp 0FCD02BDAh mem[esp] = 0x645EDE7B Output State #1 esp 0FCD02BDAh mem[esp] = 0x645EDE7B Output State #2 State Comparison Compare the resulting states. We can be lenient for flags and stack memory. If they match, the candidate is a potential replacement for the obfuscated sequence.
  • 56. Peephole Superdeobfuscation The Idea: Obfuscated/Unobfuscated Behavior Must Match lea esp, [esp-4] mov [esp], eax Obfuscated push eax Candidate Equivalence Checker eax 645EDE7Bh eax 0FCD02BDEh Output State #1 Output State #2 State Comparison Obfuscated Deobfuscated lea esp, [esp-4] push eax mov [esp], eax Deobfuscator Rules If the states matched, use an SMT solver to ensure that the sequences are actually equivalent. Learn rules if so.
  • 57. Peephole Superdeobfuscation Generalization Obfuscated Deobfuscated Constraints lea esp, [esp-4] push eax mov [esp], eax Deobfuscator Rules Our deobfuscator rule is specific to the register eax. In fact, eax can be replaced with any register except esp.
  • 58. Peephole Superdeobfuscation Generalization Obfuscated Deobfuscated Constraints lea esp, [esp-4] push eax mov [esp], eax Deobfuscator Rules Obfuscated Deobfuscated Constraints Generalized Rules lea esp, [esp-4] push r32 r32 = esp mov [esp], r32 Our deobfuscator rule is specific to the register eax. In fact, eax can be replaced with any register except esp. Generalization removes specific register and/or immediate values from deobfuscation rules.
  • 59. Peephole Superdeobfuscation In Action sub esp, 4 mov [esp], eax mov eax, ebp push ebp push edi mov [esp], eax xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints mov [esp], r322 mov r322, i321+i322 r321 = r322 We iterate through the in- structions, trying to sim- plify those within a window. : bottom of current window. : can simplify. : simplification applied.
  • 60. Peephole Superdeobfuscation In Action sub esp, 4 mov [esp], eax mov eax, ebp push ebp push edi mov [esp], eax xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 mov [esp], r322 mov r322, i321+i322 r321 = r322 Learn rule and generalize.
  • 61. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp push edi mov [esp], eax xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 mov [esp], r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 62. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp push edi mov [esp], eax xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 mov [esp], r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 63. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp push edi mov [esp], eax xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 mov [esp], r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 64. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp push edi mov [esp], eax xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 mov r322, i321+i322 r321 = r322 Learn rule and generalize.
  • 65. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp push eax xor [esp], 7FEE03B1h xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 66. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp push eax xor [esp], 7FEE03B1h xor [esp], 7FEE03B1h pop ebp sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 Learn rule and generalize.
  • 67. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 68. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h sub esp, 4 mov [esp], esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 Apply previous simplification.
  • 69. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h push esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 70. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h push esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 71. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h push esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 72. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h push esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 mov r322, i321+i322 r321 = r322 No simplification.
  • 73. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h push esi push edx mov edx, 6F6B5081h mov esi, 0BC6D38Bh add esi, edx pop edx Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 push r321 mov r322, i321+i322 r321 = esp mov r321, i321 mov r322, i321+i322 r322 = esp mov r322, i322 r321 = r322 add r322, r321 pop r321 Learn rule and generalize.
  • 74. Peephole Superdeobfuscation In Action push eax mov eax, ebp push ebp xor [esp], 7FEE03B1h mov ebp, eax xor ebp, 7FEE03B1h push esi mov esi, 7B32240Ch Obfuscated Deobfuscated Constraints sub esp, 4 push r32 r32 = esp mov [esp], r32 push r321 push r322 r322 = esp mov [esp], r322 push r321 mov r322, r321 r321 = esp xor [esp], i32 xor r322, i32 r322 = esp pop r322 push r321 mov r322, i321+i322 r321 = esp mov r321, i321 mov r322, i321+i322 r322 = esp mov r322, i322 r321 = r322 add r322, r321 pop r321 Continue ad infinitum. wtf?
  • 75. Peephole Superdeobfuscation System Output Deobfuscator Rules Code Generator C CodePython Code OCaml Code We can turn a set of rules into a program that uses pattern-matching to implement those transformations. We can generate a deobfuscator automatically, and also the obfuscator that created the code in the first place!
  • 76. Peephole Superdeobfuscation Limitations mov eax, 12h or edx, eax add eax, 34h jmp @end bswap ax bsr edx, 12h salc cmc @end: xor edx, edx mov eax, [esp] add esp, 4 ret Could simplify if we knew that edx was dead (i.e., not used before its next modification ). Ours is a forward analysis; dead code elimination requires backwards analysis. Can incorporate a liveness analysis for this purpose.
  • 77. Introduction Ingredients X86 Disassembly and Assembly IR Translation IR Interpreter SMT Integration Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 78. Template-Based Program Synthesis Overview Question: is it possible to create the function x+1 by using two of ~ (not) and/or - (neg)? y = ~x; y = -x; z = ~y; z = ~y; y = ~x; y = -x; z = -y; z = -y; Table: All Possible Sequences
  • 79. Template-Based Program Synthesis Templates Problem phrased as a template: what do bop1 and bop2 need to be set to, so that f(x) == x+1 for all values of x? bool bop1 , bop2; int f(int x) { int y = bop1 ? -x : ~x; return bop2 ? -y : ~y; }
  • 80. Template-Based Program Synthesis Solving After phrasing the question appropriately: English Mathematics Are there values of bop1, bop2 ∃ bop1, bop2 ∈ Bool · Such that, for all values of x ∀ x ∈ BV[32] · In the code y = bop1 ? -x : ~x let y = bop1 ? -x : ~x in z = bop2 ? -y : ~y let z = bop2 ? -y : ~y in z == x+1 is always true? z == x+1 An SMT solver gives bop1 = false, bop2 = true. I.e., int f(int x) { return -~x; }
  • 81. Template-Based Program Synthesis Extending the Framework We can extend the idea to use more than two operator types: Or reference further constants that the solver must provide: char op1; int f(int x) { y = op1 == 0 ? -x : op1 == 1 ? ~x : x-1; return y; } bool op1; char c1; int f(int x) { y = op1 ? x + c1 : x ^ c1; return y; }
  • 82. Introduction Ingredients X86 Disassembly and Assembly IR Translation IR Interpreter SMT Integration Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 83. Metamorphic Extraction Metamorphic Decoder Generation A metamorphic engine generates code like the following: lodsb op al, bl op al, i81 op al, i82 op bl, al Where each op can be add, sub, or xor, and i81 / i82 are 8-bit constants. I.e., 34 ∗ 28 ∗ 28 ≈ 5.3 million possible instances. Example metamorphic decoder: lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al
  • 84. Metamorphic Extraction Further Obfuscation add al, 0B7h sub al, 90h push cx mov cl, bl add al, bl pop cx add al, 90h sub al, 0B7h push ebx mov ebx, 8716AEF1h push ecx push eax xor dword ptr [esp], ebx ; continued Obfuscator lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The metamorphic code is then further obfuscated.
  • 85. Metamorphic Extraction Goal lodsb op al, bl op al, i81 op al, i82 op bl, al add al, 0B7h sub al, 90h push cx mov cl, bl add al, bl pop cx add al, 90h sub al, 0B7h push ebx mov ebx, 8716AEF1h push ecx push eax xor dword ptr [esp], ebx ; continued Re-Create Given an obfuscated sequence, we want to determine the underlying metamorphically-generated function. I.e., would like to know the ops and i8s. We re-create the information instead of deobfuscating. Let’s explore two approaches based on program synthesis.
  • 86. Metamorphic Extraction Template for Metamorphic Functionality Template for Metamorphic Functionality al = Load(vMem,vEdi,8); a = op1 == 0 ? al + vBl : op1 == 1 ? al - vBl : al ^ vBl; b = op2 == 0 ? a + c1 : op2 == 1 ? a - c1 : a ^ c1; c = op3 == 0 ? b + c2 : op3 == 1 ? b - c2 : b ^ c2; d = op4 == 0 ? vBl + c : op4 == 1 ? vBl - c : vBl ^ c; X86-related variables: al is the value read by lodsb vBl is the initial value of bl d is the final value of bl Template parameters: Constants c1, c2 Operators op1, op2, op3, op4 0:add, 1:sub, 2+:xor lodsb op al, bl op al, i81 op al, i82 op bl, al
  • 87. Metamorphic Extraction Straightforward Approach Using ∃ · ∀· Template Program IR for obfuscated X86 IR assertions ∃ op1, op2, op3, op4, c1, c2 · ∀ input X86 states · d == vBlAfter Query SMT Solver Values for op1, op2, op3, op4, c1, c2 We can solve directly using the quantifiers ∃, ∀. Quantifiers can slow solving, so we show an alternative.
  • 88. Metamorphic Extraction Collect I/O Pairs IR for obfuscated X86 Input State IR Interpreter Output State Collect I/O pairs for the obfuscated X86 seqeuence. Extract the important parts of the states. al 0x00 vBl 0x00 vBlAfter 0xE1 Parts of state needed for synthesis
  • 89. Metamorphic Extraction Create Witnesses Plug the I/O pair al 0x00 vBl 0x00 vBlAfter 0xE1 Into the template a = op1 == 0 ? al + vBl : op1 == 1 ? al - vBl : al ^ vBl; b = op2 == 0 ? a + c1 : op2 == 1 ? a - c1 : a ^ c1; c = op3 == 0 ? b + c2 : op3 == 1 ? b - c2 : b ^ c2; d = op4 == 0 ? vBl + c : op4 == 1 ? vBl - c : vBl ^ c; To obtain a witness: a1 = op1 == 0 ? 0x00+0x00 : op1 == 1 ? 0x00-0x00 : 0x00^0x00; b1 = op2 == 0 ? a1 + c1 : op2 == 1 ? a1 - c1 : a1 ^ c1; c1 = op3 == 0 ? b1 + c2 : op3 == 1 ? b1 - c2 : b1 ^ c2; d1 = op4 == 0 ? 0x00 + c1 : 0x00 == 1 ? 0x00 - c1 : 0x00 ^ c1; assert(d1 == 0xE1);
  • 90. Metamorphic Extraction Synthesizing Candidate Functions Witnesses SMT Solver Function Doesn’t Exist Values for op1, op2, op3, op4, c1, c2 UNSAT SAT Query for template parameters that satisfy the witnesses. If they exist, create a function from them. a = al ^ vBl; b = a ^ 0x4A; c = b ^ 0x85; d = vBl + c;
  • 91. Metamorphic Extraction Equivalence Checking Our function is only valid for witnesses seen far. Does its behavior always match the X86? Synthesized Function Obfuscated X86 IR assertions d != vBlAfter Query SMT Solver Valid State Causing Difference UNSAT SAT If the formula is satisfiable, the output is a counterexample: a state causing a difference in execution. al 0x00 vBl 0x88
  • 92. Metamorphic Extraction Refinement If the function did not match the X86, use the output state to create a new witness. Repeat until error or success. Witnesses Synthesis Can’t SynthesizeEquivalence Check Create New Witness Function is Valid UNSATSAT UNSATSAT
  • 93. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; c = b ^ 0x85; c = b ^ 0x85; c = b ^ 0x85; Synthesized b = a ^ 0x4A; Program c = b ^ 0x85; d = vBl + c; Counter- Example Begin with a program synthesized from the witnesses. Try to find a counter-example.
  • 94. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; c = b ^ 0x85; c = b ^ 0x85; c = b ^ 0x85; Synthesized b = a ^ 0x4A; Program c = b ^ 0x85; d = vBl + c; Counter- vBl = 0x88 Example al = 0x00 Synthesize a program from the witnesses. Try to find a counter-example. Found: generate new witness.
  • 95. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; a = al + vBl; c = b ^ 0x85; c = b ^ 0x85; Synthesized b = a ^ 0x4A; b = a ^ 0xD9; Program c = b ^ 0x85; c = b - 0xE8; d = vBl + c; d = vBl + c; Counter- vBl = 0x88 Example al = 0x00 Synthesize a program from the witnesses. Try to find a counter-example.
  • 96. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; a = al + vBl; c = b ^ 0x85; c = b ^ 0x85; Synthesized b = a ^ 0x4A; b = a ^ 0xD9; Program c = b ^ 0x85; c = b - 0xE8; d = vBl + c; d = vBl + c; Counter- vBl = 0x88 vBl = 0x20 Example al = 0x00 al = 0x10 Synthesize a program from the witnesses. Try to find a counter-example. Found: generate new witness.
  • 97. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; a = al + vBl; a = al + vBl; c = b ^ 0x85; Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3; Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E; d = vBl + c; d = vBl + c; d = vBl + c; Counter- vBl = 0x88 vBl = 0x20 Example al = 0x00 al = 0x10 Synthesize a program from the witnesses. Try to find a counter-example.
  • 98. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; a = al + vBl; a = al + vBl; c = b ^ 0x85; Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3; Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E; d = vBl + c; d = vBl + c; d = vBl + c; Counter- vBl = 0x88 vBl = 0x20 vBl = 0x23 Example al = 0x00 al = 0x10 al = 0x08 Synthesize a program from the witnesses. Try to find a counter-example. Found: generate new witness.
  • 99. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; a = al + vBl; a = al + vBl; a = al + vBl; Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3; b = a ^ 0xD2; Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E; c = b + 0x0F; d = vBl + c; d = vBl + c; d = vBl + c; d = vBl + c; Counter- vBl = 0x88 vBl = 0x20 vBl = 0x23 Example al = 0x00 al = 0x10 al = 0x08 Synthesize a program from the witnesses. Try to find a counter-example.
  • 100. Metamorphic Extraction In Action lodsb add al, bl xor al, 0D2h sub al, 0F1h add bl, al The deobfuscated sequence is shown for clarity. Our analysis works upon obfuscated sequences. a = al ^ vBl; a = al + vBl; a = al + vBl; a = al + vBl; Synthesized b = a ^ 0x4A; b = a ^ 0xD9; b = a ^ 0xD3; b = a ^ 0xD2; Program c = b ^ 0x85; c = b - 0xE8; c = b + 0x0E; c = b + 0x0F; d = vBl + c; d = vBl + c; d = vBl + c; d = vBl + c; Counter- vBl = 0x88 vBl = 0x20 vBl = 0x23 Example al = 0x00 al = 0x10 al = 0x08 NONE Synthesize a program from the witnesses. Try to find a counter-example. Not found: function is valid.
  • 101. Introduction Ingredients X86 Disassembly and Assembly IR Translation IR Interpreter SMT Integration Applications Enumerative Program Synthesis CPU Emulator Synthesis Peephole Superdeobfuscation Template-Based Program Synthesis Metamorphic Extraction Conclusion
  • 102. New Course Offering New training course offering on SMT-based binary program analysis. Written for low-level people comfortable programming in Python; no particular math or CS background required. Learn what SMT solvers are and how to use them. Lecture material vividly illustrated like these slides. Students construct a minimal, yet fully functional SMT-based program analysis framework in Python. Dozens of small, guided programming exercises. Code an SMT solver, X86 → IR translator, ROP compiler1 . Available now! 1 ROP compiler application subject to potential replacement pending forthcoming regulation of the computer security industry
  • 103. Questions? [email protected] Check out M¨obius Strip Reverse Engineering at: http://www.msreverseengineering.com Program analysis training classes Reverse engineering training classes Consulting services Blog, research archive, and other resources
  • 104. Thanks My proofreaders are awesome: Igor Skochinsky William Whistler Vijay D’Silva People who publish inspiring work are awesome, too.
  • 105. References Sorav Bansal and Alex Aiken. Automatic generation of peephole superoptimizers. In ACM Sigplan Notices, volume 41, pages 394–403. ACM, 2006. Patrice Godefroid and Ankur Taly. Automated synthesis of symbolic instruction encodings from i/o samples. In ACM SIGPLAN Notices, volume 47, pages 441–452. ACM, 2012. Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. Synthesis of loop-free programs. In ACM SIGPLAN Notices, volume 46, pages 62–73. ACM, 2011.