Chapter 31
Chapter 31
net/publication/259563782
CITATIONS READS
14 8,716
2 authors, including:
Mark Stamp
San Jose State University
214 PUBLICATIONS 3,956 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Secure Implementation of Modular Arithmetic Operations and hash search organization for IoT and Cloud Applications View project
All content following this page was uploaded by Mark Stamp on 03 November 2018.
and Patching Exercise . . . . . . . . . . . . . . . 665 ware module that has worked for years and carries
31.5.3 Recommended Reversing Tool
several rules of a business in its lines of code; unfor-
for the Java Exercise . . . . . . . . . . . . . . . . . 665
31.5.4 Animated Solution to the Java tunately the source code of the application has been
Reversing Exercise . . . . . . . . . . . . . . . . . . 666 lost – what remains is “native” or “binary” code. Re-
verse engineering skills are also used to detect and
31.6 Basic Antireversing Techniques . . . . . . . . . . . . 667 neutralize viruses and malware, and to protect in-
31.7 Applying Antireversing Techniques tellectual property. Computer programmers profi-
to Wintel Machine Code . . . . . . . . . . . . . . . . . . 668 cient in SRE will be needed should software compo-
31.7.1 Eliminating Symbolic Information nents like these need to be maintained, enhanced,
in Wintel Machine Code . . . . . . . . . . . . . 668 or reused. It became frightfully apparent during the
31.7.2 Basic Obfuscation
Y2K crisis that reverse engineering skills were not
Proof number:1
businesses to expand to the Internet for what was that a great deal of legacy code is poorly designed
promised to be limitless potential for new revenue and documented [31.3]. It is stated in [31.4] that
caused the creation of many business to consumer “COBOL programs are in use globally in govern-
(B2C) Web sites. mental and military agencies, in commercial en-
31.2 Reverse Engineering in Software Development 655
Encapsulate
binary and
Software module test
enhancement request Design
recovery
Patch
binary and
No test
Source
Software Deploy
exists?
engineer
Yes
Legacy
Edit, system
compile,
and test Deploy Fig. 31.1 Development process
for maintaining legacy software
terprises, and on operating systems such as IBM’s that even the most sophisticated tools can replace
z/OS®, Microsoft’s Windows®, and the POSIX fami- experience with building a mental model of exist-
lies (Unix/Linux, etc.). In 1997, the Gartner Group ing software; Deursen et al. [31.5] stated that “com-
reported that 80% of the world’s business ran on mercial reverse engineering tools produce various
COBOL with over 200 billion lines of code in exis- kinds of output, but software engineers usually don’t
tence and with an estimated 5 billion lines of new how to interpret and use these pictures and reports.”
code annually.” Since it is cost-prohibitive to rip and The lack of reverse engineering skills in most pro-
replace billions of lines of legacy code, the only rea- grammers is a serious risk to the long-term viability
sonable alternative has been to maintain and evolve of any organization that employs information tech-
the code, often with the help of concepts found in nology. The problem of software maintenance can-
SRE. Figure 31.1 illustrates a process a software engi- not be dispelled with some clever technique; Weide
neer might follow when maintaining legacy software et al. [31.6] argue “re-engineering code to create
systems. a system that will not need to be reverse engineered
Whenever computer scientists or software en- again in the future – is presently unattainable.” CE0
gineers are engaged with evolving an existing sys- According to Eliam [31.7], there are four
tem, 50–90% of the work effort is spent on program software-development-related reverse engineer-
understanding [31.3]. Having engineers spend such ing scenarios; the scenarios cover a broad spectrum
a large amount of their time attempting to under- of activities that include software maintenance,
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
stand a system before making enhancements is not reuse, re-engineering, evolution, interoperability,
economically sustainable as a software system con- and testing. Figure 31.2 summarizes the software-
tinues to grow in size and complexity. To help lessen development-related reverse engineering scenarios.
the cost of program understanding, Ali [31.3] ad- The following are tasks one might perform in
vises that “practice with reverse engineering tech- each of the reversing scenarios [31.7]:
niques improves ability to understand a given sys-
tem quickly and efficiently.” • Achieving interoperability with proprietary soft-
Even though several tools already exist to aid ware: Develop applications or device drivers that
software engineers with the program understand- interoperate (use) proprietary libraries in oper-
ing process, the tools focus on transferring infor- ating systems or applications.
mation about a software system’s design into the • Verification that implementation matches design:
Proof number:1
mind of the developer [31.1]. The expectation is Verify that code produced during the forward
that the developer has enough skill to efficiently development process matches the envisioned de-
integrate the information into his/her own mental sign by reversing the code back into an abstract
model of the system’s architecture. It is not likely design.
0
CE AQ: Please confirm that this quotation appears exactly as in the original publication.
656 31 Software Reverse Engineering
Developed-related
software
reverse engineering
• Evaluating software quality and robustness: En- tunity to practice anti-reversing techniques might be
sure the quality of software before purchasing it in a better position to help their employer, or them-
by performing heuristic analysis of the binaries selves, protect their intellectual property. As stated
to check for certain instruction sequences that in [31.3], “to defeat a crook you have to think like
appear in poor-quality code. one.” By reverse engineering viruses or other ma-
• Legacy software maintenance, re-engineering, and licious software, programmers can learn their in-
evolution: Recover the design of legacy software ner workings and witness at first hand how vulner-
modules when the source code is not available to abilities find their way into computer programs. Re-
make possible the maintenance, evolution, and versing software that has been infected with a virus
reuse of the modules. is a technique used by the developers of antivirus
products to identify and neutralize new viruses or
understand the behavior of malware.
31.3 Reverse Engineering Programming languages such as Java, which
in Software Security do not require computer programmers to manage
low-level system details, have become ubiquitous.
From the perspective of a software company, it is As a result, computer programmers have increas-
highly desirable that the company’s products are dif- ingly lost touch with what happens in a system
ficult to pirate and reverse engineer. Making soft- during execution of programs. Ali [31.3] suggests
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
ware difficult to reverse engineer seems to be in that programmers can gain a better and deeper
conflict with the idea of being able to recover the understanding of software and hardware through
software’s design later on for maintenance and evo- learning reverse engineering concepts. Hackers and
lution. Therefore, software manufacturers usually crackers have been quite vocal and active in proving
do not apply anti-reverse-engineering techniques to that they possess a deeper understanding of low-
software until it is shipped to customers, keeping level system details than their professional counter-
copies of the readable and maintainable code. Soft- parts [31.3].
ware manufacturers will typically only invest time in According to Eliam [31.7], there are four soft-
making software difficult to reverse engineer if there ware-security-related reverse engineering scenarios;
are particularly interesting algorithms that make the just like development-related reverse engineering,
product stand out from the competition. the scenarios cover a broad spectrum of activities
Proof number:1
Making software difficult to pirate or reverse en- that include ensuring that software is safe to deploy
gineer is often a moving target and requires special and use, protecting clever algorithms or business
skills and understanding on the part of the devel- processes, preventing pirating of software and digital
oper. Software developers who are given the oppor- media such as music, movies, and books, and mak-
31.4 Reversing and Patching Wintel Machine Code 657
Security-related
software
reverse engineering
ing sure that cryptographic algorithms are not vul- that the object file has, such as operating system li-
nerable to attacks. Figure 31.3 summarizes the soft- braries. In contrast to high-level languages, there are
ware-security-related reverse engineering scenarios. low-level languages which are still considered to be
The following are tasks one might perform in each of high level by a computer’s CPU because the lan-
the reversing scenarios [31.7]: guage syntax is still a textual or mnemonic abstrac-
tion of the processor’s instruction set. For exam-
• Detecting and neutralizing viruses and malware: ple, assembly language, a language that uses help-
Detect, analyze, or neutralize (clean) malware, ful mnemonics to represent machine instructions,
viruses, spyware, and adware. must still be translated to an object file and made
• Testing cryptographic algorithms for weaknesses: executable by a linker. However, the translation from
Test the level of data security provided by a given assembly code to machine code is done by an assem-
cryptographic algorithm by analyzing it for bler instead of a compiler – reflecting the closeness
weaknesses. of the assembly language’s syntax to actual machine
• Testing digital rights management or license pro- code.
tection (antireversing): Protect software and me- The reason why compilers translate programs
dia digital rights through application and testing coded in high-level and low-level languages to ma-
of antireversing techniques. chine code is threefold:
• Auditing the security of program binaries: Audit
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
a program for security vulnerabilities without ac- 1. CPUs only understand machine instructions.
cess to the source code by scanning instruction 2. Having a CPU dynamically translate higher-
sequences for potential exploits. level language statements to machine instruc-
tions would consume significant, additional
CPU time.
31.4 Reversing and Patching 3. A CPU that could dynamically translate multi-
ple high-level languages to machine code would
Wintel Machine Code
be extremely complex, expensive, and cumber-
some to maintain – imagine having to update
The executable representation of software, other- the firmware in your microprocessor every time
wise known as machine code, is typically the re- a bug is fixed or a feature is added to the C++
sult of translating a program written in a high-lev-
Proof number:1
language!
el language, using a compiler, to an object file, a file
which contains platform-specific machine instruc- To relieve a high-level language compiler from the
tions. The object file is made executable using a link- difficult task of generating machine instructions,
er, a tool which resolves the external dependencies some compilers do not generate machine code di-
658 31 Software Reverse Engineering
rectly; instead, they generate code in a low-level spective the next obvious step would be to trans-
language such as assembly language [31.8]. This al- late assembly language back to a high-level language,
lows for a separation of concerns where the compiler where it would be much less difficult to read, un-
does not have to know how to encode and format derstand, and alter the program. Unfortunately, this
machine instructions for every target platform or is an extremely difficult task for any tool because
processor – it can instead just concentrate on gen- once high-level-language source code is compiled
erating valid assembly code for an assembler on the down to machine code, a great deal of information
target platform. Some compilers, such as the C and is lost. For example, one cannot tell by looking at
C++ compilers in the GNU Compiler Collection the machine code which high-level language (if any)
(GCC), have the option to output the intermediate the machine code originated from. Perhaps know-
assembly code that the compiler would otherwise ing a particular quirk about a compiler might help
feed to the assembler – allowing advanced pro- a reverse engineer identify some machine code that
grammers to tweak the code [31.9]. Therefore, the C it had a hand in creating, but this is not a reliable
and C++ compilers in GCC are examples of com- strategy.
pilers that translate high-level language programs The greatest difficulty in reverse engineering
to assembly code instead of machine code; they machine code comes from the lack of adequate
rely on an assembler to translate their output into decompilers – tools that can generate equivalent
instructions the target processor can understand. high-level-language source code from machine
Gough [31.9] outlined the compilation process un- code. Eliam [31.7] argues that it should be possible
dertaken by a GCC compiler to render an executable to create good decompilers for binary executables,
file as follows: but recognizes that other experts disagree – raising
the point that some information is “irretrievably
• Preprocessing: Expand macros in the high-level
lost during the compilation process”. Boomerang is
language source file
a well-known open-source decompiler project that
• Compilation: Translate the high-level source
seeks to one day be able to decompile machine code
code to assembly language
to high-level-language source code with respectable
• Assembly: Translate assembly language to object
results [31.11]. For those reverse engineers inter-
code (machine code)
ested in recovering the source code of a program,
• Linking (Create the final executable):
decompilation may not offer much hope because
– Statically or dynamically link together the ob- as stated in [31.11], “a general decompiler does not
ject code with the object code of the programs attempt to reverse every action of the compiler,
and libraries it depends on rather it transforms the input program repeatedly
– Establish initial relative addresses for the vari- until the result is high level source code. It therefore
ables, constants, and entry points in the ob- won’t recreate the original source file; probably
ject code. nothing like it”.
To get a sense of the effectiveness of Boomerang
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
ping from each assembly language instruction to Incidentally, the Boomerang decompiler was unable
a machine instruction [31.10]. A tool that trans- to produce any output when HelloWorld.exe was
lates machine code back into assembly language is built using Microsoft’s Visual C++ 2008 edition
called a disassembler. From a reverse engineer’s per- compiler.
31.4 Reversing and Patching Wintel Machine Code 659
The full length of the C code generated by high-level-language source code of an executable
Boomerang for the HelloWorld.exe program con- does not seem feasible; however, because of the
tained 180 lines of confusing, nonsensical control one-to-one correspondence between machine code
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
structures and function calls to undefined methods. and assembly language statements [31.10], we can
It is surprising to see such a poor decompilation obtain a low-level language representation. Fortu-
result, but as stated in [31.11]: “Machine code de- nately there are graphical tools available that not
compilation, unlike Java/.NET decompilation, is only include a disassembler, a tool which generates
still a very immature technology.” To ensure that assembly language from machine code, but that also
decompilation was given a fair trial, another decom- allow for debugging and altering the machine code
piler was tried on the HelloWorld.exe executable. during execution.
The Reversing Engineering Compiler, or REC, is
both a compiler and a decompiler that claims to
be able to produce a “C-like” representation of 31.4.2 Wintel Machine Code Reversing
machine code [31.12]. Unfortunately. the results and Patching Exercise
Proof number:1
create and manage their passwords in a secure and 31.4.3 Recommended Reversing Tool
convenient way. Before releasing a limited trial ver- for the Wintel Exercise
sion of the application on our company’s Web site,
we would like to understand how difficult it would OllyDbg is a shareware interactive machine
be for a reverse engineer to circumvent a limitation code debugger and disassembler for Microsoft
in the trial version that exists to encourage purchases Windows® [31.14]. The tool has an emphasis on
of the full version; the trial version of the application machine code analysis, which makes it particu-
limits the number of password records a user may larly helpful in cases where the source code for the
create to five. target program is unavailable [31.14]. Figure 31.4
The C++ version of the Password Vault appli- illustrates the OllyDbg graphical workbench. Olly-
cation (included with this text) was developed to Dbg operates as follows: the tool will disassemble
provide a nontrivial application for reversing exer- a binary executable, generate assembly language
cises without the myriad of legal concerns involved instructions from machine code instructions, and
with reverse engineering software owned by others. perform some heuristic analysis to identify individ-
The Password Vault application employs 256-bit ual functions (methods) and loops. OllyDbg can
AES encryption, using the free cryptographic li- open an executable directly, or attach to one that
brary crypto++ [31.13], to securely store passwords is already running. The OllyDbg workbench can
for multiple users – each in separate, encrypted display several different windows, which are made
XML files. By default, the Makefile that is used to visible by selecting them on the View menu bar
build the Password Vault application defines a con- item. The CPU window, shown in Fig 31.4, is the
stant named “TRIALVERSION” which causes the default window that is displayed when the OllyDbg
resulting executable to limit the number of password workbench is started. Table 31.1 lists the panes of the
records a user may create to only five, using condi- CPU window along with their respective capabili-
tional compilation. This limitation is very similar to ties; the contents of the table are adapted from the
limitations found in many shareware and trialware online documentation provided by Yuschuk [31.14]
applications that are available on the Internet. and experience with the tool.
Pane Capabilities
Disassembler Edit, debug, test, and patch a binary executable using actions available on a popup menu
Patch an executable by copying edits to the disassembly back to the binary
Dump Display the contents of memory or a file in one of 7 predefined formats: byte, text, integer, float, address,
disassembly, or PE header
Set memory breakpoints (triggered when a particular memory location is read from or written to)
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
– Restore the state of registers and memory on return from a call statement
– Allocate storage for the local variables, parameters, and return value of the called subroutine
– Provide a return address
31.4 Reversing and Patching Wintel Machine Code 661
Disassembler Registers
Fig. 31.5 Sample slide from the machine code reversing animated tutorial
662 31 Software Reverse Engineering
Java
bytecode
Machine instruction
JVM instruction
Java Virtual
Machine
Machine (JVM)
code
Proof number:1
CPU
Fig. 31.6 Execution of Java bytecode versus machine code. JVM Java Virtual Machine
31.5 Reversing and Patching Java Bytecode 663
• Java bytecode: “Bytecode is the intermediate rep- we peek at the bytecode using javap to get an idea
resentation of Java programs just as assembler is of how much information survives the translation
the intermediate representation of C or C++ pro- from high-level Java source code to the intermedi-
grams” [31.16]. Java bytecode contains platform- ate format of Java bytecode. Algorithm 31.2 contains
independent instructions that are translated to the source code for ListArguments.java, a simple Java
platform-specific instructions by a Java Virtual program that echoes each argument passed on the
Machine (JVM). command line to standard output.
In Sect. 31.4, an attempt to recover the source code Bytecode is stored in a binary format that is not
of a simple “Hello World” C++ application was un- human-readable and therefore must be “disassem-
successful when the output of two different compil- bled” for it to be read. Recall that the result of dis-
ers was given as input to the Boomerang decom- assembling machine code is assembly language that
piler. Much more positive results can be achieved can be converted back into machine code using an
for Java bytecode because of its platform-indepen- assembler; unfortunately, the same does not hold
dent design and high-level representation. On Win- for disassembling Java bytecode. Sun Microsystem’s
dows®, machine code is typically stored in files with Java Development Toolkit (JDK) comes with javap,
the extensions *.exe and *.dll; the file extensions for a command-line tool for disassembling Java byte-
machine code vary with the operating system. This code; to say that javap “disassembles” bytecode is
is not the case with Java bytecode, as it is always a bit of a misnomer since the output of javap is un-
stored in files that have a *.class extension. Related structured text which cannot be converted back into
Java classes, such as those for an application or class bytecode. The output of javap is nonetheless useful
library, are often bundled together in an archive file as a debugging and performance tuning aid since
with a *.jar extension. The Java Language Specifica- one can see which JVM instructions are generated
tion allows at most one top-level public class to be from high-level Java language statements.
defined per *.java source file and requires that the Algorithm 31.3 lists the Java bytecode for the
bytecode be stored in a file whose name matches main method of ListArguments.class; notice that the
TopLevelClassName.class. fully qualified name of each method invoked by the
bytecode is preserved. It may seem curious that al-
though ListArguments.java contains no references to
31.5.1 Decompiling and the class java.lang.StringBuilder, there are many ref-
Disassembling Java Bytecode erences to it in the bytecode; this is because the use
of the “+” operator to concatenate strings is a conve-
To demonstrate how much more feasible it is to nience offered by the Java language that has no direct
recover Java source code from Java bytecode than representation in bytecode. To perform the concate-
it is to recover C++ code from machine code, we nation, the bytecode creates a new instance of the
decompile the bytecode for the program ListArgu- StringBuilder class and invokes its append method
ments.java using Jad, a Java decompiler [31.17]; we for each occurrence of the “+” operator in the origi-
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
then compare the Java source code generated with nal Java source code (there are three). A loss of infor-
the original. Before performing the decompilation, mation has indeed occurred, but we will see that it is
12: }
still possible to generate Java source code equivalent ror-prone and difficult. In Sect. 31.4, which covered
to the original in function, but not in syntax. reversing and patching of machine code, it was de-
Algorithm 31.4 lists the result of decompiling termined through discussion and an animated tuto-
ListArguments.class using Jad; although the code is rial that one should work with disassembly to make
different from the original ListArguments.java pro- changes to a binary executable. However, the result
gram, it is functionally equivalent and syntactically of disassembling Java bytecode is a pseudo-assembly
correct, which is a much better result than that seen language, a language that cannot be compiled or as-
earlier with decompiling machine code. sembled but serves to provide a more abstract, read-
Proof number:1
An advanced programmer who is fluent in the able representation of the bytecode. Because edit-
JVM specification could use a hex editor or a pro- ing bytecode directly is difficult, and disassembling
gram to modify Java bytecode directly, but this is bytecode results in pseudo-assembly language which
similar to editing machine code directly, which is er- cannot be compiled, it would at first seem that losing
31.5 Reversing and Patching Java Bytecode 665
Java source code is more dire a situation than losing tion limits the number of password records a user
C++ code, but of course this is not the case since, as may create to five.
we have seen using Jad, Java bytecode can be success- The Java version of the Password Vault appli-
fully decompiled to equivalent Java source code. cation (included with this text) was developed to
provide a nontrivial application for reversing exer-
cises without the myriad of legal concerns involved
31.5.2 Java Bytecode Reversing with reverse engineering software owned by others.
and Patching Exercise The Java version of the Password Vault application
employs 128-bit AES encryption, using Sun’s Java
This section introduces an exercise that is the Java Cryptography Extensions, to securely store pass-
bytecode equivalent of that given in Sect. 31.4.2 for words for multiple users – each in separate, en-
Wintel machine code. Imagine that we have just im- crypted XML files.
plemented a Java version of the console application
Password Vault, which helps computer users create
and manage their passwords in a secure and con- 31.5.3 Recommended Reversing Tool
venient way. Before releasing a limited trial version for the Java Exercise
of the application on our company’s Web site, we
would like to understand how difficult it would be If using Jad from the command line does not
for a reverse engineer to circumvent a limitation in sound appealing, there is a freeware graphical tool
the trial version that exists to encourage purchases built upon Jad called FrontEnd Plus that provides
of the full version; the trial version of the applica- a simple workbench for decompiling classes and
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Proof number:1
browsing the results [31.17]; it also has a convenient introduced the machine code (C++) version. The
batch mode where multiple Java class files can be Java version of the Password Vault application uses
decompiled at once. After the Java code generated 128-bit instead of 256-bit AES encryption because
by Jad has been edited, it is necessary to recompile Sun Microsystem’s standard Java Runtime Environ-
the source code back to bytecode to integrate the ment does not provide 256-bit encryption owing to
changes. The ability to recompile the Java code export controls. A trial limitation of five password
generated is not functional in the FrontEnd Plus records per user is also implemented in the Java
workbench for some reason, though it is simple version. Unfortunately, Java does not support con-
enough to do the compilation manually. Next we ditional compilation, so the source code cannot be
mention an animated tutorial for reversing a Java compiled to omit the trial limitation without manu-
implementation of the Password Vault application, ally removing it or using a custom build process.
which was introduced in Sect 31.4. Figure 31.7
shows a FrontEnd Plus workbench session contain-
ing the decompilation of ListArguments.class. 31.5.4 Animated Solution
To demonstrate the use of FrontEnd Plus to re- to the Java Reversing Exercise
verse engineer and patch a Java bytecode, a Java ver-
sion of the Password Vault application was devel- Using FrontEnd Plus (and Jad), one can successfully
oped; recall that the animated tutorial in Sect. 31.4 reverse engineer a nontrivial Java application such as
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Data
obfuscation
Proof number:1
Password Vault, and make permanent changes to the a deterrent by increasing the challenge for the re-
behavior of the bytecode. Again, the purpose of hav- verse engineer. Eliam [31.7] stated, “It is never possi-
ing placed a trial limitation in the Password Vault ap- ble to entirely prevent reversing” and “What is pos-
plication is to provide an opportunity for one to ob- sible is to hinder and obstruct reversers by wear-
serve how easy or difficult it is for a reverse engineer ing them out and making the process so slow and
to disable the limitation. Just like for machine code, painful that they give up.” The remainder of this sec-
antireversing strategies can be applied to Java byte- tion introduces basic antireversing techniques, two
code. We cover some basic, effective strategies for of which are demonstrated in Sects. 31.7 and 31.8.
protecting bytecode from being reverse engineered Although it is not possible to completely prevent
in a later section. software from being reverse engineered, a reason-
For instructional purposes, an animated solution able goal is to make it as difficult as possible. Im-
that demonstrates the complete end-to-end reverse plementing antireversing strategies for source code,
engineering of the Java Password Vault application machine code, and bytecode can have adverse ef-
was created using Qarbon Viewlet Builder and can fects on a program’s size, efficiency, and maintain-
be viewed using Macromedia Flash Player. The tuto- ability; therefore, it is important to evaluate whether
rial begins with the Java Password Vault application, a particular program warrants the cost of protecting
FrontEnd Plus, and Sun’s Java JDK v1.6 installed on it. The basic antireversing techniques introduced in
a Windows® XP machine. Figure 31.8 contains an this section are meant to be applied after production,
example slide from the animated tutorial. The ani- after the coding for an application is complete and
mated tutorial, source code, and installer for the Java has been tested. These techniques obscure data and
version of Password Vault can be downloaded from logic and therefore are difficult to implement while
the following locations: also working on the actual functionality of the ap-
plication – doing so could hinder or slow down de-
• http://reversingproject.info/repository.php?
bugging and, even worse, create a dependency be-
fileID=5_4_1 (Java bytecode reversing and patch-
tween the meaningful program logic and the antire-
ing animated solution)
versing strategies used. Eliam [31.7] described three
• http://reversingproject.info/repository.php?
basic antireversing techniques:
fileID=5_4_2 (Password Vault Java source code)
• http://reversingproject.info/repository.php?
1. Eliminating symbolic information: The first and
fileID=5_4_3 (Password Vault (Java version)
most obvious step in preventing reverse engi-
Windows® installer).
neering of a program is to render unrecogniz-
Begin viewing the tutorial by extracting password able all symbolic information in machine code
_vault_java_reversing_exercise.zip to a local di- or bytecode because such information can be
rectory and either running password_vault_java quite useful to a reverse engineer. Symbolic in-
_reversing_exercise.exe, which should launch the formation includes class names, method names,
standalone version of Macromedia Flash Player, variable names, and string constants that are
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
or opening the file password_vault_java_reversing still readable after a program has been compiled
_exercise_viewlet._swf.html in a Web browser. down to machine code or bytecode.
2. Obfuscating the program: Obfuscation includes
eliminating symbolic information, but goes
31.6 Basic Antireversing Techniques much further. Obfuscation strategies include
modifying the layout of a program, introducing
Having seen that it is fairly straightforward for a re- confusing nonessential logic or control flow,
verse engineer to disable the trial limitation on the and storing data in difficult-to-interpret or-
machine code and Java bytecode implementations ganizations or formats. Applying all of these
of the Password Vault application, we now investi- techniques can render a program difficult to
gate applying antireversing techniques to both im- reverse; however, care must be taken to ensure
Proof number:1
plementations to make it significantly more diffi- the original functionality of the application
cult for the trial limitation to be disabled. Although remains intact.
antireversing techniques cannot completely prevent 3. Embedding antidebugger code: Static analysis of
software from being reverse engineered, they act as machine code is usually carried out using a dis-
668 31 Software Reverse Engineering
assembler and heuristic algorithms that attempt the machine code that is not important for the exe-
to understand the structure of the program. Ac- cution of the program, but serves to ease debugging
tive or live analysis of machine code is done or reuse of it by another program. For example, if
using an interactive debugger-disassembler that a program relies on a certain function or methods
can attach to a running program and allow a re- names (as a dynamic link library does), the names of
verse engineer to step through each instruction those methods or functions will appear in the .idata
and observe the behavior of the program at key (import data) section of the Windows PE header. In
points during its execution. Live analysis is how production versions of a program, the machine code
most reverse engineers get the job done, so it is does not directly contain any symbolic information
common for developers to want to implement from the original source code – such as method
guards against binary debuggers. names, variable names, or line numbers; the exe-
cutable file only contains the machine instructions
that were produced by the compiler [31.9]. This lack
31.7 Applying Antireversing of information about the connection between the
Techniques to Wintel machine instructions and the original source code
Machine Code is unacceptable for purposes of debugging – this is
why most modern compilers, such as those in GCC,
include an option to insert debugging information
Extreme care must be taken when applying antire-
into the executable file that allows one to trace a fail-
versing techniques because some ultimately change
ure occurring at a particular machine instruction
the machine code or Java bytecode that will be exe-
back to a line in the original source code [31.9].
cuted on the target processor. In the end, if a pro-
To show the various kinds of symbolic informa-
gram does not work, measuring how efficient or
tion that is inserted into machine code to enable de-
difficult to reverse engineer it is becomes meaning-
bugging of an application, the GNU C++ compiler
less [31.18]. Some of the antireversing transforma-
was directed to compile the program Calculator.cpp
tions performed on source code to make it more dif-
with debugging information but to generate assem-
ficult to understand in both source and executable
bly language instead of machine code. The source
formats can make the source code more challenging
code for Calculator.cpp and the assembly language
for a compiler to process because the program no
equivalent generated are given in Algorithm 31.5.
longer looks like something a human would write.
The GNU compiler stores debug information in the
Weinberg [31.18] stated that “any compiler is going
symbol tables (.stabs) section of the Windows PE
to have at least some pathological programs which it
header so that it will be loaded into memory as part
will not compile correctly.” Compiler failures on so-
of the program image. It should be clear from the as-
called pathological programs occur because com-
sembly language generated shown in Algorithm 31.5
piler test cases are most often coded by people –
that the debugging information inserted by GCC CE1
not mechanically generated by a tool that knows
is by no means a replacement for the original source
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
ues on to include the most prolific author of such this is that tools to assist with obfuscating machine
helpful information – the programmer. Recall that in code are much more challenging to implement
the animated tutorial on reversing Wintel machine and expensive to acquire; no free tools were found
code (see Sect. 31.4) the key piece of information at the time of this writing. One such commercial
that led to the solution was the trial limitation mes- tool, EXECryptor (http://www.strongbit.com), is an
sage found in the .rdata (read-only) section of the industrial-strength machine code obfuscator that
executable. One can imagine that something as sim- when applied to the machine code for the Password
ple as having the Password Vault application load the Vault application rendered it extremely difficult
trial limitation message from a file each time time it to understand. The transformations performed by
is needed and immediately clearing it from memory EXECryptor caused such extreme differences in the
would have prevented the placement of a memory machine code, including having compressed parts
breakpoint on the trial message, which was an an- of it, that it was not possible to line up the differ-
chor for the entire tutorial. An alternative to mov- ences between the original and obfuscated versions
ing the trial limitation message out of the executable of the machine code to show evidence of the ob-
would be to encrypt it so that a search of the dump fuscations. Therefore, to demonstrate machine code
would not turn up any hits; of course, encrypted obfuscations in a way that is easy to follow, we will
symbolic information would need to be decrypted perform obfuscations at the source code level and
before it is used. Encryption of symbolic informa- observe the differences in the assembly language
tion, as was discussed in relation to the Wintel ani- generated by the GNU C++ compiler. The key idea
mated tutorial, is an activity related to the obfusca- here is that the obfuscated program has the same
tion of a program, which we discuss next. functionality as the original, but is more difficult to
understand during live or static analysis attempts.
There are no standards for code obfuscation, but it is
31.7.2 Basic Obfuscation relatively important to ensure that the obfuscations
of Wintel Machine Code applied to a program are not easily undone because
deobfuscation tools can be used to eliminate easily
Obfuscating the program calls for performing trans- identified obfuscations [31.7].
formations to the source code and/or machine code Algorithm 31.6 contains the source code and dis-
that would render it extremely difficult to under- assembly of VerifyPassword.cpp, a simple C++ pro-
stand but functionally equivalent to the original. gram that contains an insecure password check that
There are many kinds of transformations one can is no weaker than the implementation of the Pass-
apply with varying levels of effectiveness, and as word Vault trial limitation check. To find the rele-
Eliam [31.7] stated, “an obfuscation transformation vant parts of .text and .rdata sections that are related
will typically have an associated cost (such as): larger to the password check, the now familiar technique
code, slower execution time, or increased runtime of setting a breakpoint on a constant in the .rdata
memory consumption (by the machine code).” section was used.
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Because of the high-level nature of intermediate Using the simple program VerifyPassword.cpp,
languages such as Java and .NET bytecode, there are we now investigate applying obfuscations to make
free obfuscation tools that can perform fairly robust machine code more difficult to reverse engineer. The
transformations on bytecode so that any attempt first obfuscation that will be applied is a data trans-
to decompile the program will still result in source formation technique which in [31.7] is called “mod-
code that compiles, but is nearly impossible to un- ifying variable encoding.” Essentially this technique
derstand because of the obfuscation techniques that prescribes that all meaningful and sensitive con-
are applied. Kalinovsky [31.19] stated: “Obfuscation stants in a program be stored or represented in an
(of Java bytecode) is possible for the same reasons alternative encoding, such as ciphertext. For numer-
that decompiling is possible: Java bytecode is stan- ics, one can imagine storing or working with a func-
dardized and well documented.” Unfortunately, the tion of a number instead of the number itself; for ex-
Proof number:1
situation is very different for machine code because ample, instead of testing for α < 10, we can obscure
it is not standardized; instruction sets, formats, the test by checking if 1.2α < 1.210 instead. To make
and program image layouts vary depending on string constants unreadable in a dump of the .rdata
the target platform architecture. The side effect of section, we can employ a simple substitution cipher
31.7 Applying Antireversing Techniques to Wintel Machine Code 671
whose decryption function would become part of all members of the character set, including control
the machine code. A simple substitution cipher is characters, to be used in the mappings. Note that
an encryption algorithm where each character in the unlike ROT13 [31.22], this cipher is not its own
original string is replaced by another using a one-to- inverse – meaning that shifting each character an
one mapping [31.20]. Substitution ciphers are easily additional 13 positions to the right will not perform
broken because the algorithm is the secret [31.21], decryption.
so although we will use one for ease of demonstra- Using the substitution cipher given in Algo-
tion, stronger encryption algorithms should be used rithm 31.7, we replace each string constant in Veri-
in real-world scenarios. fyPassword.cpp with its equivalent ciphertext. Even
Algorithm 31.7 contains the definition of a sim- strings with format modifiers such as “%s” and “%d”
Proof number:1
ple substitution cipher that shifts each character can be encrypted as these inserts are not interpreted
13 positions to the right in the local 8-bit ASCII by methods such as printf and sprintf until execu-
or EBCDIC character set. Ciphertext is generated tion time. Algorithm 31.8 contains the source code
or read in printable hexadecimal format to allow and disassembly for VerifyPasswordObfuscated.exe,
2
CE AQ: Please check what you want to say here. To what does “clear” refer?
672 31 Software Reverse Engineering
17: {
18: cout << cipher.decryptFromHex(password_bad) << endl;
19: }
20: }
VerifyPasswordObfuscated.exe disassembly (abbreviated):
.rdata section
00445000 35323742383137323746324437443645 527B81727F2D7D6E
00445010 38303830383437433746373134373244 8080847C7F71472D
00445020 00373738323744324538313732374600 .77827D2E81727F.
00445030 36383543353836413244344537303730 685C586A2D4E7070
00445040 37323830383032443734374636453742 7280802D747F6E7B
Proof number:1
where each string constant in the program is stored example, the keyword “if” cannot be left as “lm.”
as ciphertext; when the program needs to display Therefore, COBF generates the cobf.h header file,
a message, the ciphertext is passed to the bundled which includes the necessary substitutions to make
decryption routine. The transformation we have the obfuscated source code compilable. Through
manually applied removes the helpful information this process, all user-defined method and variable
the string constants provided when they were stored names within the immediate file are lost, rendering
in the clear CE2 . Given that modern languages have the source code difficult to understand, even if one
well-documented grammars, it should be possible performs the substitutions prescribed in cobf.h.
to develop a tool that automatically extracts and Since COBF generates obfuscated source code as
replaces all string constants with ciphertext that is a continuous line, any formatting in the source code
wrapped by a call to the decryption routine. that served to make it more readable is lost. Al-
Once all constants have been stored in an alterna- though the original formatting cannot be recovered,
tive encoding, the next step one could take to further a code formatter such as Artistic Style [31.24] can
protect the VerifyPassword.cpp program would be to be used to format the code using ANSI formatting
obfuscate the condition in the code that tests for the schemes so that methods and control structures
correct password. Applying transformations to dis- can again be identified via visual inspection. Source
guise key logic in a program is an activity related code obfuscation is a fairly weak form of intellectual
to the antireversing technique obfuscating the pro- property protection, but it does serve a purpose
gram. For purposes of demonstration, we will imple- in real-world scenarios where a given application
ment some obfuscations to the trial limitation check needs to be built on the end-user’s target com-
in the C++ version of the Password Vault applica- puter – instead of being prebuilt and delivered on
tion, which was introduced in Sect 31.4, but first we installation media.
discuss an additional application of the technique
(obfuscating the program) that helps protect intellec-
tual property when proprietary software is shipped 31.7.4 Advanced Obfuscation
as source code. of Machine Code
tellectual property that is worth protecting, one can To see which instructions are executed when the
perform transformations to the source code which trial limitation message is displayed, the reverser
make it difficult to read, but have no impact on can choose to record a trace of all the instructions
the machine code that would ultimately be gen- that are executed when execution is resumed. To
erated when the program is compiled. To demon- make it difficult for a reverse engineer to understand
strate source code obfuscation, COBF [31.23], a free the logic of a program through tracing or stepping
C/C++ source code obfuscator, was configured and through instructions, we can employ control flow
given VerifyPassword.cpp as input; the results of this obfuscations, which introduce confusing, random-
are displayed in Algorithm 31.9. ized, benign logic that serves to make live and static
COBF replaces all user-defined method and analysis (debugging and tracing) difficult. The often
variable names in the immediate source file with randomized and recursive nature of effective con-
Proof number:1
meaningless identifiers. In addition, COBF replaces trol flow obfuscations can make traces more diffi-
standard language keywords and library calls with cult to understand and interactive debugging ses-
meaningless identifiers; however, these replace- sions less helpful: randomization makes the execu-
ments must be undone before compilation. For tion of the program appear different each time it
674 31 Software Reverse Engineering
is run, whereas recursion makes stepping through 31.7.5 Wintel Machine Code
code more difficult because of deeply nested proce- Antireversing Exercise
dure calls.
In [31.7], three types of control flow transforma- Apply the antireversing techniques eliminating sym-
tions were introduced: computation, aggregation, bolic information and obfuscating the program, both
and ordering. Computation transformations reduce introduced in Sects. 31.6 and 31.7, to the C/C++
the readability of machine code and, in the case of source code of the Password Vault application with
opaque predicates, can make it difficult for a de- the goal of making it more difficult to disable the
compiler to generate equivalent high-level-language trial limitation. Rebuild the executable binary for
source code. Aggregation transformations attempt
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
flow obfuscation to the trial limitation check in the original and obfuscated source code of the Password
Password Vault application and analyze their poten- Vault application. As each antireversing transforma-
tial effectiveness by gathering some statistics on the tion is applied to the source code, important differ-
execution of the obfuscated trial limitation check. ences and additions are explained through a series
31.7 Applying Antireversing Techniques to Wintel Machine Code 675
Algorithm 31.10 Encrypted strings are decrypted each time they are displayed
-------------------------------------------------------------------------------
133 case __createPasswordRecord: return "Create a Password Record";
==> 137 case __createPasswordRecord:
DecryptMessageText("507F726E81722D6E2D5D6E8080847C7F712D5F72707C7F7
1", _textBuffer);
-------------------------------------------------------------------------------
186 case __recordLimitReached: return "Thank you for trying Password
Vault! You have reached the maximum number of records allowed in this
trial version.";
==> 190 case __recordLimitReached:
DecryptMessageText("61756E7B782D867C822D737C7F2D817F86767B742D5D6E8
080847C7F712D636E8279812E2D667C822D756E83722D7F726E707572712D817572
2D7A6E85767A827A2D7B827A6F727F2D7C732D7F72707C7F71802D6E79797C84727
12D767B2D817576802D817F766E792D83727F80767C7B3B", _textBuffer);
-------------------------------------------------------------------------------
205 void PasswordVaultConsoleUtil::DecryptMessageText(const char
*_cipherText, string *_plainTextBuffer)
206 {
208 string cipherText(_cipherText);
210 SubstitutionCipher cipher;
212 _plainTextBuffer->assign(cipher.decryptFromHex(cipherText));
214 }
-------------------------------------------------------------------------------
of generated difference reports and memory dumps. able, one cannot simply locate and set a breakpoint
Once the antireversing transformations have been on the trial limitation message – as was done in the
applied, we cover the impact they have on the ma- solution to the Wintel machine code reversing ex-
chine code and how reversing the Password Vault ercise – causing a reverser to choose an alternative
application becomes more difficult when these ob- strategy. Note that more than just the trial limita-
fuscations make it difficult to find a good starting tion message would need to be encrypted, otherwise
point and hinder live and static analysis. The ob- it would look quite suspicious in a memory dump
fuscated source code for the Password Vault appli- alongside other nonencrypted strings!
cation is located in the obfuscated_source directory
of the archive located at http://reversingproject.info/ Obfuscating the Numeric Representation
repository.php?fileID=__.
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Encryption of String Literals Having obfuscated the string literals in the program
image, we will assume that a reverse engineer will
To eliminate the obvious starting point of setting need to select the alternative strategy of pausing the
an access breakpoint on the trial message, all of the program’s execution immediately before specifying
messages issued by the application are stored as en- the input that causes the trial limitation message
crypted hexadecimal literals that are decrypted each to be displayed. Using this strategy, a reverser can
time they are used – keeping the decrypted versions either capture a trace of all the machine instruc-
out of memory as much as possible. Algorithm 31.10 tions that are executed when the trial limitation mes-
gives an example of the necessary code changes to sage is displayed, or debug the application – step-
PasswordVaultConsoleUtil.cpp. ping through each machine instruction until a se-
Proof number:1
The net effect of encrypting the literals is shown quence that seems responsible for enforcing the trial
in Fig. 31.8. Here a dump of the .rdata section of the limitation is reached. Recall that in the solution to
Password Vault program image no longer yields the the Wintel machine code reversing exercise, an ob-
clues it once did. Since the literals are no longer read- vious instruction sequence that tested a memory lo-
676 31 Software Reverse Engineering
Algorithm 31.11 Encrypted strings are decrypted each time they are displayed
cation for a limit of five password records was found. The record limit check is abstracted out into
By using an alternative but equivalent representa- the method isRecordLimitReached, which returns
tion of the record limit, we can make the record whether or not the record limit is reached after hav-
limit test a bit less obvious. The technique we em- ing invoked the method isRecordLimitReached_0.
ploy here is to use a function of the record limit in- The method isRecordLimitReached_0 invokes itself
stead of the actual value; for example, instead of test- recursively a random number of times, increas-
ing for α 5, where α is the record limit, we ob- ing the call stack by a minimum of 16 frames
scure the limit by testing if 2α 25 . Algorithm 31.11 and a maximum of 64 frames. Each invocation of
gives an example of the necessary code changes to isRecordLimitReached_0 tests whether the record
PasswordVault.cpp. limit has been reached, locally storing the result,
The effects of the source code changes in Al- before randomly invoking one of the methods
gorithm 31.11 on the machine code are shown in isRecordLimitReached_1, isRecordLimitReached_2,
Fig. 31.8. A function of the record limit is refer- or isRecordLimitReached_3. When the call stack is
enced during execution instead of the limit itself. unraveled, isRecordLimitReached_0 finally returns
This type of obfuscation is as strong as the function whether or not the record limit is reached in the
used to obscure the actual condition is to unravel. method isRecordLimitReached. Algorithm 31.12
Keep in mind that a reverse engineer will not have shows the required code changes to implement
the nonobfuscated machine code for reference, so the control flow obfuscation. Note that a sum
even a very weak function, such as the one used in of random numbers returned from methods
this solution, may be effective at wasting some of isRecordLimitReached_1, isRecordLimitReached_2,
a reverser’s time. The numeric function used here and isRecordLimitReached_3 is stored in
is very simple; more complex functions can be de- randCallSum, a private attribute of the class; this is
vised that would further decrease the readability of to protect against a compiler optimizer discarding
the machine code. the calls because they would otherwise have no
effect on the state of any variables in the program.
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
thor using the cyclomatic complexity metric defined itation check through live or static analysis of the
by McCabe [31.24] as a general guideline for creat- disassembly. Live analysis is hampered more by ran-
ing a highly complex control flow graph for the trial domization than static analysis is because the con-
limitation check. trol flow of the trial limitation check is randomized
31.7 Applying Antireversing Techniques to Wintel Machine Code 677
Computation
obfuscation
Fig. 31.9 Record limit comperands are represented as exponents with a base of 2
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
each time it is run; one can imagine the confusion clude all the instructions executed by the program
that would arise if breakpoints are not always trig- and its operating system dependencies; fortunately
gered, or if they are triggered in an unpredictable or- the trace is columnar, with each instruction quali-
der. fied by the name of the module that executed it, so it
OllyDbg run traces are captured using the run is possible to postprocess the trace and extract only
trace view once the execution of a program has been those instructions executed by a particular module
paused at the desired starting point. To have the of interest. For example, in the case of the Password
trace logged to a file in addition to the view, select Vault traces which we will analyze in this section,
“log to file” on the context menu of the run trace the Sed (stream-editor) utility was used to filter the
view. Begin the trace by selecting “Trace into” on run traces – leaving only instructions executed by
Proof number:1
the “Debug” menu; the program will execute, but the “Password” module.
much more slowly than normal since each instruc- To analyze the effectiveness of the ordering (con-
tion must be inspected and added to the run trace trol flow) obfuscation, statistics on the differences
view and optional log file. An OllyDbg trace will in- between three different run traces were gathered us-
678 31 Software Reverse Engineering
isRecordLimitReached ()
isRecordLimitReached_()()
bool reached =
(2ˆrecords.getSize())> = (2ˆ5)
0
isRecordLimitReached_1()
1
abs(rand()) % 3 isRecordLimitReached_2()
2
isRecordLimitReached_3()
Fig. 31.10 Obfuscated control flow logic for testing the password record limit
tion insertions, deletions, or substitutions needed to with a modern PC. For reference, the average size
transform one trace into the other; we have modified of three traces analyzed in this section is 10 MB,
LD to consider each instruction instead of each char- and to compute the edit distance between two of
acter in the run traces. Figure 31.11 illustrates the them required an average of approximately 20 h of
significant differences that exist between the traces CPU time on an Intel Pentium 1.6 GHz dual-core
at the point of the obfuscated trial limitation check. processor. The LD implementation employed in this
The randomized control flow obfuscation causes sig- analysis uses a dynamic-programming approach
nificant differences in subsequent executions of the that requires O(m) space; note that some reference
trial limitation check – hopefully creating enough implementations of LD require O(mn) space since
of a deterrent for a reverse engineer by hamper- they use an (m + 1) (n + 1) matrix, which is im-
ing live and static analysis efforts. Table 31.2 con- practical for large files [31.25]. The approximately
Proof number:1
tains the statistical data that were gathered for the 20 h execution time for the LD implementation
analysis. is mainly because the dynamic-programming al-
A C++ implementation of LD, written for gorithm is quite naïve; perhaps an approximation
this solution, can be downloaded from http:// algorithm would perform significantly better.
31.7 Applying Antireversing Techniques to Wintel Machine Code 679
Algorithm 31.12 PasswordVault.cpp: implementation of the control flow obfuscation in Fig. 31.11
------------------------------------------------------------------------
if (passwordStore.getRecords().size() >= TRIAL_RECORD_LIMIT)
===> if (isRecordLimitReached())
-----------------------------------------------------------------------
01: bool PasswordVault::isRecordLimitReached()
02: {
03: srand(time(NULL));
04: controlFlowAltRemain = max(4, abs(rand()) % 64);
05: return isRecordLimitReached_0();
06: }
07:
08: bool PasswordVault::isRecordLimitReached_0()
09: {
10: while (controlFlowAltRemain > 0)
11: {
12: controlFlowAltRemain--;
13: isRecordLimitReached_0();
14: }
15:
16: bool reached = (pow(2.0,
(double)passwordStore.getRecords().size()) >= pow(2.0, 5.0));
17:
18: randCallSum = 0;
19:
20: switch (abs(rand()) % 3)
21: {
22: case 0:
23: randCallSum += isRecordLimitReached_1();
24: break;
25: case 1:
26: randCallSum += isRecordLimitReached_2();
27: break;
28: case 2:
29: randCallSum += isRecordLimitReached_3();
30: break;
31: }
32:
33: return reached;
34: }
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
35:
36: unsigned int PasswordVault::isRecordLimitReached_1()
37: {
38: return abs(rand());
39: }
40:
41: unsigned int PasswordVault::isRecordLimitReached_2()
42: {
43: return abs(rand());
44: }
45:
46: unsigned int PasswordVault::isRecordLimitReached_3()
47: {
Proof number:1
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Trace 2 → Trace 3
Trace 1 → Trace 2 Trace 1 → Trace 3
Fig. 31.11 Edit distances between three run traces of the trial limitation check
Table 31.2 Statistical data gathered for randomized control flow obfuscation
31.8 Applying Antireversing lection. The strategy of having newer Java language
Techniques to Java Bytecode constructs result in compatible bytecode with op-
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
in the constants pool of the bytecode that describes correct Java source code for a slightly larger pro-
the type of object a collection should contain; this in- gram. Given these results, one does need to be
formation can then be used at execution time by the concerned with protecting Java bytecode from
JVM to validate the type of each object in the col- decompilation if there is significant intellectual
31.8 Applying Antireversing Techniques to Java Bytecode 681
property in the program. The techniques used to obfuscation tool, a high level of protection can be
protect machine code in the antireversing exercise achieved for Java bytecode by applying three trans-
solution, detailed in Sect. 31.7.6, can also be applied formations: (1) name obfuscation, (2) string encryp-
to Java source code to produce bytecode that is tion, and (3) flow obfuscation. Unfortunately, at the
obfuscated. Since Java bytecode is standardized time of this writing, no free-of-charge software tool
and well documented, there are many free Java was found on the Internet that can perform all three
obfuscation tools available on the Internet, such of these transformations to Java bytecode. A cou-
as SandMark [31.26], ProGuard [31.27], and Ret- ple of tools, namely, ProGuard [31.27] and Retro-
roGuard [31.28], which perform transformations Guard [31.28], are capable of applying transforma-
directly on the Java bytecode instead of on the Java tion 1, and SandMark [31.26], a Java bytecode water-
source code itself. Obfuscating bytecode is inher- marking and obfuscation research tool, is capable of
ently easier than obfuscating source code because applying transformation 2, although not easily. Ex-
bytecode has a significantly stricter and more orga- perimentation with SandMark V3.4 was not promis-
nized representation than source code – making it ing since its “string encoder” obfuscation function
much more easy to parse. For example, instead of only worked on a trivial Java program; it failed when
parsing through Java source code looking for string given more substantial input such as some of the
constants to encrypt (protect), one can easily look classes that implement the Java version of the Pass-
in the constant pool section of the bytecode. The word Vault application. It is clear from a survey
constant pool section of a Java class file, unlike the of existing Java bytecode obfuscators that a full-
.rdata section of Wintel machine code, contains function, robust, open-source bytecode obfuscator
a well-documented table data structure that makes is sorely needed. Zelix Klassmaster, a commercial
available the name and length of each constant; on product capable of all three transformations men-
the other hand, the .rdata section of Wintel ma- tioned above, is said to be the best overall choice of
chine code simply contains all the constants in the Java bytecode obfsucator in [31.19]. A 30-day eval-
program in a contiguous, unstructured bytestream. uation version of Zelix Klassmaster can be down-
The variable names, method names, and string loaded from the company’s Web site.
literals in the constant pool section of Java bytecode Of course, one can always make small-scale
provide a wealth of information to a reverse engi- modifications to Java bytecode with a bytecode edi-
neer regarding the structure and operation of the tor such as CafeBabe [31.30]. Incidentally, CafeBabe
bytecode and hence should be obfuscated to protect gets its catchy name from the fact that the hexadec-
the software. Therefore, we now look at applying the imal value 0xCAFEBABE comprises the first four
technique eliminating symbolic information in the bytes of every Java class file; this value is known as
context of Java bytecode. the “magic number” which identifies every valid
Java class file. To demonstrate applying transforma-
tions to Java bytecode, we will target the bytecode
31.8.1 Eliminating Symbolic for the program CheckLimitation.java, whose source
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Information in Java Bytecode code is given in Algorithm 31.13; for this demon-
stration, assume that a reverse engineer is interested
Variable, class, and method names are all left in- in eliminating the limit on the number of passwords
tact when compiling Java source code to Java byte- and that we are interested in protecting the software.
code. This is a stark difference from machine code, We begin obfuscating CheckLimtiation.java by
where variable and method names are not preserved. applying transformation 1, i.e., name obfuscation:
Sun Microsystem’s Java compiler, javac, provides an rename all variables and methods in the bytecode so
option to leave out debugging information in Java they no longer provide hints to a reverser when the
bytecode: specifying javac -g:none will exclude in- bytecode is decompiled or edited. Using ProGuard,
formation on line numbers, the source file name, we obfuscate the bytecode and then decompile it us-
and local variables. This option offers little to no ing Jad to observe the effectiveness of the obfusca-
Proof number:1
help in fending off a reverse engineer since none of tion; the result of decompiling the obfuscated byte-
the variable names, methods names, or string lit- code using Jad is given Algorithm 31.14. As ex-
erals are obfuscated. According to the documenta- pected, all user-defined variable and method names
tion for Zelix Klassmaster [31.29], a Java bytecode have been changed to meaningless ones; of course,
682 31 Software Reverse Engineering
the names of Java standard library methods must be string literal is stored in a weakly encrypted form
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
left as is. ProGuard seems to use a different obfus- and decrypted on demand by a bundled decryption
cation scheme for local variables within a method; function. Algorithm 31.15 contains the Jad decom-
it is not clear why the variable “loop” in the main pilation result for the CheckLimitation.java bytecode
method has been changed to “flag” since it is still that was first obfuscated using ProGuard and subse-
a very descriptive name. quently obfuscated using the “String Encoder” func-
Next we further obfuscate the bytecode by ap- tionality in SandMark.
plying transformation 2, i.e., string encryption, and We can see that each string literal is decrypted
we do so by employing the “String Encoder” obfus- using the Obfuscator class which was generated by
cation in SandMark to protect the string literals in SandMark. Because Obfuscator is a public class, it
the program from being understood by a reverser. must be generated into a separate file named Obfus-
The “String Encoder” function in SandMark imple- cator.class – making it very straightforward for a re-
Proof number:1
ments an encryption strategy for literals in the byte- verser to isolate, decompile, and learn the encryp-
code that is similar to the one which was demon- tion algorithm. The danger of giving away the code
strated at the source code level in the Wintel ma- for the string decryption algorithm is that it could
chine code antireversing background section: each then be used to programmatically update the con-
31.8 Applying Antireversing Techniques to Java Bytecode 683
stants pool section of the bytecode to contain the control flow of a program – tricking a decompiler
plaintext versions of each string literal, essentially into traversing garbage bytes that are masquerad-
undoing the obfuscation. Ideally, we would like to ing as the logic contained in an else clause. Opaque
prevent a reverser from being able to successfully de- predicates are false branches, branches that appear
compile the obfuscated bytecode; this can be accom- to be conditional but are really not [31.7]. For ex-
plished through control flow obfuscations, which we ample, the conditions “if ( 1 == 1 )” and “if ( 1 == 2 )”
explore next. implement opaque predicates because the first al-
ways evaluates to true, and the second always eval-
uates to false. The essential element in preventing
31.8.2 Preventing Decompilation decompilation with opaque predicates is the inser-
of Java Bytecode tion of invalid instructions in the else branch of
an always-true predicate (or the if-body of an al-
Proof number:1
One of the most popular, and fragile, techniques for ways false predicate). Since the invalid instructions
preventing decompilation involves the use of opaque will never be reached during normal operation of
predicates which introduce false ambiguities into the the program, there is no impact on the program’s
684 31 Software Reverse Engineering
operation. The obfuscation only interferes with de- disassembly, cannot be used with Java bytecode be-
compilation, where a naïve decompiler will evalu- cause of the presence of the Java Bytecode Verifier in
ate both “possibilities” of the opaque predicate and the JVM. Before executing bytecode, the JVM per-
fail on attempting to decompile the invalid, un- forms the following checks using single-pass static
reachable instructions. Figure 31.12 illustrates how analysis to ensure that the bytecode has not been
Proof number:1
opaque predicates would be used to protect byte- tampered with; to understand why this is beneficial,
code from decompilation. Unfortunately, this tech- imagine bytecode being executed as it is received
nique, often used in protecting machine code from over a network connection. The following checks
31.8 Applying Antireversing Techniques to Java Bytecode 685
Opaque predicate template is, there should be at least one store operation to
If (1 == 1) that register before a load operation on that reg-
{ ister.
• Object initialization: Creation of object instances
doWOrk(); doWOrk(); must always be followed by a call to one of the
possible initialization methods for that object
} else (these are the constructors) before it can be used.
• Access control: Method calls, field accesses, and
{ class references must always adhere to the Java
visibility policies for that method, field, or refer-
// garbage bytes
ence. These policies are encoded in the modifiers
}
(private, protected, public, etc.).
On the basis of the high level of bytecode integrity
Fig. 31.12 Usage of opaque predicates to prevent decom- expected by the JVM, introducing garbage or il-
pilation legal instructions into bytecode is not feasible.
However, this technique does remain viable for
machine code, though there is some evidence that
good disassemblers, such as IDA Pro, do check for
made by the Java Bytecode Verifier are documented
rudimentary opaque predicates [31.7]. The authors
in [31.31]:
of SandMark claim that the sole presence of opaque
• Type correctness: Arguments of an instruction, predicates in Java bytecode, without garbage bytes
whether on the stack or in registers, should al- of course, can make decompilation more difficult.
ways be of the type expected by the instruction. Therefore, SandMark implements several differ-
• No stack overflow or underflow: Instructions ent algorithms for sprinkling opaque predicates
which remove items from the stack should never throughout bytecode. For example, SandMark in-
do so when the stack is empty (or does not cludes an experimental “irreducibility” obfuscation
contain at least the number of arguments that function which is briefly documented as “insert
the instruction will pop off the stack). Likewise, jumps into a method via opaque predicates so that
instructions should not attempt to put items the control flow graph is irreducible. This inhibits
on top of the stack when the stack is full (as decompilation.” Unfortunately this was not the
calculated and declared for each method by the case with the program DateTime.java shown in
compiler). Algorithm 31.16 as Jad was still able to decom-
• Register initialization: Within a single method, pile DateTime.class without any problems despite
any use of a register must come after the initial- the changes made by SandMark’s “irreducibil-
ization of that register (within the method). That ity” obfuscation. The bytes of the unobfuscated and
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
obfuscated class files were compared to verify that Encryption” selected, some interesting results were
SandMark did make significant changes; perhaps observed in the corresponding Jad decompilation.
SandMark does work for special cases, so more in- Algorithm 31.17 lists the Jad decompilation of
vestigation is likely warranted. In any event, opaque Zelix Klassmaster’s attempt at obfuscating Date-
predicates seem to be far more effective when in- Time.class. Zelix Klassmaster performed the same
serted into machine code because of the absence of kind of name obfuscation seen with ProGuard,
any type of verifier that validates all machine in- except it went a little too far and renamed the main
structions in a native binary before allowing it to ex- method; this was corrected by manually adding an
ecute. exception for methods named “main” in the tool.
SandMark’s approach of using control flow The results of the decompilation show that Zelix
obfuscations that leverage opaque predicates in an Klassmaster’s control flow obfuscation and use of
attempt to the confuse a decompiler is not unique opaque predicates is somewhat effective for this par-
because Zelix Klassmaster, a commercial product, ticular example because even though Jad was able
implements this approach as well. When Zelix to decompile most of the logic in DateTime.class,
Klassmaster V5.2.3a was given DateTime.class as in- Zelix Klassmaster’s obfuscation caused Jad to lose
put with both “aggressive” control flow and “String the value of the constant DATE_TIME_MASK
17:
18: private static final String a;
19: public static boolean b;
20: public static boolean c;
21:
22: static
23: {
24: "‘?X@MA%O\005@@wY\001ZQw\\\016J\024#T\rK\024>N@\013Gy";
25: -1;
26: goto _L1
27: _L5:
28: a;
29: break MISSING_BLOCK_LABEL_116;
Proof number:1
30: _L1:
31: JVM INSTR swap ;
32: toCharArray();
33: JVM INSTR dup ;
31.8 Applying Antireversing Techniques to Java Bytecode 687
when using it on line 12, and to generate a large trol flow obfuscation to inhibit static and dynamic
block of static, invalid code starting at line 22. In analysis as was done in the solution to the machine
Sects. 31.8.3 and 31.8.4 a Java antireversing exercise code antireversing exercise, apply one or more of
with a complete animated solution is provided. the control flow obfuscations available in SandMark
In the solution, decompilation of Java bytecode is and observe their impact by decompiling the obfus-
prevented through the use of a class encryption cated bytecode using Jad. Show that the Java byte-
obfuscation implemented by SandMark. Issues code reversing solution illustrated in the animated
regarding the use of this obfuscation technique are tutorial in Sect. 31.5.4 can no longer be carried out
discussed in the animated solution. as demonstrated.
Use Java bytecode antireversing tools such as Pro- For instructional purposes, an animated solution
Guard, SandMark, and CafeBabe on the Java version to the exercise in Sect. 31.8.4 that demonstrates the
of the Password Vault application to apply the antire- use of antireversing tools mentioned throughout
versing techniques eliminating symbolic information Sect. 31.8 to obfuscate the Java Password Vault ap-
and obfuscating the program with the goal of mak- plication was created using Qarbon Viewlet Builder
ing it more difficult to disable the trial limitation. and can be viewed using Macromedia Flash Player.
Instead of attempting to implement a custom con- The tutorial begins with the Java Password Vault
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Proof number:1
Fig. 31.13 Sample slide from the Java antireversing animated tutorial
688 31 Software Reverse Engineering
application, ProGuard, SandMark, Jad, CafeBabe, determine how the latest virus or worm infects
and Sun’s Java JDK already installed on a Windows® computer systems. The detection of viruses and
XP machine. Figure 31.13 contains an example slide spyware deeply leverages reverse engineering skills
from the animated solution. The animated solution by requiring both live and static analysis of machine
for the Java bytecode antireversing exercise can code and bytecode and attempting to determine
be downloaded from http://reversingproject.info/ malicious code sequences.
repository.php?fileID=__.
Begin viewing the tutorial by extracting password
_vault_java_antireversing_exercise.zip to a local di- References
rectory and either running password_vault_java_
antireversing_exercise.exe, which should launch the 31.1. H.A. Müller, J.H. Jahnke, D.B. Smith, M. Storey,
standalone version of Macromedia Flash Player, or S.R. Tilley, K. Wong: Reverse engineering:
opening the file password_vault_java_antireversing A roadmap, Proc. Conference on the Future of
Software Engineering, Limerick (2000) pp. 47–60
_exercise_viewlet_swf.html in a Web browser.
31.2. G. Canfora, M. Di Penta: New Frontiers of Reverse
Engineering, Proc. Future of Software Engineering,
Minneapolis (2007) pp. 326–341
31.9 Conclusion 31.3. M.R. Ali: Why teach reverse engineering?, ACM
SIGSOFT SEN 30(4), 1–4 (2005)
In this chapter we have covered some of the basic 31.4. L. Cunningham: COBOL Reborn (Jul. 9, 2008)
concepts related to reverse engineering and pro- [Online], available: http://it.toolbox.com/blogs/
oracle-guide/cobol-reborn- (last accessed:
tecting Wintel machine code and Java bytecode.
Jan. 30th, 2009)
Since many similarities exist between the machine 31.5. A.V. Deursen, J. Favre, R. Koschke, J. Rilling: Ex-
instruction set for different platforms, and Java byte- periences in Teaching Software Evolution and Pro-
code can now be generated using other languages, gram Comprehension, Proc. 11th IEEE Int. Work-
such as Ruby and Groovy, these concepts can be shop on Program Comprehension, Washington,
useful in a more general context. Although the con- DC (2003) pp. 2834–284
sistent theme throughout the exercises was either 31.6. B.W. Weide, W.D. Heym, J.E. Hollingsworth: Re-
verse engineering of legacy code exposed, Proc.
the disabling or protection of a trial limitation, 17th Int. Conference on Software Engineering,
which was selected for its obvious appeal, many Seattle (1995) pp. 327–331
more less controversial scenarios can be attempted 31.7. E. Eliam: Secrets of Reverse Engineering (Wiley, In-
with the base knowledge gleaned from the exercises. dianapolis 2005)
Having learned that it is possible to alter the behav- 31.8. Wikipedia contributors: Compiler, Wikipedia,
ior of machine code or bytecode, one could use this The Free Encyclopedia (Sep. 9th, 2008) [Online],
available: http://en.wikipedia.org/w/index.php?
knowledge to fix a bug or even add a new function to
title=Compiler&oldid= (last accessed:
an application for which the source code is lost. It is Sep. 14th, 2008)
no secret that intellectual property is very important
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
15th, 2008)
institutions understand their current technology 31.13. Crypto++® Library 5.5.2: Crypto++ Library is a free
stack and recommend an integration strategy for C++ class library of cryptographic schemes [On-
new technologies. No less important, of course, line], available: http://www.cryptopp.com (last ac-
are software security issues such as being able to cessed: Jun. 15th, 2008)
References 689
31.14. O. Yuschuk: OllyDbg v1.1: 32-bit assembler level 31.24. T.J. McCabe: A complexity measure, IEEE Trans.
analysing debugger for Microsoft Windows® [On- Softw. Eng. 2(4), 308–320 (1976), Online, available:
line], available: http://www.ollydbg.de (last ac- http://www.literateprogramming.com/mccabe.
cessed: Feb. 8th, 2008) pdf (last accessed: Mar. 2nd, 2009)
31.15. Wikipedia contributors: Machine code, Wikipedia, 31.25. Wikipedia contributors: Levenshtein distance,
The Free Encyclopedia (Oct. 21st, 2008) [Online], Wikipedia, The Free Encyclopedia (Sep. 26th,
available: http://en.wikipedia.org/w/index.php? 2008) [Online], available: http://en.wikipedia.
title=Machine_code&oldid= (accessed: org/w/index.php?title=Levenshtein_distance&
Nov. 1st, 2008) oldid= (last accessed: Mar. 4th, 2009)
31.16. P. Haggar: Java bytecode: Understanding bytecode 31.26. The University of Arizona, Department of Com-
makes you a better programmer, developerWorks puter Science: SandMark: A Tool for the Study
(Jul. 1st, 2001) [Online], available: http://www. of Software Protection Algorithms [Online], avail-
ibm.com/developerworks/ibm/library/it-haggar_ able: http://sandmark.cs.arizona.edu (last accessed:
bytecode/ (last accessed: Nov. 1st, 2008) Mar. 26th, 2008)
31.17. P. Kouznetsov: Jad v1.5.8g: Jad is a Java decompiler, 31.27. E. Lafortune: ProGuard v4.3: a Free Java bytecode
i.e. program that reads one or more Java class files Shrinker, Optimizer, Obfuscator, and Preverifier
and converts them into Java source files which can [Online], available: http://proguard.sourceforge.
be compiled again [Online], available: http://www. net (last accessed: Jan. 7th, 2009)
kpdus.com/jad.html (last accessed: Jun. 15th, 2008) 31.28. Retrologic Systems: RetroGuard v2.3.1
31.18. G.M. Weinberg: The Psychology of Computer Pro- for Java Obfuscation [Online], available:
gramming (Dorset House Publishing, New York http://www.retrologic.com/retroguard-main.html
1998) (last accessed: Jan. 7th, 2009)
31.19. A. Kalinovsky: Covert Java: Techniques for Decom- 31.29. Zelix Pty Ltd: Zelix Klassmaster: Java Bytecode Ob-
piling, Patching, and Reverse Engineering (Sam’s fuscator [Online], available: http://www.zelix.com/
Publishing, Indianapolis 2004) klassmaster/features.html (last accessed: Jan. 25th,
31.20. A. Sinkov: Elementary Cryptanalysis: A Mathemat- 2009)
ical Approach (The Mathematical Association of 31.30. A. G. Shvets: CafeBabe v1.2.7.a: Graphical Class-
America, Washington 1980) file Disassembler, Editor, Stripper, Migrator,
31.21. M. Stamp: Information Security: Principles and Compactor and Obfuscator [Online], available:
Practice (Wiley, Hoboken 2006) http://www.geocities.com/CapeCanaveral/Hall/
31.22. Wikipedia contributors: ROT13, Wikipedia, The /programs.html (last accessed: Jan. 15th,
Free Encyclopedia (Feb. 9th, 2009) [Online], 2009)
availble: http://en.wikipedia.org/w/index.php? 31.31. M.R. Batchelder: Java Bytecode Obfuscation, M.S.
title=ROT&oldid= (last accessed: Thesis (Dept. Comp Sci., McGill Univ., Montreal
Feb. 17th, 2009) 2007) [Online], available: http://digitool.library.
31.23. B. Baier: COBF v1.06: the Freeware C/C++ Source- mcgill.ca:/webclient/StreamGate?folder_id=
code Obfuscator [Online], available: http://home. &dvs=~ (last accessed: Mar. 3rd,
arcor.de/bernhard.baier/cobf (last accessed: Jun. 2009)
16th, 2008)
DOI 10.1007/978-3-642-04117-4 Date: 16-Oct-2009
Proof number:1
690 31 Software Reverse Engineering
The Author
Teodoro “Ted” Cipresso has worked with enterprise software systems for nearly
9 years to create tools that modernize legacy applications and subsystems through
XML and Web enablement of critical assets. Since joining IBM in June 2000, he
has worked on adding integrated XML support to the COBOL and PL/I languages
and currently works on IBM Rational Developer for System z, an Eclipse-based
integrated development environment that makes development of applications and
Web services more approachable by those new to the mainframe. In addition to
working at IBM, Ted is a graduate student of computer science at San Jose State
University and is working towards completing a thesis on software reverse engi-
neering education.