0% found this document useful (0 votes)

33 views15 pages

BinFinder asiaCCS

Uploaded by

baba212

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views15 pages

BinFinder asiaCCS

Uploaded by

baba212

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/372242707

Binary Function Clone Search in the Presence of Code Obfuscation and

Optimization over Multi-CPU Architectures

Conference Paper · July 2023

DOI: 10.1145/3579856.3582818

CITATIONS READS

3 22

4 authors, including:

Abdullah Qasem Mourad Debbabi

Concordia University Montreal Concordia University Montreal
5 PUBLICATIONS 51 CITATIONS 480 PUBLICATIONS 8,234 CITATIONS

SEE PROFILE SEE PROFILE

Marthe Kassouf
Hydro-Québec
43 PUBLICATIONS 565 CITATIONS

SEE PROFILE

All content following this page was uploaded by Abdullah Qasem on 12 May 2024.

The user has requested enhancement of the downloaded file.

Binary Function Clone Search in the Presence of Code
Obfuscation and Optimization over Multi-CPU Architectures
Abdullah Qasem Mourad Debbabi
Security Research Centre Security Research Centre
Concordia University, QC, Canada Concordia University, QC, Canada

Bernard Lebel Marthe Kassouf

Thales Research and Technologies Hydro-Québec Research Institute
Thales, QC, Canada Hydro-Québec, QC, Canada

ABSTRACT on Computer and Communications Security (ASIA CCS ’23), July 10–14, 2023,
Binary function clone search is an essential capability that enables Melbourne, VIC, Australia. ACM, New York, NY, USA, 14 pages. https://doi.
org/10.1145/3579856.3582818
multiple applications and use cases, including reverse engineer-
ing, patch security inspection, threat analysis, vulnerable function
detection, etc. As such, a surge of interest has been expressed in de-
1 INTRODUCTION
signing and implementing techniques to address function similarity The most prominent techniques that address function similarity on
on binary executables and firmware images. Although existing ap- binary executables and firmware images leverage machine learning
proaches have merit in fingerprinting function clones, they present [4, 5], deep learning [3, 18, 27], graph theory [4], etc. While these
limitations when the target binary code has been subjected to sig- techniques have an established merit in fingerprinting binary func-
nificant code transformation resulting from obfuscation, compiler tion clones, they present however limitations when it comes to code
optimization, and/or cross-compilation to multiple-CPU architec- transformation techniques and multiple CPU architectures. The rea-
tures. In this regard, we design and implement a system named son underlying this performance degradation is twofold: First, most
BinFinder, which employs a neural network to learn binary function state-of-the-art techniques rely on features that are not resilient to
embeddings based on a set of extracted features that are resilient to code transformation. Second, the underlying models are limited to
both code obfuscation and compiler optimization techniques. Our fully grasping function semantics. Hence, there is still a pressing
experimental evaluation indicates that BinFinder outperforms state- need for a more accurate binary clone search system in the presence
of-the-art approaches for multi-CPU architectures by a large margin, of advanced code transformations. Nowadays, software companies
with 46% higher Recall against Gemini, 55% higher Recall against implement such protection to impede reverse-engineering attempts
SAFE, and 28% higher Recall against GMN. With respect to obfusca- and to protect against trade secret (intellectual property) theft.
tion and compiler optimization clone search approaches, BinFinder asm2vec [3] is the first attempt to address binary function clone
outperforms the asm2vec (single CPU architecture approach) with search in the presence of code obfuscations but only over single CPU
30% higher Recall and BinMatch (multi-CPU architecture approach) architecture(x86). Also, it performs poorly over O0-vs-O3 compiler
with 10% higher Recall. Finally, our work is the first to provide note- optimization and over different O-LLVM obfuscation techniques,
worthy results with respect to binary clone search over the tigress as reported in [7, 9].
obfuscator, which is a well-established open-source obfuscator. BinMatch [7] addresses code obfuscation over multi-architecture
but only over O-LLVM obfuscation techniques and shows very
CCS CONCEPTS low accuracy against Bogus Control Flow (BCF) and Control Flow
• Security and privacy → Software reverse engineering. Flattening (FLA) obfuscation techniques. There are several existing
sophisticated obfuscation techniques supported by open-source
KEYWORDS obfuscation tools (e.g., tigress [25]) that have not yet been studied
in the context of binary function clone search. To address the pre-
Binary Code Similarity, Feature Evaluation and Selection viously mentioned limitations, we propose a new system called
ACM Reference Format: BinFinder. It is a multi-architecture binary function clone search
Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf. system in the presence of code obfuscation techniques and compiler
2023. Binary Function Clone Search in the Presence of Code Obfuscation optimization levels. BinFinder uses an end-to-end learning model
and Optimization over Multi-CPU Architectures. In ACM ASIA Conference
composed of a customized Multi-layer Perceptron Neural Network
Permission to make digital or hard copies of all or part of this work for personal or within a Siamese neural network to learn binary function repre-
classroom use is granted without fee provided that copies are not made or distributed sentations. The model is trained on a set of manually engineered
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the interpretable features selected at the binary function level. These
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or features are easy to extract, robust, CPU architecture independent,
republish, to post on servers or to redistribute to lists, requires prior specific permission and resilient to both compiler optimization and code obfuscation
and/or a fee. Request permissions from [email protected].
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia techniques. We upload the source code, dataset, and experiment
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. results to this repository1 for evaluation.
ACM ISBN 979-8-4007-0098-9/23/07. . . $15.00
https://doi.org/10.1145/3579856.3582818 1 http://bit.ly/3GBHl9f

443
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf

Code Obfuscations Compilers Optimizations

O0_fla O0_sub O0_bcf O0_O1 O0_O2 O0_O3
Features
P(0) diff_mean P(0) diff_mean P(0) diff_mean P(0) diff_mean P(0) diff_mean P(0) diff_mean
num_instructions 0.024 2188.365 0.105 615.22 0.027 3928.47 0.049 719.998 0.055 781.255 0.06 1055.09
num_basic block 0.014 42.219 0.37 5.297 0.036 27.731 0.352 5.548 0.361 5.528 0.363 5.578
num_Arithm 0.345 7.063 0.447 4.957 0.318 9.204 0.469 3.812 0.469 3.857 0.474 3.866
num_Logic 0.44 4.832 0.443 10.926 0.051 12.506 0.492 4.713 0.469 4.936 0.475 4.973
num_Cmp 0.283 29.971 0.394 7.143 0.277 14.342 0.407 7.352 0.405 7.471 0.41 7.525
num_ControlTrans 0.023 43.195 0.267 10.348 0.039 26.388 0.25 10.523 0.253 10.615 0.255 10.673
num_InOut 0.258 117.932 0.326 50.601 0.259 90.489 0.35 45.107 0.36 45.356 0.356 45.442
num_constants 0.091 15.464 0.29 3.391 0.133 2.763 0.324 2.228 0.327 2.294 0.326 2.322
num_callers 0.753 0.602 0.787 0.536 0.735 0.694 0.755 0.618 0.747 0.642 0.746 0.643
num_Libc_callees 0.923 0.158 0.932 0.144 0.888 0.224 0.912 0.194 0.912 0.2 0.91 0.203
num_callees 0.66 1.148 0.719 0.956 0.481 2.042 0.689 1.149 0.696 1.2 0.697 1.22
num_Unique_callees 0.764 0.488 0.785 0.46 0.73 0.535 0.764 0.524 0.773 0.528 0.774 0.528
unique_Vex_Instructions 0.314 0.221 0.314 0.221 0.314 0.221 0.106 0.214 0.107 0.218 0.111 0.231
Table 1: Empirical distribution of binary function features P(0)𝑠

The contributions of this work are as follows: To assess the resiliency of each extracted feature, outlined in Table
1, in the presence of compiler optimization or code obfuscation,
• We identify a set of engineered interpretable binary function we calculate the empirical distribution induced by the absolute dif-
features that are resilient to both code optimizations and ference between the targeted feature values (extracted from the
obfuscation techniques on multi-CPU architectures. original binary functions, compiled with optimization 𝑂 0 ) and their
• We design a Siamese neural network architecture to train related similar functions (compiled with other optimizations or
a corresponding neural network model using our new pro- code obfuscations) in our created Dataset-III outlined in Section
posed features to generate binary function embeddings for 3.4. For example, we calculate the absolute difference between ev-
similarity detection. ery two similar functions, (𝑓𝑖 ,𝑂 0 , feature= num_libc_callees, com-
• We conduct an extensive evaluation of BinFinder over three piler=gcc, architecture=x86) and (𝑓 𝑗 , sub, feature=num_libc_callees,
scenarios: (1) single CPU architecture (x86) in the presence compiler=clang, architecture=arm) as an input to empirical distri-
of different compiler optimization levels and code obfusca- bution function. Finally, we use the resulting P(0) and diff_mean
tions; (2) multi-CPU architectures where different compiler as metrics to decide if the targeted feature is resilient to obfus-
optimizations are applied; (3) multi-CPU architectures in the cation and optimization over multi-CPU architectures. P(0) is the
presence of different compiler optimization levels and code probability of a pair of similar binary functions to have the same
obfuscations. targeted feature value (i.e., their absolute difference is zero). A
• We demonstrate that the overall performance is maintained high probability value for P(0) means that the targeted feature is
(with small fluctuations) by performing additional experi- resilient to either compiler optimizations or to code obfuscations
ments to stress test our approach with respect to several over multi-CPU architectures. On the other hand, absolute differ-
conditions, namely: compiler choice, compiler optimization ence mean (diff_mean) values indicate to which extent the selected
levels, code obfuscation techniques, targeted CPU architec- feature value is affected by the targeted compiler optimization or
tures, and considered packages (libraries). code obfuscation compared to the same feature value extracted
from the original similar binary function. Small diff_mean values
indicate that the targeted feature is not much affected by neither
2 QUEST FOR RESILIENT FEATURES optimizations nor by obfuscations. In essence, a good candidate
Identifying resilient features is an essential step to build an efficient feature should have a high P(0) probability value and low absolute
machine learning model. Table 1 lists potential extracted numerical difference mean (diff_mean) value.
features and widely used features in the literature. After an in-depth Table 1 presents the calculated values for both P(0) and diff_mean
analysis of features that survive code transformation, we identify across the extracted numerical features from our Dataset-III, which
the following resilient features: includes several packages cross-compiled to two different CPU
architectures (x86, ARM) using different compiler optimizations
• num_callers: the count of binary functions that call the tar- and code obfuscations, as detailed in Section 3.4. From the table,
geted function. we can see that num_callers, num_libc_callees, num_libc_callees,
• num_libc_callees: the count of libc functions, such as num_callees, and num_unique_callees have very high P(0) values
strlen, memcpy, socket, etc., called by the targeted func- along with very low absolute difference mean, which indicate that
tion. these features are resilient to code obfuscation and compiler op-
• num_callees: the count of all functions that are called by the timization. Also, we can observe that, on average, 75% of similar
targeted function including libc call where some functions binary functions in our dataset have the exact num_callers value.
could be called more than one time. The remaining 25% of similar binary functions have only a small
• num_unique_callees: the number of unique functions that num_callers value difference with avg diff_mean=0.6225. Moreover,
are called by the targeted function.

444
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia

Figure 1: Overview of work flow of BinFinder

91% of similar binary functions have the same 𝑛𝑢𝑚_𝑙𝑖𝑏𝑐_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 functions, but every CPU architecture has its own assembly lan-
values while the remaining 9% have only a small difference with avg guage instructions. Therefore, we resort to lifting multi-CPU as-
diff_mean equal to 0.18. The feature num_callees has 𝑃 (0) values sembly instructions into an intermediate representation VEX-IR
greater than 0.6 across different code optimizations and obfusca- using the angr framework [24]. Theoretically, other intermedi-
tions, except over BCF obfuscation technique, where num_callees ate representations can be applied in the context of our work,
𝑃 (0) = 0.4. One reason for this behaviour is that BCF changes e.g., Valgrind [19]. Besides, code obfuscations and compiler op-
the CFGs by introducing several new unrelated basic blocks, con- timizations affect the number of generated assembly instructions
sisting of functions’ calls, that will not be executed. However, the per function dramatically. To address this issue, we only consider
remaining similar binary functions’ num_callees values have small the unique normalized VEX-IR instructions of each function. To
difference values (diff_mean=2.042) compared to their original sim- calculate 𝑃 (0) corresponding to unique_vex_Instructions, we use
ilar ones. Moreover, we investigated the lists of targeted binary Jaccard distance to calculate the difference. From Table 1, we ob-
function names called by a targeted function (callees). We find serve that 𝑃 (0) of unique_vex_Instructions is small, (0.2) on average.
that some functions are called more than one time by the targeted Also, its diff_mean is very small, (0.15) on average. Our aforemen-
functions. Hence, we consider using num_unique_callees as an ad- tioned thorough analysis indicates that the following features are
ditional feature. From Table 1, we observe that num_unique_callees resilient to code transformation: list of libcCalls, list of constants,
𝑃 (0) values are greater than 0.7. As per the aforementioned analy- num_callees, num_callers, num_libc_callees, num_unique_callees,
sis, we highlight that our newly introduced numerical features are and list of unique vex Instructions.
excellent candidates.
The remaining extracted features, such as num_Cmp, num_Logic, 3 BINFINDER APPROACH
num_Arithm, num_constants, have 𝑃 (0) values that are less than In this section, we elaborate all the steps required to design, imple-
0.5 and their related avg diff_mean is very high. Therefore, they ment, and test our proposed Binary Function Clone Search approach
are not good candidate features. Hence, we do not include them in named BinFinder. The process, as depicted in Figure 1, is divided
our approach. In addition to the extracted new numerical features into four steps: (1) data collection and generation, (2) feature se-
outlined in Table 1, we extract the list of libcCalls and the list of lection and representation, (3) model learning, and (4) query and
Constants from each binary function in our dataset. This is moti- results. In Step (1), we collect the source code of several software
vated by the fact that we encountered several instances where two packages along with their reported vulnerabilities in the NIST CVE
dissimilar functions have the same num_libc_callees or num_callers database2 , as outlined in Section 3.4. In step (2), we disassemble ev-
values, but they do not have the same list of num_libc_callees or ery binary function in our repository to extract and preprocess the
Constants. Moreover, common reverse engineering tools, such as selected features required by our proposed neural network model.
the IDA Pro disassembler, which we use, maintain databases of func- In step (3), we train and test our proposed end-to-end Siamese neu-
tion signatures named FLIRT [6] to recognize standard functions ral network to build an efficient binary function embedding model
such as those included at the link time. required to generate an embedding for every binary function in our
Regarding the list of 𝑐𝑎𝑙𝑙𝑒𝑟𝑠 and the list of 𝑐𝑎𝑙𝑙𝑒𝑒𝑠 features, repository. In Step (4), given a new binary function 𝑓𝑞 , (e.g., a newly
we considered only their length since we cannot get the exact discovered vulnerability), we initially extract its related features
function caller or function callees names when the targeted bi- obtained at step (2). Then, we generate its embedding using our
naries are stripped. Instead, we get a random function name cre- trained model obtained at learning step (3). Finally, we compare
ated during disassembly. However, there are few functions, less the generated embedding for the given function 𝑓𝑞 against other
5% in our collected Dataset-III, generally small functions, that binary functions embeddings stored in our repository. We use pair-
have neither 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 nor 𝑙𝑖𝑏𝑐𝐶𝑎𝑙𝑙𝑠. Consequently, several dis- wise cosine distance as a measure to retrieve the top-k candidate
similar functions appear similar. To address the issue with these functions based on the highest cosine similarity scores.
small functions, we decide to utilizes their generated assembly

2 https://nvd.nist.gov/vuln/search

445
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf

Figure 2: Selected features Representation of BIO_get_accept_socket in OpenSSL library

3.1 Preprocessing Selected Features a list of keywords is one-hot-vector, which is suitable when there
We have two types of features: numerical and lists of literals. As is no ordinal relationship between the list elements. We assign a
such, we need to pre-process the selected features for the training dedicated tokenizer for each selected feature, and we end up with
and testing processes. The feature 𝑙𝑖𝑏𝑐𝐶𝑎𝑙𝑙𝑠 is a list of function seven tokenizers, as shown in Figure 2. For example, we have one
names. The feature 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 is a list of integer numbers. The tokenizer, which receives VEX-IR instructions and produces a one-
feature Unique_vex_instructions is a list of normalized VEX-IR key- hot-vector of size 266. In our collected Dataset-III, there are 266
words. We replace every VEX-IR register name with REG, variable normalized unique VEX-IR instructions resulting from the lifting
name with TMP, number with CONST, function name with foo, of executable binary files cross-compiled to both ARM and x86
and memory reference with MEM. The remaining four features architectures. The tokenizer assigned for libcCalls feature produces
are numerical integers: the number of callers, the number of callees, a one-hot-vector of size 181. In our collected Dataset-III, there are
the number of unique callees, and the number of libc calls. As such, 181 unique libcCalls. The tokenizer assigned for Constants produces
we need to represent these numerical features as a list of literals a one-hot-vector of size 1000. In our analysis, large numbers are
to ensure that these feature representations have the same distri- more likely memory addresses. Regarding num_callers features, in
butions as the representations of libcCalls, constants, and unique our collected Dataset-III, the largest value of num_callers feature is
vex instructions. In our training experiments, we observe that our 94. Thus, when multiplied by 5, its related vector size is 470. Finally,
model produces less accurate results when the values of the input all the tokenizers’ outputs are concatenated to produce a single
features of the model have a different distribution. The trained neu- vector for each binary function.
ral network model gets confused. It assumes that numerical features In our implementation, we utilize Keras text Tokenizer in python
are more important than literal features. Thus, machine learning to pre-process and represent each selected feature as one-hot-vector.
practitioners recommend feature pre-processing and normaliza- Keras text Tokenizer creates a dictionary consisting of all unique 𝑛
tion. To address this issue, we consider all sequential numbers from keywords in a given group of samples. It has four modes to represent
𝑜𝑛𝑒 to the number of callers, number of callees, number of libc- a document composed of 𝑚 words: Binary, count, freq, tfidf. In our
Calls. For example, suppose that the feature num_callees value is 5, experiments, Binary mode yields the highest accuracy because it
then we represent this feature as ’1 2 3 4 5’. This way, num_callees ensures that all features values are within the same distribution
value is similar to the list of extracted Constants feature. Then, we {0, 1}. In our implementation, if a new system call or instruction is
consider the list of integer numbers as a list of literals similar to introduced, the related tokenizer will ignore it.
libcCalls. However, when num_callers, num_unique_callees, and
the num_libc_callees feature values are very small, these features 3.3 Siamese Neural Network Architecture
will not have a significant impact in terms of deciding the highest
In the literature, binary function fingerprinting is formalized as a
similarity, especially when many constants or unique VEX-IR in-
similarity problem [27]. We can not determine the final number of
structions are shared across dissimilar functions. To fix this issue,
binary functions. Therefore, binary function similarity cannot be
we assign weights to these features by multiplying the extracted
addressed with traditional classification techniques. To solve such a
number of num_callers, num_unique_callees, and num_libc_callees
problem, we need to implement an end-to-end ML technique such
with 5. We also evaluate the cases of multiplying these numbers by
as Siamese neural network.
3, 4, 10, 15 and 20. Choosing the value of 5 yields the highest 𝐴𝑈𝐶
and results in smaller vector sizes.

3.2 Feature Representation

In our implementation, we consider all features as a list of keywords
to ensure that all feature representations have the same distribu-
tion. One of the best ways to represent a document consisting of
Figure 3: Siamese Neural Network Architecture

446
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia

Our Siamese neural network architecture is depicted in Figure 3. It options with the three available obfuscators. In total, Dataset-III
is composed of two identical three-layered multi-layer perceptron consists of 284, 491 binary functions.
neural networks [2]. Our designed NN is suitable for our prob- Dataset-IV: to build this dataset, we configure tigress to ob-
lem. It takes into account our selected features and their newly fuscate three packages, namely: OpenSSL, Zlib, and Coreutils.
proposed representation. We have experimented with various po- tigress receives only a single C/C++ file, and to obfuscate a package
tential hyper-parameters including the number of layers, layer sizes, using tigress, one needs to iterate through every .𝑐 file and supply all
and activation functions. Our final design choice is based on the its required imported library locations and compilation options. To
highest reported AUC. Therefore, we choose the following internal do so, we extract the required information, for each package, from
design settings. Each multilayer perceptron neural network con- its related Makefile. Afterwards, we obfuscate all functions within
sists of three layers excluding the input layer: a first layer of size the selected packages using five different obfuscation techniques
2048 and activation function ReLU, a second layer of size 512 and (the Appendix lists the commands used). Finally, we compile the
activation function ReLU, and a third layer (an embedding layer) resulting obfuscated files using gcc to x86 CPU architecture. In total,
of size 100 without an activation function. Each multi-perceptron Dataset-IV consist of 60, 395 binary functions.
neural network receives the selected binary function features repre- Dataset-V: We use an available online dataset composed of 49
sentations, as outlined in Section 3.2, To learn and produce related packages [11]. Each package is compiled using clang and O-LLVM
function embedding at layer three. The Siamese neural network obfuscator, each time using 𝑂 0, 𝑂 1, 𝑂 2, 𝑂 3, 𝐹 𝐿𝐴, 𝑆𝑈 𝐵, 𝐵𝐶𝐹 . In total,
receives the embeddings resulting from its two inner neural net- this dataset has 164, 700 optimized and obfuscated binary functions
works as an input and produces a cosine distance as an output. for x86 CPU architecture.
The two multi-layer perceptrons neural networks share the same For all generated datasets, we use angr [24] to extract VEX-IR
parameters to remain identical during training. They are jointly instructions and IDAPro 6.8 [8] to extract the remaining numerical
and iteratively optimized using the following loss function with features from each binary function.
stochastic gradient descent:
3.5 Training
𝑛
∑︁ 𝑒𝑚𝑑 1 .𝑒𝑚𝑑 2 We train our proposed Siamese neural network is trained on Dataset-
𝑚𝑖𝑛 (𝑐𝑜𝑠 ( ) − 𝑦𝑖 ) 2 (1)
𝑖=1
||𝑒𝑚𝑑 1 ||.||𝑒𝑚𝑑 2 || I and Dataset-III, which are split into training, validation, and testing
Given 𝑛 pairs of extracted binary function features (𝑓1, 𝑓2 ), each pair sets. The training set accounts for 80% of each dataset, while 10% is
is assigned a label 𝑦𝑖 = +1 when they are similar, otherwise 𝑦𝑖 = −1. allocated for validation and testing. We ensure that the packages
With this loss function, we want to ensure that an embedding 𝑓𝑖 of chosen for training are neither part of validation nor testing and
a specific binary function is closer to all its similar binary functions vice versa to avoid overfitting and ensure that the test results ac-
embeddings. curately portray the generalization capability. .Table 3 summarizes
the number of functions in each phase. The neural network is im-
plemented using TensorFlow in Python. Adam optimizer with a
3.4 Dataset
learning rate of 0.0001 is employed for training over 100 epochs.
In a real-life scenario, the same source code package could Each epoch involves creating two pairs for each binary function in
be available on different versions and compiled with different the training set, one with a randomly selected similar function and
optimization options or code obfuscations techniques. We col- a label of +1, and the other with a randomly chosen dissimilar func-
lected seven popular open-source packages to train and test tion and a label of -1. The training dataset is shuffled and divided
our approach, namely: glibc(2.11 - 2.25), gmp(6.1.0, into mini-batches of 500 similar and 500 dissimilar pairs. AUC is
6.1.1), gnuBinutils(2.28, 2.29), libcurl(7.32.0 - used to evaluate the network’s performance on the validation set,
7.50.2),openssl(1.0.2s, 1.1.1a), ImageMagic, and and the model with the highest accuracy is saved. Training time for
zlib(1.2.7.1). For further evaluation, we collected extra pack- BinFinder on Dataset-I averages 2 minutes per epoch, reaching a
ages, as we outline later. We command each compiler to enable best accuracy of 98% in 30 iterations. Training on Dataset-III takes
debug symbols to facilitate building the ground-truth mapping be- an average of 7 minutes per epoch, achieving a best accuracy of
tween similar functions for training purposes. For the required 97% in 26 iterations. The training and evaluation are conducted on
analysis through this paper, we disable inlining and generate the a server equipped with Intel(R) Xeon(R) CPU E5-2630v3 running at
following datasets: 2.40GHz, with 300 GB memory, and 8 GPU NVIDIA TITAN cards.
Dataset-I: we instrument (gcc, clang) and O-LLVM to build each
package source code into x86, each time with one of the options:
4 EVALUATION
𝑂 0, 𝑂 1, 𝑂 2, 𝑂 3, 𝐹 𝐿𝐴, 𝑆𝑈 𝐵, and 𝐵𝐶𝐹 . In total, Dataset-I consists of
116, 508 binary functions. In this section, we evaluate our proposed approach. Accordingly,
Dataset-II: we instrument gcc and clang to compile each package we introduce our evaluation measures in Section 4.1. Then, we
into two different CPU architectures (ARM and x86) each time with present our evaluation results with respect to code obfuscation in
one of the optimization options 𝑂 0, 𝑂 1, 𝑂 2, 𝑂 3 . In total, Dataset-II Section 4.2 and respectively compiler optimization in Section 4.3.
consists of 157, 673 binary functions.
Dataset-III: we extend Dataset-I by compiling binaries to both 4.1 Evaluation Measures
ARM and x86 CPU architectures. We also instrumented O-LLVM When the source code is not available, binary function clone search
to generate all possible combinations across the four optimization is addressed in similar manner to the Information Retrieval (IR)

447
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf
gmp OpenSSL zlib ImageMagic Binutils Coreutils Findutils Plotutils Inetutils Avg
M1 M2 M1 M2 M1 M2 M1 M2 M1 M2 M1 M2 M1 M2 M1 M2 M1 M2 M1 M2
O0 Vs BCF 0.79 0.84 0.71 0.75 0.87 0.94 - - 0.8 0.44 0.55 0.57 0.61 0.68 0.78 0.55 0.54 0.59 0.71 0.67
O0 Vs FLA 0.92 0.84 0.77 0.72 0.92 0.92 0.8 0.79 0.86 0.86 0.93 0.92 0.84 0.85 1 0.78 0.93 0.94 0.88 0.85
O0 Vs SUB 0.93 0.92 0.92 0.88 0.91 0.95 0.85 0.84 0.9 0.89 0.97 0.98 0.93 0.93 0.97 0.87 0.95 0.95 0.92 0.91
Avg 0.88 0.87 0.8 0.78 0.9 0.94 0.825 0.818 0.84 0.66 0.75 0.76 0.75 0.78 0.88 0.69 0.74 0.77 0.84 0.81
Table 2: Impact of obfuscation on BinFinder using precision at top-1

Training Validation Testing Total Dataset-I to give our model the opportunity to recognize opti-
x86 83194 12619 20695 116508 mized and obfuscated functions at the same time. For both sce-
x86 + ARM 209216 36376 38899 284491
narios, we query for every original binary function in Dataset-I
Table 3: Dataset-I and Dataset-III split details and a few selected packages from Dataset-V (due to limited table
space) against its similar obfuscated ones and vice versa. Given a
binary function, 𝑞𝑖 and its embedding resulting from the config-
uration 𝑞𝑖 (𝑐𝑙𝑎𝑛𝑔, 𝑂 0, 𝑥86), we query for its similar binary func-
problem. Hence, we evaluate BinFinder using Precision and Recall tion embeddings resulting from configuration (𝑐𝑙𝑎𝑛𝑔, 𝑆𝑈 𝐵, 𝑥86),
measures from an information retrieval perspective since AUC (𝑐𝑙𝑎𝑛𝑔, 𝐹 𝐿𝐴, 𝑥86) and (𝑐𝑙𝑎𝑛𝑔, 𝐵𝐶𝐹, 𝑥86) at top-1, using pairwise co-
is inappropriate in this context. In almost all circumstances, the sine distance. Based on Table 2, we see that the precision at top-1
dataset is extremely skewed: typically, over 99.9% of the binary fluctuates at package level among all different code obfuscation
functions are in the dissimilar category [15]. The measures that we techniques. It is worth mentioning that the precision at top-1 is
use are detailed hereafter: equal to the recall at top-1. For example, the precision at top-1 for
Precision (P) is the fraction of similar functions from the total gmp library is different from the precision at top-1 for OpenSSL. Nev-
number of retrieved functions (False Positive) [15]: ertheless, we can see that BinFinder achieves its best result over
SUB with a precision of 93% for M1 and 91% for M2. SUB modifies
(#𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2) instructions sequences by adding more instructions in between,
(#𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠)
but it has a minimal effect on the selected features of BinFinder
Recall (R) is the fraction of similar functions retrieved from the such as 𝑙𝑖𝑏𝑐𝐶𝑎𝑙𝑙𝑠, 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑒𝑠, and 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑟𝑠. In addition, Table
total of all similar functions in the Repository (True Positive) [15]: 2 shows that M1 and M2 achieve the lowest accuracy over BCF with
a precision 71% and 67%, respectively. The reason is that BCF intro-
(#𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑)
𝑅𝑒𝑐𝑎𝑙𝑙 = (3) duces the largest amount of modifications in our selected feature
(#𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠) num_libcCalless. The underlying P(0) is 4% less. Another interest-
The normalized Discounted Cumulative Gain (nDCG) is a mea- ing observation from the table is that M1 usually achieves better
sure between 0 and 1. accuracy than M2 over both FLA and SUB, for all packages. This
Í𝑘 𝑖𝑠𝑆𝑖𝑚𝑖𝑙𝑎𝑟 (𝑓𝑖 ,𝑞) is because M1 employs our proposed features without any noise
𝑖=1 𝑙𝑜𝑔 (1+𝑖 ) introduced by O-LLVM obfuscation techniques. Overall, on aver-
𝑛𝐷𝐶𝐺 = (4) age, both models achieve close precision values, i.e., 84% and 80%,
𝑂𝑝𝑡𝑖𝑚𝑎𝑙𝐷𝐶𝐺𝑘
respectively. In the end, the observations mentioned earlier show
where 𝑖𝑠𝑆𝑖𝑚𝑖𝑙𝑎𝑟 is 1 if 𝑓𝑖 is a function similar to 𝑞 otherwise is 0. that BinFinder can identify, with a high precision, obfuscated binary
𝑂𝑝𝑡𝑖𝑚𝑎𝑙𝐷𝐶𝐺𝑘 is the Discounted Cumulative Gain of the optimal functions introduced by O-LLVM without the need of seeing any
query answering. This measure is between 0 and 1. It gives a high prior obfuscated samples. Therefore, BinFinder is resilient to the
value for results where similar functions appear in the first positions addition of ”junk code”.
of the retrieved functions.
4.2.2 tigress Obfuscator. For this evaluation, we investigate
4.2 Code Obfuscation how BinFinder performs against unseen advanced obfuscation tech-
In this section, we evaluate BinFinder in the presence of the appli- niques. We use (M1) model detailed earlier to generate the embed-
cation of different code obfuscation techniques implemented by dings for all obfuscated binary functions by tigress in Dataset-IV.
O-LLVM and tigress. An overview of targeted obfuscation tech- Afterwards, we search for every original binary function (compiled
niques is detailed in Section A in the Appendix. with O0 ), to find its similar counterpart (obfuscated by tigress) in
Dataset-IV. We summarize the results in Table 4.
4.2.1 O-LLVM obfuscator. For this evaluation, we train two Add Opaque: BinFinder achieves the highest Recall (89% over
models. The first model (M1) is trained and tested over binary func- zlib, and 40% over OpenSSL), while achieving a low Recall over
tions compiled only with different optimization levels in Dataset-I. the Coreutils (12%). To get more insights into our results, we
We then use the resulting model to generate the embeddings for manually inspect our selected features over obfuscated functions.
the obfuscated binary functions generated by O-LLVM. This setup We find that obfuscated functions by Add Opaque have more calless
aims at evaluating BinFinder efficiency against binary functions and unique_vex instructions than their similar non-obfuscated ones.
obfuscated using O-LLVM, that the model did not encounter dur- For example, in the case of Coreutils, we observe that 𝑃 (0) of
ing training. The second model (M2) is trained and tested over a 𝑛𝑢𝑚_𝑢𝑛𝑖𝑞𝑢𝑒_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 is 0.13 and 𝑃 (0) of unique_vex is 0.1. We see
mix of optimized and obfuscated binary function samples from that our selected features (unique_vex, 𝑛𝑢𝑚_𝑢𝑛𝑖𝑞𝑢𝑒_𝑐𝑎𝑙𝑙𝑒𝑒𝑠) are

448
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia
openssl zlib Coreutils Avg
top-1 top-10 top-1 top-10 top-1 top-10 top-1 top-10
O0 O3 O0 O3 O0 O3 O0 O3 O0 O3 O0 O3 O0 O3 O0 O3
Add Opaque 0.4 0.58 0.56 0.77 0.89 0.44 1 0.59 0.13 0.17 0.2 0.27 0.47 0.4 0.59 0.54
EncodeArithmetic 0.42 0.52 0.772 0.78 0.54 0.35 0.84 0.76 0.71 0.43 0.91 0.81 0.56 0.43 0.84 0.78
EncodeLitrals 0.65 0.75 0.85 0.9 0.85 0.84 1 0.98 0.91 0.72 0.99 0.95 0.8 0.77 0.95 0.94
Flatten 0.08 0.45 0.14 0.63 0.19 0.75 0.43 0.96 0.21 0.66 0.29 0.84 0.16 0.62 0.29 0.81
virtualization 0.18 0.12 0.33 0.17 0.3 0.04 0.55 0.33 0.33 0.25 0.56 0.4 0.27 0.14 0.48 0.3
Avg 0.346 0.48 0.53 0.65 0.55 0.48 0.76 0.72 0.45 0.45 0.59 0.65 0.45 0.47 0.63 0.68
Table 4: Clone search between original binary functions and obfuscated ones by tigress

𝑈 𝑛𝑖𝑞𝑢𝑒𝑉 𝑒𝑥 #𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠𝐿𝑖𝑠𝑡 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑟𝑠 𝑛𝑢𝑚_𝑙𝑖𝑏𝑐_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 𝑛𝑢𝑚_𝑢𝑛𝑖𝑞𝑢𝑒_𝑐𝑎𝑙𝑙𝑒𝑒𝑠

O0 O3 O0 O3 O0 O3 O0 O3 O0 O3 O0 O3 O0 O3
Add Opaque 0.96 0.96 1 0.98 0.99 0.97 0.88 0.89 0.95 0.88 0.56 0.82 0.56 0.82
EncodeLiterals 0.7 0.88 0.96 0.93 0.93 0.93 0.91 0.84 0.97 0.96 0.84 0.9 0.95 0.94
Flatten 0.09 0.73 0.6 0.79 0.58 0.79 0.4 0.62 0.88 0.94 0.21 0.79 0.21 0.85
Virtualization 0.2 0.2 0.46 0.66 0.45 0.66 0.82 0.52 0.95 0.87 0.34 0.29 0.38 0.3
EncodeArithmetic 0.33 0.42 0.95 0.86 0.92 0.86 0.91 0.81 0.97 0.94 0.84 0.84 0.95 0.91
Table 5: Compiler optimizations effect against tigress obfuscation techniques over OpenSSL; each value represents P(0)

less resilient in case of Add Opaque. Based on these observations, optimizations generally eliminate dead code such as unreachable
we can state that “junk calls” and instruction insertions weakens instructions or unused functions. Therefore, in this section, our goal
our Recall. Actually, our model is sensitive to dissimilar functions is to understand the impact of compiler optimizations on tigress
that have close similar features. Overall, BinFinder still achieves an obfuscation techniques.
average of 47% Recall at top-1 and 59% Recall at top-10. To achieve our goal, we take obfuscated source files generated
Encode Arithmetic: BinFinder reports the highest Recall over by tigress and compile them with the highest compiler optimization
Coreutils with 71%. Yet, it reports less Recall over Zlib and level (O3 ) using gcc. We use (M1) model explained earlier to generate
OpenSSL with 54% and 42% respectively. These packages perform a the embeddings for all binary functions obfuscated by tigress. Then,
lot of mathematical operations where more complex mathematical we search for every original binary function (compiled with O3 ), to
instructions could be added. From Table 5, we can observe that 𝑃 (0) find its similar counterpart (obfuscated by tigress) in our repository,
of unique_vex is 0.33, which indicates that several new instructions and we report the performance at top-1 and top-10.
have been added by tigress. Overall, BinFinder still achieves an Add Opaque: From Table 4, we can see that BinFinder achieves
average of 56% Recall at top-1 and 84% Recall at top-10. 58% Recall at top-1 and 77% Recall at top-10 over OpenSSL. Notice-
Encode Literals: This technique has the lowest impact on our ably, the optimization O3 considerably reduces the impact of Add
selected features; it only replaces strings and integers. As seen in Opaque technique on our approach. As a result, the Recall improved
Table 5, all 𝑃 (0) values calculated for OpenSSL are more than 0.7 from 40% to 58%, over OpenSSL. Using manual analysis, we find
for 𝑂 0 . On average, BinFinder still achieves 81% Recall at top-1 and that, the number of inserted instructions and “junk” callees is re-
95% Recall at top-10. duced. From Table 5, we can see that with respect to O0 , 𝑃 (0) of
Flatten: BinFinder reports the lowest performance against the 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 is 0.56. In contrast, for O3 it is 0.82. The optimization
Flatten technique. Flatten heavily affects almost all of our selected level O3 removes unnecessary function calls introduced by tigress.
features including unique_vex, constantsList, 𝑛𝑢𝑚_𝑢𝑛𝑖𝑞𝑢𝑒_𝑐𝑎𝑙𝑙𝑒𝑒𝑠, Flatten: Table 4 shows that BinFinder performance increases
𝑛𝑢𝑚_𝑢𝑛𝑖𝑞𝑢𝑒_𝑐𝑎𝑙𝑙𝑒𝑟𝑠. These features’ 𝑃 (0) values are less than from 16% to 62%, on average. We find that, in contrast to O0 , the
0.5 on average. The 𝑛𝑢𝑚_𝑙𝑖𝑏𝑐_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 is the only resilient feature number of 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑒𝑠 for obfuscated functions over O3 is not
against Flatten; its 𝑃 (0) equals to 0.88. In our inspection, we find persistently zero. Also, the value of 𝑃 (0) for unique_Vex improved
that, obfuscated functions by Flatten have no calless. Flatten uses from 0.09 over O0 to 0.73 over O3 , as outlined in Table 5.
indirect call mechanism to call the targeted functions. On average, In the same way, O3 against Encode Arithmetic, and Encode lit-
BinFinder achieves 16% Recall at top-1 and 28% Recall at top-10. erals. We can see, in the case of OpenSSL, that BinFinder Recall
Virtualization: This technique heavily affects our extracted performance has increased by more than 10%, as shown in Table
features including unique_Vex, constansList, num_unique_callees. 4. For example, against Encode Arithmetic, BinFinder achieves 52%
As a result, their 𝑃 (0)𝑠 values are less than 0.5. However, Virtualiza- Recall at top-1 and 77% Recall at top-10. Moreover, over Encode
tion does not affect the 𝑛𝑢𝑚_𝑐𝑎𝑙𝑙𝑒𝑟𝑠 and 𝐿𝑖𝑏𝑐𝐶𝑎𝑙𝑙𝑠. The P(0) values literals, BinFinder achieves 75% Recall at top-1 and 90% Recall at
of these two features are greater than 0.8 over three examined top-10. From our analysis, we conclude that, in some situations, O3
packages. On average, BinFinder achieves 27% Recall at top-1 and mitigates to some extent the effects of certain tigress code obfus-
48% Recall at top-10. Compiler optimisation effects on tigress. cations (e.g., Flaten, Add Opaque). Conversely, in the case of Zlib
Based on our understanding of the behavior of tigress obfuscation and Coreutils, we find that O3 optimization hardens some tigress
techniques, it could add dead code, substitute mathematical op- obfuscations, namely Encode Arithmetic, and Encode Literals, which
erations with equivalent but more complex ones, etc. Compiler results in a drop in the achieved Recall by approximately 20%.

449
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf
gmp gnuBinutils libcurl openssl zlib Average #Instructions P(0) #Nodes P(0)
gcc clang gcc clang gcc clang gcc clang gcc clang gcc clang gcc clang gcc clang
O0 Vs O1 0.712 0.809 0.718 NA 0.558 NA 0.803 0.901 0.819 0.823 0.722 0.844 0.03 0.23 0.36 0.44
O0 Vs O2 0.710 0.723 0.704 NA 0.533 NA 0.792 0.877 0.834 0.603 0.715 0.734 0.041 0.24 0.38 0.46
O0 Vs O3 0.716 0.730 0.952 NA 0.457 NA 0.776 0.881 0.648 0.559 0.710 0.723 0.046 0.244 0.38 0.46
O1 Vs O2 0.924 0.887 0.948 NA 0.750 NA 0.847 0.898 0.953 0.757 0.885 0.848 0.049 0.365 0.44 0.53
O1 Vs O3 0.863 0.876 0.970 NA 0.617 NA 0.825 0.893 0.688 0.820 0.792 0.863 0.047 0.364 0.45 0.53
O2 Vs O3 0.904 0.974 1 NA 0.718 NA 0.847 0.921 0.775 0.964 0.849 0.953 0.259 0.492 0.50 0.55
Average 0.805 0.833 0.882 NA 0.606 NA 0.815 0.895 0.786 0.754 0.779 0.828 - - - -
Table 6: Binary function clone search with various compiler optimizations: evaluation with precision at top-1

4.3 Compiler Optimization but compiled with two compilers (𝑔𝑐𝑐, 𝑐𝑙𝑎𝑛𝑔) each with 𝑂 0 − 𝑂 3
In this Section, we test BinFinder under different compiler optimiza- optimizations and three different obfuscation techniques using O-
tion levels, investigating the impact of all pairwise combinations LLVM. For each query binary function 𝑞𝑖 in Dataset-I, we look up
of 𝑂 0 − 𝑂 3 methods on Dataset-II.For each binary function embed- all its similar functions among 116508 functions, and record the
ding (𝑞𝑖 ) that results, for example, from the following configuration BinFinder performance. From Figure 4(a), we see that BinFinder
(𝑔𝑐𝑐, 𝑂 0, 𝑥86), we search for its similar binary functions from the precision is above 80% for 𝑘 ∈[1-5] and furthermore, it is above 70%
following configurations (𝑔𝑐𝑐, 𝑂 3, 𝑥86) or (𝑔𝑐𝑐, 𝑂 3, 𝐴𝑅𝑀) and vice for 𝑘 ∈[6-10], and above 50% for 𝑘 ∈[10-20]. We also observe from
versa. Then, precision is calculated at top-1 over the retrieved result Figure 4(b) that, BinFinder nDCG values are above 80% for 𝑘 ∈[1-
using pairwise cosine distance. Finally, we calculate the average 7], and above 70% for 𝑘 ∈[8-20]. These observations convey that
between the two resulting values. From Table 6, we see that the similar functions appear among the first instances of the retrieved
precision at top-1 over clang compiler is better compared to gcc candidate functions from the repository. Besides, from Figure 4(c),
compiler. On average, clang precision is 82% while gcc precision we can see that at 𝑘=30, BinFinder Recall is 72%, at 𝑘=50, Recall is
is 77%. This observation is justified based on the effect of com- 80%, and at 𝑘=100 Recall is 85%. The higher 𝑘 we select, the Recall
piler optimizations pairs over the number of generated instructions value is increased, as depicted in Figure 4(c). At 𝑘=200, we have a
and basic blocks (nodes). We see that gcc 𝑃 (0) values are less than Recall of 90%.
clang 𝑃 (0) values. These value differences certify that gcc compiler Multi-CPU Architectures with Compiler Optimization. In
modifies more the generated functions compared to clang compiler. this scenario, we evaluateBinFinder over Dataset-II, which contains
Moreover, we can see that the precision at top-1 fluctuates at pack- samples for two CPU architectures (x86 and ARM). On average,
age level among all different optimization options. For example, for every query function 𝑞𝑖 in Dataset-II, it has at least 15 similar
the precision at top-1 for gmp library is significantly different than functions and 203465 dissimilar functions found in Dataset-II. Some
the precision at top-1 for Openssl over both gcc and clang compil- selected packages have more than one version. Therefore, the num-
ers. One more observation from Table 6, indicates that the binary ber of similar functions could double. From Figure 4(a), we see that
function clone search under (𝑂 0, 𝑂 2 ) and (𝑂 0, 𝑂 3 ) setup are more BinFinder precision is 90% when 𝑘 = 1, it is above 80% for 𝑘 ∈[1-5],
challenging compared to other optimization search pairs over gcc it is above 70% for 𝑘 ∈[6-12], and it is above 60% for 𝑘 ∈[13-17].
and clang compilers. Based on Table 1, 𝑂 2 and 𝑂 3 absolute differ- We also observe from Figure 4(b) that, BinFinder nDCG values are
ence mean values over all selected features are higher compared above 80% for 𝑘 ∈[1-10], and above 70% for 𝑘 ∈[11-20]. Besides,
to other optimizations. The optimization 𝑂 3 implicitly includes all from Figure 4(c), we can see that at 𝑘=30 BinFinder Recall is 60%,
𝑂 2 optimizations. It indicates that 𝑂 2 and 𝑂 3 optimization options while at 𝑘=50 BinFinder Recall is 70%. Moreover, BinFinder Recall is
modify their related binary functions to a larger degree compared 80% when 𝑘 = 100, and it is 85% when 𝑘 = 200.
to other compiler optimization options. Multi-CPU Architectures with Code Obfuscation and Op-
timization. In this scenario, BinFinder is trained and tested over
Dataset-III. On average, for every query 𝑞𝑖 , it has 21 similar func-
5 SEARCHING AGAINST ALL BINARIES
tions and 284470 dissimilar functions in Dataset-III. In Figure 4(a)
We evaluate BinFinder when both code obfuscations and compiler we see that BinFinder precision is 86% for 𝑘 = 1, above 80% for 𝑘 ∈[1-
optimizations are applied in the following three scenarios: on single 3], above 70% for 𝑘 ∈[4-10], and above 60% for 𝑘 ∈[13-19]. In Figure
CPU Architecture (x86), on multi-CPU architectures (x86, ARM) 4(b) we observe that BinFinder nDCG values are above 80% for 𝑘 ∈
when only the compiler optimization levels are applied, and on [1-5], and they are above 70% for 𝑘 ∈[6,20]. Besides, from Figure
multi-CPU architectures (x86, ARM) when both compiler optimiza- 4(c) we see that at 𝑘=50 Recall is 60%, at 𝑘=100 it is 70%, and it is 76%
tions and code obfuscation techniques are applied. Given a binary when 𝑘 = 200. Based on the results reported in Figures 4(a), 4(b),
function 𝑞𝑖 , we generate its embedding using BinFinder trained 4(c), we see that binary function clone search on single architecture
model. Then, we search for all its similar functions in our reposi- is less challenging compared to multi-CPU architectures. Based
tory using pairwise cosine similarity distance. Afterwards, we sort on the Recall values, at 𝑘 = 50, BinFinder achieves 80% on single
the retrieved results, and based on the top k candidate functions architecture, 70% on multi-CPU architectures when only different
with the highest cosine similarity scores, we calculate Precision, optimization levels are applied, and 60% when both optimization
nDCG, and Recall. Our evaluation is depicted in Figure 4. levels and obfuscations are applied on multi-CPU architectures.
Single CPU Architecture (x86). In this scenario, BinFinder is The reason behind this drop in Recall values on multi-CPU architec-
trained and tested over Dataset-I. On average, each binary func- ture scenarios is that num_constants, libcCalls, Callers, and Callees
tion has 10 similar functions produced from the same source code

450
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia

1.0 1.0 1.0

0.9 0.9 0.9
0.8 0.8 0.8
0.7 0.7 0.7
0.6 0.6 0.6
Precisio

Recall
nDCG
0.5 0.5 0.5
0.4 X86-O ly. 0.4 X86-On y. 0.4 X86-O ly.
0.3 Multi-CPU (Optimi(atio -O ly). 0.3 Mu ti-CPU (Optimization-Only). 0.3 Multi-CPU (Optimi(atio -O ly).
0.2 Multi-CPU(Optimi(atio & 0.2 Multi-CPU(Optimization & 0.2 Multi-CPU(Optimi(atio &
0.1 Obf&scatio ). 0.1 Obf(&cation). 0.1 Obf&scatio ).
0.0 0.0 0.0
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0 25 50 75 100 125 150 175 200
Number of Nearest Results Number of Nearest Resu ts Number of Nearest Results
(a) Precision. (b) nDCG. (c) Recall.
Figure 4: Function clone search over multi-CPU architectures in the presence of code obfuscations and optimizations

1.0 1.0 1.0

0.9 0.9 0.9
0.8 0.8 0.8
0.7 0.7 0.7
0.6 0.6 0.6
Precisio

Recall
nDCG

0.5 0.5 0.5

0.4 X86-O ly. 0.4 X86-On y. 0.4 X86-O ly.
0.3 Multi-CPU (Optimi(atio -O ly). 0.3 Mu ti-CPU (Optimization-Only). 0.3 Multi-CPU (Optimi(atio -O ly).
0.2 Multi-CPU(Optimi(atio & 0.2 Multi-CPU(Optimization & 0.2 Multi-CPU(Optimi(atio &
0.1 Obf&scatio ). 0.1 Obf(&cation). 0.1 Obf&scatio ).
0.0 0.0 0.0
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Number of Nearest Results Number of Nearest Resu ts Number of Nearest Results
(a) Precision. (b) nDCG. (c) Recall.
Figure 5: Vulnerable Function Search in the presence of code obfuscations and compiler optimizations

features are more affected on multi-CPU architectures compared are applied in Dataset-III. At 𝑘 = 75, BinFinder is able to successfully
to single architecture scenario (x86), as shown in Table 1. retrieve more than 80% of all similar vulnerable binary functions
Overall, our reported figures show that BinFinder is efficient over from different Datasets.
different CPU architecture configurations.
7 COMPARISONS TO SIMILAR APPROACHES
6 SEARCHING FOR VULNERABILITIES Marcelli et al. [17] conducted an empirical study to evaluate bi-
In this Section, we evaluate BinFinder with respect to identify- nary function similarity approaches based on machine learning
ing well-known vulnerable binary functions. We have collected techniques. In this Section, we follow the same procedure estab-
reported vulnerable functions from the CVE for different versions lished by the aforementioned study to evaluate our approach. In this
of OpenSSL, zlib, glibc, and libcurl. In total, we have 78 regard, we downloaded their datasets. Then, following their eval-
unique vulnerable functions including the Heartbleed vulnerability. uation procedure, we create similar training and testing datasets
Our dataset has been cross-compiled with two compilers, namely using our implementation to extract our proposed features. Finally,
gcc and clang using different compiler optimizations and O-LLVM we evaluate and compare BinFinder and report the obtained results.
code obfuscations. In total, our dataset consists of 1198 vulnerable
functions. We take each vulnerable binary function and generate 7.1 Experimental Setup
its related embeddings using BinFinder trained models. Afterwards, Marcelli et al. [17] created two datasets (Dataset-A and Dataset-B)
we search for all its similar functions in our repositories. Finally, representing different challenges in binary function similarities: 1)
based on the 𝑘 retrieved candidate functions with the highest co- different compilers and versions, 2) different optimization levels,
sine similarity score, we calculate Recall, Precision, and nDCG. Our and 3) different CPU architectures and bitness. The datasets are
evaluation results are depicted in Figure 5. From Figure 5(a), we composed of several projects compiled with different compilers into
can see that BinFinder Precision is 98% over Dataset-I, Dataset-II, three CPU architectures x86, ARM, and MIPS. More details about
and Dataset-III. This indicates that whenever we make a query, the dataset are provided in Section B of the Appendix. The goal of
BinFinder returns a good result with a very high precision. Besides, Dataset-A is to train and test models, while the goal of Dataset-B is
at 𝑘 = 25, we observe from Figure 5(c) that, over single architecture, to validate the resulting models trained on Dataset-A on a miscella-
BinFinder can successfully retrieve 80% of all similar vulnerable neous and extensive group of binaries. The evaluation procedure
binary functions in Dataset-I. On the other hand, BinFinder can suc- comprises nine tasks, each evaluating one binary function similar-
cessfully retrieve 62% of all similar vulnerable binary functions over ity challenge. For each task, 50K positive and 50K negative pairs
multi-CPU architectures when different compiler optimizations are of binary functions are randomly selected. For the ranking test,
applied in Dataset-II. Moreover, it can successfully retrieve 55% of 200 positive pairs and 20K negative pairs are randomly selected,
all similar vulnerable binary functions over multi-CPU architec- where each positive pair has 100 negative pairs. The XM task con-
tures when different compiler optimizations and code obfuscations siders function pairs that come from arbitrary architectures, bitness,

451
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf

AUC XM
Approach
XC XC+XB XA XM small medium large MRR10 Recall@1
BinFinder 0.98 0.97 0.98 0.98 0.98 0.98 0.93 0.8 0.73
GMN_OPC-200_e16 0.86 0.85 0.86 0.86 0.89 0.82 0.79 0.53 0.45
GNN-s2v_GeminiNN_OPC-200_e5 0.78 0.81 0.82 0.81 0.84 0.77 0.79 0.36 0.28
SAFE_ASM-list_e5 0.8 0.8 0.81 0.81 0.83 0.77 0.77 0.17 0.27
Zeek 0.84 0.84 0.85 0.84 0.85 0.83 0.87 0.28 0.13
asm2vec 0.62 0.81 0.74 0.69 0.63 0.7 0.78 0.12 0.07
Table 7: Comparison of state-of-the-art models with BinFinder on Dataset-A tasks

AUC MRR10 Recall@1

Approach
XA XA+XO XO XO XA XA+XO XO XA XA+XO
BinFinder 0.99 0.95 0.96 0.77 0.83 0.68 0.72 0.77 0.58
GMN_OPC-200_e16 0.98 0.96 0.97 0.75 0.84 0.71 0.66 0.77 0.61
GNN-s2v_GeminiNN_GeminiFeatures_e5 0.96 0.93 0.93 0.57 0.74 0.57 0.47 0.63 0.49
SAFE_ASM-list_e5 0.9 0.88 0.88 0.27 0.3 0.31 0.14 0.17 0.19
Trex 0.94 0.94 0.94 0.61 0.50 0.53 0.5 0.37 0.46
Zeek 0.94 0.91 0.92 0.41 0.45 0.36 0.28 0.30 0.21
asm2vec_e10 0.69 0.75 0.94 0.60 0.06 0.22 0.49 0.015 0.17
Table 8: Comparison of state-of-the-art models with BinFinder on Dataset-B tasks

compiler, compiler versions, and optimization. This task reflects measures obtained on Dataset-B. From Table 8, we see that Bin-
comparisons across the whole dataset and is considered the most Finder maintains similar performance in terms of 𝐴𝑈 𝐶. However,
challenging task. The XA+XO task reflects Dataset-B as it considers concerning 𝑀𝑅𝑅10 and 𝑅𝑒𝑐𝑎𝑙𝑙@𝐾 over XA+XO task, the perfor-
function pairs having dissimilar architectures, bitness, and opti- mance gets reduced by around 10%, as depicted in Figure 6(b). When
mizations but similar compiler and compiler versions. The Section compared to the GMN model [13], we find that BinFinder shows
B in the Appendix provides more details about the generated tasks. a lower performance with 5% lower Recall when K=5. However,
when K=20, BinFinder draws a similar performance to the GMN,
as depicted in Figure 6(b). To gain more insights into this and pro-
7.2 Results and Analysis
vide a better understanding, we investigate the possible reasons
We compare BinFinder to similar approaches in the literature. We behind such an observation. We examined both datasets in-depth.
evaluate each approach using three metrics: AUC, MRR10, and Re- We find that new LibcCalls and new VEX tokens are introduced in
call@K. Table 7 shows the performance of each examined approach Dataset-B. For example, 127 unique LibcCalls appear in Dataset-A,
on Dataset-A. It is worth mentioning that the selected model_names while 136 unique LibcCall appear in Dataset-B. Consequently, this
represent a customized version of their related approaches that re- could directly impact the number of Libc calls, and the number of
port the best performance, as the empirical study demonstrated in callees features in our model. We believe that this is a limitation
[17]. In Table 7, we can observe that BinFinder outperforms GNN of BinFinder, which occurs due to the fact that the model does not
[13], ZEEK [23], SAFE [18], asm2vec [3], and Gemini [27] in all address unseen system calls. Note that GMN could address this
tasks generated from Dataset-A. Also, to a small extent, BinFinder limitation since it relies on the CFG structures. However, GMN
still outperforms CodeCMR [29] on the XM task with 3% higher approach will not be practical in the presence of obfuscation tech-
Recall when K=5, as depicted in Figure 6(a). Regarding tasks (small, niques such as flattening (FLA), which significantly modifies the
medium, and large), which test the targeted models in the presence CFG of the targeted functions, as elaborated in Section 2.
of various binary function sizes based on their number of Basic
Blocks, we observe in Table 7 that BinFinder has promising per- 7.3 Comparison to Obfuscation Approaches
formance over a varying size of binary functions. BinFinder has
the same performance over small and medium size functions with We compares BinFinder to existing obfuscation-focused methods,
98% 𝐴𝑈 𝐶. However, BinFinder shows less performance over large asm2vec and BinMatch, using Dataset-V outlined in Section 3.4.
binary functions with 93% 𝐴𝑈𝐶. Another observation we can de- From Figure 7 in Appendix, we can see that BinFinder outperforms
rive from Table 7 is that the majority of the models demonstrate asm2vec 3 , achieving a recall rate of 79% at k=100 compared to
quite similar performance when compared using AUC. However, asm2vec’s 33%. It also reports higher precision at k=2 with 79%
they exhibit varying performance when compared to the ranking against asm2vec’s 60%. Precision for BinFinder decreases gently
metrics (MRR10 and Recall@K), as shown in Figure 6. as k increases, while asm2vec’s drops sharply. Similar outcomes
We can see from Table 7, over the XM task, BinFinder outper- have been reported in previous studies [7, 9]. In a comparison
forms all examined competing approaches. For example, with 𝑘 = 1, against BinMatch, a multi-architecture approach, using the same
BinFinder reports 73% Recall. It is a 28% higher Recall than GMN set of packages BinMatch evaluated, BinFinder consistently reports
[13], 46% higher Recall than Gemini [27], and 55% higher Recall higher average recall values. Specifically, it outperforms BinMatch
than SAFE [18]. Moreover, when K=5, BinFinder reports 92% Recall by 9%, 8%, and 7% at top-1, top-5, and top-10 respectively, affirming
while GMN reports 65% Recall. BinFinder’s superiority.
To validate the performance of BinFinder, we test it over Dataset-
B tasks: XO, XA, and XA+XO. Table 8 presents the performance 3 https://github.com/oalieno/asm2vec-pytorch

452
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia

1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
Recall@K

Recall@K
model_name
model_name Binfinder
0.6 Binfinder 0.6 BinfinderDa.ase.-1
GGSNN_OPC-200_e10 GGSNN_OPC-200_e10
0.5 GMN_OPC-200_e16 0.5 GMN_OPC-200_e16
GNN-s2v_GeminiNN_GeminiFea-.res_e5 GNN--20_GeminiNN_GeminiFeature-_e5
0.4 SAFE_ASM-lis-_e5 0.4 SAFE_ASM-li-t_e5
Tre0 Tre1
0.3 Zeek 0.3 Zeek
asm2vec_e10 a-m20ec_e10
0.2 0.2
10 20 30 40 50 10 20 30 40 50
K: num er of results. K: number of results.
(a) Dataset-A XM (b) Dataset-B XA+XO.
Figure 6: A comparison of the Recall at different K values for Dataset-A XM (left) and Dataset-B XA+XO (right) tasks

Feature(s) Analysis Compiler(s) CPU Arch. Obfuscation

Approach Methodology Disassembler
Data Flow
Structural

Statistical

Signature
Semantic

Dynamic

O-LLVM
Distance
Slicing

x86-64

tigress
Graph

Clang

MIPS
ARM
GCC

ICC
VS
DiscovRE [4] • • • MCS, JD IDA • • • • • •
Genius [5] • • • LSH, JD IDA • • • • •
Gemini [27] • • • GNN IDA • • • •
asm2vec [3] • • • PV-DM IDA • • • •
SAFE [18] • • seq2seq angr,radar2 • • •
BinMatch [7] • • • • semantic angr • • • • •
GMN [13] • • GNN/GMN IDA • • • •
𝛼 Diff [14] • • • CNN IDA • •
Zeek [23] • • • MLP pyvex • • • • •
CodeCMR [29] • • • • encoder+GNN +LSTM IDA • • • •
Trex [21] • • • • transformer - • • • • • •
TIKNIB [11] • • Direct comparisons IDA • • • • • •
BinFinder • • MLP angr • • • • • • •
Table 9: A comparison of state-of-the-art related approaches. (•) means that the approach provides the corresponding feature, it
is empty otherwise. (MCS) Maximum Common Subgraph Isomorphism, (JD) Jaccard Distance, (LSH) Locality Sensitive Hashing,
(GNN) Graph Neural Network, (MLP) Multi-layer Perceptron Neural Network.

8 DISCUSSION then passed as input to train the MLP network to generate the final
In this section, BinFinder is compared to state-of-the-art approaches embeddings. As a result, both approaches demonstrate different
considering various factors including features, analysis, methodol- performance behaviors, as detailed in Table 7 and Table 8.
ogy, disassemblers, compilers, CPU architectures, and obfuscation. More recent methods, TIKNIP and 𝛼Diff, use similar features
Table 9 summarizes the aforementioned aspects for 13 approaches, to BinFinder but lack in identifying other potent features and in
most of which have been selected by Marcelli et al. [17] as a repre- building machine learning models to address binary function sim-
sentative set of studies for binary function similarity [17] .The com- ilarity across code transformations and multi-CPU architectures.
parison, encompassing 13 techniques, reveals BinFinder to be the BinFinder proves to be as efficient as leading approaches for mul-
sole approach addressing binary function similarity across different tiple CPU architectures tasks, however, it faces limitations when
obfuscation techniques like O-LLVM and tigress. Other methods encountering unseen libc calls. Future work will focus on refining
such as asm2vec[3], BinMatch[7], and Trex[21] explore O-LLVM the model using a comprehensive dataset or considering an Out-Of-
obfuscation but limit their investigations to single or multiple ar- Vocabulary solution as with INNEREYE[30] to improve BinFinder’s
chitectures and require a dynamic analysis step. performance.
We can also mention that Zeek [23] and BinFinder look similar
as they employ Multi-layer Perceptron Neural Network (MLP) and
use lifted Vex-IR instructions. However, they are different as each
9 RELATED WORK
approach uses different input features. Zeek extracts strands at In this section, we review proposed state-of-the-arts approaches
the basic block level to be used as an input to an MLP model. In over multiple CPU architectures. Further detailed are available in a
contrast, BinFinder uses unique_Vex_instructions in addition to six related survey [1]. Discovre [4] uses multi-level filtering using both
other features that are firstly engineered for representation and numeric and structural features. Inspired by Discovre, Genius [5]
uses statistical and structural features to create attributed control

453
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf

flow graph (ACFG) to be latter used to generate a codebooks to 295–315.

represent binary functions using spectral clustering [20] and then [10] Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-
LLVM–software protection for the masses. In 2015 IEEE/ACM 1st International
encoded using LSH. However, the authors state that building a Workshop on Software Protection. IEEE, 3–9.
codebook is computationally expensive, but it is a one-time process. [11] Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2022.
Revisiting binary code similarity analysis using interpretable feature engineering
In addition, data flow-based approaches typically inspect data flow and lessons learned. IEEE Transactions on Software Engineering (2022).
by analyzing the memory reads and writes, input/output pairs, [12] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-
variable locations, etc. However, most of the existing data flow- mization. In 3rd International Conference on Learning Representations, ICLR 2015,
San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio
based solutions are not scalable. and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
On the other hand, INNEREYE[30] considers an assembly instruc- [13] Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019.
tion as a word and the whole basic block as a sentence. Then, it Graph matching networks for learning the similarity of graph structured objects.
In International conference on machine learning. PMLR, 3835–3845.
utilizes a Neural Machine Translation Model. Yu et al. [28] propose [14] Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and
a semantic-aware system. Given CFGs; it use three connected neu- Wei Zou. 2018. 𝛼 diff: cross-version binary code similarity detection with dnn. In
Proceedings of the 33rd ACM/IEEE International Conference on Automated Software
ral networks within a Siamese neural network: Bert(Transformer) Engineering. ACM, 667–678.
to concurrently learn assembly instructions and basic block embed- [15] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008.
dings, a CNN to learn the order of a given CFG adjacency matrix, Evaluation in Information Retrieval. Cambridge University Press, 139–161.
https://doi.org/10.1017/CBO9780511809071.009
and a GNN to capture the structure information. However, this [16] Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio,
approach receives a CFG as an input, which can be significantly Mohamad Mansouri, and Davide Balzarotti. [n. d.]. How Machine Learning Is
altered by obfuscation techniques, as outlined earlier. jTrans [26] Solving the Binary Function Similarity Problem. ([n. d.]).
[17] Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio,
fuse the CFG information into the Transformer model to capture Mohamad Mansouri, and Davide Balzarotti. 2022. How Machine Learning Is Solv-
both the semantics of instruction and CFG. However, this approach ing the Binary Function Similarity Problem. In 31st USENIX Security Symposium
(USENIX Security 22). 2099–2116.
does not consider obfuscation techniques, which heavily increase [18] Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and
the generated instructions and Basic Blocks. Leonardo Querzoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary
Similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment
- 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19-20,
10 CONCLUSION 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11543), Roberto Perdisci,
The paper introduced BinFinder, a robust new approach for binary Clémentine Maurice, Giorgio Giacinto, and Magnus Almgren (Eds.). Springer,
309–329. https://doi.org/10.1007/978-3-030-22038-9_15
function clone searching that resilient to compiler optimizations [19] Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavy-
and code obfuscations across multiple CPU architectures. By utiliz- weight dynamic binary instrumentation. In ACM Sigplan notices, Vol. 42. ACM,
ing a unique neural network and binary function features, BinFinder 89–100.
[20] Andrew Y Ng, Michael I Jordan, and Yair Weiss. 2002. On spectral clustering:
achieved significant improvements over leading methods, even Analysis and an algorithm. In Advances in neural information processing systems.
amidst varied optimization options and obfuscation techniques. It 849–856.
[21] Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex:
exhibited superior performance with 46% higher recall than Gemini, Learning execution semantics from micro-traces for binary similarity. arXiv
55% higher than SAFE, 28% higher than GMN, and comparable to preprint arXiv:2012.08680 (2020).
CODECMR. Furthermore, BinFinder outperformed asm2vec and Bin- [22] Federico Scrinzi. 2015. Behavioral Analysis of Obfuscated Code. http://essay.
utwente.nl/67522/
Match by 30% and 10% respectively in obfuscation scenarios. This [23] Noam Shalev and Nimrod Partush. 2018. Binary similarity detection using
study is the first to report significant results regarding binary clone machine learning. In Proceedings of the 13th Workshop on Programming Languages
search using the widely-used tigress obfuscator. and Analysis for Security. 42–47.
[24] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino,
Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Krügel,
REFERENCES and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques
[1] Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2022. A Survey of Binary in Binary Analysis. In IEEE Symposium on Security and Privacy, SP 2016, San Jose,
Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features. ACM CA, USA, May 22-26, 2016. IEEE Computer Society, 138–157. https://doi.org/10.
Computing Surveys (CSUR) 55, 1 (2022), 1–41. 1109/SP.2016.17
[2] Christopher M Bishop et al. 1995. Neural Networks for Pattern Recognition. Oxford [25] tigress. 2020. tigress @ONLINE. https://tigress.wtf/.
University Press. [26] Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei
[3] Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary
Boosting Static Representation Robustness for Binary Clone Search against Code code similarity detection. In Proceedings of the 31st ACM SIGSOFT International
Obfuscation and Compiler Optimization. In 2019 IEEE Symposium on Security Symposium on Software Testing and Analysis. 1–13.
and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019. IEEE, 472–489. [27] Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017.
https://doi.org/10.1109/SP.2019.00003 Neural Network-based Graph Embedding for Cross-Platform Binary Code Simi-
[4] Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: larity Detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer
Efficient Cross-Architecture Identification of Bugs in Binary Code.. In NDSS. and Communications Security. ACM, 363–376.
[5] Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng [28] Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order
Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings matters: semantic-aware neural networks for binary code similarity detection. In
of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1145–1152.
ACM, 480–491. [29] Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020.
[6] FLIRT. 2020. FLIRT @ONLINE. https://hex-rays.com/products/ida/tech/flirt/. Codecmr: Cross-modal retrieval for function-level binary source code matching.
[7] Yikun Hu, Hui Wang, Yuanyuan Zhang, Bodong Li, and Dawu Gu. 2019. A Advances in Neural Information Processing Systems 33 (2020), 3872–3883.
Semantics-Based Hybrid Approach on Binary Code Similarity Comparison. IEEE [30] Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and
Transactions on Software Engineering 47 (2019), 1241–1258. Zhexin Zhang. 2019. Neural Machine Translation Inspired Binary
[8] idapro. 2020. idapro @ONLINE. https://www.hex- Code Similarity Comparison beyond Function Pairs. In 26th Annual
rays.com/products/ida/index.shtml. Network and Distributed System Security Symposium, NDSS 2019, San
[9] Jianguo Jiang, Gengwang Li, Min Yu, Gang Li, Chao Liu, Zhiqiang Lv, Bin Lv, Diego, California, USA, February 24-27, 2019. The Internet Society.
and Weiqing Huang. 2020. Similarity of binaries across optimization levels and https://www.ndss-symposium.org/ndss-paper/neural-machine-translation-
obfuscation. In European Symposium on Research in Computer Security. Springer, inspired-binary-code-similarity-comparison-beyond-function-pairs/

454
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia

A CODE OBFUSCATIONS OVERVIEW A.1 tigress commands

Code obfuscation makes significant changes to the control flow tigress --Environment=x86\_64:Linux:Gcc:4.6
graph and the instructions of the binaries it is applied to. In this --Transform=Virtualize --Functions=%100
study, we focus our attention on two state-of-the-art obfuscation --out=outputfile.c -c {f} -I includePath
tools, namely: LLVM-Obfuscator (O-LLVM) [10] and tigress [25]. -o inputfile.c -std=c99
These tools are publicly available and provide modern transforma-
tigress --Environment=x86\_64:Linux:Gcc:4.6
tions techniques with different and flexible configurations. Next,
--Transform=Flatten --Functions=%100
we describe the techniques offered by each tool.
--FlattenDispatch=switch,goto --out=outputfile.c
O-LLVM: it is easy to use and configure in a similar manner to the -c {f} -I. o inputfile.c -nostartfiles
𝑔𝑐𝑐 compiler; it does not require any additional manual interven- -D\_Noreturn="" -std=c99
tion. Consequently, it allows to easily compare optimized functions tigress --Environment=x86\_64:Linux:Gcc:4.6
against obfuscated ones. O-LLVM provides three different code --Transform=EncodeLiterals --Functions=%100
obfuscation techniques as outlined in the following: --EncodeLiteralsKinds=integer --out=outputfile.c
• Control Flow Flattening (FLA) transforms the original -c {f} -I -o inputfile.c -nostartfiles
Control Flow Graph (CFG) into a more complicated struc- -D\_Noreturn="" -std=c99
ture with new entry conditions and variables. As a result, tigress --Environment=x86\_64:Linux:Gcc:4.6
the numbers of instructions, basic blocks, and edges of ob- --Transform=EncodeArithmetic --Functions=%100
fuscated functions are significantly increased compared to --EncodeLiteralsKinds=integer --out=outputfile.c
-c {f} -I. -o inputfile.c -nostartfiles
the original ones.
-D\_Noreturn -std=c99
• Instruction Substitution (SUB) replaces assembly code
tigress --Environment=x86\_64:Linux:Gcc:4.6
segments portions with other semantically equivalent ones
--Transform=InitOpaque --Functions=%100
by traversing the function’s control flow using predefined
--Transform=AddOpaque --Functions=%100
rules. As such, SUB significantly modifies the number of --AddOpaqueCount=16
assembly instructions and adds extra constants. --AddOpaqueKinds=call,bug,true,junk
• Bogus Control Flow (BCF) changes the CFG by intro- --Transform=CleanUp --CleanUpKinds=annotations
ducing several new unrelated basic blocks and conditions. --out=outputfile.c -c {f} -I.
It could also divide, join, and reorder the original basic blocks. -o inputfile.c -std=c99

Tigress is a code-to-code obfuscator that receives a C/C++ source

code file as an input and outputs C/C++ obfuscated source code B COMPARE TO SIMILAR APPROACHES
file. In this setting, tigress provides various types of obfuscation This section will provide an extra details related to Section 7.
techniques. According to tigress authors, their technology is on
a par with other tools that are provided by commercial vendors. B.1 Datasets
In our study, we investigate some well-known obfuscation tech- Marcelli et al. [16] created two datasets (Dataset-A and Dataset-B)
niques implemented by tigress. In the following, we provide a brief representing different challenges in binary function similarities: 1)
summary of the techniques that we studied. different compilers and versions, 2) different optimization levels,
• Encode Literals replaces strings and integers with less ob- and 3) different CPU architectures and bitness. Dataset-A comprises
vious expressions evaluating to original values at run-time. seven projects: OpenSSL, Unrar, Z3, ClamAV, Curl, Nmap, and Zlib.
• Encode Arithmetic substitutes arithmetic operations with Each project is compiled using four different versions of GCC and
more complex but functionally equivalent expressions ones. Clang compilers into three CPU architectures, x86, ARM, and MIPS
• Add Opaque breaks up code blocks by adding conditional in 32-bit and 64-bit versions and five optimization levels O0, O1, O2,
jumps based on opaque predicates. It generates a manipu- O3, and Os. Dataset-B comprises ten projects: GMP, ImageMagick,
lated version of the original code, inserting randomly gener- Libmicrohttpd, LibT omCrypt, Binutils, Coreutils, Diffutils, Find-
ated assembly, or calling random or non-existing functions utils, PuTTy, and SQLite. Each project is compiled using GCC-7.5
that will not be executed at run-time. for x86, x64, and ARM in 32-bit and 64-bit versions and four op-
• Flatten provides a switch-based or indirect call-based flat- timization levels (O0, O1, O2, and O3). The goal of Dataset-A is
ting. In a switch dispatch, each block becomes a case in a to train and test models, while the goal of Dataset-B is to validate
switch statement where the switch is included inside an in- the resulting models trained on Dataset-A on a miscellaneous and
finite loop. Note that this technique is different from the extensive group of binaries.
FLA technique provided by by O-LLVM, which uses if-else
constructs and a local variable or a register [22]. B.2 Experimental Setup
• Virtualization transforms a function into an interpreter. The evaluation procedure comprises nine tasks, each evaluating
Thus, each transformed function has its own specific byte- one binary function similarity challenge. For each task, 50K positive
code language. Each interpreter variant differs in the struc- and 50K negative pairs of binary functions are randomly selected.
ture of its code and its execution pattern. For the ranking test, 200 positive pairs and 20K negative pairs are
randomly selected, where each positive pair has 100 negative pairs.

455
ASIA CCS ’23, July 10–14, 2023, Melbourne, VIC, Australia Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf

BinFinder asm2 ec BinFinder asm2vec BinFinder asm2vec

1.0 1.0
1.0
0.9 0.9 0.9
0.8 0.8 0.8
0.7 0.7 0.7
precision

0.6 0.6 0.6

Recall
nDCG
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0.0 0.0 0.0
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0 25 50 75 100 125 150 175 200
Number of Nearest Results Number of Neares Resul s Number of Nearest Results
(a) Precision. (b) nDCG. (c) Recall.
Figure 7: Evaluating BinFinder aganist state-of-the-art approaches namely asm2vec and BinMatch. In this scenario, we take a
given function 𝑓𝑖 and we search for its similar ones within a large set of functions.
compiler versions, and optimizations but similar architecture and
1.0 bitness; 3) XC+XB: considers function pairs having dissimilar com-
0.9
0.8 pilers, compiler versions, optimizations, and bitness but similar
0.7 architecture; 4) XA: considers function pairs having dissimilar ar-
True positive rate

0.6
0.5
chitectures and bitness but similar compiler, compiler version, and
0.4 optimizations. This task represents binary function similarity over
0.3 firmware images, which are cross-compiled using a single com-
0.2
0.1 X86 Onl piler with different optimization levels; 5) XA+XO: this task reflects
0.0 X86 + ARM
Dataset-B, it considers function pairs having dissimilar architec-
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate tures, bitness, and optimizations but similar compiler and compiler
versions; 6) XM: considers function pairs that come from arbitrary
Figure 8: Results of similarity testing architectures, bitness, compiler, compiler versions, and optimiza-
tion. This task reflects comparisons across the whole dataset, and it
is considered the most difficult task; 7) XM-S: is a sub-task of XM,
The first three tasks consider cases limited to a single CPU archi- but it considers only small functions with less than 20 Basic Blocks;
tecture, namely: 1) XO considers function pairs having dissimilar 8) XM-M considers medium sizes functions with greater than 20
optimizations but the same compiler, compiler version, and archi- and less than 100 Basic Blocks; 9) XM-L considers large functions
tecture; 2) XC: considers function pairs having dissimilar compilers, with greater than 100 Basic Blocks.

456

View publication stats

Data Driven Prediction of Vehicle Cabin Thermal Comfort Using Machine Learning and High Fidelity Simulation Results
No ratings yet
Data Driven Prediction of Vehicle Cabin Thermal Comfort Using Machine Learning and High Fidelity Simulation Results
12 pages
Moroccan MoCA
No ratings yet
Moroccan MoCA
7 pages
A Novel Hybrid MPPT Controller Using (P&O) - Neural Networks For Variable Speed Wind Turbine Based On DFIG
No ratings yet
A Novel Hybrid MPPT Controller Using (P&O) - Neural Networks For Variable Speed Wind Turbine Based On DFIG
8 pages
Osdi21 Full Proceedings PDF
No ratings yet
Osdi21 Full Proceedings PDF
579 pages
PROJECT REPORT- SIGN LANGUAGE TO TEXT CONVERSION
No ratings yet
PROJECT REPORT- SIGN LANGUAGE TO TEXT CONVERSION
34 pages
Lecture 6: Value Function Approximation: David Silver
No ratings yet
Lecture 6: Value Function Approximation: David Silver
56 pages
Federated Learning
No ratings yet
Federated Learning
50 pages
Policy Gradient Method For Robust Reinforcement Learning
No ratings yet
Policy Gradient Method For Robust Reinforcement Learning
44 pages
ml_for_data_science
No ratings yet
ml_for_data_science
76 pages
2016 - CoojaSimulatorManual
No ratings yet
2016 - CoojaSimulatorManual
26 pages
Classification of Endangered Bird Species of Nepal Using Deep Learning
No ratings yet
Classification of Endangered Bird Species of Nepal Using Deep Learning
43 pages
Hyundai VENUE SUV Brochure PDF
No ratings yet
Hyundai VENUE SUV Brochure PDF
10 pages
Create Project File: Sandeep B
No ratings yet
Create Project File: Sandeep B
12 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
20 pages
Sl. No. Experiment No 1a 3: Analog and Digital Electronics Laboratory (17CSL37)
No ratings yet
Sl. No. Experiment No 1a 3: Analog and Digital Electronics Laboratory (17CSL37)
58 pages
Jacobian Descent For Multi-Objective Optimization: Pierre - Quinton@epfl - CH
No ratings yet
Jacobian Descent For Multi-Objective Optimization: Pierre - Quinton@epfl - CH
39 pages
Apple Juice Fermentation Process: A Review
No ratings yet
Apple Juice Fermentation Process: A Review
17 pages
Failure Monitoring of Gas Turbine Based On Vibration Analysis and Detection
No ratings yet
Failure Monitoring of Gas Turbine Based On Vibration Analysis and Detection
7 pages
IBinHunt - Binary Hunting With Inter-Procedural Control Flow
No ratings yet
IBinHunt - Binary Hunting With Inter-Procedural Control Flow
19 pages
2008 Binhunt - Icics08
No ratings yet
2008 Binhunt - Icics08
18 pages
Optimization: 1 Motivation
No ratings yet
Optimization: 1 Motivation
20 pages
Multi-Classification of Brain Tumor Images Using Convolutional Neural Network - IEEE
No ratings yet
Multi-Classification of Brain Tumor Images Using Convolutional Neural Network - IEEE
11 pages
Synthesis and Radical Scavenging Activity of New Phenolic Hydrazone/hydrazide Derivatives: Experimental and Theoretical Studies
No ratings yet
Synthesis and Radical Scavenging Activity of New Phenolic Hydrazone/hydrazide Derivatives: Experimental and Theoretical Studies
9 pages
The Epidemiological Profile of Night-Time IJISRT22APR824 - 2
No ratings yet
The Epidemiological Profile of Night-Time IJISRT22APR824 - 2
8 pages
PHD Syllabus
No ratings yet
PHD Syllabus
50 pages
Farook Hamzeh - Removing Constraints To Make Tasks Ready
No ratings yet
Farook Hamzeh - Removing Constraints To Make Tasks Ready
7 pages
Abdelkarimetal 2023 I
No ratings yet
Abdelkarimetal 2023 I
5 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Efficient Online Learning Algorithms Based On LSTM Neural Networks
No ratings yet
Efficient Online Learning Algorithms Based On LSTM Neural Networks
12 pages
Machine Learning
No ratings yet
Machine Learning
135 pages
Isikdogan2019 Deepwatermap v2
No ratings yet
Isikdogan2019 Deepwatermap v2
5 pages
Sample project doc-REC
No ratings yet
Sample project doc-REC
66 pages
1 s2.0 S2090447922003835 Main
No ratings yet
1 s2.0 S2090447922003835 Main
21 pages
Highlights On Nutritional and Therapeutic Value of Stinging Nettle (Urtica Dioica)
No ratings yet
Highlights On Nutritional and Therapeutic Value of Stinging Nettle (Urtica Dioica)
8 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
AI Expert Roadmap
No ratings yet
AI Expert Roadmap
13 pages
chimie_vieillissement2
No ratings yet
chimie_vieillissement2
9 pages
Paper With Data
No ratings yet
Paper With Data
8 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
13 pages
Orthodonticallyinducedosteocyteapoptosis PDF
No ratings yet
Orthodonticallyinducedosteocyteapoptosis PDF
11 pages
Test Bench For Photovoltaic Modules: November 2010
No ratings yet
Test Bench For Photovoltaic Modules: November 2010
6 pages
8255A
No ratings yet
8255A
3 pages
SRFControl Strategy Basedon STFIS
No ratings yet
SRFControl Strategy Basedon STFIS
7 pages
Maldozer: Automatic Framework For Android Malware Detection Using Deep Learning
No ratings yet
Maldozer: Automatic Framework For Android Malware Detection Using Deep Learning
13 pages
Designand Simulationofa Planar Topology Butler Matrixfor 10 GHZ Switched Multibeam Antenna
No ratings yet
Designand Simulationofa Planar Topology Butler Matrixfor 10 GHZ Switched Multibeam Antenna
5 pages
Growing Agaricus Bisporus On Compost Mixtures Based On Chicken Manure and Banana Residues
No ratings yet
Growing Agaricus Bisporus On Compost Mixtures Based On Chicken Manure and Banana Residues
12 pages
10.1016_j.indcrop.2023.117536 (3)
No ratings yet
10.1016_j.indcrop.2023.117536 (3)
13 pages
A Property-Based Abstraction Framework For Sysml Activity Diagrams
No ratings yet
A Property-Based Abstraction Framework For Sysml Activity Diagrams
17 pages
56 Eadab 408 Ae 2 A 58 DC 49 B 98 e
No ratings yet
56 Eadab 408 Ae 2 A 58 DC 49 B 98 e
7 pages
The Impact of Military Expenditures On Economic Growth of Pakistan
No ratings yet
The Impact of Military Expenditures On Economic Growth of Pakistan
9 pages
257489-ArticleText-625197-1-10-20230102
No ratings yet
257489-ArticleText-625197-1-10-20230102
7 pages
POLL Volume9 Issue3 Pages890-906
No ratings yet
POLL Volume9 Issue3 Pages890-906
18 pages
Effect of Supporting System On Dynamic Buckling of Elevated Water Tanks: A Case Study
No ratings yet
Effect of Supporting System On Dynamic Buckling of Elevated Water Tanks: A Case Study
11 pages
New Analytical Expressions of Photovoltaic Solar Module Physical Parameters Effects of Module Temperature and Incident Solar Irradiance
No ratings yet
New Analytical Expressions of Photovoltaic Solar Module Physical Parameters Effects of Module Temperature and Incident Solar Irradiance
14 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Potential Therapeutic Effect of Moroccan Propolis in Hyperglycemia, Dyslipidemia, and Hepatorenal Dysfunction in Diabetic Rats
No ratings yet
Potential Therapeutic Effect of Moroccan Propolis in Hyperglycemia, Dyslipidemia, and Hepatorenal Dysfunction in Diabetic Rats
10 pages
Enhancing
No ratings yet
Enhancing
5 pages
GCJ5M388
No ratings yet
GCJ5M388
15 pages
Purification and Biochemical Characterization of A Highly Thermostable Bacteriocin Isolated From Brevibacillus Brevis Strain GM100
No ratings yet
Purification and Biochemical Characterization of A Highly Thermostable Bacteriocin Isolated From Brevibacillus Brevis Strain GM100
11 pages
Model Experiments To Assess Effect of Eccentric Loading On The Ultimate Bearing Capacity of A Strip Footing Near A Dry Sand Slope
No ratings yet
Model Experiments To Assess Effect of Eccentric Loading On The Ultimate Bearing Capacity of A Strip Footing Near A Dry Sand Slope
13 pages
Google Aiml
No ratings yet
Google Aiml
50 pages
Reservoir Compartmentalization and Uid Property Determination Using A Modular Dynamic Tester (MDT) : Case Study of An Algerian Oil Field
No ratings yet
Reservoir Compartmentalization and Uid Property Determination Using A Modular Dynamic Tester (MDT) : Case Study of An Algerian Oil Field
14 pages
Aeromagnetisme 3MA 2015 Agadir Taroudante
No ratings yet
Aeromagnetisme 3MA 2015 Agadir Taroudante
3 pages
Jintelligence 10 000603
No ratings yet
Jintelligence 10 000603
12 pages
10.03_04
No ratings yet
10.03_04
11 pages
Acs Omega 2023
No ratings yet
Acs Omega 2023
14 pages
Molecules KARA
No ratings yet
Molecules KARA
14 pages
2012 Elsevier MaterialsScienceandEngineeringA Aminallahp3 Vol.538pages2027
No ratings yet
2012 Elsevier MaterialsScienceandEngineeringA Aminallahp3 Vol.538pages2027
9 pages
4pDevelopment and Validation of An
No ratings yet
4pDevelopment and Validation of An
8 pages
Acte2Gobiusnigerasarelevant
No ratings yet
Acte2Gobiusnigerasarelevant
5 pages
Morphology and Corrosion Behavior of Zn-Ni Layers Electrodeposited On Low Alloy - 2021
No ratings yet
Morphology and Corrosion Behavior of Zn-Ni Layers Electrodeposited On Low Alloy - 2021
7 pages
Modeling_and_Simulation_of_Energy_Management_Syste
No ratings yet
Modeling_and_Simulation_of_Energy_Management_Syste
7 pages
3
No ratings yet
3
10 pages
NIC-UNIT2
No ratings yet
NIC-UNIT2
11 pages
9-Variable step size P&O MPPT algorithm for optimal power extraction of multi-phase PMSG based wind generation system
No ratings yet
9-Variable step size P&O MPPT algorithm for optimal power extraction of multi-phase PMSG based wind generation system
15 pages
PhononspectraofdiamondSiGeandalpha-Sn-Aouissietal-2006
No ratings yet
PhononspectraofdiamondSiGeandalpha-Sn-Aouissietal-2006
11 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
73-ICCITT12249
No ratings yet
73-ICCITT12249
6 pages
Big Data Quality Eval RG
No ratings yet
Big Data Quality Eval RG
8 pages
Math 404 - W01 - Intro
No ratings yet
Math 404 - W01 - Intro
28 pages
A_Comparative_Study_of_Blockchain_Development_Platforms__ICCIP_
No ratings yet
A_Comparative_Study_of_Blockchain_Development_Platforms__ICCIP_
8 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Zouhal2022 FIFAWC22fitnesschallenges
No ratings yet
Zouhal2022 FIFAWC22fitnesschallenges
6 pages
UNIT1
No ratings yet
UNIT1
38 pages
Jurdp.21.00040 Offprint
No ratings yet
Jurdp.21.00040 Offprint
20 pages
article francais DRX
No ratings yet
article francais DRX
7 pages
WWDR3 Case Study Volume Tunisia
No ratings yet
WWDR3 Case Study Volume Tunisia
7 pages
Document Recherche Anglais
No ratings yet
Document Recherche Anglais
11 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
18 pages
Stability Analysis of Ubiquitous Direct Time Integration Methods
No ratings yet
Stability Analysis of Ubiquitous Direct Time Integration Methods
11 pages
articleBelgmorMouissi
No ratings yet
articleBelgmorMouissi
8 pages
Unstructured_Big_Data_Quality_Camera_Ready_Last_Adel
No ratings yet
Unstructured_Big_Data_Quality_Camera_Ready_Last_Adel
7 pages
Towards Rural Nearly Zero Energy Buildings Through1
No ratings yet
Towards Rural Nearly Zero Energy Buildings Through1
16 pages
2.Paper2-Published-APhotovoltaicCellModuleArraySimulationandMonitoringModelusingMATLABGUIInterface
No ratings yet
2.Paper2-Published-APhotovoltaicCellModuleArraySimulationandMonitoringModelusingMATLABGUIInterface
16 pages
2018 Too, A Comparative Study of Fine-Tuning Deep Learning Models For Plant Disease PDF
No ratings yet
2018 Too, A Comparative Study of Fine-Tuning Deep Learning Models For Plant Disease PDF
8 pages
Ik2021Nour
No ratings yet
Ik2021Nour
2 pages
Containerization Technology Essentials: Definitive Reference for Developers and Engineers
From Everand
Containerization Technology Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

BinFinder asiaCCS

Uploaded by

BinFinder asiaCCS

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Binary Function Clone Search in the Presence of Code Obfuscation and

Conference Paper · July 2023

Abdullah Qasem Mourad Debbabi

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Bernard Lebel Marthe Kassouf

Code Obfuscations Compilers Optimizations

Figure 1: Overview of work flow of BinFinder

Figure 2: Selected features Representation of BIO_get_accept_socket in OpenSSL library

3.2 Feature Representation

1.0 1.0 1.0

1.0 1.0 1.0

0.5 0.5 0.5

AUC MRR10 Recall@1

Feature(s) Analysis Compiler(s) CPU Arch. Obfuscation

flow graph (ACFG) to be latter used to generate a codebooks to 295–315.

A CODE OBFUSCATIONS OVERVIEW A.1 tigress commands

Tigress is a code-to-code obfuscator that receives a C/C++ source

BinFinder asm2 ec BinFinder asm2vec BinFinder asm2vec

0.6 0.6 0.6

View publication stats

You might also like