0% found this document useful (0 votes)

2 views

An Adversarial Machine Learning Model Against Android Malware Evasion Attacks

This paper presents an adversarial machine learning model designed to enhance Android malware detection against evasion attacks. It explores various evasion strategies based on the attackers' knowledge and proposes a robust learning paradigm to improve classifier security. The model is validated through comprehensive experiments using real-world data from an anti-malware industry, demonstrating its effectiveness in countering malware threats.

Uploaded by

soutien104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

An Adversarial Machine Learning Model Against Android Malware Evasion Attacks

Uploaded by

soutien104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

An Adversarial Machine Learning Model

Against Android Malware Evasion Attacks

Lingwei Chen1 , Shifu Hou1 , Yanfang Ye1(B) , and Lifei Chen2

1
Department of Computer Science and Electrical Engineering,
West Virginia University, Morgantown, WV 26506, USA
{lgchen,shhou}@mix.wvu.edu, [email protected]
2
School of Mathematics and Computer Science, Fujian Normal University,
Fuzhou 350117, Fujian, China
[email protected]

Abstract. With explosive growth of Android malware and due to its

damage to smart phone users, the detection of Android malware is one of
the cybersecurity topics that are of great interests. To protect legitimate
users from the evolving Android malware attacks, systems using machine
learning techniques have been successfully deployed and offer unparal-
leled flexibility in automatic Android malware detection. Unfortunately,
as machine learning based classifiers become more widely deployed, the
incentive for defeating them increases. In this paper, we explore the secu-
rity of machine learning in Android malware detection on the basis of
a learning-based classifier with the input of Application Programming
Interface (API) calls extracted from the smali files. In particular, we
consider different levels of the attackers’ capability and present a set of
corresponding evasion attacks to thoroughly assess the security of the
classifier. To effectively counter these evasion attacks, we then propose
a robust secure-learning paradigm and show that it can improve system
security against a wide class of evasion attacks. The proposed model can
also be readily applied to other security tasks, such as anti-spam and
fraud detection.

Keywords: Adversarial machine learning · Android malware detection ·

Evasion attack

1 Introduction
Due to their mobility and ever expanding capabilities, smart phones have been
widely used to perform the tasks, such as banking and automated home con-
trol, in people’s daily life. In recent years, there has been an exponential growth
in the number of smart phone users around the world and it is estimated that
77.7% of all devices connected to the Internet will be smart phones in 2019
[13]. Designed as an open, free, and programmable operation system, Android
as one of the most popular smart phone platforms dominates the current market
share [1]. However, the openness of Android not only attracts the developers for
c Springer International Publishing AG 2017
S. Song et al. (Eds.): APWeb-WAIM 2017 Workshops, LNCS 10612, pp. 43–55, 2017.
https://doi.org/10.1007/978-3-319-69781-9_5
44 L. Chen et al.

producing legitimate applications (apps), but also attackers to deliver malware

(short for malicious software) onto unsuspecting users to disrupt the mobile
operations. Today, a lot of android malware (e.g., Geinimi, DriodKungfu and
Hongtoutou) is released on the markets [25], which poses serious threats to smart
phone users, such as stealing user information and sending SMS advertisement
spams without the user’s permission [9]. According to Symantec’s recent Internet
Security Threat Report [21], one in every five Android apps were actually mal-
ware. To protect legitimate users from the attacks of Android malware, intelli-
gent systems using machine learning techniques have been successfully developed
in recent years [11,12,23,24,26,29]. Though these machine learning techniques
offer exceptional performance in automatic Android malware detection, machine
learning itself may open the possibility for an adversary who maliciously “mis-
trains” a classifier (e.g., by changing data distribution or feature importance)
in a detection system. When the learning system is deployed in a real-world
environment, it is of a great interest for the attackers to actively manipulate the
data to make the classifier producing minimum true positive (i.e., maximumly
misclassifying Android malware as benign).
Android malware attackers and anti-malware defenders are actually engaged
in a never-ending arms race, where both the attackers and defenders analyze the
vulnerabilities of each other, and develop their own optimal strategies to over-
come the opponents [5,18]. For example, attackers use repackaging and obfusca-
tion to evade the anti-malware venders’ detection. Though, the issues of machine
learning security are starting to be leveraged [3,4,6,7,16,19,20,30], most exist-
ing researches for adversarial machine learning focus in the area of spam email
detection, but rarely for Android malware detection. However, with the popu-
larity of machine learning based detections, such adversaries will sooner or later
present. In this paper, with the inputs of Application Programming Interface
(API) calls extracted from the smali files (smali is an assemble/dissembler for
the Dalvid executable (dex) files and provides readable code in smali language),
we explore the security of machine learning in Android malware detection on
the basis of a learning-based classifier. The major contributions of our work can
be summarized as follows:

– Thorough exploration of Android malware evasion attacks under diﬀerent sce-

narios: The attacker may have different levels of knowledge of the targeted
learning system [19]. We define a set of evasion attacks corresponding to the
different scenarios, and implement a thorough security analysis of a learning-
based classifier.
– An adversarial machine learning model against the evasion attacks: Based on
the learning system tainted by the attacks, we present an adversarial learning
model against the evasion attacks in Android malware detection, in which
we incorporate evasion data manipulation into the learning algorithm and
enhance the robustness of the classifier using the security regularization term
over the evasion cost.
– Comprehensive experimental study on a real sample collection from an anti-
malware industry company: We collect the sample set from Comodo Cloud
An Adversarial Machine Learning Model 45

Security Center, and provide a series of comprehensive experiments to empir-

ically access the performances of our proposed methods.

The rest of the paper is organized as follows. Section 2 defines the problem of
machine learning based Android malware detection. Section 3 describes the eva-
sion attacks under different scenarios and their corresponding implementations.
Section 4 introduces an adversarial learning model against the evasion attacks.
Section 5 systematically evaluates the effectiveness of the proposed methods.
Section 6 discusses the related work. Finally, Sect. 7 concludes.

2 Machine Learning Based Android Malware Detection

Based on the collected Android apps, without loss of generality, in this paper, we
extract API calls from the smali files as the features, since they are used by the
apps in order to access operating system functionality and system resources and
thus can be used as representations of the behaviors of an app [12]. To extract
the API calls, the Android app is first unzipped to provide the dex file, and then
the dex file is further decompiled into smali code (i.e., the interpreted, interme-
diate code between Java and DalvikVM [8]) using a well-known reverse engi-
neering tool APKTool [2]. The converted smali code can then be parsed for API
call extraction. For example, the API calls of “Lorg/apache/http/HttpRequest;
→containsHeader ” and “Lorg/apache/ http/ HttpRequest; →addHeader ” can be
extracted from “winterbird.apk” (MD5: 53cec6444101d19 76af1b253ff5b2226 )
which is a theme wallpaper app embedded with malicious code that can
steal user’s credential. Note that other feature representations, either static
or dynamic, are also applicable in our further investigation. Resting on the
extracted API calls, we denote our dataset D to be of the form D = {xi , yi }ni=1
of n apps, where xi is the set of API calls extracted from app i, and yi is the
class label of app i, where yi ∈ {+1, −1, 0} (+1 denotes malicious, −1 denotes
benign, and 0 denotes unknown). Let d be the number of all extracted API calls
in the dataset D. Each of the app can be represented by a binary feature vector:

xi =< xi1 , xi2 , xi3 , ..., xid >, (1)

where xi ∈ R , and xij = {0, 1} (i.e., if app i includes AP Ij , then xij = 1;
d

otherwise, xij = 0).

The problem of machine learning based Android malware detection can be
stated in the form of: f : X → Y which assigns a label y ∈ Y (i.e., −1 or
+1) to an input app x ∈ X through the learning function f . A general linear
classiﬁcation model for Android malware detection can be thereby denoted as:

f = sign(f (X)) = sign(XT w + h), (2)

where f is a vector, each of whose elements is the label (i.e., malicious or benign)
of an app to be predicted, each column of matrix X is the API feature vector of
an app, w is the coeﬃcients and h is the biases. More speciﬁcally, the machine
46 L. Chen et al.

learning on the basis of a linear classiﬁer can be formalized as an optimization

problem [28]:
1 1 T 1 T
argmin ||y − f ||2 + w w+ h h + ξ T (f − XT w − h) (3)
f ,w,h;ξ 2 2β 2γ
subject to Eq. 2, where y is the labeled information vector, ξ is Lagrange mul-
tiplier, β and γ are the regularization parameters. Note that Eq. 3 is a typical
linear classification model consisting of specific loss function and regularization
terms. Without loss of generality, the equation can be transformed into different
linear models depending on the choices of loss function and regularization terms.

3 Implementation of Evasion Attacks

Considering that the attackers may know differently about: (i) the feature space,
(ii) the training sample set, and (iii) the learning algorithm [19], we characterize
their knowledge in terms of a space Γ that encodes knowledge of the feature
space X, the training sample set D, and the classification function f . To this
end, we present three well-defined evasion attacks to facilitate our analysis: (1)
Mimicry Attacks: The attackers are assumed to know the feature space and
be able to obtain a collection of apps to imitate the original training dataset,
i.e., Γ = (X, D̂). (2) Imperfect-knowledge Attacks: In this case, it’s assumed
that both the feature space and the original training sample set can be fully
controlled by the attackers, i.e., Γ = (X, D). (3) Ideal-knowledge Attacks: This
is the worst case where the learning algorithm is also known to the attackers,
i.e., Γ = (X, D, f ), allowing them to perfectly access to the learning system.

3.1 Evasion Cost

Evasion attacks are universally modeled as an optimization problem: given an
original malicious app x ∈ X + , the evasion attacks attempt to manipulate it to
be detected as benign (i.e., x ∈ X − ), with the minimal evasion cost. Consid-
ering API calls of each app is formatted as a binary vector, the cost of feature
manipulation (addition or elimination), can be encoded as the distance between
x and x :

c(x , x) = ||aT (x − x)||pp , (4)

where p is a real number and a is a weight vector, each of which denotes the
relative cost of changing a feature. The cost function can be considered as 1 -
norm or 2 -norm depending on the feature space. For the attackers, to evade
the detection, adding or hiding some API calls in a malicious app does not seem
difficult. However, some specific API calls may affect the structure for intrusive
functionality, which may be more expensive to be modified. Therefore we view
the evasion cost c practically significant. Accordingly, there is an upper limit of
the maximum manipulations that can be made to the original malicious app x.
Therefore, the manipulation function A(x) can be formulated as
An Adversarial Machine Learning Model 47

x sign(f (x )) = −1 and c(x , x) ≤ δmax
A(x) = , (5)
x otherwise

where the malicious app is manipulated to be misclassiﬁed as benign only if

the evasion cost is less than or equal to a maximum cost δmax . Let f =
sign(f (A(X))), then the main idea for an evasion attack is to maximize the
total loss of classification (i.e., argmax 12 ||y − f ||2 ), which means that the more
malicious Android apps are misclassified as benign, the more effective the evasion
attack could be. An ideal evasion attack modifies a small but optimal portion
of features of the malware with minimal evasion cost, while makes the classifier
achieve lowest true positive rate.

3.2 Evasion Attack Method

To implement the evasion attack, it’s necessary for the attackers to choose a
relevant subset of API calls applied for feature manipulations. To evade the
detection with lower evasion cost, the attackers may inject the API calls most
relevant to benign apps while remove the ones more relevant to malware. To
stimulate the attacks, we rank each API call using Max-Relevance algorithm
[17] which has been successfully applied in malware detection [27]. Based on a
real sample collection of 2, 334 Android apps (1, 216 malicious and 1, 118 benign)
obtained from Comodo Cloud Security Center, 1, 022 API calls are extracted.
Figure 1 shows the Cumulative Distribution Function (CDF) of the API calls’
relevance scores, from which we can see that (1) for diﬀerent API calls, some
are explicitly relevant to malware, while some have high inﬂuence on benign
apps; and (2) those API calls with extremely low relevance scores (below 0.06
in our case) have limited or no contributions in malware detection, thus we
exclude them for feature manipulations. Therefore, we rank each API call and
group them into two sets for feature manipulations: M (those highly relevant to
malware) and B (those highly relevant to benign apps) in the descent order of
I(x, +1) and I(x, −1) respectively.

Fig. 1. Relevance score CDF of API calls

To evaluate the capability of an evasion attack, we further deﬁne a function

g(A(X)) to represent the accessibility to the learning system by the attack:
48 L. Chen et al.

g(A(X)) = ||y − f ||2 , (6)

which implies the number of malicious apps being misclassified as benign. The
underlying idea of our evasion attack EvAttack is to perform feature manipula-
tions with minimum evasion cost while maximize the total loss of classification in
Eq. (6). Specifically, we conduct bi-directional feature selection, that is, forward
feature addition performed on B and backward feature elimination performed
on M. At each iteration, an API call will be selected for addition or elimina-
tion depending on the fact how it influences the value of g(A(X)). The evasion
attack θ = {θ + , θ − } will be drawn from the iterations, where θ + , θ − ∈ {0, 1}d
(if AP Ii is selected for elimination (or addition), then θi+ (θi− ) = 1; otherwise,
θi+ (θi− ) = 0). The iterations will end at the point where the evasion cost reaches
to maximum (δmax ), or the features available for addition and elimination are
all manipulated. Given m = max(|M|, |B|), EvAttack requires O(nt m(μ+ +μ− ))
queries, in which nt is the number of malicious apps that the attackers want to
evade the detection, μ+ and μ− are the numbers of selected features for elimi-
nation and addition respectively (μ+ d, μ− d, m d).

4 Adversarial Machine Learning Model Against Evasion

Attacks

To be resilient against the evasion attacks, an anti-malware defender may modify

features of the training dataset, or analyze the attacks and retrain the classifier
accordingly to counter the adversary’s strategy [18]. However, these methods
typically suffer from assumption of specific attack models or substantial modifi-
cations of learning algorithms [15]. In this paper, we provide a systematic model
to formalize the impact of the evasion attacks with respect to system security
and robustness in Android malware detection. To this end, we perform our secu-
rity analysis of the learning-based classifier resting on the application setting
that the defender draws the well-crafted EvAttack from the observed sample
space, since the attack is modeled as optimization under generic framework in
which the attackers try to (1) maximize the number of malicious apps being
classified as benign, and (2) minimizes the evasion cost for optimal attacks over
the learning-based classifier [15].
Incorporating the evasion attack θ into the learning algorithm can be per-
formed through computing a Stackelberg game [20] or adding adversarial sam-
ples [15], which enables us to provide a significant connection between training
and the adversarial action. An adversarial learning model for Android malware
detection is supposed to be more resilient to the evasion attacks. It’s recalled
that an optimal evasion attack aims to manipulate a subset of the features with
minimum evasion cost while maximize the total loss of classification. In contrast,
to secure the classifier in Android malware detection, we would maximize the
evasion cost for the attacks [30]: from the analysis of the adversary problem
[4,14], we can find that the larger the evasion cost, the more manipulations need
to be performed, and the more difficult the attack is. Therefore, in our proposed
An Adversarial Machine Learning Model 49

adversarial learning model, we will not only incorporate the evasion attack θ
into the learning algorithm, but also enhance the classifier by using a security
regularization term based on the evasion cost.
We first define the resilience coefficient of a classifier as:

s(A(x), x) = 1/c(A(x), x), (7)

subject to Eq. 4, which is converse to the evasion cost. We then define a diagonal
matrix for the adversary action denoted as S ∈ Rn×n , where the diagonal ele-
ment Sii = s(A(xi ), xi ) and the remaining elements in the matrix are 0. Based
on the concept of label smoothness, we can secure the classifier with the con-
straint as f T Sy. Since the learning based Android malware detection can be
formalized as an optimization problem denoted by Eq. 3, we can then bring a
security regularization term to enhance its security. This constraint penalizes
parameter choices, smooths the effects the attack may cause, and in turn helps
to promote the optimal solution for the local minima in the optimization prob-
lem. Therefore, to minimize classifier sensitivity to feature manipulation, we can
minimize the security regularization term. Based on Eq. 3, we can formulate an
adversarial learning model against evasion attacks as:

1 α 1 T 1 T
argmin ||y − f ||2 + f T Sy + w w+ h h + ξ T (f − XT w − h) (8)
f ,w,h;ξ 2 2 2β 2γ

subject to f = sign(f (A(X))) and X = A(X), where α is the regularization

parameter for the security constraint. To solve the problem in Eq. 8, let

1 α 1 T 1
L(f , w, h; ξ) = ||y − f ||2 + f T Sy + w w + hT h + ξ T (f − XT w − h).
2 2 2β 2γ
(9)
∂L
As ∂w = 0, ∂L
∂h = 0, ∂L
∂ξ = 0, ∂L
∂f = 0, we have

w = βX ξ, (10)

h = γξ, (11)

f = XT w + h, (12)

α
f = y −
Sy − ξ. (13)
2
Based on the derivation from Eqs. 10, 11 and 12, we have

ξ = (βXT X + γI)−1 f . (14)

50 L. Chen et al.

We substitute Eqs. 14 to 13, then we can get our adversarial model in malware
detection (AdvMD) as:
α
((βXT X + γI) + I)f = (I − S)(βXT X + γI)y. (15)
2
Since the size of X is d×n, the computational complexity for Eq. 15 is O(n3 ).
If d < n, we can follow Woodbury identity [22] to transform Eqs. 15 to 16 and
reduce the complexity to O(d3 ).
α
(I + γ −1 I − γ −1 XT (γβ −1 I + X XT )−1 X )f = (I −
S)y. (16)
2
To solve the above secure learning in Eq. 15, conjugate gradient descent
method can be applied, which is also applicable to Eq. 16, provided that all
the variables are conﬁgured as the correct initializations.

5 Experimental Results and Analysis

In this section, three sets of experiments are conducted to evaluate our proposed
methods. The real sample collection obtained from Comodo Cloud Security Cen-
ter contains 2, 334 apps (1, 216 malicious and 1, 118 benign) with 1, 022 extracted
API calls. In our experiments, we use k-fold cross-validations for performance
evaluations. We evaluate the detection performance of different methods using
TP (true positive), TN (true negative), FP (false positive), FN (false negative),
TPR (TP rate), FPR (FP rate), FNR (FN rate), and ACC (accuracy).
According to the relevance scores analyzed in Sect. 3.2, those API calls whose
relevance scores are lower than the empirical threshold (i.e., 0.06 in our appli-
cation) will be excluded from consideration. Therefore, |M| = 199, |B| = 333.
Considering different weights for API calls, we optimize the weight for each fea-
ture (a ∈ [0, 1]). We exploit the average number of API calls that each app
possesses (i.e., 127) to define δmax based on the CDF drawn from all these num-
bers. When we set δmax as 20% of the average number of API calls that each app
possesses (i.e., 25), the average feature manipulation of each app is about 10%
of its extracted API calls, which could be considered as a reasonable trade-off
to conduct the evasion attacks considering the feature number and manipula-
tion cost. Therefore, we run our evaluation of the proposed evasion attacks with
δmax = 25.

5.1 Evaluation of EvAttack Under Diﬀerent Scenarios

Given δmax = 25, we first evaluate EvAttack under different scenarios: (1) In
mimicry (MMC) attack, i.e., Γ = (X, D̂), we select 200 random apps (100 mal-
ware and 100 benign apps) from our collected sample set (excluding those ones
for testing) as our mimic dataset and utilize linear Support Vector Machine
(SVM) as the surrogate classifier to train these app samples. (2) In imperfect-
knowledge (IPK) attack, i.e., Γ = (X, D), we implement this attack in a con-
sistent manner as MMC attack where the only difference is that we apply 90%
An Adversarial Machine Learning Model 51

of the collected apps to train SVM. (3) In Ideal-knowledge (IDK) attack, i.e.,
Γ = (X, D, f ), we conduct EvAttack based on all the collected apps and the
learning model in Eq. 3. Note that, EvAttack is applied to all these scenarios rest-
ing on the same δmax , and each attack is performed by 10-fold cross-validation.
The experimental results are shown in Table 1, which illustrate that the perfor-
mance of the attack signiﬁcantly relies on the available knowledge the attackers
have. In ideal-knowledge scenarios, the FNR of IDK reaches to 0.7227 (i.e.,
72.27% of testing malicious apps are misclassiﬁed as benign) that is superior to
MMC and IPK.

Table 1. Evaluation of the evasion attacks under diﬀerent scenarios

Scenarios TP FN ACC FNR

Before attack 111 8 96.12% 0.0672
MMC attack 85 34 84.91% 0.2857
IPK attack 48 71 68.97% 0.5966
IDK attack 33 86 62.50% 0.7227

5.2 Comparisons of EvAttack and Other Attacks

We further compare EvAttack with other attack methods including: (1) only
injecting API calls from B (Method 1 ); (2) only eliminating API calls from
M (Method 2 ); (3) injecting (1/2 × δmax ) API calls from B and eliminating
(1/2 × δmax ) API calls from M (Method 3 ); (4) simulating anonymous attack
by randomly manipulating API calls for addition and elimination (Method 4 ).
The experimental results which average over the 10-fold cross-validations shown
in Fig. 2 demonstrate that the performances of the attacks vary when using
diﬀerent feature manipulation methods (Method 0 is the baseline before attack
and Method 5 denotes the EvAttack ) with the same evasion cost δmax : (1) Method
2 performs worst with the lowest FNR which denotes that elimination is not
as eﬀective as others; (2) Method 3 performs better than the methods only
applying feature addition or elimination, and the anonymous attack, due to
its bi-directional feature manipulation over B and M; (3) EvAttack can greatly
improve the FNR to 0.7227 and degrade the detection accuracy ACC to 62.50%,
which outperforms other four feature manipulation methods for its well-crafted
attack strategy.

5.3 Evaluation of AdvMD Against Evasion Attacks

In this set of experiments, we validate the eﬀectiveness of AdvMD to the well-

crafted attacks. We use EvAttack to taint the malicious apps in the testing set,
and access the performances in diﬀerent ways: (1) the baseline before attack;
52 L. Chen et al.

Fig. 2. FNRs and accuracies of diﬀerent attacks.

(2) the classifier under attack; (3) the classifier retrained using the updated
training dataset [15,20]; (4) AdvMD. We conduct the 10-fold cross-validations,
experimental results of different learning models are shown in Fig. 3. Figure 3(a)
shows the comparisons of FPR, TPR and ACC for different learning models
before/against EvAttack, and Fig. 3(b) illustrates the ROC curves of different
learning models before/against EvAttack. From Fig. 3, we can observe that (i)
the retraining techniques can somehow defense the evasion attacks, but the per-
formance still remain unsatisfied; while (ii) AdvMD can significantly improve the
TPR and ACC, and bring the malware detection system back up to the desired
performance level, the accuracy of which is 91.81%, approaching the detection
results before the attack. We also implement the anonymous attack by randomly
selecting the features for manipulation. Under the anonymous attack, AdvMD
has zero knowledge of what the attack is. Even in such case, AdvMD can still
improve the TPR from 0.7815 to 0.8367. Based on these properties, AdvMD can
be a resilient solution in Andriod malware detection.

Fig. 3. Comparisons of diﬀerent learning models before/against EvAttack.

6 Related Work
Adversarial machine learning problems are starting to be leveraged from either
adversarial or defensive perspectives in some domains, such as anti-spam and
intrusion detection. Lowd and Meek [16] introduced an ACRE framework to
study how an adversary can learn suﬃcient information from the features to
construct targeted attacks using minimal adversarial cost. Zhang et al. [30],
An Adversarial Machine Learning Model 53

Li et al. [15], and Biggio et al. [4] took gradient steps to find the closest evasion
point x to the malicious sample x. Haghtalab et al. [10] proposed to learn the
behavioral model of a bounded rational attacker by observing how the attacker
responded to three defender strategies. To combat the evasion attacks, ample
research efforts have been devoted to the security of machine learning. Wang
et al. [20] modeled the adversary action as it controlling a vector α to modify
the training dataset x. Debarr et al. [7] explored randomization to generalize
learning model to estimate some parameters that fit the data best. Kolcz and
Teo [14] investigated a feature reweighting technique to avoid single feature
over-weighting. More recently, robust feature selection methods have also been
proposed to counter some kinds of evasion data manipulations [20,30]. However,
most of these works rarely investigate the security of machine learning in Android
malware detection. Different from the existing works, we explore the adversarial
machine learning in Android malware detection by providing a set of evasion
attacks to access the security of the classifier over different capabilities of the
attackers and enhancing the learning algorithm using evasion action and security
regularization term.

7 Conclusion and Future Work

In this paper, we take insights into the machine learning based model and its eva-
sion attacks. Considering diﬀerent knowledge of the attackers, we implement an
evasion attack EvAttack under three scenarios by manipulating an optimal por-
tion of the features to evade the detection. Accordingly, an adversarial learning
model AdvMD, enhanced by evasion data manipulation and security regulariza-
tion term, is presented against these attacks. Three sets of experiments based on
the real sample collection from Comodo Cloud Security Center are conducted to
empirically validate the proposed approaches. The experimental results demon-
strate that EvAttack can greatly evade the detection, while AdvMD can be a
robust and practical solution against the evasion attacks in Android malware
detection. In our future work, we will further explore the poisoning attacks in
which the attackers alter the training precess through inﬂuence over the training
data, as well as its resilient detection.

Acknowledgments. The authors would also like to thank the experts of Comodo
Security Lab for the data collection and helpful discussions. The work is partially
supported by the U.S. National Science Foundation under grant CNS-1618629 and
Chinese NSF grant 61672157.

References
1. Android: iOS combine for 91 percent of market. http://www.cnet.com
2. APKTool. http://ibotpeaches.github.io/Apktool/
3. Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning
be secure? In: ASIACCS (2006)
54 L. Chen et al.

4. Biggio, B., Fumera, G., Roli, F.: Evade hard multiple classifier systems. In: Okun,
O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble
Methods. Studies in Computational Intelligence, pp. 15–38. Springer, Heidelberg
(2009). doi:10.1007/978-3-642-03999-7 2
5. Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under
attack. IEEE TKDE 26(4), 984–996 (2014)
6. Bruckner, M., Kanzow, C., Scheffer, T.: Static prediction games for adversarial
learning problems. JMLR 13, 2617–2654 (2012)
7. Debarr, D., Sun, H., Wechsler, H.: Adversarial spam detection using the random-
ized hough transform-support vector machine. In: ICMLA 2013, pp. 299–304 (2013)
8. Dex. http://www.openthefile.net/extension/dex
9. Felt, A.P., Finifter, M., Chin, E., Hanna, S., Wagner, D.: A survey of mobile
malware in the wild. In: SPSM (2011)
10. Haghtalab, N., Fang, F., Nguyen, T.H., Sinha, A., Procaccia, A.D., Tambe, M.:
Three strategies to success: learning adversary models in security games. In: IJCAI
(2016)
11. Hou, S., Saas, A., Chen, L., Ye, Y.: Deep4MalDroid: a deep learning framework
for android malware detection based on linux kernel system call graphs. In: WIW
(2016)
12. Hou, S., Saas, A., Ye, Y., Chen, L.: DroidDelver: an android malware detection
system using deep belief network based on API call blocks. In: Song, S., Tong, Y.
(eds.) WAIM 2016. LNCS, vol. 9998, pp. 54–66. Springer, Cham (2016). doi:10.
1007/978-3-319-47121-1 5
13. IDC. http://www.idc.com/getdoc.jsp?containerId=prUS25500515
14. Kolcz, A., Teo, C.H.: Feature weighting for improved classifier robustness. In:
CEAS 2009 (2009)
15. Li, B., Vorobeychik, Y., Chen, X.: A general retraining framework for adversarial
classification. In: NIPS 2016 (2016)
16. Lowd, D., Meek, C.: Adversarial learning. In: KDD, pp. 641–647 (2005)
17. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: crite-
ria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern
Anal. Mach. Intell. 27(8), 1226–1238 (2005)
18. Roli, F., Biggio, B., Fumera, G.: Pattern recognition systems under attack. In:
Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013. LNCS, vol. 8258, pp.
1–8. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41822-8 1
19. Šrndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study.
In: SP (2014)
20. Wang, F., Liu, W., Chawla, S.: On sparse feature attacks in adversarial learning.
In: ICDM 2014 (2014)
21. Wood, P.: Internet Security Threat Report 2015. Symantec, California (2015)
22. Woodbury, M.A.: Inverting modified matrices. Statistical Research Group, Prince-
ton University, Princeton, NJ (1950)
23. Wu, D., Mao, C., Wei, T., Lee, H., Wu, K.: DroidMat: android malware detection
through manifest and API calls tracing. In: Asia JCIS (2012)
24. Wu, W., Hung, S.: DroidDolphin: a dynamic Android malware detection framework
using big data and machine learning. In: RACS (2014)
25. Xu, J., Yu, Y., Chen, Z., Cao, B., Dong, W., Guo, Y., Cao, J.: MobSafe: cloud com-
puting based forensic analysis for massive mobile applications using data mining.
Tsinghua Sci. Technol. 18, 418–427 (2013)
An Adversarial Machine Learning Model 55

26. Yang, C., Xu, Z., Gu, G., Yegneswaran, V., Porras, P.: DroidMiner: automated
mining and characterization of fine-grained malicious behaviors in android appli-
cations. In: Kutylowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712,
pp. 163–182. Springer, Cham (2014). doi:10.1007/978-3-319-11203-9 10
27. Ye, Y., Li, D., Li, T., Ye, D.: IMDS: intelligent malware detection system. In: KDD
2007 (2007)
28. Ye, Y., Li, T., Zhu, S., Zhuang, W., Tas, E., Gupta, U., Abdulhayoglu, M.: Com-
bining file content and file relations for cloud based malware detection. In: KDD
2011, pp. 222–230 (2011)
29. Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-Sec: deep learning in android malware
detection. In: SIGCOMM (2014)
30. Zhang, F., Chan, P.P.K., Biggio, B., Yeung, D.S., Roli, F.: Adversarial feature
selection against evasion attacks. IEEE Trans. Cybern. 46(3), 766–777 (2015)

Building Occupancy Detection and Localisation Using CCTV Camera and Deep Learning
No ratings yet
Building Occupancy Detection and Localisation Using CCTV Camera and Deep Learning
12 pages
Android Malware
No ratings yet
Android Malware
62 pages
masum2019
No ratings yet
masum2019
5 pages
Android Based Malware Detection Technique Using Machine Learning Algorithms
No ratings yet
Android Based Malware Detection Technique Using Machine Learning Algorithms
6 pages
Designing - Adversarial - Attack - and - Defence - For - Robust - Android - Malware - Detection - Models
No ratings yet
Designing - Adversarial - Attack - and - Defence - For - Robust - Android - Malware - Detection - Models
4 pages
Final Research PPT
No ratings yet
Final Research PPT
12 pages
Agrawal-Trivedi2021 Chapter MachineLearningClassifiersForA
No ratings yet
Agrawal-Trivedi2021 Chapter MachineLearningClassifiersForA
13 pages
pdf1
No ratings yet
pdf1
22 pages
A Vast Review of Recognizing The Presence of Andro
No ratings yet
A Vast Review of Recognizing The Presence of Andro
17 pages
Are - Malware - Detection - Models - Adversarial - Robust - Against - Evasion - Attack
No ratings yet
Are - Malware - Detection - Models - Adversarial - Robust - Against - Evasion - Attack
2 pages
An Investigation On Fragility of Machine Learning Classifiers in Android Malware Detection
No ratings yet
An Investigation On Fragility of Machine Learning Classifiers in Android Malware Detection
6 pages
126001974
No ratings yet
126001974
11 pages
GBKPA and AuxShield
No ratings yet
GBKPA and AuxShield
9 pages
LinRegDroid Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers
No ratings yet
LinRegDroid Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers
14 pages
Droiddetector: Android Malware Characterization and Detection Using Deep Learning
No ratings yet
Droiddetector: Android Malware Characterization and Detection Using Deep Learning
10 pages
1 s2.0 S0167404823005862 Main
No ratings yet
1 s2.0 S0167404823005862 Main
18 pages
Mining Based Learning Framework For Android Malware Detection
No ratings yet
Mining Based Learning Framework For Android Malware Detection
12 pages
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
No ratings yet
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
8 pages
Deep Learning en Android
No ratings yet
Deep Learning en Android
79 pages
A_Survey_on_Android_Malware_Detection_Techniques_Using_Machine_Learning_Algorithms
No ratings yet
A_Survey_on_Android_Malware_Detection_Techniques_Using_Machine_Learning_Algorithms
8 pages
V25I0107
No ratings yet
V25I0107
6 pages
DEF: Deep Ensemble Neural Network Classifier For Android Malware Detection
No ratings yet
DEF: Deep Ensemble Neural Network Classifier For Android Malware Detection
11 pages
On Device Malware Detection - Mansour Ahmadi, Angelo Sotgiu, Giorgio Gia
No ratings yet
On Device Malware Detection - Mansour Ahmadi, Angelo Sotgiu, Giorgio Gia
28 pages
7.analysis and Detection of Malware in Android Applications Using Machine Learning
No ratings yet
7.analysis and Detection of Malware in Android Applications Using Machine Learning
55 pages
Permission Based Android Malware Detecti
No ratings yet
Permission Based Android Malware Detecti
7 pages
Towards a Fair Comparison and Realistic Evaluation Framework of Android Malware
No ratings yet
Towards a Fair Comparison and Realistic Evaluation Framework of Android Malware
18 pages
A_Survey_on_Android_Malware_Detection_Techniques_Using_Supervised_Machine_Learning
No ratings yet
A_Survey_on_Android_Malware_Detection_Techniques_Using_Supervised_Machine_Learning
24 pages
Mathematics 09 02880 v2
No ratings yet
Mathematics 09 02880 v2
18 pages
A Review of Android Malware Detection Approaches Based On Machine Learning
No ratings yet
A Review of Android Malware Detection Approaches Based On Machine Learning
29 pages
511-C0015
No ratings yet
511-C0015
12 pages
Feature-Based Semi-Supervised Learning To Detect Malware From Android
No ratings yet
Feature-Based Semi-Supervised Learning To Detect Malware From Android
26 pages
Feature Engineering and Evaluation For Android Malware Detection Scheme
No ratings yet
Feature Engineering and Evaluation For Android Malware Detection Scheme
18 pages
CCCS-CIC-AndMal-2020 (2)
No ratings yet
CCCS-CIC-AndMal-2020 (2)
6 pages
pdf4
No ratings yet
pdf4
11 pages
BlackBook-Report FY-ML MalwareDetection1
No ratings yet
BlackBook-Report FY-ML MalwareDetection1
48 pages
Information 15 00025
No ratings yet
Information 15 00025
25 pages
5_6143364505431708942
No ratings yet
5_6143364505431708942
36 pages
Android Malware Detection Using Machine Learning
No ratings yet
Android Malware Detection Using Machine Learning
4 pages
Machine Learning Based Ensemble Classifier For Android Malware Detection
No ratings yet
Machine Learning Based Ensemble Classifier For Android Malware Detection
18 pages
Machine Learning Based Ensemble Classifier For Android Malware Detection
No ratings yet
Machine Learning Based Ensemble Classifier For Android Malware Detection
18 pages
Detection of Malware in Android Phones Using Machine Learning
No ratings yet
Detection of Malware in Android Phones Using Machine Learning
6 pages
3116-analisis-statis-deteksi-malware-jurnal-cybersecurity.id.en
No ratings yet
3116-analisis-statis-deteksi-malware-jurnal-cybersecurity.id.en
5 pages
Feature Extraction From Android Application Packages and Its Usage in Machine Learning For Malware Classification
No ratings yet
Feature Extraction From Android Application Packages and Its Usage in Machine Learning For Malware Classification
40 pages
CIC-AndMal-2017
No ratings yet
CIC-AndMal-2017
5 pages
2312.06423v1
No ratings yet
2312.06423v1
14 pages
IEEE Xplore Citation Plain Text Download 2025.1.5.19.1.38
No ratings yet
IEEE Xplore Citation Plain Text Download 2025.1.5.19.1.38
9 pages
Malware - Me Project Document
No ratings yet
Malware - Me Project Document
28 pages
Malware - Me Project Document
No ratings yet
Malware - Me Project Document
31 pages
Malicious Application Detection
No ratings yet
Malicious Application Detection
7 pages
Adversarial Examples For Malware Detection: Abstract
No ratings yet
Adversarial Examples For Malware Detection: Abstract
18 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
19 pages
Machine Learning Aided Android Malware Classification
No ratings yet
Machine Learning Aided Android Malware Classification
21 pages
Hybrid Machine Learning Model For Malware Analysis in
No ratings yet
Hybrid Machine Learning Model For Malware Analysis in
18 pages
Dynamic Android Malware Category Classification Using Semi-Supervised Deep Learning
No ratings yet
Dynamic Android Malware Category Classification Using Semi-Supervised Deep Learning
8 pages
TSP_CMC_53163
No ratings yet
TSP_CMC_53163
18 pages
PermDroid - A framework for Android malware detection
No ratings yet
PermDroid - A framework for Android malware detection
38 pages
Journals
No ratings yet
Journals
4 pages
IEEE_Conference_Template__1_
No ratings yet
IEEE_Conference_Template__1_
4 pages
1-s2.0-S0957417424024138-main
No ratings yet
1-s2.0-S0957417424024138-main
16 pages
Next-Gen Cybersecurity
From Everand
Next-Gen Cybersecurity
Dr. Valarian Couch
No ratings yet
The Android Malware Handbook: Detection and Analysis by Human and Machine
From Everand
The Android Malware Handbook: Detection and Analysis by Human and Machine
Qian Han
No ratings yet
Intrusion Detection Using Deep Belief Network and Probabilistic Neural Network (2017)
No ratings yet
Intrusion Detection Using Deep Belief Network and Probabilistic Neural Network (2017)
4 pages
An Intelligent Intrusion Detection Scheme Powered by Boosting Algorithm
No ratings yet
An Intelligent Intrusion Detection Scheme Powered by Boosting Algorithm
5 pages
Almttaliat Aladdia Mlkhs Aldrs 1 11
No ratings yet
Almttaliat Aladdia Mlkhs Aldrs 1 11
1 page
Examen Maths 2018
No ratings yet
Examen Maths 2018
2 pages
LV, Lu Wang, Wenhai Zhang, Zeyin Liu, Xinggao (2020)
No ratings yet
LV, Lu Wang, Wenhai Zhang, Zeyin Liu, Xinggao (2020)
17 pages
Acharya, Toya Khatri, Ishan Annamalai, Annamalai Chouikha, Mohamed F (2021)
No ratings yet
Acharya, Toya Khatri, Ishan Annamalai, Annamalai Chouikha, Mohamed F (2021)
7 pages
Kato, Keisuke Klyuev, Vitaly (2017)
No ratings yet
Kato, Keisuke Klyuev, Vitaly (2017)
8 pages
Ashfaq Khan, Muhammad Kim, Yangwoo (2021)
No ratings yet
Ashfaq Khan, Muhammad Kim, Yangwoo (2021)
17 pages
Gupta, Govind P. Kulariya, Manish (2016)
No ratings yet
Gupta, Govind P. Kulariya, Manish (2016)
8 pages
Big Data in Cybersecurity - A Survey of Applications and Future Trends
No ratings yet
Big Data in Cybersecurity - A Survey of Applications and Future Trends
30 pages
Mask R-CNN
No ratings yet
Mask R-CNN
20 pages
CCD chapter 3 notes
No ratings yet
CCD chapter 3 notes
11 pages
Montecarlo Prog Analysis
No ratings yet
Montecarlo Prog Analysis
12 pages
Ai for remte sensing assignments notes
No ratings yet
Ai for remte sensing assignments notes
16 pages
Nikhil Major Project
No ratings yet
Nikhil Major Project
60 pages
Ebook Learn Artificial Intelligence With Altair Data Analytics
No ratings yet
Ebook Learn Artificial Intelligence With Altair Data Analytics
122 pages
Full Download Learn Data Mining Through Excel: A Step-By-Step Approach For Understanding Machine Learning Methods 1st Edition Hong Zhou PDF
100% (4)
Full Download Learn Data Mining Through Excel: A Step-By-Step Approach For Understanding Machine Learning Methods 1st Edition Hong Zhou PDF
62 pages
Data Science Tools
No ratings yet
Data Science Tools
8 pages
Semi-Supervised Bearing Fault Diagnosis and Classification Using Variational Autoencoder-Based Deep Generative Models
No ratings yet
Semi-Supervised Bearing Fault Diagnosis and Classification Using Variational Autoencoder-Based Deep Generative Models
11 pages
Predictive Maintenance - From Data Collection To Value Creation
No ratings yet
Predictive Maintenance - From Data Collection To Value Creation
12 pages
Diploma FullStack Software Development Specialization Europe Brochure-1
No ratings yet
Diploma FullStack Software Development Specialization Europe Brochure-1
19 pages
Regularization
No ratings yet
Regularization
9 pages
Week 3
No ratings yet
Week 3
56 pages
5.4-Reinforcement learning-part3-Q-Learning
No ratings yet
5.4-Reinforcement learning-part3-Q-Learning
18 pages
These Are The Top 10 Machine Learning Languages On GitHub
No ratings yet
These Are The Top 10 Machine Learning Languages On GitHub
3 pages
Semi-Supervised Learning
No ratings yet
Semi-Supervised Learning
8 pages
University of Liverpool Dissertation Fail
100% (2)
University of Liverpool Dissertation Fail
6 pages
Model Prediction of Defects in Sheet Metal Forming Processes
No ratings yet
Model Prediction of Defects in Sheet Metal Forming Processes
12 pages
Graph_Neural_Network_for_Fraud_Detection_via_Spatial-Temporal_Attention
No ratings yet
Graph_Neural_Network_for_Fraud_Detection_via_Spatial-Temporal_Attention
14 pages
HRDC JNTUH RC AI and ML
No ratings yet
HRDC JNTUH RC AI and ML
2 pages
Computing Model For Alzheimer Prediction Using Support Vector Machine Classifier
No ratings yet
Computing Model For Alzheimer Prediction Using Support Vector Machine Classifier
10 pages
Predictive Modeling W - Python - Curated by Ben Putney - Medium
No ratings yet
Predictive Modeling W - Python - Curated by Ben Putney - Medium
7 pages
ArtificiaI Intelligence CLASS 10 HALF YEARLY TEST Solutions For Students
No ratings yet
ArtificiaI Intelligence CLASS 10 HALF YEARLY TEST Solutions For Students
4 pages
Breed Identification of Meat Using Machine Learning and Breed Tag SNPs
No ratings yet
Breed Identification of Meat Using Machine Learning and Breed Tag SNPs
7 pages
Face Recognition AWS Report
No ratings yet
Face Recognition AWS Report
5 pages
Jacob Eisenstein - Natural Language Processing-MIT Press
No ratings yet
Jacob Eisenstein - Natural Language Processing-MIT Press
591 pages
Auto-Encoding_Variational_Bayes
No ratings yet
Auto-Encoding_Variational_Bayes
8 pages
AAi
No ratings yet
AAi
37 pages

An Adversarial Machine Learning Model Against Android Malware Evasion Attacks

Uploaded by

An Adversarial Machine Learning Model Against Android Malware Evasion Attacks

Uploaded by

An Adversarial Machine Learning Model

Against Android Malware Evasion Attacks

Lingwei Chen1 , Shifu Hou1 , Yanfang Ye1(B) , and Lifei Chen2

Abstract. With explosive growth of Android malware and due to its

Keywords: Adversarial machine learning · Android malware detection ·

producing legitimate applications (apps), but also attackers to deliver malware

– Thorough exploration of Android malware evasion attacks under diﬀerent sce-

Security Center, and provide a series of comprehensive experiments to empir-

2 Machine Learning Based Android Malware Detection

xi =< xi1 , xi2 , xi3 , ..., xid >, (1)

otherwise, xij = 0).

f = sign(f (X)) = sign(XT w + h), (2)

learning on the basis of a linear classiﬁer can be formalized as an optimization

3 Implementation of Evasion Attacks

3.1 Evasion Cost

c(x , x) = ||aT (x − x)||pp , (4)

where the malicious app is manipulated to be misclassiﬁed as benign only if

3.2 Evasion Attack Method

Fig. 1. Relevance score CDF of API calls

To evaluate the capability of an evasion attack, we further deﬁne a function

g(A(X)) = ||y − f ||2 , (6)

4 Adversarial Machine Learning Model Against Evasion

To be resilient against the evasion attacks, an anti-malware defender may modify

s(A(x), x) = 1/c(A(x), x), (7)

subject to f = sign(f (A(X))) and X = A(X), where α is the regularization

ξ = (βXT X + γI)−1 f . (14)

5 Experimental Results and Analysis

5.1 Evaluation of EvAttack Under Diﬀerent Scenarios

Table 1. Evaluation of the evasion attacks under diﬀerent scenarios

Scenarios TP FN ACC FNR

5.2 Comparisons of EvAttack and Other Attacks

5.3 Evaluation of AdvMD Against Evasion Attacks

In this set of experiments, we validate the eﬀectiveness of AdvMD to the well-

Fig. 2. FNRs and accuracies of diﬀerent attacks.

Fig. 3. Comparisons of diﬀerent learning models before/against EvAttack.

7 Conclusion and Future Work

You might also like