MalPat Mining Patterns of Malicious and Benign Android Apps Via Permission Related APIs
MalPat Mining Patterns of Malicious and Benign Android Apps Via Permission Related APIs
Abstract—The dramatic rise of Android application (app) mar- of installing malware. Unlike desktop apps, mobile apps can
ketplaces has significantly gained the success of convenience for mo- have the privilege, after declared (e.g., in Manifest file of An-
bile users. Consequently, with the advantage of numerous Android droid platform), to access sensitive information such as contact
apps, Android malware seizes the opportunity to steal privacy-
sensitive data by pretending to provide functionalities as benign lists, SMS messages, GPS, etc. To make full use of resources of
apps do. To distinguish malware from millions of Android apps, mobile devices and support abundant functionalities of mobile
researchers have proposed sophisticated static and dynamic anal- apps, such mechanism of permission declaration remarkably
ysis tools to automatically detect and classify malicious apps. Most fulfills its job. Unexpectedly, however, it also provides the op-
of these tools, however, rely on manual configuration of lists of fea- portunity for malware to hijack and steal sensitive information.
tures based on permissions, sensitive resources, intents, etc., which
are difficult to come by. To address this problem, we study real- According the report released by McAfee [1] in March 2016,
world Android apps to mine hidden patterns of malware and are more than 13 million mobile malware samples were collected
able to extract highly sensitive APIs that are widely used in An- by 2015, and it recorded a 72% increase in new mobile mal-
droid malware. We also implement an automated malware detec- ware samples as to the last quarter. Data leaks are the typical
tion system, MalPat, to fight against malware and assist Android behaviors of malware to steal users’ contact lists, personal in-
app marketplaces to address unknown malicious apps. Compre-
hensive experiments are conducted on our dataset consisting of formation, even money. As more people change their payment
31 185 benign apps and 15 336 malware samples. Experimental re- habits from cash to mobile banking, it has been a serious chal-
sults show that MalPat is capable of detecting malware with a high lenge to protect mobile users from spiteful attackers who may
F1 score (98.24%) comparing with the state-of-the-art approaches. steal bank account credentials. Rasthofer et al. [2] identified a
Index Terms—Android applications, malware detection, new malware family Android/BadAccents that stole, within two
permission-related APIs, random forests, software security. months, the credentials of more than 20 000 bank accounts of
users residing in Korea. There are some underground groups
I. INTRODUCTION whose revenues are earned by trading stolen bank account cre-
HE past few years have witnessed the drastic increase of dentials. Symantec reported [3] that one underground group had
T mobile apps providing various facilities for personal and
business use. The proliferation of mobile apps is due to billions
made $4.3 million in purchases using stolen credit cards over a
two-year period.
of users who enable developers to earn revenue through adver- Under such severe situation of the security and privacy of
tisements, in-app purchases, etc. A multitude of apps developed mobile users, identifying malicious apps and defending against
by many independent developers, involving unfriendly ones, them from stealing sensitive information have posted an es-
can be hard for users to determine the trustworthiness of these pecially important challenge. Rasthofer et al. [4] proposed a
apps. Whenever users install a new app, they are under the risk machine-learning approach, SUSI, to identify lists of sources of
sensitive data (e.g., user location) and sinks of potential chan-
nels to leak such data to an adversary (e.g., network connection).
Manuscript received September 9, 2016; revised April 22, 2017 and October The categories published by SUSI along with other new cate-
9, 2017; accepted November 24, 2017. Date of publication December 20, 2017;
date of current version March 1, 2018. This work was supported in part by the gories of sensitive sources were adopted by MUDFLOW [5] to
National Basic Research Program of China (973 Project 2014CB347701), in part identify malware as well as their abnormal usage of sensitive
by the National Natural Science Foundation of China under Grant 61722214 and data. Their work both focused on the data flows that originate
Grant 61472338, in part by the Program for Guangdong Introducing Innovative
and Enterpreneurial Teams (2016ZT06D211), and in part by the Pearl River from sensitive sources, which were effective and efficient as to
S&T Nova Program of Guangzhou (201710010046). Associate Editor: Y. Le identify abnormal behaviors of malware, but when it comes to
Traon. (Corresponding author: Zibin Zheng.) classifying malicious apps, these approaches can be inefficient
G. Tao, Z. Zheng, and Z. Guo are with the School of Data and Computer
Science, Sun Yat-sen University, Guangzhou 510006, China, and also with and hard to assist Android app marketplaces fighting against
Collaborative Innovation Center of High Performance Computing, National malware. Permissions declared in Manifest file are easy to cap-
University of Defense Technology, Changsha 410073, China (e-mail: gwin- ture the intention of apps for data usage, which can be utilized
[email protected]; [email protected]; [email protected]).
M. R. Lyu is with the Department of Computer Science and Engineering, to identify malicious behaviors of apps [6]–[8]. Permission-
The Chinese University of Hong Kong, Shatin, Hong Kong, China (e-mail: based methods avoid high cost of time and computation, which,
[email protected]). however, only capture the coarse-grained features of apps. Ap-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. plication programming interfaces (APIs) provided by Android
Digital Object Identifier 10.1109/TR.2017.2778147 operating system, in contrast, profoundly demonstrate the full
0018-9529 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
356 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018
leak the original sensitive information. The impacted data were classify malware. However, the permission declaration of An-
identified before they left the system at a taint sink. Comparing droid apps is coarse grained, lacking the ability of capturing
to TaintDroid, more information leaks were found by VetDroid in-depth behaviors of malicious apps. To this point, Gorla et al.
[15]. It constructed permission use behaviors to examine the in- [25] selected a subset of APIs that were governed by Android
ternal sensitive behaviors and find information leaks. VetDroid permission setting naming sensitive APIs, which were identified
can help identify subtle vulnerabilities in some apps. To cap- by Felt et al. [26]. These sensitive APIs were used as binary fea-
ture both the OS-level and Java-level semantics simultaneously tures to train an OC-SVM to identify the outlier apps. Similarly,
and seamlessly, DroidScope [16] collected detailed native and DREBIN [6] also utilized APIs as features but with more other
Dalvik instruction traces to track information leakage through types of features (hardware components, permissions, intents,
both Java and native components. These dynamic methods all etc.). These features were all manually selected and required
aim to conduct taint analysis to detect suspicious behaviors professional background knowledge. Besides, the final feature
during runtime. With the dramatically increasing number of set contained about 545 000 features, which was a very large
Android apps, it restricts dynamic methods from high-speed de- number of features and even more than the number of samples
tection. Besides, we aim to detect malware instead of capturing (123 453 benign apps and 5560 malware samples). It could re-
stealthy behaviors, dynamic analysis can be inefficient to assist quire more time and efforts to extract these features as well as
mobile app marketplaces fighting against malware. to train machine-learning models on them. Utilizing the cate-
gorization of Android APIs published by SUSI [4], Avdiienko
B. Static Analysis et al. [5] added three new categories to detect malware: sensitive
resources, intents, nonsensitive sources, and sinks. With these
Differing from dynamic analysis, static approaches can ef-
categories as app features, Avdiienko et al. trained a ν-SVM
fectively deal with millions of apps and analyze patterns of
[27] to detect malware (MUDFLOW). Differing from the fea-
malicious and benign behaviors without online running. Pre-
tures used in MUDFLOW, we only adopt the permission-related
vious studies focused on malware detection and classification
APIs used by apps without any other additional information to
were conducted via various methods, such as static taint analy-
detect malware. Moreover, random forests are employed in our
sis [17], information flow analysis [18], [19], and probabilistic
malware detection system as the classifier to identify malware.
models [7], [20], etc. DroidRanger [8] utilized a permission-
Random forest is a well-known machine learning approach in
based behavioral footprinting scheme to detect new samples of
classification and regression, which consists of a set of binary
known Android malware families and a heuristics-based filter-
decision trees [28]. By training multiple decision trees, random
ing scheme was adopted to identify certain inherent behaviors
forests combine the results from these decision trees with voting
of unknown malicious families. Permission-based method is
approach. Details of the training process of random forests are
coarse-granted and cannot capture more detailed behaviors of
illustrated in Section V.
apps. We analyze and elaborate more details in the rest of this
Previous studies focused on data flow can be inefficient for
paper. To capture fine-grained information, Apposcopy [17] pro-
malware detection and classification, resulting in high cost of
posed a high-level language to capture the signatures describ-
time and resources in fighting against malware. In contrast, sen-
ing semantic characteristics of malware families. Based on the
sitive or critical API-based approaches are easy to construct and
extracted signatures, a static analysis was conducted to detect
can adjust rapidly according to the change of malware. How-
certain malware families. The main concern of Apposcopy is
ever, very few efforts have been made in conducting a thorough
to identify certain family of malware, which is different from
analysis of permission-related APIs, which can significantly im-
the purpose of our work. Based on static data-flow analysis,
prove the efficiency of malware detection with easy and practical
Flowdroid [21], ded [22], and CHEX [23] were able to pre-
approaches. In this paper, we try to address this problem and
cisely detect potential malicious behaviors of Android apps.
give an empirical study of permission-related APIs.
Such technique can effectively capture the exactly stealthy and
malicious behaviors of apps, but it can introduce a very high
overhead and be inefficient to fight against malware. III. DATASET COLLECTION
To mine the hidden patterns of malware, we study the behav-
C. Machine Learning Approach iors of malicious and benign Android apps in the real world. In
Classifying malware automatically is an open problem com- this section, we briefly describe two sets of Android apps and
monly addressed by employing machine learning techniques. their characteristics.
Permissions are designed to protect sensitive resources on An- Benign apps: To collect the benign apps, we used Google
droid platform, which directly demonstrates the sensitive be- Play [11], a leading Android app marketplace in the world. Be-
haviors of Android apps. Bartel et al. [24] found that there were nignRan, the first dataset, consists of 28 787 apps randomly
a large part of apps suffering from permission gaps, i.e., not all collected from May to December 2015. The category distribu-
the permissions they declared were actually used. By analyz- tion of BenignRan is shown in Fig. 2. The right part in light
ing permission usage among millions of benign and malicious gray are the categories belonging to Games category. Except for
apps, they can effectively expose abnormal behaviors and fi- the Tools category, containing 3787 apps, other categories all
nally distinguish malware from various apps. Peng et al. [7] include almost similar amount of apps. The randomly collected
utilized the advantage of permissions and was able to correctly apps may have bias on categories and API usage. To ease such
358 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018
Fig. 3. Top 25 most used permissions in benign and malicious app dataset.
Fig. 5. Top 25 most used APIs in benign and malicious app dataset.
Fig. 6. Difference of API usage between malicious and benign apps, where
the percentage of the difference is larger than 20%.
TABLE I
API-PERMISSION MAPPINGS
TABLE II
USE OF PERMISSIONS AND APIS BY MALICIOUS AND BENIGN APPS:
MANN–WHITNEY TEST (p-VALUE)
Test p-value
A. Feature Extraction
Algorithm 1: Classifier Training.
Data samples are the base of training a model. As described Require:
in Section III, we crawled two sets of Android apps, malicious A : App Set
and benign apps, comprising the datasets of our training pro- F : Feature Set
cess. There are 31 185 apps in the benign app dataset and L : Label Set
15 336 malware samples in the malicious app dataset. With k : Number of Decision Trees
all these malicious and benign apps, we can extract features Ensure:
from the source codes of decompiled files. The installation C : Random Forests Classifier
package of Android apps is the .apk file, which can be dis- 1: function Train A, F, L, k
assembled by the well-known decompiling tool, Apktool [38]. 2: for i ← 0, k do
It can recover main files organizing source codes into a partic- 3: Ai ← Randomly selected N apps
ular way in the smali folder, and the methods implemented 4: Li ← Labels of Ai
in the source codes are in the following format after being 5: for each node n in decision tree Ti do
decompiled: 6: Fi ← Randomly selected m features
7: f = arg max Gini(Ai , Li , Fi )
8: Generate n using feature f
android/net/ConnectivityManager;
9: end for
− > getActiveNetworkInfo() 10: end for
11: C = k0 Ti
The first part android/net/ConnectivityManager presents the 12: return C
package of the invoked method, and the second part is the target 13: end function
method, getActiveNetworkInfo(), used in the app. Based on
this, we can traverse all the decompiled source codes to extract
employed APIs of the target app, which form the initial feature [43]. Specifically, the subject function is defined as follows:
set. For all the apps in both malicious and benign app datasets,
⎛ ⎞
the numbers of permission-related APIs are extracted as the fea-
2
tures to train the classifier. For each app, the feature consists of |A0 | ⎝ |Ck | ⎠
Gini(A, f ) = 1−
32 304 items, where each item represents the number of call |A| |A0 |
k =0,1
sites of the current API used in the target app. The features we ⎛ ⎞
extracted for classification could resist to the code obfuscation |A1 | ⎝ |Ck | 2
that does not obscure API calls of Android operating system. + 1− ⎠ (3)
|A| |A1 |
The manual of an official Android obfuscation tool, ProGuard k =0,1
3) Step 3 (lines 10–11): Construct k decision trees from steps Equation (4) denotes that how many of the true malicious apps
1–2, and the result is decided by the voting approach, i.e., are correctly identified. The value of precision is in the interval
the type of each app is decided by the major result of all of [0, 1], and the large value indicates the correctness of the
the outputs of these decision trees. malware detection system. Equation (5) denotes that how many
After the training process, the parameters of random forests of the malicious apps identified by the detection system are true
at each node of each decision tree are set and have the capability malware. The value of recall is also in the interval of [0, 1].
of classifying apps. Therefore, in testing process, each app with Equation (6) is the F1 score, which is the harmonic average
a feature vector can be determined into a certain type with the of precision and recall. The value of F1 score is also in the
trained malware classifier. Our full dataset is split into two parts interval of [0, 1]. In order to fight against malware, we focus on
for extracting highly sensitive APIs and training final classifier, the correctness of the identification of malicious apps instead
respectively. The full set of APIs is employed as the features of benign apps. Therefore, precision, recall, and F1 score of
in the training process on the first dataset part. After the first malware are adopted as evaluation metrics in the experiments.
process, we are able to extract highly sensitive APIs based on
the importance of APIs, i.e., the weights of APIs learned from B. Experimental Setup
the first training process. Therefore, the APIs with large weights
In the experiments, MUDFLOW [5], DREBIN [6], and
are adopted to retrain the classifier, which is subsequently used
DroidAPIMiner [9] are employed as the state-of-the-art ap-
to detect malware.
proaches to compare with our MalPat. In comparison with these
The detection process is based on the trained classifier. When
methods, we select the intersection dataset of the dataset used in
a new app comes, it is decompiled by Apktool [38]. As described
article [5] and our dataset, which contains 2398 benign apps and
above, the source codes of the app are all stored in the smali
13 840 malware samples. Therefore, our remaining dataset con-
folder. We extract all the permission-related APIs, including the
sists of 28 787 benign apps and 1496 malicious apps, and there is
number of call sites for each API, as the initial features. Based on
no interaction with the dataset adopted to compare with MUD-
the app database, highly sensitive APIs are able to be mined from
FLOW, DREBIN, and DroidAPIMiner. This remaining dataset
thousands of permission-related APIs by training the random
is employed in our MalPat to extract highly sensitive APIs. Ex-
forests classifier. Therefore, only these highly sensitive APIs
cept the experiments comparing with MUDFLOW, DREBIN,
are utilized as the final features of the newly coming app. These
and DroidAPIMiner, the datasets used in the later experiments
features are then employed as the input to obtain the result of
comparing with the baseline methods are our full malicious and
the app type.
benign datasets consisting of 15 336 malicious apps and 31 185
VI. EVALUATION benign apps. Besides, in extracting highly sensitive APIs, the
In this section, we present the evaluation metrics adopted dataset is split into two parts with the partition of 1:1, which is
in our experiments. With the datasets of benign and malicious illustrated in Section V. For all the experiments, we randomly
apps described in Section III, comprehensive experiments are select from 50% to 90% of both malicious and benign datasets as
conducted in our malware detection system, MalPat. The com- the training set, and the remaining part is regarded as the test set.
parison with the state-of-the-art approaches is also illustrated in We repeat each experiment for ten times and average the results.
this section. Details of experiments are given in the following. In addition, the number of decision trees trained in the random
forests classifier is 200, and remains the same. In Section VI-D,
A. Evaluation Metrics we also study the impact of highly sensitive APIs, i.e., how the
To evaluate the performance of malware detection, we use number of APIs used to retrain the classifier influences the final
precision and recall metrics. As malicious apps are positive result. The partition of training and test set in the study of the
samples and benign apps are negative samples in our evaluation, impact of highly sensitive APIs is 9:1. Details of experiments
we first present three types of values: and discussions are presented in the following sections.
1) (tp: true positive): The number of malicious apps that are
correctly identified as malicious apps. C. Comparison With State-of-the-Art Methods
2) (fp: false positive): The number of benign apps that are To demonstrate the effectiveness and efficiency of our auto-
incorrectly identified as malicious apps. mated malware detection system, MalPat, we compare it with
3) (fn: false negative): The number of malicious apps that are existing state-of-the-art approaches. We test on two versions of
incorrectly identified as benign apps. our MalPat system: the one using the full set of 32 304 APIs
Therefore, the metrics precision and recall can be calculated (MalPat) and the one only employing top 50 highly sensitive
as follows: APIs (MalPat50).
tp The first method we utilize to compare with MalPat is MUD-
precision = (4)
tp + f p FLOW [5]. We downloaded the source code from the website
tp of MUDFLOW [44] and reran the scripts with its optimal set-
recall = (5) tings on our intersection dataset. The experimental results are
tp + f n
shown in Fig. 13. It is obvious that MUDFLOW cannot com-
precision · recall pete with MalPat under all the measures. Especially, MalPat50
F1 = 2 · . (6)
precision + recall outperforms MUDFLOW with 3% precision value, 2% recall
366 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018
Fig. 13. Comparison with state-of-the-art methods. (a) Precision results. (b) Recall results. (c) F1 score results.
rate, and 2% F1 score. The classifier employed in MUDFLOW then employed in DroidAPIMiner to identify malware, and the
is support vector machine (SVM) [45], and the features of apps results are shown in Fig. 13. Clearly, DroidAPIMiner could not
are extracted according to a manually selected list. MalPat, on compete with other approaches including MalPat. Especially, all
the other hand, takes full advantages of the random forests clas- the recall rates and F1 scores are worse than any of other meth-
sifier by engaging it during the feature set construction, which ods. We carry out the experiment on DREBIN with restricted
manages to capture more information of disparate behaviors API calls in the previous paragraph, and it is observed a surpris-
between benign and malicious apps. ing result. By comparing with the results of DroidAPIMiner, it
The second state-of-the-art method to compare with is can be explained that restricted API calls or permission-relate
DREBIN [6], which used APIs as well as other types of features APIs do have better capability to distinguish malware from mil-
to detect malware. There are two sources of features utilized lions of apps. This proves our statement in Section I that APIs
in DREBIN. The first source of feature sets is from the mani- that are not permission-related may introduce noise affecting
fest and another one is from disassembled code. Especially, the feature extraction.
feature sets from the disassembled code include the similar API By comparing with existing state-of-the-art approaches, it
features as we used in MalPat from PScout [26]. From Fig. 13(a), demonstrates the effectiveness of MalPat in identifying mal-
it can be observed that DREBIN’s precision results are between ware. The highest recall rate achieved by MalPat is 0.9963,
MalPat and MalPat50, which means DREBIN has similar ca- which means that only 51 out of 13 840 malicious apps are
pability of correctly identifying malicious apps to MalPat. Its not detected. In the following section, we study how different
recall rates in Fig. 13(b) are all lower than both versions of Mal- features of apps can affect the results of MalPat on our full
Pat, and the largest difference can be more than 0.02. The recall dataset.
rate demonstrates the capability of detecting malware from mil-
lions of apps. Apparently, DREBIN is worse than MalPat on
D. Comparison With Baseline
this functionality. The reason can be the same to MUDFLOW,
where DREBIN uses a manually selected features, but according To study the influence of different features of apps on Mal-
to the fair results of DREBIN that are better than MUDFLOW, Pat, we experiment on different sets of API features like the
the features employed in DREBIN do show their effectiveness previous section, where MalPat includes the full 32 304 APIs
in identifying malware. We have an assumption that the API fea- and MalPat50 contains the top 50 highly sensitive APIs. Be-
tures employed in DREBIN, which are similar to the ones used sides, a baseline method we compared with is the one using
in our MalPat, are the key to detect malware. Thus, we mod- permissions declared in the Manifest file of Android platform
ify the original DREBIN by reducing its feature sets to only (Perm). Permission-based methods can avoid high cost of time
the one that contains restricted API calls. We test this method and computation, which, however, can only capture the coarse-
on 90% training set for 10 times, and the averaged results are grained features of Android apps. Because there are thousands
0.950910 precision value and 0.972977 recall rate. These results of Android APIs governed by each permission. Hence, using
are almost the same as the ones generated by MUDFLOW. It is permissions as the features of apps can miss the full view of app
unexpected but also reasonable. Because our MalPat also uses behaviors due to the lack of in-depth information. To demon-
the similar API features and achieves the best results. strate the weakness of using permissions as features, we conduct
The last state-of-the-art approach is DroidAPIMiner [9]. It the experiments on our MalPat with both APIs and permissions,
also employs Android system calls as its major features. Be- respectively.
sides, DroidAPIMiner adds APIs with similar support whose As illustrated above, MalPat surpasses the state-of-the-art ap-
parameters are more frequent in the malware set as well, but the proaches, MUDFLOW, DREBIN, and DroidAPIMiner, on mal-
later part does not make a difference on the final results. There- ware detection, so the experiments on MalPat can directly show
fore, we extract the APIs that have a usage difference of more the effectiveness of using APIs instead of permissions. Fig. 14
than 6% between malicious and benign apps. These features are shows the results of MalPat and Perm methods. According to
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 367
Fig. 14. Comparison with baseline methods on the full datasets. (a) Precision results. (b) Recall results. (c) F1 score results.
the results shown in Fig. 14, it is obvious that MalPat with APIs
as features outperforms the one using permissions under all per-
centage of training set. The precision results of MalPat, shown
in Fig. 14(a) are all between 0.90 and 0.94, and there is no big
change with the increase of the training set. The difference be-
tween MalPat and Perm is near 0.1 when 50% of apps are used
for training. There is one observation that the precision results of
MalPat are better than that of MalPat50, but MalPat50 surpasses
MalPat on the results of recall. It is interesting that fewer APIs
employed as features can improve the recall rate, which means
that highly sensitive APIs actually represent the malicious be-
haviors of malware and have the capability to identify malware.
As observed, permissions of Android apps lack the ability to
Fig. 15. Impact of highly sensitive APIs on the efficiency of MalPat.
capture the fine-grained features, which can be addressed by
adopting the APIs governed by them. Permission-related APIs
not only capture more detailed behaviors of apps, but also have
Moreover, there is one special case that should be noted. The
the similar features as permissions do.
recall rate reaches the largest value with 20 APIs, and then it
drops from 0.8918 to 0.8696 as the number of APIs increases
E. Impact of Highly Sensitive APIs from 20 to 200. Although there is a small increase when the
number of APIs increases from 100 to 200, but from the blue
In our model training process, we aim to extract highly sen-
line, we can observe that MalPat with full set of APIs as features
sitive APIs so as to detect malware with as few features as
has lower recall rate than the one with 50 APIs. The numbers
possible. As discussed in Section IV-B-1, different APIs have
of malicious and benign apps are both similar to the numbers
different importance in capturing malicious behaviors and iden-
of all the APIs. Therefore, if all the APIs are adopted as the
tifying malware. Therefore, to study the impact of different
features to train the classifier, it may overfit on the training set
number of APIs as features, we conduct experiments on MalPat
and can misclassify malicious apps. On the other hand, too few
with the number of APIs ranging from 10 to 200. As illustrated
features can also affect the classification results. According the
in Section VI-B, we split our full dataset into two parts with
F1 score results in Fig. 15, MalPat with 50 APIs achieves the
partition of 1:1. The first part is used to train the classifier to
best results, which means that this number of APIs is able to
extract the importance of different APIs. In the second part, we
distinguish malicious and benign apps with high precision and
utilize the importance of all the APIs obtained in the first part to
recall. Moreover, based on the 50 highly sensitive APIs, MalPat
retain MalPat. Training on different apps can ease the possibility
can be trained within 1 min on the whole dataset.
that MalPat has the prior knowledge of features of benign and
malicious apps. The results of MalPat with different number of
APIs are shown in Fig. 15. According to the figure, we can see VII. CONCLUSION
that the largest difference between the maximum and minimum To fight against malware, we study malicious and benign An-
values of precision is larger than 0.025, which means that the droid apps in the real world to mine hidden patterns of malware.
number of APIs as features has influence on the precision of Previous research work mainly focused on permissions, sen-
MalPat. MalPat with 50 APIs as features peaks at 0.9172 of sitive resources, intents, etc., and very few efforts have been
precision result, and even with only 20 APIs has the precision proposed for addressing the malware detection problem from
result of 0.9024. As to recall results, the top result is achieved API perspective. To fill this gap, we analyze the behaviors
with 20 APIs, and the change of recall rates of MalPat with the of malicious apps on API usage comparing with benign apps.
increase of API number is larger than that of precision results. Utilizing fine-grained features, we are able to mine the patterns
368 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018
of malware and extract highly sensitive APIs by training the ran- [9] Y. Aafer, W. Du, and H. Yin, “DroidAPIMiner: Mining API-level features
dom forests classifier. To assist Android app marketplaces, we for robust malware detection in android,” in Proc. 9th Int. Conf. Security
Privacy Commun. Netw., 2013, pp. 86–103.
propose an automated malware detection system, MalPat. Com- [10] M. Zhang, Y. Duan, H. Yin, and Z. Zhao, “Semantics-aware android mal-
prehensive experiments are conducted on the large scale dataset ware classification using weighted contextual API dependency graphs,” in
we collected from the Internet consisting of 31 185 benign apps Proc. 21st ACM Conf. Comput. Commun. Security, 2014, pp. 1105–1116.
[11] “Google Play,” [Online]. Available: https://play.google.com/store/apps
and 15 336 malicious apps. Compared with the state-of-the-art [12] “VirusShare.com,” [Online]. Available: https://virusshare.com/
approaches, MUDFLOW, DREBIN, and DroidAPIMiner, Mal- [13] “Contagio Mobile,” [Online]. Available: http://contagiominidump.
Pat outperforms them in both precision and recall. Based on the blogspot.com/
[14] W. Enck et al., “TaintDroid: An information-flow tracking system for
small feature set of 50 highly sensitive APIs, MalPat achieves realtime privacy monitoring on smartphones,” ACM Trans. Comput. Syst,
the F1 score of 98.24%. Experimental results show the effective- vol. 32, no. 2, p. 5, 2014.
ness and efficiency of MalPat, and highly sensitive APIs mined [15] Y. Zhang et al., “Vetting undesirable behaviors in android apps with per-
mission use analysis,” in Proc. 20th ACM Conf. Comput. Commun. Secu-
from malicious and benign apps reflect the patterns of malware rity, 2013, pp. 611–622.
that are able to identify malware. [16] L. K. Yan and H. Yin, “Droidscope: Seamlessly reconstructing the OS and
Despite the efficiency of MalPat, there still remains a great Dalvik semantic views for dynamic android Malware analysis,” in Proc.
21st USENIX Security Symp., 2012, pp. 569–584.
deal of work to improve the pattern mining and malware de- [17] Y. Feng, S. Anand, I. Dillig, and A. Aiken, “Apposcopy: Semantics-based
tection. Therefore, we will focus on the following topics in the detection of android malware through static analysis,” in Proc. 22nd ACM
future work. First, a larger scale dataset of apps need to be col- SIGSOFT Int. Symp. Found. Softw. Eng, 2014, pp. 576–587.
[18] I. Roy, D. E. Porter, M. D. Bond, K. S. McKinley, and E. Witchel, “Lam-
lected so as to avoid overfitting problem, which leads to the inar: Practical fine-grained decentralized information flow control,” in
second topic. Due to the limitation of the number of existing Proc. 30th ACM SIGPLAN Conf. Program. Lang. Des. Implementation,
malware collected on the Internet, mining patterns of malicious 2009, pp. 63–74.
[19] X. Xiao, N. Tillmann, M. Fähndrich, J. de Halleux, and M. Moskal, “User-
apps can be hard to expand. In order to address this problem, aware privacy control via extended static-information-flow analysis,” in
we will mainly focus on benign apps to study their behavior Proc. 27th IEEE/ACM Int. Conf. Autom. Softw. Eng, 2012, pp. 80–89.
patterns to exclude malware. MalPat only considers the differ- [20] O. Tripp and J. Rubin, “A Bayesian approach to privacy enforce-
ment in smartphones,” in Proc. 23rd USENIX Security Symp., 2014,
ence of malicious and benign apps but neglects the categories of pp. 175–190.
benign apps, which may affect the identification of benign apps [21] S. Arzt et al., “Flowdroid: Precise context, flow, field, object-sensitive
due to their category features. It is the third topic we are going to and lifecycle-aware taint analysis for android apps,” in Proc. 35th ACM
SIGPLAN Conf. Program. Lang. Des. Implementation, 2014, pp. 259–269.
overcome by taking categories of benign apps into consideration [22] W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri, “A study of an-
in the detection of malware. In future work, we will improve droid application security,” in Proc. 20th USENIX Security Symp., 2011,
the capability of MalPat and assist Android app marketplaces to pp. 21–21.
[23] L. Lu, Z. Li, Z. Wu, W. Lee, and G. Jiang, “CHEX: Statically vetting
fight against malware efficiently. android apps for component hijacking vulnerabilities,” in Proc. 19th ACM
Conf. Comput. Commun. Security, 2012, pp. 229–240.
[24] A. Bartel, J. Klein, Y. L. Traon, and M. Monperrus, “Automatically se-
ACKNOWLEDGMENT curing permission-based software by reducing the attack surface: An ap-
plication to android,” in Proc. 27th IEEE/ACM Int. Conf. Autom. Softw.
The authors would like to thank L. Huang for his assistance Eng., 2012, pp. 274–277.
[25] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior
in building the MalPat system. against app descriptions,” in Proc. 36th Int. Conf. Softw. Eng., 2014, pp.
1025–1035.
[26] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permis-
REFERENCES sions demystified,” in Proc. 18th ACM Conf. Comput. Commun. Security,
2011, pp. 627–638.
[1] “McAfee Labs Threats Report March 2016,” [Online]. Avail- [27] P.-H. Chen, C.-J. Lin, and B. Schölkopf, “A tutorial on ν-support vector
able: http://www.mcafee.com/us/resources/reports/rp-quarterly-threats- machines,” Appl. Stoch. Models Bus. Ind., vol. 21, no. 2, pp. 111–136,
mar-2016.pdf 2005.
[2] S. Rasthofer, I. Asrar, S. Huber, and E. Bodden, “How current android [28] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1,
malware seeks to evade automated code analysis,” in Proc. 9th Int. Conf. pp. 81–106, 1986.
Inf. Security Theory Practice, 2015, pp. 187–202. [29] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie, “Pscout: Analyzing
[3] “Symantec Report on the Underground Economy July 07-June 08,” the android permission specification,” in Proc. 19th ACM Conf. Comput.
2008. [Online]. Available: https://www.symantec.com/content/en/us/ Commun. Security, 2012, pp. 217–228.
about/media/pdfs/Underground_Eco n_Report.pdf [30] M. L. Vásquez, G. Bavota, C. Bernal-C árdenas, M. D. Penta, R. Oliveto,
[4] S. Rasthofer, S. Arzt, and E. Bodden, “A machine-learning approach for and D. Poshyvanyk, “API change and fault proneness: A threat to the
classifying and categorizing android sources and sinks,” in Proc. 21st success of android apps,” in Proc. 9th Joint Meeting Eur. Softw. Eng.
Annu. Netw. Distrib. Syst. Security Symp., 2014, pp. 1–15. Conf. ACM SIGSOFT Symp. Found. Softw. Eng., 2013, pp. 477–487.
[5] V. Avdiienko et al., “Mining apps for abnormal usage of sensitive data,” [31] W. J. Conover, Practical Nonparametric Statistics. 3rd ed. Hoboken, NJ,
in Proc. 37th IEEE Int. Conf. Softw. Eng, 2015, pp. 426–436. USA: Wiley, 1998.
[6] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, [32] “ActivityManager,” [Online]. Available: http://developer.android.com/
“DREBIN: Effective and explainable detection of android malware in reference/android/app/ActivityManager.html
your pocket,” in Proc. 21st Annu. Netw. Distrib. Syst. Security Symp., [33] “API Mappings,” [Online]. Available: http://www.inpluslab.com/
2014, pp. 1–12. mappings.html
[7] H. Peng et al., “Using probabilistic generative models for ranking risks [34] “TelephonyManager,” [Online]. Available: http://developer.android.com/
of android apps,” in Proc. 19th ACM Conf. Comput. Commun. Security, reference/android/telephony/TelephonyManag er.html
2012, pp. 241–252. [35] “IMSI,” [Online]. Available: https://en.wikipedia.org/wiki/International_
[8] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang, “Hey, you, get off of my market: mobile_subscriber_identity
Detecting malicious apps in official and alternative android markets,” in [36] S. Holm, “A simple sequentially rejective multiple test procedure,” Scand.
Proc. 19th Annu. Netw. Distrib. Syst. Security Symp., 2012, pp. 50–62. J. Statist., pp. 65–70, 1979.
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 369
[37] P. Jaccard, “Etude Comparative de la Distribution Florale Dans Une Por- Zibin Zheng (SM’16) received the Ph.D. degree in computer science and en-
tion Des Alpes et du Jura,” Bulletin del la Socit Vaudoise des Sciences gineering from the Department of Computer Science and Engineering, The
Naturelles, vol. 37, pp. 547–579, 1901. Chinese University of Hong Kong, Sha Tin, Hong Kong, in 2010.
[38] C. Tumbleson and R. Winiewski, “Apktool,” [Online]. Available: http:// He is currently an Associate Professor with the School of Data and Computer
ibotpeaches.github.io/Apktool/ Science, Sun Yat-sen University, Guangzhou, China. His research interests in-
[39] “ProGuard,” [Online]. Available: http://developer.android.com/tools/ clude services computing, software engineering, and blockchain.
help/proguard.html Dr. Zheng received the Outstanding Thesis Award of CUHK in 2012, the
[40] Y. Amit and D. Geman, “Shape quantization and recognition with ran- ACM SIGSOFT Distinguished Paper Award at ICSE2010, and the Best Student
domized trees,” Neural Comput., vol. 9, no. 7, pp. 1545–1588, 1997. Paper Award at ICWS 2010.
[41] T. K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. Document
Anal. Recognit., 1995, pp. 278–282.
[42] J. Ross Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA,
USA: Morgan Kaufmann, 1993. Ziying Guo is currently working toward the undergraduate degree at the School
[43] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and of Data and Computer Science, Sun Yat-sen University, Guangzhou, China.
Regression Trees. Belmont, CA, USA: Wadsworth, 1984. Her research interests include mobile computing and data mining.
[44] V. Avdiienko et al., “Mudflow,” [Online]. Available: https://www.st.cs.
uni-saarland.de/appmining/mudflow/
[45] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
Michael R. Lyu (F’04) received the B.S. degree in electrical engineering from
vol. 20, no. 3, pp. 273–297, 1995.
National Taiwan University, Taipei, Taiwan, in 1981; the M.S. degree in com-
puter engineering from the University of California, Santa Barbara, CA, USA,
in 1985; and the Ph.D. degree in computer science from the University of Cali-
fornia, Los Angeles, CA, USA, in 1988.
He is currently a Professor with the Department of Computer Science and
Guanhong Tao received the B.Eng. degree in computer science and technology Engineering, Chinese University of Hong Kong. He is also the Director of the
from Zhejiang University, Hangzhou, China, in 2014. He is currently working Video over Internet and Wireless (VIEW) Technologies Laboratory. His re-
the Master’s degree in the School of Data and Computer Science, Sun Yat-sen search interests include software reliability engineering, distributed systems,
University, Guangzhou, China. fault-tolerant computing, mobile networks, web technologies, multimedia in-
His research interests include mobile computing, program analysis, and formation processing, and e-commerce systems.
software security. Dr. Lyu is a fellow of the ACM, AAAS, and Croucher Senior Research.