0% found this document useful (0 votes)
10 views

MalPat Mining Patterns of Malicious and Benign Android Apps Via Permission Related APIs

Uploaded by

Muhammet Tan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

MalPat Mining Patterns of Malicious and Benign Android Apps Via Permission Related APIs

Uploaded by

Muhammet Tan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO.

1, MARCH 2018 355

MalPat: Mining Patterns of Malicious and Benign


Android Apps via Permission-Related APIs
Guanhong Tao , Zibin Zheng, Senior Member, IEEE, Ziying Guo, and Michael R. Lyu, Fellow, IEEE

Abstract—The dramatic rise of Android application (app) mar- of installing malware. Unlike desktop apps, mobile apps can
ketplaces has significantly gained the success of convenience for mo- have the privilege, after declared (e.g., in Manifest file of An-
bile users. Consequently, with the advantage of numerous Android droid platform), to access sensitive information such as contact
apps, Android malware seizes the opportunity to steal privacy-
sensitive data by pretending to provide functionalities as benign lists, SMS messages, GPS, etc. To make full use of resources of
apps do. To distinguish malware from millions of Android apps, mobile devices and support abundant functionalities of mobile
researchers have proposed sophisticated static and dynamic anal- apps, such mechanism of permission declaration remarkably
ysis tools to automatically detect and classify malicious apps. Most fulfills its job. Unexpectedly, however, it also provides the op-
of these tools, however, rely on manual configuration of lists of fea- portunity for malware to hijack and steal sensitive information.
tures based on permissions, sensitive resources, intents, etc., which
are difficult to come by. To address this problem, we study real- According the report released by McAfee [1] in March 2016,
world Android apps to mine hidden patterns of malware and are more than 13 million mobile malware samples were collected
able to extract highly sensitive APIs that are widely used in An- by 2015, and it recorded a 72% increase in new mobile mal-
droid malware. We also implement an automated malware detec- ware samples as to the last quarter. Data leaks are the typical
tion system, MalPat, to fight against malware and assist Android behaviors of malware to steal users’ contact lists, personal in-
app marketplaces to address unknown malicious apps. Compre-
hensive experiments are conducted on our dataset consisting of formation, even money. As more people change their payment
31 185 benign apps and 15 336 malware samples. Experimental re- habits from cash to mobile banking, it has been a serious chal-
sults show that MalPat is capable of detecting malware with a high lenge to protect mobile users from spiteful attackers who may
F1 score (98.24%) comparing with the state-of-the-art approaches. steal bank account credentials. Rasthofer et al. [2] identified a
Index Terms—Android applications, malware detection, new malware family Android/BadAccents that stole, within two
permission-related APIs, random forests, software security. months, the credentials of more than 20 000 bank accounts of
users residing in Korea. There are some underground groups
I. INTRODUCTION whose revenues are earned by trading stolen bank account cre-
HE past few years have witnessed the drastic increase of dentials. Symantec reported [3] that one underground group had
T mobile apps providing various facilities for personal and
business use. The proliferation of mobile apps is due to billions
made $4.3 million in purchases using stolen credit cards over a
two-year period.
of users who enable developers to earn revenue through adver- Under such severe situation of the security and privacy of
tisements, in-app purchases, etc. A multitude of apps developed mobile users, identifying malicious apps and defending against
by many independent developers, involving unfriendly ones, them from stealing sensitive information have posted an es-
can be hard for users to determine the trustworthiness of these pecially important challenge. Rasthofer et al. [4] proposed a
apps. Whenever users install a new app, they are under the risk machine-learning approach, SUSI, to identify lists of sources of
sensitive data (e.g., user location) and sinks of potential chan-
nels to leak such data to an adversary (e.g., network connection).
Manuscript received September 9, 2016; revised April 22, 2017 and October The categories published by SUSI along with other new cate-
9, 2017; accepted November 24, 2017. Date of publication December 20, 2017;
date of current version March 1, 2018. This work was supported in part by the gories of sensitive sources were adopted by MUDFLOW [5] to
National Basic Research Program of China (973 Project 2014CB347701), in part identify malware as well as their abnormal usage of sensitive
by the National Natural Science Foundation of China under Grant 61722214 and data. Their work both focused on the data flows that originate
Grant 61472338, in part by the Program for Guangdong Introducing Innovative
and Enterpreneurial Teams (2016ZT06D211), and in part by the Pearl River from sensitive sources, which were effective and efficient as to
S&T Nova Program of Guangzhou (201710010046). Associate Editor: Y. Le identify abnormal behaviors of malware, but when it comes to
Traon. (Corresponding author: Zibin Zheng.) classifying malicious apps, these approaches can be inefficient
G. Tao, Z. Zheng, and Z. Guo are with the School of Data and Computer
Science, Sun Yat-sen University, Guangzhou 510006, China, and also with and hard to assist Android app marketplaces fighting against
Collaborative Innovation Center of High Performance Computing, National malware. Permissions declared in Manifest file are easy to cap-
University of Defense Technology, Changsha 410073, China (e-mail: gwin- ture the intention of apps for data usage, which can be utilized
[email protected]; [email protected]; [email protected]).
M. R. Lyu is with the Department of Computer Science and Engineering, to identify malicious behaviors of apps [6]–[8]. Permission-
The Chinese University of Hong Kong, Shatin, Hong Kong, China (e-mail: based methods avoid high cost of time and computation, which,
[email protected]). however, only capture the coarse-grained features of apps. Ap-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. plication programming interfaces (APIs) provided by Android
Digital Object Identifier 10.1109/TR.2017.2778147 operating system, in contrast, profoundly demonstrate the full

0018-9529 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
356 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

with respect to benign apps in terms of Android APIs.1 Finally,


with the extracted features, malicious apps can be effectively
detected and classified by malware detection systems. Conse-
quently, we present MalPat, an automated malware detection
system, which mines malware patterns from known malware
samples automatically, and identifies malware efficiently.
The contributions of this paper are summarized in the
following:
Fig. 1. Detecting malware via permission-related APIs. Starting from a col-
lection of malicious and benign apps, we mine the patterns of malware. Based 1) We present a thorough study on the different usage of
on extracted highly sensitive APIs, we train a two-class classifier to identify permissions and APIs using statistical analysis techniques,
unknown apps. and reveal their capabilities of differentiating malware and
benign apps.
picture of app behaviors in data usage. DroidAPIMiner [9] con- 2) We mine the hidden patterns of malware by extracting
ducted an analysis to extract API-level features by statically highly sensitive APIs protected by permissions and ana-
counting the numbers of APIs used in malware and benign lyzing the correlation of different APIs.
apps to capture different usage. However, the frequency anal- 3) We make full use of characteristics of the machine learn-
ysis adopted by DroidAPIMiner may miss some key features. ing approach by engaging it much earlier in the system
Furthermore, APIs that are not permission related can be less architecture to learn the potential features from existing
of value, and may introduce noise affecting feature extraction. data.
In addition, DroidAPIMiner also utilized data flow analysis to 4) We propose MalPat,2 an automated malware detection
obtain the frequent parameters, which exposes the similar is- system, which extracts crucial features from tens of thou-
sue as SUSI [4] and MUDFLOW [5] did. DREBIN [6] also sands of apps automatically, to assist Android app mar-
adopted APIs as features but with more other types of fea- ketplaces to fight against malware.
tures, such as hardware components, permissions, intents, etc. 5) We conduct comprehensive experiments on a large-scale
These features were all manually selected and required pro- dataset consisting of 31 185 benign apps crawled from
fessional background knowledge. Besides, the final feature set Google Play [11] and 15 336 malware samples collected
contained about 545 000 features, which was a very large num- from VirusShare [12] and Contagio [13]. The experimen-
ber of features and even more than the number of samples tal results show that MalPat can detect malware with the
(123 453 benign apps and 5560 malware samples). It could F1 score of 0.9824 using only 50 APIs, and outperforms
require more time and efforts to extract these features as well the state-of-the-art approaches.
as to train machine-learning models on them. Instead of using The rest of this paper is organized as follows. Section II dis-
raw features of APIs, Zhang et al. [10] constructed a weighted cusses the related work addressing privacy and security prob-
dependency graph of APIs to represent the features of apps. lems on Android platform. Section III describes the dataset that
Their approach was based on data flow analysis and the graph we have collected. In Section IV, we statistically analyze the
generation process was complex, which could be inefficient to usage of permissions and APIs, mine the hidden patterns of
assist app marketplaces. Therefore, we employ a machine learn- malware, and discuss highly sensitive APIs. We then present
ing approach to automatically extract permission-related APIs our malware detection system, MalPat, in Section V. Compre-
that can be utilized to distinguish malware from millions of hensive experiments on the real-world dataset are conducted in
apps. Section VI. Finally, the conclusion and future work are described
In this paper, we study malicious and benign Android apps to in Section VII.
mine hidden patterns of malware from fine-grained perspective.
Android’s main security mechanisms are based on sandboxes, II. RELATED WORK
which control physical and virtual resources of mobile devices Malware detection and classification are challenging prob-
to restrict the privilege of mobile apps, and permission mech- lems, especially on mobile platforms. Researchers have paid
anism is one of them. As permissions present the sensitive us- great efforts to address these problems in various ways. In
age of Android resources, we focus on permission-related APIs this section, we discuss the previous work addressing malware
currently employed in various Android apps. Differing from problems.
permission-based methods, our study captures fine-grained fea-
tures extracting from APIs that contain more information of A. Dynamic Analysis
apps’ behaviors. In addition, unlike API-level approaches, we
neglect those APIs unprotected by permissions as they may in- In-depth analysis of malware and app behaviors is carried out
troduce noise into the classification process and increase the by many researchers from dynamic perspective. TaintDroid [14]
computation cost. Our study includes three parts, illustrated in identified sensitive information at a taint source, and tracked, dy-
Fig. 1 and detailed later in the paper. First, we collect real-world namically, the impact of labeled data to other data that might
Android apps consisting of 31 185 benign apps and 15 336 mal- 1 In the rest of this paper, API and Android API are both referred to the
ware samples from the Internet. Second, based on the analysis permission-related API on Android platform.
of our collected datasets, we extract hidden patterns of malware 2 http://malpat.inpluslab.com
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 357

leak the original sensitive information. The impacted data were classify malware. However, the permission declaration of An-
identified before they left the system at a taint sink. Comparing droid apps is coarse grained, lacking the ability of capturing
to TaintDroid, more information leaks were found by VetDroid in-depth behaviors of malicious apps. To this point, Gorla et al.
[15]. It constructed permission use behaviors to examine the in- [25] selected a subset of APIs that were governed by Android
ternal sensitive behaviors and find information leaks. VetDroid permission setting naming sensitive APIs, which were identified
can help identify subtle vulnerabilities in some apps. To cap- by Felt et al. [26]. These sensitive APIs were used as binary fea-
ture both the OS-level and Java-level semantics simultaneously tures to train an OC-SVM to identify the outlier apps. Similarly,
and seamlessly, DroidScope [16] collected detailed native and DREBIN [6] also utilized APIs as features but with more other
Dalvik instruction traces to track information leakage through types of features (hardware components, permissions, intents,
both Java and native components. These dynamic methods all etc.). These features were all manually selected and required
aim to conduct taint analysis to detect suspicious behaviors professional background knowledge. Besides, the final feature
during runtime. With the dramatically increasing number of set contained about 545 000 features, which was a very large
Android apps, it restricts dynamic methods from high-speed de- number of features and even more than the number of samples
tection. Besides, we aim to detect malware instead of capturing (123 453 benign apps and 5560 malware samples). It could re-
stealthy behaviors, dynamic analysis can be inefficient to assist quire more time and efforts to extract these features as well as
mobile app marketplaces fighting against malware. to train machine-learning models on them. Utilizing the cate-
gorization of Android APIs published by SUSI [4], Avdiienko
B. Static Analysis et al. [5] added three new categories to detect malware: sensitive
resources, intents, nonsensitive sources, and sinks. With these
Differing from dynamic analysis, static approaches can ef-
categories as app features, Avdiienko et al. trained a ν-SVM
fectively deal with millions of apps and analyze patterns of
[27] to detect malware (MUDFLOW). Differing from the fea-
malicious and benign behaviors without online running. Pre-
tures used in MUDFLOW, we only adopt the permission-related
vious studies focused on malware detection and classification
APIs used by apps without any other additional information to
were conducted via various methods, such as static taint analy-
detect malware. Moreover, random forests are employed in our
sis [17], information flow analysis [18], [19], and probabilistic
malware detection system as the classifier to identify malware.
models [7], [20], etc. DroidRanger [8] utilized a permission-
Random forest is a well-known machine learning approach in
based behavioral footprinting scheme to detect new samples of
classification and regression, which consists of a set of binary
known Android malware families and a heuristics-based filter-
decision trees [28]. By training multiple decision trees, random
ing scheme was adopted to identify certain inherent behaviors
forests combine the results from these decision trees with voting
of unknown malicious families. Permission-based method is
approach. Details of the training process of random forests are
coarse-granted and cannot capture more detailed behaviors of
illustrated in Section V.
apps. We analyze and elaborate more details in the rest of this
Previous studies focused on data flow can be inefficient for
paper. To capture fine-grained information, Apposcopy [17] pro-
malware detection and classification, resulting in high cost of
posed a high-level language to capture the signatures describ-
time and resources in fighting against malware. In contrast, sen-
ing semantic characteristics of malware families. Based on the
sitive or critical API-based approaches are easy to construct and
extracted signatures, a static analysis was conducted to detect
can adjust rapidly according to the change of malware. How-
certain malware families. The main concern of Apposcopy is
ever, very few efforts have been made in conducting a thorough
to identify certain family of malware, which is different from
analysis of permission-related APIs, which can significantly im-
the purpose of our work. Based on static data-flow analysis,
prove the efficiency of malware detection with easy and practical
Flowdroid [21], ded [22], and CHEX [23] were able to pre-
approaches. In this paper, we try to address this problem and
cisely detect potential malicious behaviors of Android apps.
give an empirical study of permission-related APIs.
Such technique can effectively capture the exactly stealthy and
malicious behaviors of apps, but it can introduce a very high
overhead and be inefficient to fight against malware. III. DATASET COLLECTION
To mine the hidden patterns of malware, we study the behav-
C. Machine Learning Approach iors of malicious and benign Android apps in the real world. In
Classifying malware automatically is an open problem com- this section, we briefly describe two sets of Android apps and
monly addressed by employing machine learning techniques. their characteristics.
Permissions are designed to protect sensitive resources on An- Benign apps: To collect the benign apps, we used Google
droid platform, which directly demonstrates the sensitive be- Play [11], a leading Android app marketplace in the world. Be-
haviors of Android apps. Bartel et al. [24] found that there were nignRan, the first dataset, consists of 28 787 apps randomly
a large part of apps suffering from permission gaps, i.e., not all collected from May to December 2015. The category distribu-
the permissions they declared were actually used. By analyz- tion of BenignRan is shown in Fig. 2. The right part in light
ing permission usage among millions of benign and malicious gray are the categories belonging to Games category. Except for
apps, they can effectively expose abnormal behaviors and fi- the Tools category, containing 3787 apps, other categories all
nally distinguish malware from various apps. Peng et al. [7] include almost similar amount of apps. The randomly collected
utilized the advantage of permissions and was able to correctly apps may have bias on categories and API usage. To ease such
358 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

aims at revealing the permission usage patterns of mal-


ware as compared to benign apps. The conjecture is that
the usage of permissions would not significantly differen-
tiate malware from benign apps. This conjecture is based
on the previous work carried out by Felt et al. [26] point-
ing out that apps tend to request extra privileges (declare
more permissions than they actually need). Thus, we test
the following null hypothesis:
H0 p : There is no significant difference between the per-
missions used by malicious and benign apps.
1) RQ2 : Is the usage of APIs in malware significantly dif-
ferent from that of benign apps? This research question
is similar to RQ1 ; however, it considers APIs instead of
Fig. 2. Categories of Android Apps collected from Google Play, where the
green histograms are the categories belonging to Games category. permissions as the main factor to analyze. It is intuitive
that apps tend to request more permissions in case of fur-
ther use of privileged resources (e.g., APIs), but never
potential influence, we also downloaded the top 100 most popu- actually acquire corresponding APIs. Based on this rea-
lar free apps for each of the 30 app categories based on the same sonable fact, APIs could reveal the real behaviors of apps
list. Avdiienko et al. [5] had collected as of March 1st, 2014. to some extent. Specifically, we test the null hypothesis as
Although some apps cannot be found due to unknown reasons, H0 a : There is no significant difference between the APIs
we were still able to collect 2398 apps, which formed our sec- used by malicious and benign apps.
ond benign dataset, BenignPop. These two benign datasets are In the following sections, detailed analysis on permissions
all employed in our experiments comprising the whole benign and APIs is conducted by comparing their use in corresponding
dataset, BenignAll, with 31 185 Android apps, and the detailed benign and malicious apps. Specifically in Section IV-C, we use
discussions are presented in Section VI. Mann–Whitney test [31] so as to capture statistical significance
Malicious apps: Our malicious apps were collected from two of the different usage of permissions and APIs in malware and
sources: benign apps.
1) A set consisting of 24 317 malicious apps was obtained As to mining the hidden patterns of malware comparing to
from VirusShare [12] with the period from May 6th, 2013 benign apps, we utilize machine learning approaches and suc-
to March 24th, 2014. As some of these apps could not cessfully extract highly sensitive APIs.
be correctly decompiled or the .apk files were missing,
14 843 apps were left for further study and evaluation.
2) 493 Android malware samples provided by Contagio [13] A. Permission Distribution
were also adopted in our dataset. These malicious apps
were collected from October 2010 to January 2016. An overview study of privacy and security is conducted on the
Hence, the malware dataset totally contains 15 336 malicious benign and malicious app datasets, and Fig. 3 gives the result of
apps. It should be noted that there is no overlap between the the permission usage in these apps. The top 25 most frequently
benign and malicious app datasets. used permissions are illustrated in Fig. 3 and sorted by the order
of usage percentage in the benign app dataset.
As shown in Fig. 3, some of the highly utilized permissions are
IV. PATTERN MINING
both adopted by benign and malicious apps, which can hardly be
In this section, we first make a rough analysis on permis- distinguished between these two types. Nevertheless, there are a
sion and API distribution of Android apps, respectively. In the few permissions that are frequently used by malicious apps, such
API-permission mapping list extracted by Pscout [29], there are as ACCESS WIFI STATE, SEND SMS and GET TASKS,
32 304 APIs governed by 71 permissions. As the differences etc. WiFi state can be utilized to obtain the network information
of APIs in different versions of Android operating systems, we as well as location context of users, which may be the reasons
adopt the API mapping list of Jelly Bean (Android 4.1) in the why it is highly used. As to SEND SMS, it is obvious that
following analysis. The influence of API change, which has bank verification and other sensitive information are protected
been studied by previous work [30], is out of the scope of this via messages. The last permission seems unreasonable that it is
paper. By analyzing permissions and APIs used in malicious widely adopted in malicious apps. However, if we take a look
and benign apps, we can obtain a brief statistical result indicting into the APIs protected by the permission GET TASKS, we
the difference between these two types of apps. can find one of them, i.e., getRecentTasks(). As illustrated
In order to understand to what extent benign and malicious in the API documents [32], this method returns a list of the
apps are of difference with respect to permissions and APIs, we tasks that the user has recently launched, which can be used
study two research questions in the following: by malware for malicious purposes, such as monitoring user
1) RQ1 : Is the usage of permissions in malware significantly behaviors, attacking certain tasks, etc. This method is no longer
different from that of benign apps? This research question available to third party applications, but there may still remain
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 359

Fig. 3. Top 25 most used permissions in benign and malicious app dataset.

Fig. 5. Top 25 most used APIs in benign and malicious app dataset.

big difference between malicious and benign apps. The differ-


ences of three permissions, READ EXTERNAL STORAGE,
WRITE SYNC SETTINGS, and USE CREDENTIALS,
even surpass 20%, but these three permissions are more used by
benign apps rather than malware, which is very different from
the study conducted by Peng et al. [7]. The malware dataset
used in their study contained only 378 apps, which may cause
unpredictable deviations. Based on our datasets, three permis-
sions expose significant difference between malicious and be-
nign apps, which can be utilized to distinguish their types. Such
coarse-grained features, however, cannot effectively differenti-
ate malicious and benign apps which is discussed in Section VI.
Therefore, regarding the research question RQ1 , it can be ob-
served that some permissions present the ability of distinguish-
ing malware and benign apps, but it is still hard to determine
Fig. 4. Difference of permission usage between malicious and benign apps, whether permissions actually possess such feature in the big
where the percentage of the differences is larger than 10%.
picture. We try to answer this question in Section IV-C. In the
following section, we study the difference of these two types of
apps from a fine-grained perspective, i.e., API usage.
such methods that can be used for malicious purposes, which is
studied in this paper.
To obtain a deeper analysis of the difference of permission B. API Distribution
usage between malicious and benign apps, we compare the per- Statistical patterns of API usage are studied and the top
centage of two datasets. Fig. 4 shows the results that the per- 25 most frequently used APIs in both benign and malicious
centage of the differences is larger than 10%, where the positive app datasets are illustrated in Fig. 5. There are 31 APIs
values represent that more malicious apps use the current per- in the figure, where the whole mapping of API IDs rang-
mission than benign apps. There are 12 permissions denoting a ing from 0 to 32303 is listed on our website [33]. We can
360 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

Fig. 6. Difference of API usage between malicious and benign apps, where
the percentage of the difference is larger than 20%.

Fig. 7. Mann–Whitney test (p-value) on Android permissions. The red line


observe that most of the top used APIs are not distinguish- represents the significant level (α = 0.05).
able as they are all widely employed in both benign and mali-
cious apps. However, API IDs, 14824, 15057, and 15225, are
more frequently used in malware comparing with benign apps. detection. The use of APIs do show a certain level of differ-
75.15% malicious apps use the method, getDeviceID(), from ence between malware and benign apps. However, similar to
android.telephony.TelephonyManager package. The method the analysis in the last section, we still cannot conclude the
getDeviceID() returns the unique device ID, e.g., the IMEI for significant difference regarding research question RQ2 without
GSM and the MEID or ESN for CDMA phones [34], which can more detailed and thorough analysis, which is carried out in the
be used to identify mobile devices as the tag to trace anything Section IV-C.
occurring on those devices. If such sensitive identifications are
utilized for malicious purposes, it can be dangerous and risky
to the privacy and security of mobile users. Therefore, meth- C. Permissions Versus APIs
ods that may lead to unsecure information leaks should be paid In this section, we use Mann–Whitney test [31] to analyze
close attention, especially the ones that are frequently employed statistical significance of the different usage between malicious
in malware rather than benign apps. and benign apps. The significance level to reject the null hypoth-
To find the big difference of API usage between two types esis is set as α = 0.05. As the API-permission mapping list [29]
of apps, we select those APIs whose differences are larger with 71 permissions and 32 304 APIs are employed in the study,
than 20% in Fig. 6. The methods whose API IDs are 9723, we apply tests on each permission and each API of malicious
14824, 15057, 15225, and 17626 are more frequently en- and benign apps, as well as the average usage of them in the
gaged in malicious apps. Specifically, the difference of the following three sections.
method 15057 is larger than 40%. We list these APIs with 1) Permission test: We separate the malicious and benign
large differences in Table I. The method with API ID 15057 is apps into two groups, and use the Mann–Whitney test to analyze
getSubscriberId() from the same package of getDeviceID(). statistical significance on every permission. More specifically,
Differing from getDeviceID(), the method getSubscriberId() for one permission, if an app (malicious or benign one) declared
can be much more dangerous as it returns the unique sub- this permission, then the sample value is set to 1; otherwise,
scriber ID, e.g., the IMSI for a GSM phone [34]. According to it will be set to 0. Therefore, in one test, two sets of samples
Wikipedia [35], International Mobile Subscriber Identity (IMSI) represent one specific permission usage of malicious and benign
is used in any mobile network that interconnects with other net- apps, respectively, and the range of each sample value is {0, 1}.
works. Especially, for GSM, UMTS and LTE networks, the Fig. 7 shows the p-values of all the tests on permissions and
IMSI number is provisioned in the SIM card, which means that the red line represents the cutoff level (α = 0.05). Since we ap-
it can be utilized to mark any SIM card, i.e., mobile user. Hence, ply multiple tests on permissions, we adjust these p-values using
it is essential and important to mine the API usage patterns from Holm’s correction procedure [36]. The adjusting procedure sorts
malicious and benign apps. Table I also lists the permissions that the p-values from n test in ascending orders, and then multiply
govern those APIs. Some methods, such as API IDs of 6004, them with n to 1, respectively. For instance, let p(1)  p(2)
9723, etc., are protected by more than one permissions. As the  · · ·  p(n ) be the p-values of n test on permissions, and
protection of APIs are overlapped via permissions, the features 0 < α < 1 is the significant level. The assessment on the n
based on permissions can be hard to clearly and correctly re- hypotheses is performed as follows:
flect the patterns of apps and to detect malware. Therefore, we
employ fine-grained features, i.e., permission-related APIs, to α α α
mine hidden patterns of malicious apps for efficient malware p(1) > , p(2) > , ..., p(n ) > . (1)
n n−1 1
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 361

TABLE I
API-PERMISSION MAPPINGS

API ID Package Name Method Name Permission

6004 com.android.providers.media. openFileAndEnforce READ_EXTERNAL_STORAGE


MediaProvider PathPermissionsHelper() WRITE_EXTERNAL_STORAGE
7483 com.android.browser. <init>() INTERNET
GoogleAccountLogin
9723 android.telephony.TelephonyManager getCellLocation() ACCESS_COARSE_LOCATION
ACCESS_FINE_LOCATION
9737 android.accounts.AccountManagerService getAuthToken() USE_CREDENTIALS
10094 android.accounts.AccountManagerService invalidateAuthToken() USE_CREDENTIALS
MANAGE_ACCOUNTS
11784 android.content.ContentService removePeriodicSync() WRITE_SYNC_SETTINGS
14824 android.telephony.TelephonyManager getDeviceId() READ_PHONE_STATE
15057 android.telephony.TelephonyManager getSubscriberId() READ_PHONE_STATE
15225 android.telephony.TelephonyManager getLine1Number() READ_PHONE_STATE
15610 com.android.contacts.activities. onAttachFragment() READ_PHONE_STATE
ContactDetailActivity GET_ACCOUNTS
READ_SYNC_SETTINGS
17626 android.telephony.SmsManager sendTextMessage() SEND_SMS
20199 com.android.calendar.DayView <init>() READ_CALENDAR
21058 android.net.ConnectivityManager isActiveNetworkMetered() ACCESS_NETWORK_STATE
24662 android.widget.VideoView pause() WAKE_LOCK
24677 android.widget.VideoView stopPlayback() WAKE_LOCK
24713 android.widget.VideoView start() WAKE_LOCK
26195 android.webkit.WebViewClassic drawContent() WAKE_LOCK
26603 android.webkit.WebViewClassic onPause() WAKE_LOCK
26828 android.widget.VideoView setVideoPath() WAKE_LOCK

After the adjustment, we can notice that the usage of 31 (out


of 71) permissions in malware exhibits a statistically signifi-
cant difference as compared to benign apps (p-values < 0.05).
From the above analysis, we cannot directly accept or reject the
null hypothesis H0 p as not all the tests presents the statistical
significance. However, the 31 permissions that shows statistical
significance can be employed as major features in analyzing
malware and worth further study.
2) API test: Similar to the last section, this section focuses
on the tests on APIs instead of permissions. For one specific
API, call sites of this API used in the app (malicious or benign
one) indicate the sample value of this app. The sample value is
set as 0 if this API is never used in the app. Therefore, in each
test, two sets of samples represent one specific API usage of
malicious and benign apps, respectively, and the range of each
sample value is [0, +∞). Fig. 8 illustrates the p-values of all
the tests on APIs and the red line represents the cutoff level Fig. 8. Mann–Whitney test (p-value) on Android APIs. The red line represents
(α = 0.05). From the figure, we can observe that most tests on the significant level (α = 0.05).
APIs have p-values larger than α, which indicates that most test
results cannot reject the null hypothesis H0 a . Since we apply
multiple tests on APIs as well, the p-values are also adjusted 3) Comparison test: The previous two sections study the
with Holm’s correction procedure. After adjustment, we notice statistical significance of each permission and API one by one,
that the usage of 106 (out of 32 304) APIs in malware exhibits and the conclusions are based on intuitive analysis. In this sec-
a statistically significant difference as compared to benign apps tion, we investigate the different usage of permissions and APIs
(p-values < 0.05). From the above analysis, it can be noticed between malicious and benign apps by considering all the em-
that only a very small part of APIs usage have the statistical ployment of these features. For each permission, we compute
significance between malware and benign apps. In addition, the average declaration in all malicious and benign apps, re-
with these test results, we cannot directly accept or reject the spectively. Thus, two sets of samples represent the average dec-
null hypothesis H0 a , but the usage of 106 APIs in malware is laration of permissions in malware (malware_perm) and benign
worth further study as it provides fine granted and more features apps (benign_perm), and the range of sample value lies in [0, 1].
than permissions. The first line of Table II reports the result of the Mann–Whitney
362 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

TABLE II
USE OF PERMISSIONS AND APIS BY MALICIOUS AND BENIGN APPS:
MANN–WHITNEY TEST (p-VALUE)

Test p-value

malware_perm versus benign_perm 0.3442


malware_API versus benign_API <0.0001

test (p-value) on average permission usage. Apparently, the av-


erage usage of permissions in malware does not show a statis-
tically significant difference comparing to benign apps (p-value
> 0.05). Hence, from the perspective of average usage of per-
missions, we cannot reject the null hypothesis H0 p , and the
conclusion is that there is no significant difference between the
Fig. 9. Weights of Android APIs.
permissions on average used by malicious and benign apps. For
each API, we compute the average call sites in all malicious and
benign apps, respectively. Thus, two sets of samples represent TABLE III
the average call sites of APIs in malware (malware_API) and API WEIGHT DISTRIBUTION
benign apps (benign_API), and the range of sample value lies
in [0, +∞). The second line in Table II shows the p-value of Weight >0.01 >0.001 >0.0001 >0 All
the Mann–Whitney test on average usage of APIs. As we can APIs 26 137 315 2939 32 304
notice from the table, the usage of APIs in malware exhibits a
statistically significant difference as compared to benign apps
(p-value <0.05). Therefore, from the perspective of average us-
age of APIs, we can reject the null hypothesis H0 a , and come of apps through one permission as the requirements of func-
into the conclusion that APIs used by malware are on average tionalities. Permission-related APIs, on the other hand, can not
significantly different from APIs used by benign apps. only hold the features of permissions, but also be more dis-
Summarizing, the usage of APIs in malware demonstrates tinguishable among millions of apps. Therefore, we take these
statistical significance as compared to benign apps. In analyz- permission-related APIs as the features of apps, and trained
ing behaviors of malicious and benign apps, APIs can be a a random forests classifier. The random forests classifier can
nonnegligible factor with respect to permissions. Therefore, in learn the importance of different APIs in the training process
this paper, we mainly focus on API usage patterns of malicious and output the weights of APIs. The training process is pre-
and benign apps. More details are analyzed in the following sented in Algorithm 1 of Section V. The weights given by the
sections. classifier indicate the importance of APIs in classifying ma-
licious and benign apps. As shown in Fig. 9, the weights of
different APIs trained and tested on real-world datasets range
D. Hidden Patterns
from 0 to 0.045. Most of the APIs cannot be used to clas-
To mine the hidden patterns of malware, we study the APIs ex- sify malicious and benign apps as their weights are all zero.
tracted by a trained classifier that are highly sensitive in malware Only a few of these APIs have the ability to distinguish the
classification, and analyze the co-used APIs in both malicious types of apps. The weight distribution of APIs is listed in
and benign app datasets. Some specific APIs have the capa- Table III. Among the 32 304 permission-related APIs, only 2939
bilities to distinguish malware from millions of Android apps. APIs have nonzero weights. The weights of 26 APIs listed on
Therefore, by training a classifier, it gives a full picture of the our website [33] are over 0.01, which shows the importance of
API usage in different types of apps and identify malicious pat- them to identify malware. We compared the 26 APIs with the
terns with highly sensitive APIs. As co-used APIs can represent most different API usage listed in Table I and found that 8 of
the usage patterns of malware stealing users’ sensitive infor- them are the same. The API 17626 with the highest weight of
mation, the difference of co-used APIs between malicious and 0.042 is also among the eight highly sensitive APIs. As listed
benign apps is an effective and efficient way to attract the prob- in Table I, the API 17626 is the method sendTextMessage()
lem. In the following sections, highly sensitive APIs extracted from package android.telephony.SmsManager, which is also
through a trained classifier are presented, and co-used APIs are identified as a suspicious API call in DREBIN [6]. Other highly
studied based on both malicious and benign app datasets. sensitive APIs such as getDeviceID(), getSubscriberId(), and
1) Highly sensitive APIs: The permission mechanism is uti- getLine1Number(), etc., are all highly dangerous APIs which
lized to protect sensitive information on Android platform so have discussed in the previous sections. These APIs with high
as to prevent unpredictable data leaking by malware. However, weights are especially important in identifying malware. Hence,
there are hundreds and thousands of APIs governed by only one we utilize this kind of APIs to retrain our classifier, which is dis-
permission, which makes it hard to distinguish the behaviors cussed in Section V.
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 363

Fig. 10. Distribution of the correlation of co-APIs λ in datasets. (a) Distribu-


tion of λ value in malicious app dataset. (b) Distribution of λ value in benign
app dataset.

2) Coused APIs: To investigate the correlation between dif-


ferent APIs, we first define the correlation value of co-used APIs Fig. 11. Top 20 most different co-API usage between malicious and benign
(co-APIs) in the following. apps. The positive percentage denotes that more malicious apps employ the
Definition 1 (Co-API): Given two APIs ai and aj , if ai and current co-API comparing with benign apps, and vice versa.
aj are both employed in the same (benign or malicious) app,
these two APIs are called co-API. The correlation of co-APIs
is the possibility how they are likely to be used together in the
app dataset, which is to calculate the co-API value λ as Jaccard
measure [37]:
D(ai ∩ aj )
λ= (2)
D(ai ∪ aj )

where D(ai ∩ aj ) indicates the number of unique apps that used


API ai and aj together, and the number of apps that used either
API ai or aj is D(ai ∪ aj ).
Fig. 10 shows the co-API usage in malicious and benign
apps, respectively. In Fig. 10(a), it shows the correlation value λ
against the ratio of co-APIs. From the figure, we can notice that
about 40% of co-APIs have the correlation value of 1.0. But
Fig. 12. Architecture of MalPat.
we found that most of these co-APIs with λ = 1.0 were only
employed in one malicious app. As to benign apps shown in
Fig. 10(b), there is not a large part of the dataset with λ value of
benign apps. This observation indicates the hidden patterns
1.0. About only 10% of API pairs have the correlation of 1.0 in
of malware and can be utilized to identify malicious apps. As
benign apps. To compare the difference of co-API usage of ma-
the correlation of APIs is considered in the training process of
licious and benign apps, we select top 20 most different co-APIs
random forests, the co-APIs are not regarded as features in the
employed in malicious and benign app datasets. As illustrated in
classifier.
Fig. 11, all the 20 co-APIs have the difference larger than 35%,
which are the significant patterns indicating app behaviors.
V. MALWARE DETECTION
The positive value of difference denotes that the current
co-API is employed in malware more than benign apps. The To assist Android app marketplaces to fight against malware,
largest difference is achieved by the co-API <14837, 15057>, we proposed an automated malware detection system, MalPat,3
which is employed by 43.08% more apps in malicious app to detect any suspicious Android apps. Permission-related APIs
dataset. The API 14837 is the method <init>() from package are adopted as main features in MalPat to classify malicious and
com.android.emailcommon.service.EmailServiceProxy. benign apps based on their unique usage patterns. As shown in
We cannot find the official document about this API, but from Fig. 12, there are two main parts of MalPat to detect malware:
the name of the method, we can conjecture that this API is model training and malware detection. A newly coming Android
used to set email services, more specifically, the proxy of email app is decompiled first to extract permission-related APIs and
services. Obviously, this is a sensitive API that can be hijacked highly sensitive APIs are selected as the features. Apps in the
by malicious behaviors. The co-used API 15057 discussed database consisting of malicious and benign apps are engaged to
in Section IV-A is the method getSubscriberId(), which is train the classifier. The detection process is based on the trained
also a very dangerous API. From the top 20 most different classifier. Detailed explanations are presented in the following.
co-APIs, we observe that 8 of them contain the API 15057, and
these co-APIs are all employed in malicious apps more than 3 http://malpat.inpluslab.com
364 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

A. Feature Extraction
Algorithm 1: Classifier Training.
Data samples are the base of training a model. As described Require:
in Section III, we crawled two sets of Android apps, malicious A : App Set
and benign apps, comprising the datasets of our training pro- F : Feature Set
cess. There are 31 185 apps in the benign app dataset and L : Label Set
15 336 malware samples in the malicious app dataset. With k : Number of Decision Trees
all these malicious and benign apps, we can extract features Ensure:
from the source codes of decompiled files. The installation C : Random Forests Classifier
package of Android apps is the .apk file, which can be dis- 1: function Train A, F, L, k
assembled by the well-known decompiling tool, Apktool [38]. 2: for i ← 0, k do
It can recover main files organizing source codes into a partic- 3: Ai ← Randomly selected N apps
ular way in the smali folder, and the methods implemented 4: Li ← Labels of Ai
in the source codes are in the following format after being 5: for each node n in decision tree Ti do
decompiled: 6: Fi ← Randomly selected m features
7: f = arg max Gini(Ai , Li , Fi )
8: Generate n using feature f
android/net/ConnectivityManager;
9: end for
− > getActiveNetworkInfo() 10: end for

11: C = k0 Ti
The first part android/net/ConnectivityManager presents the 12: return C
package of the invoked method, and the second part is the target 13: end function
method, getActiveNetworkInfo(), used in the app. Based on
this, we can traverse all the decompiled source codes to extract
employed APIs of the target app, which form the initial feature [43]. Specifically, the subject function is defined as follows:
set. For all the apps in both malicious and benign app datasets,
⎛ ⎞
the numbers of permission-related APIs are extracted as the fea-
  2
tures to train the classifier. For each app, the feature consists of |A0 | ⎝ |Ck | ⎠
Gini(A, f ) = 1−
32 304 items, where each item represents the number of call |A| |A0 |
k =0,1
sites of the current API used in the target app. The features we ⎛ ⎞
extracted for classification could resist to the code obfuscation |A1 | ⎝   |Ck | 2
that does not obscure API calls of Android operating system. + 1− ⎠ (3)
|A| |A1 |
The manual of an official Android obfuscation tool, ProGuard k =0,1

[39], explicitly confirms this. Benign apps are regarded as neg-


ative samples and malicious apps as positive samples. With where A denotes the sample set containing malicious and benign
the extracted features, we utilize random forests to train the apps at a specific tree node. A0 and A1 represent the subsets of
malware classifier, and details are illustrated in the following A (A0 ∪ A1 = A) that are classified as benign and malicious
section. apps, respectively. The feature f is adopted to split the sample
set at the node. Ck is the sample set belonging to the class k.
Each trained decision tree outputs a classified result and the final
B. Malware Classifier result of random forests is combined with all the results from
Random forests, proposed by Amit et al. [40] and Ho [41] these decision trees using voting process. The voting process
independently, have been widely utilized in classification and is based on majority rule, i.e., selecting alternatives with more
regression. To construct random forests is to train a set of deci- than half of the votes. Algorithm 1 shows the training process
sion trees [28], [42] separately, and to combine them with the of the random forests classifier, which includes the following
voting approach. We formalize our problem as a binary classi- steps:
fication of apps. Each app a is described by the API features 1) Step 1 (lines 2–4): Select N apps from the full app dataset
f = (f1 , f2 , . . . , fi , . . . , fn ), where fi denotes the call sites of A randomly as the initial dataset for each decision tree Ti
API i. In the training step, a set of labels is given to deter- of random forests.
mine the type of each app, and 1 denotes malware and 0 denotes 2) Step 2 (lines 5–9): Let M be the size of the feature set
benign apps. The construction of random forests consists of a F. For each node n of the decision tree Ti , m APIs are
collection of decision trees and the number of decision trees is selected randomly as the feature set Fi to compute the
set manually. Each decision tree is constructed in a top-down Gini impurity [43], where m  M . The feature f with the
fashion starting from the root. At each node of the decision tree, min value of Gini impurity is selected as the best feature
it splits the training set into two subsets with different labels by to generate the node of decision tree. It should be noted
minimizing the uncertainty of the class labels. The uncertainty that the value of m is the same during the construction of
is evaluated in our classifier by computing the Gini impurity each decision tree.
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 365

3) Step 3 (lines 10–11): Construct k decision trees from steps Equation (4) denotes that how many of the true malicious apps
1–2, and the result is decided by the voting approach, i.e., are correctly identified. The value of precision is in the interval
the type of each app is decided by the major result of all of [0, 1], and the large value indicates the correctness of the
the outputs of these decision trees. malware detection system. Equation (5) denotes that how many
After the training process, the parameters of random forests of the malicious apps identified by the detection system are true
at each node of each decision tree are set and have the capability malware. The value of recall is also in the interval of [0, 1].
of classifying apps. Therefore, in testing process, each app with Equation (6) is the F1 score, which is the harmonic average
a feature vector can be determined into a certain type with the of precision and recall. The value of F1 score is also in the
trained malware classifier. Our full dataset is split into two parts interval of [0, 1]. In order to fight against malware, we focus on
for extracting highly sensitive APIs and training final classifier, the correctness of the identification of malicious apps instead
respectively. The full set of APIs is employed as the features of benign apps. Therefore, precision, recall, and F1 score of
in the training process on the first dataset part. After the first malware are adopted as evaluation metrics in the experiments.
process, we are able to extract highly sensitive APIs based on
the importance of APIs, i.e., the weights of APIs learned from B. Experimental Setup
the first training process. Therefore, the APIs with large weights
In the experiments, MUDFLOW [5], DREBIN [6], and
are adopted to retrain the classifier, which is subsequently used
DroidAPIMiner [9] are employed as the state-of-the-art ap-
to detect malware.
proaches to compare with our MalPat. In comparison with these
The detection process is based on the trained classifier. When
methods, we select the intersection dataset of the dataset used in
a new app comes, it is decompiled by Apktool [38]. As described
article [5] and our dataset, which contains 2398 benign apps and
above, the source codes of the app are all stored in the smali
13 840 malware samples. Therefore, our remaining dataset con-
folder. We extract all the permission-related APIs, including the
sists of 28 787 benign apps and 1496 malicious apps, and there is
number of call sites for each API, as the initial features. Based on
no interaction with the dataset adopted to compare with MUD-
the app database, highly sensitive APIs are able to be mined from
FLOW, DREBIN, and DroidAPIMiner. This remaining dataset
thousands of permission-related APIs by training the random
is employed in our MalPat to extract highly sensitive APIs. Ex-
forests classifier. Therefore, only these highly sensitive APIs
cept the experiments comparing with MUDFLOW, DREBIN,
are utilized as the final features of the newly coming app. These
and DroidAPIMiner, the datasets used in the later experiments
features are then employed as the input to obtain the result of
comparing with the baseline methods are our full malicious and
the app type.
benign datasets consisting of 15 336 malicious apps and 31 185
VI. EVALUATION benign apps. Besides, in extracting highly sensitive APIs, the
In this section, we present the evaluation metrics adopted dataset is split into two parts with the partition of 1:1, which is
in our experiments. With the datasets of benign and malicious illustrated in Section V. For all the experiments, we randomly
apps described in Section III, comprehensive experiments are select from 50% to 90% of both malicious and benign datasets as
conducted in our malware detection system, MalPat. The com- the training set, and the remaining part is regarded as the test set.
parison with the state-of-the-art approaches is also illustrated in We repeat each experiment for ten times and average the results.
this section. Details of experiments are given in the following. In addition, the number of decision trees trained in the random
forests classifier is 200, and remains the same. In Section VI-D,
A. Evaluation Metrics we also study the impact of highly sensitive APIs, i.e., how the
To evaluate the performance of malware detection, we use number of APIs used to retrain the classifier influences the final
precision and recall metrics. As malicious apps are positive result. The partition of training and test set in the study of the
samples and benign apps are negative samples in our evaluation, impact of highly sensitive APIs is 9:1. Details of experiments
we first present three types of values: and discussions are presented in the following sections.
1) (tp: true positive): The number of malicious apps that are
correctly identified as malicious apps. C. Comparison With State-of-the-Art Methods
2) (fp: false positive): The number of benign apps that are To demonstrate the effectiveness and efficiency of our auto-
incorrectly identified as malicious apps. mated malware detection system, MalPat, we compare it with
3) (fn: false negative): The number of malicious apps that are existing state-of-the-art approaches. We test on two versions of
incorrectly identified as benign apps. our MalPat system: the one using the full set of 32 304 APIs
Therefore, the metrics precision and recall can be calculated (MalPat) and the one only employing top 50 highly sensitive
as follows: APIs (MalPat50).
tp The first method we utilize to compare with MalPat is MUD-
precision = (4)
tp + f p FLOW [5]. We downloaded the source code from the website
tp of MUDFLOW [44] and reran the scripts with its optimal set-
recall = (5) tings on our intersection dataset. The experimental results are
tp + f n
shown in Fig. 13. It is obvious that MUDFLOW cannot com-
precision · recall pete with MalPat under all the measures. Especially, MalPat50
F1 = 2 · . (6)
precision + recall outperforms MUDFLOW with 3% precision value, 2% recall
366 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

Fig. 13. Comparison with state-of-the-art methods. (a) Precision results. (b) Recall results. (c) F1 score results.

rate, and 2% F1 score. The classifier employed in MUDFLOW then employed in DroidAPIMiner to identify malware, and the
is support vector machine (SVM) [45], and the features of apps results are shown in Fig. 13. Clearly, DroidAPIMiner could not
are extracted according to a manually selected list. MalPat, on compete with other approaches including MalPat. Especially, all
the other hand, takes full advantages of the random forests clas- the recall rates and F1 scores are worse than any of other meth-
sifier by engaging it during the feature set construction, which ods. We carry out the experiment on DREBIN with restricted
manages to capture more information of disparate behaviors API calls in the previous paragraph, and it is observed a surpris-
between benign and malicious apps. ing result. By comparing with the results of DroidAPIMiner, it
The second state-of-the-art method to compare with is can be explained that restricted API calls or permission-relate
DREBIN [6], which used APIs as well as other types of features APIs do have better capability to distinguish malware from mil-
to detect malware. There are two sources of features utilized lions of apps. This proves our statement in Section I that APIs
in DREBIN. The first source of feature sets is from the mani- that are not permission-related may introduce noise affecting
fest and another one is from disassembled code. Especially, the feature extraction.
feature sets from the disassembled code include the similar API By comparing with existing state-of-the-art approaches, it
features as we used in MalPat from PScout [26]. From Fig. 13(a), demonstrates the effectiveness of MalPat in identifying mal-
it can be observed that DREBIN’s precision results are between ware. The highest recall rate achieved by MalPat is 0.9963,
MalPat and MalPat50, which means DREBIN has similar ca- which means that only 51 out of 13 840 malicious apps are
pability of correctly identifying malicious apps to MalPat. Its not detected. In the following section, we study how different
recall rates in Fig. 13(b) are all lower than both versions of Mal- features of apps can affect the results of MalPat on our full
Pat, and the largest difference can be more than 0.02. The recall dataset.
rate demonstrates the capability of detecting malware from mil-
lions of apps. Apparently, DREBIN is worse than MalPat on
D. Comparison With Baseline
this functionality. The reason can be the same to MUDFLOW,
where DREBIN uses a manually selected features, but according To study the influence of different features of apps on Mal-
to the fair results of DREBIN that are better than MUDFLOW, Pat, we experiment on different sets of API features like the
the features employed in DREBIN do show their effectiveness previous section, where MalPat includes the full 32 304 APIs
in identifying malware. We have an assumption that the API fea- and MalPat50 contains the top 50 highly sensitive APIs. Be-
tures employed in DREBIN, which are similar to the ones used sides, a baseline method we compared with is the one using
in our MalPat, are the key to detect malware. Thus, we mod- permissions declared in the Manifest file of Android platform
ify the original DREBIN by reducing its feature sets to only (Perm). Permission-based methods can avoid high cost of time
the one that contains restricted API calls. We test this method and computation, which, however, can only capture the coarse-
on 90% training set for 10 times, and the averaged results are grained features of Android apps. Because there are thousands
0.950910 precision value and 0.972977 recall rate. These results of Android APIs governed by each permission. Hence, using
are almost the same as the ones generated by MUDFLOW. It is permissions as the features of apps can miss the full view of app
unexpected but also reasonable. Because our MalPat also uses behaviors due to the lack of in-depth information. To demon-
the similar API features and achieves the best results. strate the weakness of using permissions as features, we conduct
The last state-of-the-art approach is DroidAPIMiner [9]. It the experiments on our MalPat with both APIs and permissions,
also employs Android system calls as its major features. Be- respectively.
sides, DroidAPIMiner adds APIs with similar support whose As illustrated above, MalPat surpasses the state-of-the-art ap-
parameters are more frequent in the malware set as well, but the proaches, MUDFLOW, DREBIN, and DroidAPIMiner, on mal-
later part does not make a difference on the final results. There- ware detection, so the experiments on MalPat can directly show
fore, we extract the APIs that have a usage difference of more the effectiveness of using APIs instead of permissions. Fig. 14
than 6% between malicious and benign apps. These features are shows the results of MalPat and Perm methods. According to
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 367

Fig. 14. Comparison with baseline methods on the full datasets. (a) Precision results. (b) Recall results. (c) F1 score results.

the results shown in Fig. 14, it is obvious that MalPat with APIs
as features outperforms the one using permissions under all per-
centage of training set. The precision results of MalPat, shown
in Fig. 14(a) are all between 0.90 and 0.94, and there is no big
change with the increase of the training set. The difference be-
tween MalPat and Perm is near 0.1 when 50% of apps are used
for training. There is one observation that the precision results of
MalPat are better than that of MalPat50, but MalPat50 surpasses
MalPat on the results of recall. It is interesting that fewer APIs
employed as features can improve the recall rate, which means
that highly sensitive APIs actually represent the malicious be-
haviors of malware and have the capability to identify malware.
As observed, permissions of Android apps lack the ability to
Fig. 15. Impact of highly sensitive APIs on the efficiency of MalPat.
capture the fine-grained features, which can be addressed by
adopting the APIs governed by them. Permission-related APIs
not only capture more detailed behaviors of apps, but also have
Moreover, there is one special case that should be noted. The
the similar features as permissions do.
recall rate reaches the largest value with 20 APIs, and then it
drops from 0.8918 to 0.8696 as the number of APIs increases
E. Impact of Highly Sensitive APIs from 20 to 200. Although there is a small increase when the
number of APIs increases from 100 to 200, but from the blue
In our model training process, we aim to extract highly sen-
line, we can observe that MalPat with full set of APIs as features
sitive APIs so as to detect malware with as few features as
has lower recall rate than the one with 50 APIs. The numbers
possible. As discussed in Section IV-B-1, different APIs have
of malicious and benign apps are both similar to the numbers
different importance in capturing malicious behaviors and iden-
of all the APIs. Therefore, if all the APIs are adopted as the
tifying malware. Therefore, to study the impact of different
features to train the classifier, it may overfit on the training set
number of APIs as features, we conduct experiments on MalPat
and can misclassify malicious apps. On the other hand, too few
with the number of APIs ranging from 10 to 200. As illustrated
features can also affect the classification results. According the
in Section VI-B, we split our full dataset into two parts with
F1 score results in Fig. 15, MalPat with 50 APIs achieves the
partition of 1:1. The first part is used to train the classifier to
best results, which means that this number of APIs is able to
extract the importance of different APIs. In the second part, we
distinguish malicious and benign apps with high precision and
utilize the importance of all the APIs obtained in the first part to
recall. Moreover, based on the 50 highly sensitive APIs, MalPat
retain MalPat. Training on different apps can ease the possibility
can be trained within 1 min on the whole dataset.
that MalPat has the prior knowledge of features of benign and
malicious apps. The results of MalPat with different number of
APIs are shown in Fig. 15. According to the figure, we can see VII. CONCLUSION
that the largest difference between the maximum and minimum To fight against malware, we study malicious and benign An-
values of precision is larger than 0.025, which means that the droid apps in the real world to mine hidden patterns of malware.
number of APIs as features has influence on the precision of Previous research work mainly focused on permissions, sen-
MalPat. MalPat with 50 APIs as features peaks at 0.9172 of sitive resources, intents, etc., and very few efforts have been
precision result, and even with only 20 APIs has the precision proposed for addressing the malware detection problem from
result of 0.9024. As to recall results, the top result is achieved API perspective. To fill this gap, we analyze the behaviors
with 20 APIs, and the change of recall rates of MalPat with the of malicious apps on API usage comparing with benign apps.
increase of API number is larger than that of precision results. Utilizing fine-grained features, we are able to mine the patterns
368 IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 1, MARCH 2018

of malware and extract highly sensitive APIs by training the ran- [9] Y. Aafer, W. Du, and H. Yin, “DroidAPIMiner: Mining API-level features
dom forests classifier. To assist Android app marketplaces, we for robust malware detection in android,” in Proc. 9th Int. Conf. Security
Privacy Commun. Netw., 2013, pp. 86–103.
propose an automated malware detection system, MalPat. Com- [10] M. Zhang, Y. Duan, H. Yin, and Z. Zhao, “Semantics-aware android mal-
prehensive experiments are conducted on the large scale dataset ware classification using weighted contextual API dependency graphs,” in
we collected from the Internet consisting of 31 185 benign apps Proc. 21st ACM Conf. Comput. Commun. Security, 2014, pp. 1105–1116.
[11] “Google Play,” [Online]. Available: https://play.google.com/store/apps
and 15 336 malicious apps. Compared with the state-of-the-art [12] “VirusShare.com,” [Online]. Available: https://virusshare.com/
approaches, MUDFLOW, DREBIN, and DroidAPIMiner, Mal- [13] “Contagio Mobile,” [Online]. Available: http://contagiominidump.
Pat outperforms them in both precision and recall. Based on the blogspot.com/
[14] W. Enck et al., “TaintDroid: An information-flow tracking system for
small feature set of 50 highly sensitive APIs, MalPat achieves realtime privacy monitoring on smartphones,” ACM Trans. Comput. Syst,
the F1 score of 98.24%. Experimental results show the effective- vol. 32, no. 2, p. 5, 2014.
ness and efficiency of MalPat, and highly sensitive APIs mined [15] Y. Zhang et al., “Vetting undesirable behaviors in android apps with per-
mission use analysis,” in Proc. 20th ACM Conf. Comput. Commun. Secu-
from malicious and benign apps reflect the patterns of malware rity, 2013, pp. 611–622.
that are able to identify malware. [16] L. K. Yan and H. Yin, “Droidscope: Seamlessly reconstructing the OS and
Despite the efficiency of MalPat, there still remains a great Dalvik semantic views for dynamic android Malware analysis,” in Proc.
21st USENIX Security Symp., 2012, pp. 569–584.
deal of work to improve the pattern mining and malware de- [17] Y. Feng, S. Anand, I. Dillig, and A. Aiken, “Apposcopy: Semantics-based
tection. Therefore, we will focus on the following topics in the detection of android malware through static analysis,” in Proc. 22nd ACM
future work. First, a larger scale dataset of apps need to be col- SIGSOFT Int. Symp. Found. Softw. Eng, 2014, pp. 576–587.
[18] I. Roy, D. E. Porter, M. D. Bond, K. S. McKinley, and E. Witchel, “Lam-
lected so as to avoid overfitting problem, which leads to the inar: Practical fine-grained decentralized information flow control,” in
second topic. Due to the limitation of the number of existing Proc. 30th ACM SIGPLAN Conf. Program. Lang. Des. Implementation,
malware collected on the Internet, mining patterns of malicious 2009, pp. 63–74.
[19] X. Xiao, N. Tillmann, M. Fähndrich, J. de Halleux, and M. Moskal, “User-
apps can be hard to expand. In order to address this problem, aware privacy control via extended static-information-flow analysis,” in
we will mainly focus on benign apps to study their behavior Proc. 27th IEEE/ACM Int. Conf. Autom. Softw. Eng, 2012, pp. 80–89.
patterns to exclude malware. MalPat only considers the differ- [20] O. Tripp and J. Rubin, “A Bayesian approach to privacy enforce-
ment in smartphones,” in Proc. 23rd USENIX Security Symp., 2014,
ence of malicious and benign apps but neglects the categories of pp. 175–190.
benign apps, which may affect the identification of benign apps [21] S. Arzt et al., “Flowdroid: Precise context, flow, field, object-sensitive
due to their category features. It is the third topic we are going to and lifecycle-aware taint analysis for android apps,” in Proc. 35th ACM
SIGPLAN Conf. Program. Lang. Des. Implementation, 2014, pp. 259–269.
overcome by taking categories of benign apps into consideration [22] W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri, “A study of an-
in the detection of malware. In future work, we will improve droid application security,” in Proc. 20th USENIX Security Symp., 2011,
the capability of MalPat and assist Android app marketplaces to pp. 21–21.
[23] L. Lu, Z. Li, Z. Wu, W. Lee, and G. Jiang, “CHEX: Statically vetting
fight against malware efficiently. android apps for component hijacking vulnerabilities,” in Proc. 19th ACM
Conf. Comput. Commun. Security, 2012, pp. 229–240.
[24] A. Bartel, J. Klein, Y. L. Traon, and M. Monperrus, “Automatically se-
ACKNOWLEDGMENT curing permission-based software by reducing the attack surface: An ap-
plication to android,” in Proc. 27th IEEE/ACM Int. Conf. Autom. Softw.
The authors would like to thank L. Huang for his assistance Eng., 2012, pp. 274–277.
[25] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior
in building the MalPat system. against app descriptions,” in Proc. 36th Int. Conf. Softw. Eng., 2014, pp.
1025–1035.
[26] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permis-
REFERENCES sions demystified,” in Proc. 18th ACM Conf. Comput. Commun. Security,
2011, pp. 627–638.
[1] “McAfee Labs Threats Report March 2016,” [Online]. Avail- [27] P.-H. Chen, C.-J. Lin, and B. Schölkopf, “A tutorial on ν-support vector
able: http://www.mcafee.com/us/resources/reports/rp-quarterly-threats- machines,” Appl. Stoch. Models Bus. Ind., vol. 21, no. 2, pp. 111–136,
mar-2016.pdf 2005.
[2] S. Rasthofer, I. Asrar, S. Huber, and E. Bodden, “How current android [28] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1,
malware seeks to evade automated code analysis,” in Proc. 9th Int. Conf. pp. 81–106, 1986.
Inf. Security Theory Practice, 2015, pp. 187–202. [29] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie, “Pscout: Analyzing
[3] “Symantec Report on the Underground Economy July 07-June 08,” the android permission specification,” in Proc. 19th ACM Conf. Comput.
2008. [Online]. Available: https://www.symantec.com/content/en/us/ Commun. Security, 2012, pp. 217–228.
about/media/pdfs/Underground_Eco n_Report.pdf [30] M. L. Vásquez, G. Bavota, C. Bernal-C árdenas, M. D. Penta, R. Oliveto,
[4] S. Rasthofer, S. Arzt, and E. Bodden, “A machine-learning approach for and D. Poshyvanyk, “API change and fault proneness: A threat to the
classifying and categorizing android sources and sinks,” in Proc. 21st success of android apps,” in Proc. 9th Joint Meeting Eur. Softw. Eng.
Annu. Netw. Distrib. Syst. Security Symp., 2014, pp. 1–15. Conf. ACM SIGSOFT Symp. Found. Softw. Eng., 2013, pp. 477–487.
[5] V. Avdiienko et al., “Mining apps for abnormal usage of sensitive data,” [31] W. J. Conover, Practical Nonparametric Statistics. 3rd ed. Hoboken, NJ,
in Proc. 37th IEEE Int. Conf. Softw. Eng, 2015, pp. 426–436. USA: Wiley, 1998.
[6] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, [32] “ActivityManager,” [Online]. Available: http://developer.android.com/
“DREBIN: Effective and explainable detection of android malware in reference/android/app/ActivityManager.html
your pocket,” in Proc. 21st Annu. Netw. Distrib. Syst. Security Symp., [33] “API Mappings,” [Online]. Available: http://www.inpluslab.com/
2014, pp. 1–12. mappings.html
[7] H. Peng et al., “Using probabilistic generative models for ranking risks [34] “TelephonyManager,” [Online]. Available: http://developer.android.com/
of android apps,” in Proc. 19th ACM Conf. Comput. Commun. Security, reference/android/telephony/TelephonyManag er.html
2012, pp. 241–252. [35] “IMSI,” [Online]. Available: https://en.wikipedia.org/wiki/International_
[8] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang, “Hey, you, get off of my market: mobile_subscriber_identity
Detecting malicious apps in official and alternative android markets,” in [36] S. Holm, “A simple sequentially rejective multiple test procedure,” Scand.
Proc. 19th Annu. Netw. Distrib. Syst. Security Symp., 2012, pp. 50–62. J. Statist., pp. 65–70, 1979.
TAO et al.: MALPAT: MINING PATTERNS OF MALICIOUS AND BENIGN ANDROID APPS VIA PERMISSION-RELATED APIS 369

[37] P. Jaccard, “Etude Comparative de la Distribution Florale Dans Une Por- Zibin Zheng (SM’16) received the Ph.D. degree in computer science and en-
tion Des Alpes et du Jura,” Bulletin del la Socit Vaudoise des Sciences gineering from the Department of Computer Science and Engineering, The
Naturelles, vol. 37, pp. 547–579, 1901. Chinese University of Hong Kong, Sha Tin, Hong Kong, in 2010.
[38] C. Tumbleson and R. Winiewski, “Apktool,” [Online]. Available: http:// He is currently an Associate Professor with the School of Data and Computer
ibotpeaches.github.io/Apktool/ Science, Sun Yat-sen University, Guangzhou, China. His research interests in-
[39] “ProGuard,” [Online]. Available: http://developer.android.com/tools/ clude services computing, software engineering, and blockchain.
help/proguard.html Dr. Zheng received the Outstanding Thesis Award of CUHK in 2012, the
[40] Y. Amit and D. Geman, “Shape quantization and recognition with ran- ACM SIGSOFT Distinguished Paper Award at ICSE2010, and the Best Student
domized trees,” Neural Comput., vol. 9, no. 7, pp. 1545–1588, 1997. Paper Award at ICWS 2010.
[41] T. K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. Document
Anal. Recognit., 1995, pp. 278–282.
[42] J. Ross Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA,
USA: Morgan Kaufmann, 1993. Ziying Guo is currently working toward the undergraduate degree at the School
[43] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and of Data and Computer Science, Sun Yat-sen University, Guangzhou, China.
Regression Trees. Belmont, CA, USA: Wadsworth, 1984. Her research interests include mobile computing and data mining.
[44] V. Avdiienko et al., “Mudflow,” [Online]. Available: https://www.st.cs.
uni-saarland.de/appmining/mudflow/
[45] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
Michael R. Lyu (F’04) received the B.S. degree in electrical engineering from
vol. 20, no. 3, pp. 273–297, 1995.
National Taiwan University, Taipei, Taiwan, in 1981; the M.S. degree in com-
puter engineering from the University of California, Santa Barbara, CA, USA,
in 1985; and the Ph.D. degree in computer science from the University of Cali-
fornia, Los Angeles, CA, USA, in 1988.
He is currently a Professor with the Department of Computer Science and
Guanhong Tao received the B.Eng. degree in computer science and technology Engineering, Chinese University of Hong Kong. He is also the Director of the
from Zhejiang University, Hangzhou, China, in 2014. He is currently working Video over Internet and Wireless (VIEW) Technologies Laboratory. His re-
the Master’s degree in the School of Data and Computer Science, Sun Yat-sen search interests include software reliability engineering, distributed systems,
University, Guangzhou, China. fault-tolerant computing, mobile networks, web technologies, multimedia in-
His research interests include mobile computing, program analysis, and formation processing, and e-commerce systems.
software security. Dr. Lyu is a fellow of the ACM, AAAS, and Croucher Senior Research.

You might also like