Detection of Malicious Android Apps Using Machine Learning Techniques
Detection of Malicious Android Apps Using Machine Learning Techniques
ABSTRACT
Lately, the matter of dangerous malware in devices is spreading speedily, particularly those repackaged
android malware. Though understanding robot malware mistreatment dynamic analysis will give a comprehensive
read, it's still subjected to high price in setting preparation and manual efforts within the investigation. Android is the
most preferred openly available smart phone OS and its permission declaration access management mechanisms can’t
sight the behavior of malware. the matter of police investigation such malware presents distinctive challenges thanks
to the restricted resources accessible and restricted privileges granted to the user however conjointly presents
distinctive opportunities within the needed data hooked up to every application. Through this article, we tend to gift a
machine learning-based system for the detection of malware on android devices. In our project, a code behavior
signature-based malware detection framework mistreatment associate degree SVM rule is planned, which might sight
malicious code and their variants effectively in runtime and extend malware characteristics information dynamically.
Experimental results show that the approach incorporates a high detection rate and low rate of false positive and false
negative, the power, and performance impact on the first system can even be unheeded. Our system extracts variety of
options associate degreed trains a Support Vector Machine in an offline (off-device) manner, so as to leverage the
upper computing power of a server or cluster of servers.
Keywords: Malware analysis, Machine Learning, Neural Networks, Support Vector Machine.
www.trendytechjournals.com
16
International Journal of Trendy Research in Engineering and Technology
Volume 4 Issue 7 December 2020
ISSN NO 2582-0958
____________________________________________________________________________
DESCRIPTION
ANALYSIS
www.trendytechjournals.com
17
International Journal of Trendy Research in Engineering and Technology
Volume 4 Issue 7 December 2020
ISSN NO 2582-0958
___________________________________________________________________________
The User selects the APK file which is to be specific features which seem like an anomaly or
tested and it is sent to the cloud storage. Over a red flag, collect and store them in new CSV
there the applications are stored in sequence and files and train our Machine Learning model
wait to be executed. The APK is then sent to the according to them. Finally, our last phase is to
Cloud Engine. The features of the APK such as test our model with the data and evaluate our
permissions and API calls are extracted and sent findings according to the results.
to the Machine Learning model. Thus, the
features of the APK are analyzed and based on
the findings a report is generated and sent back IMPLEMENTATION METHODOLOGY
to the User.
(Data Collection and Feature Filtering)
1. Collect all applications in separate folders
which contain benign as well as suspicious
applications respectively.
2. Using “Glob” framework in python create an array of
files is for further processing.
3. Analyze each application in the
array using “pyaxmlparser” and
“androguard” framework.
4. Extract the following things in the analysis phase:
a. Permissions
b. Activities
c. Intents
d. API calls
5. Taking these four attributes into
consideration a program maps all attributes
to a CSV file and mentions a class for each
application.
6. Once CSV files are generated, analyze them
for any redundancy present, and if found,
eliminate the entire row.
7. Another program extracts the total
permissions from these APK files.
Fig 4.3: Flow of the Project
These permissions will work as
attributes in the Dataset CSV File (Here
DESIGN if permission is present it is marked as 1
else it is marked as 0).
Firstly, we collect a number of malicious
8. An N-bit Vector extracts search line in the
and benign android applications by retrieving
CSV file, these vectors work as input to the
their APKs online. Next, we implement feature
machine learning algorithm.
extraction by enumerating the API calls,
permissions and activity of the APKs to get a
(Machine Learning)
fair idea of the behavior of the particular apps.
(SVM)
Then we create a dataset based on the extracted
features by compiling them into CSV files for 1. Import data in the form of CSV file using pandas
future use and reference. After that we select framework.
2. Using train test split divide entire dataset in a ratio of
www.trendytechjournals.com
18
International Journal of Trendy Research in Engineering and Technology
Volume 4 Issue 7 December 2020
ISSN NO 2582-0958
____________________________________________________________________________
1:3 (75% of data is for training and 25% for testing).
3. Design an SVM model while keeping the inputs in
mind.
4. Select ‘rbf’ kernel as input/output is binary.
5. Analyze accuracy using confusion matrix.
(KNN)
1. Import data in form of CSV file using pandas
framework.
2. Using train test split divide entire
dataset in a ratio of 1:3 (75% of data is
for training and 25% for testing).
3. Design a KNN model keeping input in mind.
4. Use this model along with a Decision tree.
5. Use The Decision tree for further bifurcation of
malware families.
2) Software Details
Python 3.x
Tensorflow 2.2.x
Android Studio 5
Future Scope
www.trendytechjournals.com
19
International Journal of Trendy Research in Engineering and Technology
Volume 4 Issue 7 December 2020
ISSN NO 2582-0958
___________________________________________________________________________
CONCLUSION framework for android devices. J Intell Inf Syst
38, 161–190 (2012).
Hence, we have successfully proposed to
use permissions and API calls of Android [6] Justin Sahs and Latifur Khan.2012.A Machine
applications to detect malware and malicious Learning Approach to Android Malware
codes in Android based mobile platform. Ours is Detection. In Proceedings of the 2012 European
a novel approach to distinguish and detect Intelligence and Security Informatics
Android malware with different intentions. It is Conference (EISIC’12). IEEE Computer Society,
effective, that is, it is able to distinguish variant USA, 141–147.
of Android malware between distinct purposes
of them. The proposed framework extracts
[7] Zhao M., Ge F., Zhang T., Yuan Z. (2011)
AntiMalDroid: An Efficient SVM-Based
permissions from Android applications and
Malware Detection Framework for Android. In:
further combines the API calls to characterize
Liu C, Chang J, Yang A (eds) In- formation
each application as a high dimension feature
Computing and Applications. ICICA2011.
vector. By applying learning methods to the
Communications in Computer and Information
collected datasets, we can derive classification Science, vol 243. Springer, Berlin, Heidelberg.
models to classify Apps as benign or malware.
Experiments on real world data demonstrate the [8] Seung-Hyun Seo, Aditi Gupta, Asmaa Mohamed
good performance of the framework for Sallam, Elisa Bertino, Kangbin Yim, Detecting
malware detection. mobile malware threats to homeland security
through static analysis, Journal of Network and
APPENDIX Computer Applications, Volume 38, 2014, Pages
43-53, ISSN 1084-8045,
SVM-Support vector machine
UI-User Interface [9] Arp, Daniel & Spreitzenbarth, Michael &
API-Application Programming Interface Hubner, Malte & Gascon, Hugo & Rieck,
KNN- K nearest neighbors Konrad.(2014). DREBIN: Effective and
Explainable Detection of Android Malware in
CSV-Comma Separated Values
Your Pocket. Symposium on Network and
RBF-Radial basis function
Distributed System Security (NDSS).
APK-Android Application Package
[10] D. Wu, C. Mao, T. Wei, H. Lee and K. Wu,”
DroidMat: Android Malware Detection through
REFERENCES Manifest and API Calls Tracing,” 2012 Seventh
Asia Joint Conference on Information Security,
[1] Androzoo Dataset, Tokyo, 2012, pp. 62-69, DOI:
https://androzoo.uni.lu/ 10.1109/AsiaJCIS.2012.18.
https://ieeexplore.ieee.org/document/6298136/
[2] CIC Dataset 2020,
https://www.unb.ca/cic/datasets/maldroid- [11] William Enck, Machigar Ongtang, and Patrick
2020.html McDaniel.2009. On light weight mobile phone
application certification. In Proceedings of the
[3] VirusShare.com Dataset, 16th ACM conference on Computer and
https://virusshare.com/
communications security CCS’09). Association
[4] SandDroid-An automatic Android application analysis for Computing Machinery, New York, NY,
system. USA, 235–245.
http://sanddroid.xjtu.edu.cn:8080/#home
[12] Sanz, Borja & Santos, Igor & Laorden, Carlos &
[5] Shabtai, A., Kanonov, U., Elovici, Y. et al. Ugarte Pedrero, Xabier & Bringas, Pablo.
“Andromaly”: a behavioral malware detection
www.trendytechjournals.com
20
International Journal of Trendy Research in Engineering and Technology
Volume 4 Issue 7 December 2020
ISSN NO 2582-0958
____________________________________________________________________________
(2012). On the Automatic Categorization of [15] Mohammed K. Alzaylaee, Suleiman Y. Yerima,
Android Applications. Sakir Sezer, DL-Droid: Deep Learning Based
10.1109/CCNC.2012.6181075. Android Malware Detection Using Real
Devices, Computers & Security (2019), DOI:
[13] H.Wang, J.Si, H.Li and Y.Guo,” RmvDroid:
Towards A Reliable Android Malware Dataset [16] N. Peiravian and X. Zhu,” Machine Learning for
with App Metadata,” 2019 IEEE/ACM 16th Android Malware Detection Using Permission
International Conference on Mining Software and API Calls,”2013 IEEE 25th International
Repositories (MSR), Montreal, QC, Canada, Conference on Tools with Artificial Intelligence,
2019, pp.404-408, DOI: Herndon, VA, 2013, pp. 300-305,
10.1109/MSR.2019.00067. DOI:10.1109/ICTAI.2013.53.
[14] X. Li, J. Liu, Y. Huo, R. Zhang and Y. Yao, ”An [17] The Complete Android Oreo Developer Course -
Android malware detection method based on Build 23 Apps! Created by Rob Percival, Nick
Android Manifest file,” 2016 4th International Walter,
Conference on Cloud Computing and
https://www.udemy.com/course/the-
Intelligence Systems(CCIS), Beijing, 2016,
pp.239-243, DOI: 10.1109/CCIS.2016.7790261. complete-android-oreo-developer-course/
www.trendytechjournals.com
21