0% found this document useful (0 votes)
7 views

A Study On Deep Learning Approaches Over Malware Detection

Uploaded by

4ryful
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

A Study On Deep Learning Approaches Over Malware Detection

Uploaded by

4ryful
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 IEEE INTERNATIONAL CONFERENCE ON

ADVANCES AND DEVELOPMENTS IN ELECTRICAL AND ELECTRONICS ENGINEERING (ICADEE 2020)

A study on deep learning approaches over Malware detection


1
PM Kavitha, 2Dr.B.Muruganantham
1
Research Scholar, Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Kattankulathur, Chennai. India. E Mail:[email protected]
2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE) | 978-1-7281-9251-2/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICADEE51157.2020.9368924

2
Assistant Professor, Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Kattankulathur, Chennai, India. E Mail: [email protected]

ABSTRACT detect new categories of malware being generated in


day to day data storage[12].
As an inclination to technology there is a
tremendous growth in internet which leads to a need A.RELATED WORK
in storage of data. While manipulating data during
uploading or downloading, the data is greatly infected The applications of deep learning over security
with different types of malware.The usage of problems are a attracted space overall the years.
technology will become complex in the upcoming Researchers have given a close description of the
heterogeneous technologies. In accordance with the applications of deep learning techniques within the
effective usage of the technologies, various machine domain of security threat detection [1,4,29].Deep
learning mechanisms exist. In this survey, we provide Learning techniques play a vital role in the area of
a survey on deep learning algorithms applied on security in emerging days. There are various ways a
detection of infection. First, the basic study of the data may get infected. [24] There exist a considerable
related content is discussed. Then, an overview of noise in data storage in which the classification of
deep learning algorithms and categories of infections infections need to be identified. There are some handful
are conferred. This survey presents a brief report on major variety of infections found. Namely Virus,
Deep learning methodologies and malware detection. Worms, Malware etc.. Normally these infections will
fall under the following category of work as either
INDEX TERMS: CNN, RNN, Static analysis , detection of infection or protection from infection. Out
Dynamic analysis, Hybrid analysis. of the above mentioned infections Malware is the
major infection which affect the data. [12]Malware
I.INTRODUCTION Analysis is the end-point in security concern. Malware
Deep Learning is the mechanism which is a part of detection using machine learning approach deals with
machine learning family. It is based on neural network two major Phases: Feature extraction and automatic
pattern of artificial intelligence. Deep Learning is a detection[24].The detection of malware is of two
technology that teach computer what to do that is categories either as static analysis method and dynamic
exactly expected by human. It belongs to AI family analysis method. Existence of various classification of
which follows hierarchical learning methodology infections and detecting using multiple Deep learning
which is broader classified into supervised Learning functions and techniques were discussed in many
and unsupervised Learning. Deep learning is research work.
mentioned as Black box of Artificial Intelligence[8].A B.NEED OF STUDY:
malicious program is the one which replicates itself and
modifies other computer program. The anti- This survey paper may lead to a new approach of deep
virus packages were designed to sight the existence learning. There were many paper which state the
of such a malware by ¿nding a match with the virus various approaches on machine learning. Added to that
de¿nition information updated from time to time. various applications of machine learning algorithms
This is called signature-based malware detection were also discussed in advanced survey. It is been
[1,21].The program that cause infection is named as researched on ML and Data Mining (DM) methods for
Malware, Virus, Spam etc.. Detection and cyber security intrusion detection[2,7,29]. Klaine have
identification of malware is the most important surveyed the machine learning algorithms and their
problem in the cyber security[1]. Machine Learning solutions in self organizing cellular networks, and
offers the flexibility to cut back abundant of the manual given valuable classi¿cation over machine learning and
effort needed with the normal approaches to malware comparison[9,7,29]. [29]Machine learning techniques
analysis , in addition as hyperbolic accuracy in have been applied in various domains, but still there are
malware detection and classification [2]. In this paper few others areas where there must be focus on various
an efficient way is inculcated to prove that the other application[11]. Jun Feng Xie in his paper have
algorithm of Deep learning over various cyber security surveyed about the application of machine learning
is efficient way. Considering many ancient strategies algorithm in SDN.[11]. [7]Hodo et al. [19] also focus
and machine learning strategies deep learning on ML based Intrusion Detection System (IDS). The
algorithms thought-about as a sturdy thanks to solve main difference between[17] and [12] is that deep
issues [16].This paper comes with an outcome that learning-based IDS which is also been described
rapid and automatic detection of malware is need to briefly in [12]. The static and dynamic options square
measure integrated along and therefore the integrated
978-1-7281-9251-2/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: SRM University. Downloaded on March 15,2021 at 04:57:31 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE INTERNATIONAL CONFERENCE ON
ADVANCES AND DEVELOPMENTS IN ELECTRICAL AND ELECTRONICS ENGINEERING (ICADEE 2020)

feature vector is employed for coaching and combination of neurons with multiple layers of neurons
classi¿cation through machine learning approach . Neural network architecture of DL is combination of
[19].DL is known as hidden boxing of AI [8]and it also neuron arranged as various layers . where in multiple
comprises of many additional advantage that machine hidden layers are in between input and out layers.
learning fail to carry over. These are various surveys
which projects the need of DL in various security Convolutional Neural Network (CNN)
threat. [15]CNN though evolved in later 80’s due to the
II.DEEP LEARNING hardware limitations for the computation. In 90’s
gradient based algorithm was used with CNN and
A Introduction output expected was great. With respect to 2D and 3D
AI has a major impact in the emerging technology. images CNN absorbs the exact shape and dimensions
There are few subsets of AI named as Machine of the figure and improves dimensionality. CNN are
Learning-which will be as an introductory for Deep trained based in gradient based learning algorithm.
Learning[15].DL consist of neurons which are Feature extractor and classifier are the most important
interconnected. Deep Learning is the AI’s most features of CNN. The steps followed in CNN to extract
important technology which is composed of collection optimal output are Convolution followed by max
of neurons. Structure of Neural Network is processed pooling which is followed by flattening which finally
using Connected layers. A network of neuron is formed evolves the optimal solution. Pooling is normally
by multiple layers as such input layer, output layer and dividing the image into small piece of data. CNN helps
hidden layer. All the layer in between input and output the machine in feature exaction. The feature
layers are mentioned as hidden layer. The network propagation is from higher layer to lowest layer. The
joining more than two layers is represented as deep. final layer of CNN performs as the input of fully
Neurons are connected with each other. The strength of connected layers.
the signal given by input layer to the next layer is based
on the weight , bias and activation function. More the
level of layer more the complex learning in neural
system.

B Deep Learning Approaches


Deep learning algorithms may be categorised as
supervised Learning , Semi supervised Learning and
unsupervised learning. It has also been classified into
Reinforcement Learning and Deep reinforcement
Learning[15].The learning is classified into two major
phases. In the first phase, nonlinear transformation of Fig 2:Sample Structure of Convolutional Neural
the input data is applied to create a statistical model of Network
data as output. In the second phase, improves the data
as mathematical model named as derivative. These two Recurrent Neural Network (RNN)
phases are repeated multiple times to reach expected
[17] Although RNN was evolved in 80’s its fullest need
output is arrived. The various deep learning approaches
is satisfied in later 90’s. Recurrent Neural Network
are Deep Supervised Learning, Deep Semi-Supervised
uses internal state as inputs. RNN is made of single
Learning, Deep Unsupervised Learning,
node. RNN uses feed forward propagation. The output
Reinforcement Learning and Deep Reinforcement
of one layer acts as output of the next layer. It is also
Learning.
stated as processing sequential data . RNN converts
independent activations to dependent activations. RNN
is further classified as Fully Recurrent Network,
Recursive Neural Network and Neural History. The
multi-layered perceptron is protected to weight of
every single feedback connection. Recursive neural
network are linear structure that promotes hierarchical
structure.

Artificial Neural Network(ANN)


Fig 1: Simple Structure of Deep Neural Network
ANN is represented as directed graph. It is associated
C Architecture of Deep Learning Network with each node. It acts similar to that of biological
The Architecture of Deep learning is broadly classified neuron. It may be the expansion of collection of
based on Neural Network as Convolutional Neural neurons either vertically or horizontally. It has the
Network (CNN) ,Recurrent Neural Network (RNN) combinations of layers. Multiple hidden layer present
[21],Feed Forward Neural Network(FFN), in between input and output layer. Neurons in
Reinforcement Learning. There are various

Authorized licensed use limited to: SRM University. Downloaded on March 15,2021 at 04:57:31 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE INTERNATIONAL CONFERENCE ON
ADVANCES AND DEVELOPMENTS IN ELECTRICAL AND ELECTRONICS ENGINEERING (ICADEE 2020)

interconnection of nodes . Each node may be identified


as the term perceptron.

III MALWARE
Static Analysis
A Classification
[8]Malware is one of the program which infects This is also mentioned as static code[19]. Used in
program. The Most common classification of Malware software debugging without executing the program.
are: Dynamic Analysis
Types of Feature Dama Area of Dynamic analysis is used in testing its behaviour and
Malware ge Applicat
ion
learns its functionality[19].This may also comprise of
Virus[8],[3 Creates Performa • Contag IP address, domain name and so on.
] infection nce ious Hybrid Analysis
without Degradati threat.
the on This is the combination of both the static and dynamic
awarenes analysing technique .Here both software debugging
s of User and testing the functionality of the software is done.
Worms[8], Standalo Issue in • Contag
[26] ne Storage ious Factors STATIC DYNAMIC HYBRID
Maliciou Network threat Analysis Analysis Analysis
s performa Examin Without With execution Tight
software nce es executio Integratio
Trojan[8],[ Takes Steals • Maske n n of both
3] unauthori Password d threat Tools Attack Network Mobile
sed Money and modelin scanner,Sniffer sand
control of theft/File techniqu g, ,Fuzz box,Andr
computer modificat e Source tester,digital ubis
ion. code forensic
Rootkit[8], Masking Steals • Maske analyser
[23] technique Password d threat ,
Install Obfusca
keylogger ted code
. detectio
Spyware Keeps Some • Financi n
track of Captures al Accurac High Low Better
user entire Threat y level than static
system network and
without Mis usage dynamic
their of Target Cannot Able to detect Able to
knowled Encryptio Code detect new /Unknown detect
ge n key Executi new one malware new
Keylogger A typical Used in • Financi on malware malware
class of online al Limitati Limited More time and High cost
Spyware forgeries. Threat ons signatur power
record Steal user e and consumption
keystroke name can
s, browse detect
cookies existing
and files malware
on the Table II: Comparison of Analysis technique
drive to
assemble C Malware detection Approach
personal Malware detection is classified into two broad category
details Signature based and Heuristic based. Signature based
detection approach is further classified into hash
signature and byte signature. Heuristic based detection
Table:I Classification and applications of malware approach is classified into static and dynamic
B Malware Analysis Technique techniques.[15] ,[23] Signature based helps to track
down the detail log of system and helps in classifying
Malware analysis technique enhance user from security malicious program. Heuristic based detection method
threat over various attacks by the malware[19].There used rules or algorithm to detect malicious program.
are three major classification in malware analysis.
They are static, dynamic and hybrid.

Authorized licensed use limited to: SRM University. Downloaded on March 15,2021 at 04:57:31 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE INTERNATIONAL CONFERENCE ON
ADVANCES AND DEVELOPMENTS IN ELECTRICAL AND ELECTRONICS ENGINEERING (ICADEE 2020)

DBN Multi-layered, ܲሺ‫ݒ‬ሻ


Evidence is ൌ σܲሺ݇ȁ‫ݓ‬ሻܲሺ‫ݒ‬ȁ݇ͳǡ ‫ݓ‬
considered as P(k|w)=updated
approximation of design of aggregate
facts. posterior
distribution.

CNN Variants of Relu:


multi-layered . Range: [ 0 to
1.Convolutional infinity)
Fig:3.1 Detection Technique filter is involved. Function:Monotoni
2.Fully c(slope-curve)
connected
network.
Family/Variant Approach/ Features Result
Classifier Used Accuracy RNN Multi-Layered -
[24]Backdoor, Static API call 97.19% perceptron,
Worm,and analysis Looping
Trojan sequence structure

[24]Backdoor, Dynamic API call 99.8% Table IV: Deep Learning Algorithm classification.
analysis
Reference:
Worm,Packed, sequence
PUP,and
Trojan [1] Ankur Singh Bist Kateryna Chumachenko. “Machine
Learning Methods for Malware Detection and Classification”
[24]Backdoor, Hybrid API call 94.9% International Journal of Computer Science and Information
Worm, analysis Security (IJCSIS), Vol. 16, No. 3, March 2018.
Packed,PUP, sequence
and [2] A. L. Buczak and E. Guven, “A survey of data mining and
Trojan machine learning methods for cyber security intrusion
detection,” IEEE Communications. Surveys Tuts., vol. 18,
Table:III Malware detection -Static, Dynamic and no. 2, pp. 1153–1176, 2nd Quart., 2016.
Hybrid analysis
[3] D. Castelvecchi, “Can we open the black box of AI?”
D Deep Learning Algorithms in Malware detection Nature News, vol. 538, no. 7623, p. 20, 2016.
[1]The most popular deep learning algorithms used in
detection of infections are SVM, Random forest, Naive [4] Dolly Uppal , Vishakha Mehra and Vinod Verma , Basic
bayes, Multi-layer perceptron, KNN, Ada Boost and survey on Malware Analysis, Tools and Techniques ,
International Journal on Computational Sciences &
Decision tree. But then the algorithm that are most
Applications (IJCSA) Vol.4, No.1, February 2014
efficient in detection is Restricted Boltzman Machine
(RBM),CNN,RNN,DBN-Deep Belief Network and [5] Erkam Guresen a*, Gulgun Kayakutlu, procedia
Auto encoder. Activation function is implemented in Definition of artificial neural networks with comparison to
CNN to transform the summed weighted input to other networks , Computer Science 3 (2011) 426–433
output. [6] E. Hodo, X. Bellekens, A. Hamilton, C. Tachtatzis, and
R. Atkinson, “Shallow and deep networks intrusion detection
Conclusion
system: A taxonomy and survey,” arXiv preprint
Deep Learning being extension of machine learning
arXiv:1701.02145, 2017
follows various methodologies. Various algorithm and
infection category in discussed. Infection free data is [7] Jun Feng Xie , F. Richard Yu , Fellow, IEEE, Tao Huang
the major goal in data transmission. In this paper it is , Renchao Xie , Jiang Liu A Survey of Machine Learning
been discussed the algorithms of DL over malware Techniques Applied to Software De¿ned Networking (SDN):
Research Issues and Challenges, IEEE
detection. This paper describes impact of the various COMMUNICATIONS SURVEYS & TUTORIALS, VOL.
algorithm on detecting the infections 21, NO. 1, FIRST QUARTER 2019

Algorith Architecture Formula /Activation


m Function [8] M. Karresand, “Separating Trojan horses, viruses, and
RBM Two layered ܽ௜ =™‫ݓ‬௜௝ ‫ݔ‬௜ worms - A proposed taxonomy of software weapons,” in
1.Stochastic IEEE Systems, Man and Cybernetics Society Information
visible unit Assurance Workshop, 2003, pp. 127–134.
W=Weight of
2.Stochastic connection between
observable unit. i&j [9] P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza,
x=0 or 1 “A survey of machine learning techniques applied to self-
organizing cellular networks,” IEEE Communications.
Surveys Tuts., vol. 19, no. 4, pp. 2392–2431, 4th Quart.,
2017.

Authorized licensed use limited to: SRM University. Downloaded on March 15,2021 at 04:57:31 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE INTERNATIONAL CONFERENCE ON
ADVANCES AND DEVELOPMENTS IN ELECTRICAL AND ELECTRONICS ENGINEERING (ICADEE 2020)

dynamic characteristics, Elsevier,Computers & Security


2019 208-233
[10] W. Knight, The Dark Secret at the Heart of AI.
Cambridge, MA, USA: MIT Technology Review, 2017
[25] Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
[11] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-
Recurrent Neural Network Regularization
based learning applied to document recognition. Proc. IEEE
arXiv:1409.2329 [cs.NE]
1998, 86, 2278–2324.
[26] X. Wang, W. Yu, A. Champion, X. Fu, and D. Xuan,
“Detecting worms via mining dynamic program execution,”
[12] Matilda Rhode,Pete Burnap ,Kevin Jones Early-stage
malware prediction using recurrent neural networks, in Proceedings of the 3rd International Conference on
Elsevier, Computers & Security 2018 578-594 Security and Privacy in Communication Networks,
SecureComm, 2007, pp. 412–421.

[13] G. A. N. Mohamed and N. B. Ithnin, “Survey on [27] A. Zaki and B. Humphrey, “Unveiling the kernel :
Representation Techniques for Malware Detection System,” Rootkit discovery using selective automated kernel memory
Am. J. Appl. Sci., vol. 14, no. 11, pp. 1049–1069, 2017. differencing,” Virus Bull., no. September, pp. 239–256, 2014

[14].Mohana, Dr.S.M.Jagatheesan Survey on Permission [28] Md Zahangir Alom 1,*, Tarek M. Taha 1, Chris
Based Android Malware Detection Techniques, IJEDR 2019 Yakopcic A State-of-the-Art Survey on Deep Learning
| Volume 7, Issue 3 | ISSN: 2321-9939 Theory and Architectures, Published: 5 March 2019 MDPI-
Electronics
[15] Mohammed Harun Babu R, Vinayakumar R, Soman KP
,A short review on Applications of Deep learning for Cyber [29] A Survey of Machine Learning Techniques Applied to
security Software Defined Networking (SDN): Research Issues and
Challenges" , IEEE Communications Surveys & Tutorials,
2018
[16] Mohammed Harun Babu R, Vinayakumar R, Soman KP,
A short review on Applications of Deep learning for Cyber
security, arXiv:1812.06292

[17] A. Patcha and J.-M. Park, “An overview of anomaly


detection techniques: Existing solutions and latest
technological trends,” Computer Network., vol. 51, no. 12,
pp. 3448–3470, 2007.

[18]. Quan Le , Oisín Boydell , Brian Mac Namee , Mark


Scanlon, Deep learning at the shallow end: Malware
classi¿cation for non-domain experts, ScienceDirect,
DFRWS 2018 USA d Proceedings of the Eighteenth Annual
DFRWS USA

[19]. P. V. Shijoa,‫כ‬, A. Salimb, Elsevier Integrated static and


dynamic analysis for malware detection,Procedia Computer
Science 46 ( 2015 ) 804 – 811

[20] Rami Sihwail, Khairuddin Omar, K. A. Z. AriffinA


Survey on Malware Analysis Techniques: Static, Dynamic,
Hybrid and Memory Analysis, International Journal on
Advanced Engineering Information Technology , Vol
8(2018) No 4-2 ISSN:2088-5334

[21] R. Vinayakumar and Mamoun Alazab, (Senior Member,


IEEE) Robust Intelligent Malware Detection Using Deep
Learning, IEEE ACCESS April 18, 2019.Digital Object
Identifier 10.1109/ ACCESS .2019. 2906934.

[22] R. Vinaykumar and Mamoun Alazab, (Senior Member,


IEEE) Deep Learning Approach for Intelligent Intrusion
Detection System April 18, 2019.Digital Object Identifier
10.1109/ ACCESS. 2019.2906934

[23] G. Vigna, E. Jonsson, and C. Kruegel (Eds.): RAID


2003, LNCS 2820, pp. 173–191, 2003. Springer-Verlag
Berlin Heidelberg 2003 Using Decision Trees to Improve
Signature-Based Intrusion Detection

[24]Weijie Han, Jingfeng Xue ,Yong Wang ,Lu Huang


,Zixiao Kong,Limin, MalDAE: Detecting and explaining
malware based on correlation and fusion of static and

Authorized licensed use limited to: SRM University. Downloaded on March 15,2021 at 04:57:31 UTC from IEEE Xplore. Restrictions apply.

You might also like