0% found this document useful (0 votes)
17 views20 pages

TLTK1

Des

Uploaded by

zxcvbnm13112002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

TLTK1

Des

Uploaded by

zxcvbnm13112002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

A Survey of Malware Detection Using Deep Learning

Ahmed Bensaouda,∗ , Jugal Kalitaa and Mahmoud Bensaouda


a
Deptarment of Computer Science, University of Colorado Colorado Springs, CO, USA

ARTICLE INFO ABSTRACT


Keywords: The problem of malicious software (malware) detection and classification is a complex task,
Malware Detection and there is no perfect approach. There is still a lot of work to be done. Unlike most other
Multi-task Learning research areas, standard benchmarks are difficult to find for malware detection. This paper
Malware Image aims to investigate recent advances in malware detection on MacOS, Windows, iOS, Android,
Generative Adversarial Networks and Linux using deep learning (DL) by investigating DL in text and image classification, the
Mobile Malware use of pre-trained and multi-task learning models for malware detection approaches to obtain
Convolutional Neural Network high accuracy and which the best approach if we have a standard benchmark dataset. We
discuss the issues and the challenges in malware detection using DL classifiers by reviewing
the effectiveness of these DL classifiers and their inability to explain their decisions and
actions to DL developers presenting the need to use Explainable Machine Learning (XAI)
or Interpretable Machine Learning (IML) programs. Additionally, we discuss the impact
of adversarial attacks on deep learning models, negatively affecting their generalization
capabilities and resulting in poor performance on unseen data. We believe there is a need
arXiv:2407.19153v1 [cs.CR] 27 Jul 2024

to train and test the effectiveness and efficiency of the current state-of-the-art deep learning
models on different malware datasets. We examine eight popular DL approaches on various
datasets. This survey will help researchers develop a general understanding of malware
recognition using deep learning.

1. Introduction data. Our survey focuses on static, dynamic and hy-


brid malware detection methods in Windows, Android,
Operating systems such as Windows, Android,
Linux, MacOS, and iOS. We describe the strengths and
Linux, and MacOS are updated every few weeks to weaknesses of deep learning models for malware detec-
protect against critical vulnerabilities. On the other tion. Most recent research uses deep neural networks
hand, malware authors are also always looking for new (DNNs) for malware classification and achieves high
ways to finesse their malicious code to overwhelm the success. State-of-the-art DNN models have been de-
new operating system updates. Every operating system
veloped against modern malware such as Zeus, Fleece-
is vulnerable. In addition, since operating systems run
ware, RaaS, Mount Locker, REvil, LockBit, Cryptesla,
on desktops and servers, and even on routers, security Snugy, and Shlayer.
cameras, drones and other devices, the biggest problem The contributions of this paper are as follows:
is diversity of systems to protect because all these
devices are very different. • It gives the big picture of how hackers attack
Most every day, there is a new story about mali- (Sections 2,3,4,5).
cious software in the news. For example, in Oct 2022,
• It presents how to generate images form malware
cyberattacks coming from a Russia-based hacker group
files (Section 6).
known as Killnet targeted the government services of
the state of Colorado, Alabama, Alaska, Delaware, • It discusses deep learning models for malware
Connecticut, Florida, Mississippi, and Kansas web- image classification (Section 7).
sites1 . Again in 2022, hackers working on behalf of the
Chinese government stole $20 million from covid relief • It describes feature reduction that can improve
benefits2 . The increase in the vulnerability of sensitive performance (Section 8).
data due to cyber-attacks, cyber-threats, cyber-crimes, • It discusses transfer learning approaches in the
and malware needs to be countered. In 2023, Fig. 1 classification of malware and what needs to im-
shows countries that have been attacked by malware and prove for better performance (Section 9).
the top origins of these malware 3 .
Researchers have used deep learning to classify • It reviews the use of natural language processing
malware samples since it generalizes well to unseen in malware classification (Section 10).

Corresponding author:
∗ • It presents the deep learning models for cryptog-
[email protected] (A. Bensaoud); [email protected] (J. rapher ransomware (Section 11).
Kalita); [email protected] (M. Bensaoud)
ORCID (s): • It shows how we know if we can trust the results
1
https://www.nbcnews.com/tech/security/colorado-state- of a DL model using Explainable Artificial Intel-
websites-struggle-russian-hackers-vow-attack-rcna51012 ligence, XAI (Section 12).
2
https://www.nbcnews.com/tech/security/china-hacked-least-
six-us-state-governments-report-says-rcna19255 • It discusses significant challenge for the reliabil-
3
https://attackmap.sonicwall.com/live-attack-map ity and security pozed by adversarial attacks on
deep learning models (Section 13).

Bensaoud et al.: Preprint submitted to Elsevier Page 1


Fig. 1: Worldwide attacks

The rest of this paper, we discuss avenues for future be present in the first 1024 bytes of the docu-
research and we examine the Efficientnet B0, B1, B2, ments. Some hackers take advantage of this by
B3, B4, B5, B6, and B7 models on malware images putting unrelated data within the first 1024 bytes.
datasets for classification. This is a very simple technique to try to avoid
signature-based detection. PDFs are composed of
2. Mechanics of Malware Attacts objects; each section has specific data within the
document or performs a specific function. Each
The hacker has one goal, which is to get malware object starts with two numbers, followed by the
installed onto a victim’s computer. Because most com- keyword obj, and ends with endobj. There are
puters are protected by some type of firewall, direct many kinds of objects, such as font objects, image
attacks are difficult to impossible to perform. Therefore, objects, and even objects that contain metadata.
attackers attempt to trick the computer into running the
malicious code. The most common way to do this is • There are many keywords that begin with a /
by using documents or executable files. For instance, and describe how the PDF works. Some of the
a hacker may send an email or a phish to the victim keywords related to malicious activity include
with a malicious document attachment or a link to a /OpenAction, or its abbreviation /AA, both of
website where the malicious document is located. Once which indicate an automatic action to be per-
the victim opens the document, embedded exploits or formed when the document is viewed4 . This key-
scripts run and download or extract more malware. This word points to another object that automatically
is the real malware the hacker wants to run on the gets opened or executed when the PDF is opened.
victim’s system and is often something like a back- Malicious PDFs have /OpenAction pointing to
door or ransomware. However, malicious documents some malicious JavaScript, or an object contain-
are usually not the final piece of malware in an attack, ing an export; whenever one opens the docu-
but are one of the compromised vectors used by the ment, the system is automatically compromised.
hacker to get on the system. As an example, below we /JavaScript or /JS keyword indicate the presence
discuss how a PDF document can be used to initiate an of JavaScript code. Malicious PDFs usually con-
attack. tain malicious JavaScript to launch an exploit or
download additional malware. Some objects can
2.1. PDF and Document Files be referred to as /Name instead of their number.
When analyzing PDF, we find three things: Object, Some PDFs have the ability to have files em-
which is the structure of the PDF, Keywords which bedded with keyword /EmbeddedFile, /URL or
control how the PDF works, and Data stored or en- /SubmitForm. /URL is accessed or downloaded
coded within a PDF. when the object is loaded.
4
• Objects are the building blocks of a PDFs. Ev- https://blog.didierstevens.com/programs/pdf-tools/
ery PDF starts with a Header which needs to

2
• PDFs can encode data in multiple ways, which JavaScript, it is difficult for hackers to get their exploit
is very flexible and can store data in a number to work.
of ways. Hackers can encode and hide their data.
For example, names are case sensitive, but can
be fully or partially hex encoded. More precisely,
the # sign followed by two hex characters repre-
sents hex encoded data. Data also can be octal
encoded or represented by their base eight num-
ber. The octal encoded character has a ∖ followed
by three digits between 0 and 7. However, the
hackers can mix hex, octal, and ASCII data all
together, which makes it possible to hide data
such as JavaScript code or URLs.
The names and strings can be encoded, but data
streams can be modified and encoded further using
filters. Filters are algorithms that are applied to the
data to encode or compress within the PDF. There
are multiple filters that can be used in PDFs, such Fig. 3: Malicious JavaScript code
as /ASCiiHexDecode, Hex encoding of characters; /
LZWDecode, LZW compression algorithm; /FlateDecode,
Zlib compression; /ASCii85Decode, ASCII base-85 Most hackers try to hide what their script is doing
representation; and /Crypt, various encryption algo- using obfuscation techniques. Most techniques used to
rithms. For example, in Fig. 2, we have a PDF doc- obfuscate script can be broken down into four different
ument with three objects. Object 1 is a catalog that has categories. How the format of a program is obfuscated
OpenAction and is referring to version 0 of object 2, is shown in Fig. 4; approaches include adding extra
which means as soon as the document is opened, Object lines of code, obfuscating the data, and substituting
2 will be run. Object 2 contains a JavaScript keyword, variable names.
but we do not see any JavaScript code in this object
because the JavaScript keyword refers to another object
which is Object 3. Object 3 is a stream object as indi-
cated by the stream keyword and has been ASCiiHex
encoded and compressed with the Zlib compression
algorithm. However, we have been able to determine
that as soon as the PDF opens, JavaScript will be
executed, and we do not know what the JavaScript’s
goal is. If this is a malicious PDF, it can cause problems.
In Fig. 3, the JavaScript code references the two hosts’
names, performs an HTTP GET request to each, saves
an executable file, and finally runs it. Fig. 4: Obfuscated malicious JavaScript code.

3. Nature of Malware Code


The nature of malware code encompasses various
characteristics and behaviors that define its purpose and
functionality. Malware, short for malicious software,
refers to any code or program designed with malicious
intent to compromise systems, steal information, or
disrupt normal operations. The nature of malware code
can vary depending on its specific type and objectives,
but some common attributes include:
Fig. 2: PDF format example.
3.1. Obfuscation
Is malicious JavaScript used only in documents? Obfuscation is an attempt by an author of a piece
The answer is everywhere. Malicious JavaScript is used of code to obscure the meaning, to make something
in web pages that are created by web attack kits that unclear, or make it very difficult to analyze. It may use
perform drive-by downloads. The user opens the web- encryption or compression to hide its true intentions or
site that has been compromised or loads a malicious to evade signature-based detection by security software.
ad, which then loads malicious JavaScript. Without
3
3.2. Payload Delivery
Malware code typically carries a payload, which
is the malicious action it intends to execute. This can
range from stealing sensitive information (e.g., finan-
cial data, login credentials ) to launching distributed
denial-of-service (DDoS) attacks, encrypting files for
ransom (ransomware), or providing backdoor access for
remote control.

3.3. Command and Control (C&C)


Many malware strains establish communication
channels with remote servers or command-and-control
infrastructure. This allows attackers to remotely control
and manage the infected systems, update the malware,
and receive stolen data.
Fig. 5: Oops, your files have been encrypted!
3.4. Self-Replication
Many malware strains possess the ability to self-
replicate, allowing them to spread across networks, As an example, on May 11, 2022, Costa Rica’s
devices, or files. This replication can occur through newly elected president had to declare a state of na-
various means, such as attaching to exploiting vulnera- tional emergency due to a ransomware attack carried
bilities, legitimate files, or utilizing network resources. out by the Conti ransomware gang. They requested
$10 million, but the demand changed to $20 million
3.5. Exploitation after Costa Rica refused to pay5 . As another example,
Malware leverages vulnerabilities and weaknesses in october 2022, ransomware gang accessed data on
in software, operating systems, or user behavior to gain 270,000 patients from Louisiana hospital system 6 .
unauthorized access or control. It can exploit security Understanding the nature of malware code is cru-
flaws, network vulnerabilities, or social engineering cial for developing effective defense mechanisms and
techniques to compromise systems and execute mali- mitigating its impact. It enables security professionals
cious actions. to develop robust detection methods, implement secu-
rity best practices, and respond promptly to evolving
3.6. Polymorphism threats.
Some malware utilizes polymorphic or metamor-
phic techniques to dynamically change its code struc-
ture or appearance while preserving its functionality.
4. Overview & Malware Detection
This makes it more challenging for antivirus software Malware detection methods are divided into three
to detect and block. types: static, dynamic, and hybrid [1]. Static methods
inspect an executable file without running it, while
3.7. Ransomware dynamic methods must run the executable file and an-
A ransomware usually combines cryptography with alyze its behaviors inside a controlled environment. In
malware. How does it work? The hacker sends the file hybrid methods, the information is collected regarding
to an unknowing victim. When the victim opens the file, malware from static as well as dynamic analysis.
it executes the malware’s payload and encrypts victim Some security researchers use static features by
data such as photos, documents, multimedia, files, and decompiling the target file. Naik et al. [2] proposed a
even confidential records. The hacker offten forces the fuzzy-import hashing technique based on static analysis
victim to pay in cryptocurrency, in most cases Bitcoin. for malware detection. Mohamad et al. [3] proposed
Ransomware has worm-like properties and has machine learning classifiers based on permission-based
names such as WannaCrypt, WanaCrypt0r, WCRY, features for static analysis to detect Android malware.
WanaDecrypt0r, and WCrypt. Each encrypted file is Compared to static analysis, dynamic analysis in-
locked by a different key and encrypted with the RSA cludes system dynamic behavior monitoring, snapshot,
algorithm, which makes the file unaccessible to the debugging, etc. Kim et al. [4] presented a new encoding
owner who does not have the keys. The WannaCry virus technique for dynamic features to identify anomalous
can encrypt a large number of file types. An exhaustive events using Convolutional Neural Networks (CNNs).
list is given in Appendix A. Security researchers have also extracted combined
features from different parts of malware files. Bai et al.
The ransomware replaces the desktop wallpaper [5] extracted features from static and dynamic analysis
with the ransom note file by modifying Windows reg- 5
https://securityintelligence.com/news/costa-rica-state-
istry. It holds all files hostage to demand ransom pay- emergency-ransomware/
ments of $300 and later $600 in the Bitcoin cryptocur- 6
https://www.cnn.com/2022/12/28/politics/hackers-access-
rency as shown in Fig. 5. data-louisiana-hospital-system-ransomware/index.html

4
Table 1
Syslog and Windows log

Syslog Windows Logs


IETF standard Event log
Timestamp Contains source, event ID, and log level
Standard for network equipment logging Logs Application, security, network events from a machine or server
Device-ID, severity level, message number,
Timestamp, user, computer, and process ID
message text
Can be customized on network equipment
Used in most enterprise environments running Windows
for different events and severity levels

of Android apps and applied a deep learning technique. are black and white values in the range [0-255] where
Chaulagain et al. [6] presented a deep learning-based 0 represents black, and 255 represents white.
hybrid analysis technique by collecting different arti-
facts during static and dynamic analysis to train the Gray image feature: The machine stores images
deep learning models. in a matrix of numbers. These numbers, or the pixel
values, denote the intensity or brightness of the pixel.
Smaller numbers (close to zero) represent black, and
5. Data for Malware Detection larger numbers (closer to 255) denote white (see Fig.
Numerous system logs of activities of machines 6).
such as phones, tablets, laptops, and other devices are
generated by the operating system and other infrastruc-
ture software. The data are created and stored on the
local device and sent to remote servers. Analyzing log
data, we can not only detect breaches or suspicious
activity, but we can track behavior through the network.
Log data allow us to track security events, troubleshoot
the infrastructure, and optimize the environment and
the machines. Log data can take many different forms
like syslog, authentication logs, local security event
logs, network asset logs, and system logs. One of goals
in malware detection is to be able to read, search, and
analyze the data efficiently and effectively.
Fig. 6: Malware feature representation in grayscale image
Table 1 contains some information that is useful
from syslog and windows logs. Both kinds of logs have
many components in different format that helps us in
the investigation. RGB images: There are three matrices or channels
(Red, Green, Blue), where each matrix has values be-
tween 0−255. These three colors are combined together
6. Generating Malware Images for Deep in various ways to represent one of 16,777,216 possible
Learning colors (see Fig. 7).
Several tools can visualize and edit a binary file
in hexadecimal or ASCII formats such as IDA Pro7 ,
x32/x64 Debugger8 , HxD9 , PE-bear10 , Yara11 , Fid-
dler12 , Metadata13 , XOR analysis14 , and Embedded
strings15 .
Malware file or code can be used to generate an
image by converting the binary, octal, hexadecimal or
decimal into a two dimensional matrix of pixels. The
image can be grayscale or RGB. In greyscale, pixels
7
https://hex-rays.com/ida-pro
8
https://x64dbg.com/#start
9
https://mh-nexus.de/en/hxd
10
https://hshrzd.wordpress.com/pe-bear
11
https://yara.readthedocs.io/en/stable Fig. 7: Malware feature representation in RGB image
12
https://www.telerik.com/purchase/fiddler
13
https://www.malwarebytes.com/glossary/metadata
14
https://eternal-todo.com/var/scripts/xorbruteforcer Malware can be converted to images in different
15
https://virustotal.github.io/yara/
ways. Yuan et al. [7] converted malware binaries into
5
Markov images by computing transfer probability of accuracy on obfuscated malware detection. Asam et al.
bytes where each pixel is generated by equation 1: [18] proposed two malware image classification ap-
proaches called Deep Feature Space-based Malware
classification (DFS-MC) and Deep Boosted Feature
𝑓 (𝑚, 𝑛)
𝑝𝑚,𝑛 = 𝑃 (𝑛|𝑚) = 𝑚, 𝑛 ∈ {0, 1, ..., 255}. Space-based Malware classification (DBFS-MC). The

255
approach achieved a good accuracy of 98.61% on the
𝑓 (𝑚, 𝑛) MalImg malware dataset.
𝑛=0
Xiao et al. [19] presented a visualization method
(1)
called Colored Label boxes (CoLab) to specify each
Mohammed et al. [8] used a vector of 16-bit signed section in a PE file and convert it to malware image.
hexadecimal numbers to represent a 256 × 256 image. The authors built a composed CoLab image,cand used
Then, they computed bi-gram frequency counts which VGG16, and Support vector machine for classification.
they used as pixel intensity values. Full-frame Discrete The model was applied on two datasets, VX-Heaven16
Cosine Transform (DCT) [9] was computed to de- and BIG-2015, with 96.59% and 98.94% average accu-
sparsify, and the bigram-DCT was used to represent racies, respectively. A comparison of reviewed malware
the output image. Euh et al. [10] proposed Window images classification is discussed in Table 2.
Entropy Map (WEM) to visualize malware as an image.
They calculated the entropy for each byte to measure the
8. Feature Reduction for Efficient
degree of uncertainty. Ni et al. [11] converted malware
code into gray images using SimHash [12] and then Malware Detection
encoded them. They mapped SimHash values to pixels Feature Reduction reduces the number of variables
and then converted them to grayscale images. or features in the representation of a data example.
Approaches to feature reduction can be divided into
two subcategories called a) Feature Selection which in-
7. Image Classification for Malware cludes methods such as Wrappers, Filters, and Embed-
Detection ded, and b) Feature Extraction, which includes methods
Deep learning can solve diverse "vision" problems, such as Principal Components Analysis [26]. How does
including malware image classification tasks. Deep Feature Reduction improve performance? It does by
learning can extract features automatically obviating reducing the number of features that are considered for
manual feature extraction. The content of the malware analysis.
executable file is first converted into a digital image. In feature extraction, we start with 𝑛 features 𝑥1 , 𝑥2 , 𝑥3
Nataraj et al. [13] visualized the byte codes of samples , ...., 𝑥𝑛 , which we map to a lower dimensional space
from 25 malware families as grayscale images. Several to get the new features 𝑧1 , 𝑧2 , 𝑧3 , ...., 𝑧𝑚 where 𝑚 < 𝑛.
visualization techniques have been used for malware Each of the new features is usually linear a combination
classification. The basic idea used in these methods is to of the original feature set 𝑥1 , 𝑥2 , 𝑥3 , ...., 𝑥𝑛 . Thus, each
explore the distinguishing patterns in malware images. new feature is obtained as a function F(X) of the
In addition, the visualization techniques help find the original feature set X. This makes a projection of a
correlations among different malware families. Some higher dimensional feature space to a lower dimen-
existing approaches generate grayscale images and sional feature space, so that the smaller dimensional
others generate RGB images. Most existing approaches feature set may lead to better classification or faster
use global features to generate malware image. classification (see equation 2).
Yuan et al. [7] proposed a method based on Markov
images according to the byte transmission probability [ ]⊺ ([ ]⊺ )
matrix. They used a CNN to classify Markov malware 𝑧1 … 𝑧𝑚 =𝐹 𝑥1 … 𝑥𝑛 (2)
images without scaling. Narayanan and Davuluru [14]
proposed an ensemble approach using RNN and CNN In feature selection, we choose a subset of the
architectures for malware image classification. Images features, in contrast to feature extraction where we map
were generated from assembly compiled files and clas- the original features to a lower dimensional space. The
sified using CNNs. Zhu et al. [15] proposed a Task- smaller dimensional feature set can help produce better
Aware Meta Learning-based Siamese Neural Network as well as faster classification. To do that, we need
to classify obfuscated malware images. Their model to find a projection matrix 𝑊 ∋ 𝑍̄ = 𝑊 𝑇 𝑋. ̄ We
showed high effectiveness on unique malware signa- expect from such a projection that the new features
ture detection to classify obfuscated malware. Chauhan are uncorrelated and cannot be reduced further and are
et al. [16] visualized malware files in different color non redundant. Next, we need features to have large
modes, RGB, HSV, greyscale, and BGR. They used a variance: Why? Because if a feature takes similar values
support vector machine (SVM) to classify these mal- for all the instances, that feature cannot be used as a
ware images, with accuracy of 96% in all modes. Darem discriminator.
et al. [17] designed a semi-supervised method based
on malware image and feature engineering for obfus- 16
https://archive.org/download/vxheavens-2010-05-18
cated malware detection. The model achieved 99.12%
6
Table 2
Comparative performance summary of Transfer Learning models for malware image classification.

Reference Features Model Files Accuracy Dataset


Çayır et al. [20] gray-scale images CapsNet PE 98.63% Malimg
Çayır et al. [20] gray-scale images RCNF PE 98.72% Malimg
Go et al. [21] gray-scale images ResNeXt PE 98.32% Malimg
Bensaoud et al. [22] gray-scale images Inception V3 PE 99.24% Malimg
El-Shafai et al. [23] gray-scale images VGG16 PE 99.97% Malimg
Hemalatha et al. [24] gray-scale images DenseNet PE 98.23% Malimg
Hemalatha et al. [24] gray-scale images DenseNet PE 98.46% BIG 2015
Lo et al. [25] gray-scale images Xception PE 99.03% Malimg
Lo et al. [25] gray-scale images Xception PE 99.17% BIG 2015

Feature extraction methods such as a Principal learning algorithm on large datasets can be done in two
Component Analysis (PCA) [26], GIST [27], Hu Mo- ways, as discussed below.
ments [28], Color Histogram [29], Haralick texture
[30], Discrete Wavelet Transform (DWT) [31], In- 9.1. Using feature extraction
dependent Component Analysis (ICA) [32], Linear Feature extraction discussed earlier is a practical
discriminant analysis (LDA) [33], Oriented Fast and and common, and low resource-intensive way of using
Rotated BRIEF (ORB) [34], Speeded Up Robust Fea- pre-trained networks. It takes the convolutional base of
ture (SURF) [35], Scale Invariant Feature Transform a previously trained network and runs the malware data
(SIFT) [36], Dense Scale Invariant Feature Transform through it, and then trains a new classifier on top of the
(D-SIFT) [36], Local Binary Patterns (LBPs) [37], output. As shown in Fig. 8, we can choose a network
KAZE [38] have been combined with machine learning such as VGG16 [47] that has been trained on ImageNet,
including deep learning. These methods successfully as an example. The input fed at the bottom, goes up to
filter the characteristics of malware files. the trained convolutional base, representing the CNN
region of the VGG16. The trained classifier resides in
Azad et al. [39] proposed a method named DEEPSEL the dense region and the prediction is made by this
(Deep Feature Selection) to identify malicious codes dense region at the end. Usually, we have 1000 neurons
of 39 unique malware families. Their model achieved at the end to predict the actual ImageNet classes. We
an accuracy of 83.6% and an F-measure of 82.5%. take this ImageNet trained model as base, and remove
Tobiyama et al. [40] proposed feature extraction based the classifier layer, keeping the convolutional layers of
on system calls. Recurrent Neural Network was used to the pre-trained model, along with their weights. In the
extract features and Convolutional Neural Network to next step, we attach a new classifier that has new dense
classify these features. layers for malware classification on top. The weights
of the base are frozen, which means that the malware
input passes through convolutional layers which have
9. Deep Transfer Leaning models for their prior weights, during training. However, all dense
layers are randomly initialized, and the interconnection
Malware detection weights for these layers are learned during the new
Transfer learning takes place if we have a source training process for detecting malware.
model which has some pre-trained knowledge and this Why remove the original dense layers? What has
knowledge is needed as the foundation to build a new been observed is that the representations learned by the
model [41]. For example, using a very large pre-trained convolutional base are generic and therefore reusable
convolutional neural network usually involves saving for a variety of tasks.
a network that was previously trained on some large
dataset, typically on a large-scale image classification 9.2. Using fine tuning
task, using a dataset like ImageNet [42]. After training Fine-tuning involves changing some of the convo-
a network on the ImageNet dataset, we can re-purpose lutional layers by learning new weights. In Fig. 9,
this trained network. Research papers have discussed we have a network divided into three regions. The
applying these pre-trained networks to malware image yellow region is a pre-trained model. The green region
datasets [43, 44, 45, 46] that are generated form PE and represents our dense layers for which we need to learn
APK malware files, which are quite different from each the weights. During training using a library such as
other. Keras [48] and Tensorflow [49], we can select certain
Malware image datasets are very different from Im- layers and freeze the weights of those layers.
ageNet, which is normally used to pre-train the model. For example, we can select convolutional block one
The ImageNet dataset and a malware image dataset and then freeze all the weights of the convolutional
represent visually completely different images. How- layers, in this block only. This means that during train-
ever, pre-trained still seems to help. Training a machine ing, everything else will change, but the weights of
7
Fig. 9: Fine Tuning of Transfer Learning

Fig. 10 shows how to train the model on an image


dataset. We randomly initialize the model, and then
train the model on dataset X, which is a large-scale
Fig. 8: Feature Extraction for Transfer Learning
image dataset. This is the pre-training step. Next, we
train the model on dataset Y; this dataset is typically
smaller than dataset X. This is the fine-tuning step.
the convolutional layers in this block will not change. State-of-the-art transfer learning models we have
Similarly, we can keep frozen the convolutional layers trained and evaluated for malware classification are
of the next block as well as blocks three and four if EffNet B0, B1, B2, B3, B4, B5, B6, and B7 [57];
we so wish. Then, we can fine-tune the convolutional Inception-V4 [58], Xception [59], and CapsNet [60]
layers that are closer to the dense layer. As a result, as shown in Table 3. The datasets used are our RGB
the initial layers of representation are kept constant, but malware image dataset and two other datasets, namely
new representations are learned by later layers (yellow Malimg Dataset [13] and Microsoft Malware Dataset
region) as their weights change, evolve and get updated. [61]. The accuracy and loss curve plots for EffNet B1,
Thus, fine-tuning means unfreezing a few of the B2, B3, B4, B5, B6, and B7 are shown in Appendix B
top layers of a frozen model base used for feature and EffNet B0 shows in Fig. 11.
extraction. What we simply do is jointly train the newly We found that the Inception-V4 model is most ef-
added top part of the model (green region) consisting of fective in classifying malware images among the ten
dense layers, and the top convolutional layers (yellow models. In addition, the training times for each model
region), for which we have unfrozen the weights. increases with increase in the size of input images since
Why fine-tune in this manner? Because, we slightly the number of network cells grows quickly in GPU
adjust the more abstract representations of the model RAM.
being reused to make them more relevant for the
problem at hand. Sudhakar and Kumar [50] redesigned 9.3. Analysis of Transfer Learning for
ResNet50 [51] by changing the last layer with a fully Malware Classification
connected dense layer to detect unknown malware We found that transfer learning based image clas-
samples without feature engineering. Go et al. [52] sification, with a small number of parameters to re-
proposed a visualization approach to classify the mal- train successfully to classify malware images. On the
ware families by using a ResNeXt50 pre-trained model. other hand, we argue that scaled up wider and deeper
The model achieved 98.86% accuracy on the Mal- transfer models with more parameters builds a new
img dataset [13]. Çayır et al. [20] built an ensem- model that may improve performance. Inception-V3
ble pre-trained capsule network (CapsNet) [53] based and Inception-V4 for malware detection and classifi-
on the bootstrap aggregating approach. The model cation avoid the inefficiencies in classifying unknown
was trained and tested on two public datasets, Mal- malware grayscale and RGB images among transfer
img, and BIG2015. Their model achieved F-Score learning classification model. There are many transfer
96.6% on the Malimg dataset [13] and 98.20% on learning models techniques such as batch normalization
the BIG2015 dataset17 . Bensaoud et al. [22] used [62], skip connections [63] that are designed to help
six convolutional neural network models for malware in training, but the accuracy still needs to improve.
classification. Comparison among these models shows For instance, ResNet-101 and ResNet-50 have similar
that the transfer learning model called Inception-V3 accuracies in terms of malware detection even though
[54] achieved the current state-of-the-art in malware they have very different deep networks [64].
classification. Khan et al. [55] evaluated ResNet and
GoogleNet [56] models for malware detection by con-
verting an APK bytecode into grayscale image. Table 10. Natural Language Processing for
3 summarizes the most transfer learning models for Malware Classification
malware classification. We conclude that CNN transfer Natural Language Processing (NLP) extracts valu-
learning models can be fine-tuned to specific image able information so that a program is able to read,
sizes that are robust enough and accurate to use mal- understand and derive meaning from human language
ware image classification. text or speech. Malware data contain executable files,
17
https://www.kaggle.com/c/malware-classification Microsoft Word files, macro files, logs from different
operating systems, emails, network activities, etc. Many
8
Table 3
Fine-tuned pre-trained models applied on different malware image datasets.

Setting Average Accuracy Our Dataset


Pre-trained Model Samples Resize image Epoch Malimg Microsoft Challenge Drebin Accuracy
EffNet B0 30,000 224 200 92.72% 90.45% 87.23% 94.59%
EffNet B1 30,000 240 200 95.64% 93.65% 88.91% 95.89%
EffNet B2 20,000 260 200 93.84% 91.78% 86.82% 94.12%
EffNet B3 15,000 300 400 90.32% 94.19% 89.35% 95.73%
EffNet B4 20,000 380 400 95.63% 96.68% 90.59% 97.98%
EffNet B5 25,000 456 400 80.19% 87.54% 84.23% 94.68%
EffNet B6 40,000 528 400 85.67% 83.82% 85.43% 93.54%
EffNet B7 30,000 600 1000 82.76% 80.76% 90.57% 88.45%
Inception V4 20,000 229 300 95.98% 93.21% 88.93% 96.39%
Xception 20,000 229 200 89.50% 90.84% 84.39% 93.53%
CapsNet 3,000 256 100 88.64% 72.69% 78.68% 92.65%

Fig. 10: Transfer Learning steps

others contain snippets of text mixed with code and


other information. NLP can be used to enhance mal-
ware classification due to the extensive use of text or
text-like content within malware. A critical requirement
for malware text classification is using effective text
representation in the form text encoding. The initial
step in text encoding is preprocessing by removing a
redundant opcode or API fragments, discarding un-
necessary text. After tokenization, there are different
types of non-sequential text representations [65] such
Fig. 11: Training and testing for accuracy and loss of as Bag of Words (BoW), Term Frequency Inverse doc-
EfficientnetB0 ument frequency matrices (TFIDF), Term document
matrices (TDM), n-grams, One hot encoding, ASCII
representations, and modren word embedding such as
of these files contain extensive amounts of text; some Word2vec [66] and Sent2vec [67]. Table 4 presents
9
Table 4 also use each state of the encoder along with the current
The steps of encoding the domain by NLP. decoder state to generate a dynamic context vector.
There are two benefits; the first benefit is encoding
Domain Notes information contained in a sequence of vectors not
www.uccs.edu Start with domain just in one single context vector. The second benefit
uccs Extract second level is to choose a subset of these vectors adaptively while
["u","c","c","s"] Convert to sequence decoding the translation.
Translate character to
[21,3,3,18]
numeric values
[0,0,0,....,0,21,3,3,18] Pad sequence

text representation methods used in malware classifica-


tion. Current word embeddings, when used in malware
classification, do not carry much semantic and contex-
tual significance. Bensaoud and Kalita [68] proposed
a novel model for malware classification using API
calls and opcodes, incorporating a combined Convolu-
tional Neural Network and Long Short-Term Memory
architecture. By transforming features into N-gram se-
quences and experimenting with various deep learning
architectures, including Swin-T and Sequencer2D-L,
the method achieves a high accuracy of 99.91%, sur-
Fig. 12: Encoder and decoder
passing state-of-the-art performance. Mimura and Ito
[69] designed NLP-based malware detection by using
printable ASCII strings. The model can detect effec-
tively packed malware and anti-debugging. Sequence to
Sequence neural models are commonly used for natural
languages processing and therefore used for malware
detection as well.

10.1. Sequence to Sequence Neural Models


Fig. 13: Encoder and decoder include RNNs
Attention mechanism [70] has achieved high per-
formance in sequential learning applications such as
machine translation [71], image recognition [72], text
summarization [73], and text classification [74]. At-
tention mechanism was designed to improve the per-
formance of the encoder-decoder machine translation
approach [75]. The encoder and decoder are usually
many stacked RNN layers such as LSTM as shown in
Fig. 12. The encoder converts the text into a fixed-
length vector while the decoder generates the transla-
tion text from this vector. The sequence {𝑥1 , 𝑥2 , ..., 𝑥𝑛 }
can either be a representation of text or image as shown
in Fig. 13. In case of sequences, Recurrent Neural Net-
works (RNNs) can take two sequences with the same
or arbitrary lengths. In Fig. 14, the encoder creates
a compressed representation called context vector of
the input, while the decoder gets the context vector
to generate the output sequence. In this approach, the
network is incapable of remembering dependencies in
Fig. 14: Encoder and decoder include RNNs with atten-
long sentences. This is because the context vector needs tion mechanism
to handle potentially long sentences, and a shoot overall
representation does not have the especially to store
many potential dependencies. An attention mechanism is another Lego block that
Attention in encoder-decoder: Bahdanau et al. can be used in any deep learning model. Vaswani et al.
[76] proposed an encoder-decoder attention mechanism [77] showed that an attention mechanism is apparently
framework for machine translation. A single fixed con- the only Lego block one needs. It improved the perfor-
text vector is created by an RNN by encoding the input mance of a language translation model by dynamically
sequence. Rather than using just the fixed vector, we can choosing important parts of the input sequence that
matter at a certain point in the output sequence. We can
10
entirely replace traditional Recurrent Neural Network malware across different datasets and scenarios. How-
(RRN) blocks by an attention mechanism block. When ever, a comprehensive overview of the comparative per-
dealing with sequential data, the attention mechanisms formance of these methods is needed, highlighting their
allow models to not only perform better but also train strengths and capabilities in addressing the challenges
faster. of malware detection.
Applying attention mechanism in malware clas-
sification: Or-Meir et al. [78] added an attention mech- 11. Deep Learning for Cryptographic
anism to an LSTM model, which improved accuracy
in malware classification. Yakura et al. [79] proposed Ransomware
a method by using Convolutional Neural Network with Cryptography has been used traditionally for mil-
Attention Mechanism for malware image classification. itary and government use, to keep secrets from the
Mimura and Ohminami [80] proposed a sliding local enemy. Today most of us use cryptography when we
attention mechanism model (SLAM) based on API use commercial websites or services. For example, we
execution sequence. Ma et al. [81] proposed a malware use it to protect our emails. A lot of countries try to
classification framework (ACNN) based on two sec- control the export of cryptography to make sure that
tions within the malware text, the assembly code and bi- good cryptographic algorithms are not in the hands of
nary code, and converted them into multi-dimensional criminals, enemies, or adversaries. This is the idea be-
features. A CNN with attention mechanism for classifi- hind export administration, and regulations as codified
cation has a higher malware image classification accu- in International Traffic in Arms Regulations (ITAR)18 .
racy than conventional methods [79]. To build predic- In addition, we have various agreements like the Wasse-
tive models using LSTM and attention mechanism for naar Arrangement19 , where a number of countries got
malware classification, we need to add an embedding together and developed an agreement for what crypto-
layer followed by an LSTM layer and dense layers . This graphic elements can be exported and imported without
approach is superior to capturing a long sequence of any type of restrictions. This agreement allows publicly
Windows API call sequences and using them directly available cryptographic algorithms to be distributed
[82] (see Fig. 15). Malware’s longer sequence can freely. Cryptography provides various security capabil-
be addressed by attention mechanisms that can help ities for us.
detect short repeating patterns and other dependencies
• Confidentiality: To protect our intellectual prop-
[83]. While attention mechanism improves accuracy, it
erty from somebody else being able to get hold of
suffers from the heavy computation.
it.
• Non-repudiation: To repudiate is to deny. For
example, if we use digital signatures, we can pro-
vide proof that the message came from the person
who signed. We can link the signed document to
a trusted person, which gives us trust or assurance
in the world of e-contracts and e-commerce. The
signer cannot repudiate or deny being the source
of the document.
• Integrity: Hashing provides integrity, to know
that a message was not changed either acciden-
tally or intentionally as it was transmitted or
stored. Integrity is built into implementation of
electronic communication services today using
such as SHA algorithms20 and MD521 .
Fig. 15: LSTM with attention mechanism for malware
classification • Proof-of-Origin: Cryptography can be used to
prove where a message came from, the idea of
Table 5 shows various approaches and their corre- Proof-of-Origin.
sponding accuracies. The methods presented, including
• Authenticity: The idea is to ensure that com-
MAPAS, MaMaDroid, Deep Generative Model, Deep-
munication is with the intended person. For ex-
Ware, Multi-Modal Deep Learning, and Deep Multi-
ample, if we go to a bank’s website, then we
Task Learning, employ diverse techniques such as API
want to be sure that the website is truly of that
call graph analysis, static analysis, and hybrid deep
generative models. Particularly, these methods are eval- 18
https://csrc.nist.gov/glossary/term/itar
19
uated on distinct datasets, indicating that the compar- https://www.federalregister.gov/documents/2022/08/15/2022-
17125/implementation-of-certain-2021-wassenaar-arrangement-
isons are not based on the same dataset. The authors aim decisions-on-four-section-1758-technologies
to convey the effectiveness of these models in detecting 20
https://csrc.nist.gov/glossary/term/sha
21
https://csrc.nist.gov/glossary/term/md5
11
Table 5
State-of-the-art deep learning models.

Ref Deep Learning Approach OS Features Accuracy


Kim et al. [84] MAPAS Android API call graphs 91.27%
Onwuzurike et al. [85] MaMaDroid Android API calls 84.99%
Dalvik code,
API call,
Kim and Cho [86] Deep Generative Model Android 97.47%
Malware images,
developers’ signature
Olani et al. [87] DeepWare Windows/Linux HPC 96.8%
Grayscale image,
Lian et al. [88] Multi-Modal Deep Learning Windows Byte/Entropy 97.01%
Histogram
Windows
Deep multi-task Android Grayscale
Bensaoud and Kalita [89] 99.97%
learning Linux color image
MacOS

bank, not that of an impostor or somebody else


masquerading as that bank.

11.1. Operations of Cryptography


Cryptographic algorithms come in three basic fla-
vors: Symmetric, Asymmetric, and Hash algorithms.
Each of these different types of algorithms serves a dif-
ferent purpose, but all work together in a cryptography
system.
Cryptography is a key to keeping communicated
information secret by converting it into an unreadable
code that is hard to break. To encrypt or encipher is to
take a plaintext message and convert it into something
unreadable to anyone who does not have a key. To Fig. 16: Crypto Action
decrypt or decipher is the reverse step.
In Fig. 16, the basic action includes plaintext being
fed into a cryptosystem. This process is used to encrypt Multiple attacks, such as a man-in-middle attack, brute
and decrypt a message. It contains an algorithm that force attack, biclique attack, ciphertext only attack, known
uses a mathematical process to convert a message from plaintext attack, chosen plaintext attack, chosen ciphertext
plaintext to ciphertext and then back again. The algo- attack, and chosen text attack can discover the key to find
rithm includes a key or a cryptovariable. The variable the plaintext. Attackers know the mathematical relationship
is used by the algorithm during the encryption and of the keys for some algorithms, such as Advanced Encryption
decryption processes. Typically the key is a secret pass- Standard (AES) [90], Triple DES [91], Blowfish [92], and
word, passphrase, or PIN chosen either by the person or Rivest-Shamir-Adleman (RSA) [93]. We perform cryptana-
by the tool that encrypts the message. This combination lysts using statistical measures to try to get the cipher type,
of the key (or a cryptovariable) and the algorithm in the but a cryptanalyst can only test as many solvers via trial and
cryptosystem produces a unique ciphertext. error to test if the ciphertext was encrypted using a specific
In the symmetric algorithm family, a symmetric cipher. Machine learning can tell us what type a cipher is [94].
key is one that is a shared secret between the sender The cipher type detection problem is a classification problem.
and receiver of the information. The same key used for We can use statistical values as features for machine learning.
encryption is also used for decryption. It is not safe to
send a copy of the key along with the message that it 11.2. Connection Between Deep Learning and
encrypts. We need to use another mode of communica- Cryptography
tion to transmit the key. For example, Ahmed sends the A neural network can deal with the complexity of
symmetric key to Bryan using a certain secure node of computation applied to perform cryptography. Instead
communication. Once Bryan has the key, Ahmed can of giving an image to a neural network, we can give
encrypt the plaintext message into ciphertext and send ciphertext to the neural network to classify the kind
it over a public network to Bryan with confidence that of algorithm that was used to obtain the ciphertext,
it will remain encrypted until Bryan decides to decrypt as shown in Fig. 17. To build a machine learning
with the received key. model, we can represent different features of the cipher,
which cryptanalysts usually use to identify them. We
12
Fig. 17: Cryptocurrency malware detection using machine learning

need to put an intermediate layer between the net- attack characteristics of unlabeled ransomware sam-
work and ciphertext that computes the features, such ples using a deep learning-based unsupervised learned
as Unigram frequencies, Bigram frequencies, Index of model. Fischer et al. [100] designed a tool to detect se-
Coincidence IoC, HasDoubleLetters, etc., and then we curity vulnerabilities of cryptographic APIs in Android
can train the network with millions of ciphertext and by achieving an average AUC-ROC of 99.2%.
all American Cryptogram Association (ACA) cipher
types. For example, in Fig. 18, the three blue neural
networks are given the frequencies of N-grams (1- 12. Explainable Artificial Intelligence
grams, 2-grams, 3-grams, 4-grams, etc.), and the green (XAI)
neural network computes HasDoubleLetters. Then we Explainable Artificial Intelligence (XAI) is a rapidly
have a hidden layer that connects the input and output emerging field that focuses on creating transparent and
layers. Finally, in this case the designed neural network interpretable models (see Fig. 19). In the context of
shows 90% Seriated Playfair, and the green neural malware detection, XAI can help security experts and
network shows 10% Bazeries. Baksi [95] designed a analysts understand how a machine learning model
machine-learning model for differential attacks on the arrived at its decisions, making it easier to identify
non-Markov 8-round GIMLI cipher and GIMLI hash. and understand false positives and false negatives. By
They applied multi-layer perceptron (MLP), Convolu- applying XAI techniques, such as Local Interpretable
tional Neural Networks (CNN), and Long Short-Term Model-Agnostic Explanations (LIME) [101] or Deep
Memory (LSTM). Learning Important Features (DeepLIFT) [102], secu-
The ransomware families to encrypt data and force rity teams can gain insights into the most important
the victim to make payment via cryptocurrency include features and decision-making processes of the model.
WannaCry, Locky, Stop, CryptoJoker, CrypoWall, Tes- This can help them identify areas where the model
laCrypt, Dharma, Locker, Cerber, and GandCrab. Re- may be vulnerable to evasion or identify new malware
cently, deep learning algorithms have been used for strains that the model may have missed. Ultimately,
cryptography [96]. Ding et al. [97] proposed DeepEDN XAI can improve the trustworthiness and reliability
to fulfill the process of encrypting and decrypting med- of machine learning models for malware detection,
ical images. Kim et al. [98] proposed detection of cryp- enabling more effective threat detection and response.
tographic ransomware using Convolutional Neural Net- Nadeem et al. [103] provided a comprehensive sur-
work. Their model prevents crypto-ransomware infec- vey and analysis of the current state of research on ex-
tion by detecting a block cipher algorithm. Sharmeen plainable machine learning (XAI) techniques for com-
et al. [99] proposed an approach to extract the intrinsic puter security applications. The paper highlights the
13
Fig. 18: Detect the Cipher Type With Neural Networks

Fig. 19: Using explainable artificial intelligence in deep learning

challenges and opportunities for adopting XAI in the Ablation (FA), and Local Interpretable Model-Agnostic
security domain and discusses several approaches for Explanations (LIME).
designing and evaluating explainable machine learning Guo et al. [110] proposed an approach called Ex-
models. Vivek et al. [104] proposed an approach for plaining Deep Learning based Security Applications
detecting ATM fraud using explainable artificial intelli- (LEMNA) for security applications, which generates
gence (XAI) and causal inference techniques. They pre- interpretable features to explain how input samples are
sented a detailed analysis of the proposed method and classified. Kuppa and Le-Khac [111] presented a com-
highlighted its effectiveness in improving the accuracy prehensive analysis of the vulnerability of XAI methods
and interpretability of ATM fraud detection systems. to adversarial attacks in the context of cybersecurity,
Kinkead et al. [105] proposed an approach that uses discussing potential risks associated with deploying
LIME to identify important locations in the opcode XAI models in real-world applications, and proposing
sequence that are deemed significant by the Convolu- a framework for designing robust and secure XAI sys-
tional Neural Network (CNN). McLaughlin et al. [106] tems. Rao and Mane [112] proposed an approach to
used LRP [107] and DeepLift [102] methods to iden- protect and analyze systems against the alarm-flooding
tify the opcode sequences for most malware families, problem using the NSL-KDD dataset. They included a
and they demonstrated that the CNN, while using the Security Information and Event Management (SIEM)
DAMD dataset, learned patterns from the underlying system to generate a zero-shot method for detecting
op-code representation. Hooker et al. [108] proposed a alarm labels specific to adversarial attacks. Although
method to remove relevant features detected by an XAI explainable artificial intelligence (XAI) has gained sig-
approach and verify the accuracy degradation. Lin et al. nificant attention, its effectiveness in malware detection
[109] presented seven different XAI methods and auto- still requires further investigation to fully comprehend
mated the evaluation of the correctness of explanation its performance.
techniques. The first four XAI methods are white-box
approaches to determine the importance of input fea-
tures: Backpropagation (BP), Guided Backpropagation 13. Adversarial Attack on Deep Neural
(GBP), Gradient-weighted Class Activation Mapping Networks
(GCAM), and Guided GCAM (GGCAM). The last Adversarial examples refer to maliciously crafted
three are black-box approaches that observe an essential inputs to machine learning models designed to deceive
feature in the output probability using perturbed sam- the model into making incorrect predictions. Deep de-
ples of the input: Occlusion Sensitivity (OCC), Feature tection in this context refers to the use of deep learning

14
models for detecting and classifying objects or pat- combines random forests and LIME to identify the most
terns in the input data. Adversarial examples can be important features and thus improve the interpretability
specifically crafted to evade deep detection models and and robustness of the model. Meenakshi and Mara-
cause them to misclassify or miss the target objects or gatham [119] proposed a defensive technique using
patterns. Therefore, adversarial examples can be seen Curvelet transform to recognize adversarial iris images,
as a type of attack on deep detection models. Adver- optimizing the image classification accuracy. The de-
sarial examples can be generated using a variety of signed method was shown to be effective against several
techniques, including optimization-based approaches existing adversarial attacks on iris recognition systems.
and perturbation-based approaches, and can be used Pintor et al. [120] introduced a method for debugging
for various objectives, including evasion attacks and and improving the optimization of adversarial examples
poisoning attacks. Zhong et al. [113] proposed a novel by identifying and analyzing the indicators of attack
adversarial malware example generation method called failure. The proposed method can help to improve the
Malfox, which uses conditional generative adversarial robustness of deep learning models against adversarial
networks (conv-GANs) to generate camouflaged ad- attacks.
versarial examples against black-box detectors. The
presented method was evaluated on two real-world
14. Conclusion
malware detection systems, and the results showed that
Malfox achieved high attack success rates while main- Machine learning has started to gain the attention
taining low detection rates. Zhao et al. [114] proposed of malware detection researchers, notably in malware
a new method called SAGE for steering the adversarial image classification and cipher cryptanalysis. However,
generation of examples with accelerations. The tech- more experimentation is required to understand the
nique combines the advantages of gradient-based and capabilities and limitations of deep learning when used
gradient-free methods to generate more effective and to detect/classify malware. Deep learning can reduce
efficient adversarial examples. the need for static and dynamic analysis and discover
The development of defense mechanisms against suspicious patterns. In the future, researchers may con-
adversarial attacks is a computationally expensive pro- sider developing more accurate, robust, scalable, and
cess, which can potentially affect the performance of efficient deep learning models for malware detection
the deep learning model. In addition, adversarial ex- systems for various operating systems. Finally, multi-
amples can impact the generalization ability of deep task learning and transfer learning can provide valu-
learning models, resulting in poor performance on new able results in classifying all types of malware. Fur-
and unseen data. Moreover, generating adversarial ex- thermore, we show that the significant challenges of
amples can be computationally intensive, especially for deep learning approaches that need to be considered
large datasets and complex models, which can hinder are hyperparameters optimization, fine-tuning, and size
the practical deployment of deep learning models in and quality of datasets when features are overweighted
real-world applications. Thus, further research is re- or overrepresented. We also illustrate the opportunities
quired to improve the efficiency and effectiveness of de- and challenges of XAI in deep learning as well as future
fense mechanisms, as well as the generalization ability research directions in the context of malware detection.
and robustness of deep learning models to adversarial Finally, we presented the idea of adversarial attacks on
attacks. deep neural networks by introducing small, carefully
Hu and Tan [115] proposed a method to generate crafted perturbations to input data in order to cause
adversarial malware examples using Generative Adver- misclassification or reduce model performance.
sarial Networks (GANs) for black-box attacks. Their
results show that the generated adversarial malware
samples can evade detection by existing machine learn- References
ing models while maintaining high similarity to the [1] A. Damodaran, F. Di Troia, C. A. Visaggio, T. H. Austin,
original malware. Ling et al. [116] conducted a survey M. Stamp, A comparison of static, dynamic, and hybrid
of the state-of-the-art in adversarial attacks against analysis for malware detection, Journal of Computer Virology
and Hacking Techniques 13 (2017) 1–12.
Windows PE malware detection, covering various types [2] N. Naik, P. Jenkins, N. Savage, L. Yang, T. Boongoen, N. Iam-
of attacks and defense mechanisms. The authors also On, Fuzzy-import hashing: A static analysis technique for
provided insights on potential future research directions malware detection, Forensic Science International: Digital
in this area. Xu et al. [117] proposed a semi-black-box Investigation 37 (2021) 301139.
adversarial sample attack framework called Ofei that [3] Mohamad, J. Arif, M. F. Ab Razak, S. Awang, S. R. Tuan Mat,
N. S. N. Ismail, A. Firdaus, A static analysis approach for
can generate adversarial samples against Android apps android permission-based malware detection systems, PloS
deployed on a DLAAS platform. The framework uti- one 16 (2021) e0257968.
lizes a multi-objective optimization algorithm to gener- [4] T. Kim, S. C. Suh, H. Kim, J. Kim, J. Kim, An encoding
ate robust and stealthy adversarial samples. Qiao et al. technique for cnn-based network anomaly detection, in: 2018
[118] proposed an adversarial detection method for IEEE International Conference on Big Data (Big Data), 2018,
pp. 2960–2965. doi:10.1109/BigData.2018.8622568.
ELF malware using model interpretation and show that [5] Y. Bai, Z. Xing, D. Ma, X. Li, Z. Feng, Comparative analysis
their method can effectively identify adversarial ELF of feature representations and machine learning methods in
malware with high accuracy. The proposed approach
15
android family classification, Computer Networks 184 (2021) [25] W. W. Lo, X. Yang, Y. Wang, An xception convolutional
107639. neural network for malware classification with transfer learn-
[6] D. Chaulagain, P. Poudel, P. Pathak, S. Roy, D. Caragea, ing, in: 2019 10th IFIP International Conference on New
G. Liu, X. Ou, Hybrid analysis of android apps for security Technologies, Mobility and Security (NTMS), 2019, pp. 1–5.
vetting using deep learning, in: 2020 IEEE Conference on doi:10.1109/NTMS.2019.8763852.
Communications and Network Security (CNS), 2020, pp. 1–9. [26] N. Barath, D. Ouboti, M. Temesguen, Pattern recognition
doi:10.1109/CNS48642.2020.9162341. algorithms for malware classification, in: proceeding of 2016
[7] B. Yuan, J. Wang, D. Liu, W. Guo, P. Wu, X. Bao, Byte- IEEE Conference of Aerospace and Electronics, 2016, pp.
level malware classification based on markov images and deep 338–342.
learning, Computers & Security 92 (2020) 101740. [27] A. Oliva, A. Torralba, Modeling the shape of the scene: A
[8] T. M. Mohammed, L. Nataraj, S. Chikkagoudar, S. Chan- holistic representation of the spatial envelope, International
drasekaran, B. Manjunath, Malware detection using frequency journal of computer vision 42 (2001) 145–175.
domain-based image visualization and deep learning, arXiv [28] M.-K. Hu, Visual pattern recognition by moment invariants,
preprint arXiv:2101.10578 (2021). IRE transactions on information theory 8 (1962) 179–187.
[9] S. A. Khayam, The discrete cosine transform (dct): theory and [29] M. J. Swain, D. H. Ballard, Color indexing, International
application, Michigan State University 114 (2003) 1–31. journal of computer vision 7 (1991) 11–32.
[10] S. Euh, H. Lee, D. Kim, D. Hwang, Comparative analysis [30] W.-C. Lin, J. Hays, C. Wu, V. Kwatra, Y. Liu, A comparison
of low-dimensional features and tree-based ensembles for study of four texture synthesis algorithms on regular and near-
malware detection systems, IEEE Access 8 (2020) 76796– regular textures, Tech. Rep. (2004).
76808. [31] K. Kancherla, J. Donahue, S. Mukkamala, Packer identifica-
[11] S. Ni, Q. Qian, R. Zhang, Malware identification using tion using byte plot and markov plot, Journal of Computer
visualization images and deep learning, Computers & Security Virology and Hacking Techniques 12 (2016) 101–111.
77 (2018) 871–885. [32] J. Herault, C. Jutten, Space or time adaptive signal processing
[12] M. S. Charikar, Similarity estimation techniques from round- by neural network models, in: AIP conference proceedings,
ing algorithms, in: Proceedings of the thiry-fourth annual American Institute of Physics, 1986, pp. 206–211.
ACM symposium on Theory of computing, 2002, pp. 380– [33] Z. Fan, Y. Xu, D. Zhang, Local linear discriminant analysis
388. framework using sample neighbors, IEEE Transactions on
[13] L. Nataraj, S. Karthikeyan, G. Jacob, B. S. Manjunath, Mal- Neural Networks 22 (2011) 1119–1132.
ware images: visualization and automatic classification, in: [34] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, Orb: An
Proceedings of the 8th international symposium on visualiza- efficient alternative to sift or surf, in: 2011 International Con-
tion for cyber security, 2011, pp. 1–7. ference on Computer Vision, 2011, pp. 2564–2571. doi:10.
[14] B. N. Narayanan, V. S. P. Davuluru, Ensemble malware 1109/ICCV.2011.6126544.
classification system using deep neural networks, Electronics [35] H. Bay, T. Tuytelaars, L. Van Gool, Surf: Speeded up
9 (2020) 721. robust features, in: European conference on computer vision,
[15] J. Zhu, J. Jang-Jaccard, A. Singh, P. A. Watters, S. Camtepe, Springer, 2006, pp. 404–417.
Task-aware meta learning-based siamese neural network [36] D. G. Lowe, Object recognition from local scale-invariant
for classifying obfuscated malware, arXiv preprint features, in: Proceedings of the seventh IEEE international
arXiv:2110.13409 (2021). conference on computer vision, volume 2, Ieee, 1999, pp.
[16] D. Chauhan, H. Singh, H. Hooda, R. Gupta, Classification 1150–1157.
of malware using visualization techniques, in: International [37] T. Ojala, M. Pietikäinen, D. Harwood, A comparative study
Conference on Innovative Computing and Communications, of texture measures with classification based on featured dis-
Springer, 2022, pp. 739–750. tributions, Pattern recognition 29 (1996) 51–59.
[17] A. Darem, J. Abawajy, A. Makkar, A. Alhashmi, S. Alanazi, [38] P. F. Alcantarilla, A. Bartoli, A. J. Davison, Kaze features, in:
Visualization and deep-learning-based malware variant detec- European conference on computer vision, Springer, 2012, pp.
tion using opcode-level features, Future Generation Computer 214–227.
Systems 125 (2021) 314–323. [39] M. A. Azad, F. Riaz, A. Aftab, S. K. J. Rizvi, J. Arshad,
[18] M. Asam, S. J. Hussain, M. Mohatram, S. H. Khan, T. Jamal, H. F. Atlam, Deepsel: A novel feature selection for early
A. Zafar, A. Khan, M. U. Ali, U. Zahoora, Detection of ex- identification of malware in mobile applications, Future
ceptional malware variants using deep boosted feature spaces Generation Computer Systems 129 (2022) 54–63.
and machine learning, Applied Sciences 11 (2021) 10464. [40] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, T. Yagi,
[19] M. Xiao, C. Guo, G. Shen, Y. Cui, C. Jiang, Image-based Malware detection with deep neural network using process
malware classification using section distribution information, behavior, in: 2016 IEEE 40th Annual Computer Software and
Computers & Security 110 (2021) 102420. Applications Conference (COMPSAC), volume 2, 2016, pp.
[20] A. Çayır, U. Ünal, H. Dağ, Random capsnet forest model for 577–582. doi:10.1109/COMPSAC.2016.151.
imbalanced malware type classification task, Computers & [41] R. Ye, Q. Dai, Implementing transfer learning across different
Security 102 (2021) 102133. datasets for time series forecasting, Pattern Recognition 109
[21] J. H. Go, T. Jan, M. Mohanty, O. P. Patel, D. Puthal, (2021) 107617.
M. Prasad, Visualization approach for malware classification [42] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
with resnext, in: 2020 IEEE Congress on Evolutionary Com- Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Im-
putation (CEC), 2020, pp. 1–7. doi:10.1109/CEC48606.2020. agenet large scale visual recognition challenge, International
9185490. journal of computer vision 115 (2015) 211–252.
[22] A. Bensaoud, N. Abudawaood, J. Kalita, Classifying malware [43] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei,
images with convolutional neural network models, Interna- Q. Zheng, Imcfn: Image-based malware classification using
tional Journal of Network Security 22 (2020) 1022–1031. fine-tuned convolutional neural network architecture, Com-
[23] W. El-Shafai, I. Almomani, A. AlKhayer, Visualized mal- puter Networks 171 (2020) 107138.
ware multi-classification framework using fine-tuned cnn- [44] E. Rezende, G. Ruppert, T. Carvalho, F. Ramos, P. De Geus,
based transfer learning models, Applied Sciences 11 (2021). Malicious software classification using transfer learning of
[24] J. Hemalatha, S. A. Roseline, S. Geetha, S. Kadry, R. Damaše- resnet-50 deep neural network, in: 2017 16th IEEE Inter-
vičius, An efficient densenet-based deep learning model for national Conference on Machine Learning and Applications
malware detection, Entropy 23 (2021) 344. (ICMLA), IEEE, 2017, pp. 1011–1014.

16
[45] N. Bhodia, P. Prajapati, F. Di Troia, M. Stamp, Transfer learn- [64] S. Eum, H. Lee, H. Kwon, Going deeper with cnn in
ing for image-based malware classification, arXiv preprint malicious crowd event classification, in: Signal Processing,
arXiv:1903.11551 (2019). Sensor/Information Fusion, and Target Recognition XXVII,
[46] Y. Qiao, B. Zhang, W. Zhang, Malware classification method volume 10646, International Society for Optics and Photonics,
based on word vector of bytes and multilayer perception, in: 2018, p. 1064616.
ICC 2020-2020 IEEE International Conference on Communi- [65] D. Jurafsky, J. H. Martin, Speech and Language Processing,
cations (ICC), IEEE, 2020, pp. 1–6. 3rd ed., Prentice Hall, 2021.
[47] K. Simonyan, A. Zisserman, Very deep convolutional net- [66] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation
works for large-scale image recognition, arXiv preprint of word representations in vector space, arXiv preprint
arXiv:1409.1556 (2014). arXiv:1301.3781 (2013).
[48] N. Ketkar, E. Santana, Deep learning with Python, volume 1, [67] M. Pagliardini, P. Gupta, M. Jaggi, Unsupervised learning
Springer, 2017. of sentence embeddings using compositional n-gram features,
[49] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, arXiv preprint arXiv:1703.02507 (2017).
M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., Ten- [68] A. Bensaoud, J. Kalita, Cnn-lstm and transfer learning models
sorflow: A system for large-scale machine learning, in: for malware classification based on opcodes and api calls,
12th {USENIX} symposium on operating systems design and Knowledge-Based Systems (2024) 111543.
implementation ({OSDI} 16), 2016, pp. 265–283. [69] M. Mimura, R. Ito, Applying nlp techniques to malware
[50] Sudhakar, S. Kumar, Mcft-cnn: Malware classification with detection in a practical environment, International Journal of
fine-tune convolution neural networks using traditional and Information Security (2021) 1–13.
transfer learning in internet of things, Future Generation [70] M.-T. Luong, H. Pham, C. D. Manning, Effective approaches
Computer Systems 125 (2021) 334–351. to attention-based neural machine translation, arXiv preprint
[51] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for arXiv:1508.04025 (2015).
image recognition, in: Proceedings of the IEEE conference on [71] Z. Lu, X. Li, Y. Liu, C. Zhou, J. Cui, B. Wang, M. Zhang,
computer vision and pattern recognition, 2016, pp. 770–778. J. Su, Exploring multi-stage information interactions for
[52] J. H. Go, T. Jan, M. Mohanty, O. P. Patel, D. Puthal, multi-source neural machine translation, IEEE/ACM Trans-
M. Prasad, Visualization approach for malware classification actions on Audio, Speech, and Language Processing (2021)
with resnext, in: 2020 IEEE Congress on Evolutionary Com- 1–1.
putation (CEC), 2020, pp. 1–7. doi:10.1109/CEC48606.2020. [72] Y. Gao, H. Gong, X. Ding, B. Guo, Image recognition based
9185490. on mixed attention mechanism in smart home appliances, in:
[53] U. Von Luxburg, I. Guyon, S. Bengio, H. Wallach, R. Fergus, 2021 IEEE 5th Advanced Information Technology, Electronic
S. Vishwanathan, R. Garnett, Advances in neural information and Automation Control Conference (IAEAC), volume 5,
processing systems 30, in: 31st annual conference on neural 2021, pp. 1501–1505. doi:10.1109/IAEAC50856.2021.9391092.
information processing systems (NIPS 2017), Long Beach, [73] R. Z. AlMazrouei, J. Nelci, S. A. Salloum, K. Shaalan, Feasi-
California, USA, 2017, pp. 4–9. bility of using attention mechanism in abstractive summariza-
[54] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, tion, in: International Conference on Emerging Technologies
Rethinking the inception architecture for computer vision, in: and Intelligent Systems, Springer, 2021, pp. 13–20.
Proceedings of the IEEE conference on computer vision and [74] Z. Niu, G. Zhong, H. Yu, A review on the attention mechanism
pattern recognition, 2016, pp. 2818–2826. of deep learning, Neurocomputing 452 (2021) 48–62.
[55] R. U. Khan, X. Zhang, R. Kumar, Analysis of resnet and [75] S. Ren, L. Zhou, S. Liu, F. Wei, M. Zhou, S. Ma, Semface:
googlenet models for malware detection, Journal of Computer Pre-training encoder and decoder with a semantic interface for
Virology and Hacking Techniques 15 (2019) 29–37. neural machine translation, in: Proceedings of the 59th Annual
[56] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, Meeting of the Association for Computational Linguistics and
D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with the 11th International Joint Conference on Natural Language
convolutions, in: Proceedings of the IEEE conference on Processing (Volume 1: Long Papers), 2021, pp. 4518–4527.
computer vision and pattern recognition, 2015, pp. 1–9. [76] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation
[57] M. Tan, Q. Le, Efficientnet: Rethinking model scaling for by jointly learning to align and translate, arXiv preprint
convolutional neural networks, in: International Conference arXiv:1409.0473 (2014).
on Machine Learning, PMLR, 2019, pp. 6105–6114. [77] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
[58] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception- A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you
v4, inception-resnet and the impact of residual connections on need, in: Advances in neural information processing systems,
learning, Proceedings of the AAAI Conference on Artificial 2017, pp. 5998–6008.
Intelligence 31 (2017). [78] O. Or-Meir, A. Cohen, Y. Elovici, L. Rokach, N. Nissim,
[59] F. Chollet, Xception: Deep learning with depthwise separable Pay attention: Improving classification of pe malware using
convolutions, in: Proceedings of the IEEE conference on attention mechanisms based on system call analysis, in: 2021
computer vision and pattern recognition, 2017, pp. 1251– International Joint Conference on Neural Networks (IJCNN),
1258. 2021, pp. 1–8. doi:10.1109/IJCNN52387.2021.9533481.
[60] S. Sabour, N. Frosst, G. E. Hinton, Dynamic routing between [79] H. Yakura, S. Shinozaki, R. Nishimura, Y. Oyama, J. Sakuma,
capsules, arXiv preprint arXiv:1710.09829 (2017). Neural malware analysis with attention mechanism, Comput-
[61] D. Gibert, C. Mateu, J. Planes, The rise of machine learning ers & Security 87 (2019) 101592.
for detection and classification of malware: Research devel- [80] M. Mimura, T. Ohminami, Using lsi to detect unknown
opments, trends and challenges, Journal of Network and malicious vba macros, Journal of Information Processing 28
Computer Applications 153 (2020) 102526. (2020) 493–501.
[62] V. Kocaman, O. M. Shir, T. Bäck, Improving model accuracy [81] X. Ma, S. Guo, H. Li, Z. Pan, J. Qiu, Y. Ding, F. Chen,
for imbalanced image classification tasks by adding a final How to make attention mechanisms more practical in malware
batch normalization layer: An empirical study, in: 2020 classification, IEEE Access 7 (2019) 155270–155280.
25th International Conference on Pattern Recognition (ICPR), [82] Girinoto, H. Setiawan, P. A. W. Putro, Y. R. Pramadi, Com-
2021, pp. 10404–10411. doi:10.1109/ICPR48806.2021.9412907. parison of lstm architecture for malware classification, in:
[63] S. Alaraimi, K. E. Okedu, H. Tianfield, R. Holden, O. Uth- 2020 International Conference on Informatics, Multimedia,
mani, Transfer learning networks with skip connections Cyber and Information System (ICIMCIS), 2020, pp. 93–97.
for classification of brain tumors, International Journal of doi:10.1109/ICIMCIS51567.2020.9354301.
Imaging Systems and Technology (2021).

17
[83] R. Agrawal, J. W. Stokes, K. Selvaraj, M. Marinescu, Atten- the 22nd ACM SIGKDD international conference on knowl-
tion in recurrent neural networks for ransomware detection, edge discovery and data mining, 2016, pp. 1135–1144.
in: ICASSP 2019 - 2019 IEEE International Conference on [102] A. Shrikumar, P. Greenside, A. Kundaje, Learning important
Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. features through propagating activation differences, in: Inter-
3222–3226. doi:10.1109/ICASSP.2019.8682899. national conference on machine learning, PMLR, 2017, pp.
[84] J. Kim, Y. Ban, E. Ko, H. Cho, J. H. Yi, Mapas: a practical 3145–3153.
deep learning-based android malware detection system, Inter- [103] A. Nadeem, D. Vos, C. Cao, L. Pajola, S. Dieck, R. Baumgart-
national Journal of Information Security (2022) 1–14. ner, S. Verwer, Sok: Explainable machine learning for com-
[85] L. Onwuzurike, E. Mariconti, P. Andriotis, E. D. Cristofaro, puter security applications, arXiv preprint arXiv:2208.10605
G. Ross, G. Stringhini, Mamadroid: Detecting android mal- (2022).
ware by building markov chains of behavioral models (ex- [104] Y. Vivek, V. Ravi, A. A. Mane, L. R. Naidu, Explainable
tended version), ACM Transactions on Privacy and Security artificial intelligence and causal inference based atm fraud
(TOPS) 22 (2019) 1–34. detection, arXiv preprint arXiv:2211.10595 (2022).
[86] J.-Y. Kim, S.-B. Cho, Obfuscated malware detection using [105] M. Kinkead, S. Millar, N. McLaughlin, P. O’Kane, Towards
deep generative model based on global/local features, Com- explainable cnns for android malware detection, Procedia
puters & Security 112 (2022) 102501. Computer Science 184 (2021) 959–965.
[87] G. Olani, C.-F. Wu, Y.-H. Chang, W.-K. Shih, Deepware: [106] N. McLaughlin, J. Martinez del Rincon, B. Kang, S. Yerima,
Imaging performance counters with deep learning to detect P. Miller, S. Sezer, Y. Safaei, E. Trickel, Z. Zhao, A. Doupé,
ransomware, IEEE Transactions on Computers (2022) 1–1. et al., Deep android malware detection, in: Proceedings of the
[88] W. Lian, G. Nie, Y. Kang, B. Jia, Y. Zhang, Cryptomining seventh ACM on conference on data and application security
malware detection based on edge computing-oriented multi- and privacy, 2017, pp. 301–308.
modal features deep learning, China Communications 19 [107] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller,
(2022) 174–185. W. Samek, On pixel-wise explanations for non-linear classifier
[89] A. Bensaoud, J. Kalita, Deep multi-task learning for malware decisions by layer-wise relevance propagation, PloS one 10
image classification, Journal of Information Security and (2015) e0130140.
Applications 64 (2022) 103057. [108] S. Hooker, D. Erhan, P.-J. Kindermans, B. Kim, A benchmark
[90] S. Heron, Advanced encryption standard (aes), Network for interpretability methods in deep neural networks, Ad-
Security 2009 (2009) 8–12. vances in neural information processing systems 32 (2019).
[91] S. B. Sasi, N. Sivanandam, A survey on cryptography using [109] Y.-S. Lin, W.-C. Lee, Z. B. Celik, What do you see? evalua-
optimization algorithms in wsns, Indian Journal of Science tion of explainable artificial intelligence (xai) interpretability
and Technology 8 (2015) 216. through neural backdoors, in: Proceedings of the 27th ACM
[92] M. Mahendra, P. S. Prabha, Classification of security levels SIGKDD Conference on Knowledge Discovery & Data Min-
to enhance the data sharing transmissions using blowfish ing, 2021, pp. 1027–1035.
algorithm in comparison with data encryption standard, in: [110] W. Guo, D. Mu, J. Xu, P. Su, G. Wang, X. Xing, Lemna:
2022 International Conference on Sustainable Computing and Explaining deep learning based security applications, in: pro-
Data Communication Systems (ICSCDS), IEEE, 2022, pp. ceedings of the 2018 ACM SIGSAC conference on computer
1154–1160. and communications security, 2018, pp. 364–379.
[93] C. M. Kota, C. Aissi, Implementation of the rsa algorithm [111] A. Kuppa, N.-A. Le-Khac, Black box attacks on explainable
and its cryptanalysis, in: proceedings of the 2002 ASEE Gulf- artificial intelligence (xai) methods in cyber security, in: 2020
Southwest Annual Conference, 2002, pp. 20–22. International Joint Conference on Neural Networks (IJCNN),
[94] T. R. Lee, J. S. Teh, N. Jamil, J. L. S. Yan, J. Chen, Lightweight IEEE, 2020, pp. 1–8.
block cipher security evaluation based on machine learning [112] D. Rao, S. Mane, Zero-shot learning approach to adap-
classifiers and active s-boxes, IEEE Access 9 (2021) 134052– tive cybersecurity using explainable ai, arXiv preprint
134064. arXiv:2106.14647 (2021).
[95] A. Baksi, Machine learning-assisted differential distinguishers [113] F. Zhong, X. Cheng, D. Yu, B. Gong, S. Song, J. Yu, Malfox:
for lightweight ciphers, in: Classical and Physical Security camouflaged adversarial malware example generation based
of Symmetric Key Cryptographic Algorithms, Springer, 2022, on conv-gans against black-box detectors, IEEE Transactions
pp. 141–162. on Computers (2023).
[96] S. Kok, A. Azween, N. Jhanjhi, Evaluation metric for crypto- [114] Z. Zhao, Z. Li, F. Zhang, Z. Yang, S. Luo, T. Li, R. Zhang,
ransomware detection using machine learning, Journal of K. Ren, Sage: Steering the adversarial generation of examples
Information Security and Applications 55 (2020) 102646. with accelerations, IEEE Transactions on Information Foren-
[97] Y. Ding, G. Wu, D. Chen, N. Zhang, L. Gong, M. Cao, sics and Security 18 (2023) 789–803.
Z. Qin, Deepedn: a deep-learning-based image encryption [115] W. Hu, Y. Tan, Generating adversarial malware examples for
and decryption network for internet of medical things, IEEE black-box attacks based on gan, in: Data Mining and Big Data:
Internet of Things Journal 8 (2020) 1504–1518. 7th International Conference, DMBD 2022, Beijing, China,
[98] H. Kim, J. Park, H. Kwon, K. Jang, H. Seo, Convolutional November 21–24, 2022, Proceedings, Part II, Springer, 2023,
neural network-based cryptography ransomware detection for pp. 409–423.
low-end embedded processors, Mathematics 9 (2021) 705. [116] X. Ling, L. Wu, J. Zhang, Z. Qu, W. Deng, X. Chen, Y. Qian,
[99] S. Sharmeen, Y. A. Ahmed, S. Huda, B. Ş. Koçer, M. M. C. Wu, S. Ji, T. Luo, et al., Adversarial attacks against
Hassan, Avoiding future digital extortion through robust windows pe malware detection: A survey of the state-of-the-
protection against ransomware threats using deep learning art, Computers & Security (2023) 103134.
based adaptive approaches, IEEE Access 8 (2020) 24522– [117] G. Xu, G. Xin, L. Jiao, J. Liu, S. Liu, M. Feng, X. Zheng,
24534. Ofei: A semi-black-box android adversarial sample attack
[100] F. Fischer, H. Xiao, C.-Y. Kao, Y. Stachelscheid, B. Johnson, framework against dlaas, IEEE Transactions on Computers
D. Razar, P. Fawkesley, N. Buckley, K. Böttinger, P. Muntean, (2023).
et al., Stack overflow considered helpful! deep learning [118] Y. Qiao, W. Zhang, Z. Tian, L. T. Yang, Y. Liu, M. Alazab,
security nudges towards stronger cryptography, in: 28th Adversarial elf malware detection method using model inter-
{USENIX} Security Symposium ({USENIX} Security 19), pretation, IEEE Transactions on Industrial Informatics 19
2019, pp. 339–356. (2022) 605–615.
[101] M. T. Ribeiro, S. Singh, C. Guestrin, " why should i trust you?" [119] K. Meenakshi, G. Maragatham, An optimised defensive
explaining the predictions of any classifier, in: Proceedings of technique to recognize adversarial iris images using curvelet

18
transform, Intelligent Automation & Soft Computing 35
(2023) 627–643.
[120] M. Pintor, L. Demetrio, A. Sotgiu, A. Demontis, N. Carlini,
B. Biggio, F. Roli, Indicators of attack failure: Debugging and
improving optimization of adversarial examples, Advances
in Neural Information Processing Systems 35 (2022) 23063–
23076.

15. Appendix A: File Types


. tbk, .jpeg , . brd, .dot , .jpg , .rtf , .doc , .js , .sch ,
.3dm , .mp3 , .sh , .3ds , .key , .sldm , .3g2 , .lay , .sldm , Fig. 21: Training and testing for accuracy and loss of
.mkv , .std , .asp , .mml , .sti , .avi , .mov , .stw , .backup EfficientnetB2
, . jsp, .suo , .bak , .mp4 , .svg , .bat , .mpeg , .swf , .bmp
, .mpg , .sxc , .rb , .msg , .sxd , .bz2 , .myd , .sxi , .c ,
.myi , .sxm , .cgm , .nef , .sxw , .class , .odb , .tar , .cmd
, .odg , .123 , . onetoc2, .odp , .tgz , .crt , .ods , .tif , .3gp
, .lay6 , .sldx , .7z , .ldf , .slk , .vsd , .m3u , .sln , .aes
, .m4u , .snt , .ai , .max , .sql , . ppam, .mdb , .sqlite3 ,
.asc , .mdf , .sqlitedb , .asf , .mid , .stc , .asm , .cs , .odt
, .tiff , .csr , .cpp , .txt , .csv , .pas , .vmx , .docb , .pdf ,
.vob , .docm , .pem , . accdb, .docx , .pfx , .vsdx , .602
, . p12, .wav , .dotm , .pl , .wb2 , .dotx , .png , .wk1 ,
.dwg , .pot , . xltx, .edb , .potm , .wma , .eml , .potx ,
.wmv , .fla , .ARC , .xlc , .flv , .pps , .xlm , .frm , .ppsm
, .xls , .gif , .ppsx , .xlsb , .gpg , .ppt , .xlsm , .gz , .pptm
, .xlsx , .h , .pptx , .xlt , .hwp , .ps1 , .xltm , .ibd , .psd
Fig. 22: Training and testing for accuracy and loss of
, .wks , .iso , .pst , .xlw , .jar , .rar , . djvu, .java , .raw., EfficientnetB3
.ost , .uop , .db , .otg , .uot , .dbf , .otp , .vb , .dch , .ots
, .vbs , .der” , .ott , .vcd , .dif , .php, .vdi , .dip , .PAQ ,
.vmdk , .zip

16. Appendix B: The Accuracy and Loss


Curves Plots

Fig. 23: Training and testing for accuracy and loss of


EfficientnetB4

Fig. 20: Training and testing for accuracy and loss of


EfficientnetB1

Fig. 24: Training and testing for accuracy and loss of


EfficientnetB5

19
Fig. 25: Training and testing for accuracy and loss of
EfficientnetB6

Fig. 26: Training and testing for accuracy and loss of


EfficientnetB7

20

You might also like