0% found this document useful (0 votes)
14 views17 pages

Deep Learning Models for Real-Time Automatic Malware Detection.docx Main

This study presents a hybrid deep learning model for real-time automatic malware detection that integrates static and dynamic analysis, achieving a precision of 98%, recall of 97%, and F1-score of 0.975, significantly outperforming traditional methods. The model utilizes convolutional neural networks (CNN) for visual analysis and recurrent neural networks (RNN) for sequential analysis, effectively detecting both known and new malware variants. The research highlights the limitations of existing detection methods and emphasizes the need for more diverse datasets and robust architectures in future studies.

Uploaded by

cesohe2072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Deep Learning Models for Real-Time Automatic Malware Detection.docx Main

This study presents a hybrid deep learning model for real-time automatic malware detection that integrates static and dynamic analysis, achieving a precision of 98%, recall of 97%, and F1-score of 0.975, significantly outperforming traditional methods. The model utilizes convolutional neural networks (CNN) for visual analysis and recurrent neural networks (RNN) for sequential analysis, effectively detecting both known and new malware variants. The research highlights the limitations of existing detection methods and emphasizes the need for more diverse datasets and robust architectures in future studies.

Uploaded by

cesohe2072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received 14 June 2024, accepted 27 July 2024, date of publication 1 August 2024, date of current version 14 August 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3436588

Deep Learning Models for Real-Time


Automatic Malware Detection
ROMMEL GUTIERREZ 1, WILLIAM VILLEGAS-CH. 1, (Member, IEEE),
LORENA NARANJO GODOY2, ARACELY MERA-NAVARRETE3, AND SERGIO LUJÁN-MORA 4
1 Escuela de Ingeniería en Ciberseguridad, FICA, Universidad de Las Américas, Quito 170125, Ecuador
2 Escuela de Posgrados, Maestría en Derecho Digital, Universidad de Las Américas, Quito 170125, Ecuador
3 Departamento de Sistemas, Universidad Internacional del Ecuador, Quito 170411, Ecuador
4 Department of Software and Computing Systems, University of Alicante, 03690 Alicante, Spain

Corresponding author: William Villegas-Ch. ([email protected])

ABSTRACT The increase in the sophistication and volume of cyberattacks has made traditional malware
detection methods, such as those based on signatures and heuristics, obsolete. These conventional techniques
struggle to identify new malware variants that employ advanced evasion tactics, resulting in significant
security gaps. This study addresses this problem by proposing a hybrid model based on deep learning that
integrates static and dynamic analysis to improve the precision and robustness of malware detection. This
proposal combines the extraction of static features from the code and dynamic features from the behavior at
runtime, using convolutional neural networks for visual analysis and recurrent neural networks for sequential
analysis. This comprehensive integration of features allows our model to detect known malware and new
variants more effectively. The results show that our model achieves a precision of 98%, a recall of 97%,
and an F1-score of 0.975, outperforming traditional methods, which generally reach 88% to 89% precision.
Furthermore, our model outperforms recent deep learning approaches documented in the literature, which
report up to 96% precision. In work, it offers a significant advancement in malware detection, providing a
more effective and adaptable solution to modern cyber threats.

INDEX TERMS Malware detection, deep learning, static and dynamic analysis, cybersecurity.

I. INTRODUCTION ability to learn and generalize complex patterns in large


Malware detection is a critical concern in cybersecurity due volumes of data [3]. However, even these recent methods face
to the increasing number and sophistication of cyberattacks. challenges, such as detecting obfuscated malware samples
Based on signatures and heuristics, traditional detection and generalizing them to new threats.
methods have proven insufficient to identify new malware This study proposes a deep learning-based model that
variants that employ advanced evasion techniques [1]. integrates static and dynamic analysis to improve accuracy
These conventional methods present significant limitations, and robustness in malware detection [4]. The proposal
as signature-based techniques rely on predefined patterns combines static feature extraction from code and dynamic
that cannot quickly adapt to malware evolution. In contrast, features from runtime behavior, using convolutional neural
heuristic methods, which look for suspicious behavior, networks (CNN) for visual analysis [5]. This integration
often result in high false favorable rates due to the allows our model to detect known malware more effectively
difficulty distinguishing between legitimate and malicious and new variants [6]. The results show a precision of 98%,
behavior [2]. a recall of 97%, and an F1-score of 0.975, significantly out-
In response to these limitations, deep learning (DL) performing traditional methods and some modern approaches
techniques have emerged as a promising solution for their documented in the literature. In comparison, signature- and
heuristic-based methods typically achieve 88% to 89% [7],
The associate editor coordinating the review of this manuscript and while previous studies using convolutional neural networks
approving it for publication was Yassine Maleh . achieved up to 96% [8].
© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
107742 VOLUME 12, 2024
R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

This work presents several important contributions. First, and CNNs to take advantage of sequential and visual features,
it introduces a hybrid model combining static and dynamic thereby improving accuracy and generalization.
analysis using CNN and RNN, improving the detection The work of Chen and Cao [14] combined static and
of known and new malware [9]. Furthermore, the model dynamic analysis using a deep neural network, achieving a
achieves an accuracy of 98%, significantly outperforming precision of 93%. Although this approach is similar to ours,
traditional and recent techniques. Limitations of the model the difference in results can be attributed to our dataset’s
are also addressed, highlighting the need for more flexible greater diversity and size and the optimization of model
and adaptive architectures and the representativeness of the hyperparameters. Incorporating advanced preprocessing and
dataset. Finally, future research directions are proposed, feature selection techniques also played a crucial role in
suggesting the need for more diverse and representative improving performance.
datasets and the development of more robust architectures for Deep learning-based methods have proven more effective
malware detection. in detecting unknown malware variants than traditional
The article is structured as follows: the Introduction malware detection methods such as signature-based and
presents the context, a literature review, the definition of heuristic-based ones. For example, signature-based methods,
the problem, and our proposal. The literature review covers such as those discussed by Pandit and Mondal [15],
traditional methods and recent advances in deep learning showed an accuracy of 88%, which is less effective against
for malware detection. Materials and methods include data new malware variants. Although more adaptive, heuristic
selection and preprocessing, model architecture, and training. techniques achieved an accuracy of 89% but suffer from
The Case Study details the implementation of the system on a high false favorable rates due to the difficulty distinguishing
mobile application platform and the evaluation of the results. between legitimate and malicious behavior. Our study, with
The Results and Discussion present an analysis, a comparison an accuracy of 98%, demonstrates the superiority of deep
with other approaches, and the study’s limitations. The learning techniques in detecting modern malware.
Conclusions summarize the findings, potential impact, and Despite the strides made, deep learning for malware
future research directions. Finally, the References used in the detection has challenges and limitations. The generalization
study are included. of new samples and the representativeness of the dataset are
critical issues. Our study identified that approximately 5% of
II. LITERATURE REVIEW the latest samples were not detected due to advanced evasion
Malware detection using deep learning techniques has gained techniques. Moreover, the representativeness of the dataset is
considerable attention in the last decade due to its ability limited, with 80% of the samples representing only five types
to identify complex patterns in large and heterogeneous of malware. These issues underscore the urgent need for more
data [10]. Recent studies have explored various neural diverse and representative datasets to bolster the robustness
network architectures to improve accuracy and robustness in and effectiveness of the models.
malware detection. Hybrid approaches, which combine static and dynamic
A study by Liu et al. [11] introduced a deep neural network analysis with advanced neural networks, have been explored
for malware detection using static features extracted from to bridge these gaps. For instance, Dong et al. [10] employed a
binary code. This work showed an accuracy of around 95%, combination of CNN and DNN to detect malware on Android
but malware obfuscation and mutation techniques limited devices, enhancing accuracy by integrating multiple features.
the model’s effectiveness. Compared to our study, which Furthermore, Yerima et al. [16] proposed a model that incor-
achieved a precision of 98%, the difference can be attributed porates a balancing optimizer with deep learning techniques
to incorporating dynamic analysis techniques and using a for Android malware detection, demonstrating the effective-
more complex neural network architecture. ness of hybrid approaches in improving accuracy and gen-
Another significant work is Yadav et al. [12], where a CNN eralization. This progress should reassure the audience that
was implemented for malware detection based on visualizing malware detection techniques are continuously improving.
binaries as images. This method achieved a precision of
around 96%, highlighting the effectiveness of CNNs in
detecting complex visual patterns. However, this approach is III. MATERIALS AND METHODS
limited to static features and may be less effective against A. DATA SELECTION
malware that uses behavior-based evasion techniques. Our This work was developed in a cybersecurity research environ-
study combined static and dynamic features, which allowed ment within the University’s Computer Security Laboratory,
for better generalization and superior performance. equipped with advanced computing resources and access to
Yao et al. [13] proposed using recurrent neural networks multiple malware databases. The lab is configured with high-
(RNNs) for malware detection by sequencing system calls, performance servers, large storage capacity, and specialized
achieving an accuracy of around 94%. RNNs are effective at software tools for security analysis. The infrastructure
capturing temporal dependencies and sequential patterns, but includes GPU clusters to efficiently train deep learning
their ability to handle large volumes of sequential data may be models and sandboxing systems to execute malware samples
limited. In contrast, our model used a combination of RNNs safely.

VOLUME 12, 2024 107743


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

Several public and private databases were used to obtain TABLE 1. Features extracted for malware and benign software analysis.
malware and benign software samples. Public databases
include Drebin [17], a widely used collection of malicious
Android applications, and VirusShare [18], which offers
extensive malware samples for multiple platforms. Internal
repositories of samples collected and labeled by the lab were
accessed, including executable files for Windows systems and
mobile applications for Android and iOS.
The dataset used in this study is classified into two
main categories: static analysis and dynamic analysis. The
static analysis includes features extracted from application
source code and binaries, such as code signatures, requested
permissions, and code structure [19]. On the other hand,
dynamic analysis focuses on the behavior of applications
during their execution, capturing information such as system
call sequences, network activities, and system resource
usage [20]. This duality allows for more complete and robust
malware detection, combining static patterns with dynamic
behaviors.
The dataset used in this study includes 50,000 samples,
distributed between 30,000 malware samples and 20,000 FIGURE 1. Distribution of extracted features between malware and
benign software samples. These samples cover a wide range benign software samples.
of malware types, including Trojans, ransomware, adware,
and spyware, as well as benign applications from various were transformed into images by representing bytes as pixels.
categories, such as games, productivity tools, and social This technique allows CNNs to analyze the samples as if
networking applications. The diversity of malware types and they were images, identifying visual patterns characteristic of
target platforms (Windows, Android, iOS) ensures that the malware [25]. Additionally, system call sequences and other
deep learning model can effectively generalize and detect dynamic features were converted into numerical vectors using
various threats in different operating environments. one-hot encoding and embeddings for RNNs and extended
short-term memory networks (LSTMs).
Specific Examples of Extracted Features: Static Features:
B. DATA PREPROCESSING
• Code Signatures: Byte patterns in binary code that help
Data cleansing is crucial in preparing malware and benign identify similarities between different malware samples.
software samples for deep learning analysis. This process • Requested Permissions: Permissions that applications
involves several steps to ensure that the data is consistent request, such as access to sensitive data or device
and high-quality. First, duplicate samples were identified and functionality, which may indicate potentially malicious
removed using hashing techniques to ensure each sample behavior.
was unique [21]. Subsequently, samples that do not provide
relevant information, such as empty files or containing only Dynamic Features:
• System Call Sequences: These are the software’s
non-executable data, are discarded. The data is normalized
interactions with the operating system, providing a
to ensure format consistency, such as file names and folder
detailed profile of the software’s actions.
structures, thus facilitating subsequent analysis [22].
• Network Activities: Traffic generated by the application
Feature extraction is essential to convert raw data into
is relevant to identifying malicious behavior, such as
useful information that deep learning models can process. communication with command and control servers.
This study used both static and dynamic features. Static
Table 1 summarizes the features extracted for analysis,
features were obtained through static analysis of the code
without executing it, extracting code signatures, permissions including the techniques and data types generated. It also
requested by applications, and structures from the source clearly shows the static and dynamic characteristics and the
code [23]. In contrast, dynamic characteristics were captured transformations carried out.
by monitoring the behavior of applications during their Figure 1 illustrates the distribution of extracted features,
execution in a controlled environment (sandbox) [24]. comparing the number of extracted features between malware
Sequences of system calls, network activities, and system and benign software samples. The graph shows four main
resource usage were recorded, providing a dynamic profile categories of features: code signatures, permissions, system
of each sample’s behavior. calls, and network activities. Each category is important for
Several transformations were performed to prepare the data deep analysis of the samples, providing multiple perspectives
on the software’s behavior and structure.
for the deep learning model. Binaries of the malware samples

107744 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

Code signatures and permissions are static features


obtained without running the applications. Code signatures,
represented as byte patterns, help identify similarities
between different malware samples. At the same time,
permissions requested by applications can indicate potential
malicious behavior, such as access to sensitive data or
device functionalities [7]. The graph shows more code
signatures and permissions in malware samples than in
benign applications. This is consistent with the hypothesis
that malware requests excessive permissions and exhibits
characteristic code patterns.
On the other hand, system calls, and network activities
are dynamic features captured during application execution
in a controlled environment. System calls to record the
interactions of the software with the operating system,
providing a detailed profile of the actions performed by the
software [26]. Network activities include traffic generated by
the application, which is particularly relevant for identifying
malicious behavior, such as communication with command- FIGURE 2. Block diagram of CNN model architecture.
and-control servers. By looking at the relative proportions of
each feature category, the relevance and potential impact of
each data type on the accuracy and effectiveness of the deep to make the final prediction. Finally, an output layer with a
learning model can be inferred. SoftMax activation function is used for binary classification,
determining whether the sample is malware or benign soft-
ware. Figure 2 illustrates the model’s architecture, showing
C. MODEL ARCHITECTURE how the different layers are interconnected to process and
The architecture of the deep learning model selected for this classify the samples.
study is a CNN optimized for malware detection. Although The hyperparameters selected for training the model are
CNNs are traditionally used for image analysis, in this case, crucial for its performance and precision. The learning rate
malware and benign software binaries were transformed into was set to 0.001, a low rate that ensures that the model stably
images to take advantage of CNNs’ capabilities in identifying converges toward the global minimum of the training error.
complex patterns [27]. This technique has proven effective in The number of epochs was set to 50, allowing sufficient
several recent studies where binaries are represented visually, training of the model without the risk of over-fitting. The
allowing the neural network to detect malicious patterns that batch size was set to 64, an intermediate size that balances
would not be apparent using traditional methods. training efficiency and gradient stability. Additionally, the
The model starts with an input layer that receives 256 × Adam optimizer was used for its ability to adapt to changes
256 pixel images generated from the application binaries. in the gradient dynamically, accelerating the convergence
This visual representation allows CNN to analyze the data process.
effectively. Next, several sequential convolution layers are Widely recognized tools and frameworks in deep learning
introduced, each followed by a ReLU (Rectified Linear Unit) were used to develop the model. TensorFlow was the
activation layer. These convolutional layers have 32, 64, and primary framework used to build and train the model. Keras,
128 filters, respectively, with kernel sizes 3 ×
3. Convolutional a high-level API integrated with TensorFlow, simplified
layers are responsible for detecting local features by applying the definition and training of neural networks [29]. Python
filters that sweep over the input image, activating neurons was the programming language to implement the entire
based on the presence of specific patterns. data preprocessing and model training pipeline. Jupyter
Each convolutional layer is followed by a pooling layer Notebooks was the interactive environment for developing
(max-pooling) of size 2×2, which reduces the dimen- and experimenting with the model, facilitating visualization
sionality of the data by selecting the maximum values in and parameter adjustment.
non-overlapping regions of the convolutional output [28].
This pooling process helps reduce computational complexity D. MODEL TRAINING
and prevent overfitting to the model, extracting the most To ensure that the deep learning model generalizes well
relevant features more efficiently. Subsequently, the data to previously unseen data, the data was divided into three
passes through two fully connected (FC) layers with 256 and sets: training, validation, and testing. Of the total 50,000
128 neurons, respectively, followed by a ReLU activation. samples, 70% (35,000 samples) were used for the training
These fully connected layers act as high-level classifications, set, 15% (7,500 samples) for the validation set, and the
combining the features extracted by the convolutional layers remaining 15% (7,500 samples) for the test set. This balanced

VOLUME 12, 2024 107745


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

split allows you to evaluate the model’s performance at In addition, data augmentation techniques were imple-
each training stage and adjust parameters as necessary. The mented to increase the diversity of the training data and
samples were randomized to avoid bias and ensure that each improve the model’s ability to generalize to new samples.
set adequately represented data types, malware, and benign These techniques included rotating and scaling the images
software. Additionally, class balances were ensured within generated from the binaries and introducing Gaussian noise,
each set to prevent the problem of class imbalance, which which helps simulate variations in the data and makes
could lead to a biased model. the model more robust to different representations of the
The model training procedure involves several steps to malware [32]. The training process was monitored using
optimize the model’s performance and ensure its robustness. precision and loss plots for the training and validation
The training was carried out in a high computing environment sets. These visualizations made it possible to quickly
using GPU clusters, specifically NVIDIA Tesla V100, which identify any signs of overfitting or underfitting and adjust
provide the power needed to handle the large volume of data the hyperparameters accordingly. Additionally, Keras used
and complexities of the deep learning model. The choice of callbacks to implement early stopping, stopping training if the
GPUs was based on their ability to perform massively parallel loss on the validation set stopped improving for a predefined
calculations, which is essential to accelerate the training number of epochs, thus preventing overfitting.
process of deep convolutional neural networks.
The first step in the training procedure was setting up E. VALIDATION AND EVALUATION
the environment. The TensorFlow framework was used Model validation and evaluation ensure that the deep learning
with Keras, running in a development environment based model performs optimally and can adequately generalize to
on Jupyter Notebooks, allowing easy manipulation and unseen data. Several evaluation methods and metrics were
visualization of data and results [30]. The training scripts used to evaluate the model’s performance. Cross-validation
were implemented in Python, taking advantage of the was used to assess the stability and generalization of the
advanced deep-learning libraries and tools available in this model. This study applied k-fold cross-validation, dividing
ecosystem. Additionally, Docker containers were used to the training set into k subsets (folds). The model is trained k
ensure the development environment’s reproducibility and times using k-1 subsets for training and the remaining subset
facilitate the model’s deployment on different operating for validation [33]. This is repeated k times so that each subset
systems and hardware configurations. is used exactly once as a validation set. Cross-validation
Several regularization techniques were applied during helps ensure the model does not overfit a specific part of
training to prevent overfitting and improve the model’s the training data set. The primary evaluation metric on each
generalization ability. The dropout technique was used in fold is calculated, and then the metrics across all folds are
the fully connected layers, with a dropout rate of 50%. This averaged to obtain a robust estimate of model performance.
technique randomly turns off a fraction of the neurons during To evaluate the model’s performance, the following metrics
each training step, forcing the model to learn more robust and were used: precision, recall, F1-score, and the area under
distributed representations. Additionally, batch normalization the ROC curve (AUC-ROC) [34]. These metrics provide a
was implemented after each convolutional layer [31]. This comprehensive view of model performance regarding binary
technique normalizes the activations of each mini-batch, classification (malware vs. benign).
stabilizing and accelerating the training process by reducing Precision: Precision is the proportion of correct predictions
the problem of fading and gradient explosion. over the total predictions. TP is the number of true positives,
The training process was carried out for 50 epochs, TN is the number of true negatives, FP is the number of
with a batch size of 64 samples. The loss function used false positives, and FN is the number of false negatives. It is
was binary crossentropy, which is suitable for binary calculated as:
classification. The Adam optimizer, known for its ability to True positives
adapt to changes in the gradient dynamically, was used to Precision = (1)
True positives + False positives
minimize the loss function. During training, model perfor-
mance was continuously monitored on the validation set, Recall (Sensitivity or True Positive Rate): Recall measures
adjusting hyperparameters to optimize precision and reduce the model’s ability to identify all positive samples correctly.
error. It is calculated as:
Several evaluations were performed during the training to True positives
Recall = (2)
monitor model performance and ensure that overfitting did True positives + False negatives
not occur. These evaluations included cross-validation, where F1-score: The F1-score is the harmonic mean of precision
the training set was further divided into k-subsets, and the
and recall, balancing the two. It is calculated as:
model was trained and evaluated k times, each time using a 2 × Precision × Recall
different subset as the validation set and the remaining k-1 F1 Score = (3)
subsets as the validation set training [11]. This technique Precision + Recall
helps ensure that the model generalizes well and is not overly AUC-ROC: The ROC curve is a graph that shows the
dependent on any specific subset of the data. relationship between the true positive rate (TPR) and the

107746 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

false positive rate (FPR). The AUC provides a single additional benefits in terms of latency and performance.
measure of performance, where a value of 1 indicates A cache system was implemented to store recent inference
perfect performance and a value of 0.5 indicates random results to improve efficiency and reduce latency. This is
performance. The AUC is calculated using the integral of the particularly useful for samples analyzed repeatedly, avoiding
ROC curve. The TPR and FPR are defined as: the need to process the same samples multiple times. Using
TP a load balancer distributes inference requests across multiple
TPR (4)
= inference server instances, ensuring no single point of failure
TP + FN
FP and improving system scalability.
FPR (5)
= Several continuous monitoring and evaluation methods
FP + TN were implemented to ensure the malware detection system
The independent test set (15% of the total data) was works effectively in a real production environment. Stress
used to conduct the evaluation, which was not seen by tests were performed to evaluate system performance under
the model during training. This approach ensures that the load, simulating a high volume of inference requests to
evaluation metrics reflect the model’s actual performance on identify potential bottlenecks and ensure the system can
unseen data, accurately measuring its generalization ability. handle traffic spikes without performance degradation [36].
In addition, confusion matrices were generated to analyze the A continuous monitoring system was implemented using
model predictions in detail. Confusion matrices allow you to tools such as Prometheus and Grafana, which allow tracking
identify and quantify true positives, false positives, and false key metrics such as inference latency, error rate, and resource
negatives, providing a detailed view of areas where the model utilization in real-time. This helps detect operational issues
can improve. quickly and take corrective action before they impact end
users. In addition to operational metrics, model precision
F. REAL-TIME IMPLEMENTATION in production was monitored by collecting and analyzing
Deploying the deep learning model in a production envi- ground truth labels for a subset of the analyzed samples. This
ronment requires careful integration with other software allows you to continually evaluate the model’s effectiveness
components to ensure efficient and reliable operation. The and adjust parameters as necessary. A feedback loop was
trained model is deployed on a highly available server, established where newly labeled samples are fed back to the
integrated with a micro-services architecture to facilitate model to perform periodic adjustments and retraining, thus
interaction with other systems and applications. The system continuously improving its detection capacity and adapting
architecture includes several main components. First, a ded- to new threats.
icated inference server that uses GPUs to speed up request
processing. This server is connected to a RESTful web
service that allows external applications to submit real-time G. ETHICAL AND SAFETY CONSIDERATIONS
software samples for analysis. Implementing a deep learning-based malware detection
Additionally, a NoSQL database stores records of the infer- system involves technical challenges and ethical and security
ences performed, including classification results, response considerations important for its acceptance and effectiveness
times, and any errors found. This allows for continuous in real environments. Data privacy is a priority in designing
monitoring and rapid response to operational problems [35]. and implementing the malware detection system. Several
The preprocessing pipeline ensures that software samples measures were adopted to ensure the privacy and security
undergo a feature extraction and transformation process of the data used and generated by the system. First, all
like during training, ensuring consistency in the input data. sample data is anonymized before use in model training and
The threshold-based alert system notifies administrators of evaluation. This includes removing personally identifiable
detected anomalies, such as an unexpected increase in false information (PII) from software samples and inference logs.
positives or high response times. Before the anonymization process, the risk of explicit or
Several techniques were implemented to optimize real- implicit inferences shall be assessed; that is, the structure
time model performance, ensuring fast and efficient and information within an attribute shall be identified and
inference without compromising model precision. Compres- understood to ensure that all inference records have been
sion techniques such as quantization and weight pruning were removed. Data is also encrypted at rest and in transit
applied to reduce the model’s size and improve its inference using advanced encryption algorithms, such as AES-256,
efficiency. Quantization reduces the precision of the model to prevent unauthorized access. Data accesses are restricted
weights from 32 bits floating point to 16 or even 8 bits. At the to authorized personnel through role-based access controls
same time, pruning removes insignificant weights that do not (RBAC), ensuring only users with appropriate credentials can
significantly affect the precision of the model. The model access sensitive information [37].
runs on high-performance GPUs, such as the NVIDIA Tesla Malware detection carries ethical implications that must
V100, capable of massively parallel calculations. be carefully considered. One of the main challenges is the
Additionally, the use of TPUs (Tensor Processing Units) potential for false positives, where legitimate software is
was evaluated for specific inference tasks, which could offer incorrectly identified as malware. This can have significant

VOLUME 12, 2024 107747


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

consequences, including disruption of services, loss of data, TABLE 2. Deep learning model evaluation metrics.
and damage to the reputation of software developers [38].
To mitigate these risks, manual verification mechanisms are
implemented where suspected cases are reviewed before
corrective actions are taken. Additionally, transparent com-
munication is maintained with end users, providing clear
explanations when malware is detected and allowing appeals
or additional reviews in case of disputes.
TABLE 3. Performance comparison between deep learning model and
Another important ethical aspect is responsibility in auto- traditional malware detection methods.
mated decision-making. System decisions must be auditable
and explainable. For this reason, explainable AI (XAI)
techniques were implemented to allow the deep learning
model’s decisions to be broken down and justified. Both the
decision and the techniques implemented must allow human
evaluation.
Compliance with relevant regulations and standards is
essential for successfully implementing any cybersecurity predictions, recall measures the model’s ability to identify
system. The malware detection system is aligned with various positive samples (malware) correctly, the F1-score provides
international and local data protection and cybersecurity a balance between precision and recall, and the AUC-ROC
regulations. This includes compliance with the European evaluates the ability of the model to distinguish between
Union’s General Data Protection Regulation (GDPR), which positive and negative classes.
establishes strict guidelines for collecting, processing, and The results obtained are presented in Table 2. The model
storing personal data. achieved a precision of 0.96 on the validation and 0.95 on
In the field of cybersecurity, the system follows the the test set, indicating a high precision level in classifying
standards established by the National Institute of Standards malware and benign software. The model recall was 0.94 in
and Technology (NIST), particularly the cybersecurity frame- validation and 0.93 in testing, reflecting its ability to
work (NIST Cybersecurity Framework) and the guidelines identify malware samples correctly. The F1-score, which
for privacy risk management [39]. In addition, they adhere to balances precision and recall, was 0.95 in validation and
the recommendations of the Cybersecurity and Infrastructure 0.94 in testing, suggesting the balanced performance of the
Security Agency (CISA) for protecting critical infrastructure model. Finally, the AUC-ROC, which measures the model’s
and managing cyber incidents. These measures ensure that ability to distinguish between classes, was 0.97 in validation
the system complies with current regulations and is prepared and 0.96 in testing, demonstrating excellent discrimination
to adapt to future regulatory changes. Continuous review between malware and benign software.
and updating of security and privacy policies and procedures The deep learning model performed better than traditional
ensure the system remains compliant and protects user data malware detection methods, such as those based on signatures
and rights adequately. and heuristics. As shown in Table 3, the deep learning model
achieved significantly higher precision (0.95) compared to
the signature-based (0.85) and heuristic-based (0.80) meth-
IV. RESULTS ods. Similarly, the recall and F1-score of the deep learning
A. GENERAL DESCRIPTION OF RESULTS model were higher, with values of 0.93 and 0.94, respectively,
Analysis of the performance of the deep learning model compared to 0.80 and 0.825 for signature-based methods
was performed using a set of metrics, including precision, and 0.78 and 0.79 for heuristic methods. The AUC-ROC of
recall, F1-score, and AUC-ROC. The model was trained the deep learning model (0.96) also outperformed that of
and evaluated in multiple phases to obtain these results. traditional methods, indicating a better ability to distinguish
First, the data was divided into training, validation, and between malware and benign software.
test sets, ensuring adequate representation of malware and Figure 3 illustrates the deep learning model’s performance
benign software samples in each set. The model was then compared to traditional malware detection methods, using
trained using the training set, with hyperparameter tuning line graphs for a more detailed and precise representation.
based on performance on the validation set. Finally, model The first part of the figure presents the deep learning model’s
performance was evaluated on the test set to ensure that the performance in the training, validation, and test sets.
metrics reflect the model’s ability to generalize to previously In Graph 3A, we observe that the deep learning model
unseen data. shows high precision in all sets, with values of 0.98 in
In each phase of the analysis, precision, recall, F1- training, 0.96 in validation, and 0.95 in testing. This indicates
score, and AUC-ROC metrics were calculated to evaluate that the model accurately classifies malware and benign
the performance of the deep learning model. Precision software samples. The model recall follows a similar trend,
measures the proportion of correct predictions over the total with values of 0.97 in training, 0.94 in validation, and

107748 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

FIGURE 3. Performance of the deep learning model and comparison with traditional malware detection methods. Graph 3A: Metrics of the deep
learning model. Graph 3B: Comparison of conventional malware detection methods.

0.93 in testing, reflecting its ability to identify malware TABLE 4. Comparison of Features between Interactive Learning Tools.
samples correctly. The F1-score, which balances precision
and recall, is also high, with values of 0.975 in training,
0.95 in validation, and 0.94 in testing, suggesting a balanced
model performance. Finally, the AUC-ROC of the model
is 0.99 in training, 0.97 in validation, and 0.96 in testing,
demonstrating excellent discrimination between malware and
benign software.
Graph 3B compares the performance of the deep learning
model with traditional methods based on signatures and
heuristics. Here, the deep learning model outperforms
conventional methods in all evaluated metrics. The precision
of the deep learning model is significantly higher (0.95)
compared to the signature (0.85) and heuristic (0.80) based
methods. Similarly, the recall and F1-score of the deep
learning model are higher, with values of 0.93 and 0.94,
respectively, compared to 0.80 and 0.825 for signature-based
methods and 0.78 and 0.79 for heuristic methods. The
AUC-ROC of the deep learning model, with a value of 0.96,
also exceeds that of traditional methods, indicating a better
ability to distinguish between malware and benign software.
The results demonstrate the effectiveness of the deep
learning approach in malware detection, outperforming
traditional methods in precision, recall, F1-score, and AUC- to generalize to unseen data, complementing the general
ROC. The superiority of the deep learning model is due results presented in the previous section.
to its ability to learn complex patterns and features that The critical difference between this section and the
signature-based methods and heuristics cannot capture. This previous one lies in the granularity and temporal focus of
deep learning capability allows the deep learning model the metrics. While the last section focused on the results
to detect newer, more sophisticated malware variants with and comparison with traditional methods, here we explore
greater precision, making it a valuable tool for real-time how the metrics change during the training process, providing
cybersecurity. insights into the stability and behavior of the model over time.
Table 4 shows that the model precision improves con-
sistently across epochs, reaching a value of 0.98 on the
B. QUANTITATIVE RESULTS training set, 0.96 on the validation set, and 0.95 on the
The precision, recall, F1-score, and AUC-ROC metrics were test set at the end of 50 epochs. The recall also shows
calculated over 50 epochs for the training, validation, and test continuous improvement, with final values of 0.97, 0.94,
sets to evaluate the deep learning model’s performance. This and 0.93 for the training, validation, and test sets. The
analysis was carried out following a meticulous process in F1 score follows a similar trend, indicating an adequate
which the model was trained iteratively and evaluated in each balance between precision and recall. At the same time,
epoch, thus allowing us to observe how the metrics evolve. the AUC-ROC reflects an excellent ability of the model to
These results provide detailed insight into the model’s ability distinguish between classes, with final values of 0.99, 0.97,

VOLUME 12, 2024 107749


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

and 0.96. These metrics allow an under-standing of how the TABLE 5. Analysis of Case Studies in Malware Detection.
model improves performance over time and help identify
potential optimization points in future iterations.
Figure 4 presents the evolution of the loss during training
and validation, as well as the confusion matrices for the
validation and test sets. The process to obtain these results
includes monitoring the loss in each epoch during training
and validation, which allows for evaluating the model’s
convergence and detecting possible overfitting or underfitting
problems. Confusion matrices provide a detailed view of
the model’s ability to classify malware and benign software
samples correctly.
In Graphs 4A, the evolution of the loss during training and
validation shows how the model fits the data. Initially, the loss
is high but decreases as the model learns, stabilizing towards
the later epochs, indicating that the model has reached a good
fit. Graphs 4B and 4C represent the confusion matrices for
the validation and test sets. These matrices show that the
model has a high rate of true positives and negatives, with a
relatively low number of false positives and negatives. This
confirms the model’s ability to correctly classify malware
and benign software samples. However, there is always room
to improve the reduction of false negatives to increase the
model’s sensitivity.

C. QUALITATIVE RESULTS OF THE MODEL IN MALWARE


DETECTION In the false positive/negative examples, case 3 was
To gain a deeper understanding of the performance of established: Legitimate TeamViewer software identified as
the deep learning model in detecting malware, specific malware, in which the model misclassified TeamViewer
cases were selected, analyzed, and presented in Table 5. software as malware due to its connections to multiple
These cases include examples where the model successfully servers, a behavior that may be legitimate in specific contexts
detected malware and instances where the model produced but that is also characteristic of some malware. In this
false positives or failed to detect threats (false negatives). This case, TeamViewer connected to 15 different servers during
qualitative analysis reviewed detection logs, network traffic a 5-minute interval, a behavior resembling that of specific
characteristics, code signatures, and behavioral patterns command and control (C&C) malware. This false positive
observed in the analyzed samples. The selection of these contributed to an increase in the false positive rate by 2%.
cases was based on the representativeness of the various types In Case 4: 7-Zip benign software detected as malware, the
of threats and behaviors and the diversity of observed errors 7-Zip software, although legitimate, contained functions that
to identify patterns and areas of improvement for the model. were like those of certain malware. This similarity in code led
Case 1: Emotet Malware: The model successfully detected to a false alarm. Specifically, 7-Zip used a data compression
the Emotet malware by identifying a consistent pattern of library that is also used by 80% of ransomware-type malware.
anomalous behavior in network traffic. This case highlights This incident highlighted the importance of improving code
the model’s ability to recognize signatures and patterns analysis techniques to avoid false positives, resulting in a
characteristic of certain malware. For example, Emotet false positive rate of 1.5%.
generated anomalous network traffic that included multiple For Case 5: TrickBot Malware Not Detected, the TrickBot
requests to suspicious domains at a rate of 5 requests per malware was not detected due to its advanced evasion
second over 10 minutes. This specific signature allowed the techniques, such as code obfuscation and behavior manip-
model to identify the threat with 98% precision. ulation that made it difficult to identify by the model.
Case 2: WannaCry Malware. In this example, the model This malware used obfuscation techniques that dynamically
correctly identified the WannaCry ransomware due to the altered its digital signature, avoiding detection in 3% of
similarity of its code to known malware signatures. The the cases analyzed. Advanced evasion highlighted the need
sample featured features and code structures that matched to improve the model’s capabilities to detect sophisticated
previously seen malware samples, such as using specific evasion techniques.
cryptographic libraries and encryption patterns found in In Case 6: APT28 Malware Not Detected, the APT28 mal-
95% of known WannaCry samples. This static code analysis ware exhibited sporadic strange behavior, which complicated
resulted in precision detection with a 96% recall rate. its detection. The low frequency of these irregular patterns,

107750 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

FIGURE 4. Visualization of the evolution of the loss and confusion matrices. Graph 4A: Evolution of Loss during Training and
Validation. Graph 4B: Confusion Matrix - Validation. Graph 4C: Confusion Matrix - Test.

TABLE 6. Performance Metrics Under Load. Figure 5 presents the system’s performance under different

load levels; for example, Graph 5A shows that the inference


latency increases non-linearly as the load increases, reflecting
such as accessing system resources at random intervals, went a more realistic behavior of the system. Graph 5B shows
unnoticed. In 24 hours, the APT28 malware accessed critical the request processing rate under different load levels, where
system files on three occasions, contributing to a 2.5% false fluctuations common in real environments can be seen due
negative rate. This case indicates that the model needs to be to the variability in handling requests. For example, low-
tuned to recognize anomalous low-frequency patterns. load inference latency remains around 50-60 ms. Still, when
increasing the load to maximum levels, the latency rises
D. PRODUCTION PERFORMANCE EVALUATION
to 200 ms, demonstrating the impact of load on system
performance. Likewise, the request processing rate drops
Stress tests simulating different load levels were performed
from 150 req/s to 75 req/s as the load increases, indicating
to evaluate the model’s performance in a production environ-
that the system needs optimization to handle heavier loads
ment. The objective of these tests is to determine how the
efficiently.
system behaves under varied and extreme conditions of use.
Continuous monitoring was implemented to ensure system
The stress tests were carried out by gradually increasing the
stability and performance in a production environment. This
workload and measuring two key metrics: inference latency
process involved collecting and analyzing key metrics such as
and request processing rate.
real-time average latency and error rate. These metrics were
• Inference Latency: This metric measures the time monitored over time to evaluate the stability and efficiency of
it takes for the system to process a request and the system in operation.
return a response. Low latency is crucial for real-time • Average Latency: This metric indicates the average time
applications. it takes for the system to process requests. It is a critical
• Request Processing Rate: This metric indicates how measure of the system’s operational efficiency.
many requests the system can handle per second. • Real-Time Error Rate: This metric measures the per-
It measures the system’s ability to maintain adequate centage of requests that result in errors. It is essential
performance under intensive workloads. to evaluate the system’s reliability.
Table 6 shows the stress test results. The inference latency Table 7 presents continuous monitoring data. Over the
increases as the load increases, indicating that the system days, a slight increase in average latency and real-time error
needs more time to process each request. On the other hand, rate can be observed, which could indicate the need for
the request processing rate decreases as the load increases, system adjustments to maintain performance and reliability.
reflecting the system’s lower ability to handle multiple For example, average latency increased from 60 ms on Day
requests simultaneously under heavier loads. 1 to 70 ms on Day 5, suggesting potential overhead or
inefficiencies that must be addressed. Similarly, the real-time
VOLUME 12, 2024 107751
R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

error rate rose from 0.5% to 0.65%, indicating increased


system errors that could impact the user experience.
Figure 6 Presents the continuous performance of the system
in production; for example, in Graph 6A, the average latency
over time is shown. A progressive increase in latency

107752 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

FIGURE 5. Figure 5. System Performance Under Different Load Levels. Graph 5A: Inference Latency Under Different Load Levels. Graph 5B: Request
Processing Rate Under Different Load Levels.

FIGURE 6. Continuous System Performance in Production. Graph 6A: Average Latency Over Time. Graph 6B: Real-Time Error Rate Over Time.

TABLE 7. Continuous Monitoring Metrics of the System in Production. E. COMPARATIVE EVALUATION WITH ADVANCED

TECHNIQUES
is observed, especially noticeable between Day 3 and Day 5, To provide a clear context for the developed model’s
which could indicate an increasing workload or optimization performance, it was compared with other recent studies in the
problems in the system. Average latency increased from literature. This benchmarking process compares our model’s
60 ms on Day 1 to 82 ms on Day 5, suggesting the need for critical metrics with other deep-learning approaches and
adjustments to improve operational efficiency. traditional malware detection methods.
Chart 6B presents the real-time error rate over time. The When comparing our model with other recent studies
error rate shows an increasing trend, with significant peaks presented in Table 8, it is observed that our approach based
on Day 3 and Day 5, when it reached 0.85%. This increase on deep learning outperforms the models presented in studies
in errors can be due to system overload, network issues, A: [12], B: [13] and C: [14]. Specifically, our model achieves
or failures in the underlying infrastructure. This analysis a precision of 98%, a recall of 97%, an F1-score of 0.975,
highlights the importance of continuous monitoring and the and an AUC-ROC of 0.99. These values are higher than those
need for periodic adjustments to maintain the stability and obtained in the other studies, where the metrics range between
reliability of the system in production. 94% and 96% for precision and between 91% and 94% for
recall.
Table 9 compares traditional malware detection methods.
The deep learning-based model also shows significantly
VOLUME 12, 2024 107753
R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

better performance here. Although effective in certain


contexts, methods based on signatures and heuristics present
limitations in detecting new malware variants, reflected in
lower precision and recall values than our model. Although

107754 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

TABLE 8. Comparison of Key Metrics with Other Deep Learning TABLE 10. Limitations of the Study and Results.
Approaches.

TABLE 9. Comparison of Key Metrics with Other Deep Learning


Approaches.

more robust, Static and dynamic analysis still fall behind in


the metrics evaluated.
Several factors contribute to the differences in performance
5% of new malware samples were not detected due to these
between our model and other deep learning approaches. First,
evasion techniques.
the quality and quantity of data used to train the model play a
Although large and varied, the data set used to train and
crucial role. Our model benefited from an extensive data set
evaluate the model may not fully represent all potential
that included various malware and benign software samples,
malware threats in the real environment. The diversity and
contributing to better generalization and performance.
complexity of malware samples can vary significantly, and
Additionally, deep learning architectures and selected
some variants may not be adequately represented in the
hyperparameters can significantly impact the results. Our
data set. Our study found that 80% of the malware samples
model used an optimized architecture, with fine-tuning to
represented only five specific types of malware, which may
hyperparameters such as learning rate, batch size, and number
limit the model’s ability to detect more diverse threats.
of epochs, which improved its ability to detect complex
Additionally, the data collected to train the model may
patterns in the data. Compared to traditional methods, deep
contain biases that affect its performance. For example, if the
learning-based techniques can learn discriminative features
data set has a disproportionate representation of certain types
directly from the data without requiring extensive knowledge
of malware or benign software, the model could be biased
about malware characteristics. This allows them to better
toward those classes, resulting in uneven performance in
adapt to new threats and malware variants not seen during
detecting different kinds of malware. Additionally, it was
training.
observed that 70% of the malware samples came from a single
geographic region, which could impact the model’s ability to
F. LIMITATIONS OF RESULTS
detect threats in different geographic contexts.
Despite the results obtained with our deep learning model for
malware detection, it is crucial to recognize and discuss the
limitations inherent to this study. These limitations fall into V. DISCUSSION
two main categories, as presented in Table 10: model and data Malware detection through deep learning techniques has
set limitations. shown to be a promising tool in modern cybersecurity.
This study’s deep learning model architecture has proven Comparing our results with the existing literature, our
effective for malware detection. However, it is not immune approach offers significant advantages. For example, Yao
to limitations. Choosing a specific architecture may not be et al. [12] used static features and achieved a precision of
optimal for all malware varieties. The fixed structure of 95%, while our model, which combines static and dynamic
the neural network and hyperparameter settings may limit features, achieved a precision of 98%. Yao et al. [13] applied
the model’s ability to identify complex patterns in unknown convolutional neural networks to binary images, obtaining
samples. Although the model showed high performance on a precision of 96%, but faced limitations with malware
the training and testing data sets, its ability to generalize that uses behavior-based evasion techniques. Using RNNs
to new malware samples not seen during training remains and CNNs allowed us to capture sequential and visual
challenging. Advanced evasion techniques, such as code features, overcoming these challenges. Chen and Cao [14]
obfuscation and digital signature manipulation, can make it implemented RNNs to analyze sequences of system calls,
difficult to detect new threats. In our tests, approximately achieving a precision of 94%. At the same time, our model,

VOLUME 12, 2024 107755


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

by integrating multiple types of features and optimizing the VI. CONCLUSION


architecture, significantly improved this metric. This study has demonstrated the effectiveness of using
The detection process in our study begins with data selec- deep learning techniques for malware detection, providing
tion and preprocessing, where static and dynamic features are a significant advance compared to traditional methods and
extracted. This comprehensive approach lets you capture a some modern approaches documented in the literature.
more complete view of malware behavior. The model training Our research has integrated static and dynamic features,
used an optimized deep learning architecture that combines leveraging advanced neural network architectures to improve
CNNs for visual feature analysis and RNNs for dynamic precision, recall, and F1 scores in cyber threat detection.
behavior sequencing [39]. This combination improves the The results highlight our model’s ability to overcome
model’s ability to detect complex patterns and various types the limitations of previous approaches and underline the
of malware. The results show that our deep learning model importance of a comprehensive and adaptive methodology in
surpasses traditional precision and recall methods and is more cybersecurity.
robust against advanced evasion techniques. The precision The developed model achieved a precision of 98%, a recall
achieved was 98%, with a recall of 97% and an F1-score of 97%, and an F1-score of 0.975, surpassing traditional
of 0.975, significantly surpassing the methods based on methods based on signatures and heuristics, which showed
signatures and heuristics, which showed precision from 88% accuracies in the 88% to 89% range. Furthermore, compared
to 89%. These improvements are essential in detecting new to recent studies using convolutional and recurrent neural
malware variants, where traditional methods tend to fail. networks, our approach achieved superior metrics thanks to
Our work is essential because it can improve malware combining multiple feature types and optimization of the
detection in real environments where threats are diverse and model architecture. This high performance is due to the
constantly evolving. Our model offers higher generalization model’s ability to capture complex and diverse patterns in
and precision by integrating multiple features and using malware samples, thereby improving its generalization and
advanced deep-learning architecture [40]. This represents detection capabilities.
a significant advance in cybersecurity, providing a more These findings are important because of their practi-
effective tool for protection against emerging threats. cal applicability. In production environments, threats are
However, it is essential to recognize the limitations of dynamic and rapidly evolving; a deep learning model’s ability
our study. One of the main restrictions is the model’s to adapt and detect new malware variants is crucial. Our
architecture, which, although optimized for the data set used, approach improves detection in terms of precision and recall
may not be the most suitable for all malware varieties. and provides greater robustness against advanced evasion
The choice of specific layers, activation functions, and techniques used by attackers. This represents a significant
other parameters can limit the model’s ability to adapt to advance in protecting critical systems and data against
new variants. Furthermore, generalization to new malware emerging threats.
samples remains a challenge. Approximately 5% of new However, the study has also identified several significant
samples were undetected due to advanced evasion techniques, limitations. The model architecture, although practical, may
such as code obfuscation and digital signature manipulation. not be optimal for all malware varieties. The fixed neural
These techniques can make it difficult to detect new threats, network structure and hyperparameter settings can limit
underscoring the need for more flexible and adaptive network the model’s ability to adapt to new malware variants,
architectures. especially those that use sophisticated evasion techniques.
Another significant limitation is the representativeness Approximately 5% of new malware samples were undetected,
of the data set. Although extensive, our data set may not underscoring the need to develop more adaptive and flexible
adequately reflect all malware threats in the real environment. models.
80% of the malware samples represented only five specific Furthermore, the representativeness of the data set used
types, which may limit the model’s ability to detect more in the study poses significant challenges. Although the data
diverse threats. Additionally, it was observed that 70% of the set is large and varied, it may not adequately reflect all
samples came from a single geographic region, which could malware threats in the real environment. 80% of the samples
introduce geographic biases and affect the model’s ability to represented only five specific types of malware, which may
detect threats in different contexts. limit the model’s ability to detect more diverse threats. The
These limitations suggest that future work should focus geographic concentration of the samples, with 70% coming
on developing more robust architectures and collecting from a single region, can also introduce biases that affect
more diverse and representative data sets. Improving model model performance in different geographic contexts.
generalization to new malware samples and reducing bias in These limitations indicate that future research should
the data are critical areas for continued research. Integrating focus on collecting more representative and diverse data
transfer learning techniques and using synthetic data to sim- sets and developing more robust and adaptive deep learning
ulate greater sample diversity could be promising approaches architectures. Integrating transfer learning techniques and
to address these challenges using synthetic data to simulate greater sample diversity may

107756 VOLUME 12, 2024


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

be promising approaches to improving model generalization [15] A. V. Pandit and D. Mondal, ‘‘Real-time malware detection on IoT devices
ability and reducing biases in the data. using behavior-based analysis and neural networks,’’ Res. J. Comput. Syst.
Eng., vol. 4, no. 2, pp. 117–129, Dec. 2023, doi: 10.52710/rjcse.82.
In terms of future work, exploring several directions to [16] S. Y. Yerima, M. K. Alzaylaee, A. Shajan, and P. Vinod, ‘‘Deep learning
improve and expand this work is recommended. A promising techniques for Android botnet detection,’’ Electronics, vol. 10, no. 4,
line of research is the development of hybrid models p. 519, Feb. 2021, doi: 10.3390/electronics10040519.
[17] F. M. Alotaibi and Fawad, ‘‘A multifaceted deep generative adversarial
that combine traditional machine learning techniques with networks model for mobile malware detection,’’ Appl. Sci., vol. 12, no. 19,
deep learning to take advantage of the strengths of both p. 9403, Sep. 2022, doi: 10.3390/app12199403.
approaches. Additionally, implementing real-time detection [18] K. Kong, Z. Zhang, Z.-Y. Yang, and Z. Zhang, ‘‘FCSCNN: Fea-
systems that dynamically adapt to new threats and adjust their ture centralized Siamese CNN-based Android malware identifica-
tion,’’ Comput. Secur., vol. 112, Jan. 2022, Art. no. 102514, doi:
parameters based on live data is crucial to maintaining model 10.1016/j.cose.2021.102514.
relevance and effectiveness in production environments. [19] G. Marín, P. Caasas, and G. Capdehourat, ‘‘DeepMAL—Deep learning
It would also be beneficial to investigate the application models for malware traffic detection and classification,’’ in Data Science–
Analytics and Applications. Wiesbaden, Germany, 2021, pp. 105–112, doi:
of XAI techniques to provide greater transparency and 10.1007/978-3-658-32182-6_16.
interpretability in model decisions, thus facilitating its [20] S. S. Lad. and A. C. Adamuthe, ‘‘Improved deep learning model for static
adoption in safety-critical environments. PE files malware detection and classification,’’ Int. J. Comput. Netw. Inf.
Secur., vol. 14, no. 2, pp. 14–26, Apr. 2022, doi: 10.5815/ijcnis.2022.02.02.
[21] A. Morales, R. Cuevas, and J. M. Martínez, ‘‘Analytical processing
REFERENCES with data mining,’’ RECI Revista Iberoamericana de las Ciencias
[1] E. S. Alomari, R. R. Nuiaa, Z. A. A. Alyasseri, H. J. Mohammed, Computacionales e Informática, vol. 5, no. 9, pp. 22–43, 2016. [Online].
N. S. Sani, M. I. Esa, and B. A. Musawi, ‘‘Malware detection using deep Available: http://www.reci.org.mx/index.php/reci/article/view/40/176
learning and correlation-based feature selection,’’ Symmetry, vol. 15, no. 1, [22] J. Dean and S. Ghemawat, ‘‘MapReduce: Simplified data processing on
p. 123, Jan. 2023, doi: 10.3390/sym15010123. large clusters,’’ Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008,
[2] X. Luo, J. Li, W. Wang, Y. Gao, and W. Zhao, ‘‘Towards improving doi: 10.1145/1327452.1327492.
detection performance for malware with a correntropy-based deep learning [23] A. Ksibi, M. Zakariah, L. Almuqren, and A. S. Alluhaidan, ‘‘Efficient
method,’’ Digit. Commun. Netw., vol. 7, no. 4, pp. 570–579, Nov. 2021, doi: Android malware identification with limited training data utilizing multiple
10.1016/j.dcan.2021.02.003. convolution neural network techniques,’’ Eng. Appl. Artif. Intell., vol. 127,
[3] Y. J. Kim, C.-H. Park, and M. Yoon, ‘‘FILM: Filtering and machine Jan. 2024, Art. no. 107390, doi: 10.1016/j.engappai.2023.107390.
learning for malware detection in edge computing,’’ Sensors, vol. 22, no. 6, [24] U. A. Khan and A. Alamäki, ‘‘Designing an ethical and secure pain
p. 2150, Mar. 2022, doi: 10.3390/s22062150. estimation system using AI sandbox for contactless healthcare,’’ Int.
[4] Y. Liu, P. Yang, P. Jia, Z. He, and H. Luo, ‘‘MalFuzz: Coverage- J. Online Biomed. Eng., vol. 19, no. 15, pp. 166–201, Oct. 2023, doi:
guided fuzzing on deep learning-based malware classification model,’’ 10.3991/ijoe.v19i15.43663.
PLoS ONE, vol. 17, no. 9, Sep. 2022, Art. no. e0273804, doi: [25] I. Almomani, A. Alkhayer, and W. El-Shafai, ‘‘E2E-RDS: Efficient end-
10.1371/journal.pone.0273804. to-end ransomware detection system based on static-based ML and vision-
[5] M. Maray, M. Maashi, H. M. Alshahrani, S. S. Aljameel, S. Abdelbagi, and based DL approaches,’’ Sensors, vol. 23, no. 9, p. 4467, May 2023, doi:
A. S. Salama, ‘‘Intelligent pattern recognition using equilibrium optimizer 10.3390/s23094467.
with deep learning model for Android malware detection,’’ IEEE Access, [26] A. Rasool, A. R. Javed, and Z. Jalil, ‘‘SHA-AMD: Sample-efficient hyper-
vol. 12, pp. 24516–24524, 2024, doi: 10.1109/access.2024.3357944. tuned approach for detection and identification of Android malware family
[6] G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone, ‘‘Towards an and category,’’ Int. J. Ad Hoc Ubiquitous Comput., vol. 38, nos. 1–3, p. 172,
interpretable deep learning model for mobile malware detection and family 2021, doi: 10.1504/ijahuc.2021.119097.
identification,’’ Comput. Secur., vol. 105, Jun. 2021, Art. no. 102198, doi: [27] B. Menaouer, A. E. H. M. Islem, and M. Nada, ‘‘Android malware detection
10.1016/j.cose.2021.102198. approach using stacked AutoEncoder and convolutional neural networks,’’
[7] Ö. A. Aslan and R. Samet, ‘‘A comprehensive review on malware Int. J. Intell. Inf. Technol., vol. 19, no. 1, pp. 1–22, Sep. 2023, doi:
detection approaches,’’ IEEE Access, vol. 8, pp. 6249–6271, 2020, doi: 10.4018/ijiit.329956.
10.1109/ACCESS.2019.2963724. [28] M. Aamir, M. W. Iqbal, M. Nosheen, M. U. Ashraf, A. Shaf,
[8] A. R. Nasser, A. M. Hasan, and A. J. Humaidi, ‘‘DL-AMDet: Deep K. A. Almarhabi, A. M. Alghamdi, and A. A. Bahaddad, ‘‘AMDDLmodel:
learning-based malware detector for Android,’’ Intell. Syst. Appl., vol. 21, Android smartphones malware detection using deep learning model,’’
Mar. 2024, Art. no. 200318, doi: 10.1016/j.iswa.2023.200318. PLoS ONE, vol. 19, no. 1, Jan. 2024, Art. no. e0296722, doi: 10.1371/jour-
[9] H. Rathore, A. Samavedhi, S. K. Sahay, and M. Sewak, ‘‘Robust malware nal.pone.0296722.
detection models: Learning from adversarial attacks and defenses,’’ [29] H. G. Ghifari, D. Darlis, and A. Hartaman, ‘‘Pendeteksi golongan darah
Forensic Sci. Int., Digit. Invest., vol. 37, Jul. 2021, Art. no. 301183, doi: manusia berbasis tensorflow menggunakan ESP32-CAM,’’ ELKOMIKA,
10.1016/j.fsidi.2021.301183. Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, Teknik Elektronika,
[10] S. Dong, L. Shu, and S. Nie, ‘‘Android malware detection method based on vol. 9, no. 2, p. 359, Apr. 2021, doi: 10.26760/elkomika.v9i2.359.
CNN and DNN bybrid mechanism,’’ IEEE Trans. Ind. Informat., vol. 20, [30] J. Huang, ‘‘Accelerated training and inference with the TensorFlow object
no. 5, pp. 7744–7753, May 2024, doi: 10.1109/tii.2024.3363016. detection API,’’ Google AI Blog, Mountain View, CA, USA, Rep., 2017.
[11] B. Liu, W. Huo, C. Zhang, W. Li, F. Li, A. Piao, and W. Zou, ‘‘αDiff: [31] Y. Qiao, W. Zhang, Z. Tian, L. T. Yang, Y. Liu, and M. Alazab, ‘‘Adversarial
Cross-version binary code similarity detection with DNN,’’ in Proc. 33rd malware sample generation method based on the prototype of deep learning
ACM/IEEE Int. Conf. Automated Softw. Eng., Sep. 2018, pp. 667–678, doi: detector,’’ Comput. Secur., vol. 119, Aug. 2022, Art. no. 102762, doi:
10.1145/3238147.3238199. 10.1016/j.cose.2022.102762.
[12] P. Yadav, N. Menon, V. Ravi, S. Vishvanathan, and T. D. Pham, [32] M. Chandan, S. G. Santhi, and T. S. Rao, ‘‘Combined shallow
‘‘EfficientNet convolutional neural networks-based Android malware and deep learning models for malware detection in WSN,’’ Int. J.
detection,’’ Comput. Secur., vol. 115, Apr. 2022, Art. no. 102622, doi: Image Graph., vol. 19, no. 2, Sep. 2023, Art. no. 2550034, doi:
10.1016/j.cose.2022.102622. 10.1142/s0219467825500342.
[13] Y. Yao, Y. Zhu, Y. Jia, X. Shi, L. Zhang, D. Zhong, and J. Duan, [33] L. D. M. Ortiz-Aguilar, M. Carpio, J. A. Soria-Alcaraz, H. Puga, C. Díaz,
‘‘Research on malware detection technology for mobile terminals based C. Lino, and V. Tapia, ‘‘Training OFF-line hyperheuristics for course
on API call sequence,’’ Mathematics, vol. 12, no. 1, p. 20, Dec. 2023, doi: timetabling using K-folds cross validation,’’ La Revista Programación
10.3390/math12010020. Matemática y Softw., vol. 8, pp. 1–8, Oct. 2016.
[14] Z. Chen and J. Cao, ‘‘VMCTE: Visualization-based malware classification [34] S. Sen, D. Sugiarto, and A. Rochman, ‘‘Komparasi metode multilayer
using transfer and ensemble learning,’’ Comput., Mater. Continua, vol. 75, perceptron (MLP) dan long short term memory (LSTM) dalam peramalan
no. 2, pp. 4445–4465, 2023, doi: 10.32604/cmc.2023.038639. Harga beras,’’ Ultimatics, vol. 12, no. 1, pp. 35–41, 2020.

VOLUME 12, 2024 107757


R. Gutierrez et al.: DL Models for Real-Time Automatic Malware Detection

[35] A. A. Darem, F. A. Ghaleb, A. A. Al-Hashmi, J. H. Abawajy, LORENA NARANJO GODOY received the mas-
S. M. Alanazi, and A. Y. Al-Rezami, ‘‘An adaptive behavioral-based ter’s degree in new technologies law and the
incremental batch learning malware variants detection model using Ph.D. degree (cum laude) in legal and political
concept drift detection and sequential deep learning,’’ IEEE Access, vol. 9, sciences and from the Universidad Pablo de
pp. 97180–97196, 2021, doi: 10.1109/ACCESS.2021.3093366. Olavide, Seville, Spain. She is a Researcher, a BID
[36] I. Almomani, A. Alkhayer, and W. El-Shafai, ‘‘An automated vision- Consultant, an undergraduate and postgraduate
based deep learning model for efficient detection of Android mal- Teacher, the author of several academic articles,
ware attacks,’’ IEEE Access, vol. 10, pp. 2700–2720, 2022, doi:
and the national and international Lecturer. She is
10.1109/ACCESS.2022.3140341.
a leading implementer with national and interna-
[37] G. Sahani, C. S. Thaker, and S. M. Shah, ‘‘Supervised learning-based
approach mining ABAC rules from existing RBAC enabled systems,’’ EAI
tional companies, banks, and other entities in the
Endorsed Trans. Scalable Inf. Syst., vol. 10, no. 1, 2023, Art. no. e9, doi: financial sector, digital platforms, and e-commerce in adopting personal data
10.4108/eetsis.v5i16.1560. protection models, cybersecurity, and digital transformation incorporating
[38] M. Cho, J.-S. Kim, J. Shin, and I. Shin, ‘‘mal2D: 2D based deep learning big data, the Internet of things, and artificial intelligence, with an experience
model for malware detection using black and white binary image,’’ in the public and private sector. She is the author and a leader of the
IEICE Trans. Inf. Syst., vol. E103-D, no. 4, pp. 896–900, 2020, doi: process of approval of the personal data protection law for Ecuador and other
10.1587/transinf.2019edl8146. regulations that allowed its implementation in the National System of Public
[39] T. Lu, Y. Du, L. Ouyang, Q. Chen, and X. Wang, ‘‘Android malware Data Registry, when she was the National Director of DINARDAP. She was
detection based on a hybrid deep learning model,’’ Secur. Commun. Netw., the Director of the School of Law, UDLA; an Undersecretary of Normative
vol. 2020, pp. 1–11, Aug. 2020, doi: 10.1155/2020/8863617. Development of the Ministry of Justice, Human Rights and Worship; an
[40] A. Albakri, F. Alhayan, N. Alturki, S. Ahamed, and S. Shamsudheen, Advisor to the Presidency of the National Court of Justice; and the National
‘‘Metaheuristics with deep learning model for cybersecurity and Android Director of the Public Data Registry. Currently, she is the Director of the
malware detection and classification,’’ Appl. Sci., vol. 13, no. 4, p. 2172, Master’s in digital law and innovation, with a mention in the economy, trust,
Feb. 2023, doi: 10.3390/app13042172. and digital transformation with UDLA and the digital law and personal data
protection area of Estudio Jurídial.

ROMMEL GUTIERREZ is a Research Technician ARACELY MERA-NAVARRETE received the


with UDLA, Quito, Ecuador, where he applies his master’s degree in business administration from
knowledge in software development, data science, UIDE. She is a Computer Engineer in Quito,
and cybersecurity. As an IT Engineer with a Ecuador. She is an Expert in e-learning plat-
master’s in cybersecurity, his focus on AI, data forms FATLA.Org. Her skills and abilities are in
science, cybersecurity, and software development computer science and its associated technologies,
is particularly geared towards education and such as hardware, software, communications, e-
research. He is passionate about utilizing these learning platforms, construction of computer
technological tools to fortify digital systems and systems, and the management in LMS applications
create innovative solutions, with a special empha- (Moodle–CANVAS).
sis on their applicability in educational settings and research environments.

SERGIO LUJÁN-MORA was born in Alicante,


Spain, in 1974. He received the degree in computer
WILLIAM VILLEGAS-CH. (Member, IEEE) science and engineering from the University of
received the master’s degree in communications Alicante, Alicante, in 1998, and the Ph.D. degree
networks and the Ph.D. degree in computer in computer engineering from the Department of
science from the University of Alicante. He is Software and Computing Systems, University of
a Professor of information technology with the Alicante, in 2005. He is currently a Senior Lecturer
Universidad de Las Américas, Quito, Ecuador. with the Department of Software and Computing
He is a Systems Engineer, specializing in robotics Systems, University of Alicante. In recent years,
in artificial intelligence. He has participated in he has focused on e-learning, massive open online
various conferences as a speaker on topics, such courses (MOOCs), open educational resources (OERs), and the accessibility
as ICT in education and how they improve of video games. He is the author of several books and has published
educational quality and student learning. His main articles focus on the many papers in various conferences (ER, UML, and DOLAP) and high
design of ICT systems, models and prototypes applied to different academic impact journals (DKE, JCIS, JDBM, JECR, JIS, JWE, IJEE, and UAIS).
environments, especially with the use of big data and artificial intelligence as His research interests include web applications, web development, and web
a basis for creating intelligent educational environments. His main research accessibility and usability.
topics include web applications, data mining, and e-learning.

107758 VOLUME 12, 2024

You might also like