DL 2P DDoSADF
DL 2P DDoSADF
Keywords: In today’s tech-driven world, while Internet-based applications drive social progress, their architectural
Deep learning weaknesses, inadequate security measures, lack of network segmentation, unsecured IoT devices etc., offer
Autoencoder ample opportunities for attackers to launch a multitude of attacks on their services. Despite numerous security
Reconstruction error
solutions, the frequent changes in the methods employed by attackers present a challenge for security systems
Deep neural network
to stay up to date. Moreover, the existing machine learning approaches are confined to known attack patterns
CICDDoS2019 dataset
DDoS-AT 2022 dataset
and necessitate annotated data. This paper proposes a deep learning-based two-phase DDoS attack detection
framework named DL-2P-DDoSADF. The proposed framework has been validated using the CICDDoS2019 and
DDoS-AT-2022 datasets. In the first phase, Autoencoder (AE) has been trained using the legitimate traffic and
threshold value has been set using Reconstruction Error (RE). The test data comprising legitimate and attack
traffic has been used to validate the proposed approach efficacy. The initial phase entails utilizing a trained
AE model to enable the passage of predicted legitimate traffic through the network. In contrast, the predicted
attack traffic proceeds to the second phase to classify the type of attack it represents. The performance and
efficacy of various deep learning approaches: Deep Neural Network (DNN), Long Short-Term Memory (LSTM)
and Gated Recurrent Units (GRU) are compared as part of the second phase. The autoencoder displayed an
accuracy level of 99% in detecting both datasets in the initial phase. It has been observed that the DNN
produced an overall accuracy of 97% and 96% for the CICDDoS2019 and DDoS-AT-2022 datasets, respectively,
for multiclass classification. The DNN model performed better than LSTM and GRU models in the second phase.
1. Introduction In February 2022, websites for the Ministry of Defense and Privat
Bank in Ukraine were among those affected by DDoS attacks [4].
In the past decade, there has been a significant increase in the According to data from Q1 2022, 53.64% of these attacks were UDP
number of individuals who use the Internet. It is estimated that In- floods [4]. During the third quarter of 2022, DDoS threats grew,
dia will have as many as 900 million Internet users by 2025 [1]. particularly those executed by experienced hackers [5]. In September
However, this impressive growth comes with less secure devices and 2022, Google reported that it thwarted a massive DDoS attack that
unprotected paths for cyberattacks [2]. A cyberattack is an effort generated 46 million requests per second [6].
made without proper authorization to access and destroy information Therefore, based on the incidents mentioned above, it is crucial to
within a computer system or disrupt the entire computer network [3].
detect DDoS attacks promptly so that administrators can take preven-
Cybercriminals use various methods such as malware, phishing, SQL
tive measures on time.
injection, and Distributed Denial of Service (DDoS) attacks, depending
In the age of big data and the Internet of Things (IoT), generating a
on the system's and network's weaknesses. The most prevalent type of
high volume of unlabeled data is common [14], but labeling this data,
cyberattack is a DDoS attack, which involves overwhelming a target or
server with internet traffic to disrupt its services [3]. DDoS attacks are especially network traffic, is a complex task. This results in a scarcity
becoming a significant concern for the cyber world as they are easy to of labeled data for supervised machine learning (ML) [14]. Most of
carry out but difficult to detect. It made DDoS attacks a powerful tool the current research in this field uses supervised machine learning
for cybercriminals. to classify network traffic. The accuracy of machine learning models
∗ Corresponding author at: Department of Information Technology, UIET, Panjab University, Chandigarh, India.
E-mail address: [email protected] (M. Mittal).
https://doi.org/10.1016/j.jisa.2023.103609
Table 1
Comparison of existing work.
References Approach used Features Detection or Dataset used Results
extraction classification
Choi et al. [7] AE AE AE NSL-KDD Accuracy: 0.9170, F1-score: 0.9071, Specificity:
0.9815, Precision: 0.9768, Recall: 0.8468
Yang et al. [8] AE AE AE Synthetic dataset (SYNT), UNB SYNT: Detection rate: 98.32, FPR:0.38. UNB:
2017, MAWI Detection rate: 94.10, FPR: 1.88
Meira et al. [9] AE AE AE NSL-KDD and ISCX NSL-KDD: F1 score: 63, Precision: 53, Recall: 77,
AUC: 83.65. ISCX: F1 score: 69, Precision: 76,
Recall: 63, AUC: 80.44
Tang et al. [10] LightGBM-AE LightGBM AE NLS-KDD Accuracy: 89.82, Recall: 90.16, Precision: 91.81,
F1-score: 90.98
Song et al. [11] AE AE AE NSL-KDD, IoTID20, and NSL-KDD: Accuracy: 0.887, TPR: 0.851, F1 score:
N-BaIoT 0.895, FPR: 0.066
Hou et al. [12] NAE and DNN NAE NAE and DNN NSL-KDD, BoT-IoT, and NSL-KDD: Accuracy: 90.03, Recall: 94.88,
N-baIoT Precision: 88.44, F1 score: 92.21
Aktar et al. [13] AE AE AE NSL-KDD, CIC-IDS2017, CIC-IDS2017: Precision: 92.46, Recall: 92.45,
CIC-DDoS2019 F1-Score: 92.45, Accuracy: 92.45, AUC: 92.45.
NSL-KDD: Precision: 96.10, Recall: 96.08, F1-Score:
96.08, Accuracy: 96.08, AUC: 96.08.
CICDDoS2019: Accuracy: 93.41%–97.58%.
is high when the training and evaluation data have similar patterns. • A dataset named DDoS-AT-2022 (DDoS attacks at the Application
However, in real-life scenarios, attackers use new patterns that these and Transport layer) has been generated as part of the work,
models cannot detect precisely. Autoencoder (AE) can predict zero-day comprising a mixture of benign traffic, flash traffic, and different
attacks if utilized as a classifier. types of DDoS attacks at the application and transport layers with
Previous studies have used AE either for binary classification or varying attack rates.
feature extraction (as shown in Table 1). AE is inadequate for identi- • The proposed approach’s performance has been validated using
fying specific forms of DDoS attacks when used for categorization and various evaluation metrics on the datasets viz: CICDDoS2019 and
incapable of discovering unknown or previously unseen attacks when DDoS-AT-2022.
used for feature extraction. Hence, there is a need for a technique that
can both detect previously unknown attacks and categorize attacks into The rest of the paper has been organized as follows: Section 2
various classes. discusses the DL-based DDoS attacks detection techniques found in
Most of the current techniques for detecting DDoS attacks rely on prior research; Section 3 presents the DL-based two-phase DDoS attacks
datasets that are deficient in the amalgamation of divergent categories detection framework; Section 4 outlines the research methodology;
of DDoS attacks, which display a diversity of attack rates and flash Section 5 shows the results and discussions, and finally, Section 6
traffic. Therefore, it is imperative to have a dataset that incorporates concludes and suggests future directions for the paper.
these features to achieve a more plausible outcome.
To overcome the above mentioned limitations, we proposed a Deep 2. Related work
Learning based Two-Phase DDoS attack Detection Framework named
DL-2P-DDoSADF that distinguishes itself from prior methods by com- Several researchers have proposed DL-based approaches to identify
bining the AE with other deep learning techniques, specifically Deep DDoS attacks. In the proposed approach, the Autoencoder (AE) has
Neural Network (DNN) technique. The performance of Long Short-Term been utilized as the classifier. Therefore, the existing literature on AEs
Memory (LSTM) and Gated Recurrent Units (GRU) were assessed for
as binary classifiers is listed as follows:
detecting attacks, but it was found that the categorical effectiveness of
Choi et al. [7] have employed different variations of autoencoders
DNN was better than that of LSTM and GRU. As a result, DNN was
such as Stacked Autoencoder (SAE), Denoising Autoencoder (DAE), and
selected for the second phase. Thus, in the first phase, AE detects zero-
Variational Autoencoder (VAE) to identify intrusions in the NSL-KDD
day attacks, while in the subsequent phase, the DNN categorizes the
dataset. The basic AE had one hidden layer comprising 32 units, and it
type of attack.
employed the ReLU activation function. The training dataset was split
To our knowledge, no known methods (as listed in Table 1) utilize
into three parts with normal instances making up 99%, 95%, and 90%
a two-phase approach similar to our proposed approach. Our proposed
method is beneficial in two ways: first, the autoencoder only needs and abnormal instances making up the remaining 1%, 5%, and 10%
legitimate traffic for training, making it effective for detecting new or respectively. The results showed that the basic Autoencoder model was
zero-day attacks. Second, it is effective in detecting different types of the most effective, achieving an accuracy of 91.70%, a F1 score of
DDoS attacks. The main contributions of this paper are listed as follows: 0.9071, a specificity of 0.9815, a precision of 0.9768, and a recall of
0.8468 using a training dataset with a normal to abnormal instance
• Proposed a deep learning-based two-phase DDoS attack detection ratio of 99% to 1%.
framework named DL-2P-DDoSADF: Yang et al. [8] have created a AE based DDoS attacks Detection
Framework, named AE-D3F, which is unsupervised and straightforward
– Autoencoder is trained using legitimate data in the first
to use. The model requires only normal data to be constructed. The AE
phase and performs binary classification.
model comprises an input layer, three hidden layers, and one output
– In the second phase, the attack traffic detected by AE is fed
layer. The layer sizes are 27 neurons for the input layer, followed by
into the trained DNN model to detect different DDoS attack
24, 16, 24, and 27 neurons for the three hidden layers and output
types. In addition, the second phase filters out the normal
layer, respectively. The activation function employed is leaky ReLU.
traffic that may have been misclassified as an attack by AE.
For optimization, the model uses the Adam optimizer, while the mean
• The various DL models (AE, DNN, LSTM, and GRU) are optimized squared error (MSE) serves as the loss function. Additionally, the batch
by tuning hyperparameters using an automated process. size is configured to 32. The threshold has been calculated as 𝛥𝑟𝑒 = c
2
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
* 𝛿𝑎𝑣𝑔 , where 𝛿𝑎𝑣𝑔 denotes the average reconstruction error of normal BoT-IoT, and N-baIoT datasets. The results over three datasets are as
samples, while c remains a fixed value. Experiments conducted on NSL-KDD: accuracy 90.03%, recall 94.88%, precision 88.44%, F1 score
synthetic and public datasets (SYNT, UNB 2017, MAWI) revealed that 92.21%, N-baIoT: accuracy 99.51, recall 99.81, precision 99.32, F1
normal traffic patterns remain consistent within a specific network score 99.56, BoTIoT: accuracy 99.80, recall 99.85, precision 99.95, F1
environment, but cannot be applied to another network. The AE-D3F score 99.90. The results show good performance of the proposed hy-
model can achieve nearly 100% detection rate with less than 0.5% FPR brid detection model compared to several other cutting-edge detection
on both known and unknown attack test sets, but it is necessary to set methods.
the RE threshold value appropriately. Aktar et al. [13] proposed a Deep Contractive Autoencoder (DCAE)
Meira et al. [9] claimed that the majority of the algorithms for to detect DDoS attacks more effectively than traditional IDSs. The
Intrusion detection are supervised techniques that have gaps in their DCAE, based on deep learning, is trained using benign instances, allow-
capability to detect unseen attacks. Therefore, the authors have used ing it to reconstruct benign inputs with minimal RE. However, for DDoS
many unsupervised learning approaches to detect unseen attacks. In attacks, the RE will be high. The authors differentiate between benign
this work, six unsupervised algorithms have been used that are ‘‘AE, and DDoS attacks by using the RE as a metric, employing the contrac-
Nearest Neighbor, K-Means, Isolation Forest, Support Vector Machines tive loss, which includes an additional penalty term alongside the tradi-
(SVM), and Scaled Convex Hull’’. The AE architecture is defined as tional reconstruction loss function of Autoencoders (AEs). The authors
having three hidden layers with 50, 5, and 50 units, respectively. The determined the threshold value for their model by analyzing the range
hyperbolic tangent activation function is applied, and the training is of minimum and maximum RE from the training set. They evaluated
done for 20 epochs. The threshold value used is 0.002. These algo- their proposed model using three datasets: CIC-IDS2017, NSL-KDD, and
rithms are assessed using the NSL-KDD and ISCX datasets. The results CIC-DDoS2019, focusing exclusively on DoS/DDoS-related attacks. The
showed that all the techniques used are proficient to detect most of the model was configured with 100 epochs, a batch size of 32, and utilized
anomalies with suitable performance. the Adam optimizer with the contractive loss function. For the NSL-
Tang et al. [10] has proposed the Network Intrusion Detection KDD dataset, two hidden layers with 60 and 30 neurons were used, and
System (IDS) in which the LightGBM approach is applied for feature for CIC-IDS2017, 32 and 16 neurons were used. They evaluated the CIC-
selection and the AE is used for training and detection. The detection DDoS2019 dataset with seven types of traffic and achieved accuracy
of intrusion is done through the proposed model using RE. The appro- ranging from 93.41% to 97.58%. On the NSL-KDD and CIC-IDS2017
priate threshold was set according to the RE to distinguish between datasets, the model achieved accuracy rates of 96.08% and 92.45%,
benign and attack traffic. The AE architecture consists of three hidden respectively. The proposed model outperformed other deep learning
layers with 48, 32, and 16 neurons, respectively. Each layer uses the models (Basic AE, Variational AE, and LSTM AE) on all three datasets.
Relu activation function, and the optimizer employed is Adam. The After conducting a comprehensive review of the literature, it has
model is trained for 20 epochs with a learning rate of 0.004. The been noted that several research gaps require attention, as outlined
threshold value for the model is determined as the maximum of the below:
difference between TPR and FPR. The variational autoencoder VAE
and DAE was also evaluated in addition to the AE. These are then • Most existing methods use a dataset that lacks diversity, with no
compared with existing ML techniques like ‘‘XGBoost, Decision Tree combination of various types of DDoS attacks and attack rates,
(DT), Random Forest (RF), K-Nearest Neighbors (KNN), and GBDT’’. All as well as flash traffic. A more comprehensive and representative
these approaches were evaluated over the NSL-KDD dataset. The results dataset is needed to achieve more accurate results.
showed that LightGBM-AE can differentiate between benign and attack • Rare detection methods exist that aim to identify various types
traffic compared to other approaches. The accuracy of LightGBM-AE is of DDoS attacks and their associated attack rates, as well as flash
89.82%, the recall is 90.16%, the precision is 91.81%, and the F1-score traffic.
is 90.98%. • The distinction between legitimate traffic and various types of
Song et al. [11], have optimized the structure and hyperparameter DDoS attacks, which have different attack rates and flash traffic,
settings of the AE, the threshold value, and the latent size of the is challenging due to their shared behavioral characteristics.
autoencoder for detection. The AE architecture utilized for the NSL- • Most of the prior research that utilized AEs as classifiers (as listed
KDD dataset consists of five hidden layers, comprising both the encoder in Table 1) have a high False Positive Rate (FPR) and are unable
and decoder. These layers had neuron counts of 32, 16, 4, 16, and to perform multiclass classification for identifying various types
32, respectively. The threshold metric adopted was based on the Z- of DDoS attacks.
score derived from the standard normal distribution. They evaluated
the approach over NSL-KDD, IoTID20, and N-BaIoT datasets. The best
F1-score is 0.895 obtained using the latent size of 4 over the NSL-KDD 3. Proposed approach
dataset.
Hou et al. [12] have proposed a DL based hybrid detection ap- There are several methods that exist, including supervised, unsuper-
proach. First of all, they have designed a nonsymmetric autoencoder vised, semi-supervised, and reinforcement learning. However, we have
(NAE) to extract the patterns (or characteristics) of benign and detec- used unsupervised learning in phase-I of the proposed approach. As
tion of an anomaly. The NAE is designed to extract the latent feature of in supervised learning, the samples come with known category labels,
the network traffic. Its structure comprises of an encoder and a decoder. and the objective is to establish a relationship between the sample
The encoder is comprised of two parts with two different convolution features and these labels [15]. Typically, having a greater number
neural networks (CNN); hence it extracts the hidden information of of training samples enhances the accuracy of classification and the
traffic from two different perspectives. The decoder has five linear ability of the trained model to perform well on new, unseen data [15].
layers to reconstruct the input. The authors have also proposed a However, in the current era of big data and the IoTs, a vast amount of
scheme to extract latent features using the NAE encoder and used those unlabeled data is being generated, making the task of labeling this data
features to train the DNN model for detection. DNN is trained with all quite formidable. This is where unsupervised learning becomes valu-
(benign as well as attack) network traffic using latent features which able as it can handle such unlabeled data more effectively compared
are extracted through the NAE encoder. The proposed hybrid scheme to supervised learning. Semi-supervised learning addresses the issue
takes the detection results from NAE and DNN models. If either of the of inadequate labeled samples by incorporating numerous unlabeled
two detects an anomaly then the traffic will be counted as the attack samples along with a small set of labeled ones to train the classifier [15–
traffic. The proposed scheme has been evaluated over the NSL-KDD, 18]. Consequently, these algorithms draw knowledge from both labeled
3
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Phase II: DNN, LSTM and GRU for Attack Type Identification
4
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Table 2
Details of training and testing data for the CICDDoS2019 and DDoS-AT-2022 datasets.
Dataset Approach Train Data Test data
CICDDoS 2019 Autoencoder BENIGN:101227 Attacks: 3864747, BENIGN:11134
DNN, LSTM, GRU DrDoS MSSQL:2197825, DrDoS NetBIOS: Benign: 1586, LDAP: 374474, MSSQL:
1981478, DrDoS UDP: 1546136, DrDoS LDAP: 1116712, NetBIOS: 704085, UDP: 756380,
1071571, Syn: 690036, UDP-lag: 165336, Syn: 911886,
BENIGN: 84679, Total: 7737061 UDPLag: 268
DDoS-AT-2022 Autoencoder Benign: 17928 TCP-Syn flood: 19999,
UDP flood: 19999,
Slow read: 19999, HTTP flood: 19999, Flash
traffic: 19999, Slow header: 8050, HTTP low
volume low rate: 7800, Slow body: 7506, TCP
Syn low: 6000,
Benign: 8486
DNN, LSTM, GRU UDP flood: 50001, HTTP flood: 50001, Benign: 1209, Low-rate attack: 13745, Slow
TCP-Syn flood: 50001, Flash traffic: 50001, attack: 35360, Flood attack: 59769, Flash
Slow read: 50001, Slow body: 8678, traffic: 19950
Slow header: 8099, TCP Syn low: 16159, HTTP
low volume low rate: 8023,
Benign: 30832
• Details about the training data used for DNN, LSTM, and GRU:
Second phase: To train DNN, LSTM, and GRU, seven types of traffic (‘‘BE-
In the second phase, DNN, LSTM, and GRU models are used to NIGN, DrDoS UDP, DrDoS MSSQL, DrDoS LDAP, DrDoS NetBIOS,
classify the types of attacks. These methods have been trained using Syn, UDP-lag’’) are taken from the 12th January data of the
both benign and attack traffic, as described in Table 2. The hyperpa- CICDDoS2019 dataset, and the training dataset created for the
rameters of the DNN, LSTM, and GRU models are optimized using the Autoencoder is also utilized (as shown in Table 2).
• Details about the test data: To evaluate the proposed approach,
Talos tool [24] from a set of parameter space boundaries. These trained
seven types of traffic (BENIGN, LDAP, MSSQL, NetBIOS, UDP,
models are used to test the traffic identified as an attack in Phase
Syn, UDPLag) are taken from the 11th March data and 20% of
I and then these models identify the type of DDoS attack. Previous
it is utilized as the testing data (as shown in Table 2).
works [25–28] have shown that DNN, LSTM, and GRU perform well,
so these approaches are chosen in this study.
4.1.2. DDoS-AT-2022 dataset [30]
The section outlines the design of a testbed that is being used
4. Research methodology to generate the DDoS-AT2022 dataset. The architecture/design of the
testbed is shown in Fig. 2.
The methodology followed in this paper is shown in Fig. 1. The The network traffic has been obtained from the DDoS-Testbed
sequential methodology of the proposed approach is explained below: (Fig. 2). The testbed has 4 physical PCs split into two groups of
two PCs each, with each running Ubuntu and Kali Linux operating
4.1. Dataset used systems. In addition, there are 3 D-Link routers, 3 Layer 2 switches,
and a Layer 3 switch, along with a Linux server that functions as
In this research, we employ two datasets, namely CICDDoS2019 the web server victim. To capture data, an Intel Xeon Linux server
[29] and self-generated DDoS-AT-2022 [30]. The choice of these (known as the capturing server) is connected to the Manage Switch. The
datasets is driven by their distinct characteristics. The CICDDoS2019 Manage Switch is connected to two attack networks (2 and 3 router)
dataset is being utilized as it serves as a recent benchmark and en- and one victim network. In order to increase the virtual nodes, the
CORE Emulator [31] is used and installed on three PCs. For detailed
compasses TCP/UDP based attacks. On the other hand, we include the
description of DDoS-AT-2022 dataset refer [30].
DDoS-AT-2022 dataset due to its diversity, containing legitimate traffic,
The DDoS-AT-2022 dataset has been used to train and assess diverse
flash traffic, and various DDoS attacks occurring at both the application
deep-learning models, as indicated in Table 2. The DDoS-AT-2022
and transport layers. Furthermore, the dataset includes attacks with
dataset comprises various types of DDoS attacks, including UDP flood,
different rates, such as low, slow, and flood attacks. By using these two
HTTP flood, TCP-Syn flood, Flash traffic, Slow read, Slow body, Slow
datasets, we aim to demonstrate the efficacy of our proposed approach
header, TCP Syn low, and HTTP low volume low rate, as well as benign
across diverse types of traffic. These datasets are explained below:
traffic.
5
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
The input features that are used for the DDoS-AT-2022 dataset are 4.3.2. DDoS-AT-2022 dataset
the same as those used in the CICDDoS2019 dataset, except for two The configurations of the models used in the proposed framework
features: inbound and Fwd Header Length.1, which are dropped. The for the DDoS-AT 2022 dataset are explained below:
preprocessing steps are similar to those applied to the CICDDoS2019
• Autoencoder: The structure of the AE model for the DDoS-AT-
dataset, except for the type and number of attacks.
2022 dataset is identical to that used for the CICDDoS2019
In this paper, the generated dataset is divided into five classes: dataset, as depicted in Table 3. The only differences are the
benign traffic is labeled as 0, low-volume low-rate HTTP and TCP Syn number of epochs used and the threshold value. The number of
low are grouped as low-rate attacks and labeled as 1, Slow read, Slow epochs utilized is 380, and the threshold value used for testing
body, Slow header are grouped as slow attacks and labeled as 2, UDP data is 0.02133163307724133.
flood, HTTP flood, TCP-Syn flood are grouped as flood attacks and • DNN: The architecture of the DNN model for the DDoS-AT-2022
labeled as 3, and flash traffic is labeled as 4. Thus, the output class dataset is the same as that used for the CICDDoS2019 dataset, as
for multiclass classification is five: benign (0), low-rate attack (1), slow depicted in Table 4. The only difference is the number of epochs,
attack (2), flood attack (3), and flash traffic (4). which is 70 in this case.
• LSTM: The structure of the LSTM model for the DDoS-AT-2022
dataset differs slightly from the architecture used for the CICD-
4.3. Hyperparameter values DoS2019 dataset, as shown in Table 5. The number of hidden
layers is set to 4, with neuron sizes of 60, 50, 40, and 30, respec-
In this section, we tune the hyperparameter values of the models tively. The batch size used is 1024, and the number of epochs
used in the proposed framework for the CICDDoS2019 and DDoS-AT used is 9. There are 5 neurons in the output layer. The remaining
parameters are the same as those used for the CICDDoS2019
2022 datasets. The process is explained below:
dataset.
• GRU: The design of the GRU model for the DDoS-AT-2022 dataset
4.3.1. CICDDoS2019 dataset differs slightly from the architecture used for the CICDDoS2019
In this study, the talos tool [24] is utilized to tune the hyper- dataset, as shown in Table 6. The model uses 140 GRU units with
parameters of the models. Firstly, the models are created and their 3 hidden layers. The number of neurons in each hidden layer is
functionality are tested [24]. In the next step, hyperparameter space 90, 50, and 30, respectively. The batch size used is 1000. The
boundaries for AE, DNN, LSTM, and GRU are established in the param- training was done for 6 epochs, and there are 5 neurons in the
output layer. The remaining parameters are the same as those
eters dictionary, as depicted in Tables 3, 4, 5, and 6. The experiment
used for the CICDDoS2019 dataset.
was then run using the scan () function and results were evaluated
using Evaluate (). After experimenting with different hyperparameter 4.4. Testing using the trained model
values, the models with the best results are identified at a specific
hyperparameter values, as shown in Tables 3, 4, 5, and 6. These models As illustrated in Fig. 1, the trained AE model is utilized to categorize
were then used to predict the test data. test data as benign or an attack. Traffic that is predicted as benign is
6
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Table 3
AE hyperparameter space boundaries.
S. no. Hyperparameters Values Best value chosen
1. First neuron 128, 100, 69, 64, 84, 32 64
2. Second neuron 64, 50, 32, 16, 18, 8 32
3. Third neuron 64, 50, 32, 30, 16, 8 32
4. Fourth neuron 50, 32, 18, 16, 8 32
5. Fifth neuron 32, 16 32
6. Sixth neuron 32, 16 16
7. Seventh neuron 32, 16, 8 16
8. Eighth neuron 16, 8 8
9. Ninth neuron 16, 8 8
10. Tenth neuron 16, 8 –
11. Eleventh neuron 16, 8 –
12. Twelfth neuron 8, 6 –
13. Thirteen neuron 8 –
14. Activation Function tanh tanh
15. Bottleneck Layer 32, 16, 8 8
16. No. of encoder and decoder layers 4, 5, 6, 7, 8, 9, 10, 11 9
17. Dropout rate 0.0, 0.2, 0.3, 0.03, 0.4, 0.04, 0.5, 0.05 0.05
18. Batch size 512 512
19. Optimizer adam adam
20. Epochs 5, 10, 20, 50, 80, 100, 120 5
21. Loss Mean squared error Mean squared error
22. Threshold value – 0.0397644828300993
Table 4
DNN hyperparameter space boundaries.
S. No. Hyperparameters Values Best value chosen
1. First hidden neuron 160, 128, 69 69
2. Remaining hidden neurons 60, 50 50
3. No. of hidden layers 5, 4 4
4. Dropout rate 0, 0.01, 0.1, 0.02, 0.2, 0.03, 0.3 0
5. Batch size 1000, 700, 512, 128 512
6. Optimizer adam adam
7. Kernel initializer uniform uniform
8. Epochs 50, 20, 10, 2 20
9. Activation function in last layer softmax softmax
10. Loss categorical crossentropy categorical crossentropy
11. No. of the neuron at the output layer 7 7
Table 5
LSTM hyperparameter space boundaries.
S. no. Hyperparameters Values Best value chosen
1. LSTM units in layer 1 80 80
2. LSTM units in layer 2 60, 50 –
3. Hidden layers 2, 3 3
4. Neuron in the first hidden layer 60, 50, 30 60
5. Neuron in the second hidden layer 40, 50, 30 40
6. Neuron in the third hidden layer 50, 30 30
7. Activation for layers relu relu
8. Batch Size 2000, 1000, 700, 512, 256, 128 2000
9. Optimizer adam adam
10. Learning rate 0.001, 0.0001 0.001
11. Epochs 1, 5, 10 5
12. No. of the neuron at the output layer 7 7
13. Activation function in last layer softmax softmax
14. Loss categorical crossentropy categorical crossentropy
permitted to pass through the network, while predicted attack traffic 4.5. Performance metrics
is further identified into various types of DDoS attacks using trained
DNN, LSTM, and GRU models. Algorithms 1 to 3 provide a thorough The paper assesses the most widely used performance metrics such
description of the proposed approach using the CICDDoS2019 dataset. as accuracy, precision, recall, F-measure (also known as F1-score), and
The process is the same for the DDoS-AT-2022 dataset, except for AUC-ROC.
the number of records and classes. Algorithm 1 outlines the training Accuracy: This is determined by the proportion of correct pre-
approach of AE specifically for the CICDDoS-2019 dataset. Algorithms dictions done by the model out of all available classes [33]. The
2 elaborate on the training strategy of DNN, LSTM, and GRU models formula for calculating this is as follows: TP+TN/Total where TP is true
using the CICDDoS-2019 dataset. Subsequently, algorithm 3 describes positives and TN is true negatives.
the testing procedure of the proposed approach DL-2P-DDoSADF, over Precision: It is the proportion of correctly classified positive records
the CICDDoS-2019 dataset. among the total number of records predicted to be positive. It can be
The results of this approach are described in Section 5. expressed as TP/(TP+FP), where FP is false positives [34].
7
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Table 6
GRU hyperparameter space boundaries.
S. no. Hyperparameters Values Best value chosen
1. GRU units 70, 80 80
2. Hidden layers 2, 3 3
3. Neuron in the first hidden layer 50, 60 50
4. Neuron in the second hidden layer 50, 40 50
5. Neuron in the third hidden layer 50, 30 30
6. Activation for Layers relu relu
7. Batch Size 2000, 1000, 700, 512, 256, 128 2000
8. Optimizer adam adam
9. Learning rate 0.001. 0.0001 0.001
10. Epochs 1, 5, 10 10
11. No. of the neuron at the output layer 7 7
12. Activation function in last layer softmax softmax
13. Loss categorical crossentropy categorical crossentropy
Algorithm 1 The training algorithm for AE model (first phase) of proposed DL-2P-DDoSADF for CICDDoS2019 dataset
Input:
CSV files: 𝐵𝐸𝑁𝐼𝐺𝑁 𝑡𝑟𝑎𝑓 𝑓 𝑖𝑐 𝑓 𝑟𝑜𝑚 𝑡ℎ𝑒 12𝑡ℎ 𝐽 𝑎𝑛𝑢𝑎𝑟𝑦 𝑜𝑓
𝑡ℎ𝑒 𝐶𝐼𝐶𝐷𝐷𝑜𝑆2019 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
Output:
Trained model: 𝐴𝐸_𝑚𝑜𝑑𝑒𝑙
1: Read the 𝑐𝑠𝑣 files dated January 12𝑡ℎ from the CICDDoS2019 dataset.
2: Extract 𝐵𝐸𝑁𝐼𝐺𝑁 traffic from the read 𝑐𝑠𝑣 files
3: Select the 69 features to be used for the 𝐴𝐸.
4: Drop the records which have NaN, and infinity values.
5: Got BENIGN← 101227 as final training data.
6: Normalize the data using MinMax Scaler ().
7: Split the dataset into train and validation with ratio 80 ∶ 20.
8: Hyperparameter tuning using Talos tool:
(1) Preparation and testing of the model.
(2) Setting the parameter space boundaries in the Params dictionary.
(3) Configure the experiment and run the hyperparameters with Scan ().
(4) Evaluate the result.
(5) Got the acceptable model.
9: Got the trained 𝐴𝐸 model from 𝑆𝑡𝑒𝑝 8
10: Threshold value chosen: 0.0397644828300993.
11: Save the trained 𝐴𝐸_𝑚𝑜𝑑𝑒𝑙.
True positive rate (TPR): It is also known Recall [34]. It can be Table 7
expressed as TP/(TP+FN) where FN is false negatives. Confusion matrix and performance metrics over CICDDoS2019 dataset.
F-measure: It is also known as F1 score. It can be represented as Confusion matrix and performance metrics w.r.t.benign
of classification models across different threshold values. Its formula is Benign (0) 9548 1586 Precision: 0.9102
Attack (1) 942 3 863 805 Recall: 0.8575
given below [35,36]: AUC = ((Recall - False Alarm) + 100)/200.
F1-Score: 0.8830
Confusion matrix and performance metrics w.r.t.attack
5. Results and discussions
Predicted values Performance metrics
Actual values
Attack (1) Benign (0) Accuracy: 0.9993
5.1. Experimental environment
Attack (1) 3 863 805 942 Precision: 0.9995
Benign (0) 1586 9548 Recall: 0.9997
The experiments utilized an Ubuntu operating system (OS) (22.04.1
F1-Score: 0.9995
LTS), Intel® Xeon (R) Gold 5220R [email protected] GHz× 48, 128 GiB RAM
with Graphics NVIDIA Corporation GA 104GL [RTX A4000]. The deep
learning models are developed using Python 3.9.7, TensorFlow and
Keras libraries. The results of the proposed approach for both datasets figures, indicate that the model is effective in detecting DDoS attacks,
are explained below: however, it has a lower recall value compared to other metrics for
benign data. The reason for this is that the AE (phase-I of the proposed
5.2. Results of proposed DL-2P-DDoSADF over CICDDoS2019 dataset approach) has incorrectly identified the benign traffic as attack traffic.
In order to improve the results, a second phase was implemented to
5.2.1. Results of binary classifier (Autoencoder) filter out normal traffic that was misclassified by the autoencoder and
The trained AE model has been tested on the test data, as described to identify specific types of DDoS attacks.
in Table 2, using the CICDDoS 2019 dataset. The results are then Table 2 demonstrates that there are a total of 3,864,747 attack
presented as two confusion matrices, shown in Table 7 for benign and flows and 11,134 benign flows in the test data. In the first phase,
attack data respectively. The performance metrics, as shown in these the AE functioned as a binary classifier and correctly identified 9548
8
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Algorithm 2 The training algorithm for second phase of proposed DL-2P-DDoSADF for CICDDoS-2019 dataset
Input:
CSV files: 𝐵𝐸𝑁𝐼𝐺𝑁, 𝐷𝑟𝐷𝑜𝑆 − 𝑈 𝐷𝑃 , 𝐷𝑟𝐷𝑜𝑆 − 𝑀𝑆𝑆𝑄𝐿, 𝐷𝑟𝐷𝑜𝑆 − 𝐿𝐷𝐴𝑃 , 𝐷𝑟𝐷𝑜𝑆 − 𝑁𝑒𝑡𝐵𝐼𝑂𝑆, 𝑆𝑦𝑛, 𝑈 𝐷𝑃 −
𝑙𝑎𝑔 𝑓 𝑟𝑜𝑚 𝑡ℎ𝑒 12𝑡ℎ 𝐽 𝑎𝑛𝑢𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝐶𝐼𝐶𝐷𝐷𝑜𝑆2019 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑐𝑟𝑒𝑎𝑡𝑒𝑑 𝑓 𝑜𝑟 𝑡ℎ𝑒 𝐴𝐸
(𝑖𝑛 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1)
Output:
Trained model: 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀 𝑎𝑛𝑑 𝐺𝑅𝑈 𝑚𝑜𝑑𝑒𝑙𝑠
1: Read 𝑐𝑠𝑣 files of BENIGN, DrDoS-UDP, DrDoS-MSSQL, DrDoS-LDAP, DrDoS-NetBIOS, Syn, UDP-lag from the 12𝑡ℎ January of the CICDDoS2019
dataset and the training dataset created for the AE (in algorithm 1).
2: Select the 69 features to be used for the 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 .
3: Drop the records which have NaN, and infinity values.
4: Got the following as final training data.
(1) Benign ← 84679
(2) DrDoS-MSSQL← 2197825
(3) DrDoS-NetBIOS← 1981478
(4) DrDoS-UDP← 1546136
(5) DrDoS-LDAP← 1071571
(6) Syn← 690036
(7) UDP-lag← 165336
5: Normalize the data using MinMax Scaler ().
6: 𝑡𝑜_𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 function converts the classess: “BENIGN= 0, DrDoS LDAP=1, DrDoS MSSQL=2, DrDoS NetBIOS=3, DrDoS UDP=4, Syn=5,
UDP-Lag=6” into the matrix of binary class.
7: Hyperparameter tuning using Talos tool:
(1) Preparation and testing of the models (𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 ).
(2) Setting the parameter space boundaries in the Params dictionary.
(3) Configure the experiment and run the hyperparameters with Scan ().
(4) Evaluate the result.
(5) Got the acceptable models.
8: Got the trained 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 models from 𝑆𝑡𝑒𝑝 7
9: Save the trained 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 models.
9
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Table 8
Classwise performance metrics using DNN, LSTM, and GRU over CICDDoS2019 dataset.
Class type Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Support
(DNN) (DNN) (DNN) (LSTM) (LSTM) (LSTM) (GRU) (GRU) (GRU)
Benign 0.90 0.97 0.93 0.82 0.88 0.85 0.77 0.92 0.84 1586
LDAP 0.94 0.98 0.96 0.93 0.99 0.96 0.94 0.98 0.96 374 474
MSSQL 0.98 0.96 0.97 0.98 0.91 0.95 0.98 0.94 0.96 1 116 712
NetBIOS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 704 085
UDP 0.98 0.98 0.98 0.91 0.97 0.94 0.94 0.98 0.96 756 380
Syn 1.00 0.93 0.96 1.00 1.00 1.00 1.00 0.93 0.96 911 886
UDPLag 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 268
Macro avg 0.83 0.83 0.83 0.81 0.82 0.81 0.80 0.82 0.81 3 865 391
Weighted avg 0.98 0.97 0.97 0.97 0.97 0.97 0.98 0.96 0.97 3 865 391
Accuracy 0.97 0.97 0.96 3 865 391
Table 9
Confusion matrix and performance metrics over DDoS-AT-2022 dataset.
Confusion matrix and performance metrics w.r.t.benign
Predicted values Performance metrics
Actual values
Benign (0) Attack (1) Accuracy: 0.9908
Benign (0) 7229 1209 Precision: 0.9931
Attack (1) 50 128 824 Recall: 0.8567
F1-Score: 0.9198
Confusion matrix and performance metrics w.r.t.attack
Predicted values Performance metrics
Actual values
Attack (1) Benign (0) Accuracy: 0.9908
Attack (1) 128 824 50 Precision: 0.9907
Benign (0) 1209 7229 Recall: 0.9996
F1-Score: 0.9951
flows out of 11,134 benign flows and 3,863,805 flows out of 3,864,747
attack flows, as shown in Table 7. The predicted benign flows, 9548
+ 942, were permitted to pass through the network while the attack
flows, 3,863,805 + 1586, were further processed in the second phase
of detection.
10
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Table 10
Classwise performance metrics using DNN, LSTM, and GRU model over DDoS-AT-2022 dataset.
Class type Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Support
(DNN) (DNN) (DNN) (LSTM) (LSTM) (LSTM) (GRU) (GRU) (GRU)
Benign 0.99 1.00 1.00 0.39 0.73 0.50 0.95 0.89 0.92 1209
Low-rate 0.98 0.65 0.78 0.81 0.73 0.77 0.97 0.58 0.72 13 745
Slow rate 0.89 0.98 0.93 0.92 0.88 0.90 0.87 0.94 0.90 35 360
Flood 0.99 1.00 0.99 0.96 0.99 0.98 0.96 0.99 0.98 59 769
Flash traffic 0.99 1.00 1.00 0.99 1.00 0.99 0.99 1.00 0.99 19 950
Macro avg 0.97 0.93 0.94 0.81 0.87 0.83 0.95 0.88 0.90 130 033
Weighted avg 0.96 0.96 0.95 0.93 0.93 0.93 0.94 0.94 0.93 130 033
Accuracy 0.96 0.93 0.94 130 033
Table 11
Comparison of proposed approach with existing work.
References Approach used Dataset used Accuracy Precision Recall F1-score TNR FPR
Choi et al. [7] AE NSL-KDD 91.70 97.68 84.68 90.71 98.15 –
Yang et al. [8] AE Synthetic (SYNT) – – 98.32 – – 0.38
UNB 2017 – – 94.10 – – 1.88
Meira et al. [9] AE NSL-KDD – 53 77 63 – –
ISCX – 76 63 69 – –
Tang et al. [10] LightGBM-AE NLS-KDD 89.82 91.81 90.16 90.98 – –
Song et al. [11] AE NSL-KDD 88.7 – 85.1 89.5 – 0.066
Hou et al. [12] NAE and DNN NSL-KDD 90.03 88.44 94.88 92.21 – –
Aktar et al. [13] AE NSL-KDD 96.08 96.10 96.08 96.08 – –
AE CIC-IDS2017 92.45 92.46 92.45 92.45 – –
AE CIC-DDoS2019 93.41%-97.58% – – – – –
DL-2P-DDoSADF AE (binary) and DNN (multiclass): CICDDoS 2019 99.93 91.02 85.75 88.30 99.97 0.024
(Proposed approach) Phase-I: AE (binary)
Phase-II: DNN (Multiclass) CICDDoS 2019 97 98 97 97 – –
DL-2P-DDoSADF Phase-I: AE (binary) DDoS-AT-2022 99.08 99.31 85.67 91.98 99.96 0.038
(Proposed approach)
Phase-II: DNN (Multiclass) DDoS-AT-2022 96 96 96 95 – –
(4). Also, these models are able to filter out benign traffic that has been
misclassified as attack traffic by the AE during the first phase.
The DNN model provides the best results for multiclass classification
compared to the LSTM and GRU models. The DNN model outperforms
both LSTM and GRU models for all classes (0, 1, 2, 3, 4), with an
overall accuracy of 96% for multiclass classification. The DNN model
performed poorly for low rate attacks compared to other attacks but
showed good results because low-rate traffic is very difficult to detect
due to its volume and patterns being similar to legitimate user traffic.
The LSTM model has worse results in predicting class 0 (i.e., benign
traffic) because the number of records of benign traffic is very low, and
these records are randomly distributed over the test data. Consequently,
the limited number of benign records is insufficient to provide the
model with enough information to identify dominant patterns effec-
tively. The GRU results are comparable to the DNN model. The results
Fig. 4. Comparison of overall accuracy for DNN, LSTM and GRU models over
CICDDoS2019 dataset. of the DNN, LSTM, and GRU models are presented in Table 10. The
F1-score of class 1 is less than 80% in the case of the DNN model,
which is due to low-rate attacks. The LSTM model performed poorly in
5.3. Results of proposed DL-2P-DDoSADF over DDoS-AT-2022 dataset predicting benign and low-rate attack, as the F1 scores of classes 0 and
1 are very low (50% and 77% respectively) compared to other classes.
The GRU model has the F1 score value of 72% for low-rate attacks.
The DDoS-AT-2022 dataset contains a blend of benign traffic, flash Additionally, all three models (DNN, LSTM, and GRU) performed well
traffic, and various DDoS attacks at the transport and application layer in identifying flash traffic. The DNN model has the highest AUC-ROC
with varying attack rates (such as low, slow, and flood). Our proposed values for all classes compared to the LSTM and GRU models. The
DL-2P-DDoSADF has successfully predicted all these types of traffic. ROC-AUC curves for these models are shown in Figs. 5(a)–5(c).
The results for the AE are presented in Table 9. The AE has a lower Table 10 compares the precision, recall, and F1 scores of the DNN,
recall rate (85.6% in the case of the Confusion matrix w.r.t. Benign), LSTM, and GRU models on the DDoS-AT 2022 dataset, specifically for
meaning that benign traffic is wrongly classified as the attack traffic. individual DDoS attacks, flash traffic, and benign traffic. The results
To address this limitation, we have introduced a second phase in the reveal that the DNN model outperformed the LSTM and GRU models
proposed approach to filter out the normal traffic that was mistakenly in class-wise classification, as indicated by its higher precision, recall,
classified as attack traffic by the AE. Additionally, this phase aims to and F1 scores.
identify the types of attacks accurately. The rest of the AE results are Furthermore, another Fig. 6 illustrates the overall accuracy for mul-
virtuous. The predicted attacks by the AE model are fed to the DNN, ticlass classification on the DDoS-AT 2022 dataset. The results indicate
LSTM, and GRU models to classify them into five classes: benign (0), that the DNN model achieved higher overall accuracy of 96% than both
low-rate attack (1), slow attack (2), flood attack (3), and flash traffic the LSTM (93%) and GRU (94%) models.
11
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
Fig. 5. AUC-ROC curve for DNN, LSTM, and GRU models over DDoS-AT-2022 dataset.
12
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609
References [20] Klar M, Glatt M, Aurich JC. Performance comparison of reinforcement learning
and metaheuristics for factory layout planning. CIRP J Manuf Sci Technol
[1] India to have 900 million internet users by 2025: Report | IBEF. 2022, https: 2023;45:10–25.
//www.ibef.org/news/india-to-have-900-million-internet-users-by-2025-report. [21] Xu X, Hu H, Liu Y, Tan J, Zhang H, Song H. Moving target defense of routing
[2] Patil NV, Krishna CR, Kumar K. Distributed frameworks for detecting dis- randomization with deep reinforcement learning against eavesdropping attack.
tributed denial of service attacks: A comprehensive review, challenges and future Digit Commun Netw 2022;8(3):373–87.
directions. Concurr Comput: Pract Exper 2021;33:e6197. [22] Ajao LA, Apeh ST. Secure edge computing vulnerabilities in smart cities sustain-
[3] What is a distributed denial-of-service (DDoS) attack? | Cloudflare. 2022, https: ability using petri net and genetic algorithm-based reinforcement learning. Intell
//www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/. Syst Appl 2023;18:200216.
[4] Kaspersky DDoS report, Q1 2022 | Securelist. 2022, https://securelist.com/ddos- [23] Ko I, Chambers D, Barrett E. Adaptable feature-selecting and threshold-moving
attacks-in-q1-2022/106358/. complete autoencoder for DDoS flood attack mitigation. J Inf Secur Appl
[5] Hacktivists step back giving way to professionals: a look at DDoS in
2020;55:102647.
Q3 2022 | Kaspersky. 2022, https://www.kaspersky.com/about/press-
[24] GitHub - autonomio/talos: Hyperparameter Optimization for TensorFlow, Keras
releases/2022{_}hacktivists-step-back-giving-way-to-professionals-a-look-at-
and PyTorch. 2022, https://github.com/autonomio/talos.
ddos-in-q3-2022.
[25] Yungaicela-Naula NM, Vargas-Rosales C, Perez-Diaz JA. SDN-based architecture
[6] 20+ DDoS attack statistics and facts for 2018–2022. 2022, https://www.
comparitech.com/blog/information-security/ddos-statistics-facts/. for transport and application layer DDoS attack detection by using machine and
[7] Choi H, Kim M, Lee G, Kim W. Unsupervised learning approach for deep learning. IEEE Access 2021;9:108495–512.
network intrusion detection system using autoencoders. J Supercomput [26] Assis MV, Carvalho LF, Lloret J, Proença ML. A GRU deep learning system against
2019;75(9):5597–621. attacks in software defined networks. J Netw Comput Appl 2021;177:102942.
[8] Yang K, Zhang J, Xu Y, Chao J. DDoS attacks detection with autoencoder. In: [27] Amaizu GC, Nwakanma CI, Bhardwaj S, Lee JM, Kim DS. Composite and
Proceedings of IEEE/IFIP network operations and management symposium 2020: efficient DDoS attack detection framework for B5G networks. Comput Netw
Management in the age of softwarization and artificial intelligence, NOMS 2020. 2021;188:107871.
Institute of Electrical and Electronics Engineers Inc.; 2020. [28] Cil AE, Yildiz K, Buldu A. Detection of DDoS attacks with feed forward based
[9] Meira J, Andrade R, Praça I, Carneiro J, Bolón-Canedo V, Alonso-Betanzos A, deep neural network model. Expert Syst Appl 2021;169:114520.
Marreiros G. Performance evaluation of unsupervised techniques in cyber-attack [29] Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA. Developing realistic dis-
anomaly detection. J Ambient Intell Humaniz Comput 2020;11(11):4477–89. tributed denial of service (DDoS) attack dataset and taxonomy. In: Proceedings
[10] Tang C, Luktarhan N, Zhao Y. An efficient intrusion detection method based on - International Carnahan conference on security technology, Vol. 2019-October.
lightgbm and autoencoder. Symmetry 2020;12(9):1458. Institute of Electrical and Electronics Engineers Inc.; 2019.
[11] Song Y, Hyun S, Cheong YG. Analysis of autoencoders for network intrusion
[30] Mittal M, Kumar K, Behal S. DDoS-AT-2022: a distributed denial of service attack
detection. Sensors 2021;21(13):4294.
dataset for evaluating DDoS defense system. Proc Indian Nat Sci Acad 2023;1–19,
[12] Hou Y, Fu Y, Guo J, Xu J, Liu R, Xiang X. Hybrid intrusion detection model based
2023.
on a designed autoencoder. J Ambient Intell Humaniz Comput 2022;1:1–11.
[31] CORE. The CORE Emulator. 2016, http://www.nrl.navy.mil/itd/ncs/products/
[13] Aktar S, Yasin Nur A. Towards DDoS attack detection using deep learning
approach. Comput Secur 2023;129:103251. core.
[14] Lopes IO, Zou D, Abdulqadder IH, Ruambo FA, Yuan B, Jin H. Effective net- [32] sklearn.preprocessing.MinMaxScaler — scikit-learn 1.2.0 documentation.
work intrusion detection via representation learning: A Denoising AutoEncoder 2022, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.
approach. Comput Commun 2022;194:55–65. MinMaxScaler.html.
[15] Sun G, Wen Y, Li Y. Instance segmentation using semi-supervised learning for [33] Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on? | by
fire recognition. Heliyon 2022;8(12):e12375. Salma Ghoneim | Towards Data Science. 2022, https://towardsdatascience.com/
[16] Zha W, Hu L, Duan C, Li Y. Semi-supervised learning-based satellite remote accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124.
sensing object detection method for power transmission towers. Energy Rep [34] Precision and recall in machine learning - Javatpoint. 2022, https://www.
2023;9:15–27. javatpoint.com/precision-and-recall-in-machine-learning.
[17] Menezes GK, Astolfi G, Martins JAC, Castelão Tetila E, da Silva Oliveira Junior A, [35] Han J, Kamber M, Pei J. Data mining: Concepts and techniques. Elsevier Inc.;
Gonçalves DN, Marcato Junior J, Silva JA, Li J, Gonçalves WN, Pistori H. Pseudo- 2012.
label semi-supervised learning for soybean monitoring. Smart Agric Technol [36] Bhuvaneswari Amma NG, Subramanian S. VCDeepFL: Vector convolutional deep
2023;4:100216. feature learning approach for identification of known and unknown denial
[18] Aamir M, Ali Zaidi SM. Clustering based semi-supervised machine learning for
of service attacks. In: IEEE region 10 annual international conference, Pro-
DDoS attack classification. J King Saud Univ - Comput Inf Sci 2021;33(4):436–46.
ceedings/TENCON, Vol. 2018-October. Institute of Electrical and Electronics
[19] Reinforcement learning - GeeksforGeeks. 2023, https://www.geeksforgeeks.org/
Engineers Inc.; 2019, p. 640–5.
what-is-reinforcement-learning/.
13