0% found this document useful (0 votes)
28 views13 pages

DL 2P DDoSADF

Uploaded by

Karim Schneit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

DL 2P DDoSADF

Uploaded by

Karim Schneit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Journal of Information Security and Applications 78 (2023) 103609

Contents lists available at ScienceDirect

Journal of Information Security and Applications


journal homepage: www.elsevier.com/locate/jisa

DL-2P-DDoSADF: Deep learning-based two-phase DDoS attack detection


framework
Meenakshi Mittal a,b ,∗, Krishan Kumar a , Sunny Behal c
a Department of Information Technology, UIET, Panjab University, Chandigarh, India
b
Department of Computer Science and Technology, Central University of Punjab, Bathinda, Punjab, India
c
Department of CSE, Shaheed Bhagat Singh State University, Ferozepur, Punjab, India

ARTICLE INFO ABSTRACT

Keywords: In today’s tech-driven world, while Internet-based applications drive social progress, their architectural
Deep learning weaknesses, inadequate security measures, lack of network segmentation, unsecured IoT devices etc., offer
Autoencoder ample opportunities for attackers to launch a multitude of attacks on their services. Despite numerous security
Reconstruction error
solutions, the frequent changes in the methods employed by attackers present a challenge for security systems
Deep neural network
to stay up to date. Moreover, the existing machine learning approaches are confined to known attack patterns
CICDDoS2019 dataset
DDoS-AT 2022 dataset
and necessitate annotated data. This paper proposes a deep learning-based two-phase DDoS attack detection
framework named DL-2P-DDoSADF. The proposed framework has been validated using the CICDDoS2019 and
DDoS-AT-2022 datasets. In the first phase, Autoencoder (AE) has been trained using the legitimate traffic and
threshold value has been set using Reconstruction Error (RE). The test data comprising legitimate and attack
traffic has been used to validate the proposed approach efficacy. The initial phase entails utilizing a trained
AE model to enable the passage of predicted legitimate traffic through the network. In contrast, the predicted
attack traffic proceeds to the second phase to classify the type of attack it represents. The performance and
efficacy of various deep learning approaches: Deep Neural Network (DNN), Long Short-Term Memory (LSTM)
and Gated Recurrent Units (GRU) are compared as part of the second phase. The autoencoder displayed an
accuracy level of 99% in detecting both datasets in the initial phase. It has been observed that the DNN
produced an overall accuracy of 97% and 96% for the CICDDoS2019 and DDoS-AT-2022 datasets, respectively,
for multiclass classification. The DNN model performed better than LSTM and GRU models in the second phase.

1. Introduction In February 2022, websites for the Ministry of Defense and Privat
Bank in Ukraine were among those affected by DDoS attacks [4].
In the past decade, there has been a significant increase in the According to data from Q1 2022, 53.64% of these attacks were UDP
number of individuals who use the Internet. It is estimated that In- floods [4]. During the third quarter of 2022, DDoS threats grew,
dia will have as many as 900 million Internet users by 2025 [1]. particularly those executed by experienced hackers [5]. In September
However, this impressive growth comes with less secure devices and 2022, Google reported that it thwarted a massive DDoS attack that
unprotected paths for cyberattacks [2]. A cyberattack is an effort generated 46 million requests per second [6].
made without proper authorization to access and destroy information Therefore, based on the incidents mentioned above, it is crucial to
within a computer system or disrupt the entire computer network [3].
detect DDoS attacks promptly so that administrators can take preven-
Cybercriminals use various methods such as malware, phishing, SQL
tive measures on time.
injection, and Distributed Denial of Service (DDoS) attacks, depending
In the age of big data and the Internet of Things (IoT), generating a
on the system's and network's weaknesses. The most prevalent type of
high volume of unlabeled data is common [14], but labeling this data,
cyberattack is a DDoS attack, which involves overwhelming a target or
server with internet traffic to disrupt its services [3]. DDoS attacks are especially network traffic, is a complex task. This results in a scarcity
becoming a significant concern for the cyber world as they are easy to of labeled data for supervised machine learning (ML) [14]. Most of
carry out but difficult to detect. It made DDoS attacks a powerful tool the current research in this field uses supervised machine learning
for cybercriminals. to classify network traffic. The accuracy of machine learning models

∗ Corresponding author at: Department of Information Technology, UIET, Panjab University, Chandigarh, India.
E-mail address: [email protected] (M. Mittal).

https://doi.org/10.1016/j.jisa.2023.103609

2214-2126/© 2023 Elsevier Ltd. All rights reserved.


M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Table 1
Comparison of existing work.
References Approach used Features Detection or Dataset used Results
extraction classification
Choi et al. [7] AE AE AE NSL-KDD Accuracy: 0.9170, F1-score: 0.9071, Specificity:
0.9815, Precision: 0.9768, Recall: 0.8468
Yang et al. [8] AE AE AE Synthetic dataset (SYNT), UNB SYNT: Detection rate: 98.32, FPR:0.38. UNB:
2017, MAWI Detection rate: 94.10, FPR: 1.88
Meira et al. [9] AE AE AE NSL-KDD and ISCX NSL-KDD: F1 score: 63, Precision: 53, Recall: 77,
AUC: 83.65. ISCX: F1 score: 69, Precision: 76,
Recall: 63, AUC: 80.44
Tang et al. [10] LightGBM-AE LightGBM AE NLS-KDD Accuracy: 89.82, Recall: 90.16, Precision: 91.81,
F1-score: 90.98
Song et al. [11] AE AE AE NSL-KDD, IoTID20, and NSL-KDD: Accuracy: 0.887, TPR: 0.851, F1 score:
N-BaIoT 0.895, FPR: 0.066
Hou et al. [12] NAE and DNN NAE NAE and DNN NSL-KDD, BoT-IoT, and NSL-KDD: Accuracy: 90.03, Recall: 94.88,
N-baIoT Precision: 88.44, F1 score: 92.21
Aktar et al. [13] AE AE AE NSL-KDD, CIC-IDS2017, CIC-IDS2017: Precision: 92.46, Recall: 92.45,
CIC-DDoS2019 F1-Score: 92.45, Accuracy: 92.45, AUC: 92.45.
NSL-KDD: Precision: 96.10, Recall: 96.08, F1-Score:
96.08, Accuracy: 96.08, AUC: 96.08.
CICDDoS2019: Accuracy: 93.41%–97.58%.

is high when the training and evaluation data have similar patterns. • A dataset named DDoS-AT-2022 (DDoS attacks at the Application
However, in real-life scenarios, attackers use new patterns that these and Transport layer) has been generated as part of the work,
models cannot detect precisely. Autoencoder (AE) can predict zero-day comprising a mixture of benign traffic, flash traffic, and different
attacks if utilized as a classifier. types of DDoS attacks at the application and transport layers with
Previous studies have used AE either for binary classification or varying attack rates.
feature extraction (as shown in Table 1). AE is inadequate for identi- • The proposed approach’s performance has been validated using
fying specific forms of DDoS attacks when used for categorization and various evaluation metrics on the datasets viz: CICDDoS2019 and
incapable of discovering unknown or previously unseen attacks when DDoS-AT-2022.
used for feature extraction. Hence, there is a need for a technique that
can both detect previously unknown attacks and categorize attacks into The rest of the paper has been organized as follows: Section 2
various classes. discusses the DL-based DDoS attacks detection techniques found in
Most of the current techniques for detecting DDoS attacks rely on prior research; Section 3 presents the DL-based two-phase DDoS attacks
datasets that are deficient in the amalgamation of divergent categories detection framework; Section 4 outlines the research methodology;
of DDoS attacks, which display a diversity of attack rates and flash Section 5 shows the results and discussions, and finally, Section 6
traffic. Therefore, it is imperative to have a dataset that incorporates concludes and suggests future directions for the paper.
these features to achieve a more plausible outcome.
To overcome the above mentioned limitations, we proposed a Deep 2. Related work
Learning based Two-Phase DDoS attack Detection Framework named
DL-2P-DDoSADF that distinguishes itself from prior methods by com- Several researchers have proposed DL-based approaches to identify
bining the AE with other deep learning techniques, specifically Deep DDoS attacks. In the proposed approach, the Autoencoder (AE) has
Neural Network (DNN) technique. The performance of Long Short-Term been utilized as the classifier. Therefore, the existing literature on AEs
Memory (LSTM) and Gated Recurrent Units (GRU) were assessed for
as binary classifiers is listed as follows:
detecting attacks, but it was found that the categorical effectiveness of
Choi et al. [7] have employed different variations of autoencoders
DNN was better than that of LSTM and GRU. As a result, DNN was
such as Stacked Autoencoder (SAE), Denoising Autoencoder (DAE), and
selected for the second phase. Thus, in the first phase, AE detects zero-
Variational Autoencoder (VAE) to identify intrusions in the NSL-KDD
day attacks, while in the subsequent phase, the DNN categorizes the
dataset. The basic AE had one hidden layer comprising 32 units, and it
type of attack.
employed the ReLU activation function. The training dataset was split
To our knowledge, no known methods (as listed in Table 1) utilize
into three parts with normal instances making up 99%, 95%, and 90%
a two-phase approach similar to our proposed approach. Our proposed
method is beneficial in two ways: first, the autoencoder only needs and abnormal instances making up the remaining 1%, 5%, and 10%
legitimate traffic for training, making it effective for detecting new or respectively. The results showed that the basic Autoencoder model was
zero-day attacks. Second, it is effective in detecting different types of the most effective, achieving an accuracy of 91.70%, a F1 score of
DDoS attacks. The main contributions of this paper are listed as follows: 0.9071, a specificity of 0.9815, a precision of 0.9768, and a recall of
0.8468 using a training dataset with a normal to abnormal instance
• Proposed a deep learning-based two-phase DDoS attack detection ratio of 99% to 1%.
framework named DL-2P-DDoSADF: Yang et al. [8] have created a AE based DDoS attacks Detection
Framework, named AE-D3F, which is unsupervised and straightforward
– Autoencoder is trained using legitimate data in the first
to use. The model requires only normal data to be constructed. The AE
phase and performs binary classification.
model comprises an input layer, three hidden layers, and one output
– In the second phase, the attack traffic detected by AE is fed
layer. The layer sizes are 27 neurons for the input layer, followed by
into the trained DNN model to detect different DDoS attack
24, 16, 24, and 27 neurons for the three hidden layers and output
types. In addition, the second phase filters out the normal
layer, respectively. The activation function employed is leaky ReLU.
traffic that may have been misclassified as an attack by AE.
For optimization, the model uses the Adam optimizer, while the mean
• The various DL models (AE, DNN, LSTM, and GRU) are optimized squared error (MSE) serves as the loss function. Additionally, the batch
by tuning hyperparameters using an automated process. size is configured to 32. The threshold has been calculated as 𝛥𝑟𝑒 = c

2
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

* 𝛿𝑎𝑣𝑔 , where 𝛿𝑎𝑣𝑔 denotes the average reconstruction error of normal BoT-IoT, and N-baIoT datasets. The results over three datasets are as
samples, while c remains a fixed value. Experiments conducted on NSL-KDD: accuracy 90.03%, recall 94.88%, precision 88.44%, F1 score
synthetic and public datasets (SYNT, UNB 2017, MAWI) revealed that 92.21%, N-baIoT: accuracy 99.51, recall 99.81, precision 99.32, F1
normal traffic patterns remain consistent within a specific network score 99.56, BoTIoT: accuracy 99.80, recall 99.85, precision 99.95, F1
environment, but cannot be applied to another network. The AE-D3F score 99.90. The results show good performance of the proposed hy-
model can achieve nearly 100% detection rate with less than 0.5% FPR brid detection model compared to several other cutting-edge detection
on both known and unknown attack test sets, but it is necessary to set methods.
the RE threshold value appropriately. Aktar et al. [13] proposed a Deep Contractive Autoencoder (DCAE)
Meira et al. [9] claimed that the majority of the algorithms for to detect DDoS attacks more effectively than traditional IDSs. The
Intrusion detection are supervised techniques that have gaps in their DCAE, based on deep learning, is trained using benign instances, allow-
capability to detect unseen attacks. Therefore, the authors have used ing it to reconstruct benign inputs with minimal RE. However, for DDoS
many unsupervised learning approaches to detect unseen attacks. In attacks, the RE will be high. The authors differentiate between benign
this work, six unsupervised algorithms have been used that are ‘‘AE, and DDoS attacks by using the RE as a metric, employing the contrac-
Nearest Neighbor, K-Means, Isolation Forest, Support Vector Machines tive loss, which includes an additional penalty term alongside the tradi-
(SVM), and Scaled Convex Hull’’. The AE architecture is defined as tional reconstruction loss function of Autoencoders (AEs). The authors
having three hidden layers with 50, 5, and 50 units, respectively. The determined the threshold value for their model by analyzing the range
hyperbolic tangent activation function is applied, and the training is of minimum and maximum RE from the training set. They evaluated
done for 20 epochs. The threshold value used is 0.002. These algo- their proposed model using three datasets: CIC-IDS2017, NSL-KDD, and
rithms are assessed using the NSL-KDD and ISCX datasets. The results CIC-DDoS2019, focusing exclusively on DoS/DDoS-related attacks. The
showed that all the techniques used are proficient to detect most of the model was configured with 100 epochs, a batch size of 32, and utilized
anomalies with suitable performance. the Adam optimizer with the contractive loss function. For the NSL-
Tang et al. [10] has proposed the Network Intrusion Detection KDD dataset, two hidden layers with 60 and 30 neurons were used, and
System (IDS) in which the LightGBM approach is applied for feature for CIC-IDS2017, 32 and 16 neurons were used. They evaluated the CIC-
selection and the AE is used for training and detection. The detection DDoS2019 dataset with seven types of traffic and achieved accuracy
of intrusion is done through the proposed model using RE. The appro- ranging from 93.41% to 97.58%. On the NSL-KDD and CIC-IDS2017
priate threshold was set according to the RE to distinguish between datasets, the model achieved accuracy rates of 96.08% and 92.45%,
benign and attack traffic. The AE architecture consists of three hidden respectively. The proposed model outperformed other deep learning
layers with 48, 32, and 16 neurons, respectively. Each layer uses the models (Basic AE, Variational AE, and LSTM AE) on all three datasets.
Relu activation function, and the optimizer employed is Adam. The After conducting a comprehensive review of the literature, it has
model is trained for 20 epochs with a learning rate of 0.004. The been noted that several research gaps require attention, as outlined
threshold value for the model is determined as the maximum of the below:
difference between TPR and FPR. The variational autoencoder VAE
and DAE was also evaluated in addition to the AE. These are then • Most existing methods use a dataset that lacks diversity, with no
compared with existing ML techniques like ‘‘XGBoost, Decision Tree combination of various types of DDoS attacks and attack rates,
(DT), Random Forest (RF), K-Nearest Neighbors (KNN), and GBDT’’. All as well as flash traffic. A more comprehensive and representative
these approaches were evaluated over the NSL-KDD dataset. The results dataset is needed to achieve more accurate results.
showed that LightGBM-AE can differentiate between benign and attack • Rare detection methods exist that aim to identify various types
traffic compared to other approaches. The accuracy of LightGBM-AE is of DDoS attacks and their associated attack rates, as well as flash
89.82%, the recall is 90.16%, the precision is 91.81%, and the F1-score traffic.
is 90.98%. • The distinction between legitimate traffic and various types of
Song et al. [11], have optimized the structure and hyperparameter DDoS attacks, which have different attack rates and flash traffic,
settings of the AE, the threshold value, and the latent size of the is challenging due to their shared behavioral characteristics.
autoencoder for detection. The AE architecture utilized for the NSL- • Most of the prior research that utilized AEs as classifiers (as listed
KDD dataset consists of five hidden layers, comprising both the encoder in Table 1) have a high False Positive Rate (FPR) and are unable
and decoder. These layers had neuron counts of 32, 16, 4, 16, and to perform multiclass classification for identifying various types
32, respectively. The threshold metric adopted was based on the Z- of DDoS attacks.
score derived from the standard normal distribution. They evaluated
the approach over NSL-KDD, IoTID20, and N-BaIoT datasets. The best
F1-score is 0.895 obtained using the latent size of 4 over the NSL-KDD 3. Proposed approach
dataset.
Hou et al. [12] have proposed a DL based hybrid detection ap- There are several methods that exist, including supervised, unsuper-
proach. First of all, they have designed a nonsymmetric autoencoder vised, semi-supervised, and reinforcement learning. However, we have
(NAE) to extract the patterns (or characteristics) of benign and detec- used unsupervised learning in phase-I of the proposed approach. As
tion of an anomaly. The NAE is designed to extract the latent feature of in supervised learning, the samples come with known category labels,
the network traffic. Its structure comprises of an encoder and a decoder. and the objective is to establish a relationship between the sample
The encoder is comprised of two parts with two different convolution features and these labels [15]. Typically, having a greater number
neural networks (CNN); hence it extracts the hidden information of of training samples enhances the accuracy of classification and the
traffic from two different perspectives. The decoder has five linear ability of the trained model to perform well on new, unseen data [15].
layers to reconstruct the input. The authors have also proposed a However, in the current era of big data and the IoTs, a vast amount of
scheme to extract latent features using the NAE encoder and used those unlabeled data is being generated, making the task of labeling this data
features to train the DNN model for detection. DNN is trained with all quite formidable. This is where unsupervised learning becomes valu-
(benign as well as attack) network traffic using latent features which able as it can handle such unlabeled data more effectively compared
are extracted through the NAE encoder. The proposed hybrid scheme to supervised learning. Semi-supervised learning addresses the issue
takes the detection results from NAE and DNN models. If either of the of inadequate labeled samples by incorporating numerous unlabeled
two detects an anomaly then the traffic will be counted as the attack samples along with a small set of labeled ones to train the classifier [15–
traffic. The proposed scheme has been evaluated over the NSL-KDD, 18]. Consequently, these algorithms draw knowledge from both labeled

3
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

• If the input traffic is classified as an attack, proceed to phase II.


Otherwise, consider it normal traffic and permit it to traverse the
network.

Phase II: DNN, LSTM and GRU for Attack Type Identification

• Take the dataset and preprocess it to be compatible with the input


format of DNN, LSTM, and GRU approaches.
• Train DNN, LSTM, and GRU approaches on both benign and
attack traffic data to classify different types of DDoS attacks.
• Use the trained DNN, LSTM, and GRU models to test the traffic
identified as an attack in Phase I. Then these models identify the
type of DDoS attack.
• Select the best-performing model for multiclass classification from
the DNN, LSTM, and GRU models.

A detailed description of the aforementioned algorithm is given below:


First phase:
According to [12] AE is an unsupervised deep-learning approach
that can learn the proficient representation of input data via training.
It comprises of two parts that are encoder and the decoder. The encoder
reduces the input dimensions to obtain its hidden layer representation
and the decoder takes those hidden layer representations to reconstruct
the input. AE is trained so that the reconstruction error (RE) between
the input and the reconstructed input can be reduced. The RE is the
difference between the input and an output (reconstructed input), to
create a threshold for categorization purposes [23]. In this paper, AE
is trained with normal traffic (as shown in Table 2) and thus, RE for
normal traffic will be low. When attack traffic is fed into the well-
trained AE, then RE for it would be high. Therefore, we have set the
threshold value to detect DDoS attacks. If the RE of the testing data
is greater than the threshold value then it is predicted as attack traffic
otherwise benign.
In the first phase, the AE separates the traffic into two categories:
benign and attack. The reasons for selecting the AE are:

• Adequate in situations where there is no attack data available for


training: AE is appropriate when there is a lack of attack data
for training purposes, as it can still be used effectively when only
benign traffic data is available.
Fig. 1. DL-2P-DDoSADF methodology. • Extract the features automatically: The AE is utilized for clas-
sifying traffic into benign and attack categories by extracting
the features automatically. Additionally, it helps in reducing the
features and these features are then used by other classifiers to
and unlabeled data, which adds complexity to the learning process. As a
categorize the traffic.
result, implementing and training semi-supervised learning models can
• Appropriate when labeled data is not available: The RE of fea-
become more challenging. Reinforcement learning is a self-sufficient,
tures is utilized to differentiate test data into benign and attack
self-learning system that primarily learns through a process of trial and
categories. The AE is trained on the features of benign traffic
error [19]. It takes actions to maximize rewards, effectively learning
and unlabeled data, making it useful when labeled data is not
by practical experience to achieve optimal results. Due to the trial-
available.
and-error nature of this learning approach, reinforcement learning
• Appropriate when dealing with a substantial amount of data:
algorithms may occasionally make unfavorable choices, posing chal-
Being a DL approach, the AE is equipped to handle a significant
lenges during training [20–22]. As a result, in Phase-I of the proposed
amount of data. With modern networks generating large amounts
approach, we opted to employ unsupervised learning instead of relying of data, there is a demand for detecting DDoS attacks from this
on supervised, semi-supervised, or reinforcement learning methods. substantial volume, making AE useful in this context.
In this study, we present a DL-based two-phase framework for • Appropriate for detecting new, anonymous, or zero-day attacks:
detecting DDoS attacks named DL-2P-DDoSADF. Here are the steps of This method is useful in detecting new, anonymous, or zero-day
the algorithm for the proposed approach for detecting DDoS attacks: attacks because the AE is only trained using benign traffic, and
Phase I: Autoencoder (AE) for Zero-day Attack Detection: then uses a threshold value to identify DDoS attacks. As a result,
this approach has the ability to detect zero-day attacks.
• Take normal traffic dataset and preprocess it for the AE input
format. As illustrated in Fig. 1, the AE operates on pre-processed data and
• Train the AE using the preprocessed data to reconstruct normal functions as a binary classifier. Traffic that is predicted to be benign is
network traffic. permitted to traverse the network, while traffic that is predicted to be
• Set a threshold reconstruction error to classify the test data as an attack is further processed in a second phase to identify the type of
attack or benign traffic. attack.

4
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Table 2
Details of training and testing data for the CICDDoS2019 and DDoS-AT-2022 datasets.
Dataset Approach Train Data Test data
CICDDoS 2019 Autoencoder BENIGN:101227 Attacks: 3864747, BENIGN:11134
DNN, LSTM, GRU DrDoS MSSQL:2197825, DrDoS NetBIOS: Benign: 1586, LDAP: 374474, MSSQL:
1981478, DrDoS UDP: 1546136, DrDoS LDAP: 1116712, NetBIOS: 704085, UDP: 756380,
1071571, Syn: 690036, UDP-lag: 165336, Syn: 911886,
BENIGN: 84679, Total: 7737061 UDPLag: 268
DDoS-AT-2022 Autoencoder Benign: 17928 TCP-Syn flood: 19999,
UDP flood: 19999,
Slow read: 19999, HTTP flood: 19999, Flash
traffic: 19999, Slow header: 8050, HTTP low
volume low rate: 7800, Slow body: 7506, TCP
Syn low: 6000,
Benign: 8486
DNN, LSTM, GRU UDP flood: 50001, HTTP flood: 50001, Benign: 1209, Low-rate attack: 13745, Slow
TCP-Syn flood: 50001, Flash traffic: 50001, attack: 35360, Flood attack: 59769, Flash
Slow read: 50001, Slow body: 8678, traffic: 19950
Slow header: 8099, TCP Syn low: 16159, HTTP
low volume low rate: 8023,
Benign: 30832

• Details about the training data used for DNN, LSTM, and GRU:
Second phase: To train DNN, LSTM, and GRU, seven types of traffic (‘‘BE-
In the second phase, DNN, LSTM, and GRU models are used to NIGN, DrDoS UDP, DrDoS MSSQL, DrDoS LDAP, DrDoS NetBIOS,
classify the types of attacks. These methods have been trained using Syn, UDP-lag’’) are taken from the 12th January data of the
both benign and attack traffic, as described in Table 2. The hyperpa- CICDDoS2019 dataset, and the training dataset created for the
rameters of the DNN, LSTM, and GRU models are optimized using the Autoencoder is also utilized (as shown in Table 2).
• Details about the test data: To evaluate the proposed approach,
Talos tool [24] from a set of parameter space boundaries. These trained
seven types of traffic (BENIGN, LDAP, MSSQL, NetBIOS, UDP,
models are used to test the traffic identified as an attack in Phase
Syn, UDPLag) are taken from the 11th March data and 20% of
I and then these models identify the type of DDoS attack. Previous
it is utilized as the testing data (as shown in Table 2).
works [25–28] have shown that DNN, LSTM, and GRU perform well,
so these approaches are chosen in this study.
4.1.2. DDoS-AT-2022 dataset [30]
The section outlines the design of a testbed that is being used
4. Research methodology to generate the DDoS-AT2022 dataset. The architecture/design of the
testbed is shown in Fig. 2.
The methodology followed in this paper is shown in Fig. 1. The The network traffic has been obtained from the DDoS-Testbed
sequential methodology of the proposed approach is explained below: (Fig. 2). The testbed has 4 physical PCs split into two groups of
two PCs each, with each running Ubuntu and Kali Linux operating
4.1. Dataset used systems. In addition, there are 3 D-Link routers, 3 Layer 2 switches,
and a Layer 3 switch, along with a Linux server that functions as
In this research, we employ two datasets, namely CICDDoS2019 the web server victim. To capture data, an Intel Xeon Linux server
[29] and self-generated DDoS-AT-2022 [30]. The choice of these (known as the capturing server) is connected to the Manage Switch. The
datasets is driven by their distinct characteristics. The CICDDoS2019 Manage Switch is connected to two attack networks (2 and 3 router)
dataset is being utilized as it serves as a recent benchmark and en- and one victim network. In order to increase the virtual nodes, the
CORE Emulator [31] is used and installed on three PCs. For detailed
compasses TCP/UDP based attacks. On the other hand, we include the
description of DDoS-AT-2022 dataset refer [30].
DDoS-AT-2022 dataset due to its diversity, containing legitimate traffic,
The DDoS-AT-2022 dataset has been used to train and assess diverse
flash traffic, and various DDoS attacks occurring at both the application
deep-learning models, as indicated in Table 2. The DDoS-AT-2022
and transport layers. Furthermore, the dataset includes attacks with
dataset comprises various types of DDoS attacks, including UDP flood,
different rates, such as low, slow, and flood attacks. By using these two
HTTP flood, TCP-Syn flood, Flash traffic, Slow read, Slow body, Slow
datasets, we aim to demonstrate the efficacy of our proposed approach
header, TCP Syn low, and HTTP low volume low rate, as well as benign
across diverse types of traffic. These datasets are explained below:
traffic.

4.1.1. CICDDoS2019 dataset 4.2. Preprocessing


The CICDDoS dataset contains DDoS attacks generated through
TCP/UDP-based protocols, which are divided into two classes: According to the paper [28], certain features such as Flow ID,
exploitation-based and reflection-based attacks. During the training Dst IP, Src IP, Dst Port, Src Port, and Protocol were not used in the
day, 12 types of DDoS attacks were generated, including NTP, WebD- CICDDoS2019 dataset. Additionally, 11 features (‘‘Timestamp, Bwd
DoS, DNS, SYN, NetBIOS, LDAP, UDP, MSSQL, SSDP, TFTP, SNMP, and PSH Flags, Fwd Bytes/Bulk Avg, Fwd Bulk Rate Avg, Bwd Bulk Rate
UDP-Lag. During the testing day, 7 attacks were generated: NetBIOS, Avg, Fwd URG Flags, Bwd Packet/Bulk Avg, Bwd Bytes/Bulk Avg, Fwd
UDP-Lag, LDAP, PortScan, MSSQL, UDP, and SYN [29]. The dataset has Packet/Bulk Avg, Bwd URG Flags, and SimillarHTTP’’) were removed
86 features. from the dataset [28]. In this study, 69 features are used for training
and evaluating the model [28]. These values are then normalized using
• Details about the training data used for the AE: To train the the MinMax Scaler() [32]. In the labels, benign is represented as 0 and
autoencoder, benign traffic is extracted from the 12th January the different types of attacks are represented as ‘‘DrDoS LDAP = 1,
data of the CICDDoS2019 dataset, and additional benign traffic DrDoS MSSQL = 2, DrDoS NetBIOS = 3, DrDoS UDP = 4, Syn = 5,
is taken from 80% of the 11th March data (as shown in Table 2). UDP-Lag = 6’’.

5
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Fig. 2. DDoS-testbed architecture for DDoS-AT-2022 dataset [30].

The input features that are used for the DDoS-AT-2022 dataset are 4.3.2. DDoS-AT-2022 dataset
the same as those used in the CICDDoS2019 dataset, except for two The configurations of the models used in the proposed framework
features: inbound and Fwd Header Length.1, which are dropped. The for the DDoS-AT 2022 dataset are explained below:
preprocessing steps are similar to those applied to the CICDDoS2019
• Autoencoder: The structure of the AE model for the DDoS-AT-
dataset, except for the type and number of attacks.
2022 dataset is identical to that used for the CICDDoS2019
In this paper, the generated dataset is divided into five classes: dataset, as depicted in Table 3. The only differences are the
benign traffic is labeled as 0, low-volume low-rate HTTP and TCP Syn number of epochs used and the threshold value. The number of
low are grouped as low-rate attacks and labeled as 1, Slow read, Slow epochs utilized is 380, and the threshold value used for testing
body, Slow header are grouped as slow attacks and labeled as 2, UDP data is 0.02133163307724133.
flood, HTTP flood, TCP-Syn flood are grouped as flood attacks and • DNN: The architecture of the DNN model for the DDoS-AT-2022
labeled as 3, and flash traffic is labeled as 4. Thus, the output class dataset is the same as that used for the CICDDoS2019 dataset, as
for multiclass classification is five: benign (0), low-rate attack (1), slow depicted in Table 4. The only difference is the number of epochs,
attack (2), flood attack (3), and flash traffic (4). which is 70 in this case.
• LSTM: The structure of the LSTM model for the DDoS-AT-2022
dataset differs slightly from the architecture used for the CICD-
4.3. Hyperparameter values DoS2019 dataset, as shown in Table 5. The number of hidden
layers is set to 4, with neuron sizes of 60, 50, 40, and 30, respec-
In this section, we tune the hyperparameter values of the models tively. The batch size used is 1024, and the number of epochs
used in the proposed framework for the CICDDoS2019 and DDoS-AT used is 9. There are 5 neurons in the output layer. The remaining
parameters are the same as those used for the CICDDoS2019
2022 datasets. The process is explained below:
dataset.
• GRU: The design of the GRU model for the DDoS-AT-2022 dataset
4.3.1. CICDDoS2019 dataset differs slightly from the architecture used for the CICDDoS2019
In this study, the talos tool [24] is utilized to tune the hyper- dataset, as shown in Table 6. The model uses 140 GRU units with
parameters of the models. Firstly, the models are created and their 3 hidden layers. The number of neurons in each hidden layer is
functionality are tested [24]. In the next step, hyperparameter space 90, 50, and 30, respectively. The batch size used is 1000. The
boundaries for AE, DNN, LSTM, and GRU are established in the param- training was done for 6 epochs, and there are 5 neurons in the
output layer. The remaining parameters are the same as those
eters dictionary, as depicted in Tables 3, 4, 5, and 6. The experiment
used for the CICDDoS2019 dataset.
was then run using the scan () function and results were evaluated
using Evaluate (). After experimenting with different hyperparameter 4.4. Testing using the trained model
values, the models with the best results are identified at a specific
hyperparameter values, as shown in Tables 3, 4, 5, and 6. These models As illustrated in Fig. 1, the trained AE model is utilized to categorize
were then used to predict the test data. test data as benign or an attack. Traffic that is predicted as benign is

6
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Table 3
AE hyperparameter space boundaries.
S. no. Hyperparameters Values Best value chosen
1. First neuron 128, 100, 69, 64, 84, 32 64
2. Second neuron 64, 50, 32, 16, 18, 8 32
3. Third neuron 64, 50, 32, 30, 16, 8 32
4. Fourth neuron 50, 32, 18, 16, 8 32
5. Fifth neuron 32, 16 32
6. Sixth neuron 32, 16 16
7. Seventh neuron 32, 16, 8 16
8. Eighth neuron 16, 8 8
9. Ninth neuron 16, 8 8
10. Tenth neuron 16, 8 –
11. Eleventh neuron 16, 8 –
12. Twelfth neuron 8, 6 –
13. Thirteen neuron 8 –
14. Activation Function tanh tanh
15. Bottleneck Layer 32, 16, 8 8
16. No. of encoder and decoder layers 4, 5, 6, 7, 8, 9, 10, 11 9
17. Dropout rate 0.0, 0.2, 0.3, 0.03, 0.4, 0.04, 0.5, 0.05 0.05
18. Batch size 512 512
19. Optimizer adam adam
20. Epochs 5, 10, 20, 50, 80, 100, 120 5
21. Loss Mean squared error Mean squared error
22. Threshold value – 0.0397644828300993

Table 4
DNN hyperparameter space boundaries.
S. No. Hyperparameters Values Best value chosen
1. First hidden neuron 160, 128, 69 69
2. Remaining hidden neurons 60, 50 50
3. No. of hidden layers 5, 4 4
4. Dropout rate 0, 0.01, 0.1, 0.02, 0.2, 0.03, 0.3 0
5. Batch size 1000, 700, 512, 128 512
6. Optimizer adam adam
7. Kernel initializer uniform uniform
8. Epochs 50, 20, 10, 2 20
9. Activation function in last layer softmax softmax
10. Loss categorical crossentropy categorical crossentropy
11. No. of the neuron at the output layer 7 7

Table 5
LSTM hyperparameter space boundaries.
S. no. Hyperparameters Values Best value chosen
1. LSTM units in layer 1 80 80
2. LSTM units in layer 2 60, 50 –
3. Hidden layers 2, 3 3
4. Neuron in the first hidden layer 60, 50, 30 60
5. Neuron in the second hidden layer 40, 50, 30 40
6. Neuron in the third hidden layer 50, 30 30
7. Activation for layers relu relu
8. Batch Size 2000, 1000, 700, 512, 256, 128 2000
9. Optimizer adam adam
10. Learning rate 0.001, 0.0001 0.001
11. Epochs 1, 5, 10 5
12. No. of the neuron at the output layer 7 7
13. Activation function in last layer softmax softmax
14. Loss categorical crossentropy categorical crossentropy

permitted to pass through the network, while predicted attack traffic 4.5. Performance metrics
is further identified into various types of DDoS attacks using trained
DNN, LSTM, and GRU models. Algorithms 1 to 3 provide a thorough The paper assesses the most widely used performance metrics such
description of the proposed approach using the CICDDoS2019 dataset. as accuracy, precision, recall, F-measure (also known as F1-score), and
The process is the same for the DDoS-AT-2022 dataset, except for AUC-ROC.
the number of records and classes. Algorithm 1 outlines the training Accuracy: This is determined by the proportion of correct pre-
approach of AE specifically for the CICDDoS-2019 dataset. Algorithms dictions done by the model out of all available classes [33]. The
2 elaborate on the training strategy of DNN, LSTM, and GRU models formula for calculating this is as follows: TP+TN/Total where TP is true
using the CICDDoS-2019 dataset. Subsequently, algorithm 3 describes positives and TN is true negatives.
the testing procedure of the proposed approach DL-2P-DDoSADF, over Precision: It is the proportion of correctly classified positive records
the CICDDoS-2019 dataset. among the total number of records predicted to be positive. It can be
The results of this approach are described in Section 5. expressed as TP/(TP+FP), where FP is false positives [34].

7
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Table 6
GRU hyperparameter space boundaries.
S. no. Hyperparameters Values Best value chosen
1. GRU units 70, 80 80
2. Hidden layers 2, 3 3
3. Neuron in the first hidden layer 50, 60 50
4. Neuron in the second hidden layer 50, 40 50
5. Neuron in the third hidden layer 50, 30 30
6. Activation for Layers relu relu
7. Batch Size 2000, 1000, 700, 512, 256, 128 2000
8. Optimizer adam adam
9. Learning rate 0.001. 0.0001 0.001
10. Epochs 1, 5, 10 10
11. No. of the neuron at the output layer 7 7
12. Activation function in last layer softmax softmax
13. Loss categorical crossentropy categorical crossentropy

Algorithm 1 The training algorithm for AE model (first phase) of proposed DL-2P-DDoSADF for CICDDoS2019 dataset
Input:
CSV files: 𝐵𝐸𝑁𝐼𝐺𝑁 𝑡𝑟𝑎𝑓 𝑓 𝑖𝑐 𝑓 𝑟𝑜𝑚 𝑡ℎ𝑒 12𝑡ℎ 𝐽 𝑎𝑛𝑢𝑎𝑟𝑦 𝑜𝑓
𝑡ℎ𝑒 𝐶𝐼𝐶𝐷𝐷𝑜𝑆2019 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
Output:
Trained model: 𝐴𝐸_𝑚𝑜𝑑𝑒𝑙

1: Read the 𝑐𝑠𝑣 files dated January 12𝑡ℎ from the CICDDoS2019 dataset.
2: Extract 𝐵𝐸𝑁𝐼𝐺𝑁 traffic from the read 𝑐𝑠𝑣 files
3: Select the 69 features to be used for the 𝐴𝐸.
4: Drop the records which have NaN, and infinity values.
5: Got BENIGN← 101227 as final training data.
6: Normalize the data using MinMax Scaler ().
7: Split the dataset into train and validation with ratio 80 ∶ 20.
8: Hyperparameter tuning using Talos tool:
(1) Preparation and testing of the model.
(2) Setting the parameter space boundaries in the Params dictionary.
(3) Configure the experiment and run the hyperparameters with Scan ().
(4) Evaluate the result.
(5) Got the acceptable model.
9: Got the trained 𝐴𝐸 model from 𝑆𝑡𝑒𝑝 8
10: Threshold value chosen: 0.0397644828300993.
11: Save the trained 𝐴𝐸_𝑚𝑜𝑑𝑒𝑙.

True positive rate (TPR): It is also known Recall [34]. It can be Table 7
expressed as TP/(TP+FN) where FN is false negatives. Confusion matrix and performance metrics over CICDDoS2019 dataset.

F-measure: It is also known as F1 score. It can be represented as Confusion matrix and performance metrics w.r.t.benign

2∗Recall∗Precision/(Recall+Precision) [33]. Predicted values Performance metrics


Actual values
AUC-ROC curve: The AUC-ROC curve evaluates the effectiveness Benign (0) Attack (1) Accuracy: 0.9993

of classification models across different threshold values. Its formula is Benign (0) 9548 1586 Precision: 0.9102
Attack (1) 942 3 863 805 Recall: 0.8575
given below [35,36]: AUC = ((Recall - False Alarm) + 100)/200.
F1-Score: 0.8830
Confusion matrix and performance metrics w.r.t.attack
5. Results and discussions
Predicted values Performance metrics
Actual values
Attack (1) Benign (0) Accuracy: 0.9993
5.1. Experimental environment
Attack (1) 3 863 805 942 Precision: 0.9995
Benign (0) 1586 9548 Recall: 0.9997
The experiments utilized an Ubuntu operating system (OS) (22.04.1
F1-Score: 0.9995
LTS), Intel® Xeon (R) Gold 5220R [email protected] GHz× 48, 128 GiB RAM
with Graphics NVIDIA Corporation GA 104GL [RTX A4000]. The deep
learning models are developed using Python 3.9.7, TensorFlow and
Keras libraries. The results of the proposed approach for both datasets figures, indicate that the model is effective in detecting DDoS attacks,
are explained below: however, it has a lower recall value compared to other metrics for
benign data. The reason for this is that the AE (phase-I of the proposed
5.2. Results of proposed DL-2P-DDoSADF over CICDDoS2019 dataset approach) has incorrectly identified the benign traffic as attack traffic.
In order to improve the results, a second phase was implemented to
5.2.1. Results of binary classifier (Autoencoder) filter out normal traffic that was misclassified by the autoencoder and
The trained AE model has been tested on the test data, as described to identify specific types of DDoS attacks.
in Table 2, using the CICDDoS 2019 dataset. The results are then Table 2 demonstrates that there are a total of 3,864,747 attack
presented as two confusion matrices, shown in Table 7 for benign and flows and 11,134 benign flows in the test data. In the first phase,
attack data respectively. The performance metrics, as shown in these the AE functioned as a binary classifier and correctly identified 9548

8
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Algorithm 2 The training algorithm for second phase of proposed DL-2P-DDoSADF for CICDDoS-2019 dataset
Input:
CSV files: 𝐵𝐸𝑁𝐼𝐺𝑁, 𝐷𝑟𝐷𝑜𝑆 − 𝑈 𝐷𝑃 , 𝐷𝑟𝐷𝑜𝑆 − 𝑀𝑆𝑆𝑄𝐿, 𝐷𝑟𝐷𝑜𝑆 − 𝐿𝐷𝐴𝑃 , 𝐷𝑟𝐷𝑜𝑆 − 𝑁𝑒𝑡𝐵𝐼𝑂𝑆, 𝑆𝑦𝑛, 𝑈 𝐷𝑃 −
𝑙𝑎𝑔 𝑓 𝑟𝑜𝑚 𝑡ℎ𝑒 12𝑡ℎ 𝐽 𝑎𝑛𝑢𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝐶𝐼𝐶𝐷𝐷𝑜𝑆2019 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑐𝑟𝑒𝑎𝑡𝑒𝑑 𝑓 𝑜𝑟 𝑡ℎ𝑒 𝐴𝐸
(𝑖𝑛 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1)
Output:
Trained model: 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀 𝑎𝑛𝑑 𝐺𝑅𝑈 𝑚𝑜𝑑𝑒𝑙𝑠
1: Read 𝑐𝑠𝑣 files of BENIGN, DrDoS-UDP, DrDoS-MSSQL, DrDoS-LDAP, DrDoS-NetBIOS, Syn, UDP-lag from the 12𝑡ℎ January of the CICDDoS2019
dataset and the training dataset created for the AE (in algorithm 1).
2: Select the 69 features to be used for the 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 .
3: Drop the records which have NaN, and infinity values.
4: Got the following as final training data.
(1) Benign ← 84679
(2) DrDoS-MSSQL← 2197825
(3) DrDoS-NetBIOS← 1981478
(4) DrDoS-UDP← 1546136
(5) DrDoS-LDAP← 1071571
(6) Syn← 690036
(7) UDP-lag← 165336
5: Normalize the data using MinMax Scaler ().
6: 𝑡𝑜_𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 function converts the classess: “BENIGN= 0, DrDoS LDAP=1, DrDoS MSSQL=2, DrDoS NetBIOS=3, DrDoS UDP=4, Syn=5,
UDP-Lag=6” into the matrix of binary class.
7: Hyperparameter tuning using Talos tool:
(1) Preparation and testing of the models (𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 ).
(2) Setting the parameter space boundaries in the Params dictionary.
(3) Configure the experiment and run the hyperparameters with Scan ().
(4) Evaluate the result.
(5) Got the acceptable models.
8: Got the trained 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 models from 𝑆𝑡𝑒𝑝 7
9: Save the trained 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 models.

Algorithm 3 The testing algorithm of proposed DL-2P-DDoSADF for CICDDoS2019 dataset


Input:
CSV files: 𝐵𝐸𝑁𝐼𝐺𝑁, 𝐿𝐷𝐴𝑃 , 𝑀𝑆𝑆𝑄𝐿, 𝑁𝑒𝑡𝐵𝐼𝑂𝑆, 𝑈 𝐷𝑃 , 𝑆𝑦𝑛, 𝑈 𝐷𝑃 𝐿𝑎𝑔 𝑡𝑟𝑎𝑓 𝑓 𝑖𝑐
𝑓 𝑟𝑜𝑚 𝑡ℎ𝑒 11𝑡ℎ 𝑀𝑎𝑟𝑐ℎ 𝑜𝑓 𝑡ℎ𝑒 𝐶𝐼𝐶𝐷𝐷𝑜𝑆2019 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
Trained models: 𝐴𝐸, 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 𝑚𝑜𝑑𝑒𝑙𝑠
Set threshold value= 0.0397644828300993
Output:
Prediction: 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝐵𝑖𝑛𝑎𝑟𝑦 𝑎𝑛𝑑 𝑀𝑢𝑙𝑡𝑖𝑐𝑙𝑎𝑠𝑠
1: Read 𝑐𝑠𝑣 files of BENIGN, LDAP, MSSQL, NetBIOS, UDP, Syn, UDPLag traffic from the 11𝑡ℎ March of the CICDDoS2019 dataset.
2: Extract 20% data from the read csv files.
3: Select the 69 features to be used for the trained 𝐴𝐸, 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 .
4: Label 𝐵𝐸𝑁𝐼𝐺𝑁 = 0, 𝐷𝑟𝐷𝑜𝑆𝐿𝐷𝐴𝑃 = 1, 𝐷𝑟𝐷𝑜𝑆𝑀𝑆𝑆𝑄𝐿 = 2, 𝐷𝑟𝐷𝑜𝑆𝑁𝑒𝑡𝐵𝐼𝑂𝑆 = 3, 𝐷𝑟𝐷𝑜𝑆𝑈 𝐷𝑃 = 4, 𝑆𝑦𝑛 = 5, 𝑈 𝐷𝑃 − 𝐿𝑎𝑔 = 6.
5: Drop the records which have NaN, and infinity values.
6: After 𝑠𝑡𝑒𝑝 5, keep a copy of records in a variable named 𝑟𝑒𝑎𝑙 for later use.
7: (1) Copy target output (labels) of records in 𝑙𝑎𝑏𝑒𝑙 variable.
(2) Copy the selected (69) features of records in 𝑡𝑒𝑠𝑡 variable.
8: Convert all records with a target output (label) of 0 to 𝑡𝑟𝑢𝑒, and set 𝑎𝑙𝑙 𝑜𝑡ℎ𝑒𝑟 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 to 𝑓 𝑎𝑙𝑠𝑒 in 𝑙𝑎𝑏𝑒𝑙 variable.
9: Normalize the data in 𝑡𝑒𝑠𝑡 variable using MinMax Scaler ().
10: Test the data of 𝑠𝑡𝑒𝑝 9 using trained 𝐴𝐸 𝑚𝑜𝑑𝑒𝑙.
11: Compare the 𝑅𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 obtained in 𝑠𝑡𝑒𝑝 10 with the 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑣𝑎𝑙𝑢𝑒 that was set earlier, and store the resulting comparison output
in the 𝑦ℎ𝑎𝑡 variable.
12: Compute the 𝑝𝑒𝑟𝑓 𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑚𝑒𝑡𝑟𝑖𝑐𝑠 for 𝑏𝑖𝑛𝑎𝑟𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑐𝑎𝑡𝑖𝑜𝑛 by comparing 𝑦ℎ𝑎𝑡 and 𝑙𝑎𝑏𝑒𝑙 (𝑖𝑛 𝑠𝑡𝑒𝑝 8) variables.
13: Retrieve the test data that was previously saved in the 𝑟𝑒𝑎𝑙 variable.
14: Remove the records from the 𝑟𝑒𝑎𝑙 variable that have a 𝑡𝑟𝑢𝑒 value in the 𝑦ℎ𝑎𝑡 variable.
15: (1) Copy the target output (label) of records into the 𝑌 variable.
(2) Copy the selected (69) features of records in 𝑋 variable.
16: (1) Normalize the data (in 𝑋) using MinMax Scaler ().
(2) Use 𝑡𝑜_𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 function on 𝑌 to convert the classes: “BENIGN= 0, LDAP=1, MSSQL=2, NetBIOS=3, UDP=4, Syn=5, UDPLag=6” into the
matrix of binary class.
17: Test the data (of 𝑠𝑡𝑒𝑝 16.(1)) using trained 𝐷𝑁𝑁, 𝐿𝑆𝑇 𝑀, 𝑎𝑛𝑑 𝐺𝑅𝑈 models individually.
18: Compute the 𝑝𝑒𝑟𝑓 𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑚𝑒𝑡𝑟𝑖𝑐𝑠 for 𝑚𝑢𝑙𝑡𝑖𝑐𝑙𝑎𝑠𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑐𝑎𝑡𝑖𝑜𝑛.

9
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Table 8
Classwise performance metrics using DNN, LSTM, and GRU over CICDDoS2019 dataset.
Class type Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Support
(DNN) (DNN) (DNN) (LSTM) (LSTM) (LSTM) (GRU) (GRU) (GRU)
Benign 0.90 0.97 0.93 0.82 0.88 0.85 0.77 0.92 0.84 1586
LDAP 0.94 0.98 0.96 0.93 0.99 0.96 0.94 0.98 0.96 374 474
MSSQL 0.98 0.96 0.97 0.98 0.91 0.95 0.98 0.94 0.96 1 116 712
NetBIOS 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 704 085
UDP 0.98 0.98 0.98 0.91 0.97 0.94 0.94 0.98 0.96 756 380
Syn 1.00 0.93 0.96 1.00 1.00 1.00 1.00 0.93 0.96 911 886
UDPLag 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 268
Macro avg 0.83 0.83 0.83 0.81 0.82 0.81 0.80 0.82 0.81 3 865 391
Weighted avg 0.98 0.97 0.97 0.97 0.97 0.97 0.98 0.96 0.97 3 865 391
Accuracy 0.97 0.97 0.96 3 865 391

Table 9
Confusion matrix and performance metrics over DDoS-AT-2022 dataset.
Confusion matrix and performance metrics w.r.t.benign
Predicted values Performance metrics
Actual values
Benign (0) Attack (1) Accuracy: 0.9908
Benign (0) 7229 1209 Precision: 0.9931
Attack (1) 50 128 824 Recall: 0.8567
F1-Score: 0.9198
Confusion matrix and performance metrics w.r.t.attack
Predicted values Performance metrics
Actual values
Attack (1) Benign (0) Accuracy: 0.9908
Attack (1) 128 824 50 Precision: 0.9907
Benign (0) 1209 7229 Recall: 0.9996
F1-Score: 0.9951

flows out of 11,134 benign flows and 3,863,805 flows out of 3,864,747
attack flows, as shown in Table 7. The predicted benign flows, 9548
+ 942, were permitted to pass through the network while the attack
flows, 3,863,805 + 1586, were further processed in the second phase
of detection.

5.2.2. Results of multiclass classifiers (DNN, LSTM, GRU)


The DNN model is utilized to detect the type of DDoS attacks and
benign traffic from the (3 863 805 + 1586) flows that were predicted as
an attack by the AE. The predicted attack traffic also includes benign
traffic that was incorrectly labeled as an attack by the AE.
Table 8 displays the classwise evaluation metrics for the DNN
model, and Fig. 3(a) depicts the AUC-ROC curve. The outcomes demon-
strate that the DNN model is unable to predict the UDP-lag attack due
to the limited number of UDP-lag records in the test data. Furthermore,
these records are dispersed randomly throughout the test data, and
their limited number does not provide sufficient information for the
model to identify the dominant pattern [14]. The performance of the
UDP-lag as shown by the AUC-ROC curve is not adequate. This suggests
that the classifier is not accurately distinguishing between positive and
negative cases. The DNN model demonstrates impressive performance
on benign traffic despite having a limited number of records. In addi-
tion to that, it showed commendable performance for various types of
traffic, such as LDAP, MSSQL, NetBIOS, UDP, and Syn. The AUC-ROC
results were satisfactory for all of them, except for UDP-lag traffic.
Additionally, we evaluated the results using both LSTM and GRU Fig. 3. AUC-ROC curve for DNN, LSTM, and GRU models over CICDDoS2019 dataset.

models after conducting binary classification in phase I. The class-


specific results for LSTM and GRU models can be seen in Table 8.
These models are also unable to predict the UDP-lag attack. The LSTM specifically for individual DDoS attacks and benign traffic. The DNN
and GRU models exhibit inferior performance for benign traffic in model has shown better performance than the LSTM and GRU models
comparison to the DNN model. Figs. 3(b) and 3(c) show the AUC-ROC in class-wise classification, as observed through higher precision, recall,
curves for both LSTM and GRU, respectively. However, when utilizing and F1 scores.
the GRU model, the results of the UDP-lag attack, as depicted by the Additionally, the overall accuracy for multiclass classification has
AUC-ROC curve, are superior compared to those obtained with the DNN been illustrated in a Fig. 4 over CICDDoS2019 dataset. The results
and LSTM models. indicate that the DNN and LSTM models attained a higher overall
Table 8 displays the precision, recall, and F1 score comparison accuracy of 97% than the GRU model. The GRU model achieved an
for the DNN, LSTM, and GRU models on the CICDDoS2019 dataset, overall accuracy of 96%for multiclass classification.

10
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Table 10
Classwise performance metrics using DNN, LSTM, and GRU model over DDoS-AT-2022 dataset.
Class type Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Support
(DNN) (DNN) (DNN) (LSTM) (LSTM) (LSTM) (GRU) (GRU) (GRU)
Benign 0.99 1.00 1.00 0.39 0.73 0.50 0.95 0.89 0.92 1209
Low-rate 0.98 0.65 0.78 0.81 0.73 0.77 0.97 0.58 0.72 13 745
Slow rate 0.89 0.98 0.93 0.92 0.88 0.90 0.87 0.94 0.90 35 360
Flood 0.99 1.00 0.99 0.96 0.99 0.98 0.96 0.99 0.98 59 769
Flash traffic 0.99 1.00 1.00 0.99 1.00 0.99 0.99 1.00 0.99 19 950
Macro avg 0.97 0.93 0.94 0.81 0.87 0.83 0.95 0.88 0.90 130 033
Weighted avg 0.96 0.96 0.95 0.93 0.93 0.93 0.94 0.94 0.93 130 033
Accuracy 0.96 0.93 0.94 130 033

Table 11
Comparison of proposed approach with existing work.
References Approach used Dataset used Accuracy Precision Recall F1-score TNR FPR
Choi et al. [7] AE NSL-KDD 91.70 97.68 84.68 90.71 98.15 –
Yang et al. [8] AE Synthetic (SYNT) – – 98.32 – – 0.38
UNB 2017 – – 94.10 – – 1.88
Meira et al. [9] AE NSL-KDD – 53 77 63 – –
ISCX – 76 63 69 – –
Tang et al. [10] LightGBM-AE NLS-KDD 89.82 91.81 90.16 90.98 – –
Song et al. [11] AE NSL-KDD 88.7 – 85.1 89.5 – 0.066
Hou et al. [12] NAE and DNN NSL-KDD 90.03 88.44 94.88 92.21 – –
Aktar et al. [13] AE NSL-KDD 96.08 96.10 96.08 96.08 – –
AE CIC-IDS2017 92.45 92.46 92.45 92.45 – –
AE CIC-DDoS2019 93.41%-97.58% – – – – –
DL-2P-DDoSADF AE (binary) and DNN (multiclass): CICDDoS 2019 99.93 91.02 85.75 88.30 99.97 0.024
(Proposed approach) Phase-I: AE (binary)
Phase-II: DNN (Multiclass) CICDDoS 2019 97 98 97 97 – –
DL-2P-DDoSADF Phase-I: AE (binary) DDoS-AT-2022 99.08 99.31 85.67 91.98 99.96 0.038
(Proposed approach)
Phase-II: DNN (Multiclass) DDoS-AT-2022 96 96 96 95 – –

(4). Also, these models are able to filter out benign traffic that has been
misclassified as attack traffic by the AE during the first phase.
The DNN model provides the best results for multiclass classification
compared to the LSTM and GRU models. The DNN model outperforms
both LSTM and GRU models for all classes (0, 1, 2, 3, 4), with an
overall accuracy of 96% for multiclass classification. The DNN model
performed poorly for low rate attacks compared to other attacks but
showed good results because low-rate traffic is very difficult to detect
due to its volume and patterns being similar to legitimate user traffic.
The LSTM model has worse results in predicting class 0 (i.e., benign
traffic) because the number of records of benign traffic is very low, and
these records are randomly distributed over the test data. Consequently,
the limited number of benign records is insufficient to provide the
model with enough information to identify dominant patterns effec-
tively. The GRU results are comparable to the DNN model. The results
Fig. 4. Comparison of overall accuracy for DNN, LSTM and GRU models over
CICDDoS2019 dataset. of the DNN, LSTM, and GRU models are presented in Table 10. The
F1-score of class 1 is less than 80% in the case of the DNN model,
which is due to low-rate attacks. The LSTM model performed poorly in
5.3. Results of proposed DL-2P-DDoSADF over DDoS-AT-2022 dataset predicting benign and low-rate attack, as the F1 scores of classes 0 and
1 are very low (50% and 77% respectively) compared to other classes.
The GRU model has the F1 score value of 72% for low-rate attacks.
The DDoS-AT-2022 dataset contains a blend of benign traffic, flash Additionally, all three models (DNN, LSTM, and GRU) performed well
traffic, and various DDoS attacks at the transport and application layer in identifying flash traffic. The DNN model has the highest AUC-ROC
with varying attack rates (such as low, slow, and flood). Our proposed values for all classes compared to the LSTM and GRU models. The
DL-2P-DDoSADF has successfully predicted all these types of traffic. ROC-AUC curves for these models are shown in Figs. 5(a)–5(c).
The results for the AE are presented in Table 9. The AE has a lower Table 10 compares the precision, recall, and F1 scores of the DNN,
recall rate (85.6% in the case of the Confusion matrix w.r.t. Benign), LSTM, and GRU models on the DDoS-AT 2022 dataset, specifically for
meaning that benign traffic is wrongly classified as the attack traffic. individual DDoS attacks, flash traffic, and benign traffic. The results
To address this limitation, we have introduced a second phase in the reveal that the DNN model outperformed the LSTM and GRU models
proposed approach to filter out the normal traffic that was mistakenly in class-wise classification, as indicated by its higher precision, recall,
classified as attack traffic by the AE. Additionally, this phase aims to and F1 scores.
identify the types of attacks accurately. The rest of the AE results are Furthermore, another Fig. 6 illustrates the overall accuracy for mul-
virtuous. The predicted attacks by the AE model are fed to the DNN, ticlass classification on the DDoS-AT 2022 dataset. The results indicate
LSTM, and GRU models to classify them into five classes: benign (0), that the DNN model achieved higher overall accuracy of 96% than both
low-rate attack (1), slow attack (2), flood attack (3), and flash traffic the LSTM (93%) and GRU (94%) models.

11
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

Fig. 5. AUC-ROC curve for DNN, LSTM, and GRU models over DDoS-AT-2022 dataset.

it difficult to label such a large amount of network traffic. This results


in a limited amount of labeled data for supervised learning. As network
traffic continues to grow, cyberattacks, especially DDoS attacks, are
becoming increasingly common and pose a serious threat to internet
users. This research proposed a DL-2P-DDoSADF to address this issue.
The first phase uses an Autoencoder to overcome the limited labeled
data challenge and acts as a binary classifier. In the second phase,
a supervised DL approach detects different DDoS attacks and helps
filter out the normal traffic that the AE misclassifies. The proposed
DL-2P-DDoSADF has been tested using the CICDDoS2019 and DDoS-
AT-2022 datasets. In the first phase, an Autoencoder model is trained
on legitimate traffic, and a threshold value is set using the RE. In the
Fig. 6. Comparison of overall accuracy for DNN, LSTM and GRU models over second phase, the performance of various deep learning algorithms
DDoS-AT-2022 dataset. (DNN, LSTM, and GRU) is compared. In the first phase, the autoencoder
demonstrated a 99% accuracy in detecting both datasets. At the same
time, the DNN achieved an overall accuracy of 97% and 96% for
5.4. Comparative analysis with existing work multiclass classification over the CICDDoS2019 and DDoS-AT 2022
datasets, respectively. The DNN model outperformed the LSTM and
We compared our proposed approach, DL-2P-DDoSADF, with ex- GRU models. Future work would entail the implementation of this
isting methods. The basic ANN model configuration consisted of five technique in a real-time environment.
hidden layers with 64, 32, 32, 16, and 16 neurons in each layer, respec-
tively. The model was trained for five epochs. The evaluation results
showed an accuracy of 99%, precision of 67%, recall of 64%, and an CRediT authorship contribution statement
F1-score of 65%. On the other hand, the basic DNN model configuration
used two hidden layers, each with 50 neurons. The Adam optimizer was Meenakshi Mittal: Writing – original draft, Conceptualization,
employed, and it achieved 80% accuracy in a multiclass classification Methodology, Formal analysis, Validation, Investigation, Writing – re-
task. The results presented in Table 11 demonstrate that our approach
view & editing. Krishan Kumar: Conceptualization, Supervision, Writ-
outperformed existing methods. S. Aktar and A.Y. Nur [13] examined
ing – review & editing. Sunny Behal: Conceptualization, Supervision,
seven types of traffic, namely LDAP, UDP, MSSQL, PORTMAP, SYN,
Writing – review & editing.
UDPLAG, and NETBIOS, from the CICDDoS 2019 dataset. Each of these
seven types was individually analyzed using the proposed approach.
The results demonstrate that the accuracy for these traffic types ranges Declaration of competing interest
from 93.41% to 97.58%. The DL-2P-DDoSADF showed good results for
all performance metrics except the recall value of benign traffic for both
The authors declare that they have no known competing finan-
the datasets in phase I. Recall is defined as the number of records the
cial interests or personal relationships that could have appeared to
model correctly identifies out of the total records for a particular class.
The recall values for benign traffic for the CICDDoS2019 and DDoS- influence the work reported in this paper.
AT 2022 datasets are 85.75% and 85.67%, respectively, because the
AE model misclassified some benign traffic as attack traffic. Therefore, Data availability
we deployed a second phase for two reasons: first, to filter out benign
traffic misclassified by the AE model, and second, to classify traffic
Data will be made available on request.
into multiclass categories. In this research, the first phase (i.e. AE) of
the proposed approach demonstrates an accuracy of 99.93% for the
CICDDoS2019 dataset and 99.08% for the DDoS-AT-2022 dataset. In Acknowledgments
the second phase, the DNN model showed better overall multiclass
classification accuracy of 97% and 96% for the CICDDoS2019 and
I would like to express my gratitude to Dr. Satwinder Singh, HoD
DDoS-AT 2022 datasets, respectively.
of the Department of Computer Science and Technology at the Central
6. Conclusion and future work University of Punjab, Ghudda, Bathinda. Their support in providing
resources of the Indian Council of Medical Research (ICMR) project
In the current age of big data and IoT, the amount of network (IRIS No/Proposal ID 2021-6329) was invaluable in evaluating Deep
traffic has rapidly grown, but most of it needs to be labeled, making Learning models over the datasets.

12
M. Mittal et al. Journal of Information Security and Applications 78 (2023) 103609

References [20] Klar M, Glatt M, Aurich JC. Performance comparison of reinforcement learning
and metaheuristics for factory layout planning. CIRP J Manuf Sci Technol
[1] India to have 900 million internet users by 2025: Report | IBEF. 2022, https: 2023;45:10–25.
//www.ibef.org/news/india-to-have-900-million-internet-users-by-2025-report. [21] Xu X, Hu H, Liu Y, Tan J, Zhang H, Song H. Moving target defense of routing
[2] Patil NV, Krishna CR, Kumar K. Distributed frameworks for detecting dis- randomization with deep reinforcement learning against eavesdropping attack.
tributed denial of service attacks: A comprehensive review, challenges and future Digit Commun Netw 2022;8(3):373–87.
directions. Concurr Comput: Pract Exper 2021;33:e6197. [22] Ajao LA, Apeh ST. Secure edge computing vulnerabilities in smart cities sustain-
[3] What is a distributed denial-of-service (DDoS) attack? | Cloudflare. 2022, https: ability using petri net and genetic algorithm-based reinforcement learning. Intell
//www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/. Syst Appl 2023;18:200216.
[4] Kaspersky DDoS report, Q1 2022 | Securelist. 2022, https://securelist.com/ddos- [23] Ko I, Chambers D, Barrett E. Adaptable feature-selecting and threshold-moving
attacks-in-q1-2022/106358/. complete autoencoder for DDoS flood attack mitigation. J Inf Secur Appl
[5] Hacktivists step back giving way to professionals: a look at DDoS in
2020;55:102647.
Q3 2022 | Kaspersky. 2022, https://www.kaspersky.com/about/press-
[24] GitHub - autonomio/talos: Hyperparameter Optimization for TensorFlow, Keras
releases/2022{_}hacktivists-step-back-giving-way-to-professionals-a-look-at-
and PyTorch. 2022, https://github.com/autonomio/talos.
ddos-in-q3-2022.
[25] Yungaicela-Naula NM, Vargas-Rosales C, Perez-Diaz JA. SDN-based architecture
[6] 20+ DDoS attack statistics and facts for 2018–2022. 2022, https://www.
comparitech.com/blog/information-security/ddos-statistics-facts/. for transport and application layer DDoS attack detection by using machine and
[7] Choi H, Kim M, Lee G, Kim W. Unsupervised learning approach for deep learning. IEEE Access 2021;9:108495–512.
network intrusion detection system using autoencoders. J Supercomput [26] Assis MV, Carvalho LF, Lloret J, Proença ML. A GRU deep learning system against
2019;75(9):5597–621. attacks in software defined networks. J Netw Comput Appl 2021;177:102942.
[8] Yang K, Zhang J, Xu Y, Chao J. DDoS attacks detection with autoencoder. In: [27] Amaizu GC, Nwakanma CI, Bhardwaj S, Lee JM, Kim DS. Composite and
Proceedings of IEEE/IFIP network operations and management symposium 2020: efficient DDoS attack detection framework for B5G networks. Comput Netw
Management in the age of softwarization and artificial intelligence, NOMS 2020. 2021;188:107871.
Institute of Electrical and Electronics Engineers Inc.; 2020. [28] Cil AE, Yildiz K, Buldu A. Detection of DDoS attacks with feed forward based
[9] Meira J, Andrade R, Praça I, Carneiro J, Bolón-Canedo V, Alonso-Betanzos A, deep neural network model. Expert Syst Appl 2021;169:114520.
Marreiros G. Performance evaluation of unsupervised techniques in cyber-attack [29] Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA. Developing realistic dis-
anomaly detection. J Ambient Intell Humaniz Comput 2020;11(11):4477–89. tributed denial of service (DDoS) attack dataset and taxonomy. In: Proceedings
[10] Tang C, Luktarhan N, Zhao Y. An efficient intrusion detection method based on - International Carnahan conference on security technology, Vol. 2019-October.
lightgbm and autoencoder. Symmetry 2020;12(9):1458. Institute of Electrical and Electronics Engineers Inc.; 2019.
[11] Song Y, Hyun S, Cheong YG. Analysis of autoencoders for network intrusion
[30] Mittal M, Kumar K, Behal S. DDoS-AT-2022: a distributed denial of service attack
detection. Sensors 2021;21(13):4294.
dataset for evaluating DDoS defense system. Proc Indian Nat Sci Acad 2023;1–19,
[12] Hou Y, Fu Y, Guo J, Xu J, Liu R, Xiang X. Hybrid intrusion detection model based
2023.
on a designed autoencoder. J Ambient Intell Humaniz Comput 2022;1:1–11.
[31] CORE. The CORE Emulator. 2016, http://www.nrl.navy.mil/itd/ncs/products/
[13] Aktar S, Yasin Nur A. Towards DDoS attack detection using deep learning
approach. Comput Secur 2023;129:103251. core.
[14] Lopes IO, Zou D, Abdulqadder IH, Ruambo FA, Yuan B, Jin H. Effective net- [32] sklearn.preprocessing.MinMaxScaler — scikit-learn 1.2.0 documentation.
work intrusion detection via representation learning: A Denoising AutoEncoder 2022, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.
approach. Comput Commun 2022;194:55–65. MinMaxScaler.html.
[15] Sun G, Wen Y, Li Y. Instance segmentation using semi-supervised learning for [33] Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on? | by
fire recognition. Heliyon 2022;8(12):e12375. Salma Ghoneim | Towards Data Science. 2022, https://towardsdatascience.com/
[16] Zha W, Hu L, Duan C, Li Y. Semi-supervised learning-based satellite remote accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124.
sensing object detection method for power transmission towers. Energy Rep [34] Precision and recall in machine learning - Javatpoint. 2022, https://www.
2023;9:15–27. javatpoint.com/precision-and-recall-in-machine-learning.
[17] Menezes GK, Astolfi G, Martins JAC, Castelão Tetila E, da Silva Oliveira Junior A, [35] Han J, Kamber M, Pei J. Data mining: Concepts and techniques. Elsevier Inc.;
Gonçalves DN, Marcato Junior J, Silva JA, Li J, Gonçalves WN, Pistori H. Pseudo- 2012.
label semi-supervised learning for soybean monitoring. Smart Agric Technol [36] Bhuvaneswari Amma NG, Subramanian S. VCDeepFL: Vector convolutional deep
2023;4:100216. feature learning approach for identification of known and unknown denial
[18] Aamir M, Ali Zaidi SM. Clustering based semi-supervised machine learning for
of service attacks. In: IEEE region 10 annual international conference, Pro-
DDoS attack classification. J King Saud Univ - Comput Inf Sci 2021;33(4):436–46.
ceedings/TENCON, Vol. 2018-October. Institute of Electrical and Electronics
[19] Reinforcement learning - GeeksforGeeks. 2023, https://www.geeksforgeeks.org/
Engineers Inc.; 2019, p. 640–5.
what-is-reinforcement-learning/.

13

You might also like