Enhancing Cybersecurity: Machine Learning Approaches For Predicting Ddos Attack
Enhancing Cybersecurity: Machine Learning Approaches For Predicting Ddos Attack
1
Department of Computer Science, American International University-Bangladesh. Dhaka, Bangladesh.
KEYWORDS ABSTRACT
Attack Dealing with network security has always been challenging, particularly with regard to the
Cyber Security detection and prevention of Distributed Denial of Service (DDoS) attacks. Attacks like
DDoS DDoS bring threats to the network by violating its availability to the probable people who
Machine Learning are in need of using that particular server. It is a type of cyber-attack where a network is
CIC-DDoS2019 Dataset flooded with a huge amount of traffic, overwhelming the system, and making it
Security unavailable. This type of attack focuses on making the service unavailable to rightful
users, without breaching the security perimeter. In a DDoS attack, a master computer
hacks a network of vulnerable computers to send a huge quantity of packets to a server
ARTICLE HISTORY from already captured zombie computers. Researchers have suggested various Machine
learning (ML) algorithms to detect such attacks. To study and analyse DDoS attacks,
Received 3 April 2024 researchers have used the CIC-DDoS2019 dataset. To find out how often a DDoS attack
Received in revised form happens to a server along with the possible pattern of the attack and type of the attack.
14 June 2024 This dataset is utilized to train and evaluate ML models for detecting DDoS attacks. In this
Accepted 22 June 2024 paper, the primary objective is to propose a decent version of DDoS dataset for
Available online 4 July 2024 investigation and evaluate the performance of various state-of-the-art classifiers, such as
Gaussian Naïve Byes (GNB), Bernoulli Naïve Byes (BNB), Random Forest (RF), ID3
Decision Tree (ID3 DT), Logistic Regression (LR), K-Nearest Neighbors (KNN),
AdaBoost, CART, and Bagging Classifier ML algorithms to detect DDoS attacks
accurately. Along with that, the experimenter showed that DDoS attacks can be identified
even more accurately if the attacks are stored in a binary way rather than categorized into
13 different types of attacks in the dataset.
This type of attack can cause significant financial and resource Some of the examples of DDoS attacks are DNS, NTP,
damage to a company. DDoS attacks have gradually become SSDP, LDAP, SNMP, TFTP, MSSQL, Portmap, SYN,
common in the present time where so much of the whole NetBIOS, UDP, and UDP-Lag. These attacks get activated in
infrastructure and services rely on the internet. many network layers. The hacker can either use a single or
many computers as a bot to activate an attack into the network
layer. More than one attack also happened on a server at a
time. When the attacker uses multiple computers, these
computers are called bots or zombies.
2. LITERATURE REVIEW
This part covers a brief analysis of all the DDoS attacks
and the present research in this sector of work. Fig 3. DDoS Attack on Different Network Layer
DDoS attacks are intended [10] to make the server Hariharan. M et al. [14] describe this paper as aiming to
ineffectual to give general services, resulting in a network identify DDoS attacks by utilizing the C5.0 ML algorithm.
failure. It is frustrating for users who rely on these services, The objective is to assess the results obtained with other
but thankfully, some measures can be taken to prevent and classifiers such as NB and C4.5 DT. The study will help
mitigate these attacks. Direct DDoS and Reflection-based determine the effectiveness of the C5.0 algorithm in
DDoS are two common types of DDoS attacks [11, 12]. A identifying DDoS attacks and how it compares to other
DDoS is an attack that originated to make networks and commonly used classifiers.
systems’ resources unavailable for legitimate users [13].
K. Narasimha Mallikarjunan et al. [15] contributed to
creating a real-time dataset by using a Naive Bayes technique
for identifying and evaluating its accuracy with the remaining
methods like RF and J48.
Iman Sharafaldin et al. [16] analyzed the existing dataset
completely and proposed a classification for DDoS attacks.
Then present the CICDDoS2019 dataset including 11 DDoS
attacks, which remedies all current shortcomings. Recommend
an identification and classification way of detection based on a
set of features from network flow with a prediction percentage
of 78% in ID3, 77% in RF, 41% NB, and 25% LR.
Sagar Dhanraj Pande et al [17] demonstrated the
classification of normal and attack samples was performed
using the RF algorithm. The accuracy of the classification was
Fig. 2. DDoS Attack 99.76%, which indicates that the algorithm was highly
effective in distinguishing between the two types of samples.
Akinul Islam Jony et al./ Malaysian Journal of Science and Advanced Technology 251
Kimmi Kumari and M. Mrunalini [18] used the CAIDA 3. METHODS AND MATERIALS
2007 dataset for expletory research. The implementation of the
Naïve Byes (NB) and LR algorithm was done using the Weka
data mining platform. The results of this study were carefully
analyzed and compared in order to draw valid conclusions
with a percentage of 99% in both algorithms.
Marwane Zekri et al. [19] implemented a highly effective
system for detecting DDoS that utilizes advanced algorithms
such as Naive Bayesian, C4.5, and K-Means to reduce the
DDoS threat. This system also incorporates signature
detection techniques, which enable it to generate a decision
tree (DT) for automatic and accurate identification of
signature attacks for DDoS flooding attacks.
Raniyah Wazirali and Rami Ahmad contributed [20] to
the evaluation of the use of ML approaches in WSN node
traffic and how they impact the overall WSN network lifetime.
The author thoroughly analyzes the performance metrics of
different ML classifications such as KNN, LR, SVM (Support
Vector Machine), DT, NB, Gboost, LSTM (Long Short-Term
Memory) (e.g., [24]), and MLP, (Multi-Layer Perceptron) on a
WSN-dataset of different sizes. To accurately assess the
effectiveness of these algorithms, the author used various
performance metrics such as Accuracy, F1-score, FPR (False
Positive Ratio), FNR (False Negative Ratio), and training
execution time.
Salim Salmi and Lahcen Oughdir demonstrated [21] that
several deep learning (DL) algorithms like DNN, CNN, RNN,
CNN+RNN based IDS (were trained on WSN-DS dataset) for
identifying 4 types of DoS attacks (Blackhole, Grayhole, Fig. 4. Working Procedure
Flooding, and Scheduling) that affect WSNs. 3.1 Dataset
Mohamed Idhammad et al. [10] present an innovative The dataset is about DDoS attacks [23]. DDoS attacks are
approach for detecting DDoS attacks online using network a significant threat to network security. The attacks
Entropy estimation, Co-clustering, Information Gain Ratio, overwhelm the target network with malicious traffic, making it
and Extra-Trees technique. They used 3 (three) public inaccessible to users. The dataset used in this study was
datasets such as NSL-KDD, UNB ISCX 12, and UNSW- obtained from the Canadian Institute for Cybersecurity [23].
NB15, and achieved a predictive accuracy of 98.23%, 99.88%, The dataset has 87 feature columns and 1 target column,
and 93.71% respectively. Moreover, the false positive rates totaling 88 columns. There are 14 different categorical values
were minimal, with values of 0.33%, 0.35%, and 0.46%, in the target column, including NTP, DNS, LDAP, MSSQL,
respectively. NetBIOS, SNMP, SSDP, UDP, UDP-Lag, WebDDoS, SYN,
TFTP, Portmap, and BENIGN (represents non-attack).
Rami J. Alzahrani and Ahmed Alzahrani [22] worked on Canadian Institute for Cybersecurity has created two datasets
implementing six ML algorithms that include NB, KNN, DT, – one is for training and the other one is for testing [16]. The
SVM, RF, and LR by using WEKA tools to identify DDoS training dataset was created for the day of January 12th,
attacks from the same CICDDoS2019 dataset. The most starting at 10:30 AM and ending at 5:15 PM. This dataset
satisfying accuracy result in the introduced determination was includes 12 types of DDoS attacks, such as NTP, DNS,
obtained by DT and RF methods with 99% accuracy. LDAP, MSSQL, NetBIOS, SNMP, SSDP, UDP, UDP-Lag,
The previous research mentioned datasets were unable to WebDDoS, SYN and TFTP. The testing dataset was created
execute and capture all the DDoS attacks presented in the for the day of March 11th, starting at 9:40 AM and ending at
dataset CIC-DDoS2019, which doesn't contain enough data to 5:35 PM. This dataset includes 7 types of attacks, such as
match the credibility of the dataset. The created dataset in this Portmap, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, and
paper was used to obtain accuracy through various ML SYN.
algorithms, but only random data from every attack was 3.2 Dataset Preprocessing
chosen to create an idealized dataset. This research study
highlighted the difference in accuracy when the target To propose the modified version of the dataset, both the
variables are modified. Also, this paper initiated to highlight training and testing datasets were used. Here data were
the accuracy, recall, precision, and F1 score by various ML randomly selected 50,000 data for each type of attack (NTP,
techniques using the CIC-DDoS2019 dataset. DNS, LDAP, MSSQL, NetBIOS, SNMP, SSDP, UDP, UDP-
Lag, SYN, and TFTP) from the training dataset, including all
BENIGN data. Additionally, 50,000 Portmap attack data were
also randomly collected, including all benign data from the
testing dataset. WebDDoS data was removed from the dataset
since there was the minimum account of data available for
Akinul Islam Jony et al./ Malaysian Journal of Science and Advanced Technology 252
WebDDoS. After merging all datasets, a new dataset was 3.3.2 Random Forest (RF)
created that contained 661597 records. The new dataset has 87 RF is a well-known [17] ML technique that is commonly
feature columns, 1 target column with 13 target variables. used for classification which was established by Leo Breiman.
After removing the object datatype column during data RF combines [16] the concepts of DT and ensemble learning.
cleaning, the dataset was left with a total of 82 columns. After It uses a forest of several DTs, each using randomly selected
replacing the infinity values with NaN, the corresponding data attributes as its input. To prevent the DT [25] from being
rows are removed from the dataset. The final dataset contains identical, in an RF, a subset of characteristics is randomly
638455 records. The target variable in the final dataset was selected for each node. The rest of the parameters are then
renamed from its original names (DrDoS_DNS, used for the DT within the forest.
DrDoS_LDAP, etc.) to more concise names (DNS, LDAP, 1
etc.). 𝑃(𝑌 = 1|𝑋) = (2)
1+ 𝑒(−(𝑏0+𝑏1×𝑋1+𝑏2×𝑋2+⋯+𝑏𝑛×𝑋𝑛)) )
that the proposed way of work is not only feasible but also [14] M. Hariharan, H.K. Abhishek, and B.G. Prasad, "DDoS attack detection
delivers superior performance as compared to various recent using C5.0 machine learning algorithm," IJ Wireless and Microwave
Technologies, vol. 1, pp. 52-59, 2019.
and relevant approaches that have been documented in the
literature [29]. [15] K. Narasimha Mallikarjunan, A. Bhuvaneshwaran, K. Sundarakantham,
and S. Mercy Shalinie, "DDAM: detecting DDoS attacks using machine
This model still operates more as a damage control learning approach," in Computational Intelligence: Theories,
Applications and Future Directions-Volume I: ICCI-2017, pp. 261-273,
system [14] rather than a prevention system. It seems that the
Singapore, Aug. 2018.
detection only occurs after the damage has already been done.
[16] I. Sharafaldin, A.H. Lashkari, S. Hakak, and A.A. Ghorbani,
Also, the predictions are only detected by the ML algorithm. "Developing realistic distributed denial of service (DDoS) attack dataset
This model doesn’t showcase a new way of detecting DDoS and taxonomy," in 2019 International Carnahan Conference on Security
attacks. Technology (ICCST), pp. 1-8, Oct. 2019.
In the future, a throughout analysis hopes to be conducted [17] S. Pande, A. Khamparia, D. Gupta, and D.N. Thanh, "DDOS detection
using machine learning technique," in Recent Studies on Computational
by using the DL algorithm. In this research paper it hoped to Intelligence: Doctoral Symposium on Computational Intelligence
find improved ways to stop cyber-attacks more effectively, as (DoSCI 2020), pp. 59-68, Springer Singapore, 2021.
well as new techniques and efficient algorithms, being [18] K. Kumari and M. Mrunalini, "Detecting Denial of Service attacks using
developed to stop DDoS attacks. This project aims to machine learning algorithms," Journal of Big Data, vol. 9, no. 1, pp. 1-7,
investigate the real-time execution and verification of the Dec. 2022.
method to address the problem at hand [30]. [19] M. Zekri, S. El Kafhali, N. Aboutabit, and Y. Saadi, "DDoS attack
detection using machine learning techniques in cloud computing
environments," in 2017 3rd international conference of cloud computing
technologies and applications (CloudTech), pp. 1-7, Oct. 2017.
REFERENCES
[20] R. Wazirali and R. Ahmad, "Machine Learning Approaches to Detect
[1] S. Chakraborty, P. Kumar, and B. Sinha, "A study on DDoS attacks, DoS and Their Effect on WSNs Lifetime," Computers, Materials &
danger and its prevention," Int. J. Res. Anal. Rev., vol. 6, no. 2, pp. 10- Continua, vol. 70, no. 3, Mar. 2022.
15, 2019.
[21] S. Salmi and L. Oughdir, "Performance evaluation of deep learning
[2] K. H. Zaboon and A. A. Abdullah, "A Review of the Common DDoS techniques for DoS attacks detection in wireless sensor network,"
Attack: Types and Protection Approaches Based on Artificial Journal of Big Data, vol. 10, no. 1, pp. 1-25, Dec. 2023.
Intelligence," Fusion: Practice and Applications, vol. 7, no. 1, pp. 08-08,
Dec. 2021. [22] R. J. Alzahrani and A. Alzahrani, "Security analysis of DDoS attacks
using machine learning algorithms in networks traffic," Electronics, vol.
[3] L. E. Jaramillo, "Malware detection and mitigation techniques: Lessons 10, no. 23, p. 2919, Nov. 25, 2021.
learned from Mirai DDOS attack," Journal of Information Systems
Engineering & Management, vol. 3, no. 3, pp. 19, Jul. 16, 2018. [23] University of New Brunswick, "Canadian Institute for Cybersecurity
DDoS Attack Dataset (2019)," [Online]. Available:
[4] A. I. Jony and S. A. Hamim, “Navigating the Cyber Threat Landscape: https://www.unb.ca/cic/datasets/ddos-2019.html.
A Comprehensive Analysis of Attacks and Security in the Digital Age”,
Journal of Information Technology and Cyber Security, vol. 1, no. 2, pp. [24] A. I. Jony and A. K. B. Arnob, “A long short-term memory based
53-67, 2023. approach for detecting cyber attacks in IoT using CIC-IoT2023 dataset”,
Journal of Edge Computing, vol. 3, no. 1, pp. 28-42, 2024. Available
[5] I. V. Kotenko and A. V. Ulanov, "Agent-based simulation of DDoS from: https://doi.org/10.55056/jec.648.
attacks and defense mechanisms," Journal of Computing, vol. 4, no. 2,
pp. 16-37, 2005. [25] X. D. Hoang and Q. C. Nguyen, "Botnet detection based on machine
learning techniques using DNS query data," Future Internet, vol. 10, no.
[6] Q. Yan, F. R. Yu, Q. Gong, and J. Li, "Software-defined networking 5, p. 43, May 18, 2018.
(SDN) and distributed denial of service (DDoS) attacks in cloud
computing environments: A survey, some research issues, and [26] T. H. Kim, D. C. Park, D. M. Woo, T. Jeong, and S. Y. Min, "Multi-
challenges," IEEE communications surveys & tutorials, vol. 18, no. 1, class classifier-based AdaBoost algorithm," in Intelligent Science and
pp. 602-622, Oct. 5, 2015. Intelligent Data Engineering: Second Sino-foreign-interchange
Workshop, IScIDE 2011, Xi’an, China, October 23-25, 2011, Revised
[7] Cisco, "Annual Internet Report (2018–2023) White Paper," Accessed Selected Papers 2 2012, pp. 122-127.
June 11, 2020. [Online]. Available: https://www.cisco.com/c/en/
us/solutions/collateral/executive-perspectives/annual-internet- [27] S. Bashir, U. Qamar, F. H. Khan, and M. Y. Javed, "An efficient rule-
report/white-paper-c11-741490.html. based classification of Diabetes using ID3, C4.5, & CART ensembles,"
in 2014 12th International Conference on Frontiers of Information
[8] A. I. Jony and A. K. B. Arnob, “Securing the Internet of Things- Technology, Dec. 17, 2014, pp. 226-231.
Evaluating Machine Learning Algorithms for Detecting IoT
Cyberattacks using CIC-IoT2023 Dataset”, International Journal of [28] S. Sikkanan and M. Kasthuri, "Denial-of-service and botnet analysis,
Information Technology and Computer Science, 2024. (In Press). detection, and mitigation," in Research Anthology on Combating
Denial-of-Service Attacks, 2021, pp. 20-48.
[9] S. S. Shanto, Z. Ahmed and A. I. Jony, “Mining User Opinions: A
Balanced Bangla Sentiment Analysis Dataset for E-Commerce”, [29] F. S. Lima Filho, F. A. Silveira, A. de Medeiros Brito Junior, G. Vargas-
Malaysian Journal of Science and Advanced Technology, vol. 3, no. 4, Solar, and L. F. Silveira, "Smart detection: an online approach for
pp.272-279, 2023. DoS/DDoS attack detection using machine learning," Security and
Communication Networks, vol. 2019, pp. 1-5, Oct. 13, 2019.
[10] Z. Chao-Yang, "DOS attack analysis and study of new measures to
prevent," in 2011 International Conference on Intelligence Science and [30] C. Kemp, C. Calvert, T. M. Khoshgoftaar, and J. L. Leevy, "An
Information Engineering, IEEE, Aug. 2011, pp. 426-429. approach to application-layer DoS detection," Journal of Big Data, vol.
10, no. 1, p. 22, Feb. 13, 2023.
[11] M. Idhammad, K. Afdel, and M. Belouch, "Semi-supervised machine
learning approach for DDoS detection," Applied Intelligence, vol. 48,
pp. 3193-3208, Oct. 2018.
[12] D. Tang and X. Kuang, "Distributed denial of service attacks and
defense mechanisms," in IOP Conference Series: Materials Science and
Engineering, vol. 612, no. 5, p. 052046, Oct. 2019.
[13] N. Tripathi, "DoS and DDoS Attacks: Impact, Analysis and
Countermeasures."