Final Thesis 29.01.2025
Final Thesis 29.01.2025
Introduction
1.1 Overview
Distributed Denial-of-Service (DDoS) attacks are one of the most prevalent and
destructive cybersecurity threats in existence today. Flooding a target system is how
these attacks operate. This deluge of traffic frequently originates from a botnet,
which is a collection of infected devices under remote attacker control [2]. By
creating an unusual spike in requests, DDoS attacks make it impossible for the target
system to serve real users. The attacks primarily aim to exhaust important resources
like memory, CPU power, and network bandwidth, which can result in serious service
disruptions, monetary losses, and reputational damage to a company. The effects of
these attacks can be severe and pervasive, regardless of whether they are motivated
by extortion, economic sabotage, or political goals [4][2].
A standard network connection uses a three-way handshake process to make sure that
both the client and server are prepared to communicate [2]. This process starts when
the client sends a connection request called (SYN packet) to the server. The server
acknowledges this request with a SYN-ACK packet and the client completes the
handshake by sending back an acknowledgment called (ACK packet) [6]. However,
DDoS attackers take benefit of this mechanism to disrupt the normal operations. A
typical example is the TCP-SYN flood attack, in which the attacker overload the
server with a massive number of SYN packets without completing the handshake.
These incomplete connections drain server resources [8] such as memory and
processing power, as the system must maintain these half-open connections until they
time out. With the server's resources fully utilized, legitimate clients or the users are
unable to establish a connection, leading to service denial.[3][9]
1
The rapid growth of interconnected devices, including Internet of Things (IoT)
systems and cloud infrastructure, has greatly expanded the potential for DDoS threats
[44]. As these technologies become more widespread, attackers can tap into an
increasing number of vulnerable devices to launch large-scale attacks. To make
matters worse, modern DDoS attacks have evolved to be more sophisticated, using
adaptive strategies like multi-vector attacks that hit multiple layers of a system at once
[7]. Traditional defensive measures, such as firewalls, intrusion detection systems and
rate-limiting techniques, often fall short against these advancing threats. This
highlights the pressing need for more advanced and dynamic defense mechanisms [5].
Users
C&C
BOTS
Victim
Server
2
and lessens the impact on legitimate users [23]. As cyber threats keep evolving,
incorporating machine learning into DDoS defense systems is a vital step in ensuring
the resilience of modern digital infrastructures [23][1]
Figure-1.2 Global Top Daily DDoS Attacks on the Digital Attack Map
3
bypassing these defenses by employing advanced and adaptive tactics, such as multi-
vector attacks or imitating legitimate user behavior.[82]
Machine learning (ML) has emerged as a promising solution to tackle these issues.
Unlike traditional methods, ML algorithms can process large volumes of real-time
network traffic data to identify patterns and detect subtle anomalies that might signal a
DDoS attack. The benefits of using machine learning for DDoS detection include:
1. Anomaly Detection in Network Traffic: Machine learning models are
capable of identifying unusual patterns in network traffic, such as sudden
increases in request rates or atypical geographic traffic distributions, which
often signal potential attacks [79][76].
This research aims to investigate how effective machine learning techniques are in
identifying and mitigating certain types of DDoS attacks. The study concentrates on
three prevalent attack vectors:
1. UDP Flood Attacks: These attacks inundate the target system with a massive
influx of User Datagram Protocol (UDP) packets, which drains bandwidth and
resources [56] [51].
4
2. TCP_SYN Flood Attacks: By taking advantage of the TCP three-way
handshake, these attacks send a multitude of synchronization (SYN) requests
without finalizing the connection, thereby consuming server resources [54][46].
3. ICMP Flood Attacks: Commonly referred to as ping floods, these attacks
generate a large number of Internet Control Message Protocol (ICMP) requests,
which can overwhelm the network infrastructure and diminish its ability to
process legitimate requests [41] [57].
The results of this study are intended to aid in the creation of more robust, adaptive,
and efficient DDoS detection systems that utilize machine learning. Such progress is
essential for protecting modern digital infrastructures, ensuring service availability,
and preserving user trust amid the ever-changing landscape of cyber threats [31] [78].
1.3 Background
This study looks closely at Distributed Denial-of-Service attacks in terms of network
security. It talks about using Machine Learning to detect DDoS attacks and checks out
the latest defense methods in cybersecurity. The document also lists and explains the
words and terms related to network security that we'll talk about in the next Chapter.
5
1.3.2 Network Security Terminology
In the realm of security, certain terms carry specific meanings, while others may lack
clear definitions. Key vocabulary associated with computer security includes:
1. Vulnerability: A defect or shortcoming in a system's viability, architecture,
execution, operation, or upkeep [75]. Even if every system has vulnerabilities,
there needs to be strong defenses in place to deal with related risks.
2. Threat: A threat having the means and intent to take advantage of a weakness.
A company's reputation and finances may suffer if a threat materializes into an
assault, thus it is important to handle threats carefully [73] [71].
3. Attack: It is the utilization or exploitation of a weakness, which is neither
good nor bad. Adversaries and well-meaning people looking for answers can
both assault a system [68].
4. Attacker: The entity initiating an attack, synonymous with a threat. Attackers
exploit system vulnerabilities using appropriate tools and techniques.
5. Exploit: A weakness that is developing that may be used in an assault. While
not every vulnerability has an exploit, a single vulnerability can lead to several
attacks [16].
6. Target: The individual, business, or system that the exploit directly targets
and affects. Certain exploits contain targets that are both main and secondary
[1] [7].
7. Attack vector: The route taken by an attacker using various tools and
strategies to reach their target. For example, writing passwords on paper
introduces an alternative attack vector for acquiring system access [15].
8. Defender: An item or procedure that reduces or stops an assault. Automated
systems are often deployed to safeguard company networks from cyber threats
[53][67].
9. Compromise: It is the successful exploitation of a target by an attacker,
resulting in a system or network being rendered useless for legitimate users
[71].
10. Risk: A qualitative evaluation characterizing the probability that an attacker
would effectively breach a system by evading defenses through the use of an
6
exploit [66]. Every layer of the network architecture should undergo risk
analysis, and suitable defenses against potential attacks should be put in place
[54].
7
1.3.1.2 Implementing Network Security
In the OSI model, each layer functions independently and communicates only with
the layers directly above and below it. While security vulnerabilities can be present in
any of these layers, this discussion focuses specifically on the network layer and the
related security measures. The following steps are outlined in Top-Down Network
Design to help create a secure network:
Identify network assets: This involves recognizing every element of the network,
including switches, routers, operating systems, applications, and network data. It's
essential to understand the risks that could arise from these assets being compromised
or accessed inappropriately [61].
Analyzing security risks: These threats can range from malicious intruders to
unsuspecting individuals installing harmful software from the Internet. Risks from
hostile attackers may include denial of service attacks, data theft, and data
manipulation [67].
8
Examine trade-offs and security requirements: Implementing the CIA triad—
Confidentiality, Integrity, and Availability is a common standard for security. Often,
compromises must be made between CPU power, network performance, redundancy,
and other factors to achieve the desired level of security [54] [62].
Create a security strategy: This high-level document outlines the organization's
recommended course of action to meet security needs. It considers network risks and
assets while aligning with the organization's objectives. The plan should include
network topology and a description of network services, such as provider, access,
management, and more [39] [67].
Create a security policy: It is a formal statement that outlines the rules for individuals
who have access to an organization's technology and information assets. While the
specifics of security policies can differ from one organization to another, they
generally address key elements along with items unique to the organization.
Establish security procedures: Procedures outline the processes for configuration,
login, auditing, maintenance, and incident management, effectively implementing the
security policies [80]. They should be designed with the needs of end users, network
administrators, and security administrators in mind, offering clear guidance on how to
address issues, including detection and response protocols.
Ensure security: It is crucial to enforce the established protocols through scheduled
audits, reviewing audit logs, responding to incidents, monitoring literature and alerts,
conducting security tests, training security administrators, and updating policies and
plans [76][42].
1.3.2 DDoS Attack
9
Facing apparent complexity in defense against DDoS attacks, there is no particular
kind of defense that an organization can use in a standard way. There exists increased
complexity in the attribution of DDoS attacks on their mimicry of normal traffic but
usually at a much larger scale [7] [53].
1.3.2.2 Launching a DDoS Attack
A DDoS attack can start in several ways, with the most common method being the
relentless sending of packets from an attacker to a victim server. This flood of data
drains essential resources, making it difficult for legitimate users to access the
targeted services. Another common tactic involves sending a series of malformed
packets, which can cause victim servers to freeze or restart [41] [3]. Additionally,
attackers might take control of machines within a victim's network, leading to
resource depletion and making the network unavailable for both internal and external
services. The methods used to carry out these attacks are varied and often hard to
predict, typically becoming clear only after the attacks have begun. Figure 1.3.2.2
provides an example of a DDoS attack.
Figu
re 1.3.2.2 shows the Example of DDoS attack
A DDoS attack typically unfolds in several distinct phases and involves four main
participants: the attacker, the controller, the zombies, and the victim [21] The attacker
starts the process by scanning for vulnerable ports on machines that can be accessed
remotely [71]. Once a vulnerability is found, the attacker sends out malicious code
10
that, when executed on the targeted machine, replicates itself and initiates the attack.
The malware can also spread by disguising itself as a legitimate internet packet, like
an email attachment [21]. Throughout this process, the attacker retains remote control,
using spoofing techniques to avoid detection and make it difficult to identify the
machines involved [62].
11
i. Connectionless Nature of UDP: Unlike TCP, UDP does not establish a
connection before sending data. Each UDP packet is independent, and the
recipient does not send acknowledgments. This lack of connection setup
overhead makes UDP flood attacks relatively easy to execute.
ii. Volume of Packets: The effectiveness of a UDP flood attack relies on the
sheer volume of UDP packets sent to the target. The attacker may use various
means to generate a massive number of packets, including the use of botnets,
amplification techniques, or simply saturating the target with a high volume of
traffic [76][62]
iii. Port Exhaustion: The attacker may target specific UDP ports on the victim
server or flood it indiscriminately with packets to random ports. This can lead
to port exhaustion, where legitimate services on the target server become
inaccessible due to the overwhelming volume of incoming traffic.
iv. Distributed Denial of Service (DDoS): UDP flood attacks are often carried
out in a distributed manner using multiple compromised devices or a botnet.
This distributed approach increases the scale and impact of the attack, making
it more challenging for the target to mitigate the incoming traffic [41][6]
v. Detection and Mitigation: Detecting and mitigating UDP flood attacks can
be challenging because of the connectionless nature of UDP. Traditional
security measures, such as firewalls, may struggle to differentiate between
legitimate and malicious UDP traffic. Specialized DDoS mitigation solutions
are often employed to identify and filter out the malicious traffic, allowing
legitimate traffic to reach its intended destination [61][51].
vi. Reflective Amplification: In some cases, attackers leverage reflective
amplification techniques, such as DNS amplification or NTP amplification, to
magnify the volume of their attack traffic. This involves sending requests to
third-party servers that, in turn, send larger responses to the target, amplifying
the impact of the attack [33][41]
12
mitigation services. Regular monitoring and timely response to unusual patterns of
network traffic are crucial in mitigating the impact of such attacks.
2. TCP_SYN Flood:
Manipulates the TCP protocol's three-way handshake, causing the server to wait idly,
impacting valid user service. A TCP_SYN flood attack is a type of Distributed Denial
of Service (DDoS) attack that exploits the TCP (Transmission Control Protocol)
three-way handshake process to overwhelm a target server and disrupt its services
[31]. In a typical TCP_SYN flood attack, the attacker floods the target server with a
high volume of TCP SYN (synchronization) packets, [74] exploiting the fact that the
server must allocate resources and maintain state information for each incoming
connection attempt.
Here are key aspects of a TCP_SYN flood attack:
13
performance or complete unavailability of services for legitimate users [33]
[56]
iv. Impact on Legitimate Users: As the server becomes inundated with SYN
packets from the attacker, it has fewer resources available to process
legitimate connection requests. This can lead to delays, timeouts, or service
unavailability for legitimate users trying to establish connections with the
targeted server [31].
v. Distributed Nature: TCP_SYN flood attacks are often executed in a
distributed manner, using multiple compromised devices or a botnet to amplify
the volume of attack traffic. This distributed approach makes it more
challenging for the target to distinguish between legitimate and malicious
connection requests [ 13]
vi. Detection and Mitigation: Detecting and mitigating TCP_SYN flood attacks
require specialized security measures. Intrusion detection systems (IDS),
firewalls, and DDoS mitigation solutions are commonly used to identify and
filter out malicious SYN packets. Additionally, some solutions implement rate
limiting and connection tracking to differentiate between normal and abnormal
traffic patterns [4].
vii. TCP SYN Cookies: Some servers employ TCP SYN cookies as a defense
mechanism against SYN flood attacks. TCP SYN cookies allow a server to
handle connection requests without allocating resources until the three-way
handshake is completed. This helps prevent resource exhaustion during the
initial stages of a connection attempt.
Defending against TCP_SYN flood attacks involves a combination of network
security best practices, proper configuration of firewalls and intrusion prevention
systems, and the use of dedicated DDoS mitigation services to filter out malicious
traffic and ensure the availability of services for legitimate users. Regular monitoring
and timely response to anomalous network patterns are crucial in mitigating the
impact of such attacks [44].
14
A flood attack employing ICMP ping that takes advantage of the IP protocol's
maximum packet size by sending large packets. An ICMP (Internet Control Message
Protocol) flood attack, commonly known as a Ping of Death (PoD), is a type of
Distributed Denial of Service (DDoS) attack that exploits the ICMP protocol to
overwhelm a target's network or system with an excessive volume of ping requests
[56]. The attack takes advantage of the IP (Internet Protocol) maximum packet size by
sending unusually large ICMP packets, causing disruption and potentially leading to a
denial of service.
i. ICMP Protocol and Ping: ICMP is a network layer protocol used for
diagnostic purposes and error reporting in IP networks. The Ping utility, based
on ICMP Echo Request and Echo Reply messages, is commonly used to test
network connectivity. In an ICMP flood attack, the attacker leverages the Ping
utility to send a massive number of ICMP Echo Request packets to the target
[77].
ii. Ping of Death (PoD): The "Ping of Death" refers to a specific type of ICMP
flood attack that involves sending ICMP packets larger than the maximum
allowable size specified by the IP protocol. Historically, some systems had
vulnerabilities that allowed them to be crashed or disrupted when attempting
to process oversized ICMP packets, leading to the term "Ping of Death" [1].
iii. Packet Fragmentation: In an ICMP flood attack, the attacker may exploit the
IP protocol's ability to fragment packets. By sending oversized ICMP packets
that need to be fragmented to traverse the network, the attacker can cause the
target system to expend resources reassembling these packets. This can result
in the target's resources being overwhelmed, leading to degraded performance
or service unavailability [12][4].
iv. Amplification Factor: Similar to other DDoS attacks, ICMP flood attacks can
be executed in a distributed manner using multiple compromised devices or a
15
botnet. The distributed nature of the attack increases the volume of ICMP
traffic directed at the target, amplifying its impact.
v. Network Congestion: The high volume of ICMP packets generated by the
attack can lead to network congestion, impacting not only the targeted system
but also the surrounding network infrastructure. Legitimate traffic may
experience delays or packet loss as a result of the increased network load [41]
[6]
vi. Detection and Mitigation: Detecting and mitigating ICMP flood attacks
require specialized security measures. Network intrusion detection systems
(IDS), firewalls, and DDoS mitigation solutions are commonly used to
identify and filter out malicious ICMP traffic. Rate limiting, traffic shaping,
and IP filtering may also be employed to differentiate between normal and
abnormal traffic patterns.
vii. Preventing IP Fragmentation Vulnerabilities: Modern systems and network
devices are designed to handle fragmented packets appropriately, and many of
the vulnerabilities that allowed for the exploitation of the Ping of Death have
been addressed through software updates and security patches. It is crucial for
organizations to keep their systems up to date with the latest security patches
to mitigate vulnerabilities associated with oversized ICMP packets [52].
Defending against ICMP flood attacks involves a combination of network security
best practices, proper configuration of firewalls and intrusion prevention systems, and
the use of dedicated DDoS mitigation services to filter out malicious traffic. Regular
monitoring and timely response to anomalous network patterns are essential for
mitigating the impact of such attacks.
Detecting and mitigating DDoS attacks, including UDP flood, TCP_SYN flood, and
ICMP flood attacks, is crucial for maintaining the availability and performance of
online services. Considering UDP flood, TCP_SYN flood, and ICMP flood attacks in
DDoS detection is essential for building effective and adaptive defense mechanisms.
16
Supervised learning techniques, whether in classification or regression, play a crucial
role in training models to recognize patterns associated with these attacks, allowing
for timely and accurate identification and response [62].
Here are some reasons why these DDoS attacks are important considerations in the
context of detection, employing both supervised learning techniques like classification
and regression:
1. Diverse Attack Vectors: UDP flood attacks, TCP_SYN flood attacks, and
ICMP flood attacks represent diverse attack vectors targeting different aspects
of network communication. Supervised learning algorithms can be trained to
recognize patterns specific to each type of attack, allowing for the
development of detection models tailored to the characteristics of these attacks
[31].
2. Commonality in Traffic Patterns: DDoS attacks often exhibit distinctive
traffic patterns that can be identified through machine learning. By utilizing
supervised learning techniques, classifiers can be trained to distinguish
between normal and malicious traffic based on features such as packet rates,
packet sizes, and connection initiation behavior. This enables the detection of
anomalous activity associated with UDP, TCP_SYN, and ICMP flood attacks
[22].
3. Feature Extraction: Machine learning models for DDoS detection often rely
on feature extraction from network traffic data. Features could include
attributes like packet size, frequency of connection requests, or changes in
traffic patterns over time. Supervised learning allows the model to learn the
importance of various features in distinguishing between benign and malicious
traffic, contributing to accurate classification or regression.
4. Real-Time Detection: DDoS attacks can have rapid and dynamic
characteristics, requiring real-time detection and response mechanisms.
Supervised learning models, once trained, can operate in real-time to analyze
incoming traffic and make decisions on whether certain patterns are indicative
of an ongoing attack [33]. This capability is essential for promptly mitigating
the impact of DDoS attacks.
17
5. Adaptability to Evolving Threats: DDoS attack techniques are constantly
evolving, and attackers often employ new strategies to bypass traditional
security measures. Supervised learning models can adapt to new threats by
continuously updating their training data and learning from the evolving
characteristics of both normal and malicious traffic.
6. False Positive Reduction: The use of supervised learning allows for the
creation of models that can minimize false positives. By training on diverse
datasets that include both normal and attack scenarios, the model learns to
make more accurate predictions, reducing the likelihood of misclassifying
legitimate traffic as malicious [45].
7. Regression for Traffic Volume Prediction: In addition to classification,
regression techniques can be valuable for predicting the volume of incoming
traffic. This is particularly relevant for DDoS attacks, where a sudden surge in
traffic is a key indicator. Regression models can estimate expected traffic
volumes based on historical data, helping organizations proactively prepare for
potential attacks [21].
8. Integration with DDoS Mitigation Strategies: Supervised learning models
can be integrated with DDoS mitigation strategies to automate the process of
identifying and mitigating attacks. By combining detection and mitigation
efforts, organizations can respond quickly to minimize the impact of DDoS
attacks on their services.
1.4 Motivation
This work focuses on detecting DDoS attacks in computer networks using a limited
set of features through a machine learning approach. By utilizing fewer
18
characteristics, we can process network packets more quickly to determine whether
they are part of an attack or just regular traffic, which is essential for the rapid
identification of DDoS attacks [45].
The study examines three major types of DDoS attacks: ICMP, TCP_sync, and UDP.
By analyzing the distinct features of each attack vector, we aim to customize machine
learning algorithms for precise and timely detection. The chosen algorithms—
Decision Tree (DT), Multi-Layer Perceptron (MLP), Logistic Regression (LR), and
K-Nearest Neighbors (KNN)—represent a variety of techniques that bring different
strengths to pattern recognition and classification.
The practical implications of this research are as significant as its potential to advance
the field of cybersecurity. Effectively implementing machine learning-based DDoS
detection systems can enable organizations to proactively safeguard against cyber
threats, ensuring the smooth operation of essential online services. Additionally, the
findings from this research could play a role in the ongoing conversation about
enhancing cybersecurity frameworks worldwide.
1.5 Contributions
This study makes a significant contribution to the field of cybersecurity by using
machine learning techniques to detect DDoS attacks. The main contributions can be
summarized as follows: it provides a detailed analysis of various types of DDoS
attacks, including ICMP Flood, TCP SYN Flood, and UDP Flood. This in-depth
understanding allows for a more nuanced evaluation of how effectively machine
learning algorithms can identify these specific attack vectors. The research includes
the practical application of several machine learning algorithms, such as K-Nearest
Neighbors (KNN), Decision Tree, Multi-layer Perceptron, and Logistic Regression.
The implementation is done in Python, which makes the code and methodologies
available to the broader community. Additionally, the study utilizes the KDD99
dataset, a well-known benchmark for intrusion detection. This dataset presents a range
of real-world scenarios that can be used to evaluate the performance of machine
learning algorithms in detecting DDoS attacks. The research further enriches the field
by conducting a comparative analysis of different machine learning techniques. The
19
study compares KNN, Decision Tree, Multi-layer Perceptron, and Logistic
Regression, assessing their performance through various metrics like accuracy,
precision, false positive rate (FPR), error rate, F1-Score, and ROC curve. It
systematically evaluates these machine learning techniques, providing insights into
their strengths and weaknesses. This analysis helps identify the most effective
algorithm for detecting DDoS attacks by establishing a clear set of criteria.
Additionally, the research includes practical implementations of DDoS attacks,
simulating real-world scenarios, which enhances the study's relevance by testing the
algorithms in conditions that reflect actual cyber threats. The findings offer valuable
guidance for cybersecurity professionals and researchers, highlighting machine
learning algorithms that are effective in detecting DDoS attacks. This knowledge can
significantly aid in developing more resilient and adaptable cybersecurity systems. In
conclusion, the research deepens the understanding of DDoS attack detection by
integrating a thorough analysis of specific attack types, practical applications of
machine learning algorithms, and a comprehensive evaluation using established
metrics. The results contribute to ongoing efforts to strengthen cybersecurity
measures against evolving cyber threats.
20
Simulating various DDoS attacks, including TCP-SYN Flood, UDP Flood, and ICMP
Flood, to create realistic scenarios for assessing the effectiveness of machine learning
algorithms in detecting and mitigating these cyber threats.
Evaluating and comparing the performance of machine learning algorithms using the
KDD99 dataset. The algorithms will be assessed based on key metrics such as
accuracy, recall, precision, and F1-score, along with ROC curves to provide a
comprehensive view of their ability to detect DDoS attacks.
By achieving these objectives, the study aims to offer valuable insights into the
application of machine learning in cybersecurity, particularly in the context of DDoS
attack detection. The results will not only deepen our understanding of algorithm
performance but also guide the development of more robust and effective defense
mechanisms against evolving cyber threats.
21
Chapter 2
Literature Review
2.1 Overview
DDoS attacks have been on the rise in recent years, making it important to have reliable and
effective detection systems. Machine learning has shown great potential in improving
detection methods to keep up with the constantly changing nature of these attacks. A
literature review summarizes key studies in this area, focusing on their methods, datasets and
performance results.
In the perspective of Internet of Things (IoT) devices, which are increasingly targeted by
botnet-driven DDoS attacks, Doshi et al. [17] developed lightweight models designed for
resource-constrained devices. These models were optimized for real-time detection,
balancing computational efficiency with accuracy. Kirubavathi and Anitha [15] focused on
Android-based botnets, using structural analysis and machine learning to detect botnet
activity in its early stages, often a precursor to large-scale DDoS attacks. Bhushan and Gupta
[11] highlighted the integration of machine learning with SDN for DDoS mitigation in cloud
environments, achieving improved scalability and flexibility.
22
Ensemble and hybrid approaches have also been explored to enhance detection capabilities.
Das et al. [22] proposed an ensemble-based model that integrated multiple machine learning
algorithms to improve the robustness and accuracy of DDoS detection. Tuan et al. [27]
evaluated various machine learning models, including neural networks and support vector
machines, for botnet-based DDoS detection, concluding that neural networks outperformed
traditional classifiers in terms of detection speed and accuracy. Cao et al. [13] presented a
genetic algorithm-based solution to address the limitations of static detection models,
particularly for protecting Hadoop clusters under attack.
Despite these advancements, several challenges remain. Bhushan and Gupta [11] noted the
difficulty of maintaining low false-positive rates in real-time environments with dynamic and
diverse traffic. Idhammad et al. [16] suggested that distributed systems integrated with
machine learning could improve scalability and response times. Doshi et al. [17] emphasized
the importance of lightweight detection models for IoT environments to ensure both
computational efficiency and accuracy. These studies collectively underscore the potential of
machine learning in combating modern DDoS threats while pointing to areas such as real-
time scalability, resilience, and adaptability as critical directions for future research.
In a similar vein, Bhushan and Gupta [11] focused on the use of Software-Defined
Networking (SDN) for DDoS mitigation in cloud environments. Their approach highlighted
SDN’s potential in handling large-scale network traffic by offering centralized control and
23
efficient resource management. By decoupling the control plane from the data plane, SDN-
based networks can provide more dynamic and responsive management of network resources,
which is crucial when defending against distributed attacks. Their findings suggest that SDN,
when combined with distributed systems, can offer enhanced performance and scalability for
DDoS mitigation in cloud infrastructures. The study underlines the need for distributed
approaches to manage increasingly sophisticated and large-scale attacks effectively.
Pillutla and Arjunan [14] introduced an innovative method for DDoS mitigation based on
fuzzy self-organizing maps (FSM) within a cloud environment. This method, designed for
distributed systems, aims to reduce false positives while maintaining high detection accuracy.
By utilizing FSM, their approach adapts to varying attack patterns, providing a flexible
mechanism for real-time traffic analysis in cloud computing. Furthermore, their system’s
ability to dynamically classify and detect malicious traffic in large, distributed systems
presents a significant advantage over traditional methods. However, the approach could
benefit from further optimization and integration with other machine learning models to
improve its adaptability to emerging DDoS attack strategies.
Idhammad et al. [16] explored a semi-supervised machine learning approach for DDoS
detection, which incorporates both labeled and unlabeled data to enhance the detection
process. Their method takes advantage of distributed systems to efficiently process large-
scale datasets in real-time. The combination of supervised and unsupervised learning
techniques allows the system to adapt to a variety of attack patterns while maintaining a low
false positive rate. This technique is especially suitable for cloud environments, where the
volume and variety of traffic are continuously evolving. Further advancements could involve
integrating additional data sources and refining the semi-supervised learning model to handle
more complex attack vectors.
Tuan et al. [27] also contributed to the field by evaluating the effectiveness of machine
learning techniques in detecting DDoS attacks driven by botnets. Their research highlighted
the importance of distributed systems like Hadoop for large-scale DDoS attack detection.
Hadoop's ability to process massive volumes of data in parallel makes it an excellent choice
for real-time detection in cloud environments, where botnets can generate significant traffic
volumes. Their findings emphasize the scalability and efficiency of distributed data
processing systems in combating complex DDoS attacks. Expanding the approach to
24
incorporate other machine learning algorithms and classifiers could further improve detection
accuracy and adaptability in the face of evolving DDoS tactics.
Bhushan and Gupta [11] used Multi-Layer Perceptrons (MLP) for detecting DDoS attacks in
Software-Defined Networking (SDN) environments. MLP, a type of deep learning model,
was found to improve detection accuracy by learning complex patterns in large datasets.
Idhammad et al. [16] implemented Logistic Regression to classify traffic as either attack or
legitimate, demonstrating its ability to offer reliable and computationally efficient results.
Logistic Regression is a linear classifier, and its performance in binary classification tasks for
DDoS detection was emphasized in their findings.
Basic Idea: KNN is a non-parametric classification algorithm that classifies data points based
on the majority class of their nearest neighbors in the feature space.
Implementation Points:
25
Strengths: Simple to implement, effective in high-dimensional spaces.
Implementation Points:
Splitting Criteria: Use criteria like Gini impurity or Information Gain to
decide the best feature at each node.
Tree Pruning: After the tree is built, prune branches to prevent overfitting
and improve the generalization ability.
Handling Imbalanced Data: Decision Trees can struggle with imbalanced
data, so techniques like class balancing or cost-sensitive learning may be
necessary [6].
Strengths: Easy to interpret, requires little data preprocessing, and is capable of handling
both numerical and categorical data.
Weaknesses: Can overfit if the tree is too deep and struggles with capturing complex
relationships without pruning or ensemble methods.
26
Figure 2.3.2 Decision Tree Algorithm
2.3.3 Multi-layer Perceptron Algorithm
Multi-Layer Perceptrons (MLP) are a class of feedforward neural networks designed to solve
supervised learning problems, including DDoS detection. By utilizing multiple layers of
neurons, MLPs can learn complex, non-linear relationships between input features, making
them highly effective for identifying DDoS attack patterns in network traffic [3]. Below are
the key elements and steps in implementing MLPs for DDoS detection:
Key Features of MLP:
Architecture: An MLP consists of three layers: the input layer, one or more hidden
layers, and the output layer. Each neuron in the hidden layers computes a weighted
sum of the inputs and applies an activation function, such as Sigmoid or ReLU, to
generate an output. The final output layer produces the classification result (e.g.,
attack or normal traffic). These networks are trained using a backpropagation
algorithm, where the network adjusts weights based on the error between predicted
and actual values [11][17].
Training Process: MLPs are trained using labeled data, which requires the
classification of network traffic as either legitimate or an attack. The backpropagation
algorithm works by calculating the gradient of the loss function with respect to each
weight in the network and adjusting the weights to minimize the error. This iterative
process, known as stochastic gradient descent (SGD), ensures that the network learns
to make accurate predictions [11][16].
27
Steps for Implementing MLP for DDoS Detection:
1. Data Preprocessing:
i Feature Selection: MLPs require relevant features from the network traffic data.
Features like packet size, flow duration, source/destination IP, protocol types, and
number of packets are essential for effective DDoS detection [11][17].
ii Normalization: MLPs perform better when the input data is normalized to a
standard scale. Normalization improves the convergence rate of the model and
prevents certain features from dominating the learning process due to their larger
numerical values [11][16].
2. Model Construction:
i Input Layer: The number of neurons in the input layer corresponds to the number
of features in the dataset. Each neuron receives a feature value as input.
ii Hidden Layers: The number and size of hidden layers are critical in defining the
complexity of the model. Bhushan and Gupta [11] recommend experimenting
with different configurations to optimize performance.
iii Activation Function: Popular activation functions include Sigmoid, ReLU, and
Tanh. Sigmoid is often used in binary classification tasks because it outputs
probabilities between 0 and 1, while ReLU is favored for its ability to efficiently
handle non-linearities in large datasets [11][17].
3. Training the Model:
i Backpropagation: The MLP uses the backpropagation algorithm to adjust weights
during training. The loss function (such as mean squared error or cross-entropy) is
calculated, and the gradient descent method is employed to update the weights and
reduce the error.
ii Optimization: Common optimizers include Stochastic Gradient Descent (SGD)
and Adam. These methods help fine-tune the weights by minimizing the loss
function and improving model accuracy [11].
4. Model Evaluation:
i Cross-validation: After training the model, cross-validation is performed to
evaluate its performance. This technique splits the data into multiple subsets,
training the model on some while validating it on others to prevent overfitting
[11].
28
ii Metrics: Performance metrics such as accuracy, precision, recall, and F1 score are
used to assess the model's ability to correctly classify network traffic. High
precision and recall are especially important in DDoS detection to minimize false
positives and false negatives [17].
5. Testing and Deployment:
i Real-time Testing: Once the model is trained and evaluated, it is deployed in a
real-time environment where it classifies incoming traffic as normal or suspicious.
The MLP model's performance is continuously monitored to ensure it adapts to
evolving attack patterns [1].
ii Threshold Setting: A decision threshold is applied to classify traffic. If traffic
surpasses a predefined threshold indicating an attack, the system flags it as a
potential DDoS attack [11].
29
attack traffic can be approximated by a linear function, which is often the case with structured
network traffic data. However, its performance can decrease if the data relationships are non-
linear or too complex
Logistic Regression is a widely used machine learning technique for binary classification
tasks, such as distinguishing between normal network traffic and DDoS (Distributed Denial
of Service) attack traffic. It models the relationship between the dependent variable (binary
output, in this case, whether the traffic is a DDoS attack or not) and one or more independent
variables (features such as packet rate, connection duration, etc.) [43]. The logistic regression
model calculates the probability of an instance belonging to a particular class by applying the
logistic (sigmoid) function to the weighted sum of input features.
Logistic regression is attractive for DDoS detection because of its simplicity, interpretability,
and efficiency. It works particularly well when the decision boundary between normal and
attack traffic can be approximated by a linear function, which is often the case with structured
network traffic data. However, its performance can decrease if the data relationships are non-
linear or too complex.
The fundamental steps for implementing Logistic Regression for DDoS detection include:
1. Data Preparation
30
2. Model Setup
i Define the Output: Logistic regression is a binary classification model that
predicts the probability of the given input belonging to one of the two classes:
DDoS attack or normal traffic [17]. The output value ranges from 0 to 1,
representing the likelihood of the input belonging to the "attack" class.
ii Sigmoid Function: The key to logistic regression is the sigmoid activation
function, which transforms the output into a probability value. The formula for
the sigmoid function is:
where Z is the weighted sum of the input features, and eee is the base of the
natural logarithm [11].
3. Training the Model
i Optimization: The goal of training the logistic regression model is to find the
optimal weights (parameters) that minimize the prediction error. This is done
by using the logistic loss function (also called cross-entropy loss) which
measures the difference between predicted probabilities and actual labels
(attack or normal). The model is trained by minimizing this loss using
optimization algorithms like Gradient Descent [17].
ii Gradient Descent: Gradient descent is used to iteratively adjust the weights of
the logistic regression model by moving in the direction of the steepest
decrease of the loss function. The update rule for the weight vector w is:
where α is the learning rate, and J(w) is the loss function [11].
31
4. Model Evaluation
i Cross-validation: To assess the model’s performance and avoid over fitting, it
is common to perform k-fold cross-validation. In this process, the dataset is
divided into kkk subsets, and the model is trained and validated on different
subsets, ensuring that the model generalizes well to unseen data [17].
ii Performance Metrics: After training the model, various metrics are used to
evaluate its effectiveness, including:
5. Deployment
i Real-time Classification: Once the model is trained, it can be deployed in a
real-time environment where it classifies incoming network traffic as either
normal or an attack. The model uses the learned weights and applies the
sigmoid function to classify new instances [17].
ii Thresholding: A decision threshold is applied to the output of the sigmoid
function to make a classification decision. Typically, if the probability is
greater than 0.5, the model classifies the traffic as a DDoS attack; otherwise, it
is classified as normal traffic. The threshold can be adjusted based on the
desired balance between false positives and false negatives [11]. Many
industries, including marketing, banking, and healthcare, employ logistic
regression extensively for applications including credit scoring, customer
churn prediction, and spam identification.
32
Figure 2.3.4 shows Logistic Regression Algorithm
Numerous studies have highlighted the application of supervised, unsupervised, and hybrid
ML approaches for DDoS detection. For instance, Bhushan and Gupta [11] explored the
mitigation of DDoS attacks in software-defined networking (SDN) environments by
employing ML techniques to analyze and filter malicious traffic. Their approach
demonstrated improved scalability and adaptability, key factors for dynamic cloud
environments. Similarly, Idhammad et al. [16] proposed a semi-supervised ML method that
combines labeled and unlabeled data, significantly enhancing detection accuracy, particularly
in scenarios with limited labeled datasets. Doshi et al. [17] focused on the use of ML
algorithms for detecting DDoS attacks in IoT environments, showcasing their effectiveness in
identifying anomalies in consumer-grade devices.
Among the ML techniques, K-Nearest Neighbors (KNN) has been effective in identifying
malicious patterns based on similarity measures. Decision Trees (DT) have proven valuable
for their interpretability and ability to handle complex decision boundaries. Logistic
Regression (LR) is often used for its simplicity and efficiency in binary classification tasks,
33
such as distinguishing between normal and attack traffic. Multilayer Perceptrons (MLPs), as a
form of deep learning, have demonstrated their ability to learn intricate patterns from high-
dimensional data, as shown in the works of Pillutla and Arjunan [14]. These algorithms form
the foundation for many contemporary DDoS detection systems.
To test and refine detection methods, researchers have simulated various DDoS attack types,
including UDP Flood, TCP_SYN Flood, and ICMP Flood. These attack types mimic real-
world scenarios and challenge detection systems to respond effectively. For example, Tuan et
al. [19] conducted simulations involving botnet-driven DDoS attacks, analyzing the
performance of ML algorithms under high-volume traffic conditions. Pillutla and Arjunan
[14] integrated fuzzy self-organizing maps to detect these attack types, highlighting their
potential in identifying subtle anomalies that traditional rule-based systems might overlook.
Simulating diverse attack scenarios ensures the robustness and adaptability of ML-based
detection mechanisms.
The KD99 dataset is a benchmark for evaluating the performance of intrusion detection
systems (IDS). Its rich set of features, including network traffic data and labeled attack types,
makes it an ideal choice for training and testing ML algorithms. Researchers such as Doshi et
al. [17] and Kirubavathi and Anitha [15] have extensively used KD99 to validate their
models. Doshi et al. [17] demonstrated how feature selection and optimization on KD99
could significantly enhance detection accuracy and precision. Meanwhile, Kirubavathi and
Anitha [15] utilized the dataset to analyze the behaviors of Android botnets, showing how
ML models can adapt to specific attack scenarios. The dataset’s diversity also allows for
comparative studies across different ML approaches, enabling researchers to assess their
models in terms of accuracy, error rates, and computational efficiency.
34
While significant progress has been made, challenges remain in developing scalable and
adaptable detection systems. Studies such as those by Bhushan and Gupta [11] and Pillutla
and Arjunan [14] highlight the need for real-time processing and efficient resource utilization
in cloud and SDN environments. Additionally, balancing detection accuracy with low false
positive rates remains a critical focus. The KD99 dataset, while widely used, also presents
limitations in terms of its age and relevance to emerging attack types, prompting the need for
updated datasets and real-world testing scenarios. The growing prevalence of Distributed
Denial of Service (DDoS) attacks in modern computing environments has spurred extensive
research into detection and mitigation mechanisms. Machine learning (ML) algorithms have
emerged as critical tools for identifying and preventing such attacks due to their ability to
process large volumes of network data, identify patterns, and adapt to evolving threats. The
literature has particularly focused on the implementation of ML algorithms, the simulation of
various DDoS attack types, and performance evaluation using benchmark datasets such as
KD99.
35
Attack
Author Name Year Learning Method Dataset Used
Identified
K, Pradeep & 2025 Logistic Regression, Distributed KDD Cup 1999
Kumar, Pavan & Random Forest, and Denial-of- , NSL-KDD
J, Pradeepa & S, Neural Network Service (DDoS)
Prashantha & classifiers. attack
Khan, Saad.[1]
Raihan Putra 2024 Random Forest, SVM, Distributed CICIDS DDoS
Janivasya, Ika Logistic Regression, or Denial-of- 2017, KDD Cup
Dyah Agustia Decision Trees. Service (DDoS) 1999
Rachmawati. [2] attack.
Sahosh, Zerin & 2024 Random Forest, Distributed KDD Cup 1999,
Faheem, Azraf & Support Vector Denial-of- NSL-KDD
Tuba, Marzana & Machine(SVM), Service (DDoS)
Ahmed, Md & Neural Networks, and attack.
Tasnim, Syed. [3] Decision Trees
Wu, Yeefong. [4] 2023 Random Forest, SVM, Distributed CICIDS DDoS
Logistic Regression, or Denial-of- 2017, KDD Cup
Decision Trees. Service (DDoS) 1999
attack
Hashim, Baydaa 2023 Random Forest, Distributed CICIDS DDoS
& Sallehudin, Support Vector Denial-of- dataset
Hasimi & Safie, Machine (SVM), or K- Service (DDoS)
Nurhizam & Nearest Neighbors attack
Safie, Hizam & (KNN)
Murhg, Hamed &
Abdelghany,
Shaymaa [5]
Kumari, K., 2022 Logistic Regression Distributed CAIDA 2007
Mrunalini, M. [6] and Naive Bayes Denial-of- Dataset
algorithms. Service (DDoS)
attack
Kishore, Dasari 2021 Logistic Regression, Distributed CIC-DDoS2019
& Devarakonda, Decision Tree, Denial-of-
Nagaraju. [7] Random Forest, Service (DDoS)
AdaBoost, Gradient attack
Boost, KNN, and Naive
Bayes.
Borah, Rituparna 2023 K-Nearest Neighbour Distributed CICDDoS2017
& Sarmah, (KNN), Random Forest Denial-of- dataset
Satyajit & Service (DDoS)
Choudhury, Nitin attack
& Mahanta[8]
36
Author Name Year Learning Method Attack Dataset Used
Identified
Tom Ball. [12] 2018 SDN based Cloud DDoS attack KDD99
Wang Y, Li J, 2018 Neural Network DDoS attack KDD99
Zhao Y, Cao N, Model
Li G, Zhu P, Sun
Q. [13]
Pillutla H, 2018/2019/2019 Dempster's tandem Maps-based KDD99
Arjunan A/ Jha rule DDoS
S/ Pritam N
[14]
[20]
[21]
Kirubavathi 2018/ 2019 Structural DDoS attack KDD99
G/Homayoun S interpretation learning
[15]
[19]
Idhammad M/ 2018/ 2018/ An online consecutive DDoS attack UNSW-NB15,
Doshi R/Co N 2018 tractor trailer method NSL-KDD,
[16] and UNB
[17] ISCX 12
[18]
Son NTK/ Khan 2019/ 2019/ DDoS assaults DDoS attack NSL-KDD
MMT/S Das 2019 indepth assessed the
[22] condensed feature set
[23]
[24]
Li Q, Meng L 2019 PCA and New DDoS attack PCA-RNN
[25] Detection Model
37
Author Name Year Learning Method Attack Dataset Used
Identified
39
Chapter 3
Methodology
3.1 Overview
To tackle the constantly changing threat of Distributed Denial of Service (DDoS) attacks,
this study uses a thorough methodology that incorporates machine learning algorithms.
The suggested approach emphasizes the analysis of unusual network traffic patterns to
effectively differentiate between legitimate and harmful activities. By utilizing advanced
feature selection methods, strong model architectures, and careful evaluation strategies,
the system aims to provide accurate and efficient detection of DDoS attacks while
reducing false positives.
Feature Selection
Feature selection is an essential step in the process, aimed at identifying and prioritizing
the most relevant attributes from network traffic data. By reducing dimensionality and
concentrating on key variables, these attributes enhance model performance. The main
features considered in this study include:
40
Dataset Preparation:
The study makes use of the KDD Cup 1999 dataset, which is well-known for effectively
representing both normal and attack traffic. This dataset provides a structured framework
for training and evaluation.
Dataset Structure:
It consists of 41 features that are divided into Basic, Content, and
Traffic categories.
It encompasses various types of attacks, such as DoS, R2L, and U2R.
Preprocessing Steps:
This includes addressing missing values and eliminating duplicates.
Numerical features are normalized using Min-Max Scaling to fit within
a range of [0, 1].
Categorical variables are encoded (for example, Protocol Type: TCP =
1, UDP = 2).
Furthermore, the dataset was divided into training (70%), testing (20%), and validation
(10%) subsets to facilitate a thorough evaluation process.
Model Selection:
A variety of machine learning models were examined to create an effective detection
mechanism:
Traditional Algorithms:
Logistic Regression (LR): A statistical approach used for binary
classification.
K-Nearest Neighbors (KNN): A non-parametric method that identifies
patterns based on proximity measures.
Advanced Techniques:
Random Forest (RF): An ensemble learning technique known for its
high accuracy and robustness.
Multi-Layer Perceptions (MLP): A deep learning framework that can
capture complex relationships.
41
To enhance model performance, hyper parameter tuning methods like grid search were
utilized, concentrating on parameters such as the number of trees in Random Forest and
the number of neurons in MLP.
Performance Evaluation:
The trained models are then thoroughly evaluated for performance using a testing dataset
that mimics real-world situations. We assess how well the models can accurately identify
DDoS attacks while keeping false alarms to a minimum. If necessary, we make
adjustments to the methodology to ensure the best possible outcomes. [28][33][38]
DDoS attacks are mostly determined by two key factors: bandwidth, which measures a
communication channel's data capacity, and throughput, which measures the successful
transfer of data from a source to a destination. [50]
3.3 Dataset
3.3.1 KDD99 dataset and its features
The KDD99 dataset consists of a uniform data gathering process for auditing,
featuring a diverse range of simulated intrusions in a networked military scenario. The
KDD Cup 99 dataset has been extensively used since 1999 and is a crucial benchmark
for assessing abnormality identification techniques [20]. The dataset is accessible in
two versions: entire version, which includes around 500 million packets with 41
characteristics apiece, and another second version, which makes up 20% of original
dataset and includes about 500,000 rows with the same structural properties.
Of the 41 features, thirteen content features inside a connection are based on domain
knowledge, while nine are basic qualities linked to specific TCP connections. Table
(3.3.1) provides a detailed explanation of each characteristic. All characteristics that
43
can be acquired through a TCP/IP connection are considered basic features [20].
There are two categories when it comes to traffic-related factors. [20]
44
hot Content Number of ``hot'' indicators
45
srv_serror_rate Traffic % of connections that have ``SYN'' errors
Attacks that break typical regular intrusion sequence patterns are called Remote-to-
Local (R2L) and User-to-Root (U2R) assaults. These attacks are not like conventional
DoS and probing attacks. DoS and probing attacks vary in that the former often
involve a large number of connections to a specific server or servers in a brief period
of time. In contrast, R2L and U2R assaults typically impact a single connection and
are found in the packet's data parts [20]. These types of assaults are identified by
Content characteristics that examine the data section for unusual activity.
20 %
80 %
46
3.3.2 Data Pre-Processing
Protocol type, service, and flag are the only three of the 28 characteristics that still
have numerical values; the other features are all categorical. The characteristics with
categorical values are changed to numeric values in order to make feature selection
easier in the following stage and help identify the most crucial traits. For every
characteristic of this kind, unique values are found for every entry in that column, and
those values are substituted with numerical values by using a basic integer assignment
starting at 1. Table (3.3.2) below is the reference table for this conversion.
47
Table 3.3.2 Table of conversions from numerical values to categorical variables
Protocol type Flag Service
As was previously mentioned, DDoS assaults may take several forms. The KDD Cup
1999 dataset's class variable includes information on the type of assault that was made
against each packet [79]. Each packet's class variable is altered to reflect whether it is
a "Normal" or "Attack" packet, although this is not necessary for the research. Figure
(3.3.1) displays the percentage of attack and regular packets in the 20% KDD Dataset.
48
After normalization, dataset is acceptable for statistical procedures, which are the
foundation of many feature selection techniques, [80] because it falls within a 0–1
range. Four distinct feature selection methods that provide A ranked list of features
from most significant to least is covered in the next section.
50
Figure 3.4 shows the architecture of MLModel for identifying DDoS attack
51
3.5.2 Testing the Model:
1) Data Preprocessing (Test Set): By apply the same preprocessing steps to the
testing set as have already used for the training set to ensure consistency.
2) Model Evaluation: Feed the trained model with the testing data that has been
preprocessed. evaluating the model's performance using metrics like as area
under the Receiver Operating Characteristic (ROC) curve, F1-score, accuracy,
precision, and recall.
3) Confusion Matrix: It determines the quantity of true positives, true negatives,
false positives, and false negatives, also analyze the confusion matrix. This
offers information on how well the model can categorize occurrences.
4) Tuning and Optimization: Fine-tune hyper parameters or consider feature
engineering to optimize model performance. This step might involve adjusting
parameters based on performance metrics or using techniques like grid search.
Continuous monitoring and periodic retraining are essential to adapt the model
to evolving network conditions and emerging DDoS attack patterns.
52
3.6 Evaluation Metrics
1) Accuracy: Accuracy is defined as the proportion of correctly identified
examples to all occurrences. It is a fundamental metric for evaluating the
model's overall performance.
2) Precision: The percentage of real positive predictions to all anticipated
positives is known as precision. It measures how accurate positive forecasts
are.
3) Sensitivity (Recall): Recall is defined as the ratio of all true positives to all
true positive forecasts. It evaluates the model's ability to explain each and
every excellent case.
4) F1-Score: The F1-score is the harmonic mean of recall and accuracy. It
provides a fair evaluation that considers both false negatives and false
positives.
5) Receiver Operating Characteristic (ROC) Curve: The ROC curve plots the
true positive rate against the false positive rate at various thresholds. The Area
Under the Curve serves as a gauge for the entire performance. [21] [27]
1.2.1 Testbed
• System Specification
1) Device name DESKTOP-ACEO563
2) Processor 8th Gen Intel(R) Core(TM) i5-835U 1.30 GHz
3) Installed RAM 16.00 GB (15.73 GB usable)
4) Storage 256 GB SSD
5) Display 14.6-inch FHD (1920x1080)
6) Graphics Integrated Intel UHD Graphics
7) Connectivity ["Wi-Fi 6", "Bluetooth 5.0", "USB-C", "HDMI"],
8) Pre-installed Software ["Windows 11", "Microsoft Office
365"]
9) System type 64-bit operating system, x64-based processor
• Language and version: Python 3.13.0.
53
• Platforms: Jupyter Notebook and IDLE
• Packages/ Libraries: numpy, sklearn, pickle, tqdm, pandas, seaborn and matpotlib.
References
[2] Raihan Putra Janisavya and Ika Dyah Agustina Machmawati, "DDoS
Detection Using Random Forest, SVM, and Decision Trees." Proceedings of
the 12th International Conference on Cyber Security and Cloud Computing
(CSCloud), pp. 45-52, 2024. DOI: 10.1109/CSCloud.2024.00012.
[3] Sahosh Zerin, Faheem Azraf, Marzana Tuba, Md Ahmed, and Syed
Tasnim,"Machine Learning Approaches for DDoS Attack Detection." Journal
of Network Security, vol. 10, no. 3, pp. 123-130, 2024. DOI:
10.1016/j.jns.2024.03.005.
[4] Yefeng Wu, "DDoS Attack Identification Using Random Forest, SVM, and
Logistic Regression." IEEE Transactions on Information Forensics and
Security, vol. 18, pp. 789-798, 2023. DOI: 10.1109/TIFS.2023.3267890.
[5] Baydaa Hashim, Hasimi Sallehudin, Nurhizam Safie, Hizam Safie, Hame
Murhg, and Shaymaa Abdelghany, "DDoS Attack Detection Using Machine
Learning Models." International Journal of Advanced Computer Science and
Applications, vol. 14, no. 1, pp. 25-32, 2023. DOI:
10.14569/IJACSA.2023.0140104.
[6] K. Kumari and M. Mrunalini, "Logistic Regression and Naive Bayes for DDoS
Detection." Proceedings of the 2022 International Conference on Machine
54
Learning and Cyber Security (MLCS), pp. 78-84, 2022. DOI:
10.1109/MLCS.2022.00015.
[8] Riturparna Borah, Satyajit Samah, Nitin Choudhury, and [Author's Full Name]
Mahanta, "K-NN and Random Forest for Detecting DDoS Attacks."
International Journal of Network Security & Its Applications, vol. 15, no. 2,
pp. 45-54, 2023. DOI: 10.5121/ijnsa.2023.15204.
55
[14] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “ A deep learning approach to
network intrusion detection,” IEEE Transactions on Emerging Topics in
Computational Intelligence, vol. 2, no. 1, pp. 41–50, 2018.
[17] Mydoom, “Mydoom lesson: Take proactive steps to prevent DDoS attacks |
Com- puterworld.”
http://www.computerworld.com/article/2574799/security0/mydoom- lesson–
takeproactive-steps-to-prevent-ddos-attacks.html. Date last accessed: June 2,
2017.
http://www.theguardian.com/media/2010/dec/08/operation-payback-
mastercardwebsite-wikileaks. Date last accessed: June 2, 2016.
"https://www.abusix.com/blog/5- biggest-ddos-attacks-of-the-past-decade".
https://www.tripwire.com/state-of-security/featured/5- notable-ddos-attacks-
2017/. Date last accessed: June 10, 2018.
56
Alsirhani, S. Sampalli, and P. Bodorik, “DDoS Detection System: Using a Set of
Clas- sification Algorithms Controlled by Fuzzy Logic System in Apache Spark,”
IEEE Trans- actions on Network and Service Management, 2019.
[26] Jia, X. Huang, R. Liu, and Y. Ma, “A DDoS Attack Detection Method
Based on Hybrid Heterogeneous Multiclassifier Ensemble Learning,” J.
Electr. Comput. Eng., vol. 2017, 2017.
57
[27] S. M. T. Nezhad, M. Nazari, and E. A. Gharavol, “A Novel DoS and
DDoS Attacks De- tection Algorithm Using ARIMA Time Series Model and
Chaotic
System in Computer Networks,” IEEE Commun. Lett., vol. 20, no. 4, pp.
700– 703, 2016.
[32] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term memory
recurrent neural network classifier for intrusion detection,” in 2016
International Conference on Platform Technology and Service (PlatCon), pp.
1–5, Feb 2016.
58
Review, vol. 1, no. 1, p. 8, 2018.
[38] C. Yin, Y. Zhu, J. Fei, and X. He, “A Deep Learning Approach for
Intrusion Detection Using Recurrent Neural Networks,” IEEE Access, vol. 5,
pp. 21954– 21961, 2017.
[40] J. Choi, C. Choi, B. Ko, D. Choi, and P. Kim, “Detecting Web based
DDoS
59
Attack using MapReduce operations in Cloud Computing Environment,” J.
Internet Serv. Inf. Secur., no. 8111, pp. 28–37, 2013.
60
[48] Sahay R, Blanc G, Zhang Z, Debar H. Aroma: an SDN based
autonomic DDoS mitigation framework. Computer Security. 2017;70:1–18.
https://doi.org/10.1016/j.cose.2017.07.008.
[50] Wang TS, Lin HT, Cheng WT, Chen CY. “DBod: Clustering and
detecting DGAbased botnets using DNS traffic analy- sis. Computer Security.
2017;64:1–15.
[51] Ali ST, Mc Corry P, Lee PHJ, Hao F. Zombie Coin 2.0: managing
next-generation Botnets using Bitcoin. Int J Inform Security. 2017;17:411.
[54] Tom Ball. Malicious Botnets responsible for 40% of global login
attempts. 2018. https://www.cbronline.com/news/ malicious-botnets-login
61
[57] Kirubavathi G, Anitha R. Structural analysis and detection of android
Botnets using machine learning techniques. Int J Inf Secur. 2018;17(2):153–
67.
62
[64] Son NTK, Dong NP, Long HV, Son LH, Khastan A. Linear quadratic
regulator problem governed by granular neutro- sophic fractional diferential
equations. ISA
Trans. 2019. https://doi.org/10.1016/j.isatra.2019.08.006.
[65] Khan MMT, Singh K, Son LH, Abdel-Basset M, Long HV, Singh SP,
“A novel and comprehensive trust estimation clus- tering based approach for
large scale wireless sensor networks”. 2019. IEEE. pp. 58221–58240.
[69] Tuan TA, Long HV, Son LH, Kumar R, Priyadarshini I, Son NTK.
Performance evaluation of Botnet DDoS attack detec- tion using machine
learning. Evol Intell.
2020;13:283–94.
63
[71] “kddcup99.html,” [Online]. Available:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[80] Tang, Guiji, Xiaolong Wang, and Yuling He. "A novel method of fault
diagnosis for rolling bearing based on dual tree complex wavelet packet
64
transform and improved multiscale permutation entropy." Mathematical
Problems in Engineering 2016 (2016).
65