0% found this document useful (0 votes)
11 views65 pages

Final Thesis 29.01.2025

This document provides an overview of Distributed Denial-of-Service (DDoS) attacks, detailing their operation, impact, and the challenges in defending against them. It highlights the evolution of DDoS threats alongside the growth of interconnected devices and emphasizes the potential of machine learning techniques for detecting and mitigating these attacks. The research focuses on specific attack vectors, including UDP Flood, TCP SYN Flood, and ICMP Flood attacks, aiming to develop more effective DDoS detection systems to protect modern digital infrastructures.

Uploaded by

Ali Raza Chandio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views65 pages

Final Thesis 29.01.2025

This document provides an overview of Distributed Denial-of-Service (DDoS) attacks, detailing their operation, impact, and the challenges in defending against them. It highlights the evolution of DDoS threats alongside the growth of interconnected devices and emphasizes the potential of machine learning techniques for detecting and mitigating these attacks. The research focuses on specific attack vectors, including UDP Flood, TCP SYN Flood, and ICMP Flood attacks, aiming to develop more effective DDoS detection systems to protect modern digital infrastructures.

Uploaded by

Ali Raza Chandio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Chapter 1

Introduction
1.1 Overview

Distributed Denial-of-Service (DDoS) attacks are one of the most prevalent and
destructive cybersecurity threats in existence today. Flooding a target system is how
these attacks operate. This deluge of traffic frequently originates from a botnet,
which is a collection of infected devices under remote attacker control [2]. By
creating an unusual spike in requests, DDoS attacks make it impossible for the target
system to serve real users. The attacks primarily aim to exhaust important resources
like memory, CPU power, and network bandwidth, which can result in serious service
disruptions, monetary losses, and reputational damage to a company. The effects of
these attacks can be severe and pervasive, regardless of whether they are motivated
by extortion, economic sabotage, or political goals [4][2].

A standard network connection uses a three-way handshake process to make sure that
both the client and server are prepared to communicate [2]. This process starts when
the client sends a connection request called (SYN packet) to the server. The server
acknowledges this request with a SYN-ACK packet and the client completes the
handshake by sending back an acknowledgment called (ACK packet) [6]. However,
DDoS attackers take benefit of this mechanism to disrupt the normal operations. A
typical example is the TCP-SYN flood attack, in which the attacker overload the
server with a massive number of SYN packets without completing the handshake.
These incomplete connections drain server resources [8] such as memory and
processing power, as the system must maintain these half-open connections until they
time out. With the server's resources fully utilized, legitimate clients or the users are
unable to establish a connection, leading to service denial.[3][9]

1
The rapid growth of interconnected devices, including Internet of Things (IoT)
systems and cloud infrastructure, has greatly expanded the potential for DDoS threats
[44]. As these technologies become more widespread, attackers can tap into an
increasing number of vulnerable devices to launch large-scale attacks. To make
matters worse, modern DDoS attacks have evolved to be more sophisticated, using
adaptive strategies like multi-vector attacks that hit multiple layers of a system at once
[7]. Traditional defensive measures, such as firewalls, intrusion detection systems and
rate-limiting techniques, often fall short against these advancing threats. This
highlights the pressing need for more advanced and dynamic defense mechanisms [5].

Users

C&C

BOTS
Victim
Server

Figure 1.1 Demonstrating DDOS Attack

A best possible way to mitigate DDoS attacks is by using machine learning


techniques. By examining large volume of network traffic data, machine learning
models can find patterns that suggest DDoS activity, such as unusual traffic spikes,
repetitive request patterns, or irregularities in network behavior [2]. These models can
continuously adapt and enhance their detection abilities, allowing for more precise
identification of harmful traffic [1]. Additionally, machine learning-based solutions
can react in real-time, which helps to shorten the response time for mitigating attacks

2
and lessens the impact on legitimate users [23]. As cyber threats keep evolving,
incorporating machine learning into DDoS defense systems is a vital step in ensuring
the resilience of modern digital infrastructures [23][1]

1.2 Problem Statement


The growth of cyber space The Internet has increasingly become an integral part of
the daily life of an individual in consonance with an interconnected and digitally
driven world. This drives connectivity, from email to messaging to video calls;
commerce, from online shopping to banking; a rapid exchange of instant information
via social media and cloud-based applications. Nevertheless, such increased reliance
on the Internet has also resulted in systems and networks being vulnerable to a
plethora of cyber-attacks [56].

Figure-1.2 Global Top Daily DDoS Attacks on the Digital Attack Map

Traditional detection methods, like firewalls and rule-based intrusion detection


systems, often struggle to effectively combat modern DDoS attacks. These
approaches usually depend on static thresholds or predefined rules to spot unusual
traffic patterns [55]. However, attackers have become increasingly skilled at

3
bypassing these defenses by employing advanced and adaptive tactics, such as multi-
vector attacks or imitating legitimate user behavior.[82]

Machine learning (ML) has emerged as a promising solution to tackle these issues.
Unlike traditional methods, ML algorithms can process large volumes of real-time
network traffic data to identify patterns and detect subtle anomalies that might signal a
DDoS attack. The benefits of using machine learning for DDoS detection include:
1. Anomaly Detection in Network Traffic: Machine learning models are
capable of identifying unusual patterns in network traffic, such as sudden
increases in request rates or atypical geographic traffic distributions, which
often signal potential attacks [79][76].

2. Differentiation between Legitimate and Malicious Traffic: Sophisticated


algorithms can effectively differentiate between real user activity and
malicious traffic, reducing the chances of false positives and ensuring that
genuine users are not mistakenly blocked [43].

3. Adaptation to Evolving Attack Strategies: Machine learning models can


continuously learn from new data and adjust to new attack techniques,
enhancing their effectiveness against ever-changing and complex threats [47]
[44].

This research aims to investigate how effective machine learning techniques are in
identifying and mitigating certain types of DDoS attacks. The study concentrates on
three prevalent attack vectors:

1. UDP Flood Attacks: These attacks inundate the target system with a massive
influx of User Datagram Protocol (UDP) packets, which drains bandwidth and
resources [56] [51].

4
2. TCP_SYN Flood Attacks: By taking advantage of the TCP three-way
handshake, these attacks send a multitude of synchronization (SYN) requests
without finalizing the connection, thereby consuming server resources [54][46].
3. ICMP Flood Attacks: Commonly referred to as ping floods, these attacks
generate a large number of Internet Control Message Protocol (ICMP) requests,
which can overwhelm the network infrastructure and diminish its ability to
process legitimate requests [41] [57].

The results of this study are intended to aid in the creation of more robust, adaptive,
and efficient DDoS detection systems that utilize machine learning. Such progress is
essential for protecting modern digital infrastructures, ensuring service availability,
and preserving user trust amid the ever-changing landscape of cyber threats [31] [78].

1.3 Background
This study looks closely at Distributed Denial-of-Service attacks in terms of network
security. It talks about using Machine Learning to detect DDoS attacks and checks out
the latest defense methods in cybersecurity. The document also lists and explains the
words and terms related to network security that we'll talk about in the next Chapter.

1.3.1 Network Security


Network security involves the policies, practices, and technologies aimed at
safeguarding the integrity, confidentiality, and availability of networked systems. The
fundamental principles of network security are outlined by the CIA triad [72]:
1. Confidentiality: Making sure that sensitive information is only accessible to
those who are authorized.
2. Integrity: Safeguarding data from unauthorized changes to ensure its accuracy
and reliability [66].
3. Availability: Ensuring that network resources are available to authorized users
whenever they are needed.

5
1.3.2 Network Security Terminology
In the realm of security, certain terms carry specific meanings, while others may lack
clear definitions. Key vocabulary associated with computer security includes:
1. Vulnerability: A defect or shortcoming in a system's viability, architecture,
execution, operation, or upkeep [75]. Even if every system has vulnerabilities,
there needs to be strong defenses in place to deal with related risks.
2. Threat: A threat having the means and intent to take advantage of a weakness.
A company's reputation and finances may suffer if a threat materializes into an
assault, thus it is important to handle threats carefully [73] [71].
3. Attack: It is the utilization or exploitation of a weakness, which is neither
good nor bad. Adversaries and well-meaning people looking for answers can
both assault a system [68].
4. Attacker: The entity initiating an attack, synonymous with a threat. Attackers
exploit system vulnerabilities using appropriate tools and techniques.
5. Exploit: A weakness that is developing that may be used in an assault. While
not every vulnerability has an exploit, a single vulnerability can lead to several
attacks [16].
6. Target: The individual, business, or system that the exploit directly targets
and affects. Certain exploits contain targets that are both main and secondary
[1] [7].
7. Attack vector: The route taken by an attacker using various tools and
strategies to reach their target. For example, writing passwords on paper
introduces an alternative attack vector for acquiring system access [15].
8. Defender: An item or procedure that reduces or stops an assault. Automated
systems are often deployed to safeguard company networks from cyber threats
[53][67].
9. Compromise: It is the successful exploitation of a target by an attacker,
resulting in a system or network being rendered useless for legitimate users
[71].
10. Risk: A qualitative evaluation characterizing the probability that an attacker
would effectively breach a system by evading defenses through the use of an

6
exploit [66]. Every layer of the network architecture should undergo risk
analysis, and suitable defenses against potential attacks should be put in place
[54].

7
1.3.1.2 Implementing Network Security

Figure 1.3 The system model of OSI

In the OSI model, each layer functions independently and communicates only with
the layers directly above and below it. While security vulnerabilities can be present in
any of these layers, this discussion focuses specifically on the network layer and the
related security measures. The following steps are outlined in Top-Down Network
Design to help create a secure network:
Identify network assets: This involves recognizing every element of the network,
including switches, routers, operating systems, applications, and network data. It's
essential to understand the risks that could arise from these assets being compromised
or accessed inappropriately [61].
Analyzing security risks: These threats can range from malicious intruders to
unsuspecting individuals installing harmful software from the Internet. Risks from
hostile attackers may include denial of service attacks, data theft, and data
manipulation [67].

8
Examine trade-offs and security requirements: Implementing the CIA triad—
Confidentiality, Integrity, and Availability is a common standard for security. Often,
compromises must be made between CPU power, network performance, redundancy,
and other factors to achieve the desired level of security [54] [62].
Create a security strategy: This high-level document outlines the organization's
recommended course of action to meet security needs. It considers network risks and
assets while aligning with the organization's objectives. The plan should include
network topology and a description of network services, such as provider, access,
management, and more [39] [67].
Create a security policy: It is a formal statement that outlines the rules for individuals
who have access to an organization's technology and information assets. While the
specifics of security policies can differ from one organization to another, they
generally address key elements along with items unique to the organization.
Establish security procedures: Procedures outline the processes for configuration,
login, auditing, maintenance, and incident management, effectively implementing the
security policies [80]. They should be designed with the needs of end users, network
administrators, and security administrators in mind, offering clear guidance on how to
address issues, including detection and response protocols.
Ensure security: It is crucial to enforce the established protocols through scheduled
audits, reviewing audit logs, responding to incidents, monitoring literature and alerts,
conducting security tests, training security administrators, and updating policies and
plans [76][42].
1.3.2 DDoS Attack

1.3.2.1 Overview of the DDoS Attack


Distributed Denial of Service (DDoS) attack is one of the major security threats for
the internet services and networks. DDoS attacks, as a matter of fact, have served as a
prime research problem for this thesis focusing on proposing varied solution
frameworks for mitigating DDoS attacks, which aim at making a certain service
unavailable by exhausting the particular network or computing resource designated
for traffic processing. In this way, access to the relevant services is denied to
legitimate clients [63] [65].

9
Facing apparent complexity in defense against DDoS attacks, there is no particular
kind of defense that an organization can use in a standard way. There exists increased
complexity in the attribution of DDoS attacks on their mimicry of normal traffic but
usually at a much larger scale [7] [53].
1.3.2.2 Launching a DDoS Attack
A DDoS attack can start in several ways, with the most common method being the
relentless sending of packets from an attacker to a victim server. This flood of data
drains essential resources, making it difficult for legitimate users to access the
targeted services. Another common tactic involves sending a series of malformed
packets, which can cause victim servers to freeze or restart [41] [3]. Additionally,
attackers might take control of machines within a victim's network, leading to
resource depletion and making the network unavailable for both internal and external
services. The methods used to carry out these attacks are varied and often hard to
predict, typically becoming clear only after the attacks have begun. Figure 1.3.2.2
provides an example of a DDoS attack.

Figu
re 1.3.2.2 shows the Example of DDoS attack

A DDoS attack typically unfolds in several distinct phases and involves four main
participants: the attacker, the controller, the zombies, and the victim [21] The attacker
starts the process by scanning for vulnerable ports on machines that can be accessed
remotely [71]. Once a vulnerability is found, the attacker sends out malicious code

10
that, when executed on the targeted machine, replicates itself and initiates the attack.
The malware can also spread by disguising itself as a legitimate internet packet, like
an email attachment [21]. Throughout this process, the attacker retains remote control,
using spoofing techniques to avoid detection and make it difficult to identify the
machines involved [62].

1.3.3 Types of DDoS Attacks


Protocol attacks, Volume-based attacks and Application layer attacks are the three
types of DDoS attacks [22]. Attacks using volume require directly flooding a victim's
server with packets from the Internet. ICMP flooding, spoofing packet flooding, and
UDP flooding are some of the methods. Protocol-based assaults, such as Ping of
Death, Fragmented Packets, [36] and SYN Flooding, consume a server's Internet
resources and elevate traffic loads. Application Layer attacks, exemplified by
GET/POST Floods, target specific security protocol vulnerabilities. Examples of
attacks in each category are detailed below:

1. UDP Flood Attack:


Uses User Datagram Protocol (UDP) to its advantage by flooding the target server
with traffic, which exhausts its network resources. A UDP flood attack is a type of
Distributed Denial of Service (DDoS) attack that exploits the User Datagram Protocol
(UDP) to overwhelm a target server or network with an excessive amount of traffic.
Unlike the Transmission Control Protocol (TCP), UDP is a connectionless protocol,
meaning it does not establish a connection before sending data. This characteristic
makes UDP susceptible to abuse in the context of DDoS attacks [51].
In a UDP flood attack, the attacker sends a large number of UDP packets to the target
server or network with the intention of consuming its resources and causing
disruption. The attack typically involves sending these UDP packets to random ports
or to a specific port, overwhelming the target's ability to process the incoming traffic
[29].
Here are some key characteristics and aspects of UDP flood attacks:

11
i. Connectionless Nature of UDP: Unlike TCP, UDP does not establish a
connection before sending data. Each UDP packet is independent, and the
recipient does not send acknowledgments. This lack of connection setup
overhead makes UDP flood attacks relatively easy to execute.
ii. Volume of Packets: The effectiveness of a UDP flood attack relies on the
sheer volume of UDP packets sent to the target. The attacker may use various
means to generate a massive number of packets, including the use of botnets,
amplification techniques, or simply saturating the target with a high volume of
traffic [76][62]
iii. Port Exhaustion: The attacker may target specific UDP ports on the victim
server or flood it indiscriminately with packets to random ports. This can lead
to port exhaustion, where legitimate services on the target server become
inaccessible due to the overwhelming volume of incoming traffic.
iv. Distributed Denial of Service (DDoS): UDP flood attacks are often carried
out in a distributed manner using multiple compromised devices or a botnet.
This distributed approach increases the scale and impact of the attack, making
it more challenging for the target to mitigate the incoming traffic [41][6]
v. Detection and Mitigation: Detecting and mitigating UDP flood attacks can
be challenging because of the connectionless nature of UDP. Traditional
security measures, such as firewalls, may struggle to differentiate between
legitimate and malicious UDP traffic. Specialized DDoS mitigation solutions
are often employed to identify and filter out the malicious traffic, allowing
legitimate traffic to reach its intended destination [61][51].
vi. Reflective Amplification: In some cases, attackers leverage reflective
amplification techniques, such as DNS amplification or NTP amplification, to
magnify the volume of their attack traffic. This involves sending requests to
third-party servers that, in turn, send larger responses to the target, amplifying
the impact of the attack [33][41]

To defend against UDP flood attacks, organizations often employ a combination of


network security measures, traffic filtering, rate limiting, and the use of DDoS

12
mitigation services. Regular monitoring and timely response to unusual patterns of
network traffic are crucial in mitigating the impact of such attacks.

2. TCP_SYN Flood:
Manipulates the TCP protocol's three-way handshake, causing the server to wait idly,
impacting valid user service. A TCP_SYN flood attack is a type of Distributed Denial
of Service (DDoS) attack that exploits the TCP (Transmission Control Protocol)
three-way handshake process to overwhelm a target server and disrupt its services
[31]. In a typical TCP_SYN flood attack, the attacker floods the target server with a
high volume of TCP SYN (synchronization) packets, [74] exploiting the fact that the
server must allocate resources and maintain state information for each incoming
connection attempt.
Here are key aspects of a TCP_SYN flood attack:

i. TCP Three-Way Handshake: The TCP three-way handshake is the process


by which two devices establish a connection before exchanging data. It
involves three steps: SYN (synchronization), SYN-ACK (synchronization
acknowledgment), and ACK (acknowledgment). During normal operation, a
client sends a SYN packet to initiate a connection, the server responds with a
SYN-ACK packet, and the client acknowledges with an ACK packet [21][51]
ii. Exploiting Handshake Process: In a TCP_SYN flood attack, the attacker
sends a large number of SYN packets to the target server without completing
the three-way handshake. The goal is to overwhelm the server's capacity to
handle incoming connection requests, as it allocates resources for each
pending connection attempt.
iii. Server Resource Exhaustion: The server, expecting legitimate connection
requests, allocates resources such as memory and CPU for each incoming
SYN packet. However, since the attacker does not complete the handshake by
sending the necessary ACK packet, these resources are not released. As a
result, the server's resources become exhausted, leading to degraded

13
performance or complete unavailability of services for legitimate users [33]
[56]
iv. Impact on Legitimate Users: As the server becomes inundated with SYN
packets from the attacker, it has fewer resources available to process
legitimate connection requests. This can lead to delays, timeouts, or service
unavailability for legitimate users trying to establish connections with the
targeted server [31].
v. Distributed Nature: TCP_SYN flood attacks are often executed in a
distributed manner, using multiple compromised devices or a botnet to amplify
the volume of attack traffic. This distributed approach makes it more
challenging for the target to distinguish between legitimate and malicious
connection requests [ 13]
vi. Detection and Mitigation: Detecting and mitigating TCP_SYN flood attacks
require specialized security measures. Intrusion detection systems (IDS),
firewalls, and DDoS mitigation solutions are commonly used to identify and
filter out malicious SYN packets. Additionally, some solutions implement rate
limiting and connection tracking to differentiate between normal and abnormal
traffic patterns [4].
vii. TCP SYN Cookies: Some servers employ TCP SYN cookies as a defense
mechanism against SYN flood attacks. TCP SYN cookies allow a server to
handle connection requests without allocating resources until the three-way
handshake is completed. This helps prevent resource exhaustion during the
initial stages of a connection attempt.
Defending against TCP_SYN flood attacks involves a combination of network
security best practices, proper configuration of firewalls and intrusion prevention
systems, and the use of dedicated DDoS mitigation services to filter out malicious
traffic and ensure the availability of services for legitimate users. Regular monitoring
and timely response to anomalous network patterns are crucial in mitigating the
impact of such attacks [44].

3. ICMP/ Ping of Death (POD):

14
A flood attack employing ICMP ping that takes advantage of the IP protocol's
maximum packet size by sending large packets. An ICMP (Internet Control Message
Protocol) flood attack, commonly known as a Ping of Death (PoD), is a type of
Distributed Denial of Service (DDoS) attack that exploits the ICMP protocol to
overwhelm a target's network or system with an excessive volume of ping requests
[56]. The attack takes advantage of the IP (Internet Protocol) maximum packet size by
sending unusually large ICMP packets, causing disruption and potentially leading to a
denial of service.

Here are key aspects of an ICMP flood attack:

i. ICMP Protocol and Ping: ICMP is a network layer protocol used for
diagnostic purposes and error reporting in IP networks. The Ping utility, based
on ICMP Echo Request and Echo Reply messages, is commonly used to test
network connectivity. In an ICMP flood attack, the attacker leverages the Ping
utility to send a massive number of ICMP Echo Request packets to the target
[77].
ii. Ping of Death (PoD): The "Ping of Death" refers to a specific type of ICMP
flood attack that involves sending ICMP packets larger than the maximum
allowable size specified by the IP protocol. Historically, some systems had
vulnerabilities that allowed them to be crashed or disrupted when attempting
to process oversized ICMP packets, leading to the term "Ping of Death" [1].
iii. Packet Fragmentation: In an ICMP flood attack, the attacker may exploit the
IP protocol's ability to fragment packets. By sending oversized ICMP packets
that need to be fragmented to traverse the network, the attacker can cause the
target system to expend resources reassembling these packets. This can result
in the target's resources being overwhelmed, leading to degraded performance
or service unavailability [12][4].
iv. Amplification Factor: Similar to other DDoS attacks, ICMP flood attacks can
be executed in a distributed manner using multiple compromised devices or a

15
botnet. The distributed nature of the attack increases the volume of ICMP
traffic directed at the target, amplifying its impact.
v. Network Congestion: The high volume of ICMP packets generated by the
attack can lead to network congestion, impacting not only the targeted system
but also the surrounding network infrastructure. Legitimate traffic may
experience delays or packet loss as a result of the increased network load [41]
[6]
vi. Detection and Mitigation: Detecting and mitigating ICMP flood attacks
require specialized security measures. Network intrusion detection systems
(IDS), firewalls, and DDoS mitigation solutions are commonly used to
identify and filter out malicious ICMP traffic. Rate limiting, traffic shaping,
and IP filtering may also be employed to differentiate between normal and
abnormal traffic patterns.
vii. Preventing IP Fragmentation Vulnerabilities: Modern systems and network
devices are designed to handle fragmented packets appropriately, and many of
the vulnerabilities that allowed for the exploitation of the Ping of Death have
been addressed through software updates and security patches. It is crucial for
organizations to keep their systems up to date with the latest security patches
to mitigate vulnerabilities associated with oversized ICMP packets [52].
Defending against ICMP flood attacks involves a combination of network security
best practices, proper configuration of firewalls and intrusion prevention systems, and
the use of dedicated DDoS mitigation services to filter out malicious traffic. Regular
monitoring and timely response to anomalous network patterns are essential for
mitigating the impact of such attacks.

1.3.3.1 Reasons for Consideration these DDoS Attacks:

Detecting and mitigating DDoS attacks, including UDP flood, TCP_SYN flood, and
ICMP flood attacks, is crucial for maintaining the availability and performance of
online services. Considering UDP flood, TCP_SYN flood, and ICMP flood attacks in
DDoS detection is essential for building effective and adaptive defense mechanisms.

16
Supervised learning techniques, whether in classification or regression, play a crucial
role in training models to recognize patterns associated with these attacks, allowing
for timely and accurate identification and response [62].
Here are some reasons why these DDoS attacks are important considerations in the
context of detection, employing both supervised learning techniques like classification
and regression:
1. Diverse Attack Vectors: UDP flood attacks, TCP_SYN flood attacks, and
ICMP flood attacks represent diverse attack vectors targeting different aspects
of network communication. Supervised learning algorithms can be trained to
recognize patterns specific to each type of attack, allowing for the
development of detection models tailored to the characteristics of these attacks
[31].
2. Commonality in Traffic Patterns: DDoS attacks often exhibit distinctive
traffic patterns that can be identified through machine learning. By utilizing
supervised learning techniques, classifiers can be trained to distinguish
between normal and malicious traffic based on features such as packet rates,
packet sizes, and connection initiation behavior. This enables the detection of
anomalous activity associated with UDP, TCP_SYN, and ICMP flood attacks
[22].
3. Feature Extraction: Machine learning models for DDoS detection often rely
on feature extraction from network traffic data. Features could include
attributes like packet size, frequency of connection requests, or changes in
traffic patterns over time. Supervised learning allows the model to learn the
importance of various features in distinguishing between benign and malicious
traffic, contributing to accurate classification or regression.
4. Real-Time Detection: DDoS attacks can have rapid and dynamic
characteristics, requiring real-time detection and response mechanisms.
Supervised learning models, once trained, can operate in real-time to analyze
incoming traffic and make decisions on whether certain patterns are indicative
of an ongoing attack [33]. This capability is essential for promptly mitigating
the impact of DDoS attacks.

17
5. Adaptability to Evolving Threats: DDoS attack techniques are constantly
evolving, and attackers often employ new strategies to bypass traditional
security measures. Supervised learning models can adapt to new threats by
continuously updating their training data and learning from the evolving
characteristics of both normal and malicious traffic.
6. False Positive Reduction: The use of supervised learning allows for the
creation of models that can minimize false positives. By training on diverse
datasets that include both normal and attack scenarios, the model learns to
make more accurate predictions, reducing the likelihood of misclassifying
legitimate traffic as malicious [45].
7. Regression for Traffic Volume Prediction: In addition to classification,
regression techniques can be valuable for predicting the volume of incoming
traffic. This is particularly relevant for DDoS attacks, where a sudden surge in
traffic is a key indicator. Regression models can estimate expected traffic
volumes based on historical data, helping organizations proactively prepare for
potential attacks [21].
8. Integration with DDoS Mitigation Strategies: Supervised learning models
can be integrated with DDoS mitigation strategies to automate the process of
identifying and mitigating attacks. By combining detection and mitigation
efforts, organizations can respond quickly to minimize the impact of DDoS
attacks on their services.

1.3.4 DDoS Detection Approaches


Many studies have investigated solutions for DDoS attacks, employing a variety of
techniques. These solutions can be categorized into distributed systems, machine
learning, or a blend of both. A growing trend is the use of Deep Learning (DL) to
improve DDoS detection capabilities [65].

1.4 Motivation
This work focuses on detecting DDoS attacks in computer networks using a limited
set of features through a machine learning approach. By utilizing fewer

18
characteristics, we can process network packets more quickly to determine whether
they are part of an attack or just regular traffic, which is essential for the rapid
identification of DDoS attacks [45].

The study examines three major types of DDoS attacks: ICMP, TCP_sync, and UDP.
By analyzing the distinct features of each attack vector, we aim to customize machine
learning algorithms for precise and timely detection. The chosen algorithms—
Decision Tree (DT), Multi-Layer Perceptron (MLP), Logistic Regression (LR), and
K-Nearest Neighbors (KNN)—represent a variety of techniques that bring different
strengths to pattern recognition and classification.

The practical implications of this research are as significant as its potential to advance
the field of cybersecurity. Effectively implementing machine learning-based DDoS
detection systems can enable organizations to proactively safeguard against cyber
threats, ensuring the smooth operation of essential online services. Additionally, the
findings from this research could play a role in the ongoing conversation about
enhancing cybersecurity frameworks worldwide.

1.5 Contributions
This study makes a significant contribution to the field of cybersecurity by using
machine learning techniques to detect DDoS attacks. The main contributions can be
summarized as follows: it provides a detailed analysis of various types of DDoS
attacks, including ICMP Flood, TCP SYN Flood, and UDP Flood. This in-depth
understanding allows for a more nuanced evaluation of how effectively machine
learning algorithms can identify these specific attack vectors. The research includes
the practical application of several machine learning algorithms, such as K-Nearest
Neighbors (KNN), Decision Tree, Multi-layer Perceptron, and Logistic Regression.
The implementation is done in Python, which makes the code and methodologies
available to the broader community. Additionally, the study utilizes the KDD99
dataset, a well-known benchmark for intrusion detection. This dataset presents a range
of real-world scenarios that can be used to evaluate the performance of machine
learning algorithms in detecting DDoS attacks. The research further enriches the field
by conducting a comparative analysis of different machine learning techniques. The

19
study compares KNN, Decision Tree, Multi-layer Perceptron, and Logistic
Regression, assessing their performance through various metrics like accuracy,
precision, false positive rate (FPR), error rate, F1-Score, and ROC curve. It
systematically evaluates these machine learning techniques, providing insights into
their strengths and weaknesses. This analysis helps identify the most effective
algorithm for detecting DDoS attacks by establishing a clear set of criteria.
Additionally, the research includes practical implementations of DDoS attacks,
simulating real-world scenarios, which enhances the study's relevance by testing the
algorithms in conditions that reflect actual cyber threats. The findings offer valuable
guidance for cybersecurity professionals and researchers, highlighting machine
learning algorithms that are effective in detecting DDoS attacks. This knowledge can
significantly aid in developing more resilient and adaptable cybersecurity systems. In
conclusion, the research deepens the understanding of DDoS attack detection by
integrating a thorough analysis of specific attack types, practical applications of
machine learning algorithms, and a comprehensive evaluation using established
metrics. The results contribute to ongoing efforts to strengthen cybersecurity
measures against evolving cyber threats.

1.6 Aim and Objectives


The research aims to detect DDoS attacks using machine learning (ML) algorithms,
specifically focusing on UDP Flood, TCP_SYN Flood, and ICMP Flood, utilizing the
KDD99 dataset. A thorough examination and comparison of various machine learning
algorithms will be conducted, employing metrics such as recall, F1-score, accuracy,
precision, and ROC curve. The objectives include:

A comprehensive study and implementation of different machine learning algorithms


in the Python programming environment. This involves understanding the core
concepts and applying techniques like Decision Trees, Multi-layer Perceptions, K-
Nearest Neighbors, and Logistic Regression in practical scenarios.

20
Simulating various DDoS attacks, including TCP-SYN Flood, UDP Flood, and ICMP
Flood, to create realistic scenarios for assessing the effectiveness of machine learning
algorithms in detecting and mitigating these cyber threats.

Evaluating and comparing the performance of machine learning algorithms using the
KDD99 dataset. The algorithms will be assessed based on key metrics such as
accuracy, recall, precision, and F1-score, along with ROC curves to provide a
comprehensive view of their ability to detect DDoS attacks.

By achieving these objectives, the study aims to offer valuable insights into the
application of machine learning in cybersecurity, particularly in the context of DDoS
attack detection. The results will not only deepen our understanding of algorithm
performance but also guide the development of more robust and effective defense
mechanisms against evolving cyber threats.

1.7 Thesis Organization


This work is organized into five chapters. Chapter 1 offers an overview of the thesis,
including the problem statement, background, motivation, aims, and objectives.
Chapter 2 presents the literature review and discusses related work. Chapter 3 details
the methodology, including the machine learning models used to identify DDoS
attacks, as well as the KDD Cup 1999 dataset and its features. It also outlines the
architecture of the machine learning model for detecting DDoS attacks and describes
the evaluation metrics employed. Chapter 4 presents the results and discussion, which
is divided into three main sections, each showcasing the results of three different
attacks. Finally, Chapter 5 concludes the thesis and explores potential future work.

21
Chapter 2
Literature Review
2.1 Overview
DDoS attacks have been on the rise in recent years, making it important to have reliable and
effective detection systems. Machine learning has shown great potential in improving
detection methods to keep up with the constantly changing nature of these attacks. A
literature review summarizes key studies in this area, focusing on their methods, datasets and
performance results.

2.2 Types of Approaches


2.2.1 Machine Learning Methods
The use of machine learning in detecting and mitigating Distributed Denial of Service
(DDoS) attacks has gained significant power, particularly in cloud computing and software-
defined networking (SDN) environments. Zekri et al. [9] explored various machine learning
methods, including classification and clustering, to identify DDoS patterns in cloud
infrastructures. Their study showed the effectiveness of real-time analysis in dynamic
environments. Similarly, Pillutla and Arjunan [14] proposed a fuzzy self-organizing map-
based framework for mitigating DDoS attacks in SDN, showcasing the adaptability of
unsupervised learning techniques in handling evolving traffic patterns. Idhammad et al. [16]
introduced a semi-supervised machine learning model that reduced reliance on labeled
datasets while enhancing detection accuracy in hybrid environments.

In the perspective of Internet of Things (IoT) devices, which are increasingly targeted by
botnet-driven DDoS attacks, Doshi et al. [17] developed lightweight models designed for
resource-constrained devices. These models were optimized for real-time detection,
balancing computational efficiency with accuracy. Kirubavathi and Anitha [15] focused on
Android-based botnets, using structural analysis and machine learning to detect botnet
activity in its early stages, often a precursor to large-scale DDoS attacks. Bhushan and Gupta
[11] highlighted the integration of machine learning with SDN for DDoS mitigation in cloud
environments, achieving improved scalability and flexibility.

22
Ensemble and hybrid approaches have also been explored to enhance detection capabilities.
Das et al. [22] proposed an ensemble-based model that integrated multiple machine learning
algorithms to improve the robustness and accuracy of DDoS detection. Tuan et al. [27]
evaluated various machine learning models, including neural networks and support vector
machines, for botnet-based DDoS detection, concluding that neural networks outperformed
traditional classifiers in terms of detection speed and accuracy. Cao et al. [13] presented a
genetic algorithm-based solution to address the limitations of static detection models,
particularly for protecting Hadoop clusters under attack.

Despite these advancements, several challenges remain. Bhushan and Gupta [11] noted the
difficulty of maintaining low false-positive rates in real-time environments with dynamic and
diverse traffic. Idhammad et al. [16] suggested that distributed systems integrated with
machine learning could improve scalability and response times. Doshi et al. [17] emphasized
the importance of lightweight detection models for IoT environments to ensure both
computational efficiency and accuracy. These studies collectively underscore the potential of
machine learning in combating modern DDoS threats while pointing to areas such as real-
time scalability, resilience, and adaptability as critical directions for future research.

2.2.2 Distributed System Approaches

Distributed systems have proven to be a vital component in addressing the challenges of


Distributed Denial of Service (DDoS) attacks, particularly in the context of cloud computing,
where scalability and data processing speed are paramount. Zekri et al. [9] proposed a
machine learning-based detection framework within cloud computing environments,
leveraging distributed systems to improve DDoS attack detection. Their approach enhanced
the real-time detection capabilities by utilizing cloud-based frameworks that scale efficiently.
This system's ability to process large amounts of traffic in real-time proves essential in
combating DDoS attacks that target cloud-based infrastructures. However, future research
could further evaluate the performance of this method with various types of attack scenarios
and under different cloud environments.

In a similar vein, Bhushan and Gupta [11] focused on the use of Software-Defined
Networking (SDN) for DDoS mitigation in cloud environments. Their approach highlighted
SDN’s potential in handling large-scale network traffic by offering centralized control and

23
efficient resource management. By decoupling the control plane from the data plane, SDN-
based networks can provide more dynamic and responsive management of network resources,
which is crucial when defending against distributed attacks. Their findings suggest that SDN,
when combined with distributed systems, can offer enhanced performance and scalability for
DDoS mitigation in cloud infrastructures. The study underlines the need for distributed
approaches to manage increasingly sophisticated and large-scale attacks effectively.

Pillutla and Arjunan [14] introduced an innovative method for DDoS mitigation based on
fuzzy self-organizing maps (FSM) within a cloud environment. This method, designed for
distributed systems, aims to reduce false positives while maintaining high detection accuracy.
By utilizing FSM, their approach adapts to varying attack patterns, providing a flexible
mechanism for real-time traffic analysis in cloud computing. Furthermore, their system’s
ability to dynamically classify and detect malicious traffic in large, distributed systems
presents a significant advantage over traditional methods. However, the approach could
benefit from further optimization and integration with other machine learning models to
improve its adaptability to emerging DDoS attack strategies.

Idhammad et al. [16] explored a semi-supervised machine learning approach for DDoS
detection, which incorporates both labeled and unlabeled data to enhance the detection
process. Their method takes advantage of distributed systems to efficiently process large-
scale datasets in real-time. The combination of supervised and unsupervised learning
techniques allows the system to adapt to a variety of attack patterns while maintaining a low
false positive rate. This technique is especially suitable for cloud environments, where the
volume and variety of traffic are continuously evolving. Further advancements could involve
integrating additional data sources and refining the semi-supervised learning model to handle
more complex attack vectors.

Tuan et al. [27] also contributed to the field by evaluating the effectiveness of machine
learning techniques in detecting DDoS attacks driven by botnets. Their research highlighted
the importance of distributed systems like Hadoop for large-scale DDoS attack detection.
Hadoop's ability to process massive volumes of data in parallel makes it an excellent choice
for real-time detection in cloud environments, where botnets can generate significant traffic
volumes. Their findings emphasize the scalability and efficiency of distributed data
processing systems in combating complex DDoS attacks. Expanding the approach to

24
incorporate other machine learning algorithms and classifiers could further improve detection
accuracy and adaptability in the face of evolving DDoS tactics.

2.3 Machine Learning Algorithms


Machine learning algorithms are widely used for detecting and mitigating Distributed Denial
of Service (DDoS) attacks in cloud computing environments. Zekri et al. [17] demonstrated
the use of K-Nearest Neighbors (KNN) for real-time DDoS detection, showing that it
performs well when dealing with large datasets by classifying traffic based on the nearest
neighbor. Similarly, Hoang and Nguyen [2] utilized Decision Trees to detect botnet activities,
emphasizing the algorithm's ability to handle large-scale DNS query data, where the decision-
making process is easy to interpret, making it suitable for real-time DDoS detection systems.

Bhushan and Gupta [11] used Multi-Layer Perceptrons (MLP) for detecting DDoS attacks in
Software-Defined Networking (SDN) environments. MLP, a type of deep learning model,
was found to improve detection accuracy by learning complex patterns in large datasets.
Idhammad et al. [16] implemented Logistic Regression to classify traffic as either attack or
legitimate, demonstrating its ability to offer reliable and computationally efficient results.
Logistic Regression is a linear classifier, and its performance in binary classification tasks for
DDoS detection was emphasized in their findings.

Fundamental Points for Implementing These Algorithms:

2.3.1 K-Nearest Neighbors (KNN) Algorithm

Basic Idea: KNN is a non-parametric classification algorithm that classifies data points based
on the majority class of their nearest neighbors in the feature space.

Implementation Points:

 Choose a distance metric, such as Euclidean distance, to measure the


similarity between points.
 Determine the number of neighbors (k) to be considered for
classification.
 Classify a new data point based on the most frequent class among the k
nearest neighbors.

25
Strengths: Simple to implement, effective in high-dimensional spaces.

Weaknesses: Computationally expensive as it requires calculating the distance to all other


points in the dataset, making it less efficient for large datasets.

Figure 2.3.1 shows KNN algorithm

2.3.2 Decision Tree Algorithm


Basic Idea: Decision Trees recursively partition the data into subsets based on feature values,
aiming to maximize information gain or minimize impurity. Hoang and Nguyen [10]
employed this technique for DDoS detection, showing its utility in classifying botnet traffic.

Implementation Points:
 Splitting Criteria: Use criteria like Gini impurity or Information Gain to
decide the best feature at each node.
 Tree Pruning: After the tree is built, prune branches to prevent overfitting
and improve the generalization ability.
 Handling Imbalanced Data: Decision Trees can struggle with imbalanced
data, so techniques like class balancing or cost-sensitive learning may be
necessary [6].
Strengths: Easy to interpret, requires little data preprocessing, and is capable of handling
both numerical and categorical data.
Weaknesses: Can overfit if the tree is too deep and struggles with capturing complex
relationships without pruning or ensemble methods.

26
Figure 2.3.2 Decision Tree Algorithm
2.3.3 Multi-layer Perceptron Algorithm
Multi-Layer Perceptrons (MLP) are a class of feedforward neural networks designed to solve
supervised learning problems, including DDoS detection. By utilizing multiple layers of
neurons, MLPs can learn complex, non-linear relationships between input features, making
them highly effective for identifying DDoS attack patterns in network traffic [3]. Below are
the key elements and steps in implementing MLPs for DDoS detection:
Key Features of MLP:
 Architecture: An MLP consists of three layers: the input layer, one or more hidden
layers, and the output layer. Each neuron in the hidden layers computes a weighted
sum of the inputs and applies an activation function, such as Sigmoid or ReLU, to
generate an output. The final output layer produces the classification result (e.g.,
attack or normal traffic). These networks are trained using a backpropagation
algorithm, where the network adjusts weights based on the error between predicted
and actual values [11][17].
 Training Process: MLPs are trained using labeled data, which requires the
classification of network traffic as either legitimate or an attack. The backpropagation
algorithm works by calculating the gradient of the loss function with respect to each
weight in the network and adjusting the weights to minimize the error. This iterative
process, known as stochastic gradient descent (SGD), ensures that the network learns
to make accurate predictions [11][16].

27
Steps for Implementing MLP for DDoS Detection:
1. Data Preprocessing:
i Feature Selection: MLPs require relevant features from the network traffic data.
Features like packet size, flow duration, source/destination IP, protocol types, and
number of packets are essential for effective DDoS detection [11][17].
ii Normalization: MLPs perform better when the input data is normalized to a
standard scale. Normalization improves the convergence rate of the model and
prevents certain features from dominating the learning process due to their larger
numerical values [11][16].
2. Model Construction:
i Input Layer: The number of neurons in the input layer corresponds to the number
of features in the dataset. Each neuron receives a feature value as input.
ii Hidden Layers: The number and size of hidden layers are critical in defining the
complexity of the model. Bhushan and Gupta [11] recommend experimenting
with different configurations to optimize performance.
iii Activation Function: Popular activation functions include Sigmoid, ReLU, and
Tanh. Sigmoid is often used in binary classification tasks because it outputs
probabilities between 0 and 1, while ReLU is favored for its ability to efficiently
handle non-linearities in large datasets [11][17].
3. Training the Model:
i Backpropagation: The MLP uses the backpropagation algorithm to adjust weights
during training. The loss function (such as mean squared error or cross-entropy) is
calculated, and the gradient descent method is employed to update the weights and
reduce the error.
ii Optimization: Common optimizers include Stochastic Gradient Descent (SGD)
and Adam. These methods help fine-tune the weights by minimizing the loss
function and improving model accuracy [11].
4. Model Evaluation:
i Cross-validation: After training the model, cross-validation is performed to
evaluate its performance. This technique splits the data into multiple subsets,
training the model on some while validating it on others to prevent overfitting
[11].

28
ii Metrics: Performance metrics such as accuracy, precision, recall, and F1 score are
used to assess the model's ability to correctly classify network traffic. High
precision and recall are especially important in DDoS detection to minimize false
positives and false negatives [17].
5. Testing and Deployment:
i Real-time Testing: Once the model is trained and evaluated, it is deployed in a
real-time environment where it classifies incoming traffic as normal or suspicious.
The MLP model's performance is continuously monitored to ensure it adapts to
evolving attack patterns [1].
ii Threshold Setting: A decision threshold is applied to classify traffic. If traffic
surpasses a predefined threshold indicating an attack, the system flags it as a
potential DDoS attack [11].

Figure 2.3.3 shows Multi-Layer Perceptron

2.3.4 Logistic Regression Algorithm


Logistic Regression is a widely used machine learning technique for binary classification
tasks, such as distinguishing between normal network traffic and DDoS (Distributed Denial
of Service) attack traffic. It models the relationship between the dependent variable (binary
output, in this case, whether the traffic is a DDoS attack or not) and one or more independent
variables (features such as packet rate, connection duration, etc.). The logistic regression
model calculates the probability of an instance belonging to a particular class by applying the
logistic (sigmoid) function to the weighted sum of input features [2][65].
Logistic regression is attractive for DDoS detection because of its simplicity, interpretability,
and efficiency. It works particularly well when the decision boundary between normal and

29
attack traffic can be approximated by a linear function, which is often the case with structured
network traffic data. However, its performance can decrease if the data relationships are non-
linear or too complex

2.3.4.1 Logistic Regression for DDoS Detection

Logistic Regression is a widely used machine learning technique for binary classification
tasks, such as distinguishing between normal network traffic and DDoS (Distributed Denial
of Service) attack traffic. It models the relationship between the dependent variable (binary
output, in this case, whether the traffic is a DDoS attack or not) and one or more independent
variables (features such as packet rate, connection duration, etc.) [43]. The logistic regression
model calculates the probability of an instance belonging to a particular class by applying the
logistic (sigmoid) function to the weighted sum of input features.

Logistic regression is attractive for DDoS detection because of its simplicity, interpretability,
and efficiency. It works particularly well when the decision boundary between normal and
attack traffic can be approximated by a linear function, which is often the case with structured
network traffic data. However, its performance can decrease if the data relationships are non-
linear or too complex.

The fundamental steps for implementing Logistic Regression for DDoS detection include:

1. Data Preparation

i Feature Selection: The effectiveness of logistic regression heavily relies on the


selection of relevant features. For DDoS detection, typical features might
include packet size, flow duration, number of packets per connection, protocol
type, and traffic volume [11][17]. The quality of these features directly
influences the model's ability to differentiate between normal and attack
traffic.
ii Data Normalization: Features are typically normalized to ensure they are on
the same scale. This is essential because logistic regression is sensitive to the
magnitude of input features, and large variations in feature values could affect
the model’s convergence and accuracy [11]. Normalization methods like Min-
Max scaling or Standardization (Z-score normalization) are commonly used.

30
2. Model Setup
i Define the Output: Logistic regression is a binary classification model that
predicts the probability of the given input belonging to one of the two classes:
DDoS attack or normal traffic [17]. The output value ranges from 0 to 1,
representing the likelihood of the input belonging to the "attack" class.
ii Sigmoid Function: The key to logistic regression is the sigmoid activation
function, which transforms the output into a probability value. The formula for
the sigmoid function is:

where Z is the weighted sum of the input features, and eee is the base of the
natural logarithm [11].
3. Training the Model
i Optimization: The goal of training the logistic regression model is to find the
optimal weights (parameters) that minimize the prediction error. This is done
by using the logistic loss function (also called cross-entropy loss) which
measures the difference between predicted probabilities and actual labels
(attack or normal). The model is trained by minimizing this loss using
optimization algorithms like Gradient Descent [17].
ii Gradient Descent: Gradient descent is used to iteratively adjust the weights of
the logistic regression model by moving in the direction of the steepest
decrease of the loss function. The update rule for the weight vector w is:

where α is the learning rate, and J(w) is the loss function [11].

31
4. Model Evaluation
i Cross-validation: To assess the model’s performance and avoid over fitting, it
is common to perform k-fold cross-validation. In this process, the dataset is
divided into kkk subsets, and the model is trained and validated on different
subsets, ensuring that the model generalizes well to unseen data [17].
ii Performance Metrics: After training the model, various metrics are used to
evaluate its effectiveness, including:

 Accuracy: The proportion of correct predictions.


 Precision: The percentage of true positive predictions among all
predicted positive instances.
 Recall (Sensitivity): The percentage of true positive predictions
among all actual positive instances.
 F1 Score: The harmonic mean of precision and recall, providing a
balanced measure of performance [11].

5. Deployment
i Real-time Classification: Once the model is trained, it can be deployed in a
real-time environment where it classifies incoming network traffic as either
normal or an attack. The model uses the learned weights and applies the
sigmoid function to classify new instances [17].
ii Thresholding: A decision threshold is applied to the output of the sigmoid
function to make a classification decision. Typically, if the probability is
greater than 0.5, the model classifies the traffic as a DDoS attack; otherwise, it
is classified as normal traffic. The threshold can be adjusted based on the
desired balance between false positives and false negatives [11]. Many
industries, including marketing, banking, and healthcare, employ logistic
regression extensively for applications including credit scoring, customer
churn prediction, and spam identification.

32
Figure 2.3.4 shows Logistic Regression Algorithm

2.4 Literature Review


The growing prevalence of Distributed Denial of Service (DDoS) attacks in modern
computing environments has spurred extensive research into detection and mitigation
mechanisms. Machine learning (ML) algorithms have emerged as critical tools for identifying
and preventing such attacks due to their ability to process large volumes of network data,
identify patterns, and adapt to evolving threats. The literature has particularly focused on the
implementation of ML algorithms, the simulation of various DDoS attack types, and
performance evaluation using benchmark datasets such as KD99.

Numerous studies have highlighted the application of supervised, unsupervised, and hybrid
ML approaches for DDoS detection. For instance, Bhushan and Gupta [11] explored the
mitigation of DDoS attacks in software-defined networking (SDN) environments by
employing ML techniques to analyze and filter malicious traffic. Their approach
demonstrated improved scalability and adaptability, key factors for dynamic cloud
environments. Similarly, Idhammad et al. [16] proposed a semi-supervised ML method that
combines labeled and unlabeled data, significantly enhancing detection accuracy, particularly
in scenarios with limited labeled datasets. Doshi et al. [17] focused on the use of ML
algorithms for detecting DDoS attacks in IoT environments, showcasing their effectiveness in
identifying anomalies in consumer-grade devices.

Among the ML techniques, K-Nearest Neighbors (KNN) has been effective in identifying
malicious patterns based on similarity measures. Decision Trees (DT) have proven valuable
for their interpretability and ability to handle complex decision boundaries. Logistic
Regression (LR) is often used for its simplicity and efficiency in binary classification tasks,

33
such as distinguishing between normal and attack traffic. Multilayer Perceptrons (MLPs), as a
form of deep learning, have demonstrated their ability to learn intricate patterns from high-
dimensional data, as shown in the works of Pillutla and Arjunan [14]. These algorithms form
the foundation for many contemporary DDoS detection systems.

To test and refine detection methods, researchers have simulated various DDoS attack types,
including UDP Flood, TCP_SYN Flood, and ICMP Flood. These attack types mimic real-
world scenarios and challenge detection systems to respond effectively. For example, Tuan et
al. [19] conducted simulations involving botnet-driven DDoS attacks, analyzing the
performance of ML algorithms under high-volume traffic conditions. Pillutla and Arjunan
[14] integrated fuzzy self-organizing maps to detect these attack types, highlighting their
potential in identifying subtle anomalies that traditional rule-based systems might overlook.
Simulating diverse attack scenarios ensures the robustness and adaptability of ML-based
detection mechanisms.

The KD99 dataset is a benchmark for evaluating the performance of intrusion detection
systems (IDS). Its rich set of features, including network traffic data and labeled attack types,
makes it an ideal choice for training and testing ML algorithms. Researchers such as Doshi et
al. [17] and Kirubavathi and Anitha [15] have extensively used KD99 to validate their
models. Doshi et al. [17] demonstrated how feature selection and optimization on KD99
could significantly enhance detection accuracy and precision. Meanwhile, Kirubavathi and
Anitha [15] utilized the dataset to analyze the behaviors of Android botnets, showing how
ML models can adapt to specific attack scenarios. The dataset’s diversity also allows for
comparative studies across different ML approaches, enabling researchers to assess their
models in terms of accuracy, error rates, and computational efficiency.

In addition to testing individual algorithms, researchers have focused on comparing their


performance against various metrics. Metrics such as accuracy, precision, recall, and false
positive rates are crucial in evaluating the efficacy of detection systems. For instance, Tuan et
al. [19] conducted a comparative analysis of ML techniques for DDoS detection and
emphasized the importance of selecting algorithms tailored to specific traffic patterns and
attack types. Similarly, Idhammad et al. [16] showed that hybrid approaches combining
supervised and unsupervised learning often outperform single-method solutions, particularly
in scenarios with limited labeled data.

34
While significant progress has been made, challenges remain in developing scalable and
adaptable detection systems. Studies such as those by Bhushan and Gupta [11] and Pillutla
and Arjunan [14] highlight the need for real-time processing and efficient resource utilization
in cloud and SDN environments. Additionally, balancing detection accuracy with low false
positive rates remains a critical focus. The KD99 dataset, while widely used, also presents
limitations in terms of its age and relevance to emerging attack types, prompting the need for
updated datasets and real-world testing scenarios. The growing prevalence of Distributed
Denial of Service (DDoS) attacks in modern computing environments has spurred extensive
research into detection and mitigation mechanisms. Machine learning (ML) algorithms have
emerged as critical tools for identifying and preventing such attacks due to their ability to
process large volumes of network data, identify patterns, and adapt to evolving threats. The
literature has particularly focused on the implementation of ML algorithms, the simulation of
various DDoS attack types, and performance evaluation using benchmark datasets such as
KD99.

35
Attack
Author Name Year Learning Method Dataset Used
Identified
K, Pradeep & 2025 Logistic Regression, Distributed KDD Cup 1999
Kumar, Pavan & Random Forest, and Denial-of- , NSL-KDD
J, Pradeepa & S, Neural Network Service (DDoS)
Prashantha & classifiers. attack
Khan, Saad.[1]
Raihan Putra 2024 Random Forest, SVM, Distributed CICIDS DDoS
Janivasya, Ika Logistic Regression, or Denial-of- 2017, KDD Cup
Dyah Agustia Decision Trees. Service (DDoS) 1999
Rachmawati. [2] attack.
Sahosh, Zerin & 2024 Random Forest, Distributed KDD Cup 1999,
Faheem, Azraf & Support Vector Denial-of- NSL-KDD
Tuba, Marzana & Machine(SVM), Service (DDoS)
Ahmed, Md & Neural Networks, and attack.
Tasnim, Syed. [3] Decision Trees
Wu, Yeefong. [4] 2023 Random Forest, SVM, Distributed CICIDS DDoS
Logistic Regression, or Denial-of- 2017, KDD Cup
Decision Trees. Service (DDoS) 1999
attack
Hashim, Baydaa 2023 Random Forest, Distributed CICIDS DDoS
& Sallehudin, Support Vector Denial-of- dataset
Hasimi & Safie, Machine (SVM), or K- Service (DDoS)
Nurhizam & Nearest Neighbors attack
Safie, Hizam & (KNN)
Murhg, Hamed &
Abdelghany,
Shaymaa [5]
Kumari, K., 2022 Logistic Regression Distributed CAIDA 2007
Mrunalini, M. [6] and Naive Bayes Denial-of- Dataset
algorithms. Service (DDoS)
attack
Kishore, Dasari 2021 Logistic Regression, Distributed CIC-DDoS2019
& Devarakonda, Decision Tree, Denial-of-
Nagaraju. [7] Random Forest, Service (DDoS)
AdaBoost, Gradient attack
Boost, KNN, and Naive
Bayes.
Borah, Rituparna 2023 K-Nearest Neighbour Distributed CICDDoS2017
& Sarmah, (KNN), Random Forest Denial-of- dataset
Satyajit & Service (DDoS)
Choudhury, Nitin attack
& Mahanta[8]

36
Author Name Year Learning Method Attack Dataset Used
Identified

Zekri M, 2017 Trademark DoS and DDoS KDD99


Aboutabit N, recognition methods attack
Saadi Y, Kafhali in cloud computing
S, [9] environment

Hoang X, 2018 Multilayer Perceptron, DoS and DDoS UNBS-NB 15


Nguyen Q. [10] Naive Bayes, Decision attack and KDD99
Tree, Convolutional
Neural Network
Bhushan K, 2018 Random forest DDoS attack in Empirical data
Gupta BB [11] algorithm SDN

Tom Ball. [12] 2018 SDN based Cloud DDoS attack KDD99
Wang Y, Li J, 2018 Neural Network DDoS attack KDD99
Zhao Y, Cao N, Model
Li G, Zhu P, Sun
Q. [13]
Pillutla H, 2018/2019/2019 Dempster's tandem Maps-based KDD99
Arjunan A/ Jha rule DDoS
S/ Pritam N
[14]
[20]
[21]
Kirubavathi 2018/ 2019 Structural DDoS attack KDD99
G/Homayoun S interpretation learning
[15]
[19]
Idhammad M/ 2018/ 2018/ An online consecutive DDoS attack UNSW-NB15,
Doshi R/Co N 2018 tractor trailer method NSL-KDD,
[16] and UNB
[17] ISCX 12
[18]
Son NTK/ Khan 2019/ 2019/ DDoS assaults DDoS attack NSL-KDD
MMT/S Das 2019 indepth assessed the
[22] condensed feature set
[23]
[24]
Li Q, Meng L 2019 PCA and New DDoS attack PCA-RNN
[25] Detection Model

37
Author Name Year Learning Method Attack Dataset Used
Identified

Ceron J. [26] 2019 Artificial Neural IoT Botnet UNBS-NB 15


Network, Support Using Network and KDD99
Vector Machine, Layer
Decision Tree, Naive
Bayes
Tuan TA. [27] 2020 Artificial Neural DDoS attack UNBS-NB 15
Network, Support and KDD99
Vector Machine,
Decision Tree, Naive
Bayes
Smith, J. et al. 2018 Support Vector Statistical KDD Cup 1999
[28] Machines Pattern
Recognition
Zhang, L. et al. 2019 Neural Networks Anomaly NSL-KDD
[29] Detection based
on Traffic
Deviations
Patel, R. and 2020 Decision Trees Flow-based CICIDS2017
Gupta, S. [30] Anomaly
Detection
Kim, Y. and Lee, 2017 Random Forest Entropy-based DARPA 2000
S. [31] Detection
Chen, H. et al. 2021 Clustering (K-Means) Traffic UNSW-NB15
[32] Profiling for
Anomaly
Detection
Wang, Q. and Li, 2016 Naive Bayes Packet Entropy CAIDA
Z. [33] Analysis (DDoS Attack
Scenarios)
Smith, J. et al. 2022 Random Forest Statistical CICDDoS2019
[34] Anomalies dataset

Wang, Y. et al. 2021 Support Vector Traffic Patterns NSL-KDD


[35] Machine dataset
Kim, H. et al. 2021 Deep Learning Network CAIDA,
[36] (LSTM) Behavior MAWI, and
Analysis AWS datasets
Patel, R. et al. 2020 Decision Trees Packet Header UNSW-NB15
[37] Features dataset

Author Name Year Learning Method Dataset Used


Attack
38
Identified
Liu, Q. et al. 2020 Ensemble Learning Flow-Based KDD Cup
[38] Features 1999 dataset
Chen, X. et al. 2019 Neural Networks Packet Payload CICDDoS2017
[39] Analysis dataset

Kumar, S. et al. 2019 Naive Bayes Time Series DARPA


[40] Analysis dataset

39
Chapter 3
Methodology
3.1 Overview
To tackle the constantly changing threat of Distributed Denial of Service (DDoS) attacks,
this study uses a thorough methodology that incorporates machine learning algorithms.
The suggested approach emphasizes the analysis of unusual network traffic patterns to
effectively differentiate between legitimate and harmful activities. By utilizing advanced
feature selection methods, strong model architectures, and careful evaluation strategies,
the system aims to provide accurate and efficient detection of DDoS attacks while
reducing false positives.

 Feature Selection
Feature selection is an essential step in the process, aimed at identifying and prioritizing
the most relevant attributes from network traffic data. By reducing dimensionality and
concentrating on key variables, these attributes enhance model performance. The main
features considered in this study include:

 Traffic Volume: The total amount of data transmitted over a connection.


 Packet Rates: The number of packets sent per second.
 Protocol Types: Differentiating between traffic types such as TCP, UDP, and
ICMP.
 Connection Attributes: Features like connection flags and error rates.
To optimize the selection process, advanced methods such as Recursive Feature
Elimination (RFE) and Information Gain were utilized. These techniques help ensure that
the model concentrates on the most significant features, thus improving both accuracy and
computational efficiency.

40
 Dataset Preparation:
The study makes use of the KDD Cup 1999 dataset, which is well-known for effectively
representing both normal and attack traffic. This dataset provides a structured framework
for training and evaluation.

 Dataset Structure:
 It consists of 41 features that are divided into Basic, Content, and
Traffic categories.
 It encompasses various types of attacks, such as DoS, R2L, and U2R.
 Preprocessing Steps:
 This includes addressing missing values and eliminating duplicates.
 Numerical features are normalized using Min-Max Scaling to fit within
a range of [0, 1].
 Categorical variables are encoded (for example, Protocol Type: TCP =
1, UDP = 2).
Furthermore, the dataset was divided into training (70%), testing (20%), and validation
(10%) subsets to facilitate a thorough evaluation process.

 Model Selection:
A variety of machine learning models were examined to create an effective detection
mechanism:

 Traditional Algorithms:
 Logistic Regression (LR): A statistical approach used for binary
classification.
 K-Nearest Neighbors (KNN): A non-parametric method that identifies
patterns based on proximity measures.
 Advanced Techniques:
 Random Forest (RF): An ensemble learning technique known for its
high accuracy and robustness.
 Multi-Layer Perceptions (MLP): A deep learning framework that can
capture complex relationships.

41
To enhance model performance, hyper parameter tuning methods like grid search were
utilized, concentrating on parameters such as the number of trees in Random Forest and
the number of neurons in MLP.

 Mathematical Model for DDoS Identification:


Complementing the machine learning approach, a mathematical model was developed to
evaluate throughput and inter arrival times for DDoS detection. Key metrics include:

 Bandwidth: The maximum data capacity of a communication channel.


 Throughput: The rate of successful data transfer from source to destination.
This model identifies abnormal traffic by comparing observed throughput to a predefined
threshold derived from statistical analyses of the dataset. Instances exceeding the threshold
are classified as potential DDoS attacks.

 Performance Evaluation:
The trained models are then thoroughly evaluated for performance using a testing dataset
that mimics real-world situations. We assess how well the models can accurately identify
DDoS attacks while keeping false alarms to a minimum. If necessary, we make
adjustments to the methodology to ensure the best possible outcomes. [28][33][38]

3.2 Machine learning Model for Identifying DDoS Attacks


A type of cyberattack known as a Distributed Denial of Service (DDoS) attack makes use
of a range of compromised machines to interfere with network operations and cause a
denial of service for authorised users. This paper presents two different methods for
identifying DDoS attacks: a machine learning model and a mathematical model. The
machine learning model uses Logistic Regression and Naive Bayes techniques for DDoS
detection, while the mathematics model builds a link between throughput and the
interarrival time of requests. [45][48]

3.2.1 Mathematical Model for DDoS Identification:


42
It is essential to comprehend a system's quantitative behavior, and a mathematical model
is a useful tool in this regard. The effectiveness and limitations of the mathematical model
can be evaluated by contrasting quantitative results with observational data. The
mathematical model for identifying DDoS attacks is presented, with a focus on bandwidth
and throughput as important indicators.

DDoS attacks are mostly determined by two key factors: bandwidth, which measures a
communication channel's data capacity, and throughput, which measures the successful
transfer of data from a source to a destination. [50]

3.2.2 Machine Learning Model for DDoS Identification:


Complementing the mathematical approach, we are using a machine learning model to
spot DDoS attacks. This model uses two popular algorithms: KNN, DT, Logistic
Regression, and MLP. Logistic Regression algorithm is well-suited for classification tasks,
making them apt choices for discerning normal network behavior from DDoS attacks.
These algorithms learn patterns and characteristics suggestive of DDoS attacks from
labeled datasets. [76]

3.3 Dataset
3.3.1 KDD99 dataset and its features
The KDD99 dataset consists of a uniform data gathering process for auditing,
featuring a diverse range of simulated intrusions in a networked military scenario. The
KDD Cup 99 dataset has been extensively used since 1999 and is a crucial benchmark
for assessing abnormality identification techniques [20]. The dataset is accessible in
two versions: entire version, which includes around 500 million packets with 41
characteristics apiece, and another second version, which makes up 20% of original
dataset and includes about 500,000 rows with the same structural properties.
Of the 41 features, thirteen content features inside a connection are based on domain
knowledge, while nine are basic qualities linked to specific TCP connections. Table
(3.3.1) provides a detailed explanation of each characteristic. All characteristics that
43
can be acquired through a TCP/IP connection are considered basic features [20].
There are two categories when it comes to traffic-related factors. [20]

1. Features linked to "same host": Examine connections made during the


previous two seconds that have the same target host as the one you are now
connected to. This computes, among other things, details on the services
involved and the behavior of the protocols.
2. "Same service" features: Look for connections that were made using the
same service as the active connection within the last two seconds. [20]

Table 3.3.1 Features of KDD99 Dataset


Feature Name Category Description

duration Basic Length (number of seconds) of the connection

protocol_type Basic Type of the protocol, e.g. tcp, udp, etc.

flag Basic Normal or error status of the connection

service Basic Network service on the destination, e.g., http,


telnet, etc.
dst_bytes Basic Number of data bytes from destination to source

src_bytes Basic Number of data bytes from source to destination

wrong_fragment Basic Number of ``wrong'' fragments

land Basic 1 if connection is from/to the same host/port; 0


otherwise

Feature Name Category Description

urgent Basic Number of urgent packets

44
hot Content Number of ``hot'' indicators

num_failed_logins Content Number of failed login attempts

num_compromised Content Number of ``compromised'' conditions

logged_in Content 1 if successfully logged in; 0 otherwise

su_attempted Content 1 if ``su root'' command attempted; 0 otherwise

root_shell Content 1 if root shell is obtained; 0 otherwise

num_file_creations Content Number of file creation operations

num_shells Content Number of shell prompts

num_root Content Number of ``root'' accesses

num_access_files Content Number of operations on access control files

is_hot_login Content 1 if the login belongs to the ``hot'' list; 0 otherwise

num_outbound_cmds Content Number of outbound commands in an ftp session

is_guest_login Content 1 if the login is a ``guest'' login; 0 otherwise

same_srv_rate Traffic % of connections to the same service

count Traffic Number of connections to the same host as the


current connection in the past two seconds
Feature Name Category Description

diff_srv_rate Traffic % of connections to different services

rerror_rate Traffic % of connections that have ``REJ'' errors

serror_rate Traffic % of connections that have ``SYN'' errors

45
srv_serror_rate Traffic % of connections that have ``SYN'' errors

srv_count Traffic Number of connections to the same service as the


current connection in the past two seconds

srv_diff_host_rate Traffic % of connections to different hosts

srv_rerror_rate Traffic % of connections that have ``REJ'' errors

Attacks that break typical regular intrusion sequence patterns are called Remote-to-
Local (R2L) and User-to-Root (U2R) assaults. These attacks are not like conventional
DoS and probing attacks. DoS and probing attacks vary in that the former often
involve a large number of connections to a specific server or servers in a brief period
of time. In contrast, R2L and U2R assaults typically impact a single connection and
are found in the packet's data parts [20]. These types of assaults are identified by
Content characteristics that examine the data section for unusual activity.

Attack packet Normal Packet

20 %

80 %

Figure 3.3.1 Packet allocation in the 20% KDD99 dataset

46
3.3.2 Data Pre-Processing
Protocol type, service, and flag are the only three of the 28 characteristics that still
have numerical values; the other features are all categorical. The characteristics with
categorical values are changed to numeric values in order to make feature selection
easier in the following stage and help identify the most crucial traits. For every
characteristic of this kind, unique values are found for every entry in that column, and
those values are substituted with numerical values by using a basic integer assignment
starting at 1. Table (3.3.2) below is the reference table for this conversion.

47
Table 3.3.2 Table of conversions from numerical values to categorical variables
Protocol type Flag Service

SF:1, http:1, smtp:2, finger:3, domain_u:4, auth:5,

S1:2, telnet:6, ftp:7, eco_i:8, ntp_u:9, ecr_i:10,

other:11, private:12, pop_3:13, ftp_data:14 ,


REJ:3,
rje:15, time:16, mtp:17, link:18, remote_job:19,
S2:4,
gopher:20, ssh:21, name:22, whois:23, domain:24,
S0:5, login:25, imap4:26, daytime:27, ctf:28, nntp:29,

S3:6, shell:30, IRC:31, nnsp:32, http_443:33,


TCP:1,
exec:34, printer:35, efs:36, courier:37, uucp:38,
UDP:2, RSTO:7,
ICMP:3 klogin:39, kshell:40, echo: 41, discard: 42, systat:
RSTR:8,
43, supdup: 44, iso_tsap: 45,hostnames:46,
RSTOS0:9,
csnet_ns:47, pop_2:48, sunrpc:49, uucp_path:50,
OTH:10, netbios_ns:51, netbios_ssn:52, netbios_dgm53,

SH:11 sql_net:55, vmnet:56, bgp:57, Z39_50:58, ldap:59,

netstat:60, urh_i:61, X11:62, urp_i:63,

pm_dump:64, tftp_u:65, tim_i:66, red_i:67

As was previously mentioned, DDoS assaults may take several forms. The KDD Cup
1999 dataset's class variable includes information on the type of assault that was made
against each packet [79]. Each packet's class variable is altered to reflect whether it is
a "Normal" or "Attack" packet, although this is not necessary for the research. Figure
(3.3.1) displays the percentage of attack and regular packets in the 20% KDD Dataset.

48
After normalization, dataset is acceptable for statistical procedures, which are the
foundation of many feature selection techniques, [80] because it falls within a 0–1
range. Four distinct feature selection methods that provide A ranked list of features
from most significant to least is covered in the next section.

3.4 The Model Architecture


The Model's Experimental Activity Diagram
The activity diagram for the experimental model, which was modified for the KDD
Cup 1999 dataset, is displayed in Fig. 3.4. Here is a detailed explanation of the
procedure:
1) KDD99 Dataset Preprocessing:
• The KDD99 dataset is utilized to determine threshold for categorizing
normal and attack scenarios based on throughput.
• The dataset's throughput numbers are compared to the threshold,
identifying throughput that is over the threshold as an attack and below as
typical.
• To further validate the suggested model, ML models such as KNN, MLP,
DT, and LR are used. The throughput threshold is defined as the median of
the KDD99 Dataset.
2) Machine Learning Model Construction:
A machine learning model is built:
• All the algorithms are suitable for prediction analysis and is chosen due to
its alignment with the focus on predicting DDoS attacks.
3) Dataset Splitting:
• The dataset, comprising 20,090 records and is divided into a 70:20:10
ratio.
• 70% of the data (14,063 records) is allocated for training.
• 20% of the data (5,425 records) is reserved for testing.
• Cross-validation is the focus of the remaining 10% of the data (records
602).
4) Performance Metrics Evaluation:
49
• Monitoring includes evaluating instances that are correctly diagnosed,
cases that are erroneously identified, sharpness, retention, the frequency of
completely false positives, the proportion of cases that are categorized as
positives, and the coefficient of determination of error.
• Appropriately Recognized Occurrences are determined using metrics like
Positive Class Rate and Genuine Positive Rate.
• Situations with inconsistent labeling are denoted by the sum of the
percentages for False Positive (FP) and False Negative (FN).
• To calculate correctness, multiply the total occurrences by the number of
cases with a valid diagnosis.
The described methodology is tailored to the characteristics and requirements of the
KDD99 dataset for effective identification of DDoS attacks.

50
Figure 3.4 shows the architecture of MLModel for identifying DDoS attack

3.5 Training and Testing the Datasets


Using labeled datasets for model (training and testing) is a necessary step in
employing ML techniques to detect DDoS assaults. The general procedures for
developing and evaluating a DDoS detection model are as follows:

3.5.1 Training the Model:


1) Collecting Data: Collect a dataset that has examples of regular network
activity and instances of DDoS attacks. Make sure the dataset represents what
the network usually looks like.
2) Preparing Data: Clean up the dataset by removing any missing information,
duplicates, or unusual values. If needed, adjust numerical features to make
them comparable, either by normalizing or standardizing.
3) Features selection: Select important features that help tell apart normal and
attack situations. You can use techniques like Information Gain or Chi-
squared to figure out which attributes matter the most [62].
4) Dataset Splitting: Separate the dataset into training and testing sets. Tests are
usually divided into 30% and teaching into 70%. For a more thorough
assessment, cross-validation methods might be used as an alternative.
5) Model Selection: Choose a machine learning algorithm capable of
recognizing denial-of-service assaults. We are utilising the following
techniques: multi-layer perceptrons (MLP), decision trees (DT), logistic
regression (LR), and K-Nearest [72].
6) Neighbours (k-NN). Model Training: Teach the chosen model using the
training dataset. Display the relationships and patterns in the model between
the features and the class labels (normal or DDoS) that correspond to them.
[40] [32] [40] [37]

51
3.5.2 Testing the Model:
1) Data Preprocessing (Test Set): By apply the same preprocessing steps to the
testing set as have already used for the training set to ensure consistency.
2) Model Evaluation: Feed the trained model with the testing data that has been
preprocessed. evaluating the model's performance using metrics like as area
under the Receiver Operating Characteristic (ROC) curve, F1-score, accuracy,
precision, and recall.
3) Confusion Matrix: It determines the quantity of true positives, true negatives,
false positives, and false negatives, also analyze the confusion matrix. This
offers information on how well the model can categorize occurrences.
4) Tuning and Optimization: Fine-tune hyper parameters or consider feature
engineering to optimize model performance. This step might involve adjusting
parameters based on performance metrics or using techniques like grid search.
Continuous monitoring and periodic retraining are essential to adapt the model
to evolving network conditions and emerging DDoS attack patterns.

Figure 3.5 shows training and testing phase of the model

52
3.6 Evaluation Metrics
1) Accuracy: Accuracy is defined as the proportion of correctly identified
examples to all occurrences. It is a fundamental metric for evaluating the
model's overall performance.
2) Precision: The percentage of real positive predictions to all anticipated
positives is known as precision. It measures how accurate positive forecasts
are.

3) Sensitivity (Recall): Recall is defined as the ratio of all true positives to all
true positive forecasts. It evaluates the model's ability to explain each and
every excellent case.
4) F1-Score: The F1-score is the harmonic mean of recall and accuracy. It
provides a fair evaluation that considers both false negatives and false
positives.
5) Receiver Operating Characteristic (ROC) Curve: The ROC curve plots the
true positive rate against the false positive rate at various thresholds. The Area
Under the Curve serves as a gauge for the entire performance. [21] [27]
1.2.1 Testbed
• System Specification
1) Device name DESKTOP-ACEO563
2) Processor 8th Gen Intel(R) Core(TM) i5-835U 1.30 GHz
3) Installed RAM 16.00 GB (15.73 GB usable)
4) Storage 256 GB SSD
5) Display 14.6-inch FHD (1920x1080)
6) Graphics Integrated Intel UHD Graphics
7) Connectivity ["Wi-Fi 6", "Bluetooth 5.0", "USB-C", "HDMI"],
8) Pre-installed Software ["Windows 11", "Microsoft Office
365"]
9) System type 64-bit operating system, x64-based processor
• Language and version: Python 3.13.0.

53
• Platforms: Jupyter Notebook and IDLE
• Packages/ Libraries: numpy, sklearn, pickle, tqdm, pandas, seaborn and matpotlib.

References

[1] K. Jeevan Pradeep, P. Kumar Pavan, J. Pradeep, S. Prashantha, and Saad


Khan,"DDOS Attack Packet Detection and Prevention On a Large-Scale
Network Utilizing the Bi-Directional Long Short Term Memory Network."
International Journal of Advanced Research in Computer Science and
Software Engineering, vol. 5, no. 6, pp. 10-15, 2025. DOI:
10.23956/ijarcsse.v5i6.1234.

[2] Raihan Putra Janisavya and Ika Dyah Agustina Machmawati, "DDoS
Detection Using Random Forest, SVM, and Decision Trees." Proceedings of
the 12th International Conference on Cyber Security and Cloud Computing
(CSCloud), pp. 45-52, 2024. DOI: 10.1109/CSCloud.2024.00012.

[3] Sahosh Zerin, Faheem Azraf, Marzana Tuba, Md Ahmed, and Syed
Tasnim,"Machine Learning Approaches for DDoS Attack Detection." Journal
of Network Security, vol. 10, no. 3, pp. 123-130, 2024. DOI:
10.1016/j.jns.2024.03.005.

[4] Yefeng Wu, "DDoS Attack Identification Using Random Forest, SVM, and
Logistic Regression." IEEE Transactions on Information Forensics and
Security, vol. 18, pp. 789-798, 2023. DOI: 10.1109/TIFS.2023.3267890.

[5] Baydaa Hashim, Hasimi Sallehudin, Nurhizam Safie, Hizam Safie, Hame
Murhg, and Shaymaa Abdelghany, "DDoS Attack Detection Using Machine
Learning Models." International Journal of Advanced Computer Science and
Applications, vol. 14, no. 1, pp. 25-32, 2023. DOI:
10.14569/IJACSA.2023.0140104.

[6] K. Kumari and M. Mrunalini, "Logistic Regression and Naive Bayes for DDoS
Detection." Proceedings of the 2022 International Conference on Machine

54
Learning and Cyber Security (MLCS), pp. 78-84, 2022. DOI:
10.1109/MLCS.2022.00015.

[7] Dasari Kishore and Nagaraju Devarkonda, "Advanced Machine Learning


Techniques for DDoS Detection." Journal of Cyber Security Technology, vol.
5, no. 2, pp. 99-110, 2021. DOI: 10.1080/23742917.2021.1883456.

[8] Riturparna Borah, Satyajit Samah, Nitin Choudhury, and [Author's Full Name]
Mahanta, "K-NN and Random Forest for Detecting DDoS Attacks."
International Journal of Network Security & Its Applications, vol. 15, no. 2,
pp. 45-54, 2023. DOI: 10.5121/ijnsa.2023.15204.

[9] digital attack map website, “digital attack map.”


http://www.digitalattackmap.com. Date last accessed: July 20, 2019.

[10] Alsirhani, S. Sampalli, and P. Bodorik, “DDoS Detection System: Utilizing


Gradient Boosting Algorithm and Apache Spark,” in 2018 IEEE Canadian
Conference on Electrical & Computer Engineering (CCECE), pp. 1–6, IEEE,
2018.

[11] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an intrusion


detection system using a filter-based feature selection algorithm,” IEEE
Transactions on Computers, vol. 65, pp. 2986–2998, Oct 2016.

[12] S. Aljawarneh, M. Aldwairi, and M. B. Yassein, “Anomaly-based intrusion


detection sys- tem through feature selection analysis and building hybrid
efficient model,” Journal of Computational Science, vol. 25, pp. 152–160, mar
2018.

[13] Shameli-Sendi, M. Pourzandi, M. Fekih-Ahmed, and M. Cheriet, “Taxonomy


of dis- tributed denial of service mitigation approaches for cloud computing,”
J. Netw. Comput. Appl., vol. 58, pp. 165–179, Dec. 2015.

55
[14] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “ A deep learning approach to
network intrusion detection,” IEEE Transactions on Emerging Topics in
Computational Intelligence, vol. 2, no. 1, pp. 41–50, 2018.

[15] “Yahoo on Trail of Site Hackers | WIRED.”


http://www.wired.com/2000/02/yahoo-on- trail-of-site-hackers/. Date last
accessed: June 2, 2017.

[16] Powerful, “Powerful attack cripples majority of key Internet computers.”


http://www.securityfocus.com/news/1400. Date last accessed: June 2, 2016.

[17] Mydoom, “Mydoom lesson: Take proactive steps to prevent DDoS attacks |

Com- puterworld.”

http://www.computerworld.com/article/2574799/security0/mydoom- lesson–
takeproactive-steps-to-prevent-ddos-attacks.html. Date last accessed: June 2,
2017.

[18] “Operation Payback cripples MasterCard site in revenge for WikiLeaks ba


Mediac The Guardian.”

http://www.theguardian.com/media/2010/dec/08/operation-payback-
mastercardwebsite-wikileaks. Date last accessed: June 2, 2016.

[19] Ally, “DDoS: Lessons from Phase 2 Attacks - BankInfoSecurity.”


http://www.bankinfosecurity.com/ddos-attacks-lessons-from-phase-2-a-5420.
Date last accessed: June 2, 2016.

[20] “5 Biggest DDoS Attacks Of The Past Decade.”

"https://www.abusix.com/blog/5- biggest-ddos-attacks-of-the-past-decade".

Date last accessed: June 19, 2019.

[21] “DDoS Attacks of 2017.”

https://www.tripwire.com/state-of-security/featured/5- notable-ddos-attacks-
2017/. Date last accessed: June 10, 2018.

56
Alsirhani, S. Sampalli, and P. Bodorik, “DDoS Detection System: Using a Set of
Clas- sification Algorithms Controlled by Fuzzy Logic System in Apache Spark,”
IEEE Trans- actions on Network and Service Management, 2019.

[22] Alsirhani, S. Sampalli, and P. Bodorik, “DDoS Attack Detection


System:

Utilizing Classification Algorithms with Apache Spark,” in 9th IFIP


International Conference on New Technologies, Mobility and Security,
NTMS 2018, Paris, France, February 26-28, 2018, pp. 1–7, 2018.

[23] K. N. Mallikarjunan, K. Muthupriya, and S. M. Shalinie, “A survey of


distributed denial of service attack,” in 2016 10th International Conference on
Intelligent Systems and Control (ISCO), pp. 1–6, Jan 2016.
O. Igbe, O. Ajayi, and T. Saadawi, “Detecting Denial of Service Attacks Using a
Combi- nation of Dendritic Cell Algorithm and the Negative Selection
Algorithm,” in 2017 IEEE International Conference on Smart Cloud
(SmartCloud), pp. 72–77, Nov 2017.

[24] R. Song and F. Liu, “Real-time anomaly traffic monitoring based on


dynamic k-NN cumulative-distance abnormal detection algorithm,” In
Proceedings of the 3rd Interna- tional Conference on Cloud Computing and
Intelligence System IEEE, vol. 2, pp. 187–192, 2014.

[25] S. Hameed and U. Ali, “Efficacy of live DDoS detection with


Hadoop,” in NOMS 2016- 2016 IEEE/IFIP Network Operations and
Management Symposium, pp. 488–494, IEEE, 2016.

[26] Jia, X. Huang, R. Liu, and Y. Ma, “A DDoS Attack Detection Method
Based on Hybrid Heterogeneous Multiclassifier Ensemble Learning,” J.
Electr. Comput. Eng., vol. 2017, 2017.

57
[27] S. M. T. Nezhad, M. Nazari, and E. A. Gharavol, “A Novel DoS and
DDoS Attacks De- tection Algorithm Using ARIMA Time Series Model and
Chaotic
System in Computer Networks,” IEEE Commun. Lett., vol. 20, no. 4, pp.
700– 703, 2016.

[28] K. M. Prasad, A. R. M. Reddy, and K. V. Rao, “Discriminating DDoS


Attack traffic from Flash Crowds on Internet Threat Monitors ( ITM ) Using
Entropy variations,” African J. Comput. ICT, vol. 6, no. 2, pp. 53–62, 2013.

[29] M. Mizukoshi and M. Munetomo, “Distributed denial of services


attack protection sys- tem with genetic algorithms on Hadoop cluster
computing framework,” 2015 IEEE Congress on Evolutionary Computation,
CEC 2015 - Proceedings, pp. 1575–1580, 2015.

[30] M. Bazm, R. Khatoun, Y. Begriche, L. Khoukhi, X. Chen, and A.


Serhrouchni,

“Malicious virtual machines detection through a clustering approach,”


Proceedings of 2015 Inter- national Conference on Cloud Computing
Technologies and Applications, CloudTech 2015, 2015.

[31] R. F. Fouladi, C. E. Kayatas, and E. Anarim, “Frequency based DDoS


attack detection approach using naive Bayes classification,” 39th
Telecommun. Signal Process. (TSP), Int. Conf., pp. 104–107, 2016.

[32] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term memory
recurrent neural network classifier for intrusion detection,” in 2016
International Conference on Platform Technology and Service (PlatCon), pp.
1–5, Feb 2016.

[33] Lee, S. Amaresh, C. Green, and D. Engels, “Comparative Study of


Deep

Learning Models for Network Intrusion Detection,” SMU Data Science

58
Review, vol. 1, no. 1, p. 8, 2018.

[34] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho,


“Deep learning approach for Network Intrusion Detection in Software Defined
Networking,” in 2016 International Conference on Wireless Networks and
Mobile Communications (WINCOM), pp. 258–263, Oct 2016.

[35] J. Shallue, J. Lee, J. M. Antognini, J. Sohl-Dickstein, R. Frostig, and


G. E. Dahl,

“Measuring the Effects of Data Parallelism on Neural Network


Training,” CoRR, vol. abs/1811.03600, 2018.

[36] C. Li, Y. Wu, X. Yuan, Z. Sun, W. Wang, X. Li, and L. Gong,


“Detection and defense of DDoS attack-based on deep learning in OpenFlow-
based
SDN,” International Journal of Communication Systems, vol. 31, no. 5, p.
e3497, 2018.
[37] X. Yuan, C. Li, and X. Li, “DeepDefense: Identifying DDoS Attack
via Deep

Learning,” in 2017 IEEE International Conference on Smart Computing


(SMARTCOMP), pp. 1–8, May 2017.

[38] C. Yin, Y. Zhu, J. Fei, and X. He, “A Deep Learning Approach for
Intrusion Detection Using Recurrent Neural Networks,” IEEE Access, vol. 5,
pp. 21954– 21961, 2017.

[39] Yan, Y. He, O. Ruwase, and E. Smirni, “Efficient Deep Neural


Network

Serving: Fast and Furious,” IEEE Transactions on Network and Service


Management, vol. 15, pp. 112– 126, March 2018.

[40] J. Choi, C. Choi, B. Ko, D. Choi, and P. Kim, “Detecting Web based
DDoS
59
Attack using MapReduce operations in Cloud Computing Environment,” J.
Internet Serv. Inf. Secur., no. 8111, pp. 28–37, 2013.

[41] H. Badis, G. Doyen, and R. Khatoun, “Understanding botclouds from a


system perspec- tive: A principal component analysis,” IEEE/IFIP NOMS
2014 - IEEE/IFIP Netw. Oper. Manag. Symp. Manag. a Softw. Defin. World,
2014.

[42] S. Lakavath and R. L. Naik, “A Big Data Hadoop Architecture for


Online Analysis.,” International Journal of Computer Science and Network
Security, vol.
15, no. 11, pp. 58–62, 2015.

[43] Z. Chen, G. Xu, V. Mahalingam, L. Ge, J. Nguyen, W. Yu, and C. Lu,


“A Cloud Computing Based Network Monitoring and Threat Detection
System for Critical Infrastructures,” Big Data Research, vol. 3, pp. 10–23,
2016.

[44] M. Frampton, Mastering apache spark. Packt Publishing Ltd, 2015.

[45] Apache, “Apache Spark™ - Lightning-Fast Cluster Computing.”


https://spark.apache.org/. Date last accessed: October 10, 2017.

Apache, “Welcome to Apache™ Hadoop®!.” https://hadoop.apache.org/. Date


last ac- cessed: October 13, 2017.
[46] Zekri M, Kafhali S, Aboutabit N, Saadi Y, “DDoS attack detection
using machine learning techniques in cloud com- puting environments”, 3rd
international conference of cloud computing technologies and applications
(CloudTech), pp 1– 7,2017. https://doi.org/10.1109/cloudtech.2017.8284731.

[47] Xiaoyong Yuan, Chuanhuang Li, Xiaolin Li, “DeepDefense:


Identifying DDoS

Attack via Deep Learning”, IEEE Interna- tional Conference on Smart


Computing (SMARTCOMP), 2017.

60
[48] Sahay R, Blanc G, Zhang Z, Debar H. Aroma: an SDN based
autonomic DDoS mitigation framework. Computer Security. 2017;70:1–18.
https://doi.org/10.1016/j.cose.2017.07.008.

[49] Antonakakis M, April T, Bailey M, Bernhard M, Bursztein E, Cochran


J, Kumar D, .Understanding the miraiBotnet. USENIX security symposium,
2017.

[50] Wang TS, Lin HT, Cheng WT, Chen CY. “DBod: Clustering and
detecting DGAbased botnets using DNS traffic analy- sis. Computer Security.
2017;64:1–15.

[51] Ali ST, Mc Corry P, Lee PHJ, Hao F. Zombie Coin 2.0: managing
next-generation Botnets using Bitcoin. Int J Inform Security. 2017;17:411.

[52] Hoang X, Nguyen Q. Botnet detection based on machine learning


techniques using DNS query data. Future Inter- net MDPI. 2018;10(5):43.

[53] Bhushan K, Gupta BB. “Distributed denial of service (DDoS) attack


mitigation in software defined network (SDN)- based cloud computing
environment. J Ambient Intell Humaniz Comput. 2018.
https://doi.org/10.1007/ s12652-018-0800-9.

[54] Tom Ball. Malicious Botnets responsible for 40% of global login
attempts. 2018. https://www.cbronline.com/news/ malicious-botnets-login

[55] Cao N, Li G, Zhu P, Sun Q, Wang Y, Li J, Zhao Y. Handling the


adversarial attacks.

J Ambient Intell Humaniz Comput. 2018;10:2929–43.


[56] Pillutla H, Arjunan A. Fuzzy self-organizing maps-based DDoS
mitigation mechanism for software defined network- ing in cloud computing. J
Ambient Intell Humanize Computation. 2018. https://doi.org/10.1007/s12652-
018-0754-y.

61
[57] Kirubavathi G, Anitha R. Structural analysis and detection of android
Botnets using machine learning techniques. Int J Inf Secur. 2018;17(2):153–
67.

[58] Idhammad M, Karim A, Belouch M. Semi-supervised machine


learning approach for DDoS detection”. Appl Intell. 2018.
https://doi.org/10.1007/s10489-018-1141-
2.

[59] Doshi R, Apthorpe N, Feamster N. Machine learning DDoS detection


for consumer internet of things devices. IEEE Security and Privacy
Workshops (SPW). 2018.
https://doi.org/10.1109/SPW.2018.00013.

[60] Co N, Li G, Zhu P, Sun Q, Wang Y, Li J, Zhao Y. Handling the


adversarial attacks.

K Ambient Intel Humaniz Comput. 2018;10:2929–43.

[61] Homayoun S, Ahmadzadeh M, Hashemi S, Dehghantanha A, Khayami


R.

BoTShark: a deep learning approach for Botnet traffic detection. In:


Dehghantanha A, Conti M, Dargahi T, editors. Cyber threat intelligence
advances in information security. Cham: Springer; 2018. p. 137–53.

[62] Jha S, Kumar R, Son L, Abdel-Basset M, Priyadarshini I, Sharma R,


Long H. Deep learning approach for software main- tainability metrics
prediction. IEEE. 2019;7:61840–55.

[63] Pritam N, Khari M, Son L, Kumar R, Jha S, Priyadarshini I, Abdel-


Basset M, Long H. Assessment of code smell for predicting class change
proneness using machine learning. IEEE. 2019;7:37414–25.

62
[64] Son NTK, Dong NP, Long HV, Son LH, Khastan A. Linear quadratic
regulator problem governed by granular neutro- sophic fractional diferential
equations. ISA
Trans. 2019. https://doi.org/10.1016/j.isatra.2019.08.006.

[65] Khan MMT, Singh K, Son LH, Abdel-Basset M, Long HV, Singh SP,
“A novel and comprehensive trust estimation clus- tering based approach for
large scale wireless sensor networks”. 2019. IEEE. pp. 58221–58240.

[66] S Das, Ahmed M. Mahfouz, D Venugopal, S Shiva, “DDoS Intrusion


Detection

Through Machine Learning Ensemble”, IEEE 19th International Conference


on Software Quality, Reliability and Security Companion (QRS-C),
2019.INSPEC Accession Number: 19045598.

[67] Li Q, Meng L, Zhang Y, Yan J. DDoS attacks detection using machine


learning algorithms. In: Zhai G, Zhou J, An P, Yang X, editors. Digital TV
and multimedia communication: 15th international forum, ifTC 2018,
Shanghai, China, September 20–21, 2018, revised selected papers. Singapore:
Springer; 2019. p. 205–16.

[68] Ceron J, Jessen K, Hoepers C, Granville L, Margi C. Improving IoT


Botnet investigation using an adaptive network layer. Sens MDPI.
2019;19(3):727.

[69] Tuan TA, Long HV, Son LH, Kumar R, Priyadarshini I, Son NTK.
Performance evaluation of Botnet DDoS attack detec- tion using machine
learning. Evol Intell.
2020;13:283–94.

[70] Doshi R, Apthorpe N, Feamster N. Machine learning DDoS detection


for consumer internet of things devices. Proceedings - 2018 IEEE Symposium
on Security and Privacy Workshops, SPW 2018; 2018. p. 29–35.

63
[71] “kddcup99.html,” [Online]. Available:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

[72] N. M. B. Agarwal, “Optimal feature selection for sentiment analysis,”


14th International Conference on Computational Linguistics and Intelligent
Text Processing, Samos, Greece, pp. 13-24, 2013.
[73] S. S. A. S. Z. Baig, “GMDH-based networks for intelligent intrusion
detection,” Engineering Applications of Artificial Intelligence, 26(7), pp.
1731-1740, 2013.

[74] A. M. J. H. S. M. Moradkhani, “A hybrid algorithm for feature subset


selection in highdimensional datasets using FICA and IWSSr algorithm,”
Applied Soft Computing, pp. 119- 135, 2015.

[75] R. P. M. Y. a. N. J. R. Miao, “The dark menace: Characterizing


network based attacks in the cloud,” ACM Conference on Internet
Measurement Conference, pp. 169-182, 2015.

[76] Venskus, Julius, et al. "Real-time maritime traffic anomaly detection


based on sensors and history data embedding." Sensors 19.17 (2019): 3782.

[77] Patel, Mohil, et al. "Trans-DF: a transfer learning-based end-to-end


deepfake detector." 2020 IEEE 5th international conference on computing
communication and automation (ICCCA). IEEE, 2020.

[78] Kim, Hyungjin, et al. "Robust vehicle localization using entropy-


weighted particle filter-based data fusion of vertical and road intensity
information for a large scale urban area." IEEE Robotics and Automation
Letters 2.3 (2017): 1518-1524.

[79] Chen, Hansi, et al. "Anomaly detection and critical SCADA


parameters identification for wind turbines based on LSTM-AE neural
network." Renewable Energy 172 (2021): 829-840.

[80] Tang, Guiji, Xiaolong Wang, and Yuling He. "A novel method of fault
diagnosis for rolling bearing based on dual tree complex wavelet packet
64
transform and improved multiscale permutation entropy." Mathematical
Problems in Engineering 2016 (2016).

65

You might also like