11
11
On
Group 11
In 2016, the Mirai botnet DDoS attack exploited IoT devices, crippling major websites like
Twitter and Netflix by flooding DNS provider Dyn with traffic. This attack underscored the
growing threat posed by the proliferation of connected devices.
Another notable incident occurred in 2018 when GitHub faced a record-breaking 1.35 Tbps
attack, leveraging Memcached servers to amplify traffic. These historical events illustrate the
evolving tactics and increasing scale of DDoS attacks.
Each incident prompted advancements in defensive measures, from improved traffic filtering
to deploying more sophisticated intrusion detection systems. Understanding these pivotal
moments provides crucial insights into DoS threats' persistent and adaptive nature,
emphasizing the need for continuous innovation in cybersecurity defenses.
The distribution of hosts that defines a DDoS provides the attacker multiple advantages:
They can leverage the greater volume of machines to execute a more disruptive attack
The location of the attack is difficult to detect due to the random distribution of
attacking systems (often worldwide and from otherwise legitimate systems)
The true attacking party is challenging to identify, as they are disguised behind many
(mostly compromised) systems
DDoS attacks are challenging to mitigate because blocking one source does not stop the attack.
They require more sophisticated solutions, such as traffic analysis, rate limiting, and using
content delivery networks (CDNs) to distribute and absorb the traffic load.
The most common denial of service (DoS) attack is the buffer overflow attack, which involves
sending more traffic to a network address than the system is designed to handle. This can
manifest in various forms, including:
ICMP flood: This attack targets misconfigured network devices by sending spoofed
packets that ping every computer on the targeted network, causing the network to
amplify the traffic. It is also known as the Smurf attack or ping of death.
SYN flood: In this attack, a request to connect to a server is sent, but the handshake is
never completed. This continues until all open ports are saturated with requests, making
none available for legitimate users to connect to.
Malicious actors exploit buffer overflow vulnerabilities by overloading a buffer with data,
leading to system crashes and unpredictable behavior. Attackers may also inject malicious code
to gain unauthorized access and compromise sensitive information.
Flood Attacks
Attackers overwhelm a network with excessive traffic, disrupting legitimate requests. This
often involves botnets and strains the target's resources, as seen in the 2016 Dyn attack.
Mitigation strategies include rate limiting, traffic analysis, firewalls, content delivery networks,
redundancy, proactive monitoring, and anomaly detection.
Protocol Attacks
Attackers exploit weaknesses in network protocols to disrupt services, often targeting TCP/IP
layers:
SYN flood attacks overwhelm servers and exhaust resources by sending numerous
connection requests without completing the handshake.
DNS amplification attacks leverage vulnerable DNS servers to amplify traffic, directing
it to the target.
Attackers inundate networks with massive volumes of traffic, overwhelming bandwidth and
server capacity. Botnets, comprising thousands of compromised devices, generate this flood,
challenging detection and mitigation.
Common tactics include UDP floods, which exploit the connectionless nature of the protocol,
and ICMP floods, which bombard the target with echo requests. These attacks can peak at
terabits per second, crippling even robust infrastructures.
Effective defenses involve deploying robust traffic filtering, leveraging content delivery
networks (CDNs) to absorb excess traffic, and utilizing scrubbing centers to cleanse incoming
data. Constant monitoring and adaptive rate limiting can enhance resilience against these high-
volume onslaughts.
Cloud-Based Attacks
DoS attacks on cloud resources often focus on hypervisor and crypto-jacking.
How: These attacks exploit vulnerabilities in the hypervisor layer, which manages and
allocates resources to virtual machines (VMs).
Impact: If successful, the hypervisor can crash, rendering all VMs on that host
inaccessible.
Result: The entire cloud infrastructure becomes unavailable, affecting services and
users.s
Hypercall Attacks:
How: Attackers send specially crafted requests to the cloud hypervisor, aiming to
extract information or execute malicious code.
Impact: If the hypervisor processes these malicious hypercalls, it can lead to resource
exhaustion or system instability.
Hyperjacking:
How: An attacker installs a rogue hypervisor beneath the original one. The rogue
hypervisor remains undetected, allowing the attacker to gain control of the target
hypervisor and its resources.
Impact: With control of the hypervisor, the attacker can manipulate the VM's behavior,
consume resources, or launch further attacks.
Impact: Crypto-jacking depletes available resources, such as CPU, RAM, and Network
bandwidth, making a VM unresponsive
Cybercriminals use botnets, networks of compromised devices, for large-scale DDoS attacks.
Infected devices bombard targets with overwhelming traffic without their owners knowing.
Malware infiltrates devices through phishing emails, malicious downloads, or unpatched
software. Compromised devices become part of a botnet, controlled remotely by the attacker.
Mirai, a notorious botnet, has taken down major websites with massive traffic floods.
Hackers employ a variety of sophisticated tools and scripts to launch DoS attacks. LOIC (Low
Orbit Ion Cannon) and HOIC (High Orbit Ion Cannon) are popular open-source tools that
enable users to flood targets with HTTP, TCP, or UDP requests. Script kiddies often use these
tools due to their ease of use.
Advanced attackers might deploy custom Python or Perl scripts to exploit specific
vulnerabilities. These scripts can automate the process, launching highly targeted attacks that
bypass traditional defenses. Tools like Metasploit also provide modules for DoS attacks,
allowing attackers to integrate them into broader exploitation frameworks.
Amplification Techniques
Attackers exploit amplification techniques to magnify the volume of traffic directed at a target,
overwhelming its resources. By leveraging protocols like DNS, NTP, and SSDP, they send
small requests with spoofed IP addresses, causing servers to respond with significantly larger
replies to the victim.
This method, known as reflection, can exponentially increase the attack's impact. For example,
a 1-byte request can generate a 100-byte response, creating a 100:1 amplification ratio.
Sudden spikes in traffic often signal a DoS attack, overwhelming network resources and
causing service disruptions. Unusual patterns, such as repeated requests from a single IP
address or a surge in incomplete connections, also indicate malicious activity. Degraded system
performance, including slow response times and frequent crashes, further highlights potential
threats.
Monitoring tools that analyze traffic in real time can identify these anomalies and provide
critical insights. Machine learning algorithms enhance detection by recognizing deviations
from normal behavior, enabling quicker responses. Accurate identification of these indicators
is vital for mitigating the impact of DoS attacks and maintaining system integrity.
Real-time traffic analysis helps detect DoS attacks by monitoring data packets for irregularities.
Advanced systems use machine learning to differentiate between legitimate traffic and
potential threats, with automated alerts for immediate response. Effective traffic analysis
detects ongoing attacks and provides valuable data for strengthening defenses against future
threats.
Machine learning algorithms analyze behavioral patterns to distinguish between normal and
malicious user activity. Legitimate traffic displays consistent, predictable patterns, while
malicious traffic often shows erratic spikes and unusual request types.
Deep packet inspection (DPI) scrutinizes data at a granular level to identify anomalies that
signal potential threats. Whitelisting known IP addresses and employing rate limiting further
refine traffic differentiation.
Use deep packet inspection (DPI) to analyze data packets for malicious signatures and
anomalies.
Implement web application firewalls (WAFs) to filter and monitor HTTP traffic,
blocking harmful requests before they reach the server.
Utilize intrusion detection systems (IDS) and intrusion prevention systems (IPS) to
detect and prevent suspicious activities in real time.
Employ Secure Sockets Layer (SSL) encryption to protect data integrity and
confidentiality, making it harder for attackers to intercept and manipulate traffic.
Integrate machine learning algorithms and artificial intelligence (AI) to identify and
adapt to new attack patterns, enhancing the strength of your defenses against
sophisticated threats.
Rate Limiting and Traffic Filtering
Set rate limits to throttle incoming requests to prevent overwhelming your servers. This
approach helps manage a user's requests within a specific timeframe, effectively mitigating
potential denial of service (DoS) attacks.
Implement traffic filtering to distinguish between legitimate and malicious traffic, using criteria
such as IP reputation and request patterns. By employing these measures, you can ensure
genuine users maintain access while blocking harmful traffic. Real-time monitoring tools can
adjust rate limits and filtering rules dynamically, providing an adaptive defense mechanism
against evolving threats.
Deploy anycast networks to distribute traffic across multiple servers, reducing the risk of a
single point of failure. By routing requests to the nearest or least congested server, anycast
enhances load balancing and minimizes latency. This strategy improves user experience and
mitigates the impact of DoS attacks by dispersing malicious traffic.
Incident Response and Recovery Plans
Organizations must establish vigorous incident response and recovery plans to counteract and
recover from DoS attacks swiftly. Rapid identification of attack vectors and immediate
isolation of affected systems are crucial.
Employ automated real-time monitoring and alerting tools to ensure swift detection and
response. Develop a comprehensive recovery strategy that includes data backups, system
redundancies, and predefined communication protocols. Regularly update and test these plans
to adapt to evolving threats.
3. Financial Services: Preventing downtime for online banking systems and transaction
gateways.
Use Cases
1. Smart Homes:
o Protecting devices like smart speakers, thermostats, and security cameras from
malicious traffic.
2. Industrial IoT (IIoT):
o Safeguarding industrial controllers and sensors from attacks that can disrupt
manufacturing processes.
3. Smart Cities:
4. Healthcare IoT:
DoS (Denial of Service) and DDoS (Distributed Denial of Service) attacks are cyberattacks aimed
at disrupting the normal functioning of a system, server, or network by overwhelming it with
excessive traffic or requests. While a DoS attack originates from a single source, a DDoS attack
leverages multiple compromised devices, often part of a botnet, to launch a coordinated assault.
These attacks exploit vulnerabilities via port numbers to crash servers, degrade performance, or
make services unavailable to legitimate users. Common techniques include TCP SYN Floods, UDP
Floods, Ping of Death, and HTTP request floods, targeting websites, APIs, DNS servers, or other
internet-exposed services. DDoS attacks are particularly challenging to mitigate due to their
distributed nature, making it hard to distinguish between malicious and legitimate traffic.
Motivations range from financial gain and hacktivism to cyberwarfare and personal vendettas.
Effective defense strategies include using firewalls, traffic filtering, rate limiting, CDNs, and
specialized DDoS protection services, alongside proactive measures like traffic monitoring and
incident response plans. And in this report we will also be discussing how SNORT IDE( Intrusion
Detection Engine) works.
First of all, what is a DOS attack? As I have written - DOS attack is Denial of service attack , in
which a sole hacker either by their own IP address or a spoofed IP address sends a lot of packets in
seconds like 100000 packets per second. Now attached below is a pic of what happens normally.
Now when you try to do a DoS attack which is specifically a TCP syn flooding what happens is
that there is a spoofed IP address here, as we see below spoofed IP may not even exist so when the
target device which is the server wants to send a Syn/Ack request but the spoofed IP will not
accept it so the device is in an half open state. So in this time our IP address sends
data/malware/packets continuously so it will be in a half open state and will continuously be in an
half open state. So when a legitimate user wants a service from the service , what would be
displayed is “Nothing”, it will keep on buffering and even if you open in a new tab it will continue
to do so.
DDoS attacks are very similar, what happens in this case is not 1 computer/hacker but many
hackers sending packets at the same time. So it will be very very hard for the server to handle and
it may even crash. Other times what can happen is that they can steal the data too and do phishing
attacks. So that’s done, now let’s go into the attacking part.
Now there are various types of DoS,DDoS attacks. We will be performing TCP syn flood attack,
UDP flood and ICMP flooding. In TCP flooding the 3 way handshake never happens , in UDP
flooding the attacker sends a large volume of UDP packets to a random port on the target system,
forcing the system to process and respond to each packet even if there is no application listening
on that port, consuming system resources. Normally, ICMP echo-request and echo-reply messages
are used to ping a network device in order to diagnose the health and connectivity of the device
and the connection between the sender and the device. By flooding the target with request packets,
the network is forced to respond with an equal number of reply packets.
Now the most important things we need are an attacking device and a target device , in our case
the attacking is done via Kali Linux VM and the target device is Metasploitable 2. Now we also
need to measure whether a TCP flood or any flood happens, we measure this using WireShark.
-----------------------------------------------------------------------------------------------------------------
Metasploitable 2
Now why do we need Metasploitable2 . Metasploitable2 is an environment to test whether the
hackers are good at hacking or one can try different types of attacks for their reports like I have
done. So Metasploitable2 is an intentionally vulnerable Linux virtual machine designed for
training, exploit testing, and general target practice. Unlike other vulnerable virtual machines,
Metasploitable2 focuses on vulnerabilities at the operating system and network services layer
instead of custom, vulnerable applications. Attached below you can see the homepage of the
virtual machine and we can find out that the IP address for the vulnerable website testing is
192.168.56.101 . Now we have gotten our target device .
This distribution has several hundred tools, configurations, and scripts with industry-specific
modifications that allow users to focus on tasks such as computer forensics, reverse engineering,
and vulnerability detection, instead of dealing with unrelated activities.
This distribution is specifically tailored to the needs of experienced penetration testers, so therefore
all documentation on this site assumes prior knowledge of, and familiarity with, the Linux
operating system in general.
-----------------------------------------------------------------------------------------------------------------
DVWA
DVWA also known as the Damn Vulnerable Web App is the site which we are going to attack.
Damn Vulnerable Web Application (DVWA) is a PHP/MySQL web application that is damn
vulnerable. Its main goal is to be an aid for security professionals to test their skills and tools in a
legal environment, helping web developers better understand the processes of securing web
applications. DVWA is one of the hyperlinks in the homepage of our target IP address which was
192.168.56.101. Attached below is the home page of the IP address,DVWA and the starting page
of the DVWA.
Wireshark is the world's foremost network protocol analyzer. It lets you see what's happening on
your network at a microscopic level. It is the de facto (and often de jure) standard across many
industries and educational institutions.
We used wireshark to detect the number of packets and analyse whether the DoS attacks are
occurring or not.
-----------------------------------------------------------------------------------------------------------------
Procedure
● While entering each of the commands, turn on the WireShark and check whether the
attacks are taking place or not.
● Export the data and analyze the data in WireShark
During TCP Flooding 54,071 tcp packets were transmitted from random sources without any 3
way handshakes and the data which was sent was about 50 bytes. Attached below is the structure
of the TCP diagram , the picture of capturing packet data and what each one of them is. One thing
to note is that all the packets are going to 192.168.56.101. Hence it is TCP flooding.
load is 50 bytes
During TCP Flooding 69,543 UDP packets were transmitted from random sources and the data
which was sent was about 50 bytes. Attached below is the structure of the UDP diagram , the
picture of capturing packet data and what each one of them is.
During TCP Flooding 48,314 ICMP packets were transmitted from random sources and the data
which was sent was about 50 bytes. Attached below is the structure of the ICMP diagram , the
picture of capturing packet data and what each one of them is.
potentially severe.
Snort is an open-source Intrusion Detection and Prevention System (IDPS) developed by Cisco. It
monitors network traffic in real-time to detect and respond to malicious activities or security policy
violations. Snort is widely used for network security due to its flexibility and extensive rule-based
detection capabilities.
→Uses predefined or custom rules to identify suspicious traffic (e.g., port scans, buffer
overflows).
3. Intrusion Prevention:
Snort primarily operates as a signature-based and rule-based detection engine, meaning it relies on
predefined patterns (signatures) or rules to identify threats.
● Signature-Based Detection:
→Effective for detecting well-documented threats but may struggle with new, unknown
attacks.
● Rule-Based Detection:
Primary Objective:
The edge device (e.g., gateway or router) runs a lightweight CNN-LSTM model to
monitor network traffic and detect DoS attacks in real time. It’s optimized for fast
detection with minimal computational load, enabling timely responses.
Key Features:
1. CNN for Feature Extraction: The CNN processes key network traffic features (e.g.,
src_bytes, dst_bytes, serror_rate). Kernels focus on features indicative of DoS
behaviour.
2. LSTM for Time-Series Analysis: The LSTM analyzes traffic patterns over time,
identifying anomalies like sudden SYN errors or traffic surges that signal potential
attacks.
3. Overfitting for Sensitivity: The edge model is overfitted to known DoS patterns,
prioritizing fast detection over precision, which results in more false positives but
fewer missed attacks.
1. No Attack Detected: If no attack is found, the edge device either takes no action or
sends a "no attack" signal for confirmation.
2. Attack Detected: Detected attacks trigger encryption of traffic data, which is then
sent to the cloud for deeper analysis and verification.
Cloud Model: Heavy Cross-Verification for Final Decision
Primary Objective:
The cloud model cross-verifies flagged traffic from the edge device using
computationally intensive algorithms to accurately confirm DoS attacks before alerting
the user.
Key Features:
1. Decryption of Data: The cloud model securely decrypts the data sent from the edge
device for further analysis.
2. Cross-Validation: It uses advanced machine learning and anomaly detection
algorithms, leveraging historical data to ensure high accuracy.
3. Attack Confirmation: The model filters out false positives and determines if an
actual attack is occurring.
Response Workflow:
1. Verified Attack: If confirmed, an accurate alert is sent to the user for defensive
action.
2. False Alarm: If no attack is found, the cloud logs it as a false alarm and prevents
unnecessary user notifications.
Optimized Communication: The edge device only transmits data when suspicious
activity is detected, reducing communication overhead and conserving bandwidth.
Attack Alerts: Users receive alerts only when the cloud model confirms a DoS attack,
preventing false positives from overwhelming them.
System Health Monitoring: The edge device may periodically send health reports to
reassure users that the system is functioning normally when no attacks are detected.
In this setup, the edge model is intentionally overfitted to known DoS attack patterns, leading
to:
Reducing False Negatives: The model's high sensitivity lowers the chance of missed
attacks.
Increasing False Positives: While this raises the likelihood of normal traffic being
flagged, the cloud model's secondary verification minimizes the impact of these false
positives.
SOLUTION PIPELINE:
NO YES
NO
YES
Introduction
This report provides a comparison between the proposed Edge-to-Cloud DoS detection model
and current industry-standard methods. The proposed system employs a lightweight CNN-
LSTM model on edge devices for fast, local detection of DoS attacks and an ANN-based
verification model in the cloud. This hybrid approach aims to achieve a balance between
responsiveness, resource efficiency, and detection accuracy.
6. Pulse Traffic Analysis is a technique used to detect patterns in network traffic that
involve periodic bursts, often linked to malicious activities like DoS attacks (e.g.,
DNSbomb). It involves analyzing the timing, volume, and frequency of traffic to
identify irregular pulses that deviate from normal baselines. Tools like statistical
models, temporal metrics, and machine learning algorithms enhance detection
accuracy. This approach is crucial in distinguishing between legitimate high-traffic
events and attack-related traffic anomalies, enabling proactive mitigation.
1. Signature-Based Detection
Comparison: Signature-based systems rely on pre-defined attack patterns,
offering efficient real-time detection of known DoS attacks. However, they
lack adaptability to novel or evolving attack types.
Advantages of Proposed Solution: The CNN-LSTM model at the edge adapts
to new attacks by learning high-level and sequential features from the NSL-
KDD dataset. The LSTM component enables detection of temporal patterns
beyond static signatures. The cloud-based Random Forest (RF) further
validates edge predictions by leveraging structured tabular data analysis.
Drawbacks of Signature-Based Detection: Signature-based methods are
limited to known attacks and can be evaded by variations. The proposed edge-
cloud solution dynamically detects both known and unknown patterns,
offering adaptability and resilience.
Sri Nithya Bandi Bhavya Reddy Sama
Pranjal Bhardwaj Sreyash Somesh Mishra
2. Anomaly-Based Detection
Introduction
The NSL-KDD dataset has emerged as a standard benchmark for network intrusion detection
research. It addresses key limitations of its predecessor, the KDD’99 dataset, which faced
criticism for issues such as high redundancy and skewed record frequency. These shortcomings
led to biased performance evaluation of machine learning models. By mitigating these concerns,
the NSL-KDD dataset provides a more realistic and effective framework for assessing and
benchmarking machine learning models and cybersecurity techniques. This makes it an
invaluable resource for developing robust network intrusion detection systems
Dataset Structure
The dataset consists of labeled network connection records, classified as either "normal" or
specific network attacks. Each connection is described using multiple features that encapsulate
its attributes and behaviors. To facilitate efficient evaluation, the dataset is organized into two
subsets:
· KDDTrain+: This subset contains labeled network connections for model training and
development.
· KDDTest+: This testing set includes labeled connections, with some attack types not
present in the training set. This setup evaluates how well models generalize to unseen
attack types.
The dataset describes each network connection with 41 features, grouped into three categories:
Basic Features: These describe general properties of each connection, such as:
PRANJAL BHARDWAJ
SREYASH SOMESH MISHRA
Content Features: These analyze the payload of connections to detect specific attack
activities:
Traffic Features: These are derived from statistical observations over time, highlighting
patterns like:
The dataset categorizes network intrusions into four main attack types:
For this study, the focus is on detecting DoS attacks, with other attack types treated as normal
conditions.
The target label in each connection record identifies the connection as either "normal" or a
specific attack type. One significant challenge posed by the NSL-KDD dataset is its inherent
class imbalance, a common trait in network intrusion datasets. This imbalance indicates that
certain types of attacks, like DoS and Probe, are much more frequent than others. This was used
to the advantage because the edge model trained on this dataset will inherently be better at
detecting dos attacks reducing the number of False negatives.
PRANJAL BHARDWAJ
SREYASH SOMESH MISHRA
Data Preprocessing for NSL-KDD Dataset
Objective of Preprocessing
The primary goal of preprocessing the NSL-KDD dataset was to prepare it for the effective
training of machine learning models, specifically for detecting Denial-of-Service (DoS) attacks.
The process addressed critical steps such as handling categorical data, normalizing numerical
features, mapping attack types, and selecting features to enhance model accuracy and
computational efficiency.
The raw dataset was loaded, and appropriate column names were assigned to ensure clarity and
correctness in feature identification. These columns included key attributes such as connection
duration, protocol type, service type, connection status (flag), and various traffic statistics, all of
which describe network behaviors.
Categorical features like protocol_type, service, and flag were transformed into numerical
representations using label encoding. This step was essential for making the data compatible with
machine learning models, which typically require numerical inputs. Each category was mapped
to a unique integer, preserving the underlying information while enabling effective computation.
The dataset's attack column, originally comprising multiple network intrusion types, was mapped
to a binary classification:
PRANJAL BHARDWAJ
SREYASH SOMESH MISHRA
· land: Spoofs source and destination IP addresses to cause denial of service.
· neptune: A SYN flood attack designed to overwhelm network resources.
· pod (Ping of Death): Sends oversized packets to crash the target system.
· smurf: Exploits ICMP echo requests to flood a target with traffic.
· teardrop: Exploits IP fragmentation vulnerabilities to crash systems.
Other types of attacks, such as Remote-to-Local (R2L), User-to-Root (U2R), and Probe attacks,
were considered part of normal traffic for this specific training purpose.
Feature Normalization
Numerical features were normalized using the MinMaxScaler technique to scale values between
0 and 1. This process ensured that features with larger ranges, such as src_bytes and dst_bytes,
did not disproportionately influence model training. Excluding categorical and target columns,
this normalization step enhanced the numerical stability of the model and provided a
standardized feature space for machine learning.
PRANJAL BHARDWAJ
SREYASH SOMESH MISHRA
Feature Analysis and Selection
Methods used:
· Mean analysis
· Plotting features
A mean analysis was conducted to identify features with the most significant differences between
normal traffic and DoS attack traffic. This step aimed to prioritize features for the edge model,
focusing computational resources on high-impact attributes
Features with the highest differences in order.These values were obtained from subtracting the
normalized mean value of function in normal condition from its mean value in attack condition.
1. srv_serror_rate: 0.731816 selected
2. serror_rate: 0.731626 selected
3. dst_host_serror_rate: 0.731474 selected
4. same_srv_rate: 0.740047 selected
5. dst_host_same_srv_rate: 0.62412 selected
6. dst_host_srv_count: 0.549149411 selected
7. count: 0.28987182 selected
8. dst_host_count: 0.381967844 selected
9. last_flag: 0.012914 selected
10. wrong_fragment: 0.021433333 selected
11. srv_diff_host_rate: 0.14406
12. rerror_rate: 0.050139 selected
13. srv_rerror_rate: 0.05012 selected
14. dst_host_rerror_rate: 0.060276
15. num_failed_logins: 0.00026
PRANJAL BHARDWAJ
16. num_root: 8.43E-05
17. num_compromised: 7.12E-05
18. num_access_files: 0.001044444
19. su_attempted: 0.00125
20. hot: 0.003472728
21. root_shell: 0.0031
22. num_shells: 0.0003
23. num_file_creations: 0.000302326
24. urgent: 0.0001
25. land: 0.0001
26. is_guest_login: 0.0139
27. num_outbound_cmds: 0
28. is_host_login: 0
29. dst_bytes: 3.25E-06
30. src_bytes: 1.26E-05
31. duration: 0.009985751
Those features with “selected” written next to them were selected for training the edge model.
PRANJAL BHARDWAJ
normal conditions, SYN packets are typically acknowledged, resulting in a low
serror_rate. During DoS attacks, many SYN packets are sent without completing
the handshake, leading to connection failures and a high serror_rate.
PRANJAL BHARDWAJ
7. Number of Connections to the Same Host (count):
· Normal: 0.059363014
· DoS Attack: 0.349234834
Explanation: The count feature represents the number of connections made to the
same host. Normal traffic has fewer connections, but during a DoS attack, especially
with a focus on one host, the count of connections drastically increases as attackers
target the same host repeatedly.
PRANJAL BHARDWAJ
11. REJ Error Rate (rerror_rate):
· Normal: 0.098998
· DoS Attack: 0.149137
Explanation: rerror_rate measures the rate of rejected connections due to errors. In
normal conditions, some connection requests may be rejected. However, during a
DoS attack, more requests are invalid or malicious, increasing the rerror_rate as the
server rejects those attempts.
PRANJAL BHARDWAJ
Objective of plotting features.
Visualize feature distributions and mean differences between normal and DoS traffic, enabling
effective feature selection and preprocessing validation. This ensures data quality and guides the
edge model to focus on high-impact attributes, improving accuracy and reducing computational
overhead.
Sample outputs from plotting.
Color coding:
PRANJAL BHARDWAJ
Edge model
CNN+LSTM model was chosen for its ability to efficiently extract critical features through
CNNs, while LSTMs effectively capture temporal patterns in network traffic, enabling early
detection of DoS attacks with minimal computational overhead on resource-constrained edge
devices.
1. Efficient Feature Extraction with CNN : The use of Convolutional Neural Networks
(CNNs) in the edge model allows for efficient processing of network traffic data by
leveraging convolutional kernels. These kernels extract critical features such as
abnormal byte rates, error patterns, and traffic surges, which are indicative of potential
Denial-of-Service (DoS) attacks. The feature compression provided by CNNs minimizes
data complexity while retaining essential information, ensuring computational overhead
is reduced. This lightweight design is particularly suited for resource-constrained edge
devices, enabling real-time anomaly detection without overwhelming hardware
capabilities.
PRANJAL BHARDWAJ
1. Data Preprocessing and Feature Selection
The training pipeline begins with preprocessing the NSL-KDD dataset to standardize its format
and enhance computational efficiency:
· Feature Selection: Using mean analysis, we identified features with the highest
variance between normal and attack traffic, such as srv_serror_rate, serror_rate, and
dst_host_serror_rate. These features provide critical insights into DoS behavior.
· Scaling Factors: Selected features were scaled by a factor of 1.5 to emphasize their
importance during training while still considering all available features. This
approach ensures that critical features have a stronger influence on the model without
discarding other potentially useful attributes.
· Sliding Window Technique: To simulate sequential behavior, the data was formatted
into overlapping sequences of five rows. Each sequence represents a
pseudo-time-series input, enabling the LSTM component to analyze temporal patterns
across connection events.
The architecture combines the strengths of Convolutional Neural Networks (CNNs) and Long
Short-Term Memory (LSTM) networks:
· CNN Component:
o Extracts spatial features by applying convolutional filters to the input sequences.
The CNN reduces dimensionality and computational overhead while
highlighting patterns within feature groups.
o Pooling layers further compress the feature space, ensuring efficient processing
on resource-constrained systems.
· LSTM Component:
o Processes the reduced features from the CNN over the simulated time-series
sequences, capturing temporal dependencies critical for detecting evolving
attack patterns.
o By analyzing changes in features like count and serror_rate across time steps,
the LSTM identifies sequential anomalies indicative of DoS activity.
PRANJAL BHARDWAJ
3. Modification of Loss Function and Class Imbalance
To address the imbalance in the NSL-KDD dataset, where normal traffic significantly outweighs
attack traffic:
Given the edge constraints of real-time DoS detection systems, computational efficiency was a
key consideration:
5. Training Strategy
· Batch Processing: DataLoader efficiently handles training samples in batches, reducing
memory overhead while ensuring stable gradient updates.
· Sliding Window Sampling: By overlapping sequences during preprocessing, the model
learns both static and evolving patterns, improving its generalization to unseen data.
· Regularization: Dropout layers and weight decay are incorporated into the training
process to mitigate overfitting, ensuring the model performs well on test datasets.
PRANJAL BHARDWAJ
Model Results:
The following results were obtained from training the edge model from scratch on nsl.
These results presented were obtained from training the CNN+LSTM model on the NSL-KDD
dataset for 10 epochs. The training accuracy reached 99.73%, with accuracy consistently
increasing and loss steadily decreasing. This trend indicates that the model is successfully
learning and generalizing well from the data, without overfitting. The improvement in accuracy
coupled with the reduction in loss further supports the model's ability to differentiate between
normal traffic and DoS attacks effectively.
PRANJAL BHARDWAJ
Results from unseen test data:
The results presented were obtained from testing the CNN+LSTM model on unseen test data
from the NSL-KDD dataset. The test accuracy achieved was 92.71%, with a relatively low
number of false negatives (approximately 180), indicating that the model effectively identifies
DoS attacks. The RAM usage for testing 22,540 samples was minimal, at just 4.199 MB,
demonstrating that the model is suitable for deployment on edge devices with limited
computational resources. Additionally, the model's inference time was 0.548 seconds, enabling
real-time responses for DoS attack detection, which is crucial for edge-based applications.
These results confirm that the model not only performs well but is also efficient enough to
operate in real-time environments.
PRANJAL BHARDWAJ
Suggested methods to improve accuracy:
To reduce false negatives (DoS attacks classified as normal), adjusting the classification
threshold can help. By lowering the threshold, the model becomes more sensitive to potential
DoS attacks, detecting them earlier—even with lower confidence scores. Although this might
increase false positives, it ensures that more attacks are flagged proactively.
3. Hyperparameter Tuning
Optimizing hyperparameters such as the number of LSTM layers, CNN filter sizes, and
learning rates using techniques like Grid Search can significantly improve model accuracy.
Cross-validation should be used to ensure the model generalizes well to unseen data and avoids
overfitting.
Implementing online or incremental learning allows the model to adapt to evolving attack
patterns without retraining from scratch. This can improve the system’s ability to detect new and
emerging DoS attacks in real-time, keeping the model up-to-date.
PRANJAL BHARDWAJ
Challenges in Implementing Suggested Improvements
While several methods, including threshold adjustment and real-time adaptive learning, were
considered and tried for enhancing model performance, these solutions were not incorporated
due to limitations in the current implementation environment and device limitations.
Also the NSL-KDD dataset does not contain information in time-series format, which limited
our ability to fully utilize the LSTM component of the model for sequential pattern recognition.
However, LSTM was incorporated by using a sliding window technique to simulate time-series
data, capturing sequential dependencies in network traffic despite the dataset's tabular structure.
I did not train the model on the CIC-IDS2017 dataset as it is too large to be managed efficiently
on my device. However, the modular design of the model and the training methodology ensures
that the same steps can be easily adapted to train a new model on this or other more advanced
datasets
During the simulation of a DoS attack, data was collected using Wireshark to analyze network
activity. However, the predictive model failed to produce accurate results as the input data
extracted from Wireshark included only a subset of features, such as IP addresses and the
number of bytes transmitted between nodes. The model requires a comprehensive set of 43
features to make reliable predictions, which were not captured in the current data collection
process. Additionally, the model was trained on connection-level information, whereas the data
obtained from Wireshark was at the packet level. This necessitated aggregation of packet-level
data into connection-level metrics, introducing further complexity and potential inaccuracies. To
ensure effective testing and prediction, advanced data collection tools capable of capturing all
necessary features at the appropriate level of granularity are essential.
PRANJAL BHARDWAJ
Model Architecture Strengths for Edge Limitations for Edge
Deployment Deployment
Random Forest (RF) Ensemble of Decision - High interpretability for - Memory-intensive and
Trees anomaly detection in unsuitable for real-time
tabular data. edge applications.
- Lacks support for
sequential or temporal
patterns.
Support Vector Kernel-based Classifier - Compact model size for - Poor scalability for
Machine (SVM) small datasets. large datasets.
- Limited adaptability to
edge environments with
high-dimensional data.
PRANJAL BHARDWAJ
Results from comparison
a) Traditional Machine Learning Models: While models like Random Forest and
Gradient Boosting Machines perform well with structured datasets, their high
memory and computational requirements make them impractical for edge
deployment.
b) Simpler Neural Networks: Models such as MLPs and SVMs are computationally
efficient but lack the ability to process sequential or temporal data, reducing their
effectiveness in detecting evolving attack patterns.
c) RNN-Based Models: Although RNNs capture sequential dependencies, they often
demand greater computational resources and are prone to gradient-related issues,
limiting their applicability in edge scenarios.
4. Adaptability:
The modular design of the CNN-LSTM model allows it to be easily adapted to other datasets or
evolving threats. This flexibility ensures long-term viability for edge deployments as network
traffic patterns change.
PRANJAL BHARDWAJ
Cloud-Based Random Forest Model for DoS Attack Detection
The cloud-based machine learning model leverages a Random Forest classifier to efficiently
detect Denial-of-Service (DoS) attacks, utilizing the NSL-KDD dataset as its foundation.
It is designed specifically for scalable cloud infrastructures. The model prioritizes accuracy,
robustness, and real-time threat analysis, making it a powerful tool for modern cybersecurity
challenges.
By incorporating strategies for large-scale data processing and resource optimization, the
system ensures seamless integration into distributed environments.
4. Cross-Validation
Use K-Fold cross-validation to evaluate the Random Forest model’s performance
across multiple splits of the dataset, reducing overfitting.
Computational Efficiency
1. Inference Time and Resource Utilization
Low Latency:
o The ANN processes batches of data with minimal latency, suitable for real-
time applications.
Memory Usage:
o Compared to Random Forest, the ANN exhibits comparable memory usage
but excels in adaptability to new data patterns.
2. Optimize Hyperparameters
Learning Rate: Use a learning rate scheduler to decay the learning rate during
training.
Batch Size: Experiment with different batch sizes. Smaller batches can improve
convergence, while larger batches stabilize updates.
Epochs: Train the model for more epochs while monitoring for overfitting using
validation metrics.
1. Model Performance
Training and Test Accuracy
o Random Forest:
Training Accuracy: 99.97% (0.9997)
Test Accuracy: 98.32% (0.9832)
o Artificial Neural Network:
Training Accuracy: 99.96% (0.9996)
Test Accuracy: 94.48% (0.9448)
The Random Forest model's test accuracy of 98.32% outperforms the ANN's 94.48%,
demonstrating its ability to generalize better to unseen data.
This difference is especially significant in real-world applications, where
generalization is critical for detecting previously unseen attack patterns while
minimizing false negatives.
High test accuracy also ensures enhanced reliability in detecting malicious activities
across IoT networks.
Metrics Summary (RF vs ANN)
o Although the ANN demonstrates high recall (indicating a strong ability to
detect DoS attacks effectively), its precision of 83.25% reveals a
comparatively higher rate of false positives. This could lead to unnecessary
alerts or disruptions in a real-time detection system.
o In contrast, the Random Forest model achieves superior precision (95.89%)
and recall (97.60%), striking a better balance between accurately identifying
attacks and minimizing false positives.
o In a cloud environment, where both computational efficiency and predictive
reliability are critical, Random Forest’s ability to reduce false positives
provides a clear operational advantage.
Inference Time
o Random Forest’s architecture supports parallel inference across its
decision trees, enabling faster predictions. This feature is particularly
advantageous in cloud deployments, where latency-sensitive applications
demand rapid responses to detected anomalies.
Conclusion
Based on empirical results and practical considerations, the Random Forest model is the
optimal choice for cloud-based DoS attack detection in IoT networks. Its superior test
accuracy, efficiency with tabular data, low resource demands, and seamless scalability make
it a robust solution for safeguarding IoT environments against the growing threat of DoS
attacks.
Convolutional - Specialized for image - Best for image, - Not suitable for
Neural and spatial data video, and spatial data tabular data
Network - Convolutions extract - Supported by cloud - Training is
(CNN) features → Pooling GPUs/TPUs for high- resource-intensive
layers reduce speed computation (time and compute)
dimensions → Fully - Pre-trained models - Requires advanced
connected layers like ResNet available cloud infrastructure
classify for transfer learning with large-scale
parallelism
Enhanced communication protocols between edge and cloud systems are essential to address
the growing demands of modern IoT and AI-driven applications. Traditional communication
methods often result in high latency, excessive bandwidth usage, and increased energy
consumption due to the constant transfer of data between edge devices and the cloud..
Enhanced protocols enable selective data transmission, ensuring that only necessary
information is sent to the cloud for processing, reducing communication overhead and
conserving resources. Additionally, they incorporate security features like end-to-end
encryption and integrity checks to safeguard sensitive data, which is crucial in industries
handling confidential information. These protocols strike a balance between edge autonomy
and cloud computation, ensuring scalable, efficient, and secure communication in distributed
systems.
Gurupriya D
3. Security Considerations
Implement lightweight encryption protocols such as AES-128 for data before transmission.
Include checksums or hash-based message authentication codes (HMACs) to verify that data
received by the cloud has not been tampered with during transmission.
With the growing adoption of IoT and edge computing, efficient communication between
edge devices and cloud systems is crucial. Traditional models often transmit all data to the
cloud, leading to high bandwidth usage, latency, and energy consumption. The Small-Big
Model Framework addresses these challenges by enabling selective communication, reducing
unnecessary data transmission while maintaining high performance.
Gurupriya D
Introduction to the Small-Big Model Framework
The Small-Big Model Framework is a communication strategy designed to optimize data
transmission and processing between edge devices and cloud systems. This framework
addresses the challenges of traditional edge-cloud architectures, including high bandwidth
usage, latency issues, and limited scalability. By enabling selective communication, the
framework ensures that only critical data is transmitted to the cloud, while simpler tasks are
processed locally on edge devices. This approach significantly reduces communication
overhead while maintaining the accuracy and efficiency of the system, making it ideal for
applications in IoT, surveillance, and industrial automation.
Gurupriya D
Communication Workflow
The Small-Big Model Framework introduces an innovative approach to communication
between edge devices and cloud systems, ensuring optimal resource utilization and efficient
data processing. The workflow involves four primary stages: edge processing, selective
transmission, cloud processing, and result integration. Each stage is meticulously designed to
balance computational efficiency and accuracy, reducing unnecessary data transfers while
maintaining high detection performance.
1. Edge Processing
The edge device captures raw input data, such as images or sensor readings, and preprocesses
it to ensure compatibility with the edge model. A lightweight Small Model runs locally on the
edge device to analyze the input data.
This model is optimized for speed and energy efficiency, capable of making decisions for
straightforward cases (e.g., objects with high confidence levels or simple classifications).
A confidence threshold is applied to predictions:
If the confidence score exceeds the threshold, the data is processed and resolved
locally.
If the confidence score falls below the threshold, the data is flagged as a "difficult
case" for further processing.
2. Difficult-Case Discrimination
The Difficult-Case Discriminator (DCD) evaluates flagged data to ensure only essential
information is sent to the cloud.
The flagged data is packaged efficiently, often compressed or reduced to its most critical
features, to minimize bandwidth usage.
The DCD is an integral part of the framework, allowing the system to maintain a balance
between bandwidth savings and data integrity.
Gurupriya D
3. Selective Transmission
Transmission of Flagged Data:
Only flagged "difficult cases" are transmitted to the cloud. This selective
communication drastically reduces the volume of data sent compared to traditional
systems that offload all data.
Lightweight serialization techniques, such as Protocol Buffers or MessagePack, are
often used to further compress the data packets before transmission.
Communication Optimization:
1. Dataset Selection
For the testing phase, various image and sensor-based datasets were selected to reflect
different levels of complexity and real-world applicability:
Gurupriya D
VOC (Pascal Visual Object Classes): A widely used benchmark for object detection
tasks, the VOC dataset provides labeled data on images with various object classes,
making it ideal for testing edge-based object recognition models.
COCO (Common Objects in Context): This dataset consists of diverse images with
annotations for object detection, segmentation, and captioning. It includes more
complex scenarios with objects in varied contexts, making it suitable for the cloud-
based Big Model's processing.
HELMET: A specialized dataset designed for object detection, particularly in
environments like industrial settings where precise detection of equipment or human
activity is required.
These datasets offer a mix of simple and complex tasks that allow for testing the Difficult-
Case Discriminator (DCD)'s ability to distinguish between easily resolvable cases and those
that require cloud offloading.
3. Testing Process
The testing phase followed a clear progression:
Initial Edge Processing: The Small Model processed input from the datasets directly
on the edge device. For simple object detection cases, such as identifying well-defined
objects in clear environments, the Small Model handled the processing locally.
Flagging Difficult Cases: For more complex or uncertain cases, such as detecting
partially occluded objects or objects in cluttered backgrounds, the Small Model
flagged the data as a "difficult case." These cases were then forwarded to the cloud for
detailed analysis by the Big Model.
Data Transmission: The flagged data was serialized and compressed using
lightweight techniques like Protocol Buffers or MessagePack, ensuring efficient
transmission. The data was sent over low-latency communication protocols, such as
MQTT or WebSocket, to minimize the delay between edge and cloud processing.
Gurupriya D
Cloud Processing and Feedback: Once the data reached the cloud, the Big Model
performed more comprehensive processing, refining the detection and making more
precise predictions. The results were sent back to the edge device for integration.
Output Integration: The edge device combined its local processing results with the
cloud-based results to generate a final output, ensuring the system provided both real-
time responses for simple cases and high-accuracy results for complex scenarios.
4. Evaluation Metrics
The success of the Small-Big Model Framework was evaluated using a range of metrics:
Accuracy: This was measured using standard object detection metrics like mean
average precision (mAP). For the datasets tested, the framework achieved a mAP
ranging from 91.22% to 92.52%, indicating high accuracy in both local and cloud-
based processing.
Latency: The latency was evaluated based on the time taken for the system to process
data from the edge device to the cloud and back. The use of local processing ensured
minimal delays for straightforward cases, while the cloud-based processing added
slight delays for more complex cases. However, these delays were offset by the
efficiency of selective transmission and data compression.
Bandwidth Efficiency: The transmission of flagged data was optimized to reduce
bandwidth consumption. About 50% of the images were processed locally on the
edge, with only the difficult cases sent to the cloud, significantly reducing the amount
of data transmitted and conserving network bandwidth.
Energy Consumption: The edge devices were evaluated for energy efficiency by
measuring the power consumption during local processing and when transmitting
data. By minimizing the number of transmissions to the cloud, the system reduced the
overall energy usage on the edge device.
5. Results
Data Transmission Efficiency: By only transmitting difficult cases, the framework reduced
the amount of data sent to the cloud by 50% compared to traditional systems that would send
all data regardless of complexity.
Performance Gains: The Small-Big Model achieved a high balance of accuracy, bandwidth
efficiency, and latency, making it suitable for real-time applications where edge devices have
limited resources but require robust cloud-based support.
Gurupriya D
Benefits of the Framework
The Small-Big Model Framework offers several key benefits:
Bandwidth Optimization: By processing easy cases locally, the framework
significantly reduces the amount of data transmitted to the cloud.
Reduced Latency: Local processing ensures quicker response times for
straightforward cases, while only complex cases experience cloud-related delays.
Scalability: The framework is adaptable to a wide range of edge devices, making it
suitable for diverse applications.
Energy Efficiency: By minimizing cloud dependency, the framework reduces energy
consumption on both the edge device and the cloud.
Conclusion
The Small-Big Model Framework provides a practical and efficient solution for edge-to-
cloud communication, addressing critical challenges in bandwidth usage, latency, and energy
consumption. Its innovative approach to selective data transmission ensures that the system
remains scalable and accurate, even in resource-constrained environments. This framework
holds great promise for applications requiring real-time processing and efficient resource
management.
Gurupriya D
XAI and Blockchain-Powered Edge-to-Cloud System
for DoS Mitigation in IoT
The Vulnerability of IoT Networks:
The sources emphasize the rapid proliferation of IoT devices and their inherent vulnerabilities,
making them prime targets for DoS attacks. These vulnerabilities stem from factors such as:
●Limited Resources: IoT devices typically have constrained processing power, memory, and
energy, making them susceptible to attacks that overwhelm their resources.
●Insecure Communication: Many IoT devices rely on wireless communication protocols
that lack robust security measures, making them prone to interception and manipulation.
●Lack of Standardization: The diverse range of IoT devices and protocols often leads to
inconsistencies in security implementations, creating vulnerabilities that attackers can exploit.
The sources highlight the importance of explainability in AI-based security systems XAI
techniques, such as SHAP (Shapley Additive exPlanations), provide insights into the
decision-making process of AI models, enhancing trust and enabling informed responses to
detected anomalies.
● Understanding Feature Importance: XAI helps identify the most influential features
contributing to the detection of DoS attacks. This allows security analysts to:
○Fine-tune detection models for greater accuracy and efficiency.
○Establish security policies based on thresholds for critical features.
○Gain a deeper understanding of the nature and characteristics of attacks.
● Reducing False Positives: By explaining why a particular network flow is flagged as an
anomaly, XAI helps distinguish between legitimate traffic bursts and genuine DoS attacks. This
reduces the likelihood of mistakenly blocking harmless traffic.
Bhavya
Key Components of XAI Implementation
Bhavya
○ Example:
■ A spike in traffic from a sensor during maintenance hours may be flagged
as an anomaly. However, XAI can contextualize this spike using historical
data and explain it as non-malicious.
● Visualization Tools
○ Purpose: To present explanations in an intuitive and actionable manner.
○ Techniques:
■ SHAP Summary Plots: Show the average contribution of each feature
across all flagged instances.
■ SHAP Waterfall Plots: Break down the cumulative effect of features
leading to a specific prediction.
■ Feature Importance Heatmaps: Visualize the importance of various
features across different traffic samples.
Waterfall plot:
Bhavya
2. Blockchain for Secure Data Management and Collaboration:
The sources propose leveraging blockchain technology to enhance security and collaboration in
the IoT ecosystem:
● Immutable and Transparent Logging: Blockchain provides a tamper-proof and
auditable record of detected DoS events and blacklisted IP addresses This:
○Enhances accountability and trust among network participants.
○Facilitates forensic analysis and investigation of attacks.
○Prevents attackers from manipulating or erasing evidence of their actions.
Bhavya
● Decentralized Threat Intelligence: Blockchain enables secure sharing of threat
intelligence among distributed IoT devices and network nodes. This allows:
○Real-time updates on emerging threats and attack patterns.
○Collaborative mitigation efforts to block malicious traffic at multiple points.
○Increased resilience against attacks targeting individual devices or nodes.
Bhavya
● Decentralized Anomaly Verification
○ Purpose:
■ Reduces reliance on a single point of failure by distributing the
verification of detected anomalies across multiple nodes.
○ How It Works:
■ Each Blockchain node validates anomalies reported by edge devices by
cross-referencing the traffic patterns logged in the ledger.
■ If a consensus is reached among nodes, the anomaly is flagged as verified,
triggering appropriate responses.
○ Application in DoS Detection:
■ Helps in detecting Distributed DoS (DDoS) attacks by correlating logs
from multiple devices to identify coordinated patterns.
● Immutable Forensic Records
○ Purpose:
■ Provides a tamper-proof history of network events for post-attack analysis
and compliance reporting.
○ How It Works:
■ Once logged, entries cannot be altered or deleted, ensuring that all records
are trustworthy and auditable.
○ Use Case:
■ In regulatory environments (e.g., healthcare IoT), Blockchain-based logs
can demonstrate compliance with cybersecurity standards.
Bhavya
3. Edge-to-Cloud Architecture for Efficient and Scalable DoS
Mitigation:
Integrating XAI and Blockchain into an Edge-to-Cloud architecture optimizes DoS detection and
mitigation in IoT networks:
● Edge Computing for Rapid Local Detection: Lightweight XAI-enabled detection
models can be deployed on resource-constrained edge devices for real-time anomaly detection.
This enables:
○Quick identification and isolation of potential DoS attacks close to the source.
○Reduced latency in response time, minimizing the impact of attacks.
○Offloading computationally intensive tasks from the cloud.
Bhavya
● Cloud Computing for Verification and Collaboration: The cloud serves as a
central hub for:
○Verifying anomalies detected at the edge using more sophisticated XAI models.
○Maintaining the blockchain network for secure data storage and collaboration.
○Coordinating mitigation efforts across the entire IoT ecosystem.
1. Framework Overview
The proposed system leverages Explainable AI (XAI) and Blockchain technologies within an
Edge-to-Cloud architecture to enhance the detection and mitigation of Denial-of-Service (DoS)
attacks. The framework consists of two primary layers: the edge layer, responsible for real-time
detection and explanation of anomalies, and the cloud layer, which performs comprehensive
analysis and coordinated responses. Blockchain underpins the entire system, ensuring secure,
tamper-proof communication and logging.
2. Dataset Preparation
For implementation, we use multiple kinds of datasets like CICDDoS2019, CIC-IoT2023, etc.
we even use of the NSL-KDD dataset, a benchmark dataset for network intrusion detection,
supplemented with synthetic IoT-specific traffic to simulate diverse attack scenarios. The
datasets includes attributes such as:
Preprocessing Steps:
Bhavya
3. Edge Layer Implementation
The edge layer focuses on real-time detection of anomalous traffic using a lightweight machine
learning model. A CNN-LSTM model is suitable here, as it captures both spatial and temporal
patterns in the traffic data.
● Model Functionality:
○ The CNN extracts spatial features (e.g., traffic intensity patterns).
○ The LSTM identifies sequential anomalies, such as traffic bursts indicative of
DoS attacks.
○ The output is a binary or multi-class prediction (e.g., Normal, DoS).
● XAI Integration:
○ SHAP (SHapley Additive exPlanations) is employed to provide interpretability.
○ For example, if a traffic sample is flagged as a DoS attack, SHAP can show that
high packet rates and low inter-packet arrival times were the key contributing
factors.
4. Blockchain Integration
● Traffic Logging:
○ Each detected anomaly is logged with attributes such as source IP, destination IP,
timestamp, and detected attack type.
○ Logs are distributed across Blockchain nodes, ensuring data immutability.
● Smart Contracts:
○ Pre-programmed rules automate actions such as:
■ Blocking malicious IPs.
■ Throttling traffic from flagged devices.
○ For instance, a detected DoS attack triggers a smart contract to notify all
connected devices to isolate the source node.
Bhavya
5. Cloud Layer Implementation
The cloud layer aggregates and verifies anomalies flagged by edge devices using advanced
analytical models.
● Model Usage:
○ Graph Neural Networks (GNNs) analyze the IoT network's topology.
○ Nodes represent devices, and edges represent traffic flows.
○ The model detects patterns indicative of coordinated DoS attacks, such as
abnormal traffic from clusters of devices.
● Global Analysis:
○ Consolidates data from multiple edge nodes.
○ Correlates localized anomalies to detect distributed DoS (DDoS) attacks.
● Blockchain Validation:
○ Cloud systems query Blockchain logs to validate edge-detected anomalies.
○ If multiple edge devices report consistent anomalies, the cloud flags them as
verified attacks.
● Challenges:
○ The computational overhead of Blockchain in resource-constrained IoT devices.
○ Balancing the latency of real-time detection with the need for detailed XAI
explanations.
○ Scaling the framework for large IoT networks with millions of devices.
● Future Work:
○ Explore lightweight Blockchain implementations such as Directed Acyclic
Graphs (DAGs) for reduced computational demands.
○ Optimize XAI techniques for faster explanations without sacrificing
interpretability.
○ Incorporate federated learning to train detection models collaboratively across
devices while preserving data privacy.
○ Deploy the framework in real-world IoT environments, such as smart cities or
healthcare systems, to validate scalability and robustness.
Bhavya
8. Conclusion
The proposed Edge-to-Cloud system combining XAI and Blockchain addresses critical
challenges in IoT network security against DoS attacks. By ensuring real-time detection,
transparency, and secure communication, the framework offers a robust solution for modern IoT
environments. While challenges remain in terms of computational efficiency and scalability, the
integration of these technologies marks a significant step forward in safeguarding IoT networks
from evolving cyber threats.
Bhavya
References
Cao, Zhiqiang, et al. "Edge-Cloud collaborated object detection via difficult-case discriminator." 2023
IEEE 43rd International Conference on Distributed Computing Systems (ICDCS). IEEE, 2023.
https://arxiv.org/abs/2108.12858
Andriulo, Francesco Cosimo, et al. "Edge Computing and Cloud Computing for Internet of Things: A
Review." Informatics. Vol. 11. No. 4. MDPI, 2024.
https://doi.org/10.3390/informatics11040071
https://www.wireshark.org/about.html
https://www.kali.org/docs/introduction/what-is-kali-linux/
https://youtu.be/KytAmziXs4k?si=Lx4uJJtFEVYo-Ea4
Shah, Syed Ali Raza, and Biju Issac. "Performance comparison of intrusion detection systems and
application of machine learning to Snort system." Future Generation Computer Systems 80 (2018):
157-170.https://doi.org/10.1016/j.future.2017.10.016
Manikumar, D. V. V. S., and B. Uma Maheswari. "Blockchain based DDoS mitigation using machine
learning techniques." 2020 Second international conference on inventive research in computing
applications (ICIRCA). IEEE, 2020.https://doi.org/10.1109/ICIRCA48905.2020.9183092
Kumari, Pooja, et al. "Leveraging blockchain and machine learning to counter DDoS attacks over IoT
network." Multimedia Tools and Applications (2024): 1-25.
https://link.springer.com/article/10.1007/s11042-024-18842-4
Kumar, Prabhat, et al. "Blockchain and explainable AI for enhanced decision making in cyber threat
detection." Software: Practice and Experience (2024).
https://onlinelibrary.wiley.com/doi/full/10.1002/spe.3319
Kalutharage, Chathuranga Sampath, et al. "Explainable AI-based DDOS attack identification method for
IoT networks." Computers 12.2 (2023): 32. https://doi.org/10.3390/computers12020032
Almadhor, Ahmad, et al. "Strengthening network DDOS attack detection in heterogeneous IoT
environment with federated XAI learning approach." Scientific Reports 14.1 (2024): 24322.
https://www.nature.com/articles/s41598-024-76016-6
Contribution of the members:
NAME CONTRIBUTION
Sri Nithya Bandi - Introduction and description of DoS and DDoS attacks
- Different types of attacks on edge device and cloud
system
- Differences of of DoS attack detection on edge device
and cloud system
- DoS attack detection techniques analysis on edge device
and Cloud system
- Challenges of Detection on Edge vs.Cloud
- Current Industry Standards & comparison with proposed
solution
Sai Varun Ragi - What is flooding DoS attacks and their types
- Performing a DoS attacking on MetaSploitable2 using
Kali Linux and collecting the packets via Wireshark
- Short comparison of different types of attacks
- Snort IDE - One of the current industrial standards for
detection of DoS attacks
Sreyash Somesh Mishra - Cloud-based Random Forest model for DoS attack
detection
- Cloud-based ANN model for DoS attack detection
- Justification for employing Random forest model for
cloud based DoS attack detection in IoT networks
- Comparison with some other models for cloud
deployment
- Dataset information & Data pre-processing for
NSL-KDD dataset
- Current Industry Standards & comparison with proposed
solution