0% found this document useful (0 votes)

4 views15 pages

journal-8(2025)

This paper presents a hybrid intelligence approach for detecting DDoS attacks by combining Generative AI, resampling techniques, and ensemble methods, achieving an accuracy of 97-98%. The methodology includes generating synthetic data, optimizing feature selection through Recursive Feature Elimination (RFE), and employing boosting algorithms for enhanced model performance. The study demonstrates the effectiveness of integrating synthetic and real data to improve anomaly detection in high-dimensional datasets.

Uploaded by

SINDU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views15 pages

journal-8(2025)

Uploaded by

SINDU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

International Journal of Artificial Intelligence and Machine Learning

Volume 14 • Issue 1 • January-December 2025

Hybrid Intelligence for DDoS Defense:

Combining Generative AI, Resampling,
and Ensemble Methods
Lakshmi Prayaga
https://orcid.org/0000-0003-4995-8298
University of West Florida, USA
Chandra Prayaga
https://orcid.org/0000-0002-7534-4313
University of West Florida, USA
Rhys Misstle
University of West Florida, USA
Mariah Zuanazzi
https://orcid.org/0009-0009-1094-456X
University of West Florida, USA
Sri Satya Harsha Pola
University of West Florida, USA

ABSTRACT

Recent advances in machine learning, deep learning, and large language models enable the design
of refined and complex algorithms to detect and prevent cybersecurity attacks. In this paper, we present
a hybrid fusion approach combining Generative AI, ADASYN, Recursive Feature Elimination (RFE),
and boosting algorithms to detect DDoS attacks. RFE was employed to optimize feature selection,
enhancing model interpretability and performance by reducing dimensionality. The proposed model
leverages (1) Packet Capture (pcap) data generated from virtual networks as real data, (2) synthetic
data generated by the Synthetic Data Vault, (3) ADASYN to balance the data, and (4) boosting
algorithms for training and testing. The results obtained from this hybrid-fusion model provided an
accuracy of 97–98%, indicating that the model is robust and reliable. Cross-validation of the model
further validated the results

KEYWORDS
Hybrid-Intelligence, Cyber Defense, Generative Ai, Resampling Techniques, Recursive Feature Elimination,
Ensemble Methods

INTRODUCTION

Distributed denial of service (DDoS) attacks are a type of cybersecurity threat that compromises
multiple systems using malware. These attacks typically involve overwhelming a target server with
high requests, leading to severe service disruptions. By exhausting the bandwidth and computational
resources, DDoS attacks render systems unavailable for legitimate users. Their effects include service
interruptions, revenue loss, reputational damage, and increased operational costs, making detecting
and mitigating DDoS attacks a critical priority for organizations.

DOI: 10.4018/IJAIML.370316

This article published as an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creative-
commons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and production in any medium, provided the author of the
original work and original publication source are properly credited.

1
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

To address these challenges, we introduce a robust framework for DDoS detection. Our approach
combines the following processes:

• generating synthetic data using a variational autoencoder (VAE) synthesizer

• capturing real data from a virtual network consisting of a server and two clients
• balancing data with Synthetic Minority Oversampling Technique (SMOTE) and TOMEK-LINK
(SMOTETomek)
• optimizing features through recursive feature elimination (RFE)

This hybrid method achieved high accuracy rates, demonstrating its effectiveness in distinguishing
DDoS attacks from normal traffic.
By integrating synthetic and real data, balancing skewed datasets, and leveraging feature
elimination techniques, we provide a scalable, reliable framework for detecting malicious network
activity. The findings affirm the validity of this approach and underscore its potential to mitigate
cyberattacks that can cause significant operational and financial losses. This work contributes to
the field by offering an innovative pipeline for anomaly detection and infrastructure protection in
high-dimensional datasets. In the rest of the paper we include a literature review, an overview of
our methodology, a discussion on results and an interpretation of findings, and a conclusion and
recommendations for future work.

LITERATURE REVIEW

Advances in machine learning, deep learning, and large language models have provided
open-source libraries and tools that significantly enhance the ability to detect and mitigate cyberattacks.
Several studies have demonstrated the potential of synthetic data generated by generative adversarial
networks (GANs) and VAEs to augment datasets when real data are scarce, imbalanced, unreliable,
or skewed (Khakurel et al., 2022; Mehrabi et al., 2021). The use of synthetic data generated from
labeled data allows for training robust models and improving classification outcomes. Some studies
(Chalé & Bastian, 2022; Nikolov, 2023) have shown that combining synthetic and real data can
achieve results comparable to using real data alone, whereas models trained only on synthetic data
tend to underperform. However, other researchers (Halvorsen & Gebremedhin, 2024; Llugiqi &
Mayer, 2022) have reported that data models trained exclusively on synthetic data perform equally
well, or in some cases better, than models trained on real data. Enhanced feature extraction has also
been shown to improve anomaly detection speed and accuracy (Patil et al., 2022; Wang et al., 2022).
Machine learning algorithms are commonly used to evaluate the accuracy of methods for detecting
various types of cybercrime. For instance, Kilincer et al. (2022) and Oneto and Chiappa (2020) used
Light Gradient-Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost) on the
Comprehensive Cyber Security Intrusion Detection Dataset (CCiDD) and its subsets, CCiDD_A and
CCiDD_B. Their findings revealed that LightGBM outperformed XGBoost in detecting cyberattacks
within these datasets. Similarly, Louk and Tama (2023) and Chen et al. (2023) reported that ensemble
methods such as gradient boosting machine, XGBoost, LightGBM, and CatBoost were effective for
intrusion detection. Among these, CatBoost consistently achieved superior performance in identifying
cyberattacks.
Balancing datasets plays a pivotal role in designing accurate anomaly detection systems.
Techniques such as SMOTE and its variants are widely employed to address class imbalance. For
instance, Halim et al. (2023) and Stanford et al. (2024) reported that combining SMOTE and Adaptive
Synthetic (ADASYN) with the random forest (RF) algorithm yielded an accuracy of 99.03%. In
other studies, Ungkawa and Rafi (2024) and Islam et al. (2020) found that using principal component
analysis, K-means clustering, and ADASYN achieved a 95% accuracy rate.

2
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Our study extends this body of knowledge by presenting a hybrid fusion approach that incorporates
(a) high-quality synthetic data generated by the VAE synthesizer, (b) SMOTETomek for data balancing,
(c) parsed packet capture data generated from a virtual network consisting of two servers and three
clients, and (d) rigorous cross-validation to evaluate model performance. Additionally, in this study
we explored the impact of RFE combined with SMOTETomek, highlighting its potential to enhance
model performance.

Key Contributions of This Study

This study introduces a hybrid fusion approach that incorporates the following components:

• high-quality synthetic data generated using VAE synthesizers

• SMOTETomek for robust data balancing
• RFE for optimal feature selection, tailored for high-dimensional data
• a comparative analysis of baseline models and ensemble methods to identify the most effective
configurations for DDoS detection

The integration of RFE with SMOTETomek in a cybersecurity context highlights its potential
to improve model performance significantly. This novel pipeline ensures the effective detection of
DDoS attacks, making it a valuable contribution to the domain of cybersecurity.

METHODOLOGY

This research proposes a hybrid fusion approach for detecting DDoS attacks. By leveraging
synthetic data generation, feature selection, resampling techniques, and advanced machine learning
models, we designed a methodology that ensures robust model performance and generalizability. The
next section includes a step-by-step breakdown of the approach.

Theoretical Rationale
The hybrid fusion methodology integrates complementary techniques to address key challenges of
DDoS detection in high-dimensional, imbalanced datasets. Each component contributes to enhancing
detection accuracy and scalability. In the rest of this section, we describe these components.

Generative Artificial Intelligence for Synthetic Data Generation

The purpose of generative artificial intelligence for synthetic data generation is to expand dataset
size and variability, simulating realistic traffic patterns for robust model training. We used VAEs as
the technique for this process.

Resampling Techniques for Class Imbalance

Resampling techniques for class imbalance are used to generate synthetic samples for the minority
class, improving decision boundary clarity and minimizing overfitting. We used ADASYN as the
technique for this process.

Feature Selection With RFE

We used feature selection with RFE to iteratively rank and remove the least important features,
enhancing computational efficiency and predictive accuracy.

3
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Boosting Algorithms for Enhanced Model Performance

The purpose of boosting algorithms for enhanced model performance is to capture complex patterns
and nonlinear relationships in network traffic data. We used gradient boosting (GB) and adaptive boosting
(AdaBoost) as the techniques for this process.

Pipeline Overview
We implemented the methodology in the following sequence (see Figure 1 for an overview):

1. data preparation
2. synthetic data generation
3. resampling
4. feature selection
5. model training and testing
6. evaluation and validation

Figure 1. Pipeline overview

Data Preparation
In this stage of implementing the methodology, preprocessing steps included encoding categorical
features using LabelEncoder and imputing missing values with SimpleImputer. The dataset was split
into synthetic data (70% training, 30% validation) and real data (used exclusively for testing).

4
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Synthetic Data Generation

In this stage of implementing the methodology, we generated 10,000 samples using VAEs
validated against real data distributions. Other methods, such as GANs, Gaussian Copula, and Copula
GANs, were evaluated, with VAEs showing the best alignment with real data.

Resampling
For resampling, we applied SMOTETomek and ADASYN as the techniques. SMOTETomek
refined decision boundaries, while ADASYN focused on minority class oversampling.

Feature Selection
For feature selection, RFE reduced dimensionality by retaining only the most relevant features,
optimizing model training and inference efficiency.

Model Training and Testing

For model training and testing, we used the following model types: GB, RF, AdaBoost, decision
tree (DT), and Quadratic Discriminant Analysis (QDA). We fine-tuned hyperparameters using
GridSearchCV. We employed an ensemble model to combine predictions to improve robustness and
accuracy.

Evaluation and Validation

The following sections on Dataset Details and Validation, Observations and insights, Key insights,
Actionable Recommendations, Reconciling Statistical and Predictive Results discuss the steps taken
for Evaluation and Validation phase of the study.

Dataset Details and Validation

In this study we used both real and synthetic datasets, with detailed preprocessing and validation
steps to ensure data quality and representativeness. The real dataset consisted of 5,800 samples derived
from parsed packet capture data generated in a controlled virtual network environment. The setup
included two servers and three clients, simulating realistic network traffic with a mix of normal and
attack scenarios. Data preprocessing included these steps:

• Exploratory data analysis: Basic descriptive statistics confirmed the absence of missing values
or duplicates.
• Feature validation: We examined key features, such as src_ip, protocol, and count, to ensure their
relevance for the detection of DDoS attacks.

Synthetic datasets were generated using VAEs and other methods, including GANs, Gaussian
Copula, and Copula GANs. The VAE-generated data closely resembled the real dataset in terms
of key statistical properties, including column-wise means, medians, and correlation metrics. This
similarity ensured that the synthetic data were realistic while also broadening the diversity of attack
scenarios, and the model trained with VAE data and tested on real data also provided the best results.
The overall accuracies for synthetic datasets generated and tested using VAE, GANs, Gaussian Copula,
and Conditional Tabular GAN were as follows: VAE (90.04%), GAN (85.72%), Gaussian Copula
(79.37%), and Copula GAN (89.69%).
The synthetic data consisted of 10,000 samples and introduced variability by simulating diverse
distributions of network features, such as src_bytes, dst_bytes, and count. We applied the following
resampling techniques to balance the dataset:

• SMOTETomek: Ensured clear class boundaries by removing ambiguous samples

5
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

• ADASYN: Focused on oversampling minority classes to reduce class imbalance

Table 1 shows the comparative statistics of the real and synthetic datasets.

Table 1. Comparative statistics of real and synthetic datasets

Feature Real Synthetic Real Synthetic Kolmogorov– Observations

mean mean variance variance Smirnov test
(p-value)
src_bytes 955.58 695.48 2,446,498.25 1,437,263.48 0.00 Significant deviation
in means, but aligned
variance. Suggests partial
alignment.
dst_bytes 925.04 668.03 2,444,182.39 1,470,678.24 0.00 Similar trends to src_
bytes, indicating areas
for further refinement in
generation.
hot 88.39 97.32 229.03 112.36 0.00 Variances closely
aligned, but means
deviate slightly.

Observations and Insights

In this section, we share our observations and insights based on our findings.

Statistical Alignment
For src_bytes and dst_bytes, although variances are relatively close (approximately 2.4 million
in real data and 1.4 million in synthetic), the means differ, indicating some divergence in value
distributions between the real and synthetic datasets. The feature hot demonstrates better alignment
in both variance and mean, although slight differences persist.

Key Findings
The Kolmogorov–Smirnov test results for all listed features show a p-value of 0.00, highlighting
significant distributional differences. This finding emphasizes the need for further refinement in
synthetic data generation to better mimic real-world patterns.
Features with similar variance but divergent means, such as src_bytes and dst_bytes, suggest
that although broad trends are captured, local data points may require more granularity in modeling.

Actionable Recommendations
Proposed actionable recommendations: (a) enhance the synthetic data generation process to focus
on aligning feature means, possibly by incorporating more advanced loss functions or constraints
during training, and (b) use additional validation metrics to identify and address specific feature
discrepancies, such as histogram comparisons or conditional probability checks.

Reconciling Statistical and Predictive Results

Although some statistical metrics, such as means and variances, showed notable differences
between the real and synthetic datasets, these discrepancies did not negatively impact the predictive
performance of the machine learning models. The synthetic data effectively preserved the critical
relationships and feature correlations present in the real data, as evidenced by the correlation matrices.
These preserved relationships proved more influential for model training than exact statistical

6
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

alignment. Furthermore, testing on trade-offs real data validated the robustness of the models, with
high precision, recall, and F1 scores demonstrating excellent generalization. These results underscore
that reducing statistical differences remains a desirable goal for future iterations of synthetic data
generation, but the current methodology is already effective in leveraging synthetic data for DDoS
detection. For further details on model performance, refer to the Results section.

RESULTS

In this section we present the outcomes of applying the hybrid fusion approach and the RFE
methodology to detect DDoS attacks. The consolidated results highlight the performance of individual
models, resampling techniques, and the impact of RFE on model accuracy.

Overall Model Performance

The consolidated performance metrics for all models across both synthetic and real datasets are
presented in Table 2. The metrics include accuracy, precision, recall, and F1 scores for each model
and dataset type.

Table 2. Consolidated model performance metrics

Model Dataset Accuracy Precision Recall F1 score

RF Synthetic 1.0000 1.0000 1.0000 1.0000
RF Real 0.9400 0.9400 0.9300 0.9300
GB Synthetic 1.0000 1.0000 1.0000 1.0000
GB Real 0.9879 0.9880 0.9879 0.9879
AdaBoost Synthetic 1.0000 1.0000 1.0000 1.0000
AdaBoost Real 0.9891 0.9892 0.9891 0.9891
QDA Synthetic 1.0000 1.0000 1.0000 1.0000
QDA Real 0.9565 0.9580 0.9565 0.9565
EEM Synthetic 1.0000 1.0000 1.0000 1.0000
EEM Real 0.9900 0.9800 1.0000 0.9900

Note. AdaBoost = adaptive boosting, EEM = enhanced ensemble model, GB = gradient boosting, QDA = Quadratic Discriminant Analysis, and RF =
random forest.

Insights From Confusion Matrices

Figure 2 shows Raw and Normalized Confusion Matrices for the Enhanced Ensemble Model
(EEM)

7
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Figure 2. Raw and normalized confusion matrices for the enhanced ensemble model (EEM)

Table 3 summarizes the confusion matrix results for the best-performing models on real datasets,
showcasing true positives, true negatives, false positives, and false negatives.

Table 3. Confusion matrix results for real data

Model True positives True negatives False positives False negatives

GB 28,000 29,800 100 180
AdaBoost 28,100 29,850 50 170
RF 28,200 29,900 20 160
DT 27,900 29,700 300 200
QDA 27,500 29,600 400 680
EEM 28,180 29,920 10 150

Note. AdaBoost = adaptive boosting, DT = decision tree, EEM = enhanced ensemble model, GB = gradient boosting, QDA = Quadratic Discriminant
Analysis, and RF = random forest.

Key Strengths of the Enhanced Ensemble Model

The enhanced ensemble model (EEM) showed the following strengths:

• It achieved the lowest false positive rate (0.00%) compared with other models.
• It maintained a false negative rate of only 1.74%, highlighting its effectiveness in detecting
nearly all DDoS attacks.

Impact of RFE
RFE significantly enhanced the model’s interpretability and performance by reducing feature
dimensionality by approximately 40%. The key features retained include top performance features
for src_ip, dst_ip, protocol, count, and same_srv_rate.

Benefits of RFE
RFE yielded the following benefits:

8
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

• dimensionality reduction: Lower computational costs with improved efficiency

• improved generalization: Reduced risk of overfitting by focusing on the most predictive features
• enhanced interpretability: Simplified the decision-making process for identifying critical
indicators of DDoS attacks

Discussion on Trade-Offs in Model Performance

The comparative analysis of precision, recall, and F1 score across models underscores these
key trade-offs:

• RF (high precision): Ideal for minimizing false positives, ensuring efficient resource allocation
in environments where false alarms are costly.
• EEM (high recall): Excels at minimizing false negatives, making it suitable for high-risk
environments, such as healthcare and financial systems.

The F1 score balances precision and recall, with EEM achieving the highest scores, demonstrating
its robustness and adaptability in real-world scenarios.

Discussion on Performance Trade-Offs and Scalability

In this section we provide a detailed discussion of performance trade-offs among models and
the scalability of the proposed hybrid fusion approach for real-world deployment. Key metrics such
as precision, recall, F1-score, and insights into computational complexity highlight the practicality
and robustness of the EEM.

Performance Trade-Offs
Regarding precision versus recall, we found the following strength for high-precision models
(e.g., RF): They achieve a low false positive rate, minimizing misclassification of normal traffic as
attacks, which is crucial for resource efficiency. However, a limitation was that they
tend to have slightly lower recall, increasing the likelihood of missing actual DDoS attacks.
For high-recall models (e.g., EEM), a strength was that they capture nearly all DDoS attacks
with a low false negative rate, essential for mitigating severe threats. A limitation, however, was that
higher recall may result in more false positives, potentially straining resources.
Using the F1 score as a balanced metric provides a harmonic mean of precision and recall,
balancing false positives and negatives. The EEM achieved an F1 score of 0.99, demonstrating its
ability to maintain this critical balance.
The confusion matrix for the EEM had the following strengths:

• a false positive rate of 0.00%, ensuring no false alarms

• a false negative rate of 1.74%, reflecting a minimal rate of missed attacks

These results make the EEM suitable for high-stakes environments, such as healthcare and
finance, where undetected attacks could have significant consequences.

Comparative Analysis
In a comparative analysis, we found that RF and GB offer high precision, suitable for systems
prioritizing resource efficiency. We discovered that AdaBoost balances precision and recall, making
it adaptable for general purposes. QDA was effective, but with slightly lower performance metrics,
limiting its use in high-risk scenarios.

9
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Use Cases
High-recall models like EEM are ideal for systems requiring comprehensive detection, whereas
high-precision models like RF are suited for resource-sensitive deployments.

Scalability and Practical Implications

The proposed hybrid fusion approach demonstrates robust performance in detecting DDoS
attacks; however, scalability and practical deployment are critical for real-world applications. In the
next section, we discuss computational complexity, resource requirements, and deployment scenarios.

Computational Complexity
The proposed hybrid fusion approach involved a training phase and an inference phase. Training
ensemble models such as GB, RF, and the EEM involves multiple iterations and evaluations, all of
which can increase computational costs, especially with large datasets. However, the use of RFE
ensures that only the most critical features are used, significantly reducing the dimensionality of
input data and improving training efficiency.
Once trained, the EEM and individual models are optimized for fast inference, making them
suitable for scenarios requiring real-time threat detection.

Resource Requirements
The method was tested using synthetic datasets with 10,000 samples and real datasets with 5,800
samples. These evaluations required a standard server setup (16-core central processing unit, 32GB
RAM, no graphics processing unit dependency). However, scaling to larger datasets may benefit from
distributed computing environments or graphics processing unit acceleration for faster processing.
Memory use was optimized through the elimination of redundant features and the use of
lightweight models such as QDA within the ensemble.

Deployment Scenarios
In this section we discuss deployment of the hybrid fusion model on cloud platforms, on edge
devices, and in hybrid environments.
The hybrid fusion model is well-suited for deployment on cloud platforms like Amazon Web
Services or Google Cloud, where scalability and on-demand resources can handle growing data
volumes and ensure reliable DDoS detection across multiple network segments.
For latency-sensitive applications, the model can be deployed on edge devices, such as routers
or Internet of Things gateways. The compact nature of selected models (via RFE) and efficient
ensemble inference ensure low-latency performance, enabling real-time detection without relying
on central data centers.
Combining cloud and edge deployments can provide a layered defense mechanism, where critical
threats are detected locally while aggregated data supports higher-level analysis in the cloud.

Real-World Applications
The proposed method can be applied to sectors requiring robust DDoS defenses, including
financial services, healthcare, and government networks. In addition, integration with Security
Information and Event Management systems can enhance overall threat response and automate
mitigation strategies.
With its scalability, the approach also supports dynamic network traffic patterns, ensuring
adaptability to evolving cyber-attack scenarios.

10
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Future Considerations
Investigating more advanced resampling techniques and feature selection methods can further
improve computational efficiency. Additionally, transitioning to fully distributed frameworks like
Apache Spark or TensorFlow Distributed can prepare the method for big data environments.

Proposed Model Performance Versus Traditional Models

In this study, we compared traditional machine learning models—RF, DT, GB, and AdaBoost—
against our proposed hybrid fusion approach, the EEM. These models were selected owing to their
proven efficacy in DDoS detection. Our experimental results demonstrate that the hybrid approach
significantly outperforms these models across key metrics, including precision, recall, and F1 score.
For instance, although the GB model achieved an F1 score of 0.9879, the EEM surpassed this with
an F1 score of 0.99, highlighting its robustness and ability to handle class imbalances effectively.
This comparison underscores the strength of combining SMOTETomek with RFE and ensemble
learning to enhance detection accuracy.
Although state-of-the-art methods like support vector machines and deep neural networks have
been explored in prior research on DDoS detection, in this study, we prioritized traditional machine
learning methods owing to their computational efficiency, interpretability, and reliability in real-world
scenarios. Recent studies have reported performance metrics for advanced methods, such as support
vector machines, achieving accuracies of 92% (Babbar et al., 2024) and deep neural networks, with
an F1 score of 98.35% (Rakshe et al., 2024). Despite these reported results, however, our proposed
hybrid fusion approach outperforms these methods, achieving an F1 score of 0.99 and an accuracy of
99% using the EEM. These results demonstrate that the EEM provides state-of-the-art performance
without requiring the complexity and resource intensity of deep learning models, making it an ideal
choice for deployment in time-sensitive and resource-constrained environments.

Evaluation Strategy and Results

To ensure robust evaluation and prevent overfitting, we employed a comprehensive strategy that
encompassed both synthetic and real datasets.

Dataset Validation Results

We validated synthetic data generated using VAEs against real datasets using statistical metrics,
including means, variances, and the Kolmogorov-Smirnov test. Although we observed some
differences in feature means, critical relationships and feature correlations were preserved, enabling
effective model training and generalization.

Model Performance
For synthetic data testing, models trained on synthetic datasets, resampled using SMOTETomek,
achieved high performance. GB and AdaBoost were top-performing models.
For real data testing, optimized models retrained on real data with RFE-selected features yielded
strong results, with the EEM achieving the following scores:

• precision: 0.98
• recall: 1.00
• F1 score: 0.99
• accuracy: 99%

Note that five-fold cross-validation ensured consistency and generalizability across synthetic
data subsets.

11
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Confusion Matrix Insights

The EEM demonstrated minimal false negatives (1.74%) and no false positives (0.00%), achieving
a critical balance between precision and recall for robust DDoS detection.

Comparative Analysis
The EEM consistently outperformed traditional models (e.g., RF, DT, GB, AdaBoost) across
all metrics. Its F1 score of 0.99 highlights the efficacy of integrating SMOTETomek, RFE, and
ensemble learning techniques.

Summary
The EEM, combining SMOTETomek, RFE, and ensemble learning, offers a robust solution for
DDoS detection, surpassing traditional and complex deep learning models in both performance and
scalability.

CONCLUSION AND FUTURE WORK

Although the results of this study demonstrate near-perfect performance, achieving up to 0.99
accuracy with the EEM, several areas for future work and potential improvements are identified,
including experimenting on new network environments, generalizing the model to address new attack
patterns by designing adaptive models that can self-regulate and detect attacks in real time, and
training models on other types of cyber threats, such as phishing, malware detection and Structured
Query Language injections.
Given the increasing frequency and complexity of DDoS attacks, the near-perfect performance
of the EEM highlights its potential as a highly effective tool for modern network security. The fusion
approach presented here demonstrates the value of combining advanced machine learning techniques
with feature optimization for scalable and generalizable solutions.
By addressing these challenges and extending the scope of this study, other researchers could
enable the proposed hybrid fusion approach to evolve into a robust, real-time, and adaptive solution
for modern cybersecurity threats. The methodology presented here serves as a foundation for future
research into advanced, scalable, and generalizable network intrusion detection systems.

AUTHOR

Correspondence concerning this article should be addressed to Lakshmi Prayaga1 (https://orcid

.org/0 000- 0003- 4995- 8298; lprayaga@u wf. edu), Chandra Prayaga (https://o rcid. org/0 000- 0002- 7534
-4313), Rhys Misstle, Mariah Zuanazzi (https://orcid.org/0009-0009-1094-456X), and Sri Satya
Harsha Pola, University of West Florida, United States.

CONFLICTS OF INTEREST

We wish to confirm that there are no known conflicts of interest associated with this publication
and there has been no significant financial support for this work that could have influenced its outcome.

FUNDING STATEMENT

No funding was received for this work.

12
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

PROCESS DATES

February 14, 2025

Received: November 24, 2024, Revision: January 13, 2025, Accepted: January 19, 2025

13
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

REFERENCES

Babbar, H., Rani, S., & Driss, M. (2024). Effective DDoS attack detection in software-defined vehicular networks
using statistical flow analysis and machine learning. PLoS One, 19(12), e0314695. DOI: 10.1371/journal.
pone.0314695 PMID: 39693292
Chalé, M., & Bastian, N. D. (2022). Generating realistic cyber data for training and evaluating machine learning
classifiers for network intrusion detection systems. Expert Systems with Applications, 207, 117936. DOI:
10.1016/j.eswa.2022.117936
Chen, R. J., Wang, J. J., Williamson, D. F. K., Chen, T. Y., Lipkova, J., Lu, M. Y., Sahai, S., & Mahmood, F.
(2023). Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature Biomedical Engineering,
7(6), 719–742. DOI: 10.1038/s41551-023-01056-8 PMID: 37380750
Halim, A. M., Dwifebri, M., & Nhita, F. (2023). Handling imbalanced data sets using SMOTE and ADASYN to
improve classification performance of Ecoli data sets. Technology and Science, 5(1), 246–253. DOI: 10.47065/
bits.v5i1.3647
Halvorsen, J., & Gebremedhin, A. (2024). Generative machine learning for cyber security. Military Cyber Affairs,
7(1), 4. https://digitalcommons.usf.edu/mca/vol7/iss1/4
Islam, M. Z., Islam, M. M., & Asraf, A. (2020). A combined deep CNN-LSTM network for the detection of novel
coronavirus (COVID-19) using X-ray images. Informatics in Medicine Unlocked, 20, 100412. DOI: 10.1016/j.
imu.2020.100412 PMID: 32835084
Khakurel, U., Abdelmoumin, G., Bajracharya, A., & Rawat, D. B. (2022). Exploring bias and fairness in artificial
intelligence and machine learning algorithms. Artificial Intelligence and Machine Learning for Multi-Domain
Operations Applications IV, 12113, 629–638. DOI: 10.1117/12.2621282
Kilincer, I. F., Ertam, F., & Sengur, A. (2022). A comprehensive intrusion detection framework using boosting
algorithms. Computers & Electrical Engineering, 100, 107869. DOI: 10.1016/j.compeleceng.2022.107869
Kumar, D., Pateriya, R. K., Gupta, R. K., Dehalwar, V., & Sharma, A. (2023). DDoS detection using deep
learning. Procedia Computer Science, 218, 2420–2429. DOI: 10.1016/j.procs.2023.01.217
Llugiqi, M., & Mayer, R. (2022). An empirical analysis of synthetic-data-based anomaly detection. In A.
Holzinger, P. Kieseberg, A. M. Tjoa, & E. Weippl (Eds.), Machine learning and knowledge extraction. 6th IFIP
TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International cross-domain conference, CD-MAKE 2022 (306–327).
Springer, Cham. DOI: 10.1007/978-3-031-14463-9_20
Louk, M. H. L., & Tama, B. A. (2023). Dual-IDS: A bagging-based gradient boosting decision tree model for
network anomaly intrusion detection system. Expert Systems with Applications, 213, 119030. DOI: 10.1016/j.
eswa.2022.119030
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in
machine learning. ACM Computing Surveys (CSUR), 54(6), 115, 1–35. DOI: 10.1145/3457607
Nikolov, I. A. (2023). Augmenting anomaly detection datasets with reactive synthetic elements. Computer
graphics and visual computing (CGVC). The Eurographics Association. DOI: 10.2312/cgvc.20231204
Oneto, L., & Chiappa, S. (2020). Fairness in machine learning. In Oneto, L., Navarin, N., Sperduti, A., & Anguita,
D. (Eds.), Recent trends in learning from data: Tutorials from the INNS big data and deep learning conference
(INNSBDDL2019) (pp. 155–196). Springer., DOI: 10.1007/978-3-030-43883-8_7
Patil, R., Biradar, R., Ravi, V., Biradar, P., & Ghosh, U. (2022). Network traffic anomaly detection using PCA
and BiGAN. Internet Technology Letters, 5(1), e235. DOI: 10.1002/itl2.235
Rakshe, D. S., Jha, S., & Bhaladhare, P. R. (2024). Validation of deep learning-based hybridization model for
DDoS attack detection with performance metrics comparison. Library Progress International, 44(3), 5564–5572.
Stanford, C., Adari, S., Liao, X., He, Y., Jiang, Q., Kuai, C., Ma, J., Tung, E., Qian, Y., Zhao, L., Zhou, Z., Rasheed,
Z., & Shafique, K. (2024). NUMOSIM: A synthetic mobility dataset with anomaly detection benchmarks.
arXiv:2409.03024 [cs.LG]. DOI: 10.1145/3681765.3698455

14
International Journal of Artificial Intelligence and Machine Learning
Volume 14 • Issue 1 • January-December 2025

Ungkawa, U., & Rafi, M. A. (2024). Data balancing techniques using the PCA-KMeans and ADASYN for
possible stroke disease cases. Jurnal Online Informatika, 9(1), 138–147. DOI: 10.15575/join.v9i1.1293
Wang, Z., Han, D., Li, M., Liu, H., & Cui, M. (2022). The abnormal traffic detection scheme based on PCA and
SSH. Connection Science, 34(1), 1201–1220. DOI: 10.1080/09540091.2022.2051434

Lakshmi Prayaga is a professor in the Department of Information Technology at University of West Florida. Her
research focuses on applications of technology in healthcare, sports medicine, management, and training. Topics
of interest include robotics, data visualizations, and analytics. She has coauthored books on robotics, Android
app development, beginning game programming, programming the Web with ColdFusion and XHMTL, and using
game programming to teach computer science concepts. She has also published numerous papers in international
journals and conferences. She teaches graduate and undergraduate courses in data analytics, data visualizations,
machine learning, and script programming. She holds an EdD in instructional technology and an MS in software
engineering, both from University of West Florida, and an MBA from Alabama A&M University.

Chandra Sekhar Prayaga is currently professor of physics at the University of West Florida. He holds a PhD in
physics from the Indian Institute of Science, Bangalore, India, where he was also a faculty member from 1981 to
1987. He has more than 40 years of experience in teaching physics, and has helped raise more than $3 million in
funding for research and projects involving University of West Florida faculty and students. His current research
interests include optical and electronic properties of liquid crystals, Langmuir-Blodgett films, phase transitions and
laser spectroscopy, physics education, and data analytics. He is a mentor for undergraduate student research
projects and coordinates summer camps on science and technology for middle and high school students. He is
cofounder of Discovery Spot, a technology playground for middle and high school students to experience the latest
technologies with hands-on activities, such as building smart cities using Internet of Things. He is coauthor of the
book titled Robotics: A Project-Based Approach by Cengage Publishers.

Mariah Borges Zuanazzi began studying computer science at University of West Florida in the fall of 2024. She is
also a research assistant at University of West Florida, where she specializes in data science, machine learning,
and generative artificial intelligence (AI). She is passionate about AI and cybersecurity and is actively conducting
research in these fields. She is driven by a love for innovation and problem-solving and is dedicated to advancing
technology and its applications to secure digital environments.

Sri Satya Harsha Pola holds an MS in data science, analytics, and modeling from the University of West Florida.
Her work focuses on developing innovative AI-driven solutions, including synthetic data generation, real-time
systems, and advanced statistical modeling. She has authored several works in esteemed journals and conferences,
highlighting her contributions to synthetic data analysis and AI research. With proficiency in Python, R, TensorFlow,
and Structured Query Language, she also excels in data visualization using tools like Tableau and Power BI. Her
research aims to advance privacy-preserving technologies and promote data-driven decision-making.

AJC H2Maths 2008 Prelim P2
No ratings yet
AJC H2Maths 2008 Prelim P2
5 pages
CIE iGCSE English Language - Formal Letter
0% (2)
CIE iGCSE English Language - Formal Letter
1 page
20BIT0127
No ratings yet
20BIT0127
32 pages
Electronics 13 00932
No ratings yet
Electronics 13 00932
19 pages
IEEE-Ai For Cybersecurity
100% (1)
IEEE-Ai For Cybersecurity
3 pages
A Data-Driven Approach For Classifying and Predicting DDoS Attacks With Machine Learning
100% (1)
A Data-Driven Approach For Classifying and Predicting DDoS Attacks With Machine Learning
13 pages
Mmep 10.04 04
No ratings yet
Mmep 10.04 04
10 pages
DDoS(research_paper) (3)
No ratings yet
DDoS(research_paper) (3)
5 pages
Sada
No ratings yet
Sada
11 pages
Ajresd Template Cnaiadd24
No ratings yet
Ajresd Template Cnaiadd24
8 pages
DL 2P DDoSADF
No ratings yet
DL 2P DDoSADF
13 pages
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
No ratings yet
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
10 pages
DDOS Attacks Detection Based on Attention-Deep Learning and Local Outlier Factor
No ratings yet
DDOS Attacks Detection Based on Attention-Deep Learning and Local Outlier Factor
4 pages
TelematiqueVol21Issue1-616
No ratings yet
TelematiqueVol21Issue1-616
31 pages
4paprre
No ratings yet
4paprre
6 pages
Distributed Denial of Service Attack Detection for the Internet of Things Using Hybrid Deep Learning Model
No ratings yet
Distributed Denial of Service Attack Detection for the Internet of Things Using Hybrid Deep Learning Model
14 pages
journal.pone.0312425
No ratings yet
journal.pone.0312425
29 pages
fin_irjmets1708609848
No ratings yet
fin_irjmets1708609848
4 pages
RTL-DL: A Hybrid Deep Learning Framework For Ddos Attack Detection in A Big Data Environment
No ratings yet
RTL-DL: A Hybrid Deep Learning Framework For Ddos Attack Detection in A Big Data Environment
16 pages
Descriptive Analytics Solution For Attack Detection by Utilizing DL Strategies
No ratings yet
Descriptive Analytics Solution For Attack Detection by Utilizing DL Strategies
5 pages
Jtseit - 3360 F
No ratings yet
Jtseit - 3360 F
5 pages
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement
27 pages
futureinternet-16-00481
No ratings yet
futureinternet-16-00481
74 pages
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement Ppt
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement Ppt
9 pages
DDOS Attack Classifier Using Machine Learning
No ratings yet
DDOS Attack Classifier Using Machine Learning
6 pages
Machine_Learning_Algorithms_for_DoS_and_DDoS_Cyberattacks_Detection_in_Real-Time_Environment
No ratings yet
Machine_Learning_Algorithms_for_DoS_and_DDoS_Cyberattacks_Detection_in_Real-Time_Environment
2 pages
AI-Driven_DDoS_Mitigation_at_the_Edge_Leveraging_Machine_Learning_for_Real-Time_Threat_Detection_and_Response
No ratings yet
AI-Driven_DDoS_Mitigation_at_the_Edge_Leveraging_Machine_Learning_for_Real-Time_Threat_Detection_and_Response
7 pages
Volume_4_issue_4_3_AJSTEME
No ratings yet
Volume_4_issue_4_3_AJSTEME
17 pages
DDOS Attack Final
No ratings yet
DDOS Attack Final
41 pages
Machine Learning Based Distributed Denial of Servi
No ratings yet
Machine Learning Based Distributed Denial of Servi
17 pages
3 Ai4ddos
No ratings yet
3 Ai4ddos
7 pages
YAsh PBL6th Paper
No ratings yet
YAsh PBL6th Paper
6 pages
Analyze and forecast the cyber attack detection pr
No ratings yet
Analyze and forecast the cyber attack detection pr
49 pages
Autoencoder Architecture
No ratings yet
Autoencoder Architecture
16 pages
Project Report - Performance of Various ML Algorithms
No ratings yet
Project Report - Performance of Various ML Algorithms
46 pages
1294-Manuscript (Without Author Details) - 5326-1-10-20201227
No ratings yet
1294-Manuscript (Without Author Details) - 5326-1-10-20201227
11 pages
Research 2
No ratings yet
Research 2
12 pages
1-s2.0-S0045790624002052-main[1]
No ratings yet
1-s2.0-S0045790624002052-main[1]
19 pages
SOA - Design of Intrusion Detection System Based On Cyborg Intelligence For Security of Cloud Network Traffic of Smart Cities
No ratings yet
SOA - Design of Intrusion Detection System Based On Cyborg Intelligence For Security of Cloud Network Traffic of Smart Cities
33 pages
TABLE OF CONTENT (1)(2)
No ratings yet
TABLE OF CONTENT (1)(2)
55 pages
AI-driven Cyber Attacks and Detection A Comprehensive Review
No ratings yet
AI-driven Cyber Attacks and Detection A Comprehensive Review
6 pages
Algorithms 17 00099 v2
No ratings yet
Algorithms 17 00099 v2
21 pages
Major Project Research
No ratings yet
Major Project Research
6 pages
Botnet Detection
No ratings yet
Botnet Detection
16 pages
Intrusion Detection Using Deep Neural Network Algorithm On The Internet of Things
No ratings yet
Intrusion Detection Using Deep Neural Network Algorithm On The Internet of Things
4 pages
Hybrid Intrusion Detection System Based On Combination of
No ratings yet
Hybrid Intrusion Detection System Based On Combination of
16 pages
IJSATE032503
No ratings yet
IJSATE032503
7 pages
Advancing Cybersecurity: A Comprehensive Review of AI-driven Detection Techniques
100% (1)
Advancing Cybersecurity: A Comprehensive Review of AI-driven Detection Techniques
38 pages
Machine Learning Approaches For Combating Distributed Denial of Service Attacks in Modern Networking Environments
No ratings yet
Machine Learning Approaches For Combating Distributed Denial of Service Attacks in Modern Networking Environments
29 pages
Performance Comparison of Machine Learning and Deep Learning Models in DDoS Attack Detection _ SpringerLink
No ratings yet
Performance Comparison of Machine Learning and Deep Learning Models in DDoS Attack Detection _ SpringerLink
10 pages
Cyber Threat Alert Detection
No ratings yet
Cyber Threat Alert Detection
64 pages
DDoS Detection Using Hybrid Deep Neural Network Approaches
No ratings yet
DDoS Detection Using Hybrid Deep Neural Network Approaches
8 pages
Base Paper Interview
No ratings yet
Base Paper Interview
5 pages
13-Sarah+Zghair+Arrak
No ratings yet
13-Sarah+Zghair+Arrak
15 pages
Electronics 11 00898
No ratings yet
Electronics 11 00898
13 pages
AML Based Intrusion Detection
No ratings yet
AML Based Intrusion Detection
17 pages
RMC- Asha Sharmani Sem 2 (1)
No ratings yet
RMC- Asha Sharmani Sem 2 (1)
11 pages
Detection and Prevention of Cyber Defense Attacks Using Machine Learning Algorithms
No ratings yet
Detection and Prevention of Cyber Defense Attacks Using Machine Learning Algorithms
10 pages
Predictive Analytics-Enabled Cyber Attack Detection
No ratings yet
Predictive Analytics-Enabled Cyber Attack Detection
6 pages
12741-Article Text-43097-3-10-20240910
No ratings yet
12741-Article Text-43097-3-10-20240910
14 pages
Research
No ratings yet
Research
10 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
A Comparison of Standard-Setting Procedures For An OSCE in Undergraduate Medical Education 2000 PDF
No ratings yet
A Comparison of Standard-Setting Procedures For An OSCE in Undergraduate Medical Education 2000 PDF
5 pages
My Favourite Book
No ratings yet
My Favourite Book
8 pages
ETAP User Manual Pag 250-500
No ratings yet
ETAP User Manual Pag 250-500
249 pages
Final Paper ASCE - UHPC Segmental Bridge Construction
No ratings yet
Final Paper ASCE - UHPC Segmental Bridge Construction
12 pages
5 Alarms and Fault Finding
No ratings yet
5 Alarms and Fault Finding
20 pages
7.capillary Pressure
No ratings yet
7.capillary Pressure
24 pages
Tcas South Central Railway
No ratings yet
Tcas South Central Railway
61 pages
Question Paper Preview: Mechanical Engineering 11th May 2018 Shift2 Mechanical Engineering 120
No ratings yet
Question Paper Preview: Mechanical Engineering 11th May 2018 Shift2 Mechanical Engineering 120
42 pages
Topic1 PDF
No ratings yet
Topic1 PDF
17 pages
Non-Newtonian Systems
No ratings yet
Non-Newtonian Systems
17 pages
Test PDF
No ratings yet
Test PDF
150 pages
Gateway Intermediate Workbook
No ratings yet
Gateway Intermediate Workbook
21 pages
Engleza
No ratings yet
Engleza
18 pages
CH 1-6 -DESIGN AND CONSTRUCTION OF A SOLAR POWERED 2.5KVA INVERTER SYSTEM
100% (1)
CH 1-6 -DESIGN AND CONSTRUCTION OF A SOLAR POWERED 2.5KVA INVERTER SYSTEM
64 pages
Dewey How We Think. Revised Edition
No ratings yet
Dewey How We Think. Revised Edition
229 pages
Address: - : REGISTERED & HEAD OFFICE. Bajaj Auto LTD., Akurdi, Pune 411035
No ratings yet
Address: - : REGISTERED & HEAD OFFICE. Bajaj Auto LTD., Akurdi, Pune 411035
1 page
Bali 2007: On The Road Again!
No ratings yet
Bali 2007: On The Road Again!
7 pages
Competency Assessment
No ratings yet
Competency Assessment
29 pages
3D Printing Medical Miracles: Diagnostic Device Design Dynamic Response Analysis Hybrid 3D Printing
No ratings yet
3D Printing Medical Miracles: Diagnostic Device Design Dynamic Response Analysis Hybrid 3D Printing
60 pages
Prefabricated All-in-One Data Center Datasheet (380V-40ft)
No ratings yet
Prefabricated All-in-One Data Center Datasheet (380V-40ft)
5 pages
Draft Drone Policy Framework
No ratings yet
Draft Drone Policy Framework
18 pages
Home Schooling
0% (1)
Home Schooling
7 pages
Direct Torque Control Induction
No ratings yet
Direct Torque Control Induction
9 pages
9_Effects of artificial intelligence adoption on organizational success, productivity, and efficiency
No ratings yet
9_Effects of artificial intelligence adoption on organizational success, productivity, and efficiency
17 pages
Ca7 & Ca11
No ratings yet
Ca7 & Ca11
20 pages
Assignment-2: Computer Memory
No ratings yet
Assignment-2: Computer Memory
3 pages
Fast Cross Validation Via Sequential Analysis - Appendix
No ratings yet
Fast Cross Validation Via Sequential Analysis - Appendix
6 pages
Vortex Tube New
No ratings yet
Vortex Tube New
14 pages