0% found this document useful (0 votes)
4 views

AI Based Threat Detection System IEEE Report 1 1

The document presents an AI-based threat detection system developed by a team from Amrita Vishwa Vidyapeetham, utilizing machine learning techniques to classify network traffic as normal or malicious. It addresses the limitations of traditional intrusion detection systems by employing advanced algorithms such as Gradient Boosted Trees and Multi-Layer Perceptron, along with feature engineering methods like Principal Component Analysis. The system is designed for scalability and user-friendliness, integrating with a Flask application for real-time predictions.

Uploaded by

dandu.dharmaraju
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

AI Based Threat Detection System IEEE Report 1 1

The document presents an AI-based threat detection system developed by a team from Amrita Vishwa Vidyapeetham, utilizing machine learning techniques to classify network traffic as normal or malicious. It addresses the limitations of traditional intrusion detection systems by employing advanced algorithms such as Gradient Boosted Trees and Multi-Layer Perceptron, along with feature engineering methods like Principal Component Analysis. The system is designed for scalability and user-friendliness, integrating with a Flask application for real-time predictions.

Uploaded by

dandu.dharmaraju
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Page 1 of 14 - Cover Page Submission ID trn:oid:::1:3133765712

Kavitha C. R.
Paper
security paper

Publication

Amrita Vishwa Vidyapeetham

Document Details

Submission ID

trn:oid:::1:3133765712 9 Pages

Submission Date 3,967 Words

Jan 21, 2025, 9:29 AM GMT+5:30


24,138 Characters

Download Date

Jan 21, 2025, 9:30 AM GMT+5:30

File Name

AI_based_Threat_Detection_System_IEEE_Report_1_1.docx

File Size

1.3 MB

Page 1 of 14 - Cover Page Submission ID trn:oid:::1:3133765712


Page 2 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712

12% Overall Similarity


The combined total of all matches, including overlapping sources, for each database.

Filtered from the Report


Bibliography

Match Groups Top Sources

44 Not Cited or Quoted 11% 7% Internet sources


Matches with neither in-text citation nor quotation marks
8% Publications
2 Missing Quotations 0% 2% Submitted works (Student Papers)
Matches that are still very similar to source material

0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation

0 Cited and Quoted 0%


Matches with in-text citation present, but no quotation marks

Integrity Flags
0 Integrity Flags for Review
Our system's algorithms look deeply at a document for any inconsistencies that
No suspicious text manipulations found. would set it apart from a normal submission. If we notice something strange, we flag
it for you to review.

A Flag is not necessarily an indicator of a problem. However, we'd recommend you


focus your attention there for further review.

Page 2 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712


Page 3 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712

Match Groups Top Sources

44 Not Cited or Quoted 11% 7% Internet sources


Matches with neither in-text citation nor quotation marks
8% Publications
2 Missing Quotations 0% 2% Submitted works (Student Papers)
Matches that are still very similar to source material

0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation

0 Cited and Quoted 0%


Matches with in-text citation present, but no quotation marks

Top Sources
The sources with the highest number of matches within the submission. Overlapping sources will not be displayed.

1 Internet

www.mdpi.com 2%

2 Publication

H.L. Gururaj, Francesco Flammini, S. Srividhya, M.L. Chayadevi, Sheba Selvam. "Co… 1%

3 Publication

V. Sharmila, S. Kannadhasan, A. Rajiv Kannan, P. Sivakumar, V. Vennila. "Challeng… <1%

4 Student papers

Amrita Vishwa Vidyapeetham <1%

5 Publication

"Computer, Communication, and Signal Processing. Smart Solutions Towards SD… <1%

6 Internet

www.researchsquare.com <1%

7 Internet

arxiv.org <1%

8 Internet

test-www.iqvia.com <1%

9 Publication

Ahmed H. Ali, Ahmed Ali Hagag. "An enhanced AI-based model for financial fraud… <1%

10 Student papers

Georgia Institute of Technology Main Campus <1%

Page 3 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712


Page 4 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712

11 Publication

Lei Yan, Wei Tian, Hong Wang, Xing Hao, Zuyi Li. "Robust event detection for resi… <1%

12 Publication

M Karthigha, L Latha, Sripriyan K. "Intelligent Honeypot-based IDS for Cyber Atta… <1%

13 Publication

Singh Ajeet Kumar, Manish Verma, Vishal Kumar, Golu Kumar. "Chapter 33 Identi… <1%

14 Internet

1login.easychair.org <1%

15 Internet

journal.esrgroups.org <1%

16 Internet

github.com <1%

17 Internet

www.coursehero.com <1%

18 Publication

Mateusz Kazimierczak, Nuzaira Habib, Jonathan H. Chan, Thanyathorn Thanapatt… <1%

19 Internet

cps-vo.org <1%

20 Internet

wjahr.com <1%

21 Internet

www.frontiersin.org <1%

22 Internet

www.preprints.org <1%

23 Publication

Baker, Monika Joanna. "Applying Machine Learning Techniques to Clinical Proble… <1%

24 Publication

Gasimova, Aydan. "Performance Comparison of Weak and Strong Learners in Det… <1%

Page 4 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712


Page 5 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712

25 Publication

Jaya Banerjee, Durbar Chakraborty, Baisakhi Chakraborty, Anupam Basu. "Applic… <1%

26 Internet

journals.plos.org <1%

27 Internet

ksp.etri.re.kr <1%

28 Internet

mdpi-res.com <1%

29 Internet

readera.org <1%

30 Internet

sos-vo.org <1%

31 Internet

www.ijana.in <1%

32 Publication

Medha Mohan Ambali Parambil, Jaloliddin Rustamov, Soha Galalaldin Ahmed, Za… <1%

Page 5 of 14 - Integrity Overview Submission ID trn:oid:::1:3133765712


Page 6 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

AI Based Threat Detection System


Appikonda Shyam Sai Venkata Agastya, Bandi Rishikesh Kumar, Dandu Sasi Sathvik Varma, Gangu Chirudeep,
Kavitha C. R.*
7 Department of Computer Science & Engineering
Amrita School of Computing, Bengaluru
Amrita Vishwa Vidyapeetham, India
*Corresponding Author: [email protected],
4 [email protected], [email protected],
[email protected], [email protected].

30 Abstract— With the rapid growth of network cyber meant to cause harm to the confidentiality, integrity and
threats, there exists a growing need for advanced, availability of critical data and systems.
scalable and highly accurate mechanisms for threat
detection. AI based threat detection system is presented
in this paper that uses machine learning and feature Current rule based intrusion detection systems (IDS) suffer
engineering techniques for classifying network traffic as from their inability to adapt to rapidly changing attack
normal to malicious.It leverages state of the art patterns and sheer wide breadth of inputted network traffic.
algorithms including Gradient Boosted Trees and a Due to the growing innovation of attackers these systems
Multi Layer Perceptron, achieving high accuracy with have been facing challenges moving to detect novel threats,
1 optimized preprocessing steps such as Principal to handle huge scale data and generate fast response. In
8 Component Analysis and Chi Square feature response to these limitations, Artificial Intelligence (AI) and
selection.arning techniques and feature engineering Machine learning (ML) are powerful tools that can
28 methods to classify network traffic as either normal or potentially provide the ability to analyzise the complex
malicious. The system incorporates state-of-the-art patterns, spot anomalies and adapt to the new attack
algorithms, including Gradient Boosted Trees and strategy.
Multi-Layer Perceptron achieving high accuracy
27
1 through optimized preprocessing steps such as Principal In this paper, we present an AI Based Threat Detection
2 Component Analysis and Chi-Square feature selection. A System which is used to detect the network traffic and
3 Flask application and Python GUI are utilized as a classify it as normal or malicious traffic. Range of machine
means to test the system via user friendly interfaces for learning algorithms, such as ensemble models and neural
real time prediction and validation. networks are used in the system to identify known and
5 unknown threats. To improve model performance and
computational efficiency, advanced feature engineering
1 techniques are applied, including Principal Component
29 Keywords— Network Threat Detection, Machine
Analysis (PCA) for dimensionality reduction and Chi-
Learning, Gradient Boosted Trees, Multi-Layer
Square feature selection. These methods permit the system
Perceptron, Feature Engineering, Principal Component
to effectively process large datasets and identify crucial
Analysis, Cybersecurity.
'feature characteristics' that are critical to the identification
I. INTRODUCTION of threats.

Digital networks and inter connected systems have fastly The system is built to be both friendly to use, and scalable
expanded giving us the communications and also gadgets such that it integrates easily with testing interfaces including
more advancement. Unfortunately this growth has also a Flask based web application and a Python GUI. They are
resulted in an alarming rise in the sophistication of cyber further useful for practical scenarios in cybersecurity for the
threats from malware and phishing, to large scale following reason: they enable the user to input manually
19 Distributed Denial of Service (DDoS) attacks and APT. features and get real time predictions.
These attacks are potentially very damaging to the
By utilizing AI and ML techniques, this work seeks to
organizations, governments and individuals, and they are
address some challenges of modern cybersecurity landscape

Page 6 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 7 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

by introducing a new, robust and scalable threat detection more elements of optimization entered into the system and if
system. the detection algorithms were refined. In Software –
Defined Networking (SDN), Francesco Salatino et al. [5]
proposed an intrusion detection system based on artificial
32 II. LITERATURE SURVEY intelligence techniques for detecting Distributed Denial of
Service (DDoS) attacks. The authors combine advanced
15 Growing threats of cyber have led to extensive research on Machine Learning (ML) and Deep Learning (DL) methods
AI driven and integrated systems for cyber threats detection. to improve detection accuracy without increasing
These works consider diverse areas such as financial computational complexity. A research gap is indicated by a
networks and smart infrastructures, and are novel in the need for better feature analysis and selection to further
methodologies and technologies that they successfully apply reduce computational requirements and improve the
to enhance cybersecurity. This part summarizes detailed scalability of the system. Android malware detection was
review of few important works in this domain, which the main focus of Shamsher Ullah et al.[6] where they gave
discusses the research gaps and its matters in proposing attention to the fact that the cyber threats against Android
project. Kuldeep Singh and Lakshmi Sevukamoorthy [1] devices are increasing exponentially. Moreover, they
3 suggested one possible method of strengthening security of pointed out that the deficiencies in the performance of
such financial networks: the use of blockchain technology current machine learning models with regard to
combined with AI. The challenge of increasing cyber threats transparency and interpretability were especially glaring
to financial institutions was addressed by the authors who when considering the explanatory AI (XAI). According to
pointed out that cybersecurity frameworks must be robust. their study, XAI techniques come in handy in demystifying
One of their findings is the inability of existing the decision-making processes of ML models and supply
comprehensive frameworks that can adopt the advantages of actionable insights for end users and stakeholders. Sonu
blockchain as well as AI in the detection of threats. Their Preetam et al. [7] proposed a behaviour based threat
research fills the gap by demonstrating that it is possible to modelling approach with their explanations for intelligent
build secure and resilient system using immutable decision making. In order to overcome the various issues
blockchain ledgers and intelligent AI based threat detection associated with traditional intrusion detection systems, the
mechanisms. Marc Schmitt[2] investigated AI based authors wanted to make them scalable and real time. A gap
malware and intrusion detection in smart infrastructures and in model development was demonstrated that integrates
digital industries. It also highlighted the urgent need to diverse data sources, correlates tactics, techniques and
protect environments that are getting more and more procedures (TTPs) with advanced AI techniques, and
interlinked against sophisticated cyber threats. But Schmitt ultimately delivers real time, explainable threat detection.In
pointed out the difficulties of bringing AI/ML models into the context of 5G networks, Thulitha Senevirathna et al.[8]
22 internal digital ecosystems that are complex. The gap investigated the vulnerabilities of Explainable AI (XAI)
identified suggests that solutions that improve accuracy of methods in Network Intrusion Detection Systems (NIDS).s.
detection do not have to disrupt existing infrastructures or The results of their study also showed the vulnerabilities of
18 operations while seamlessly interfacing with them. Yisroel XAI methods towards scaffolding attacks and lack of
Mirsky et al. [3] examined the threat of offensive AI effective solutions to the problem of detecting such
highlighting how AI capable adversaries might exploit sophisticate adversarial attacks. Jonghoon Lee et al.[9]
organizational systems vulnerabilities. This research proposed an artificial neural networks based cyber threat
presents a structured perspective on offensive AI tactics detection system using event profiles. The challenge of
taking for granted the cyber kill chain and what it means to analyzing vast amounts of security event data where false
security. The authors found that the most glaring gap in their positives are high and real world threats are difficult to
study concerned the fact that this emerging threat lacked detect from those large amounts of data was addressed. The
strategic insights into how to defend against offensive AI, identified gap shows that existing methods in general are
proposing ways to assess and mitigate these threats when unable to to generalize to multiple data sets well and do not
12 they emerge. Bo-Xiang Wang and Jiann-Liang Chen [4] adequately reduce false alarms. Viraj Rathod et al. [10]
built an AI powered network threat detection network applied their AI and ML based anomaly detection system to
extended with 52 features derived from network interactions detect adversarial behaviors, based on the EMBER dataset.
including message based, host based and geography based They found the need for systems with real time response
data. The aim was to prevent command line based threats at mechanisms coupled with AI driven anomaly detection, a
the remote network connection and to get better detection space that if properly filled in, would greatly improve
accuracy and effectiveness. Nevertheless, this system responsiveness and accuracy in a dynamic threat
comprehensive system was successful and it was pointed out environment. Reviews of the literature show that there has
that greater advancement would have been possible were

Page 7 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 8 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

been excellent progress in the development of AI based


threat detection systems, but critical gaps also still
remain.The limitations include a lack of integrated
frameworks of blockchain and AI [1], difficulty of
seamlessly deploying AI/ML models in complex ecosystems
[2] and the demand of transparency and interpretability in
AI models [6,8]. Additional work is crucial, including
advances in feature analysis and selection processes [5],
generalization across different datasets [9], and real-time
response mechanisms [10] to alleviate current limitations.
The proposed system addresses these gaps in order to
contribute to the science of cybersecurity through the
development of an AI based threat detection system
specializing in scalability, interpretability, and real time
response. This presented system will take advantage of the
strengths, and lessons learned in already existing works, as
it will enhance the existing tools with the new ideas to
overcome the limitations in the present approaches.

III. METHODOLOGY

The proposed AI Based Threat Detection System is based on


17 a systematic methodology comprising of data preprocessing,
feature engineering, machine learning model training and
evaluation. Figure 1 illustrates the architecture of this
methodology with the modular design and the flow between
components. In this section we describe individually what
the steps are and the role in the end that they play to get at
accurate and efficient threat detection.
Figure 1 Architecture

Implementation Flow

11 The architecture of the proposed methodology is shown in


figure 1. It starts with raw data collection, follows through
preprocessing, feature engineering, followed by model
training. We integrate the model that performs the best into
the interfaces for testing and maintain a workflow from any
data input to threat prediction.

Dataset Description

Network traffic data are contained in the dataset making


both categorical as well as numerical features such as
protocol types, service ports, and connection flags. We label
each instance as normal or malicious so that supervised
machine learning models can classify threats. Our
challenges include working with high dimensional data,
unbalanced class distributions, and heterogenous feature
types.

Data Preprocessing

Page 8 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 9 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

Transforming raw or non structured data into structured 6.Naive Bayes: A quick baseline categorical data classifier
format for machine learning algorithms is called and a probabilistic model effective for categorical data.
preprocessing. Tasks include :
7.Logistic Regression: An interpretability and simplicity
1.String Indexing: It transforms categorial attributes (e.g. baseline linear model for binary classification.
protocol type, service flags) into numerical indicies to match
14 Model Training and Evaluation
those that machine learning models can handle.

2.Handling Missing Values: Data integrity is maintained by


either missing entries imputed or removed. The dataset is then split into the training (70%) and the
5 testing (30%) subsets for model development and
3.Scaling and Normalization: To make model converge,
evaluation.
2 Min-Max scaling or Z-score normalization is used to scale
numerical features to a uniform range. 1.Training Phase: Then, we process the dataset and train
machine learning models on it, which are then tuned to best
performance.
Feature Engineering
2.Evaluation Metrics: The assessment of models is through
16 metrics like accuracy, precision, recall, F1-score and
confusion matrix. In this case, these metrics make sure the
Feature engineering makes the dataset better by improving
models learn how to strike a fine balance between
the quality, and at the same time reducing computational
classifying datasets as’true’threat and ’false positive’,
overhead. The following techniques are applied:
avoiding to neglect a threat.
1.Principal Component Analysis (PCA): It reduces
3.Cross-Validation: The robustness and generality of the
dimensionality by finding out principal components, and
models is validated with k-fold cross validation using
keeping maximum variance on fewer features.
different subsets of data.
2.Chi-Square Feature Selection: Removes the irrelevant or
redundant attributes that contribute most to the classification
task and identifies the statistically significant features.

21 3.SMOTE (Synthetic Minority Over-sampling Technique): Testing Interfaces


31 It addresses class imbalance and balances the dataset by
To validate the system, two user-friendly testing interfaces
generating synthetic samples for the minority class.
are developed:

1.Flask Application: An application or interface through


Machine Learning Models which a user types in feature values to obtain immediate
predictions from the developed model.
A variety of machine learning models are implemented to
classify network traffic accurately: 2.Python GUI: A graphical user interface of a local
application that can be used for interactive testing where the
1.Gradient Boosted Trees (GBT): It iteratively combines
user can input feature vector, and get the classification
weak learners into a robust ensemble model on which high
immediately.
performance can be achieved even on complex datasets.

2.Multi-Layer Perceptron (MLP): It is effective for anomaly


detection because basis learns non linear relationships using Flask Application for Testing
neural networks.
The web framework used in this system is known as the
3.Support Vector Machines (SVM): As binary classification flask application which is used to create a web application
constructs, it is hyperplanes for high dimensional data. where one can test the system. The entered values of
features are subsequently obtained through the interface, and
4.Random Forest: A multiple decision trees ensemble
the extracted information is passed to the trained model for a
method for robust classification.
prediction.
5.Decision Tree: This model will split data by using the
Key Features of the Flask Application:
most important features at each step, it will be a simple and
interpretable model.

Page 9 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 10 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

1.Input Form: The user is given a form in which he or she is


expected to fill all the required features. All of the features
are specific to the one to one mapping of the attribute used
in the machine learning model.

2.Prediction Output: Once the user fills the form the Flask
application post the data to the backend where the data gets
passed on to the ML model. Thus the response of the system
will either read ‘Normal (No Threat)’ or ‘Anomaly
(Intrusion Threat)’ on the screen.

Functionality Workflow:

1.User connects to the Flask application over a web browser. Figure 2 Flask Application to test the system

2.The feature values are entered by the user manually. Figure 2 shows a screenshot of the Flask application
interface, highlighting the input fields and the prediction
3.Frontend model can only classify, which will be sent to the result.
backend model for further treatment.

4.It results in the same interface showing results


instantment.

Python GUI for Testing

A another alternative test availability interface is made in


Python GUI itself which is more user friendly for desktop
environments. It is just like the Flask application with
interactive as well as standalone desktop application.

Key Features of the Python GUI:

1.Graphical Input Fields: Users can conveniently enter


feature values in a good interface.

2.Real-Time Prediction: The prediction regarding what


prompts the system to make the prediction come as soon as
the user enters the input.

3.Reset Option: A reset button to reset the inputs and test


new feature values without restarting the application was
included in the GUI.

Page 10 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 11 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

a myriad of machine learning models. Modular design


facilitates future enhancement to meet emerging
cybersecurity challenges, and testing interfaces demonstrate
that the design provides usability and practical applicability.

IV. RESULTS AND DISCUSSION

Results Of Machine Learning Algorithms

The results present a detailed evaluation of the AI-Based


Threat Detection System using various machine learning
models under different preprocessing configurations.
13 Performance metrics, including accuracy, precision, recall,
and F1-score, are analyzed to determine the best-performing
models. Tables 1 through 4 and Figures 4 through 11
illustrate the evaluation metrics and comparison graphs for
5 each configuration, providing a comprehensive view of the
system's effectiveness.

1 Table 1 Evaluation Metrics of Machine learning algorithms

Accuracy Precision Recall F1 Score


Algorithms
Naive Bayes 72.353420 76.441927 72.353420 71.729593
SVM 95.697611 95.702568 95.697611 95.695478
Decision Tree 98.439197 98.444107 98.439197 98.438586
Random Forest 98.276330 98.289491 98.276330 98.275212
MLP 96.647666 96.706550 96.647666 96.642367
Logistic 95.317590 95.327916 95.317590 95.314340
Regression
Gradient- 99.619978 99.621049 99.619978 99.619916
Boosted Tree

20 The performance of machine learning algorithms on the raw


dataset is presented in Table 1. Gradient Boosted Trees
9 (GBT) achieved the highest accuracy of 99.6%, making it
the best-performing model in this configuration.

Accuracy: GBT emerged as the most accurate model.

Precision, Recall, and F1-Score: GBT consistently


Figure 3 Python GUI to test the system
outperformed other models across these metrics.
Figure 3 displays a screenshot of the Python GUI interface,
Figures 4 through 7 present comparison graphs for accuracy,
showcasing the input fields and the prediction result area.
precision, recall, and F1-scores, respectively, demonstrating
Functionality Workflow: the relative performance of all algorithms on the raw
dataset.
1.Python GUI opens in front of the user’s desktop.

2.The GUI input provides manually entered feature values.

3 3.After submission, the input is fed to the trained model and


the result will be displayed in GUI.

2 The both interfaces are used for testing the system’s


accuracy and usability in practical scenarios.

We build a scalable and robust threat detection system,


combining advanced preprocessing, feature engineering, and

Page 11 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 12 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

Figure 4 Comparison graph for accuracies on machine learning Figure 7 Comparison graph for F1 scores on machine learning
algorithms algorithms

Results when PCA is performed

Table 2 PCA Results

Accuracy Precision Recall F1 Score

Algorithms
SVM 94.536489 94.536688 94.536489 94.536585
Decision Tree 96.844181 96.850596 96.844181 96.841893
Random Forest 96.469428 96.583230 96.469428 96.458673
MLP 99.408284 99.408357 99.408284 99.408305
Logistic 94.902066 94.902875 94.902066 94.900487
Regression
Gradient-Boosted 98.121814 98.133233 98.121814 98.120693
Tree

When Principal Component Analysis (PCA) was applied to


the dataset for dimensionality reduction, the Multi-Layer
26 Perceptron (MLP) model performed best, achieving an
accuracy of 99.4%. The evaluation metrics for all models
under this configuration are shown in Table 2.
Figure 5 Comparison graph for Precision scores on machine
learning algorithms
6 MLP: Achieved high scores across all metrics, including
precision, recall, and F1-score, demonstrating its ability to
generalize well with reduced dimensionality.

Results when ChiSquare Selection is performed

Table 3 ChiSquare selection Results

Accuracy Precision Recall F1 Score


1 Algorithms
Naive Bayes 72.268245 76.226531 72.268245 71.825852
SVM 95.226824 95.227188 95.226824 95.226991
Decision Tree 98.836292 98.837667 98.836292 98.836511
Random Forest 98.796844 98.817761 98.796844 98.795536
MLP 96.824458 96.86172 96.824458 96.819258
Logistic 95.266272 95.26553 95.266272 95.265766
Regression
Gradient- 99.526627 99.527006 99.526627 99.526676
Boosted Tree
Figure 6 Comparison graph for Recall scores on machine
learning algorithms
The application of Chi-Square Feature Selection led to
Gradient Boosted Trees (GBT) again achieving the best

Page 12 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 13 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

performance, with an accuracy of 99.5%. The evaluation


metrics for this configuration are detailed in Table 3.

GBT: Demonstrated improved precision, recall, and F1-


scores, reinforcing its ability to effectively classify threats
with optimized features.

Results when PCA is applied is also followed by


ChiSquare Feature Selection

Table 4 Results of PCA followed by ChiSquare Selection

Accuracy Precision Recall F1 Score


Algorithms Figure 9 Comparison graph for Precision scores on machine
SVM 94.930966 94.936025 94.930966 94.932422
Decision Tree 97.100592 97.160715 97.100592 97.094559 learning algorithms when PCA followed by ChiSquare selection
Random Forest 96.646943 96.749411 96.646943 96.637386 is performed
MLP 99.447732 99.448701 99.447732 99.447827
Logistic
Regression 94.990138 94.994401 94.990138 94.986298
Gradient-Boosted
Tree 98.560158 98.560378 98.560158 98.560233

When PCA was followed by Chi-Square Feature Selection,


the Multi-Layer Perceptron (MLP) model once again
performed best, achieving an accuracy of 99.4%. The
24 evaluation metrics for all models under this configuration
are summarized in Table 4.

6 MLP: Exhibited strong performance in all metrics,


including precision, recall, and F1-score, showcasing its Figure 10 Comparison graph for Recall scores on machine
robustness in handling reduced and optimized feature sets. learning algorithms when PCA followed by ChiSquare selection
is performed
Figures 8 through 11 provide comparison graphs for
23 accuracy, precision, recall, and F1-scores, respectively, for
machine learning algorithms when PCA followed by Chi-
Square selection was applied.

Figure 11 Comparison graph for F1 scores on machine learning


algorithms when PCA followed by ChiSquare selection is
performed

Figure 8 Comparison graph for accuracies on machine learning The evaluation highlights Gradient Boosted Trees (GBT)
algorithms when PCA followed by ChiSquare selection is and Multi-Layer Perceptron (MLP) as the top-performing
performed
models under different configurations. Feature engineering
techniques, such as PCA and Chi-Square selection,
10 significantly improve the system's performance by reducing
2 noise and focusing on the most relevant features. The results
validate the effectiveness of the proposed methodology in
detecting anomalies and intrusion threats with high
accuracy. Figures 4 through 11 provide detailed visual

Page 13 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712


Page 14 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

representations of the evaluation metrics and comparisons [4] Wang, Bo-Xiang, Jiann-Liang Chen, and Chiao-Lin Yu.
across different configurations. "An ai-powered network threat detection system." IEEE
Access 10 (2022): 54029-54037.
V. CONCLUSION
[5] Salatino, Francesco, et al. "Detecting DDoS Attacks
The AI Based Threat Detection System develops to address
Through AI driven SDN Intrusion Detection System." 2024
the problem of developing advanced scalable, accurate
IEEE 21st Consumer Communications & Networking
mechanisms for network traffic anomalies detection and
Conference (CCNC). IEEE, 2024.
classification in the presence of evolving cyber threats. The
2 system employs state of the art machine learning algorithms [6] Ullah, Shamsher, et al. "The revolution and vision of
and robust feature engineering techniques, which yield high explainable AI for android malware detection and
accuracy across numerous configurations and show the protection." Internet of Things (2024): 101320.
system to be an effective and flexible tool while in operation
[7] Preetam, Sonu, et al. "An Approach for Intelligent
in the real world.
Behaviour-Based Threat Modelling with Explanations."
Key results include evidence that raw data are performing 2023 IEEE Conference on Network Function Virtualization
better than Gradient Boosted Trees (GBT) with 99.6% and Software Defined Networks (NFV-SDN). IEEE, 2023.
accuracy and that the Multi-layer Perceptron (MLP) model
[8] Senevirathna, Thulitha, et al. "Deceiving Post-hoc
1 is robust to Principal Component Analysis (PCA) and Chi-
Explainable AI (XAI) Methods in Network Intrusion
square feature selection. These results demonstrate the
Detection." 2024 IEEE 21st Consumer Communications &
25 improvement of model accuracy, precision, recall and F1
Networking Conference (CCNC). IEEE, 2024.
scores achieved by feature optimization.
[9] Lee, Jonghoon, et al. "Cyber threat detection based on
The usability of the system was further validated in the
artificial neural networks using event profiles." Ieee Access
development of Flask and Python GUI interfaces which
7 (2019): 165607-165626.
allowed for immediate live predictions with manual testing.
Thus, these interfaces guarantee practical applicability and [10] Rathod, Viraj, Chandresh Parekh, and Dharati
improve the user experience of the cybersecurity Dholariya. "AI & ML Based Anamoly Detection and
professionals. Response Using Ember Dataset." 2021 9th International
Conference on Reliability, Infocom Technologies and
Finally,The study concludes that a robust and scalable
Optimization (Trends and Future Directions)(ICRITO).
solution for the problem of detecting network threats can be
IEEE, 2021.
formed through using machine learning with feature
engineering and user friendly interface. This work creates a [11] Sidarth V., Kavitha C.R., “Network Intrusion Detection
solid foundation for future improvements that incorporate System Using Stacking and Boosting Ensemble Methods “,
with real time monitoring systems to help overcome the Proceedings of the 3rd International Conference on
cyber security issues of today. Inventive Research in Computing Applications, ICIRCA
2021, pp: 357 – 363
VI. REFERRENCES
[12] Shanmukha Aditya G., Kruthika B., Shinu
[1] Singh, Kuldeep, and Lakshmi Sevukamoorthy.
M.Rajagopal, C. R. Kavitha, Homomorphic Encryption for
"Blockchain and AI-Based Threat Detection for Enhanced
Secure Data Analysis: A Hybrid Approach using
Security in Financial Networks." 2023 IEEE Technology &
PKCS1_OAEP Padding, 2nd International Conference on
Engineering Management Conference-Asia Pacific
Intelligent Data Communication Technologies and Internet
(TEMSCON-ASPAC). IEEE, 2023.
of Things (IDCIoT 2024), January 2024
[2] Schmitt, Marc. "Securing the Digital World: Protecting
smart infrastructures and digital industries with Artificial
Intelligence (AI)-enabled malware and intrusion detection."
Journal of Industrial Information Integration 36 (2023):
100520.

[3] Mirsky, Yisroel, et al. "The threat of offensive ai to


organizations." Computers & Security 124 (2023): 103006.

Page 14 of 14 - Integrity Submission Submission ID trn:oid:::1:3133765712

You might also like