Ids Iot Paper
Ids Iot Paper
---------------------------------------------------------------**-------------------------------------------------------------------
1. Abstract
In contemporary network environments, ensuring robust security against
malicious attacks poses significant challenges to Intrusion Detection
Systems (IDS). Despite their critical role in safeguarding network
integrity, IDS often encounter performance degradation due to evolving
attack methodologies and increasing network complexities. To mitigate
these challenges and bolster network security, innovative approaches
are required.
This study explores novel techniques for enhancing IDS performance by
leveraging advanced Machine Learning technologies. Through
meticulous preprocessing of an extensive Internet of Things (IoT)
dataset encompassing both normal network traffic and various anomaly
attack types, features directly correlated with the target column, denoting
different attack categories, are identified. The emphasis is placed on
discerning dependent features crucial for accurate intrusion detection.
Subsequently, diverse classification algorithms are employed to evaluate
their efficacy in accurately identifying and classifying network intrusions
based on the identified features. By comparing the performance of
different classifiers, the study aims to ascertain the most suitable
algorithm for robust and efficient intrusion detection in modern networks.
Furthermore, the study elucidates the trans formative potential of
advanced technologies in fortifying network security amidst evolving
Cyber threats. By harnessing innovative approaches, such as Machine
2
2. Introduction
The proliferation of interconnected devices and digital infrastructures has
heightened the importance of network security in safeguarding sensitive
information, critical assets, and organizational operations. As cyber
threats continue to evolve in sophistication and complexity, organizations
are faced with the formidable task of fortifying their networks against a
myriad of potential vulnerabilities and attacks. In this context, the role of
network security technologies, including Intrusion Detection Systems
(IDS), is paramount in mitigating risks, detecting anomalies, and
preserving the integrity of digital ecosystems. This section delves into
the multifaceted landscape of network security, exploring key challenges,
strategies, and technologies aimed at fortifying network defenses and
mitigating Cyber threats.
Challenges in Network Security:
Ensuring robust network security poses formidable challenges for
organizations in the face of evolving Cyber threats, burgeoning network
complexities, and the proliferation of interconnected devices. One of the
primary challenges lies in the dynamic nature of Cyber threats, which
continuously adapt and evolve in response to advancements in
technology and security measures. From sophisticated malware attacks
to stealthy infiltration attempts, organizations must contend with a
diverse array of threats that target vulnerabilities in network
infrastructure, software applications, and user endpoints.
3
3. Dataset Description
Overview: The IoT Network Intrusion Dataset is a collection of network
traffic data captured from Internet of Things (IoT) devices in a simulated
environment. This dataset is intended for research and analysis of
network security, particularly in IoT ecosystem. This dataset can be used
for free for academic projects.
Content:
● The dataset contains network traffic logs from various IoT devices
like smart home devices, Wi-Fi camera, smartphones, laptops,
tablets, etc. connected to a smart home router.
● Each record in the dataset contains a network transaction,
including source and destination IP addresses, source and
destination port numbers, protocol, timestamp, no. of packets sent,
etc.
● Attributes include sent packet length, maximum packet length,
minimum packet length, etc.
Data Format:
6
Link to Dataset:
https://sites.google.com/view/iot-network-intrusion-dataset/home
4. Literacy Survey
In recent years, the increasing complexity and diversity of cyber threats
have made intrusion detection a critical component of cybersecurity.
8
5.2 Mutual_info_classif:-
Mutual information classification, often abbreviated as
mutual_info_classif, stands out as a feature selection technique utilized
to discern crucial features within a given feature set. The process of
selecting key features holds paramount importance as it contributes to
performance enhancement through noise reduction and the alleviation of
overfitting concerns. Additionally, it aids in improving interpretability by
concentrating on key factors pivotal to understanding underlying data
patterns.
14
Here:
p(x,y)is the joint probability distribution function of X and Y
p(x) and p(y) are the marginal probability distribution functions of X and
Y.
5.4 XG Boost: -
XG Boost, short for "Extreme Gradient Boosting," represents a highly
optimized distributed gradient boosting library meticulously crafted for
the efficient and scalable training of machine learning models. This
method, an ensemble learning technique, amalgamates the predictions
of multiple weak models to generate a robust and accurate prediction.
XGBoost has garnered widespread popularity and acclaim in the
machine learning community owing to its prowess in handling large
datasets and its capacity to achieve state-of-the-art performance across
various tasks like classification and regression. Notably, XGBoost
distinguishes itself with its adeptness in managing missing values,
obviating the need for extensive preprocessing of real-world data.
Furthermore, XGBoost incorporates built-in support for parallel
processing, facilitating the training of models on extensive datasets
within reasonable time frames.
Mathematical Formula of XG Boost is
i) Training Dataset {(xi,yi)}i=1n where xi represent the feature and yi
represent the target variable.
ii) Objective function L(y/,y) to minimize, typically a differentiable loss
function such as squared error or log loss.
16
6. Proposed Model
Our proposed model combines unsupervised clustering and boosting
algorithms to enhance intrusion detection in network systems. The
foundation of our methodology relies on the IOT intrusion dataset, a
17
Accuracy:
In machine learning, the accuracy score is calculated using a confusion
matrix. A confusion matrix is a table that provides a summary of a
classification model's performance by comparing the predicted and
actual output values. The matrix has rows that represent the predicted
output and columns that represent the actual output. The values in the
diagonal of the matrix represent the correctly classified instances, while
the values outside the diagonal represent the incorrectly classified
instances. By analysing the confusion matrix, one can determine the
model's accuracy, precision, recall, and other performance metrics. The
accuracy score is the percentage of correctly predicted output labels out
of the total number of output labels. It is one of the most commonly used
metrics to evaluate a classification model's performance.
Using these values, we can find the accuracy using the following
formula:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision: -
Precision is a fundamental performance metric that evaluates the ratio of
correctly predicted positive instances to the total number of positive
predictions made by a classification model. It is a crucial metric in the
context of the confusion matrix. The precision score can be computed
using the following formula:
Precision = True Positives / (True Positives + False Positives)
Here, True Positives refer to the number of correctly predicted positive
instances, and False Positives refer to the number of negative instances
that were incorrectly classified as positive by the model. By utilizing this
formula, one can easily calculate the precision score of a classification
model.
Recall: -
Recall is a metric used in performance evaluation that measures the
proportion of true positives to the total number of actual positive cases in
the dataset, within the confusion matrix. It is also known as sensitivity or
the true positive rate.
Recall can be calculated using the following equation:
Recall = True Positives / (True Positives + False Negatives)
To clarify, true positives are the cases where the model correctly
predicted the positive class, while false negatives are the cases where
the model incorrectly predicted the negative class.
F1 Score: -
The F1 score is a performance metric that combines precision and recall
to provide an overall measure of a classification model's performance. It
is the harmonic mean of precision and recall, with values ranging from 0
to 1, where a higher score indicates better performance.
27
8. Result
Where:
In our case, the calculated inertia value was 1.024, suggesting a high
degree of separation between the clusters and tight grouping within each
cluster.
In the second layer of our machine learning model, our primary objective
is to pinpoint the specific type of vulnerability lurking within a malicious
packet. This could be a Denial-of-Service (DoS) attack aiming to
overwhelm a system with traffic, a malicious attempt to infect a device
with the Mirai botnet, a Man-in-the-Middle (MITM) exploit designed to
eavesdrop on communication, or even a routine network scan. To
achieve this fine-grained classification, we've incorporated the power of
ensemble machine learning algorithms.
o Accuracy: 0.989670
o Class 0:
▪ Precision : 0.999964
▪ Recall : 0.999637
▪ F1-Score : 0.999801
o Class 1:
▪ Precision : 0.935238
▪ Recall : 0.931933
▪ F1-Score: 0.933583
o Class 2:
31
▪ Precision : 0.991107
▪ Recall : 0.995378
▪ F1-Score : 0.993238
o Class 3:
▪ Precision : 0.999233
▪ Recall : 0.990452
▪ F1-Score : 0.994823
o Class 4:
▪ Precision : 0.994956
▪ Recall : 0.977599
▪ F1-Score : 0.986201
o Accuracy: 0.989281
o Class 0:
▪ Precision : 1.000000
▪ Recall : 0.999982
▪ F1-Score : 0.999991
o Class 1:
▪ Precision : 0.938513
▪ Recall : 0.919863
▪ F1-Score : 0.929095
o Class 2:
▪ Precision : 0.990383
▪ Recall : 0.995907
▪ F1-Score : 0.993137
o Class 3:
▪ Precision : 0.997574
▪ Recall : 0.992524
▪ F1-Score : 0.995043
o Class 4:
▪ Precision : 0.994624
▪ Recall : 0.975845
▪ F1-Score : 0.9851453434