Intrusion Detection System For IoT Environment Using Ensemble Approaches

Uploaded by

tahmid hasan sakib

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Intrusion Detection System For IoT Environment Using Ensemble Approaches

Uploaded by

tahmid hasan sakib

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Intrusion Detection System for IoT Environment

using Ensemble Approaches

Aamir S. Ahanger Sajad M. Khan Faheem Syeed Masoodi
Department of Computer Science Department of Computer Science Department of Computer Science
University of Kashmir University of Kashmir University of Kashmir
Srinagar, India Srinagar, India Srinagar, India
[email protected] [email protected] [email protected]

Abstract— To ensure that important assets are available monitor their various organs' health have been tried and
and secure within a protected network architecture, Intrusion successful on a number of occasions [3]. Attackers may
Detection Systems (IDS) are commonly used. However, current target these devices in an effort to manipulate data or track
IDS algorithms often struggle to perform effectively. To down a specific person. Despite the fact that such an attack
address this, machine learning has been employed to enhance has not yet occurred in reality, it can be quite harmful if
IDS efficiency. The main challenge with IDS classification is the these gadgets are compromised. By monitoring, recognising,
large amount of irrelevant and redundant data in high- and defending against threats and breaches that could
dimensional datasets, making it impossible for a single classifier endanger the security of network resources. Security in the
to identify all types of attacks effectively. Thus, a novel
IoT refers to the process of protecting Internet devices and
ensemble IDS approach was proposed in this study. The
approach involved using Random Forest (RF) for
the network. These threats may jeopardise the availability,
dimensionality reduction to select the optimal subset of the confidentiality, and integrity of data [4], [5]. Therefore, it is
initial dataset. An ensemble learning method was then used for crucial to identify these flaws and threats as soon as possible
intrusion detection and identification. The proposed RF method to protect data privacy. When it comes to completely
outperformed other state-of-the-art approaches in several defending networks and devices from more sophisticated
parameters with an accuracy of 99%, as demonstrated by assaults, traditional security measures such as, user
experimental results on the IoTID20 dataset. The approach was authentication, firewalls, encryption, and access control in
evaluated using several performance criteria, including data transfer have some limits. The considerable amount of
Accuracy, Precision, Recall, and F1-score. redundant and irrelevant data available in high-dimensional
datasets presents the main challenge in an IDS's
Keywords— Intrusion detection, Internet of Things (IoT), classification process. Second, it is implausible that a single
Feature Selection, Machine learning, Dataset, Random Forest, classifier is incapable of accurately classifying every attack
ensemble approach. type. It is crucial to create an IDS that is capable of spotting
sophisticated network attacks. Additionally, a sufficient
I. INTRODUCTION
dataset must be available to assess IDS performance in order
The Internet of Things (IoT) is expanding quickly and is to examine system performance prior to system deployment
becoming increasingly important in our daily lives. Internet- in the actual world [6].
connected IoT nodes can connect to the internet by using an
IP address. As a result, users of various social networking In order to train and evaluate their model, researchers
platforms will be able to connect to and share devices [1]. require a dataset. It's still challenging because there are so
There is a concern about security and privacy with this broad few datasets that are freely accessible, and some of them
range of IoT applications. Without a secure and reliable IoT even lack thoroughness and completeness. Currently,
ecosystem, emerging IoT applications cannot be widely machine learning (ML) technology has undergone enormous
implemented. IoT security concerns include issues with improvement due to the capability of computer devices.
privacy, authentication, management, information storage, Consequently, as ML classifiers significantly improve the
and other issues include usual security issues that the system's accuracy and robustness, They are used to flag
Internet, cellular networks, and WSNs face. All of these multiple attacks in the security field [7], [8]. In this study,
problems and weaknesses make IoT applications a prime we suggested a brand-new ensemble intrusion detection
target. Attacks on privacy and security have occurred all system (IDS) using ML approaches that entails two steps:
across the world. feature selection and intrusion detection. In the first step, the
ideal subset (to remove redundant and irrelevant data
The Mirai attack, which launched a DDoS attack in the present) from the original dataset by using dimensionality
4th quarter of 2016, compromised around 2.5 million reduction approach namely RF. In second phase, we
Internet-connected devices. After Hajime and Reaper, Mirai, provided an ensemble learning technique for identifying and
are two other noteworthy botnet attacks against a sizable IoT detecting intrusions. XGBoost and RF are the most effective
device network [2]. IoT hardware is less powerful and secure techniques for predictive modeling, because a single
than conventional hardware. IoT devices offer a backdoor classifier can't effectively identify every kind of attack.
for enemies to infiltrate into residential and commercial
networks, providing them simple access to user data. The II. RELATED WORK
Internet of Stuff is also becoming more than just physical The proposed NBIPS by Kumar et al. examines network
objects or things. Implanting IoT devices into people to activity streams to spot and stop instances of misuse. Both

Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.
passive and inline network-based IPS sensors are available assurance for integrity, authentication, secrecy, and non-
for installation. To keep track of the traffic moving through repudiation. Machine learning techniques have greatly
it, an inline sensor has been put. The sensors are put in place benefited anomaly-based intrusion detection systems. The
to thwart assaults by employing an IoT protocol with classification approach involves using data examples to train
signature-based traffic blocking [9]. The AE-IDS method, and build a model that can accurately categorize new
proposed by Li et al. [10], is based on random forest instances.
algorithm, and deep learning technology. Using this
technique, which also makes use of feature grouping and TABLE I. DISTRIBUTION OF BINARY, CATEGORY INSTANCES IN THE
feature selection, the training set is created. According to the IOTID20 DATASET
experimental results, the suggested method outperforms Binary Instances Category Instances
conventional ML-based IDS in terms of simple training, and Normal 40073 Normal 40073
high detection accuracy ,reliable adaptability. Talita et.al. DoS 59391
applied NB and PSO as attribute selection methods on the Mirai 415677
KDD CUP'99 benchmark dataset. PSO is used as the feature Anomaly 585710
MITM ARP Spoofing 35377
selection approach because solving the IDS issue on the Scan 75265
dataset requires relatively significant costs, either in terms of
time or memory use. When 38 features are included, they get In this context, a series of simple actions were combined
the best classification result with an accuracy of 99.12%. to achieve this goal. The dataset was pre-processed to
The best classifier performance can be achieved by using a prepare it for use by learning algorithms. A ratio of 65:35
parameter optimization technique for future enhancement were used to train and test datasets. The train set was utilized
[11]. The PSO-Xgboost model is suggested in this study during the training phase of the ensemble learning
because it generally performs better in terms of classification algorithms. Finally, the test set was considered to evaluate
accuracy when compared to rival models like Xgboost, the final model.
Bagging, Adaboost and Random Forest. When recognizing
attack subtypes with low attack frequency, such as U2R and In order to train and evaluate the model, researchers
R2L, the PSO-Xgboost model performs better than other require a dataset [17]. To track down cyber-attacks on IoT
comparable models in terms of accuracy, recall, macro- networks, the IoTID20 dataset was created. This dataset was
average, and mean average precision (mAP) [12]. The produced using communication data and fresh information
suggested model has a two-stage feature selection and attack regarding the detection of network interference. Three
identification process. An associated subset of characteristics labels—binary, category, and subcategory—and 83 IoT
is found during the feature selection stage using the logistic network features make up this dataset. Labels comprise
regression technique and genetic algorithm. The ANN binary, category, and subcategory features [18]. Five distinct
algorithm was utilized during the attack detection phase, classes: Mirai, Scan, DoS, Normal, and MITM were used to
particle optimization and gravitational search techniques are group the total count of records. The distribution of attack
used to train artificial neural networks (ANN) [13]. Four and baseline records is appropriately depicted in Table I. For
unique attribute subsets that were taken from the NSL-KDD ML/DL approaches, preparing the data is a crucial step. Data
dataset were used to assess the system's performance. The is preprocessed into a format that is appropriate for any
accuracy of the detection technique for the various attack learning model. Cleansing, label encoding; feature
classes was compared in order to determine which algorithm engineering, normalization, and data separation all are
is the most effective for a given attack class [14]. Generative included in this section. Cleaning the Dataset before training
DL models were trained to recognize a range of assaults like a model, it is necessary to check a dataset for empty and
DDoS and numerous botnets like Okiruk, Mirai, and Torii by undefined instances. The dataset in this experiment was
using IoT-23 dataset. The different models/algorithms were verified using Pandas, a built-in Python module. There are
trained using over 1.8 million flows. The generative models some missing variables in the IoTID20 dataset. All instances
outperform conventional ML methods such as Random of missing values were eliminated from the dataset to make
Forests. Models based on BiGAN and AAE both yielded an it clean. For dealing with categorical values, label encoding
F1-Score of 0.99. With an F1-Score ranging from 0.85 to 1, is a well-known encoding technique. Each categorical value
a BiGAN trained to recognize novel zero-day attacks was were given a different number value by this technique. The
also trained to recognize unidentified attacks [15]. The categorical features were converted to numeric values using
problems that single classifiers have led to the recent the label encoder approach. Normalization is the process of
proposal of an ensemble technique, a methodology to converting the numerical column values in a dataset to a
address them. Thus, a highly scalable and practical ensemble similar scale while preserving the ranges of values. The data
model based on majority voting was offered, which can be were normalized using the min-max method between 0 and
used in real-time to effectively analyse network data and 1. Feature selection is a technique for minimizing the input
foresee future attacks. As a result, accuracy of assaults, DoS variable to a model. It is the procedure of automatically
: 99%, Probe : 97.2%, R2L : 97.2%, and U2R : 93.2% was selecting pertinent features for the machine learning model.
attained, proving that the suggested model is successful at The process of feature selection involves including important
identifying intrusion [16]. features without changing them. ML is a subcategory of AI
that is closely related to computational statistics and places a
III. MATERIALS AND METHODS strong emphasis on prediction. Different people are known
In recent years, the importance of network security has to learn and create baseline behavioral profiles on their own,
grown significantly, and the protection of IoT devices which they subsequently use to spot significant aberrations.
requires a security method that can provide reasonable

936 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom)
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.
Ensemble learning is a method for solving specific performances of the models that were ran. Also calculated
computational intelligence problems by strategically are the Precision, Recall, and F1-score scores. These
generating and combining a number of models, such as measures are dependent on the TP, TN, FP, and FN—four
classifiers or experts. The main purpose of using ensemble fundamental qualitative model quality indicators—as well as
learning is to enhance a model's performance (in terms of other metrics [23]. The performance is described below,
classification, prediction, function approximation, etc.) or along with a brief analogy of the testing accuracy that results
lessen the possibility of making a mistaken choice of a from these procedures. Table II shows the testing recall,
subpar model. This study takes into account XGBoost, RF precision, accuracy, and f1-score of each ensemble model
ensemble approaches. The supervised ML algorithm RF using IoTID20 dataset. These numbers show that XGBoost
creates the forest using a number of DTs, thus living up to its achieves the highest test accuracy, with a 99% score, among
name. A forecast is obtained from each tree, and the best ensemble models (RF and XGBoost). Every dataset comes
answer is selected by voting by RF, which builds DTs using with its own unique set of features. Insignificant features that
data samples selected at random [19]. With a highly scalable have no bearing on the output label must be removed from a
training method that avoids overfitting, XGBoost uses dataset if it contains numerous features; otherwise, over- and
shallow decision trees that are built sequentially to generate under-fitting will have a negative impact on the classifier's
accurate results [20]. performance and execution time. The random forest
methodology was employed in order to filter the most
The training procedure is iterative, with further trees
relevant variables. The dataset examined contains 83
being built as needed to account for the residuals or errors of
variables. Using the embedded technique, we select 42
earlier trees, which are then added to earlier trees to obtain
features to check the performance. Table II includes the
the final forecast. Since attack patterns change daily, it is
findings of our experimental procedures.
necessary to increase the accuracy of current machine
learning algorithms in order to detect new threats [21]. TABLE II. PERFORMANCE OF RF AND XGBOOST USING ALL FEATURES OF
IOTID20 DATASET
Machine learning methods have tremendously benefited
the creation and advancement of anomaly-based intrusion Classifier Accuracy Precision Recall F1- Score
detection systems. The aim of the classification method is to Random Forest 98.20% 97.00% 96.40% 97.00%
use the data instances to train and built a model, which will XGBoost 97.14% 97.60% 93.80% 95.60%
then be used to accurately classify new instances [22]. The
entire structure is shown in Fig. 1, and the first and most TABLE III. PERFORMANCE OF RF AND XGBOOST USING ONLY 42
important stage is to gather the dataset, observe carefully, FEATURES OF THE IOTID20 DATASET
and assess the features and their data kinds. Intrusion Classifier Accuracy Precision Recall F1- Score
detection is a proactive security defense technique that is Random Forest 99.00% 98.40% 98.00% 97.80%
capable of efficiently defending the security of cyberspace. XGBoost 98.47% 97.60% 93.80% 95.60%
Systems for detecting intrusions have advanced along with
data analysis technology. ML techniques allow the IoT V. CONCLUSION
systems to make wise decisions, by utilization the data
produced by the IoT networks. In this paper, we have presented a data analysis method
for detection of intrusions in IoT environment. As a broad
introduction to potential hazards posed by IoT, we start with
the most recent versions of various intrusion detection
systems. This article then goes through two ensemble
strategies for identifying IoT assaults in a known or even
mysterious setting. Based on the experimental examination,
it can be said that among the ensemble techniques under
study, RF performs the best. As a result, additional research
into the issue based on real-time data and power-time
optimization is necessary. The next step in this study's
methodology involves the use of deep learning algorithms,
which will improve the correctness and effectiveness of
packet classification and recognition across a wide range of
Fig. 1. Architecture of the IDS System packet types sent over a network.

IV. EXPERIMENT AND RESULTS REFERENCES

The experiment were carried out on a personal [1] M. Frustaci, P. Pace, G. Aloi, and G. Fortino, “Evaluating critical
security issues of the IoT world: Present and future challenges,” IEEE
TOSHIBA laptop with an Intel(R) Core(TM) i5CPU M 480 Internet Things J, vol. 5, no. 4, pp. 2483–2495, 2018, doi: 10.1109/
@ 2.67 GHz processor, 4 GB of RAM, and the 64-bit JIOT.2017.2767291.
Windows 7 Ultimate operating system. The models were [2] S. Mani, V. Saravanan, T. S. Lawrence, G. R. Sakthidharan, and M.
used in Google Colab Notebook, an open-source application. Veluchamy, “Advanced Security Model for Internet of Things
The scikit-learn framework is used to implement ensemble Environment,” International Journal of Recent Technology and
models. Pandas and NumPy frameworks are employed for Engineering, vol. 8, no. 6, pp. 3387–3392, 2020, doi: 10.35940/ijrte.
F8815.038620.
data loading and cleaning.
[3] G. Yang et al., “IoT-Based Remote Pain Monitoring System: From
In the experiment, accuracy, a popular multi-class Device to Cloud Platform,” IEEE J Biomed Health Inform, vol. 22,
no. 6, pp. 1711–1719, 2018, doi: 10.1109/JBHI.2017.2776351.
performance metric, is assessed in order to analyze the

2023 10th International Conference on Computing for Sustainable Global Development (INDIACom) 937
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.
[4] T. A. Teli, F. Masoodi and R. Yousuf, “Security Concerns and Privacy [22] V. Hassija, V. Chamola, V. Saxena, D. Jain, P. Goyal, and B. Sikdar,
Preservation in Blockchain based IoT Systems: Opportunities and “A Survey on IoT Security: Application Areas, Security Threats, and
Challenges,” in Proc. of the ICICNIS 2020, 2021. Solution Architectures,” IEEE Access, vol. 7, pp. 82721-82743, 2019.
[5] I. S. Thaseen, B. Poorva, and P. S. Ushasree, “Network Intrusion [23] M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class
Detection using Machine Learning Techniques,” In Proc. of the Classification: an Overview,” 2020, [Online]. Available: http://arxiv.
International Conference on Emerging Trends in Information org/abs/2008.05756.
Technology and Engineering, 2020, doi: 10.1109/ic-ETITE47903.
2020.148.
[6] F. Masoodi, S. Alam, and S. T. Siddiqui, “Security & Privacy Threats,
Attacks and Countermeasures in Internet of Things,” Int. J. Netw.
Secur. Its Appl., vol. 11, no. 02, pp. 67–77, 2019, doi: 10.5121/ijnsa.
2019.11205.
[7] H. Alqahtani, I. H. Sarker, A. Kalim, S. M. Minhaz Hossain, S. Ikhlaq,
and S. Hossain, “Cyber intrusion detection using machine learning
classification techniques,” In Proc. of the International Conference on
Computing Science, Communication and Security, 2020, pp. 121-131.
[8] F. S. Masoodi, I. Abrar, and A. M. Bamhdi, “An Effective Intrusion
Detection System using Homogeneous Ensemble Techniques,” Int. J.
Inf. Secur. Priv., vol. 16, no. 1, pp. 1–18, 2021, doi: 10.4018/ijisp.
2022010112.
[9] A. Kumar, K. Abhishek, M. R. Ghalib, A. Shankar, and X. Cheng,
“Intrusion detection and prevention system for an IoT environment,”
Digital Communications and Networks, vol. 8, no. 4, pp. 540-551,
2022.
[10] X. Li, W. Chen, Q. Zhang, and L. Wu, “Computers & Security
Building Auto-Encoder Intrusion Detection System based on random
forest feature selection,” ComputSecur, vol. 95, 2020, doi: 10.1016/j.
cose.2020.101851.
[11] A. S. Talita, O. S. Nataza, and Z. Rustam, “Naïve Bayes Classifier and
Particle Swarm Optimization Feature Selection Method for
Classifying Intrusion Detection System Dataset,” J Phys Conf Ser, vol.
1752, no. 1, 2021, doi: 10.1088/1742-6596/1752/1/012021.
[12] H. Jiang, Z. He, G. Ye, and H. Zhang, “Network Intrusion Detection
Based on PSO-Xgboost Model,” IEEE Access, vol. 8, pp. 58392–
58401, 2020, doi: 10.1109/ACCESS.2020.2982418.
[13] S. Hosseini, “A new machine learning method consisting of GA-LR
and ANN for attack detection,” Wireless Networks, vol. 26, no. 6, pp.
4149–4162, 2020, doi: 10.1007/s11276-020-02321-3.
[14] F. Masoodi and others, “Machine Learning for Classification analysis
of Intrusion Detection on NSL-KDD Dataset,” Turkish Journal of
Computer and Mathematics Education (TURCOMAT), vol. 12, no.
10, pp. 2286–2293, 2021.
[15] N. Abdalgawad and S. Member, “Generative Deep Learning to Detect
Cyberattacks for the IoT-23 Dataset,” IEEE Access, vol. 10, pp. 6430–
6441, 2022, doi: 10.1109/ACCESS.2021.3140015.
[16] A. M. Bamhdi, I. Abrar, and F. Masoodi, “An ensemble based
approach for effective intrusion detection using majority voting,”
Telkomnika (Telecommunication Computing Electronics and
Control), vol. 19, no. 2, pp. 664–671, 2021, doi: 10.12928/
TELKOMNIKA.v19i2.18325.
[17] I. Ullah and Q. H. Mahmoud, “A Scheme for Generating a Dataset for
Anomalous Activity Detection in IoT Networks,” in Lecture Notes in
Computer Science, 2020, pp. 508–520. doi: 10.1007/978-3-030-
47358-7_52.
[18] I. Abrar, Z. Ayub, F. Masoodi, and A. M. Bamhdi, “A Machine
Learning Approach for Intrusion Detection System on NSL-KDD
Dataset,” In Proc. of the Int. Conf. Smart Electron Commun.
(ICOSEC-2020), 2020, pp. 919–924, doi: 10.1109/ICOSEC49089.
2020.9215232.
[19] C. Ambikavathi and S. K. Srivatsa, “Predictor Selection and Attack
Classification using Random Forest for Intrusion Detection,” Journal
of Scientific and Industrial Research, vol. 79, no. 05, pp. 365–368,
2020.
[20] S. S. Dhaliwal, A. al Nahid, and R. Abbas, “Effective intrusion
detection system using XGBoost,” Information (Switzerland), vol. 9,
no. 7, 2018, doi: 10.3390/info9070149
[21] A. S. Ahanger, S. M. Khan, and F. Masoodi, “An Effective Intrusion
Detection System using Supervised Machine Learning Techniques,”
In Proc. of the 5th International Conference on Computing
Methodologies and Communication (ICCMC-2021), 2021, pp. 1639–
1644, doi: 10.1109/ICCMC51019.2021.9418291.

938 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom)
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.