Intrusion Detection System For IoT Environment Using Ensemble Approaches
Intrusion Detection System For IoT Environment Using Ensemble Approaches
Abstract— To ensure that important assets are available monitor their various organs' health have been tried and
and secure within a protected network architecture, Intrusion successful on a number of occasions [3]. Attackers may
Detection Systems (IDS) are commonly used. However, current target these devices in an effort to manipulate data or track
IDS algorithms often struggle to perform effectively. To down a specific person. Despite the fact that such an attack
address this, machine learning has been employed to enhance has not yet occurred in reality, it can be quite harmful if
IDS efficiency. The main challenge with IDS classification is the these gadgets are compromised. By monitoring, recognising,
large amount of irrelevant and redundant data in high- and defending against threats and breaches that could
dimensional datasets, making it impossible for a single classifier endanger the security of network resources. Security in the
to identify all types of attacks effectively. Thus, a novel
IoT refers to the process of protecting Internet devices and
ensemble IDS approach was proposed in this study. The
approach involved using Random Forest (RF) for
the network. These threats may jeopardise the availability,
dimensionality reduction to select the optimal subset of the confidentiality, and integrity of data [4], [5]. Therefore, it is
initial dataset. An ensemble learning method was then used for crucial to identify these flaws and threats as soon as possible
intrusion detection and identification. The proposed RF method to protect data privacy. When it comes to completely
outperformed other state-of-the-art approaches in several defending networks and devices from more sophisticated
parameters with an accuracy of 99%, as demonstrated by assaults, traditional security measures such as, user
experimental results on the IoTID20 dataset. The approach was authentication, firewalls, encryption, and access control in
evaluated using several performance criteria, including data transfer have some limits. The considerable amount of
Accuracy, Precision, Recall, and F1-score. redundant and irrelevant data available in high-dimensional
datasets presents the main challenge in an IDS's
Keywords— Intrusion detection, Internet of Things (IoT), classification process. Second, it is implausible that a single
Feature Selection, Machine learning, Dataset, Random Forest, classifier is incapable of accurately classifying every attack
ensemble approach. type. It is crucial to create an IDS that is capable of spotting
sophisticated network attacks. Additionally, a sufficient
I. INTRODUCTION
dataset must be available to assess IDS performance in order
The Internet of Things (IoT) is expanding quickly and is to examine system performance prior to system deployment
becoming increasingly important in our daily lives. Internet- in the actual world [6].
connected IoT nodes can connect to the internet by using an
IP address. As a result, users of various social networking In order to train and evaluate their model, researchers
platforms will be able to connect to and share devices [1]. require a dataset. It's still challenging because there are so
There is a concern about security and privacy with this broad few datasets that are freely accessible, and some of them
range of IoT applications. Without a secure and reliable IoT even lack thoroughness and completeness. Currently,
ecosystem, emerging IoT applications cannot be widely machine learning (ML) technology has undergone enormous
implemented. IoT security concerns include issues with improvement due to the capability of computer devices.
privacy, authentication, management, information storage, Consequently, as ML classifiers significantly improve the
and other issues include usual security issues that the system's accuracy and robustness, They are used to flag
Internet, cellular networks, and WSNs face. All of these multiple attacks in the security field [7], [8]. In this study,
problems and weaknesses make IoT applications a prime we suggested a brand-new ensemble intrusion detection
target. Attacks on privacy and security have occurred all system (IDS) using ML approaches that entails two steps:
across the world. feature selection and intrusion detection. In the first step, the
ideal subset (to remove redundant and irrelevant data
The Mirai attack, which launched a DDoS attack in the present) from the original dataset by using dimensionality
4th quarter of 2016, compromised around 2.5 million reduction approach namely RF. In second phase, we
Internet-connected devices. After Hajime and Reaper, Mirai, provided an ensemble learning technique for identifying and
are two other noteworthy botnet attacks against a sizable IoT detecting intrusions. XGBoost and RF are the most effective
device network [2]. IoT hardware is less powerful and secure techniques for predictive modeling, because a single
than conventional hardware. IoT devices offer a backdoor classifier can't effectively identify every kind of attack.
for enemies to infiltrate into residential and commercial
networks, providing them simple access to user data. The II. RELATED WORK
Internet of Stuff is also becoming more than just physical The proposed NBIPS by Kumar et al. examines network
objects or things. Implanting IoT devices into people to activity streams to spot and stop instances of misuse. Both
936 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom)
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.
Ensemble learning is a method for solving specific performances of the models that were ran. Also calculated
computational intelligence problems by strategically are the Precision, Recall, and F1-score scores. These
generating and combining a number of models, such as measures are dependent on the TP, TN, FP, and FN—four
classifiers or experts. The main purpose of using ensemble fundamental qualitative model quality indicators—as well as
learning is to enhance a model's performance (in terms of other metrics [23]. The performance is described below,
classification, prediction, function approximation, etc.) or along with a brief analogy of the testing accuracy that results
lessen the possibility of making a mistaken choice of a from these procedures. Table II shows the testing recall,
subpar model. This study takes into account XGBoost, RF precision, accuracy, and f1-score of each ensemble model
ensemble approaches. The supervised ML algorithm RF using IoTID20 dataset. These numbers show that XGBoost
creates the forest using a number of DTs, thus living up to its achieves the highest test accuracy, with a 99% score, among
name. A forecast is obtained from each tree, and the best ensemble models (RF and XGBoost). Every dataset comes
answer is selected by voting by RF, which builds DTs using with its own unique set of features. Insignificant features that
data samples selected at random [19]. With a highly scalable have no bearing on the output label must be removed from a
training method that avoids overfitting, XGBoost uses dataset if it contains numerous features; otherwise, over- and
shallow decision trees that are built sequentially to generate under-fitting will have a negative impact on the classifier's
accurate results [20]. performance and execution time. The random forest
methodology was employed in order to filter the most
The training procedure is iterative, with further trees
relevant variables. The dataset examined contains 83
being built as needed to account for the residuals or errors of
variables. Using the embedded technique, we select 42
earlier trees, which are then added to earlier trees to obtain
features to check the performance. Table II includes the
the final forecast. Since attack patterns change daily, it is
findings of our experimental procedures.
necessary to increase the accuracy of current machine
learning algorithms in order to detect new threats [21]. TABLE II. PERFORMANCE OF RF AND XGBOOST USING ALL FEATURES OF
IOTID20 DATASET
Machine learning methods have tremendously benefited
the creation and advancement of anomaly-based intrusion Classifier Accuracy Precision Recall F1- Score
detection systems. The aim of the classification method is to Random Forest 98.20% 97.00% 96.40% 97.00%
use the data instances to train and built a model, which will XGBoost 97.14% 97.60% 93.80% 95.60%
then be used to accurately classify new instances [22]. The
entire structure is shown in Fig. 1, and the first and most TABLE III. PERFORMANCE OF RF AND XGBOOST USING ONLY 42
important stage is to gather the dataset, observe carefully, FEATURES OF THE IOTID20 DATASET
and assess the features and their data kinds. Intrusion Classifier Accuracy Precision Recall F1- Score
detection is a proactive security defense technique that is Random Forest 99.00% 98.40% 98.00% 97.80%
capable of efficiently defending the security of cyberspace. XGBoost 98.47% 97.60% 93.80% 95.60%
Systems for detecting intrusions have advanced along with
data analysis technology. ML techniques allow the IoT V. CONCLUSION
systems to make wise decisions, by utilization the data
produced by the IoT networks. In this paper, we have presented a data analysis method
for detection of intrusions in IoT environment. As a broad
introduction to potential hazards posed by IoT, we start with
the most recent versions of various intrusion detection
systems. This article then goes through two ensemble
strategies for identifying IoT assaults in a known or even
mysterious setting. Based on the experimental examination,
it can be said that among the ensemble techniques under
study, RF performs the best. As a result, additional research
into the issue based on real-time data and power-time
optimization is necessary. The next step in this study's
methodology involves the use of deep learning algorithms,
which will improve the correctness and effectiveness of
packet classification and recognition across a wide range of
Fig. 1. Architecture of the IDS System packet types sent over a network.
2023 10th International Conference on Computing for Sustainable Global Development (INDIACom) 937
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.
[4] T. A. Teli, F. Masoodi and R. Yousuf, “Security Concerns and Privacy [22] V. Hassija, V. Chamola, V. Saxena, D. Jain, P. Goyal, and B. Sikdar,
Preservation in Blockchain based IoT Systems: Opportunities and “A Survey on IoT Security: Application Areas, Security Threats, and
Challenges,” in Proc. of the ICICNIS 2020, 2021. Solution Architectures,” IEEE Access, vol. 7, pp. 82721-82743, 2019.
[5] I. S. Thaseen, B. Poorva, and P. S. Ushasree, “Network Intrusion [23] M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class
Detection using Machine Learning Techniques,” In Proc. of the Classification: an Overview,” 2020, [Online]. Available: http://arxiv.
International Conference on Emerging Trends in Information org/abs/2008.05756.
Technology and Engineering, 2020, doi: 10.1109/ic-ETITE47903.
2020.148.
[6] F. Masoodi, S. Alam, and S. T. Siddiqui, “Security & Privacy Threats,
Attacks and Countermeasures in Internet of Things,” Int. J. Netw.
Secur. Its Appl., vol. 11, no. 02, pp. 67–77, 2019, doi: 10.5121/ijnsa.
2019.11205.
[7] H. Alqahtani, I. H. Sarker, A. Kalim, S. M. Minhaz Hossain, S. Ikhlaq,
and S. Hossain, “Cyber intrusion detection using machine learning
classification techniques,” In Proc. of the International Conference on
Computing Science, Communication and Security, 2020, pp. 121-131.
[8] F. S. Masoodi, I. Abrar, and A. M. Bamhdi, “An Effective Intrusion
Detection System using Homogeneous Ensemble Techniques,” Int. J.
Inf. Secur. Priv., vol. 16, no. 1, pp. 1–18, 2021, doi: 10.4018/ijisp.
2022010112.
[9] A. Kumar, K. Abhishek, M. R. Ghalib, A. Shankar, and X. Cheng,
“Intrusion detection and prevention system for an IoT environment,”
Digital Communications and Networks, vol. 8, no. 4, pp. 540-551,
2022.
[10] X. Li, W. Chen, Q. Zhang, and L. Wu, “Computers & Security
Building Auto-Encoder Intrusion Detection System based on random
forest feature selection,” ComputSecur, vol. 95, 2020, doi: 10.1016/j.
cose.2020.101851.
[11] A. S. Talita, O. S. Nataza, and Z. Rustam, “Naïve Bayes Classifier and
Particle Swarm Optimization Feature Selection Method for
Classifying Intrusion Detection System Dataset,” J Phys Conf Ser, vol.
1752, no. 1, 2021, doi: 10.1088/1742-6596/1752/1/012021.
[12] H. Jiang, Z. He, G. Ye, and H. Zhang, “Network Intrusion Detection
Based on PSO-Xgboost Model,” IEEE Access, vol. 8, pp. 58392–
58401, 2020, doi: 10.1109/ACCESS.2020.2982418.
[13] S. Hosseini, “A new machine learning method consisting of GA-LR
and ANN for attack detection,” Wireless Networks, vol. 26, no. 6, pp.
4149–4162, 2020, doi: 10.1007/s11276-020-02321-3.
[14] F. Masoodi and others, “Machine Learning for Classification analysis
of Intrusion Detection on NSL-KDD Dataset,” Turkish Journal of
Computer and Mathematics Education (TURCOMAT), vol. 12, no.
10, pp. 2286–2293, 2021.
[15] N. Abdalgawad and S. Member, “Generative Deep Learning to Detect
Cyberattacks for the IoT-23 Dataset,” IEEE Access, vol. 10, pp. 6430–
6441, 2022, doi: 10.1109/ACCESS.2021.3140015.
[16] A. M. Bamhdi, I. Abrar, and F. Masoodi, “An ensemble based
approach for effective intrusion detection using majority voting,”
Telkomnika (Telecommunication Computing Electronics and
Control), vol. 19, no. 2, pp. 664–671, 2021, doi: 10.12928/
TELKOMNIKA.v19i2.18325.
[17] I. Ullah and Q. H. Mahmoud, “A Scheme for Generating a Dataset for
Anomalous Activity Detection in IoT Networks,” in Lecture Notes in
Computer Science, 2020, pp. 508–520. doi: 10.1007/978-3-030-
47358-7_52.
[18] I. Abrar, Z. Ayub, F. Masoodi, and A. M. Bamhdi, “A Machine
Learning Approach for Intrusion Detection System on NSL-KDD
Dataset,” In Proc. of the Int. Conf. Smart Electron Commun.
(ICOSEC-2020), 2020, pp. 919–924, doi: 10.1109/ICOSEC49089.
2020.9215232.
[19] C. Ambikavathi and S. K. Srivatsa, “Predictor Selection and Attack
Classification using Random Forest for Intrusion Detection,” Journal
of Scientific and Industrial Research, vol. 79, no. 05, pp. 365–368,
2020.
[20] S. S. Dhaliwal, A. al Nahid, and R. Abbas, “Effective intrusion
detection system using XGBoost,” Information (Switzerland), vol. 9,
no. 7, 2018, doi: 10.3390/info9070149
[21] A. S. Ahanger, S. M. Khan, and F. Masoodi, “An Effective Intrusion
Detection System using Supervised Machine Learning Techniques,”
In Proc. of the 5th International Conference on Computing
Methodologies and Communication (ICCMC-2021), 2021, pp. 1639–
1644, doi: 10.1109/ICCMC51019.2021.9418291.
938 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom)
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:32:54 UTC from IEEE Xplore. Restrictions apply.