Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees
Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees
1st Ahmad Houkan 2nd Ashwin Kumar Sahoo 3rd Sarada Prasad Gochhayat
Dept. of Electrical Engineering Dept. of Electrical Engineering Dept. of Electrical Engineering
Abstract:
In the era of information technology, network security is of paramount importance, and the need for
robust intrusion detection systems (IDS) is paramount. Cyber threats are evolving rapidly, demanding
advanced techniques to safeguard critical networks and systems. This scientific article presents a
comprehensive approach to building an effective IDS focusing on feature engineering, dimensionality
reduction, and a systematic evaluation of classifiers. By harnessing the power of Principal Component
Analysis (PCA) for dimensionality reduction and conducting thorough classifier assessments, this
research aims to improve network security and intrusion detection.
Introduction:
In the dynamic landscape of the Industrial Internet of Things (IIoT), the integration of smart technologies
into industrial processes has undergone a swift and transformative evolution, as exemplified by real-life
IoT applications in Fig. 1. This shift presents a critical challenge—ensuring the security of IIoT-based
systems in the face of diverse cyber threats.[1]
The increasing adoption of IIoT introduces a spectrum of potential risks, from common attacks like Brute
Force and Port Scanning to sophisticated threats such as Denial of Service (DoS), Distributed Denial of
Service (DDoS), Man-in-the-Middle (MITM), Remote-to-Local (R2L), Probing (Probe), User-to-Root
(U2R), and vulnerabilities in operating systems. Security breaches in IIoT not only compromise sensitive
industrial data but also pose a substantial risk to operational efficiency and safety.[2]
In response to this pressing need within the IIoT landscape, designing robust systems that safeguard
industrial networks becomes imperative. Numerous tools and methodologies have been devised to
counteract cyber threats, encompassing a range of solutions such as firewalls, spam filters, anti-malware
solutions, unified threat management, intrusion detection systems (IDS), and intrusion prevention
systems.
Within this context, Signature-based detection emerges as a powerful approach in Network Intrusion
Detection Systems (NIDS), specifically tailored for identifying known intrusions in IIoT environments.
This method relies on predefined signatures or patterns of known malicious activities. While Anomaly-
based detection excels at identifying unknown intrusions, Signature-based detection provides a specific
and effective means of recognizing and blocking known threats.[3]
In the context of our research focused on IIoT security, we center on Signature-based detection,
employing Decision Trees (DT) for classification. Additionally, we incorporate Principal Component
Analysis (PCA) for feature reduction, enhancing the efficiency of our intrusion detection system. A pre-
processing step for data, involving cleaning, encoding, and normalization, further optimizes the dataset
for robust analysis within the dynamic realm of IIoT.
As the Industrial Internet of Things continues to evolve, presenting unique challenges and opportunities,
our research aims to contribute to this evolving landscape by refining and fortifying Signature-based
detection using DT Classification, PCA for feature reduction, and a comprehensive pre-processing step
for data within the specific context of IIoT security.
Related Work:
Intrusion detection systems (IDS) play a crucial role in safeguarding network and system security,
especially within the realm of IoT (Internet of Things) networks. Here, we examine and evaluate several
significant studies in the field of intrusion detection, each employing unique methodologies, algorithms,
and datasets. The studies presented below offer valuable insights and innovations within the domain of
intrusion detection systems:
Sarwar et al. (2022): Sarwar and colleagues focused on designing an advanced intrusion detection system
for IoT networks. Their multifaceted approach involved data preprocessing, feature selection/reduction,
and classification. They utilized Particle Swarm Optimization (PSO), XGBoost, and Random Forest (RF)
algorithms. Experimentation on the IoTID20 dataset yielded notable results, showcasing an accuracy of
98% in binary classification and 83% in multiclass classification. The study underscores the pivotal role
of data preprocessing and feature selection in enhancing intrusion detection accuracy.[4]
Kasongo and Sun (2020) conducted a comprehensive performance analysis of intrusion detection
systems, focusing on feature selection methods using the UNSW-NB15 dataset. Their methodology
included data splitting, Min-Max scaling, and employing XGBoost for feature selection. The study
incorporated the application of Decision Tree (DT) and Artificial Neural Networks (ANN) as
classification algorithms. The outcomes demonstrated noteworthy accuracy rates in binary (up to 90.85%)
and multiclass (up to 77.51%) classifications, underscoring the significance of feature selection for
augmenting intrusion detection performance [5].
Akhther et al. (2023) introduced a novel intrusion detection system for IoT based on Least Square
Support Vector Machine (LSSVM). Their methodology included data preprocessing, encompassing min-
max normalization, equal-width discretization, and correlation-based feature selection. Through
comparisons with Support Vector Machine (SVM) and Random Forest, the study achieved remarkable
accuracy rates, with LSSVM outperforming alternatives at 97.73%. This study highlights the potential of
LSSVM in IoT intrusion detection and the significance of efficient data preprocessing.[6]
Omar and George (2021) presented a lightweight machine learning-based solution for IoT intrusion
detection. Their approach involved data preprocessing, feature reduction, model selection, and
hyperparameter tuning using GridSearchCV. Working with the IoTID20 dataset, the study reduced the
feature set to 34 relevant features and employed Decision Tree as the classification model, achieving
exceptional accuracy at 98.9%. This study underscores the importance of feature reduction and model
selection, particularly in resource constrained IoT environments.[7]
In comparison to previous studies, the approach we're adopting involving PCA with DT machine learning
demonstrates promising performance. This combination displayed notable effectiveness in high accuracy
rates and feature reduction in the context of intrusion detection systems for IoT networks. The integration
of PCA (Principal Component Analysis) for dimensionality reduction with Decision Trees (DT) is
particularly compelling, indicating potential scalability and resource efficiency. This approach serves as a
promising avenue for enhancing intrusion detection, especially in the complex and diverse realm of IoT
security.
Table 1: Performance Analysis Table for Classification Methods
Reference Methods Algorithms Dataset Accuracy Classification
Type
[4] Data Preprocessing, PSO, XGB, IoTID20 98% (Binary), Binary,
Feature Random 83% Multiclass
Selection/Reduction, Forest (RF) (Multiclass)
Classification
[5] Data Splitting, Min-Max Decision Tree UNSW- Binary: 90.85% Binary,
Scaling, Feature (DT), NB15 (19 features), Multiclass
Selection with XGBoost Artificial 88.13% (42
Neural features);
Networks Multiclass:
(ANN) 77.51% (19
features),
75.62% (42
features)
[6] Data Preprocessing, Least Squares IoTID20 LSSVM: Binary
Feature Selection with Support 97.73%, SVM:
LSSVM Vector 95.28%,
Machine Random Forest:
(LSSVM), 89.26%
SVM,
Random
Forest
[7] Data Preprocessing, Decision Tree IoTID20 98.9% Binary
Feature Reduction,
Model Selection,
Hyperparameter Tuning
with GridSearchCV
PROPOSED APPROACH
The suggested structure of the framework is depicted in Figure 2, illustrating three key stages: pre-
processing, feature reduction utilizing PCA, modeling, and subsequent assessment. In the pre-processing
stage, the datasets—specifically the IoTID20—are loaded. Each dataset undergoes rigorous cleaning and
standardization protocols as a preparatory measure.
As described in Algorithm 1, the application of Principal Component Analysis (PCA) on the IoTID20
dataset is detailed. PCA is utilized as a dimensionality reduction method, transforming high-dimensional
data into a lower-dimensional space while preserving data variance. This process encompasses
standardizing features, computing covariance, identifying principal components, and constructing a
projection matrix for dimensionality reduction.
Input: Pre-processed features vector of IoTID20 dataset as D_IoTID20 Output: Reduced dimensionality
dataset
1 Standardize the dataset features.
2 Calculate the covariance matrix.
3 Compute the eigenvectors and eigenvalues.
4 Arrange the eigenvalues in a descending order.
5 Choose the top k eigenvectors associated with the largest eigenvalues.
6 Choose the top k eigenvectors associated with the largest eigenvalues.
7 Convert the initial dataset into the reduced-dimensional subspace.
Algorithm 2 outlines the workflow for implementing a Decision Tree (DT) classification model on a
dataset processed through Principal Component Analysis (PCA) and generating a Confusion Matrix. The
steps involve dataset splitting, DT application on the PCA-processed data, model training and evaluation,
prediction calculation, and the assessment of model accuracy and performance. The algorithm culminates
in the computation of the Confusion Matrix, which enables an in-depth analysis of True Positive (TP),
True Negative (TN), False Positive (FP), and False Negative (FN) values. This comprehensive approach
yields essential performance metrics such as Accuracy, Precision, Recall, and F1 Score, offering a
detailed insight into the DT classifier's performance post-PCA reduction.
Algorithm 2: Decision Tree (DT) Classification Model with PCA-Processed Input and Confusion Matrix
Accuracy (AC)
𝑇𝑃 + 𝑇𝑁
𝐴𝐶 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Precision (P):
𝑇𝑃
𝑃=
𝑇𝑃 + 𝐹𝑃
Recall:
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
F-Measure:
𝑃 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2 ×
𝑃 + 𝑅𝑒𝑐𝑎𝑙𝑙
Error Rate:
𝐸𝑟𝑟𝑜𝑒 𝑅𝑎𝑡𝑒 = 1 − 𝐴𝐶
Specificity:
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃
Table 2: Performance Metrics for DT Classifier on Dataset Evaluation
Metric Value
Training Time 10.6876 Sec
Testing Time 0.1340 Sec
Accuracy-DT-Classifier 99.8114%
Accuracy 99.80%
Error Rate 0.20%
Recall 98.20%
Specificity 99.90%
Precision 98.80%
F-measure 98.50%
Next, we compute the area under the ROC curve value (AUC) to be 0.9906, as depicted in Figure 5.
conclusion
The IoT network attack detection system was developed employing a PCA-DT model. Achieving a
remarkable 99.8% accuracy in binary classification through simulations conducted in MATLAB is a
significant outcome. Future objectives encompass transitioning to multi-class classification and
leveraging multiple algorithms. The presented system not only showcases robustness in current
classification tasks but also sets the stage for broader, more complex categorization challenges. This high
accuracy serves as a strong foundation for expanding the model's capabilities to encompass diverse attack
classifications within IoT networks.
Reference:
[1] Houkan, A., Sahoo, A. K., & Nayak, S. (2023). Industry 5.0 and sustainable development in
the developing world. 2023 IEEE 3rd International Conference on Sustainable Energy and
Future Electric Transportation (SEFET). doi:10.1109/sefet57834.2023.10245935
[2] M. A. Ferrag , L. Maglaras , A. Ahmim , M. Derdour and H. Janicke, aRDTIDS: Rules and
decision tree-based intrusion detection system for ˆ internet-of things networks,aˆ Future
Internet, vol. 12, no. 3, p. 44, 2020.
[3] Khraisat, A. and Alazab, A. (2021) ‘A critical review of intrusion detection systems in the
internet of things: Techniques, deployment strategy, validation strategy, attacks, public
datasets and challenges’, Cybersecurity, 4(1). doi:10.1186/s42400-021-00077-7.
4- Sarwar, Asima, et al. "Design of an Advanced Intrusion Detection System for IoT Networks."
2022 2nd International Conference on Artificial Intelligence (ICAI). IEEE, 2022.
5- Kasongo, Sydney M., and Yanxia Sun. "Performance analysis of intrusion detection systems
using a feature selection method on the UNSW-NB15 dataset." Journal of Big Data 7
(2020): 1-20.
7- Omar, Mawloud, and Laurent George. "Toward a lightweight machine learning based solution
against cyber-intrusions for IoT." 2021 IEEE 46th Conference on Local Computer
Networks (LCN). IEEE, 2021.
8- Islam, M. J., Ahmad, S., Haque, F., Reaz, M. B. I., Bhuiyan, M. A. S., & Islam, M. R. (2022).
Application of Min-Max Normalization on Subject-Invariant EMG Pattern
Recognition. IEEE Transactions on Instrumentation and Measurement. DOI:
10.1109/TIM.2022.322028
10- J. Long, S. Zhang and C. Li, aEvolving deep echo state networks for ˆ intelligent fault diagnosis,aˆ IEEE
Transactions on Industrial Informatics, vol. 16, no. 7, pp. 4928-4937, 2020.