0% found this document useful (0 votes)
29 views

Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees

This document proposes an intrusion detection system for Industrial IoT networks using principal component analysis (PCA) and decision trees. It first discusses the need for robust intrusion detection in IIoT due to evolving cyber threats. It then reviews related work applying techniques like data preprocessing, feature selection, and machine learning algorithms. The proposed approach involves preprocessing data, reducing dimensions with PCA, and classifying intrusions using decision trees. This combination of PCA and decision trees could provide high accuracy intrusion detection suited for resource-constrained IIoT environments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees

This document proposes an intrusion detection system for Industrial IoT networks using principal component analysis (PCA) and decision trees. It first discusses the need for robust intrusion detection in IIoT due to evolving cyber threats. It then reviews related work applying techniques like data preprocessing, feature selection, and machine learning algorithms. The proposed approach involves preprocessing data, reducing dimensions with PCA, and classifying intrusions using decision trees. This combination of PCA and decision trees could provide high accuracy intrusion detection suited for resource-constrained IIoT environments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Enhancing Industrial IoT Security: A Comprehensive Approach to Intrusion

Detection Using PCA-Driven Decision Trees

1st Ahmad Houkan 2nd Ashwin Kumar Sahoo 3rd Sarada Prasad Gochhayat
Dept. of Electrical Engineering Dept. of Electrical Engineering Dept. of Electrical Engineering

C. V. Raman Global University C. V. Raman Global University C. V. Raman Global University

Bhubaneswar, India Bhubaneswar, India Bhubaneswar, India

[email protected] [email protected] [email protected]

Abstract:
In the era of information technology, network security is of paramount importance, and the need for
robust intrusion detection systems (IDS) is paramount. Cyber threats are evolving rapidly, demanding
advanced techniques to safeguard critical networks and systems. This scientific article presents a
comprehensive approach to building an effective IDS focusing on feature engineering, dimensionality
reduction, and a systematic evaluation of classifiers. By harnessing the power of Principal Component
Analysis (PCA) for dimensionality reduction and conducting thorough classifier assessments, this
research aims to improve network security and intrusion detection.
Introduction:
In the dynamic landscape of the Industrial Internet of Things (IIoT), the integration of smart technologies
into industrial processes has undergone a swift and transformative evolution, as exemplified by real-life
IoT applications in Fig. 1. This shift presents a critical challenge—ensuring the security of IIoT-based
systems in the face of diverse cyber threats.[1]
The increasing adoption of IIoT introduces a spectrum of potential risks, from common attacks like Brute
Force and Port Scanning to sophisticated threats such as Denial of Service (DoS), Distributed Denial of
Service (DDoS), Man-in-the-Middle (MITM), Remote-to-Local (R2L), Probing (Probe), User-to-Root
(U2R), and vulnerabilities in operating systems. Security breaches in IIoT not only compromise sensitive
industrial data but also pose a substantial risk to operational efficiency and safety.[2]
In response to this pressing need within the IIoT landscape, designing robust systems that safeguard
industrial networks becomes imperative. Numerous tools and methodologies have been devised to
counteract cyber threats, encompassing a range of solutions such as firewalls, spam filters, anti-malware
solutions, unified threat management, intrusion detection systems (IDS), and intrusion prevention
systems.
Within this context, Signature-based detection emerges as a powerful approach in Network Intrusion
Detection Systems (NIDS), specifically tailored for identifying known intrusions in IIoT environments.
This method relies on predefined signatures or patterns of known malicious activities. While Anomaly-
based detection excels at identifying unknown intrusions, Signature-based detection provides a specific
and effective means of recognizing and blocking known threats.[3]
In the context of our research focused on IIoT security, we center on Signature-based detection,
employing Decision Trees (DT) for classification. Additionally, we incorporate Principal Component
Analysis (PCA) for feature reduction, enhancing the efficiency of our intrusion detection system. A pre-
processing step for data, involving cleaning, encoding, and normalization, further optimizes the dataset
for robust analysis within the dynamic realm of IIoT.
As the Industrial Internet of Things continues to evolve, presenting unique challenges and opportunities,
our research aims to contribute to this evolving landscape by refining and fortifying Signature-based
detection using DT Classification, PCA for feature reduction, and a comprehensive pre-processing step
for data within the specific context of IIoT security.

Fig1. IoT Applications Landscape

Related Work:
Intrusion detection systems (IDS) play a crucial role in safeguarding network and system security,
especially within the realm of IoT (Internet of Things) networks. Here, we examine and evaluate several
significant studies in the field of intrusion detection, each employing unique methodologies, algorithms,
and datasets. The studies presented below offer valuable insights and innovations within the domain of
intrusion detection systems:
Sarwar et al. (2022): Sarwar and colleagues focused on designing an advanced intrusion detection system
for IoT networks. Their multifaceted approach involved data preprocessing, feature selection/reduction,
and classification. They utilized Particle Swarm Optimization (PSO), XGBoost, and Random Forest (RF)
algorithms. Experimentation on the IoTID20 dataset yielded notable results, showcasing an accuracy of
98% in binary classification and 83% in multiclass classification. The study underscores the pivotal role
of data preprocessing and feature selection in enhancing intrusion detection accuracy.[4]
Kasongo and Sun (2020) conducted a comprehensive performance analysis of intrusion detection
systems, focusing on feature selection methods using the UNSW-NB15 dataset. Their methodology
included data splitting, Min-Max scaling, and employing XGBoost for feature selection. The study
incorporated the application of Decision Tree (DT) and Artificial Neural Networks (ANN) as
classification algorithms. The outcomes demonstrated noteworthy accuracy rates in binary (up to 90.85%)
and multiclass (up to 77.51%) classifications, underscoring the significance of feature selection for
augmenting intrusion detection performance [5].
Akhther et al. (2023) introduced a novel intrusion detection system for IoT based on Least Square
Support Vector Machine (LSSVM). Their methodology included data preprocessing, encompassing min-
max normalization, equal-width discretization, and correlation-based feature selection. Through
comparisons with Support Vector Machine (SVM) and Random Forest, the study achieved remarkable
accuracy rates, with LSSVM outperforming alternatives at 97.73%. This study highlights the potential of
LSSVM in IoT intrusion detection and the significance of efficient data preprocessing.[6]
Omar and George (2021) presented a lightweight machine learning-based solution for IoT intrusion
detection. Their approach involved data preprocessing, feature reduction, model selection, and
hyperparameter tuning using GridSearchCV. Working with the IoTID20 dataset, the study reduced the
feature set to 34 relevant features and employed Decision Tree as the classification model, achieving
exceptional accuracy at 98.9%. This study underscores the importance of feature reduction and model
selection, particularly in resource constrained IoT environments.[7]
In comparison to previous studies, the approach we're adopting involving PCA with DT machine learning
demonstrates promising performance. This combination displayed notable effectiveness in high accuracy
rates and feature reduction in the context of intrusion detection systems for IoT networks. The integration
of PCA (Principal Component Analysis) for dimensionality reduction with Decision Trees (DT) is
particularly compelling, indicating potential scalability and resource efficiency. This approach serves as a
promising avenue for enhancing intrusion detection, especially in the complex and diverse realm of IoT
security.
Table 1: Performance Analysis Table for Classification Methods
Reference Methods Algorithms Dataset Accuracy Classification
Type
[4] Data Preprocessing, PSO, XGB, IoTID20 98% (Binary), Binary,
Feature Random 83% Multiclass
Selection/Reduction, Forest (RF) (Multiclass)
Classification
[5] Data Splitting, Min-Max Decision Tree UNSW- Binary: 90.85% Binary,
Scaling, Feature (DT), NB15 (19 features), Multiclass
Selection with XGBoost Artificial 88.13% (42
Neural features);
Networks Multiclass:
(ANN) 77.51% (19
features),
75.62% (42
features)
[6] Data Preprocessing, Least Squares IoTID20 LSSVM: Binary
Feature Selection with Support 97.73%, SVM:
LSSVM Vector 95.28%,
Machine Random Forest:
(LSSVM), 89.26%
SVM,
Random
Forest
[7] Data Preprocessing, Decision Tree IoTID20 98.9% Binary
Feature Reduction,
Model Selection,
Hyperparameter Tuning
with GridSearchCV

PROPOSED APPROACH
The suggested structure of the framework is depicted in Figure 2, illustrating three key stages: pre-
processing, feature reduction utilizing PCA, modeling, and subsequent assessment. In the pre-processing
stage, the datasets—specifically the IoTID20—are loaded. Each dataset undergoes rigorous cleaning and
standardization protocols as a preparatory measure.

Fig.2 Proposed Dataset Analysis Framework


The original training dataset undergoes a process of feature reduction using the PCA algorithm to
generate optimal feature vectors. These resulting optimal feature vectors are then applied to train the DT
model.
Before deploying the model in real-world scenarios, it is essential to train it on a data sample to adapt to
actual network traffic patterns. The IoTID20 dataset is employed for this purpose, but prior to
constructing a model, the dataset undergoes necessary cleaning and normalization. Data preprocessing not
only enhances data quality but also strengthens the precision of the model being trained. This includes
tasks such as rectifying missing values, eliminating duplicates, and resolving structural errors. Following
the cleaning process, features are normalized through Min-Max scaling, as depicted in the following
equation [8].
𝑣 − 𝑚𝑖𝑛𝐴
𝑉′ = (𝑛𝑒𝑤_𝑚𝑎𝑥𝐴 − 𝑛𝑒𝑤_𝑚𝑖𝑛𝐴 ) + 𝑛𝑒𝑤_𝑚𝑖𝑛𝐴
𝑚𝑎𝑥𝐴 − 𝑚𝑖𝑛𝐴
The IoTID20 dataset comprises 86 attributes, with three specifically designated for classification,
resulting in a total of 83 features. This dataset includes three distinct label features: binary, category, and
subcategory. Notably, it covers four primary types of attacks: Scan, Mirai, Denial of Service (DoS), and
Man in the Middle (MITM). The detailed breakdowns and variations of these attacks are visually
presented in Fig. 3.

fig 3. Hierarchical Attack Classification in the IoTID20 Dataset


The menace posed by these threats pertains to critical security requisites like authentication,
confidentiality, integrity, and availability. Given the potential severity of these implications for IoT
applications [9], the accurate identification of such attacks holds paramount significance. Real-world
datasets are intricate due to their high dimensionality derived from data procured across a multitude of
IoT devices and sensors. When constructing machine learning models, the judicious selection of an
assemblage of effective and non-redundant features is critical, as the quality of features significantly
influences the performance of the machine learning classifier [10].

As described in Algorithm 1, the application of Principal Component Analysis (PCA) on the IoTID20
dataset is detailed. PCA is utilized as a dimensionality reduction method, transforming high-dimensional
data into a lower-dimensional space while preserving data variance. This process encompasses
standardizing features, computing covariance, identifying principal components, and constructing a
projection matrix for dimensionality reduction.

Algorithm 1: Principal Component Analysis (PCA) Applied on IoTID20 Dataset

Input: Pre-processed features vector of IoTID20 dataset as D_IoTID20 Output: Reduced dimensionality
dataset
1 Standardize the dataset features.
2 Calculate the covariance matrix.
3 Compute the eigenvectors and eigenvalues.
4 Arrange the eigenvalues in a descending order.
5 Choose the top k eigenvectors associated with the largest eigenvalues.
6 Choose the top k eigenvectors associated with the largest eigenvalues.
7 Convert the initial dataset into the reduced-dimensional subspace.
Algorithm 2 outlines the workflow for implementing a Decision Tree (DT) classification model on a
dataset processed through Principal Component Analysis (PCA) and generating a Confusion Matrix. The
steps involve dataset splitting, DT application on the PCA-processed data, model training and evaluation,
prediction calculation, and the assessment of model accuracy and performance. The algorithm culminates
in the computation of the Confusion Matrix, which enables an in-depth analysis of True Positive (TP),
True Negative (TN), False Positive (FP), and False Negative (FN) values. This comprehensive approach
yields essential performance metrics such as Accuracy, Precision, Recall, and F1 Score, offering a
detailed insight into the DT classifier's performance post-PCA reduction.

Algorithm 2: Decision Tree (DT) Classification Model with PCA-Processed Input and Confusion Matrix

Input: PCA-reduced dataset D_PCA_IoTID20 Output: Trained DT model, Confusion Matrix


1 Divide the dataset into training and testing sets.
2 Utilize Decision Trees (DT) on the dataset processed through Principal Component Analysis (PCA).
3 Train the Decision Trees (DT) model using the training dataset.
4 Assess the model's performance on the testing dataset.
5 Calculate predictions using the DT model on unseen data.
6 Evaluate the accuracy and overall performance of the model.
7 Calculate the Confusion Matrix using predictions and actual labels.
8 Analyze TP, TN, FP, and FN
9 Derive performance metrics: Accuracy, Precision, Recall, F1 Score

IV. IMPLEMENTATION AND RESULTS EVALUATION


This section will cover the experimental setup, the evaluation metrics employed to assess the model's
performance, the experimental measurements, and ultimately the outcomes of evaluating the proposed
model.
A. Experimental Setup
The proposed model's performance assessment was carried out using an ASUS laptop running Windows
10 Home Single Language. The laptop is equipped with an Intel(R) Core (TM) i5-8250U CPU, operating
at 1.60GHz (with a turbo speed of 1.80GHz), and it contains 12.0 GB of RAM, with 11.9 GB usable.
For simulation and analysis tasks, MATLAB 2023b was utilized to perform various experiments. This
environment provided a robust platform for the creation and evaluation of feature selection and
classification algorithms.
B. Evaluation Metrics
In our research, we assess the effectiveness of our work based on various parameters: accuracy (AC), F-
measure, precision (P), error rate, recall, ROC curve value (AUC), and specificity [19]. Additionally, we
consider testing time and training time. We have derived a confusion matrix for our model, as illustrated
in Figure 4.

Fig.4 Confusion Matrix of PCA-based DT classifier.

Accuracy (AC)
𝑇𝑃 + 𝑇𝑁
𝐴𝐶 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Precision (P):
𝑇𝑃
𝑃=
𝑇𝑃 + 𝐹𝑃
Recall:
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
F-Measure:
𝑃 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2 ×
𝑃 + 𝑅𝑒𝑐𝑎𝑙𝑙
Error Rate:
𝐸𝑟𝑟𝑜𝑒 𝑅𝑎𝑡𝑒 = 1 − 𝐴𝐶

Specificity:
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃
Table 2: Performance Metrics for DT Classifier on Dataset Evaluation
Metric Value
Training Time 10.6876 Sec
Testing Time 0.1340 Sec
Accuracy-DT-Classifier 99.8114%
Accuracy 99.80%
Error Rate 0.20%
Recall 98.20%
Specificity 99.90%
Precision 98.80%
F-measure 98.50%

Next, we compute the area under the ROC curve value (AUC) to be 0.9906, as depicted in Figure 5.

fig.5 ROC of PCA based DT Classifier for Binary classification.


C. Experimental Results
"The study rigorously employed a train-test split validation methodology to assess algorithm performance
on the IoTID20 dataset, encompassing a substantial 625,783 records and 83 features. With a strategic 70-
30 training-testing ratio, our approach integrated PCA to condense the feature set to 12, complemented by
the utilization of DT Model algorithms in training the model with optimized features, primarily focusing
on the binary classification of these attacks.
Moreover, Figure 6 provides insights into the column significance post-PCA application, emphasizing the
reduction from 83 to 12 features. It illustrates that the first two columns represent 74.42% of the total
importance, while the collective significance of the first four columns reaches 88.74%. Notably, the initial
12 columns collectively demonstrate a substantial 99.12% importance, a significant attribute derived from
the post-PCA application."

Fig.6 Visualizing Feature Importance


In contrast, Figure 7 presents a direct comparison of Binary Classification Accuracy between the
proposed model and previous methodologies. It vividly demonstrates the superior performance of our
proposed model. Leveraging PCA for feature reduction in tandem with the DT Model has notably
enhanced accuracy in binary attack classification compared to prior algorithms. This visual comparison
substantiates the effectiveness of our approach, portraying a marked improvement in accurately
distinguishing between various attack types.
Together, these figures collectively underscore the substantial impact of PCA-driven feature reduction on
the model's performance, validating its effectiveness in enhancing accuracy and robustness in classifying
attacks in IoTID20 dataset.
Fig.7 Comparison of Binary Classification Accuracy: Proposed Model vs. Previous Algorithms

conclusion
The IoT network attack detection system was developed employing a PCA-DT model. Achieving a
remarkable 99.8% accuracy in binary classification through simulations conducted in MATLAB is a
significant outcome. Future objectives encompass transitioning to multi-class classification and
leveraging multiple algorithms. The presented system not only showcases robustness in current
classification tasks but also sets the stage for broader, more complex categorization challenges. This high
accuracy serves as a strong foundation for expanding the model's capabilities to encompass diverse attack
classifications within IoT networks.

Reference:

[1] Houkan, A., Sahoo, A. K., & Nayak, S. (2023). Industry 5.0 and sustainable development in
the developing world. 2023 IEEE 3rd International Conference on Sustainable Energy and
Future Electric Transportation (SEFET). doi:10.1109/sefet57834.2023.10245935

[2] M. A. Ferrag , L. Maglaras , A. Ahmim , M. Derdour and H. Janicke, aRDTIDS: Rules and
decision tree-based intrusion detection system for ˆ internet-of things networks,aˆ Future
Internet, vol. 12, no. 3, p. 44, 2020.

[3] Khraisat, A. and Alazab, A. (2021) ‘A critical review of intrusion detection systems in the
internet of things: Techniques, deployment strategy, validation strategy, attacks, public
datasets and challenges’, Cybersecurity, 4(1). doi:10.1186/s42400-021-00077-7.
4- Sarwar, Asima, et al. "Design of an Advanced Intrusion Detection System for IoT Networks."
2022 2nd International Conference on Artificial Intelligence (ICAI). IEEE, 2022.

5- Kasongo, Sydney M., and Yanxia Sun. "Performance analysis of intrusion detection systems
using a feature selection method on the UNSW-NB15 dataset." Journal of Big Data 7
(2020): 1-20.

6- Akhther, Parveen, A. Maryposonia, and V. S. Prasanth. "Least Square Support Vector


Machine based Intrusion Detection System in IoT." 2023 7th International Conference on
Intelligent Computing and Control Systems (ICICCS). IEEE, 2023.

7- Omar, Mawloud, and Laurent George. "Toward a lightweight machine learning based solution
against cyber-intrusions for IoT." 2021 IEEE 46th Conference on Local Computer
Networks (LCN). IEEE, 2021.

8- Islam, M. J., Ahmad, S., Haque, F., Reaz, M. B. I., Bhuiyan, M. A. S., & Islam, M. R. (2022).
Application of Min-Max Normalization on Subject-Invariant EMG Pattern
Recognition. IEEE Transactions on Instrumentation and Measurement. DOI:
10.1109/TIM.2022.322028

9- R. Qaddoura, A. M. Al-Zoubi, I. Almomani and H. Faris, aPredicting ˆ different types of imbalanced


intrusion activities based on a multistage deep learning approach,aˆ International Conference on
Information Technology (ICIT), pp. 858-863, 2021.

10- J. Long, S. Zhang and C. Li, aEvolving deep echo state networks for ˆ intelligent fault diagnosis,aˆ IEEE
Transactions on Industrial Informatics, vol. 16, no. 7, pp. 4928-4937, 2020.

You might also like