Expert Systems With Applications: Bhawana Sharma, Lokesh Sharma, Chhagan Lal, Satyabrata Roy
Expert Systems With Applications: Bhawana Sharma, Lokesh Sharma, Chhagan Lal, Satyabrata Roy
Keywords: The Internet of Things (IoT) is currently seeing tremendous growth due to new technologies and big data.
Intrusion detection system Research in the field of IoT security is an emerging topic. IoT networks are becoming more vulnerable to new
DL assaults as a result of the growth in devices and the production of massive data. In order to recognize the
Deep neural network
attacks, an intrusion detection system is required. In this work, we suggested a Deep Learning (DL) model
Convolution neural network
for intrusion detection to categorize various attacks in the dataset. We used a filter-based approach to pick
XAI
Local interpretable model-agnostic
out the most important aspects and limit the number of features, and we built two different deep-learning
explanations models for intrusion detection. For model training and testing, we used two publicly accessible datasets,
Shapley additive explanations NSL-KDD and UNSW-NB 15. First, we applied the dataset on the Deep neural network (DNN) model and
then the same dataset on Convolution Neural Network (CNN) model. For both datasets, the DL model had
a better accuracy rate. Because DL models are opaque and challenging to comprehend, we applied the idea
of explainable Artificial Intelligence (AI) to provide a model explanation. To increase confidence in the DNN
model, we applied the explainable AI (XAI) Local Interpretable Model-agnostic Explanations (LIME ) method,
and for better understanding, we also applied Shapley Additive Explanations (SHAP).
∗ Corresponding author.
E-mail addresses: [email protected] (B. Sharma), [email protected] (L. Sharma), [email protected] (C. Lal),
[email protected] (S. Roy).
https://doi.org/10.1016/j.eswa.2023.121751
Received 15 June 2023; Received in revised form 22 August 2023; Accepted 19 September 2023
Available online 25 September 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
• Perception layer: : In this layer, sensors/devices termed as things 3. Analyze the model, contrast it with other models, and fine-tune
in IoT network collect the useful data, and then the data is trans- it using various hyper-parameters.
mitted to the network layer after processing. There are three ma- 4. Explain the model using the concept of Explainable AI and
jor security issues disturbance of signals, tampering of hardware, identify important features and the effects of a feature on the
and constrained IoT devices and sensors. Signals are transmitted prediction/detection results using LIME and SHAP.
via wireless technologies, so there is a risk of signal disturbance,
and thus, the efficiency of signals is compromised. Furthermore, The subsequent part of the paper is organized as follows: Section 2
a physical attack can weaken hardware components because IoT describes the state-of-the-art review of ML/DL techniques used for
devices and sensors function in an external environment. The detecting the intrusion and the concept of explainable AI. Section 3
third problem is that IoT devices and sensors’ limited power details the research design and methodology, based on deep learning
consumption, storage, and processing power make them suscepti- techniques, to classify normal/ attack classes in IoT networks. Section 4
ble to attacks. Sleep deprivation, node capturing, data injection, shows the evaluation and analysis of the model’s accuracy and eval-
eavesdropping, and interference by sending noise signals can uation metrics. Section 5 includes the model explanation. Section 6
affect confidentiality. We can mitigate these issues by encrypting insights into the future work and limitations of the model, and Section 7
data from one end to another. summarizes the work done in the paper and includes the future work.
• Network Layer: This layer transfers data using modern tech-
nologies, including Bluetooth, Zigbee, WIFI, and ‘‘Long-Term Evo- 2. Related works
lution (LTE)’’ as well as cloud computing platforms, network
gateways, routers, and switches. Eavesdropping, traffic analy- In this section, we conduct a study and present a systematic liter-
sis, and monitoring are three major security challenges in the ature review providing the introspection about different ML/DL tech-
network layer. Massive traffic overload has rendered the target niques based intrusion detection systems. With the vast expansion of
inaccessible to authorized users. The device can shut down and IoT networks, security, and privacy are the prime areas which need
stop working by DoS attacks and sinkhole attacks. Data secrecy to be considered. Researchers effectively analyze the network and
can be hampered by Man-in-the-Middle (MitM) attacks. We can identify different attacks to take measures to prevent the network.
prevent eavesdropping and heavy traffic bombardment by using Deep learning and Machine Learning are widely used in the field of
a network object with the appropriate protocols and software to intrusion detection systems (Karatas, Demir, & Sahingoz, 2020; Khan &
monitor the network. Herrmann, 2019; Ma, 2020).
• Application Layer:This layer provides application-specific ser-
vices to the system. It offers a range of applications where IoT 2.1. IDS studies based on ML and DL techniques
systems are installed, such as smart parking, smart healthcare,
smart homes, smart cities, etc. It monitors different applications With the recent advancement in ML/DL techniques, models con-
and other layers of the IoT system. structed using these techniques are used by researchers for intrusion
The main security issue in this layer is the authentication of dif- detection systems. Different ML models are applied by the researchers
ferent mechanisms used by various applications. IoT involves a lot for intrusion detection, such as ‘‘K-Nearest Neighbor (KNN)’’ (Xu et al.,
of connected devices or things in the network, so there is a need 2018) and ‘‘Support Vector Machine (SVM)’’ (Teng, Wu, Zhu, Teng, &
to monitor shared data and manage the data. There are risks of Zhang, 2017), and evaluated the models using KDD99, NSL-KDD, and
phishing, malicious scripts, and SYN flooding. Different protocols, DARPA datasets.
such as ‘‘Message Queuing Telemetry Transport (MQTT)’’ and On openly accessible NSL-KDD1 and UNSW-NB152 datasets, Fe-
‘‘Constrained Application Protocol (CoAP)’’, are used to mitigate nanir, Semchedine, and Baadache (2019) implemented various ML-
these issues (Al Nafea & Almaiah, 2021; Altulaihan, Almaiah, & based models, utilized a filter approach to pick features, and used a
Aljughaiman, 2022). Decision Tree (DT) to get the maximum accuracy. A lightweight IDS
model was suggested by the author. The characteristics were selected
Nowadays, ML and DL techniques are widely used for anomaly- using the filtering process, and the data was then categorized using ML
based detection, where models learn to determine the normal behavior techniques. The DT generates the most effective classification model on
in the training phase (Chaabouni, Mosbah, Zemmari, Sauvignac, & a variety of datasets, according to the experts’ comparison of several
Faruki, 2019). Before applying ML and DL techniques, we intelligently machine learning techniques. The characteristics were selected using
select features to attain the maximum accuracy with the fewest features many datasets, various threshold values, and filter techniques such
possible. By lowering the amount of features, we can speed up training, as the correlation filter methods like ‘‘Pearson Correlation Coefficient
which lowers the cost of computation and storage. (PCC)’’, ‘‘Kendall Correlation Coefficient (KCC)’’ and ‘‘Spearman corre-
In this paper, we employed DL models to detect intrusions based on lation coefficient (SCC)’’.3 The author used a number of ML methods,
anomaly detection, which depends on the behavior of the system. The including DT, SVM, and ‘‘Logistic Regression (LR)’’ for classification on
DNN model classifies the normal/attack categories in multi-class classi- multiple datasets, including UNSW-NB15, KDD99, and NSL-KDD.
fication. Since the ML and DL models are black boxes, we explain and Sun et al. (2020) created an LSTM-CNN model for classification
interpret the models using the concept of explainable AI. Explainable AI using the hybrid method concept. To deal with the dataset’s uneven dis-
is to explain and interpret the models and find what makes the model tribution of the target class, the author employed the weight optimiza-
arrive at such predictions. Mostly ML and DL models are considered tion strategy. The model’s accuracy was tested using the CICIDS2017
black boxes, and it is not easy to understand the models. Researchers dataset, and it was 98.67% accurate. Hassan, Gumaei, Alsanad, Al-
are working in this direction to develop methods for explaining the rubaian, and Fortino (2020) also suggested a hybrid deep neural net-
black box ML and DL models. For greater comprehension and model work model that integrates CNN and LSTM. Model performance was
explainability, the XAI approaches LIME and SHAP are frequently evaluated based on accuracy parameters using the openly accessible
utilized today. The major contributions of this research document are: UNSW-NB15 dataset, and an overall accuracy of 97% was attained.
2
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
A CNN-based IDS model was proposed by Xiao, Xing, Zhang, and In Ali et al. (2022), Ali et al. put out a blockchain-based model for
Zhao (2019), and the model performance was evaluated using the KDD the health care system and for the protection of data that uses a homo-
Cup 99 dataset and found that it had a 94% accuracy rate. Denial morphic encryption method. Using the Hyperledger Caliper, a hybrid
of Service (DoS) attacks were the subject of Kim, Kim, Kim, Shim, DNN model with binary spring search (BSS) was put into practice
and Choi (2020)’s CNN and RNN model, which had an accuracy of for both intrusion detection and blockchain. The proposed approach
almost 99 percent on evaluating the model using KDD Cup dataset and obtained shorter confirmation time and computational cost for security
similarly 91 percent on the CICIDS2018 dataset. compared to benchmark models.
On the publicly accessible dataset UNSW-NB15, which was used to Al Hwaitat et al. (2020) proposed the Particle Swarm Optimization
evaluate Kasongo and Sun (2020)’s DNN model, the model’s accuracy (PSO) algorithm and compared the model to the existing optimization
for multi-class classification was 77.16%. On the publicly accessible approaches. The program was improved to detect jamming attacks,
dataset NSL-KDD cup, 97% accuracy was attained using the DNN model which are the most prevalent kind of DoS attack. The outcome has
that Liang et al. (2019) presented. The DNN model was created by shown that the suggested strategy produced superior outcomes in terms
Thamilarasu and Chawla (2019), who then tested it on their own of the coverage area and the least fitness value. In Fatani et al. (2023),
dataset and attained great precision. Ge, Syed, Fu, Baig, and Robles- Fatani et al. proposed a deep learning model with an optimization
Kelly (2021) developed the FNN model, evaluated it using the BoT-IoT technique. The author employed the CNN model for feature extraction,
dataset, and achieved a multi-class classification accuracy of above the growth optimizer modified version (MGO) for feature selection, and
99%. The DNN model was created by Vinayakumar et al. (2019), and it the whale optimization algorithm for the search process. An experiment
achieved a 78 percent accuracy rate on the publicly accessible dataset using several datasets revealed that the MGO performed better than
NSL-KDD. other strategies. The KDD dataset experiment shows that the training
Nagisetty and Gupta (2019) suggested many DL models, including accuracy is 99.9 while the testing accuracy is 92.04. Similarly, the
CNN, DNN, MLP, and autoencoder. The models were tested using the
training accuracy on the NSL KDD dataset is 99.214. In contrast, the
open-source datasets UNSW-NB15 and NSL-KDD, and the DNN model
testing accuracy is 76.72, demonstrating that the model is over fit and
outperformed them in terms of accuracy. Qiu et al. (2020) developed
performs well on the training dataset, but accuracy decreases on the
a DL-based model in which DoS assaults might be generated by a
testing dataset.
little modification in the characteristics. On the DMD-2018 dataset,
Similarly, in Abd Elaziz, Al-qaness, Dahou, Ibrahim, and Abd El-
the Vinayakumar et al. (2020)-proposed deep learning CNN and RNN
Latif (2023), the author builds the CNN-CapSA model for Intrusion
models yielded 99% accuracy. A DNN model with varying numbers of
detection using a combination of a deep learning model and the swarm
neurons and hidden layers that are customized with different learning
intelligence method. The author used a deep learning model to find
rates was proposed by the author. Model performance was evaluated
optimal features, and then an optimizer based on a swarm intelligence
on several publicly accessible datasets, including binary-class datasets
method called the Capuchin Search Algorithm (CapSA) was applied
with attack and normal classes and multi-class datasets with various
for efficient feature selection. The experiment was conducted on four
attacks and normal classes. On the KDDCup99 dataset and the NSL-KDD
different datasets.
dataset, the authors’ model had an accuracy of 93% and 78%, respec-
tively, after they reduced the number of features and tested it. With the
features reduction Zhou, Han, Liu, He, and Wang (2018) recommended, 2.2. Explainable AI and IDS studies based on model explanation
his DL model had a 93% accuracy rate. Meidan et al. (2018) suggested
a deep autoencoder and used the Mirai dataset to train the model. The Explainable AI is being researched because ML and DL models are
model was then adjusted using various hyperparameters. opaque and challenging to grasp, making it difficult to interpret model
In 31, Kasongo et al. developed the FNN model and used the predictions. Explainable AI explains the predictions model, fostering
filter approach to choose the feature. Following that, the model was model transparency and confidence. The idea of explainable AI is a
adjusted using a variety of hyperparameters and parameters, including new one, and it entails employing model explanation techniques to
the learning rate and the number of neurons in hidden layers. The explain the models created and the contributions of each feature to
model was evaluated and contrasted with other ML approaches using the prediction (Ribeiro, Singh, & Guestrin, 2016; Samek, Wiegand, &
the NSL-KDD dataset. The author’s model, which included three hidden Müller, 2017).
layers with 30 neurons each and was tested on a binary classification In Zhou, Hooker, and Wang (2021), Zhou et al. recommended stable
dataset, had an accuracy of 88 percent. Similar results were obtained LIME to be used for explaining models and deployed a random forest
for multi-class classification utilizing 3 hidden layers and 150 neurons, classifier to the data set containing breast cancer data, and it achieved
which yielded an accuracy of 86.19 95% accuracy.
Almaiah et al. used the Frequency Particle Swarm Optimization
(FPSO) approach in Almaiah and Almomani (2020) to identify the char-
2.3. Local interpretable model-agnostic explanations (LIME)
acteristics of the Shamoon attack. The Shamoon addresses industrial
data, while the Fog nodes supply medical, educational, and industrial
LIME gives the user the model interpretation, which clarifies the
data. The source of the assault can be determined by locating the initial
forecast on a given instance. Having confidence in the model comes
node because the author studied the Shamoon attack’s movement and
from understanding it, and LIME explains the predictions the model
discovered that it follows the shortest path.
In Siam et al. (2021), Siam et al. proposed an IoT-based smart health made. LIME checks the model using an instance’s explanation as a basis.
monitoring system that uses sensors to evaluate temperature, blood Eq. (1) provides the formula for LIME, which minimizes loss L and
oxygen levels, and heart rate. The Advanced Encryption Standard (AES) determines how closely the explanation resembles the original model.
technique is then used to encrypt the data before it is delivered to the 𝜖(𝑥) is the explanation for instance 𝑥 of the model 𝑔.
organization for decryption. A 95% confidence interval was achieved 𝜖(𝑥) = 𝑎𝑟𝑔𝑚𝑖𝑛(𝐿(𝑓 , 𝑔, 𝜋𝑥)) + 𝛺(𝑔) (1)
using the suggested procedure.
Almaiah et al. applied the blockchain concept for IIoT and proposed where, 𝑔 represents the interpretable model 𝑔𝜖 G. 𝐺 represents the
the deep learning model in Almaiah, Hajjej, Ali, Pasha, and Almo- family of the model. 𝜋x is the proximity measure of the neighborhood
mani (2022). The proposed model outperformed the current consensus we used to explain the instance. 𝛺(𝑔) represents the complexity of the
protocol employed in the benchmark models, according to the results model eg, the number of features. 𝑓 represents the probability of 𝑥
of the improvement in the blockchain’s existing consensus protocol. belonging to specific class.
3
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
2.4. Shapley additive explanations (SHAP) Description, Data Preprocessing to encode and normalize the data,
Feature Selection to identify important features and thus reduce the
SHAP explains the model using Shapley values based on the feature input variables, Feature Preprocessing to transform the dataset, and
importance. For a given data point for extracting the SHAP value in then training and testing the proposed deep learning method, and last
the Shapley value extraction formula, the contribution of the j feature explanation of the model. We explain them in detail as follows:
is calculated using Eq. (2).
𝜙(𝑖) = 𝛴𝑆⊆𝐹 ∕𝑗 ((|𝑆|!(|𝑃 | − |𝑆| − 1)!)∕|𝑃 |!)(𝑣(𝑆 ∪ 𝑗) − 𝑣(𝑆)) (2) 3.1. Dataset description
where 𝑃 is all features in the dataset and 𝑆 is the set of all except the
feature j, and 𝑣(𝑥) is the contribution of a subset x. Using a network analyzer tool, raw traffic is gathered, and the
In summary, the review of the literature reveals that the researchers features are then extracted. The researchers used publicly accessible
have proposed a variety of machine learning and deep learning tech- datasets to test the DL models for intrusion detection systems. One
niques for intrusion detection and evaluated the model based on accu- of them NSL-KDD dataset is made up of 42 features in all, 38 of
racy, precision, recall, and F1 score metrics using various benchmark which are numerical values, 3 of which are nominal values, and one
datasets, but they have not provided any metrics for the model’s label indicates the normal/attack type category. Furthermore, another
trustworthiness and ability to explain the model. The feature selection UNSW-NB15 dataset has 44 characteristics in total, including 4 categor-
also minimizes the number of features, which lowers the model’s com-
ical values, 39 numerical values, and one label indicating the category
plexity. It also reduces the amount of time needed to test and train the
of normal/attack.
model, which improves the model’s performance. The model’s accuracy
and performance can be improved by integrating filter-based selection
with deep learning methods. Since trustworthiness is not discussed in 3.2. Data preprocessing
the literature, we reviewed our study’s goal to make the IDS more
trustworthy. The model must be trusted to be used in the real world. In this stage, the dataset is transformed and normalized after re-
By utilizing the techniques of explainability, this study seeks to expand dundant data has been eliminated from it. During data transformation,
the field of XAI.
various encoding techniques are used to translate the nominal val-
The literature review revealed that the described approaches’ ac-
ues of the characteristics in the dataset into numerical values. Label
curacy is high but that the DL and ML models are complex and that
encoding and one hot encoding are the two most popular encoding
comprehension of the prediction is necessary in order to place trust
techniques. Data must be normalized such that values fall within the
in the model. Each feature contribution should be explained in the
prediction. We can use the concept of explainable AI to explain each range of 0 to 1, enhancing the model’s accuracy and performance. Min–
feature’s prediction and contribution, which builds trust in the model. Max normalization is used to normalize data (Sharma, Sharma, & Lal,
Secondly, the training time and complexity of the model should be 2022b).
reduced by reducing the features. Feature selection techniques could
be used, which reduce the number of input variables to the model by 3.3. Feature selection
eliminating redundant and irrelevant features and thus reducing both
model training time and complexity.
Features are the input variables to the ML/DL models. The model is
3. Proposed framework trained after selecting the important features, and the method is called
the feature selection technique, which subsequently decreases the fea-
This section presents the workflow of the framework for detecting ture columns in the dataset, thus reducing storage and computing costs.
attacks in the network, as shown in Fig. 1. The main steps are Dataset Different feature selection approaches are:
4
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
5
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
Table 1
NSL-KDD CUP dataset attack statistics showing number of records in every normal/attack category.
Attack_class Attack_SubType No. of records
‘‘Denial of Service (DoS)’’ ‘‘apache2’’, ‘‘land’’, ‘‘pod’’, ‘‘smurf’’, ‘‘udpstorm’’, ‘‘worm’’ 45 927
‘‘back’’, ‘‘mailbomb’’, ‘‘teardrop’’, ‘‘neptune’’, ‘‘processtable’’
‘‘Probe’’ ‘‘ipsweep’’, ‘‘mscan’’, ‘‘portsweep’’, ‘‘satan’’, ‘‘nmap’’, ‘‘saint’’ 11 656
Root to local (R2L) ‘‘ftp_write’’, ‘‘httptunnel’’, ‘‘imap’’, ‘‘named’’, ‘‘phf’’, ‘‘sendmail’’, 995
‘‘Snmpgetattack’’, ‘‘snmpguess’’, ‘‘warezclient’’, ‘‘warezmaster’’,
‘‘xlock’’, ‘‘guess_passwd’’, ‘‘multihop’’, ‘‘spy’’, ‘‘xsnoop’’
User to Root (U2R) ‘‘buffer_overflow’’, ‘‘perl’’, ‘‘ps’’, ‘‘sqlattack’’, ‘‘xterm’’, 52
‘‘loadmodule’’, ‘‘rootkit’’
Normal 67 342
Sum of records 125 972
Table 2
The features of NSL-KDD cup dataset.
S.No Feature name S.No Feature name S.No Feature name
1 protocol_type 15 num_shells 29 srv_rerror_rate
2 src_bytes 16 num_access_files 30 root_shell
3 rerror_rate 17 serror_rate 31 dst_host_diff_srv_rate
4 is_guest_login 18 dst_host_serror_rate 32 num_root
5 srv_serror_rate 19 duration 33 dst_host_same_src_port_rate
6 diff_srv_rate 20 count 34 is_host_login
7 service 21 srv_count 35 dst_host_srv_serror_rate
8 num_failed_logins 22 wrong_fragment 36 num_file_creations
9 dst_host_count 23 dst_bytes 37 dst_host_srv_rerror_rate
10 num_compromised 24 land 38 num_outbound_cmds
11 dst_host_same_srv_rate 25 hot 39 dst_host_srv_count
12 su_attempted 26 urgent 40 same_srv_rate
13 Flag 27 srv_diff_host_rate 41 dst_host_srv_diff_host_rate
14 dst_host_rerror_rate 28 logged_in 42 class
Table 3 column. Furthermore, we get 𝐹𝑛𝑒𝑤 as the new value, which lies in the
The features of UNSW-NB 15 dataset.
range of 0 to 1. After label encoding, we then normalized the datasets
S.No Feature name S.No Feature name S.No Feature name
using min–max normalization.
1 Dur 16 djit 31 trans_depth
2 dpkts 17 ct_dst_src_ltm 32 response_body_len
3 spkts 18 ct_ftp_cmd 33 ct_srv_src
4.2.1. Feature selection
4 dbytes 19 ct_src_ltm 34 is_ftp_login After transforming the dataset to the common scale, we select the
5 sbytes 20 is_sm_ips_ports 35 ct_dst_ltm important features by the feature selection techniques. We choose the
6 rate 21 swin 36 ct_dst_sport_ltm correlation-based filter method to select the features from various fea-
7 Sttl 22 attack_cat 37 ct_src_dport_ltm
ture selection techniques. Feature selection reduces the computational
8 dttl 23 Stcpb 38 ct_state_ttl
9 sload 24 dtcpb 39 ct_flw_http_mthd time and increases the storage efficiency. Using the correlation-based
10 dload 25 dwin 40 ct_srv_dst filter method, where the features are chosen based on correlation score,
11 sloss 26 tcprtt 41 proto we picked the features. We applied the Pearson correlation method
12 dloss 27 synack 42 service
to find the association between the features. The similarity measure
13 sinpkt 28 ackdat 43 state
14 dinpkt 29 smean 44 label between two features/variables, F1 and F2, is given by Eq. (4) given
15 sjit 30 dmean below:
6
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
contains 38 features. The correlation matrix of the features of the NSL- the DNN model and are compatible with it. However, the data is
KDD and UNSW-NB15 datasets is shown in Figs. 2 and 3 respectively, transformed into 1D vector form for the 1D CNN model, which is then
which shows the highly correlated features. Features are chosen that used as input. The data is transformed into a 2D matrix for the 2D CNN
have a value higher than the threshold of 0.95 and are regarded model before being compatible with it.After encoding and scaling, the
as redundant, which leads to the removal of redundant features by processed data is converted into 2D matrix form, which is then given as
dropping one of the features. Fig. 4 shows the architecture of the DNN input into the CNN model. The dataset is transformed into a 2D matrix
model representing the layers and neurons, while Fig. 5 presents the depending on the number of features. The dataset containing 𝑋 features
architecture of the CNN model representing a number of convolution, is rescaled into the 𝑁 × 𝑁 matrix, and if 𝑋 is not a perfect square, then
pooling layers with filter size. We presented the confusion matrix for the record in the dataset is padded with ‘0’ and then transformed into
classification in Fig. 6. the nearest perfect square.
NSL-KDDnew dataset, after feature selection, has a total of 36
4.3. Feature preprocessing features ; therefore, in order to input the dataset into a 2D CNN model,
we first transformed the features into a 6 × 6 matrix. Then, we input the
The processed data is then made compatible with the model for dataset into the 2D CNN model, where each record of the dataset with
training after feature selection. The datasets are applied as input to its 36 features is transformed into a 6 × 6 pixels. The UNSW-NBnew
7
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
Table 5
Model performance evaluation metrics.
Accuracy A = (TrueP+TrueN)/(TrueP+TrueN+FalseP+FalseN)
Precision P = TrueP/(TrueP+FalseP)
Recall R = TrueP/(TrueP+FalseN)
F1 Score F = 2PR/(P+R)
the training set and validation set for training the model, and then the
model trained is tested on the testing set.
Each of the three dense hidden layers in our DNN model, each with
64 neurons, contains five neurons in it. Dense hidden layers receive
ReLu activation, but the last dense layer receives softmax activation, as
seen in Fig. 4. Fig. 4 illustrates how we used the Adam optimizer. The
sparse categorical loss function is used to optimize the model during
training. The probability between the real value and predicted values
is compared, and the loss, which measures how far the predicted value
is from the actual value, is determined. Loss minimization is the goal
during the model training. To prevent the model from being overfitting,
we tweaked it using several hyperparameters like weight decay and
dropout rate.
With a kernel size of (3, 3), ReLu activation function, and Max
pooling with a pool size of (2, 2), we create a 2D-CNN model with three
convolution layers and 64, 32, and 32 neurons. According to Fig. 5, the
number of neurons in the last layer depends on the number of classes
in the dataset, and there are 5 classes in the dataset; the dense layer,
which is at the last of the model, has 5 neurons. The softmax activation
function, which converts the n real values into values between 0 and
1, is utilized at the last layer. Additionally, we chose a 3 kernel size
Fig. 4. Architecture of the DNN model. and a 2 pooling size for the 1D-CNN model. With the help of various
hyperparameters, including kernel size and dropout rate, the model was
tweaked.
dataset contains 38 features and is transformed into a 7 × 7 matrix with To prevent overfitting, we trained the model with various hyperpa-
a padding of 11 pixels. We then split/divide the dataset into 25%–75% rameters, a weight decay of 0.0001, a dropout rate of 0, and a learning
testing and training sets. rate of 0.001 for epochs of 20 (Sharma, Sharma, & Lal, 2022a). Using a
test dataset, we evaluated the model. For the NSL-KDDnew dataset, the
4.4. Training and testing the model accuracy obtained by the DNN, 1D-CNN, and 2D-CNN models is 0.993,
0.992, and 0.994, respectively. With the UNSW-NBnew dataset as our
Following encoding, scaling, and transformation, the transformed, test subject, we were able to get accuracy scores of 0.80, 0.80, and 0.81
compatible dataset is used as the input for the training model, and for the DNN, 1D-CNN, and 2D-CNN models, respectively.
the model is trained using the training dataset. We trained DNN and
CNN models by randomly putting 75% rows in the training dataset 6. Result analysis
and then further split into putting 60% rows for the training and 15%
rows for the validation of the model. 25% dataset is applied for testing The performance of deep learning models is evaluated using differ-
the model. DNN model contains dense hidden layers, and the CNN ent parameters. In our experiment, to evaluate the performance of the
model contains three layers, namely a convolution layer, then a pooling model, we have used the following evaluation metrics, where C1 and
layer, and at last, it contains a fully connected layer for classification. C2 are the two different classes:
The convolution layer uses a filter or kernel, stride, padding, and an
• True-positives (TrueP): the outcome/predicted value which be-
activation function is applied, and the output is termed a ‘feature map’.
longs to class C1 is accurately categorized as class C1.
Pooling is applied for dimensionality reduction of each feature map,
• False-positive (FalseP): the outcome/predicted value which be-
such as Max pooling and Avg pooling. The resultant data from the
longs to class C2 is incorrectly identified as class C1.
convolution and pooling is then input to a fully connected layer. The
• True-negatives (TrueN): the outcome/predicted value which be-
model is trained using the training dataset during the training phase. In
order to make sure that the training and testing accuracies are nearly longs to class C2 is accurately categorized as class C2.
comparable, the trained model is now tested in the testing phase using • False-negative (FalseN): the outcome/predicted value which be-
a new dataset called the testing dataset that contains normal/attack longs to class C1 is incorrectly identified as class C2.
classes. The model is overfit if training accuracy is high and testing We presented the confusion matrix for classification in Fig. 6. We also
accuracy is low. calculated the following evaluation metrics as mentioned below and
tabulated in Table 5
5. Experimental setup
• Precision (P): It finds how accurate/ precise the model is by
We utilized the TensorFlow library to build the models after upload- measuring the number of actual positives from the total number
ing the dataset to Google’s Colaboratory. After feature selection and of predicted positives.
data preparation, the dataset is divided into three sets, each measuring • Recall (R): It finds how many are predicted as positives out of the
60%, 15%, and 25% of the total data set. These sets are referred to as actual positives.
8
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
9
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
Fig. 7. Confusion matrix of (a) DNN, (b) 1D CNN and (c) 2D CNN model for NSL-KDDnew dataset.
Fig. 8. (a) Precision, (b) Recall and (c) F1-Score of DNN, 1D-CNN and 2D-CNN model for NSL-KDDnew.
Fig. 9. Confusion matrix of DNN, 1D CNN and 2D CNN model for UNSW-NBnew.
Fig. 10. (a) Precision, (b) Recall and (c) F1-Score of DNN, 1D-CNN and 2D-CNN model for UNSW-NBnew.
Table 6
Evaluation metrics of DNN, 1D-CNN and 2D-CNN model for NSL-KDDnew.
Classification report
Attacks DNN model ID CNN model 2D CNN model
Precision Recall F1-Score Precision Recall F1-Score Precision Recall F1-Score
DoS 1.00 0.99 1.00 1.00 0.99 0.99 1.00 1.00 1.00
Normal 0.99 1.00 0.99 0.98 0.99 0.99 0.99 1.00 0.99
Probe 0.99 0.99 0.99 0.98 0.99 0.98 0.98 0.99 0.99
R2L 0.80 0.54 0.06 0.85 0.70 0.77 0.93 0.51 0.66
U2R 0.83 0.38 0.53 0.58 0.54 0.56 0.60 0.23 0.33
Accuracy 0.99 0.99 0.99
10
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
Table 7
Evaluation metrics of DNN, 1D-CNN and 2D-CNN model for UNSW-NBnew dataset.
Classification report
Attacks DNN model ID CNN model 2D CNN model
Precision Recall F1-Score Precision Recall F1-Score Precision Recall F1-Score
DoS 0.49 0.03 0.06 0.48 0.06 0.10 0.57 0.04 0.07
Exploits 0.67 0.90 0.77 0.64 0.94 0.76 0.69 0.91 0.78
Fuzzers 0.59 0.81 0.68 0.65 0.67 0.66 0.60 0.80 0.69
Generic 1.00 0.97 0.98 0.99 0.97 0.98 1.00 0.97 0.99
Normal 0.93 0.79 0.85 0.91 0.81 0.85 0.91 0.81 0.86
Accuracy 0.80 0.80 0.81
Fig. 11. (a) Accuracy vs. Epochs for NSL-KDDnew dataset (b) Loss vs. Epochs for NSL-KDDnew dataset (c) Accuracy vs. Epochs for UNSW-NBnew dataset (d) Loss vs. Epochs of
UNSW-NBnew dataset.
chart, it shows the features of interest, and in the right, it shows instance. Logged_in has a value of 1.00, and the weight assigned
the value of features. Orange specifies the positive impact, and in the bar chart is 0.26, the value of protocol_type is 0.5, and the
blue specifies the negative impact of the feature. weight assigned is 0.14. The total value is 0.26 for Normal and
Fig. 13(a) shows that the actual value is Normal, and the pre- 0.07 for Not Normal, and since the value of Normal is greater
dicted value is normal. On the left, we see that the model pre- than Not Normal, the model predicted the instance as Normal.
dicted as Normal with 100% accuracy, and in the middle, it Similarly, the actual value is DoS, and the predicted value is DoS
shows the top ten features. The right side of the bar chart shows with 100% accuracy, as shown in Fig. 13(b).
the features which help to predict the instance as Normal, and For UNSW-NB 15, the actual value is Normal, and the predicted
the left side shows the features which help to predict the in- value is Normal with 100% accuracy, as shown in Fig. 14(a).
stance as not Normal. To predict instance as Normal, the features Another instance selected to form the testing set shows that the
wrong_fragment, hot, serror_rate, rerror_rate, su_attempted have actual value is Exploits and the predicted value is Exploits with
values ≤ 0.00, and the weight assigned are 0.40, 0.39, 0.30, 0.23, 51% accuracy, as shown in Fig. 14(b).
0.16 respectively. And to predict an instance as Not Normal, the • SHAP Model Explanation: SHAP is used widely for explaining
features protocol_type and num_shells have values ≤ 0.50 and models and understanding how the features are related to the
0.00, respectively and the weights assigned are 0.14 and 0.09. predictions. SHAP provides the local and global explanation. In
The value of features is shown on the right side for the selected local explanation, we select a particular instance and explain
11
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
the model prediction showing each feature’s contribution to the plot of the DoS instance, showing each feature’s contribution to
prediction of the instance selected. In the global explanation, the prediction. The plot shows the base value, and the features
we explain the model prediction using the contribution of each having a positive impact on the prediction are in red, and the
feature in the prediction. features showing a negative impact on the predictions are in
SHAP calculates the Shapley value, which shows the impact of blue. The base value in the plot is the average of all prediction
features on the model predictions. We selected a particular in- values. Each strip in the plot shows the impact of the features in
stance and calculated the shape values. Fig. 15 shows the local pushing the predicted value close or farther from the base value.
12
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
Fig. 15. The local plot of DoS instance of NSL-KDDnew showing the each feature’s contribution to the prediction.
Fig. 16. The local plot of normal instance of UNSW-NBnew showing the each feature’s contribution to the prediction.
13
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
14
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
7. Conclusions Almaiah, A., & Almomani, O. (2020). An investigation of digital forensics for shamoon
attack behaviour in FOG computing and threat intelligence for incident response.
Journal of Theoretical and Applied Information Technology, 15, 98.
In this manuscript, we study different layers of IoT with the main
Almaiah, M. A., Hajjej, F., Ali, A., Pasha, M. F., & Almomani, O. (2022). A novel
security issues in each layer and propose a deep learning model for hybrid trustworthy decentralized authentication and data preservation model for
intrusion detection, and explain the model using an explainable AI digital healthcare IoT based CPS. Sensors, 22(4), 1448.
concept to build trust in the model. In multiclass classification, our Altulaihan, E., Almaiah, M. A., & Aljughaiman, A. (2022). Cybersecurity threats,
Deep learning model achieved higher accuracy for both datasets. We countermeasures and mitigation techniques on the IoT: Future research directions.
Electronics, 11(20), 3330.
also explained the DNN model using LIME and SHAP. The dataset
Chaabouni, N., Mosbah, M., Zemmari, A., Sauvignac, C., & Faruki, P. (2019). Net-
applied to the model is not balanced, so the accuracy of the class having work intrusion detection for IoT security based on learning techniques. IEEE
a majority number of records are high as compared to the number of Communications Surveys & Tutorials, 21(3), 2671–2701.
minority class, so our future work is to resolve the issue and improve Da Xu, L., He, W., & Li, S. (2014). Internet of things in industries: A survey. IEEE
the accuracy of minority classes in the dataset. Our proposed method Transactions on Industrial Informatics, 10(4), 2233–2243.
Fatani, A., Dahou, A., Abd Elaziz, M., Al-Qaness, M. A., Lu, S., Alfadhli, S. A., et
achieved high accuracy with reduced training time. However, several
al. (2023). Enhancing intrusion detection systems for IoT and cloud environments
issues still need to be improved. using a growth optimizer algorithm and conventional neural networks. Sensors,
The proposed model can be applied to other datasets, and the pre- 23(9), 4430.
dictions of the model can be explained using LIME and SHAP methods. Fenanir, S., Semchedine, F., & Baadache, A. (2019). A machine learning-based
We reduced the number of features by selecting fewer features, which lightweight intrusion detection system for the internet of things. Revista d’Intelligence
Artificial, 33(3), 203–211.
reduced the number of inputs and decreased the computational cost.
Ge, M., Syed, N. F., Fu, X., Baig, Z., & Robles-Kelly, A. (2021). Towards a deep learning-
Our future work is to apply other feature reduction techniques and driven intrusion detection approach for Internet of Things. Computer Networks, 186,
find the optimal number of features that gives high accuracy to the Article 107784.
model. The limitation of our model is the class imbalance issue in the Hassan, M. M., Gumaei, A., Alsanad, A., Alrubaian, M., & Fortino, G. (2020). A hybrid
dataset. We need to resolve the class imbalance issue by generating the deep learning model for efficient intrusion detection in big data environment.
Information Sciences, 513, 386–396.
synthetic data using GANs, applying other feature reduction techniques,
Karatas, G., Demir, O., & Sahingoz, O. K. (2020). Increasing the performance of machine
and finding the best possible set of features. We applied LIME and SHAP learning-based IDSs on an imbalanced and up-to-date dataset. IEEE Access, 8,
to explain the model only, but after visualizing the outcome of SHAP 32150–32162.
and LIME, we can make the changes to the DNN model and find the Kasongo, S. M., & Sun, Y. (2020). A deep learning method with wrapper based feature
best model for the dataset. extraction for wireless intrusion detection system. Computers & Security, 92, Article
101752.
Khan, Z. A., & Herrmann, P. (2019). Recent advancements in intrusion detection
CRediT authorship contribution statement systems for the internet of things. Security and Communication Networks, 2019.
Kim, J., Kim, J., Kim, H., Shim, M., & Choi, E. (2020). CNN-based network intrusion
Bhawana Sharma: Conceptualization, Methodology, Software, detection against denial-of-service attacks. Electronics, 9(6), 916.
Writing – original draft. Lokesh Sharma: Visualization, Investigation, Liang, C., Shanmugam, B., Azam, S., Jonkman, M., De Boer, F., & Narayansamy, G.
(2019). Intrusion detection system for Internet of Things based on a machine
Supervision. Chhagan Lal: Supervision, Software. Satyabrata Roy:
learning approach. In 2019 international conference on vision towards emerging trends
Writing – review & editing, Supervision, Investigation. in communication and networking (ViTECoN) (pp. 1–6). IEEE.
Lin, J., Yu, W., Zhang, N., Yang, X., Zhang, H., & Zhao, W. (2017). A survey on
Declaration of competing interest internet of things: Architecture, enabling technologies, security and privacy, and
applications. IEEE Internet of Things Journal, 4(5), 1125–1142.
Ma, W. (2020). Analysis of anomaly detection method for internet of things based
The authors declare that they have no known competing finan-
on deep learning. Transactions on Emerging Telecommunications Technologies, 31(12),
cial interests or personal relationships that could have appeared to Article e3893.
influence the work reported in this paper. Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Shabtai, A., Breitenbacher, D.,
et al. (2018). N-baiot—network-based detection of iot botnet attacks using deep
Data availability autoencoders. IEEE Pervasive Computing, 17(3), 12–22.
Nagisetty, A., & Gupta, G. P. (2019). Framework for detection of malicious activities in
IoT networks using keras deep learning library. In 2019 3rd international conference
Data will be made available on request on computing methodologies and communication (ICCMC) (pp. 633–637). IEEE.
Qiu, H., Dong, T., Zhang, T., Lu, J., Memmi, G., & Qiu, M. (2020). Adversarial attacks
References against network intrusion detection in IoT systems. IEEE Internet of Things Journal,
8(13), 10327–10335.
Abd Elaziz, M., Al-qaness, M. A., Dahou, A., Ibrahim, R. A., & Abd El-Latif, A. A. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ‘‘Why should i trust you?’’ Explain-
(2023). Intrusion detection approach for cloud and IoT environments using deep ing the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
learning and Capuchin search algorithm. Advances in Engineering Software, 176, international conference on knowledge discovery and data mining (pp. 1135–1144).
Article 103402. Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable artificial intelligence:
Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Understanding, visualizing and interpreting deep learning models. arXiv preprint
Network intrusion detection system: A systematic study of machine learning and arXiv:1708.08296.
deep learning approaches. Transactions on Emerging Telecommunications Technologies, Sharma, B., Sharma, L., & Lal, C. (2019). Anomaly detection techniques using
32(1), Article e4150. deep learning in IoT: a survey. In 2019 international conference on computational
Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). intelligence and knowledge economy (ICCIKE) (pp. 146–149). IEEE.
Internet of things: A survey on enabling technologies, protocols, and applications. Sharma, B., Sharma, L., & Lal, C. (2022a). Anomaly based network intrusion detection
IEEE Communications Surveys & Tutorials, 17(4), 2347–2376. for IoT attacks using convolution neural network. In 2022 IEEE 7th international
Al-Garadi, M. A., Mohamed, A., Al-Ali, A. K., Du, X., Ali, I., & Guizani, M. (2020). A conference for convergence in technology (I2CT) (pp. 1–6). http://dx.doi.org/10.1109/
survey of machine and deep learning methods for internet of things (IoT) security. I2CT54291.2022.9824229.
IEEE Communications Surveys & Tutorials, 22(3), 1646–1685. Sharma, B., Sharma, L., & Lal, C. (2022b). Feature selection and deep learning technique
Al Hwaitat, A. K., Almaiah, M. A., Almomani, O., Al-Zahrani, M., Al-Sayed, R. M., for intrusion detection system in IoT. In Proceedings of international conference on
Asaifi, R. M., et al. (2020). Improved security particle swarm optimization (PSO) computational intelligence: ICCI 2020 (pp. 253–261). Springer.
algorithm to detect radio jamming attacks in mobile networks. International Journal Sharma, B., Sharma, L., & Lal, C. (2023). Anomaly-based DNN model for intrusion
of Advanced Computer Science and Applications, 11(4). detection in IoT and model explanation: Explainable artificial intelligence. In
Al Nafea, R., & Almaiah, M. A. (2021). Cyber security threats in cloud: Literature Proceedings of second international conference on computational electronics for wireless
review. In 2021 international conference on information technology (ICIT) (pp. communications: ICCWC 2022 (pp. 315–324). Springer.
779–786). IEEE. Siam, A. I., Almaiah, M. A., Al-Zahrani, A., Elazm, A. A., El Banby, G. M., El-Shafai, W.,
Ali, A., Almaiah, M. A., Hajjej, F., Pasha, M. F., Fang, O. H., Khan, R., et al. (2022). et al. (2021). Secure health monitoring communication systems based on IoT and
An industrial IoT-based blockchain-enabled secure searchable encryption approach cloud computing for medical emergency applications. Computational Intelligence and
for healthcare systems using neural network. Sensors, 22(2), 572. Neuroscience, 2021.
15
B. Sharma et al. Expert Systems With Applications 238 (2024) 121751
Sun, P., Liu, P., Li, Q., Liu, C., Lu, X., Hao, R., et al. (2020). DL-IDS: extracting Xiao, Y., Xing, C., Zhang, T., & Zhao, Z. (2019). An intrusion detection model based on
features using CNN-LSTM hybrid network for intrusion detection system. Security feature reduction and convolutional neural networks. IEEE Access, 7, 42210–42219.
and Communication Networks, 2020. Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., et al. (2018). Machine learning and
Teng, S., Wu, N., Zhu, H., Teng, L., & Zhang, W. (2017). SVM-DT-based adaptive deep learning methods for cybersecurity. Ieee Access, 6, 35365–35381.
and collaborative intrusion detection. IEEE/CAA Journal of Automatica Sinica, 5(1), Xu, H., Fang, C., Cao, Q., Fu, C., Yan, L., & Wei, S. (2018). Application of a
108–118. distance-weighted KNN algorithm improved by moth-flame optimization in network
Thamilarasu, G., & Chawla, S. (2019). Towards deep-learning-driven intrusion detection intrusion detection. In 2018 IEEE 4th international symposium on wireless systems
for the internet of things. Sensors, 19(9), 1977. within the international conferences on intelligent data acquisition and advanced
Vinayakumar, R., Alazab, M., Soman, K., Poornachandran, P., Al-Nemrat, A., & computing systems (IDAACS-SWS) (pp. 166–170). IEEE.
Venkatraman, S. (2019). Deep learning approach for intelligent intrusion detection Zhou, Y., Han, M., Liu, L., He, J. S., & Wang, Y. (2018). Deep learning approach
system. Ieee Access, 7, 41525–41550. for cyberattack detection. In IEEE INFOCOM 2018-IEEE conference on computer
Vinayakumar, R., Alazab, M., Srinivasan, S., Pham, Q.-V., Padannayil, S. K., & communications workshops (INFOCOM WKSHPS) (pp. 262–267). IEEE.
Simran, K. (2020). A visualized botnet detection system based deep learning for the Zhou, Z., Hooker, G., & Wang, F. (2021). S-lime: Stabilized-lime for model explanation.
internet of things networks of smart cities. IEEE Transactions on Industry Applications, In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data
56(4), 4436–4456. mining (pp. 2429–2438).
16