Support Based Graph Framework For Effective Intrusion Detection
Support Based Graph Framework For Effective Intrusion Detection
Research Article
Keywords: Normalization, Features selection, Optimization, Support value measure, Classi cation
DOI: https://doi.org/10.21203/rs.3.rs-1035364/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
SUPPORT BASED GRAPH FRAMEWORK FOR EFFECTIVE INTRUSION
DETECTION AND CLASSIFICATION
*1
Rahul B Adhao, 2 Vinod K Pachghare
*1
Researcher at Department of Computer Engineering, College of Engineering Pune (COEP) India.
2
Associate Professor at Department of Computer Engineering, College of Engineering Pune (COEP) India.
*Email Id: [email protected]
ABSTRACT
Intrusion Detection System is one of the worthwhile areas for researchers for a long.
Numbers of researchers have worked for increasing the efficiency of Intrusion Detection
Systems. But still, many challenges are present in modern Intrusion Detection Systems. One
of the major challenges is controlling the false positive rate. In this paper, we have presented
an efficient soft computing framework for the classification of intrusion detection dataset to
diminish a false positive rate. The proposed processing steps are described as; the input data
is at first pre-processed by the normalization process. Afterward, optimal features are chosen
for the dimensionality decrease utilizing krill herd optimization. Here, the effective feature
assortment is utilized to enhance classification accuracy. Support value is then estimated
from ideally chosen features and lastly, a support value-based graph is created for the
powerful classification of data into intrusion or normal. The exploratory outcomes
demonstrate that the presented technique outperforms the existing techniques regarding
different performance examinations like execution time, accuracy, false-positive rate, and
their intrusion detection model increases the detection rate and decreases the false rate.
Declarations
1. INTRODUCTION
With the progress of the Internet in the modern world, security threats to computer systems
and the network have improved a lot. The security threats influence system security
administrations. To control security threats number of innovations are established and
organized in administrations, for example, firewall, anti-virus software, message encryption,
secured software protocols, and so on. Likewise, this Intrusion Detection is a significant
innovation that has existed for a long time [1-3]. Intrusion detection is one of the significant
presentations of outlier identification that is utilized to recognize the system attacks by
opponents. Intrusion Detection Systems (IDSs) are fundamental to guarantee system security
[4, 5]. The generally utilized methodologies for intrusion detection are the anomaly and
signature dependent methodologies [6]. Anomaly-dependent IDSs gain proficiency with the
benchmark for the system conduct and any occasion that falls outside the accepted behavior
is confirmed as a malicious event. The signature dependent IDSs acquire the normal and
anomalous events to identify the attacks with their types [7].
The Intrusion detection system (IDS) is a significant component of secure information
systems [8, 9]. Intruders in the network are attempting to access the unapproved resources in
the system [10]. It is highly required to screen and examine the actions of the user and
framework behaviors. Essentially, by adjusting the arrangement of the system parameters, the
conduct of the framework could be unpredictable. Subsequently, the framework must be
furnished with the highlights for the intermittent observing and its conduct standards both for
normal and abnormal activities [11-13]. Machine learning techniques can be effective in
detecting intrusions. Numerous Intrusion Detection Systems are displayed dependent on
machine learning strategies [14].
Machine learning is a common term for depicting a lot of optimization and processing
techniques that are lenient of roughness and vulnerability. Currently, a machine learning
framework has been stretched out for executing a successful interruption location framework.
Machine learning approaches are exceptionally useful and enhanced in current intrusion
detection [15-17]. Precisely, support vector machines, neural networks, decision trees to have
powerful important plans in anomaly detection structures to improve the characterization
execution and speed [18]. The key components of machine learning procedures are Fuzzy
Logic (FL), Artificial Neural Networks (ANNs), Probabilistic Reasoning (PR), and Genetic
Algorithms (GAs). The idea behind the application of soft computing techniques and
particularly ANNs in implementing IDSs is to include an intelligent agent in the system that
is capable of disclosing the latent patterns in abnormal and normal connection audit records
and to generalize the patterns to new connection records of the same class [19, 20].
Optimum features are selected from the normalized information using krill herd
optimization for the features dimensionality reduction.
Effective feature selection using krill herd optimization enhances the classification
accuracy by diminishing the false positive rate. Support value is estimated for
effectually selected features.
Support based graph is constructed for the effective classification of data into
intrusion or normal.
The structure of the manuscript is sorted out as pursues: Section 2 reviews the literature
works in regards to the proposed strategy. In section 3, a short discussion about the proposed
system is given, section 4 examines the exploratory outcomes, and section 5 finishes up the
paper.
2. RELATED WORK
Majjed Al-Qatf et al. [21] proposed a powerful deep learning method STL-IDS dependent
on the self-trained learning (STL) system. The presented methodology was utilized to feature
learning and dimensionality reduction and improves the prediction accuracy of support vector
machines (SVM) concerning attacks. The presented approach was assembled utilizing the
sparse auto-encoder system, which was an effective learning algorithm for reconstructing a
new feature representation in an unsupervised method. After the pre-processing step, the new
feature is fed into the SVM to enhance its prediction capacity for intrusion and classification
accuracy.
Farrukh Aslam Khan et al. [22] presented a novel two-stage deep learning (TSDL)
dependent on a stacked auto-encoder for proficient system intrusion detection. The model
contains 2 decision steps: the first step is responsible for classifying network traffic as normal
or abnormal using a probability score value. Secondly, it was utilized in the final decision
step as an additional. The presented model was able to learn useful feature representations
from large amounts of unlabelled data and classifies them automatically and efficiently.
Chuanlong Yin et al. [23] presented an intrusion detection system-dependent based on deep
learning, and we propose a deep learning approach for intrusion detection using recurrent
neural networks (RNN-IDS). Besides, they investigated the performance of the design in
binary classification and multiclass classification, and the number of neurons and different
learning rate impacts on the performance of the proposed model. They analyzed the presented
strategy with existing soft computing techniques presented by previous researchers on the
benchmark data set.
Jie Gu et al. [24] presented a methodology for intrusion detection using an SVM ensemble
with feature extension. Precisely, the logarithm marginal density ratios transformation was
implemented on the original features to obtain new and better-quality transformed training
data; the SVM ensemble was then used to build the intrusion detection model. Exploratory
outcomes demonstrate that the presented technique can achieve a good and robust execution.
Haipeng Yao et al. [25] presented an MSML framework that incorporates components such
as, pure cluster mining, pattern discovery, fine-grained classification, and model updating. In
the pure cluster module, they presented knowledge of pure cluster formation and presented a
hierarchical semi-supervised k-means calculation mean to discover all the unadulterated
clusters. In the pattern discovery model, they defined the unknown pattern and apply a
cluster-based technique intending to locate those unknown patterns. At that point, a test was
sentenced to mark known examples or unlabelled unknown patterns. The fine-grained
classification module can achieve fine-grained classification for those unknown pattern
samples.
3. PROPOSED METHODOLOGY
In this paper, we have proposed an efficient methodology for the classification of intrusion
detection data. The input information is first pre-processed by the normalization procedure.
Afterward, the finest features are chosen from the normalized information for the highlights
dimensionality decrease utilizing krill herd optimization. Here, the viable feature
determination enhances the classification precision by diminishing the false positive rate. At
that point, support value is estimated for effectually chosen features, and then a support based
graph is constructed for the effective classification of data into intrusion or normal. The block
diagram of the presented methodology is given in figure 1.
DATABASE
FEATURES
PREPROCESSING
SELECTION
INTRUSION
DETECTION
SUPPORT BASED
CLASSIFICATION
GRAPH
NORMAL INTRUSION
3.2 PREPROCESSING
3.2.1 Normalization
Normalization performs the direct change of input information to fit into a particular range.
Here, Min-max normalization is utilized for the standardization of data which linearly
transforms the data. Min-Max normalization is regularly done through the accompanying
condition,
Y Ymin
Y (1)
Ymax Ymin
Where, Ymin and Ymax are the minimum and maximum values in Y , and Y is the set of values
in the dataset.
Begin
Define the size of the populace ( S ' ) and cycle ( Ĉ max )
Initialization
Set cycle C ' 1 ;
Initialize the cluster information as input and population data
~
S 1,2,3,.....S ' Of krill arbitrarily.
Fitness assessment
Evaluate each krill as specified by the krill location
While C' Cˆ max do
Class the populace/krill from finest to extremely worst.
for i 1 : S ' do
Perform the accompanying motion calculations,
1) Movement actuated by the krill
2) Foraging action
3) Physical dispersion
Update the krill location in the inquiry space.
Evaluate each krill according to its location.
end for i
Categorize the krill from the finest to the poorest and locate the present
best.
Cˆ max C'1 .
End while
Estimate the krill finest result.
End
Accuracy
Accuracy is determined as the quantity of every single right prediction (TN TP) divided
by the absolute number of a dataset ( TN TP FN FP ). It quantifies the degree of
accurateness of information classification. Accuracy is ascertained by utilizing the condition
(10),
(TN TP)
Accuracy (10)
(TN TP FN FP)
Where TN is a true negative, TP is the true positive, FP is the false positive, and FN is the
false negative.
Sensitivity
Sensitivity is the number of true positives that are viably recognized by a classification test.
It demonstrates how extraordinary the test is at classifying the information. Sensitivity is
computed by utilizing the condition (11).
Sensitivit y TP/(TP FN) (11)
Specificity
Specificity is the number of true negatives effectively-recognized through classification
tests. It recommends how great the test is at distinguishing normal data. Specificity is
computed by utilizing the condition (12).
Specificity TN/(TN FP) (12)
F-measure
It is an estimate of a test's precision. The F measure picks up its best value at 1
accompanied by the most unpleasant at 0. It is determined by the condition (15).
2TP
F
2TP FP FN (15)
The features chosen of the presented work utilizing the krill herd optimization is contrasted
with the existing GA, PSO optimization algorithms, and the features selection without
optimization. The comparison examination provided in table 3 demonstrates that the
performance of the proposed feature selection is enhanced than existing techniques.
Table 3: Comparison table of presented optimization with existing optimization utilizing
KDD-CUP 99 dataset
Proposed PSO GA
Measures
Accuracy 99.2 87.5 78.8
Sensitivity 98.68 79.56 70.20
Specificity 99.63 97.31 91.37
Precision 99.55 97.34 92.25
Recall 98.68 79.56 70.20
F measure 99.11 87.56 79.73
FPR 0.36 0.26 8.62
FNR 1.31 20.43 29.79
Kappa 98.53 76.67 59.82
Rank sum 85 63 28
The classification performance outcomes are given in figure 4 and the feature selection
utilizing different optimization techniques is provided in figure 5 proves that the proposed
work accuracy is much greater than the existing techniques.
Figure 4: Comparison graph regarding accuracy with various classifiers
The performance of the proposed work features selection is contrasted with the different
existing optimization techniques utilizing CIC IDS 2017 dataset is in table 5 and the obtained
results prove that the proposed work performance is more prominent than the existing
techniques in every performance measure.
Table 5: Comparison table of proposed optimization with existing optimization utilizing
CIC IDS 2017 dataset
Measures Proposed PSO GA
Accuracy 99.5 87.2 80.5
Sensitivity 99.4 86.32 82.37
Specificity 99.59 88.11 78.82
Precision 99.6 88.4 77.6
Recall 99.4 86.32 82.37
F measure 99.5 87.35 79.91
FPR 0.40 11.88 21.17
FNR 0.59 13.67 17.62
Kappa 99 74.4 61
Rank sum 96.43 59.16 19.45
The comparison graphs in figure 7 and figure 8 portray that the presented technique is
superior to the number of previous optimization algorithms regarding accuracy.
Figure 9: ROC curve with different optimizations utilizing CIC IDS 2017 dataset
The ROC curve comparison of the presented work with existing techniques utilizing the
CIC IDS 2017 dataset is depicted in figure 9. It depicts that the proposed work detection
performance is better than the existing technique through the ROC curve.
The comparison table 7 delineates the proposed work feature selection utilizing krill herd
optimization results in a better outcome than the existing techniques for different execution
measures. Here, the performance of the proposed work is examined with the ISCX IDS 2012
dataset.
Table 7: Comparison table of proposed optimization with existing optimization utilizing
ISCX IDS 2012 dataset
Measures Proposed PSO GA
Accuracy 99.5 87.1 83.6
Sensitivity 99.5 91.67 90.25
Specificity 99.49 68.20 58.57
Precision 99.87 92.25 89.12
Recall 98.50 91.67 90.25
F measure 99.68 91.96 89.68
FPR 0.50 31.79 41.42
FNR 0.49 8.3 9.74
Kappa 97.53 53.49 45.04
Rank sum 86.65 77.89 57.97
The proposed work accuracy for ISCX IDS 2012 dataset is examined with different
classifiers and optimization algorithms than the existing techniques are demonstrated by
figure 10 and figure 11. It displays that the accuracy of the proposed work with the ISCX IDS
2012 dataset is vastly improved than the existing techniques.
Figure 10: Comparison graph in terms of accuracy with various classifiers
The proposed work features selection using the krill herd optimization is compared with the
existing optimization algorithms in table 9 and the comparison analysis proves that the
performance of the proposed work is improved than the existing techniques.
Table 9: Comparison table of proposed optimization with existing optimization using
CICDDOS2019 dataset
Measures Proposed PSO GA
Accuracy 99.6 87.5 81.6
Sensitivity 99.2 74.13 62.89
Specificity 99.73 92.17 88.03
Precision 99.2 76.8 64.4
Recall 99.2 74.13 62.89
F measure 99.2 75.44 63.63
FPR 0.26 7.82 11.96
FNR 0.8 25.86 37.10
Kappa 99.46 81.81 72.03
Rank sum 100 64.41 75.77
The classification performance and the optimization algorithms performance using the
CICDDOS2019 dataset in figure 13 and figure 14 proves that the proposed work accuracy is
much greater than the existing techniques.
Figure 15: ROC curve with various techniques using CICDDOS2019 dataset
The ROC curve with various existing techniques in intrusion detection is given in figure 15
is analyzed with the CICDDOS2019 dataset. It displays that the detection performance of the
proposed support value-based classification is better than the existing classifications.
5. CONCLUSION
In this paper, we have presented a support value-based graph classification for the
categorization of data into normal or intrusion. Moreover, an optimal feature selection
utilizing krill herd optimization yields superior outcomes for effectively choosing the
features. In the presented technique, the input data is pre-processed and the features are
ideally chosen to utilize optimization. Lastly, an effective support value-based graph
classification is efficiently categorized the data into normal or intrusion. The exploratory
outcomes exhibit that our presented classification outperforms the existing SVM, Naive
Bayes, and random forest classifiers concerning performance metrics such as accuracy,
sensitivity, specificity, precision, recall, F-measure, FPR, FNR, Kappa, and Rank sum
measures.
REFERENCES
[1] Gnanaprasanambikai, L., and Nagarajan Munusamy. "Data Pre-Processing and
Classification for Traffic Anomaly Intrusion Detection Using NSLKDD Dataset" Cybernetics
and Information Technologies 18, no. 3 (2018): 111-119.
[2] Vinutha, Poornima, A Survey - Comparative Study on Intrusion detection System,
"International Journal of Advanced Research in Computer and Communication Engineering
Vol. 4, 2015.
[3] Binbusayyis, Adel, and Thavavel Vaiyapuri. "Identifying and Benchmarking Key
Features for Cyber Intrusion Detection: An Ensemble Approach." IEEE Access 7 (2019):
106495-106513.
[4] Anwar, Shahid, Jasni Mohamad Zain, Mohamad Fadli Zolkipli, Zakira Inayat, Suleman
Khan, Bokolo Anthony, and Victor Chang. "From intrusion detection to an intrusion response
system: fundamentals, requirements, and future directions." Algorithms 10, no. 2 (2017): 39.
[5] Ashfaq, Rana Aamir Raza, Xi-Zhao Wang, Joshua Zhexue Huang, Haider Abbas, and Yu-
Lin He. "Fuzziness based semi-supervised learning approach for intrusion detection system."
Information Sciences 378 (2017): 484-497.
[6] Aziz, Amira Sayed A., E. L. Sanaa, and Aboul Ella Hassanien. "Comparison of
classification techniques applied for network intrusion detection and classification." Journal
of Applied Logic24 (2017): 109-118.
[7] Upasani, Nilam, and Hari Om. "A modified neuro-fuzzy classifier and its parallel
implementation on modern GPUs for real time intrusion detection" Applied Soft Computing
(2019): 105595.
[8] Ben-Asher, Noam, and Cleotilde Gonzalez "Effects of cyber security knowledge on attack
detection." Computers in Human Behavior 48 (2015): 51-61.
[9] Manzoor, Ishfaq, and Neeraj Kumar. "A feature reduced intrusion detection system using
ANN classifier." Expert Systems with Applications 88 (2017): 249-257.
[10] Kevric, Jasmin, Samed Jukic, and Abdulhamit Subasi. "An effective combining
classifier approach using tree algorithms for network intrusion detection." Neural Computing
and Applications 28, no. 1 (2017): 1051-1058.
[11] Selvakumar, B., and K. Muneeswaran. "Firefly algorithm based feature selection for
network intrusion detection." Computers & Security 81 (2019): 148-155.
[12] Aljawarneh, Shadi, Monther Aldwairi, and Muneer Bani Yassein. "Anomaly-based
intrusion detection system through feature selection analysis and building hybrid efficient
model" Journal of Computational Science 25 (2018): 152-160.
[13] Ambusaidi, Mohammed A., Xiangjian He, Priyadarsi Nanda, and Zhiyuan Tan.
"Building an intrusion detection system using a filter-based feature selection algorithm" IEEE
transactions on computers 65, no. 10 (2016): 2986-2998.
[14] Anjum Khan, Anjana Nigam,"Analysis of Intrusion Detection and Classification using
Machine Learning Approaches", International journal of online science, 2015.
[15] Bhuyan, Monowar H., Dhruba Kumar Bhattacharyya, and Jugal K. Kalita. "Network
anomaly detection: methods, systems and tools." IEEE communications surveys &
tutorials16, no. 1 (2014): 303-336.
[16] Ahmed, Mohiuddin, Abdun Naser Mahmood, and Jiankun Hu. "A survey of network
anomaly detection techniques" Journal of Network and Computer Applications 60 (2016):
19-31.
[17] Shahreza, M. Lotfi, D. Moazzami, B. Moshiri, and M. R. Delavar. "Anomaly detection
using a self-organizing map and particle swarm optimization." Scientia Iranica 18, no. 6
(2011): 1460-1468.
[18] Aslahi-Shahri, B. M., Rasoul Rahmani, M. Chizari, A. Maralani, M. Eslami, M. J.
Golkar, and A. Ebrahimi. "A hybrid method consisting of GA and SVM for intrusion
detection system" Neural computing and applications 27, no. 6 (2016): 1669-1676.
[19] Srinivas Mishra, Sateesh Kumar Pradhan, Subhendu Kumar Rath, "Performance
Analysis of Network Intrusion Detection System using Back Propagation for Feed Forward
Neural Network in MATLAB/SIMULINK" International Journal of Computational
Engineering Research (IJCER), Vol. 8, 2018.
[20] Sunita, Swain, Badajena J. Chandrakanta, and Rout Chinmayee. "A hybrid approach of
intrusion detection using ANN and FCM." European Journal of Advances in Engineering and
Technology 3, no. 2 (2016): 6-14.
[21] Al-Qatf, Majjed, Yu Lasheng, Mohammed Al-Habib, and Kamal Al-Sabahi. "Deep
Learning Approach Combining Sparse Autoencoder With SVM for Network Intrusion
Detection" IEEE Access 6 (2018): 52843-52856.
[22] Khan, Farrukh Aslam, Abdu Gumaei, Abdelouahid Derhab, and Amir Hussain. "A
Novel Two-Stage Deep Learning Model for Efficient Network Intrusion Detection." IEEE
Access 7 (2019): 30373-30385.
[23] Yin, Chuanlong, Yuefei Zhu, Jinlong Fei, and Xinzheng He "A deep learning approach
for intrusion detection using recurrent neural networks." Ieee Access 5 (2017): 21954-21961.
[24] Gu, Jie, Lihong Wang, Huiwen Wang, and Shanshan Wang. "A novel approach to
intrusion detection using SVM ensemble with feature augmentation" Computers & Security
(2019).
[25] Yao, Haipeng, Danyang Fu, Peiying Zhang, Maozhen Li, and Yunjie Liu. "MSML: A
Novel Multilevel Semi-Supervised Machine Learning Framework for Intrusion Detection
System." IEEE Internet of Things Journal 6, no. 2 (2018): 1949-1959.
[26]https://github.com/defcom17/NSL_KDD/blob/master/Original%20NSL%20KDD%20Zi
p.zip
[27] https://www.unb.ca/cic/datasets/
[28] http://ids-hogzilla.org/dataset/
[29] https://www.unb.ca/cic/datasets/ddos-2019.html
[30] Almomani, Omar. "A Feature Selection Model for Network Intrusion Detection System
Based on PSO, GWO, FFA and GA Algorithms." Symmetry 12.6 (2020): 1046.
[31] Abualigah, Laith Mohammad Qasim. Feature selection and enhanced krill herd
algorithm for text document clustering. Berlin: Springer, 2019.