Feature Selection and Comparison of Classifcation Algorithms
Feature Selection and Comparison of Classifcation Algorithms
https://doi.org/10.1007/s12652-021-03411-6
ORIGINAL RESEARCH
Received: 15 January 2021 / Accepted: 26 July 2021 / Published online: 3 August 2021
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
Abstract
Wireless sensor networks (WSNs) are developing at an incredible pace because of their cost-effective solutions for applica-
tions like military and medical. WSN consists of a large number of nodes that have to suffer from constraints like limited
computation capacity and limited battery capacity. There are a lot of attacks in WSNs; one of them is the distributed denial
of service attack. Many studies have shown that decreasing the redundancy of relevant features from a dataset can make a
model more accurate and efficient. In this paper, correlation-based feature selection, principal component analysis, linear
discriminant analysis, recursive feature elimination, and univariate feature selection are used for feature selection. Results
are compared after selecting features using these techniques. A novel technique for feature selection is introduced, which
combines five feature selection techniques as a stack. After implementing the feature selection techniques, the model is
trained with five machine learning algorithms, namely SVM, perceptron, K-nearest neighbor, stochastic gradient descent,
and XGBoost. Finally, the model is evaluated with the help of K-fold cross-validation. Among all of the techniques best
accuracy of 99.87% is achieved with the XGBoost classifier after selecting the best eleven features from the KDD dataset.
Keywords Wireless sensor network (WSN) · Feature extraction · Recursive feature elimination (RFE) · Univariate feature
selection · XGBoost · Machine learning (ML)
13
Vol.:(0123456789)
1978 S. Pande et al.
technique with the most promising result has been imple- their proposed method. The accuracy obtained by using their
mented and evaluated in this work. In this paper, novel proposed system was better than the other existing feature
stacked-based feature selection is proposed which is helping selection techniques. Gündüz and Çeter (2018) used four
to extract the features for improving the detection accuracy. different classification algorithms and compared their result.
A DDoS attack can quickly exhaust the available Algorithms like support vector machines, multilayer per-
resources using various ways, such as flooding. The dan- ceptron network, fuzzy unordered rule induction algorithm
ger is considerably more when we talk about military and (FURIA), and decision trees were used in this study. They
medical applications. To exhibit a security model that will concluded that the best-first search algorithm (BFS) and
think about these limitations and give security is a major CFS are giving the best features for the model. They selected
challenge nowadays. the 11 best features from 41 features of the KDDCUP99
The contribution of this research paper is summarized as dataset and applied classification algorithms. FURIA gives
the best result. Abd-Eldayem et al. (2014) proposed HTTP
• The importance of feature selection techniques is dis- service-based IDS in their work. They focused on making
cussed and explored with five different techniques. high-performance IDS for HTTP services. They used the
• A stacked based novel approach for feature selection is NSL-KDD dataset and Naive Bayes Classifier to make their
proposed. model. The model gave 37% FN and 6.6% FP rate on train-
• Comparative analysis with different machine learning ing data and 24% FN and 4.6% FP on test data. Aburomman
algorithms is performed. et al. (2016) analyzed some hybrid and ensemble techniques.
They used ensemble techniques of both similar as well as
The remainder of the paper will be structured as fol- different forms. They also worked on ensemble based on
lows. The related work on our proposed plan is covered voting techniques because, generally, these algorithms give
in Sect. 2. Section 3 describes the methodology proposed, accurate results. Aljawarneh et al. (2018) proposed a method
which includes dataset description, preprocessing, feature with hybrid algorithms consisting of classifiers like Naive
selection, feature scaling, and model description. Section 4 Bayes, Random Tree, Meta Tagging, Decision Stump, Ada-
describes the experiment and the analysis of the results, BoostM1, Pegging, J48, and REPTree. They used the Vote
along with the result comparison. Section 5 describes the algorithm with information gain to filter the data on the
conclusion and future scope. NSL-KDD dataset. They have got 99.81% accuracy with
binary class and 98.56% accuracy with a multiclass dataset.
Amiri et al. (2011) proposed a framework with the least
2 Related work square support vector machine to increase the accuracy of
IDS. They examined the results given by the algorithm on
Many studies have shown that by using less redundant data- the features selected by using the linear correlation coeffi-
sets or by selecting relevant features from a dataset the accu- cient and also by using mutual information. Obtained results
racy and overall performance of the model can be enhanced. have shown that the model is efficient in detecting R2L and
Researchers have used different feature selection techniques U2R attacks. Attia et al. (2018) proposed a very efficient
to achieve this, and some of them have also used the hybridi- model to detect any misbehaving on the network. They dis-
zation of various techniques as well as feature ranking tech- played the difference between the existing mechanism and
niques to find the most important features from the dataset. their proposed mechanism of the IDS system. The false rate
Vinutha and Basavaraju (2018) suggested that there is no of their model is approximately 5%, and the detection rate
need to use all features from the dataset to make the model. of their model is 95%. Selvakumar and Muneeswaran (2019)
They tested their model by using information gain, Chi- analyzed the dataset with Mutual Information and Bayesian
square based attribute selection, symmetrical uncertainty Network-based classifier. They selected ten features using
feature selection technique, and gain ratio attribute evalua- a firefly nature-inspired algorithm. They have also used
tion and then finally applied Naive Bayes classifier to clas- the y graphs showing the difference between training and
sify the situation (Vinutha and Basavaraju 2018). Calix and testing time on all features. Borkar et al. (2019) used the
Sankara (2013) proposed that a highly accurate classifica- RRF algorithm to select the best features of the dataset and
tion model can be developed with the NSL-KDD dataset by developed a model with a support vector machine classifier.
using the Support Vector Machine classifier based upon the They utilized a high-level security mechanism to secure the
best-ranked features. They claimed that ranking features are transmission between packets and sensors. This model has
very effective for classification. Chae and Choi (2014) used given good performance in classifying the attacks. Chebrolu
gain ratio, information gain, and CFS for feature selection et al. (2005) examined the dataset with a combination of
which gave impressive results with decision tree classifiers. feature selection techniques. They selected features using
They got an accuracy of 99.79% with 22 features by using the Markov blanket model and decision tree. They trained
13
Feature selection and comparison of classification algorithms for wireless sensor networks 1979
their model with hybridization of Regression Trees, Clas- available datasets, long training and computing time, a lot
sification, and Bayesian networks. They claimed that the of preprocessing of datasets, etc. (Pande and Bhagat 2016;
accuracy of detecting Probe and DOS attacks is 100%, and Madhavan et al. 2021; Khasawneh et al. 2020; Abualigah
U2R and R2L are 99.47% respectively. Chen et al. (2006) et al. 2021; Safaldin et al. 2021; Abualigah et al. 2021). Still,
used a Flexible Neural Tree model for breast cancer clas- there are lots of areas that need to be focused upon. In this
sification and intrusion detection systems. They claimed work, a novel feature selection technique is proposed, which
that their model gave fair detection accuracy with 4, 12, will execute different selection techniques one after another
8, 10 features. They focused on minimizing the number of as a stack and will reduce the time required to classify the
features and got the best accuracy for their dataset. Javaid model as the number of features obtained after applying fea-
et al. (2016) implemented self-taught learning (STL) by ture selection techniques will be less.
using deep learning techniques. The method combines
sparse auto-encoder along with softmax regression. For the
experimental study, the NSL-KDD dataset was used in their 3 Methodology
work, and the binary classification results were obtained for
the f-score. Potluri and Diedrich (2016) implemented a deep 3.1 Dataset description
learning approach based on the DNN method. They filter out
41 features, amongst which they have used only 27 features The selection of important features from the dataset plays
for the experimentation. They have got mixed results. You a crucial role in any ML model’s success. For the imple-
et al. (2016) used a deep learning approach along with the mentation purpose, the most popular KDDCup99 dataset
RNN technique. Experimentation was done on the famous is used for intrusion classification. This dataset contains
dataset. Comparative analysis was done on SVM and Naïve forty-one different features including content type, basic
based methods. They claimed to obtained promising results type, and traffic type features. It was developed based upon
with an accuracy of 92.7%. Alrawashdeh and Purdy (2016) the DARPA’98 IDS evaluation program.
implemented deep learning-based upon unsupervised fea- All the records in KDD dataset are categorized into two
ture reduction. The logistic regression classifier was used for ways:
experimentation. The popular dataset KDD Cup 99 was used
in this work, and the results obtained had a detection rate of 1. Normal intrusion
97.90%. Dong and Wang (2016) used deep learning meth- 2. Kind of intrusion
ods along with the traffic anomaly technique. They tried to
address the problems associated with the dataset. Also, they There are four categories of intrusion in the dataset (Hari-
claimed to obtained promising results. Zhao et al. (2019) haran et al. 2019; Nkiama et al. 2016); ; :
implemented deep learning methods for the health monitor-
ing machine. Four deep learning methods were compared • Denial-of-service attack (DOS) DOS is the oldest form
and analyzed. The experiment was carried out on a famous of cyber extortion attack. Basically, in this attack, the
dataset obtaining good results. Alkasassbeh et al. (2016) attacker makes the server very busy, and because of that
designed a framework based on the Multilayer Perceptron machine or software, it denies access to the genuine user.
technique (MLP). The tests were carried out for SIDDoS • Remote to local attack (R2L) In a specific version of
and HTTP flood-based attacks. The obtained results were ncftp, the prominent FTP client, exploits a bug. The
claimed to be around 98%. Tesfahun and Bhaskari (2013) directory includes a directory with a very long name;
designed a technique based on the Oversampling technique. one or more commands contained in the name are then
The popular NSL-KDD dataset was used for the experiment, executed (unintentionally) by the FTP client with the
and the selection of features was made based on informa- user’s permission. Example: Passwords Guessing etc.
tion gain. Machine learning Random Forest algorithm was • Unauthorized access to local superuser (root) privileges
used for experimentation while performing classification. (U2R) It gets access to the root of the system and per-
Table 1 provides a comparison between existing approaches. forms various attacks and unauthorized attempts. Exam-
In Table 1 research work and state of art performed by the ple: various Buffer overflow attacks, Perl, rootkit, etc.
various researcher are discussed comparatively. The param- • Probe This assault is an endeavor to assemble data
eters considered for comparison are dataset, strength, limi- about a system of PCs for the obvious motivation behind
tation, etc. evading its security controls. For example, while send-
Even though many methodologies have been designed ing an empty message just to see if there is a destina-
to detect DDOS attacks with high accuracy, still there are tion. Ping is a common tool to send such a sample. Exam-
many areas where improvements can be made. These areas ple—port scanning, SATAN, SAINT, portweep, etc.
may include dependency on human operators, lack of freely
13
Table 1 Comparison of various existing methodologies
1980
13
Potluri et al. (IEEE) 2016 Deep neural network (DNN) DNN NSL-KDD The evaluation was done on Performance metrics were not
a different processor discussed
You et al. (IEEE) 2016 Recurrent neural networks Automatic security auditing Collected short messages Improvement has been done Obtained accuracy is less
(RNN) tool for short messages over the SVM algorithm
(SMS)
Alrawashdeh et al. (IEEE) 2016 Restricted Boltzmann RBM with 2 hidden layers KDDCUP’99 The classification was done Lack of feature reduction
machine (RBM) using 5 classes
Dong et al. (IEEE) 2016 Deep learning methods Synthetic minority over-sam- KDD-99 Problems related to Only precision results are
pling technique (SMOTE) imbalanced datasets are compared and discussed
overcome by using the
oversampling method
Zekri et al. (IEEE) 2017 C.4.5 algorithm Wireshark tool was used to Hping3 generated attack- The approach was focused on Considered only signature-
monitor on virtual box based data was used the flooding-based attack based attacks
Shah et al. (future generation 2017 Snort adaptive plug-in Different learning algorithms NSA Snort IDS, DARPA The performance was com- Obtained simulation result
computer systems) IDS, NSL-KDD IDS pared on two IDSs gives less accuracy
Idhammad et al. (Springer 2018 Semi-supervised ML Applied entropy estimation, NSL-KDD, UNB ISCX 12 A time-based sliding window Both supervised and unsu-
Nature) approach information gain ratio, co- and UNSW-NB15 was used for entropy pervised techniques are
clustering used which requires more
computation time
Doshi et al. (IEEE) 2018 Packet-level machine learn- 5 ML algorithms were used Simulated self-generated Tested and compared results Limited features were selected
ing dataset with 5 ML classifiers KN, specifically to avoid over-
LSVM, DT, RF, NN head
Hariharan et al. (modern 2019 C 5.0 decision tree (DT) Offline detection model Simulated dataset obtained The accuracy obtained was The model is not proactive
education and computer using VMware significantly high
science press)
Aamir et al. (Elsevier) 2019 Clustering-based semi- The clustering approach CICIDS2017 Various ML algorithms Less number of features were
supervised machine are used in a diversified considered
learning manner
Dayananda et al. (Springer 2019 Access control list (ACL) Firewall Self-simulated Network and application- Cannot handle more frequent
Nature) level defense mechanism traffic due to dependency on
the firewall
Mallikarjunan et al. 2019 Naive Bayes Classification based upon Self-simulating dataset Gives promising results com- If features are increased,
(Springer Nature) NB pared to J 48 and RF accuracy will be reduced
drastically
Khamparia et al. (Library 2020 Hybrid technique Anomaly-based approach Orkut, Twitter, etc. Combination of statisti- No feature selection technique
Hi-Tech) cal and semi-supervised was applied
learning
Pande et. al. (Springer 2021 Random Forest Weka tool NSL-KDD Accuracy around 99% Used only partial dataset
Nature)
Pande et al. (World Journal 2021 Artificial neural network Deep learning NSL-KDD Accuracy around 99.99% Computation time required to
of Engineering) (ANN) train the model is more
S. Pande et al.
Feature selection and comparison of classification algorithms for wireless sensor networks 1981
3.2 Preprocessing
3.3 Feature selection
• Decreases size of the dataset For feature selection, five different techniques have been
• Decreases the risk of overfitting used namely correlation-based feature selection, linear
• Decreases the misleading of the data discriminant analysis (LDA), univariate feature selection,
• Decreases the time of training the model recursive feature elimination (RFE), principle component
• Improves the accuracy of the model analysis (PCA). All the feature selection techniques are also
used as a stack. Initially, the Correlation-based feature selec-
Redundancy can be decreased by dropping irrelevant or tion technique is applied to the dataset. The availability of
partially relevant features from the dataset (Khamparia et al. a Correlated feature in a dataset can decrease the perfor-
2020; Ghosh et al. 2014; Hasan et al. 2016). It is the most mance of the model, and also it can affect the accuracy of
important step for almost every framework which uses a the model, so these features need to be dropped from the
dataset having high redundancy or having a large number of dataset. To make a correlation matrix Pearson’s correlation
columns. Because training the model on irrelevant features coefficient (PCC) is used.
may negatively affect the model’s accuracy (Pande et al. Pearson’s correlation If the covariance of two values is
2021a, b; Pande and Gadicha 2015; Pande and Khamparia divided and divided value is then multiplied with the stand-
2019; Madhavan et al. 2021). Figure 1 depicts the stacked ard deviation of each value, and then the output obtained is
based feature selection techniques used in the proposed the PCC.
model.
covariance(x, y)
Pseudocode of the proposed technique (using all the fea- Coefficient = (1)
(stdv(x) ∗ stdv(y))
ture selection algorithms as a stack)
Input: Set of 41 features from KDD’99 Cup dataset where x is data and y is a random variable.
Output: Best Selected Features Subset By using these coefficients, the relationship between the
1: Select the features having a value greater than or equal features can be understood. The value of the coefficients
to 0.7 and less than or equal to-0.7 and apply Correlation- always ranges from − 1 to 1. In the KDD dataset, the values
based feature selection. between − 0.5 to + 0.5 ranges have shown a significant cor-
2: Calculate Pearson’s Correlation Coefficient using relation. By measuring this relationship and putting those
equation 1. values in a matrix between each pair of values available
3: Select the features subset which satisfies the threshold. in the dataset will form a symmetric matrix. Based on this
4: Repeatedly apply the feature selection from the stack observation, correlating features are dropped. Features hav-
of univariate, RFE, PCA, LDA on the features obtained from ing their value greater than or equal to 0.7 and less than or
step3. equal to − 0.7 are dropped. The number of features before
13
1982 S. Pande et al.
∑f ∑c (sij − 𝝁ij )2 the 41 features of the KDD dataset when the Univariate tech-
𝝌2 = (2)
i=1 j=1 𝝁ij nique is implemented separately.
Recursive feature elimination (RFE) The name of this
where sij is the ith value of the feature along with the technique is self-explanatory. It works in a loop and removes
instances. And a few features in each loop. If a dataset has high co-linearity
s∗j si∗ and dependencies, then RFE can be the best algorithm to
μij = , (3) eliminate these features. RFE first evaluates the importance
s
of the features and then ranked them accordingly. Elimina-
where si∗ is the ith value of the specific feature, s∗j is the tion of the weakest features is performed in this step. RFE
number of instances in class j, and s is the number of class mainly takes two arguments, the classifier and the num-
instances. Figure 3 depicts the 13 selected features among ber of features to be selected. A logistic regression classifier
is used in the proposed framework as it takes comparatively
13
Feature selection and comparison of classification algorithms for wireless sensor networks 1983
less time for training the model. RFE trains the model 3.4 Feature scaling
using the classifier provided and calculates the accuracy by
eliminating the unwanted features. RFE takes more time to Feature scaling is an essential step for making an accurate
compare to univariate feature selection because it trains the machine learning model because the feature can contain
model until the end of the loop. large values, and the model needs to be trained on these
Linear discriminant analysis (LDA) LDA is a widely used values, which can weigh more on the final result. Feature
feature selection technique. This technique mainly removes scaling is done by using the normalization technique (NT).
the redundant and dependent features from the dataset. It is a process of converting the data points into 0–1 range
Mainly, it consists of three main stages. In the first stage, it without disturbing the relation between them. For better
calculates the difference between the averages of different accuracy, this step plays a crucial role and hence needs to
classes. This difference is known as between class variance. be implemented before training the model.
In the second stage, it calculates the difference between the
average and the sample values of each class. This differ- 3.5 Model description
ence is known as within class variance. In the third stage, it
selects the features that have greater between class variance The preprocessed dataset is split into training and test-
and less within class variance. LDA is also broadly used ing sets. Five different classifiers are used for training the
in the field of bioinformatics and chemistry. Consider that model—SVM, Perceptron, K-Nearest Neighbor, Stochastic
the probability density function of x with mean vector μi Gradient Descent, and XGBoost.
and variance–covariance matrix (same for all populations) Perceptron The perceptron algorithm works like a neural
is multivariate normal in population πi. For this scenario, cell present in our body. It accepts the training data as a
the normal function of probability density is calculated as node. It consists of 2 parameters weights and biases and a
given below: function called the activation function. It runs several times
� � in a loop, and every time it changes the value of weights and
1 1
P(X∕𝝅i) = ∏ p∕2 ∑ exp − (X − 𝝁i)� (X − 𝝁i) biases to minimize the loss between predicted and actual
(2 ) � �1∕2 2
value, and the activation function decides that the particular
(4) neuron should be fired for the output layer or not. The below
Principle component analysis (PCA) PCA is widely used formula is used for calculating the activation function.
in unsupervised learning. The approach of this technique
is very simple, but it can make a fair difference between Activation = sum(weight[i] ∗ z[i]) + Bias (7)
the accuracy of the model trained when applied to all the [Where i is the index number]
features. Initially, it calculates the covariance of data points The prediction will be equal to one of the activation value
and arranged them in a matrix form. Further, it calculates is greater than or equal to zero, and it will be zero if the
the Eigen Vector and Eigen Value of that matrix. Then, it activation value is less than zero.
arranges all Eigen Vectors according to their Eigen Values Stochastic gradient descent Minimization of a function
in descending order. Furthermore, it selects the most promis- by checking the gradients of loss functions can be obtained
ing features for training the model. It converts the original by using the Gradient Descent Algorithm. After checking,
dataset into the selected number of Eigen Vectors. PCA is it updates the weights of the function. To minimize the error
also used in the field of medical science and chemistry. It is by updating the weights, this algorithm is beneficial. These
also used to reduce the distortion from a graph. The below algorithms are also called optimization algorithms. The
formula describes the covariance computation of X and Y. learning rate has to be given as an argument to the classifier
Where X and Y are matrices of m, and P is a linear transfor- to make the changes accordingly. The default value of the
mation. X is the original data set point, and Y represents the learning rate in the classifier is 0.01. The formula to update
Re-deployment dataset. the weights is given below:
PX = Y (5) Weight = weight + learn_rate ∗ (expected − predicted)∗a
(8)
n
1 ∑( )( ) where a is the input variable.
cov(X, Y) = Xi − x Yi − y (6)
n − 1 i=0 K-nearest neighbor This algorithm finds the nearest
neighbors and differentiates them in a class. It comes under
the category of the supervised algorithm. It identifies the
closest neighbor by using the Euclidean distance formula.
The implementation of this technique is simple to under-
stand. Initially, separate the dataset in training and testing
13
1984 S. Pande et al.
set, and choose important features that are required to select XGBoost: Extreme Gradient Boosting (XGBoost) is a
from the training data. Then, find the distance between all widely used algorithm for the Classification of large datasets
the points by using the Euclidean distance formula and store with a minimal amount of time. There are many advantages
it in a list. Further, the sort that lists and selects the first n of this algorithm which causes its popularity these days. It
values (number of features needed) from the dataset and then performs parallel computing due to which users get their
allocates a class to test the points based upon the majority of results faster. XGBoost classifier outperformed and accom-
classes available in the points that have been chosen. Follow- plished the most noteworthy results when compared with
ing are the various distance measuring techniques possible different classifiers. Parameters of the classifier can also be
in KNN along with their standard formulas: tuned to enhance the results.
√
∑f F() = L(�) + 𝛀(�) (12)
Euclidean distance function (Xi − Y i )2 (9)
i=1
L(�) and 𝛀(�), ∅ refers to the different parameters in the
above equation.
∑f
Manhattan distance function | X − Y i || (10) Where L(�) is a differentiable convex loss function, and
i=1 | i
𝛀(�), is a regularized term that penalizes complex models
1
(Zekri et al. 2017).
∑f The sequence of all the feature selection techniques is
Minkowski distance function( | | q q
( X − Y i |) ) (11)
i=1 | i very important because selecting the features with the dif-
where X and Y are two distinct points and f is the number ferent combinations is affecting the accuracy of the models.
of instance points. Also, these feature selection techniques are arranged in such
13
Feature selection and comparison of classification algorithms for wireless sensor networks 1985
a manner that the computation time of the model can be of the classifiers. Finally, all of these techniques are used
optimized. If we change the sequence of the feature selec- as a stack i.e., first, 23 features are selected from the 28
tion algorithms then the computation time will be increased features by using univariate feature selection, and then from
gradually. those 23 features, 20 best features are selected by RFE and
from those 20 features, 16 features are selected by principal
component analysis (PCA) and from those 16 features, 11
4 Experiment and result analysis features are selected by LDA and then the model is trained
with each classifier on these 11 features. At last, two feature
Each classifier used for the model is trained eleven times by selection techniques, RFE, and LDA are chosen to perform
using a different number of features. Furthermore, the entire feature scaling as a stack. The first 17 features from the 28
evaluated model is then validated by using K-fold cross-val- features are selected by RFE, and from those 17 features,
idation. It is the most reliable validation technique because 11 best features are selected by LDA and then the trained
the results generated by this technique will always be less model with each of the classifiers is implemented. K-fold
biased. The function cross_val_score() is used with 10 folds cross validation is used to evaluate the models. K-fold cross-
for K-fold cross-validation. The experiment is performed on validation splits the training set into 10 folds (default value
a system having the configuration of 8 GB of RAM, Intel of the parameter), and it trains the model on 9 folds and then
i5 8th generation quad-core processor with 1.6 GHz clock tests with the last remaining fold. Moreover, because of 10
speed. different folds, it gives ten different accuracies, and finally,
After performing Correlations-based feature selection it calculates the mean of the accuracies to get the final accu-
on the dataset, 28 features are identified, and the model is racy of the model. Table 3 gives a comparative analysis of
trained with each classifier on these 28 features. Further, 17 the obtained accuracy with other existing techniques.
features are selected from these 28 features by using four XGBoost classifier is a classifier that fuses plenty of tree
feature scaling techniques, and then the model is trained versions with lesser distinction reliability, and it creates a
with each of the classifiers. Later, 11 features are selected substantial correct and minimal False Positive item through
from the 28 features by using four feature scaling tech- the regular iteration of the model. However, XGBoost could
niques, and training of the model is carried out with each scale up beyond billions of good examples while using much
105.00%
100.00%
95.00%
90.00%
85.00%
80.00%
75.00%
70.00%
65.00%
60.00%
55.00%
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
UFS RFE PCA LDA
13
1986 S. Pande et al.
100.00%
95.00%
90.00%
85.00%
80.00%
75.00%
70.00%
65.00%
60.00%
55.00%
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
UFS RFE PCA LDA All as Stack RFE-LDA as stack
Bold reflects the best-obtained results using KNN and XGBoost Method
Table 3 Accuracy comparison with other techniques fewer resources than existing methods. Additionally, it can
be estimated on the out-of-core, which will conserve the
Author name Classifiers used Results (%)
controller’s memory resources (Chen et al. 2018; Pande et al.
You et al. RNN 92.7 2021a, b). The complete proposed architecture is depicted in
Alrawashdeh et al. RBM 97.9 Fig. 4. Obtained accuracy results using 17 and 11 features
Zekri et al. C4.5 98.8 are depicted in Figs. 5 and 10, respectively. Accuracies of
Muhammad Aamir et al. KNN, SVM, RF 99.66 all classifiers can be seen in Fig. 6.
Li et al. AutoEncoder + DBN 92.10 Table 2 depicts the results obtained after applying various
Gao et al. DBN 93.49 combinations of feature selection techniques along with the
Proposed technique Stack based approach 99.87 various training models. Five different training algorithms
were used and their results were compared. Overall, it was
observed that KNN and XGBoost algorithm was giving
more promising results in all the scenarios. In Table 3, the
13
Feature selection and comparison of classification algorithms for wireless sensor networks 1987
110.00%
105.00%
100.00%
95.00%
90.00%
85.00%
80.00%
75.00%
70.00%
65.00%
60.00%
55.00%
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
1 2 3 4 5 6 7 8 9 10 11
1: Correlaon Matrix (28), 2: UFS(17), 3: RFE(17), 4: PCA(17), 5:LDA(17), 6: UFS(11), 7: RFE(11), 8: PCA(11), 9: LDA(11), 10: All as stack(11), 11: RFE-LDA as stack(11)
13
1988 S. Pande et al.
Amiri F, Yousefi MR, Lucas C, Shakery A, Yazdani N (2011) Mutual Khamparia A, Pande S, Gupta D, Khanna A, Sangaiah AK (2020)
information-based feature selection for intrusion detection sys- Multi-level framework for anomaly detection in social network-
tems. J Netw Comput Appl 34(4):1184–1199 ing. Library Hi-Tech
Attia M, Senouci SM, Sedjelmaci H, Aglzim EH, Chrenko D (2018) An Khasawneh AM, Kaiwartya O, Abualigah LM, Lloret J (2020) Green
efficient intrusion detection system against cyber-physical attacks computing in underwater wireless sensor networks pressure-cen-
in the smart grid. Comput Electr Eng 68:499–512 tric energy modeling. IEEE Syst J 14(4):4735–4745
Borkar GM, Patil LH, Dalgade D, Hutke A (2019) A novel cluster- Kumari S, Khan MK, Atiquzzaman M (2015) User authentication
ing approach and adaptive SVM classifier for intrusion detection schemes for wireless sensor networks: a review. Ad Hoc Netw
in WSN: a data mining concept. Sustain Comput Inform Syst 27:159–194
23:120–135 Li Y, Ma R, Jiao R (2015) A hybrid malicious code detection method
Calix RA, Sankara R (2013) Feature ranking and support vector based on deep learning. Int J Secur Appl 9(5):205–216
machines classification analysis of the NSL-KDD intrusion detec- Li C, Wu Y, Yuan X, Sun Z, Wang W, Li X, Gong L (2018) Detection
tion corpus. In: International florida artificial intelligence research and defense of DDoS attack–based on deep learning in OpenFlow-
society conference, pp 292–295 based SDN. Int J Commun Syst 31(5):e3497
Chae HS, Choi SH (2014) Feature selection for efficient intrusion López J, Zhou J (eds) (2008) Wireless sensor network security, vol 1.
detection using attribute ratio. Int J Comput Commun 8:134–139 Ios Press, Amsterdam
Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and Madhavan MV, Pande S, Umekar P, Mahore T, Kalyankar D (2021)
ensemble design of intrusion detection systems. Comput Secur Comparative analysis of detection of email spam with the aid of
24(4):295–307 machine learning approaches. In: IOP conference series: materi-
Chen Y, Abraham A, Yang B (2006) Feature selection and classification als science and engineering, vol 1022, no 1. IOP Publishing, pp
using a flexible neural tree. Neurocomputing 70(1–3):305–313 012113
Chen Z, Jiang F, Cheng Y, Gu X, Liu W, Peng J (2018) XGBoost clas- Mallikarjunan KN, Bhuvaneshwaran A, Sundarakantham K, Shalinie
sifier for DDoS attack detection and analysis in SDN-based cloud. SM (2019) DDAM: detecting DDoS attacks using a machine
In: 2018 IEEE international conference on big data and smart learning approach. In: Computational intelligence: theories,
computing (big comp). IEEE, pp 251–256 applications and future directions, vol I. Springer, Singapore, pp
Cui J, Wang M, Luo Y, Zhong H (2019) DDoS detection and defense 261–273
mechanism based on cognitive-inspired computing in SDN. Nkiama H, Said SZM, Saidu M (2016) A subset feature elimination
Future Gener Comput Syst 97:275–283 mechanism for the intrusion detection system. Int J Adv Comput
Dayanandam G, Rao TV, Babu DB, Durga SN (2019) DDoS attacks— Sci Appl 7(4):148–157
analysis and prevention. In: Innovations in computer science and Ozdemir S, Xiao Y (2009) Secure data aggregation in wireless
engineering. Springer, Singapore, pp 1–10 sensor networks: A comprehensive overview. Comput Netw
Dong B, Wang X (2016) Comparison deep learning method to tradi- 53(12):2022–2037
tional methods using for network intrusion detection. In: 2016 8th Pande S, Gadicha AB (2015) Prevention mechanism on DDOS attacks
IEEE international conference on communication software and by using multilevel filtering of distributed firewalls. International
networks (ICCSN). IEEE, pp 581–585 Journal on Recent and Innovation Trends in Computing and Com-
Doshi R, Apthorpe N, Feamster N (2018) Machine learning DDoS munication 3(3):1005–1008
detection for the consumer internet of things devices. In: 2018 Pande SD, Khamparia A (2019) A review on detection of DDOS attack
IEEE security and privacy workshops (SPW). IEEE, pp 29–35 using machine learning and deep learning techniques. Think India
Fotue D, Melakessou F, Labiod H, Engel T (2011) Mini-sink mobility J 22(16):2035–2043
with diversity-based routing in wireless sensor networks. In: Pro- Pande SD, Bhagat VB (2016) Hybrid wireless network approach for
ceedings of the 8th ACM symposium on performance evaluation QoS. Int J Recent Innov Trends Comput Commun 4(4):327–332
of wireless ad hoc, sensor, and ubiquitous networks, pp 9–16 Pande S, Khamparia A, Gupta D, Thanh DN (2021a) DDOS detection
Gao N, Gao L, Gao Q, Wang H (2014) An intrusion detection model using machine learning technique. In: Recent studies on compu-
based on deep belief networks. In: 2014 second international con- tational intelligence. Springer, Singapore, pp 59–68
ference on advanced cloud and big data. IEEE, pp 247–252 Pande S, Khamparia A, Gupta D (2021b) An intrusion detection system
Ghosh P, Debnath C, Metia D, Dutta R (2014) An efficient hybrid mul- for healthcare systems using machine and deep learning. World J
tilevel intrusion detection system in a cloud environment. IOSR J Eng. https://doi.org/10.1108/WJE-04-2021-0204
Comput Eng 16(4):16–26 Potluri S, Diedrich C (2016) Accelerated deep neural networks for the
Gündüz SY, Çeter MN (2018) Feature selection and comparison of enhanced intrusion detection system. In: 2016 IEEE 21st interna-
classification algorithms for intrusion detection. Anadolu Univ tional conference on emerging technologies and factory automa-
J Sci Technol A Appl Sci Eng 19(1):206–218. https://doi.org/10. tion (ETFA). IEEE, pp 1–8
18038/aubtda.356705 Safaldin M, Otair M, Abualigah L (2021) Improved binary gray wolf
Hariharan M, Abhishek HK, Prasad BG (2019) DDoS attack detec- optimizer and SVM for intrusion detection system in wireless sen-
tion using C5.0 machine learning algorithm. IJ Wirel Microwave sor networks. J Ambient Intell Humaniz Comput 12(2):1559–1576
Technol 1:52–59 Selvakumar B, Muneeswaran K (2019) Firefly algorithm-based fea-
Hasan MAM, Nasser M, Ahmad S, Molla KI (2016) Feature selec- ture selection for network intrusion detection. Comput Secur
tion for intrusion detection using random forest. J Inf Secur 81:148–155
7(3):129–140 Shah SAR, Issac B (2018) Performance comparison of intrusion detec-
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed tion systems and application of machine learning to Snort system.
28 Oct 1999 Future Gener Comput Syst 80:157–170
Idhammad M, Afdel K, Belouch M (2018) Semi-supervised Tesfahun A, Bhaskari DL (2013) Intrusion detection using random
machine learning approach for DDoS detection. Appl Intell forests classifier with SMOTE and feature reduction. In: 2013
48(10):3193–3208 international conference on cloud & ubiquitous computing &
Javaid A, Niyaz Q, Sun W, Alam M (2016) A deep learning approach emerging technologies. IEEE, pp 127–132
for network intrusion detection systems. EAI Endorsed Trans Tripathy S, Nandi S (2008) Defense against outside attacks in wireless
Secur Saf 3(9):e2 sensor networks. Comput Commun 31(4):818–826
13
Feature selection and comparison of classification algorithms for wireless sensor networks 1989
Vinutha HP, Basavaraju P (2018) Analysis of feature selection and Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX (2019) Deep learn-
ensemble classifier methods for intrusion detection. Int J Natural ing and its applications to machine health monitoring. Mech Syst
Comput Res 7(1):57–72 Signal Process 115:213–237
You L, Li Y, Wang Y, Zhang J, Yang Y (2016) A deep learning-based
RNNs model for an automatic security audit of short messages. Publisher’s Note Springer Nature remains neutral with regard to
In: 2016 16th international symposium on communications and jurisdictional claims in published maps and institutional affiliations.
information technologies (ISCIT). IEEE, pp 225–229
Zekri M, El Kafhali S, Aboutabit N, Saadi Y (2017) DDoS attack
detection using machine learning techniques in cloud comput-
ing environments. In: 2017 3rd international conference of cloud
computing technologies and applications (CloudTech). IEEE, pp
1–7. https://doi.org/10.1109/CloudTech.2017.8284731
13