0% found this document useful (0 votes)

40 views

Feature Selection and Comparison of Classifcation Algorithms

Uploaded by

ankeeonline

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Feature Selection and Comparison of Classifcation Algorithms

Uploaded by

ankeeonline

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Journal of Ambient Intelligence and Humanized Computing (2023) 14:1977–1989

https://doi.org/10.1007/s12652-021-03411-6

ORIGINAL RESEARCH

Feature selection and comparison of classification algorithms

for wireless sensor networks
Sagar Pande1 · Aditya Khamparia2 · Deepak Gupta3

Received: 15 January 2021 / Accepted: 26 July 2021 / Published online: 3 August 2021
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021

Abstract
Wireless sensor networks (WSNs) are developing at an incredible pace because of their cost-effective solutions for applica-
tions like military and medical. WSN consists of a large number of nodes that have to suffer from constraints like limited
computation capacity and limited battery capacity. There are a lot of attacks in WSNs; one of them is the distributed denial
of service attack. Many studies have shown that decreasing the redundancy of relevant features from a dataset can make a
model more accurate and efficient. In this paper, correlation-based feature selection, principal component analysis, linear
discriminant analysis, recursive feature elimination, and univariate feature selection are used for feature selection. Results
are compared after selecting features using these techniques. A novel technique for feature selection is introduced, which
combines five feature selection techniques as a stack. After implementing the feature selection techniques, the model is
trained with five machine learning algorithms, namely SVM, perceptron, K-nearest neighbor, stochastic gradient descent,
and XGBoost. Finally, the model is evaluated with the help of K-fold cross-validation. Among all of the techniques best
accuracy of 99.87% is achieved with the XGBoost classifier after selecting the best eleven features from the KDD dataset.

Keywords Wireless sensor network (WSN) · Feature extraction · Recursive feature elimination (RFE) · Univariate feature
selection · XGBoost · Machine learning (ML)

1 Introduction Numerous sensors are used in a WSN to transmit the col-

lected information to a processing station remotely. They
Wireless sensor network (WSN) is a network of sensor nodes can detect temperature, sound, vibration, and weight. WSNs
that transmit sensed data across a wireless world (Fortue have become more common when the demand for wireless
et al. 2011; Ozdemir and Xiao 2009). A sensor is an object remote monitoring and control increases in different fields.
used to collect data on a physical object or the occurrence WSNs are widely used in many regions, including mili-
of an event. The sensor hub is comprised of detection, acti- tary applications, environmental disasters, wildlife surveil-
vation, and power segments where they are coordinated on lance, safety surveillance, and so on (Kumari et al. 2015).
a single or various sheet framing an installed framework. WSNs’ primary objective is to get meaningful data in a sta-
ble environment (Tripathy and Nandi 2008). Providing and
maintaining data protection is an important issue as there
* Sagar Pande are numerous challenges to the security of these networks.
[email protected] These attacks can be classified as distributed-denial-of-ser-
Aditya Khamparia vice attacks (DDoS), server compromise impersonation, and
[email protected] protocol-specific attacks.
Deepak Gupta The DDoS attack, among them, tends to hamper sensor
[email protected] communication by blocking one or more network devices
1 that perform routing functions. If meaningful data sharing
School of Computer Science and Engineering, Lovely
Professional University, Punjab, India between nodes is prevented, the network cannot function
2 following its intent. It is, therefore, important for WSNs
Babasaheb Bhimrao Ambedkar University, Lucknow, India
to accurately detect and evaluate DDoS attacks and take
3
Department of Computer Science Engineering, Maharaja pre-attack security measures (López et al. 2008). A novel
Agrasen Institute of Technology, New Delhi, India

13
Vol.:(0123456789)
1978 S. Pande et al.

technique with the most promising result has been imple- their proposed method. The accuracy obtained by using their
mented and evaluated in this work. In this paper, novel proposed system was better than the other existing feature
stacked-based feature selection is proposed which is helping selection techniques. Gündüz and Çeter (2018) used four
to extract the features for improving the detection accuracy. different classification algorithms and compared their result.
A DDoS attack can quickly exhaust the available Algorithms like support vector machines, multilayer per-
resources using various ways, such as flooding. The dan- ceptron network, fuzzy unordered rule induction algorithm
ger is considerably more when we talk about military and (FURIA), and decision trees were used in this study. They
medical applications. To exhibit a security model that will concluded that the best-first search algorithm (BFS) and
think about these limitations and give security is a major CFS are giving the best features for the model. They selected
challenge nowadays. the 11 best features from 41 features of the KDDCUP99
The contribution of this research paper is summarized as dataset and applied classification algorithms. FURIA gives
the best result. Abd-Eldayem et al. (2014) proposed HTTP
• The importance of feature selection techniques is dis- service-based IDS in their work. They focused on making
cussed and explored with five different techniques. high-performance IDS for HTTP services. They used the
• A stacked based novel approach for feature selection is NSL-KDD dataset and Naive Bayes Classifier to make their
proposed. model. The model gave 37% FN and 6.6% FP rate on train-
• Comparative analysis with different machine learning ing data and 24% FN and 4.6% FP on test data. Aburomman
algorithms is performed. et al. (2016) analyzed some hybrid and ensemble techniques.
They used ensemble techniques of both similar as well as
The remainder of the paper will be structured as fol- different forms. They also worked on ensemble based on
lows. The related work on our proposed plan is covered voting techniques because, generally, these algorithms give
in Sect. 2. Section 3 describes the methodology proposed, accurate results. Aljawarneh et al. (2018) proposed a method
which includes dataset description, preprocessing, feature with hybrid algorithms consisting of classifiers like Naive
selection, feature scaling, and model description. Section 4 Bayes, Random Tree, Meta Tagging, Decision Stump, Ada-
describes the experiment and the analysis of the results, BoostM1, Pegging, J48, and REPTree. They used the Vote
along with the result comparison. Section 5 describes the algorithm with information gain to filter the data on the
conclusion and future scope. NSL-KDD dataset. They have got 99.81% accuracy with
binary class and 98.56% accuracy with a multiclass dataset.
Amiri et al. (2011) proposed a framework with the least
2 Related work square support vector machine to increase the accuracy of
IDS. They examined the results given by the algorithm on
Many studies have shown that by using less redundant data- the features selected by using the linear correlation coeffi-
sets or by selecting relevant features from a dataset the accu- cient and also by using mutual information. Obtained results
racy and overall performance of the model can be enhanced. have shown that the model is efficient in detecting R2L and
Researchers have used different feature selection techniques U2R attacks. Attia et al. (2018) proposed a very efficient
to achieve this, and some of them have also used the hybridi- model to detect any misbehaving on the network. They dis-
zation of various techniques as well as feature ranking tech- played the difference between the existing mechanism and
niques to find the most important features from the dataset. their proposed mechanism of the IDS system. The false rate
Vinutha and Basavaraju (2018) suggested that there is no of their model is approximately 5%, and the detection rate
need to use all features from the dataset to make the model. of their model is 95%. Selvakumar and Muneeswaran (2019)
They tested their model by using information gain, Chi- analyzed the dataset with Mutual Information and Bayesian
square based attribute selection, symmetrical uncertainty Network-based classifier. They selected ten features using
feature selection technique, and gain ratio attribute evalua- a firefly nature-inspired algorithm. They have also used
tion and then finally applied Naive Bayes classifier to clas- the y graphs showing the difference between training and
sify the situation (Vinutha and Basavaraju 2018). Calix and testing time on all features. Borkar et al. (2019) used the
Sankara (2013) proposed that a highly accurate classifica- RRF algorithm to select the best features of the dataset and
tion model can be developed with the NSL-KDD dataset by developed a model with a support vector machine classifier.
using the Support Vector Machine classifier based upon the They utilized a high-level security mechanism to secure the
best-ranked features. They claimed that ranking features are transmission between packets and sensors. This model has
very effective for classification. Chae and Choi (2014) used given good performance in classifying the attacks. Chebrolu
gain ratio, information gain, and CFS for feature selection et al. (2005) examined the dataset with a combination of
which gave impressive results with decision tree classifiers. feature selection techniques. They selected features using
They got an accuracy of 99.79% with 22 features by using the Markov blanket model and decision tree. They trained

13
Feature selection and comparison of classification algorithms for wireless sensor networks 1979

their model with hybridization of Regression Trees, Clas- available datasets, long training and computing time, a lot
sification, and Bayesian networks. They claimed that the of preprocessing of datasets, etc. (Pande and Bhagat 2016;
accuracy of detecting Probe and DOS attacks is 100%, and Madhavan et al. 2021; Khasawneh et al. 2020; Abualigah
U2R and R2L are 99.47% respectively. Chen et al. (2006) et al. 2021; Safaldin et al. 2021; Abualigah et al. 2021). Still,
used a Flexible Neural Tree model for breast cancer clas- there are lots of areas that need to be focused upon. In this
sification and intrusion detection systems. They claimed work, a novel feature selection technique is proposed, which
that their model gave fair detection accuracy with 4, 12, will execute different selection techniques one after another
8, 10 features. They focused on minimizing the number of as a stack and will reduce the time required to classify the
features and got the best accuracy for their dataset. Javaid model as the number of features obtained after applying fea-
et al. (2016) implemented self-taught learning (STL) by ture selection techniques will be less.
using deep learning techniques. The method combines
sparse auto-encoder along with softmax regression. For the
experimental study, the NSL-KDD dataset was used in their 3 Methodology
work, and the binary classification results were obtained for
the f-score. Potluri and Diedrich (2016) implemented a deep 3.1 Dataset description
learning approach based on the DNN method. They filter out
41 features, amongst which they have used only 27 features The selection of important features from the dataset plays
for the experimentation. They have got mixed results. You a crucial role in any ML model’s success. For the imple-
et al. (2016) used a deep learning approach along with the mentation purpose, the most popular KDDCup99 dataset
RNN technique. Experimentation was done on the famous is used for intrusion classification. This dataset contains
dataset. Comparative analysis was done on SVM and Naïve forty-one different features including content type, basic
based methods. They claimed to obtained promising results type, and traffic type features. It was developed based upon
with an accuracy of 92.7%. Alrawashdeh and Purdy (2016) the DARPA’98 IDS evaluation program.
implemented deep learning-based upon unsupervised fea- All the records in KDD dataset are categorized into two
ture reduction. The logistic regression classifier was used for ways:
experimentation. The popular dataset KDD Cup 99 was used
in this work, and the results obtained had a detection rate of 1. Normal intrusion
97.90%. Dong and Wang (2016) used deep learning meth- 2. Kind of intrusion
ods along with the traffic anomaly technique. They tried to
address the problems associated with the dataset. Also, they There are four categories of intrusion in the dataset (Hari-
claimed to obtained promising results. Zhao et al. (2019) haran et al. 2019; Nkiama et al. 2016); ; :
implemented deep learning methods for the health monitor-
ing machine. Four deep learning methods were compared • Denial-of-service attack (DOS) DOS is the oldest form
and analyzed. The experiment was carried out on a famous of cyber extortion attack. Basically, in this attack, the
dataset obtaining good results. Alkasassbeh et al. (2016) attacker makes the server very busy, and because of that
designed a framework based on the Multilayer Perceptron machine or software, it denies access to the genuine user.
technique (MLP). The tests were carried out for SIDDoS • Remote to local attack (R2L) In a specific version of
and HTTP flood-based attacks. The obtained results were ncftp, the prominent FTP client, exploits a bug. The
claimed to be around 98%. Tesfahun and Bhaskari (2013) directory includes a directory with a very long name;
designed a technique based on the Oversampling technique. one or more commands contained in the name are then
The popular NSL-KDD dataset was used for the experiment, executed (unintentionally) by the FTP client with the
and the selection of features was made based on informa- user’s permission. Example: Passwords Guessing etc.
tion gain. Machine learning Random Forest algorithm was • Unauthorized access to local superuser (root) privileges
used for experimentation while performing classification. (U2R) It gets access to the root of the system and per-
Table 1 provides a comparison between existing approaches. forms various attacks and unauthorized attempts. Exam-
In Table 1 research work and state of art performed by the ple: various Buffer overflow attacks, Perl, rootkit, etc.
various researcher are discussed comparatively. The param- • Probe This assault is an endeavor to assemble data
eters considered for comparison are dataset, strength, limi- about a system of PCs for the obvious motivation behind
tation, etc. evading its security controls. For example, while send-
Even though many methodologies have been designed ing an empty message just to see if there is a destina-
to detect DDOS attacks with high accuracy, still there are tion. Ping is a common tool to send such a sample. Exam-
many areas where improvements can be made. These areas ple—port scanning, SATAN, SAINT, portweep, etc.
may include dependency on human operators, lack of freely

13
Table 1 Comparison of various existing methodologies
1980

Authors Year Methods used Methodology Dataset Strength Limitations

13
Potluri et al. (IEEE) 2016 Deep neural network (DNN) DNN NSL-KDD The evaluation was done on Performance metrics were not
a different processor discussed
You et al. (IEEE) 2016 Recurrent neural networks Automatic security auditing Collected short messages Improvement has been done Obtained accuracy is less
(RNN) tool for short messages over the SVM algorithm
(SMS)
Alrawashdeh et al. (IEEE) 2016 Restricted Boltzmann RBM with 2 hidden layers KDDCUP’99 The classification was done Lack of feature reduction
machine (RBM) using 5 classes
Dong et al. (IEEE) 2016 Deep learning methods Synthetic minority over-sam- KDD-99 Problems related to Only precision results are
pling technique (SMOTE) imbalanced datasets are compared and discussed
overcome by using the
oversampling method
Zekri et al. (IEEE) 2017 C.4.5 algorithm Wireshark tool was used to Hping3 generated attack- The approach was focused on Considered only signature-
monitor on virtual box based data was used the flooding-based attack based attacks
Shah et al. (future generation 2017 Snort adaptive plug-in Different learning algorithms NSA Snort IDS, DARPA The performance was com- Obtained simulation result
computer systems) IDS, NSL-KDD IDS pared on two IDSs gives less accuracy
Idhammad et al. (Springer 2018 Semi-supervised ML Applied entropy estimation, NSL-KDD, UNB ISCX 12 A time-based sliding window Both supervised and unsu-
Nature) approach information gain ratio, co- and UNSW-NB15 was used for entropy pervised techniques are
clustering used which requires more
computation time
Doshi et al. (IEEE) 2018 Packet-level machine learn- 5 ML algorithms were used Simulated self-generated Tested and compared results Limited features were selected
ing dataset with 5 ML classifiers KN, specifically to avoid over-
LSVM, DT, RF, NN head
Hariharan et al. (modern 2019 C 5.0 decision tree (DT) Offline detection model Simulated dataset obtained The accuracy obtained was The model is not proactive
education and computer using VMware significantly high
science press)
Aamir et al. (Elsevier) 2019 Clustering-based semi- The clustering approach CICIDS2017 Various ML algorithms Less number of features were
supervised machine are used in a diversified considered
learning manner
Dayananda et al. (Springer 2019 Access control list (ACL) Firewall Self-simulated Network and application- Cannot handle more frequent
Nature) level defense mechanism traffic due to dependency on
the firewall
Mallikarjunan et al. 2019 Naive Bayes Classification based upon Self-simulating dataset Gives promising results com- If features are increased,
(Springer Nature) NB pared to J 48 and RF accuracy will be reduced
drastically
Khamparia et al. (Library 2020 Hybrid technique Anomaly-based approach Orkut, Twitter, etc. Combination of statisti- No feature selection technique
Hi-Tech) cal and semi-supervised was applied
learning
Pande et. al. (Springer 2021 Random Forest Weka tool NSL-KDD Accuracy around 99% Used only partial dataset
Nature)
Pande et al. (World Journal 2021 Artificial neural network Deep learning NSL-KDD Accuracy around 99.99% Computation time required to
of Engineering) (ANN) train the model is more
S. Pande et al.
Feature selection and comparison of classification algorithms for wireless sensor networks 1981

3.2 Preprocessing

In this section, the raw dataset goes through a cleaning

process to make it in a format that can be used to train a
machine learning model. While making an efficient model
dataset must be in a proper format. In the KDD dataset,
some features are of an object type. For training a model,
the dataset should not contain any object-type data. Some
of the features in the KDD dataset are of an object type, so
these features must be converted to float type. A class named
LabelEncoder was used for this purpose. It encodes the fea-
tures from object data type to float data type.

3.3 Feature selection

In this process, the less redundant or more relevant features

from a dataset are selected. It is essential to select the rel-
evant features because they can affect the performance and
accuracy of the model [36].
Decreasing redundancy and selecting more suitable fea- Fig. 1 Stacked based feature selection
tures helps to:

• Decreases size of the dataset For feature selection, five different techniques have been
• Decreases the risk of overfitting used namely correlation-based feature selection, linear
• Decreases the misleading of the data discriminant analysis (LDA), univariate feature selection,
• Decreases the time of training the model recursive feature elimination (RFE), principle component
• Improves the accuracy of the model analysis (PCA). All the feature selection techniques are also
used as a stack. Initially, the Correlation-based feature selec-
Redundancy can be decreased by dropping irrelevant or tion technique is applied to the dataset. The availability of
partially relevant features from the dataset (Khamparia et al. a Correlated feature in a dataset can decrease the perfor-
2020; Ghosh et al. 2014; Hasan et al. 2016). It is the most mance of the model, and also it can affect the accuracy of
important step for almost every framework which uses a the model, so these features need to be dropped from the
dataset having high redundancy or having a large number of dataset. To make a correlation matrix Pearson’s correlation
columns. Because training the model on irrelevant features coefficient (PCC) is used.
may negatively affect the model’s accuracy (Pande et al. Pearson’s correlation If the covariance of two values is
2021a, b; Pande and Gadicha 2015; Pande and Khamparia divided and divided value is then multiplied with the stand-
2019; Madhavan et al. 2021). Figure 1 depicts the stacked ard deviation of each value, and then the output obtained is
based feature selection techniques used in the proposed the PCC.
model.
covariance(x, y)
Pseudocode of the proposed technique (using all the fea- Coefficient = (1)
(stdv(x) ∗ stdv(y))
ture selection algorithms as a stack)
Input: Set of 41 features from KDD’99 Cup dataset where x is data and y is a random variable.
Output: Best Selected Features Subset By using these coefficients, the relationship between the
1: Select the features having a value greater than or equal features can be understood. The value of the coefficients
to 0.7 and less than or equal to-0.7 and apply Correlation- always ranges from − 1 to 1. In the KDD dataset, the values
based feature selection. between − 0.5 to + 0.5 ranges have shown a significant cor-
2: Calculate Pearson’s Correlation Coefficient using relation. By measuring this relationship and putting those
equation 1. values in a matrix between each pair of values available
3: Select the features subset which satisfies the threshold. in the dataset will form a symmetric matrix. Based on this
4: Repeatedly apply the feature selection from the stack observation, correlating features are dropped. Features hav-
of univariate, RFE, PCA, LDA on the features obtained from ing their value greater than or equal to 0.7 and less than or
step3. equal to − 0.7 are dropped. The number of features before

13
1982 S. Pande et al.

dropping the correlated features was 41, and after dropping,

28 features are left. Figure 2 depicts the correlation matrix
of the KDD dataset.
Univariate feature selection Feature, which has the most
stable relationship between the label features of the dataset,
can be called an important feature. These features can have
a huge impact on the accuracy of the model. For the selec-
tion of these features, statistical approaches can be applied.
Different statistical approaches can be implemented with the
help of a class named SelectKBest. In this framework, the
Chi2 (Chi-squared) test has been implemented for selecting
hi2 score technique is implemented on the
the best features. C
dataset with f features and c classes by using the following
Fig. 3 Features selected using univariate feature selection technique
formula:

∑f ∑c (sij − 𝝁ij )2 the 41 features of the KDD dataset when the Univariate tech-
𝝌2 = (2)
i=1 j=1 𝝁ij nique is implemented separately.
Recursive feature elimination (RFE) The name of this
where sij is the ith value of the feature along with the technique is self-explanatory. It works in a loop and removes
instances. And a few features in each loop. If a dataset has high co-linearity
s∗j si∗ and dependencies, then RFE can be the best algorithm to
μij = , (3) eliminate these features. RFE first evaluates the importance
s
of the features and then ranked them accordingly. Elimina-
where si∗ is the ith value of the specific feature, s∗j is the tion of the weakest features is performed in this step. RFE
number of instances in class j, and s is the number of class mainly takes two arguments, the classifier and the num-
instances. Figure 3 depicts the 13 selected features among ber of features to be selected. A logistic regression classifier
is used in the proposed framework as it takes comparatively

Fig. 2 Plot of correlation matrix

13
Feature selection and comparison of classification algorithms for wireless sensor networks 1983

less time for training the model. RFE trains the model 3.4 Feature scaling
using the classifier provided and calculates the accuracy by
eliminating the unwanted features. RFE takes more time to Feature scaling is an essential step for making an accurate
compare to univariate feature selection because it trains the machine learning model because the feature can contain
model until the end of the loop. large values, and the model needs to be trained on these
Linear discriminant analysis (LDA) LDA is a widely used values, which can weigh more on the final result. Feature
feature selection technique. This technique mainly removes scaling is done by using the normalization technique (NT).
the redundant and dependent features from the dataset. It is a process of converting the data points into 0–1 range
Mainly, it consists of three main stages. In the first stage, it without disturbing the relation between them. For better
calculates the difference between the averages of different accuracy, this step plays a crucial role and hence needs to
classes. This difference is known as between class variance. be implemented before training the model.
In the second stage, it calculates the difference between the
average and the sample values of each class. This differ- 3.5 Model description
ence is known as within class variance. In the third stage, it
selects the features that have greater between class variance The preprocessed dataset is split into training and test-
and less within class variance. LDA is also broadly used ing sets. Five different classifiers are used for training the
in the field of bioinformatics and chemistry. Consider that model—SVM, Perceptron, K-Nearest Neighbor, Stochastic
the probability density function of x with mean vector μi Gradient Descent, and XGBoost.
and variance–covariance matrix (same for all populations) Perceptron The perceptron algorithm works like a neural
is multivariate normal in population πi. For this scenario, cell present in our body. It accepts the training data as a
the normal function of probability density is calculated as node. It consists of 2 parameters weights and biases and a
given below: function called the activation function. It runs several times
� � in a loop, and every time it changes the value of weights and
1 1
P(X∕𝝅i) = ∏ p∕2 ∑ exp − (X − 𝝁i)� (X − 𝝁i) biases to minimize the loss between predicted and actual
(2 ) � �1∕2 2
value, and the activation function decides that the particular
(4) neuron should be fired for the output layer or not. The below
Principle component analysis (PCA) PCA is widely used formula is used for calculating the activation function.
in unsupervised learning. The approach of this technique
is very simple, but it can make a fair difference between Activation = sum(weight[i] ∗ z[i]) + Bias (7)
the accuracy of the model trained when applied to all the [Where i is the index number]
features. Initially, it calculates the covariance of data points The prediction will be equal to one of the activation value
and arranged them in a matrix form. Further, it calculates is greater than or equal to zero, and it will be zero if the
the Eigen Vector and Eigen Value of that matrix. Then, it activation value is less than zero.
arranges all Eigen Vectors according to their Eigen Values Stochastic gradient descent Minimization of a function
in descending order. Furthermore, it selects the most promis- by checking the gradients of loss functions can be obtained
ing features for training the model. It converts the original by using the Gradient Descent Algorithm. After checking,
dataset into the selected number of Eigen Vectors. PCA is it updates the weights of the function. To minimize the error
also used in the field of medical science and chemistry. It is by updating the weights, this algorithm is beneficial. These
also used to reduce the distortion from a graph. The below algorithms are also called optimization algorithms. The
formula describes the covariance computation of X and Y. learning rate has to be given as an argument to the classifier
Where X and Y are matrices of m, and P is a linear transfor- to make the changes accordingly. The default value of the
mation. X is the original data set point, and Y represents the learning rate in the classifier is 0.01. The formula to update
Re-deployment dataset. the weights is given below:
PX = Y (5) Weight = weight + learn_rate ∗ (expected − predicted)∗a
(8)
n
1 ∑( )( ) where a is the input variable.
cov(X, Y) = Xi − x Yi − y (6)
n − 1 i=0 K-nearest neighbor This algorithm finds the nearest
neighbors and differentiates them in a class. It comes under
the category of the supervised algorithm. It identifies the
closest neighbor by using the Euclidean distance formula.
The implementation of this technique is simple to under-
stand. Initially, separate the dataset in training and testing

13
1984 S. Pande et al.

set, and choose important features that are required to select XGBoost: Extreme Gradient Boosting (XGBoost) is a
from the training data. Then, find the distance between all widely used algorithm for the Classification of large datasets
the points by using the Euclidean distance formula and store with a minimal amount of time. There are many advantages
it in a list. Further, the sort that lists and selects the first n of this algorithm which causes its popularity these days. It
values (number of features needed) from the dataset and then performs parallel computing due to which users get their
allocates a class to test the points based upon the majority of results faster. XGBoost classifier outperformed and accom-
classes available in the points that have been chosen. Follow- plished the most noteworthy results when compared with
ing are the various distance measuring techniques possible different classifiers. Parameters of the classifier can also be
in KNN along with their standard formulas: tuned to enhance the results.
√
∑f F() = L(�) + 𝛀(�) (12)
Euclidean distance function (Xi − Y i )2 (9)
i=1
L(�) and 𝛀(�), ∅ refers to the different parameters in the
above equation.
∑f
Manhattan distance function | X − Y i || (10) Where L(�) is a differentiable convex loss function, and
i=1 | i
𝛀(�), is a regularized term that penalizes complex models
1
(Zekri et al. 2017).
∑f The sequence of all the feature selection techniques is
Minkowski distance function( | | q q
( X − Y i |) ) (11)
i=1 | i very important because selecting the features with the dif-
where X and Y are two distinct points and f is the number ferent combinations is affecting the accuracy of the models.
of instance points. Also, these feature selection techniques are arranged in such

Fig. 4 Model training and validation

13
Feature selection and comparison of classification algorithms for wireless sensor networks 1985

a manner that the computation time of the model can be of the classifiers. Finally, all of these techniques are used
optimized. If we change the sequence of the feature selec- as a stack i.e., first, 23 features are selected from the 28
tion algorithms then the computation time will be increased features by using univariate feature selection, and then from
gradually. those 23 features, 20 best features are selected by RFE and
from those 20 features, 16 features are selected by principal
component analysis (PCA) and from those 16 features, 11
4 Experiment and result analysis features are selected by LDA and then the model is trained
with each classifier on these 11 features. At last, two feature
Each classifier used for the model is trained eleven times by selection techniques, RFE, and LDA are chosen to perform
using a different number of features. Furthermore, the entire feature scaling as a stack. The first 17 features from the 28
evaluated model is then validated by using K-fold cross-val- features are selected by RFE, and from those 17 features,
idation. It is the most reliable validation technique because 11 best features are selected by LDA and then the trained
the results generated by this technique will always be less model with each of the classifiers is implemented. K-fold
biased. The function cross_val_score() is used with 10 folds cross validation is used to evaluate the models. K-fold cross-
for K-fold cross-validation. The experiment is performed on validation splits the training set into 10 folds (default value
a system having the configuration of 8 GB of RAM, Intel of the parameter), and it trains the model on 9 folds and then
i5 8th generation quad-core processor with 1.6 GHz clock tests with the last remaining fold. Moreover, because of 10
speed. different folds, it gives ten different accuracies, and finally,
After performing Correlations-based feature selection it calculates the mean of the accuracies to get the final accu-
on the dataset, 28 features are identified, and the model is racy of the model. Table 3 gives a comparative analysis of
trained with each classifier on these 28 features. Further, 17 the obtained accuracy with other existing techniques.
features are selected from these 28 features by using four XGBoost classifier is a classifier that fuses plenty of tree
feature scaling techniques, and then the model is trained versions with lesser distinction reliability, and it creates a
with each of the classifiers. Later, 11 features are selected substantial correct and minimal False Positive item through
from the 28 features by using four feature scaling tech- the regular iteration of the model. However, XGBoost could
niques, and training of the model is carried out with each scale up beyond billions of good examples while using much

105.00%
100.00%
95.00%
90.00%
85.00%
80.00%
75.00%
70.00%
65.00%
60.00%
55.00%
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
UFS RFE PCA LDA

SVM Perceptron KNN SGD XGBoost

Fig. 5 Accuracies with 17 features

13
1986 S. Pande et al.

100.00%
95.00%
90.00%
85.00%
80.00%
75.00%
70.00%
65.00%
60.00%
55.00%
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
UFS RFE PCA LDA All as Stack RFE-LDA as stack

SVM Perceptron KNN SGB XGBoost

Fig. 6 Accuracies with 11 features

Table 2 Accuracies using K-fold cross-validation

Classifier used No. of features SVM (%) Perceptron (%) KNN (%) SGD (%) XGBoost (%)
Method used

After dropping correlations 28 77.82 25.44 99.82 22.32 99.96

By using univariaite feature selection 17 63.77 25.41 99.81 26.79 99.93
By using RFE 17 86.25 28.43 99.87 29.34 99.94
By using PCA 17 96.5 23.33 99.82 12.74 99.94
By using LDA 17 99.18 98.31 99.79 98.87 99.86
By using univariate feature selection 11 91.35 27.44 99.79 23.57 99.91
By using RFE 11 98.82 87.54 99.11 88.84 99.43
By using PCA 11 95.93 23.33 99.82 7.34 99.9
By using LDA 11 99.16 98.58 99.79 98.48 99.84
By using all the selection algorithms as a stack 11 99.13 98.62 99.86 98.63 99.87
By using RFE and LDA as a stack 11 99.16 98.59 99.81 98.65 99.82

Bold reflects the best-obtained results using KNN and XGBoost Method

Table 3 Accuracy comparison with other techniques fewer resources than existing methods. Additionally, it can
be estimated on the out-of-core, which will conserve the
Author name Classifiers used Results (%)
controller’s memory resources (Chen et al. 2018; Pande et al.
You et al. RNN 92.7 2021a, b). The complete proposed architecture is depicted in
Alrawashdeh et al. RBM 97.9 Fig. 4. Obtained accuracy results using 17 and 11 features
Zekri et al. C4.5 98.8 are depicted in Figs. 5 and 10, respectively. Accuracies of
Muhammad Aamir et al. KNN, SVM, RF 99.66 all classifiers can be seen in Fig. 6.
Li et al. AutoEncoder + DBN 92.10 Table 2 depicts the results obtained after applying various
Gao et al. DBN 93.49 combinations of feature selection techniques along with the
Proposed technique Stack based approach 99.87 various training models. Five different training algorithms
were used and their results were compared. Overall, it was
observed that KNN and XGBoost algorithm was giving
more promising results in all the scenarios. In Table 3, the

13
Feature selection and comparison of classification algorithms for wireless sensor networks 1987

SVM Perceptron KNN SGB XGBoost

110.00%
105.00%
100.00%
95.00%
90.00%
85.00%
80.00%
75.00%
70.00%
65.00%
60.00%
55.00%
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
1 2 3 4 5 6 7 8 9 10 11

1: Correlaon Matrix (28), 2: UFS(17), 3: RFE(17), 4: PCA(17), 5:LDA(17), 6: UFS(11), 7: RFE(11), 8: PCA(11), 9: LDA(11), 10: All as stack(11), 11: RFE-LDA as stack(11)

Fig. 7 Accuracies of classifiers by using different feature selection techniques

proposed technique was compared with existing methodolo- References

gies (Fig. 7).
Aamir M, Zaidi SMA (2019) Clustering-based semi-supervised
machine learning for DDoS attack classification. J King Saud
Univ Comput Inf Sci 33(4):436–446. https://doi.org/10.1016/j.
5 Conclusion jksuci.2019.02.003
Abd-Eldayem MM (2014) A proposed HTTP service based IDS. Egypt
The proposed model is trained several times with the dif- Info J 15(1):13–24. https://doi.org/10.1016/j.eij.2014.01.001
ferent number of features selected by implementing various Abdeldayem EH, Ibrahim AS, Ahmed AM, Genedi ES, Tantawy WH
(2015) Positive remodeling index by MSCT coronary angiogra-
algorithms, and their results are compared. To obtain more phy: a prognostic factor for early detection of plaque rupture and
reliable results, the model is tested using the K-fold cross- vulnerability. Egypt J Radiol Nucl Med 46(1):13–24
validation technique. The highest accuracy of 99.96% is Abualigah L, Diabat A (2021) Advances in sine cosine algorithm: a
achieved by using XGBoost for the best 28 features selected comprehensive survey. Artif Intell Rev 54:2567–2608. https://d oi.
org/10.1007/s10462-020-09909-3
after applying the correlations-based feature selection Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021)
method. Along with this, 99.87% accuracy is obtained with The arithmetic optimization algorithm. Comput Methods Appl
11 features when it is implemented as a stack. Although, Mech Eng 376:113609
according to K-fold cross-validation, XGBoost is giving Aburomman AA, Reaz MBI (2017) A survey of intrusion detection
systems based on ensemble and hybrid classifiers. Comput Secur
more than 99.8% accuracy in almost all 11 cases. Obtained 65:135–152
results also signify that better accuracy is achieved when Aljawarneh S, Aldwairi M, Yassein MB (2018) Anomaly-based intru-
RFE and LDA are used as a stack along with the correlation sion detection system through feature selection analysis and build-
technique. KNN and XGBoost both are giving fair efficien- ing a hybrid efficient model. J Comput Sci 25:152–160
Alkasassbeh M, Al-Naymat G, Hassanat A, Almseidin M (2016)
cies, but the overall performance of XGBoost is better than Detecting distributed denial of service attacks using data mining
KNN. Computation time is one of the major concerns in a techniques. Int J Adv Comput Sci Appl 7(1):436–445
large datasets like KDD and NSL-KDD. To overcome this Alrawashdeh K, Purdy C (2016) Toward an online anomaly intrusion
issue, nature inspired algorithm can be used for optimiza- detection system based on deep learning. In: 2016 15th IEEE
international conference on machine learning and applications
tion, and training, either ensemble technique or deep learn- (ICMLA). IEEE, pp 195–200
ing algorithms can be used.

13
1988 S. Pande et al.

Amiri F, Yousefi MR, Lucas C, Shakery A, Yazdani N (2011) Mutual Khamparia A, Pande S, Gupta D, Khanna A, Sangaiah AK (2020)
information-based feature selection for intrusion detection sys- Multi-level framework for anomaly detection in social network-
tems. J Netw Comput Appl 34(4):1184–1199 ing. Library Hi-Tech
Attia M, Senouci SM, Sedjelmaci H, Aglzim EH, Chrenko D (2018) An Khasawneh AM, Kaiwartya O, Abualigah LM, Lloret J (2020) Green
efficient intrusion detection system against cyber-physical attacks computing in underwater wireless sensor networks pressure-cen-
in the smart grid. Comput Electr Eng 68:499–512 tric energy modeling. IEEE Syst J 14(4):4735–4745
Borkar GM, Patil LH, Dalgade D, Hutke A (2019) A novel cluster- Kumari S, Khan MK, Atiquzzaman M (2015) User authentication
ing approach and adaptive SVM classifier for intrusion detection schemes for wireless sensor networks: a review. Ad Hoc Netw
in WSN: a data mining concept. Sustain Comput Inform Syst 27:159–194
23:120–135 Li Y, Ma R, Jiao R (2015) A hybrid malicious code detection method
Calix RA, Sankara R (2013) Feature ranking and support vector based on deep learning. Int J Secur Appl 9(5):205–216
machines classification analysis of the NSL-KDD intrusion detec- Li C, Wu Y, Yuan X, Sun Z, Wang W, Li X, Gong L (2018) Detection
tion corpus. In: International florida artificial intelligence research and defense of DDoS attack–based on deep learning in OpenFlow-
society conference, pp 292–295 based SDN. Int J Commun Syst 31(5):e3497
Chae HS, Choi SH (2014) Feature selection for efficient intrusion López J, Zhou J (eds) (2008) Wireless sensor network security, vol 1.
detection using attribute ratio. Int J Comput Commun 8:134–139 Ios Press, Amsterdam
Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and Madhavan MV, Pande S, Umekar P, Mahore T, Kalyankar D (2021)
ensemble design of intrusion detection systems. Comput Secur Comparative analysis of detection of email spam with the aid of
24(4):295–307 machine learning approaches. In: IOP conference series: materi-
Chen Y, Abraham A, Yang B (2006) Feature selection and classification als science and engineering, vol 1022, no 1. IOP Publishing, pp
using a flexible neural tree. Neurocomputing 70(1–3):305–313 012113
Chen Z, Jiang F, Cheng Y, Gu X, Liu W, Peng J (2018) XGBoost clas- Mallikarjunan KN, Bhuvaneshwaran A, Sundarakantham K, Shalinie
sifier for DDoS attack detection and analysis in SDN-based cloud. SM (2019) DDAM: detecting DDoS attacks using a machine
In: 2018 IEEE international conference on big data and smart learning approach. In: Computational intelligence: theories,
computing (big comp). IEEE, pp 251–256 applications and future directions, vol I. Springer, Singapore, pp
Cui J, Wang M, Luo Y, Zhong H (2019) DDoS detection and defense 261–273
mechanism based on cognitive-inspired computing in SDN. Nkiama H, Said SZM, Saidu M (2016) A subset feature elimination
Future Gener Comput Syst 97:275–283 mechanism for the intrusion detection system. Int J Adv Comput
Dayanandam G, Rao TV, Babu DB, Durga SN (2019) DDoS attacks— Sci Appl 7(4):148–157
analysis and prevention. In: Innovations in computer science and Ozdemir S, Xiao Y (2009) Secure data aggregation in wireless
engineering. Springer, Singapore, pp 1–10 sensor networks: A comprehensive overview. Comput Netw
Dong B, Wang X (2016) Comparison deep learning method to tradi- 53(12):2022–2037
tional methods using for network intrusion detection. In: 2016 8th Pande S, Gadicha AB (2015) Prevention mechanism on DDOS attacks
IEEE international conference on communication software and by using multilevel filtering of distributed firewalls. International
networks (ICCSN). IEEE, pp 581–585 Journal on Recent and Innovation Trends in Computing and Com-
Doshi R, Apthorpe N, Feamster N (2018) Machine learning DDoS munication 3(3):1005–1008
detection for the consumer internet of things devices. In: 2018 Pande SD, Khamparia A (2019) A review on detection of DDOS attack
IEEE security and privacy workshops (SPW). IEEE, pp 29–35 using machine learning and deep learning techniques. Think India
Fotue D, Melakessou F, Labiod H, Engel T (2011) Mini-sink mobility J 22(16):2035–2043
with diversity-based routing in wireless sensor networks. In: Pro- Pande SD, Bhagat VB (2016) Hybrid wireless network approach for
ceedings of the 8th ACM symposium on performance evaluation QoS. Int J Recent Innov Trends Comput Commun 4(4):327–332
of wireless ad hoc, sensor, and ubiquitous networks, pp 9–16 Pande S, Khamparia A, Gupta D, Thanh DN (2021a) DDOS detection
Gao N, Gao L, Gao Q, Wang H (2014) An intrusion detection model using machine learning technique. In: Recent studies on compu-
based on deep belief networks. In: 2014 second international con- tational intelligence. Springer, Singapore, pp 59–68
ference on advanced cloud and big data. IEEE, pp 247–252 Pande S, Khamparia A, Gupta D (2021b) An intrusion detection system
Ghosh P, Debnath C, Metia D, Dutta R (2014) An efficient hybrid mul- for healthcare systems using machine and deep learning. World J
tilevel intrusion detection system in a cloud environment. IOSR J Eng. https://doi.org/10.1108/WJE-04-2021-0204
Comput Eng 16(4):16–26 Potluri S, Diedrich C (2016) Accelerated deep neural networks for the
Gündüz SY, Çeter MN (2018) Feature selection and comparison of enhanced intrusion detection system. In: 2016 IEEE 21st interna-
classification algorithms for intrusion detection. Anadolu Univ tional conference on emerging technologies and factory automa-
J Sci Technol A Appl Sci Eng 19(1):206–218. https://doi.org/10. tion (ETFA). IEEE, pp 1–8
18038/aubtda.356705 Safaldin M, Otair M, Abualigah L (2021) Improved binary gray wolf
Hariharan M, Abhishek HK, Prasad BG (2019) DDoS attack detec- optimizer and SVM for intrusion detection system in wireless sen-
tion using C5.0 machine learning algorithm. IJ Wirel Microwave sor networks. J Ambient Intell Humaniz Comput 12(2):1559–1576
Technol 1:52–59 Selvakumar B, Muneeswaran K (2019) Firefly algorithm-based fea-
Hasan MAM, Nasser M, Ahmad S, Molla KI (2016) Feature selec- ture selection for network intrusion detection. Comput Secur
tion for intrusion detection using random forest. J Inf Secur 81:148–155
7(3):129–140 Shah SAR, Issac B (2018) Performance comparison of intrusion detec-
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed tion systems and application of machine learning to Snort system.
28 Oct 1999 Future Gener Comput Syst 80:157–170
Idhammad M, Afdel K, Belouch M (2018) Semi-supervised Tesfahun A, Bhaskari DL (2013) Intrusion detection using random
machine learning approach for DDoS detection. Appl Intell forests classifier with SMOTE and feature reduction. In: 2013
48(10):3193–3208 international conference on cloud & ubiquitous computing &
Javaid A, Niyaz Q, Sun W, Alam M (2016) A deep learning approach emerging technologies. IEEE, pp 127–132
for network intrusion detection systems. EAI Endorsed Trans Tripathy S, Nandi S (2008) Defense against outside attacks in wireless
Secur Saf 3(9):e2 sensor networks. Comput Commun 31(4):818–826

13
Feature selection and comparison of classification algorithms for wireless sensor networks 1989

Vinutha HP, Basavaraju P (2018) Analysis of feature selection and Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX (2019) Deep learn-
ensemble classifier methods for intrusion detection. Int J Natural ing and its applications to machine health monitoring. Mech Syst
Comput Res 7(1):57–72 Signal Process 115:213–237
You L, Li Y, Wang Y, Zhang J, Yang Y (2016) A deep learning-based
RNNs model for an automatic security audit of short messages. Publisher’s Note Springer Nature remains neutral with regard to
In: 2016 16th international symposium on communications and jurisdictional claims in published maps and institutional affiliations.
information technologies (ISCIT). IEEE, pp 225–229
Zekri M, El Kafhali S, Aboutabit N, Saadi Y (2017) DDoS attack
detection using machine learning techniques in cloud comput-
ing environments. In: 2017 3rd international conference of cloud
computing technologies and applications (CloudTech). IEEE, pp
1–7. https://doi.org/10.1109/CloudTech.2017.8284731

Research Proposal For MS (CS) Thesis
100% (1)
Research Proposal For MS (CS) Thesis
9 pages
FAIR2019 Paper 69
No ratings yet
FAIR2019 Paper 69
12 pages
2018 Computers and Security Journal Paper
No ratings yet
2018 Computers and Security Journal Paper
21 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
An Ensemble Approach For Feature Selection of Cyber Attack Dataset
No ratings yet
An Ensemble Approach For Feature Selection of Cyber Attack Dataset
7 pages
Feature Selection and Intrusion Classification in NSL-KDD Cup 99 Dataset Employing SVMs
No ratings yet
Feature Selection and Intrusion Classification in NSL-KDD Cup 99 Dataset Employing SVMs
6 pages
Recall, Precision
No ratings yet
Recall, Precision
7 pages
Proofreading
No ratings yet
Proofreading
23 pages
s40537-023-00694-8
No ratings yet
s40537-023-00694-8
26 pages
Next-gen Network Attack Detection With Machine Learning and Deep Learning Techniques
No ratings yet
Next-gen Network Attack Detection With Machine Learning and Deep Learning Techniques
5 pages
Network Traf Fic Classification Using Multiclass Classi Fier
No ratings yet
Network Traf Fic Classification Using Multiclass Classi Fier
10 pages
Improving network intrusion detection by identifying effective features based on probabilistic dependency trees and evolutionary algorithm
No ratings yet
Improving network intrusion detection by identifying effective features based on probabilistic dependency trees and evolutionary algorithm
13 pages
s40537-024-00887-9
No ratings yet
s40537-024-00887-9
25 pages
SSRN Id2376652
No ratings yet
SSRN Id2376652
8 pages
Survey 2006
No ratings yet
Survey 2006
15 pages
Machine Learning Based Intrusion Detection Systems Using HGWCSO and ETSVM Techniques
No ratings yet
Machine Learning Based Intrusion Detection Systems Using HGWCSO and ETSVM Techniques
4 pages
Feature Selection Approach For Intrusion Detection System Based On Pollination Algorithm
No ratings yet
Feature Selection Approach For Intrusion Detection System Based On Pollination Algorithm
5 pages
Intrusion Detection System Based On Support Vector Machines and The Two-Phase Bat Algorithm
No ratings yet
Intrusion Detection System Based On Support Vector Machines and The Two-Phase Bat Algorithm
16 pages
Project Sample
No ratings yet
Project Sample
55 pages
A Study of Machine Learning Algorithms For DDoS Detection
No ratings yet
A Study of Machine Learning Algorithms For DDoS Detection
7 pages
Manjunath_jusstuu
No ratings yet
Manjunath_jusstuu
11 pages
Intrusion Detection
No ratings yet
Intrusion Detection
7 pages
An Ensemble Approach For Intrusion Detection System Using Machine Learning Algorithms
No ratings yet
An Ensemble Approach For Intrusion Detection System Using Machine Learning Algorithms
4 pages
4. a Mechatronics System Based on Feature Selection and AI For
No ratings yet
4. a Mechatronics System Based on Feature Selection and AI For
13 pages
Enhanced Intrusion Detection System Usin
No ratings yet
Enhanced Intrusion Detection System Usin
8 pages
Sada
No ratings yet
Sada
11 pages
2012-Elsiver-An Efficient Intrusion Detection System Based On Support Vector Machines
No ratings yet
2012-Elsiver-An Efficient Intrusion Detection System Based On Support Vector Machines
7 pages
DattaDeshmukhecs 2014 6892542
No ratings yet
DattaDeshmukhecs 2014 6892542
7 pages
Seguridad
No ratings yet
Seguridad
29 pages
A Subset Feature Elimination Mechanism For Intrusion Detection System
No ratings yet
A Subset Feature Elimination Mechanism For Intrusion Detection System
10 pages
Intrusion Detection System IDS Developme
No ratings yet
Intrusion Detection System IDS Developme
17 pages
Intrusion Detection System (Ids) Development Using Tree - Based Machine Learning Algorithms
No ratings yet
Intrusion Detection System (Ids) Development Using Tree - Based Machine Learning Algorithms
17 pages
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
No ratings yet
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
17 pages
Ijnsa 040604
No ratings yet
Ijnsa 040604
16 pages
Applsci 14 01044 v3
No ratings yet
Applsci 14 01044 v3
25 pages
2015-Elsevier-Multi-objective-optimization-of-shared-nearest-neighbor-similarity-for-feature-selection
No ratings yet
2015-Elsevier-Multi-objective-optimization-of-shared-nearest-neighbor-similarity-for-feature-selection
12 pages
Sat - 35.Pdf - Detection of Attacks (DoS Probe) Using Genetic Algorithm
No ratings yet
Sat - 35.Pdf - Detection of Attacks (DoS Probe) Using Genetic Algorithm
11 pages
ContentServer
No ratings yet
ContentServer
33 pages
2.1(4)
No ratings yet
2.1(4)
9 pages
INDEX1
No ratings yet
INDEX1
15 pages
Feature Selection For Intrusion Detection Using Neural Networks and Support Vector Machines
No ratings yet
Feature Selection For Intrusion Detection Using Neural Networks and Support Vector Machines
17 pages
Futureinternet 12 00054 v2 PDF
No ratings yet
Futureinternet 12 00054 v2 PDF
14 pages
Using A Long Short Term Memory Recurrent
No ratings yet
Using A Long Short Term Memory Recurrent
21 pages
11 (1)
No ratings yet
11 (1)
6 pages
Analysis of Network Traffic Features For Anomaly Detection: Félix Iglesias Tanja Zseby
No ratings yet
Analysis of Network Traffic Features For Anomaly Detection: Félix Iglesias Tanja Zseby
26 pages
8
No ratings yet
8
1 page
2023 Scopus Ensemble Based Dimensionality
No ratings yet
2023 Scopus Ensemble Based Dimensionality
5 pages
Anomaly Detection by Using CFS Subset and Neural Network With WEKA Tools
No ratings yet
Anomaly Detection by Using CFS Subset and Neural Network With WEKA Tools
8 pages
Time-Aware Detection Systems: Proceedings
No ratings yet
Time-Aware Detection Systems: Proceedings
3 pages
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
No ratings yet
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
12 pages
Intrusion Detection - Full Doc Java
No ratings yet
Intrusion Detection - Full Doc Java
88 pages
eTasci
No ratings yet
eTasci
26 pages
Systematic Approach To Intrusion Evaluation Using The Rough Set Based Classification
No ratings yet
Systematic Approach To Intrusion Evaluation Using The Rough Set Based Classification
6 pages
Ijcse V2i3p4
No ratings yet
Ijcse V2i3p4
6 pages
Article 13
No ratings yet
Article 13
6 pages
Selecting Critical Features For Data Classification Based On Machine Learning Methods
No ratings yet
Selecting Critical Features For Data Classification Based On Machine Learning Methods
26 pages
Project Day Correction
No ratings yet
Project Day Correction
49 pages
Ke 2021 J. Phys. Conf. Ser. 2113 012074
No ratings yet
Ke 2021 J. Phys. Conf. Ser. 2113 012074
14 pages
Journal Tiis 12-10 TIISVol12No10-24
No ratings yet
Journal Tiis 12-10 TIISVol12No10-24
22 pages
Artificial Intelligence and Natural Algorithms
From Everand
Artificial Intelligence and Natural Algorithms
Rijwan Khan
No ratings yet
An Iot Based Smart Power Mangement System For Technical University
No ratings yet
An Iot Based Smart Power Mangement System For Technical University
7 pages
Data Aggregation Using Compressive Sensing For Energy Efficient
No ratings yet
Data Aggregation Using Compressive Sensing For Energy Efficient
10 pages
Distributed Systems and Technologies: Chapter 1
100% (1)
Distributed Systems and Technologies: Chapter 1
37 pages
A Wearable Nutrition Monitoring System
No ratings yet
A Wearable Nutrition Monitoring System
6 pages
Energy Consumption of Wireless IoT Nodes (PDFDrive)
No ratings yet
Energy Consumption of Wireless IoT Nodes (PDFDrive)
151 pages
Parikrama Polytechnic Mandatory Disclosure PDF
No ratings yet
Parikrama Polytechnic Mandatory Disclosure PDF
140 pages
Icics 2017
No ratings yet
Icics 2017
88 pages
Research Interests of Professors of ECE Department 2024-25
No ratings yet
Research Interests of Professors of ECE Department 2024-25
12 pages
Media To Upload16624
No ratings yet
Media To Upload16624
10 pages
Neuro-OPS-79 IoT Based Smart Irrigation System Using Arduino
No ratings yet
Neuro-OPS-79 IoT Based Smart Irrigation System Using Arduino
7 pages
All Question Paper
No ratings yet
All Question Paper
6 pages
WSN Chapter 3
No ratings yet
WSN Chapter 3
75 pages
IoT-Based EHealth Data Acquisition System
No ratings yet
IoT-Based EHealth Data Acquisition System
5 pages
Seminar Report
No ratings yet
Seminar Report
34 pages
Network Protocols, Schemes, and Mechanisms For Internet of Things (IoT) Features, Open Challenges, and Trends
No ratings yet
Network Protocols, Schemes, and Mechanisms For Internet of Things (IoT) Features, Open Challenges, and Trends
25 pages
Vishnu KP
No ratings yet
Vishnu KP
22 pages
Rc5 Algorithm: Potential Cipher Solution For Security in Wireless Body Sensor Networks (WBSN)
No ratings yet
Rc5 Algorithm: Potential Cipher Solution For Security in Wireless Body Sensor Networks (WBSN)
7 pages
IOT AND WSM MODULE 1 AND 2 NOTES
No ratings yet
IOT AND WSM MODULE 1 AND 2 NOTES
37 pages
p3 PDF
No ratings yet
p3 PDF
5 pages
Call For Book Chapter (Springer)
100% (1)
Call For Book Chapter (Springer)
3 pages
Exercise 5 Arduino To Arduino Communication
No ratings yet
Exercise 5 Arduino To Arduino Communication
7 pages
Tutorial On Contiki OS, Cooja and FIT Iot-Lab: Internet of Things
No ratings yet
Tutorial On Contiki OS, Cooja and FIT Iot-Lab: Internet of Things
31 pages
Lecture 42 - Sensor-Cloud - I
No ratings yet
Lecture 42 - Sensor-Cloud - I
14 pages
9783319706870
100% (1)
9783319706870
383 pages
EC2015 7semand8th PDF
No ratings yet
EC2015 7semand8th PDF
47 pages
Planning The Maintenance of Green
No ratings yet
Planning The Maintenance of Green
17 pages
A Comparative Study of Wireless Protocols With Li-Fi Technology: A Survey
No ratings yet
A Comparative Study of Wireless Protocols With Li-Fi Technology: A Survey
6 pages
Unit - IV
No ratings yet
Unit - IV
70 pages
A GSM, WSN and Embedded Based Kitchen Monitoring System
No ratings yet
A GSM, WSN and Embedded Based Kitchen Monitoring System
2 pages
Darwish 2017
No ratings yet
Darwish 2017
16 pages

Feature Selection and Comparison of Classifcation Algorithms

Uploaded by

Feature Selection and Comparison of Classifcation Algorithms

Uploaded by

Journal of Ambient Intelligence and Humanized Computing (2023) 14:1977–1989

Feature selection and comparison of classification algorithms

1 Introduction Numerous sensors are used in a WSN to transmit the col-

Authors Year Methods used Methodology Dataset Strength Limitations

In this section, the raw dataset goes through a cleaning

In this process, the less redundant or more relevant features

dropping the correlated features was 41, and after dropping,

Fig. 2 Plot of correlation matrix

Fig. 4 Model training and validation

SVM Perceptron KNN SGD XGBoost

Fig. 5 Accuracies with 17 features

SVM Perceptron KNN SGB XGBoost

Fig. 6 Accuracies with 11 features

Table 2 Accuracies using K-fold cross-validation

After dropping correlations 28 77.82 25.44 99.82 22.32 99.96

SVM Perceptron KNN SGB XGBoost

Fig. 7 Accuracies of classifiers by using different feature selection techniques

proposed technique was compared with existing methodolo- References

You might also like