0% found this document useful (0 votes)
58 views

Wireless Sensor Network Intrusion Detection System Based On MK-ELM

This document discusses a wireless sensor network intrusion detection system based on a multi-kernel extreme learning machine (MK-ELM). It proposes using a hierarchical clustering model for wireless sensor networks and a MK-ELM classification algorithm to improve detection accuracy while reducing false alarms and detection time. The system was able to guarantee high detection accuracy while dramatically reducing detection time, making it well-suited for resource-constrained wireless sensor networks.

Uploaded by

kishore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Wireless Sensor Network Intrusion Detection System Based On MK-ELM

This document discusses a wireless sensor network intrusion detection system based on a multi-kernel extreme learning machine (MK-ELM). It proposes using a hierarchical clustering model for wireless sensor networks and a MK-ELM classification algorithm to improve detection accuracy while reducing false alarms and detection time. The system was able to guarantee high detection accuracy while dramatically reducing detection time, making it well-suited for resource-constrained wireless sensor networks.

Uploaded by

kishore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Soft Computing

https://doi.org/10.1007/s00500-020-04678-1 (0123456789().,-volV)(0123456789().
,- volV)

METHODOLOGIES AND APPLICATION

Wireless sensor network intrusion detection system based on MK-ELM


Wenjie Zhang1 • Dezhi Han1 • Kuan-Ching Li2 • Francisco Isidro Massetto3

Ó Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract
Advances in digital electronics, wireless communications, and electro-mechanical systems technology have revolutionized
the society and economy across the globe by enabling the development of low-cost, low-power, and multi-functional sensor
nodes, from which the sensor networks are realized by leveraging the features of sensing, data processing, and commu-
nication present in these nodes. Though the energy of the wireless sensor network (WSN) nodes is limited, the detection of
existing intrusion detection systems in WSN is weakly accurate further. To reduce the energy consumption of nodes in
WSNs during detection processing, we propose a hierarchical intrusion detection model that clusters the nodes in a WSN
according to their functions. Even more, to improve the detection accuracy of abnormal behavior of the WSN intrusion
detection system and reduce the false alarm rate, it is considered in this research the usage of the classification algorithm of
kernel extreme learning machine, following to Mercer Property to synthesize multi-kernel functions. We realize the
optimal linear combination by testing and applying the multi-kernel function and build a multi-kernel extreme learning
machine to WSN intrusion detection systems. Simulation results show that the system not only guarantees a high detection
accuracy but also dramatically reduces the detection time, being well suited for resource-constrained WSNs.

Keywords Wireless sensor networks  Intrusion detection  Kernel extreme learning machine  Multi-kernel learning

1 Introduction information systems’ availability in the network system


infrastructure. Intrusion detection technology is widely
With the increased connectivity between networks, the used in network security protection, as it refers to col-
openness of wireless sensor network deployment area and lecting and analyzing data from the network to detect
the broadcast ability of wireless communication make the abnormal behavior in the network.
network vulnerable to external attacks or intrusions, By using different detection methods, wireless sensor
severely increasing the exposure to risks that threaten networks (WSNs) intrusion detection is classified into 2
categories: anomaly detection and misuse detection. The
former is a detection method based on a mathematical
Communicated by V. Loia.
model since a standard network model is established with
& Kuan-Ching Li normal network behavior profile and calculated whether
[email protected] specific feature values of network behavior deviate from
Dezhi Han average values. If the limit threshold is exceeded, it is
[email protected] determined that an intrusion has occurred. Anomaly
Francisco Isidro Massetto detection methods include (1) anomaly detection based on
[email protected] data mining, (2) anomaly detection based on machine
learning, and (3) anomaly detection based on clustering;
1
College of Information Engineering, Shanghai Maritime the latter is an intrusion detection method based on an
University, Shanghai 201306, China
information base, as it establishes a data state information
2
Department of Computer Science and Information base for known attack network behavior and establishes
Engineering (CSIE), Providence University, Taichung 43301,
Taiwan one or more matching patterns for each intrusion. By
3
matching with user behavior, if the matching patterns are
Center for Cognition and Complex Systems, Universidade
Federal do ABC (UFABC), Santo André, SP 09210-580, found in the information base, the existing intrusion pat-
Brazil terns can be quickly detected.

123
W. Zhang et al.

Neural networks have the advantages of self-learning existing works within the field of WSN intrusion detection
ability, classification ability, and good robustness, and have system. In this section, we will discuss the most current
attracted a large number of scholars to investigate the notable works.
intrusion detection algorithm based on neural network and In previous studies, WSN intrusion detection mostly
have achieved interestingly good results. Shone et al. utilizes single-point independent detection, as proposed
(2018) proposed a deep learning approach to network and deployed by (Silva et al. 2005) the detection algorithm
intrusion detection, which reduces the training time of on a single detection node. In the analysis of intrusion
samples and has high accuracy and detection rate. Yin et al. detection, once the number of failures caused by non-
(2017) proposed an intrusion detection algorithm based on compliance with rules is more than the number of failures
recurrent neural network incurred the performance in bin- caused by accidental network reasons, the intrusion is
ary classification and multi-classification, as well as the determined to have occurred. Based on linear prediction
effect of a different number of neurons and learning rate on theory, Han et al. (2010) constructed a Markov mathe-
model performance. matical prediction model on a single sensor node. If the
Kernel extreme learning machine (KELM) is an excel- absolute value of the difference between the actual and
lent classification algorithm in artificial neural networks predicted network traffic is higher than the preset threshold
(ANNs). Huang and Chen (2007) studied the least-squares value, an attack behavior is determined. Due to the
supported vector machines (LS-SVM) and found that the resource-constrained of nodes in WSN, single-point inde-
kernel function has excellent advantages in dealing with pendent detection is not applicable.
large-scale complex data. Later, Huang et al. introduced For planar networks, peer-to-peer cooperative detection
kernel function into ELM to construct the kernel extreme is mainly used. Ping et al. (2015) designed a multi-agent
learning machine (KELM) with the least-square optimal intrusion detection system based on immune theory. The
solution. Compared with the extreme learning machine monitoring agent is deployed on each node, and the deci-
(ELM) (Huang et al. 2006), KELM does not need to set the sion agent matches the collected data features. Once an
number of network hidden layer nodes, since the kernel attack is determined on a sensor node, the nearby Killer
function is used to represent the unknown nonlinear feature agent is activated, and the Killer agent responds and iso-
mapping of the hidden layer, and the regularized least- lates the anomalous node. Hierarchical detection is mainly
squares algorithm calculates the output weights of the used for heterogeneous networks, as proposed by Rani and
network (Liang et al. 2019). Fast calculation speed and Jayakumar (2017) a hierarchical intrusion detection
high classification accuracy of KELM algorithms are method that layers the WSN network progressively. The
shown to be attractive. In this paper, an intrusion detection sensor nodes are set as the first layer, the aggregation nodes
system based on multi-kernel extreme learning machine as the second layer, and the upper base station is the third
(MK-ELM) for clustered WSN environments, showing a layer responsible for anomaly detection of received infor-
high detection rate, low false positive rate, and low energy mation, analyzing data and judging whether an intrusion
consumption is proposed. Experimental results obtained has occurred.
show promising performance with breakneck learning Due to the random initialization of the ELM algorithm,
speed. it is difficult to build a sample-based nonlinear model.
The remaining of this paper is organized as follows. KELM solves this problem and shows good robustness to
Related work is introduced in Sect. 2, the proposed multi- model parameters. Zhang et al. (2014) proposed an online
kernel extreme learning machine (MK-ELM) construction modeling of kernel extreme learning machine based on fast
approach is described in Sect. 3, and the design of WSN leave-one-out cross-validation. Experimental results show
intrusion detection system based on MK-ELM is presented that the proposed algorithm improves the detection rate of
in Sect. 4. In Sect. 5, experimental results and discussions the original kernel extreme learning machine, though the
are presented. Finally, conclusion remarks and future work random selection of dataset has a high impact on the
are given in Sect. 6. classification performance. Tang et al. (2016) proposed an
extreme learning machine for multi-layer perceptron and
tested on KDD CUP 99 dataset, where the performance
2 Related work compared with previous results shows to be effective.
Wang et al. (2018) applied the equality constrained-opti-
Wireless sensor network is obtaining significant interest, mization-based extreme learning machine to network
and its application is being investigated within many intrusion detection and proposed an adaptive optimization
research fields. The security of wireless sensor networks is criterion for hidden neurons, which effectively establish a
becoming more and more important, so its intrusion model with high attack detection rate and fast learning
detection is particularly significant. There are several speed.

123
Wireless sensor network intrusion detection system based on MK-ELM

Borkar et al. (2019) presented an efficient clustering


technique called adaptive chicken swarm optimization
algorithm. Through this adaptive method, the lifetime and
scalability of the WSN are improved, and the time con-
sumption is also greatly reduced. In addition, a two-stage
classification method called adaptive SVM is proposed,
which uses an acknowledgment-based method to report
malicious sensor nodes. Their work concluded that the
hierarchical intrusion detection model offers better accu-
racy than conventional method. Dai and Pan (2019) pro- Fig. 1 Basic ELM structure
posed an improved DBN-ELM integrated intrusion
detection classification, the model uses the feature extrac- represents activation function of the hidden layer, w is the
tion of the DBN to represent the learning network, and the weight matrix of size m  L, wi represents the weight
ELM and final learning are determined by the majority vector between the i-th node of the hidden layer and the
vote. Although this algorithm improves the accuracy and input layer, bi is bias value of the i-th node in the hidden
reduces the false alarm rate, it increases the complexity of layer. b is the weight matrix of size L  n between the
the algorithm. hidden layer and output layer, bi represents the weight
The focus of the above references is to solve WSN vector between the i-th node of the hidden layer and the
intrusion detection problems with previous learning algo- output layer. Randomly generated of wi and bi . The solu-
rithms or to solve intrusion detection problems with tion is transformed into Moore–Penrose generalized
machine learning methods in traditional networks. There- inverse. Thus, the ELM can directly generate a globally
fore, it is proposed in this paper an intrusion detection optimal solution, and the resolution speed is fast (Huang
method for WSNs based on a multi-kernel extreme learn- et al. 2015).
ing machine (MK-ELM). By comparing to previous works, The output of an ELM network with L hidden neurons is
MK-ELM model is applied for classification. As there is no expressed as:
need for iterative training, this algorithm is fast and time- X
L

saving. Besides, NSL-KDD and UNSW-NB 15 datasets are y¼ bi gðwTi x þ bi Þ ð1Þ


i¼1
applied to training and testing the model, comparing the
experimental results with SVM (Maleh et al. 2015) and Moreover, the hidden layer output matrix is as follows:
basic ELM (Zhang et al. 2014; Zhang 2014). 2 3
gðwT1 x1 þ b1 Þ    gðwTL x1 þ bL Þ
Based on the issues mentioned above, the contributions 6 7
H¼4 .. .. .. ð2Þ
in this paper are twofold. The former is the investigation of . . . 5
T T
the intrusion detection system model and propose a hier- gðw1 xm þ b1 Þ    gðwL xm þ bL Þ
archical intrusion detection model in clustered WSN, while
where hðxi Þ ¼ gðwT1 xi þ b1 Þ is a function of hidden layer
the latter is the design of a suitable multiple kernel function
node mapping, which is only related to xi .
applying the extreme learning machine based on the theory
Next, the optimization objective of ELM is as follows:
of multi-kernel function. By training and adjusting the
required parameters, the multi-kernel extreme learning C
min jjHb  Tjj2 þ jjbjj2 ð3Þ
machine (MK-ELM) investigated meets the efficiency of b 2
the intrusion detection system design.
where C is the regularization parameter, and Eq. (3) can be
solved as:
3 ELM algorithm b ¼ ðH T H þ CIÞy H T T ð4Þ

3.1 Kernel extreme learning machine where ðH T H þ CIÞy is the Moore–Penrose generalized
inverse of ðH T H þ CIÞ.
The network of the extreme learning machine is single- Accordingly, the classification of the extreme learning
hidden-layer feed-forward neural networks (SLFNs). As machine can be expressed as:
shown in Fig. 1, m is the number of input layer nodes, L is  1 !
T I T
the number of hidden layer nodes, and n is the number of f ðxÞ ¼ sign hðxÞH þ HH T ð5Þ
C
output layer nodes. The training samples are x1 ; x2 ; . . .; xp ,
and the corresponding labels are t1 ; t2 ; . . .; tp . gðxÞ

123
W. Zhang et al.

As shown in Fig. 1, the activation function in the output


matrix H of the hidden layer is unknown, and kernel
functions can be introduced into ELM. Moreover, a kernel
function to replace HH T is built, presented in Eq. (6) as:
HH T ði; jÞ ¼ Kðxi ; xj Þ ð6Þ
2 3
Kðx1 ; x1 Þ
 Kðx1 ; xj Þ
T 6 .. .. .. 7
HH ¼ XELM ¼ 4 . . . 5 ð7Þ
KðxN ; x1 Þ    KðxN ; xN Þ
2 3
Kðx; x1 Þ
6 .. 7
hðxÞH T ¼ 4 . 5 ð8Þ
Kðx; xN Þ Fig. 2 Linear combination schematic diagram of multi-kernel
functions
By substituting Eqs. (6)–(8) into Eq. (5), the equations
above can be equivalently written as: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
02 3T 1 normalized as follows: Kðx; xÞKðx1 ; x1 Þ. Using the
Kðx; x1 Þ  1 symbols introduced above, the following synthetic kernels
B6 .. 7 I C can be defined:
f ðxÞ ¼ sign@4 . 5 þ XELM TA ð9Þ
C
Kðx; xN Þ (a) Direct summation kernel:
As shown in Eq. (9), kernel extreme learning machine X
M
(KELM), as an optimization solution that is a combination Kðx; xi Þ ¼ K^j ðx; xi Þ ð10Þ
of machine learning theory with standard optimization j¼1
method, has better generalization performance due to rel-
atively weak optimization constraints. In the specific kernel
implementation of the extreme learning machine, the fea- (b) Weighted summation kernel:
ture mapping gðxi Þ of hidden layer is usually unknown, and
the corresponding kernel (e.g., the kernel X
M

2 2 Kðx; xi Þ ¼ bj K^j ðx; xi Þ


Kðx; xi Þ ¼ expðjjx  xi jj =d Þ) is generally given. Com- j¼1
paring with traditional support vector machine (SVM), ð11Þ
X
M
KELM has better performance owing to fewer constraints s.t. bj  0; bj ¼ 1
(Cao et al. 2014). j¼1

3.2 Multi-kernel learning theory


(c) Weighted polynomial extended kernel:
By combining the kernel functions with different charac-
teristics, advantages of multi-kernel functions can be Kðx; xi Þ ¼ lK^1p ðx; xi Þ þ ð1  lÞK^2p ðx; xi Þ ð12Þ
obtained, such as better mapping performance. Mercer’s p
where K ðx; xi Þ is a polynomial extended kernel of Kðx; xi Þ.
theorem (Girolami 2002) is a sufficient condition for con-
structing kernel functions, referring that any semi-positive
3.3 The proposed MK-ELM
definite symmetric function can be used as a kernel func-
tion. Different kernel functions have different effects on the
The KELM algorithm combines the advantages of extreme
performance of the constructed MK-ELM. Thereupon, a
learning machine and the generalization performance of the
weighted multi-kernel synthesis method has emerged,
support vector machine method, though still under the
whereas similar multi-kernel method is obtained by the
single-kernel learning method. Besides, its classification
linear combination of multiple kernels. Figure 2 is the
performance is affected by the type of kernel function and
schematic diagram of this composition.
the selection of kernel parameters. Taking into considera-
Next, the linear combination of multi-kernel functions
tion that KELM is composed of a single-kernel function, it
as a math formula is described. Suppose that Kðx; xi Þ is a
has drawbacks such as poor robustness and low detection
^ xi Þ is the normalized form
known kernel function, and Kðx; accuracy. In according with the Mercer property of kernel
of the kernel function, the kernel function can be function, combining the multi-kernel learning method with

123
Wireless sensor network intrusion detection system based on MK-ELM

ELM and the proposed multi-kernel extreme learning compare the performance of MK-ELM composed of
machine model, the algorithm is derived next. different single-kernel functions,
Linear combination of multi-kernel functions usually – Step 6 Output the classification result and get the MK-
adopts several kernels, e.g., linear kernel Kðx; xi Þ ¼ x  xi , ELM classifier.
Gaussian kernel Kðx; xi Þ ¼ expðjjx  xi jj2 =d2 Þ, and The process of MK-ELM algorithm is illustrated in
polynomial kernel Kðx; xi Þ ¼ ðx  xi þ 1Þd . The obtained Fig. 3.
multi-kernel function overcomes the deficiency present in
single-kernel functions, and the form of the multi-kernel
function is as follows: 4 WSN intrusion detection system
Kðx; xi Þ ¼ l1 K1 ðx; xi Þ þ l2 K2 ðx; xi Þ
þ l3 K3 ðx; xi Þ þ    þ lm Km ðx; xi Þ
4.1 Wireless sensor network model
ð13Þ
X
m
s.t. lk ¼ 1; 8uk  0 The wireless sensor network (WSN) consists of sensor
k¼1 nodes, cluster head nodes, sink nodes, and management
node (Butun et al. 2014). To ensure the stable operation of
where each of the kernel function in Eq. (13) may be a
the network, the WSN is clustered to effortless manage-
different type kernel, such as a Gaussian kernel and a
ment processing. A large number of sensing nodes are
wavelet kernel, which can also be the same type kernel
deployed in the monitoring area, and they form a network
with different kernel parameters.
in a self-organizing manner; the cluster head nodes trans-
The optimization problem of the multi-kernel extreme
mit the collected information to the sink node through
learning machine can be described as:
multi-hop relay, reaching to the management node through
1X 1 2 1X N
satellite or the Internet. The user can remotely configure or
minLMKELM ¼ jjwk jj þ C n2
2 ku
k 2 i¼1 i manage the network through the management node and
X issue monitoring tasks.
s.t. Kðx; xi Þwk ¼ ti  ni ; i ¼ 1; . . .; N; ð14Þ
k The WSN network model is shown in Fig. 4. The model
X 1 consists of three parts: a large number of sensor nodes
kl
¼1
k distributed in the monitoring area, a small number of
scattered sink nodes and management node. The functions
where wk is feature weight corresponding to the adopted
of each part are as follows:
kernel function Kðx; xi Þ. ni is the predicted error of sample
i, and C is regularization parameter to balance model (1) Sensor node As the basis of WSN, the main task is to
complexity and predictive performance. collect the data information of each range, process
By replacing the kernel matrix of Eq. (9) in the ELM the collected information, and then transmit the
with the newly constructed multi-kernel function, the
multi-kernel extreme learning machine (MK-ELM) algo-
rithm is obtained by following:
– Step 1 Initialize sample set N, N ¼ fðxi ; ti Þjxi 2 Rm ; Start
ti 2 Rm ; i ¼ 1; 2; . . .; Ng,
– Step 2 Through constructed multi-kernel formula Input training sample,
Initialize the training
Eq. (13), combining different single-kernel functions, training MK-ELM,
sample
selecting the optimal kernel function combination, and update H and
determining the regularization parameter C and kernel
parameters; construct optimal MK-ELM, Construct MK-ELM Input test sample,
– Step 3 Randomly generated weights w and bias model Test MK-ELM
b between the input layer and hidden layer according
to MK-ELM algorithm,
Randomly generate the Output classification
– Step 4 Input training samples, calculate hidden layer Weight w, bias b result
output matrix H and output layer weight matrix b by
Eqs. (2) and (4),
– Step 5 Input testing samples, test the performance of
End
MK-ELM through a large number of experiments;
Fig. 3 Flowchart of MK-ELM algorithm

123
W. Zhang et al.

Monitoring
area
Sink node

Cluster head Satellite or the


Sensor node
node Internet

Management
node

Sink node
Monitoring
area

Fig. 4 WSN model

information to the upper node; this part includes 4.2 Hierarchical intrusion detection model
common sensor nodes and cluster head nodes,
(2) Sink node Fuses the data information sent by the As WSNs have many restrictions, it is necessary to con-
cluster head sensor, and then transmits it to the sider the following aspects when designing an intrusion
management node through other network channels detection model:
such as the Internet and satellite,
(1) Energy saving The overall design of the model
(3) Management node It is directly oriented to the user.
should not be sophisticated. Else, it significantly
Used to observe the running status of the network,
increases the energy cost of the network in the actual
perform intrusion detection and analysis on the data
application and eventually leads to shortening the
information sent by the sink nodes, and take
life cycle of the network,
corresponding operations. Besides, the management
(2) Detection accuracy Since the design of the intrusion
node can also actively send a query request to the
detection system should consider the requirements of
WSN.
real time and application, how to improve the
The assumptions of the wireless sensor network envi- accuracy of detecting intrusion and reduce the false
ronment in this paper are fourfold, and are listed as: negative rate and false positive rate have become the
main concerns of intrusion detection systems,
(1) The wireless sensor network is a clustered network.
(3) Processing power for large traffic networks.
The common nodes in the cluster can communicate
directly with the cluster head nodes. The cluster head WSNs have gradually entered into people’s lives and
nodes can communicate with the sink nodes directly have been applied in several fields that lead to an
or through other cluster heads, and the sink nodes increasing amount of data to be processed. Therefore, this
can directly communicate with the management intrusion detection system should also be capable of pro-
node, cessing under massive data traffic.
(2) Each node is static, has a unique identifier ID, and Being WSNs different from traditional computer net-
belongs to only one cluster. The cluster head node is works such as terminal type, data transmission, and net-
necessarily the same as the common node, and the work topology, traditional network intrusion detection
sink node has more resources and energy, methods are no longer applicable. Each device in WSNs is
(3) The data transmission model of the network is divided into three layers according to its characteristics,
hybrid, including the sustainability model and the which framework for intrusion detection of WSNs is
event-driven transmission model, depicted as Fig. 5. The model of wireless network intrusion
(4) The state of the node includes a sleep state, a detection based on MK-ELM adopts anomaly detection
monitoring state, and an active state. method, and according to the characteristics of the kernel

123
Wireless sensor network intrusion detection system based on MK-ELM

(1) Perception layer It contains common sensor nodes


Intrusion Exception and cluster head nodes. Common sensor nodes in the
detection handling monitoring area are used to sense and collect data,
and cluster head nodes are used to summarize data
Core control layer sent by current common nodes. This layer collects
network packets, TCP/IP traffic packets, and other
information,
(2) Data aggregation layer This layer is composed of
sink nodes. Firstly, data sent by network and cluster
head nodes are collected. Next, the data are fused,
Data
preprocessing and the intrusion detection features are extracted.
The preprocessed data are sent to the core control
Data aggregation layer layer for analysis and judgment. As the collected
network data format is not uniform, and there is data
information that cannot be recognized by the clas-
sifier, so the data should be digitized or vectorized in
advance. Each dimension feature value corresponds
to a number, and the next step is normalization. After
Summarize the operation of vectorization and normalization, the
Collect data
data feature values of each dimension are distributed
Perception layer
between [0, 1],
(3) Core control layer This layer consists of manage-
ment nodes, including intrusion detection modules
Fig. 5 WSN intrusion detection framework
and anomaly handling modules,
(4) Intrusion detection module Responsible for receiving
function, a multi-kernel extreme learning machine is con-
data information from sink node and intrusion
structed to resolve the limitations of the single-kernel
judgment. As the core part of the intrusion detection
extreme learning machine in dealing with intrusion detec-
system, the accuracy and timeliness of data infor-
tion. With the large and unevenly distributed features of the
mation analysis in this module affect the perfor-
intrusion data, data preprocessing is conducted, so data
mance of the whole intrusion detection system. The
processed are used as the dataset of intrusion detection.
module utilizes the MK-ELM detection algorithm as
The networks are mostly heterogeneous in WSNs. That
a classifier to predict and classify the testing dataset,
is, different types of nodes in the network have different
(5) Anomaly handling module The output of the intru-
responsibilities (Bao et al. 2012; Liang et al. 2020). For
sion detection module is sent to the anomaly
instance, the capacity of sink node is higher than that of
handling module, in which the final response is
cluster head node and common nodes in the cluster of a
analyzed, and the corresponding measures are taken.
clustered hierarchical WSN, as management node is at a
To reach the best decisions before the intrusion
terminal location and its energy can be regarded as infinite
detection model is output, two situations need to be
relative to another type of nodes. Based on this fact, the
considered: the accuracy of the detection model and
hierarchical intrusion detection method takes into account
the possible classification of the attack. Because the
the difference in functions between nodes and maximizes
false alarm rate of the MK-ELM can reduce the
the use of nodes in the wireless sensor network. Based on
accuracy of the detection module, setting the alarm
the limited energy of nodes in WSN yet the perspective of
threshold can improve the detection precision of the
saving network energy consumption, only simple data
module. The WSN is detected using the intrusion
collection and aggregation work are carried out on ordinary
detection module of the core control layer, and the
nodes. Next, the cluster head node transmits the collected
output is sent to the anomaly handling module.
data to the sink node for data preprocessing. Finally, the
Whenever the number of abnormal data detected
detection and anomaly processing modules are imple-
reaches the alarm threshold, the node that requests
mented on the management node with infinite energy. Such
the maximum number of records is found by the
an arrangement can effectively reduce energy consumption
node routing table (Dai et al. 2018), according to the
in intrusion detection systems. The functions of each level
data collected by the common sensor node of the
are as follows:
sensing layer and regarded as an abnormal node. The
message is sent to the management node of the IDS

123
W. Zhang et al.

core control layer, and the user decides to remove the KDDTrain? dataset as the training set and KDDTest?
neighbor nodes from the routing table of the dataset, as the testing set that has different standard records
abnormal nodes. Next, the abnormal nodes are and four different types of attack records. This dataset
removed from the system, so the abnormal nodes contains 148,517 samples, with 41 features and 5
are ignored in the next communication, according to categories.
the routing protocol.
(1) Vectorization of symbol features
Based on energy saving, high detection accuracy, and
Firstly, we must convert some non-numeric features, such
large-scale network detection capability, our intrusion
as ‘protocol_type,’ ‘service,’ and ‘flag’ features, into
detection model is based on hierarchical WSN. Due to the
numeric form. For instance, the three protocol types: TCP,
unlimited energy of the management node, we let intrusion
UDP, and ICMP, are converted into binary digital feature
detection algorithm model on the management node to
vectors: TCP{1,0,0}, UDP{0,1,0}, and ICMP{0,0,1}. The
operate intrusion detection. However, a two-step faulty
service type feature is extended to 70-dimensional features,
nodes detection algorithm is based on spatial–temporal
and the state of flag feature is extended to 11-dimensional
cooperation for WSN (Dai et al. 2018). The intrusion
features. Coming next, the 41-dimensional features map
detection algorithm is placed on the sink node. Due to the
into 122-dimensional features after transformation. Simi-
limited energy of the sink node, it is more reasonable to
larly, to test the accuracy of the machine learning classi-
place the intrusion detection algorithm model on the
fication algorithms, 40 types of labels are classified into
management node. At present, many intrusion detection
Normal, Dos, Probe, U2R, and R2L together 5 categories.
systems use SVM and basic ELM for intrusion detection,
but their accuracy is lower than MK-ELM and their false (2) Normalization of digital features
alarm rate is higher than MK-ELM. From the perspective
In the NSL-KDD dataset, there are 41 feature items with
of WSN node energy and intrusion detection indicators, the
different value ranges. Some of the values higher than 106
proposed algorithm has great advantages.
dramatically affects ELM performance and makes smaller
features easy to be ignored. Therefore, to facilitate
numerical calculation and avoid excessive proportion of
5 Simulation experiments
features with large values in the training process, it is
necessary to normalize the feature data. Normalization
The experiment is performed on a personal computer,
maps the original data to the range of standard attributes
configured with Intel 4-core i5-6500 3.2 GHz CPU pro-
through some mapping so that the data can be transformed
cessor, 6 MB cache, 8 GB memory, and GPU acceleration
into [0,1] intervals. The min–max normalization formula is
disabled running Windows 7 OS. The software environ-
as follows:
ment is MATLAB R2014b version.
xi  xmin
x ¼ ð15Þ
5.1 Dataset xmax  xmin
where xi denotes the value to be normalized, xmin denotes
5.1.1 NSL-KDD dataset the minimum value in a dimension, xmax denotes the
maximum value in a dimension, and x denotes the nor-
KDD Cup99 dataset is the most widely used dataset for malized value.
evaluations of intrusion detection systems. Nevertheless,
this dataset is duplicative, redundant, and imbalanced, 5.1.2 UNSW-NB 15 dataset
which seriously affects the performance of the evaluated
intrusion detection systems. As shown in Table 1, this UNSW-NB 15 dataset is created by UNSW cybersecurity
paper utilizes NSL-KDD dataset which is a less biased laboratory through IXIA PerfectStorm tool and released in
subset from the KDD Cup99 dataset and covers the 2015. It contains real modern normal and the synthetical
abnormal network traffic in the synthetic environment.
Table 1 Different classifications in the NSL-KDD dataset
UNSW-NB 15 represents nine major families of attacks,
and contains 49 features and their labels. As depicted in
NSL-KDD All types Normal Dos Probe R2L U2R Table 2, a partition of this dataset is configured as two sets,
KDDTrain? 125,973 67,343 45,927 11,656 995 52 training set and testing set. The major disadvantage of
KDDTest? 22,544 9711 7458 2421 2754 200 NSL-KDD is that it does not represent the current low
footprint attack scenarios. Thus, to further validate the

123
Wireless sensor network intrusion detection system based on MK-ELM

Table 2 Statistics of the UNSW-NB 15 dataset Even so, the accuracy of UNSW-NB 15 is obtained in
All types Training set Testing set
the two-category experiment. In the process of searching
for the best performance multi-kernel function, the control
Sample % Sample % variable method is adopted. In this research, we take into
Normal 65,000 37.08 37,000 44.94 account the performance of different combinations and the
Attack 110,341 62.92 45,332 55.06 MK-ELM with the best performance is selected. As can be
Total 175,341 100.00 82,332 100.00 seen in Table 4, the best accuracy can be achieved when
the weight of the RBF kernel is 0.3, and the weight of the
multiquadric kernel is 0.7. Therefore, the optimal combi-
nation of the multi-kernel function selected is 0.3RBF
kernel ? 0.7 multiquadric kernel.
performance of the proposed method, UNSW-NB 15
dataset is applied to validate the indicators selected.
5.3 Performance evaluation
Similar to the NSL-KDD dataset, we first digitize the
characteristic features of the UNSW-NB 15 and then nor-
5.3.1 Evaluation metrics
malize the data features. There are two forms of label
processing: divided into 2 categories and 10 categories.
The accuracy AC (accuracy) is used to measure the per-
The size of the training set and testing set is 82,332 9 44
formance of the proposed MK-ELM intrusion detection
and 175,341 9 44 dimensions, respectively, and the size of
algorithm, as the total rate of correct decisions whether the
the training set and testing set in the second method is,
incident of an attack happened. Three performance metrics
respectively, 82,332 9 52 and 175,341 9 52 dimensions.
are used: true positive rate (TPR), false positive rate (FPR),
Ten-category labels of UNSW-NB 15 are shown in
and false negative rate (FNR), which represents the rate of
Table 3.
attack cases identified correctly, the rate of no-attack cases
identified as attacks by the system, and the rate of attack
5.2 Multi-kernel function parameters setting
cases identified as normal ones, respectively.
The receiver operating curve (ROC) helps in visualizing
Extensively tested the weights of commonly used kernel
a classifier’s performance by plotting the actual positive
functions, the results are shown in Table 4. Referencing to
rate against the false positive rate of the classifier. The area
optimal parameters from other publications, some of the
under the ROC gives the best estimate of an average of
parameters are determined in advance (Cheng et al. 2012).
average performance of the classifier. Higher the area,
This experiment is based on following parameters: the
more significant is the performance. The calculation
regularization parameter C = 217, the kernel parameter of
methods are:
RBF = 100, the kernel parameter of multiquadric ker-
nel = 75, and 5000 different training data and testing data, TP þ TN
AC ¼ ð16Þ
respectively, from KDDTrain? and KDDTest? in NSL- TP þ TN þ FP þ FN
KDD are chosen. Similarly, we take 5000 different sam- TP
ples, respectively, from the training set and testing set of TPR ¼ ð17Þ
TP þ FN
UNSW-NB 15. The experimental accuracy of KDDTest? FP
is obtained in the five-category experiment. FPR ¼ ð18Þ
FP þ TN
FN
FNR ¼ ð19Þ
Table 3 Ten-category labels of
Label type Vectorization
FN þ TP
UNSW-NB 15
where TP denotes positive samples predicted by the model,
Normal 1000000000
TN denotes negative samples predicted by the model, FP
Analysis 0100000000
denotes negative samples predicted by the model, and FN
Backdoor 0010000000
denotes positive samples predicted by the model.
Dos 0001000000
Exploits 0000100000
5.3.2 Experiment results and discussion
Fuzzers 0000010000
Generic 0000001000
(1) MK-ELM experiment results
Reconnaissance 0000000100
Shellcode 0000000010 The NSL-KDD dataset was used in the first phase of the
Worms 0000000001 experiment, where 14,000 pieces training data are

123
W. Zhang et al.

Table 4 The optimal combination of multi-kernel function

C ¼ 217 Kernel1 Kernel2 Kernel3 Accuracy Accuracy


RBF_para = 100 RBF_kernel ðu1 Þ lin_kernel ðu2 Þ mq_kernel ðu3 Þ KDDTest? (%) UNSW-NB 15 (%)
mq_para = 75
Training/testing = 5000/5000

1 0.0 1.0 0.0 0.92 0.83


2 0.0 0.5 0.5 0.87 0.76
3 0.0 0.0 1.0 0.88 0.69
4 0.1 0.9 0.0 0.91 0.72
5 0.1 0.5 0.4 0.85 0.63
6 0.1 0.0 0.9 0.89 0.75
7 0.2 0.8 0.0 0.85 0.72
8 0.2 0.5 0.3 0.86 0.71
9 0.2 0.0 0.8 0.95 0.91
10 0.3 0.7 0.0 0.89 0.79
11 0.3 0.5 0.2 0.87 0.79
12 0.3 0.0 0.7 0.98 0.92
13 0.4 0.6 0.0 0.97 0.85
14 0.4 0.5 0.1 0.82 0.66
15 0.4 0.0 0.6 0.97 0.90
16 0.5 0.5 0.0 0.95 0.83
17 0.5 0.4 0.1 0.81 0.69
18 0.5 0.0 0.5 0.92 0.86
19 0.6 0.4 0.0 0.91 0.81
20 0.6 0.3 0.1 0.82 0.74
21 0.6 0.0 0.4 0.94 0.79
22 0.7 0.3 0.0 0.91 0.86
23 0.7 0.2 0.1 0.84 0.66
24 0.7 0.0 0.3 0.98 0.85
25 0.8 0.2 0.0 0.96 0.87
26 0.8 0.1 0.1 0.83 0.69
27 0.8 0.0 0.2 0.95 0.88
28 0.9 0.1 0.0 0.94 0.89
29 0.9 0.0 0.1 0.95 0.91
30 1.0 0.0 0.0 0.97 0.89

randomly selected from ‘KDDTrain?’ for network training, the false positive is less than 5%, which is due to the tiny
and another 14,000 pieces of different data are randomly number of U2R intrusion type that leads to a false alarm.
selected from ‘KDDTest?’ for testing. The training set and The true positive rate of U2R is significantly improved
the testing set are uniformly subject to the above data when compared with the literature work (Huang et al.
preprocessing and become 14,000 9 127 and 2017), and achieved a detection rate of 50%, derived from
14,000 9 127 dimensions. Figure 6a shows the confusion the confusion matrix shown in Fig. 6a. According to the
matrix of the MK-ELM on the selected testing set in the 5-category confusion matrix, Table 5 shows the binary-
five-category classification experiments. Experiments show category confusion matrix of Dos.
that the accuracy of the model is 98.3%. From this 5 9 5
(2) Experiment comparison among different schemes
confusion matrix, it is noted that some other indicators are
summarized in Tables 5 and 6. Depicted in Fig. 6b, the Parameters’ setting of SVM algorithm is given as: the
ROC curve for 5 different classes that, except for U2R, the typical kernel functions are the polynomial kernel
exact position of other four classes is more than 90%, and Kðx; xi Þ ¼ ðx  xi þ 1Þd and the Gaussian kernel

123
Wireless sensor network intrusion detection system based on MK-ELM

Fig. 6 a Confusion matrix for the five-category and b ROC curve for the five-category

Table 5 Dos confusion matrix used in this work. For basic ELM, the number of hidden
neurons is 400 and let the hidden neurons be sigmoidal
Predicted Actual
additive nodes. As random values are assigned to some of
Positive Negative the parameters in basic ELM, the output is not fixed. Let
Positive (TP)5096 (FP)43 the number of hidden neurons be 400 and choose the value
Negative (FN)102 (TN)8759 of C in Eq. (5). Hence, for SVM, basic ELM, and proposed
MK-ELM algorithm, 50 trials for each dataset are con-
ducted and record its average testing accuracy.
Table 7 shows the average detection rate for three
algorithms performed 50 times when both the training
Table 6 Results of the evaluation metrics for the five-category
dataset and the test dataset are 14,000. Results show that
Intrusion type TPR (%) FPR (%) FNR (%) the detection rate of the SVM algorithm is comparable to
Dos 98.04 0.49 1.96
the basic ELM algorithm. The detection rate of the pro-
posed algorithm is the highest, about 2% higher than the
Probe 95.67 0.47 4.33
other two algorithms. Comparing to the literature (Cheng
R2L 76.12 0.11 23.88
et al. 2012), 1000, 2000, 4000, 8000, and 14,000 test data
U2R 50.00 0.00 50.00
are selected. As presented in Table 8, the proposed MK-

Table 7 The detection rate of three methods

Kðx; xi Þ ¼ expðjjx  xi jj2 =d2 Þ, where d is the degree of Intrusion type Detection rate (%)
the polynomial kernel and d2 is the bandwidth of the SVM ELM MK-ELM
Gaussian kernel. In experiments performed, the cost
Normal 97.73 97.92 99.12
parameter C and kernel parameter d2 were appropriately Dos 96.24 97.15 98.03
chosen from ½224 ; 29 ; 28 ; . . .; 225  and Probe 93.75 94.54 95.74
24 9 8 25
½2 ; 2 ; 2 ; . . .; 2 , respectively, for each dataset. R2L 55.26 65.03 76.15
Therefore, 50  50 ¼ 2500 combinations for each dataset U2R 30.73 23.02 50.00
were tested and the set of parameters is applied to the test
dataset. An SVM implementation called LIBSVM-3 was

123
W. Zhang et al.

Table 8 The accuracy of three methods in different data size Table 9 Binary-category confusion of SVM
Data size SVM ELM MK-ELM Predicted Actual
Training/testing Accuracy (%) Accuracy (%) Accuracy (%)
Positive Negative
1000/1000 97.58 96.83 97.85
Positive (TP)7959 (FP)105
2000/2000 98.31 97.07 98.88
Negative (FN)1547 (TN)4389
4000/4000 98.69 97.00 98.92
8000/8000 98.02 96.79 98.79
14,000/14,000 97.02 96.43 98.34

Table 10 Binary-category confusion of ELM


Predicted Actual
ELM accuracy is higher than SVM and ELM under same Positive Negative
conditions. The time consumed by SVM, ELM, and MK-
ELM is calculated, and, respectively, refers to the total Positive (TP)7899 (FP)172
time spent in training and testing. Negative (FN)1522 (TN)4407
Experiments with 5000, 10,000, 15,000, 20,000, and
25,000 test data are selected. From the results presented in
Fig. 7, we can observe that the basic ELM performs better
than SVM in terms of time consumption. Moreover, due to Table 11 Binary-category confusion of MK-ELM
the performance degradation of SVM, the proposed MK- Predicted Actual
ELM outperforms SVM in terms of speed and accuracy,
Positive Negative
what demonstrates that the proposed MK-ELM method
shows better scalability than SVM when classifying multi- Positive (TP)8437 (FP)108
class traffic for intrusion detection. Negative (FN)998 (TN)4457
In the second phase of the experiment, the UNSW-NB
15 dataset is used to evaluate the performance of the pro-
posed method, and the statistical distribution is shown in
Table 2. Different 14,000 pieces of data are randomly Also, the comparison of this proposed model MK-ELM
selected from the preprocessed training dataset and testing with other models using UNSW-NB 15 is shown in
set. The model for binary category is evaluated by con- Table 12. The accuracy of SVM and ELM is similar, while
sidering all kinds of attacks as a single attack class to make the accuracy of the proposed algorithm MK-ELM is the
a comparison with SVM and basic ELM. The binary-cat- highest. However, compared with the NSL-KDD dataset,
egory confusion matrix of three algorithms on the UNSW- the accuracy of these three algorithms has decreased.
NB 15 dataset is shown in Tables 9, 10, and 11. The label of UNSW-NB 15 dataset is vectorized into 10
categories; also ten classifications testing are performed.
The preprocessed training dataset and the testing set are
randomly selected to have different data amounts of
10,000, 15,000, 20,000, and 25,000, respectively. The
accuracy of the above discussed three algorithms is
depicted in Fig. 8.

Table 12 Comparison of the proposed model with other models using


UNSW-NB 15 in selected four evaluation indicators
Evaluation indicators (%) SVM ELM MK-ELM

AC 88.20 87.90 92.10


TPR 83.73 83.84 89.42
FPR 2.34 3.76 2.37
FNR 16.27 16.16 10.58

Fig. 7 Time consumption of three algorithms

123
Wireless sensor network intrusion detection system based on MK-ELM

network energy consumption and improving the overall


performance of WSNs.

Acknowledgements Authors of this manuscript are grateful to the


valuable comments provided by external reviewers and international
experts for the improvement of technical and organization sections.

Funding This work is supported by the National Natural Science


Foundation of China (Nos. 61672338 and 61873160).

Compliance with ethical standards

Conflict of interest All the authors declare that they have no conflict
of interest.

Ethical approval This article does not contain any studies with human
participants or animals performed by any of the authors.
Fig. 8 The AC of three algorithms in UNSW-NB 15 dataset

References
When testing with the UNSW-NB 15 dataset, it is noted
that the accuracy of two classifications, true positive rate, Bao F, Chen R, Chang MJ et al (2012) Hierarchical trust management
false positive rate, and false negative rate, or the accuracy for wireless sensor networks and its applications to trust-based
of 10 classifications of different groups, the proposed MK- routing and intrusion detection. IEEE Trans Netw Serv Manag
ELM algorithm achieve the highest performance. Taking 9(2):169–183
Borkar GM, Patil LH, Dalgade D et al (2019) A novel clustering
into consideration the detection and evaluation indexes of approach and adaptive SVM classifier for intrusion detection in
the three algorithms and their respective time consumed in WSN: a data mining concept. Sustain Comput Inform Syst
different datasets, it is noted that the MK-ELM algorithm 23:120–135
proposed in this paper has significant advantages and Butun I, Morgera SD, Sankar R (2014) A survey of intrusion
detection systems in wireless sensor networks. IEEE Commun
promising when applied in WSN environments. Surv Tutor 16(1):266–282
Cao LL, Huang WB, Sun FC (2014) Optimization-based extreme
learning machine with multi-kernel learning approach for
6 Conclusion remarks and future work classification. IEEE Comput Soc 14:3564–3569
Cheng C, Tay WP, Huang GB (2012) Extreme learning machines for
intrusion detection. In: the 2012 international joint conference on
Based on the KELM algorithm and the multi-kernel theory, neural networks (IJCNN). IEEE, pp 1–8
the optimal multi-kernel function 0.3RBF kernel ? 0.7 Dai JJ, Tao Y, Yang FY (2018) A novel intrusion detection system
multiquadric kernel is chosen, and intrusion detection based on IABRBFSVM for wireless sensor networks. Procedia
Comput Sci 131:1113–1121
algorithm MK-ELM proposed to clustered WSN environ- Girolami M (2002) Mercer kernel-based clustering in feature space.
ments, architecting a hierarchical WSN intrusion detection IEEE Trans Neural Netw 13(3):780–784
system model. The classification algorithm is compared Han Z, Zhang W, Chen Z (2010) A Markov-based intrusion detection
with the SVM-based multi-classification algorithm and scheme for wireless sensor networks. Comput Eng Sci 9:009
Huang GB, Chen L (2007) Convex incremental extreme learning
basic ELM algorithm. Simulation results show that this machine. Neurocomputing 70(16):3056–3062
proposed method improves the detection rate in compar- Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine:
ison with the basic ELM algorithm, dramatically shortens theory and applications. Neurocomputing 70(16):489–501
the detection time compared to the multi-classification Huang G, Huang GB, Song S et al (2015) Trends in extreme learning
machines: a machines: a review. Neural Netw 61:32–48
algorithm of SVM, and solves the problems of low detec- Huang SH, Chen WZ, Li J (2017) Network intrusion detection based
tion rate based on the basic ELM algorithm and time- on extreme learning machine and principal component analysis.
consuming detection based on SVM algorithm. Yet, it J Jilin Univ (Inf Sci Ed) 35(5):576–583
provides a new approach for intrusion detection in WSNs. Liang W, Li K-C et al (2019a) An industrial network intrusion
detection algorithm based on multi-feature data clustering
Even though this model has a high detection rate, the optimization model. IEEE Trans Ind Inform. https://doi.org/10.
proposed model needs to increase the detection of multiple 1109/TII.2019.2946791
intrusion patterns. For future work, other classes of attacks Liang W, Tang M, Long J, Peng X, Xu J, Li K-C (2019b) A secure
in WSNs are focused and consider the energy consumption fabric blockchain-based data transmission technique for indus-
trial internet-of-things. IEEE Trans Ind Inform 15(6):3582–3592
of communication among nodes further, aiming at reducing

123
W. Zhang et al.

Maleh Y, Ezzati A, Qasmaoui Y et al (2015) A global hybrid Wang CR, Xu RF, Lee SJ et al (2018) Network intrusion detection
intrusion detection system for wireless sensor networks. Procedia using equality constrained-optimization-based extreme learning
Comput Sci 52:1047–1052 machines. Knowl Based Syst 147:68–80
Rani TP, Jayakumar C (2017) Unique identity and localization based Yin C, Zhu Y, Fei J et al (2017) A deep learning approach for
replica node detection in hierarchical wireless sensor networks. intrusion detection using recurrent neural networks. IEEE
Comput Electr Eng 64:148–162 Access 5(2):21954–21961
Shone N, Ngoc TN, Phai VD et al (2018) A deep learning approach to Zhang Z (2014) Efficient computer intrusion detection method based
network intrusion detection. IEEE Trans Emerg Top Comput on artificial bee colony optimized kernel extreme learning
Intell 2(1):41–50 machine. Indones J Electr Eng Comput Sci 12(3):1954–1959
Silva AAPD, Martins MH, Rocha BP et al (2005) Decentralized Zhang YT, Ma C, Li ZN et al (2014) Online modeling of kernel
intrusion detection in wireless sensor networks. In: Proceedings extreme learning machine based on fast leave-one-out cross-
of the 1st ACM international workshop on quality of service & validation. Shanghai Jiaotong Univ (Sci) 48:641–646
security in wireless and mobile networks, pp 16–23
Tang J, Deng C, Huang GB (2016) Extreme learning machine for Publisher’s Note Springer Nature remains neutral with regard to
multilayer perceptron. IEEE Trans Neural Netw Learn Syst jurisdictional claims in published maps and institutional affiliations.
27(4):809–821

123

You might also like