4
4
ABSTRACT Intrusion detection can identify unknown attacks from network traffics and has been an
effective means of network security. Nowadays, existing methods for network anomaly detection are usually
based on traditional machine learning models, such as KNN, SVM, etc. Although these methods can obtain
some outstanding features, they get a relatively low accuracy and rely heavily on manual design of traffic
features, which has been obsolete in the age of big data. To solve the problems of low accuracy and feature
engineering in intrusion detection, a traffic anomaly detection model BAT is proposed. The BAT model
combines BLSTM (Bidirectional Long Short-term memory) and attention mechanism. Attention mechanism
is used to screen the network flow vector composed of packet vectors generated by the BLSTM model, which
can obtain the key features for network traffic classification. In addition, we adopt multiple convolutional
layers to capture the local features of traffic data. As multiple convolutional layers are used to process data
samples, we refer BAT model as BAT-MC. The softmax classifier is used for network traffic classification.
The proposed end-to-end model does not use any feature engineering skills and can automatically learn
the key features of the hierarchy. It can well describe the network traffic behavior and improve the ability
of anomaly detection effectively. We test our model on a public benchmark dataset, and the experimental
results demonstrate our model has better performance than other comparison methods.
INDEX TERMS Network traffic, intrusion detection, deep learning, BLSTM, attention mechanism.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 29575
T. Su et al.: BAT: Deep Learning Methods on Network Intrusion Detection Using NSL-KDD Dataset
neural network with traffic data as image. This method does of the BAT-MC network can reach 84.25%, which is about
not need manual design features, and directly takes the origi- 4.12% and 2.96% higher than the existing CNN and RNN
nal traffic as the input data to the classifier. In [10], the authors model, respectively.
provide an analysis of the viability of Recurrent Neural Net- The following are some of the key contributions and find-
works (RNN) to detect the behavior of network traffic by ings of our work:
modeling it as a sequence of states that change over time. 1) We propose an end-to-end deep learning model
In [11], the authors verify the performance of Long Short- BAT-MC that is composed of BLSTM and attention
Term memory (LSTM) network in classifying intrusion traf- mechanism. BAT-MC can well solve the problem of
fics. Experimental results show that LSTM can learn all the intrusion detection and provide a new research method
attack classes hidden in the training data. All the above meth- for intrusion detection.
ods treat the entire network traffic as a whole consisting of a 2) We introduce the attention mechanism into the BLSTM
sequence of traffic bytes. They don’t make full use of domain model to highlight the key input. Attention mechanism
knowledge of network traffics. For example, CNN converts conducts feature learning on sequential data composed
continuous network traffic into images for processing, which of data package vectors. The obtained feature informa-
is equivalent to treating traffics as independent and ignore the tion is reasonable and accurate.
internal relations of network traffics. Firstly, network traffic 3) We compare the performance of BAT-MC with tra-
is a hierarchical structure. Specifically, network traffic is a ditional deep learning methods, the BAT-MC model
traffic unit composed of multiple data packets. Data packet can extract information from each packet. By making
is a traffic unit composed of multiple bytes. Secondly, traffic full use of the structure information of network traffic,
features in the same and different packets are significantly the BAT-MC model can capture features more compre-
different. Sequential features between different packets need hensively.
to be extracted independently. In other words, not all traffic 4) We evaluate our proposed network with a real
features are equally important for traffic classification in the NSL-KDD dataset. The experimental results show that
process of extracting features on a certain network traffic. the performance of BAT-MC is better than the tradi-
However, little prior works have utilized the above men- tional methods.
tioned structure of network traffic. Inspired by these char-
acteristics, in this paper, we propose and demonstrate our The rest of the paper is organized as follows: In Section 2,
method to analyze network traffic in an overall view. Net- we give a brief overview of the related work, especially how
work traffic is generally collected at fixed time intervals. intelligent algorithms facilitate the development of intrusion
Repeating this collecting process for m times, we can get detection. In Section 3, we present details of the proposed
the network traffic X 0 , where X 0 = (x10 , x20 , . . . , xm0 ) is a BAT-MC model. In Section 4, we explain the experimental
matrix with m data packets. Each x represents a data packet, setup and present our results. The performance of BAT-MC
in data packet is seen as a whole consisting of a sequence of model is compared with other machine learning methods
traffic bytes. Before entering the data into the BAT model, both in binary classification and multiclass classification.
the original data is preprocessed by multiple convolutional Section 5 draws the conclusions.
layers. Global features can be obtained with the increase of
the convolutional layer. With the preprocessing, we get an II. RELATED WORKS
abstract representation of network traffic X from X 0 . In order The intrusion detection technology can be divided into
to better make full use of domain knowledge of network three major categories: pattern matching methods, traditional
traffics, we propose a deep learning model BAT-MC that machine learning methods and deep learning methods.
mainly combines bidirectional long-term memory (BLSTM) At the beginning, people mainly use pattern matching
[12] and attention mechanism [13]. BLSTM is used to learn algorithms for intrusion detection. Pattern matching algo-
the characteristics of each packet and get the vector corre- rithm [14], [15] is the core algorithm of intrusion detection
sponding to each packet. Attention mechanism is then used system based on feature matching. Most algorithms have
to perform feature learning on the sequence data composed been considered for use in the past. In [16], the authors
of the packet vector to obtain the fine-grained features. Up to make a summary of pattern matching algorithm in Intrusion
now, we have finished the key features extraction of net- Detection System: KMP algorithm, BM algorithm, BMH
work traffics via attention mechanism. The whole process of algorithm, BMHS algorithm, AC algorithm and AC-BM
feature learning does not use any feature engineering skills. algorithm. Experiments show that the improved algorithm
The automatically learnt key features can better describe the can accelerate the matching speed and has a good time perfor-
traffic behavior, which can effectively improve the anomaly mance. In [17], Naive approach, Knuth-MorrisPratt algorithm
detection capability. Finally, a full connected network and a and RabinKarp Algorithm are compared in order to check
softmax function are performed on the obtained fine-grained which of them is most efficient in pattern/intrusion detection.
features for anomaly detection. To verify the effectiveness Pcap files have been used as datasets in order to determine
of the BAT-MC network, it is comprehensively evaluated on the efficiency of the algorithm by taking into consideration
the NSL-KDD dataset and gets the best results. The accuracy their running times respectively. These traditional pattern
recognition algorithms have serious defects, which cannot The results show a significantly high accuracy and detection
achieve the effect of intrusion detection. Finding an efficient rate, averaging 99%. However, current deep learning methods
algorithm that reaches high efficiency and low false positive don’t make full use of the structured information of network
rates is still the focus of current work. With the develop- traffic. Network traffic is essentially a kind of time series
ment of artificial intelligence, the application of intelligent data. Similar to the structure of letters, words, sentences and
algorithms for intrusion detection has become a new research paragraphs in natural language processing (NLP), network
hotspot. traffic is composed of multiple data packets and each data
The traffic anomaly detection methods based on machine packet is a set of multiple bytes.
learning have achieved a lot of success. In [18], the authors In this paper, drawing on the application methods of deep
propose a new method of feature selection and classifica- learning in NLP, we adopt phased processing. The BLSTM
tion based on support vector machine (SVM). Experimen- is used to learn the sequential features in the data packet
tal results on NSL-KDD cup 99 of intrusion detection data to obtain a vector corresponding to each data packet. Then,
set showed that the classification accuracy of this method attention layer is used to perform feature learning on the
with all training features reached 99%. In [19], the authors sequential data composed of the packet vector. Attention can
combine k-mean clustering on the basis of KNN classifier. filter out the characteristics to get a network flow vector,
The experimental results on NSL-KDD dataset show that this which are helpful to achieve more accurate network traffic
method greatly improves the performance of KNN classifier. classification. Through the learning of two phases of BLSTM
In [20], the authors propose a new framework to combine and attention on the time series features, the BAT-MC model
the misuse and the anomaly detection in which they apply finally outputs a network flow vector, which contains struc-
the random forests algorithm. Experimental results show tured information of network traffic. Hence, the BAT-MC
that the overall detection rate of the hybrid system is 94.7% model makes full use of the structure information of network
and the overall false positive rate is 2%. In [21], the perfor- traffic.
mance of NSL-KDD dataset is evaluated via Artificial Neural
Network (ANN). The detection rate obtained is 81.2% and III. PROPOSED WORK
79.9% for intrusion detection and attack type classification As shown in Figure 1, the BAT-MC model consists of five
task respectively for NSL-KDD dataset. In [22], an intrusion components, including the input layer, multiple convolutional
detection method based on decision tree (DT) is proposed. Layers, BSLTM layer, attention layer and output layer, from
Experimental results of feature selection using the relevant bottom to top. At the input layer, BAT-MC model converts
feature selection (CFS) subset evaluation method show that each traffic byte into a one-hot data format. Each traffic
the DT based intrusion detection system has a higher accu- byte is encoded as an n-dimensional vector. After traffic
racy. As described above, machine learning methods have byte is converted into a numerical form, we perform nor-
been proposed and have achieved success for an intrusion malization operations. At the multiple convolutional layer,
detection system. However, these methods require large-scale we convert the numerical data into traffic images. Convolu-
preprocessing and complex feature engineering of traffic tional operation is used as a feature extractor that takes an
data. It is impossible to solve the massive intrusion data image representation of data packet. At the BLSTM layer,
classification problem using machine learning methods. BLSTM model which connects the forward LSTM and the
With the superior performance of deep learning in image backward LSTM is used to extract features on the the traffic
recognition [23], [24] and speech recognition [25], [26], bytes of each packet. BLSTM model can learn the sequential
traffic anomaly detection methods based on deep learning characteristics within the traffic bytes because BLSTM is
have been proposed. In [27], the authors use Self-taught suitable to the structure of network traffic. In the attention
Learning (STL) on NSL-KDD dataset for network intrusion. layer, attention mechanism is used to analyze the important
Testing results show that their 5-class classification achieved degree of packet vectors to obtain fine-grained features which
an average f-score of 75.76%. In [28], the authors propose an are more salient for malicious traffic detection. At the output
intrusion detection method using deep belief network (DBN) layer, the features generated by attention mechanism are then
and probabilistic neural network (PNN). The experiment imported into a fully connected layer for feature fusion, which
result on the KDD CUP 1999 dataset shows that the method obtains the key features that accurately characterize network
performs better than the traditional PNN, PCA-PNN and traffic behavior. Finally, the fused features are fed into a
unoptimized DBN-PNN. Similarly, [29] and [30] train the classifier to get the final recognition results.
DBN as a classifier to detect intrusions. In [31], the authors
propose a novel network intrusion detection model utilizing A. DATA PREPROCESSING LAYER
convolutional neural networks (CNNs). The CNN model not There are three symbolic data types in NSL-KDD data fea-
only reduces the false alarm rate (FAR) but also improves the tures: protocol type, flag and service. We use one-hot encoder
accuracy of the class with small numbers. In [32], an artificial mapping these features into binary vectors.
intelligence (AI) intrusion detection system using a deep neu- One-Hot Processing: NSL-KDD dataset is processed
ral network (DNN) is investigated and tested with the KDD by one-hot method to transform symbolic features into
Cup 99 dataset in response to ever-evolving network attacks. numerical features. For example, the second feature of the
FIGURE 1. The Architecture of BAT-MC model. The whole architecture is divided into five
parts.
NSL-KDD data sample is protocol type. The protocol type and while the deeper layers can capture global information
has three values: tcp, udp, and icmp. One-hot method is with larger vision field. Hence, as the number of the con-
processed into a binary code that can be recognized by a volutional layers increases, the scale of the convolutional
computer, where tcp is [1, 0, 0], udp is [0, 1, 0], and icmp feature gradually becomes coarser. In this paper, the input
is [0, 0, 1]. of the convolutional layer can be formulated as a tensor of
Normalization Processing: The value of the original data the size H × W × 1, where H and W denote the height and
may be too large, resulting in problems such as ‘‘large num- width of data yielded by normalization processing. Suppose
bers to eat decimals’’, data processing overflows, and incon- we have some N unites layer as input which is followed by
sistent weights so on. We use standard scaler to normalize the convolutional layer. If we use m width filter w, the convolu-
continuous data into the range [0, 1]. Normalization process- tional output will be (N − m + 1) unites. The convolutional
ing eliminates the influence of the measurement unit on the calculation process is as shown in equation (3).
model training, and makes the training result more dependent m
on the characteristics of the data itself. The formula is shown l,j j l−1,j
X
xi,k = f (bj + wa,k ri+(k−1)×s+a−1 ), (3)
in equation (1) and equation (2). a=1
r − rmin
r0 c , (1) l,j
where xi,k is one of the ith unit of j feature map of the
rmax − rmin
rmax = max{r}, (2) kth section in the lth layer, and s is the range of section. f
is a non-linear mapping, it usually uses hyperbolic tangent
where r stands for numeric feature value, rmin stands for the function, tanh(·).
minimal value of the feature, rmax stands for the max value,
r0 stands the value after the normalization. C. BLSTM LAYER
For the time series data composed of traffic bytes, BLSTM
B. MULTIPLE CONVOLUTIONAL LAYERS can effectively use the context information of data for fea-
After the above processing operations, convolutional layer ture learning. The BLSTM is used to learn the time series
is used to capture the local features of traffic data. Convo- feature in the data packet. Traffic bytes of each data packet
lutional layer [33], [34] is the most important part of the are sequentially input into an BLSTM, which finally obtain
CNN, which convolves the input images (or feature maps) a packet vector. BLSTM is an enhanced version of LSTM
with multiple convolutional kernels to create different feature (Long Short-Term Memory) [36], [37]. The BLSTM model is
maps. According to [35], the shallower convolutional layers used to extract coarse-grained features by connecting forward
whose receptive field is narrow can extract local information, LSTM and backward LSTM. LSTM is designed by the input
D. ATTENTION LAYER
BLSTM eventually generates a packet vector for each packet.
These packet vectors are arranged in the order of inter-
action between the two parties in the network stream to
form a sequence of packet vectors. The relationships within
packet vectors will be learned by attention layer. similarly
to [39], attention mechanism is used to adjust probability
of packet vectors so that our model pays more attention to
FIGURE 2. The architecture of BLSTM model. important features. Firstly, the packet vectors ht extracted by
the BLSTM model is used to obtain its implicit represen-
tation ut through a nonlinear transformation, which can be
gate i, the forget gate f and the output gate o to control how expressed as:
to overwrite the information by comparing the inner memory
cell C when new information arrives [38]. When information ut = tanh(Ww ht + bw ), (12)
enters a LSTM network, we can judge whether it is useful We next measure the importance of packet vectors based on
according to relevant rules. Only the information that meets the similarity representation ut with a context vector uw and
algorithms authentication will be remained, and inconsistent obtain the normalized importance weight coefficient αt . uw
information will be forgotten through forget gate. Given an is a random initialization matrix that can focus on important
input sequence x = (x0 , . . . , xt ) at time t and the hidden information over ut . The weight coefficient for the above
states of a BLSTM layer, h = (h0 , . . . , ht ) can be derived coarse-grained features can be expressed as:
as follows.
The forget gate will take the output of hidden layer ht−1 at exp(uTt uw )
αt = P , (13)
the previous moment and the input xt at the current moment exp(uTt uw )
as input to selectively forget in the cell state Ct , which can be Finally, the fine grained feature s can be computed via the
expressed as: weighted sum of ht based on αt . s can be expressed as:
ft = sigmoid(Wxf xt + Whf ht−1 + bf ), (4)
X
s= αt ht , (14)
The input gate cooperates with a tanh function together to The fine-grained feature vector s generated from the atten-
control the addition of new information. tanh generates a new tion mechanism is used for malicious traffic recognition with
candidate vector. The input gate generates a value for each a softmax classifier, which can be expressed as:
item in C
et from 0 to 1 to control how much new information
will be added, which can be expressed as: y = softmax(Wh s + bh ), (15)
et ),
Ct = sigmoid(ft · Ct−1 + it · C (5) where Wh represents the weight matrix of the classifier, which
can map s to a new vector with length h. h is the number of
it = sigmoid(Wxi xt + Whi ht−1 + bt ), (6)
categories of network traffics.
et = tanh(Wc xt + Wc ht−1 + bc ),
C (7)
The output gate is used to control how much of the current E. MODEL TRAINING
unit state will be filtered out, which can be expressed as: Training the proposed network contains a forward pass and a
backward pass.
ot = sigmoid(Wxo xt + Who ht−1 + bo ), (8) Forward Propagation The BAT-MC model is mainly com-
posed of BLSTM layer and attention layer, each of which
For the BLSTM model at time t, the hidden states of the presents different structures and thus plays different role
ht that is a packet vector generated from each packet can be in the whole model. The forward propagation [40], [41] is
←− −
→
defined as the concatenation of h t and h t , which can be conducted from BLSTM layer to attention layer. The input of
expressed as: current model is obtained by the processing of the previous
←
− −
→ model. After the completion of forward propagation, the final
ht = h t + h t , (9)
−
→ −
→ recognition result is obtained. The NSL-KDD dataset is
h t = tanh(Wx −→ xt + W−
h
→−
h h
→ ),
→ h t−1 + b−
h
(10) defined as X . The divided training dataset and testing dataset
←− ←− can be expressed as x1 ,x2 ,x3 . After one-hot operation and
h t = tanh(Wx ← − x + W←
h t
−← − ),
− h t−1 + b←
h h h
(11)
normalization operation, every samples is converted into a
where 0 .0 means the pointwise product. x represents the input format X 00 that can be acceptable to the BAT-MC model.
−
→ ←−
of the heterogeneous time series data. h t and h t is the Meanwhile, we set the cell state vector size as Sstate . In sum-
hidden states of forward LSTM layer and backward LSTM mary, the abnormal traffic detection algorithm based on the
layer at time t. All the matrices W are the connection weights BAT-MC model is summarized as Algorithm 1. The objec-
between two units, and b are bias vectors. tive function of our model is the cross-entropy based cost
Algorithm 1 BAT-MC Intrusion Detection Algorithms TABLE 1. Different classifications in the NSL-KDD dataset.
3
00 , x 00 = normalization(x 0 , x 0 , x 0 );
x100 , x12 3 1 12 3
4 conduct convolutional processing;
5 for t = 1; t ≤ T ; do out on a public dataset: the NSL-KDD dataset [46], [47].
←−−−
6 create LSTM cell by Sstate ; Then, we analyze the performance of the BAT-MC model.
−−−→ Finally, in order to verify the advancement and practicability
7 create LSTM cell by Sstate ;
←−−− −−−→ of the BAT-MC model, we compare the performance of this
8 connect BLSTMnet by LSTM cell and LSTM cell ;
9 initialize BLSTMnet by seed; model with some state-of-the-art works.
10 get hidden states ht of the BLSTMnet ;
11 end A. BENCHMARK DATASETS
12 add a full connection layer, whose value is 320; The final result of network traffic anomaly detection is
13 add a dropout, whose value is 0.1; closely related to the dataset. The NSL-KDD dataset is
14 for each hidden state in 1:ht ; do an enhanced version of KDD cup 1999 dataset [48],
15 obtain ht implicit representation ut through a [49], which is widely used in intrusion detection experi-
nonlinear transformation; ments. The NSL-KDD dataset not only effectively solves
16 generate a random initialization matrix uw ; the inherent redundant records problems of the KDD Cup
17 obtain the normalized importance weight coefficient 1999 dataset but also makes the number of records rea-
αt ; sonable in the training dataset and testing dataset. The
18 get the fine-grained feature s via αt and ht ; NSL-KDD dataset is mainly composed of KDDTrain+ train-
19 end ing dataset, KDDTest+ and KDDTest-21 testing dataset,
20 add a full connection layer, whose value is 1024; which can make a reasonable comparison with different
21 add a full connection layer, whose value is 10; methods of the experimental results. As shown in Table 1,
22 P = BAT − MCnet (X 00 ) ; the NSL-KDD dataset have different normal records and four
23 get Loss by pi and yi ; different types of abnormal records. The KDDTest-21 dataset
24 update BAT − MCnet by Adam with loss and η is a subset of the KDDTest+ and is more difficult for
25 return accuracy, f 1 − score; classification.
Network traffic is generally collected at fixed time inter-
vals. Essentially, network traffic data is a kind of time series
data. Network traffic is a traffic unit composed of multiple
function [42]. The goal of training this model is to minimize data packets. Each data packet is seen as a whole consisting of
the cross entropy of the expected and actual outputs for all a sequence of traffic bytes. There are 41 features from differ-
activities. The formula is shown in (16): ent data packet and 1 class label for every data packet. It can
XX j j j j be described in the following form: x = (b0 , . . . , bi ,..). bi is
C =− yi ln ai + (1 − yi ) ln(1 − ai ), (16)
the i-th feature in a data packet, and x represents a continuous
i j
features of data packet. These features include basic features
where i is the index of network traffic. j is the traffic cate- (1-10), content features (11-22) and traffic features (23-41)
gory. a is the actual category of network traffic and y is the [50]. According to its characteristics, there are four types of
predicted category. attacks in this dataset: DoS (Denial of Service attacks), R2L
Backward Propagation: The model is trained with adam (Root to Local attacks), U2R (User to Root attack), and Probe
[43]. Adam is calculated by the back-propagation algo- (Probing attacks).
rithm. Error differentials are back-propagated with the
forward-backward algorithm. Back-Propagation Through B. EVALUATION METRIC
Time (BPTT) [44], [45] is applied to calculate the error differ- In this paper, Accuracy (A) is used to evaluate the BAT-
entials. In this paper, we use the Back Propagation Through MC model. Except for accuracy, false positive rate (TPR)
Time (BPTT) algorithm to obtain the derivatives of the objec- and false positive rate (FPR) are also introduced [51]. These
tive function with respect to all the weights, and minimize the three indicators are commonly used in the research field
objective function by stochastic gradient descent. of network traffic anomaly detection, which the calculation
formula is shown as follows. Where True Positive (TP) rep-
IV. EVALUATION resents the correct classification of the Intruder. False Positive
In this section, we first determine the parameters of BAT-MC (FP) represents the incorrect classification of a normal user
to obtain the optimal model through experiments which carry taken as an intruder. True Negative (NP) represents a normal
user classified correctly. False Negative (FN) represents an TABLE 3. Super parameters of the end-to-end learning model.
instance where the intruder is incorrectly classified as a nor-
mal user.
Accuracy represents the proportion of correctly classified
samples to the total number of samples. The evaluation metric
are defined as follows:
TP + TN
accuracy, A = . (17)
TP + FP + FN + TN
True Positive Rate (TPR): as the equivalent of the Detec-
tion Rate (DR), it represents the percentage of the number of
records correctly identified over the total number of anomaly
records.
TP
DR = TPR = . (18)
TP + FN
False Positive Rate (FPR) represents the percentage of the
number of records rejected incorrectly is divided by the total
number of normal records. The evaluation metric are defined
as follows:
FP
FPR = . (19)
FP + TN In the experiment of identifying malicious traffics, when there
C. EXPERIMENTAL SETTINGS are 80 hidden nodes in the BAT-MC model, the accuracy of
BAT-MC on the KDDTest+ dataset is higher. Meanwhile,
In order to test the performance of BAT-MC model proposed
the learning rate is set to 0.01 and the number of training is
in this paper, NSL-KDD dataset is used for verification. The
100 epoches. The confusion matrix generated by the BAT-MC
data samples of the NSL-KDD dataset are divided into two
model on the KDDTest+ dataset is shown in Figure 3 and
parts: one is used to build a classifier, that is called the
Figure 4. Figure 3 and Figure 4 represent the experimental
training dataset. The other is used to evaluate the classifier,
results of the BAT-MC model for the 2-class and 5-class
that is called the testing dataset. There are 125,973 records
classification, respectively. The experimental results show
in the training set and 22,543 records in the testing set.
that most samples is concentrated on the diagonal of the
Table 2 shows the distribution of training and testing records
confusion matrix, indicating that the overall classification
for the (normal/attack) type of network traffic.
performance is very high. However, it can be intuitively seen
from the confusion matrix in Figure 3 show that the BAT-MC
TABLE 2. Distribution of training and testing records.
network achieves good detection performance in distinguish-
ing normal traffics from attack traffics (only 51 samples
are false positives), but there is still further improvement in
This model effectively avoids the problem of manual design [19] H. Shapoorifard and P. Shamsinejad, ‘‘Intrusion detection using a novel
features. Performance of the BAT-MC method is tested by hybrid method incorporating an improved KNN,’’ Int. J. Control Automat.,
vol. 173, no. 1, pp. 5–9, Sep. 2017.
KDDTest+ and KDDTest-21 dataset. Experimental results [20] J. Zhang, M. Zulkernine, and A. Haque, ‘‘Random-forests-based network
on the NSL-KDD dataset indicate that the BAT-MC model intrusion detection systems,’’ IEEE Trans. Syst., Man, Cybern. C, Appl.
achieves pretty high accuracy. By comparing with some stan- Rev., vol. 38, no. 5, pp. 649–659, Sep. 2008.
[21] B. Ingre and A. Yadav, ‘‘2015 international conference on signal process-
dard classifier, these comparisons show that BAT-MC models ing and communication engineering systems (spaces),’’ in Proc. Int. Conf.
results are very promising when compared to other current Signal Process. Commun. Eng. Syst., 2015, pp. 1–15.
deep learning-based methods. Hence, we believe that the [22] B. Ingre, A. Yadav, and A. K. Soni, ‘‘Decision tree based intrusion detec-
tion system for NSL-KDD dataset,’’ in Proc. Int. Conf. Inf. Commun.
proposed method is a powerful tool for the intrusion detection Technol. Intell. Syst., 2017, pp. 207–218.
problem. [23] M. Asadi-Aghbolaghi, A. Clapes, M. Bellantonio, H. J. Escalante,
V. Ponce-Lopez, X. Baro, I. Guyon, S. Kasaei, and S. Escalera, ‘‘A survey
on deep learning based approaches for action and gesture recognition in
REFERENCES image sequences,’’ in Proc. 12th IEEE Int. Conf. Autom. Face Gesture
[1] B. B. Zarpelo, R. S Miani, C. T. Kawakani, and S. C. de Alvarenga, Recognit. (FG), May 2017, pp. 476–483.
‘‘A survey of intrusion detection in Internet of Things,’’ J. Netw. Comput. [24] Z. Yan, ‘‘Multi-instance multi-stage deep learning for medical image
Appl., vol. 84, pp. 25–37, Apr. 2017. recognition,’’ Deep Learn. Med. Image Anal., pp. 83–104, Jan. 2017.
[2] B. Mukherjee, L. T. Heberlein, and K. N. Levitt, ‘‘Network intrusion [25] Z. Zhang, J. Geiger, J. Pohjalainen, A. E.-D. Mousa, W. Jin, and
detection,’’ IEEE Netw., vol. 8, no. 3, pp. 26–41, May 1994. B. Schuller, ‘‘Deep learning for environmentally robust speech recogni-
[3] S. Kishorwagh, V. K. Pachghare, and S. R. Kolhe, ‘‘Survey on intru- tion,’’ ACM Trans. Intell. Syst. Technol., vol. 9, no. 5, pp. 1–28, 2017.
sion detection system using machine learning techniques,’’ Int. J. Control [26] K. Noda, Y. Yamaguchi, K. Nakadai, H. G. Okuno, and T. Ogata, ‘‘Audio-
Automat., vol. 78, no. 16, pp. 30–37, Sep. 2013. visual speech recognition using deep learning,’’ Appl. Intell., vol. 42, no. 4,
[4] N. Sultana, N. Chilamkurti, W. Peng, and R. Alhadad, ‘‘Survey on pp. 722–737, Jun. 2015.
SDN based network intrusion detection system using machine learn- [27] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, ‘‘A deep learning approach
ing approaches,’’ Peer-to-Peer Netw. Appl., vol. 12, no. 2, pp. 493–501, for network intrusion detection system,’’ in Proc. 9th EAI Int. Conf. Bio-
Mar. 2019. Inspired Inf. Commun. Technol. (BIONETICS), 2016, pp. 21–26.
[5] M. Panda, A. Abraham, S. Das, and M. R. Patra, ‘‘Network intrusion [28] G. Zhao, C. Zhang, and L. Zheng, ‘‘Intrusion detection using deep belief
detection system: A machine learning approach,’’ Intell. Decis. Technol., network and probabilistic neural network,’’ in Proc. IEEE Int. Conf. Com-
vol. 5, no. 4, pp. 347–356, 2011. put. Sci. Eng. (CSE), IEEE Int. Conf. Embedded Ubiquitous Comput.
[6] W. Li, P. Yi, Y. Wu, L. Pan, and J. Li, ‘‘A new intrusion detection sys- (EUC), Jul. 2017, pp. 639–642.
tem based on KNN classification algorithm in wireless sensor network,’’ [29] N. Gao, L. Gao, Q. Gao, and H. Wang, ‘‘An intrusion detection model based
J. Electr. Comput. Eng., vol. 2014, pp. 1–8, Jun. 2014. on deep belief networks,’’ in Proc. 2nd Int. Conf. Adv. Cloud Big Data,
[7] S. Garg and S. Batra, ‘‘A novel ensembled technique for anomaly detec- Nov. 2014, pp. 247–252.
tion,’’ Int. J. Commun. Syst., vol. 30, no. 11, p. e3248, Jul. 2017. [30] M. Z. Alom, V. Bontupalli, and T. M. Taha, ‘‘Intrusion detection using
[8] F. Kuang, W. Xu, and S. Zhang, ‘‘A novel hybrid KPCA and SVM with GA deep belief networks,’’ in Proc. Nat. Aerosp. Electron. Conf. (NAECON),
model for intrusion detection,’’ Appl. Soft Comput.., vol. 18, pp. 178–184, Jun. 2015, pp. 247–252.
May 2014. [31] K. Wu, Z. Chen, and W. Li, ‘‘A novel intrusion detection model for
[9] W. Wang, M. Zhu, X. Zeng, X. Ye, and Y. Sheng, ‘‘Malware traffic clas- a massive network using convolutional neural networks,’’ IEEE Access,
sification using convolutional neural network for representation learning,’’ vol. 6, pp. 50850–50859, 2018.
in Proc. Int. Conf. Inf. Netw. (ICOIN), 2017, pp. 712–717. [32] J. Kim, N. Shin, S. Y. Jo, and S. Hyun Kim, ‘‘Method of intrusion detection
[10] P. Torres, C. Catania, S. Garcia, and C. G. Garino, ‘‘An analysis of using deep neural network,’’ in Proc. IEEE Int. Conf. Big Data Smart
Recurrent Neural Networks for Botnet detection behavior,’’ in Proc. IEEE Comput. (BigComp), Feb. 2017, pp. 313–316.
Biennial Congr. Argentina (ARGENCON), Jun. 2016, pp. 1–6. [33] A. Tatsuma and M. Aono, ‘‘Food image recognition using covariance of
[11] R. C. Staudemeyer and C. W. Omlin, ‘‘ACM press the south African insti- convolutional layer feature maps,’’ IEICE Trans. Inf. Syst., vol. E99.D,
tute for computer scientists and information technologists conference - east no. 6, pp. 1711–1715, 2016.
London, south Africa (2013.10.07-2013.10.09) proceedings of the south [34] Z. Yu, T. Li, G. Luo, H. Fujita, N. Yu, and Y. Pan, ‘‘Convolutional networks
African institute for computer scientists and information technologists co,’’ with cross-layer neurons for image recognition,’’ Inf. Sci., vols. 433–434,
in Proc. South African Inst. Comput. Scientists Inf. Technol. Conf., 2013, pp. 241–254, Apr. 2018.
pp. 252–261. [35] W. Luo, Y. Li, R. Urtasun, and R. Zemel, ‘‘Understanding the effective
[12] S. Cornegruta, R. Bakewell, S. Withey, and G. Montana, ‘‘Modelling radi- receptive field in deep convolutional neural networks,’’ in Proc. Adv.
ological language with bidirectional long short-term memory networks,’’ Neural Inf. Process. Syst., 2016, pp. 4898–4906.
in Proc. 7th Int. Workshop Health Text Mining Inf. Anal., 2016, pp. 1–11. [36] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhu-
[13] O. Firat, K. Cho, and Y. Bengio, ‘‘Multi-way, multilingual neural machine ber, ‘‘LSTM: A search space odyssey,’’ IEEE Trans. Neural Netw. Learn.
translation with a shared attention mechanism,’’ in Proc. Conf. North Amer. Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017.
Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2016, pp. 1–10. [37] F. Ordóñez and D. Roggen, ‘‘Deep convolutional and LSTM recurrent
[14] H. Zhang, ‘‘Design of intrusion detection system based on a new pattern neural networks for multimodal wearable activity recognition,’’ Sensors,
matching algorithm,’’ in Proc. Int. Conf. Comput. Eng. Technol., Jan. 2009, vol. 16, no. 1, p. 115, Jan. 2016.
pp. 545–548. [38] F. A. Gers, J. Schmidhuber, and F. Cummins, ‘‘Learning to forget:
[15] C. Yin, ‘‘An improved BM pattern matching algorithm in intrusion Continual prediction with LSTM,’’ Neural Comput., vol. 12, no. 10,
detection system,’’ Appl. Mech. Mater., vols. 148–149, pp. 1145–1148, pp. 2451–2471, Oct. 2000.
Jan. 2012. [39] N. Pappas and A. Popescu-Belis, ‘‘Multilingual hierarchical attention net-
[16] P.-F. Wu and H.-J. Shen, ‘‘The research and amelioration of pattern- works for document classification,’’ in Proc. IJCNLP, 2017, pp. 1–11.
matching algorithm in intrusion detection system,’’ in Proc. IEEE 14th Int. [40] Y. Hua, Z. Zhao, R. Li, X. Chen, Z. Liu, and H. Zhang, ‘‘Deep learning
Conf. High Perform. Comput. Commun., IEEE 9th Int. Conf. Embedded with long short-term memory for time series prediction,’’ IEEE Commun.
Softw. Syst., Jun. 2012, pp. 1712–1715. Mag., vol. 57, no. 6, pp. 114–119, Jun. 2019.
[17] V. Dagar, V. Prakash, and T. Bhatia, ‘‘Analysis of pattern matching algo- [41] S. Iamsa-at and P. Horata, ‘‘Handwritten character recognition using his-
rithms in network intrusion detection systems,’’ in Proc. 2nd Int. Conf. Adv. tograms of oriented gradient features in deep learning of artificial neural
Comput., Commun., Autom. (ICACCA), Sep. 2016, pp. 1–5. network,’’ in Proc. Int. Conf. IT Converg. Secur. (ICITCS), Dec. 2013.
[18] M. S. Pervez and D. M. Farid, ‘‘Feature selection and intrusion classifi- [42] A. Boubezoul and S. Paris, ‘‘Application of global optimization meth-
cation in NSL-KDD cup 99 dataset employing SVMs,’’ in Proc. 8th Int. ods to model and feature selection,’’ Pattern Recognit., vol. 45, no. 10,
Conf. Softw., Knowl., Inf. Manage. Appl. (SKIMA, Dec. 2014, pp. 1–6. pp. 3676–3686, Oct. 2012.
[43] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic HUAZHI SUN received the Ph.D. degree from the
optimization,’’ 2014, arXiv:1412.6980. [Online]. Available: University of Science and Technology of Beijing,
https://arxiv.org/abs/1412.6980 China, in 2008. He is currently a Professor with the
[44] M. Zeng, L. T. Nguyen, B. Yu, O. J. Mengshoel, J. Zhu, P. Wu, and School of Computer and Information Engineer-
J. Zhang, ‘‘Convolutional neural networks for human activity recognition ing, Tianjin Normal University, China. His main
using mobile sensors,’’ in Proc. 6th Int. Conf. Mobile Comput., Appl. research interests include mobile computing and
Services, 2014, pp. 197–205. distributed computing.
[45] A. Graves, S. Fernĺćndez, and F. Gomez, ‘‘Connectionist temporal clas-
sification: Labelling unsegmented sequence data with recurrent neural
networks,’’ in Proc. Int. Conf. Mach. Learn., 2006, pp. 369–376.
[46] S. Revathi and A. Malathi, ‘‘A detailed analysis on NSL-KDD dataset using
various machine learning techniques for intrusion detection,’’ Int. J. Eng.
Res. Technol., vol. 2, no. 12, pp. 1848–1853, 2013.
[47] D. H. Deshmukh, T. Ghorpade, and P. Padiya, ‘‘Improving classifica-
tion using preprocessing and machine learning algorithms on NSL-KDD JINQI ZHU received the Ph.D. degree in computer
dataset,’’ in Proc. Int. Conf. Commun., Inf. Comput. Technol. (ICCICT), science from the University of Electronic Sci-
Jan. 2015, pp. 1–6. ence and Technology of China (UESTC), China,
[48] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis in 2009. In 2013, she joined Nanyang Techno-
of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur. logical University (NTU) as a Visiting Scholar,
Defense Appl., Jul. 2009, pp. 1–6. under the supervision of Dr. Y. G. Wen. She is
[49] V. Engen, J. Vincent, and K. Phalp, ‘‘Exploring discrepancies in findings
currently an Associate Professor with the School
obtained with the KDD Cup ‘99 data set,’’’ Intell. Data Anal., vol. 15, no. 2,
of Computer and Information Engineering, Tian-
pp. 251–276, Mar. 2011.
[50] L. Dhanabal and S. P. Shantharajah, ‘‘A study on NSL-KDD dataset for jin Normal University, China. Her main research
intrusion detection system based on classification algorithms,’’ vol. 4, interests include mobile computing, vehicular net-
no. 6, pp. 446–452, 2015. works, and security networks.
[51] E. M. Stock, J. D. Stamey, R. Sankaranarayanan, D. M. Young,
R. Muwonge, and M. Arbyn, ‘‘Estimation of disease prevalence, true
positive rate, and false positive rate of two screening tests when disease
verification is applied on only screen-positives: A hierarchical model
using multi-center data,’’ Cancer Epidemiol., vol. 36, no. 2, pp. 153–160, SHENG WANG is currently pursuing the mas-
Apr. 2012. ter’s degree with the Academy of Computer and
[52] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru- Information Engineering, Tianjin Normal Univer-
sion detection using recurrent neural networks,’’ IEEE Access, vol. 5, sity, China. His current research interests include
pp. 21954–21961, 2017. network technology, big data analysis, and deep
[53] T. A. Tang, L. Mhamdi, D. Mclernon, S. A. R. Zaidi, and M. Ghogho, learning.
‘‘Deep learning approach for network intrusion detection in software
defined networking,’’ in Proc. Int. Conf. Wireless Netw. Mobile Commun.
(WINCOM), Oct. 2016.
[54] Y. Ding and Y. Zhai, ‘‘Intrusion detection system for NSL-KDD dataset
using convolutional neural networks,’’ in Proc. 2nd Int. Conf. Comput. Sci.
Artif. Intell. (CSAI), 2018, pp. 81–85.
TONGTONG SU was born in 1992. He received YABO LI is currently pursuing the master’s degree
the master’s degree in computer science from the in computer application technology with Tian-
School of Computer and Information Engineer- jin Normal University. Her main research inter-
ing, Tianjin Normal University, in 2019. His main ests include wireless self-organizing networks and
research interests include machine learning and mobile computing.
pattern recognition.