A network intrusion detection method based on semantic Re-encoding and
A network intrusion detection method based on semantic Re-encoding and
A R T I C L E I N F O A B S T R A C T
Keywords: In recent years, with the increase of human activities in cyberspace, intrusion events, such as network penetra-
Intrusion detection tion, detection and attack, tend to be frequent and hidden. The traditional intrusion detection methods which
Semantic re-encoding
prefer rules are not enough to deal with the increasingly complex network intrusion flow. However, the gen-
Deep learning
eralization ability of intrusion detection system based on classical machine learning method is still insufficient,
and the false alarm rate is high. Aiming at this problem, we consider that normal network traffic and intrusion
network traffic are obviously different in several semantic dimensions, though the intrusion traffic is more and
more covert. Then we propose a new intrusion detection method, named SRDLM, based on semantic re-encoding
and deep learning. The SRDLM method re-encodes the semantics of network traffic, increases the distinguish
ability of traffic, and enhances the generalization ability of the algorithm by using deep learning technology,
thus effectively improving the accuracy and robustness of the algorithm. The accuracy of the SRDLC algorithm
for Web character injection network attack detection is over 99%. When detecting the NSL-KDD data set, the
average performance is improved by more than 8% compared with the traditional machine learning method.
∗ Corresponding author.
E-mail addresses: [email protected] (Z. Wu), [email protected] (J. Wang), [email protected] (L. Hu), [email protected] (Z. Zhang), [email protected] (H. Wu).
https://doi.org/10.1016/j.jnca.2020.102688
Received 28 November 2019; Received in revised form 18 March 2020; Accepted 29 April 2020
Available online 5 May 2020
1084-8045/© 2020 Elsevier Ltd. All rights reserved.
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
deep learning. Semantic re-encoding technology attempts to re-express the performance maybe not so good on other intrusion datasets. Gao
the semantic space of intrusion traffic to achieve the purpose of increas- et al. (2019) apply several base classifiers, including decision tree, ran-
ing the distinguishability of abnormal traffic. On the basis of semantic dom forest, kNN, DNN, and design an ensemble adaptive voting algo-
re-encoding, deep learning technology is used to enhance the general- rithm. Zhang et al. (2018), propose an Xgboost based on stacked sparse
ization ability of the intrusion detection model. The main contributions autoencoder network (SSAE-XGB) method. The SSAE is used to learn
of this work are as follows: the latent representation of original data, and the ensemble binary tree
do the classification. This method has weak point on the interpretable
1. We find that the semantics of network traffic are different. Nor-
of feature.
mal network traffic and attack network traffic often have signifi-
Yang et al. (2019) apply DCGAN to generating new samples and
cant differences in narrative semantics. Based on this, a semantic
then use the modified LSTM classifier method to classify the datasets.
re-encoding method for intrusion network flow is designed, which
This method showed the augment of dataset is important. However,
can effectively increase the distinguish ability of abnormal network
GAN needs huge compute resources and the generation of new data is
traffic.
lack of interpretable.
2. We design a deep learning-based detection model for intrusion
Compared with previous works, we propose a combination of algo-
traffic, which enhances the generalization capabilities of intrusion
rithm based on Semantic Recoding and Deep Learning. The semantic
detection models.
re-encoding is an effective feature extraction algorithm for intrusion
Experimental results show that our approach get competitive per- detection with good interpretable on feature, and the Deep Learning is
formance. to gain good generalization ability.
The rest of this paper is organized as follows: Section 2 introduces
the related works. Section 3 describes our proposed method in detail. 3. Proposed methodology
Section 4 shows experimental performances. Finally, the conclusion is
presented in Section 5. 3.1. Problem formulation
2
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
Internet uses the HTTP protocol to express its semantics in the form of procedure makes any unequal-length word sequence to the equal-length
the character stream. Furthermore, the network traffic of some non-http word sequence. When all the positive and negative samples re-map, we
protocol can also be analyzed through its protocol format and then be get a set of training samples of equal length.
transformed into a character stream. Step 5, Word Sequence Re-projection. Word sequence re projection
Thus, the step 1 is “Transform data to character stream”. can be used as an optional step to make the new word sequence more
Step 2, Word Segmentation. It means a word table that needs to distinguishable in semantic space. The re-projection function can be as
be extracted from the raw character stream. We use delimiters such as follows:
breakpoint punctuation and special characters to split character stream.
S = {(xi , yi )} , 1 ≤ i ≤ m, y ∈ {0, 1} (1)
After that, we split the character stream into word string, and then pick
up the words into a word table separately. For example, in the HTTP m is the number of training samples, and Xi means the record (word
protocol, ‘&’, ‘|’, ‘∖’, ‘?’, ‘‖’ are delimiters. As delimiters will vary with sequence) in the training set. The length of the word bag is n. y ∈ {0, 1}
the network protocols, it needs to be updated continuously. According is the label of sample.
to the original word list, all records in the character stream are con- First, calculate the mean of positive and negative samples 𝜇 j respec-
verted into word array, and the access string is rearranged into a record tively, j means the class type, positive and negative.
set composed of words in the word list. Abnormal and normal sam- ∑
1
ples from the access traffic are processed, and a sample record set with 𝜇j = x , j ∈ {0, 1}, 1 ≤ i ≤ m (2)
mj (x ,y )∈S&y =j i
words segmented is formed. i i i
Step 3, Word Table Reordering. We want to get a new word table Second, calculate the intra-class dispersion matrix of positive and
to map the origin word table. As shown in Fig. 1, the new word table negative samples Sw . X0 is the sample in the positive set, and X1 is the
is ordered by the difference between positive and negative sample. We sample in the negative one.
then need to calculate the difference between positive and the negative ∑ ∑
sample, and define it as comprehensive word frequency (CWF). We cal- Sw = (x − 𝜇 0 ) (x − 𝜇 0 )T + (x − 𝜇 1 ) (x − 𝜇 1 )T (3)
culate positive and negative Word Frequency (WF) in the original word x ∈X 0 x ∈X 1
table separately and sort word array according to the word frequency. Third, calculate the inter-class dispersion matrix between positive
If a word appears multiple times in a row, it is counted as one time. Set and negative one.
threshold T1, T2, we can easily see from Fig. 1 that T1 > 0 and T2 < 0.
If CWF > T1 or CWF < T2, which means that the WF between posi- Sd = (𝜇0 − 𝜇1 ) (𝜇0 − 𝜇1 )T (4)
tive and negative is enormous, thus we will do One-to-one re-encoding. Finally, calculate positive and negative sample space separation pro-
While CWF < T1 and CWF > T2, do Many-to-one re-encoding. Besides, jection vectors W. W is the corresponding eigenvector of the largest
the multiple words that CWF < T1 and CWF > T2 are combined into eigenvalue of the matrix S− 1
w Sd . Then we do the dimensionality reduc-
word WordM. And unknown words are also encoded as WordM. Sort- tion operation, and the new feature space is wT xi .
ing the original word table by CWF, we get the new word table, named
Word-table-after. S1 = {(xi1 , yi1 )} , 1 ≤ i1 ≤ m, y ∈ {0, 1} (5)
Step 4, Word Re-mapping. Positive and negative sample records are
xi1 is the record in the training set. The dimension is n1. If the pre-
remapped to new isometric sample records based on the new word table
vious semantic re-encoding can better reflect the difference between
(Word-table-after). As shown in Fig. 2, in the whole mapping process,
normal and attack traffic, then this step can further open the spatial
multiple words will be mapped to one word, which is called Word bag
distance of the positive and negative samples. The algorithm is formal-
mapping. When n words in the sample map to the same word, the value
ized in Algorithm 1.
at the corresponding position of the word is added by n. The mapping
3
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
4
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
n = v − vn (7)
s=v∗v (8)
5
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
6
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
Sometimes, even for the same type of attack, the network traffic it
generates will have many different types of characteristics. For exam- Algorithm 3 Multi-space Projection algorithm.
ple, in the NSL-KDD attack dataset, the DoS attack type is not balanced. 1: Pre-projection and get several training sets.
As shown in Fig. 8, the DoS and Normal data sets are processed by PCA 2: Train BCs for classify using different training sets.
(principal component analysis) algorithm for easy observation. The red 3: Aggression the results of BCs.
is the DoS dataset, and the blue is the Normal dataset. It can be seen 4: Final judge.
from Fig. 8 that the red dot overlaps the blue dot, and the DoS datasets
are distributed in different three regions. It is difficult for the learning
algorithm to learn three different distribution areas for the same type 4. Experimental results and analysis
of attack traffic at the same time.
We describe the steps of Algorithm 3 in the following. This experiment examines two data sets. One is the dataset collected
The first step in Algorithm 3 is to use K-means to get more negative especially for web attack in Hangdian Security Lab, which contains
set. In other words, we separate the negative sets into more groups. The both the normal and abnormal http streams, named Hduxss_data1.0.
positive sample set is still in a group. For example, we change the attack The other is the NSL-KDD, which is considered to be the benchmark
type of DoS into DoS1, DoS2, and DoS3. The purpose of this step is to evaluation data set in the field of intrusion detection. The experiment
make the classification of the sample set closer to the multi-dimensional is performed on Pytorch 1.0 using a computer with GPU 2080ti, the
normal distribution. operating system is Ubuntu 18.04, and the memory is 32G.
The second step: Multi-type binary classifiers (BCs) are used to
divide the positive and negative samples. One of BCs consists of a pre- 4.1. Hduxss_data1.0 data set
projection module + a specific training set + a ResNet network. Assum-
ing that the pre-projection module runs first, and the network flow fea- The Hduxss_data1.0 data set consists of three types of data:
ture space obtained has a better specific feature distribution. The new
distribution can be taken as a multidimensional normal distribution, 1. Attack samples generated by the SQLMAP(SQLMAP is an open
and it can be effectively distinguished by BCs. Although ResNet can be source penetra-tion testing tool that automates the process of detect-
stacked to very deep layers, we think that even network is not so deep ing and exploiting SQL injection flaws and taking over of database
enough to handle this problem. The structure of BCs is shown in Fig. 9. servers). The total number of this type is 810,000.
There are several BCs and each of them classifies one type of sample. 2. Attack samples manually and automatically generated by the XSS-
For an in-coming connection x, BCs make their decision, and if the BCi LESS tool; the total number of this type is 11,000.
considers that it belongs to the normal, we define BCi (x) = 1, and 3. Normal request samples collected through Firefox browser when
when it belongs to the abnormal, we define BCi (x) = 0. browses various web pages. We extract parameters from them and
The third step is to collect the results of the BCs, and then to calcu- obtains 130,000 normal samples.
late the sum of the results. For normal samples and abnormal samples, we randomly select 50%
The final step is to judge if the result is more than 1, then the x is of the data for training, 50% of the data for testing, and repeat 10
a normal sample, or else the x is an attack sample. That is, if any of rounds of testing for the average. The results are shown in Table 3. In
the four outcomes votes the connection to be an attack type, then we the test process, all data is processed by SRDLM Algorithm 1, and the
make the decision that the connection is an attack. The transformation outputs are reclassified by support vector machine, naive Bayes, and
function is as follows: SRDLM Algorithm 3. It can be seen from Table 3 that all the machine
∑
3 learning methods involved in the test achieved an accuracy of more
BCi (x) >= 1 (9) than 99% and F1 values, which is mainly because Algorithm 1 has
i=0
a good feature separation effect on the above attack traffic. At the
We formalize this algorithm in Algorithm 3. same time, the SRDLM algorithm achieved the best recognition per-
formance compared with the classical machine learning algorithm. The
7
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
Table 3
The result of Hduxss_data1.0 dataset.
accuracy precision recall F1
Algorithm 1 +SVM 99.3% 99.4% 99.2% 0.993
Algorithm 1 +Naive Bayes 99.0% 99.1% 99.1% 0.991
SRDLM(including Algorithm 1) 99.5% 99.7% 99.5% 0.996
Table 4
The data distribution of KDDTrain+ and KDDTest+.
Total Normal DoS Probe R2L U2R
KDDTrain+ 125973 37343 45927 11656 995 52
KDDTest+ 22544 9711 7458 2421 2754 200
Table 5
The correct predicted number of all types on KDDTest+.
kNN kNN rate Shallow CNN Shallow CNN rate ResNet ResNet rate
Normal 9478 97.60% 9126 96.38% 8067 89.04%
DoS 6094 87.71% 6315 84.67% 7099 95.19%
Probe 1863 76.95% 2116 87.40% 2353 97.19%
R2L 149 5.41% 70 2.54% 1036 37.62%
U2R 115 57.5% 112 56% 119 59.5%
8
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
the most to the determination of the sample type. Then, the perfor-
mance on R2L of classifiers through feature 34, 35, 36 was studied.
The result is shown in Fig. 12. In Fig. 12, the x-axis, y-axis, and z-axis
are respectively features 34,35, and 36 named ‘dst_host_same_srv_rate’,
‘dst_host_diff_srv_rate’ and ‘dst_host_same_src_port_rate’. There are three
types of dots in the figure. The green dots and the orange dots mean
the test samples of R2L type wrongly predicted and correctly predicted
separately by the models we trained. And the red dots mean the train
samples of R2L type. The abbreviated name of the three types of dots
in Fig. 12 are ‘R2L_test_F’, ‘R2L_test_T’ and ‘R2L_train’. Obviously, the
orange dots in Fig. 12(c) are much larger than those in Fig. 12(a) (b).
From Fig. 12(c), it can be seen that the trained ResNet model learns
some of the attack traffic when the feature 34 is between (0–0.6), the
feature 35 is between (0–1) and the feature 36 is between (0–0.7).
While in the kNN and Shallow CNN, from Fig. 12(a) and (b), it can
hardly be seen that the correct predicted sample. We then created new
5000 samples that the feature 34, 35, 36 were in the interval of (0–0.6),
(0–1), (0–0.7) separately, while the other features were the same as one
of R2L samples in the training set. From Fig. 13, 67% of the samples
were predicted right through the ResNet model. The same test works
on KNN and shallow CNN networks, 7% of the samples were predicted
right for kNN model, while all the created 5000 samples were predicted
as the opposite type on the Shallow CNN model. Based on the present
experiments, we infer that the kNN model and the Shallow CNN model
does not learn the mapping of feature 34, 35, 36 to the classification
of the result. While the ResNet model is successfully generated from
the training sample to the near region. But there are still regions that
could not be generated well, which means the ResNet model still has
some limitation on generation. Comparing with the other two models,
ResNet model has better generalization ability.
9
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
Table 6
The architecture of the ResNet used in the proposed
method.
Output map size 26∗26 13∗13 7∗ 7
#layers 1+2n 2n 2n
#filters 2 4 8
Table 7
Accuracy for binary classification
based on training data.
Depth of ResNet accuracy
ResNet_8 97
ResNet_20 97.35
ResNet_56 97.5
Fig. 15. The accuracy for binary classification based on KDDTrain+ and
20, 56-layer networks. From Table 7, it can be seen that with the deep- KDDTest+.
ening of network depth, there is hardly an improvement in accuracy.
It can be speculated that the depth of the 8-layer ResNet network is
enough, and the remaining data that could not be accurately identified There seems still some false alarm and missing alarm. In order to
is the data that the training set does not know at all. Deeper network promote the ability of generalization, we create some predicted data
means more time. As a result, we chose 8-layer ResNet in our following according to the feature of intrusion traffic, and put them into the
experiment. training set. And we use this new training dataset to train. The hyper-
parameters are the same as the previous one. As is shown in Fig. 14, the
semantically recoded data set performed better than the original data
4.2.3. Evaluating the impact of the semantic re-encoding and re-mapping set.
Three kinds of dimensions are employed to test the performance of We want to find why there are samples that cannot be classified
the feature re-encoding algorithm. They are 40∗4, 172∗4,194∗4. For right. So we mixed the KDDTrain+ with KDDTest+, and split it in to
the 40∗4 dimension, we remove the unusable feature 7 from the train- 10 folds with the same distribution of different types of network traffic.
ing and test set, and then copy 3 times to form a 40∗4 dimension. For Each time, 9 of them are used as the training set and 1 is used as the
the 172∗4 and 194∗4 dimension, Algorithm 2 is used to cut the two test set. After 10 times training and testing, we get the average accu-
dimensions to 676 dimensions, which could be transformed into 26∗26. racy. From Fig. 14, it can be seen that the cross-validated detection
In this section, we use 40∗4,172∗4 and 194∗4 dimension datasets rate raises obviously than the other two, and the re-encoded dataset is
to evaluate the re-encoding impact on the binary classification of NSL- a little better than the 40∗4 dimension. The reason can be that the
KDD. samples in KDDTrain + are not very typical. Some sample types of
First, KDDTrain + is used as the training set, and KDDTest + is KDDTest + do not appear at all in KDDTrain+, and the attack sam-
used as the test set. We set the batchsize of the training set to 16384. ples themselves are significantly different, which makes it difficult to
And the learning rate is 0.01, the optimization method is Adam, the predict KDDTest + based on KDDTrain+.
weight decay is 0.1, and the number of epoch is 10000. It can be seen We also compare with big convolution in the first layer, through
from Fig. 14 that the accuracy of 172∗4 and 194∗4 is better than 40∗4 trying to set the first convolution layer size to 7∗ 7, and contrast with the
dimension. This indicates the effectiveness of semantic re-encoding. 3∗3 network. The Algorithm 3 is used. Table 8 shows the result of BCs,
the 40∗4_3 means the 40∗4 dimension with the first convolution of 3∗3,
and the 40∗4_7 means the 40∗4 dimension with the first convolution of
7∗ 7. Referring to Fig. 8, the DoS dataset is separated into 3 parts. In
the process of experiment, it is found that although a single DoS is
easy to be classified, the effect of the combination of the BCs of DoS
and other Attacks is not good. For example, when using accuracy as
the classifier design criteria, the results of DoS classification have little
impact on the final classification results after merging the results of
all BCs. Through the experiment, it is found that using the precision
standard to design DoS classifier is the most helpful to the result of
multi BCs fusion. Therefore, in the experimental process, the precision
standard is used for the design of DoS classifier, and the rest of the
classifier design use the accuracy standard. The precisions of the DoS
classifiers are shown in Table 8. As shown in Fig. 15, the network whose
first convolution is 7∗ 7 convolution outperform the 3∗3 size, and the
best accuracy of this experiment is 94.03%.
10
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
Table 8
Accuracy for binary classification based on KDDTrain+ and KDDTest+.
Normal accuracy R2l&U2R accuracy Probe accuracy DoS precision accuracy
DoS1 DoS2 DoS3
Normal ResNet(40 ∗ 4_3) 84.67 91.27 95 99.8 92.2 75.1 88.75
SRDLM(172 ∗ 4_3) 90.08 95.57 94.54 98.6 91.36 94.3 91.65
SRDLM(194 ∗ 4_3) 91.87 93.7 95.6 99.6 99.4 95.6 93.37
Normal ResNet(40 ∗ 4_7) 90.04 89.02 93.64 99.7 59 93.3 90.2
SRDLM(172 ∗ 4_7) 88.4 94.9 94 99.9 84.7 89.5 91.73
SRDLM(194 ∗ 4_7) 90.13 94.8 95.9 95.5 97 93.7 94.03
Table 9 with huge semantic coding space and negligible word order. How-
Performance of the proposed method for testing KDDTest+. ever, for the network traffic that has been extracted features, seman-
tic re-encoding technology has limited performance improvement in
accuracy precision recall F1
traffic detection. Semantic re-encoding technology can be combined
kNN 79.06% 67.8% 97.7% 0.8008
with deep learning technology to achieve better network traffic detec-
DBN 63.9% 60.45% 46.8% 0.5276
Shallow CNN 79.72% 68.92% 96.37% 0.8037 tion results. This paper studies the ResNet network architecture and
ANN (Ingre and Yadav, 2015) 81.16% 96.59% 69.35% 0.8073 combines ResNet with semantic re-encoding to effectively improve the
RNN-IDS (Yin et al., 2017) 83.28% 96.92% 72.95% 0.8324 generalization ability of the network anomaly detection model. The
SRDLM 94.03% 95.37% 90.53% 0.9288
follow-up work will study the prediction of network abnormal traffic
to enhance the robustness of the network detection model.
Acknowledgement
the SRDLM, the classification result is unsatisfactory. It can be inferred This research is supported by National Natural Science Foun-
that the semantic re-encoding and ResNet improved the gap between dation of China (No.61772162), Key Projects of NSFC Joint Fund
the attack and the normal samples, and the deep learning algorithm of China (No.U1866209), National Natural Science Foundation
has the ability to fit nonlinear operations. Comparing shallow CNN of China (No.61602144), National Key R&D Program of China
with SRDLM, the advantage of semantic re-encoding and the genera- (No.2018YFB0804102).
tion ability of ResNet are shown obviously. In addition, as the ResNet
has the strength of CNN, it runs less time than RNN-IDS. Compared References
with several current algorithms, the SDRLM algorithm can effectively
learn the latent feature of attack sample of each types attack and make Aburomman, A.A., Reaz, M.B.I., 2017. A survey of intrusion detection systems based on
ensemble and hybrid classifiers. Comput. Secur. 65, 135–152.
the re-encoded semantic space boundaries clearing, thus leading a good Al-Qatf, M., Lasheng, Y., Al-Habib, M., Al-Sabahi, K., 2018. Deep learning approach
accuracy. Table 10 shows the computational complexity of related typ- combining sparse autoencoder with svm for network intrusion detection. IEEE
ical algorithms. As can be seen in Table 10, the complexity of kNN Access 6, 52843–52856.
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K., 2013. Network anomaly detection:
is O(m∗n). When classifying test samples, the algorithm complexity of methods, systems and tools. IEEE Commun. Surv. Tutor. 16, 303–336.
kNN is positively related to the size of existing samples, while CNN and Blanco, R., Malagn, P., Cilla, J.J., Moya, J.M., 2018. Multiclass network attack classifier
SRDLM algorithms are not limited by the size of existing sample sets. using cnn tuned with genetic algorithms. In: 28th International Symposium on
Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 177–182.
The complexity of the rest algorithms in Table 9 has nothing to do with
Chouhan, N., Khan, A., et al., 2019. Network anomaly detection using channel boosted
the number of training set, but they are related to the size of the learn- and residual learning based deep convolutional neural network. Appl. Soft Comput.
ing model. The complexity of CNN is O(n∗r∗c∗l), and the SRDLM is just 83, 105612.
k times more than the CNN model because of the k BCs computing. The Deshmukh, D.H., Ghorpade, T., Padiya, P., 2015. Improving classification using
preprocessing and machine learning algorithms on nsl-kdd dataset. In: International
r, c, k, and l values of the SRDLM algorithm are all <10. By introducing Conference on Communication, Information & Computing Technology (ICCICT).
GPU parallel computing, the algorithm’s running time can be reduced IEEE, pp. 1–6.
to an acceptable level. Dhanabal, L., Shantharajah, S., 2015. A study on nsl-kdd dataset for intrusion detection
system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng.
4, 446–452.
5. Conclusions Gao, X., Shan, C., Hu, C., Niu, Z., Liu, Z., 2019. An adaptive ensemble machine learning
model for intrusion detection. IEEE Access 7, 82512–82521.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition.
This paper proposes an SRDLM intrusion detection method based In: Proceedings of the IEEE Conference on Computer Vision and Pattern
on semantic re-encoding and deep learning. The SRDLM algorithm Recognition, pp. 770–778.
has advantages in dealing with anomaly detection of network traffic
11
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688
Heba, F.E., Darwish, A., Hassanien, A.E., Abraham, A., 2010. Principle components in Computing, Communications and Informatics (ICACCI). IEEE, pp. 1222–1228.
analysis and support vector machine based intrusion detection system. In: 10th Wu, K., Chen, Z., Li, W., 2018. A novel intrusion detection model for a massive network
International Conference on Intelligent Systems Design and Applications. IEEE, pp. using convolutional neural networks. IEEE Access 6, 50850–50859.
363–367. Yang, J., Li, T., Liang, G., He, W., Zhao, Y., 2019. A simple recurrent unit model based
Hsu, C.-M., Hsieh, H.-Y., Prakosa, S., Azhari, M., Leu, J.-S., 2018. Using long-short-term intrusion detection system with dcgan. IEEE Access 7, 83286–83296.
memory based convolutional neural networks for network intrusion detection. In: Yin, C., Zhu, Y., Fei, J., He, X., 2017. A deep learning approach for intrusion detection
WICON, pp. 86–94. using recurrent neural networks. IEEE Access 5, 21954–21961.
Ingre, B., Yadav, A., 2015. Performance analysis of nsl-kdd dataset using ann. In: Zhang, B., Yu, Y., Li, J., 2018. Network intrusion detection based on stacked sparse
International Conference on Signal Processing and Communication Engineering autoencoder and binary tree ensemble method. In: IEEE International Conference on
Systems. IEEE, pp. 92–96. Communications Workshops (ICC Workshops). IEEE, pp. 1–6.
Jaiganesh, V., Mangayarkarasi, S., Sumathi, P., 2013. Intrusion detection systems: a Zhang, Y., Li, P., Wang, X., 2019. Intrusion detection for iot based on improved genetic
survey and analysis of classification techniques. Int. J. Adv. Res. Comput. Commun. algorithm and deep belief network. IEEE Access 7, 31711–31722.
Eng. 2, 1629–1635.
Kabir, E., Hu, J., Wang, H., Zhuo, G., 2018. A novel statistical technique for intrusion
detection systems. Future Generat. Comput. Syst. 79, 303–318.
Zhendong Wu received the M.S. degree and the PhD degree in Computer Science and
Le, T.-T.-H., Kim, Y., Kim, H., et al., 2019. Network intrusion detection based on novel Technology from the Zhejiang University, Hangzhou, China. Currently, he is an Associate
feature selection model and various recurrent neural networks. Appl. Sci. 9, 1392.
Professor with the School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China.
Li, L., Yu, Y., Bai, S., Hou, Y., Chen, X., 2017. An effective two-step intrusion detection His current research interests include biometrics, biological cryptography, machine intel-
approach based on binary classification and k-nn. IEEE Access 6, 12060–12073.
ligence and natural language research.
Maaten, L.v. d., Hinton, G., 2008. Visualizing data using t-sne. J. Mach. Learn. Res. 9,
2579–2605.
Jingjing Wang is currently pursuing the master degree in Information security at
Moustafa, N., Hu, J., Slay, J., 2019. A holistic review of network anomaly detection
Hangzhou Dianzi University, Hangzhou, China. Her research interests include data min-
systems: a comprehensive survey. J. Netw. Comput. Appl. 128, 33–55.
ing, deep learning and intrusion detection.
Naoum, R.S., Al-Sultani, Z.N., 2012. Learning vector quantization (lvq) and k-nearest
neighbor for intrusion classification. World Comput. Sci. Inf. Technol. J. (WCSIT) 2,
Liqin Hu received the Ph.D. degree in mathematics from the Nanjing University of Aero-
105–109.
nautics and Astronautics, Nanjing, China. She is a lecturer of the School of Cyberspace
Naseer, S., Saleem, Y., Khalid, S., Bashir, M.K., Han, J., Iqbal, M.M., Han, K., 2018.
Security at Hangzhou Dianzi University. Her research interests include cryptography, and
Enhanced network anomaly detection based on deep neural networks. IEEE Access
coding theory.
6, 48231–48246.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017.
Grad-cam: visual explanations from deep networks via gradient-based localization. Zhang Zhang is currently pursuing the master degree in School of Systems Science at
In: Proceedings of the IEEE International Conference on Computer Vision, pp. Beijing Normal University, Beijing, China. His research interests include complex system
618–626. and Machine learning techniques.
Shrivas, A.K., Dewangan, A.K., 2014. An ensemble model for classification of attacks
with feature selection based on kdd99 and nsl-kdd data set. Int. J. Comput. Appl. 99, Han Wu is currently pursuing the master degree in Cyberspace security at Hangzhou
8–13. Dianzi University, Hangzhou, China. His research interests include computer vision, deep
Vinayakumar, R., Soman, K., Poornachandran, P., 2017. Applying convolutional neural learning and datamining.
network for network intrusion detection. In: International Conference on Advances
12