0% found this document useful (0 votes)
5 views

A network intrusion detection method based on semantic Re-encoding and

The document presents a new network intrusion detection method called SRDLM, which utilizes semantic re-encoding and deep learning to improve the accuracy and robustness of intrusion detection systems. The method enhances the distinguishability of network traffic by re-encoding its semantics, leading to over 99% accuracy in detecting web character injection attacks and an 8% performance improvement over traditional machine learning methods on the NSL-KDD dataset. The study highlights the limitations of existing intrusion detection techniques and proposes a novel approach to address the challenges posed by dynamic network intrusion traffic.

Uploaded by

Safwan Sulaiman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

A network intrusion detection method based on semantic Re-encoding and

The document presents a new network intrusion detection method called SRDLM, which utilizes semantic re-encoding and deep learning to improve the accuracy and robustness of intrusion detection systems. The method enhances the distinguishability of network traffic by re-encoding its semantics, leading to over 99% accuracy in detecting web character injection attacks and an 8% performance improvement over traditional machine learning methods on the NSL-KDD dataset. The study highlights the limitations of existing intrusion detection techniques and proposes a novel approach to address the challenges posed by dynamic network intrusion traffic.

Uploaded by

Safwan Sulaiman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Journal of Network and Computer Applications 164 (2020) 102688

Contents lists available at ScienceDirect

Journal of Network and Computer Applications


journal homepage: www.elsevier.com/locate/jnca

A network intrusion detection method based on semantic Re-encoding and


deep learning
Zhendong Wu ∗ , Jingjing Wang, Liqing Hu, Zhang Zhang, Han Wu
School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China

A R T I C L E I N F O A B S T R A C T

Keywords: In recent years, with the increase of human activities in cyberspace, intrusion events, such as network penetra-
Intrusion detection tion, detection and attack, tend to be frequent and hidden. The traditional intrusion detection methods which
Semantic re-encoding
prefer rules are not enough to deal with the increasingly complex network intrusion flow. However, the gen-
Deep learning
eralization ability of intrusion detection system based on classical machine learning method is still insufficient,
and the false alarm rate is high. Aiming at this problem, we consider that normal network traffic and intrusion
network traffic are obviously different in several semantic dimensions, though the intrusion traffic is more and
more covert. Then we propose a new intrusion detection method, named SRDLM, based on semantic re-encoding
and deep learning. The SRDLM method re-encodes the semantics of network traffic, increases the distinguish
ability of traffic, and enhances the generalization ability of the algorithm by using deep learning technology,
thus effectively improving the accuracy and robustness of the algorithm. The accuracy of the SRDLC algorithm
for Web character injection network attack detection is over 99%. When detecting the NSL-KDD data set, the
average performance is improved by more than 8% compared with the traditional machine learning method.

1. Introduction The traditional machine-learning methods include Support vector


machine (SVM), k-Nearest Neighbor (kNN), Decision Trees, and so on.
With the development of information technology, people at present As collected data sets become larger and larger, deep learning-based
enjoy the convenience of network. While the number and scale of secu- approaches are gaining much attention since they can learn computa-
rity threats is growing rapidly, which has caused great damage to net- tional process in depth and may lead to better generalization capabili-
work resources and privacy leaks. Methods and features of network ties. There are methods like Deep belief network (DBN), Convolutional
intrusion are constantly changing and developing. Thus intrusion detec- neural network (CNN), Recurrent neural network (RNN), AutoEncoder,
tion is still an important research issue at present. and so on. In order to further improve the accuracy of recognition,
Intrusion detection technology has been continuously studied by the method of combining various data classification methods to form
researchers (Moustafa et al., 2019; Bhuyan et al., 2013; Jaiganesh et al., a hybrid classifier has been studied. A large number of experiments
2013; Aburomman and Reaz, 2017; Kabir et al., 2018). In general, intru- shown that hybrid-based techniques display a better detection perfor-
sion detection can be taken as a classification problem, classifying the mance for specific data sets. Because of a specific classifier and merge
incoming network into normal and attack one. Existing intrusion detec- method, they can achieve higher precision and detection rate than a
tion models mainly combine various existing machine learning methods single method.
with intrusion detection data sets. The intrusion detection data set is a Through continuous efforts, researchers now are able to design high
general big data set, which is directly input into the existing various accuracy detectors for fixed intrusion data sets. However, due to the
machine learning models to train the intrusion detection classifier. And continuous dynamic changes of network intrusion traffic, high accuracy
various current learning methods can be broadly classified into three for only fixed data sets cannot guarantee the excellent detection perfor-
types: traditional machine learning based method, deep learning based mance in the face of dynamic traffic. Our work conducts to analyze the
method, and hybrid method. detectability of dynamic intrusion traffic, and we then propose an effec-
tive intrusion detection algorithm based on semantic re-encoding and

∗ Corresponding author.
E-mail addresses: [email protected] (Z. Wu), [email protected] (J. Wang), [email protected] (L. Hu), [email protected] (Z. Zhang), [email protected] (H. Wu).

https://doi.org/10.1016/j.jnca.2020.102688
Received 28 November 2019; Received in revised form 18 March 2020; Accepted 29 April 2020
Available online 5 May 2020
1084-8045/© 2020 Elsevier Ltd. All rights reserved.
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

deep learning. Semantic re-encoding technology attempts to re-express the performance maybe not so good on other intrusion datasets. Gao
the semantic space of intrusion traffic to achieve the purpose of increas- et al. (2019) apply several base classifiers, including decision tree, ran-
ing the distinguishability of abnormal traffic. On the basis of semantic dom forest, kNN, DNN, and design an ensemble adaptive voting algo-
re-encoding, deep learning technology is used to enhance the general- rithm. Zhang et al. (2018), propose an Xgboost based on stacked sparse
ization ability of the intrusion detection model. The main contributions autoencoder network (SSAE-XGB) method. The SSAE is used to learn
of this work are as follows: the latent representation of original data, and the ensemble binary tree
do the classification. This method has weak point on the interpretable
1. We find that the semantics of network traffic are different. Nor-
of feature.
mal network traffic and attack network traffic often have signifi-
Yang et al. (2019) apply DCGAN to generating new samples and
cant differences in narrative semantics. Based on this, a semantic
then use the modified LSTM classifier method to classify the datasets.
re-encoding method for intrusion network flow is designed, which
This method showed the augment of dataset is important. However,
can effectively increase the distinguish ability of abnormal network
GAN needs huge compute resources and the generation of new data is
traffic.
lack of interpretable.
2. We design a deep learning-based detection model for intrusion
Compared with previous works, we propose a combination of algo-
traffic, which enhances the generalization capabilities of intrusion
rithm based on Semantic Recoding and Deep Learning. The semantic
detection models.
re-encoding is an effective feature extraction algorithm for intrusion
Experimental results show that our approach get competitive per- detection with good interpretable on feature, and the Deep Learning is
formance. to gain good generalization ability.
The rest of this paper is organized as follows: Section 2 introduces
the related works. Section 3 describes our proposed method in detail. 3. Proposed methodology
Section 4 shows experimental performances. Finally, the conclusion is
presented in Section 5. 3.1. Problem formulation

2. Related work As people’s activities in cyberspace become more frequent, network


intrusion traffic presents a trend of continuous dynamic changes, which
Previously, many researchers use methods on pure traditional clas- makes the detection model for fixed dataset design often unsatisfactory.
sifiers to the intrusion detection field. There are classifiers like Naïve More importantly, dynamically changing network intrusion traffic has a
Bayes, SVM, decision trees, kNN and so on (Dhanabal and Shanthara- large number of hidden and burst features showing discontinuity, while
jah, 2015; Deshmukh et al., 2015; Heba et al., 2010; Naoum and Al-Sul- the current mainstream deep learning model behaves better at charac-
tani, 2012).These methods have indeed achieved a lot of achievements, terizing continuously changing data features. Then how to improve the
and laid a solid foundation for later research. determination of network intrusion detection model as well as the abil-
Many researches have been conducted since deep-learning. ity to adapt to the dynamic changes of the detection objects, becomes
Researches make lots of work on the preprocess of dataset as well as an important research issue. In this study, we attempt to incorporate
search for good classifiers (Vinayakumar et al., 2017; Yin et al., 2017; some knowledge of network intrusion traffic into the intrusion classi-
Wu et al., 2018; Naseer et al., 2018; Hsu et al., 2018; Blanco et al., fier design.
2018). Various methods have been employed to improve the perfor- There are two types of IDS data sets. One is the protocol representa-
mance of intrusion detection based on the public dataset NSL-KDD. tion string of the original data, and the other is the feature set extracted
Ingre and Yadav (2015) propose a simple Artificial Neural Network by the experts from the original data applied to the host. For the origi-
method for intrusion detection. And they test the dataset on various nal data stream, by observing the attack and defense characteristics of
layers of classifier and also do feature selection on the dataset. Zhang the network application, it can be found that for the network attack
(Zhang et al., 2019) combines genetic algorithm with deep belief net- acting on a specific application, the normal traffic and the attack traf-
work, which effectively improves the detection rate and reduces the fic are significantly different in the narrative semantics. In other words,
complexity of classifier. The genetic algorithm is used to select the the attacker will inevitably change the original logic of a certain level of
appropriate network structure, and then use DBN to classify the samples network application. As long as the logic can be clearly expressed, it can
of the dataset. Al-Qatf et al. (2018) develop a new method by using the form a specific network traffic characteristic. This suggests that we can
sparse autoencoder for feature learning and dimensionality reduction. effectively differentiate network intrusion traffic with effective seman-
Besides, the learned features are fed into SVM algorithm to get final tic transformation. For feature-extracted data streams, such as NSL-KDD
classification result. The paper uses an unsupervised method to reduce datasets, the direct semantic differences between attack and normal
the dimension of feature. However, the effect of unsupervised learning traffic are not easily characterized, but semantic differences between
lacks a reliable basis. Le et al. (2019) apply a feature selection model to attacks and normal traffic can still be expressed through some escaping
test the performance of various RNN models on the dataset. The feature combinations.
selection model is to generate the best feature subset from the original The details of the proposed method based on semantic re-encoding
feature set. Combined with the LSTM, the model achieved good per- and deep learning (SRDLM) will be described in the next subsections.
formance on the IDS test dataset. Chouhan et al. (Chouhan Khanet al.,
2019) propose a novel Channel Boosted and Residual Learning architec- 3.2. Network traffic semantic re-encoding
ture for deep convolutional neural network. The experiment shows that
the method can improve the classification accuracy. The above methods In our method, both raw and converted network traffic can be
did varies preprocess to the specific dataset of intrusion, but they did semantically re-encoded, while the specific methods are different. As
not highlight the difference between the normal traffic and the attack a result, we first judge whether the dataset is made up of raw data or
one on purpose. not, and then re-code them separately. For the characterizable network
Many researchers apply hybrid approach on the dataset to get good traffic, such as web traffic, we resequence the network traffic charac-
performance (Li et al., 2017; Gao et al., 2019; Shrivas and Dewangan, teristics to a new symbol space by re-encoding.
2014). Li (Li et al., 2017) propose a two-step hybrid approach to solve
the problem. Step 1 uses C4.5 algorithm to get most of the exact label of 3.2.1. Semantic re-encoding for raw data
the samples. Step 2 employs kNN to divide the rest uncertain ones. The The existing network traffic raw data can easily be converted into a
model showed high accuracy but still lack of generalization ability, and form of character stream. For example, the network application on the

2
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Fig. 1. The procedure of reordering the word table.

Internet uses the HTTP protocol to express its semantics in the form of procedure makes any unequal-length word sequence to the equal-length
the character stream. Furthermore, the network traffic of some non-http word sequence. When all the positive and negative samples re-map, we
protocol can also be analyzed through its protocol format and then be get a set of training samples of equal length.
transformed into a character stream. Step 5, Word Sequence Re-projection. Word sequence re projection
Thus, the step 1 is “Transform data to character stream”. can be used as an optional step to make the new word sequence more
Step 2, Word Segmentation. It means a word table that needs to distinguishable in semantic space. The re-projection function can be as
be extracted from the raw character stream. We use delimiters such as follows:
breakpoint punctuation and special characters to split character stream.
S = {(xi , yi )} , 1 ≤ i ≤ m, y ∈ {0, 1} (1)
After that, we split the character stream into word string, and then pick
up the words into a word table separately. For example, in the HTTP m is the number of training samples, and Xi means the record (word
protocol, ‘&’, ‘|’, ‘∖’, ‘?’, ‘‖’ are delimiters. As delimiters will vary with sequence) in the training set. The length of the word bag is n. y ∈ {0, 1}
the network protocols, it needs to be updated continuously. According is the label of sample.
to the original word list, all records in the character stream are con- First, calculate the mean of positive and negative samples 𝜇 j respec-
verted into word array, and the access string is rearranged into a record tively, j means the class type, positive and negative.
set composed of words in the word list. Abnormal and normal sam- ∑
1
ples from the access traffic are processed, and a sample record set with 𝜇j = x , j ∈ {0, 1}, 1 ≤ i ≤ m (2)
mj (x ,y )∈S&y =j i
words segmented is formed. i i i

Step 3, Word Table Reordering. We want to get a new word table Second, calculate the intra-class dispersion matrix of positive and
to map the origin word table. As shown in Fig. 1, the new word table negative samples Sw . X0 is the sample in the positive set, and X1 is the
is ordered by the difference between positive and negative sample. We sample in the negative one.
then need to calculate the difference between positive and the negative ∑ ∑
sample, and define it as comprehensive word frequency (CWF). We cal- Sw = (x − 𝜇 0 ) (x − 𝜇 0 )T + (x − 𝜇 1 ) (x − 𝜇 1 )T (3)
culate positive and negative Word Frequency (WF) in the original word x ∈X 0 x ∈X 1

table separately and sort word array according to the word frequency. Third, calculate the inter-class dispersion matrix between positive
If a word appears multiple times in a row, it is counted as one time. Set and negative one.
threshold T1, T2, we can easily see from Fig. 1 that T1 > 0 and T2 < 0.
If CWF > T1 or CWF < T2, which means that the WF between posi- Sd = (𝜇0 − 𝜇1 ) (𝜇0 − 𝜇1 )T (4)
tive and negative is enormous, thus we will do One-to-one re-encoding. Finally, calculate positive and negative sample space separation pro-
While CWF < T1 and CWF > T2, do Many-to-one re-encoding. Besides, jection vectors W. W is the corresponding eigenvector of the largest
the multiple words that CWF < T1 and CWF > T2 are combined into eigenvalue of the matrix S− 1
w Sd . Then we do the dimensionality reduc-
word WordM. And unknown words are also encoded as WordM. Sort- tion operation, and the new feature space is wT xi .
ing the original word table by CWF, we get the new word table, named
Word-table-after. S1 = {(xi1 , yi1 )} , 1 ≤ i1 ≤ m, y ∈ {0, 1} (5)
Step 4, Word Re-mapping. Positive and negative sample records are
xi1 is the record in the training set. The dimension is n1. If the pre-
remapped to new isometric sample records based on the new word table
vious semantic re-encoding can better reflect the difference between
(Word-table-after). As shown in Fig. 2, in the whole mapping process,
normal and attack traffic, then this step can further open the spatial
multiple words will be mapped to one word, which is called Word bag
distance of the positive and negative samples. The algorithm is formal-
mapping. When n words in the sample map to the same word, the value
ized in Algorithm 1.
at the corresponding position of the word is added by n. The mapping

3
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Fig. 2. Re-mapping a sample into a word bag.

Algorithm 1 Network character stream semantic


re-encoding algorithm.
1: Transform data to the character stream.
2: Word segmentation.
3: Word Table Reordering.
4: Word Re-mapping.
5: Word Sequence Re-projection.(Optional)

3.2.2. Semantic re-encoding for converted network traffic


For data streams extracted from raw data, a set of m records is used
to express the network traffic,
{( 1 1 ) ( 2 2 ) }
X , y , X , y … (X m , ym ) . X is the feature vector, y is a label,
such as NSL_KDD. Semantic re-encoding can still be used. However,
because their original semantics have escaped, the effect of semantic
re-encoding is not as good as the raw data stream.
We slightly modify Algorithm 1 and apply it to the NSL_KDD dataset. Fig. 3. Histogram of scaled feature 29.
The NSL_KDD consists of original 41-dimensional features, and we then
perform semantic re-encoding on each dimension feature separately.
The specific process is as follows:
Table 1
For symbolic feature, a one-hot encoding is used to transfer the fea-
Transformation of numeric feature.
ture space. For example, the protocol type feature has three types: TCP,
UDP, ICMP. Then after a one-hot encoding, three types are converted Feature Interval Middle code New feature code
into (0,0,1), (0,1,0),(1,0,0) separately. 0–0.05 1 00001
For continuous numerical type feature i, we first normalize the fea- 0.05–0.1 2 00010
0.1–0.15 3 00100
ture to 0–1. And then we divide the feature equally to Nf intervals and
0.15–0.95 4 01000
count the feature frequency (FF) of normal and attack, or the positive 0.95–1 5 10000
and negative type. We think all Nf intervals as potential features. And
we call the difference of the positive feature frequency and the nega-
tive feature frequency as comprehensive feature frequency (CFF). Set a
threshold T = [T1,T2] as the standard of either do segment to the fea- and 0.95, the CWF is betweenT1 and T2. Then we merge the intervals
ture or not. T1 > 0,T2 < 0. According to the threshold, we can segment to a feature. About interval (0.95–1), CWF > 10000. So we can set
the feature to discrete type. Here are the specific operation: First, judge T29 = [0.05, 0.1, 0.15.0.95]. The result of this step is shown in Table 1.
the CFF of one interval if CWF > T1 or CWF < T2, then add the upper This step is almost the same operation as Algorithm 1 step 3. Word
limit value to intervali . In other words, the samples can be classified Table Reordering.
according to the intervals in intervali of feature i. Second, use a one-hot For NSL-KDD, after doing the previous Reordering step, the original
encoding. The transformation function is as follows: xi → vj vj+1 … vj+t . 41 dimension data changes to 169 dimension data. As shown in Fig. 4,
For example, Fig. 3 is the distribution of the original data feature 29. the t-SNE (Maaten and Hinton, 2008) algorithm is used to represent
We set Nf = 20, so a single interval size is 0.05. Set T = [10000, −500]. multi-dimensional data in a 3-dimensional space. It can be seen that in
We think normal is a positive type. We can see the intervals (0–0.05), contrast with 41 dimension, the 169 dimension data of the normal type
(0.05–0.1), (0.1–0.15), the CFF < −500. And for intervals between 0.15 and attack type can be separated more easily after Word Table Reorder-

4
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Fig. 5. Convert the original n dimension to a N dimension feature space.

n = v − vn (7)

s=v∗v (8)

The algorithm is formalized by Algorithm 2.

Algorithm 2 Semantic dispre-encoding for data extracted


from raw data.
Input: input parameters codei , Npca
Output: output final_code
1: Initial: interval list Intervali and len_interi
2: if f_type == 0 then
3: len_interi = d_numi
4: goto 15
5: else
Fig. 4. (a) 41-dimensional data in 3-dimensional representation. (b) 169- 6: while j ≤ Nf do
dimensional data in 3-dimensional representation. 7: if CFFj > T1 or CFFj < T2 then
8: Intervali .add(j∗ 1∕Nf )
9: end if
ing. In other words, in the 169 dimension feature space, the inter-code 10: end while
space is broadened. However, there are some samples overlapped in 11: len_interi = len(Intervali )
169 dimension. We add some PCA features to make the overlapped 12: end if
samples apart. 13: codei = norminization(codei )
We list some of the symbols in Table 2 to describe the algorithm. 14: codei = threshold_segment(codei , Intervali )
After adding Npca features the original n dimension is converted to 15: new_codei = one_hot(codei , len_interi )

N dimension. N = n + ni=1 len_interi + Npca . Further, the N-dimensional 16: v = add_pca_feature(new_codei , Npca )
data is extended to 4∗ N data, which consists of four parts (v, p, n, s). 17: final_code = calculate p, n, s, and combine v, p, n, s
As shown in Fig. 5, the original n-dimensional vector X is semantically 18: return final_code
re-encoded to get the N-dimensional vector v. We calculate the mean
feature map of positive training data (vp) and the negative training data
(vn). Then (p, n, s) is calculated as follows: 3.2.3. The effectiveness analysis of the semantic re-encoding and
re-mapping
p = v − vp (6)
We think of network traffic as a character stream of words. It is be
assumed that all attack traffic and normal traffic can be distinguished
by a sequence of finite-length words. We let q be a prime and n a posi-
Table 2 tive integer. A q − ary(n, K, d) code is used to represent the codeword
Symbol description. space formed by a sequence of words, where 1 ≤ d ≤ n, n represents
f_type The format type of feature (0 for discrete, 1 for numeric). the code length, K represents the number of code words, and d rep-
n The total number of original feature. resents the distance between codes. According to the assumption, all
T Threshold of the CFF. T = [T1, T2].
attacks and normal traffic are finite-length character sequences, thus
Nf Divide feature equally to Nf Intervals.
Intervali Interval list of feature i. the total number of code words K can be a specific value. According
len_interi The len of Interval list of feature i. to the Singleton Bound, then K ≤ qn−d+1 . According to the discus-
Npca The number of features after PCA. sion in Section 3.2.1-2, the semantic re-encoding and re-mapping can
N The total number of output new feature. increase the code word length, that is, they can increase the value of n.
d_numi The number of the total feature type of feature i. Only n+1
valid for discrete feature. Then, qd ≤ q K , as the value of n increases, the value of d can increase
accordingly. Meanwhile, the semantic re-encoding and re-mapping can

5
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Fig. 7. Residual learning: a building block.

Fig. 6. The architecture of 8-layer ResNet.

increase the average inter-symbol distance between positive and nega-


tive samples of the training set. Therefore, if the semantic re-encoding
and re-mapping are proper, the average inter-symbol distance between
positive and negative samples can be improved, thereby increasing clas-
sification accuracy.

3.3. Deep learning model

We choose the ResNet (He et al., 2016) as a deep learning frame-


work for the method because it shows a good generalization ability for
network unknown intrusion traffic.
ResNet is based on Convolutional Neural Networks (CNN). CNN are
very similar to ordinary neural networks, and they all consist of neurons
with learnable weights and biases. As shown in Fig. 6, ResNet is made
up of building blocks with “curved curve”. From Fig. 7, we can see that
the building blocks are made up of identity mapping and residual map- Fig. 8. Dos & Normal hybrid distribution in 2D and 3D values.
ping. Those two maps can solve the problem of the accuracy decreasing
as the network deepens. The identity mapping is the “curved curve” work can be regarded as the ensemble of shallow neural networks of
part. Formally, denoting the desired underlying mapping as H(x), we let different depths. Deepening the network will not cause the network to
the stacked nonlinear layers fit another mapping of F(x) ≔ H(x) − x. degradation, while the residual design can raise the generation ability.
The original mapping is recast into F(x) + x.
We hypothesize that it is easier to optimize the residual mapping 3.4. Multi-space projection
than to optimize the original, unreferenced mapping. To the extreme, if
an identity mapping were optimal, it would be easier to push the resid- There are many types of network intrusion traffic. For different types
ual to zero than to fit an identity mapping by a stack of nonlinear lay- of traffic characteristics, a single spatial transformation is generally dif-
ers. The formulation of F(x) + x can be realized by feedforward neural ficult to achieve proper feature extraction. In order to further separate
networks with “shortcut connections”. Shortcut connections are those the feature space formed by positive and negative samples, we proposed
skipping one or more layers. Identity-shortcut connections add neither an algorithm called Algorithm 3 using multi-space projection.
extra parameter nor computational complexity. The deep residual net-

6
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Fig. 9. The architecture of classifier.

Sometimes, even for the same type of attack, the network traffic it
generates will have many different types of characteristics. For exam- Algorithm 3 Multi-space Projection algorithm.
ple, in the NSL-KDD attack dataset, the DoS attack type is not balanced. 1: Pre-projection and get several training sets.
As shown in Fig. 8, the DoS and Normal data sets are processed by PCA 2: Train BCs for classify using different training sets.
(principal component analysis) algorithm for easy observation. The red 3: Aggression the results of BCs.
is the DoS dataset, and the blue is the Normal dataset. It can be seen 4: Final judge.
from Fig. 8 that the red dot overlaps the blue dot, and the DoS datasets
are distributed in different three regions. It is difficult for the learning
algorithm to learn three different distribution areas for the same type 4. Experimental results and analysis
of attack traffic at the same time.
We describe the steps of Algorithm 3 in the following. This experiment examines two data sets. One is the dataset collected
The first step in Algorithm 3 is to use K-means to get more negative especially for web attack in Hangdian Security Lab, which contains
set. In other words, we separate the negative sets into more groups. The both the normal and abnormal http streams, named Hduxss_data1.0.
positive sample set is still in a group. For example, we change the attack The other is the NSL-KDD, which is considered to be the benchmark
type of DoS into DoS1, DoS2, and DoS3. The purpose of this step is to evaluation data set in the field of intrusion detection. The experiment
make the classification of the sample set closer to the multi-dimensional is performed on Pytorch 1.0 using a computer with GPU 2080ti, the
normal distribution. operating system is Ubuntu 18.04, and the memory is 32G.
The second step: Multi-type binary classifiers (BCs) are used to
divide the positive and negative samples. One of BCs consists of a pre- 4.1. Hduxss_data1.0 data set
projection module + a specific training set + a ResNet network. Assum-
ing that the pre-projection module runs first, and the network flow fea- The Hduxss_data1.0 data set consists of three types of data:
ture space obtained has a better specific feature distribution. The new
distribution can be taken as a multidimensional normal distribution, 1. Attack samples generated by the SQLMAP(SQLMAP is an open
and it can be effectively distinguished by BCs. Although ResNet can be source penetra-tion testing tool that automates the process of detect-
stacked to very deep layers, we think that even network is not so deep ing and exploiting SQL injection flaws and taking over of database
enough to handle this problem. The structure of BCs is shown in Fig. 9. servers). The total number of this type is 810,000.
There are several BCs and each of them classifies one type of sample. 2. Attack samples manually and automatically generated by the XSS-
For an in-coming connection x, BCs make their decision, and if the BCi LESS tool; the total number of this type is 11,000.
considers that it belongs to the normal, we define BCi (x) = 1, and 3. Normal request samples collected through Firefox browser when
when it belongs to the abnormal, we define BCi (x) = 0. browses various web pages. We extract parameters from them and
The third step is to collect the results of the BCs, and then to calcu- obtains 130,000 normal samples.
late the sum of the results. For normal samples and abnormal samples, we randomly select 50%
The final step is to judge if the result is more than 1, then the x is of the data for training, 50% of the data for testing, and repeat 10
a normal sample, or else the x is an attack sample. That is, if any of rounds of testing for the average. The results are shown in Table 3. In
the four outcomes votes the connection to be an attack type, then we the test process, all data is processed by SRDLM Algorithm 1, and the
make the decision that the connection is an attack. The transformation outputs are reclassified by support vector machine, naive Bayes, and
function is as follows: SRDLM Algorithm 3. It can be seen from Table 3 that all the machine

3 learning methods involved in the test achieved an accuracy of more
BCi (x) >= 1 (9) than 99% and F1 values, which is mainly because Algorithm 1 has
i=0
a good feature separation effect on the above attack traffic. At the
We formalize this algorithm in Algorithm 3. same time, the SRDLM algorithm achieved the best recognition per-
formance compared with the classical machine learning algorithm. The

7
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Table 3
The result of Hduxss_data1.0 dataset.
accuracy precision recall F1
Algorithm 1 +SVM 99.3% 99.4% 99.2% 0.993
Algorithm 1 +Naive Bayes 99.0% 99.1% 99.1% 0.991
SRDLM(including Algorithm 1) 99.5% 99.7% 99.5% 0.996

4.2.1. The generation ability of ResNet


ResNet shows a good generation ability through experiment. We
train kNN, Shallow CNN and ResNet models on KDDTrain + to clas-
sify the normal and attack traffic in KDDTest+. For the convenience
of transformation, we remove the 7th dimension feature rarely used in
the 41 dimension features. The 7th dimension feature has zero values
in almost all samples in the training set. The remaining 40 features are
transformed into a 5 ∗ 8 matrix, which can be transformed into the input
matrix of a CNN model. The experiment result is shown in Table 5. The
table shows the correct predicted number of samples and the proportion
of this number in the test set of the same type with samples. The error
rate of the R2L type is the highest among the 5 types. This is mainly
because the R2L types in KDDTrain + and KDDTest + are very different,
and the classifier is difficult to generalize. As can be seen from Table 5,
the ResNet network has better average generalization ability for various
types of positive and negative samples. Then, the generation ability of
the three models on R2L type is analyzed as follows.
We use Grad-CAM (Selvaraju et al., 2017) to locate important
Fig. 10. The data demand of SRDLM algorithm in Hduxss_data Set.
regions with class discrimination in the input sample. As shown in
Fig. 11, the result of Grad-CAM is a heat map. The red area indicates
that the pixels of this part contribute the most to the classification
SRDLM Algorithm 1 performs a round of semantic re-encoding on HTTP result. As can be seen from Fig. 11, features 34, 35, and 36 contribute
data streams, highlighting the special semantics of network data, and
increasing the distinguishability of data streams. In the future, no mat-
ter which classifier like SVM, Bayes or SRDLM is used, good detection
performance can be achieved.
The algorithm stability test results are shown in Fig. 10. It can be
shown that the algorithm requires less data, even in the case of only
one thousandth of the data is trained, and it still get high classification
accuracy (>98%). With the increase of the training set, the accuracy
rate is gradually close to 1. Tests show that the SRDLM algorithm has
good stability.

4.2. NSL-KDD data set

Each record of NSL-KDD dataset is a vector that contains 41 features


and a label which marks the types: normal or attack. The attack types
include four categories: DoS, Probe, U2R, R2L. We use KDDTrain+ and
KDDTest+ in our experiment. The data distribution is shown in Table 4. Fig. 11. Heatmap of a R2L sample on ResNet model.

Table 4
The data distribution of KDDTrain+ and KDDTest+.
Total Normal DoS Probe R2L U2R
KDDTrain+ 125973 37343 45927 11656 995 52
KDDTest+ 22544 9711 7458 2421 2754 200

Table 5
The correct predicted number of all types on KDDTest+.
kNN kNN rate Shallow CNN Shallow CNN rate ResNet ResNet rate
Normal 9478 97.60% 9126 96.38% 8067 89.04%
DoS 6094 87.71% 6315 84.67% 7099 95.19%
Probe 1863 76.95% 2116 87.40% 2353 97.19%
R2L 149 5.41% 70 2.54% 1036 37.62%
U2R 115 57.5% 112 56% 119 59.5%

8
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Fig. 13. The ResNet result on new ‘R2L’ samples.

the most to the determination of the sample type. Then, the perfor-
mance on R2L of classifiers through feature 34, 35, 36 was studied.
The result is shown in Fig. 12. In Fig. 12, the x-axis, y-axis, and z-axis
are respectively features 34,35, and 36 named ‘dst_host_same_srv_rate’,
‘dst_host_diff_srv_rate’ and ‘dst_host_same_src_port_rate’. There are three
types of dots in the figure. The green dots and the orange dots mean
the test samples of R2L type wrongly predicted and correctly predicted
separately by the models we trained. And the red dots mean the train
samples of R2L type. The abbreviated name of the three types of dots
in Fig. 12 are ‘R2L_test_F’, ‘R2L_test_T’ and ‘R2L_train’. Obviously, the
orange dots in Fig. 12(c) are much larger than those in Fig. 12(a) (b).
From Fig. 12(c), it can be seen that the trained ResNet model learns
some of the attack traffic when the feature 34 is between (0–0.6), the
feature 35 is between (0–1) and the feature 36 is between (0–0.7).
While in the kNN and Shallow CNN, from Fig. 12(a) and (b), it can
hardly be seen that the correct predicted sample. We then created new
5000 samples that the feature 34, 35, 36 were in the interval of (0–0.6),
(0–1), (0–0.7) separately, while the other features were the same as one
of R2L samples in the training set. From Fig. 13, 67% of the samples
were predicted right through the ResNet model. The same test works
on KNN and shallow CNN networks, 7% of the samples were predicted
right for kNN model, while all the created 5000 samples were predicted
as the opposite type on the Shallow CNN model. Based on the present
experiments, we infer that the kNN model and the Shallow CNN model
does not learn the mapping of feature 34, 35, 36 to the classification
of the result. While the ResNet model is successfully generated from
the training sample to the near region. But there are still regions that
could not be generated well, which means the ResNet model still has
some limitation on generation. Comparing with the other two models,
ResNet model has better generalization ability.

4.2.2. Changing the depths of network


Based on the theoretical analysis of the CNN, it shows that the depth
of the network matters much on the classification accuracy and the
training time. Thus, we design an experiment to choose an appropriate
Fig. 12. (a).The kNN result on R2L. (b).The Shallow CNN result on R2L. (c).The depth.
ResNet result on R2L. The ‘R2L_test_F’ means the wrong predicted R2L samples For the ResNet, we change the numbers of filters to 2, 4, 8 respec-
in the test set. The ‘R2L_test_T’ means the correct predicted R2L samples in the tively. The network started with a 3∗3 convolution layer, and ended
test set. The ‘R2L_train’ means the R2L samples in the train set. with a global average pooling, a 2-way fully-connected layer, and soft-
max. There are totally 6n+2 stacked weighted layers. Table 6 summa-
rizes the architecture: The mix of KDDTrain+ and KDDTest + are used
as the training set and the KDDTest+ is the testing set. We compare 8,

9
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Table 6
The architecture of the ResNet used in the proposed
method.
Output map size 26∗26 13∗13 7∗ 7
#layers 1+2n 2n 2n
#filters 2 4 8

Table 7
Accuracy for binary classification
based on training data.
Depth of ResNet accuracy
ResNet_8 97
ResNet_20 97.35
ResNet_56 97.5

Fig. 15. The accuracy for binary classification based on KDDTrain+ and
20, 56-layer networks. From Table 7, it can be seen that with the deep- KDDTest+.
ening of network depth, there is hardly an improvement in accuracy.
It can be speculated that the depth of the 8-layer ResNet network is
enough, and the remaining data that could not be accurately identified There seems still some false alarm and missing alarm. In order to
is the data that the training set does not know at all. Deeper network promote the ability of generalization, we create some predicted data
means more time. As a result, we chose 8-layer ResNet in our following according to the feature of intrusion traffic, and put them into the
experiment. training set. And we use this new training dataset to train. The hyper-
parameters are the same as the previous one. As is shown in Fig. 14, the
semantically recoded data set performed better than the original data
4.2.3. Evaluating the impact of the semantic re-encoding and re-mapping set.
Three kinds of dimensions are employed to test the performance of We want to find why there are samples that cannot be classified
the feature re-encoding algorithm. They are 40∗4, 172∗4,194∗4. For right. So we mixed the KDDTrain+ with KDDTest+, and split it in to
the 40∗4 dimension, we remove the unusable feature 7 from the train- 10 folds with the same distribution of different types of network traffic.
ing and test set, and then copy 3 times to form a 40∗4 dimension. For Each time, 9 of them are used as the training set and 1 is used as the
the 172∗4 and 194∗4 dimension, Algorithm 2 is used to cut the two test set. After 10 times training and testing, we get the average accu-
dimensions to 676 dimensions, which could be transformed into 26∗26. racy. From Fig. 14, it can be seen that the cross-validated detection
In this section, we use 40∗4,172∗4 and 194∗4 dimension datasets rate raises obviously than the other two, and the re-encoded dataset is
to evaluate the re-encoding impact on the binary classification of NSL- a little better than the 40∗4 dimension. The reason can be that the
KDD. samples in KDDTrain + are not very typical. Some sample types of
First, KDDTrain + is used as the training set, and KDDTest + is KDDTest + do not appear at all in KDDTrain+, and the attack sam-
used as the test set. We set the batchsize of the training set to 16384. ples themselves are significantly different, which makes it difficult to
And the learning rate is 0.01, the optimization method is Adam, the predict KDDTest + based on KDDTrain+.
weight decay is 0.1, and the number of epoch is 10000. It can be seen We also compare with big convolution in the first layer, through
from Fig. 14 that the accuracy of 172∗4 and 194∗4 is better than 40∗4 trying to set the first convolution layer size to 7∗ 7, and contrast with the
dimension. This indicates the effectiveness of semantic re-encoding. 3∗3 network. The Algorithm 3 is used. Table 8 shows the result of BCs,
the 40∗4_3 means the 40∗4 dimension with the first convolution of 3∗3,
and the 40∗4_7 means the 40∗4 dimension with the first convolution of
7∗ 7. Referring to Fig. 8, the DoS dataset is separated into 3 parts. In
the process of experiment, it is found that although a single DoS is
easy to be classified, the effect of the combination of the BCs of DoS
and other Attacks is not good. For example, when using accuracy as
the classifier design criteria, the results of DoS classification have little
impact on the final classification results after merging the results of
all BCs. Through the experiment, it is found that using the precision
standard to design DoS classifier is the most helpful to the result of
multi BCs fusion. Therefore, in the experimental process, the precision
standard is used for the design of DoS classifier, and the rest of the
classifier design use the accuracy standard. The precisions of the DoS
classifiers are shown in Table 8. As shown in Fig. 15, the network whose
first convolution is 7∗ 7 convolution outperform the 3∗3 size, and the
best accuracy of this experiment is 94.03%.

4.2.4. Discussion and additional comparisons


The performance comparison between different classification algo-
rithms is shown in Table 9 and Table 10. From Table 9, the experimen-
Fig. 14. The accuracy for binary classification based on KDDTest + dataset with tal results show that SRDLM improves the performance by more than
different dimensions. 8% compared with other classifiers. While comparing the ANN with

10
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Table 8
Accuracy for binary classification based on KDDTrain+ and KDDTest+.
Normal accuracy R2l&U2R accuracy Probe accuracy DoS precision accuracy
DoS1 DoS2 DoS3
Normal ResNet(40 ∗ 4_3) 84.67 91.27 95 99.8 92.2 75.1 88.75
SRDLM(172 ∗ 4_3) 90.08 95.57 94.54 98.6 91.36 94.3 91.65
SRDLM(194 ∗ 4_3) 91.87 93.7 95.6 99.6 99.4 95.6 93.37
Normal ResNet(40 ∗ 4_7) 90.04 89.02 93.64 99.7 59 93.3 90.2
SRDLM(172 ∗ 4_7) 88.4 94.9 94 99.9 84.7 89.5 91.73
SRDLM(194 ∗ 4_7) 90.13 94.8 95.9 95.5 97 93.7 94.03

Table 9 with huge semantic coding space and negligible word order. How-
Performance of the proposed method for testing KDDTest+. ever, for the network traffic that has been extracted features, seman-
tic re-encoding technology has limited performance improvement in
accuracy precision recall F1
traffic detection. Semantic re-encoding technology can be combined
kNN 79.06% 67.8% 97.7% 0.8008
with deep learning technology to achieve better network traffic detec-
DBN 63.9% 60.45% 46.8% 0.5276
Shallow CNN 79.72% 68.92% 96.37% 0.8037 tion results. This paper studies the ResNet network architecture and
ANN (Ingre and Yadav, 2015) 81.16% 96.59% 69.35% 0.8073 combines ResNet with semantic re-encoding to effectively improve the
RNN-IDS (Yin et al., 2017) 83.28% 96.92% 72.95% 0.8324 generalization ability of the network anomaly detection model. The
SRDLM 94.03% 95.37% 90.53% 0.9288
follow-up work will study the prediction of network abnormal traffic
to enhance the robustness of the network detection model.

CRediT authorship contribution statement


Table 10
Complexity for different types of Zhendong Wu: Methodology, Writing - original draft, Writing -
algorithms. m is the number of train
review & editing, Software, Resources. Jingjing Wang: Software, Writ-
set, n is the dimension of input
ing - original draft. Liqing Hu: Investigation, Formal analysis. Zhang
samples, r is the kernel size of
Zhang: Validation. Han Wu: Validation.
convolutions, l is the number of
network layers, c is the convolution
channels, k is the number of BCs. Declaration of competing interest
Complexity
The authors declare that they have no known competing financial
kNN O(m∗n) interests or personal relationships that could have appeared to influence
CNN O(n∗r∗c∗l)
SRDLM O(n∗r∗c∗l∗k)
the work reported in this paper.

Acknowledgement

the SRDLM, the classification result is unsatisfactory. It can be inferred This research is supported by National Natural Science Foun-
that the semantic re-encoding and ResNet improved the gap between dation of China (No.61772162), Key Projects of NSFC Joint Fund
the attack and the normal samples, and the deep learning algorithm of China (No.U1866209), National Natural Science Foundation
has the ability to fit nonlinear operations. Comparing shallow CNN of China (No.61602144), National Key R&D Program of China
with SRDLM, the advantage of semantic re-encoding and the genera- (No.2018YFB0804102).
tion ability of ResNet are shown obviously. In addition, as the ResNet
has the strength of CNN, it runs less time than RNN-IDS. Compared References
with several current algorithms, the SDRLM algorithm can effectively
learn the latent feature of attack sample of each types attack and make Aburomman, A.A., Reaz, M.B.I., 2017. A survey of intrusion detection systems based on
ensemble and hybrid classifiers. Comput. Secur. 65, 135–152.
the re-encoded semantic space boundaries clearing, thus leading a good Al-Qatf, M., Lasheng, Y., Al-Habib, M., Al-Sabahi, K., 2018. Deep learning approach
accuracy. Table 10 shows the computational complexity of related typ- combining sparse autoencoder with svm for network intrusion detection. IEEE
ical algorithms. As can be seen in Table 10, the complexity of kNN Access 6, 52843–52856.
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K., 2013. Network anomaly detection:
is O(m∗n). When classifying test samples, the algorithm complexity of methods, systems and tools. IEEE Commun. Surv. Tutor. 16, 303–336.
kNN is positively related to the size of existing samples, while CNN and Blanco, R., Malagn, P., Cilla, J.J., Moya, J.M., 2018. Multiclass network attack classifier
SRDLM algorithms are not limited by the size of existing sample sets. using cnn tuned with genetic algorithms. In: 28th International Symposium on
Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 177–182.
The complexity of the rest algorithms in Table 9 has nothing to do with
Chouhan, N., Khan, A., et al., 2019. Network anomaly detection using channel boosted
the number of training set, but they are related to the size of the learn- and residual learning based deep convolutional neural network. Appl. Soft Comput.
ing model. The complexity of CNN is O(n∗r∗c∗l), and the SRDLM is just 83, 105612.
k times more than the CNN model because of the k BCs computing. The Deshmukh, D.H., Ghorpade, T., Padiya, P., 2015. Improving classification using
preprocessing and machine learning algorithms on nsl-kdd dataset. In: International
r, c, k, and l values of the SRDLM algorithm are all <10. By introducing Conference on Communication, Information & Computing Technology (ICCICT).
GPU parallel computing, the algorithm’s running time can be reduced IEEE, pp. 1–6.
to an acceptable level. Dhanabal, L., Shantharajah, S., 2015. A study on nsl-kdd dataset for intrusion detection
system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng.
4, 446–452.
5. Conclusions Gao, X., Shan, C., Hu, C., Niu, Z., Liu, Z., 2019. An adaptive ensemble machine learning
model for intrusion detection. IEEE Access 7, 82512–82521.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition.
This paper proposes an SRDLM intrusion detection method based In: Proceedings of the IEEE Conference on Computer Vision and Pattern
on semantic re-encoding and deep learning. The SRDLM algorithm Recognition, pp. 770–778.
has advantages in dealing with anomaly detection of network traffic

11
Z. Wu et al. Journal of Network and Computer Applications 164 (2020) 102688

Heba, F.E., Darwish, A., Hassanien, A.E., Abraham, A., 2010. Principle components in Computing, Communications and Informatics (ICACCI). IEEE, pp. 1222–1228.
analysis and support vector machine based intrusion detection system. In: 10th Wu, K., Chen, Z., Li, W., 2018. A novel intrusion detection model for a massive network
International Conference on Intelligent Systems Design and Applications. IEEE, pp. using convolutional neural networks. IEEE Access 6, 50850–50859.
363–367. Yang, J., Li, T., Liang, G., He, W., Zhao, Y., 2019. A simple recurrent unit model based
Hsu, C.-M., Hsieh, H.-Y., Prakosa, S., Azhari, M., Leu, J.-S., 2018. Using long-short-term intrusion detection system with dcgan. IEEE Access 7, 83286–83296.
memory based convolutional neural networks for network intrusion detection. In: Yin, C., Zhu, Y., Fei, J., He, X., 2017. A deep learning approach for intrusion detection
WICON, pp. 86–94. using recurrent neural networks. IEEE Access 5, 21954–21961.
Ingre, B., Yadav, A., 2015. Performance analysis of nsl-kdd dataset using ann. In: Zhang, B., Yu, Y., Li, J., 2018. Network intrusion detection based on stacked sparse
International Conference on Signal Processing and Communication Engineering autoencoder and binary tree ensemble method. In: IEEE International Conference on
Systems. IEEE, pp. 92–96. Communications Workshops (ICC Workshops). IEEE, pp. 1–6.
Jaiganesh, V., Mangayarkarasi, S., Sumathi, P., 2013. Intrusion detection systems: a Zhang, Y., Li, P., Wang, X., 2019. Intrusion detection for iot based on improved genetic
survey and analysis of classification techniques. Int. J. Adv. Res. Comput. Commun. algorithm and deep belief network. IEEE Access 7, 31711–31722.
Eng. 2, 1629–1635.
Kabir, E., Hu, J., Wang, H., Zhuo, G., 2018. A novel statistical technique for intrusion
detection systems. Future Generat. Comput. Syst. 79, 303–318.
Zhendong Wu received the M.S. degree and the PhD degree in Computer Science and
Le, T.-T.-H., Kim, Y., Kim, H., et al., 2019. Network intrusion detection based on novel Technology from the Zhejiang University, Hangzhou, China. Currently, he is an Associate
feature selection model and various recurrent neural networks. Appl. Sci. 9, 1392.
Professor with the School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China.
Li, L., Yu, Y., Bai, S., Hou, Y., Chen, X., 2017. An effective two-step intrusion detection His current research interests include biometrics, biological cryptography, machine intel-
approach based on binary classification and k-nn. IEEE Access 6, 12060–12073.
ligence and natural language research.
Maaten, L.v. d., Hinton, G., 2008. Visualizing data using t-sne. J. Mach. Learn. Res. 9,
2579–2605.
Jingjing Wang is currently pursuing the master degree in Information security at
Moustafa, N., Hu, J., Slay, J., 2019. A holistic review of network anomaly detection
Hangzhou Dianzi University, Hangzhou, China. Her research interests include data min-
systems: a comprehensive survey. J. Netw. Comput. Appl. 128, 33–55.
ing, deep learning and intrusion detection.
Naoum, R.S., Al-Sultani, Z.N., 2012. Learning vector quantization (lvq) and k-nearest
neighbor for intrusion classification. World Comput. Sci. Inf. Technol. J. (WCSIT) 2,
Liqin Hu received the Ph.D. degree in mathematics from the Nanjing University of Aero-
105–109.
nautics and Astronautics, Nanjing, China. She is a lecturer of the School of Cyberspace
Naseer, S., Saleem, Y., Khalid, S., Bashir, M.K., Han, J., Iqbal, M.M., Han, K., 2018.
Security at Hangzhou Dianzi University. Her research interests include cryptography, and
Enhanced network anomaly detection based on deep neural networks. IEEE Access
coding theory.
6, 48231–48246.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017.
Grad-cam: visual explanations from deep networks via gradient-based localization. Zhang Zhang is currently pursuing the master degree in School of Systems Science at
In: Proceedings of the IEEE International Conference on Computer Vision, pp. Beijing Normal University, Beijing, China. His research interests include complex system
618–626. and Machine learning techniques.
Shrivas, A.K., Dewangan, A.K., 2014. An ensemble model for classification of attacks
with feature selection based on kdd99 and nsl-kdd data set. Int. J. Comput. Appl. 99, Han Wu is currently pursuing the master degree in Cyberspace security at Hangzhou
8–13. Dianzi University, Hangzhou, China. His research interests include computer vision, deep
Vinayakumar, R., Soman, K., Poornachandran, P., 2017. Applying convolutional neural learning and datamining.
network for network intrusion detection. In: International Conference on Advances

12

You might also like