An Enhanced AI-Based Network Intrusion Detection System Using Generative Adversarial Networks-1
An Enhanced AI-Based Network Intrusion Detection System Using Generative Adversarial Networks-1
3, 1 FEBRUARY 2023
Abstract—As communication technology advances, various and Things (IoT), and the capacity of network systems has been
heterogeneous data are communicated in distributed environ- expanded to process these data reliably. However, as the access
ments through network systems. Meanwhile, along with the points are diversified, the attack surface expands, thereby
development of communication technology, the attack surface
has expanded, and concerns regarding network security have leaving the network systems vulnerable to potential threats.
increased. Accordingly, to deal with potential threats, research Moreover, cyber-attack techniques have become more com-
on network intrusion detection systems (NIDSs) has been actively plex and sophisticated, and the frequency of attacks has also
conducted. Among the various NIDS technologies, recent interest increased. Accordingly, the importance of cybersecurity is
is focused on artificial intelligence (AI)-based anomaly detection emphasized, and various studies have been actively conducted
systems, and various models have been proposed to improve the
performance of NIDS. However, there still exists the problem to prevent potential network threats.
of data imbalance, in which AI models cannot sufficiently learn One of the fundamental challenges in cybersecurity is
malicious behavior and thus fail to detect network threats accu- the detection of network threats, and various results have
rately. In this study, we propose a novel AI-based NIDS that can been reported in the field of network intrusion detection
efficiently resolve the data imbalance problem and improve the systems (NIDSs). In particular, the most recent studies have
performance of the previous systems. To address the aforemen-
tioned problem, we leveraged a state-of-the-art generative model been focused on applying the artificial intelligence (AI) tech-
that could generate plausible synthetic data for minor attack nology to NIDS, and AI-based intrusion detection systems
traffic. In particular, we focused on the reconstruction error and have achieved remarkable performance. Initially, the research
Wasserstein distance-based generative adversarial networks, and primarily focused on applying traditional machine learning
autoencoder-driven deep learning models. To demonstrate the models, such as decision trees [1] (DTs) and support vector
effectiveness of our system, we performed comprehensive evalu-
ations over various data sets and demonstrated that the proposed machines [2] (SVMs) to existing intrusion detection systems,
systems significantly outperformed the previous AI-based NIDS. and it has now been extended to deep learning approaches [3],
such as convolutional neural networks (CNNs), long short-
Index Terms—Anomaly detection, generative adversarial
network (GAN), network intrusion detection system (NIDS), term memory (LSTM), and autoencoders. Although these
network security. results have achieved remarkable performance in detecting
anomalies, there still exist limitations in deploying them in
real systems.
I. I NTRODUCTION In general, most of the network flow data is normal traffic,
ITH the development of the fifth-generation (5G) and malicious behavior that can cause service failure occurs
W mobile communication technology that diversifies the
access environments and constructs distributed networks,
rarely. Moreover, within the category of malicious behavior,
most of the data are well-known attacks, and specific types of
various and heterogeneous data are communicated through attacks are extremely rare. Due to this data imbalance problem,
network systems. In general, these data originate from diverse AI models deployed in NIDS cannot sufficiently learn the char-
domains, such as sensors, computers, and the Internet of acteristics of specific network threats, and this may leave the
network systems vulnerable to the attacks owing to the poor
Manuscript received 5 April 2022; revised 14 August 2022; accepted 21 detection performance.
September 2022. Date of publication 3 October 2022; date of current version
24 January 2023. This work was supported by the Institute of Information and In this study, to address this inherent problem, we propose
Communications Technology Planning and Evaluation (IITP) Grant funded by a novel AI-based NIDS that can resolve the data imbal-
the Korea Government (MSIT, Development of 5G Edge Security Technology ance problem and improve the performance of the previous
for Ensuring 5G+ Service Stability and Availability) under Grant 2020-0–
00952. (Corresponding author: Cheolhee Park.) systems. To address the aforementioned problem, we leveraged
Cheolhee Park, Jonghoon Lee, Youngsoo Kim, Jong-Geun Park, a state-of-the-art deep learning architecture, generative adver-
and Hyunjin Kim are with the Cyber Security Research Division, sarial networks [4] (GANs), to generate synthetic network
Electronics and Telecommunications Research Institute, Daejeon 34129, South
Korea (e-mail: [email protected]; [email protected]; [email protected]; traffic data. In particular, we focused on the reconstruction
[email protected]; [email protected]). error and Wasserstein distance-based GAN architecture [5],
Dowon Hong is with the Department of Applied Mathematics, Kongju which can generate plausible synthetic data for minor attack
National University, Gongju 32588, South Korea (e-mail: dwhong@
kongju.ac.kr). traffic. By combining the generative model with anomaly
Digital Object Identifier 10.1109/JIOT.2022.3211346 detection models, we demonstrated that the proposed systems
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2331
outperformed previous results in terms of the classification system outperforms existing AI-based NIDS in terms
performance. of detection rate.
The entire architecture of our system consists of four main 2) Through comparative experiments with various deep
stages (see Fig. 1): 1) preprocessing; 2) generative model learning models, we present that the detection
training; 3) autoencoder training; and 4) predictive model performance for rare attacks can be improved by apply-
training. In the preprocessing stage, the system refines the raw ing our methodology it as a base module.
data set into a format that deep learning models can learn. 3) By experimenting with data sets collected from vari-
After preprocessing, the system sequentially trains generative ous scenarios, we show that the proposed system can be
models and an autoencoder model, where the trained genera- effectively applied to real-world environments.
tive models are utilized to train the autoencoder model. Finally, The remainder of this article is organized as follows.
the system trains predictive models by applying the trained Section II briefly reviews related research from the perspec-
generative models and the encoder of the trained autoencoder, tive of NIDS based on machine learning and deep learning
where the generative models are used to generate scarce data approaches, and Section III provides a background with a
and the encoder is used as a feature extractor. In the case of the focus on autoencoders and GANs. In Section IV, we describe
classifier models, we consider three deep learning models that our methodology and the proposed framework as well as the
have been widely utilized in AI-based NIDS: 1) deep neural four main stages in detail. In Section V, we evaluate the
networks (DNNs); 2) CNNs; and 3) LSTM model. To evalu- proposed system in various environments and present exper-
ate our system, we experimented with four network flow data imental results with detailed analysis. Finally, we present
sets considering different scenarios: 1) NSL-KDD [6], [7]; concluding remarks and future work directions of this study
2) UNSW-NB15 [8]; 3) IoT data set [9]; and 4) real-world in Section VI.
data set. Through experiments on these various data sets, we
show that the proposed system outperformed previous results.
Moreover, we demonstrate that our methodology can improve II. R ELATED W ORK
the performance of existing AI-based NIDS by resolving the In the field of AI-based NIDSs, many studies have been
data imbalance problem. conducted to apply machine learning and deep learning tech-
The main contributions of the proposed approach can be nologies as anomaly detection. Ingre and Yadav [10] proposed
summarized as follows. multilayer perceptron-based intrusion detection system and
1) By combining the state-of-the-art GAN model that showed that the proposed approach achieve 81% and 79.9%
can generate plausible synthetic data and measure the accuracy in experiments on the NSL-KDD data set for
convergence of training, we show that the proposed binary and multiclassification, respectively. Gao et al. [11]
2332 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 3, 1 FEBRUARY 2023
proposed a semi-supervised learning approach for NIDSs ensemble approach and showed that the proposed system
based on fuzzy and ensemble learning and reported that can effectively detect intrusion attacks regardless of tempo-
the proposed system achieved 84.54% accuracy on the ral correlation. Moreover, they demonstrated that the proposed
NSL-KDD data set. By applying the deep belief network omni-IDS outperformed previous deep learning approaches
(DBN) model, Alrawashdeh and Purdy, [12] developed an through experiments on a SCADA testbed.
anomaly intrusion detection system and showed that the In addition to the previous approach of applying supervised
proposed DBN-based IDS exhibited a superior classifica- learning as an anomaly detection model, several studies have
tion performance in subsampled testing sets (sampled subsets focused on the application of unsupervised learning, espe-
from the original data set). By considering the software- cially autoencoder models. Javaid et al. [25] proposed a sparse
defined networking environment, Tang et al. [13] proposed autoencoder-based NIDS and reported that the proposed model
a DNN-based anomaly detection system and reported that the achieved 79.1% accuracy for multiclassification on the NSL-
DNN-based approach outperformed traditional machine learn- KDD data set. Similarly, Yan and Han [26] leveraged the
ing model approaches (e.g., Naïve Bayes, SVM, and DT). sparse autoencoder model to extract high-level feature repre-
Imamverdiyev and Abdullayeva [14] proposed a restricted sentations of intrusive behavior information and demonstrated
Boltzmann machine (RBM)-based intrusion detection system that the stacked sparse autoencoder model could be applied
and showed that the Gaussian–Bernoulli RBM model outper- as an efficient feature extraction method. Shone et al. [27]
formed other RMB-based models (such as Bernoulli-Bernoulli proposed a stacked nonsymmetric deep autoencoder-based
RBM and DBN). From the perspective of utilizing both intrusion detection system. Shone et al. [27] showed that the
behavioral (network traffic characteristics) and content fea- proposed model could achieve 85.42% accuracy in multiclassi-
tures (payload information), Zhong et al. [15] introduced a fication. As one of the significant results, Ieracitano et al. [28]
big data and tree architecture-driven deep learning system into proposed an autoencoder-driven intrusion detection model.
the intrusion detection system, where the authors combined Ieracitano et al. [28] proposed autoencoder-based and LSTM-
shallow learning and deep learning strategies and showed based IDS models and compared their performance with
that the system is particularly effective at detecting subtle conventional machine learning models. Through experiments
patterns for intrusion attacks. With the ensemble model-like on the NSL-KDD data set, they reported that the proposed
approach, Haghighat et al. [16] proposed an intrusion detec- autoencoder-based systems outperformed other models and
tion system based on deep learning and voting mechanisms. achieved 84.21% and 87% accuracy for binary and multiclas-
Haghighat and Li [16] aggregated the best model results and sification, respectively.
showed that the system can provide more accurate detec- As another approach to applying unsupervised learning,
tions. Moreover, they showed that the false alarms can be several studies have investigated using generative models to
reduced up to 75% compared to the conventional deep learn- improve the performance of existing NIDS. In particular,
ing approaches. Considering data streams in industrial IoT they have focused on applying the basic GANs [4], which
environments, Yang et al. [17] proposed a tree structure- are based on the Jensen–Shannon divergence (or Kullback–
based anomaly detection system, where the authors incorporate Leibler divergence) [29], [30], [31]. Thereafter, along with
the window sliding, detection strategy changing, and model the development of various GAN models, studies have been
updating mechanisms into the locality-sensitive hashing-based conducted to apply appropriate GAN models for specific
iForest model [18], [19] to handle the infiniteness of data purposes. Li et al. [32] and Lee et al. [33] utilized the
streams in real-time scenario. Similarly, Qi et al. [20] proposed Wasserstein divergence-based GAN model to generate the syn-
an intrusion detection system for multiaspect data streams by thetic data, and Dlamini et al. [34] proposed a conditional
combining locality-sensitive hashing, isolation forest, and prin- GAN-based anomaly detection model to improve the classi-
cipal component analysis (PCA) techniques. Qi et al. [20] fication performance in the minority classes. By focusing on
showed that the proposed system can effectively detect group specific industrial environments, Li et al. [35] and Alabugin
anomalies while dealing with multiaspect data and process and Sokolov [36] proposed LSTM-GAN and bidirectional
each data row faster than the previous approaches. GAN-based anomaly detection models, respectively. Through
From the perspective of dealing with time-series data, sev- experiments on the secure water treatment (SWaT) data set,
eral results have been reported focusing on recurrent models. they demonstrated that GAN models could be effectively
Kim et al. [21] proposed an LSTM-based IDS model and applied to IDS. Siniosoglou et al. [37] proposed an anomaly
proved the efficiency of the proposed IDS. Yin et al. [22] detection model that could simultaneously detect anomalies
proposed a recurrent neural network-based intrusion detec- and categorize the attack types. Siniosoglou et al. [37] encap-
tion system and achieved 83.3% accuracy and 81.3% accuracy sulated the autoencoder architecture into the structure of the
in binary and multiclassification, respectively. Xu et al. [23] basic GAN model (i.e., deploying the encoder as a discrimina-
developed a recurrent neural network-based intrusion detec- tor and the decoder as a generator) and proved the efficiency
tion model and reported that the gated recurrent unit was of the proposed model in various smart grid environments.
more suitable as a memory unit for intrusion detection than Unlike previous GAN approaches that are based on the
the LSTM unit. By considering supervisory control and data distance between data distributions, we considered the recon-
acquisition (SCADA) networks, Gao et al. [24] proposed struction error-based GAN model to generate more plausible
an omni-intrusion detection system. Gao et al. [24] com- synthetic data. In particular, we leveraged the boundary equi-
bined LSTM and a feedforward neural network through an librium GAN (BEGAN) model [5], which is based on the
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2333
and synthetic samples), the objective of BEGAN is defined attributes. In general, normalization (e.g., [28]) and standard-
based on the Wasserstein distance between reconstruction error ization (e.g., [24]) can be considered as scaling for numeric
distributions as follows: features. Between these two approaches, we adopted the min-
⎧ max normalization method.2 The normalization function fA (·)
⎨ LD = L(x; θD ) − kt · L(G(z; θG ); θD )
LG = L(G(z; θG ); θD ) (5) for a numeric attribute A that maps ∀x ∈ A into a range [0, 1]
⎩ can be defined as follows:
kt+1 = kt + λk · (γ · L(x; θD ) − L(G(z; θG ); θD )
xi − min xj
where the hyperparameter γ ∈ [0, 1] is the diversity ratio,1 fA (xi ) = x̃i = (7)
max xj − min xj
and λk serves as the learning rate for k. Note that L(·) denotes
the reconstruction error of the autoencoder, and t indicates the where xi denotes the ith attribute value in the attribute A.
iteration step. In general, existing deep-learning-based approaches con-
sider feature extraction (e.g., PCA, Pearson correlation coeffi-
IV. P ROPOSED M ETHODOLOGY cient, etc.) at this step to feed the model as many informative
features as possible, and, consequently, feature extraction can
As shown in Fig. 1, the entire architecture of the proposed
significantly impact the performance of models in anomaly
AI-based NIDS consists of four main streams: 1) preprocess-
detection. However, we do not consider the computational
ing; 2) generative model training; 3) autoencoder training; and
feature extraction process, as our framework embeds an
4) predictive model training. In this section, we describe the
autoencoder model that can replace functionalities of feature
proposed methodology and each module (process) in detail.
extraction. Note that, in our framework, the model with a
computational feature extraction process did not show sig-
A. Preprocessing nificant improvement compared with the model without the
Before building and training AI models, the system refines feature extraction. A detailed description for deploying the
a given raw data set via the preprocessing module that consists autoencoder as a feature extractor is presented later.
of three subprocesses: 1) outlier analysis; 2) one-hot encoding;
and 3) feature scaling. B. Synthetic Data Generation With Generative Model
In the outlier analysis phase, the system eliminates outliers,
The synthetic data generation module builds and trains
which can negatively affect the model training. Typically, out-
generative models using the data set refined in the data pre-
liers are detected by quantifying the statistical distribution of
processing module. In the case of the generative model, we
the data sets via robust measures of scale. There are several
utilize a state-of-the-art GAN model, BEGAN, which is based
standard robust measures of scale for detecting outliers, such
on the concept of autoencoders and the reconstruction error-
as interquartile range (IQR) and median absolute deviation
based objective function. For the model architecture, we built
(MAD). Among these measures, we leveraged the MAD.
the discriminator as a symmetric autoencoder model with five
For a numeric attribute A = {x1 , x2 , . . . , xn }, the MAD of
layers and the generator with the same architecture as the
the attribute is defined as follows:
decoder of the discriminator (autoencoder). Fig. 4 illustrates
MAD = median(|xi − median(A)|). (6) the entire architecture of the BEGAN model. Before train-
ing the BEGAN model, the system first splits the given data
We assume that numeric attributes appearing in the data set set according to the classes and then builds generative mod-
follow a normal distribution. Then, a consistent estimator σ̂ for els for each split subdata set. That is, generative models are
the estimation of the standard deviation is 1.4826 × MAD. In built in a number equal to the number of classes, and (after
terms of this estimator, we determine that for a given numeric training) each generative model produces only synthetic data
attribute, values exceeding 10 × σ̂ are outliers. Obviously, out- corresponding to a particular class.
lier analysis is performed only on the numerical attributes One of the important factors that must be considered when
and conducted independently for each class. Note that out- applying GAN models to NIDS is the determination of the
lier removal should be performed before scaling features, as termination criteria of training, which has a significant impact
it can potentially obscure information about outliers. on the performance of anomaly detection, as it is directly
After filtering out the outliers, the system transforms nomi- related to the quality of the synthetic data to be trained on
nal attributes into one-hot vectors. Each nominal (categorical) the detection model. The determination of the termination cri-
attribute is represented as a binary vector with the size of teria stems from the tracking of the training convergence,
the number of attribute values, where 1 is assigned only to and this is a difficult problem, as the objective function of
a point corresponding to the expressed value and 0 to all GAN models is defined to have the properties of a zero-
others. For example, in the case of the “protocol” attribute sum game. In general, monitoring the training progress has
(commonly included in network traffic data) with the values been conducted indirectly through visual inspection of syn-
tcp, udp, and icmp, the attribute is transformed into a binary thetic (generated) data. However, even this approach is not
vector of length 3, and the attribute values are converted into feasible in NIDS environments because the data being han-
[1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively. Together with dled is not in the form of an image. Fortunately, unlike other
the one-hot encoding process, the system scales the numeric
2 In our experiments, there was no significant difference between the two
1 Originally, the diversity ratio γ is defined as γ = E[L(G(z))]/E[L(x)]. feature scaling methods in terms of the performance of detection models.
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2335
GAN models, BEGAN can approximate the convergence of Algorithm 1 Autoencoder Training With Generators
training through the concept of equilibrium, and this char- Input: training dataset Dtrain , a set of generators G
acteristic facilitates the determination of the criteria for the 1: Initialize Autoencoder parameters θAE 0
training termination. The convergence measure M of BEGAN 2: for Gi ∈ G, where 1 ≤ i ≤ k do
is formulated as follows: 3: sample z = {zj }j=1,...,mi from the latent space
M = L(x) + |γ L(x) − L(G(z))| (8) 4: D̂i = Gi (z)
5: end for
where L(·) is the reconstruction error function, and γ is the 6: D̃ = Dtrain ∪ D̂1 ∪ · · · ∪ D̂k
diversity ratio. 7: θAE = Train_Autoencoder(θAE 0 , D̃)
By utilizing the convergence measure, the system termi- 8: θenc = Extract_Encoder(θAE )
nates the generative model’s training process. That is, when Output: trained encoder θenc
training the generative model, the system considers a thresh-
old as an input parameter and terminates the training process
if the convergence measure M outputs a value less than the
given threshold. In the experiment, we set the threshold of the expanded data set composed in the previous module and then
convergence measure M to 0.058.3 utilizes the trained encoder as the feature extraction module.
After training the generative model, the system generates Algorithm 1 presents a detailed process for autoencoder train-
synthetic data according to the classes using the trained gen- ing, where mi (1 ≤ i ≤ k) indicates the magnitude of synthetic
erator and integrates the generated data set into the original data to be generated for the class i. Note that the trained
training data set. This expanded data set is used to train the encoder is placed at the forefront (input layer) of the detec-
autoencoder and detection model in the next stage. Note that tion models as a feature extractor and is set not to learn any
although we designed the synthetic data generation module to more when training detection models (i.e., we fix the model
build multiple generative models according to the number of parameters of the trained encoder when training the detection
classes, it can be built as a single model by integrating the models).
concept of the conditional GAN architecture [41], where class For detection models, we utilized the basic DNN, CNN, and
attributes are embedded in the input space. LSTM as classifiers. We designed the DNN model to possess
two hidden layers, and it could naturally process the refined
C. Learning the Autoencoder and Detection Model network traffic data in terms of the model training and clas-
To build the intrusion detection model, the system first trains sification task. In the case of the CNN model, because the
an autoencoder model that can provide feature extraction and model was originally designed to be more suitable for analyz-
dimensionality reduction functionalities. In our framework, ing image data, it required additional transformation processes
we designed the autoencoder to possess the same architec- in the input data space or the layers of the model depend-
ture as the discriminator of the generative model. Because ing on the approach followed. In our system, we built the
the deployed generative model is BEGAN, the discriminator CNN model with one-dimensional (1-D) convolutional layers
has the form of an autoencoder, as depicted above, and is to process the network traffic data, rather than converting the
compatible in terms of the model architecture, as it handles input data (i.e., network traffic data) into a 2-D space. As
the same data format as the detection model. After build- shown in Fig. 5, we configured the CNN classifier to have
ing an autoencoder model, the system trains it using the two 1-D convolutional layers and one fully connected layer.
3 In the learning process, if the convergence measure M does not fall below For LSTM, we designed the model to possess two recurrent
a given threshold, the process may fall into an infinite loop. To prevent this, layers with the LSTM units and a fully connected layer, as
we additionally set the maximum number of iterations. shown in Fig. 6. LSTM is known to be particularly effective
2336 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 3, 1 FEBRUARY 2023
Fig. 5. CNN model architecture in our system. Fig. 6. LSTM model architecture in our system.
can be set differently depending on the weight of each class. 9: Wθenc = Train_Classifier(Wθ0 , D̃)
enc
Note that the process of combining with the trained encoder Output: trained classifier Wθenc
(lines 7 and 8 in Algorithm 2) can be omitted according to
the predictive model.
From the perspective of the entire framework, the system training and testing data sets, KDDTrain and KDDTest, with
sequentially processes the data preprocessing, synthetic data 125 973 and 22 544 rows, respectively.4 In each data point,
generation, and detection model training modules, and we refer there exist 41 attributes (3 nominal, 6 binary, and 32 numeric
to the whole system as G-DNNAE , G-CNNAE , and G-LSTM, attributes) presenting different features of the network flow and
according to the type of the detection model. Additionally, we a label indicating an attack type or normal behavior. For the
subdivide the whole system into subsystems for a comprehen- attack type, there exist four distinct attack profiles: 1) Denial
sive comparison. In particular, we consider the DNN, CNN, of Service (DoS); 2) Probing; 3) Remote to Local (R2L); and
and LSTM models as naïve deep learning models and DNNAE 4) User to Root (U2R). DoS is an attack that depletes resources
and CNNAE , which are models combined with the autoen- by sending excessive traffic to the target system, thereby ren-
coder, as advanced deep learning models. In the experiment, dering it incapable of handling legitimate network traffic or
we conducted a comparative analysis of G-LSTM, G-DNNAE , service access. In the case of a probing attack, attacker’s
and G-DNNAE with the subsystems. objective is to gain information about the target system (e.g.,
scanning ports in use and sweeping IP addresses). R2L is
V. E XPERIMENTS AND E VALUATIONS an attack that attempts to obtain local access from a remote
In this section, we first review the target data sets and machine by sending remote fraudulent traffic to the target, and
describe the detailed implementation of each component. behaviors, such as password guessing and HTTP tunneling, are
Then, we present the experimental results with comparative considered R2L attacks. In the case of U2R, an attacker first
analysis and evaluate the proposed systems. gains access to the target system as an honest user and then
attempts to gain root privileges by causing system faults (e.g.,
A. Data Set Description buffer overflow and rootkit). Table I presents the entire distri-
bution of the NSL-KDD data set with respect to the classes
In this work, we focused on three network traffic data sets (attack classes and normal).
that are widely used as benchmark data sets in the field of 2) UNSW-NB15 Data Set: Together with the NSL-KDD
intrusion detection systems. Furthermore, we collected the data set presented above, the UNSW-NB15 data set [8], which
real data from a large enterprise system and analyzed the was created by the IXIA PerfectStorm tool, has been widely
performance of the proposed model on the real data set.
1) NSL-KDD Data Set: The NSL-KDD data set is a refined 4 The original configuration of the data set includes several subdata sets.
version of the KDDcup99 data set [6], [7] and consists of However, we only present the main training and testing data sets.
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2337
TABLE I TABLE II
DATA D ISTRIBUTION IN NSL-KDD DATA D ISTRIBUTION IN UNSW-NB15
TABLE III
D ISTRIBUTION OF R AW S ECURITY E VENTS IN THE R EAL DATA S ET model. In the case of LSTM, we connected 64 LSTM cells in
each layer and concatenated a fully connected layer with 32
neurons. For these detection models, we set the default num-
ber of epochs to 300 and applied the early stop technique (we
stopped learning when relative differences of loss are less than
10−6 consecutively for 35 epochs [24]).
We utilized two additional basic machine learning models
as comparative models.
1) SVM is a supervised learning model based on the sta-
tistical learning theory and aims to locate the best
hyperplane that can optimally separate input domains
according to the classes. In the experiment, we imple-
mented the linear kernel SVM model [2].
2) DT is a nonparametric supervised learning model, and
it recursively splits input domains based on the corre-
lation between each feature and class. In this study, we
positives are relatively high (see [43] for a detailed description implemented the C4.5 algorithm [1].
of the collected real data set). Note that, although there were For a more extensive comparison, we subdivided the com-
several detailed classes of detected attacks, each data was cat- ponents of our system, DNN, CNN, LSTM, DNNAE , and
egorized as Normal and Threat only (related to the privacy CNNAE , and utilized them as comparative models with the
issues of the enterprise). whole system. Note that we regard these submodels to cor-
respond to the existing AI-based NIDS. In particular, DNN,
CNN, and LSTM are considered as naïve deep learning
B. Implementation and Hyperparameters Tuning approaches. In the case of DNNAE and CNNAE , they are con-
As described in the previous section, we set the discrimi- sidered as advanced deep learning approaches combined with
nator of the generative model to be a symmetric autoencoder autoencoders.7
model with three layers. For this model, we constructed the In the experiment, we utilized four metrics to evaluate the
first hidden layer with 80 neurons and a latent space dimen- performance of AI models: Accuracy, Precision, Recall, and
sion with a size of 50. Therefore, the generator is set to have F1-score. Accuracy refers to the fraction of correctly inferred
the latent space of size 50 and a hidden layer of size 80. results and is commonly used to quantify the performance of
Additionally, we applied batch normalization to each hidden AI models. For a given class in a data set, Precision presents
layer for the stability of learning and used the rectified lin- the fraction of positive values inferred by the model that is
ear unit (ReLU) as the activation function. Note that because correct, while Recall refers to the fraction of data with positive
we configured the autoencoder as a feature extractor with the values that are correctly inferred by the model. The F1-score
same architecture as the discriminator, the above configura- is the harmonic mean of Precision and Recall. The formulas
tion corresponds to that of the autoencoder as well. In the of these metrics are defined as follows:
case of the generative model, we set the convergence threshold TP + TN
1) Accuracy =
to 0.058 and terminated training when the convergence mea- TP + FP + TN + FN
sure fell below the given threshold, or the number of epochs TP
2) Precision =
reached 250. For autoencoder learning, we set the default num- TP + FP
ber of epochs to 300 and stop training when the reconstruction TP
accuracy was above 0.97. 3) Recall =
TP + FN
For the classifier models, we deployed three distinct deep Precision × Recall
learning models: DNN, CNN, and LSTM. Considering the 4) F1-score = 2 ×
Precision + Recall
number of features, we explored the depth of the models up to
where TP, TN, FN, and FP denote the true positive, true
three layers. In the experiment, the one-layer structure showed
negative, false negative, and false positive, respectively.
high volatility, and the three-layer structure showed a tendency
Using these metrics, we evaluated each model on the exper-
to overfit. As a result, the models were most stable in the
imental data sets. Note that, although we built the models
two-layer structure and showed the highest performance.
with a stable structure, there was still the issue of volatility.
For the DNN model, we set the first hidden layer to have 32
Accordingly, with respect to comparison and evaluation, we
neurons and the second layer to have 16 neurons. For CNN,
independently trained each model 100 times and displayed the
we used a 1-D-CNN model with two convolutional layers. The
results for the model with the best detection rate in the test
convolutional layers are configured to have 32 convolution fil-
data set.
ters with windows of size 5, and a fully connected layer of 16
neurons follows. Additionally, we applied a max-pooling layer
with windows of size 3 to the first convolutional layer, and the 7 Although the detailed architecture and configurations may differ from
batch normalization layer after each convolutional layer. For those of the previous approaches, we stress that the implemented models are
the activation function, we used ReLU as in the generative comparable or outperform in terms of performance to the existing systems.
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2339
TABLE IV
B INARY C LASSIFICATION R ESULTS FOR THE T EST DATA S ET IN NSL-KDD
TABLE V
M ULTICLASSIFICATION R ESULTS FOR THE T EST DATA S ET IN NSL-KDD
TABLE VI
C LASSIFICATION ACCURACY FOR E ACH T HREAT C LASS ON THE UNSW-NB15 DATA S ET
TABLE VII
E XPERIMENTAL R ESULTS ON THE I OT-23 DATA S ET FOR M ULTICLASSIFICATION TASKS
each class (i.e., 84 855 training data and 36 367 test data). In
the experiments on the proposed system, we generated syn-
thetic data to attain a total size of 30 000 for each malicious
class in the training data set and evaluated the performance
of all models using the previously separated test data (36 367
rows).
Table VII presents the experimental results for the multi-
classification task on the IoT-23 data set, and Fig. 10 shows
a comparison of experimental results. Overall, all the mod-
els achieved an accuracy greater than 93%, and the models
were observed to have perfect classification performance for
the DDoS class, even in the naïve deep learning approach.
Moreover, we observed that there was no significant differ-
Fig. 10. Comparison of multiclassification results on the IoT-23 data set. ence in the performance between the advanced model and the
naïve model. In the case of the C&C class, all models showed
100% probability in precision. For the proposed models, all
(which possessed weights of approximately 1% within the the generator combined models showed the same performance
distribution) compared with the other models. Through exper- and achieved significant improvement in recall with a proba-
iments on the UNSW-NB15 data set containing more diverse bility of 80%. These results are presumably due to the fact that
classes, we found that the proposed model could improve the the IoT data set is very simple and has features that contain
classification performance for major classes. Moreover, we powerful information related to the nature of the attack (e.g.,
found that the implemented generative model could further “history”). In addition, regarding these results, we conjecture
improve the classification performance in minor and extremely that the trained generative model has generated plausible data
minor classes. points that fall within a certain region of the C&C distribu-
Although the proposed framework can improve the classi- tion (appearing in the test data set, but not in the training data
fication performance, there is still the problem of relatively set), and partially covered the missing region in the (extended)
low detection rates for some classes. In particular, all the training data set. Moreover, since there is a portion of the data
experimented models were observed to have relatively low in the corresponding region in the test data set, we estimate
detection rates for the DoS class, even in the LSTM-based that G-LSTM, G-DNNAE , and G-CNNAE performed signifi-
model, which is suitable for detecting temporally correlated cantly higher than other models. For the PortScan class, which
attacks. Regarding these results, we infer that the domain space is extremely minor, all models achieved 100% probability in
between classes is heavily overlapping [34], resulting in low recall, and the proposed systems achieved the highest precision
detection rates for some classes. value with a probability of 90.4%.
E. Experiments on the IoT Data Set F. Experiments on the Collected Real Data Set
To evaluate the performance of the proposed systems in To analyze the feasibility of the proposed system in a real
IoT environments, we conducted experiments on the IoT- environment, we collected real network flow data with raw
23 data set. As described above, we utilized the data set security events from a large enterprise system and conducted
collected on the Mirai botnet scenario (CTU-IoT-Malware- experiments on this real data set. As in the above experiment,
Capture-34-1) and intentionally simulated an extreme data we randomly split the collected data set into training and test
imbalance scenario. For evaluation, we randomly split the data sets at a ratio of 7:3 in both normal and abnormal classes
data set into training and test data sets at a ratio of 7:3 in (i.e., 3 347 639 training data and 1 434 703 test data). Note that
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2343
TABLE VIII
E XPERIMENTAL R ESULTS ON THE R EAL DATA S ET FOR B INARY C LASSIFICATION TASKS
G. Evaluation
Through comprehensive experiments on various data sets,
we demonstrated that the proposed system significantly out-
perform previous deep learning approaches and showed that
the classification performance for minor classes can be greatly
improved through the generative model. In particular, the
proposed models showed a noticeable performance improve-
ment for the R2L and Probe classes on the NSL-KDD data
set. In addition, we confirmed that the proposed model can
significantly improve the detection rate for most classes on
the UNSW-NB15 data set. Moreover, through experiments
on the IoT data set, we observed that our system can effi-
ciently detect network threats in a distributed environment.
Fig. 11. Comparison of binary classification results on the real data set. To demonstrate the feasibility in real-world environments, we
collected real data and tested our system in the binary clas-
sification scenario. Through experiments on the real data set,
we demonstrated that the proposed model could improve the
we only considered the binary classification scenario in exper- detection performance of network anomalies by resolving the
iments on the real environment. As shown in Table VIII, the data imbalance problem, and that the proposed system can be
data set possesses a severe imbalance between the normal and effectively applied in real-world environments.
abnormal classes. In the experiments on the proposed system,
we generated synthetic data for the abnormal class to be the
same size as the normal class and evaluated the performance VI. C ONCLUSION
of all models using the previously partitioned test data set as In this study, we presented a novel AI-based NIDS that can
in the previous experiments. efficiently resolve the data imbalance problem and improve
Table VIII presents the experimental results on the real the classification performance of the previous systems. To
data set, and Fig. 11 shows a comparison of experimental address the data imbalance problem, we leveraged a state-of-
results. First, it can be seen that all models achieve a superior the-art generative model that could generate plausible synthetic
performance in terms of the accuracy, as the data set consists of data and measure the convergence of training. Moreover, we
95.1% normal data and 4.9% anomalous data. Moreover, there implemented autoencoder-driven detection models based on
was no significant difference between the naïve and advanced DNN and CNN and demonstrated that the proposed mod-
models in terms of the classification performance, as in the els outperforms previous machine learning and deep learning
experiment on the IoT data set. From the perspective of each approaches. The proposed system was analyzed on various
class, the models achieved high F1-scores for normal data as data sets, including two benchmark data sets, an IoT data set,
expected, but relatively low recall values were measured for and a real data set. In particular, the proposed models achieved
abnormal data. In the case of the proposed model, G-DNNAE accuracies of up to 93.2% and 87% on the NSL-KDD data
and G-CNNAE achieved 93.8% F1-scores in the abnormal set and the UNSW-NB15 data set, respectively, and showed
class, and we observed that the deployed generative model remarkable performance improvement in the minor classes. In
could significantly improve the classification performance of addition, through experiments on an IoT data set, we demon-
minor classes even in the real system. strated that the proposed system can efficiently detect network
2344 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 3, 1 FEBRUARY 2023
threats in a distributed environment. Moreover, in order to [20] L. Qi, Y. Yang, X. Zhou, W. Rafique, and J. Ma, “Fast anomaly
investigate the feasibility in real-world environments, we col- identification based on multi-aspect data streams for intelligent intru-
sion detection toward secure industry 4.0,” IEEE Trans. Ind. Informat.,
lected real data from a large enterprise system and evaluated vol. 18, no.9, pp. 6503–6511, Sep. 2022.
the proposed model on the collected data set. Through this [21] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term memory
experiment, we demonstrated that the proposed model can recurrent neural network classifier for intrusion detection,” in Proc. Int.
Conf. Platform Technol. Service (PlatCon), 2016, pp. 1–5.
significantly improve the detection rate of network threats by [22] C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for intru-
resolving the data imbalance problem in the real environment. sion detection using recurrent neural networks,” IEEE Access, vol. 5,
In the future, by considering practical distributed environ- pp. 21954–21961, 2017.
[23] C. Xu, J. Shen, X. Du, and F. Zhang, “An intrusion detection system
ments, we will focus on applying our framework to federated using a deep neural network with gated recurrent units,” IEEE Access,
learning systems and ensemble AI systems to enhance network vol. 6, pp. 48697–48707, 2018.
threat detection. In addition, we will study adversarial attacks [24] J. Gao et al., “Omni SCADA intrusion detection using deep learn-
ing algorithms,” IEEE Internet Things J., vol. 8, no. 2, pp. 951–961,
that can bypass AI-based NIDS through vulnerabilities in AI Jan. 2021.
models and conduct research on enhanced NIDS that can resist [25] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach
these attacks in real-world environments. for network intrusion detection system,” EAI Endorsed Trans. Security
Safety, vol. 3, no. 9, p. e2, May 2016,
[26] B. Yan and G. Han, “Effective feature extraction via stacked sparse
autoencoder to improve intrusion detection system,” IEEE Access, vol. 6,
R EFERENCES pp. 41238–41248, 2018.
[1] J. R. Quinlan, C4.5: Programs for Machine Learning (Morgan [27] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach
Kaufmann Series in Machine Learning). San Mateo, CA, USA: Morgan to network intrusion detection,” IEEE Trans. Emerg. Topics Comput.
Kaufmann, 1993. Intell., vol. 2, no. 1, pp. 41–50, Feb. 2018.
[2] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector [28] C. Ieracitano, A. Adeel, F. C. Morabito, and A. Hussain, “A novel sta-
Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: tistical analysis and autoencoder driven intelligent intrusion detection
Cambridge Univ. Press, 2000. approach,” Neurocomputing, vol. 387, pp. 51–62, Apr. 2020.
[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, [29] J. Y. Kim, S. J. Bu, and S. B. Cho, “Malware detection using deep
MA, USA: MIT Press, 2016. transferred generative adversarial networks,” in Proc. Int. Conf. Neural
Inf. Process., 2017, pp. 556–564.
[4] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. 27th Int.
[30] M. H. Shahriar, N. I. Haque, M. A. Rahman, and M. Alonso, “G-IDS:
Conf. Neural Inf. Process. Syst. (NIPS), 2014, pp. 2672–2680.
Generative adversarial networks assisted intrusion detection system,”
[5] D. Berthelot, T. Schumm, and L. Metz, “BEGAN: Boundary equilibrium
in Proc. IEEE 44th Annu. Comput., Softw., Appl. Conf. (COMPSAC),
generative adversarial networks,”2017, arXiv:1703.10717.
Jul. 2020, pp. 376–385.
[6] S. Hettich and S. D. Bay. “KDD cup 1999 data.” 1999. [Online].
[31] I. Yilmaz, R. Masum, and A. Siraj, “Addressing imbalanced data
Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
problem with generative adversarial network for intrusion detection,”
[7] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed anal- in Proc. IEEE 21st Int. Conf. Inf. Reuse Integr. Data Sci. (IRI), Las
ysis of the KDD CUP 99 data set,” in Proc. IEEE Symp. Comput. Intell. Vegas, NV, USA, 2020, pp. 25–30.
Secur. Defense Appl., Jul. 2009, pp. 1–6. [32] D. Li, D. Kotani, and Y. Okabe, “Improving attack detection
[8] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for performance in NIDS using GAN,” in Proc. IEEE 44th Annu. Comput.,
network intrusion detection systems (UNSW-NB15 network data set),”in Softw., Appl. Conf. (COMPSAC), Jul. 2020, pp. 817–825.
Proc. Military Commun. Inf. Syst. Conf. (MilCIS), 2015, pp. 1–6. [33] W. Lee, B. Noh, Y. Kim, and K. Jeong, “Generation of network traf-
[9] A. Parmisano, S. Garcia, and M. J. Erquiaga, “A labeled dataset with fic using WGAN-GP and a DFT filter for resolving data imbalance,”
malicious and benign IoT network traffic.” 2020. [Online]. Available: in Proc. Int. Conf. Internet Distrib. Comput. Syst. (IDCS), Oct. 2019,
https://www.stratosphereips.org/datasets-iot23 pp. 306–317.
[10] B. Ingre and A. Yadav, “Performance analysis of NSL-KDD dataset [34] G. Dlamini and M. Fahim, “DGM: A data generative model to improve
using ANN,” in Proc. Int. Conf. Signal Process. Commun. Eng. Syst., minority class presence in anomaly detection domain,” Neural Comput.
Andhra Pradesh, India, Jan. 2015, pp. 92–96. Appl., vol. 33, pp. 13635–13646, Apr. 2021.
[11] Y. Gao, Y. Liu, Y. Jin, J. Chen, and H. Wu, “A novel semi-supervised [35] D. Li, D. Chen, J. Goh, and S.-K. Ng, “Anomaly detection with
learning approach for network intrusion detection on cloud-based robotic generative adversarial networks for multivariate time series,” 2018,
system,” IEEE Access, vol. 6, pp. 50927–50938, 2018. arXiv:1809.04758.
[12] K. Alrawashdeh and C. Purdy, “Toward an online anomaly intrusion [36] S. K. Alabugin and A. N. Sokolov, “Applying of generative adversarial
detection system based on deep learning,” in Proc. IEEE 15th Int. Conf. networks for anomaly detection in industrial control systems,” in Proc.
Mach. Learn. Appl. (ICMLA), Anaheim, CA, USA, 2016, pp. 195–200. Global Smart Ind. Conf. (GloSIC), Nov. 2020, pp. 199–203.
[13] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, [37] I. Siniosoglou, P. Radoglou-Grammatikis, G. Efstathopoulos, P. Fouliras,
“Deep learning approach for network intrusion detection in software and P. Sarigiannidis, “A unified deep learning anomaly detection and
defined networking,” in Proc. Int. Conf. Wireless Netw. Mobile Commun. classification approach for smart grid environments,” IEEE Trans. Netw.
(WINCOM), 2016, pp. 258–263. Service Manage., vol. 18, no. 2, pp. 1137–1151, Jun. 2021.
[14] Y. Imamverdiyev and F. Abdullayeva, “Deep learning method for denial [38] D. E. Rumelhart and J. L. McClelland, “Learning internal representations
of service attack detection based on restricted Boltzmann machine,” Big by error propagation,” in Parallel Distributed Processing: Explorations
Data, vol. 6, no. 2, pp. 159–169, Jun. 2018. in the Microstructure of Cognition: Foundations, vol. 1. Cambridge,
[15] W. Zhong, N. Yu, and C. Ai, “Applying big data based deep learn- MA, USA: MIT Press, 1987, pp. 318–362.
ing system to intrusion detection,” Big Data Min. Anal., vol. 3, no. 3, [39] G. E. Hinton and R. S. Zemel, “Autoencoders, minimum description
pp. 181–195, Sep. 2020. length and helmholtz free energy,” in Proc. 6th Int. Conf. Neural Inf.
[16] M. H. Haghighat and J. Li, “Intrusion detection system using vot- Process. Syst., 1993, pp. 3–10.
ingbased neural network,” Tsinghua Sci. Technol., vol. 26, no. 4, [40] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
pp. 484–495, Aug. 2021. learning with deep convolutional generative adversarial networks,” 2016,
[17] Y. Yang et al., “ASTREAM: Data-stream-driven scalable anomaly detec- arXiv:1511.06434.
tion with accuracy guarantee in IIoT environment,” IEEE Trans. Netw. [41] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”
Sci. Eng., early access, Mar. 8, 2022, doi: 10.1109/TNSE.2022.3157730. 2014, arXiv:1411.1784.
[18] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation-based anomaly [42] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
detection,” ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, pp. 1–39, sarial networks,” in Proc. 34th Int. Conf. Mach. Learn. (ICML), 2017,
Mar. 2012. pp. 214–223.
[19] X. Zhang et al., “LSHiForest: A generic framework for fast tree isolation [43] J. Lee, J. Kim, I. Kim, and K. Han, “Cyber threat detection based on
based ensemble anomaly analysis,” in Proc. IEEE 33rd Int. Conf. Data artificial neural networks using event profiles,” IEEE Access, vol. 7,
Eng. (ICDE), Apr. 2017, pp. 983–994. pp. 165607–165626, 2019.
PARK et al.: ENHANCED AI-BASED NETWORK INTRUSION DETECTION SYSTEM 2345
Cheolhee Park received the B.S. degree from Jong-Geun Park received the B.S. and M.S.
the Department of Applied Mathematics, Kongju degrees from the Department of Industrial
National University, Gongju, South Korea, in 2014, Engineering, Sungkyunkwan University, Seoul,
and the M.S. and Ph.D. degrees from the Department Republic of Korea, in 1997 and 1999, respectively,
of Mathematics, Kongju National University in 2017 and the Ph.D. degree from the Department of
and 2021, respectively. Computer Engineering, Chungnam National
He joined Electronics and Telecommunications University, Daejeon, Republic of Korea, in 2013.
Research Institute, Daejeon, South Korea, in 2021, From 1999 to 2001, he was a Researcher with
where he is currently working as a Researcher. ADD, Daejeon. Then, he joined Electronics and
His research interests include data privacy, differ- Telecommunications Research Institute, Daejeon, in
ential privacy, machine learning, deep learning, AI 2001, where he is currently working as a Principal
security, and network security. Researcher. He is currently interested in mobile network security, SDN/NFV,
cloud security, and AI security.