0% found this document useful (0 votes)
4 views12 pages

VAECGAN A Generating Framework For Long-Term Prediction in Multivariate Time Series

The document presents VAECGAN, a novel framework for long-term prediction in multivariate time series, which combines Variational Auto-Encoder and Conditional Generative Adversarial Network. The model consists of an encoder to process exogenous sequences, a generator to create predictions, and a discriminator to enhance accuracy through feedback. Empirical studies on five real-world datasets demonstrate the effectiveness and robustness of this approach in improving prediction accuracy over traditional methods.

Uploaded by

Adham Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

VAECGAN A Generating Framework For Long-Term Prediction in Multivariate Time Series

The document presents VAECGAN, a novel framework for long-term prediction in multivariate time series, which combines Variational Auto-Encoder and Conditional Generative Adversarial Network. The model consists of an encoder to process exogenous sequences, a generator to create predictions, and a discriminator to enhance accuracy through feedback. Empirical studies on five real-world datasets demonstrate the effectiveness and robustness of this approach in improving prediction accuracy over traditional methods.

Uploaded by

Adham Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Yin et al.

Cybersecurity (2021) 4:22


https://doi.org/10.1186/s42400-021-00090-w
Cybersecurity

Research Open Access

VAECGAN: a generating framework for


long-term prediction in multivariate time
series
Xiang Yin1 , Yanni Han2* , Zhen Xu2 and Jie Liu3

Abstract
Long-term prediction is still a difficult problem in data mining. People usually use various kinds of methods of
Recurrent Neural Network to predict. However, with the increase of the prediction step, the accuracy of prediction
decreases rapidly. In order to improve the accuracy of long-term prediction,we propose a framework Variational
Auto-Encoder Conditional Generative Adversarial Network(VAECGAN). Our model is divided into three parts. The first
part is the encoder net, which can encode the exogenous sequence into latent space vectors and fully save the
information carried by the exogenous sequence. The second part is the generator net which is responsible for
generating prediction data. In the third part, the discriminator net is used to classify and feedback, adjust data
generation and improve prediction accuracy. Finally, extensive empirical studies tested with five real-world datasets
(NASDAQ, SML, Energy, EEG,KDDCUP)demonstrate the effectiveness and robustness of our proposed approach.
Keywords: Long-term prediction, Multivariate time series, Attention mechanism, Generating framework

Introduction (Qu et al. 2005), security situation prediction (Liu et al.


As countries around the world strengthen the construc- 2021), anomaly detection (Li et al. 2019) and so on.
tion of modern information infrastructure and promote In recent years, ANNs have been widely used in time
the development of big data and the Internet of things, series prediction. Users do not need to specify the func-
more and more information are collected by us through tion form of independent variables and dependent vari-
sensor devices.The security of network data has gradu- ables when building Artificial Neural Networks(ANNs). It
ally become an important problem. Network managers can use back propagation algorithm to estimate param-
deploy a large number of security equipments in the net- eters. Theoretically, it can generate any complex contin-
work to prevent various attacks. In order to enhance uous function. Among them, Recurrent Neural Network
the security of the network, more and more researchers (RNN) and sequence-to-sequence models (Sutskever et al.
are also involved in the network security situation anal- 2014) have achieved great success in the field of sequence
ysis technology. Among them, the prediction technology data, and also attracted the attention of researchers. RNN
of time series data can effectively evaluate and measure adopts a chain structure to simulate the dynamic behav-
the potential threats in the network. Through this tech- ior of time series and retains the long-term pattern of
nology, the system can be used for analysis, prediction, time series through gate-like structures. At present, more
decision-making and control, such as automatic allocation and more people use RNNs for time series prediction,
of resources in the network, network attack early warning including Long Short-Term Memory (LSTM) (Hochreiter
and Schmidhuber 1997) and Gated Recurrent Unit (GRU)
*Correspondence: [email protected] (Cho et al. 2014). Several studies have shown success with
2
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, variants of these models (Zhu and Laptev 2017; Laptev et
China al. 2017; Maddix et al. 2018).
Full list of author information is available at the end of the article

© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The
images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Yin et al. Cybersecurity (2021) 4:22 Page 2 of 12

In software engineering projects, long-term forecasting (2) We propose a dynamic weights clipping method. The
is particularly important for system requirement man- dynamic weights clipping makes the discriminator
agement, storage maintenance and scheduling planning. more stable. Experiments in section 5 also prove the
Multi-step ahead prediction refers to the prediction of effectiveness of the clipping.
multiple time steps in the future for a variable based on
the past and present data. Specifically, real-world applica- The remainder of this paper is organized as follows:
tions often entail a mixture of short-term and long-term “Related work” section introduces the related background
repeating patterns. The related research on the long-term and the basic idea of our work. “Problem statement”
prediction of time series mainly focuses on the trend section describes the problem statement in the paper. In
prediction. A hybrid neural network is proposed to pre- “The VAECGAN model” section, we present the detail of
dict the trend of time series (Lin et al. 2017). In some our model, including the Encoder network, the Genera-
practical applications, people try to predict the trend of tor network and the Discriminator network. Experiments
stock price(Xu and Cohen 2018). However, these algo- are given in “Experiment” section. We conclude our work
rithms do not make full use of the information provided and give a glimpse of the future work in “Conclusion and
by exogenous (driving) sequences. Yao (Qin et al. 2017) future work” section.
and Liu (Liu et al. 2020) proposed a neural network archi-
tecture based on the encoder-decoder network to solve Related work
the problem. However, with the increasement of predic- In recent years, multiple studies have straightfor-
tion step, the complexity of prediction is improved and wardly inherited the GAN framework within the tem-
the prediction accuracy decreases. Zhang et al. (2019) poral setting. Mogren proposed a Recurrent Neural
tried to use Generative Adversarial Network (GAN) archi- Network architecture, Continuous-RNN-GAN(C-RNN-
tecture to solve the prediction problem. He proposed a GAN) (Mogren 2016), which uses confrontation train-
GAN neural network model with Multi-Layer Percep- ing to simulate the whole joint probability of sequence
tron (MLP) as a discriminator network and LSTM as and generate data sequence. This model is demonstrated
a generator network for financial forecasting. However, by training classical music sequences in midi format.
these methods are based on the recursive application of Recurrent Conditional GAN (RCGAN) (Esteban et al.
single-step prediction model for multi-step prediction. If 2017) model is a medical data generation framework.
there are errors in prediction, such errors will continue This model follows the architecture of the traditional
to accumulate. In general, we are facing a challenge: the GAN model, in which the generator and discriminator
task of using observed time series in the past to pre- can be replaced by RNN. Therefore, the RCGAN model
dict the unknown time series in long-term prediction– can generate real value sequence data limited by some
the larger the prediction steps are, the harder the input conditions. EEGGAN (Hartmann et al. 2018) is a
problem is. framework for generating brain signals. They improve the
In order to cope with the above challenge, we propose Wasserstein-GAN model to stabilize training and investi-
VAECGAN (Variational Auto-Encoder Conditional Gen- gate a range of architectural choices critical for time series
erative Adversarial Network). We use the encoder of VAE generation (most notably up and down sampling). EEG-
to encode the data driving series into lantern space and GAN opens up new possibilities for new applications, not
provide it to the generator, so that the lantern space is only for data enhancement, but also for spatial or tempo-
no longer random noise, and can contain part of the data ral oversampling (proposed by Corley and Huang in 2018)
information in the driving series. In the generation stage, or recovery of damaged signals. Time-series Generative
LSTM and attention are used to generate prediction data Adversarial Networks (Yoon et al. 2019) is also a data gen-
that has the same time trend as the past data. In the dis- eration approach, which generating realistic time-series
crimination stage, convolution layers are mainly used to data that combines the flexibility of the unsupervised
extract data features and discriminate between the gener- paradigm with the control afforded by supervised train-
ated data and the true data. The main contributions of this ing. But all these works are to generate data with the same
paper are as follows: time trend, not to predict the future data.
In some studies, representation learning is commonly
(1) A framework VAECGAN is introduced for the used to deal with the compact encodings in prediction
long-term prediction. In the model, the encoder of tasks. Therefore, several works have explored the bene-
VAE encodes the driving series into lantern space, so fit of combining autoencoders with adversarial training.
that the lantern space contains part of the series data Larsen et al. (2015) is proposed for learning similarity
information in the driving series. The CGAN module measures. Makhzani et al. (2015) is proposed for improv-
can improve the ability of the VAE module to ing generative capability. But all these works are applied to
generate time series data. image generation, not data generation.
Yin et al. Cybersecurity (2021) 4:22 Page 3 of 12

By contrast, we choose Conditional GAN (CGAN) image generation.The essence of GAN is to generate data
(Mirza and Osindero 2014) as our basic framework, consistent with the distribution of real data. Long term
but compared with CGAN, we code the exogenous forecasting also generates a series future data with simi-
sequence of temporal data through the Variational Auto- lar characteristics to the current data. This inspires us to
encoder(VAE) network instead of random noise as the apply GAN to time series distribution learning for gener-
input of generator. In the encoder stage, we input the ating future time distribution. But if the prediction data
exogenous sequence data, adjust the data weight through is generated directly from the random noise Z, the qual-
the attention mechanism, and encode it with the LSTM ity is not very good. We use the VAE model to encode
network. In the stage of decoder and generator, we use the exogenous sequence of data from the original distribu-
encode results as lantern space and target sequence data tion to a normal distribution, so the latent space contains
as label input to the network, and decode(generate) data the exogenous sequence information of data. Meanwhile,
through LSTM and attention function. In the discrimi- the Decoder net is not only used for decoding, but also
nation stage, we get the characteristics of data through can be used as a Generator network.As we can see,xk =
convolution layers and optimize the discriminator net- (xk1 , xk2 , ..., xkT ) ∈ RT represents the driving series of length
work in a dynamic way. The framework is shown in T and the xt = (x1t , x2t , ..., xnt ) ∈ Rn denotes a vector of
Fig. 1. n driving series at time t.Self-attention is used to process
the driving series xt so that the weights among the driving
Problem statement series can be captured at time t. Meanwhile, the input-
Based on the concept of adversarial training, GAN is attention mechanism is also adaptively used to select the
a deep learning framework that generates data through time correlation of the driving series xk .Then we can see
game learning. It can learn any complex probability dis- from Fig. 1, LSTM function process the calculation of
tribution in theory. Because GAN can produce high qual- the self-attention and input-attention and get the result as
ity images, it has achieved great success in the field of latent space z.

Fig. 1 The Architecture of The VAECGAN model


Yin et al. Cybersecurity (2021) 4:22 Page 4 of 12

Z = E(x1 , x2 , ..., xT ) (1) T is the time window. We can compute the attention
weight by the following formula(5) and formula (6).
WhereE(·)is the Encoder network. We put the target
sequence and lantern space into the Generator net to gen- ekt = vTe tanh(We [ ht−1 ; st−1 ] +Ue xk ) (5)
erate the target sequence. Then, given T target value,
i.e,Y = (y1 , y2 , ..., yT ) ∈ R, where T is the length of win-
exp(ekt )
dow size we define. Y denotes all target series during αtk = n (6)
i=1 exp(et )
i
the past T time step. In the Generator (Decoder) stage,
the temporal-attention mechanism is used to automati- Where ve ∈ RT , We ∈ RT×2m and Ue ∈ RT×T are param-
cally select the time steps of the result of the encoder. eters to learn.αtk is the attention weigh at time t. Then a
Then the prediction value Ŷ = (yT+1 ˆ , ..., yT+
ˆ , yT+2 ˆ ) SoftMax function ensures the attention weights sum to 1.
will be calculated with the lantern space Z and target In order to extract the series adaptively, we multiply the
series Y . Given the previous reading, predict the target attention weight with the temporal series by the following
series ŷ. formula(7).
 
ŷ = F(y1 , y2 , ..., yT , Z) (2)
x̃1 = α1k xk1 , α2k xk2 , ..., αTk xkT (7)
Where F(·) is the nonlinear function. represents the pre-
diction time steps. In order to get better prediction results, Self-attention has been used to study tex-
we use the real value Y = (yT+1 , yT+2 , ..., yT+ ) and pre- tual representation and achieved great success
diction value ŷ to train the Discriminator net, and add (Vaswani et al. 2017; Yin et al. 2020). In this paper,
category labels L = (LT+1 , LT+2 , ..., LT+ ) as conditional self-attention dynamically adjusts the importance of
variables to guide the Discriminator net. Specifically, the the driving series, which makes unique adjustment
Discriminator is trained to minimize the average negative coefficients for each driving series. Introduce an
least square between its predictions per time-step and the attention layer with an attention matrix capture the
labels of the sequence. similarity of any token with respect to all neighboring
tokens in an input sequence. Given the input driving
Dloss = LS(CNN(y), L) (3) series xt = (x1t , x2t , ..., xkt ) , the attention mechanism is
Where LS(·) is the east square function. L is a vector of implemented as follows:
1s,or 0s for sequence. The generator is trained to ‘trick’ the gt = tanh(Wg xt + bg ) (8)
discriminator into classifying its outputs as the true data,
that is, it hopes to minimize the least square between the
discriminator’s predictions on generated sequences and α̃tk = Sigmoid(Wα gt + ba ) (9)
the ‘true’ label, the vector of 1s (we write as 1).
Where Wg ∈ Rm and Wα ∈ RT×m are parameters to
Gloss = Dloss (CNN(Z), 1) (4) learn.bg and bα are the bias vectors. Then we multiply the
attention weight coefficients with the attribute of driving
The VAECGAN model series for showing the different importance of the different
The VAECGAN model is composed of three networks, attributes of the driving series.
the encoder network, the generator network and the dis-  T
criminator network. Figure 2 shows the architecture of x̃2 = αt1 x1t , αt2 x2t , ..., αtk xkt (10)
the three parts. The encoder network processes the driv-
ing series and generates the lantern space which can keep Since x̃2 will concatenate on x̃1 , we take the transpose of
the relationship information. The generator network use x̃2 to make them have the same shape.
lantern space and target series to generate prediction After calculating self-attention and input-attention, we
series. The discriminator network classifies data into real feed the result as lantern space by using f1 function which
and fake. is an LSTM unit.
h̃1t = f1 (h̃1t−1 , x̃1t ) (11)
The Enocoder network
The encoder network is composed of input attention,
self-attention and LSTM network. Figure 2a shows the h̃2t = f1 (h̃2t−1 , x̃2t ) (12)
encoder architecture. In time series prediction, long
sequence input is not friendly to the Encoder-Decoder
Z =[ h̃1t ; h̃2t ] (13)
model, so we can better predict the target value by extract-
ing important information of driving series through input- Where [ h̃1t ; h̃2t ] is the concatenate of the two hidden states.
attention. Given T input series xk = (xk1 , xk2 , ..., xkT ) ∈ RT , And Z is the input for the generator net.
Yin et al. Cybersecurity (2021) 4:22 Page 5 of 12

Fig. 2 The Architecture of The Encoder Network(a), Generator Network(b) and Discriminator Network(c)

The generator network LSTM unit can be used for updating the decoder hidden
The generator network is composed of a sequential atten- state dt at time t. The calculation formula (18) is given
tion mechanism and LSTM network. Figure 2b shows the below.
network framework. Then temporal attention is employed
to adaptively select the hidden state of all time steps dt = f1 (dt−1 , ỹt−1 ) (18)
related encoder. The attention weight βti of each time step The temporal dependence can be captured with the LSTM
t is calculated by the previous hidden state dt−1 and the unit f1 .
cell state of the LSTM unit st−1 .

lti = vT−1
d tanh(Wd [ dt−1 ; st−1 ] +Ud Z) (14) The discriminator network
Figure 2c shows the Discriminator architecture. The dis-
criminator network consists of convolutional neural net-
exp(li ) work layers and a sigmoid activation function layer. The
βti = T−1 t j (15)
j=1 exp(lt ) convolutional network is mainly composed of three 1-D
convolution layers, which can better capture the interest-
Where [ dt−1 ; st−1 ] ∈ R2p represents the concatenation of
ing features and discriminate the real data from generated
previous hidden state dt−1 and cell state of the LSTM unit
data.
st−1 .vd ∈ Rm , Wd ∈ Rm×m and Ud ∈ Rm×m are param-
In the GAN model, the discriminator network is mainly
eters to learn. The context vector ct is computed by the
used to judge whether the input data is real data or gener-
following formula(16).
ated data. It needs to adjust its parameters to give accurate

T−1 judgment as much as possible. At the same time, the gen-
ct = βti Z (16) erator network is mainly used to generate data, which
t=1 can simulate the real data as much as possible to con-
Then the context vectors ct−1 is combined with the given fuse the discriminator network. The LSGAN model (Mao
target series yt−1 .The calculation formula (17) is given et al. 2017) proposed the least square method as the loss
below. method, which can change the shortcomings of data qual-
ity is not high in the traditional GAN. Generally, taking
ỹt = W̃T [ yt−1 ; ct−1 ] +b̃ (17)
cross entropy as the loss function makes the generator not
Where [ yt−1 ; ct−1 ] ∈ Rm+1represents the concatenation optimize the data which judged as true by the discrim-
of the target series yt−1 and the weighted sum context vec- inator network, even if the data does not fully conform
tors ct−1 .W̃ ∈ Rm+1 and b̃ ∈ R are the parameters. An to the trend of real data. Why does this phenomenon
Yin et al. Cybersecurity (2021) 4:22 Page 6 of 12

Table 1 The MAE and RMSE indexes of time series prediciton over the five datasets(=50)
Models SML2020 NASDAQ100 Energy EEG KDDCUP
LSTM 0.7924 0.4362 1.3416 0.3977 0.3556
0.7145 0.3389 0.5968 0.4304 0.3928
Seq2Seq 0.7594 0.3491 1.2385 0.2814 0.3014
0.6451 0.3369 0.6147 0.3816 0.3684
DARNN 0.5134 0.1554 1.1561 0.2951 0.2579
0.3938 0.2881 0.5912 0.3817 0.3177
TCN 0.5614 0.0993 1.1154 0.2411 0.1935
0.4989 0.2412 0.5983 0.3591 0.2881
DSTPRNN 0.3254 0.1217 1.0471 0.2589 0.1753
0.2589 0.2621 0.5477 0.3127 0.1829
VAE 0.4388 0.1898 1.2378 0.2426 0.2253
0.4136 0.2716 0.5471 0.3802 0.2968
VAECGAN 0.3115 0.0579 0.8294 0.2193 0.1654
0.1638 0.1911 0.4516 0.3452 0.1901

happen? The main reason is that the generator network min J(D) =
D
has completed its goal that confused the discriminator 1
network as much as possible, so the cross entropy loss min Ez∼pr [ D(x) − a]2 +12Ez∼pr [ D(G(Z)) − b]2 (19)
D 2
is very small at this time and cannot continue to opti-
mize. The least square method is different. It is possible
1
to further reduce the least square loss, so the generator min J(G) = min Ez∼pz [ D(G(z)) − c]2 (20)
network still generates data more like real data under the D D 2

premise of confusing the discriminator network. Mean- D(·) represents the discriminator network,G(·) represents
while, the least square method can also make the process the generator network. The input series data generate
of GAN training more stable. Therefore, we think that lantern space Z by Encoder network. The constants a (1)
using the least square method as the loss function can represent the real data label, and the constants b (0) rep-
effectively improve the quality and stability of the gener- resents the generated data label. The constants c (1) is the
ated data. The expression of the least square loss function value for the discriminator to judge the generated data is
is as follows: the real data.

Table 2 The MAE and RMSE indexes of time series prediciton over the five datasets(=120)
Models SML2020 NASDAQ100 Energy EEG KDDCUP
LSTM 0.8015 0.4751 1.3270 0.4091 0.4174
0.7023 0.4635 0.8197 0.4933 0.5352
Seq2Seq 0.8012 0.3921 1.3653 0.3674 0.4228
0.6974 0.4562 0.8214 0.5017 0.4811
DARNN 0.5327 0.3393 1.2671 0.3298 0.3973
0.5526 0.4074 0.7489 0.4666 0.3596
TCN 0.5387 0.2565 1.1232 0.2713 0.2667
0.5001 0.3672 0.6152 0.3938 0.3231
DSTPRNN 0.4543 0.2285 1.0910 0.2821 0.2588
0.4109 0.3613 0.6457 0.3667 0.3182
VAE 0.5879 0.2915 1.2218 0.3017 0.2884
0.5454 0.4237 0.7322 0.4511 0.3469
VAECGAN 0.4499 0.2193 0.8411 0.2562 0.2418
0.3965 0.3782 0.6267 0.3599 0.3118
Yin et al. Cybersecurity (2021) 4:22 Page 7 of 12

Fig. 3 The detail of time series prediction results over SML Datasets

In WGAN(Arjovsky et al. 2017),in order to satisfy Lips- addition, inappropriate setting of weight clipping range
chitz condition, weight clipping is used to limit the weight will also cause the gradient disappearance problem. Only
of the whole network to a certain range(c=0.01).This when the setting of the weight clipping range is appropri-
method has been proved to be simple and has good per- ate, can a suitable gradient value be returned. Therefore,
formance. But this method produces some problems, that this paper uses dynamic clipping strategy to solve this
is, weight clipping also limits the performance of the net- problem. Firstly, the weight value of the whole network
work, and it is difficult to simulate complex functions. In is obtained. Then the weight values of the first θ precent

Fig. 4 MAE vs. Length of Prediction Timesteps() over NASDAQ (a) and SML(b) Datasets
Yin et al. Cybersecurity (2021) 4:22 Page 8 of 12

and the last θ precent are calculated, Last,the weight val- NASDAQ 100 Data Set (NASDAQ)(NASDAQ100
ues are used as the dividing line for clipping. Algorithm 1 2017): The subset of the entire nasdaq100 stock dataset
describes the training process of VAECGAN. includes 81 major corporations and interpolates the miss-
ing data with linear interpolation. The index value of
nasdaq100 is used as the target series. These data include
105 days of inventory data from July 26 to December 22
Algorithm 1: VAECGAN training process. The VAE
in the 2016 year. Each day contains 390 data points except
model consists of the Encoder network and the
for 210 data points on November 25 and 180 data points
Decoder(generator) network. The GAN model con-
on December 22 which is collected minute-by-minute. In
sists of the Generator network and the Discriminator
our experience, the last column is the target series and the
network.
other 80 columns are driving series.
Require: θ, the clipping threshold. TrueData, the
Appliances energy prediction Data Set(Energy)
real-world data.
(Candanedo Ibarra et al. 2017): The dataset is at 10
ω, the weight in discriminator net. Truelabel, a vector
min for about 4.5 months. In our experiment, we employ
of 1s.
appliances energy use as the target series, delete the
Fakedata, a vector of 0s.
date attribute, and employ other attributes as driving
For i in ganepoch do:
series.
Latern←VAE.Encoder
EEG Steady-State Visual Evoked Potential Signals
Fakedata←GAN.Genertator
Data Set(EEG)(EEG 2018): This dataset consists of 30
For τ in Dispoch do:
subjects performing Brain Computer Interface for Steady
DlossTrue←GAN.Discriminator.Train(Truedata,
State Visual Evoked Potentials (BCI-SSVEP), and we only
Truelabel)
use the visual image search dataset from the first subject.
DlossFake←GAN.Discriminator.Train(Fakedata,
In our experiment, we use O1 as the target value and the
Fakelabel)
other 13 signal attributes coming from the electrodes as
ω ← clip(ω, ω[ θ × len(ω)] , ω[ (1 − θ) × len(ω)] )
exogenous series.
End for
KDDCUP: This is the data set used for The Third Inter-
For τ in Genpoch do:
national Knowledge Discovery and Data Mining Tools
Gloss← GAN.Genertator.Train(Truelabel)
Competition, which was held in conjunction with KDD-99
End for
The Fifth International Conference on Knowledge Discov-
For τ in Vaepoch do:
ery and Data Mining. This database contains a standard
VAE.Train()
set of data to be audited, which includes a wide variety of
End for
intrusions simulated in a military network environment.
End for
In our experiments, the last 20% points are the test
data. Among the rest 80% data, the previous 80%
data points are the training data and the later 20%
points are the validation data. In order to make each
Experiment feature make the same contribution to the results,
In order to evaluate the effectiveness of our model, we the normalization method is used to preprocess the
conduct experiments on five public datasets. The param- data.
eters setting of our proposed VAECGAN model and the
evaluation metrics are introduced. Then, we adopt five Parameter settings and evaluation metrics
different baseline models for comparison. Moreover, we Hyper-parameters: In this experiment, we set seven
show the comparison results between VAECGAN and parameters according to the previous work (Qin et al.
other baselines and study the parameter sensitivity of the 2017; Liu et al. 2020). The Adam optimizer (Kingma and
clipping threshold. Ba 2014), in which the learning rate is set as 0.0001 and
the batch size is set as 128, is used to training the genera-
Data sets tor network and discriminator network. In the VAECGAN
SML2010 Data Set (SML)(SML2010 2014): The dataset model, the length of the window size T is set as the value
is collected from a monitor system mounted in a domestic of 5,8,10,13,15,20. The prediction result proves that ‘T
house. The data were sampled every minute, comput- equals 10’ is the best choice. For simplicity, the hidden
ing and uploading it smoothed with 15-min means. In units(m) of the encoder network, the hidden units(p) of
our experience, the target value is the indoor temper- the generator network have the same size which conducts
ature(room), and 18 other features are selected as the a search over 16, 32, 64, 128, 256. When m=p=64 or 128,
driving series. our approach achieves the best performance over the test
Yin et al. Cybersecurity (2021) 4:22 Page 9 of 12

set. The clipping threshold θ will be proved in the next Performance comparison
part. In this section, our proposed model is compared with the
Evaluation Metrics: In order to compare the effective- other five baseline models on five datasets. The prediction
ness of various time series prediction algorithms, we use result with fifty time steps records in Table 1. The LSTM
two common criteria to evaluate our model, namely root units in the LSTM model and Seq2Seq model are 64.
squared error (RMSE) (Plutowski et al. 1996) and mean Other models are consistent with their papers. The first
absolute error (MAE) which are widely used in regression line represents the MAE, the second line represents
tasks. The formula of the two measurements is defined the RMSE, and the best result displays in boldface.
below: In Tables 1 and 2, the prediction accuracy will be
 reduced with a long step size. The prediction results of the

1  N
LSTM model and Seq2Seq model can capture the tem-
RMSE =  (yit − ŷit )2 (21) poral dependence to a certain extent. But in the Seq2Seq
N
i=1
model, the series data are mapped to a fixed dimension
vector in the encoder stage. Therefore, some information
1  i
N
in the series data is lost. The DCRNN model is proposed
MAE = yt − ŷit (22) for short term prediction. It’s also not good enough in
N
i=1 the long term prediction. In contrast, although the VAE
Where yt is the true target at time t and ŷt is the predicted model also belongs to the encoder-decoder network, the
value at time t. input attention mechanism and the self-attention mecha-
nism can retain data information to a large extent in the
Baseline models encoder stage. Therefore, the prediction result of the VAE
LSTM (Hochreiter and Schmidhuber 1997): LSTM is a model is more accurate than the Seq2Seq model. In Com-
variant algorithm of RNN which overcomes the limitation parison, although the TCN model captures the temporal
of vanishing gradient in RNNs. Since it can capture long- dependence with convolution layers, its performance is
term dependence, it has achieved good results in many not good in the face of long-term prediction. Because
time series tasks. the transfer learning ability of the TCN model is poor,
Seq2Seq (Sutskever et al. 2014): Seq2Seq is an the prediction effect of different databases is not good.
Encoder-Decoder model with a sequence of inputs and DSTPRNN model adopts a two-stage attention mecha-
sequence of outputs. Encoder neuro network can turn a nism, which can effectively capture temporal and spatial
variable length input sequence into a fixed length vector. dependence. Therefore, it shows good prediction perfor-
Then Decoder neuro network can decode the vector into mance. The performance of the VAE model is worse than
a variable length output sequence. This method has good the DSTPRNN model. However, after adding the CGAN
performance in machine translation, text translation or module, the prediction effect of the VAECGAN model has
other NLP processing. been greatly improved. It also proves that the training and
DARNN (Qin et al. 2017): This algorithm is proposed feedback by the discriminator network can generate better
in 2017, which shows the state-of-the-art performance prediction data.
in single-step time series prediction. With the dual-stage It can be clearly seen from Fig. 3 that the red curve
attention scheme, DARNN model can not only make (VAECGAN) is more consistent with the purple curve
predictions effectively, but can also be easily interpreted. (real values). This shows that our model is more accu-
TCN (Bai et al. 2018): The Temporal Collaborative rate than the other three models. At the same time, it can
Network integrates the modeling ability in the time be seen that VAECGAN maintained the same trend with
domain and the feature extraction ability in the low the real value compared with TCN, which fluctuated a
parameter number of convolutions. It runs faster than lot. Due to the poor performance of LSTM and seq2seq
RNN and is good at capturing timing dependency. algorithms, we omit them.
DSTPRNN (Liu et al. 2020): The Dual-Stage Two- In Fig. 4, we show the prediction effect of the various
Phase based RNN is inspired by the human attention model on the SML dataset and the NASDAQ dataset. It
mechanism. The first phase produces violent but decen- can be seen from the observation that with the increase
tralized response weight, while the second phase leads of the prediction step , the prediction accuracy of all
to stationary and concentrated response weight. multi- the models has been reduced to varying degrees. This
ple attentions are employed on target series to boost the phenomenon also confirms the difficulty that the pre-
long-term dependence. diction accuracy decreases with the increase of the pre-
VAE: This method uses the combination of encoder net diction step. But VAECGAN model is more stable than
and generator net mentioned in this paper. This method the other baseline model. And with the increase of the
is also used to train the encoder network. step size, the VAECGAN model performs better. Mean-
Yin et al. Cybersecurity (2021) 4:22 Page 10 of 12

Fig. 5 The loss of the discriminator network

while, the performance of the VAECGAN model is no It is obvious that the yellow line is more stable and con-
worse than the TCN model and DSTPRNN model in verges faster than the blue line, which indicates that the
short-term perdition. The predicted value of the VAEC- discriminator network in the model with weight clipping
GAN model is closer to the real value than the other strategy has better stability and faster convergence effect.
model at the corner. Therefore, it can be proved that the In order to evaluate the sensitivity and effectiveness of
performance of the VAECGAN model in prediction is the dynamic weight clipping strategy, we test the effect
better. of different thresholds (from 0.1 to 0.3) on the prediction
In order to verify the help of the dynamic weight clip- results. Figure 6 shows the effect of weight clipping meth-
ping strategy for the stability of the discriminator, we ods. As shown in the figure, the prediction results will
compare the loss value of the discriminator with and with- fluctuate with the change of clipping threshold θ. There-
out weight clipping on the SML dataset, as shown in Fig. 5. fore, defining an appropriate threshold can effectively

Fig. 6 MAE vs. clipping threshold over NASDAQ (a) and SML(b) Datasets
Yin et al. Cybersecurity (2021) 4:22 Page 11 of 12

improve the prediction effect. The prediction result of the References


model outperforms better when the clipping threshold θ Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. ArXiv abs/1701.07875
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic
is equal to 0.2. convolutional and recurrent networks for sequence modeling. CoRR
abs/1803.01271. 1803.01271
Candanedo Ibarra L, Feldheim V, Deramaix D (2017) Data driven prediction
Conclusion and future work models of energy use of appliances in a low-energy house. Energy Build
In this study, a new framework VAECGAN for long-term 140. https://doi.org/10.1016/j.enbuild.2017.01.083
prediction in multivariate time series has been proposed. Cho K, van Merriënboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014)
Learning phrase representations using rnn encoder-decoder for statistical
The encoder module is used to deal with the multidi- machine translation. https://doi.org/10.3115/v1/D14-1179
mensional driving sequence data, and the results of the EEG (2018). https://archive.ics.uci.edu/ml/datasets/EEG+Steady-State+Visual+
encoder network input into the generator as the latent Evoked+Potential+Signals. Accessed 2018
Esteban C, Hyland SL, Rätsch G (2017) Real-valued (medical) time series
space. Compared with generating data from noise, more generation with recurrent conditional gans
relevant information is retained in the latent space. Mean- Hartmann K, Schirrmeister R, Ball T, Eeg-gan: Generative adversarial networks
while, in order to improve the accuracy of prediction, the for electroencephalograhic (eeg) brain signals (2018)
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput
discriminator network is used to feedback the result to 9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735
the generator network. We also verify the help of dynamic Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Int Conf
Learn Represent 24:1–5
threshold for data generation and the most suitable clip- Laptev N, Yosinski J, Li EL, Smyl S (2017) Time-series extreme event forecasting
ping threshold. Finally, we conduct the evaluation with with neural networks at uber. In: International Conference on Machine
five open real-world data sets. It is proved that the model Learning. pp 1–5
Larsen ABL, Sønderby SK, Winther O (2015) Autoencoding beyond pixels using
achieved the best performance in long-term prediction on a learned similarity metric. CoRR abs/1512.09300. 1512.09300
the evaluation metrics of MAE and RMSE by comparing Li D, Chen D, Jin B, Shi L, Goh J, Ng S-K (2019) Madgan: Multivariate anomaly
with the five baselines. detection for time series data with generative adversarial
networks:703–716. https://doi.org/10.1007/978-3-030-30490-456
In future work, we would continue to study the use of Lin T, Guo T, Aberer K (2017) Hybrid neural networks for learning the trend in
the GAN framework to generate long-term data to solve time series. In: Proceedings of the Twenty-Sixth International Joint
the problem that the algorithm in this paper sometimes Conference on Artificial Intelligence, IJCAI-17. pp 2273–2279. https://doi.
org/10.24963/ijcai.2017/316
generates duplicate data. We will also adjust data genera- Liu D, Cheng J, Yuan Z, Wang C, Huang X, Yin H, Lin N, Niu H (2021) Prediction
tion methods to improve the accuracy of short-term data methods for energy internet security situation based on hybrid neural
network. IOP Conf Ser Earth Environ Sci 645:012085. https://doi.org/10.
prediction.
1088/1755-1315/645/1/012085
Liu Y, Gong C, Yang L, Chen Y (2020) Dstp-rnn: A dual-stage two-phase
Acknowledgements
attention-based recurrent neural network for long-term and multivariate
We would like to thank the anonymous reviewers for detailed comments and
time series prediction. Expert Syst Appl 143:113082. https://doi.org/10.
useful feedback.
1016/j.eswa.2019.113082
Maddix D, Wang Y, Smola A (2018) Deep Factors with Gaussian Processes for
Authors’ contributions
Forecasting. CoRR abs/1812.00098. https://arxiv.org/abs/1812.00098
The first author constructed the scheme and wrote the manuscript. The Makhzani A, Shlens J, Jaitly N, Goodfellow IJ (2015) Adversarial autoencoders.
second author reviewed the manuscript and checked the validity of the CoRR abs/1511.05644. 1511.05644
scheme. She also proofread the manuscript and corrected the grammar Mao X, Li Q, Xie H, Lau RK, Wang Z, Smolley S (2017) Least squares generative
mistakes. The third author and the fourth author joined the discussion of the adversarial networks. In: 2017 IEEE International Conference on Computer
work . All authors read and approved the final manuscript. Vision (ICCV). IEEE Computer Society, Los Alamitos. pp 2813–2821. https://
doi.ieeecomputersociety.org/10.1109/ICCV.2017.304
Funding Mirza M, Osindero S (2014) Conditional generative adversarial nets. CoRR
This work is supported by the Youth Talent Star of Institute of Information abs/1411.1784. https://arxiv.org/abs/1411.1784
Engineering, Chinese Academy of Sciences (Y7Z0091105) and This work was Mogren O (2016) C-rnn-gan: Continuous recurrent neural networks with
supported in part by National Natural Science Foundation of China under adversarial training. CoRR abs/1611.09904. https://arxiv.org/abs/1611.
Grant 61771469. 09904
NASDAQ100 (2017). https://cseweb.ucsd.edu/~yaq007/
Availability of data and materials NASDAQ100stockdata.html. Accessed 2017
Not applicable. Plutowski M, Cottrell GW, White H (1996) Experience with selecting exemplars
from clean data. Neural Netw 9(2):273–294. https://doi.org/10.1016/0893-
6080(95)00099-2
Declarations Qin Y, Song D, Chen H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage
attention-based recurrent neural network for time series prediction. In:
Competing interests
Proceedings of the Twenty-Sixth International Joint Conference on
The authors declare that they have no competing interests.
Artificial Intelligence, IJCAI-17. pp 2627–2633. https://doi.org/10.24963/
ijcai.2017/366
Author details
1 Institute of Information Engineering, Chinese Academy of Sciences, School of Qu G, Hariri S, Yousif M (2005) Multivariate statistical analysis for network
attacks detection. p 9. https://doi.org/10.1109/AICCSA.2005.1387011
Cyber Security, University of Chinese Academy of Sciences, Beijing, China. SML2010 (2014). http://archive.ics.uci.edu/ml/datasets/SML2010. Accessed
2 Institute of Information Engineering, Chinese Academy of Sciences, Beijing,
2014
China. 3 Network information department, China Mobile Communications Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural
Group Co.,Ltd, Beijing, China. networks. In: Proceedings of the 27th International Conference on Neural
Information Processing System - Volume 2. MIT, Cambridge. pp 3104–3112
Received: 26 January 2021 Accepted: 15 April 2021 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser u.,
Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st
Yin et al. Cybersecurity (2021) 4:22 Page 12 of 12

International Conference on Neural Information Processing Systems.


NIPS’17. Curran Associates Inc., Red Hook. pp 6000–6010
Xu Y, Cohen SB (2018) Stock movement prediction from tweets and historical
prices. In: Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). Association for
Computational Linguistics, Melbourne. pp 1970–1979. https://www.
aclweb.org/anthology/P18-1183
Yin X, Han Y, Sun H, Xu Z, Yu H, Duan X (2020) A multivariate time series
prediction schema based on multi-attention in recurrent neural network.
In: IEEE Symposium on Computers and Communications, ISCC 2020,
Rennes, France, July 7-10, 2020. pp 1–7. https://doi.org/10.1109/
ISCC50000.2020.9219721
Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial
networks. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox
E, Garnett R (eds). Advances in Neural Information Processing Systems 32:
Annual Conference on Neural Information Processing Systems 2019,
NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. pp 5509–5519.
https://proceedings.neurips.cc/paper/2019/hash/
c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.html
Zhang K, Zhong G, Dong J, Wang S, Wang Y (2019) Stock market prediction
based on generative adversarial network. Procedia Comput Sci
147:400–406. https://doi.org/10.1016/j.procs.2019.01.256
Zhu L, Laptev N (2017) Deep and confident prediction for time series at uber.
pp 103–110. https://doi.org/10.1109/ICDMW.2017.19

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

You might also like