2020TadGAN Time Series Anomaly Detection Using

Uploaded by

664682899

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

2020TadGAN Time Series Anomaly Detection Using

Uploaded by

664682899

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

TadGAN: Time Series Anomaly Detection Using

Generative Adversarial Networks

Alexander Geiger* Dongyu Liu* Sarah Alnegheimish
MIT MIT MIT
Cambridge, USA Cambridge, USA Cambridge, USA
[email protected] [email protected] [email protected]

Alfredo Cuesta-Infante Kalyan Veeramachaneni

Universidad Rey Juan Carlos MIT
arXiv:2009.07769v3 [cs.LG] 14 Nov 2020

Madrid, Spain Cambridge, USA

[email protected] [email protected]

Abstract—Time series anomalies can offer information relevant Outperforms

to critical situations facing various fields, from finance and Deep learning based method ARIMA, 1970 [4]
aerospace to the IT, security, and medical domains. However,
LSTM AutoEncoder, 2016 [5] 5
detecting anomalies in time series data is particularly challeng-
ing due to the vague definition of anomalies and said data’s LSTM, 2018 [6] 5
frequent lack of labels and highly complex temporal correlations. MAD-GAN, 2019 [7] 0
Current state-of-the-art unsupervised machine learning methods MS Azure, 2019 [8] 0
for anomaly detection suffer from scalability and portability DeepAR, 2019 [9] 6
issues, and may have high false positive rates. In this paper, we
TadGAN 8
propose TadGAN, an unsupervised anomaly detection approach
built on Generative Adversarial Networks (GANs). To capture the
temporal correlations of time series distributions, we use LSTM TABLE I
Recurrent Neural Networks as base models for Generators and T HE NUMBER OF WINS OF A PARTICULAR METHOD COMPARED WITH
Critics. TadGAN is trained with cycle consistency loss to allow ARIMA, THE TRADITIONAL TIME SERIES FORECASTING MODEL ,
for effective time-series data reconstruction. We further propose AGAINST AN APPROPRIATE METRIC ( F 1 SCORE ) ON 11 REAL DATASETS .
several novel methods to compute reconstruction errors, as well
as different approaches to combine reconstruction errors and
Critic outputs to compute anomaly scores. To demonstrate the
performance and generalizability of our approach, we test several collective anomaly is a continuous sequence of data points that
anomaly scoring techniques and report the best-suited one. We are considered anomalous as a whole, even if the individual
compare our approach to 8 baseline anomaly detection methods data points may not be unusual [1].
on 11 datasets from multiple reputable sources such as NASA, Time series anomaly detection aims to isolate anomalous
Yahoo, Numenta, Amazon, and Twitter. The results show that
our approach can effectively detect anomalies and outperform
subsequences of varied lengths within time series. One of the
baseline methods in most cases (6 out of 11). Notably, our method simplest detection techniques is thresholding, which detects
has the highest averaged F1 score across all the datasets. Our data points that exceed a normal range. However, many
code is open source and is available as a benchmarking tool. anomalies do not exceed any boundaries – for example, they
Index Terms—Anomaly detection, Generative adversarial net- may have values that are purportedly “normal,” but are unusual
work, Time series data
at the specific time that they occur (i.e., contextual anomalies).
These anomalies are harder to identify because the context of
a signal is often unclear [1], [2].
I. I NTRODUCTION
Various statistical methods have been proposed to im-
The recent proliferation of temporal observation data has prove upon thresholding, such as Statistical Process Control
led to an increasing demand for time series anomaly detection (SPC) [3], in which data points are identified as anomalies
in many domains, from energy and finance to healthcare and if they fail to pass statistical hypothesis testing. However, a
cloud computing. A time series anomaly is defined as a time large amount of human knowledge is still required to set prior
point or period where a system behaves unusually [1]. Broadly assumptions on the models.
speaking, there are two types of anomalies: A point anomaly is Researchers have also studied a number of unsupervised
a single data point that has reached an unusual value, while a machine learning-based approaches to anomaly detection. One
popular method consists of segmenting a time series into sub-
* The two authors make equal contributions to this work. D. Liu and K.
Veeramachaneni are the co-corresponding authors. Copyright: 978-1-7281- sequences (overlapping or otherwise) of a certain length and
6251-5/20/$31.00 ©2020 IEEE applying clustering algorithms to find outliers. Another learns
Anomalies tstart tstop
Unsupervised 1 Jan 10th, 2019 - 8:16 am Jan 10th, 2019 - 3:34 pm
ML 2 Jan 16th, 2019 - 11:16 am Jan 17th, 2019 - 2:34 am
Model ... ... ...

18 Mar 24th, 2019 - 2:12 pm Mar 28th, 2019 - 3:19 pm

Fig. 1. An illustration of time series anomaly detection using unsupervised learning. Given a multivariate time series, the goal is to find out a set of anomalous
time segments that have unusual values and do not follow the expected temporal patterns.

a model that either predicts or reconstructs a time series signal, The key contributions of this paper are as follows:
and makes a comparison between the real and the predicted or • We propose a novel unsupervised GAN-reconstruction-
reconstructed values. High prediction or reconstruction errors based anomaly detection method for time series data. In
suggest the presence of anomalies. particular, we introduce a cycle-consistent GAN architec-
Deep learning methods [10] are extremely capable of han- ture for time-series-to-time-series mapping.
dling non-linearity in complex temporal correlations, and have • We identify two time series similarity measures suitable
excellent learning ability. For this reason, they have been for evaluating the contextual similarity between original
used in a number of time series anomaly detection methods and GAN-reconstructed sequences. Our novel approach
[6], [11], [12], including tools created by companies such as leverages GAN’s Generator and Critic to compute robust
Microsoft [8]. Generative Adversarial Networks (GANs) [13] anomaly scores at every time step.
have also been shown to be very successful at generat- • We conduct an extensive evaluation using 11 time-series
ing time series sequences and outperforming state-of-the-art datasets from 3 reputable entities (NASA, Yahoo, and
benchmarks [14]. Such a proliferation of methods invites the Numenta), demonstrating that our approach outperforms
question: Do these new, complex approaches actually perform 8 other baselines. We further provide several insights into
better than a simple baseline statistical method? To evaluate anomaly detection for time series data using GANs.
the new methods, we used 11 datasets (real and synthetic) • We develop a benchmarking system for time series
that collectively have 492 signals and thousands of known anomaly detection. The system is open-sourced and can
anomalies to set-up a benchmarking system (see the details be extended with additional approaches and datasets1 .
in Section VI and Table IV). We implemented 5 of the most At the time of this writing, the benchmark includes 9
recent deep learning techniques introduced between 2016 and anomaly detection pipelines, 13 datasets, and 2 evaluation
2019, and compared their performances with that of a baseline mechanisms.
method from the 1970s, A RIMA. While some methods were The rest of this paper is structured as follows. We formally
able to beat A RIMA on 50% of the datasets, two methods lay out the problem of time series anomaly detection in
failed to outperform it at all (c.f. Table I). Section II. Section III presents an overview of the related
One of the foundational challenges of deep learning-based literature. Section IV introduces the details of our GAN model.
approaches is that their remarkable ability to fit data carries the We describe how to use GANs for anomaly detection in Sec-
risk that they could fit anomalous data as well. For example, tion V, and evaluate our proposed framework in Section VI.
autoencoders, using L2 objective function, can fit and recon- Finally, Section VII summarizes the paper and reports our key
struct data extremely accurately - thus fitting the anomalies as findings.
well. On the other hand, GANs may be ineffective at learning
the generator to fully capture the data’s hidden distribution, II. U NSUPERVISED TIME SERIES ANOMALY DETECTION
thus causing false alarms. Here, we mix the two methods, Given a time series X = (x1 , x2 , · · · , xT ), where xi ∈
creating a more nuanced approach. Additionally, works in R M ×1
indicates M types of measurements at time step
this domain frequently emphasize improving the deep learning i, the goal of unsupervised time series anomaly detection
model itself. However, as we show in this paper, improving is to find a set of anomalous time segments Aseq =
post-processing steps could aid significantly in reducing the {a1seq , a2seq , · · · , akseq }, where aiseq is a continuous sequence
number of false positives. of data points in time that show anomalous or unusual behav-
In this work, we introduce a novel GAN architecture, iors (Figure 1) – values within the segment that appear not
TadGAN, for the time series domain. We use TadGAN to to comply with the expected temporal behavior of the signal.
reconstruct time series and assess errors in a contextual manner A few aspects of this problem make it both distinct from and
to identify anomalies. We explore different ways to compute more difficult than time series classification [15] or supervised
anomaly scores based on the outputs from Generators and time series anomaly detection [16], as well as more pertinent
Critics. We benchmark our method against several well-known to many industrial applications. We highlight them here:
classical- and deep learning-based methods on eleven time
series datasets. The detailed results can be found in Table IV. 1 The software is available at github (https://github.com/signals-dev/Orion)
– No a priori knowledge of anomalies or possible anoma- section, we discuss some of the unsupervised approaches.
lies: Unlike with supervised time series anomaly detec- The simplest of these are out-of-limit methods, which flag
tion, we do not have any previously identified “known regions where values exceed a certain threshold [21], [22].
anomalies” with which to train and optimize the model. While these methods are intuitive, they are inflexible and
Rather, we train the model to to learn the time series pat- incapable of detecting contextual anomalies. To overcome
terns, ask it to detect anomalies, and then check whether this, more advanced techniques have been proposed, namely
the detector identified anything relevant to end users. proximity-based, prediction-based, and reconstruction-based
– Non availability of “normal baselines” : For many real- anomaly detection (Table II).
world systems, such as wind turbines and aircraft engines,
simulation engines can produce a signal that resembles Methodology Papers
normal conditions, which can be tweaked for different Proximity [23]–[25]
control regimes or to account for degradation or aging. Prediction [2], [6], [26], [27]
Such simulation engines are often physics-based and Reconstruction [5], [28]–[30]
provide “normal baselines,” which can be used to train Reconstruction (GANs) [7], [14], [31]
models such that any deviations from them are considered
anomalous. Unsupervised time series anomaly detection
strategies do not rely on the availability of such baselines, TABLE II
U NSUPERVISED APPROACHES TO TIME SERIES ANOMALY DETECTION .
instead learning time series patterns from real-world
signals – signals that may themselves include anomalies
or problematic patterns. A. Anomaly Detection for Time Series Data.
– Not all detected anomalies are problematic: Detected Proximity-based methods first use a distance measure to
“anomalies” may not actually indicate problems, and quantify similarities between objects – single data points for
could instead result from external phenomena (such as point anomalies, or fixed length sequences of data points
sudden shifts in environmental conditions), auxiliary in- for collective anomalies. Objects that are distant from others
formation (such as the fact that a test run is being are considered anomalies. This detection type can be fur-
performed), or other variables that the algorithm did ther divided into distance-based methods, such as K-Nearest
not consider, such as regime or control setting changes. Neighbor (KNN) [24] – which use a given radius to define
Ultimately, it is up to the end user, the domain expert, to neighbors of an object, and the number of neighbors to
assess whether the anomalies identified by the model are determine an anomaly score – and density-based methods,
problematic. Figure 1 highlights how a trained unsuper- such as Local Outlier Factor (LOF) [23] and Clustering-Based
vised machine learning model can be used in real time Local Outlier Factor [25], which further consider the density
for the incoming data. of an object and that of its neighbors. There are two major
– No clear segmentation possible: Many signals, such as drawbacks to applying proximity-based methods in time series
those associated with periodic time series, can be seg- data: (1) a priori knowledge about anomaly duration and the
mented – for example, an electrocardiogram signal (ECG) number of anomalies is required; (2) these methods are unable
can be separated into similar segments that pertain to to capture temporal correlations.
periods [16], [17]. The resulting segment clusters may Prediction-based methods learn a predictive model to fit
reveal different collective patterns, along with anomalous the given time series data, and then use that model to predict
patterns. We focus on signals that cannot be clearly seg- future values. A data point is identified as an anomaly if
mented, making these approaches unfeasible. The length the difference between its predicted input and the original
of ai is also variable and is not known a priori, which input exceeds a certain threshold. Statistical models, such as
further increases the difficulty. ARIMA [26], Holt-Winters [26], and FDA [27], can serve this
– How do we evaluate these competing approaches? For purpose, but are sensitive to parameter selection, and often
this, we rely on several datasets that contain “known require strong assumptions and extensive domain knowledge
anomalies”, the details of which are introduced in Sec- about the data. Machine learning-based approaches attempt to
tion VI-A. Presumably, the “anomalies” are time seg- overcome these limitations. [2] introduce Hierarchical Tempo-
ments that have been manually identified as such by ral Memory (HTM), an unsupervised online sequence memory
some combination of algorithmic approaches and human algorithm, to detect anomalies in streaming data. HTM en-
expert annotation. These “anomalies” are used to evaluate codes the current input to a hidden state and predicts the next
the efficacy of our proposed unsupervised models. More hidden state. A prediction error is measured by computing the
details about this can be found in Section VI-B3. difference between the current hidden state and the predicted
hidden state. Hundman et al. [6] propose Long Short Term
III. R ELATED W ORK Recurrent Neural Networks (LSTM RNNs), to predict future
Over the past several years, the rich variety of anomaly time steps and flag large deviations from predictions.
types, data types and application scenarios has spurred a Reconstruction-based methods learn a model to capture
range of anomaly detection approaches [1], [18]–[20]. In this the latent structure (low-dimensional representations) of the
given time series data and then create a synthetic recon- 2019 are of note. First, to use GANs for anomaly detection
struction of the data. Reconstruction-based methods assume in time series, Li et al. [7] propose using a vanilla GAN
that anomalies lose information when they are mapped to a model to capture the distribution of a multivariate time series,
lower dimensional space and thereby cannot be effectively and using the Critic to detect anomalies. Another approach in
reconstructed; thus, high reconstruction errors suggest a high this line is BeatGAN [31], which is a Encoder and Decoder
chance of being anomalous. GAN architecture that allows for the use of the reconstruction
Principal Component Analysis (PCA) [28], a error for anomaly detection in heartbeat signals. More recently,
dimensionality-reduction technique, can be used to reconstruct Yoon et al. [14] propose a time series GAN which adopts
data, but this is limited to linear reconstruction and requires the same idea but introduces temporal embeddings to assist
data to be highly correlated and to follow a Gaussian network training. However, their work is designed for time
distribution [29]. More recently, deep learning based series representation learning instead of anomaly detection.
techniques have been investigated, including those that To the best of our knowledge, we are the first to introduce
use Auto-Encoder (AE) [30], Variational Auto-Encoder cycle-consistent GAN architectures for time series data, such
(VAE) [30] and LSTM Encoder-Decoder [5]. that Generators can be directly used for time series reconstruc-
However, without proper regularization, these tions. In addition, we systematically investigate how to utilize
reconstruction-based methods can easily become overfitted, Critic and Generator outputs for anomaly score computation.
resulting in low performance. In this work, we propose A complete framework of time series anomaly detection is
the use of adversarial learning to allow for time series introduced to work with GANs.
reconstruction. We introduce an intuitive approach for
regularizing reconstruction errors. The trained Generators can IV. A DVERSARIAL L EARNING FOR TIME SERIES
be directly used to reconstruct more concise time series data RECONSTRUCTION
– thereby providing more accurate reconstruction errors –
while the Critics can offer scores as a powerful complement The core idea behind reconstruction-based anomaly de-
to the reconstruction errors when computing an anomaly tection methods is to learn a model that can encode a
score. data point (in our case, a segment of a time series) and
then decode the encoded one (i.e., reconstructed one). An
B. Anomaly Detection Using GANs. effective model should not be able to reconstruct anomalies
Generative adversarial networks can successfully perform as well as “normal” instances, because anomalies will lose
many image-related tasks, including image generation [13], information during encoding. In our model, we learn two
image translation [32], and video prediction [33], and re- mapping functions between two domains X and Z, namely
searchers have recently demonstrated the effectiveness of E : X → Z and G : Z → X (Fig. 2). X denotes
GANs for anomaly detection in images [34], [35]. the input data domain, describing the given training samples
Adversarial learning for images. Schlegl et al. [36] use {(x1...t
i )}N 1...t
i=1 , xi ∈ X. Z represents the latent domain,
the Critic network in a GAN to detect anomalies in medical where we sample random vectors z to represent white noise.
images. They also attempt to use the reconstruction loss as We follow a standard multivariate normal distribution, i.e.,
an additional anomaly detection method, and find the inverse z ∼ PZ = N (0, 1). For notational convenience we use xi
mapping from the data space to the latent space. This mapping to denote a time sequence of length t starting at time step i.
is done in a separate step, after the GAN is trained. How- With the mapping functions, we can reconstruct the input time
ever, Zenati et al. [37] indicate that this method has proven series: xi → E(xi ) → G(E(xi )) ≈ x̂i .
impractical for large data sets or real-time applications. They We propose leveraging adversarial learning approaches to
propose a bi-directional GAN for anomaly detection in tabular obtain the two mapping functions E and G. As illustrated in
and image data sets, which allows for simultaneous training Fig. 2, we view the two mapping functions as Generators.
of the inverse mapping through an encoding network. Note that E is serving as an Encoder, which maps the time
The idea of training both encoder and decoder networks was series sequences into the latent space, while G is serving
developed by Donahue et al. [38] and Dumoulin et al. [39], as a Decoder, which transforms the latent space into the
who show how to achieve bidirectional GANs by trying to reconstructed time series. We further introduce two adversarial
match joint distributions. In an optimal situation, the joint Critics (aka discriminators) Cx and Cz . The goal of Cx is to
distributions are the same, and the Encoder and Decoder distinguish between the real time series sequences from X
must be inverses of each other. A cycle-consistent GAN was and the generated time series sequences from G(z), while Cz
introduced by Zhu et al. [32], who have two networks try measures the performance of the mapping into latent space. In
to map into opposite dimensions, such that samples can be other words, G is trying to fool Cx by generating real-looking
mapped from one space to the other and vice versa. sequences. Thus, our high-level objective consists of two
Adversarial learning for time series. Prior GAN-related terms: (1) Wasserstein losses [40], to match the distribution
work has rarely involved time series data, because the complex of generated time series sequences to the data distribution in
temporal correlations within this type of data pose significant the target domain; and (2) cycle consistency losses [32], to
challenges to generative modeling. Three works published in prevent the contradiction between E and G.
A. Wasserstein Loss Cx
The original formulation of GAN that applies the stan-
dard adversarial losses (Eq. 1) suffers from the mode x ∼ PX L2 G(E(x)) G(z)
collapse problem.
L = Ex∼PX [log Cx (x)] + Ez∼PZ [log(1 − Cx (G(z)))] (1) E G
where Cx produces a probability score from 0 to 1 indicating
the realness of the input time series sequence. To be specific, E(x) z ∼ PZ
the Generator tends to learn a small fraction of the variability
of the data, such that it cannot perfectly converge to the target Cz
distribution. This is mainly because the Generator prefers to
produce those samples that have already been found to be Fig. 2. Model architecture: Generator E is serving as an Encoder which
good at fooling the Critic, and is reluctant to produce new maps the time series sequences into the latent space, while Generator G is
ones, even though new ones might be helpful to capture other serving as a Decoder that transforms the latent space into the reconstructed
time series. Critic Cx is to distinguish between real time series sequences
“modes” in the data. from X and the generated time series sequences from G(z), whereas Critic
To overcome this limitation, we apply Wasserstein loss [40] Cz measures the goodness of the mapping into the latent space.
as the adversarial loss to train the GAN. We make use of
the Wasserstein-1 distance when training the Critic network.
Formally, let PX be the distribution over X. For the mapping consistency loss to time series reconstruction, which was first
function G : Z → X and its Critic Cx , we have the introduced by Zhu et al. [32] for image translation tasks. We
following objective: train the generative network E and G with the adapted cycle
consistency loss by minimizing the L2 norm of the difference
min max VX (Cx , G) (2) between the original and the reconstructed samples:
G Cx ∈Cx

with VL2 (E, G) = Ex∼PX [kx − G(E(x))k2 ] (5)

VX (Cx , G) = Ex∼PX [Cx (x)] − Ez∼PZ [Cx (G(z)))] (3) Considering that our target is anomaly detection, we use the L2
where Cx ∈ Cx which denotes the set of 1-Lipschitz continu- norm instead of L1 norm (the one used by Zhu et al. [32] for
ous functions. K-Lipschitz continuous functions are defined as image translation) to emphasize the impacts of anomalous val-
follows: kf (x1 ) − f (x2 )k ≤ Kkx1 − x2 k, ∀x1 , x2 ∈ dom f . ues. In our preliminary experiments, we observed that adding
The Lipschitz continuous functions constrain the upper bound the backward consistency loss (i.e., Ez∼Pz [kz − E(G(z))k2 ])
of the function, further smoothing the function. Therefore, did not improve performance.
the weights will not change dramatically when updated with C. Full Objective
gradient descent methods. This reduces the risk of gradient
Combining all of the objectives given in (3),(4) and (5) leads
explosion, and makes the model training more stable and
to the final MinMax problem:
reliable. In addition, to enforce the 1-Lipschitz constraint
during training, we apply a gradient penalty regularization min max VX (Cx , G) + VZ (Cz , E) + VL2 (E, G) (6)
term as introduced by Gulrajani et al. [41], which penalizes {E,G} {Cx ∈Cx ,Cz ∈Cz }

gradients not equal to 1 (cf. line 5). The full architecture of our model can be seen in Figure
Following a similar approach, we introduce a Wasserstein 2. The benefits of this architecture with respect to anomaly
loss for the mapping function E : X → Z and its Critic Cz . detection are twofold. First, we have a Critic Cx that is
The objective is expressed as: trained to distinguish between real and fake time series se-
min max VZ (Cz , E) (4) quences, hence the score of the Critic can directly serve as an
E Cz ∈Cz anomaly measure. Second, the two Generators trained with
The purpose of the second Critic Cz is to distinguish between cycle consistency loss allow us to encode and decode a time
random latent samples z ∼ PZ and encoded samples E(x) series sequence. The difference between the original sequence
with x ∼ PX . We present the model type and architecture for and the decoded sequence can be used as a second anomaly
E, G, Cx , Cz in section VI-B. detection measure. For detailed training steps, please refer to
the pseudo code (cf. line 1–14). The following section will
B. Cycle Consistency Loss introduce the details of using TadGAN for anomaly detection.
The purpose of our GAN is to reconstruct the input time
series: xi → E(xi ) → G(E(xi )) ≈ x̂i . However, training the V. T IME - SERIES GAN FOR ANOMALY DETECTION
GAN with adversarial losses (i.e., Wasserstein losses) alone (TAD GAN)
cannot guarantee mapping individual input xi to a desired Let us assume that the given time series is X =
output zi which will be further mapped back to x̂i . To reduce (x1 , x2 , · · · , xT ), where xi ∈ RM ×1 indicates M types of
the possible mapping function search space, we adapt cycle measurements at time step i. For simplicity, we use M = 1
Algorithm 1: TadGAN {x̂qi , i + q = j} We take the median from the collection as
Require: m, batch size. the final reconstructed value x̂j . Note that in the preliminary
epoch, number of iterations over the data. experiments, we found that using the median achieved a better
ncritic , number of iterations of the critic per performance than using the mean. Now, the reconstructed time
epoch. series is (x̂1 , x̂2 , · · · , x̂T ). Here we propose three different
η, step size. types of functions (cf. line 18) for computing the reconstruc-
1 for each epoch do tion errors at each time step (assume the interval between
2 for κ = 0, . . . , ncritic do neighboring time steps is the same).
Sample {(x1...t )}m Point-wise difference. This is the most intuitive way to
3 i i=1 from real data.
Sample {(zi )}P 1...k m define the reconstruction error, which computes the difference
4 i=1 from random.
1 m between the true value and the reconstructed value at every
5 gwCxP= ∇wCx [ m i=1 Cx (xi ) −
1 m time step:
m i=1 C x (G(z i )) + gp(xi , G(zi ))]
6 wCx = wCx + η · P adam(wCx , gwCx ) st = xt − x̂t (7)
1 m
7 gwCzP= ∇wCz [ m i=1 Cz (zi ) − Area difference. This is applied over windows of a certain
1 m
m i=1 C z (E(x i )) + gp(zi , E(xi ))] length to measure the similarity between local regions. It is
8 wCz = wCz + η · adam(wCz , gwCz ) defined as the average difference between the areas beneath
9 end two curves of length l:
10 Sample {(x1...t i )}mi=1 from real data. Z t+l
1...k m 1
11 Sample {(zi )}i=1Pfrom random.
1 m st = xt − x̂t dx (8)
12 gwG,E = ∇wG ,wE [ m i=1PCx (xi ) − 2 ∗ l t−l
1
P m 1 m
m P i=1 C x (G(z i )) + m Pi=1 Cz (zi ) −
1 m 1 m Although this seems intuitive, it is not often used in this
m i=1 Cz (E(xi )) + m i=1 kxi −
G(E(xi ))k2 ] context – however, we will show in our experiments that this
13 wG,E , = wG,E + η · adam(wG,E , gwG,E ) approach works well in many cases. Compared with the point-
14 end
wise difference, the area difference is good at identifying the
15 X = {(xi
1...t n
)}i=1 regions where small differences exist over a long period of
16 for i = 1, . . . , n do
time. Since we are only given fixed samples of the functions,
17 x̂i = G(E(xi )); we use the trapezoidal rule to calculate the definite integral in
18 RE(xi ) = f (xi , x̂i ); the implementation.
19 score = αZRE (xi ) + (1 − α)ZCx (x̂i ) Dynamic time warping (DTW). DTW aims to calculate the
20 end
optimal match between two given time sequences [42] and
is used to measure the similarity between local regions. We
have two time series X = (xt−1 , xt−l+1 , . . . , xt+l ) and X̂ =
(x̂t−1 , x̂t−l+1 , . . . , x̂t+l ) and let W ∈ R2∗l×2∗l be a matrix
in the later description. Therefore, X is now a univariate time such that the (i, j)-th element is a distance measure between xi
series and xi is a scalar. The same steps can be applied for and x̂j , denoted as wk . We want to find the warp path W ∗ =
multivariate time series (i.e., when M > 1). (w1 , w2 , . . . , wK ) that defines the minimum distance between
To obtain the training samples, we introduce a sliding the two curves, subject to boundary conditions at the start and
window with window size t and step size s to divide the end, as well as constraints on continuity and monotonicity.
original time series into N sub-sequences X = {(x1...ti )}N
i=1 , The DTW distance between time series X and X̂ is defined
T −t
where N = s . In practice, it is difficult to know the ground as follows:
truth, and anomalous data points are rare. Hence, we assume  v uK 
all the training sample points are normal. In addition, we 1 uX
st = W ∗ = DTW(X, X̂) = min  t wk  (9)
generate Z = {(zi1...k )}N i=1 from a random space following W K
k=1
normal distribution, where k denotes the dimension of the
latent space. Then, we feed X and Z to our GAN model Similar to area difference, DTW is able to identify the regions
and train it with the objective defined in (6). With the trained of small difference over a long period of time, but DTW can
model, we are able to compute anomaly scores (or likelihoods) handle time shift issues as well.
at every time step by leveraging the reconstruction error and
Critic output (cf. line 15–20).
B. Estimating Anomaly Scores with Critic Outputs
A. Estimating Anomaly Scores using Reconstruction Errors During the training process, the Critic Cx has to distinguish
Given a sequence x1...t
i of length t (denoted as xi later), between real input sequences and synthetic ones. Because we
TadGAN generates a reconstructed sequence of the same use the Wasserstein-1 distance when training Cx , the outputs
length: xi → E(xi ) → G(E(xi )) ≈ x̂i . Therefore, for each can be seen as an indicator of how real (larger value) or fake
time point j, we have a collection of reconstructed values (smaller value) a sequence is. Therefore, once the Critic is
NASA Yahoo S5 NAB
Property SMAP MSL A1 A2 A3 A4 Art AdEx AWS Traf Tweets
# SIGNALS 53 27 67 100 100 100 6 5 17 7 10
# ANOMALIES 67 36 178 200 939 835 6 11 30 14 33
point (len = 1) 0 0 68 33 935 833 0 0 0 0 0
collective (len > 1) 67 36 110 167 4 2 6 11 30 14 33
# ANOMALY POINTS 54696 7766 1669 466 943 837 2418 795 6312 1560 15651
# out-of-dist 18126 642 861 153 21 49 123 15 210 86 520
(% tot.) 33.1% 8.3% 51.6% 32.8% 2.2% 5.9% 5.1% 1.9% 3.3% 5.5% 3.3%
# DATA POINTS 562800 132046 94866 142100 168000 168000 24192 7965 67644 15662 158511
IS SYNTHETIC ? X X X X
TABLE III
DATASET SUMMARY: OVERALL THE BENCHMARK DATASET CONTAINS A TOTAL OF 492 SIGNALS AND 2349 ANOMALIES .

trained, it can directly serve as an anomaly measure for time sequences. We use sliding windows to compute thresholds,
series sequences. and empirically set the window size as T3 and the step size
T
Similar to the reconstruction errors, at time step j, we have as 3∗10 . This is helpful to identify contextual anomalies
a collection of Critic scores (cqi , i + q = j). We apply kernel whose contextual information is usually unknown. The sliding
density estimation (KDE) on the collection and then take the window size determines the number of historical anomaly
maximum value as the smoothed value cj Now the Critic scores to evaluate the current threshold. For each sliding
score sequence is (c1 , c2 , . . . , cT ). We show in our experiments window, we use a simple static threshold defined as 4 standard
that it is indeed the case that the Critic assigns different deviations from the mean of the window. We can then identify
scores to anomalous regions compared to normal regions. This those points whose anomaly score is larger than the threshold
allows for the use of thresholding techniques to identify the as anomalous. Thus, continuous time points compose into
anomalous regions. anomalous sequences (or windows): {aiseq , i = 1, 2, . . . , K},
where aiseq = (astart(i) , . . . , aend(i) ) .
C. Combining Both Scores Mitigating false positives: The use of sliding windows can
The reconstruction errors RE(x) and Critic outputs Cx (x) increase recall of anomalies but may also produce many
cannot be directly used together as anomaly scores. Intuitively, false positives. We employ an anomaly pruning approach
the larger RE(x) and the smaller Cx (x) indicate higher inspired by Hundman et al. [6] to mitigate false positives.
anomaly scores. Therefore, we first compute the mean and At first, for each anomalous sequence, we use the maximum
standard deviation of RE(x) and Cx (x), and then calculate anomaly score to represent it, obtaining a set of maximum
their respective z-scores ZRE (x) and ZCx (x) to normalize values {aimax , i = 1, 2, . . . , K}. Once these values are sorted
both. Larger z-scores indicate high anomaly scores. in descending order, we can compute the decrease percent
We have explored different ways to leverage ZRE (x) and pi = (amax
i−1
−aimax )/ai−1 i
max . When the first p does not exceed
ZCx (x). As shown in Table V (row 1–4), we first tested three a certain threshold θ (by default θ = 0.1), we reclassify all
types of ZRE (x) and ZCx (x) individually. We then explored subsequent sequences (i.e., {ajseq , i ≤ j ≤ K}) as normal.
two different ways to combine them (row 5 to the last row).
First, we attempt to merge them into a single value a(x) with VI. E XPERIMENTAL R ESULTS
a convex combination (cf. line 19) [7], [36]: A. Datasets
a(x) = αZRE (x) + (1 − α)ZCx (x) (10) To measure the performance of TadGAN, we evaluate it on
multiple time series datasets. In total, we have collected 11
where α controls the relative importance of the two terms (by datasets (a total of 492 signals) across a variety of application
default alpha = 0.5). Second, we try to multiply both scores domains. We use spacecraft telemetry signals provided by
to emphasize the high values: NASA2 , consisting of two datasets: Mars Science Laboratory
(MSL) and Soil Moisture Active Passive (SMAP). In addition,
a(x) = αZRE (x) ZCx (x) (11)
we use Yahoo S5 which contains four different sub-datasets
3
where α = 1 by default. Both methods result in robust The A1 dataset is based on real production traffic to Yahoo
anomaly scores. The results are reported in Section VI-C. computing systems, while A2, A3 and A4 are all synthetic
datasets. Lastly, we use Numenta Anomaly Benchmark
D. Identifying Anomalous Sequences
2 Spacecraft telemetry data: https://s3-us-west-2.amazonaws.com/
Finding anomalous sequences with locally adaptive thresh-
telemanom/data.zip
olding: Once we obtain anomaly scores at every time step, 3 Yahoo S5 data can be requested here: https://webscope.sandbox.yahoo.
thresholding techniques can be applied to identify anomalous com/catalog.php?datatype=s&did=70
(NAB). NAB [43] includes multiple types of time series 4) Baselines: The baseline methods can be divided into
data from various application domains4 We have picked five three categories: prediction-based methods, reconstruction-
datasets: Art, AdEx, AWS, Traf, and Tweets. based methods, and online commercial tools.
Datasets from different sources contain different numbers ARIMA (Prediction-based). An autoregressive integrated
of signals and anomalies, and locations of anomalies are moving average (ARIMA) model is a popular statistical anal-
known for each signal. Basic information for each dataset is ysis model that learns autocorrelations in the time series for
summarized in Table III. For each dataset, we present the total future value prediction. We use point-wise prediction errors as
number of signals and the number of anomalies pertaining to the anomaly scores to detect anomalies.
them. We also observe whether the anomalies in the dataset are HTM (Prediction-based). Hierarchial Temporal Memory
single “point” anomalies, or one or more collections. In order (HTM) [2] has shown better performance over many statistical
to suss out the ease of anomaly identification, we measure how analysis models in the Numenta Anomaly Benchmark. It
out-of-the-ordinary each anomaly point is by categorizing it encodes the current input to a hidden state and predicts the next
as “out-of-dist” if it falls 4 standard deviations away from the hidden state. Prediction errors are computed as the differences
mean of all the data for a signal. As each dataset has some between the predicted state and the true state, which are then
quality that make detecting its anomalies more challenging, used as the anomaly scores for anomaly detection.
this diverse selection will help us identify the effectiveness LSTM (Prediction-based). The neural network used in our
and limitations of each baseline. experiments consists of two LSTM layers with 80 units each,
and a subsequent dense layer with one unit which predicts
B. Experimental setup the value at the next time step (similar to the one used by
1) Data preparation: For each dataset, we first normalize Hundman et al. [6]). Point-wise prediction errors are used for
the data betweeen [−1, 1]. Then we find a proper interval anomaly detection.
over which to aggregate the data, such that we have several AutoEncoder (Reconstruction-based). Our approach can
thousands of equally spaced points in time for each signal. be viewed as a special instance of “adversarial autoen-
We then set a window size t = 100 and step size s = 1 to coders” [44], E ◦ G : X → X. Thus, we compare our method
obtain training samples for TadGAN. Because many signals with standard autoencoders with dense layers or LSTM lay-
in the Yahoo datasets contain linear trends, we apply a simple ers [5]. The dense autoencoder consists of three dense layers
detrending function (which subtracts the result of a linear least- with 60, 20 and 60 units respectively. The LSTM autoencoder
squares fit to the signal) before training and testing. contains two LSTM layers, each with 60 units. Again, a point-
wise reconstruction error is used to detect anomalies.
2) Architecture: In our experiments, inputs to TadGAN are
MAD-GAN (Reconstruction-based). This method [7] uses
time series sequences of length 100 (domain X), and the
a vanilla GAN along with an optimal instance searching
latent space (domain Z) is 20-dimensional. We use a 1-layer
strategy in latent space to support multivariate time series
bidirectional Long Short-Term Memory (LSTM) with 100
reconstruction. We use MAD-GAN to compute the anomaly
hidden units as Generator E, and a 2-layer bidirectional LSTM
scores at every time step and then apply the same anomaly
with 64 hidden units each as Generator G, where dropout is
detection method introduced in Sec. V-D to find anomalies.
applied. We add a 1-D convolutional layer for both Critics,
Microsoft Azure Anomaly Detector (Commercial tool). Mi-
with the intention of capturing local temporal features that can
crosoft uses Spectral Residual Convolutional Neural Networks
determine how anomalous a sequence is. The model is trained
(SR-CNN) in which the models are applied serially [8]. The
on a specific signal from one dataset for 2000 iterations, with
SR model is responsible for saliency detection, and the CNN is
a batch size of 64.
responsible for learning a discriminating threshold. The output
3) Evaluation metrics: We measure the performance of
of the model is a sequence of binary labels that is attributed
different methods using the commonly used metrics Precision,
to each timestamp.
Recall and F1-Score. In many real-world application scenarios,
Amazon DeepAR (Commercial tool). DeepAR is a proba-
anomalies are rare and usually window-based (i.e. a continuous
bilistic forecasting model with autoregressive recurrent net-
sequence of points—see Sec. V-D). From the perspective of
works [9]. We use this model in a similar manner to LSTM
end-users, the best outcome is to receive timely true alarms
in that it is a prediction-based approach. Anomaly scores are
without too many false positives (FPs), as these may waste
presented as the regression errors which are computed as the
time and resources. To penalize high FPs and reward the timely
distance between the median of the predicted value and true
true alarms, we present the following window-based rules:
value.
(1) If a known anomalous window overlaps any predicted
windows, a TP is recorded. (2) If a known anomalous window C. Benchmarking Results
does not overlap any predicted windows, a FN is recorded. (3)
TadGAN outperformed all the baseline methods by
If a predicted window does not overlap any labeled anomalous
having the highest averaged F1 score (0.7) across all
region, a FP is recorded. This method is also used in Hundman
the datasets. Table IV ranks all the methods based on their
et al’s work [6].
averaged F1 scores (the last column) across the eleven datasets.
4 NAB data: https://github.com/numenta/NAB/tree/master/data The second (LSTM, 0.623) and the third (Arima, 0.599) best
NASA Yahoo S5 NAB
Baseline MSL SMAP A1 A2 A3 A4 Art AdEx AWS Traf Tweets Mean±SD
TadGAN 0.623 0.704 0.8 0.867 0.685 0.6 0.8 0.8 0.644 0.486 0.609 0.700±0.123
(P) LSTM 0.46 0.69 0.744 0.98 0.772 0.645 0.375 0.538 0.474 0.634 0.543 0.623±0.163
(P) Arima 0.492 0.42 0.726 0.836 0.815 0.703 0.353 0.583 0.518 0.571 0.567 0.599±0.148
(C) DeepAR 0.583 0.453 0.532 0.929 0.467 0.454 0.545 0.615 0.39 0.6 0.542 0.555±0.130
(R) LSTM AE 0.507 0.672 0.608 0.871 0.248 0.163 0.545 0.571 0.764 0.552 0.542 0.549±0.193
(P) HTM 0.412 0.557 0.588 0.662 0.325 0.287 0.455 0.519 0.571 0.474 0.526 0.489±0.108
(R) Dense AE 0.507 0.7 0.472 0.294 0.074 0.09 0.444 0.267 0.64 0.333 0.057 0.353±0.212
(R) MAD-GAN 0.111 0.128 0.37 0.439 0.589 0.464 0.324 0.297 0.273 0.412 0.444 0.35±0.137
(C) MS Azure 0.218 0.118 0.352 0.612 0.257 0.204 0.125 0.066 0.173 0.166 0.118 0.219±0.145
TABLE IV
F1-S CORES OF BASELINE MODELS USING WINDOW- BASED RULES . C OLOR ENCODES THE PERFORMANCE OF THE F1 SCORE . O NE IS EVENLY DIVIDED
INTO 10 BINS , WITH EACH BIN ASSOCIATED WITH ONE COLOR . F ROM DARK RED TO DARK BLUE , F1 SCORE INCREASES FROM 0 TO 1.

Baseline Models Comparison to ARIMA How well do AutoEncoders perform? To view the supe-
riority of GAN, we compare it to other reconstruction-based
TADGAN 15.3%
method such as LSTM AE, and Dense AE. One striking result
LSTM 4.1% is that the autoencoder alone does not perform well on point
DeepAR -7.2% anomalies. We observe this as LSTM, AE, and Dense AE
obtained an average F1 Score on A3 and A4 of 0.205 and
LSTM AE -8.2%
0.082 respectively, while TadGAN and MAD-GAN achieved
HTM -18.3% a higher score of 0.643 and 0.527 respectively. One potential
Dense AE -41.1% reason could be that AutoEncoders are optimizing L2 function
and strictly attempt to fit the data, resulting in that anomalies
MAD-GAN -41.5% get fitted as well. However, adversarial learning does not have
MS Azure -63.4% this type of issue.
TadGAN v.s. MadGAN. Overall, TadGAN (0.7) outper-
80 60 40 20 0 20 40
% Improvement formed Mad-GAN (0.219) significantly. This fully demon-
strates the usage of forward cycle-consistency loss (Eq. 5)
Fig. 3. Comparing average F1-Scores of baseline models across all datasets which prevents the contradiction between two Generators E
to ARIMA. The x-axis represents the percentage of improvement over the and G and paves the most direct way to the optimal zi
ARIMA score by each one of the baseline models. that corresponds to the testing sample xi . Mad-GAN uses
only vanilla GAN and does not include any regularization
mechanisms to guarantee the mapping route xi → zi → x̂i .
are both prediction-based methods and TadGAN outperformed Their approach to finding the optimal zi is that they first
them by 12.46% and 16.86%, respectively, compared to the sample a random z from the latent space and then optimize it
averaged F1 score. with the gradient descent algorithm by optimizing the anomaly
Baseline models in comparison to Arima. Figure 3 depicts detection loss.
the performance of all baseline models with respect to Arima.
It shows how much improvement in F1-Score is gained by D. Ablation Study
each model. The F1-Score presented is the average across the We evaluated multiple variations of TadGAN, using differ-
eleven datasets. TadGAN achieves the best overall improve- ent anomaly score computation methods for each (Sec. V-C).
ment with an over 15% improvement in score, followed by The results are summarized in Table V. Here we report some
LSTM with a little over 4% improvement. It’s worth noting noteworthy insights.
that all the remaining models struggle to beat Arima. Using Critic alone is unstable, because it has the lowest
Synthetic data v.s. real-world datasets. Although average F1 score (0.29) and the highest standard deviation
TadGAN outperforms all baselines on average, we note that it (0.237). While only using Critic can achieve a good perfor-
ranks below Arima when detecting anomalies within synthetic mance in some datasets, such as SMAP and Art, its perfor-
dataset with point anomalies. Specifically, TadGAN achieved mance may also be unexpectedly bad, such as in A2, A3, A4,
an average of 0.717 while Arima scored an average of 0.784. AdEx, and Traf. No clear shared characteristics are identified
However, TadGAN still produces competitive results in both among these five datasets (see Table III). For example, some
scenarios. datasets contain only collective anomalies (Traf, AdEx), while
other datasets, like A3 and A4, have point anomalies as the
majority types. One explanation could be that Critic’s behavior
NASA Yahoo S5 NAB
Variation MSL SMAP A1 A2 A3 A4 Art AdEx AWS Traf Tweets Mean+SD
Critic 0.393 0.672 0.285 0.118 0.008 0.024 0.625 0 0.35 0.167 0.548 0.290±0.237
Point 0.585 0.588 0.674 0.758 0.628 0.6 0.588 0.611 0.551 0.383 0.571 0.594±0.086
Area 0.525 0.655 0.681 0.82 0.567 0.523 0.625 0.645 0.59 0.435 0.559 0.602±0.096
DTW 0.514 0.581 0.697 0.794 0.613 0.547 0.714 0.69 0.633 0.455 0.559 0.618±0.095
Critic×Point 0.619 0.675 0.703 0.75 0.685 0.536 0.588 0.579 0.576 0.4 0.59 0.609±0.091
Critic+Point 0.529 0.653 0.8 0.78 0.571 0.44 0.625 0.595 0.644 0.439 0.592 0.606±0.111
Critic×Area 0.578 0.704 0.719 0.867 0.587 0.46 0.8 0.6 0.6 0.4 0.571 0.625±0.131
Critic+Area 0.493 0.692 0.789 0.847 0.483 0.367 0.75 0.75 0.607 0.474 0.6 0.623±0.148
Critic×DTW 0.623 0.68 0.667 0.82 0.631 0.497 0.667 0.667 0.61 0.455 0.605 0.629±0.091
Critic+DTW 0.462 0.658 0.735 0.857 0.523 0.388 0.667 0.8 0.632 0.486 0.609 0.620±0.139
Mean 0.532 0.655 0.675 0.741 0.529 0.438 0.664 0.593 0.579 0.409 0.580
SD 0.068 0.039 0.137 0.211 0.182 0.154 0.067 0.209 0.081 0.087 0.02

TABLE V
F1-S CORES OF ALL THE VARIATIONS OF OUR MODEL .

is unpredictable when confronted with anomalies (x PX ), such as Time-Series GAN [14]. Due to our modular design,
because it is only taught to distinguish real time segments any reconstruction-based algorithm of time series can employ
(x ∼ PX ) from generated ones. our anomaly scoring method for time series anomaly detection.
DTW outperforms the other two reconstruction error In the future, we plan to investigate various strategies for time
types slightly. Among all variations, Critic×DTW has the best series reconstruction and compare their performances to the
score (0.629). Further, its standard deviation is smaller than current state-of-the-art. Moreover, it is worth understanding
most of the other variations except for Point, indicating that how better signal reconstruction affects the performance of
this combination is more stable than others. Therefore, this anomaly detection. In fact, it is expected that better reconstruc-
combination should be the safe choice when encountering new tion might overfit to anomalies. Therefore, further experiments
datasets without labels. are required to understand the relationship between reconstruc-
Combining Critic outputs and reconstruction errors does tion and detecting anomalies.
improve performance in most cases. In all datasets except VII. C ONCLUSION
A4, combinations achieve the best performance. Let us take
the MSL dataset as an example. We observe that when using In this paper, we presented a novel framework, TadGAN ,
DTW alone, the F1 score is 0.514. Combining this with the that allows for time series reconstruction and effective anomaly
Critic score, we obtain a score of 0.623, despite the fact that detection, showing how GANs can be effectively used for
the F1 score when using Critic alone is 0.393. In addition, we anomaly detection in time series data. We explored point-
find that after combining the Critic scores, the averaged F1 wise and window-based methods to compute reconstruction
score improves for each of the individual reconstruction error errors. We further proposed two different ways to combine
computation methods. However, one interesting pattern is that reconstruction errors and Critic outputs to obtain anomaly
for dataset A4, which consists mostly of point anomalies, using scores at every time step. We have also tested several anomaly-
only point-wise errors achieve the best performance. scoring techniques and reported the best-suited one in this
Multiplication is a better option than convex combina- work. Our experimental results showed that (1) TadGAN out-
tion. Multiplication consistently leads to a higher averaged performed all the baseline methods by having the highest
F1 score than convex combination does when using the same averaged F1 score across all the datasets, and showed superior
reconstruction error type (e.g., Critic×Point v.s. Critic+Point). performance over baseline methods in 6 out of 11 datasets; (2)
Multiplication also has consistently smaller standard devia- window-based reconstruction errors outperformed the point-
tions. Thus, multiplication is the recommended way to com- wise method; and (3) the combination of both reconstruction
bine reconstruction scores and Critic scores. This can be errors and critic outputs offers more robust anomaly scores,
explained by the fact that multiplication can better amplify which help to reduce the number of false positives as well
high anomaly scores. as increase the number of true positives. Finally, our code is
open source and is available as a tool for benchmarking time
E. Limitations and Discussion series datasets for anomaly detection.

Here we compare our approach to one well-known GAN- VIII. ACKNOWLEDGEMENT

based anomaly detection method [7]. However, there are many The authors are grateful to SES S.A. of Betzdorf, Lux-
other GAN architectures tailored for time series reconstruction, embourg, for their financial and non financial support in this
work. Dr. Cuesta-Infante is funded by the Spanish Government [21] J.-A. Martı́nez-Heras and A. Donati, “Enhanced Telemetry Monitoring
research fundings RTI2018-098743-B-I00 (MICINN/FEDER) with Novelty Detection,” AI Magazine, vol. 35, no. 4, p. 37, 2014.
[22] D. Decoste, “Automated Learning and Monitoring of Limit Functions,”
and Y2018/EMT-5062 (Comunidad de Madrid). Alnegheimish in International Symposium on Artificial Intelligence, Robotics, and
is supported by King Abdulaziz City for Science and Tech- Automation in Space, 1997.
nology (KACST). [23] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying
density-based local outliers,” in Proc. of the ACM SIGMOD, 2000, pp.
93–104.
R EFERENCES [24] F. Angiulli and C. Pizzuti, “Fast outlier detection in high dimensional
spaces,” in European Conference on Principles of Data Mining and
[1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” Knowledge Discovery. Springer, 2002, pp. 15–27.
ACM computing surveys (CSUR), vol. 41, no. 3, p. 15, 2009. [25] Z. He, X. Xu, and S. Deng, “Discovering cluster-based local outliers,”
[2] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-time Pattern Recognition Letters, vol. 24, no. 9-10, pp. 1641–1650, 2003.
anomaly detection for streaming data,” Neurocomputing, vol. 262, pp. [26] E. H. Pena, M. V. de Assis, and M. L. Proença, “Anomaly detection
134–147, 2017. using forecasting methods arima and hwds,” in International Conference
[3] D. Zheng, F. Li, and T. Zhao, “Self-adaptive statistical process control of the Chilean Computer Science Society (SCCC), 2013, pp. 63–66.
for anomaly detection in time series,” Expert Systems with Applications, [27] J. M. Torres, P. G. Nieto, L. Alejano, and A. Reyes, “Detection
vol. 57, pp. 324–336, 2016. of outliers in gas emissions from urban areas using functional data
[4] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series analysis,” Journal of hazardous materials, vol. 186, no. 1, pp. 144–149,
analysis: forecasting and control. John Wiley & Sons, 2015. 2011.
[5] P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and [28] H. Ringberg, A. Soule, J. Rexford, and C. Diot, “Sensitivity of pca for
G. Shroff, “LSTM-based encoder-decoder for multi-sensor anomaly traffic anomaly detection,” in Proc. of the 2007 ACM SIGMETRICS,
detection,” in Anomaly Detection Workshop at 33rd ICML, 2016. 2007, pp. 109–120.
[6] K. Hundman, V. Constantinou, C. Laporte, I. Colwell, and T. Soder- [29] X. Dai and Z. Gao, “From model, signal to knowledge: A data-driven
strom, “Detecting Spacecraft Anomalies Using LSTMs and Nonpara- perspective of fault detection and diagnosis,” IEEE Transactions on
metric Dynamic Thresholding,” in Proc. of the 24th ACM SIGKDD, Industrial Informatics, vol. 9, no. 4, pp. 2226–2238, 2013.
2018. [30] J. An and S. Cho, “Variational autoencoder based anomaly detection
[7] D. Li, D. Chen, B. Jin, L. Shi, J. Goh, and S.-K. Ng, “Mad-gan: using reconstruction probability,” Special Lecture on IE, vol. 2, no. 1,
Multivariate anomaly detection for time series data with generative 2015.
adversarial networks,” in International Conference on Artificial Neural [31] B. Zhou, S. Liu, B. Hooi, X. Cheng, and J. Ye, “BeatGAN: Anomalous
Networks. Springer, 2019, pp. 703–716. Rhythm Detection using Adversarially Generated Time Series,” in Proc.
[8] H. Ren, B. Xu, Y. Wang, C. Yi, C. Huang, X. Kou, T. Xing, M. Yang, of the 28th Int. Joint Conf. on Artificial Intelligence, (IJCAI), 2019, pp.
J. Tong, and Q. Zhang, “Time-series anomaly detection service at 4433–4439.
microsoft,” in Proc. of the 25th ACM SIGKDD, 2019, pp. 3009–3017. [32] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image
[9] D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, “Deepar: Translation Using Cycle-Consistent Adversarial Networks,” in IEEE Int.
Probabilistic forecasting with autoregressive recurrent networks,” Inter- Conf. on Computer Vision (ICCV), oct 2017, pp. 2242–2251.
national Journal of Forecasting, 2019. [33] C. Vondrick, H. Pirsiavash, and A. Torralba, “Generating videos with
[10] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, “A survey scene dynamics,” in Proc. of Advances in neural information processing
of deep learning-based network anomaly detection,” Cluster Computing, systems, 2016, pp. 613–621.
vol. 22, pp. 949–961, 2017. [34] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-
[11] P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short Term Erfurth, “f-AnoGAN: Fast unsupervised anomaly detection with gener-
Memory Networks for Anomaly Detection in Time Series,” in European ative adversarial networks,” Medical Image Analysis, vol. 54, pp. 30 –
Symposium on Artificial Neural Networks, Computational Intelligence 44, 2019.
and Machine Learning, 2015. [35] L. Deecke, R. Vandermeulen, L. Ruff, S. Mandt, and M. Kloft,
[12] J. Goh, S. Adepu, M. Tan, and Z. S. Lee, “Anomaly detection in “Anomaly detection with generative adversarial networks,” 2018.
cyber physical systems using recurrent neural networks,” in IEEE [Online]. Available: https://openreview.net/forum?id=S1EfylZ0Z
18th International Symposium on High Assurance Systems Engineering [36] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and
(HASE), 2017, pp. 140–145. G. Langs, “Unsupervised anomaly detection with generative adversarial
[13] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, networks to guide marker discovery,” in International Conference on
S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” Information Processing in Medical Imaging. Springer, 2017, pp. 146–
in Proc. of Advances in neural information processing systems, 2014, 157.
pp. 2672–2680. [37] H. Zenati, M. Romain, C.-S. Foo, B. Lecouat, and V. Chandrasekhar,
[14] J. Yoon, D. Jarrett, and M. van der Schaar, “Time-series generative “Adversarially Learned Anomaly Detection,” in IEEE ICDM, nov 2018,
adversarial networks,” in Proc. of Advances in Neural Information pp. 727–736.
Processing Systems. Curran Associates, Inc., 2019, pp. 5509–5519. [38] J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial Feature Learn-
[15] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, ing,” in IEEE ICLR, 2017.
“Deep learning for time series classification: a review,” Data Mining [39] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Ar-
and Knowledge Discovery, vol. 33, no. 4, pp. 917–963, 2019. jovsky, and A. Courville, “Adversarially Learned Inference,” in IEEE
[16] J. Qiu, Q. Du, and C. Qian, “Kpi-tsad: A time-series anomaly detector ICLR, 2017.
for kpi monitoring in cloud applications,” Symmetry, vol. 11, no. 11, p. [40] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
1350, 2019. sarial networks,” in Proc. of the 34th ICML, 2017, pp. 214–223.
[17] P. De Chazal, M. O’Dwyer, and R. B. Reilly, “Automatic classification of [41] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,
heartbeats using ecg morphology and heartbeat interval features,” IEEE “Improved Training of Wasserstein GANs,” in Proc. of the 31st Int.
transactions on biomedical engineering, vol. 51, no. 7, pp. 1196–1206, Conf. on Neural Information Processing Systems, 2017, pp. 5769–5779.
2004. [42] D. J. Bemdt and J. Clifford, “Using Dynamic Time Warping to Find
[18] V. Hodge and J. Austin, “A survey of outlier detection methodologies,” Patterns in Time Series,” in AAAI Workshop on Knowledge Discovery
Artificial intelligence review, vol. 22, no. 2, pp. 85–126, 2004. in Databases, Seattle, Washington, 1994.
[19] M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised [43] A. Lavin and S. Ahmad, “Evaluating real-time anomaly detection
anomaly detection algorithms for multivariate data,” PLOS ONE, vol. 11, algorithms–the numenta anomaly benchmark,” in Proc. of IEEE ICMLA,
no. 4, pp. 1–31, 04 2016. 2015, pp. 38–44.
[20] R. A. A. Habeeb, F. Nasaruddin, A. Gani, I. A. T. Hashem, E. Ahmed, [44] A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow, “Adversarial
and M. Imran, “Real-time big data processing for anomaly detection: A autoencoders,” in Proc. of ICLR, Workshop Track, 2016.
survey,” International Journal of Information Management, vol. 45, pp.
289–307, 2019.

Anomaly Detection
No ratings yet
Anomaly Detection
51 pages
A Review On Anomaly Detection in Time Series
No ratings yet
A Review On Anomaly Detection in Time Series
6 pages
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
No ratings yet
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
13 pages
Deep Learning For Time Series Anomaly Detection-A Survey
No ratings yet
Deep Learning For Time Series Anomaly Detection-A Survey
43 pages
Time - Series - Data 2024 05 22 05 16
No ratings yet
Time - Series - Data 2024 05 22 05 16
50 pages
DeepLearningforTimeSeriesAnomalyDetection
No ratings yet
DeepLearningforTimeSeriesAnomalyDetection
42 pages
Tkde2022 Beatgan
No ratings yet
Tkde2022 Beatgan
14 pages
MAD-GAN: Multivariate Anomaly Detection For Time Series Data With Generative Adversarial Networks
No ratings yet
MAD-GAN: Multivariate Anomaly Detection For Time Series Data With Generative Adversarial Networks
17 pages
DL For Time Series Anomaly Detection
No ratings yet
DL For Time Series Anomaly Detection
42 pages
HybridAD A Hybrid Model-Driven Anomaly Detection Approach For Multivariate Time Series
No ratings yet
HybridAD A Hybrid Model-Driven Anomaly Detection Approach For Multivariate Time Series
13 pages
Anomaly Detection and Time Series Analysis1
No ratings yet
Anomaly Detection and Time Series Analysis1
6 pages
2110.02642v5
No ratings yet
2110.02642v5
20 pages
Cheboli Deepthi May2010 PDF
No ratings yet
Cheboli Deepthi May2010 PDF
83 pages
Multivariate Time-series Anomaly Detection via
No ratings yet
Multivariate Time-series Anomaly Detection via
10 pages
Neural Contextual Anomaly Detection For Time Series
No ratings yet
Neural Contextual Anomaly Detection For Time Series
22 pages
Smbl Merged
No ratings yet
Smbl Merged
28 pages
USAD Architecture
No ratings yet
USAD Architecture
14 pages
Online Time-Series Anomaly Detection A Survey of M
No ratings yet
Online Time-Series Anomaly Detection A Survey of M
36 pages
Elk 2111 123
No ratings yet
Elk 2111 123
17 pages
Sun - Etal - 2021 - Generic and Scalable Periodicity Adaptation For Time Series Anomaly Detection
No ratings yet
Sun - Etal - 2021 - Generic and Scalable Periodicity Adaptation For Time Series Anomaly Detection
18 pages
Time Series Anomaly Detection Using Generative Adversarial Networ
No ratings yet
Time Series Anomaly Detection Using Generative Adversarial Networ
44 pages
Survey
No ratings yet
Survey
19 pages
A Survey On Graph Neural Networks For Time Series - Forecasting, Classification, Imputation and Anomaly Detection 2307.03759
No ratings yet
A Survey On Graph Neural Networks For Time Series - Forecasting, Classification, Imputation and Anomaly Detection 2307.03759
27 pages
Atf ETH Master Thesis AD+RCA
No ratings yet
Atf ETH Master Thesis AD+RCA
43 pages
Time-Series Anomaly Detection Service at Microsoft
No ratings yet
Time-Series Anomaly Detection Service at Microsoft
9 pages
Anomalies in Time Series
No ratings yet
Anomalies in Time Series
19 pages
Go l Mohammad i 2015
No ratings yet
Go l Mohammad i 2015
10 pages
Time Series Anomaly Detection With Multiresolution Ensemble Decoding
No ratings yet
Time Series Anomaly Detection With Multiresolution Ensemble Decoding
9 pages
Monitoring The Network Monitoring System: Anomaly Detection Using Pattern Recognition
No ratings yet
Monitoring The Network Monitoring System: Anomaly Detection Using Pattern Recognition
4 pages
Evaluation Metrics For Anomaly Detection Algorithm
No ratings yet
Evaluation Metrics For Anomaly Detection Algorithm
18 pages
MR - Tad Report
No ratings yet
MR - Tad Report
9 pages
Developing an Unsupervised Real-time Anomaly Detection Scheme for Time Series with Multi-seasonality TIMESERIES
No ratings yet
Developing an Unsupervised Real-time Anomaly Detection Scheme for Time Series with Multi-seasonality TIMESERIES
14 pages
A Survey On Graph Neural Networks For Time Series: Forecasting, Classification, Imputation, and Anomaly Detection
No ratings yet
A Survey On Graph Neural Networks For Time Series: Forecasting, Classification, Imputation, and Anomaly Detection
27 pages
Unsupervised Anomaly Detection in Multivariate Time Series
No ratings yet
Unsupervised Anomaly Detection in Multivariate Time Series
32 pages
Ipm S 24 03187
No ratings yet
Ipm S 24 03187
54 pages
One-Class Collective Anomaly Detection Based On Lstm-Rnns
No ratings yet
One-Class Collective Anomaly Detection Based On Lstm-Rnns
13 pages
2015KDD_Generic and Scalable Framework for Automated Time-series Anomaly Detection
No ratings yet
2015KDD_Generic and Scalable Framework for Automated Time-series Anomaly Detection
9 pages
5.1.1 Objective and Scope: Jyenis 2020
No ratings yet
5.1.1 Objective and Scope: Jyenis 2020
8 pages
Graph Neural Network-Based Anomaly Detection in Multivariate Time Series
No ratings yet
Graph Neural Network-Based Anomaly Detection in Multivariate Time Series
9 pages
Defense
No ratings yet
Defense
91 pages
Calibrated One-Class Classification For Unsupervised Time Series Anomaly Detection
No ratings yet
Calibrated One-Class Classification For Unsupervised Time Series Anomaly Detection
14 pages
Benkabou 2021
No ratings yet
Benkabou 2021
11 pages
Time Series Anomaly Detection With DL
No ratings yet
Time Series Anomaly Detection With DL
18 pages
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
No ratings yet
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
45 pages
3.anomaly Detectionmethods Based On GAN A Survey
No ratings yet
3.anomaly Detectionmethods Based On GAN A Survey
23 pages
Large-Scale Unusual Time Series Detection
No ratings yet
Large-Scale Unusual Time Series Detection
4 pages
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
No ratings yet
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
13 pages
Anomaly Detection On Industrial Electrical Systems Using Deep Learning
No ratings yet
Anomaly Detection On Industrial Electrical Systems Using Deep Learning
6 pages
Unsupervised Model Selection For Time-Series Anomaly Detection
No ratings yet
Unsupervised Model Selection For Time-Series Anomaly Detection
25 pages
时间序列
No ratings yet
时间序列
39 pages
p29-GAN Based Anomaly Detection Review Including Reviewer Suggestions
No ratings yet
p29-GAN Based Anomaly Detection Review Including Reviewer Suggestions
13 pages
APznzaYnecyXMEr-LQv9QUeETcJbwmNAK5O2xkfwKE5El6mPIXg-eQ6OudWQ8xqcHCcshI4kt4YoHR-8-Lae73pSsYtWtH3sqgsmz-84SS5iw7zEzloHoCgnXck3YIYOl394oOSaCz1LiK_6zHRd4YBxHjFbOFpkQIw7oHY5cPjdLVc05WexLgIMCgl1DJr8l7m7Ov56K9yGpHibBITGrM
No ratings yet
APznzaYnecyXMEr-LQv9QUeETcJbwmNAK5O2xkfwKE5El6mPIXg-eQ6OudWQ8xqcHCcshI4kt4YoHR-8-Lae73pSsYtWtH3sqgsmz-84SS5iw7zEzloHoCgnXck3YIYOl394oOSaCz1LiK_6zHRd4YBxHjFbOFpkQIw7oHY5cPjdLVc05WexLgIMCgl1DJr8l7m7Ov56K9yGpHibBITGrM
16 pages
Image - Anomaly - Detection With - GAN
No ratings yet
Image - Anomaly - Detection With - GAN
15 pages
2
No ratings yet
2
49 pages
Anomaly Detection Using Generative Adversarial Networks: Reviewing Methodological Progress and Challenges
No ratings yet
Anomaly Detection Using Generative Adversarial Networks: Reviewing Methodological Progress and Challenges
13 pages
Anomaly Detection With Generative Adversarial Networks For Multivariate Time Series
No ratings yet
Anomaly Detection With Generative Adversarial Networks For Multivariate Time Series
10 pages
Anomaly Detection On Iot Network Using Deep Learning
No ratings yet
Anomaly Detection On Iot Network Using Deep Learning
14 pages
DCdetector Dual Attention Contrastive Representation Learning For Time Series Anomaly Detection
No ratings yet
DCdetector Dual Attention Contrastive Representation Learning For Time Series Anomaly Detection
14 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Artificial Intelligence and Natural Algorithms
From Everand
Artificial Intelligence and Natural Algorithms
Rijwan Khan
No ratings yet
Paper 8 - Additional Losses - Damper Winding - Hydrogenerators at Open-Circuit and Load Conditions - Traxler-Samek
No ratings yet
Paper 8 - Additional Losses - Damper Winding - Hydrogenerators at Open-Circuit and Load Conditions - Traxler-Samek
7 pages
TOS New Format (4th QTR FINALS) T
No ratings yet
TOS New Format (4th QTR FINALS) T
5 pages
EDUR8132 IntroductoryNotes
No ratings yet
EDUR8132 IntroductoryNotes
6 pages
Oxford International Aqa As Physics ph01 Jan 2019 PP
No ratings yet
Oxford International Aqa As Physics ph01 Jan 2019 PP
32 pages
1 - Business Statistics
No ratings yet
1 - Business Statistics
82 pages
Gamma Distribution:: X X e X F
No ratings yet
Gamma Distribution:: X X e X F
1 page
P.E. Civil Exam Review: Structural Analysis
100% (1)
P.E. Civil Exam Review: Structural Analysis
46 pages
Test10 Eoc Geometry
No ratings yet
Test10 Eoc Geometry
31 pages
Lecture 12 Electric Dipole and Gauss Law
No ratings yet
Lecture 12 Electric Dipole and Gauss Law
36 pages
Gravitation Foundations and Frontiers 1st Edition T. Padmanabhan - Instantly access the complete ebook with just one click
100% (1)
Gravitation Foundations and Frontiers 1st Edition T. Padmanabhan - Instantly access the complete ebook with just one click
50 pages
Dashboard Design Best Practices
100% (6)
Dashboard Design Best Practices
43 pages
Thermal Equivalent Current
No ratings yet
Thermal Equivalent Current
12 pages
Causality: Causes y
No ratings yet
Causality: Causes y
3 pages
Chapter 20 Solution
No ratings yet
Chapter 20 Solution
49 pages
Lave and March Review PDF
No ratings yet
Lave and March Review PDF
3 pages
03 As Pure Mathematics Practice Paper B
No ratings yet
03 As Pure Mathematics Practice Paper B
6 pages
Pretty Rubik S Cube Patterns With Algorithms
No ratings yet
Pretty Rubik S Cube Patterns With Algorithms
5 pages
Dauns Prime Module 1978
No ratings yet
Dauns Prime Module 1978
26 pages
Test Reliability-Basic Concepts: Research Memorandum
No ratings yet
Test Reliability-Basic Concepts: Research Memorandum
46 pages
CH 6 10math
No ratings yet
CH 6 10math
27 pages
Unit 3
No ratings yet
Unit 3
46 pages
General Mathematics Lesson 3 Rational Function
No ratings yet
General Mathematics Lesson 3 Rational Function
64 pages
The Fibonacci Numbers
No ratings yet
The Fibonacci Numbers
19 pages
PID Controller - Wikipedia
No ratings yet
PID Controller - Wikipedia
41 pages
Reciprocal Operation of Ultrasonic Transducers Experimental Results
No ratings yet
Reciprocal Operation of Ultrasonic Transducers Experimental Results
4 pages
CSEC Add Maths - Paper 2 - June 2024 - Solutio
No ratings yet
CSEC Add Maths - Paper 2 - June 2024 - Solutio
27 pages
Analysis & Design of Algorithms: Binary Search
No ratings yet
Analysis & Design of Algorithms: Binary Search
21 pages
CourseOutline VLSI - 2
No ratings yet
CourseOutline VLSI - 2
2 pages
Bayesian Model Averaging: A Tutorial: Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky
No ratings yet
Bayesian Model Averaging: A Tutorial: Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky
36 pages
C1 07-Jan PDF
No ratings yet
C1 07-Jan PDF
8 pages