453677-anomaly-detection-using-prediction-error-cc5b2ed6

This paper presents a novel method for video anomaly detection using a spatio-temporal convolutional LSTM architecture that focuses on prediction error rather than reconstruction error. Experiments on five benchmark datasets demonstrate that this approach achieves comparable performance to state-of-the-art methods while being more efficient in terms of model parameters. The results indicate that using prediction significantly enhances performance in detecting anomalies in video sequences.

Uploaded by

instraeh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

453677-anomaly-detection-using-prediction-error-cc5b2ed6

Uploaded by

instraeh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

UD - JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 20, NO. 6.

2, 2022 7

Anomaly Detection Using Prediction Error

with Spatio-Temporal Convolutional LSTM
Hanh T. M. Tran*, David Hogg

Abstract—In this paper, we propose a novel method for video anomaly detection motivated by an existing architecture
for sequence-to-sequence prediction and reconstruction using a spatio-temporal convolutional Long Short-Term Memory
(convLSTM). As in previous work on anomaly detection, anomalies arise as spatially localised failures in reconstruction or
prediction. In experiments with five benchmark datasets, we show that using prediction gives superior performance to using
reconstruction. We also compare performance with different length input/output sequences. Overall, our results using prediction
are comparable with the state of the art on the benchmark datasets.

Index Terms—Convolutional LSTM; convolutional autoencoder; prediction error; reconstruction error; anomaly detection.

✦
1. Introduction

A UTOMATIC detecting abnormal events in video

has been widely studied in recent years due to
its broad range of applications, including wide-area
resentations directly from training data without prior
definition. For example, this can be done in an unsu-
pervised manner using auto-encoders [7]–[9]. A stacked
surveillance and health monitoring. This problem is de-noising autoencoder can be used to learn appear-
different from event detection where the event is clearly ance and motion features for anomaly detection [8]. A
defined, since an anomaly is by definition unknown Winner-take-all sparsity constraint combined within the
in advance and may arise from unfamiliar activities or autoencoder has been shown to produce flow-features
activities in unfamiliar contexts. that are more discriminative for a one class SVM [7] that
The standard approach to anomaly detection has is trained separately on the compressed representations
been to learn spatio-temporal models of normal activity learnt by the autoencoder.
using hand-crafted features [1]–[6] or deep feature rep- End-to-end deep learning approaches have also been
resentations [7], [8]. An abnormality is detected when proposed for anomaly detection [10]–[12], [14], [15].
spatio-temporal patterns are observed that do not con- The convolutional autoencoder (convolutional AE) can
form to the model of normality. Many different low- be used to learn a model of normality from video,
level features using dense optical flow (e.g., histograms then reconstruction error [10]–[12] or prediction error
[5], MHOF [3] ) and other patterns of spatio-temporal [14], [15] provide a local measure for anomaly detec-
gradient [4] have been used in the past. A model of tion. A Generative Adversarial Network (GAN) can
normality is learned using these features extracted from be employed to generate a normal distribution over
training data and then used to determine numerical some datasets [11], [14] by jointly optimising with a
abnormality scores in test data. The model may be of discriminator that competes to distinguish what is a
several different kinds, including probabilistic models real normal sample from what is a generated one. The
(e.g, mixture of probabilistic PCA [1], mixture of dy- motion dynamic can be learnt using a multi-channel
namic texture [2]), domain based (e.g, one-class SVM approach, fusing appearance and motion information,
[6]), sparse coding [3] and Sparse Combination Learning and a cross-channel task, forcing the generator to trans-
(SCL) [4]. All of these methods have been used for form raw-pixel data into motion information and vice
anomaly detection and localization within the image versa [11] or using FlowNet combined with U-net for a
frame. single frame prediction [14]. The combination of a con-
Recently, deep learning architectures have been suc- volutional autoencoder and U-net has also been used
cessfully applied in many computer vision tasks includ- to build two-stream network with a shared encoder in
ing video anomaly detection. A key advantage of deep which one decoder is for a single frame reconstruction
learning methods is that they can learn feature rep- and one is for translating an image to optical flow
[15]. Another Spatio-Temporal autoencoder has been
Hanh T. M. Tran is with the University of Danang - University of proposed for video anomaly detection [16]. The results
Science and Technology, Vietnam (e-mail: [email protected]).
David Hogg is with the School of Computing, University of Leeds,
show that applying 3D convolution in the encoder and
United Kingdom. 3D deconvolution in the decoder helps to enhance the
capability of extracting motion patterns over the tempo-
*Corresponding author: Hanh T. M. Tran (e-mail: [email protected]) ral dimension.
Manuscript received February 04, 2022; revised April 24, 2022; accepted
May 17, 2022. A memory module is proposed into the AE to ad-
Digital Object Identifier 10.31130/ud-jst.2022.289E dress these problems [17], [18]. The encoder inputs a
ISSN 1859-1531
8 UD - JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 20, NO. 6.2, 2022

Fig. 1: The encoding-decoding structure used for future prediction or reconstruction with video volumes of τ frames.

normal video frame and extracts feature maps. The [20]. The motivation is that reconstruction over a longer
encoding features are then used to retrieve prototypical duration using the memory of the LSTM should capture
normal patterns in the memory items and to update the more complex flow patterns. The convolutional network
memory. Then the feature maps and aggregated mem- is used to encode each frame, then feeding these encod-
ory items are fed into the decoder for reconstructing the ing tensors to Convolutional LSTMs to memorize the
input video frame or predicting the next frame. Using change of the appearance which corresponds to motion
cosine similarity and the softmax function for matching information [20]. Two Deconvolutional Networks (De-
probability between incoming encoding features and convNet) are used, one for reconstructing past frames
memory items, the global memory can be read and writ- and to identify whether an anomaly occurs; and one for
ten to. Since normal patterns in training and testing sets reconstructing the current frame. Thus the reconstruc-
may be different, the memory items are updated during tion error is an indicator of the change in appearance or
training and testing time, with the use of a predefined motion. The temporal unit in [13], [20] is applied on the
threshold to prevent updating on anomaly patterns [18]. final spatial stage, which encodes high level represen-
However, it is impossible to find an optimal threshold tations. Interleaving RNNs between spatial convolution
to distinguish between normal and abnormal patterns layers has recently been shown to improve performance
under various scenarios. Meta-learning methodology is on precipitation now-casting [21]. The model can learn
introduced into a Dynamic Prototype Unit (DPU) to temporal information on hierarchical spatial represen-
learn prototypes for encoding normal dynamics and to tations from low-level to high-level. In our work, we
enable the fast adaption capacity to a new scene with adopt the same architecture, except that we remain
only a few training frames [19]. As in previous work with convolutional LSTMs instead of the complex tra-
[18], the DPU inputs the encoding feature maps, which jGRU RNN [21]. Our results show a comparable level
are outputs of the encoder part of U-net, to generate a of performance to the state of the art on benchmark
pool of dynamic prototypes. However it is trained in datasets with fewer model parameters than state of
a fully differential attention manner in which attention the art models. Moreover, using prediction gives better
mapping functions are implemented as fully connected performance than reconstruction. Finally, performance
layers and updated using gradient descent style. After varies as expected with different prediction windows.
training the AE backbone using only frame prediction
loss, the DPU module is trained with the meta-training
phase using frame pairs sampled from videos of diverse 2. Architecture
scenes. In the testing phase, in order to adapt the model
to a new scene, the first few frames of the sequence Figure 1 illustrates the encoding-decoding structure
in this scene are used to construct K-shot input-output for future prediction or reconstruction, motivated by
frame pairs. The results show that the DPU is more earlier work [21] and adapted for anomaly detection.
memory-efficient than the memory module in previous At each time step, the network takes a video volume
work [17], [18]. of τ video frames Ft−τ +1 , ..., Ft , and generates an
output volume of the same size, predicting the future
Another approach to learning regular spatio- Ft+1 , ..., Ft+τ or reconstructing the input in reverse
temporal patterns is to use a convolutional LSTM [13], order Ft , ..., Ft−τ +1 .
Hanh T. M. Tran et al.: ANOMALY DETECTION USING PREDICTION ERROR WITH SPATIO-TEMPORAL CONVOLUTIONAL LSTM 9

2.1. Encoding-decoding model range [0, 1]. We stack τ frames in the 4th dimension
The structure consists of two networks, an encoding into video volumes and use them as the input of size
network and a decoding network (Fig. 1). The encoder 227 × 227 × 1 × τ to the encoder. Following [10], we
contains three convolutional layers, each followed by generate more video sequences by concatenating frames
leaky ReLU with negative slope equal to 0.2 [22]. In with skipping strides of 1, 2 and 3, thereby simu-
order to do down-sampling, we use all three convolu- lating faster motion patterns. Although speed can be
tional layers with stride. The strided convolution allows important in anomaly detection, we still carry out this
the network to learn its own spatial down-sampling. augmentation to minimise over-fitting and to have a fair
Similarly, three deconvolution layers are used in the de- comparison with [10], [13]. Unlike [10], we do not stack
coder to learn its own spatial up-sampling. The goal of precomputed optical flow into our input volume, in the
temporal encoding is to capture and compress changes expectation that the network can learn the necessary
due to motion in the input sequence into encoding patterns of motion.
hidden states that allow the decoder to reconstruct the
input or predict the future. 3. Training
The weights Wl and biases bl of each layer l are
learned by minimizing the regularized least squares
error:

1 X λX
N
kθn − θ̂n k22 + kWl k22 (1)
2N τ n=1 2 l

where θ̂n is the predicted frame sequence (or the recon-

structed frame sequence) from the model and θn is the
target sequence. The first term is the prediction error
(or the reconstruction error) and the second term is to
regularize the weights. λ is a hyper-parameter used to
balance the importance of two terms.
The weights in each convolutional layer are ini-
tialized from a zero-mean Gaussian distribution with
Fig. 2: The regularity score of video sequence of the CUHK Avenue standard deviation calculated from the number of input
dataset [4]. The score decreases when an anomaly (a running man)
appears on the scene.
channels and the spatial filter size of the layer [24].
This is a robust initialization method that particularly
Spatio-temporal LSTM cells [23] are employed as a considers the rectifier nonlinearities. We initialize the
temporal encoder/decoder. At each time t, the convo- weights for convLSTM using a zero-mean Gaussian
lutional LSTM (convLSTM) module receives as input a distribution with a fixed standard deviation of 0.01. The
new video frame after projection in the spatial feature biases for all layers are initialized to zero. The input-
space. This is used together with the memory content to-hidden and hidden-to-hidden convolutional filters in
and output of the previous step t − 1 to compute new the convLSTM cell are the same size.
memory activations. Interleaving multiple convLSTMs
between convolutional layers helps the model learn 3.1. Anomalous event detection
spatio-temporal dynamic information at different levels.
The Adam [25] method is used to optimize the error
The high level states capture global spatial-temporal
in Eq. 1 with batch size N = 4, momentum of 0.9 and
representations while the lower level states retain the
0.999, and weight decay λ = 5 × 10−4 [26]. We train
detail of local spatio-temporal representations. After the
a network separately on each dataset so that the model
last frame is read, the decoding LSTMs take correspond-
learns the specific normal patterns. An event may be
ing states from the encoder as their initial states and
normal in one dataset but abnormal in another. For
output an estimate for the target sequence (Fig. 1). The
example, people going towards the turnstile to enter
low-level states are combined with the up-sampling
the subway station is normal in the Subway Entrance
outputs as the initial states and inputs of decoding
dataset but abnormal in the Subway Exit dataset. We
LSTMs helps to aggregate low-level information to the
start training the model with a learning rate of 10−4 .
up-scaling data stream. Therefore, the output contains
After 80 epochs, we stop training and use the model for
details on both background and object (Fig. 3).
anomaly detection.

2.2. Input data layer

4. Regularity score for anomaly detection
The input to the model is a video volume consist-
ing of τ consecutive frames. Each frame is extracted Once the model is trained, the prediction error be-
from raw video, converted to a gray-scale image and tween each output frame F̂i and the target frame Fi
resized to 227 × 227. The pixel values are scaled to the in the video sequence is computed, then errors of all τ
10 UD - JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 20, NO. 6.2, 2022
TABLE 1: Performance comparison with the state of the art.

AUC/EER (%)
Method
UCSDPed1 UCSDPed2 CUHK Av- Subway En- Subway
enue trance Exit
Conv-WTA [7] 91.6/14.8 95/9.5 81/26.5 - -
AMDN [8] 92.1/16 90.8/17.1 - -
GAN [11] - 93.5/15.6 - - -
Conv-AE [10] 81/27.9 90/21.7 70.2/25.1 94.3/26.0 80.7/9.9
ST-AE [13] 89.9/12.5 87.4/12.0 80.3/20.7 84.7/23.7 94.0/9.5
Past-Current-LSTM [20] 75.5/− 88.1/− 77/− 93.3/− 87.7/−
STAE-3D [16] 92.3/15.3 91.2/16.7 77.1/33.8 − −
FlowNet-Unet-GAN [14] 83.1/− 95.4/− 85.1/− − −
Two-streams AE [15] - 94.1/− 83.3/− − −
MemAE [17] - 96.2/− 86.9/− − −
LMN [18] - 97/− 88.5/− − −
MPD* [19] 83.2/− 95.1/− 84.0/− − −
MPD [19] 85.1/− 96.9/− 89.5/− − −
Ours (prediction) 80.8/25.1 92.3/14.4 84.8/22.4 90.2/15.9 95/8

frames are summed up to form the prediction error for 5.2. Anomalous event detection
a volume as follows: Two performance metrics are employed for eval-
uation and comparison with state of the art results:
X
i=t+τ
e(t) = ||F̂i − Fi ||2 (2) Equal Error Rate (EER) and Area Under the ROC Curve
i=t+1 (AUC). The regularity score of each volume determines
whether it is normal or abnormal. We follow the in-
The prediction error then is normalized to compute a tuition that testing video volumes containing normal
regularity score s(t) of a testing volume as follows [10]: events generate high regularity scores (Eq. 3) since they
are similar to training data. A testing video sequence
e(t) − mint′ e(t′ ) containing an anomaly gives a lower score. Setting
s(t) = 1 − (3)
maxt′ e(t′ ) different thresholds on the regularity score, volumes
are classified into those that contain an anomaly and
where mint′ e(t′ ) and maxt′ e(t′ ) are calculated over the those that do not. These predictions are compared with
prediction errors of all volumes in the same test video. ground-truth to give the equal error rate (EER) and
If the regularity score s(t) is less than a threshold, the area under the curve (AUC) of the resulting ROC curve
corresponding test volume is abnormal. (TPR versus FPR) generated by varying an acceptance
We also use the same architecture for reconstruction threshold. Good performance has a low EER and high
in our experiments. Instead of using the next τ frames AUC.
as the target sequence, we use the input sequence in re- Table 1 shows that the model trained for prediction
verse order as the target. Replacing the target sequence performs comparably to state of the art results. Per-
in Eq. 2, we obtain the reconstruction error and use it formance on UCSDPed1 is relatively poor, whilst for
for anomaly detection with the reconstruction model. CUHK Avenue, the AUC is better than most methods,
except FlowNet-Unet-GAN [14], MemAE [17], LMN
[18], MPD [19]. However, MemAE [17], LMN [18] and
5. Experiments MPD [19] have more parameters than our models which
is shown in Table 3.
Our method is evaluated both quantitatively and
qualitatively. We modify and use Caffe [27] for all our TABLE 2: Comparison of AUC/EER with different models. τ is the
number of frames in an input sequence and a target sequence.
experiments. Code and trained models are available at
https : //github.com/t2mhanh/convLST M _P redicti
on_AnomalyDetection. AUC/EER (%)
Method
UCSDPed1 UCSDPed2 CUHK
Avenue
5.1. Datasets Reconstruction 75.6/28.9 87.5/17.1 81.4/26.1
τ =2 78.3/27.1 86.1/21.1 85.1/22.5
Our models are trained on five of the most com-
Prediction τ = 5 80.8/25.1 92.3/14.4 84.8/22.4
monly used datasets for anomaly detection: UCSD
τ =8 79/26.5 89.6/18.5 83.2/23.2
(UCSDPed1 and UCSDPed2) [2], CUHK Avenue [4],
Subway (Entrance and Exit) [5]. The UCSD and CUHK
datasets have separate training videos which contain Table 2 shows the results when different models
mostly normal events. The first 12 minutes of Subway are used. In the table, “Reconstruction” is for a model
Entrance and the first 5 minutes of Subway Exit are used trained for reconstructing a sequence of 5 frames and
for training. “Prediction” is for models trained to predict τ frames.
Hanh T. M. Tran et al.: ANOMALY DETECTION USING PREDICTION ERROR WITH SPATIO-TEMPORAL CONVOLUTIONAL LSTM 11

The model trained for future prediction gives better state of the art are compared in Table 3. We achieve 75
results than the reconstruction model. This may be be- fps for anomaly detection with a GeForce GTX TITAN
cause prediction will always try to draw back to normal- X, faster than other state of the art methods with the
ity, whereas reconstruction works from pre-sight of an same setting [18].
anomalous sequence. The quality comparison between
reconstruction and prediction is shown in Fig. 3. TABLE 3: Comparison of model complexity and testing speed.

Methods Parameters (M) FPS

Conv-AE [10] 8.4 −
ST-AE [13] 1.1 −
STAE-3D [16] 0.5 −
MemAE [17] 6.2 45
LMN [18] 15.0 67
Ours 0.85 75
Reconstruction (error e(t) = 21.57) - UCSDPed2 - biker
As can be seen in Fig. 3, the future prediction of a
biker becomes worse than the prediction of a pedes-
trian. The model is trained mostly on video sequences
containing pedestrians, the prediction of the biker looks
similar to the pedestrian. Here the prediction error is
significantly larger than the reconstruction error.
Prediction (error e(t) = 41.03) - UCSDPed2 - biker
6. Conclusion
We have adapted a state of the art predictive
encoder-decoder deep network to detect abnormal
events in video. We evaluate detection performance
using both sequence prediction and reconstruction, and
show that prediction gives superior performance on
Reconstruction (error e(t) = 30.28) - UCSDPed1 - car anomaly detection. For the prediction model, we obtain
competitive performance to state of the art methods
on five standard datasets. Finally, we evaluate perfor-
mance across different prediction windows, encompass-
ing varying levels of motion complexity. Our future
work includes investigating the fusion of gray-scale
images and optical flow on input.
Prediction (error e(t) = 66.25)- UCSDPed1 - car
Acknowledgment
This work was supported by The Murata Science
Foundation and The University of Danang - University
of Science and Technology, code number of Project:
T2021-02-05MSF.

Reconstruction (error e(t) = 11) - CUHK Avenue - running References

[1] Kim J, Grauman K. Observe locally, "infer globally: a space-
time MRF for detecting abnormal activities with incremental
updates", 2009 IEEE conference on computer vision and pattern
recognition, 2009 Jun 20 (pp. 2921-2928).
[2] Mahadevan, Vijay, Weixin Li, Viral Bhalodia, and Nuno Vas-
concelos, "Anomaly detection in crowded scenes", In 2010
IEEE computer society conference on computer vision and pattern
recognition, pp. 1975-1981, IEEE, 2010.
Prediction (error e(t) = 59.81) - CUHK Avenue - running [3] Cong, Yang, Junsong Yuan, and Ji Liu, "Sparse reconstruction
cost for abnormal event detection", In CVPR, 2011, pp. 3449-
Fig. 3: Prediction and reconstruction of third frame out of 5 3456. IEEE, 2011.
(middle), compared to target frame (left); accumulated per-pixel
[4] Lu, Cewu, Jianping Shi, and Jiaya Jia, "Abnormal event
error over 5 frames as blue-green-red colour map (right). Ground
detection at 150 fps in matlab," In Proceedings of the IEEE
truth anomalies shown as rectangles. Taken from UCSDPed2,
international conference on computer vision, pp. 2720-2727, 2013.
UCSDPed1 and CUHK Avenue. Best viewed in color
[5] Adam, Amit, Ehud Rivlin, Ilan Shimshoni, and Daviv Reinitz,
"Robust real-time unusual event detection using multiple
The number of model parameters for the method fixed-location monitors," IEEE transactions on pattern analysis
against different end-to-end trainable models in the and machine intelligence 30, no. 3 (2008): 555-560.
12 UD - JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 20, NO. 6.2, 2022

[6] Wang, Siqi, En Zhu, Jianping Yin, and Fatih Porikli, "Anomaly model," Advances in neural information processing systems, 30
detection in crowded scenes by SL-HOF descriptor and fore- (2017).
ground classification," In 2016 23rd International Conference on [22] Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng,
Pattern Recognition (ICPR), pp. 3398-3403, IEEE, 2016. "Rectifier nonlinearities improve neural network acoustic
[7] Tran, Hanh TM, and David Hogg, "Anomaly detection using models," In Proc. icml, vol. 30, no. 1, p. 3, 2013.
a convolutional winner-take-all autoencoder," In Proceedings [23] Shi, Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung,
of the British Machine Vision Conference 2017, British Machine Wai-Kin Wong, and Wang-chun Woo, "Convolutional LSTM
Vision Association, 2017. network: A machine learning approach for precipitation now-
[8] Xu, Dan, Elisa Ricci, Yan Yan, Jingkuan Song, and Nicu Sebe, casting," Advances in neural information processing systems, 28
"Learning deep representations of appearance and motion for (2015).
anomalous event detection," arXiv preprint arXiv:1510.01553 [24] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
(2015). "Delving deep into rectifiers: Surpassing human-level perfor-
[9] Tran, Thi Minh Hanh. "Anomaly Detection in Video." PhD mance on imagenet classification," In Proceedings of the IEEE
diss., University of Leeds, 2018. international conference on computer vision, pp. 1026-1034, 2015.
[10] Hasan, Mahmudul, Jonghyun Choi, Jan Neumann, Amit K. [25] Kingma, Diederik P., and Jimmy Ba, "Adam: A method for
Roy-Chowdhury, and Larry S. Davis, "Learning temporal stochastic optimization," arXiv preprint arXiv:1412.6980 (2014).
regularity in video sequences," In Proceedings of the IEEE [26] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton,
conference on computer vision and pattern recognition, pp. 733- "Imagenet classification with deep convolutional neural net-
742, 2016. works," Advances in neural information processing systems, 25
[11] Ravanbakhsh, Mahdyar, Enver Sangineto, Moin Nabi, and (2012).
Nicu Sebe, "Training adversarial discriminators for cross- [27] Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev,
channel abnormal event detection in crowds," In 2019 IEEE Jonathan Long, Ross Girshick, Sergio Guadarrama, and
Winter Conference on Applications of Computer Vision (WACV), Trevor Darrell, "Caffe: Convolutional architecture for fast fea-
pp. 1896-1904, IEEE, 2019. ture embedding," In Proceedings of the 22nd ACM international
[12] Zhao, Bin, Li Fei-Fei, and Eric P. Xing, "Online detection conference on Multimedia, pp. 675-678, 2014.
of unusual events in videos via dynamic sparse coding," In
CVPR 2011, pp. 3313-3320. IEEE, 2011.
[13] Chong, Yong Shean, and Yong Haur Tay, "Abnormal event
detection in videos using spatiotemporal autoencoder," In In-
ternational symposium on neural networks, pp. 189-196. Springer,
Cham, 2017. Hanh T. M. Tran is currently a Lec-
[14] Liu, Wen, Weixin Luo, Dongze Lian, and Shenghua Gao, "Fu- turer with the Department of Electronics
ture frame prediction for anomaly detection–a new baseline," and Telecommunications, the University of
In Proceedings of the IEEE conference on computer vision and Danang - University of Science and Tech-
pattern recognition, pp. 6536-6545, 2018. nology, Vietnam, where she joined since
[15] Nguyen, Trong-Nguyen, and Jean Meunier, "Anomaly detec- 2009. She received the B.Eng. and M.Eng.
tion in video sequence with appearance-motion correspon- degrees in Electronics and Telecommunica-
dence," In Proceedings of the IEEE/CVF international conference tions from the University of Danang - Uni-
on computer vision, pp. 1273-1283, 2019. versity of Science and Technology in 2008
[16] Zhao, Yiru, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and 2011, respectively. She obtained the
and Xian-Sheng Hua, "Spatio-temporal autoencoder for video Ph.D. degree from the University of Leeds,
anomaly detection," In Proceedings of the 25th ACM interna- United Kingdom, in 2018. She was a Visiting Researcher with the
tional conference on Multimedia, pp. 1933-1941, 2017. Arizona State University, Arizona, USA, in 2012. Her main research
[17] Gong, Dong, Lingqiao Liu, Vuong Le, Budhaditya Saha, interests include image/video processing, machine learning, deep
Moussa Reda Mansour, Svetha Venkatesh, and Anton van learning, anomaly detection, object detection and recognition.
den Hengel, "Memorizing normality to detect anomaly:
Memory-augmented deep autoencoder for unsupervised
anomaly detection," In Proceedings of the IEEE/CVF Interna-
tional Conference on Computer Vision, pp. 1705-1714, 2019.
[18] Park, Hyunjong, Jongyoun Noh, and Bumsub Ham, "Learn- David Hogg is Professor of Artificial Intel-
ing memory-guided normality for anomaly detection," In ligence at the University of Leeds. His re-
Proceedings of the IEEE/CVF Conference on Computer Vision and search is on artificial intelligence and par-
Pattern Recognition, pp. 14372-14381, 2020. ticularly in computer vision. He has been
[19] Lv, Hui, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, and Pro-Vice-Chancellor for Research and In-
Jian Yang, "Learning normal dynamics in videos with meta novation at the University of Leeds, visiting
prototype network," In Proceedings of the IEEE/CVF Conference professor at the MIT Media Lab, chair of
on Computer Vision and Pattern Recognition, pp. 15425-15434, the EPSRC ICT Strategic Advisory Team,
2021. and chair of the Academic Advisory Group
[20] Luo, Weixin, Wen Liu, and Shenghua Gao, "Remembering of the Worldwide Universities Network. He
history with convolutional lstm for anomaly detection," In is a Fellow of the European Association for
2017 IEEE International Conference on Multimedia and Expo Artificial Intelligence (EurAI), a Distinguished Fellow of the British
(ICME), pp. 439-444. IEEE, 2017. Machine Vision Association, and a Fellow of the International As-
[21] Shi, Xingjian, Zhihan Gao, Leonard Lausen, Hao Wang, Dit- sociation for Pattern Recognition. He is Director of the UKRI Centre
Yan Yeung, Wai-kin Wong, and Wang-chun Woo, "Deep learn- for Doctoral Training in Artificial Intelligence for Medical Diagnosis
ing for precipitation nowcasting: A benchmark and a new and Care at the University of Leeds