ASurveyof Deep Learning Solutionsfor Anomaly Detectionin Surveillance Videos
ASurveyof Deep Learning Solutionsfor Anomaly Detectionin Surveillance Videos
net/publication/355516971
CITATION READS
1 2
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Stephen T Njenga on 06 October 2022.
Abstract— Deep learning has proven to be a landmark computing interest with many scholars making contributions on how to
approach to the computer vision domain. Hence, it has been achieve intelligent surveillance. This paper brings together
widely applied to solve complex cognitive tasks like the detection such inputs for betterment of the video anomaly detection.
of anomalies in surveillance videos. Anomaly detection in this Deep learning is a subset of machine learning that is based on
case is the identification of abnormal events in the surveillance
variants of artificial neural networks [3]. To address more
videos which can be deemed as security incidents or threats. Deep
learning solutions for anomaly detection has outperformed other complex cognitive intensive problems many layers of neural
traditional machine learning solutions. This review attempts to networks are stack together. The stacking of many layers to
provide holistic benchmarking of the published deep learning create a deep network of layers is referred to as deep learning.
solutions for videos anomaly detection since 2016. The paper Like the name ‘deep’ implies deep learning is all about the
identifies, the learning technique, datasets used and the overall scale where larger synthetic neural networks are trained with a
model accuracy. Reviewed papers were organised into five deep huge amount of data, while their performance and accuracy
learning methods namely; autoencoders, continual learning, continue to increase [4]. This is notably different from other
transfer learning, reinforcement learning and ensemble learning. machine learning techniques that reach a plateau in their
Current and emerging trends are discussed as well.
performance. Like machine learning, deep learning algorithms
Keywords- Deep Learning; Anomaly Detection; Anomaly Detection can be categorized as supervised, semi-supervised and
in Videos; Intelligence Video Surveillance; Deep Anomaly unsupervised algorithms.
Detection. Researchers have utilized supervised learning methods to
develop models to detect specific anomalous events. For
I. INTRODUCTION instance; traffic accident detectors [5],[6], violence detectors,
home intrusion detectors [7] and shoplifting detectors [8].
In the recent past, the use of surveillance cameras has rapidly Unfortunately, these early solutions cannot be generalized to
increased to enhance public safety. Unfortunately, the ability detect other abnormal events/actions since they have limited
of security forces to monitor these surveillance footages has use.
not kept up with the speed of generation and the volume of the To address such shortcomings, from the supervised models,
surveillance data. [1]. other researchers proposed to use unsupervised learning
This scenario has resulted in a critical problem in the algorithms. For instance, Waqas Sultani [2] proposes Multiple
utilization of the surveillance footage since more human Instance Learning that can be generalized across a variety of
monitors are required as the surveillance cameras increase. anomalies. Chong [9] also proposes to use the Conv2DLSTM
The monitoring task requires dedicated attention since Autoencoder as a solution for the generalization shortcomings.
anomalous events are very rare. Hence, human monitors might The purpose of this article is to provide a comprehensive
miss out to signal security incidents. benchmarking of the deep learning solutions implemented for
Anomaly detection refers to the act of identifying improper video anomaly detection, discover the underlying learning
behaviors in surveillance videos. Security surveillance techniques, identify trends in deep learning models design and
considers anomalies as security incidents or threats. For gaps in the existing solutions.
instance, some of the most popular anomalies include This paper examined the deep learning models published since
violence, abuse, theft, traffic accidents, explosions, fighting, 2016. Only the deep learning solutions in anomaly detection in
abuse, shooting, weapons, stealing, vandalism and shoplifting surveillance videos were considered. 30 papers were reviewed
[2]. Anomaly detection in videos is complex at several levels: and summarized for better analysis. Datasets used for training
due to the subjectivity of anomaly definition, the rarity of and testing, model accuracy, learning techniques and the
anomalies in videos, big video data and high computational
power required. This research problem has spiked research
www.ijcit.com 184
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 10 – Issue 5, September 2021
underlying deep learning algorithms are identified and since the models can differ in the number of layers, filter size
compared for ranking. and dimension, as well as the basic constructs [4]. Most of the
The structure of this paper is as follows; Section II describes models used in the detection of anomalies in videos have
other review papers in this field of video anomaly detection. ConvNets, ConvLSTM and 3DCNN as the basic building
This work is put into context with other surveys, gaps present blocks.
and contributions are discussed. Section III describes the A layer within a deep learning model is composed of
findings of the systematic review. Deep learning models are interconnected nodes(neurons). A node may be connected to
analyzed and their different approaches to solving the anomaly all other nodes in the adjacent layers or not. Data fed through
detection problem. The models are clustered into renowned the model goes through each layer and is transformed to an
deep learning design architectures and learning techniques. A abstract representation also known as extracted features. The
roadmap of the researches and the rapid improvements are training process sets the weights across different
detailed as well. Section IV introduces the noted trends in transformation functions. Then the model modifies the
deep video anomaly detection, datasets, evaluation procedures. weights using backpropagation where the output is traced back
Finally, Section V wraps up with a conclusion and future to input modifying the weights [3].
expectations. Researchers have utilized this knowledge to design models
using deep learning frameworks like PyTorch, Keras,
II. RELATED WORKS TensorFlow and others. A model can be composed of
This paper is concerned with a review of deep learning different deep learning algorithms, where several deep
solutions for video anomaly detection problems. Other learning algorithms are stacked together to produce a model.
publications outside that scope are not considered. Several For instance, a model may have Conv2D, ConvLSTM, layers
reviews were found to be close to this delimitation. combine to get a model that works for sequential problems
A survey of video anomaly detection by Lomte and others using high dimensional data like videos. It can be noted that
investigated all solutions made for video anomaly detection: the nature of the problem inspires the design of the model.
both machine learning and deep learning. The review Our literature survey provides a comprehensive road map of
considered very few papers, hence the need for a deep anomaly detection in videos. The works published in this
comprehensive review [10]. area are clustered according to the learning technique. The
A survey by Mohammadi, Fathy and Sabokrou [11] on deep methodologies used to detect anomalies are analyzed and
video anomaly detection concentrates on the technologies and compared to uncover the rapid changes and their motivations.
learning techniques in deep learning. Their survey considered This paper explores more publications than its peers and aims
algorithms and models from the traditional models like at establishing the trends and some ranking of the best model.
Histogram of Gradient (HoG) and the deep learning models .
like autoencoders. The work is drawn more on the internal
working of the models and not the publications done in that III. DEEP LEARNING MODELS
area.
This chapter discusses the landmark publications in deep
Other related work includes the publication made by Monika
learning anomaly detection in surveillance videos that have
Singh on video anomaly detection [12]. This paper
received much attention. This section was arranged according
investigates the image-based techniques used to detect
to the learning techniques present in the reviewed papers.
anomalies. Image-based techniques span from image
Autoencoders, transfer learning, ensemble learning,
recognition or what is referred to sparse representations. The
reinforcement learning and continual learning are the large
paper goes further to assemble the techniques based on a
thematic areas.
single object and multiple trajectories and motion. This paper
is limited to sparse based solutions and it focuses more on the A. Transfer Learning
inner working mechanisms of this technology. Transfer Learning describes the handover of the knowledge
Other works with similar interests include Ramchandran [13], from one model to another. This approach uses an already pre-
which concentrates on the video scenes and anomalies trained model to solve a different task. Transfer learning is
definition, another paper that cannot be ignored is the survey very useful when there is a scarcity of data or computational
on traffic anomalies by Santhosh [14]. This paper analyses the resources since it allows the models to use less data by re-
computer vision-based techniques that sort to understand the using the learned weights from the pre-trained model. Other
traffic violations and on-road anomalies. Their work analyses advantages of transfer learning include improvement of
the techniques, frameworks, datasets and gaps. This work is performance accuracy of the base model and reduction of the
limited to traffic anomalies only. training time.
Since deep learning is the backbone of this study, it is Transfer Learning was found as a growing trend in video
important to understand what deep learning models entail and anomaly detection and deep learning. One strategy of
the different design architectures. Deep learning models are implementing transfer learning is through feature extraction.
composed of neural network variants that are multi-layered. Pretrained models were used to extract features from labelled
The architecture of deep learning models is extremely flexible video and imagery data. The pre-trained models used mostly
www.ijcit.com 185
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 10 – Issue 5, September 2021
in the reviewed models include deep 3-dimensional image classifiers in feature extraction to solve the problem of
convolutional networks (C3D) Model [2], Inception V3 anomaly detection. The paper found that the Xception model
Module [15], I3D, You Look Only Once Version 3 outperforms its counterparts and it can be used for features
(YOLOV3). extraction even though the whole idea performs poorly
C3D borrows from BVLC Caffee which was modified to compared to other anomaly detection methods. Other
support 3D Convolution and pooling [16]. C3D model was examples of transfer learning found in the review include [20],
trained on UCF-101 and sports 1M videos to extract features [21], [22], [23], [15], [24].
from videos. Which can be very useful in down sampling the In this review, 30% out of the 30 papers reviewed had
dataset for effective processing. The 3D Convolution extracts, adopted the transfer learning paradigm by using pre-trained
both temporal and spatial features of the motion of objects, models to improve the performance of the new models. Pre-
human scenes detection and their interactions [16]. The pre- trained models design architecture is also borrowed to create
trained model has been used in various models. For instance, other models by borrowing the design knowledge.
Sultani and others [2] utilize C3D for feature extraction in
their paper. The model is set to input a video and then it B. Autoencoders
extracts a tensor of 4096 features. Autoencoders are a substantial part of the survey, out of 30
The use of pre-trained models to extract features from videos papers reviewed, 11 papers were found to have used the
has been widely used in anomalies detection research. Other autoencoder model design paradigm. Which is around 36%
feature extractor models that were found include Inflated 3D which is very significant statistically. Thus, autoencoders can
(I3D), which was trained on the Kinetics-400 dataset. I3D is be considered as a growing trend in video anomaly detection.
composed of two streams of inflated 3D ConvNets [17]. 2D Autoencoders are widely used due to their unsupervised
kernels were inflated by adding a time dimension to filters and nature, ability to learn without human supervision and
kernels from N x N to N x N x N. It extracts features from unlabeled data. The golden idea behind autoencoders is the
videos and gives an output of shape 1024. By default, frames reconstruction error that arises after when reconstructing the
fed should be of size 224x224 and video to be recorded at 25 abnormal frames. The reconstruction error of the irregular
frames per second (fps) [17]. videos is larger than regular videos. This idea is applied in
Another important feature extractor used by researchers is designing models that detect anomalies in videos.
You Look Only Once Version3 (YOLOv3) which is a deep The autoencoders found in the review have different
convolutional neural network that identifies specific objects in architectures and deep learning algorithms. For instance,
videos or images [18]. YOLOv3 is an improved version of Nguyen and Meunier [25] integrate a Conv-AE and Inception
YOLOv2 that borrows heavily from the DarkNet model that Module to form a deep autoencoder that detects the
was trained on Imagenet. YOLOv3 combines two 53 layers of appearance and motion features from the videos. The decoder
Darknets to form a deep 106-layer network. Object detection part of the model has two units that are dedicated motion and
in the model happens within three different locations. The first appearance.
Detection happens at the 82nd layer that uses the 1x1 kernel, Duman and Erdem [26] autoencoder is composed of
the second detection happens at the 94th layer that uses the Convolutional Autoencoder and Convolutional LSTM. This
2x2 kernel while the third detection occurs at the 106th layer framework uses Optical Flow to extract features of speed and
that uses the 2x2 kernel. The model also predicts bounding trajectory from the videos. The optical flow output is fed to
boxes on the objects and draws them around the objects and the autoencoder which returns the reconstructed optical flow
labels the objects [18]. This detector was used to extract map. The reconstructed output is subtracted from the input to
objects from videos which were used to define anomalies and acquire the mean squared error that is used to calculate the
normal scenes, which anomaly detection was based on. regularity score that indicates the abnormality level of every
Transfer learning was identified in the following papers, frame.
Sultani [2], that used C3D pre-trained model combined with Ramchandran and Sangaiah [27] unsupervised solution for
light classifier i.e. Support Vector Machine (SVM) to assign a anomaly detection in crowded scenes was based on
ranking score for the normal and abnormal instances. The autoencoder. The model was constituted of Conv-LSTM. Raw
C3D model was used for feature extraction. Motion and images sequences and edge image sequences were used to
trajectory features were extracted from the real-world UCF train the model.
crime dataset. The model was trained on both normal and Spatial-Temporal autoencoder is another variation of the
abnormal videos, which were used to generate the ranking autoencoders encountered in the review. This model was made
bags of normal and abnormal instances. A novel ranking loss by Zhao and others [28] in their paper named Spatial-
function is used to estimate the anomaly level for every video Temporal Autoencoder for Video Anomaly Detection. Their
[2]. model is composed of 3D convolutional layers. The
Nazare and others [19] also explored the use of Pretrained architecture of the network is made up of an encoder and two
CNNs in anomaly detection. Nazare explored several CNN decoder branches. The decoder branches, consist of the
networks including VGG-16, ResNet-50, Xception and prediction branch and reconstruction branch. The two
DenseNet-121.Their paper investigated the role of pre-trained branches are used to create prediction loss function and
www.ijcit.com 186
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 10 – Issue 5, September 2021
reconstruction loss function that are used to estimate regularity existing knowledge when they learn new observations. The
score for anomalies locating. model forgets when the new data to be learnt differs
Pawar and Attar autoencoder is a hybrid of a convolutional significantly from the previous observations. This causes the
autoencoder and LSTM autoencoder [29]. This presents new information to overwrite the previous knowledge in the
another design paradigm of combining two different common internal representation of the neural network. To curb
autoencoders to create a seamless model. The convolutional the catastrophic forgetting problem, a solution to regularize
part takes care of the image part while the LSTM preserves the the whole network to preserve the trained knowledge was
sequence. Reconstruction error is used to model the regularity proposed. This type of learning has been used for anomaly
score [29]. detection by Doshi [35].
Variational Autoencoder is an improvement of autoencoders The continual learning was just a part of the model that
that employs the use of probabilistic modelling to select the enabled the newly learnt anomalies to be added to the
best reconstruction from the latent space. Unlike the normal knowledge without losing the previous knowledge. The
autoencoders that encode the latent space as a single point, anomaly detection part utilized Euclidian distance k in the
variational autoencoders generate the latent space as a nearest neighbors (KNN) to identify anomalies that lie away
distribution. Wu and others [30] exploited this architecture to from the nominal manifold [35].
create a two-stream variational autoencoder to detect
anomalies in both local and streaming videos.
E. Reinforcement Learning
Other unique autoencoders found include Bhakat and
Ramakrishnan [31], Mahmudul Hasan and others [32], Reinforcement learning is another rare technique found in a
Sabokrou and Fathy [33] and another case of Spatial-temporal single paper during the review. This technique describes a
autoencoder by Chong and Tay [9]. The Spatial-temporal sequential decision-making system that uses an agent to make
autoencoder is different due to its building constructs. It choices and receives a reward when it makes the right choice.
employs time-distributed layers wrapped conv2d layers for the This enables the agent to acquire new behavior and skills
spatial part and convlstm2d for the temporal part. incrementally [36]. The learning process is cyclic since it
involves repeating series of steps. The initial step entails an
agent perceiving the environment and it acquires a new state
C. Ensemble Learning and a reward, the second step involves the agent choosing the
Other reviewed models are random cases of ensemble learning next cause of action. The third step involves the agent sending
that combines multiple learning algorithms to get better the action to the environment and finally, it modifies its
predictive performance than the constituent learning internal state as inspired by the previous state and agents’
algorithms alone. For instance, Zahid and others [15] is a actions [36].
typical case of ensemble and transfer learning. The model This learning technique has been applied by Aberkane [37] to
combines both a 3D convolutional network and a Fully detect anomalies in videos. Aberkane and Elarbi used a Deep
Connected (FC) Network. Vu and others [34] is another case Q Learning Network (DQN) to locate anomalies in videos.
of ensemble learning that combines Conditional Generative The model design borrows heavily from the Multiple Instance
Adversarial Networks, R-CNN and Support Vector Machines Learning by Sultani [2]. The DQN enables the agent to learn
(SVM). how anomalies are detected and recognized in the videos.
DQN is composed of a fully connected layer, that calculates
the probability of every video clip in the anomalous and
D. Continual Learning
normal bags demonstrating the likelihood of a clip containing
Continual Learning describes a non-stop learning mechanism, an anomaly [37].
step by step maintaining the previously learnt knowledge.
Other deep learning models incline to terribly forget the
www.ijcit.com 187
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 10 – Issue 5, September 2021
A variety of models were discovered from the empirical review. present in the model predictions. It establishes the threshold
All the models selected for the empirical review had some deep value for equalizing False acceptance and False Rejection [52].
learning components in them. Some models combine both deep The model accuracy is high if the ERR is low.
learning and traditional machine learning algorithms like
Support Vector Machine (SVM) [2]. The most popular F1 Score was used in some papers. This metric is used to
underlying deep learning algorithms are 3DCNN, ConvLSTM measure the accuracy of binary classification. It has been
and ConvNets. optimized by researchers to measure the accuracy of anomaly
detection while anomaly detection is optimized as binary
classification at the frame level. It is calculated as a weighted
IV. DATASETS, EVALUATION METRICS AND TRENDS ratio of the product of precision and recall and the sum of the
precision and recall. Unlike the ROC curve, it takes false
A. Datasets positive and false negative into account [52]. Usually, the F1
Publicly available, anomaly detection video datasets were score is between 0 and 1, and the higher the F1 value is the
used for the most of experiments in the reviewed works. UCF better the model is
Crime dataset, UCSD Ped1 & Ped2, and Avenue Dataset. The
UCF Crime dataset is 1900 hours long videos dataset that was C. Trends
introduced by Sultani [2], it is composed of real-life anomalies It can be noted that transfer learning and autoencoders deep
like Arrest, Arson, Abuse and many others. The training set learning techniques have taken the biggest part of the
has both abnormal and normal videos as well as the testing set. reviewed papers. The ability to transfer knowledge has made it
Although usage of both classes is dependent upon the nature easier to develop and deploy models faster and the ability to
of the model to be trained. For instance, in the auto-encoder improve the performance of the base model has made transfer
model, only the normal videos are used for training while in learning more lucrative.
his model was trained by both normal and abnormal videos Autoencoders ability to learn with minimal or no supervision
[9]. at all has made the researchers use its design technique to
University of California San Diego (UCSD) Ped1 & Ped2 implement different deep learning models that have shown
datasets are used for training and testing [48]. UCSD Ped1 is comparable performance to others models in the same area.
composed of 70 videos with 34 as the training set and 36 as The emerging trends will be based on the ability of the model
the testing set. The videos scenery is a group of people to continually learn new knowledge since the nature of
walking in a park. Anomalies include non-pedestrian entities anomalies is subjective. Detection of anomalies in real-time
like bikers, skaters, carts, Wheelchairs and people walking in and the ability to progressively learn novel observations are
the grass area. important aspects of video surveillance.
Hence, the positioning of continual and reinforcement learning
Avenue dataset [49] contains 16 training and 21 testing video is the new direction. Currently, only a little work has been
clips. A total of 30652 frames are available in the dataset. done and due to the nature of anomaly detection, more interest
These videos are captured on a campus street using a still is expected to span that way to enable online anomaly
camera. Strange actions like the running of persons, riding a detection and sequential improvements of the models.
bike in the walkway are the abnormal events presented.
Other popular datasets found in the papers include UMN and V. CONCLUSION
ShangaiTech video datasets. The ShanghaiTech [50] Campus In this study, we have analyzed the deep learning solutions,
dataset is composed of 130 abnormal events within 13 implemented to solve the anomaly detection in videos. Deep
different. In total, the ShanghaiTech contains over 270,000 learning models were classified according to the learning
frames. University of Minnesota (UMN) dataset was techniques and design architecture. The mechanism of
introduced by researchers developing anomaly detection anomaly detection within the papers was discussed as well.
models for crowded scenes. Therefore, it contains crowd The most popular datasets used for training and testing the
escape and panic of 11 videos with 3 different scenes [51]. models were discussed, as well as the evaluation criteria used
in the reviewed works.
B. Evaluation Criteria It can be noted that hybrid models work better as seen in both
This subsection highlights some of the most popular model ensembles and transfer learning. There are emerging trends in
evaluation methods encountered during the review. Most a model’s ability to constantly improve its behavior in novel
models have employed the Receiver Operating Characteristic observations hence the reinforcement and continual learning
Curve (ROC) and its resultant Areas Under the Curve (AUC). are expected to receive much attention. Future work in this
The ROC curve is the plot of the successful cases of True research domain should focus on online anomaly detection
Positives versus the False Positive [52]. This measures the problems, for real-time detections and continued learning to
specificity and the precision of the model. optimize the model behavior.
The equal Error rate (ERR) metric was found in many of the
papers as a measure that quantified the number of errors
www.ijcit.com 189
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 10 – Issue 5, September 2021
VI. REFERENCES detection," Journal of Intelligent & Fuzzy Systems, vol. 36, no. 3, pp.
1967-1975, 2019.
[1] R. Yadav and M. Rai, "Advanced Intelligent Video Surveillance System
[21] L. P. Cinelli, "Anomaly Detection in Surveillance Videos Using Deep
(AIVSS): A Future Aspect," Research Gate, 2018.
Resdiual Networks," Universidade Federal do Rio de Janeiro, Rio de
[2] W. Sultani, C. Chen and M. Shah, "Real-World Anomaly Detection in Janeiro, 2017.
Surveillance Videos," Proceedings of the IEEE Conference on Computer
[22] K. Doshi and Y. Yilmaz, "Any-Shot Sequential Anomaly Detection in
Vision and Pattern Recognition (CVPR), pp. 6479-6488, 2018.
Surveillance Videos," Proceedings of the IEEE/CVF Conference on
[3] A. Borner, "What is Deep Learning and How Does it Work? | Content Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 934-
Simplicity," 2019. [Online]. Available: 935, 2020.
https://contentsimplicity.com/what-is-deep-learning-and-how-does-it-
[23] K. Doshi and Y. Yilmaz, "Online anomaly detection in surveillance
work/.
videos with asymptotic bound on false alarm rate," Pattern Recognition,
[4] J. Brownlee, "What is Deep Learning?," 16 August 2019. [Online]. vol. 114, p. 107865, 2021.
Available: https://machinelearningmastery.com/what-is-deep-learning/.
[24] W. Ullah, A. Ullah, T. Hussain, Z. A. Khan and S. W. Baik, "An
[5] M. U. Farooq, N. A. Khan and M. S. Ali, "Unsupervised Video Efficient Anomaly Recognition Framework Using an Attention Residual
Surveillance for Anomaly Detection of Street Traffic," (IJACSA) LSTM in Surveillance Videos," AI-Enabled Advanced Sensing for
International Journal of Advanced Computer Science and Applications, Human Action and Activity Recognition, vol. 21, 2021.
pp. 270-275, 2017.
[25] T.-N. Nguyen and J. Meunier, "Anomaly Detection in Video Sequence
[6] K. T. Nguyen, D. T. Dinh, M. N. Do and M. T. Tran, "Anomaly with Appearance-Motion Correspondence," in Proceedings of the
Detection in Traffic Surveillance Videos with GAN-based Future Frame IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
Prediction," Proceedings of the 2020 International Conference on
Multimedia, pp. 457-463, 2020. [26] E. Duman and O. A. Erdem, "Anomaly Detection in Videos Using
Optical Flow and Convolutional Autoencoder," IEEE Access, vol. 7, pp.
[7] A. Kushwaha, A. Mishra, K. Kamble, R. Janbhare and A. Pokhare, 183914 - 183923, 2019.
"Theft Detection using Machine Learning," IOSR Journal of Engineering
[27] B. Ramachandra, M. J. Jones and R. R. Vatsavai, "A Survey of Single-
(IOSRJEN), pp. 67-71, 2018.
Scene," IEEE Transactions on Pattern Analysis and Machine
[8] K. Wiggers, "AI Guardsman uses computer vision to spot shoplifters," 26 Intelligence , 2020.
June 2018. [Online]. Available: https://venturebeat.com/2018/06/26/ai-
[28] Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu and X.-S. Hua, "Spatio-
guardsman-uses-computer-vision-to-spot-shoplifters/.
Temporal AutoEncoder for Video Anomaly Detection," Proceedings of
[9] Y. S. Chong and Y. H. Tay, "Abnormal Event Detection in Videos Using the 25th ACM international conference on Multimedia, pp. 1933-1941,
Spatiotemporal Autoencoder," arxiv, vol. 1701, no. 01546v1, 2017. 2017.
[10] V. Lomte, S. Singh, S. Patil, S. Patil and D. Pahurkar, "A Survey on Real [29] K. V. Pawar and V. Attar, "Deep learning approaches for video-based
World Anomaly Detection in Live Video Surveillance Techniques," anomalous activity detection," World Wide Web, p. 22, 27 May 2018.
International Journal of Research in Engineering, Science and
[30] H. Wu, J. Shao, X. Xu, F. Shen and H. Shen, "A System for
Management, vol. 2, no. 2, pp. 2581-5792, 2019.
Spatiotemporal Anomaly Localization in Surveillance Videos,"
[11] B. Mohammadi, M. Fathy and M. Sabokrou, "Image/Video Deep Proceedings of the 25th ACM international conference on Multimedia,
Anomaly Detection: A Survey," Computing Research Repository pp. 1225-1226, 2017.
(CoRR), vol. abs/2103.01739, 2021.
[31] S. Bhakat and G. Ramakrishnan, "Anomaly Detection in Surveillance
[12] M. Singh, "A Survey on Video Anomaly Detection," International Videos," Proceedings of the ACM India Joint International Conference
Journal of Engineering Research & Technology (IJERT), vol. 5, no. 10, on Data Science and Management of Data, p. 252–255, 2019.
2017.
[32] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury and L. S. Davis,
[13] A. Ramchandran and A. K. Sangaiah, "Unsupervised deep learning "Learning Temporal Regularity in Video Sequences," Proceedings of the
system for local anomaly event detection in crowded scenes," IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Multimedia Tools and Applications, vol. 79, no. 47/48, p. 35275–35295, pp. 733-742, 2016.
2020.
[33] M. F. M. M. Sabokrou, "Video anomaly detection and localisation based
[14] K. K. Santhosh, D. P. Dogra and P. P. Roy, "Anomaly Detection in Road on the sparsity and reconstruction error of auto-encoder," Electronic
Traffic Using Visual Surveillance: A Survey," ACM Computing Surveys, Letters, vol. 52, no. 13, pp. 1122-1124, 2016.
vol. 53, no. 6, pp. 1-26, 2019.
[34] T.-H. Vu, J. Boonaert, S. Ambellouis and A. Taleb-Ahmed, "Multi-
[15] Y. Zahid, M. A. Tahir and M. N. Durrani, "Ensemble Learning Using Channel Generative Framework and Supervised Learning for Anomaly
Bagging And Inception-V3 For Anomaly Detection In Surveillance Detection in Surveillance Videos," Human Activity Recognition Based on
Videos," in 2020 IEEE International Conference on Image Processing Image Sensors and Deep Learning, vol. 21, no. 9, p. 3179, 2021.
(ICIP), Abu Dhabi, United Arab Emirates, 2020.
[35] K. Doshi and Y. Yilmaz, "Continual Learning for Anomaly Detection in
[16] D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, "Learning Surveillance Videos," in 2020 IEEE/CVF Conference on Computer
Spatiotemporal Features with 3D Convolutional Networks," IEEE Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020.
International Conference on Computer Vision (ICCV), p. 4489–4497,
[36] J. Torres, "A gentle introduction to Deep Reinforcement Learning,"
2015.
Towards Data Science, 15 May 2020. [Online]. Available:
[17] J. Carreira and A. Zisserman, "Quo Vadis, Action Recognition? A New https://towardsdatascience.com/drl-01-a-gentle-introduction-to-deep-
Model and the Kinetics Dataset," Computing Research Repository, vol. reinforcement-learning-405b79866bf4. [Accessed 18 August 2021].
abs/1705.07750, 2018.
[37] S. Aberkane and M. Elarbi, "Deep Reinforcement Learning for Real-
[18] V. Meel, "YOLOv3: Real-Time Object Detection Algorithm (What’s world Anomaly Detection in Surveillance Videos," in 2019 6th
New?)," viso.ai, 25 Feb 2021. [Online]. Available: https://viso.ai/deep- International Conference on Image and Signal Processing and their
learning/yolov3-overview/. [Accessed 17 August 2021]. Applications (ISPA), Mostaganem, Algeria, 2019.
[19] T. S. Nazare, R. F. de Mello and M. A. Ponti, "Are pre-trained CNNs [38] R. Chalapathy, A. K. Menon and S. Chawla, "Robust, Deep and
good feature extractors for anomaly detection in surveillance videos?," Inductive Anomaly Detection," Machine Learning and Knowledge
eprint arXiv, no. 1811.08495v1, 2018. Discovery in Databases, vol. 10534, 2017.
[20] S. Bansod and A. Nandedkar, "Transfer learning for video anomaly [39] K. Kavikuil and J. Amudha, "Leveraging Deep Learning for Anomaly
www.ijcit.com 190
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 10 – Issue 5, September 2021
Detection in Video Surveillance," First International Conference on [51] R. Mehran, A. Oyama and M. Shah, "Abnormal Crowd Behavior
Artificial Intelligence and Cognitive Computing, vol. 815, no. I, pp. 239- Detection using Social Force Model," in IEEE Conference on Computer
247, 2018. Vision and Pattern Recognition (CVPR), Miami, 2009.
[40] K. Liu, M. Zhu, H. Fu, H. Ma and T.-S. Chua, "Enhancing Anomaly [52] T. Kanstren, "A Look at Precision, Recall, and F1-Score," Towards Data
Detection in Surveillance Videos with Transfer Learning from Action Science, 09 September 2012. [Online]. Available:
Recognition," Proceedings of the 28th ACM International Conference on https://towardsdatascience.com/a-look-at-precision-recall-and-f1-score-
Multimedia, pp. 4664-4668, 2020. 36b5fd0dd3ec. [Accessed 18 August 2021].
[41] L. P. Cinelli, L. A. Thomaz, A. F. d. Silva, E. A. B. d. Silva and S. L. [53] H. M. Kun Liu, "Exploring Background-bias for Anomaly Detection in
Netto, "Foreground Segmentation for Anomaly Detection in Surveillance Surveillance Videos," Proceedings of the 27th ACM International
Videos Using Deep Residual Networks," XXXV Simposio Brasileiro de Conference on Multimedia, pp. 1490-1499, 2019.
Telecomuniac, Oes e processamento de Sinais, pp. 3-6, 2017. [54] R. V. H. M. Colque, C. Caetano and M. T. L. d. Andrade, "Histograms of
[42] W. Ullah, A. Ullah, I. U. Haq, K. Muhammad, M. Sajjad and S. W. Baik, Optical Flow Orientation and Magnitude and Entropy to Detect
"CNN features with bi-directional LSTM for real-time anomaly detection Anomalous Events in Videos," IEEE TRANSACTIONS ON CIRCUITS
in surveillance networks," Multimedia Tools and Applications, p. 16979– AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 27, no. 3, pp. 673-
16995, 2021. 682, 2017.
[43] M.Murugesan and S.Thilagamani, "Efficient anomaly detection in [55] A. Sarkar, "Human Activity and Behavior Recognition in Videos. A
surveillance videos based on multi-layer perception recurrent neural Brief Review," 2014. [Online]. Available:
network," in Microprocessors and Microsystems, 2020. https://www.grin.com/document/276054.
[44] V. A. Karishma Pawar, "Application of Deep Learning for Crowd [56] A. Kushwaha, A. Mishra, K. Kamble and R. Janbhare, "Theft-Detection
Anomaly Detection from Surveillance Videos," in 2021 11th using Motion Sensing Camera," International Journal of Innovative
International Conference on Cloud Computing, Data Science & Science and Research Technology, pp. 90-97, 2017.
Engineering (Confluence), Noida, India, 2021. [57] M. Sabokrou, M. Fayyaz, M. Klette and R. Fathy, "Deep-Cascade:
[45] Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu and X.-S. Hua, "Spatio- Cascading 3D Deep Neural Networks for Fast Anomaly Detection and
Temporal AutoEncoder for Video Anomaly Detection," Proceedings of Localizaton in Crowded Scenes," IEEE Transactions on Image
the 25th ACM international conference on Multimedia, pp. 1933-1941, Processing, pp. 1992-2004, 2017.
2017. [58] M. Sabokrou, M. Fayyaz, M. Fathy, Z. Moayed and R. Klette, "Deep-
[46] N. Nasaruddin, K. Muchtar, A. Afdhal and A. P. J. Dwiyantoro, "Deep Anomaly: Fully Convolutional Neural Network for Fast Anomaly
anomaly detection through visual attention in surveillance videos," Detection in Crowded Scenes," Computer Vision and Image
Journal of Big Data, vol. 7, no. 87, 2020. Understanding, pp. 1-25, 2018.
[47] A. Khaleghi and M. S. Moin, "Improved anomaly detection in [59] W. Badr, "Auto-Encoder: What Is It? And What Is It Used For? (Part
surveillance videos based on a deep learning method," in 2018 8th 1)," towards data science, 22 April 2019. [Online]. Available:
Conference of AI & Robotics and 10th RoboCup Iranopen International https://towardsdatascience.com/auto-encoder-what-is-it-and-what-is-it-
Symposium (IRANOPEN), Qazvin, Iran, 2018. used-for-part-1-3e5c6f017726. [Accessed 10 May 2021].
[48] UCSD, "UCSD Anomaly Detection Dataset," UCSD, 2014. [Online].
Available: http://www.svcl.ucsd.edu/projects/anomaly/dataset.html.
[Accessed 10 May 2021].
[49] C. Lu, J. Shi and J. Jia, "Avenue Dataset for Abnormal Event Detection,"
The Chinese Univeristy of Hong Kong, 2013. [Online]. Available:
http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html.
[Accessed 10 May 2021].
[50] W. Liu, W. Luo, D. Lian and S. Gao, "Future Frame Prediction for
Anomaly Detection -- A New Baseline," in 2018 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT,
USA , 2018.
www.ijcit.com 191