0% found this document useful (0 votes)
1 views4 pages

Comprehensive Study and Detection Of

This paper presents a comprehensive study on using neuromorphic computing and self-learning algorithms for detecting anomalies in autonomous video surveillance systems. It discusses the limitations of traditional CCTV systems and proposes a deep belief network (DBN) model that employs unsupervised learning for real-time anomaly detection in video feeds. The research aims to enhance human action recognition and automate the surveillance process by leveraging deep learning techniques to analyze video content effectively.

Uploaded by

atik2626
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views4 pages

Comprehensive Study and Detection Of

This paper presents a comprehensive study on using neuromorphic computing and self-learning algorithms for detecting anomalies in autonomous video surveillance systems. It discusses the limitations of traditional CCTV systems and proposes a deep belief network (DBN) model that employs unsupervised learning for real-time anomaly detection in video feeds. The research aims to enhance human action recognition and automate the surveillance process by leveraging deep learning techniques to analyze video content effectively.

Uploaded by

atik2626
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2020 IEEE International Conference on Convergence to Digital World – Quo Vadis (ICCDW 2020)

A Comprehensive Study and Detection of


Anomalies for Autonomous Video Surveillance
Using Neuromorphic Computing and Self
Learning Algorithm
Akansha Bhargava Gauri Salunkhe Kishor Bhosale
Electronics Engineering EXTC Engineering Electronics Engineering
Atharva College of Engineering Atharva College of Engineering Atharva College of Engineering
Mumbai, India Mumbai, India Mumbai, India
2020 International Conference on Convergence to Digital World - Quo Vadis (ICCDW) | 978-1-7281-4635-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICCDW45521.2020.9318650

[email protected] [email protected] [email protected]

Abstract— Video Analytics is widely applied in the from unlabeled data on its own from the inherent
field of surveillance. Recently, with the advent in technology structure given as an input. The automatic surveillance
deep learning network has been incorporated in the video
action detection. Traditional CNN is employed to extract system is meant to give alarm when it detects any
2D spatial features of image but for video it is required to abnormal activity by a person or a group. Similarly for
exploit CNN for temporal information. In this work we human computer interaction also the machine has to
propose to do instance segmentation in video bytes and learn and respond to the person according to their
predicting the actions with the help of deep learning. And, communication as well as actions & gestures of the
we aim to present an implementation of an algorithm that
can depict anomalies in real time video feed. person. That involves real time human action detection.
Different techniques and algorithms have been applied
Keywords— CCTV, action detection, anomalies, video for it. This paper discusses these research efforts and
surveillance, convolutional neural network. their contribution to the field. Still images provide
I. INTRODUCTION spatial information & videos (in addition spatial
information), contain temporal information. Action
CCTV cameras have been proved to boon to the recollection involves the distinguishing various actions
society from the household security to huge institutions from video clips (a sequence of 2D frames). One trivial
the CCTV cameras give the ability to monitor live approach is, to treat video frames as images and apply
situations. At certain times, it is required to analyze the image classification for every image. Aggregation of all
behavior of people and vehicles to find out what is classification scores will eventually provide a final
happening in a frame. Operators are employed to score. Researchers can apprehend the individual actions
investigate specific events by searching and monitoring in a video byte. Though most successful methods rely
manually which is a cumbersome task. In general, one upon restricted features set, which brings a disadvantage
has to go through the video bytes to deduce an as they only deal with individual space time relations
abnormal event [22]. Traditionally implemented CCTV but not amongst different individuals. This paper aims
systems are fundamentally dependent on the operator to discuss different methods of Human action
agility to find abnormalities. Hence rather than using recognition & integrating the novel of context
preventive tool surveillance cameras, feed is stored and awareness concept with modeling of the network.
used for post event analysis. Three methodologies can
be discussed to understand the abnormality in the region Human action recognition has become essential in
of interest these are the environment methodology, the many applications ranging from surveillance, automatic
individual activity methodology, and the people activity video indexing to human computer interaction. This
methodology [2]. The environment methodology gives field of computer vision has attracted lot of researchers
the idea about complete scene, usually not detecting to deal with the challenges involved. Recent approaches
people and crowd. In the human activity methodology, have demonstrated great performance in recognizing
the scene first detects the human and then decides individual actions. Although these actions will not make
whether an individual is behaving abnormal. The last any major difference when we consider a group of
approach, the crowd event methodology, uses people or when we take outdoor scenarios e.g. a subway
information about a group of people as a single unit, or a metro station in such areas the actions will be
which does not necessarily require information about driven by more than one individual action and these co
the individuals. This approach can be useful as it can relations will define various viewpoints for different
otherwise be difficult to model the complex relationship activities.
of pedestrians within proximity of each other [21]. The automatic surveillance system is meant to give
Video surveillance systems have found a great usage alarm when it detects any abnormal activity by a person
across all monitoring different events. With the or a group. Similarly for human computer interaction
significance of monitoring and inspecting, Video also the machine has to learn and respond to the person
surveillance has proved to be a great advantage for according to their communication as well as actions &
automated solutions to security. CCTVs of gestures of the person. That involves real time human
abnormalities are only possible if an algorithm can learn

978-1-7281-4635-5/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Swinburne University of Technology. Downloaded on March 30,2021 at 13:59:10 UTC from IEEE Xplore. Restrictions apply.
action detection. Different techniques and algorithms realize deviating motions. This helped in training a
have been applied for it. This paper discusses these model on different activities and simultaneously two
research efforts and their contribution to the field. activities can be identified.
II. RELATED WORK Ta et.al [5] proposed a new approach that takes
Video surveillance methods available today are based graph matching to accredit any action. They designed a
on perception of the object in space and time which uses video analysis method which will take two different
traditional methods for feature detection. An analysis of inputs at different space time relation. Features are
semantic feature is discussed in [7]. In this, any human defined and extracted based on which hyper graphs are
activity is broken down into smaller elements which give formed, which consists of any point that has more than
better results with complex data. Though this method did two nodes. The action identification problem gets
not come up with any solution for activities having a converted to finding nodes in the scene under
resemblance in appearances. observation. By enumerating local particular to one
given set are considered to detect the concerned actions
Abnormal motion detection algorithm has been rather than taking a whole background.
implemented in [11] that use macroblock motion vectors
to analyze real time behaviors. Incremental Learning: Humans keep on learning
things through their perception continually but for
In [12] MPEG motion vectors and human detection machines it becomes a limitation as the training sets are
and posture recognition have been used for activity limited and the AI based systems available today are
recognition. However, the real time detection has not following supervised learning [4]. Many self- learning
been done in that. algorithms can be found in literature. Anomaly
recognition and detection (AnRAD), is based on human
In [20] the computing model for text recognition has
neural system that gives probabilistic inferences [21],
been discussed where simple and fuzzy pattern detection
the paper discussed vehicle behavior detection, where
modules are used to accomplish comprehensive
16000 vehicles were being monitored and it took less
identification of the text.
than 0.2ms to test. These capabilities of deep learning
The architecture of the hardware resembles to neurons can be used for automated video surveillance.
of human brains. An autonomous anomaly detection
Deep Learning: CCTV systems generate large
framework on heterogeneous multicore platforms has
dataset and traditional machine learning approach
been discussed in [21] where Bayesian property of the
doesn't fully give the needed image detection whereas
system made it possible to use lesser number of training
deep learning methods prove to be superior when
sets to detect the anomaly.
dealing with the large dataset. These algorithms are able
Video recognition research is mostly dependent upon to emulate the human brain’s capability to perceive,
the advancements done in image recognition discern, assimilate, and incorporate decision, especially
methodologies that are frequently modified and for extremely complex problems. Different DL
transformed to work on video data. A large family of algorithms are discussed that can be useful for
video action recognition methods is based on local spatio- providing better tagging in videos [3].
temporal features, as well as on 2D, 3D neural networks
& even to deep networks. III. VIDEO ANALYTICS IMPLEMENTATION
This survey has presented a comprehensive
Li Wei et.al [6] recognized context information as one overview of the state of art researches and solutions to
of the most influencing aspect of human actions, and Videos surveillance systems. Installing CCTV cameras
utilized context models in their algorithm. They across the multiple locations and eventually analyzing
introduced a two-level scene context descriptor that the video feed manually is a cumbersome task.
describes the environment information of centered-target Predicting activities [1] from a video feed can be easily
at the global and local levels. done by a human but a machine should implement
A deep model for human activity recognition is being unsupervised learning to detect the anomalies. The
introduced. We propose a model based on deep belief proposed system aims to address this limitation by
network that can perform detection in an unsupervised using incremental learning. Video content analysis
manner. In [24] three steps approach can be seen, by includes, motion detection, shape recognition, style
proposing a two-stream ConvNet architecture which detection, tamper detection, video tracking [1-22].
incorporates spatial and temporal networks. Second,by Detection of intruders or pattern classification is still a
demonstrating that a ConvNet achieves good performance challenge for automated CCTV cameras and having a
when multi-frame training is done. And lastly, by human operator is an inefficient way to monitor
exhibiting that multitask approach in learning a given vigilantly. Humans lose their 95% of ability to
dataset when applied to dissimilar actions increase the concentrate after twenty-five minutes. In this research,
amount of training data and improve the performance on we propose a video content analysis technique that can
both. In [25] authors present an HMM-based approach use the human analysis method but also overcome the
that uses thresholding and voting for activity shortcomings of human behavior. Using predictive
segmentation and recognition. analysis and machine learning techniques to implement
human cognition to detect abnormality with low false
Discrete video frames are comprised of motion and rate. The research is based on the Neuromorphic
shape features. They represented each activity by a set Engineering [17], which is a self- learning algorithm
of Hidden Markov Models, where the model was acquiring continuous learning. The proposed approach
derived based on videos captured at different angles to

Authorized licensed use limited to: Swinburne University of Technology. Downloaded on March 30,2021 at 13:59:10 UTC from IEEE Xplore. Restrictions apply.
will emulate human brain behavior and will learn to invalid (infected) entries need to be separated into
detect any unusual activity.Human brain still training & testing datasets. The research will aim on
outperforms computers on seeing abnormalities, generating datasets from live camera feed. The data will
developing language skills or writing captchas per se be split into training (60%), validation (20%) and
hence a lot of research in the field of neuromorphic testing (20%). DBN will be first trained by using un-
computing is still going on, having computers to behave labeled data i.e. in unsupervised manner. And then will
like a human brain cannot be done by traditional use supervised training on same dataset, also for early
transistor-based computers but they require silicon- stopping we will use validation dataset.
based neuron structure for deep learning.
For few applications getting a labeled set of data is
A video can be considered as multiple frames put really tedious job and so unsupervised learning becomes
together; a spatio-temporal multidimensional array will be a necessity. Deep belief networks are one such
formed corresponding to feature vector. For initial generative model that can be trained using unlabeled
training, feature vectors for normal condition will be fed data and hence are unsupervised model. In DBN,
to the first layer and then this layer will be associated individual layers are connected to each other though
with two more sub layers teaching layer and general layer. scaling these kinds of models is still a big task. But they
The teaching layer is developed on GSOM functionality still address the problem in hand.
[21] and the general layer will be the input for the next
learning layer. Once the training phase is complete two Temporal Data
outcomes are predicted which gives the information about
winning node and the neighboring node [2]. All the
weight values are then combined using fuzzy logic and Neuromorphic Computer
one output is acquired to form learning for the next layer.
A group of similar weights is formed. And a new pathway
will be discovered if a new input dataset is provided to the DBN Input Layer
algorithm hence if there is a normal behavior in the video
it will follow the same path but if an abnormality is
encountered a new pathway will be formed which can be DBN layer
used to set the alarm in the videosurveillance system.
IV. METHODOLOGY
DBN Layer
Deep neural network provides a solution to analyze
the video content using unsupervised learning. We
propose a deep belief network for learning.DBN is an
unsupervised method to train an unlabelled data. DBNs Output Layer
have the potential to extract features and to categorize
them that finds applications in various image processing
fields. They can be extended to do the feature extraction Output
in video bytes too. RBM are the basic blocks for
constituting DBN. RBM has two layers namely hidden Fig. 1. Flow Diagram
and visible and they can be heaped on top of each layer.
These layers are disjoint in nature and they are trained
with different representation of data. Every layer output This dataset will be used to simulate neural network
becomes the input to next layer and input goes through which is trained on the same datasets and will be tested
non linear transformation. for anomaly in the video feed and to separate the actions
that are not normal. Since the proposed research is
Once the layer per layer training is done the model based on Home network the face recognition will have
follows the back-propagation technique to model limited dataset and the actions that research will focus
accordingly to fit through weights to optimally classify on will be on burglary and intruder getting inside the
the defined set. house without permission. Training a RBM network
uses contrastive divergence method which was
TABLE I. DEEP LEARNING METHODS VS HARDWARE proposed by Geoffrey Hinton. RBM has only two layers
PLATFORM
hidden and visible. Visible layers are initialized to
Sr. No. Deep Learning Methods Hardware Platform training vector and hidden units are updated in parallel
1. CCN HPC to visible units. This in turn will be used to update
visible units this is the “reconstruction step” now again
1. DBN Neuromorphic update the hidden units in parallel to visible units and
2. Boltzmann Machine Quantum finally update the weight. This method will update one
RBM on top of it we will place more trained RBM
V. TRAINING using the same methodology and eventually getting
DBN [26].
To train the neural network need datasets. We
will take the real the time data which will be collected VI. EXPECTED OUTCOME
from the CCTV cameras installed in a home network. The proposed algorithm will facilitate an
This raw data can be converted to structured format to unsupervised learning methodology that will discern
create datasets. This dataset consisting of both valid and essential characteristic or feature set from the given

Authorized licensed use limited to: Swinburne University of Technology. Downloaded on March 30,2021 at 13:59:10 UTC from IEEE Xplore. Restrictions apply.
video frame to provide advance assistance in [5] D. D.Silva, X. Yu., D. Alahakoon, and G Holmes., “Semi- supervised
surveillance field. The algorithm will be based on self- classification of characterized patterns for demand forecasting using
smart electricity meters,”(2011) International Conference on
learning algorithm that would help to design a better Electrical Machines and Systems, pp. 1–6.
system that could adapt to changing environment.
[6] D. D.Silva, X. Yu., D. Alahakoon, and G Holmes., “Incremental
VII. LIMITATIONS pattern characterization learning and forecasting for electricity
consumption using smart meters,”(2011) IEEE International
• Hardware limitations for neuromorphic chips. Symposium on Industrial Electronics, pp.807–812.
[7] T. Bandaragoda, D. De Silva, and D. Alahakoon, “Automatic event
• Neurons structure, input weights to deep neural detection in microblogs using incremental machine learning,” Journal
network must be designed during training process. of the Association for Information Science and Technology,
Apr.2017.
• Computation time is more for DBN.
[8] Z. Xu, H. R. Wu, X. Yu, and Z. Man, “Adaptive surveillance video
VIII. FUTURE SCOPE noise suppression,” in 2011 24th Canadian Conference on Electrical
and Computer Engineering (CCECE), 2011, pp. 000985–000988.
Although over the last few years, this field of [9] Tran, K. N., Yan, X., Kakadiaris, I. A., and Shah S. K. “A group
computer vision has seen a lot of progress in recent video- contextual model for activity recognition in crowded scenes” (2015).
based human activity recognition, there are still some In VISAPP.
apparent performance issues that make it challenging for [10] Wang, X. and Ji, Q. (2015). Video event recognition with deep
real-world deployment. More specifically: hierarchical context model. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 4418–4427.
• The standpoint issue-The orientation of camera plays [11] N. Kiryati, T. R. Raviv, Y. Ivanchenko, and S. Rochel, “Real- time
an invariable role to perfectly view the scenarios. It abnormal motion detection in surveillance video,” in 2008 19th
becomes important to place the camera in a position International Conference on Pattern Recognition, 2008, pp. 1–4.
that it can survey the entire area ofinterest. [12] Ozer B., Wolf W. and Akansu A.N., “Human activity detection in
MPEG sequences”, Workshop on Human Motion pp. 61, 2000.
• Background reduction also becomes a major
[13] J. M. Grant and P. J. Flynn, “Crowd Scene Understanding from
challenge when one has to depict the anomaly from Video: A Survey,” ACMTransactions on Multimedia Computing,
live feed. The automated camera must be able to Communications, and Applications, vol. 13, no. 2, p. 19:1– 19:23,
deduce from the surroundings. Any irregularity or Mar. 2017.
inconsistency must be reported. [14] Z. Wu, T. Yao, Y. Fu, and Y.-G. Jiang, “Deep Learning for Video
Classification and Captioning,” ArXiv160906782 Cs, Sep. 2016.
• Individual semblance can change depending on what [15]“Comparing image tagging services: Google Vision, Microsoft
the person is wearing or where the person is e.g. if a Cognitive Services, Amazon Rekognition and Clarifai “ Filestack
person is wearing a cap or if only half face is visible Blog, 14 - Mar-2017.
then to recognize a person and distinguishing [15] “Cognitive Services—Intelligence Applications | Microsoft Azure.”
activities from an abnormal scene becomes difficult to [Online].
identify. [16] L. Shi et al., "Development of a neuromorphic computing system,"
2015 IEEE International Electron Devices Meeting (IEDM),
• Individual semblance can change depending on what Washington, DC, 2015, pp. 4.3.1-4.3.4.doi:
the person is wearing or where the person is e.g. if a 10.1109/IEDM.2015.7409624
person is wearing a cap or if only half face is visible [17] Noceti N. and Odone F., “Unsupervised Video Surveillance,” in
then to recognize a person and distinguishing Computer Vision – ACCV 2010 Workshops, 2010, pp. 84– 93.
activities from an abnormal scene becomes difficult to [18] F. H Hamker., “Life-long learning Cell Structures: continuously
identify. learning without catastrophic interference,”(2001)Neural
Networks, vol. 14, no. 4, pp. 551– 573.
• The available datasets have simple actions defined [19] Q. Qiu, "A massive parallel neuromorphic computing model for
which are not able to distinguish between the certain intelligent text recognition," 2012 IEEE International SOC
human emotions. For example people cry out of Conference, Niagara Falls, NY, 2012, pp. 293-293.
happiness too but to differentiate it from panic [20] Q. Qiu, "AnRAD: A Neuromorphic Anomaly Detection Framework
condition will require more than available datasets for Massive Concurrent Data Streams," in IEEE Transactions on
Neural Networks and Learning Systems, vol. 29, no. 5, pp. 1622-
and different ways to capture these emotions. 1636, May2018.
[21] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., &
Chen,T. (2018). Recent advances in convolutional neural networks.
REFERENCES Pattern Recognition, 77, 354-377.
[1] Gowsikhaa D., X Abirami, V and Baskaran R., “Automated human [22] Li Wei & Shishir K. Shah,Human Activity Recognition Using Deep
behavior analysis from surveillance videos: a survey,” Artificial Neural Network With Contextual Information”
Intelligence Review, vol. 42, no. 4, pp. 747–765, Dec. 2014. [23] Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional
[2] Rashmika Nawaratne, Tharindu Bandaragoda, Achini Adikari, networks for action recognition in videos." In Advances in neural
Damminda Alahakoon, Daswin De Silva “Incremental Knowledge information processing systems, pp. 568- 576. 2014.
Acquisition and Self-Learning for Autonomous Video Surveillance”, [24] Feng Niu, Mohamed Abdel, HMM based Segmentation &
in 2017 43rd Annual conference of the IEEE Industrial Electronics Recognition of Human Activities From Video Sequences . IEEE
Society. Trans. Pattern Anal. Mach. Intell. 1994, 16, 449–459.
[3] Sargano A. B., Angelov P., and Habib Z., “A Comprehensive Review [25] GE Hinton (2010). A Practical Guide to Training Restricted
on Handcrafted and Learning-Based Action Representation Boltzmann Machines. Tech. Rep. UTML TR 2010-003.
Approaches for Human Activity Recognition,” Applied Sciences, vol.
7, no.1, p. 110, Jan. 2017.
[4] Silva D. D. and Alahakoon D., “Incremental knowledge acquisition
and self-learning from text,”(2010) International Joint Conference on
Neural Networks (IJCNN), pp. 1–8.

Authorized licensed use limited to: Swinburne University of Technology. Downloaded on March 30,2021 at 13:59:10 UTC from IEEE Xplore. Restrictions apply.

You might also like