Anomalous Activity Detection For Intelligent Visual Surveillance
Anomalous Activity Detection For Intelligent Visual Surveillance
SUBMITTED BY
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
We express our sincere gratitude to all the faculty members of the Department of
Computer Science and Engineering, SCT College of Engineering, Thiruvananthapuram
for their relentless support and inspiration. We are ever-grateful to our families, friends
and well-wishers for their immense goodwill and words of motivation.
We would like to express a note of deep obligation to our guide, Smt. Kutty Malu
VK, Assistant Professor, Department of Computer Science and Engineering, SCT College
of Engineering, for her excellent guidance and valuable suggestions. It was indeed a
privilege to work under her during the entire duration of this preliminary study. She has
immensely helped us with her knowledge and stimulating suggestions to shape this study,
refine arguments and present it to the best of our abilities.
LIST OF FIGURES iv
1 INTRODUCTION 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 CAD model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 LITERATURE REVIEW 5
3 DATASET 10
3.1 UCF-Crime Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 DESIGN 14
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 UML Use case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Anomalous Activity Detection Framework . . . . . . . . . . . . . . . . . . 18
4.5 Requirement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 ACTION PLAN 21
5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
i
6 SYSTEM VALIDATION 26
6.1 Conference call with Tata Elxsi on 28th September 2018 . . . . . . . . . . 26
6.2 Conference call with Tata Elxsi on 2nd November 2018 . . . . . . . . . . . 27
7 CONCLUSION 29
REFERENCES 30
ii
LIST OF ABBREVIATIONS
iii
LIST OF FIGURES
iv
CHAPTER 1
INTRODUCTION
’This area is under video surveillance’ has become a common sign in public places today.
300 Million video security cameras are in operation world wide. 700 Petabytes of daily
surveillance videos are recorded. Video security equipments are worth 20 billion US
dollars. But less than 0.1 % of the 300 Million surveillance cameras today are "blind" and
merely record video for post-incident manual analysis. In this scenario, dependability on
human monitors is increasing and reliability on the surveillance system is decreasing. We
have too many cameras and too few human monitors to analyze the events. The security
is at stake if the attention levels of the human monitors degenerate to unacceptable levels.
Our aim is to enable surveillance cameras to "see" and "understand" their environment
instantly and report to concerned users in case of events that need attention. We aspire to
improve the reliability of the surveillance system so that it could ensure maximum human
benefit and security.
1
1.1 Problem statement
Approximately 300 Million surveillance cameras today are blind and merely record video
for post-incident manual analysis. In this scenario, dependability on human monitors is
increasing and reliability on the surveillance system is decreasing. Our aim is:
1.2 Motivation
2. Prevention is better than cure : Intelligent visual surveillance systems for anomalous
activity detection can detect anomalous events and prevent them from occurring
through triggers thereby ensuring maximum security. This can be contrasted with
their less-intelligent counterparts that would require post-incident manual analysis
after an anomalous event has occurred.
3. Safety first : The proposed approach follows a safety first principle that aims to
provide maximum safety for maximum human benefit.
1.3 Objectives
2
• Real-time monitoring
• Analyze and structure data using computer vision and deep learning techniques
Some sample application scenarios of the proposed system are discussed in this section :
• Pedestrian-accident detection
1.6 Overview
The report mainly discusses the current state of the project. The Literature Review
will discuss about the various Research Papers and the various concepts the team has
evaluated to solve the problem. Each of the team member was given different topics
to review and the top three relevant paper has been reviewed discussing the valuable
3
contribution from each. The Dataset chapeter discusses the Dataset we are considering
for this project and the various advantages over similar datasets. The Design chapter will
discuss about the various use cases and requirements of the project and how that is solved
by our design. The Action Plan discusses about the project management style adopted
and how the timeline for the project has been designed. It also discusses in detail about
the various principles followed by the project. The System Validation chapter talks
about the summary of the meetings the team had with Tata Elxsi regarding validation
of the project ideas to save time otherwise wasted on taking the wrong decisions. Their
contribution was necessary for the project to reach this far. The report is concluded with
all that is discussed and the further work that is to be taken in the Conclusion chapter.
And the citations for the research papers are presented in the References chapter.
4
CHAPTER 2
LITERATURE REVIEW
Anomaly Detection. Anomaly detection is one of the most challenging and long stand-
ing problems in computer vision. [1] proposes a MIL solution to anomaly detection by
leveraging only weakly labeled training videos. A MIL ranking loss with sparsity and
smoothness constraints for a deep learning network to learn anomaly scores for video seg-
ments is also discussed. [1] ranks top among the papers to formulate the video anomaly
detection problem in the context of MIL. The anomaly detection framework utilized to
implement this project takes inspiration from this excellent paper. The major contribu-
tions of [1] include : (1) MIL solution to anomaly detection by leveraging only weakly
labeled training videos (2) Introduction of a large-scale video anomaly detection dataset
consisting of 1900 real-world surveillance videos of 13 different anomalous events and
normal activities captured by surveillance cameras (3) Experimental results on the new
dataset show that the proposed method achieves superior performance as compared to
the state-of-the-art anomaly detection approaches (4) The new dataset also serves a chal-
lenging benchmark for activity recognition on untrimmed videos due to the complexity of
activities and large intra-class variations.
5
because they are of size (3, 3), etc. However, with the right training, this paper has
discussed how features can be extracted from videos using a 3D CNN architecture called
C3D. The C3D consists of 8 layers of 3D convolutions and 5 layers of pooling. The
network was then trained on the Sports 1M dataset. This network was able to achieve
remarkable performance compared to previous networks. Since then, the C3D network
with the Sports 1M weights was able to do various computer vision tasks on videos like
scene recognition, action recognition, etc with just a linear SVM from the feature maps.
The C3D has also proven to be very effective in predicting anomaly scores from weakly
labeled anomaly classification UCF-Crime dataset by combining it with multiple instance
learning. Our work focuses on using the said feature map for improving the accuracy and
classifying anomalies.
Another approach for anomaly detection we considered was to perform action recog-
nition and then tune the network for detecting anomalies based on recognized actions.
The approach discussed in [3] was found to be suitable. This was another method eval-
uated to solve our problem. However, it had some drawbacks to the specific use case we
are considering. This particular framework works fairly good for action recognition. It
uses RGB-D data which contains the skeletal poses along with the normal video data.
They are processed in two steps. The human skeletal data is passed through an RNN
comprising of Bi-GRU units and the other and the normal video data is passed through
a CNN with normal 2D convolutions and pooling operations. The output of the two net-
works is combined and the classification is done using a linear SVM. This framework has
performed fairly well with action recognition tasks. Since CCTV data is only 2D video,
the initial solution to our problem was to estimate the poses of people in the video and
then give that as inputs to the RNN as in this architecture. This had drawbacks with
poor performance leading to anomaly caused due to non-human type objects like cars,
etc. Also, this will make the network very computationally expensive.
Another approach we evaluated to solve the problem at hand was to take inspiration
from the method proposed in [4].But we were able to weigh the merits and demerits of this
method. Basically,the solutions as discussed in [4] were to consider the poses of the people
in the frames and find their variations across frames using an RNN network and then
classify it using a linear SVM. For extracting the poses most efficiently, OpenPose is the
best algorithm. This method follows a bottom-up approach where the heat maps of various
6
key points are generated and they are combined with a probability of occurrence with
respect to other key points. This method performs fairly well even with some occlusion
and clothing. Also, this gives very good speed. However, the problem arises when only
half the body is in the frame. So even though it responds well to multiple people in the
frame this method would fail when it comes to anomaly detection. Also, this method
does not consider the anomalies caused by other objects like cars, weapons, etc. So this
method will fail in giving good results for our problem.
Ranking. The approach proposed in [1] utilizes a deep Multiple Instance Ranking
strategy[7]. A small background study was conducted on learning the concepts of Multiple
Instance Ranking. To delve deeper, we found that it was imperative to study Multiple
Instance Learning.The work as discussed in [5] presents two new formulations of multiple-
instance learning as a maximum margin problem. The proposed extensions of the Support
Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that
can be solved heuristically. Generalization of SVMs makes a state-of-the-art classification
technique, including non-linear classification via kernels, available to an area that up to
now has been largely dominated by special purpose methods.It also presents experimental
results on a pharmaceutical data set and on applications in automated image indexing
and document categorization.
7
Triplet Networks, Auto-Encoders (AEs) and Generative Adversarial Networks (GANs).
This paper [7] has the objective to introduce the most fundamental concepts of Deep
Learning for Computer Vision in particular CNNs,AEs and GANs, including architectures,
inner workings and optimization.
Real-Time Streaming. The most pivotal area that needs to be explored to im-
plement a real-time anomaly detection system is the streaming of real-time video data.
Efficient data pipelines must be used to reduce latency,ensure security and reliability.
Firstly, we need to acquire the video data from the CCTV’s.To acquire this collected
data in real-time a Kafka cluster can be used. Apache Kafka is an open-source stream-
processing software platform developed by the Apache Software Foundation, written in
Scala and Java. The project aims to provide a unified, high-throughput, low-latency
platform for handling real-time data feeds.
8
Another method that could be looked at for solving the real-time streaming problem
is discussed in [9]. Large-scale data processing framework like Apache Spark is becoming
more popular to process large amounts of data either in a local or a cloud deployed
cluster. It is thus important to understand the architecture, working and benefits of
using this analytics engine. Apache Spark is one of the most prominent big data processing
platforms. It is an open source, general purpose, large-scale data processing framework.
It mainly focuses on high speed cluster computing and provides extensible and interactive
analysis through high level API’s. We concluded from our analysis that Spark can produce
high performance through its efficient storage,speed,fault tolerance and dynamic nature.
Also, since it is polyglot, it is easy to use. Thus our system can consider using Spark as
a data analytics platform.
We also looked at RDD’s as explained in the seminal paper [10] to better under-
stand the architecture of Spark. Resilient Distributed Datasets (RDDs), a distributed
memory abstraction that helps to perform in-memory computations on large clusters in a
fault-tolerant manner. It is important to understand this data structure in order to under-
stand the working of Spark. Although current frameworks provide numerous abstractions
for accessing a cluster’s computational resources, they lack abstractions for leveraging
distributed memory. This makes them inefficient for an important class of emerging ap-
plications: those that reuse intermediate results across multiple computations. Data reuse
is common in many iterative machine learning and graph algorithms, including PageRank,
K-means clustering, and logistic regression. Another compelling use case is interactive
data mining, where a user runs multiple adhoc queries on the same subset of the data.
An overview of these basic concepts of RDD,transformation and action is required to
efficiently use Spark as the analytical engine.
9
CHAPTER 3
DATASET
10
UCF-Crime dataset contains 13 realistic anomalies including Abuse, Arrest, Arson,
Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing,
Shoplifting, and Vandalism. These anomalies are selected by [1] because they have a
significant impact on public safety. The anomaly classes are shown in Fig 3.2
Figure 3.2: Anomaly Classes in UCF-Crime. The numbers in brackets shows the number
of training samples while the remaining samples account for the number of test videos
11
Figure 3.4: Sample frames from anomaly classes in UCF-Crime
This section aims to provide a performance comparison of UCF-Crime with other existing
datasets. The table as shown in Fig 3.5 achieves this performance comparison.
UCF-Crime dataset utilizes weakly labeled training videos. That is we only know
the video-level labels, i.e. a video is normal or contains anomaly somewhere, but we
do not know where. Experimental results in this dataset as presented in [1] show that
the proposed method achieves superior performance as compared to the state-of-the-art
anomaly detection approaches.
12
Disadvantages of existing datasets for anomaly detection as found through this
performance comparison are listed below:
However, the advantages of UCF-Crime over existing datasets are listed as follows :
13
CHAPTER 4
DESIGN
A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. It is integral to understand the system
architecture of a system in order to identify sub modules of the problem at hand and
moreover the relationship between these components.
The diagram in Fig 4.1 has three broad divisions which are, the data acquisition
state, deep learning state and data analytics state.Video data collected by the CCTV is
locally processed to generate image frames. These image frames are transferred to the
Deep Learning block for scoring the frames. An anomaly score calculated by the Deep
Learning framework is used to decide whether the streamed video has anomalies or not.
A particular threshold is set, and if the anomaly score exceeds this set threshold then an
14
Figure 4.1: End to End System architecture.
alert or notification is sent to the end user. The timestamp and anomaly score all the
streamed videos is saved in a Database. The information in this database is analyzed
to understand and to draw inferences regarding the frequency of anomalies during a
particular time period, locality and anomaly frequency relationship etc. These inferences
is made available to the end user if requested.
The activity diagram as shown in Fig 4.2 begins with video acquisition, i.e from the
CCTV cameras. From there, some local processing is done to group frames according to
the architecture and encode them for sending. The data is then streamed to the server
which hosts the model.The model receives the frame along with the time-stamp of the
given block of frame and predicts the corresponding anomaly score for that block. The
15
anomaly scores are then compared against a defined threshold and if found to be greater,
a notification is send to the CCTV admin about the matter. Irrespective of crossing the
threshold, the anomaly score predicted is stored in the database. The user can access
the anomaly scores over the required time-frame and visualize them through our online
analytics platform we are deploying as a web app. The web app will fetch the data
from the database according to the time interval for which the data is required and then
visualizes them on the analytics platform for the user to see.
Our principle use case of this project is to get near real time notifications for anomalies
occurring under the camera. The anomaly and the notification is decided by the model.
16
Once a notification is received. The admin can log into the web application and get a
detailed report of the anomaly scores of that particular camera along with the time of
occurrence.
This also contains the camera data, which can be useful for directing to the corre-
sponding CCTV management portal. This use aims to reduce delayed response times for
an anomaly occurring under a camera. Another use case of our project is that it can be
used to analyze the anomaly scores for a particular time period. I.e, the admin can log in
to the online portal of the project and then monitor the various occurrences of anomalies
under the camera and also the frequency of occurrences of them as well. This will help
the admin get a better understand the root for the anomaly. For eg. if the camera is in
an anomaly prone area like a prison, possible patterns can be inferred from the various
anomalies. Or in the context of a traffic camera, a high rate of accidents in a particu-
lar season can deduce valuable inferences. For this, the admin logs into the web server
and selects the time period for which he wishes to see the anomaly scores. The anomaly
scores will be visualized in the form of very intuitive graphs. One additional advantage
of this is that, even if an anomaly misses the threshold and goes undetected, examining
the anomaly graph for possible spikes or high levels just below the threshold value could
help save lots of time when finding the time of occurrence of an undetected anomaly.
17
4.4 Anomalous Activity Detection Framework
Surveillance cameras are increasingly being used in public places to increase public
safety. But, there is a glaring deficiency in the utilization of surveillance cameras and an
unworkable ratio of cameras to human monitors.The goal of a practical anomaly detection
system is to timely signal an activity that deviates normal patterns and identify the time
window of the occurring anomaly. Therefore, anomaly detection can be considered as
coarse level video understanding, which filters out anomalies from normal patterns. Once
an anomaly is detected, it can further be categorized into one of the specific activities using
classification techniques. Anomaly detection should be done with minimum supervision.
A small step towards addressing anomaly detection is to develop algorithms to detect
a specific anomalous event, for example violence detector and traffic accident detector.
However, it is obvious that such solutions cannot be generalized to detect other anomalous
events, therefore they render a limited use in practice.This approach as discussed in [1]
proposes to learn anomaly through the deep multiple instance ranking framework[7] by
leveraging weakly labeled training videos i.e. the training labels (anomalous or normal)
are at video-level instead of clip-level.
The flow diagram as shown in Fig 4.4 of the proposed anomaly detection approach.
Given the positive (containing anomaly somewhere) and negative (containing no anomaly)
videos, we divide each of them into multiple temporal video segments. Then, each video is
18
represented as a bag and each temporal segment represents an instance in the bag. After
extracting C3D features for video segments, we train a fully connected neural network
by utilizing a novel ranking loss function which computes the ranking loss between the
highest scored instances (shown in red) in the positive bag and the negative bag.
This approach mainly utilizes the concepts of Multiple Instance Learning[5] and 3D
Convolutional Neural Networks[2]. Multiple-instance learning (MIL) is a generalization of
supervised classification in which training class labels are associated with sets of patterns,
or bags, instead of individual patterns. Here, an image can be viewed as a bag of local
image patches or image regions. Since annotating whole images is far less time consuming
then marking relevant image regions, the ability to deal with this type of weakly annotated
data is very desirable. The dataset taken for this approach is UCF-Crime. UCF-Crime
has weakly labeled training videos where we only know the video-level labels and not
clip-level labels. Instead of using 2D convolutions across frames, we use 3D convolutions
on video volume. Here, 3D convolutional neural networks act as feature extractors for
video data
The project contains several components like the camera for video acquisition, Streaming
pipeline to send the video to the server, the Deep Learning model, the database, and the
web framework. Several parameters were considered for the selection of the right tools
for each of these components as follows:
19
• Video Acquisition: This will be done using a standard HD camera, probably from
the web cam of a Laptop.
• Video Streaming: Video streaming is a very integral part of this project because,
the processing is done on a remote server and it is very important to transmit the
data with integrity and near real time without any failures. For these reasons, the
streaming framework Apache Kafka will be used. The video data will be encoded
as int64 and sent to the server. And Kafka will be the most efficient way to do that.
• Deep Learning Model: The project not only implements deployment of the deep
learning model, but also training of the architecture. And since the model is on
the computationally expensive side, it is very important that a very fast library is
used for the project. Because of this reason and the reason, the project plans to
use PyTorch as the Deep Learning library. It also allows porting of the model
into C++ which ensures faster perfomance and making anomaly detection near real
time.
• Database: The project requires maintaining a database entry for the anomaly
scores with respect to the different time stamps. This requires extensive writes to
the database with a rate of one entry per second. So it is very important to maintain
a database that can handle such a load. Even though the write rate is very huge,
the database has a very simple schema. Considering all the elements, MySQL has
been considered as our database. If this does not meet our additional requirement
of aggregation queries, we will be porting to MongoDB.
• Web Server: The project’s analytics platform will be a web server. The analytics
platform will require issuing queries to the database and visualizing the data using
some front end javascript like ReactJS or VUE JS. The back end will be han-
dled using javascript because it is rising in popularity for reliability. The backend
framework of our choice will be NodeJS.
20
CHAPTER 5
ACTION PLAN
The project undertaken contains an array of sub tasks and a limited time to carry out in
order to finish the project on time with the mentioned specifications. So that, managing
the various tasks by scheduling and allotting to the right person is very crucial. With
various components like the streaming, web application, the deep learning framework and
deployment in the cloud, a modern project management methodology was required for
our successful completion.
5.1 Methodology
With the rising popularity of agile development methodology, and since our projects had
lesser iterations to be done, we decided to adopt the Agile-Kanban methodology for
managing our project. The management of the project was done on the open source
platform, Ora Project Management. The basic Agile-Kanban methodology consists
of four phases of a process, i.e To-Do, In-Progress, Testing, and Backlog. The tasks
that are tested and validated are moved to the Done section. In this methodology of
project management, each process moves in like they are moving through the assembly
line and it is a methodology that has been proved to be very effective in not only the
software industry, but others as well. Using the tool helps us specify various things about
the tasks like, a detailed description about the task, checklist of things to ensure the
task completion, type of task, deadline, the person it is assigned to, etc. Overall it is a
very powerful free tool that has eased project tracking and management for us. We have
adopted the idea of daily stand-ups to keep track of each of our progress. This has helped
us a lot to keep the project work flow along with other academic tasks. Also, this is very
effective in figuring out problems and hurdles in our tasks. We also record the minutes of
our short stand-up sessions. We are also planning to deploy it on a docker container, and
try out some updates to our product and hence adapt some of the Dev-Ops principles.
21
5.2 Schedule
To estimate the time required for each of the tasks and preparing a timeline, the project
was analyzed during the initial phase and the tasks were plotted on a Gantt Chart.
This has given the team a lot of insight on scheduling the tasks and estimating delays in
the overall schedule. The detailed Gantt Chart can be seen in fig.The work for the project
was begun in June and is expected to be completed by March. The project has also been
divided into different phases according to the nature of the tasks as follows:
• Model Training Phase: The project is currently in the model training phase.
Various tasks like setting up the cloud platform, setting up streaming of video data,
coding the model, etc precedes the actual training of the model. The project is
halfway through this phase and is expecting to move to the next phase by the end
of December.
• Deployment Phase (Backend): This phase deals with the deployment of the
tested model on the cloud. Here, the model will be validated against a set of test
cases, both for performance and accuracy. We will also be testing the database.
• End to End testing and Evaluation: This phase will complete all the testing
and the final evaluation of the model. The project is allotted enough time to figure
out issues and correct them. The project is expected to be completed by the end
of February and the report and the paper will be completed by the first month of
March.
22
Figure 5.1: Project schedule-1
23
Figure 5.2: Project schedule-2
24
5.3 Future Work
Work done till now, is summarized in the phases above. The major part of the project
work is yet to be completed. Having received the necessary hardware support in the
form of cloud servers. The team is all set to start training and evaluating the models.
The project aims at training the model from scratch apart from using standard weights.
There are other tasks like testing the database and the analytics platform. Each of the
team member is assigned tasks on different areas of the project and the coordination for
completing the tasks till now was very good. This flow is expected to continue till the end
of the project. The team will be actively reviewing for possible issues and the time-line
allows enough time for possible issues to be fixed.
25
CHAPTER 6
SYSTEM VALIDATION
The project’s complexity demanded us to validate our proposed solutions and the designed
system so that unnecessary time will not be lost on our limited time to complete the project
and minimize the time lost on making the wrong decisions. The team approached Tata
Elxsi for assistance in System Validation. With their help, every phase of the project
was thoroughly evaluated and the necessary changes were made. Two conference calls
were arranged to discuss about the matter, one during the Idea Conceptualization Phase
and the other during the Training Phase. Tata Elxsi’s valuable contribution to helping
us validate the system has helped us a lot to overcome otherwise tedious hurdles.
We started off by giving them an overview of our project. Which included the detailed
problem statement and the proposed architecture of the system designed to solve this
problem. The team also presented the idea of using pose estimation and the RNN ap-
proach to identify anomalies. The various challenges of taking this approach because of
the lack of a proper dataset and the training and performance of the proposed model was
also discussed. The various aspects of selecting the right hardware for running the model
was also discussed.
26
6.1.2 Queries by Tata Elxsi
The first query Tata Elxsi raised was why the project was only considering poses and
not the video data. Elxsi also enquired if C3D was considered for video analysis as poses
alone would not be a suitable option to consider. The significance and the performance
of RNN over CNN was also a query raised by Elxsi. Other queries included clarifications
about the dataset for training the model and the type of learning approach the team was
going to undertake.
The Tata Elxsi R&D team remarked that it was indeed a novel approach the team has
taken to solve the problem. However, this will only give good results from an academic
point of view. I.e often models and concepts like this trained on a limited dataset will
perform fairly well on the considered test set. But, in the real world scenario it often fails
to give the expected performance. Apart from this, various suggestions were also given to
make our problem statement generic and then short list the most common anomalies. And
also, to include data with occlusion or night surveillance and sufficient pre-processing to
rectify some of the common problems. The important suggestion however, was to consider
C3D as a feature extractor and perform anomaly classification from that.
Like the previous meeting, a detailed overview of the development of the project including
the discovery of the UCF-Crime dataset and the robustness of using MIL on the C3D
extracted features on the UCF-Crime dataset. The team raised various clarifications as
to the selection of the right Deep Learning framework and other components.
The major suggestion from Tata Elxsi was about the various aspects of choosing the deep
learning framework. Tensorflow, Keras and PyTorch was suggested. Tensrflow, being
27
very robust to deploy and used extensively for that matter. Keras, being very easy to
code and prototype but not very suitable for deployment. PyTorch has records of being
faster, however it is not extensively used in production so issues are bound to occur.
For the streaming part, the Elxsi team was unsure. However, Apache Kafka would be
a good option was the suggestion given. Apart from this, general suggestions on the
representation of the system design and other aspects were given. Suggestion to test
out MySQL DB and Mongo DB according to our requirement was also suggested. For
the web application, the use of the javascript framework NodeJS was preferred by Elxsi.
These suggestions helped the team save lot of time in trying and testing to meet the
requirements.
28
CHAPTER 7
CONCLUSION
The concern for public safety and security is rising everywhere. Surveillance cameras are
increasingly being used in public places to increase safety. But there is a glaring deficiency
in the utilization of surveillance cameras and an unworkable ratio of cameras to human
monitors.The goal of a practical anomaly detection system is to timely signal an activity
that deviates normal patterns and identify the time window of the occurring anomaly.
Therefore, anomaly detection can be considered as coarse level video understanding, which
filters out anomalies from normal patterns. Once an anomaly is detected, it can further
be categorized into one of the specific activities using classification techniques. Through
this project, we seek to enhance the reliability of surveillance systems which in turn would
provide maximum security to the users. We aim to achieve this by providing real-time
triggers to the user when the intelligent surveillance system detects any anomalous event
that needs attention like theft, violence, arson, vandalism etc. We also look forward
to establish a robust data analytics platform that could derive meaningful insights with
respect to anomalous activities for the user through a web application. This provision
can be suitably established either on a timely basis or on user’s request. The proposed
system aims at reducing strenuous human labour and avoids any possibilities of negligence
from human monitors. The proposed system, Anomalous activity detection for intelligent
visual surveillance is bound to redefine notions of safety and security in public places.
29
REFERENCES
[1] Waqas Sultani, Chen Chen, Mubarak Shah. Real-world Anomaly Detection in Surveil-
lance Videos. IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2018
[3] Rui Zhao, Haider Ali, Patrick van der Smagt. Two-Stream RNN/CNN for Action
Recognition in 3D Videos. In IEE International Conference on Intelligent Robots and
Systems, 2017
[5] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-
instance learning. In NIPS, pages 577–584, Cambridge, MA, USA, 2002. MIT Press.
[6] M. Ponti et al. Everything You Wanted to Know about Deep Learning for Computer
Vision but Were Afraid to Ask. SIBGRAPI-T 2017
[8] H Chen, F Luo, L Zhao and Y Li.Design and Implementation of Real-Time Video Big
Data Platform based on Spark Streaming. In CSAE 2017
30