0% found this document useful (0 votes)
5 views

AI Praneeth IPR.docx

The document outlines a research project focused on developing a transformer-based anomaly detection system for surveillance videos, addressing limitations of traditional methods. It details the methodology, including data collection, feature extraction, and model evaluation using various deep learning architectures. The study aims to enhance accuracy and efficiency in detecting unusual behaviors, contributing to advancements in security and automated monitoring systems.

Uploaded by

akashnaskar339
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

AI Praneeth IPR.docx

The document outlines a research project focused on developing a transformer-based anomaly detection system for surveillance videos, addressing limitations of traditional methods. It details the methodology, including data collection, feature extraction, and model evaluation using various deep learning architectures. The study aims to enhance accuracy and efficiency in detecting unusual behaviors, contributing to advancements in security and automated monitoring systems.

Uploaded by

akashnaskar339
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Page 1 of 34 - Cover Page Submission ID trn:oid:::3618:84269615

Praneeth_IPR.docx
Turnitin

Document Details

Submission ID

trn:oid:::3618:84269615 32 Pages

Submission Date 3,880 Words

Mar 3, 2025, 11:48 AM UTC


25,282 Characters

Download Date

Mar 3, 2025, 11:49 AM UTC

File Name

Praneeth_IPR.docx

File Size

4.7 MB

Page 1 of 34 - Cover Page Submission ID trn:oid:::3618:84269615


Page 2 of 34 - AI Writing Overview Submission ID trn:oid:::3618:84269615

60% detected as AI Caution: Review required.

The percentage indicates the combined amount of likely AI-generated text as It is essential to understand the limitations of AI detection before making decisions
well as likely AI-generated text that was also likely AI-paraphrased. about a student’s work. We encourage you to learn more about Turnitin’s AI detection
capabilities before using the tool.

Detection Groups
1 AI-generated only 60%
Likely AI-generated text from a large-language model.

2 AI-generated text that was AI-paraphrased 0%


Likely AI-generated text that was likely revised using an AI-paraphrase tool
or word spinner.

Disclaimer
Our AI writing assessment is designed to help educators identify text that might be prepared by a generative AI tool. Our AI writing assessment may not always be accurate (it may misidentify
writing that is likely AI generated as AI generated and AI paraphrased or likely AI generated and AI paraphrased writing as only AI generated) so it should not be used as the sole basis for
adverse actions against a student. It takes further scrutiny and human judgment in conjunction with an organization's application of its specific academic policies to determine whether any
academic misconduct has occurred.

Frequently Asked Questions

How should I interpret Turnitin's AI writing percentage and false positives?


The percentage shown in the AI writing report is the amount of qualifying text within the submission that Turnitin’s AI writing
detection model determines was either likely AI-generated text from a large-language model or likely AI-generated text that was
likely revised using an AI-paraphrase tool or word spinner.

False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models.

AI detection scores under 20%, which we do not surface in new reports, have a higher likelihood of false positives. To reduce the
likelihood of misinterpretation, no score or highlights are attributed and are indicated with an asterisk in the report (*%).

The AI writing percentage should not be the sole basis to determine whether misconduct has occurred. The reviewer/instructor
should use the percentage as a means to start a formative conversation with their student and/or use it to examine the submitted
assignment in accordance with their school's policies.

What does 'qualifying text' mean?


Our model only processes qualifying text in the form of long-form writing. Long-form writing means individual sentences contained in paragraphs that make up a
longer piece of written work, such as an essay, a dissertation, or an article, etc. Qualifying text that has been determined to be likely AI-generated will be
highlighted in cyan in the submission, and likely AI-generated and then likely AI-paraphrased will be highlighted purple.

Non-qualifying text, such as bullet points, annotated bibliographies, etc., will not be processed and can create disparity between the submission highlights and the
percentage shown.

Page 2 of 34 - AI Writing Overview Submission ID trn:oid:::3618:84269615


Page 3 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

UNIVERSITY OF HERTFORDSHIRE
School of Physics, Engineering, and Computer
Science

MSc Advanced Computer Science (7COM1039)


Module Code:
Date: 03-03-2025

Anomalous Behavior Detection in Surveillance


Videos Using Transformer-Based Learning
Models

Name: Praneeth Kumar Mudiyala


Student ID: 23034478
Supervisor: Yury Savateev

1
Page 3 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 4 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Abstract

Anomalous behavior detection in surveillance videos is a critical aspect of modern security and
monitoring systems. Traditional approaches, such as motion-based detection and handcrafted
feature extraction, often fail to capture complex patterns in real-world scenarios. Recent
advancements in deep learning, particularly Vision Transformers (ViTs) and TimeSformer,
offer a promising solution by leveraging self-attention mechanisms to model long-range
dependencies in video data.

This research focuses on developing a novel transformer-based anomaly detection system


that improves the accuracy and efficiency of identifying unusual activities in surveillance
footage. The project follows a structured methodology, beginning with data collection and
preprocessing using the UCF-Crime dataset. Feature extraction techniques such as optical
flow, background subtraction, and histogram-based methods are implemented to enhance
anomaly recognition.

The study explores different deep learning architectures, including CNN-based feature
extractors (ResNet, EfficientNet) and transformer-based models (ViViT, TimeSformer). A
hybrid CNN-Transformer approach is also examined to evaluate its effectiveness in
spatiotemporal anomaly detection. The performance of these models will be assessed using key
metrics such as precision, recall, F1-score, and AUC-ROC to compare their accuracy,
computational efficiency, and real-time applicability.

The project aims to address key challenges such as high false positive rates, real-time
processing constraints, and model interpretability. Upon completion, the research is
expected to provide a scalable and robust anomaly detection system for real-world
applications in security, public safety, and automated surveillance monitoring. The findings
will contribute to the advancement of deep learning-driven video analysis, demonstrating the
potential of transformer-based architectures in improving the accuracy of anomaly detection
in complex environments.

2
Page 4 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 5 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Contents
Abstract ..................................................................................................................................... 2

1. Introduction .......................................................................................................................... 4

2. Literature Review ................................................................................................................ 6

2.1 Review of Related Work .................................................................................................. 6


2.2 Analysis of Literature ....................................................................................................... 7
2.3 Research Gap.................................................................................................................... 8
3. Proposed Methodology ........................................................................................................ 9

3.1 Proposed Practical Investigation ...................................................................................... 9


3.2 Technical Work ............................................................................................................... 10
3.3 Tools and Techniques ..................................................................................................... 11
3.4 Deliverables .................................................................................................................... 11
3.5 Ethical, Legal, Professional, and Social Issues .............................................................. 12
4. PROGRESS TO DATE AND CHALLENGES FACED ................................................. 13

4.1 Progress to Date ............................................................................................................. 13


4.2 Challenges Faced............................................................................................................ 14
5. NEXT STEPS ..................................................................................................................... 16

6. Project Timeline ................................................................................................................. 18

6.1 Project Timeline ............................................................................................................. 18


6.2 Gantt Chart ..................................................................................................................... 19
7. Conclusion .......................................................................................................................... 20

8. References ........................................................................................................................... 21

Appendix A: Technologies Used............................................................................................ 23

Appendix B: Code Snippets .................................................................................................. 24

3
Page 5 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 6 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

1. Introduction

1.1 Background

With the increasing demand for intelligent surveillance systems, detecting anomalous behavior
in real-time has become a critical challenge in public safety and security. Traditional methods
rely on manually defined rules or handcrafted features, which often struggle to capture complex
and unpredictable behaviors. Deep learning approaches, particularly CNN-LSTM models, have
improved anomaly detection but still face limitations in handling long-term dependencies in
video sequences.

Recent advancements in transformer-based architectures, such as Vision Transformers (ViTs)


and TimeSformer, have shown promising results in video analysis by efficiently capturing
spatial and temporal relationships. This project aims to explore the potential of transformer
models for anomaly detection in surveillance videos by leveraging their ability to process long-
range dependencies and improve feature representation.

1.2 Research Questions

This research will address the following key questions:

1. How can transformer-based models enhance anomalous behavior detection compared


to traditional deep learning methods?

2. What feature extraction techniques are most effective for detecting anomalies in
surveillance footage?

1.3 Aim

The aim of this project is to develop an anomaly detection system using transformer-based
learning models to improve the accuracy and efficiency of detecting unusual behaviors in
surveillance footage.

1.4 Objectives

 Analyze and preprocess surveillance video datasets, including frame extraction and
feature normalization.

 Develop a feature extraction pipeline using CNN-based models and transformers.

4
Page 6 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 7 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

 Implement self-attention mechanisms for effective spatiotemporal modeling.

 Train and optimize transformer-based models for anomaly detection.

 Evaluate model performance using precision, recall, F1-score, and AUC-ROC metrics.

 Compare results with traditional deep learning models such as CNN-LSTM and 3D
CNNs.

 Optimize the model for real-time deployment in surveillance applications.

1.5 Expected Outcomes

 A working anomalous behavior detection system using transformer-based models.

 A comparative study on the effectiveness of transformers versus traditional deep


learning models.

 A well-defined feature extraction and preprocessing pipeline for anomaly detection.

 A research report detailing the findings, performance analysis, and model evaluation.

 A deployable model optimized for real-time surveillance applications.

This section provides a clear foundation for your project, outlining its motivation, scope, and
expected contributions. Let me know if you need any refinements.

5
Page 7 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 8 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2. Literature Review

Anomalous behavior detection in surveillance videos is a critical area in computer vision and
artificial intelligence. Several research studies have explored machine learning and deep
learning techniques to enhance anomaly detection in real-time environments. Traditional
methods often rely on handcrafted features and statistical approaches, while modern deep
learning-based techniques use CNNs, RNNs, and transformers for improved accuracy.

This literature review explores key contributions in anomaly detection, comparing different
methodologies, highlighting research gaps, and analyzing the effectiveness of various
models used in video-based anomaly detection.

2.1 Review of Related Work

Older methods for anomaly detection in video surveillance were based on motion-based
detection, trajectory analysis, and handcrafted features. These techniques relied on
statistical models, background subtraction, and optical flow to detect irregular patterns.
However, these methods were highly dependent on predefined thresholds and struggled with
complex activities, varying lighting conditions, and occlusions.

Fan et al. (2020) proposed a real-time abnormal behavior detection method using traditional
computer vision techniques. Their approach involved motion estimation and trajectory
analysis, but it suffered from high false positive rates when dealing with crowded
environments.

With the advancements in deep learning, researchers have adopted CNNs, RNNs, and 3D
CNNs to extract spatiotemporal features from video sequences.

Qasim and Verdu (2023) developed an anomaly detection system using deep convolutional
and recurrent models, combining CNN feature extraction with LSTMs to capture temporal
dependencies in video data. Their approach improved accuracy but was computationally
expensive.

Mehmood (2021) proposed an efficient anomaly detection framework using pre-trained 2D


CNNs. The study demonstrated that transfer learning could significantly reduce training time
and improve model generalization. However, CNNs alone failed to capture long-range
temporal relationships in video sequences.

6
Page 8 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 9 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Maqsood et al. (2021) used 3D CNNs to process video data and detect anomalies. This
approach improved spatiotemporal feature extraction but required large datasets to prevent
overfitting.

Recently, transformer-based models have gained popularity for video anomaly detection due
to their ability to capture long-range dependencies and contextual relationships.

Choudhry et al. (2023) conducted a comprehensive survey on machine learning methods


for anomaly detection in surveillance videos, emphasizing the advantages of self-attention
mechanisms in transformers.

Mishra et al. (2024) explored skeletal video anomaly detection using deep learning. Their
study highlighted the potential of ViTs (Vision Transformers) and TimeSformer for
improving anomaly detection by leveraging global feature representations.

Mukto et al. (2024) designed a real-time crime monitoring system using deep learning
techniques, integrating CNNs and transformers to achieve faster and more accurate anomaly
classification.

Ali (2023) proposed a real-time video anomaly detection system for smart surveillance
using transformer-based architectures. The study found that attention mechanisms in ViTs
helped the model learn contextual features effectively, leading to better anomaly
classification results.

2.2 Analysis of Literature

A comparison of different methodologies in anomaly detection is summarized in the table


below:

Study Approach Used Advantages Limitations

Fan et al. Motion-based Real-time processing High false positive rate


(2020) detection

Qasim & CNN + LSTM Captures temporal Computationally


Verdu (2023) dependencies expensive

Mehmood Pretrained 2D Faster training, good Cannot handle long-term


(2021) CNNs generalization dependencies

7
Page 9 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 10 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Maqsood et al. 3D CNN Improved feature Needs large datasets to


(2021) extraction avoid overfitting

Choudhry et Transformer-based Self-attention for Requires high


al. (2023) models anomaly detection computational power

Mishra et al. Vision Captures global Higher latency in real-


(2024) Transformers dependencies time applications
(ViTs)

Ali (2023) ViTs for smart Better contextual Requires optimization for
surveillance learning real-time use

From the comparison, it is evident that transformers outperform traditional CNNs and
LSTMs in handling long-range dependencies, but they require higher computational
resources and optimization for real-time applications.

2.3 Research Gap

Despite recent advancements, several challenges remain in video anomaly detection:

1. High False Positives and False Negatives: Many deep learning models still
misclassify normal behavior as anomalies or fail to detect subtle anomalies.

2. Computational Complexity: Transformer models, while powerful, are often


computationally expensive and require high-end GPUs for training.

3. Limited Dataset Availability: Many anomaly detection models are trained on small,
imbalanced datasets, limiting their real-world generalization.

4. Real-Time Constraints: Existing deep learning methods struggle with real-time


anomaly detection, making them impractical for live surveillance systems.

5. Lack of Explainability: Many transformer-based models operate as black-box


systems, making it difficult to understand why an event was classified as anomalous.

8
Page 10 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 11 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

3. Proposed Methodology

The anomaly detection system suggested here is the use of UCF Crime dataset and pre-
processing. Pre-processing involves feature extraction, resizing images, and removal of noise.
Feature Engineering has been used to extract motion features. Model training uses CNN and
RNN to identify anomalies. Finally, performance measurement has been carried out using
measures such as AUC-ROC, precision, recall, and error rate. This architecture in Fig.1 outlines
anomaly detection for real-time surveillance.

Fig.1 Architectural Flow Diagram


3.1 Proposed Practical Investigation

The research methodology follows a structured approach involving data collection,


preprocessing, feature extraction, model development, training, and evaluation. The
proposed method aims to explore transformer-based models for anomalous behavior detection
in surveillance videos.

Data Collection and Preparation

9
Page 11 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 12 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

 The dataset used is UCF-Crime, a large-scale video dataset with labeled normal and
anomalous activities.

 Frames are extracted from videos using OpenCV while maintaining temporal
consistency.

 Preprocessing includes resizing, normalization, background subtraction, and


optical flow computation.

 Data augmentation techniques are applied to handle class imbalances.

Model Development

 CNN-based feature extractors (ResNet, EfficientNet) are used for spatial feature
learning.

 Transformer models (ViViT, TimeSformer) are integrated for temporal sequence


modeling.

 A hybrid CNN-Transformer architecture is explored for improved performance.

Model Evaluation

 Performance metrics: Precision, Recall, F1-score, and AUC-ROC.

 Comparison with traditional deep learning models (CNN-LSTM, 3D CNNs).

 Computational efficiency analysis for real-time applications.

3.2 Technical Work

The technical implementation consists of multiple steps to develop and evaluate the
transformer-based anomaly detection model.

Data Preprocessing

 Extract frames at specific intervals from crime videos.

 Apply background subtraction and frame differencing for motion analysis.

 Normalize and resize frames to 64×64 or 224×224 pixels.

 Convert frames into feature embeddings using CNNs before passing to transformers.

Model Architecture Design

 Implement CNN-based feature extractors (ResNet, EfficientNet) to learn spatial


features.

10
Page 12 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 13 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

 Use Vision Transformers (ViTs) and TimeSformer for sequence modeling.

 Design a hybrid CNN-Transformer model to capture both spatial and temporal


dependencies.

Transfer Learning Adaptation

 Utilize pre-trained CNN models for feature extraction.

 Fine-tune transformer models on the UCF-Crime dataset.

Training Process

 Train the model using AdamW optimizer with learning rate scheduling.

 Apply data augmentation to improve generalization.

 Evaluate using a train-validation-test split.

3.3 Tools and Techniques

The research involves various deep learning tools and frameworks for model development
and evaluation.

 Programming Language: Python

 Deep Learning Frameworks: TensorFlow, PyTorch

 Computer Vision Libraries: OpenCV, scikit-image

 Data Processing: NumPy, Pandas

 Visualization: Matplotlib, Seaborn

 Pretrained Models: ResNet, EfficientNet, Vision Transformers

 Hardware Used: GPU (NVIDIA CUDA) for faster training

3.4 Deliverables

The key deliverables of this research are:

 A research proposal detailing the project objectives and methodology.

 Interim Progress Report (IPR) covering data preprocessing and preliminary results.

 A project plan outlining the timeline and milestones.

 A final research report summarizing the findings, model evaluations, and conclusions.

11
Page 13 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 14 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

 Python implementation files (.py/.ipynb) containing preprocessing and model


training code.

3.5 Ethical, Legal, Professional, and Social Issues

Ethical Issues

 The research ensures compliance with data privacy regulations when using
surveillance datasets.

 No personally identifiable information (PII) is used in training or evaluation.

 Ethical AI principles are followed to avoid bias in anomaly detection models.

Legal Issues

 The dataset used (UCF-Crime) is publicly available for research purposes.

 All models and code implementations comply with open-source licensing


agreements.

 The study follows GDPR and ethical AI guidelines for responsible use of surveillance
data.

Professional Issues

 The research aligns with IEEE standards for computer vision-based security
applications.

 The project follows best practices in model development, evaluation, and


deployment.

Social Issues

 The system aims to enhance public safety by improving real-time surveillance


monitoring.

 Reducing false positives ensures the system does not unfairly flag normal activities
as crimes.

 Ethical concerns regarding mass surveillance and individual privacy are addressed by
ensuring the model is used responsibly.

12
Page 14 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 15 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

4. PROGRESS TO DATE AND CHALLENGES FACED

4.1 Progress to Date

The research has progressed significantly in the initial stages, focusing on data collection,
preprocessing, and feature extraction. The following tasks have been successfully
completed:

 Dataset Analysis: The UCF-Crime dataset has been selected for model training and
evaluation. The dataset has been analyzed to understand class distributions, imbalance
issues, and data availability.

 Frame Extraction & Sampling: Frames have been successfully extracted from video
files using OpenCV, ensuring temporal consistency. Sampling techniques have been
applied to reduce redundancy and retain only meaningful frames.

 Preprocessing Techniques Implemented:

o Resizing & Normalization of frames to a consistent size (64×64 and 224×224).

o Grayscale conversion (if required) for edge-based feature extraction.

o Optical flow computation to track motion changes.

o Frame differencing and background subtraction to detect anomalies.

o Histogram of Oriented Gradients (HOG) and ORB keypoint detection


applied for feature extraction.

 Exploratory Data Analysis (EDA):

o Class imbalance visualization using bar charts and pie charts.

o Heatmap analysis of anomaly frequencies.

o Comparison of motion-based versus static background anomaly detection.

 Preliminary Model Selection:

o CNN-based feature extraction techniques have been explored.

13
Page 15 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 16 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

o Initial tests with ViTs and TimeSformer have been conducted.

o Hybrid approaches combining CNNs with transformers are under


investigation.

The current focus is on finalizing the feature extraction pipeline and preparing the dataset for
model training.

4.2 Challenges Faced

1. Class Imbalance in Dataset

 The dataset is highly imbalanced, with fewer samples for some anomaly types.

 Oversampling techniques such as Synthetic Minority Over-Sampling Technique


(SMOTE) and data augmentation are being explored to balance the dataset.

2. Computational Constraints

 Transformer models require high computational power, making training on large


video datasets challenging.

 Solutions being considered: Cloud computing (Google Colab Pro, AWS), model
pruning, and quantization to optimize performance.

3. Temporal Dependencies in Video Data

 Unlike static images, anomalies in videos occur over multiple frames, requiring a
sequence-based approach.

 Solution: Using TimeSformer and ViViT, which handle long-term dependencies


more efficiently than CNN-LSTM models.

4. Real-Time Processing Constraints

 Some anomaly detection techniques are computationally expensive, making real-time


implementation difficult.

 Solution: Optimizing the model using TensorFlow Lite and ONNX for real-time
inference.

14
Page 16 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 17 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

5. Overfitting in Deep Learning Models

 Due to the limited number of anomalous samples, deep learning models tend to overfit
on training data.

 Solution: Dropout regularization, data augmentation, and transfer learning to


improve generalization.

6. Lack of Interpretability in Transformer Models

 Transformers are known for their black-box nature, making it hard to explain model
decisions.

 Solution: Integrating attention maps to highlight which frames contribute to


anomaly classification.

Despite these challenges, significant progress has been made in preprocessing and model
selection. The next phase will focus on model training, hyperparameter tuning, and
performance evaluation.

15
Page 17 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 18 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

5. NEXT STEPS

Moving forward, the next phase of the project will involve model training, optimization, and
evaluation. The key steps include:

1. Finalizing the Feature Extraction Pipeline

 Complete the extraction of motion-based and texture-based features.

 Ensure optimal preprocessing settings for ViTs and CNN-Transformer models.

2. Implementing and Training Transformer-Based Models

 Train ViViT, TimeSformer, and Swin Transformer on the preprocessed dataset.

 Compare performance with CNN-LSTM and 3D CNNs to analyze improvements.

 Use transfer learning with pre-trained vision models for better feature extraction.

3. Hyperparameter Optimization

 Fine-tune batch size, learning rate, dropout rate, and number of attention heads in
transformer models.

 Use Grid Search and Bayesian Optimization to find the best hyperparameter
combinations.

4. Model Evaluation and Benchmarking

 Compare the transformer-based model with traditional deep learning models.

 Evaluate performance using Precision, Recall, F1-score, AUC-ROC, and


computational efficiency metrics.

 Analyze results and refine the model based on false positives and false negatives.

5. Real-Time Deployment Considerations

 Convert the best-performing model into a TensorFlow Lite or ONNX format for real-
time deployment.

16
Page 18 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 19 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

 Test real-time anomaly detection on live CCTV feeds or pre-recorded video streams.

 Optimize the system for low-latency inference and scalability.

6. Prepare Final Report and Document Findings

 Document model performance, experimental results, and conclusions.

 Compare results with existing anomaly detection techniques.

 Finalize the research paper and prepare for submission.

17
Page 19 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 20 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

6. Project Timeline

6.1 Project Timeline

This section outlines the key phases of the project, covering research, data preprocessing,
model training, evaluation, and final report preparation. The timeline ensures that each
stage is completed within the given timeframe.
Table 6.1: Project Timeline
Phase Task Name Duration Start End
Date Date
Phase 1: Research Selection of research topic 3 days 20 Jan 22 Jan
Planning 2025 2025
Identify the problem domain 5 days 23 Jan 29 Jan
2025 2025
Create research questions 3 days 30 Jan 1 Feb
2025 2025
Define research aim & 5 days 4 Feb 8 Feb
objectives 2025 2025
Research ethics review 6 days 10 Feb 17 Feb
2025 2025
Phase 2: Literature Conduct literature review 7 days 18 Feb 24 Feb
Review & Data 2025 2025
Preparation
Identify research gaps 4 days 25 Feb 28 Feb
2025 2025
Interim Progress Report (IPR) 5 days 1 Mar 7 Mar
submission 2025 2025
Phase 3: Data Dataset selection & frame 5 days 10 Mar 15 Mar
Preprocessing & Model extraction 2025 2025
Development
Feature extraction (Optical 7 days 16 Mar 23 Mar
Flow, HOG, Background 2025 2025
Subtraction)

18
Page 20 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 21 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Model selection & 7 days 24 Mar 31 Mar


architecture design 2025 2025
Phase 4: Model Training Train deep learning models 10 days 1 Apr 10 Apr
& Evaluation (ViTs, TimeSformer, CNNs) 2025 2025
Hyperparameter tuning & 7 days 11 Apr 18 Apr
model optimization 2025 2025
Model evaluation using 7 days 19 Apr 26 Apr
performance metrics 2025 2025
Phase 5: Documentation Drafting Final Project Report 10 days 1 May 10
& Report Submission (FPR) 2025 May
2025
Proofreading & finalizing the 5 days 11 May 15
report 2025 May
2025
Project submission & Viva 5 days 16 21
preparation May May
2025 2025

6.2 Gantt Chart

To better illustrate the timeline, the Gantt Chart below provides a visual representation of
the project's progress. Each bar represents the duration of a task, with start and end dates
clearly marked.

Figure 2: Gantt Chart for Project Timeline

19
Page 21 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 22 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

7. Conclusion

This project aims to develop an anomalous behavior detection system using transformer-
based learning models for video surveillance. The research has progressed through several
critical phases, including data collection, preprocessing, literature review, and initial
feature extraction. The study addresses the limitations of traditional anomaly detection
methods by exploring Vision Transformers (ViTs), TimeSformer, and hybrid CNN-
Transformer architectures.
Through detailed preprocessing, key features such as optical flow, background subtraction,
and histogram-based methods have been extracted to enhance model performance. The
methodology ensures a structured approach for improving the accuracy and efficiency of
anomaly detection. The research also considers real-time processing challenges,
computational efficiency, and model interpretability, aiming to optimize transformer models
for practical deployment.
The next phase will focus on model training, evaluation, and optimization to compare
transformers with traditional CNN-LSTM models. A rigorous performance assessment will
be conducted using precision, recall, F1-score, and AUC-ROC metrics. Further, efforts will
be made to improve model generalization, reduce false positives, and ensure real-time
applicability in surveillance systems.
Upon completion, this study will contribute to the advancement of intelligent security systems
by developing a robust deep learning-based anomaly detection framework. The findings
will support future research in AI-driven video surveillance, real-time anomaly detection,
and the adoption of transformers in spatiotemporal video analysis.
This research is expected to bridge the gap between deep learning advancements and real-
world security applications, providing a scalable, accurate, and efficient anomaly
detection system for practical use.

20
Page 22 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 23 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

8. References

[1] Patwal, A., Diwakar, M., Tripathi, V., & Singh, P. (2023). An investigation of videos for
abnormal behavior detection. Procedia Computer Science, 218, 2264-2272.
https://doi.org/10.1016/j.procs.2023.01.202

[2] Qasim, M., & Verdu, E. (2023). Video anomaly detection system using deep convolutional
and recurrent models. Results in Engineering, 18, 101026.
https://doi.org/10.1016/j.rineng.2023.101026

[3] Fan, Z., Yin, J., Song, Y., & Liu, Z. (2020). Real-time and accurate abnormal behavior
detection in videos. Machine Vision and Applications, 31, 1-13.
https://doi.org/10.1007/s00138-020-01111-3

[4] Ren, J., Xia, F., Liu, Y., & Lee, I. (2021, December). Deep video anomaly detection:
Opportunities and challenges. In 2021 international conference on data mining
workshops (ICDMW) (pp. 959-966). IEEE.
https://doi.org/10.1109/ICDMW53433.2021.00125

[5] Patrikar, D. R., & Parate, M. R. (2022). Anomaly detection using edge computing in video
surveillance system. International Journal of Multimedia Information Retrieval, 11(2),
85-110. https://doi.org/10.1007/s13735-022-00227-8

[6] Mehmood, A. (2021). Efficient anomaly detection in crowd videos using pre-trained 2D
convolutional neural networks. IEEE Access, 9, 138283-138295.
https://doi.org/10.1109/ACCESS.2021.3118009

[7] Maqsood, R., Bajwa, U. I., Saleem, G., Raza, R. H., & Anwar, M. W. (2021). Anomaly
recognition from surveillance videos using 3D convolution neural network. Multimedia
Tools and Applications, 80(12), 18693-18716. https://doi.org/10.1007/s11042-021-
10570-3

[8] Choudhry, N., Abawajy, J., Huda, S., & Rao, I. (2023). A comprehensive survey of
machine learning methods for surveillance videos anomaly detection. IEEE Access, 11,
114680-114713. https://doi.org/10.1109/ACCESS.2023.3321800

[9] Mishra, P. K., Mihailidis, A., & Khan, S. S. (2024). Skeletal video anomaly detection
using deep learning: Survey, challenges, and future directions. IEEE Transactions on
Emerging Topics in Computational Intelligence.
https://doi.org/10.1109/TETCI.2024.3358103

21
Page 23 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 24 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

[10] Elmetwally, A., Eldeeb, R., & Elmougy, S. (2024). Deep learning based anomaly
detection in real-time video. Multimedia Tools and Applications, 1-17.
https://doi.org/10.1007/s11042-024-19116-9

[11] Mukto, M. M., Hasan, M., Al Mahmud, M. M., Haque, I., Ahmed, M. A., Jabid, T., ...
& Islam, M. (2024). Design of a real-time crime monitoring system using deep learning
techniques. Intelligent Systems with Applications, 21, 200311.
https://doi.org/10.1016/j.iswa.2023.200311

[12] Abbas, Z. K., & Al-Ani, A. A. (2022). Anomaly detection in surveillance videos based
on H265 and deep learning. International Journal of Advanced Technology and
Engineering Exploration, 9(92), 910. https://doi.org/10.19101/ijatee.2021.875907

[13] Ali, M. M. (2023). Real‐time video anomaly detection for smart surveillance. IET
Image Processing, 17(5), 1375-1388. https://doi.org/10.1049/ipr2.12720

22
Page 24 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 25 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Appendix A: Technologies Used


1. Programming Languages: Python and Google Colab is used for development purposes.

2. Machine Learning and Deep Learning Frameworks

● TensorFlow/Keras has been used for training deep learning models like CNN and
LSTMs.

● OpenCV has been employed for pre-processing, frame extraction, and motion analysis.

● Scikit-learn helps in statistical analysis and implementing baseline machine learning


models.

3. Data Handling and Processing: Pandas, NumPy, Matplotlib, Seaborn and Plotly

4. Video Pre-processing and Feature Engineering

● Motion & Anomaly Detection is implemented by optical Flow, Frame Difference, and
Background Subtraction (MOG2, Static) highlight movement anomalies.

● Feature Extraction has been done by HOG, Canny Edge Detection, and ORB.

5. Deep Learning-Based Feature Extraction

CNN Backbone (ResNet, EfficientNet), LSTM-Based Temporal Modelling and Transformer-


Based Global Attention: Identifies long-range dependencies across frames to enhance anomaly
detection accuracy.

23
Page 25 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 26 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

Appendix B: Code Snippets

The following are screenshots of the steps used to clean and prepare data.

2.1 Class Distribution of Data

24
Page 26 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 27 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.2 Percentage distribution of test and train data

2.3 Percentage Distribution of Crime Types

25
Page 27 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 28 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.4 Number of samples in the dataset for different crime types

26
Page 28 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 29 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.5 Percentage Distribution of Crime Types in Training Set

27
Page 29 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 30 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.6 Crime Frequency

2.7 Data pre-processing

28
Page 30 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 31 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.7.1 Frame extarction, sampling, Resizing & Normalization

2.7.2 Implementation of Optical flow

2.7.3 Frame Difference and Background subtraction methods

29
Page 31 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 32 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.7.4 HOG feature extraction and edge detection

30
Page 32 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 33 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

2.8 Sample frames with Description

31
Page 33 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615
Page 34 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

32
Page 34 of 34 - AI Writing Submission Submission ID trn:oid:::3618:84269615

You might also like