0% found this document useful (0 votes)
16 views26 pages

Thesis PPT Hritu Raj-1

Uploaded by

mhc2022012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

Thesis PPT Hritu Raj-1

Uploaded by

mhc2022012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Indian Institute of Information Technology, Allahabad

CCTV Face Dataset Creation and Face Recognition Using Deep


Learning

Hritu Raj (MHC2022007)


Under Supervision of
Dr. Shivram Dubey
(Asst. Professor IIITA)
Outline
● Introduction
● Motivation
● Problem Statement
● Literature review
● Proposed Dataset
● Proposed Architecture
● Experimental Setup
● Results
● Conclusion
● Future Scope
Introduction
➢ CCTV systems enhance surveillance by
monitoring public and private spaces for
security.
➢ Low-resolution footage often limits the
effectiveness of these systems, making
accurate recognition difficult.
➢ This project aims to improve recognition
accuracy by using super-resolution
techniques with deep learning-based human
subject recognition.
Frigure: 1
Image Source: https://www.asmag.com/showpost/30408.aspx
Motivation

➢ Enhance surveillance for defense and security.


➢ Utilize deep learning and super-resolution.
➢ Tackle poor lighting, low resolution, and varying angles.
➢ Boost precise identification for defense decisions.
➢ Compare and evaluate super-resolution models.
Problem Statement

➢ Low-resolution images have limited


details, making information
extraction difficult.
➢ Recognizing faces in low-resolution
images is challenging even for
humans.
➢ Facial recognition on small face
regions is particularly difficult due to
lost details. Figure:2 CCTV image (a), Low resolution face (b) [2]
Literature Review
S.No. Title Method Dataset Research Gap

9-layer CNN network, for face Labeled Faces in the Wild (LFW) Accuracy is low
1. DeepFace: Closing the Gap verification tasks. dataset containing more than
to Human-Level Performance Incorporates techniques such as
multi-task learning and supervised
13,000 labeled images of faces.

in Face Verification [3] pre-training on a large-scale


dataset.

Deep (CNN) architecture VGG-16 VGGFace dataset, which includes VGGFace highlighted
2. VGGFace: A Deep Face model, adapted for face over 2.6 million images from over challenges in handling
Recognition Model [4] recognition tasks. 2,600 identities. variations in facial
expressions, occlusions, and
It involves training the network to
learn discriminative features aging.
directly from face images .

Uses deep (CNN) based on the VGGFace2 dataset consists of Issues Related to Face
3 VGGFace2: A Dataset for VGG-16 and VGG-19 models. over 3.3 million images from Recognition Accuracy Varying
Recognising Faces across over 9,000 identities. Based on Race and Skin Tone

Pose and Age [5]


Literature Review
Utilizes a combination of machine LFW (Labeled Faces in the Wild), Do not work with low resolution
4 Dlib Face Recognition: A learning algorithms, including a face CIFAR-10 images properly
Robust and Efficient Face detection pipeline based on Histogram
of Oriented Gradients (HOG) features
Recognition Library [6] a face recognition model using deep
metric learning techniques.

Introduced Additive Angular Margin LFW (Labeled Faces in the Wild), Do not work well with low
5 ArcFace: Additive Angular (ArcFace) loss. MegaFace, and MS1M. resolution models.
Margin Loss for Deep Face
Recognition [7]

MobileFaceNet is a compact (CNN) CASIA-WebFace, MegaFace, and Due to lightweight the accuracy is
6 MobileFaceNet: Efficient architecture designed for face LFW (Labeled Faces in the Wild). little less can be improved.
CNN Model for Face recognition tasks on mobile and
embedded devices.
Recognition on Mobile
Reducing model size and
Devices [8] computational complexity
Proposed Dataset

Proposed Dataset
➢ Created for facial recognition in low-resolution
CCTV.
➢ Designed to develop and evaluate recognition
systems.
➢ Focused on low-resolution environments.
Process of Dataset Creation
1. Raw Data Collection
➢ A CCTV camera from TVT company, with a frame rate of Figure: 3 Proposed Dataset
20 FPS and resolution of 1910×1077, was used.
➢ Videos were recorded using a screen recorder on Linux.
Proposed Dataset
Extracted Frames
2. Preprocessing

➢ Irrelevant videos were removed. OpenCV


➢ Images were extracted from the video
frames using OpenCV [9]
➢ Used MTCNN [10] to detect and crop Raw Videos
faces from frames Anomaly

MTCNN
➢ cropped image size of 130x170 pixels

3. Filtering and Manual Annotation

➢ Manual verification and human


annotation for accuracy.
➢ Faces matched to corresponding HD
images.
➢ Ensured precision and reliability.

Final Data Extracted Faces With Anomaly


Figure 4. Dataset creation process
Proposed Dataset
Dataset Characteristics Challenges
Table 1: CVBL-CCTV-FACE Dataset details
The dataset presents several challenges
Category Value common in real-world CCTV footage,
including
Total Videos 58

Videos with Single Person 31 ➢ blurred faces


➢ individuals wearing caps
Videos with Multiple Individuals 27
➢ partial faces
➢ side profiles
Low-Resolution Faces Extracted 8,144 ➢ head-down poses.

High-Definition Face Images 102 These conditions highlight the need for
advanced super-resolution and recognition
Unique Identities 102
models to improve accuracy.
Males 95

Female 7

Age Range 18-30


years Figure : 5 Challenges in Dataset
Dataset
DroneSurf Dataset Table 2 : DroneSurf dataset details [11]

Category Value
Dataset was taken by the
drone Videos 200

Subjects 58

Low resolution 411451


Frames

Low resolution 786813


Faces

High resolution 232


faces
Flow Diagram
Crop Faces

Input

Enhanced Faces
using
Super-Resolution Recognition

Figure 6: Flow diagram


Proposed Architecture

Figure: 7 Architecture for the proposed method.(a) MTCNN [10] (b) Super-resolution (c)
Recognition (FaceNet [14])
Proposed Architecture
Pipeline

➢ Face Detection and Cropping (MTCNN [10])


○ Uses MTCNN [10] for accurate face detection in input images.
○ Identifies potential face regions and refines detections for precise localization.
○ Outputs cropped low-resolution (LR) facial images.
➢ Super-Resolution Enhancement:
○ Applies advanced super-resolution techniques (e.g., GFPGAN [12], CodeFormer [13]) to LR facial images.
○ Upscales images to enhance resolution and recover high-frequency details.
○ Produces high-quality super-resolved (SR) facial images with improved clarity.
➢ Facial Recognition (FaceNet):
○ Utilizes FaceNet [14] deep neural network for facial recognition.
○ Processes SR and high-resolution (HR) reference images to generate embeddings.
○ Compares embeddings to compute distance scores for accurate recognition.
FaceNet Architecture

Figure : 8 FaceNet Architecture


Image Source: https://medium.com/analytics-vidhya/introduction-to-facenet-a-unified-embedding-for-face-recognition-and-clustering-dbdac8e6f02
Other Face recognition models
➢ Evaluated two additional
Table :3 Comparison of number of Embeddings made by the models
models: ArcFace and Dlib.
➢ Dlib failed to generate Model No. of Embeddings
embeddings. on DroneSurf [11]
(Morning Category)
➢ ArcFace performed slightly
better, generating more Dlib [4] 18064
embeddings successfully.
➢ FaceNet performed the best. ArcFace [5] 27657
FaceNet [14] 66042
Total 66633
Experimental Setup
Hardware Configuration
● GPUs: NVIDIA GPUs RTX 4090 with CUDA support for accelerated computation.
● CPU: High-performance multi-core CPU for handling preprocessing and data management tasks.
● Storage: High-capacity SSDs for fast data access and storage of large datasets and model
checkpoints.

Software Configuration
● Operating System: Ubuntu 20.04 LTS
● Deep Learning Frameworks: TensorFlow and PyTorch for implementing and training models.
● CUDA and cuDNN: NVIDIA CUDA and cuDNN libraries for GPU acceleration.
● Python: Python 3.10 as the primary programming language, with various libraries for data handling
and preprocessing.
Illustrative Result

Enhanced
Cropped
Input

Figure 9: Illustrative result


Result

Table : 4 Accuracy on CVBL-CCTV-FACE Dataset

S.No Super Resolution Model Accuracy (Pretrained) % Accuracy FT(%)

1 Without Super Resolution 64.93 --

2 CodeFormer [13] 58.23 68.62

3 GFPGAN [12] 61.54 70.12

4 ESRGAN [15] 60.13 61.23

5 Real-ESRGAN [16] 59.39 62.97


Result

Table : 5 Accuracy on DroneSurf [11] Dataset

S.No Super Resolution Model Name Accuracy (Pretrained)% Accuracy FT(%)

1 Without Super Resolution 42.81 ––

2 GFPGAN [12] 55.23 69.16

3 CodeFormer [13] 54.26 59.56

4 ESRGAN [15] 32.45 35.30

5 Real-ESRGAN [16] 38.61 41.23


Result
Table : 6 Accuracy Unpaired SR [17] on CVBL-CCTV-FACE Dataset
S.No Model Name R-Acc (%) Pretrained Fine-tuned

1 Unpaired SR 20x20 5.71 14.56 32.56

2 Unpaired SR 32x32 11.58 16.46 40.76

3 Unpaired SR 50x50 38.80 34.15 43.73

Table : 7 Accuracy Unpaired SR on DroneSurf [11] Dataset


S.No Model Name R-Acc (%) Pretrained Fine-tuned

1 Unpaired SR 20x20 8.24 12.23 17.14

2 Unpaired SR 32x32 18.01 21.28 32.71

3 Unpaired SR 50x50 35.60 33.26 41.20


Conclusion

➢ This study effectively combined super-resolution techniques with the FaceNet


[14] facial recognition system to enhance human identification in
low-resolution CCTV footage.
➢ By evaluating super resolution models on the CVBL FACE and DroneSURF
[11] datasets, we demonstrated significant improvements in recognition
accuracy.
➢ Notably, GFPGAN [12] achieved the highest accuracy, and Unpaired SR [17]
and CodeFormer [13] models showed substantial gains with higher resolution
inputs.
Future Scope
➢ The super-resolution model and recognition model can be fused for better
accuracy.
➢ Models can be made light weight so that it can be deployed on any device.
➢ Size of CVBL-CCTV-Face dataset can be increased.
References
[1] "How AI-Based Super-Resolution Enhances CCTV Footage," asmag.com, 2024. Available at:
https://www.asmag.com/showpost/30408.aspx. Accessed: 12-July-2024.

[2] chowdhuri, D., K-S, S., M, R. & pradeep reddy, C. (2012). Very Low Resolution Face Recognition Problem. IEEE Transactions on
Image Processing, 21(1):327–340. doi: 10.1109/tip.2011.2162423

[3] Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the gap to human-level performance in face verification. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep Face Recognition. In British Machine Vision Conference (BMVC).

[5] Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2018). VGGFace2: A dataset for recognising faces across pose and age.
International Conference on Automatic Face and Gesture Recognition (FG).

[6] King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10, 1755-1758.

[7] Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Chen, Y., Wen, F., Zhu, W., Zhang, Y., & Wang, Z. (2018). MobileFaceNets: Efficient CNNs for accurate real-time face verification on
mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] OpenCV: Open Source Computer Vision Library. https://opencv.org/


References
[10] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao, "Joint Face Detection and Alignment Using Multi-task Cascaded Convolutional
Networks", IEEE Signal Processing Letters, 2016.

[11] I. Kalra, M. Singh, S. Nagpal, R. Singh, M. Vatsa, and P. B. Sujit, "DroneSURF: Benchmark Dataset for Drone-based Face Recognition," in
IEEE Xplore. IIIT-Delhi, India.

[12] Xintao Wang, Yu Li, Honglun Zhang, Ying Shan, "Towards Real-World Blind Face Restoration with Generative Facial Prior," in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021.

[13] Qing Guo, Xiaoming Li, Yulun Zhang, Yun Fu, Thomas H. Li, "CodeFormer: Towards Robust Face Restoration with Codebook Lookup
Transformer," in Proceedings of the 30th ACM International Conference on Multimedia (MM '22), 2022.

[14] Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering," in Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[15] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang, "ESRGAN: Enhanced
Super-Resolution Generative Adversarial Networks," in Proceedings of the European Conference on Computer Vision Workshops (ECCVW),
2018.

[16] Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan, "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data,"
in Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2021.

[17] Bulat, A., Yang, J., and Tzimiropoulos, G. (2021). "To learn image super-resolution, use a GAN to learn how to do image degradation first."
arXiv preprint arXiv:2003.04047.
Thank You

You might also like