0% found this document useful (0 votes)
15 views

Masked Face Recognition Using Deep Learning Model

Uploaded by

jksdbnjkdvnj234
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Masked Face Recognition Using Deep Learning Model

Uploaded by

jksdbnjkdvnj234
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

Masked Face Recognition using Deep Learning


Model
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N) | 978-1-6654-3811-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICAC3N53548.2021.9725368

Manoj Kumar Rachit Mann


Department of Computer Science & Engineering Department of Software Engineering
Delhi Technological University Delhi Technological University
Delhi, India Delhi, India
[email protected] [email protected]

Abstract— Face Masks have become part of our day-to-day images and utilize these features to identify or verify the
activities. But these create a problem with the existing face identity of a person. We are utilizing the masked faces instead
recognition techniques. Existing face recognition techniques of a combination of the actual face and masked face to
vary from simple ML techniques such as SVM, PCA, etc. to understand the performance of the models under observation.
state-of-the-art models such as ResNet, VGG, etc. In this paper, We analyzed the deep learning models that are available for
we are studying the effects of masked faces on the performance transfer learning and choose the top seven model architectures
of face recognition techniques. Face recognition itself is divided of the study. The selected models are trained and tested using
into face verification and face identification. This study is transfer learning. The input and prediction layers of these
performed for the face identification task using different deep
models are tuned for the dataset under observation. Apart from
learning models. In particular, we are studying the pre-trained
deep learning models that are trained using transfer learning for
these models, we have introduced our model based on ResNet
face identification tasks. In this study, a custom dataset is used architecture for this study. This study is organized in the
consisting of 65 subjects. The custom dataset is a part of the following manner. Section II contains the existing work done
VGGFace2 dataset. Subject faces are augmented with masks. in the face recognition field in the aspect of deep learning and
The dataset doesn’t consist the faces without masks. In this masked faces. It also contains the gaps that the paper is
study, we are using different popular pre-trained models such addressing. Section III describes the proposed architecture and
as VGG16, InceptionV3, etc. We are re-training those pre- dataset selected to perform this study along with the
trained models on a custom dataset and analyzing the gathered performance criteria and pre-trained models selected for the
results. Apart from the pre-trained models, a new model has comparison. Section IV includes different plots and tables that
been proposed for the masked face identification task. were recorded after the execution of the experiment. Section
V points out the conclusion of the study and the possible
Keywords— VGG, SVM, PCA, ResNet, Inception, extensions for future work based on this paper.
InceptionResNet, MobileNet, RggNet, Transfer Learning
II. RELATED WORK
I. INTRODUCTION
Initially, the face recognition techniques were using
In the AI field, deep learning is being used to solve a Machine learning techniques such as PCA, NMF, SVM, etc.
variety of problems in various domains such as image pattern NMF based framework for partial face recognition was able
analysis. One such field is face recognition. Face recognition to achieve 95.17% accuracy by using SFNMF for the bottom
is currently the most popular field whether it be a traffic region of the face, i.e., faces with occlusion on the eyes
ticketing system or personal device authentication. Face region.[1] A similar NMF framework was utilized to detect
recognition is the process of identifying or verifying a person the occlusion on the faces with an accuracy of 91.9% accuracy
in videos and images. A basic face recognition process on images with sunglass but dropping to 79.4% with scarfs.[2]
requires a source for providing input and matching that input MCF-based methods utilized the concept of prior knowledge
against a set of faces available in the database. The face to compensate for occlusion on the faces.[3] The initial deep
recognition process has become more sophisticated with the learning approach utilized very few layers (5 layers to be
use of modern approaches utilizing machine learning. precise) to detect the occluded faces in videos. This 5 layer
Face recognition techniques varies from simple ML model was able to achieve an f1-score of 0.803 with recall at
techniques to complex neural networks. State-of-the-art 0.925 and precision at 0.71 which are reported higher than
systems are based on deep learning models. In face other techniques under consideration such as SVM,
recognition, the mask on faces results in false prediction. The Adaboost.[4] MaskNet was among the first deep learning
majority of a face recognition system requires the complete models able to achieve an accuracy of 96.6 which was
face for identification of an individual and face masks have reported better than traditional approaches used for face
covered the majority of facial features that act as unique recognition. But this approach was not designed to test the
identifiers for the existing face recognition systems. Nose, face recognition on masked faces. [5] A similar approach of
mouth are the two important face identifiers that are hidden deep learning models was utilized with the intent of an
under a mask resulting in false prediction by existing face optimized solution. The system was developed by introducing
recognition techniques. To understand the degradation in batch normalization for initial and final layers along with a
performance of face recognition techniques, this study has softmax classifier for fully connected layers. The result was
been performed. In this study, the deep learning models are higher accuracy compared to existing systems. [6] Earlier
being studied that are available for transfer learning. Initial discussed ideas were using deep learning models for face
ML techniques require the need to extract features from the recognition but none of the ideas suggested using deep
inputs but with the use of deep learning, this step is learning models for occlusion detection on faces. Wan, W. &
automatically performed by a deep learning model. Deep Chen, J. proposed using deep learning models for occluded
learning models extract the features of the face from the input faces. MaskNet was proposed as a result of their study.

ISBN: 978-1-6654-3811-7/21/$31.00 ©2021 IEEE


428
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 08,2024 at 04:06:07 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

MaskNet provided a clear distinction between occluded areas features provided by the lower layer due to the introduced
and non-occluded areas of the face. Occlusion introduced layer in the shortcut path. Figure 1 represents the proposed
corruption to the training process and to tackle this, MaskNet RggNet model architecture. RggNet model has three sub-
assigned more weights to non-occluded regions and thus, blocks arranged in ResNet50v2 manner. The Identity block in
decreasing the impact of an occluded area in the training the original ResNet has been modified with our block. Our
process. As a result, an improvement of 1.4% to 3% was identity block, represented as IDENTITY*, has a convolution
observed. [7] Guo, G & Zhang, N. presented the various layer in place of a direct shortcut available in the original
challenges that are part of the face recognition field in the ResNet50v2 model.
unrestricted environment for deep learning models. The effect
of the quality of input images on results was studied. It was
found that higher-quality images yield a higher recognition
rate compared to medium and low quality where improvement
was marginal. One of the biggest challenges unraveled was
keeping the level of quality in check for better recognition. [8]
One important factor affecting the recognition rate is found to
be the available features on the face. The paper by
Elmahmudi, A., & Ugail, H. presented the idea of partial faces
effect on facial recognition. It was found that partial parts such
as cheek, nose, forehead, and mouth have low recognition
rates whereas top half of the face, right or left half and for ¾
of the face, the recognition rates can reach 100%. The a. CONV Block
experiment utilized VGG-face architecture in combination
with SVM. [9] This experiment leads us to design another
experiment with different deep learning models. Individual
models such as InceptionResnetV1 [10], AlexNet [11],
YOLOV3 [12] have been studied for occlusion-based face
recognition. MaskedFace-Net [13] dataset is created for
COVID-19 context consisting of good quality images with
proper masked and improperly masked labels. An attention-
based approach (CBAM) [14] utilized cropping of masked
faces in ratio. 0.7 is found to be the most optimal ratio for
cropping the faces and getting the maximum performance on
the models based on the test cases utilized in the
experimentation. The review paper by Adjabi, I., Ouahabi, A., b. IDENTITY* Block
Benzaoui, A., Taleb-Ahmed, A. [15] presented the latest open
challenges in the field of face recognition and occlusion based
face recognition is still a big open challenge. To accommodate
for occlusion, we are using masked faces. Instead of feeding
actual faces along with masked images to the network, we are
feeding the masked images along with cropping to understand
and study the performance of the models under consideration.
III. EXPERIMENTAL SETUP
In this experiment, the actual face knowledge is not being
used to test out the capabilities of the models when only the
masked faces are available to the model. This section is c. Pool Block
divided into 4 sub-sections. Sub-section A contains the details
of the proposed architecture used in this study. Sub-section B
contains the description of the dataset used in this study. Sub-
section C contains the description of models used for this
study. Sub-section D contains the performance metrics
utilized for the study.
A. Proposed RggNet Model Architecture
In the experiment, a modified version of ResNet
architecture is utilized to improve upon the existing models.
The modified version utilizes additional layers in shortcut
paths of a basic ResNet block. A convolution layer has been
added to the shortcut path in the basic ResNet block. The
added layer is introduced in a fashion such that the shortcut
paths in collaboration resemble a basic VGG block. This d. RggNet Model
enables the proposed model to learn an identity function that Fig. 1. RggNet Architecture and its sub-blocks
ensures that the higher layers perform better than the lower
layer. Higher layers will achieve a better understanding of the

429

Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 08,2024 at 04:06:07 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

B. Dataset Parameter Description


In the experiment, a modified version of the vggface2 Hold-out validation in ratio 8:2 for
Validation Strategy
training and testing
dataset is utilized. Pre-processing of part of the dataset is done
Pre-Training Dataset ImageNet
in which masks are augmented on the face of individuals. The
augmentation of masks is taking place in four steps: Batch Size 32
Optimizer Adam Optimizer
(a) Finding facial landmarks using the face-alignment Loss Function Categorical Cross-Entropy
library Maximum number of
25
Training epochs
(b) Triangulating the extracted points of masks to
generate masks database.
(c) Augmenting the masks on subjects using facial D. Performance Metrics
landmarks for pose estimation. Table III represents the performance metrics that are used
for analyzing the performance of the models.
(d) Cropping the masked faces in the ratio of 0.7 on
subjects. [14] TABLE III. PERFORMANCE METRICS
The custom dataset finally has 22,647 images belonging to Metric Description
65 classes. Hold-out validation is utilized in ratio 8:2. The It is the ratio of correct outcomes to the total outcomes
training set consists of 18,092 images and the validation set Accuracy
of the experiment
consists of 4,555 images. This dataset has been generated Top-5 It represents the ratio in which the actual outcome is in
under the assumption that the face images are not available Accuracy the top 5 predictions made.
without the mask. It represents the approximate AUC calculated using the
AUC
Riemann sum method
C. Models Under Observation It is the ratio of correct positive outcomes to the total
Precision positive outcomes for a class. It represents how many
In this experiment, pre-trained deep learning models are positive outcomes are actually correct for a class.
used using transfer learning for masked face recognition. It is the ratio of correct positive outcomes to the total
Following is the list of pre-trained deep learning models and Recall
correct (positive and negative) outcomes for a class. It
the proposed model under observation: represents how many actual outcomes for a class are
correctly predicted.
• ResNet50
• VGG16
• InceptionV3
• Xception
• InceptionResNetV2
• MobileNetV2
• DenseNet201
• RggNet (Our Proposed Model)

TABLE I. MODELS AND THEIR PARAMETERS

Model Parameters
RGGNet 43,269,953

ResNet50V2 25,613,800

VGG16 138,357,544

InceptionV3 23,851,784 Fig. 2. Confusion Matrix along with performance metrics for a class

Xception 22,910,480 Since accuracy alone cannot define the performance of a


InceptionRes model, AUC, precision, and recall are also chosen for this
55,873,736
NetV2 experiment. Precision and recall can easily help to understand
MobileNetV2 3,538,984 the nature of the outcomes in combination with accuracy.
AUC helps us to determine the power of the model in terms of
DenseNet201 20,242,984 understanding the dataset. A value of 0.5 AUC represents that
predictions made by the model are no better than the
prediction made by some random model. AUC determines the
Table II represents the various hyper-parameters utilized
usefulness of the model.
for training and testing the pre-trained models for the proposed
experiment. IV. RESULTS AND DISCUSSION
TABLE II. HYPERPARAMETERS OF THE PRE-TRAINED MODELS After the successful execution of the experiment, the
following results have been obtained.
A. Performance of the models under observation
Table IV is representing the outcome of the execution of
the experiment.

430

Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 08,2024 at 04:06:07 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

TABLE IV. PERFORMANCE RESULTS OF DIFFERENT TRAINED MODELS


Performance Metric
Model Accura Top-5 AUC Precisi Recall
cy Accura on
cy
RggNet (Our 0.608 0.841 0.953 0.777 0.519
Model)
ResNet50v2 0.506 0.774 0.770 0.507 0.506
VGG16 0.465 0.740 0.915 0.694 0.350
InceptionV3 0.357 0.657 0.777 0.385 0.346
Xception 0.363 0.643 0.726 0.371 0.360
InceptionResNet 0.436 0.714 0.801 0.462 0.425
V2
MobileNetV2 0.400 0.678 0.726 0.403 0.399
DenseNet201 0.467 0.740 0.785 0.477 0.464

B. Important observations based on the performance metric


values obtained
From the above table, we can see that our model (RggNet)
has performed better than the rest of the models in every Fig. 5. AUC of different models
aspect. The following plots has been generated based on the
results:

Fig. 6. The precision of different models

Fig. 3. Accuracy of different models

Fig. 7. Recall of different models

Fig. 4. Top-5 Accuracy of different models


Based on the above plots, the following observations are
made for pre-trained models:

431

Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 08,2024 at 04:06:07 UTC from IEEE Xplore. Restrictions apply.
2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

a) ResNet50V2 outperforms every other model in utilized for faster training. Fine-tuning of layers can be
terms of accuracy and top-5 accuracy while also maintaining performed, i.e., the further scope of fine-tuning can be studied
high precision and recall. for these models in the future. The results can act as a baseline
b) In terms of precision, VGG16 has the highest value, for developing a better deep learning model for the masked
i.e., VGG16 has shown a very low number of false face recognition problem without using the actual face
knowledge.
predictions of positive instances. The AUC value of VGG16
is very high (>0.9), thus stating it is very much capable of REFERENCES
differentiating between positive and negative instances or in [1] H. F. Neo, C. C. Teo, and A. B. J. Teoh, “Development of Partial
this case, identify the person more easily from the crowd. Face Recognition Framework,” Seventh International Conference on
c) ResNet50V2 is found to be more capable of masked Computer Graphics, Imaging and Visualization, 2010.
face recognition compared to VGG16 when all the [2] Y. Su, Y. Yang, Z. Guo, and W. Yang, “Face recognition with occlusion,”
performance factors are taken into consideration as it is 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 2015.
[3] E. J. He, J. A. Fernandez, B. V. K. V. Kumar, and M. Alkanhal,
performing highest in 3 out 5 performance criteria and 2nd “Masked correlation filters for partially occluded face recognition,”
highest in one of the remaining criteria. IEEE International Conference on Acoustics, Speech, and Signal
Now considering our model for comparison, the following Processing (ICASSP), 2016.
observations are made: [4] S. Lin, L. Cai, X. Lin, and R. Ji, “Masked face detection via
a modified LeNet,” Neurocomputing, vol. 218, pp. 197– 202,
a) RggNet has been advantages of both the ResNet50V2 2016. [Online]. Available: 10.1016/j.neucom.2016.08.056;
and the VGG16 model. https://dx.doi.org/10.1016/j.neucom.2016.08.056
[5] M. Wang, Z. Wang, and J. Li, “Deep convolutional neural network
b) RggNet outperforms every other model in every applies to face recognition in small and medium databases,” 4th
performance criteria. International Conference on Systems and Informatics (ICSAI), 2017.
c) The top-5 accuracy of RggNet is more than 0.8 which [6] M. Coskun, A. Ucar, O. Yildirim, and Y. Demir, “Face recognition
means that the actual identity of the person under observation based on convolutional neural network,” International Conference on
Modern Electrical and Energy Systems (MEES), 2017.
is in the top-5 predictions made by our model.
[7] W. Wan and J. Chen, “Occlusion robust face recognition based on mask
d) In terms of precision, RggNet has a value greater learning,” IEEE International Conference on Image Processing (ICIP),
than 0.75, i.e., RggNet has a very low false prediction rate 2017.
over the positive instances. [8] G. Guo and N. Zhang, “What Is the Challenge for Deep Learning in
Unconstrained Face Recognition?” 13th IEEE International
e) The AUC of 0.95 for RggNet throws the light that Conference on Automatic Face & Gesture Recognition (FG, 2018.
the prediction made by the model is not just some random [9] A. Elmahmudi and H. Ugail, “Experiments on Deep Face Recognition
prediction but the model has identified the features to predict Using Partial Faces,” International Conference on Cyberworlds
the identity of the person. (CW), 2018.
[10] G. Wu, J. Tao, and X. Xu, “Occluded Face Recognition Based on the
V. CONCLUSION AND FUTURE SCOPE Deep Learning,” Chinese Control And Decision Conference (CCDC),
2019.
Existing pre-trained deep learning models have been [11] S. Khan, E. Ahmed, M. H. Javed, S. A. A Shah and S. U. Ali, "Transfer
studied and compared in this study and it has been found that Learning of a Neural Network Using Deep Learning to Perform Face
ResNet50 has the most potential to solve the problem of Recognition," 2019 International Conference on Electrical,
Communication, and Computer Engineering (ICECCE), Swat,
masked faces compared to any other pre-trained deep Pakistan, 2019, pp. 1-5.
learning model available. The proposed architecture that is [12] M. R. Bhuiyan, S. A. Khushbu, and M. S. Islam, “A Deep Learning-
derived from ResNet has shown better accuracy and other Based Assistive System to Classify COVID-19 Face Mask for Human
performance parameters compared to all the pre-trained Safety with YOLOv3,” 11th International Conference on Computing,
Communication and Networking Technologies (ICCCNT), 2020.
models under observation. Along with this, the performance [13] Adnane Cabani, Karim Hammoudi, Halim Benhabiles, and Mahmoud
of different deep learning models has been studied. A Melkemi, “MaskedFace-Net – A dataset of correctly/incorrectly
baseline has been generated for the state-of-the-art pre- masked face images in the context of COVID-19”, Smart Health,
trained models to work upon. Volume 19, 2021.
[14] Li, Y., Guo, K., Lu, Y., and Lui L., “Cropping and attention-based
In the future, optimization of the existing deep learning approach for masked face recognition”, Appl Intell (2021).
models for performance improvement can be performed. The [15] Adjabi, I., Ouahabi, A., Benzaoui, A., Taleb-Ahmed, A. “Past, Present,
proposed model can be enhanced further by the use of layer and Future of Face Recognition: A Review”. Electronics 2020, 9,
1188.
optimization in future iterations. Currently, transfer learning is

432

Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 08,2024 at 04:06:07 UTC from IEEE Xplore. Restrictions apply.

You might also like