0% found this document useful (0 votes)
20 views

A Comparative Study of Machine Learning and Automatic Machine Learning Models For Facial Mask Recognition

this study provides insights into the potential of automatic machine learning models in computer vision tasks, specifically in facial mask recognition. The study concludes that while the traditional machine learning models may be more computationally efficient, the automatic machine learning models can offer comparable or better performance in facial mask recognition.

Uploaded by

sit22cs182
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

A Comparative Study of Machine Learning and Automatic Machine Learning Models For Facial Mask Recognition

this study provides insights into the potential of automatic machine learning models in computer vision tasks, specifically in facial mask recognition. The study concludes that while the traditional machine learning models may be more computationally efficient, the automatic machine learning models can offer comparable or better performance in facial mask recognition.

Uploaded by

sit22cs182
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 The 8th International Conference on Computer and Communication Systems

A Comparative Study of Machine Learning and


Automatic Machine Learning Models for Facial Mask
Recognition
Yuxin Pei
2023 8th International Conference on Computer and Communication Systems (ICCCS) | 978-1-6654-5612-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCCS57501.2023.10151333

dept. Glasgow College


UESTC
Chengdu, China

Abstract— In this study, we compare the performance of practitioners time and resources and improve the performance
traditional machine learning models and automatic machine of models [8].
learning models in facial mask recognition. We use a dataset of
images of individuals wearing masks and not wearing masks to However, there is limited research comparing traditional
train and test the models. We evaluate the models based on and autoML models for facial mask recognition. Furthermore,
accuracy, precision, recall, and F1 score metrics. Our results the focus on these models’ trade-offs between accuracy and
show that automatic machine learning models achieve similar or efficiency has not been fully explored [9]. This study aims to
slightly better performance than traditional machine learning fill this gap by comparing the performance of traditional
models but at the cost of longer training time. This study’s results machine learning models and automatic machine learning
can help practitioners select the appropriate model for facial models in facial mask recognition and evaluating the trade-offs
mask recognition based on the trade-off between accuracy and between accuracy and efficiency [10].
efficiency. Additionally, this study provides insights into the
potential of automatic machine learning models in computer The results of this study can have practical implications for
vision tasks, specifically in facial mask recognition. The study practitioners in selecting the appropriate model for facial mask
concludes that while the traditional machine learning models recognition based on the trade-off between accuracy and
may be more computationally efficient, the automatic machine efficiency. Additionally, this study provides insights into the
learning models can offer comparable or better performance in potential of automatic machine learning models in computer
facial mask recognition. vision tasks, specifically facial mask recognition.

Keywords-machine learning; facial mask recognition;


automatic machine learning
II. LITERATURE REVIEW
Existing facial mask recognition algorithms can be divided
I. INTRODUCTION into two categories: The first type is used for inlet and outlet,
the detected person should actively participate, and it can only
Facial mask recognition has become important in the detect a single face target in one turn. This application scenario
current global scenario due to the COVID-19 pandemic [1]. is called the active scenario, which has high requirements for
The ability to automatically detect whether a person is wearing algorithm accuracy [11]. Currently, most research adopts the
a mask in public spaces can help enforce mask-wearing idea of feature extraction and image classification to solve it.
regulations and identify individuals who are not complying The second type is applied in public places with dense crowds.
with safety guidelines [2]. The use of machine learning models Surveillance cameras automatically capture Pedestrians and
has been widely explored in this task. Still, the recent unconsciously participate in detection [1]. A single image
emergence of automatic machine learning (AutoML) models includes many face targets. This kind of scenario is called a
has raised questions about their effectiveness and efficiency passive scenario with a complex environment background-wide
compared to traditional machine learning models [3]. This detection range, so it is easy to have interfered with and has a
study aims to compare the performance of traditional machine high requirement for the real-time performance of the
learning models and automatic machine learning models in algorithm [12]. This scenario application is mainly realized
facial mask recognition. based on an object detection model. An algorithm for active
The use of machine learning models for facial mask scenarios combines deep convolutional networks with machine
recognition is not new, and various researchers have proposed learning classification. Oumina first carried out a study: They
various deep neural network models like VGG16 [4], ResNet selected VGG-19 [13], Xcption [14], and MobileNetV2 [15],
[5], Inception [6], etc. But with the advancements in computer combined with a support vector machine and K-nearest
vision and machine learning, the idea of using automatic neighbor classification, respectively to conduct experiments.
machine learning models for this task has emerged [7]. The accuracy of detection results was more than 94%, proving
AutoML models are designed to select, optimize, and fine-tune this method’s feasibility. The comparison showed that the
a given task’s best machine learning algorithm. This can save MobileNetV2+ support vector machine combination had the

978-1-6654-5612-8/23/$31.00 ©2023 IEEE 1047


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
best detection effect on the mask face data set, with an Additionally, its two-stage approach allows for a high degree of
accuracy of 97.11% [16]. This kind of algorithm has flexibility in terms of the specific architecture and algorithms
outstanding advantages in detection accuracy. However, due to used in each stage, allowing for further optimization, and fine-
the superposition and combination of models, the modeling tuning of the model.
time is long, and the process is complicated. The design of two-
stage network increases the computational complexity. The However, despite its strengths, Faster R-CNN may struggle
reasoning speed is slow, which basically cannot meet the real- in certain scenarios such as those with complex backgrounds or
time requirements [17]. Detection of mask wearing in passive heavy occlusion. Additionally, its computational complexity
scenarios also plays an important role in public health. The can be an issue, making it less suitable for real-time
detection in the passive scenarios is mainly carried out for the applications. Overall, Faster R-CNN is a powerful object
dense crowd. It needs to accurately locate multiple face targets detection framework that offers a good balance between
in the image and show whether the masks are worn or not. accuracy and speed, making it a popular choice for many
Most of these face targets have different scales, and some of researchers and practitioners in the field.
them are small, which brings great challenges to the
implementation of the algorithm [18]. B. RetinaNet
RetinaNet is a state-of-the-art object detection model that
On the other hand, passive scenarios involve the automatic utilizes a novel architecture known as Feature Pyramid
capture of individuals in crowded public spaces, focusing on Networks (FPN) and Focal Loss to address the problem of
real-time performance and the ability to detect multiple face class imbalance in object detection. The FPN architecture is
targets in a single image. Two-stage model, Faster RCNN, is built on top of a backbone CNN such as ResNet and allows for
one of the mainstream target detection frameworks, often used effective feature extraction at multiple scales. This is
to improve various special cases, especially in the face of accomplished by creating a pyramid of feature maps with
multi-scale targets. Both its detection accuracy and speed are different resolutions, which are then combined to form a more
great. Mask-wearing detection is not an exception, and many comprehensive representation of the input image.
researchers preferred Faster RCNN as the infrastructure to
build the model [19]. Through experiments, they found that the Additionally, RetinaNet addresses the problem of class
Faster RCNN model could not deal with the complex detection imbalance by introducing a new loss function called Focal Loss,
environment background and target occlusion in the passive which focuses on hard, misclassified examples and down-
scenario. The detection effect of small-scale face targets in weights easy examples. This allows for better convergence and
masks was not very satisfactory [20]. When the VGG feature improved performance on small, highly overlapped objects,
extraction network used in the original model was replaced which are often under-represented in the training data.
with ResNet-101 with deeper layers and less computation, the RetinaNet is an effective object detection model that
average accuracy increased from 77.2% to 89.41%. However, utilizes FPNs and Focal Loss to address the problem of class
the detection ability of small-scale targets is still not much imbalance, which allows for improved performance on small,
improved [21]. highly overlapped objects. The model has achieved state-of-
the-art performance on various benchmark datasets and is
III. MACHINE LEARNING MODELS widely used in various applications.

A. Faster R-CNN C. YOLOv4


Faster R-CNN is a widely used object detection framework YOLOv4 (You Only Look Once version 4) is a state-of-
that utilizes a two-stage approach to improve its performance. the-art object detection algorithm that uses a single
As shown in figure 1, in the first stage, a region proposal convolutional neural network (CNN) to perform object
network (RPN) is used to generate candidate regions of interest detection in images and videos. It is an improvement over
(RoIs) that may contain objects. In the second stage, these RoIs previous versions of YOLO and is considered to be one of the
are passed through a convolutional neural network (CNN) to fastest and most accurate object detection algorithms currently
classify and refine the object bounding boxes. available.
As shown in Figure 3, the YOLOv4 architecture is based on
a series of convolutional layers, followed by upsampling layers,
and a final output layer. The CNN is trained to predict the
bounding boxes and class probabilities for objects in an input
image. The key feature of YOLOv4 is its use of anchor boxes,
which are pre-defined boxes with different aspect ratios, that
are used to detect objects of different shapes and sizes.

Figure 1. The framework of Faster R-CNN [20]

One of the key advantages of Faster R-CNN is its ability to


handle a wide range of object scales, making it particularly
useful for detection tasks involving small or multi-scale objects.

1048
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
IV. EXPERIMENTS
Experimental results on MFDD datasets confirm that
EasyDL has comparable accuracy to Faster R-CNN, RetinaNet,
and YOLOv4 methods, while providing a unified framework
for both training and inference. EasyDL has much better
accuracy. EasyDL achieves 0.98 mAP, 0.77 recall on the test,
outperforming a comparable state of the other models.
A. Datasets
Figure 2. The example of YOLOv4 [22] The Masked Face Detection Dataset (MFDD) was created
by combining sources from various research studies and
YOLOv4 also utilizes a number of techniques to improve images obtained through internet crawling [23]. The collected
its performance, including a new network architecture called images were then meticulously annotated with information
SPP-Net, a multi-scale training strategy, and a novel data such as the presence of a mask and the location of the mask on
augmentation technique called Mosaic data augmentation. the face. This resulted in a comprehensive dataset comprising
Additionally, YOLOv4 utilizes a number of techniques to make 24,771 images of individuals wearing masks. This dataset can
it more robust to object occlusion, such as a new feature called be utilized to train a highly accurate model for the task of
Cross Mini-Batch Normalization (CmBN) and a newly masked face detection, which can, in turn, aid in masked face
introduced activation function called Mish. recognition.
In terms of its performance, YOLOv4 has shown to be
highly effective in object detection tasks, achieving state-of-
the-art results on several benchmark datasets such as COCO,
VOC, and ImageNet. Furthermore, YOLOv4 has a very fast
inference speed, making it suitable for real-time object
detection applications.
In conclusion, YOLOv4 is a powerful and efficient object
detection algorithm that is able to achieve state-of-the-art
performance while maintaining a fast inference speed. Its
unique features such as anchor boxes, SPP-Net, multi-scale
training, Mosaic data augmentation, CmBN, and Mish
activation function make it highly effective at detecting objects
in images and videos, making it a suitable solution for a wide Figure 3. The example of FMDD [23]
range of real-world object detection applications.
The MFDD dataset has been extensively used in recent
D. Automatic Machine Learning studies, providing a benchmark for comparing the performance
of different machine learning models for mask recognition. It
EasyDL is a platform that aims to make the process of
has been demonstrated that models trained on the MFDD
building, deploying, and maintaining deep learning models
dataset achieve state-of-the-art performance on various mask
more accessible and efficient for both experts and non-experts
recognition tasks, making it a valuable resource for researchers
alike. The platform utilizes a user-friendly interface that allows
and practitioners in the field. The MFDD can also be employed
users to easily upload their data, select the appropriate model
to determine compliance with mask-wearing regulations during
architecture, and set the desired hyperparameters. It then
widespread disease outbreaks such as the coronavirus
automatically optimizes the model architecture and pandemic.
hyperparameters for the given data, reducing the need for
manual tuning.
B. Metrics
One of the key features of easyDL is its ability to perform Precision, recall, and mAP is three commonly used
model selection and ensemble, which combines multiple performance metrics in object detection tasks such as facial
models to improve the overall performance. Additionally, it mask detection. These metrics are used to evaluate and
also provides the ability to perform model interpretability, compare a model’s performance with other models. The
which allows users to understand the model’s decision-making precision and recall are calculated using true positives (TP),
process and identify any potential issues. false positives (FP), true negatives (TN), and false negatives
Furthermore, easyDL also offers the ability to deploy the (FN).
trained models to various platforms such as web, mobile, and Precision is a metric that measures the proportion of true
edge devices with minimal effort. This makes it easy for users positive detections among all positive detections. It is
to integrate their models into their existing systems and calculated as:
products.
Precision: (TP / (TP + FP))

1049
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
A high precision means that the model has a low false V. CONCLUSION
positive rate, which means that it is able to correctly identify In conclusion, after an extensive evaluation of various
objects in the image while minimizing the number of false machine learning models and automatic machine learning
detections. models for mask recognition, it can be determined that there is
The recall metric measures the proportion of true positive no one-size-fits-all solution. Each model possesses its own set
detections among all actual positive instances. It is calculated of advantages and limitations. For instance, models such as
as the ratio of accurate positive detections to the sum of TP and YOLO have a high speed in object detection, while RetinaNet
FN detections. It is calculated as: and Faster RCNN have a strong balance of precision and recall.
Additionally, automatic machine learning models can optimize
Recall: (TP / (TP + FN)) the model architecture and hyperparameters, saving time and
A high recall means that the model has a low false negative resources. Furthermore, it is essential to consider the specific
rate, which means it can correctly identify most objects in the requirements and constraints of the application when selecting
image. In the facial mask recognition task, a high recall means a model for the task of mask recognition. Finally, using high-
that the model is able to correctly identify most of the masks in quality and diverse datasets is also crucial for training and
the images and minimize the number of false negatives, where evaluating the model’s performance.
the model fails to detect the mask.
Mean Average Precision (mAP) is a metric that combines REFERENCES
precision and recall into a single value. It is calculated by [1] J. S. Talahua, J. Buele, P. Calvopiña, and J. Varela-Aldás, "Facial
recognition system for people with and without face mask in times of the
averaging the precision at different recall levels. The precision- covid-19 pandemic," Sustainability, vol. 13, no. 12, p. 6900, 2021.
recall curve is a plot of precision versus recall at different
[2] I. Q. Mundial, M. S. U. Hassan, M. I. Tiwana, W. S. Qureshi, and E.
thresholds. The mAP is the area under the precision-recall Alanazi, "Towards facial recognition problem in COVID-19 pandemic,"
curve. A model that has high mAP means that it has high in 2020 4rd International Conference on Electrical, Telecommunication
precision and recall. In other words, it can correctly identify and Computer Engineering (ELTICOM), 2020: IEEE, pp. 210-214.
objects in the image while minimizing the number of false [3] O. El Gannour, B. Cherradi, S. Hamida, M. Jebbari, and A. Raihani,
detections and false negatives. "Screening Medical Face Mask for Coronavirus Prevention using Deep
Learning and AutoML," in 2022 2nd International Conference on
A good model should have a balance of precision, recall, Innovative Research in Applied Science, Engineering and Technology
and mAP. A model with high precision but low recall or vice (IRASET), 2022: IEEE, pp. 1-7.
versa is not good. In the mask recognition task, a model with [4] D. Theckedath and R. Sedamkar, "Detecting affect states using VGG16,
ResNet50 and SE-ResNet50 networks," SN Computer Science, vol. 1,
high precision and recall is desirable, as it can correctly identify pp. 1-7, 2020.
most of the masks in the images while minimizing the number
[5] S. Targ, D. Almeida, and K. Lyman, "Resnet in resnet: Generalizing
of false detections and false negatives. residual architectures," arXiv preprint arXiv:1603.08029, 2016.
[6] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, "Inception-v4,
C. Results inception-resnet and the impact of residual connections on learning," in
Proceedings of the AAAI conference on artificial intelligence, 2017, vol.
As shown in table 1, the performance of various object 31, no. 1.
detection algorithms was evaluated using the mAP metric, as
[7] E. Noyes, J. P. Davis, N. Petrov, K. L. Gray, and K. L. Ritchie, "The
well as precision and recall. The results show that the Faster R- effect of face masks and sunglasses on identity and expression
CNN algorithm achieved a mAP of 0.87, with a precision of recognition with super-recognizers and typical observers," Royal Society
0.87 and a recall of 0.75. The RetinaNet algorithm yielded a open science, vol. 8, no. 3, p. 201169, 2021.
mAP is 0.78, precision is 0.77, and recall is 0.64. The YOLOv4 [8] S. K. Karmaker, M. M. Hassan, M. J. Smith, L. Xu, C. Zhai, and K.
algorithm demonstrated an mAP of 0.89, precision of 0.86, and Veeramachaneni, "Automl to date and beyond: Challenges and
recall of 0.77. Lastly, EasyDL algorithm achieved a mAP of opportunities," ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1-
36, 2021.
0.98, with a precision of 0.90 and a recall of 0.96. These results
[9] M. Grahlow, C. I. Rupp, and B. Derntl, "The impact of face masks on
suggest that the EasyDL algorithm has the highest overall emotion recognition performance and perception of threat," PLoS One,
performance, followed by YOLOv4 and Faster R-CNN. vol. 17, no. 2, p. e0262840, 2022.
RetinaNet has the lowest performance among these algorithms. [10] R. Lionnie, C. Apriono, and D. Gunawan, "Face mask recognition with
realistic fabric face mask data set: A combination using surface
curvature and glcm," in 2021 IEEE International IOT, Electronics and
TABLE I. MODELS PERFORMANCE Mechatronics Conference (IEMTRONICS), 2021: IEEE, pp. 1-6.
Metrics [11] G. Kaur et al., "Face mask recognition system using CNN model,"
Models Neuroscience Informatics, vol. 2, no. 3, p. 100035, 2022.
mAP precision recall
[12] F. Grundmann, K. Epstude, and S. Scheibe, "Face masks reduce
Faster R-CNN 0.87 0.87 0.75 emotion-recognition accuracy and perceived closeness," Plos one, vol.
16, no. 4, p. e0249792, 2021.
RetinaNet 0.78 0.77 0.64
[13] L. Wen, X. Li, X. Li, and L. Gao, "A new transfer learning based on
YOLOv4 0.89 0.86 0.77 VGG-19 network for fault diagnosis," in 2019 IEEE 23rd international
conference on computer supported cooperative work in design
EasyDL 0.95 0.90 0.96 (CSCWD), 2019: IEEE, pp. 205-209.
[14] X. Duan, M. Gou, N. Liu, W. Wang, and C. Qin, "High-capacity image
steganography based on improved Xception," Sensors, vol. 20, no. 24, p.
7253, 2020.

1050
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, [19] X. Sun, P. Wu, and S. C. Hoi, "Face detection using deep learning: An
"Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings improved faster RCNN approach," Neurocomputing, vol. 299, pp. 42-50,
of the IEEE conference on computer vision and pattern recognition, 2018.
2018, pp. 4510-4520. [20] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international
[16] J. Liu and X. Wang, "Early recognition of tomato gray leaf spot disease conference on computer vision, 2015, pp. 1440-1448.
based on MobileNetv2-YOLOv3 model," Plant Methods, vol. 16, pp. 1- [21] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, "Going deeper in
16, 2020. spiking neural networks: VGG and residual architectures," Frontiers in
[17] M. Grassi and M. Faundez-Zanuy, "Face recognition with facial mask neuroscience, vol. 13, p. 95, 2019.
application and neural networks," in Computational and Ambient [22] L. Ale, N. Zhang, and L. Li, "Road damage detection using RetinaNet,"
Intelligence: 9th International Work-Conference on Artificial Neural in 2018 IEEE International Conference on Big Data (Big Data), 2018:
Networks, IWANN 2007, San Sebastián, Spain, June 20-22, 2007. IEEE, pp. 5197-5200.
Proceedings 9, 2007: Springer, pp. 709-716.
[23] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal
[18] N. Damer, F. Boutros, M. Süßmilch, F. Kirchbuchner, and A. Kuijper, speed and accuracy of object detection," arXiv preprint
"Extended evaluation of the effect of real and simulated masks on face arXiv:2004.10934, 2020.
recognition performance," Iet Biometrics, vol. 10, no. 5, pp. 548-561,
2021.

1051
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.

You might also like