A Comparative Study of Machine Learning and Automatic Machine Learning Models For Facial Mask Recognition
A Comparative Study of Machine Learning and Automatic Machine Learning Models For Facial Mask Recognition
Abstract— In this study, we compare the performance of practitioners time and resources and improve the performance
traditional machine learning models and automatic machine of models [8].
learning models in facial mask recognition. We use a dataset of
images of individuals wearing masks and not wearing masks to However, there is limited research comparing traditional
train and test the models. We evaluate the models based on and autoML models for facial mask recognition. Furthermore,
accuracy, precision, recall, and F1 score metrics. Our results the focus on these models’ trade-offs between accuracy and
show that automatic machine learning models achieve similar or efficiency has not been fully explored [9]. This study aims to
slightly better performance than traditional machine learning fill this gap by comparing the performance of traditional
models but at the cost of longer training time. This study’s results machine learning models and automatic machine learning
can help practitioners select the appropriate model for facial models in facial mask recognition and evaluating the trade-offs
mask recognition based on the trade-off between accuracy and between accuracy and efficiency [10].
efficiency. Additionally, this study provides insights into the
potential of automatic machine learning models in computer The results of this study can have practical implications for
vision tasks, specifically in facial mask recognition. The study practitioners in selecting the appropriate model for facial mask
concludes that while the traditional machine learning models recognition based on the trade-off between accuracy and
may be more computationally efficient, the automatic machine efficiency. Additionally, this study provides insights into the
learning models can offer comparable or better performance in potential of automatic machine learning models in computer
facial mask recognition. vision tasks, specifically facial mask recognition.
1048
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
IV. EXPERIMENTS
Experimental results on MFDD datasets confirm that
EasyDL has comparable accuracy to Faster R-CNN, RetinaNet,
and YOLOv4 methods, while providing a unified framework
for both training and inference. EasyDL has much better
accuracy. EasyDL achieves 0.98 mAP, 0.77 recall on the test,
outperforming a comparable state of the other models.
A. Datasets
Figure 2. The example of YOLOv4 [22] The Masked Face Detection Dataset (MFDD) was created
by combining sources from various research studies and
YOLOv4 also utilizes a number of techniques to improve images obtained through internet crawling [23]. The collected
its performance, including a new network architecture called images were then meticulously annotated with information
SPP-Net, a multi-scale training strategy, and a novel data such as the presence of a mask and the location of the mask on
augmentation technique called Mosaic data augmentation. the face. This resulted in a comprehensive dataset comprising
Additionally, YOLOv4 utilizes a number of techniques to make 24,771 images of individuals wearing masks. This dataset can
it more robust to object occlusion, such as a new feature called be utilized to train a highly accurate model for the task of
Cross Mini-Batch Normalization (CmBN) and a newly masked face detection, which can, in turn, aid in masked face
introduced activation function called Mish. recognition.
In terms of its performance, YOLOv4 has shown to be
highly effective in object detection tasks, achieving state-of-
the-art results on several benchmark datasets such as COCO,
VOC, and ImageNet. Furthermore, YOLOv4 has a very fast
inference speed, making it suitable for real-time object
detection applications.
In conclusion, YOLOv4 is a powerful and efficient object
detection algorithm that is able to achieve state-of-the-art
performance while maintaining a fast inference speed. Its
unique features such as anchor boxes, SPP-Net, multi-scale
training, Mosaic data augmentation, CmBN, and Mish
activation function make it highly effective at detecting objects
in images and videos, making it a suitable solution for a wide Figure 3. The example of FMDD [23]
range of real-world object detection applications.
The MFDD dataset has been extensively used in recent
D. Automatic Machine Learning studies, providing a benchmark for comparing the performance
of different machine learning models for mask recognition. It
EasyDL is a platform that aims to make the process of
has been demonstrated that models trained on the MFDD
building, deploying, and maintaining deep learning models
dataset achieve state-of-the-art performance on various mask
more accessible and efficient for both experts and non-experts
recognition tasks, making it a valuable resource for researchers
alike. The platform utilizes a user-friendly interface that allows
and practitioners in the field. The MFDD can also be employed
users to easily upload their data, select the appropriate model
to determine compliance with mask-wearing regulations during
architecture, and set the desired hyperparameters. It then
widespread disease outbreaks such as the coronavirus
automatically optimizes the model architecture and pandemic.
hyperparameters for the given data, reducing the need for
manual tuning.
B. Metrics
One of the key features of easyDL is its ability to perform Precision, recall, and mAP is three commonly used
model selection and ensemble, which combines multiple performance metrics in object detection tasks such as facial
models to improve the overall performance. Additionally, it mask detection. These metrics are used to evaluate and
also provides the ability to perform model interpretability, compare a model’s performance with other models. The
which allows users to understand the model’s decision-making precision and recall are calculated using true positives (TP),
process and identify any potential issues. false positives (FP), true negatives (TN), and false negatives
Furthermore, easyDL also offers the ability to deploy the (FN).
trained models to various platforms such as web, mobile, and Precision is a metric that measures the proportion of true
edge devices with minimal effort. This makes it easy for users positive detections among all positive detections. It is
to integrate their models into their existing systems and calculated as:
products.
Precision: (TP / (TP + FP))
1049
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
A high precision means that the model has a low false V. CONCLUSION
positive rate, which means that it is able to correctly identify In conclusion, after an extensive evaluation of various
objects in the image while minimizing the number of false machine learning models and automatic machine learning
detections. models for mask recognition, it can be determined that there is
The recall metric measures the proportion of true positive no one-size-fits-all solution. Each model possesses its own set
detections among all actual positive instances. It is calculated of advantages and limitations. For instance, models such as
as the ratio of accurate positive detections to the sum of TP and YOLO have a high speed in object detection, while RetinaNet
FN detections. It is calculated as: and Faster RCNN have a strong balance of precision and recall.
Additionally, automatic machine learning models can optimize
Recall: (TP / (TP + FN)) the model architecture and hyperparameters, saving time and
A high recall means that the model has a low false negative resources. Furthermore, it is essential to consider the specific
rate, which means it can correctly identify most objects in the requirements and constraints of the application when selecting
image. In the facial mask recognition task, a high recall means a model for the task of mask recognition. Finally, using high-
that the model is able to correctly identify most of the masks in quality and diverse datasets is also crucial for training and
the images and minimize the number of false negatives, where evaluating the model’s performance.
the model fails to detect the mask.
Mean Average Precision (mAP) is a metric that combines REFERENCES
precision and recall into a single value. It is calculated by [1] J. S. Talahua, J. Buele, P. Calvopiña, and J. Varela-Aldás, "Facial
recognition system for people with and without face mask in times of the
averaging the precision at different recall levels. The precision- covid-19 pandemic," Sustainability, vol. 13, no. 12, p. 6900, 2021.
recall curve is a plot of precision versus recall at different
[2] I. Q. Mundial, M. S. U. Hassan, M. I. Tiwana, W. S. Qureshi, and E.
thresholds. The mAP is the area under the precision-recall Alanazi, "Towards facial recognition problem in COVID-19 pandemic,"
curve. A model that has high mAP means that it has high in 2020 4rd International Conference on Electrical, Telecommunication
precision and recall. In other words, it can correctly identify and Computer Engineering (ELTICOM), 2020: IEEE, pp. 210-214.
objects in the image while minimizing the number of false [3] O. El Gannour, B. Cherradi, S. Hamida, M. Jebbari, and A. Raihani,
detections and false negatives. "Screening Medical Face Mask for Coronavirus Prevention using Deep
Learning and AutoML," in 2022 2nd International Conference on
A good model should have a balance of precision, recall, Innovative Research in Applied Science, Engineering and Technology
and mAP. A model with high precision but low recall or vice (IRASET), 2022: IEEE, pp. 1-7.
versa is not good. In the mask recognition task, a model with [4] D. Theckedath and R. Sedamkar, "Detecting affect states using VGG16,
ResNet50 and SE-ResNet50 networks," SN Computer Science, vol. 1,
high precision and recall is desirable, as it can correctly identify pp. 1-7, 2020.
most of the masks in the images while minimizing the number
[5] S. Targ, D. Almeida, and K. Lyman, "Resnet in resnet: Generalizing
of false detections and false negatives. residual architectures," arXiv preprint arXiv:1603.08029, 2016.
[6] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, "Inception-v4,
C. Results inception-resnet and the impact of residual connections on learning," in
Proceedings of the AAAI conference on artificial intelligence, 2017, vol.
As shown in table 1, the performance of various object 31, no. 1.
detection algorithms was evaluated using the mAP metric, as
[7] E. Noyes, J. P. Davis, N. Petrov, K. L. Gray, and K. L. Ritchie, "The
well as precision and recall. The results show that the Faster R- effect of face masks and sunglasses on identity and expression
CNN algorithm achieved a mAP of 0.87, with a precision of recognition with super-recognizers and typical observers," Royal Society
0.87 and a recall of 0.75. The RetinaNet algorithm yielded a open science, vol. 8, no. 3, p. 201169, 2021.
mAP is 0.78, precision is 0.77, and recall is 0.64. The YOLOv4 [8] S. K. Karmaker, M. M. Hassan, M. J. Smith, L. Xu, C. Zhai, and K.
algorithm demonstrated an mAP of 0.89, precision of 0.86, and Veeramachaneni, "Automl to date and beyond: Challenges and
recall of 0.77. Lastly, EasyDL algorithm achieved a mAP of opportunities," ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1-
36, 2021.
0.98, with a precision of 0.90 and a recall of 0.96. These results
[9] M. Grahlow, C. I. Rupp, and B. Derntl, "The impact of face masks on
suggest that the EasyDL algorithm has the highest overall emotion recognition performance and perception of threat," PLoS One,
performance, followed by YOLOv4 and Faster R-CNN. vol. 17, no. 2, p. e0262840, 2022.
RetinaNet has the lowest performance among these algorithms. [10] R. Lionnie, C. Apriono, and D. Gunawan, "Face mask recognition with
realistic fabric face mask data set: A combination using surface
curvature and glcm," in 2021 IEEE International IOT, Electronics and
TABLE I. MODELS PERFORMANCE Mechatronics Conference (IEMTRONICS), 2021: IEEE, pp. 1-6.
Metrics [11] G. Kaur et al., "Face mask recognition system using CNN model,"
Models Neuroscience Informatics, vol. 2, no. 3, p. 100035, 2022.
mAP precision recall
[12] F. Grundmann, K. Epstude, and S. Scheibe, "Face masks reduce
Faster R-CNN 0.87 0.87 0.75 emotion-recognition accuracy and perceived closeness," Plos one, vol.
16, no. 4, p. e0249792, 2021.
RetinaNet 0.78 0.77 0.64
[13] L. Wen, X. Li, X. Li, and L. Gao, "A new transfer learning based on
YOLOv4 0.89 0.86 0.77 VGG-19 network for fault diagnosis," in 2019 IEEE 23rd international
conference on computer supported cooperative work in design
EasyDL 0.95 0.90 0.96 (CSCWD), 2019: IEEE, pp. 205-209.
[14] X. Duan, M. Gou, N. Liu, W. Wang, and C. Qin, "High-capacity image
steganography based on improved Xception," Sensors, vol. 20, no. 24, p.
7253, 2020.
1050
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, [19] X. Sun, P. Wu, and S. C. Hoi, "Face detection using deep learning: An
"Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings improved faster RCNN approach," Neurocomputing, vol. 299, pp. 42-50,
of the IEEE conference on computer vision and pattern recognition, 2018.
2018, pp. 4510-4520. [20] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international
[16] J. Liu and X. Wang, "Early recognition of tomato gray leaf spot disease conference on computer vision, 2015, pp. 1440-1448.
based on MobileNetv2-YOLOv3 model," Plant Methods, vol. 16, pp. 1- [21] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, "Going deeper in
16, 2020. spiking neural networks: VGG and residual architectures," Frontiers in
[17] M. Grassi and M. Faundez-Zanuy, "Face recognition with facial mask neuroscience, vol. 13, p. 95, 2019.
application and neural networks," in Computational and Ambient [22] L. Ale, N. Zhang, and L. Li, "Road damage detection using RetinaNet,"
Intelligence: 9th International Work-Conference on Artificial Neural in 2018 IEEE International Conference on Big Data (Big Data), 2018:
Networks, IWANN 2007, San Sebastián, Spain, June 20-22, 2007. IEEE, pp. 5197-5200.
Proceedings 9, 2007: Springer, pp. 709-716.
[23] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal
[18] N. Damer, F. Boutros, M. Süßmilch, F. Kirchbuchner, and A. Kuijper, speed and accuracy of object detection," arXiv preprint
"Extended evaluation of the effect of real and simulated masks on face arXiv:2004.10934, 2020.
recognition performance," Iet Biometrics, vol. 10, no. 5, pp. 548-561,
2021.
1051
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:41:23 UTC from IEEE Xplore. Restrictions apply.