Research Paper
Research Paper
Desu Fu1, Lin Gao1*, Tao Hu1, Shukun Wang1, Wei Liu1
1
School of Information Engineering, Hubei MinZu University, Enshi, Hubei,China
*
email: [email protected]
Abstract. The traditional helmet detection algorithm in power industry has low precision and
poor robustness. In response to this problem, the helmet detection algorithm based on
improved YOLOv5 (You only look once) is put forward in this paper. Firstly, the YOLOv5
network structure is improved. By increasing the size of the feature map, one scale is added to
the original three scales, and the added 160*160 feature map can be used for the detection of
small targets; Secondly, the K-means is used for re-clustering the helmet data set to get more
suitable priori anchor boxes. The experimental results illustrate that the average accuracy of the
improved YOLOv5 algorithm is increased by 2.9% and reaching 95% compared with the
initial model, and the accuracy of helmet recognition is increased by 2.4% and reaching 94.6%.
This algorithm reduces the rates of missing detection and misdetection of small target detection
in original network, and has strong practicability and advanced nature. It can satisfy the
requirements of real-time detection and has a certain role in promoting the safety of power
industry.
1. Introduction
In the working process of power workers, our first impression of them is that they are wearing safety
helmets, whether it is in sunny, rainy or snowy. If the power workers do not wear safety helmets
during operation, they may be hit by objects falling from above, hurt their head due to falling from a
height, or their heads may suffer from electric shock. Therefore, safety helmet is the safety guarantee
for workers in power industry.
Power workers must wear safety helmets to enter the operation area, but manual supervision is time
consuming and laborious, and there are risks in close range supervision in some work scenarios. So the
intelligent real-time safety helmet detection system of power workers is particularly important. It can
not only realize the automation and digitization of safety supervision and monitoring, but also improve
the safety of power workers, which has practical development significance.
The development of target detection technology is divided into two periods, which can be called
the traditional detection period and the deep learning-based detection period [1]. In the traditional
target detection period, VJ (Viola-Jones) face detector [2], HOG + SVM (Histogram of oriented
gradient + Support Vector Machine) algorithm [3] and DPM (Deformable Part Model) algorithm [4]
are the representations. For example, in 2014, Liu Xiaohui [5] combined SVM and skin color
detection to Identify helmets. The detection period based on deep learning is represented by R-CNN
(Region-Convolutional Neural Networks) [6], Fast R-CNN [7], Faster R-CNN [8], SPP-Net (Spatial
Pyramid Pooling-Net)[9], YOLO [10], SSD (Single Shot MultiBox Detector)[11] algorithm, etc.
These algorithms are divided into two-stage and one-stage. The former mainly includes R-CNN,
SPP-Net, Fast R-CNN and Faster R-CNN, and the latter mainly includes YOLO, SSD, etc. In the
two-stage algorithm, the first-level network extracts features from the candidate area, and the
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
second-level network classifies and accurately regresses the selected area. The detection accuracy is
high, but the running speed is slow. In the single-stage algorithm, the tasks of classification and
regression can be completed only by the first-level network, and the step of candidate regions is not
required. The running speed is fast and the detection accuracy is slightly lower. The YOLO is a
representative of single-stage algorithm. Because of its fast running speed, it is suitable for real-time
detection. Very popular in practice. The YOLOv5 is the latest and best performance version, and its
application and research are very important.
2
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
3
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
enhances the ability of network feature fusion. In addition, other parts of the network have also been
adjusted.
Finally, the Prediction of YOLOv5 is innovative to a certain extent. It Increased PANNet [19] to
better complement the underlying and high-rise feature advantages, effectively solve the multi-scale
problem.
4
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
can be accurately detected on multiple scales even if the size of the helmets in the screen even have
some changes. The improved YOLOv5 model is shown in Figure 2.
Fig. 2 The network structure of our improved YOLOv5(The red box is the improved part)
5
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
Among them, γ is a parameter that measures the consistency of the aspect ratio, α is a parameter
ωgt w
used to make trade-offs, hgt is the aspect ratio of the bounding box, and h is the aspect ratio of the
predicted frame. CIOU considers the overlap of the frame on the basis of the IOU, and the center The
scale information of distance and aspect ratio makes the final prediction effect better.
6
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
of web crawlers, of which the training set is 15,900 images, the verification set is more than 2,000, and
the test set is more than 500. The labels are labeled with the Labelme tool. There are two types of
labels, namely "helmet" and "no helmet". The distribution of data set label operation, training set, test
set and data samples are illustrated in Figure 4.
7
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
such as rotation, contrast change, flipping, cropping, and scaling were used to enhance the self-made
data set. prevent the uneven distribution of the sample targets, we use the precision and the recall rate
to measure. The precision is mainly for the level of prediction results, and the recall rate is mainly for
its own samples. The relevant formula is as follows:
TP
Precision =
TP+FP
(4)
TP
Recall = TP+FN (5)
In the formulas, TP (True positives) can be understood as judging the positive class as the positive
class, that is, the amount that the prediction in the model is correct, and FP (False positive) can be
understood as the negative class being judged as the positive class, that is, the amount that the
prediction in the model is wrong, FN (False Negative) can be understood as a positive class judged as
a negative class, that is, the amount of the model that was originally a positive sample but was missed.
All the experimental results in this article are warmup training. This is to maintain the stability of
the model structure and will not cause oscillation effects due to the high initial learning rate of the
model. After passing the warmup stage, The cosine annealing algorithm is accustomed to update the
learning rate, so as to achieve a better network model. The curve of the relative change of the
YOLOv5 original model and improved model is Precision (accuracy), mAp_0.5, mAp_0.5:0.95, as
shown in Figure 5, 6, and 7, where the higher curve is the consequence of the improved model, and the
lower one is the consequence of original model.
8
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
It’s observed that that the precision and mAp value of the improved YOLOv5 network model are
higher than those of the original, which can effectively reduce the misdetection rate and missed
detection rate, The comparison of the helmet detection effect of the improved model and the original
model is illustrated in Table 3.
Tab. 3 Comparison of detection performance of YOLOv5 initial model and improved model
Model
Algorithm Parameter/piece Precision/% Recall/% Speed/fps
size/Mb
the initial
YOLOv5 7066239 13.7 92.2 94 32.6
the improved
YOLOv5 7851796 15.3 94.6 97 29.2
9
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
From the table, we can read that the parameters of the improved model are more than those of the
initial one, but the model size is not much different. The improved detection speed is 28.6fps, which is
only 3.4fps less than the original, and it will not have a great impact on the detection. However, the
improved detection precision rate has increased by 2.4% compared to the original, reaching 94.6%,
and the recall rate has increased by 3%, reaching 97%. By the comparison and analysis of the results,
the improved YOLOv5 fully meets the real-time detection requirements, and the detection precision
and recall rate are better than those of the original. Therefore, the improved YOLOv5 network model
in this article is effective in the detection task of whether power workers wear helmets or not.
10
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
Fig. 8 The effect of the initial YOLOv5 and the improved model in low light background
Fig. 9 The effect of the initial YOLOv5 and the improved model in the presence of obstruction
Fig. 10 The effect of the initial YOLOv5 and the improved model in the case of missed detection
Fig. 11 The effect of the initial YOLOv5 and the improved model in the case of false detection
Fig. 12 The effect of the initial YOLOv5 and the improved model with remote small targets
11
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
Fig. 13 The effect of the initial YOLOv5 and the improved model with dense small targets
5. Conclusion
Aiming at the poor detection effect and low accuracy of power workers helmets, this paper put
forward an improved YOLOv5 algorithm, which optimizes the feature fusion layer and multi-scale
detection layer, and adds a fusion scale for small target recognition Layer. greatly improving the
detection ability of small targets or even dense small targets. By increasing the size of the feature map
of 160*160, the detection accuracy of the network model for small targets is improved obviously, and
the rate of misdetection and missed detection is reduced. The K-means clustering method is also
accustomed to find the suitable detect candidate anchor frames for small targets such as helmets.
Through comparative experimental analysis, the improved YOLOv5 network model still meets
real-time detection, and the final precision rate is higher, the recall rate is higher, and the stability of
the algorithm is improved obviously.
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant
61562025,and Grant 61962019.
References
[1] Zou Z X, Shi Z W, Guo Y H and Ye J P 2019 Object detection in 20 years: A survey J.
Computer Vision and Pattern Recognition
[2] Viola P and Jones M 2001 Robust real-time face detection Proceedings Eighth IEEE
International Conference on Computer Vision (ICCV) pp 747-747
[3] Llorca D F, Arroyo R and Sotelo M A 2013 Vehicle logo recognition in traffic images using
HOG features and SVM Proceedings of International IEEE Conference on Intelligent
Transportation Systems (ITSC) pp 2229-2234
[4] Felzenszwalb P F, Girshick R B, McAllester D and Ramanan D 2010 Object Detection with
Discriminatively Trained Part-Based Models J. IEEE Transactions on Pattern Analysis and
Machine Intelligence 32(9) pp 1627-1645
[5] Liu X H and Ye X N 2014 The application of skin color detection and Hu moment in helmet
recognition J. East China University of Science and Technology (Natural Science Edition)
40(03) pp 365-370
[6] Kido S, Hirano Y and Hashimoto N 2018 Detection and classification of lung abnormalities by
use of convolutional neural network(CNN) and regions with CNN features (R-CNN)
International Workshop on Advanced Image Technology (IWAIT) pp 1-4
[7] Girshick R 2015 Fast R-CNN IEEE International Conference on Computer Vision (ICCV) pp
1440-1448
[8] Ren S, He K, Girshick R and Sun J 2017 Faster R-CNN: Towards Real-Time Object Detection
with Region Proposal Networks IEEE Transactions on Pattern Analysis and Machine
Intelligence vol 39 pp 1137-1149
[9] He L, Zhang X, Ren S and Sun J 2015 Spatial Pyramid Pooling in Deep Convolutional
Networks for Visual Recognition IEEE Transactions on Pattern Analysis and Machine
Intelligence vol 37 pp 1904-1916
[10] Redmon J, Divvala S, Girshick R and Farhadi A 2016 You Only Look Once: Unified,
Real-Time Object Detection IEEE Conference on Computer Vision and Pattern Recognition
12
ICCBDAI-2021 IOP Publishing
Journal of Physics: Conference Series 2171 (2022) 012006 doi:10.1088/1742-6596/2171/1/012006
(CVPR) pp 779-788
[11] Poirson P, Ammirato P, Fu C, Liu W, Kos̆ecká J and Berg A C 2016 Fast Single Shot Detection
and Pose Estimation Fourth International Conference on 3D Vision (3DV) pp 676-684
[12] Redmon J and Farhadi A 2017 YOLO9000: Better, Faster, Stronger IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) pp 6517-6525
[13] Redmon J and Farhadi A 2021 YOLO v3: an incremental improvement
[14] Bochkovshiy A, Wang C and Liao H 2020 YOLOv4: optimal speed and accuracy of object
detection J. Computer Vision and Pattern Recognition
[15] Jocher G 2020 Yolov5 https://github.com/ ultralyc-s/yolov5
[16] Tan M X and Le Q V 2019 EfficientNet: Rethinking Model Scaling for Convolutional Neural
Networks International Conference on Machine Learning vol 97 pp 6105-6114
[17] Huang Z, Zhong Z, Sun L and Huo Q 2019 Mask R-CNN With Pyramid Attention Network for
Scene Text Detection IEEE Winter Conference on Applications of Computer Vision (WACV)
pp 764-772
[18] Wang C, Liao H M, Wu Y, Chen P, Hsieh J and Yeh I 2020 CSPNet: A New Backbone that can
Enhance Learning Capability of CNN IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW) pp 1571-1580
[19] Yang J, Fu X, Hu Y, Huang Y, Ding X and Paisley J 2017 PanNet: A Deep Network Architecture
for Pan-Sharpening IEEE International Conference on Computer Vision (ICCV) pp
1753-1761
13